From steve.dower at python.org Thu Nov 1 11:38:24 2018 From: steve.dower at python.org (Steve Dower) Date: Thu, 1 Nov 2018 08:38:24 -0700 Subject: [Python-Dev] Rename Include/internals/ to Include/pycore/ In-Reply-To: References: Message-ID: <67172f5d-ca9e-c924-5970-b28d6876ce33@python.org> I assume the redundancy was there to handle the naming collision problem, but a better way to do that is to have only header files that users should ever use in "include/", and put the rest inside "include/python/" and "include/internal/" (and if possible, we don't distribute the internal directory as part of -develop packages, which only really impacts the Windows installer as far as we're concerned, but I assume some distros will care). That would mean a layout like: include/ Python.h typeslots.h Python/ abstract.h ... internal/ pycore_atomic.h ... End users/distutils should only -Iinclude, while our build can -Iinclude/Python and -Iinclude/internal as well. Or we can update all of our includes to specify the directory (which I'd prefer - #include "name.h" should look adjacent to the file first, while #include looks at search paths first, which should be able to deal with collisions). Any change here is an improvement, though. I'll be happy for Py_BUILD_CORE to go away (or at least only imply -Iinclude/internal, rather than actual preprocessor work), no matter where files land precisely :) Cheers, Steve On 31Oct2018 1835, Eric Snow wrote: > On the one hand dropping redundancy in the filename is fine.? On the > other having the names mirror the public header files is valuable.? How > about leaving the base names alone and change the directory to "pyinternal"? > > -eric > > On Wed, Oct 31, 2018, 17:36 Victor Stinner wrote: > > Ok, thanks. I decided to remove the redundant "py", so I renamed > "pystate.h" to "pycore_state.h" (single "py", instead of > "pycore_pystate.h", double "py py"). > > I updated my PR: > https://github.com/python/cpython/pull/10263 > > * Rename Include/internal/ header files: > > ? * pyatomic.h -> pycore_atomic.h > ? * ceval.h -> pycore_ceval.h > ? * condvar.h -> pycore_condvar.h > ? * context.h -> pycore_context.h > ? * pygetopt.h -> pycore_getopt.h > ? * gil.h -> pycore_gil.h > ? * hamt.h -> pycore_hamt.h > ? * hash.h -> pycore_hash.h > ? * mem.h -> pycore_mem.h > ? * pystate.h -> pycore_state.h > ? * warnings.h -> pycore_warnings.h > > * PCbuild project, Makefile.pre.in , > Modules/Setup: add the > ? Include/internal/ directory to the search paths of header files. > * Update include. For example, replace #include "internal/mem.h" > ? with #include "pycore_mem.h". > > Victor > Le mer. 31 oct. 2018 ? 23:38, Eric Snow > a ?crit : > > > > B > > On Wed, Oct 31, 2018 at 4:28 PM Victor Stinner > > wrote: > > > > > > Le mer. 31 oct. 2018 ? 22:19, Eric Snow > > a > ?crit : > > > > > Maybe we can keep "Include/internal/" directory name, but add > > > > > "pycore_" rather than "internal_" to header filenames? > > > > > > > > this sounds good to me.? thanks for chasing this down. > > > > > > What do you prefer? > > > > > > (A Include/internal/internal_pystate.h > > > (B) Include/internal/pycore_pystate.h > > > (C) Include/pycore/pycore_pystate.h > > > > > > Victor > From vstinner at redhat.com Thu Nov 1 14:19:51 2018 From: vstinner at redhat.com (Victor Stinner) Date: Thu, 1 Nov 2018 19:19:51 +0100 Subject: [Python-Dev] Rename Include/internals/ to Include/pycore/ In-Reply-To: <67172f5d-ca9e-c924-5970-b28d6876ce33@python.org> References: <67172f5d-ca9e-c924-5970-b28d6876ce33@python.org> Message-ID: Le jeu. 1 nov. 2018 ? 16:38, Steve Dower a ?crit : > I assume the redundancy was there to handle the naming collision > problem, but a better way to do that is to have only header files that > users should ever use in "include/", and put the rest inside > "include/python/" and "include/internal/" (and if possible, we don't > distribute the internal directory as part of -develop packages, which > only really impacts the Windows installer as far as we're concerned, but > I assume some distros will care). Aha, I didn't try this approach. I counted 41 Include/ header files which are not included by Python.h: Python-ast.h abstract.h asdl.h ast.h bitset.h bytes_methods.h bytesobject.h code.h codecs.h datetime.h dictobject.h dynamic_annotations.h errcode.h fileobject.h frameobject.h graminit.h grammar.h longintrepr.h marshal.h metagrammar.h node.h opcode.h osdefs.h parsetok.h patchlevel.h pgen.h pgenheaders.h py_curses.h pydtrace.h pyexpat.h pystrhex.h pythonrun.h pythread.h pytime.h setobject.h structmember.h symtable.h token.h ucnhash.h unicodeobject.h For Py_BUILD_CORE, there is at least datetime.h which may be a Include/internal/ twin header. Now I'm not sure how it's going to fit with the second step, "Move !Py_LIMITED_API to Include/pycapi/": https://bugs.python.org/issue35134 Victor From status at bugs.python.org Fri Nov 2 13:09:58 2018 From: status at bugs.python.org (Python tracker) Date: Fri, 2 Nov 2018 18:09:58 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20181102170958.3043456D39@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2018-10-26 - 2018-11-02) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6828 ( +5) closed 40071 (+66) total 46899 (+71) Open issues with patches: 2720 Issues opened (41) ================== #35034: Add closing and iteration to threading.Queue https://bugs.python.org/issue35034 reopened by hemflit #35078: Allow customization of CSS class name of a month in calendar m https://bugs.python.org/issue35078 opened by thatiparthy #35081: Move internal headers to Include/internal/ https://bugs.python.org/issue35081 opened by vstinner #35082: Mock.__dir__ lists deleted attributes https://bugs.python.org/issue35082 opened by mariocj89 #35083: Fix documentation for __instancecheck__ https://bugs.python.org/issue35083 opened by joydiamond #35084: binascii.Error: Incorrect padding https://bugs.python.org/issue35084 opened by Tester #35091: Objects/listobject.c: gallop functions rely on signed integer https://bugs.python.org/issue35091 opened by izbyshev #35097: IDLE add doc subsection for editor windows https://bugs.python.org/issue35097 opened by terry.reedy #35099: Improve the IDLE - console differences doc https://bugs.python.org/issue35099 opened by terry.reedy #35100: urllib.parse.unquote_to_bytes: needs "escape plus" option https://bugs.python.org/issue35100 opened by Henry Zhu #35101: inspect.findsource breaks on class frame objects https://bugs.python.org/issue35101 opened by orlnub123 #35103: format_exception() doesn't work with PyErr_Fetch https://bugs.python.org/issue35103 opened by tomasheran #35104: IDLE: On macOS, Command-M minimizes & opens "Open Module..." https://bugs.python.org/issue35104 opened by taleinat #35105: Document that CPython accepts "invalid" identifiers https://bugs.python.org/issue35105 opened by vstinner #35107: untokenize() fails on tokenize output when a newline is missin https://bugs.python.org/issue35107 opened by gregory.p.smith #35108: inspect.getmembers passes exceptions from object's properties https://bugs.python.org/issue35108 opened by rominf #35109: Doctest in CI uses python binary built from master causing Dep https://bugs.python.org/issue35109 opened by xtreak #35111: Make Custom Object Classes JSON Serializable https://bugs.python.org/issue35111 opened by andrewchap #35113: inspect.getsource returns incorrect source for classes when cl https://bugs.python.org/issue35113 opened by xtreak #35114: ssl.RAND_status docs describe it as returning True/False; actu https://bugs.python.org/issue35114 opened by josh.r #35118: Add peek() or first() method in queue https://bugs.python.org/issue35118 opened by Windson Yang #35120: SSH tunnel support to ftp lib https://bugs.python.org/issue35120 opened by msharma #35121: Cookie domain check returns incorrect results https://bugs.python.org/issue35121 opened by ???????????? #35122: Process not exiting on unhandled exception when using multipro https://bugs.python.org/issue35122 opened by akhi singhania #35123: Add style guide for sentinel usage https://bugs.python.org/issue35123 opened by madman bob #35124: Add style guide for unit tests https://bugs.python.org/issue35124 opened by madman bob #35125: asyncio shield: remove inner callback on outer cancellation https://bugs.python.org/issue35125 opened by mainro #35126: Mistake in FAQ about converting number to string. https://bugs.python.org/issue35126 opened by grottrumsel #35127: pyurandom() fails if user does not have an entropy device https://bugs.python.org/issue35127 opened by pehdrah #35131: Cannot access to customized paths within .pth file https://bugs.python.org/issue35131 opened by Valentin Zhao #35132: python-gdb error: Python Exception Type https://bugs.python.org/issue35132 opened by Dylan Cali #35133: Bugs in concatenating string literals on different lines https://bugs.python.org/issue35133 opened by serhiy.storchaka #35134: Move !Py_LIMITED_API to Include/pycapi/ https://bugs.python.org/issue35134 opened by vstinner #35136: test_ssl fails in AMD64 FreeBSD CURRENT Shared 3.6 buildbot https://bugs.python.org/issue35136 opened by pablogsal #35138: timeit documentation should have example with function argumen https://bugs.python.org/issue35138 opened by davidak #35140: encoding problem: coding:gbk cause syntaxError https://bugs.python.org/issue35140 opened by anmikf #35143: Annotations future requires unparse, but not accessible from P https://bugs.python.org/issue35143 opened by kayhayen #35144: TemporaryDirectory can't be cleaned up if there are unsearchab https://bugs.python.org/issue35144 opened by lilydjwg #35145: sqlite3: "select *" should autoconvert datetime fields https://bugs.python.org/issue35145 opened by jondo #35147: _Py_NO_RETURN is always empty on GCC https://bugs.python.org/issue35147 opened by izbyshev #35148: cannot activate a venv environment on a Swiss German windows https://bugs.python.org/issue35148 opened by Martin Bijl-Schwab Most recent 15 issues with no replies (15) ========================================== #35148: cannot activate a venv environment on a Swiss German windows https://bugs.python.org/issue35148 #35143: Annotations future requires unparse, but not accessible from P https://bugs.python.org/issue35143 #35132: python-gdb error: Python Exception Type https://bugs.python.org/issue35132 #35131: Cannot access to customized paths within .pth file https://bugs.python.org/issue35131 #35127: pyurandom() fails if user does not have an entropy device https://bugs.python.org/issue35127 #35125: asyncio shield: remove inner callback on outer cancellation https://bugs.python.org/issue35125 #35114: ssl.RAND_status docs describe it as returning True/False; actu https://bugs.python.org/issue35114 #35103: format_exception() doesn't work with PyErr_Fetch https://bugs.python.org/issue35103 #35100: urllib.parse.unquote_to_bytes: needs "escape plus" option https://bugs.python.org/issue35100 #35082: Mock.__dir__ lists deleted attributes https://bugs.python.org/issue35082 #35063: Checking for abstractmethod implementation fails to consider M https://bugs.python.org/issue35063 #35061: Specify libffi.so soname for ctypes https://bugs.python.org/issue35061 #35048: Can't reassign __class__ despite the assigned class having ide https://bugs.python.org/issue35048 #35018: Sax parser provides no user access to lexical handlers https://bugs.python.org/issue35018 #35009: argparse throws UnicodeEncodeError for printing help with unic https://bugs.python.org/issue35009 Most recent 15 issues waiting for review (15) ============================================= #35147: _Py_NO_RETURN is always empty on GCC https://bugs.python.org/issue35147 #35138: timeit documentation should have example with function argumen https://bugs.python.org/issue35138 #35134: Move !Py_LIMITED_API to Include/pycapi/ https://bugs.python.org/issue35134 #35133: Bugs in concatenating string literals on different lines https://bugs.python.org/issue35133 #35125: asyncio shield: remove inner callback on outer cancellation https://bugs.python.org/issue35125 #35121: Cookie domain check returns incorrect results https://bugs.python.org/issue35121 #35118: Add peek() or first() method in queue https://bugs.python.org/issue35118 #35101: inspect.findsource breaks on class frame objects https://bugs.python.org/issue35101 #35097: IDLE add doc subsection for editor windows https://bugs.python.org/issue35097 #35091: Objects/listobject.c: gallop functions rely on signed integer https://bugs.python.org/issue35091 #35082: Mock.__dir__ lists deleted attributes https://bugs.python.org/issue35082 #35081: Move internal headers to Include/internal/ https://bugs.python.org/issue35081 #35078: Allow customization of CSS class name of a month in calendar m https://bugs.python.org/issue35078 #35077: Make TypeError message less ambiguous https://bugs.python.org/issue35077 #35065: Reading received data from a closed TCP stream using `StreamRe https://bugs.python.org/issue35065 Top 10 most discussed issues (10) ================================= #35081: Move internal headers to Include/internal/ https://bugs.python.org/issue35081 19 msgs #35070: test_posix fails on macOS 10.14 Mojave https://bugs.python.org/issue35070 14 msgs #34160: ElementTree not preserving attribute order https://bugs.python.org/issue34160 13 msgs #35059: Convert Py_INCREF() and PyObject_INIT() to inlined functions https://bugs.python.org/issue35059 13 msgs #35084: binascii.Error: Incorrect padding https://bugs.python.org/issue35084 12 msgs #1154351: add get_current_dir_name() to os module https://bugs.python.org/issue1154351 10 msgs #35140: encoding problem: coding:gbk cause syntaxError https://bugs.python.org/issue35140 8 msgs #35077: Make TypeError message less ambiguous https://bugs.python.org/issue35077 7 msgs #19376: document that strptime() does not support the Feb 29 if the fo https://bugs.python.org/issue19376 6 msgs #35052: Coverity scan: copy/paste error in Lib/xml/dom/minidom.py https://bugs.python.org/issue35052 6 msgs Issues closed (64) ================== #9263: Try to print repr() when an C-level assert fails (in the garba https://bugs.python.org/issue9263 closed by vstinner #27200: make doctest in CPython has failures https://bugs.python.org/issue27200 closed by mdk #27741: datetime.datetime.strptime functionality description incorrect https://bugs.python.org/issue27741 closed by vstinner #28015: configure --with-lto builds fail when CC=clang on Linux, requi https://bugs.python.org/issue28015 closed by vstinner #29174: 'NoneType' object is not callable in subprocess.py https://bugs.python.org/issue29174 closed by vstinner #31544: gettext.Catalog title is not flagged as a class https://bugs.python.org/issue31544 closed by serhiy.storchaka #31680: Expose curses library name and version on Python level https://bugs.python.org/issue31680 closed by serhiy.storchaka #31880: subprocess process interaction with IDLEX GUI causes pygnuplot https://bugs.python.org/issue31880 closed by roger.serwy #32804: urllib.retrieve documentation doesn't mention context paramete https://bugs.python.org/issue32804 closed by xiang.zhang #33138: Improve standard error for uncopyable types https://bugs.python.org/issue33138 closed by serhiy.storchaka #33236: MagicMock().__iter__.return_value is different from MagicMock( https://bugs.python.org/issue33236 closed by michael.foord #33237: Improve AttributeError message for partially initialized modul https://bugs.python.org/issue33237 closed by serhiy.storchaka #33331: Clean modules in the reversed order https://bugs.python.org/issue33331 closed by serhiy.storchaka #33710: Deprecate gettext.lgettext() https://bugs.python.org/issue33710 closed by serhiy.storchaka #33826: enable discovery of class source code in IPython interactively https://bugs.python.org/issue33826 closed by t-vi #33940: datetime.strptime have no directive to convert date values wit https://bugs.python.org/issue33940 closed by belopolsky #34198: Additional encoding options to tkinter.filedialog https://bugs.python.org/issue34198 closed by narito #34576: [EASY doc] http.server, SimpleHTTPServer: warn users on securi https://bugs.python.org/issue34576 closed by orsenthil #34794: Memory leak in Tkinter https://bugs.python.org/issue34794 closed by serhiy.storchaka #34866: CGI DOS vulnerability via long post list https://bugs.python.org/issue34866 closed by vstinner #34876: Python3.8 changes how decorators are traced https://bugs.python.org/issue34876 closed by serhiy.storchaka #34945: regression with ./python -m test and pdb https://bugs.python.org/issue34945 closed by pablogsal #35007: Minor change to weakref docs https://bugs.python.org/issue35007 closed by mdk #35042: Use the role :pep: for the PEP \d+ https://bugs.python.org/issue35042 closed by mdk #35047: Better error messages in unittest.mock https://bugs.python.org/issue35047 closed by vstinner #35054: Add more index entries for symbols https://bugs.python.org/issue35054 closed by serhiy.storchaka #35062: io.IncrementalNewlineDecoder assign out-of-range value to bitw https://bugs.python.org/issue35062 closed by xiang.zhang #35064: COUNT_ALLOCS build export inc_count() and dec_count() function https://bugs.python.org/issue35064 closed by pablogsal #35067: Use vswhere instead of _distutils_findvs https://bugs.python.org/issue35067 closed by steve.dower #35068: [2.7] Possible crashes due to incorrect error handling in pyex https://bugs.python.org/issue35068 closed by serhiy.storchaka #35074: source install [3.7.1] on debian jessie https://bugs.python.org/issue35074 closed by xtreak #35075: Doc: pprint example uses dead URL https://bugs.python.org/issue35075 closed by inada.naoki #35076: FAIL: test_min_max_version (test.test_ssl.ContextTests) with l https://bugs.python.org/issue35076 closed by terry.reedy #35079: difflib.SequenceMatcher.get_matching_blocks omits common strin https://bugs.python.org/issue35079 closed by terry.reedy #35080: The tests for the `dis` module can be too rigid when changing https://bugs.python.org/issue35080 closed by Maxime Belanger #35085: FileNotFoundError: [Errno 2] No such file or directory: https://bugs.python.org/issue35085 closed by steven.daprano #35086: tkinter docs: errors in A Simple Hello World Program https://bugs.python.org/issue35086 closed by serhiy.storchaka #35087: IDLE: update idlelib help files for current doc build https://bugs.python.org/issue35087 closed by terry.reedy #35088: Update idlelib.help.copy_string docstring https://bugs.python.org/issue35088 closed by terry.reedy #35089: Remove typing.io and typing.re from documentation https://bugs.python.org/issue35089 closed by levkivskyi #35090: Potential division by zero and integer overflow in allocator w https://bugs.python.org/issue35090 closed by vstinner #35092: test_socket fails in MacOS High Sierra when running with -Werr https://bugs.python.org/issue35092 closed by ned.deily #35093: IDLE: document the help document viewer https://bugs.python.org/issue35093 closed by terry.reedy #35094: Improved algorithms for random.sample https://bugs.python.org/issue35094 closed by rhettinger #35095: Implement pathlib.Path.append_bytes and pathlib.Path.append_te https://bugs.python.org/issue35095 closed by pablogsal #35096: Change _PY_VERSION to derive from sys.version_info in sysconfi https://bugs.python.org/issue35096 closed by ned.deily #35098: Deleting __new__ does not restore previous behavior https://bugs.python.org/issue35098 closed by josh.r #35102: Struct pack() https://bugs.python.org/issue35102 closed by mark.dickinson #35106: Add documentation for `type.__subclasses__` to docs.python.org https://bugs.python.org/issue35106 closed by joydiamond #35110: Fix more spaces around hyphens and dashes https://bugs.python.org/issue35110 closed by serhiy.storchaka #35112: SpooledTemporaryFile and seekable() method https://bugs.python.org/issue35112 closed by nubirstein #35115: UUID objects can't be casted by `hex()` https://bugs.python.org/issue35115 closed by fcurella #35116: Doc/library entries for cgi.FieldStorage max_num_fields https://bugs.python.org/issue35116 closed by vstinner #35117: set.discard should return True or False based on if element ex https://bugs.python.org/issue35117 closed by rhettinger #35119: Customizing module attribute access example raises RecursionEr https://bugs.python.org/issue35119 closed by levkivskyi #35128: warning.warn messages with spacing issues https://bugs.python.org/issue35128 closed by scorphus #35129: Different behaviour comparing versions (distutils.version.Loos https://bugs.python.org/issue35129 closed by eric.araujo #35130: add same file name to zipfile ,result two files and same md5 https://bugs.python.org/issue35130 closed by serhiy.storchaka #35135: pip install --download option does not exist https://bugs.python.org/issue35135 closed by zach.ware #35137: Exception in isinstance when __class__ property raises https://bugs.python.org/issue35137 closed by brett.cannon #35139: Statically linking pyexpat in Modules/Setup fails to compile o https://bugs.python.org/issue35139 closed by benjamin.peterson #35141: encoding problem: gbk https://bugs.python.org/issue35141 closed by xtreak #35142: html.entities.html5 should require a trailing semicolon https://bugs.python.org/issue35142 closed by serhiy.storchaka #35146: Bad Regular Expression Broke re.findall() https://bugs.python.org/issue35146 closed by steven.daprano From nas-python at arctrix.com Fri Nov 2 13:22:38 2018 From: nas-python at arctrix.com (Neil Schemenauer) Date: Fri, 2 Nov 2018 11:22:38 -0600 Subject: [Python-Dev] Rename Include/internals/ to Include/pycore/ In-Reply-To: <1540759207.1168671.1557524968.5CFA99F3@webmail.messagingengine.com> References: <1540759207.1168671.1557524968.5CFA99F3@webmail.messagingengine.com> Message-ID: <20181102172238.u7ysgkpj2b7aids5@python.ca> On 2018-10-28, Benjamin Peterson wrote: > I don't think more or less API should be magically included based > on whether Py_BUILD_CORE is defined or not. I agree. > If we want to have private headers, we should include them where > needed and not install them. Really, Py_BUILD_CORE should go away. > We should be moving away from monolithic includes like Python.h to > having each C file include exactly what it uses, private or not. It seems that is best practice (e.g. look at Linux kernel include file style). I wonder however what are the real benefits to having modular include files and directly using them as needed? Pros for modular includes: - speeds up build process if you have good dependency info in the build system. Right now, change Python.h and everything gets rebuilt. I'm not sure this is a huge advantage anymore. - makes it clearer where an API is implemented? Cons: - more work to include the correct headers - build system dependency definitions are more complicated. Other systems generally have automatic dependancy generates (i.e. parse C files and find used includes). A simple approach would be to introduce something like Python-internal.h. If you are a Python internal unit, you can include both Python.h and Python-internal.h. We could, over time, split Python-iternal.h into smaller modular includes. From vstinner at redhat.com Fri Nov 2 15:01:30 2018 From: vstinner at redhat.com (Victor Stinner) Date: Fri, 2 Nov 2018 20:01:30 +0100 Subject: [Python-Dev] Rename Include/internals/ to Include/pycore/ In-Reply-To: <20181102172238.u7ysgkpj2b7aids5@python.ca> References: <1540759207.1168671.1557524968.5CFA99F3@webmail.messagingengine.com> <20181102172238.u7ysgkpj2b7aids5@python.ca> Message-ID: Le ven. 2 nov. 2018 ? 18:32, Neil Schemenauer a ?crit : > A simple approach would be to introduce something like > Python-internal.h. If you are a Python internal unit, you can > include both Python.h and Python-internal.h. We could, over time, > split Python-iternal.h into smaller modular includes. Since this discussion, I already moved most Py_BUILD_CORE in Include/internal/. I added a lot of #include "pycore_xxx.h" in C files. I started to reach the limit with this PR which adds the pycore_object.h include to not less than 33 C files: https://github.com/python/cpython/pull/10272 I rewrote this PR to avoid the need to modify 33 C files: https://github.com/python/cpython/pull/10276/files I added the following code to Python.h: #ifdef Py_BUILD_CORE /* bpo-35081: Automatically include pycore_object.h in Python.h, to avoid to have to add an explicit #include "pycore_object.h" to each C file. */ # include "pycore_object.h" #endif I'm not sure of it's a temporary workaround or not :-) Maybe a Python-internal.h would be a better solution? We can identify the most common header files needed to access "Python internals" and put them in this "common" header file. For example, most C files using Python internals have to access _PyRuntime global variable (55 C files), so "pycore_state.h" is a good candidate. By the way, I don't understand the rationale to have _PyRuntime in pycore_state.h. IMHO pycore_state.h should only contain functions to get the Python thread and Python interpreter states. IMHO _PyRuntime is unrelated and should belong to a different header file, maybe pycore_runtime.h? I didn't do this change yet *because* I would have to modify the 55 files currently including it. Victor From brett at python.org Fri Nov 2 18:24:20 2018 From: brett at python.org (Brett Cannon) Date: Fri, 2 Nov 2018 15:24:20 -0700 Subject: [Python-Dev] PEP 543-conform TLS library In-Reply-To: <3A7C8D1D-FA3F-46AF-8EC8-540D407EC825@gmail.com> References: <3A7C8D1D-FA3F-46AF-8EC8-540D407EC825@gmail.com> Message-ID: In case you never received a reply, you can try emailing Christian and Cory directly for an answer. On Fri, 26 Oct 2018 at 13:20, Mathias Laurin wrote: > Hello Python Dev, > > > I posted the following to python-ideas but here may be > a more suitable place. I apologize if cross posting > bothers anyone. > > > I have implemented an (I believe) PEP 543-conform TLS library > and released TLS support in the latest version yesterday: > > https://github.com/Synss/python-mbedtls/tree/0.13.0 > https://pypi.org/project/python-mbedtls/0.13.0/ > > > As far as I know, I am the first one to follow PEP 543. So one > point is that the API works. However, I have a couple of > questions regarding the PEP: > > - I do not know what to do in `TLSWrappedBuffer.do_handshake()`. > The full TLS handshake requires writing to the server, reading > back, etc., (ClientHello, ServerHello, KeyExchange, etc.), > which cannot be accomplished in a single buffer. > > For now, I am doing the handshake in > `TLSWrappedSocket.do_handshake()`: I set the BIO to using the > socket directly, then perform the handshake on the socket thus > entirely bypassing the TLSWrappedBuffer. Once this is done, I > swap the BIO to using the buffer and go on encrypting and > decrypting from the buffer. That is, the encrypted > communication is buffered. > > - The PEP sometimes mentions an "input buffer" and an "output > buffer", and some other times just "the buffer". I believe > that both implementations are possible. That is, with two > different buffers for input and output, or a single one. > > I have implemented it with a single circular buffer (that is a > stream after all). What the PEP is expecting is nonetheless > not clear to me. > > > So, can anybody clarify these two points from the PEP? > > > Or should I just address Cory Benfield (who does not seem very > active anymore lately) and Christian Heimes directly? > > > Cheers, > Mathias > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Sat Nov 3 05:08:38 2018 From: barry at barrys-emacs.org (Barry Scott) Date: Sat, 3 Nov 2018 09:08:38 +0000 Subject: [Python-Dev] windows compiler list missing 3.7 details on wiki Message-ID: <16BCAA90-23DF-4B6A-8A78-DEB397109133@barrys-emacs.org> On https://wiki.python.org/moin/WindowsCompilers details for 3.7 are missing. I'm assuming its still VC V14 Barry From steve.dower at python.org Sat Nov 3 13:57:15 2018 From: steve.dower at python.org (Steve Dower) Date: Sat, 3 Nov 2018 10:57:15 -0700 Subject: [Python-Dev] windows compiler list missing 3.7 details on wiki In-Reply-To: <16BCAA90-23DF-4B6A-8A78-DEB397109133@barrys-emacs.org> References: <16BCAA90-23DF-4B6A-8A78-DEB397109133@barrys-emacs.org> Message-ID: Yes. Visual Studio 2015 or later can be used (and as this is the only way to get the compiler right now, I think it's fine to list that as the requirement - note that the "Visual Studio Build Tools" installer doesn't include the IDE itself). Feel free to update the wiki. Cheers, Steve On 03Nov2018 0208, Barry Scott wrote: > On https://wiki.python.org/moin/WindowsCompilers details for 3.7 are missing. > I'm assuming its still VC V14 > > Barry From debatem1 at gmail.com Sat Nov 3 20:59:18 2018 From: debatem1 at gmail.com (geremy condra) Date: Sat, 3 Nov 2018 17:59:18 -0700 Subject: [Python-Dev] PEP 543-conform TLS library In-Reply-To: References: <3A7C8D1D-FA3F-46AF-8EC8-540D407EC825@gmail.com> Message-ID: Not to derail this thread, but it may be worth looking into something like Android's network security config ( https://developer.android.com/training/articles/security-config) in relation to PEP 543. One of the key takeaways from their analysis of a large number of applications which touched TLS libraries was that their developers often wanted to do simple and sane things but wound up doing complicated and insane ones. To combat that, they created a fairly lightweight declarative syntax for allowing narrow deviations from best practice. This syntax does not have the full flexibility of their API, but is enough to satisfy the needs of lots of developers and prevent lots of mistakes along the way. In particular, I'd love to see some examples of how to achieve the same effects as the canonical network security config examples using a PEP 543 interface. If they're useful enough it may even be beneficial to wrap those up in a separate library, but at the very least it will help prove out that PEP 543 can do the most important things that developers will want it to do. If that already exists and I'm just ignorant of it, sorry for the noise. Geremy Condra On Fri, Nov 2, 2018 at 3:25 PM Brett Cannon wrote: > In case you never received a reply, you can try emailing Christian and > Cory directly for an answer. > > On Fri, 26 Oct 2018 at 13:20, Mathias Laurin > wrote: > >> Hello Python Dev, >> >> >> I posted the following to python-ideas but here may be >> a more suitable place. I apologize if cross posting >> bothers anyone. >> >> >> I have implemented an (I believe) PEP 543-conform TLS library >> and released TLS support in the latest version yesterday: >> >> https://github.com/Synss/python-mbedtls/tree/0.13.0 >> https://pypi.org/project/python-mbedtls/0.13.0/ >> >> >> As far as I know, I am the first one to follow PEP 543. So one >> point is that the API works. However, I have a couple of >> questions regarding the PEP: >> >> - I do not know what to do in `TLSWrappedBuffer.do_handshake()`. >> The full TLS handshake requires writing to the server, reading >> back, etc., (ClientHello, ServerHello, KeyExchange, etc.), >> which cannot be accomplished in a single buffer. >> >> For now, I am doing the handshake in >> `TLSWrappedSocket.do_handshake()`: I set the BIO to using the >> socket directly, then perform the handshake on the socket thus >> entirely bypassing the TLSWrappedBuffer. Once this is done, I >> swap the BIO to using the buffer and go on encrypting and >> decrypting from the buffer. That is, the encrypted >> communication is buffered. >> >> - The PEP sometimes mentions an "input buffer" and an "output >> buffer", and some other times just "the buffer". I believe >> that both implementations are possible. That is, with two >> different buffers for input and output, or a single one. >> >> I have implemented it with a single circular buffer (that is a >> stream after all). What the PEP is expecting is nonetheless >> not clear to me. >> >> >> So, can anybody clarify these two points from the PEP? >> >> >> Or should I just address Cory Benfield (who does not seem very >> active anymore lately) and Christian Heimes directly? >> >> >> Cheers, >> Mathias >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/brett%40python.org >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/debatem1%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephane at wirtel.be Sun Nov 4 05:43:50 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 11:43:50 +0100 Subject: [Python-Dev] Need discussion for a PR about memory and objects Message-ID: <20181104104350.GA12859@xps> In this PR [https://github.com/python/cpython/pull/3382] "Remove reference to address from the docs, as it only causes confusion", opened by Chris Angelico, there is a discussion about the right term to use for the address of an object in memory. If you are interested by the topic, you could comment it. If there is no comments then I think we could close the PR. Thank you St?phane -- St?phane Wirtel - https://wirtel.be - @matrixise From steve at pearwood.info Sun Nov 4 08:03:15 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 5 Nov 2018 00:03:15 +1100 Subject: [Python-Dev] Need discussion for a PR about memory and objects In-Reply-To: <20181104104350.GA12859@xps> References: <20181104104350.GA12859@xps> Message-ID: <20181104130314.GU3817@ando.pearwood.info> On Sun, Nov 04, 2018 at 11:43:50AM +0100, Stephane Wirtel wrote: > In this PR [https://github.com/python/cpython/pull/3382] "Remove reference > to > address from the docs, as it only causes confusion", opened by Chris > Angelico, there is a discussion about the right term to use for the > address of an object in memory. Why do we need to refer to the address of objects in memory? Python's execution model is not based on addresses. We can't get the address of an object or do anything with it (except via ctypes). At the Python layer, objects just exist, they don't logically exist at any addressable location. In fact, they might not exist in a single location -- compound objects are split across many locations or can share chunks of memory between multiple objects. Objects in Jython and IronPython can move about, those in PyPy can disappear from existence and reappear. The concept that every object has exactly one fixed location simply isn't correct. I understand that people wishing to understand the implementation details of CPython objects will need to think about C-level concepts like memory address, but at the Python level, "address" is not a very meaningful or useful concept. -- Steve From stephane at wirtel.be Sun Nov 4 08:38:27 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 14:38:27 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. Message-ID: <20181104133827.GA25586@xps> Hi all, When we receive a PR about the documentation, I think that could be interesting if we could have a running instance of the doc on a sub domain of python.org. For example, pr-10000-doc.python.org or whatever, but by this way the reviewers could see the result online. The workflow would be like that: New PR -> build the doc (done by Travis) -> publish it to a server -> once published, the PR is notified by "doc is available at URL". Once merged -> we remove the doc and the link (hello bedevere). I am interested by this feature and if you also interested, tell me. I would like discuss with Julien Palard and Ernest W. Durbin III for a solution as soon as possible. Have a nice day, St?phane -- St?phane Wirtel - https://wirtel.be - @matrixise From nad at python.org Sun Nov 4 09:15:22 2018 From: nad at python.org (Ned Deily) Date: Sun, 4 Nov 2018 09:15:22 -0500 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <20181104133827.GA25586@xps> References: <20181104133827.GA25586@xps> Message-ID: <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> On Nov 4, 2018, at 08:38, Stephane Wirtel wrote: > When we receive a PR about the documentation, I think that could be > interesting if we could have a running instance of the doc on a sub > domain of python.org. > > For example, pr-10000-doc.python.org or whatever, but by this way the > reviewers could see the result online. It's an interesting idea but I don't like essentially opening python.org as a publishing platform (OK, another publishing platform - yes, I know about wiki.python.org). Considering how easy it is to build and view the docs yourself, I don't think the benefits of such a service are worth the added complexity and potential abuse. cd Doc make venv # first time git pr checkout ... make html open build/html/index.html # depending on platform and code change -- Ned Deily nad at python.org -- [] From julien at palard.fr Sun Nov 4 10:00:35 2018 From: julien at palard.fr (Julien Palard) Date: Sun, 04 Nov 2018 15:00:35 +0000 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> Message-ID: Considering feedback from Ned, what about building this as an independent service? We don't really need to interface with python.org at all, we just need some hardware, a domain, some code to interface with github API and... to start it's probably enough? It would be a usefull POC. --? Julien Palard https://mdk.fr From stephane at wirtel.be Sun Nov 4 10:12:39 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 16:12:39 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> Message-ID: <20181104151239.GA14162@xps> On 11/04, Ned Deily wrote: >On Nov 4, 2018, at 08:38, Stephane Wirtel wrote: >> When we receive a PR about the documentation, I think that could be >> interesting if we could have a running instance of the doc on a sub >> domain of python.org. >> >> For example, pr-10000-doc.python.org or whatever, but by this way the >> reviewers could see the result online. > >It's an interesting idea but I don't like essentially opening >python.org as a publishing platform (OK, another publishing platform - >yes, I know about wiki.python.org). Considering how easy it is to >build and view the docs yourself, I don't think the benefits of such a >service are worth the added complexity and potential abuse. > >cd Doc >make venv # first time >git pr checkout ... >make html >open build/html/index.html # depending on platform and code change I am going to give your 3 counter-examples but I think you will find an other counter-example ;-) 1. Two weeks ago, I have opened a PR [https://github.com/python/cpython/pull/10009] about a rewording of a sentence in the doc. Paul Ganssle was a reviewer and in a sentence he said that would be nice to have a live preview of the docs on PR. See https://github.com/python/cpython/pull/10009#issuecomment-433082605 Maybe he was on his smartphone/tablet and not in front of his computer. 2. On this morning, I was reviewing a PR https://github.com/python/cpython/pull/10102, but the message from the reviewer was "LGTM. Thanks for the PR ;-)". I don't doubt about the reviewer but I want to check the result of sphinx and not just the text, sometimes, we can find an other issue, you never know. 3. On this after-noon, I have reviewed a PR, and I was in the same case, download the PR, build python, compile the doc and run the local server. My workflow is the following steps: git wpr XYZ cd ../cpython-XYZ ./configure --prefix=$PWD-build --with-pydebug --silent make -j 4 -s make PYTHON=../python -C Doc/ venv make -C Doc/ check suspicious html serve and run the browser on http://localhost:8000/ and check the result. 1. Because I am a dev I can do it easily 2. If you are not a dev, you have to learn a new step (download sources, compile sources, compile doc and check the result) We could check easily the result without this big step. git wpr -> wpr = !bash -c \"git fetch upstream pull/${1}/head:pr_${1} && git worktree add ../$(basename $(git rev-parse --show-toplevel))-pr-${1} pr_${1}\" - I think this feature would be really useful for the contributors, the reviewers and you, the core-dev. St?phane -- St?phane Wirtel - https://wirtel.be - @matrixise From rosuav at gmail.com Sun Nov 4 10:18:11 2018 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 5 Nov 2018 02:18:11 +1100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> Message-ID: On Mon, Nov 5, 2018 at 2:11 AM Julien Palard via Python-Dev wrote: > > Considering feedback from Ned, what about building this as an independent service? We don't really need to interface with python.org at all, we just need some hardware, a domain, some code to interface with github API and... to start it's probably enough? It would be a usefull POC. > After running 'make html', the build directory is the entire site as a set of static files, right? Maybe the easiest solution is to tie in with GitHub Pages. I already have a script that will push a directory up as the gh-pages branch of the current repo; it'd just need a tweak so it can push to a specific repo, which you could create on GitHub for the purpose. Not 100% automatic, but also not too difficult to automate, if needed. https://github.com/Rosuav/shed/blob/master/git-deploy ChrisA From stephane at wirtel.be Sun Nov 4 10:25:25 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 16:25:25 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> Message-ID: <20181104152525.GA12103@xps> On 11/04, Julien Palard wrote: >Considering feedback from Ned, what about building this as an >independent service? We don't really need to interface with python.org >at all, we just need some hardware, a domain, some code to interface >with github API and... to start it's probably enough? It would be a >usefull POC. +1 We have 2 options 1. We keep a domain (python.org, or another one) and for each build we would have a subdomain for each build by PR. doc-pr-XYZ.domain.tld 2. We could store on domain but with a subdirectory by PR domain.tld/doc-pr-XYZ In travis, I think we have the PR info, we could use it and upload to the server for the live preview. Once merged, we drop the temporary doc on the server. St?phane > >--? >Julien Palard >https://mdk.fr > -- St?phane Wirtel - https://wirtel.be - @matrixise From stephane at wirtel.be Sun Nov 4 10:32:11 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 16:32:11 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> Message-ID: <20181104153211.GB12103@xps> On 11/05, Chris Angelico wrote: >On Mon, Nov 5, 2018 at 2:11 AM Julien Palard via Python-Dev > wrote: >> >> Considering feedback from Ned, what about building this as an independent service? We don't really need to interface with python.org at all, we just need some hardware, a domain, some code to interface with github API and... to start it's probably enough? It would be a usefull POC. >> > >After running 'make html', the build directory is the entire site as a >set of static files, right? Maybe the easiest solution is to tie in >with GitHub Pages. I already have a script that will push a directory >up as the gh-pages branch of the current repo; it'd just need a tweak >so it can push to a specific repo, which you could create on GitHub >for the purpose. Not 100% automatic, but also not too difficult to >automate, if needed. > >https://github.com/Rosuav/shed/blob/master/git-deploy Nice idea, but I am not for that. 1. We will populate the git repository with a lot of gh-pages branches and I am not for this solution 2. This static doc is just temporary, once merged, we have to remove the link and the content on the server, with the gh-pages, that will be a commit where we drop the content, but it's a commit and we will consume the storage of github. 3. 1 repo has only one gh-pages, in our case, we need to have a lot of gh-pages for a repo. But thank you for your idea. -- St?phane Wirtel - https://wirtel.be - @matrixise From steve.dower at python.org Sun Nov 4 10:34:23 2018 From: steve.dower at python.org (Steve Dower) Date: Sun, 4 Nov 2018 07:34:23 -0800 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> Message-ID: <23b8f062-348b-9750-e31f-0b3dabf09b22@python.org> On 04Nov2018 0718, Chris Angelico wrote: > On Mon, Nov 5, 2018 at 2:11 AM Julien Palard via Python-Dev > wrote: >> >> Considering feedback from Ned, what about building this as an independent service? We don't really need to interface with python.org at all, we just need some hardware, a domain, some code to interface with github API and... to start it's probably enough? It would be a usefull POC. >> > > After running 'make html', the build directory is the entire site as a > set of static files, right? Maybe the easiest solution is to tie in > with GitHub Pages. I already have a script that will push a directory > up as the gh-pages branch of the current repo; it'd just need a tweak > so it can push to a specific repo, which you could create on GitHub > for the purpose. Not 100% automatic, but also not too difficult to > automate, if needed. I can trivially attach the built docs as a ZIP file to the Azure Pipelines build, though that doesn't help the "preview on my phone" scenario (unless your phone can extract and then open a directory of HTML files? Mine can't) But that's also easy enough to tie into a second step to deploy the files practically anywhere. Pushing them to a git repo based on the PR name is easy, and presumably it can be a single repo with directories for different PRs? It might need a separate job to periodically clean it up, but this seems very doable. Cheers, Steve From storchaka at gmail.com Sun Nov 4 10:40:29 2018 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 4 Nov 2018 17:40:29 +0200 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> Message-ID: 04.11.18 17:00, Julien Palard via Python-Dev ????: > Considering feedback from Ned, what about building this as an independent service? We don't really need to interface with python.org at all, we just need some hardware, a domain, some code to interface with github API and... to start it's probably enough? It would be a usefull POC. This will just move risks to this service. Ned mentioned potential abuse. We will host unchecked content. Malicious user can create a PR which replaces Python documentation with malicious content. The Doc/ directory includes Python scripts and Makefile which are used for building documentation. Malicious user can use this for executing arbitrary code on our server. From stephane at wirtel.be Sun Nov 4 10:46:42 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 16:46:42 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <23b8f062-348b-9750-e31f-0b3dabf09b22@python.org> References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <23b8f062-348b-9750-e31f-0b3dabf09b22@python.org> Message-ID: <20181104154642.GC12103@xps> On 11/04, Steve Dower wrote: >On 04Nov2018 0718, Chris Angelico wrote: >>On Mon, Nov 5, 2018 at 2:11 AM Julien Palard via Python-Dev >> wrote: >>> >>>Considering feedback from Ned, what about building this as an independent service? We don't really need to interface with python.org at all, we just need some hardware, a domain, some code to interface with github API and... to start it's probably enough? It would be a usefull POC. >>> >> >>After running 'make html', the build directory is the entire site as a >>set of static files, right? Maybe the easiest solution is to tie in >>with GitHub Pages. I already have a script that will push a directory >>up as the gh-pages branch of the current repo; it'd just need a tweak >>so it can push to a specific repo, which you could create on GitHub >>for the purpose. Not 100% automatic, but also not too difficult to >>automate, if needed. > >I can trivially attach the built docs as a ZIP file to the Azure >Pipelines build, though that doesn't help the "preview on my phone" >scenario (unless your phone can extract and then open a directory of >HTML files? Mine can't) Or just upload the zip file to the server-live-preview-doc and unzip the result. > >But that's also easy enough to tie into a second step to deploy the >files practically anywhere. Pushing them to a git repo based on the PR >name is easy, and presumably it can be a single repo with directories >for different PRs? It might need a separate job to periodically clean >it up, but this seems very doable. for the subdirectory by PR I am not sure but we could have an issue with the cache of the browser because we will use the same domain, but this solution can be created in few hours. +1 For the subdomain, no problem with the cache, just update the DNS, but we can have an issue with the propagation of the DNS if the TTL is high. We could use Bedevere, not sure but I think it can receive an action about the merge of a PR. Once merge, we execute the clean process ;-) > >Cheers, >Steve >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >https://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: https://mail.python.org/mailman/options/python-dev/stephane%40wirtel.be -- St?phane Wirtel - https://wirtel.be - @matrixise From rosuav at gmail.com Sun Nov 4 10:48:25 2018 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 5 Nov 2018 02:48:25 +1100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <20181104153211.GB12103@xps> References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <20181104153211.GB12103@xps> Message-ID: On Mon, Nov 5, 2018 at 2:33 AM Stephane Wirtel wrote: > > On 11/05, Chris Angelico wrote: > >On Mon, Nov 5, 2018 at 2:11 AM Julien Palard via Python-Dev > > wrote: > >> > >> Considering feedback from Ned, what about building this as an independent service? We don't really need to interface with python.org at all, we just need some hardware, a domain, some code to interface with github API and... to start it's probably enough? It would be a usefull POC. > >> > > > >After running 'make html', the build directory is the entire site as a > >set of static files, right? Maybe the easiest solution is to tie in > >with GitHub Pages. I already have a script that will push a directory > >up as the gh-pages branch of the current repo; it'd just need a tweak > >so it can push to a specific repo, which you could create on GitHub > >for the purpose. Not 100% automatic, but also not too difficult to > >automate, if needed. > > > >https://github.com/Rosuav/shed/blob/master/git-deploy > > Nice idea, but I am not for that. > > 1. We will populate the git repository with a lot of gh-pages branches > and I am not for this solution > 2. This static doc is just temporary, once merged, we have to remove the > link and the content on the server, with the gh-pages, that will be a > commit where we drop the content, but it's a commit and we will consume > the storage of github. > 3. 1 repo has only one gh-pages, in our case, we need to have a lot of > gh-pages for a repo. > Yeah, understood. I was thinking of having the individual patch authors create temporary GitHub repositories to push to. It'd be an optional step; if you want to show people a preview of your PR, just create a repository and push to it (using a script something like that). That way, you don't have to worry about malicious content (since it'll be hosted under the author's name - I'm sure GitHub have measures in place to deal with that, and it wouldn't be Python.org's problem), nor having lots of gh-pages branches sitting around (they'd be the responsibility of the author). > But thank you for your idea. No probs, and I don't mind if it's not adopted. Just wanted to put it out there. ChrisA From stephane at wirtel.be Sun Nov 4 10:49:57 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 16:49:57 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> Message-ID: <20181104154957.GD12103@xps> On 11/04, Serhiy Storchaka wrote: >04.11.18 17:00, Julien Palard via Python-Dev ????: >>Considering feedback from Ned, what about building this as an independent service? We don't really need to interface with python.org at all, we just need some hardware, a domain, some code to interface with github API and... to start it's probably enough? It would be a usefull POC. > >This will just move risks to this service. > >Ned mentioned potential abuse. We will host unchecked content. >Malicious user can create a PR which replaces Python documentation >with malicious content. The content will be generated by the build/html directory from Travis. If Travis is green we upload the doc, if Travis is red, we do not publish it. If there is an abuse, we close/drop the PR, maybe Bedevere can receive this notification via the webhooks and notify the server to remove the doc. > >The Doc/ directory includes Python scripts and Makefile which are used >for building documentation. Malicious user can use this for executing >arbitrary code on our server. Currently, we use Travis. The malicious code will be execute in the container of Travis, not on the server. We only copy the static files and if we use nginx/apache, we don't execute the .py files. Just serve the .html,.css,.js files > >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >https://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: https://mail.python.org/mailman/options/python-dev/stephane%40wirtel.be -- St?phane Wirtel - https://wirtel.be - @matrixise From steve at pearwood.info Sun Nov 4 10:50:19 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 5 Nov 2018 02:50:19 +1100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <20181104151239.GA14162@xps> References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <20181104151239.GA14162@xps> Message-ID: <20181104155017.GX3817@ando.pearwood.info> On Sun, Nov 04, 2018 at 04:12:39PM +0100, Stephane Wirtel wrote: > My workflow is the following steps: > > git wpr XYZ > cd ../cpython-XYZ > ./configure --prefix=$PWD-build --with-pydebug --silent > make -j 4 -s > make PYTHON=../python -C Doc/ venv > make -C Doc/ check suspicious html serve > > and run the browser on http://localhost:8000/ and check the result. > > > 1. Because I am a dev I can do it easily > 2. If you are not a dev, you have to learn a new step (download sources, > compile sources, compile doc and check the result) If I am making doc patches, shouldn't I be doing that *before* I submit the PR? How else will I know that my changes haven't broken the docs? So surely I need to learn those steps regardless? (Not a rhetorical question.) > I think this feature would be really useful for the contributors, the > reviewers and you, the core-dev. Sure. But the usefulness has to be weighed against the extra complexity, the extra "one more thing that can break and needs to be maintained", and the risk of abuse. I have no opinion on whether the pluses outweigh the minuses. -- Steve From stephane at wirtel.be Sun Nov 4 10:58:36 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 16:58:36 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <20181104153211.GB12103@xps> Message-ID: <20181104155836.GE12103@xps> On 11/05, Chris Angelico wrote: >On Mon, Nov 5, 2018 at 2:33 AM Stephane Wirtel wrote: >> >> On 11/05, Chris Angelico wrote: >> >On Mon, Nov 5, 2018 at 2:11 AM Julien Palard via Python-Dev >> > wrote: >> >> >> >> Considering feedback from Ned, what about building this as an independent service? We don't really need to interface with python.org at all, we just need some hardware, a domain, some code to interface with github API and... to start it's probably enough? It would be a usefull POC. >> >> >> > >> >After running 'make html', the build directory is the entire site as a >> >set of static files, right? Maybe the easiest solution is to tie in >> >with GitHub Pages. I already have a script that will push a directory >> >up as the gh-pages branch of the current repo; it'd just need a tweak >> >so it can push to a specific repo, which you could create on GitHub >> >for the purpose. Not 100% automatic, but also not too difficult to >> >automate, if needed. >> > >> >https://github.com/Rosuav/shed/blob/master/git-deploy >> >> Nice idea, but I am not for that. >> >> 1. We will populate the git repository with a lot of gh-pages branches >> and I am not for this solution >> 2. This static doc is just temporary, once merged, we have to remove the >> link and the content on the server, with the gh-pages, that will be a >> commit where we drop the content, but it's a commit and we will consume >> the storage of github. >> 3. 1 repo has only one gh-pages, in our case, we need to have a lot of >> gh-pages for a repo. >> > >Yeah, understood. I was thinking of having the individual patch >authors create temporary GitHub repositories to push to. It'd be an >optional step; if you want to show people a preview of your PR, just >create a repository and push to it (using a script something like >that). That way, you don't have to worry about malicious content >(since it'll be hosted under the author's name - I'm sure GitHub have >measures in place to deal with that, and it wouldn't be Python.org's >problem), nor having lots of gh-pages branches sitting around (they'd >be the responsibility of the author). > >> But thank you for your idea. > >No probs, and I don't mind if it's not adopted. Just wanted to put it out there. In fact, I was interested by your solution because we avoid the maintenance of the server, but in our case, we would host many Docs/build/html. Thanks again > >ChrisA >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >https://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: https://mail.python.org/mailman/options/python-dev/stephane%40wirtel.be -- St?phane Wirtel - https://wirtel.be - @matrixise From stephane at wirtel.be Sun Nov 4 11:05:07 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 17:05:07 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <20181104155017.GX3817@ando.pearwood.info> References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <20181104151239.GA14162@xps> <20181104155017.GX3817@ando.pearwood.info> Message-ID: <20181104160507.GF12103@xps> On 11/05, Steven D'Aprano wrote: >On Sun, Nov 04, 2018 at 04:12:39PM +0100, Stephane Wirtel wrote: > >> My workflow is the following steps: >> >> git wpr XYZ >> cd ../cpython-XYZ >> ./configure --prefix=$PWD-build --with-pydebug --silent >> make -j 4 -s >> make PYTHON=../python -C Doc/ venv >> make -C Doc/ check suspicious html serve >> >> and run the browser on http://localhost:8000/ and check the result. >> >> >> 1. Because I am a dev I can do it easily >> 2. If you are not a dev, you have to learn a new step (download sources, >> compile sources, compile doc and check the result) > >If I am making doc patches, shouldn't I be doing that *before* I >submit the PR? How else will I know that my changes haven't broken the >docs? You can use the web interface of Github and just add/remove/modify a paragraph. > >So surely I need to learn those steps regardless? > >(Not a rhetorical question.) > > >> I think this feature would be really useful for the contributors, the >> reviewers and you, the core-dev. > >Sure. But the usefulness has to be weighed against the extra complexity, >the extra "one more thing that can break and needs to be maintained", >and the risk of abuse. Which kind of abuse? -- St?phane Wirtel - https://wirtel.be - @matrixise From julien at palard.fr Sun Nov 4 11:12:00 2018 From: julien at palard.fr (Julien Palard) Date: Sun, 04 Nov 2018 16:12:00 +0000 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <23b8f062-348b-9750-e31f-0b3dabf09b22@python.org> References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <23b8f062-348b-9750-e31f-0b3dabf09b22@python.org> Message-ID: > I can trivially attach the built docs as a ZIP file to the Azure > Pipelines build, though that doesn't help the "preview on my phone" So one can build an HTTP server that gathers doc builds from Azure and expose them? --? Julien Palard https://mdk.fr From mariatta.wijaya at gmail.com Sun Nov 4 10:53:44 2018 From: mariatta.wijaya at gmail.com (Mariatta Wijaya) Date: Sun, 4 Nov 2018 07:53:44 -0800 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> Message-ID: I think the intent is just uploading the output HTML and static assets. I agree having the temporary output of PR docs build is useful, but I don't think a python.org domain is necessary. If it can be uploaded to any cloud storage service then that's good enough, just provide the link in the status check. The output can be cleared after it receive the PR closed webhook. On Sun, Nov 4, 2018, 7:43 AM Serhiy Storchaka 04.11.18 17:00, Julien Palard via Python-Dev ????: > > Considering feedback from Ned, what about building this as an > independent service? We don't really need to interface with python.org at > all, we just need some hardware, a domain, some code to interface with > github API and... to start it's probably enough? It would be a usefull POC. > > This will just move risks to this service. > > Ned mentioned potential abuse. We will host unchecked content. Malicious > user can create a PR which replaces Python documentation with malicious > content. > > The Doc/ directory includes Python scripts and Makefile which are used > for building documentation. Malicious user can use this for executing > arbitrary code on our server. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/mariatta%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mariatta.wijaya at gmail.com Sun Nov 4 11:02:49 2018 From: mariatta.wijaya at gmail.com (Mariatta Wijaya) Date: Sun, 4 Nov 2018 08:02:49 -0800 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <20181104155017.GX3817@ando.pearwood.info> References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <20181104151239.GA14162@xps> <20181104155017.GX3817@ando.pearwood.info> Message-ID: This will make review turnout quicker, since I can potentially review and view the output from anywhere (phone while on a beach) instead of waiting until I'm back home, open a computer, and then verify the output myself. On Sun, Nov 4, 2018, 7:56 AM Steven D'Aprano On Sun, Nov 04, 2018 at 04:12:39PM +0100, Stephane Wirtel wrote: > > > My workflow is the following steps: > > > > git wpr XYZ > > cd ../cpython-XYZ > > ./configure --prefix=$PWD-build --with-pydebug --silent > > make -j 4 -s > > make PYTHON=../python -C Doc/ venv > > make -C Doc/ check suspicious html serve > > > > and run the browser on http://localhost:8000/ and check the result. > > > > > > 1. Because I am a dev I can do it easily > > 2. If you are not a dev, you have to learn a new step (download sources, > > compile sources, compile doc and check the result) > > If I am making doc patches, shouldn't I be doing that *before* I > submit the PR? How else will I know that my changes haven't broken the > docs? > > So surely I need to learn those steps regardless? > > (Not a rhetorical question.) > > > > I think this feature would be really useful for the contributors, the > > reviewers and you, the core-dev. > > Sure. But the usefulness has to be weighed against the extra complexity, > the extra "one more thing that can break and needs to be maintained", > and the risk of abuse. > > I have no opinion on whether the pluses outweigh the minuses. > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/mariatta%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Sun Nov 4 11:20:04 2018 From: steve.dower at python.org (Steve Dower) Date: Sun, 4 Nov 2018 08:20:04 -0800 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <23b8f062-348b-9750-e31f-0b3dabf09b22@python.org> Message-ID: <7d0df116-abc5-74c2-e9fd-d778da062ec9@python.org> On 04Nov2018 0812, Julien Palard wrote: >> I can trivially attach the built docs as a ZIP file to the Azure >> Pipelines build, though that doesn't help the "preview on my phone" > > So one can build an HTTP server that gathers doc builds from Azure and expose them? Either that, or Azure can push it directly to the server. (This way is simpler, since the processing power is already there and the trigger is trivial, compared to triggering yet another service from the build completion. Though it does mean putting credentials somewhere, which is why a github repo is the easiest option, as those are already there.) Cheers, Steve From stephane at wirtel.be Sun Nov 4 11:25:44 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 17:25:44 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> Message-ID: <20181104162544.GA23921@xps> On 11/04, Mariatta Wijaya wrote: >I think the intent is just uploading the output HTML and static assets. Yep, it's my idea, just upload the output html and static assets, nothing else. > >I agree having the temporary output of PR docs build is useful, but I don't >think a python.org domain is necessary. If it can be uploaded to any cloud >storage service then that's good enough, just provide the link in the >status check. The output can be cleared after it receive the PR closed >webhook. +1 And I think Bedevere is the best candidate for this feature (once closed, remove from the server) -- St?phane Wirtel - https://wirtel.be - @matrixise From tritium-list at sdamon.com Sun Nov 4 11:28:35 2018 From: tritium-list at sdamon.com (Alex Walters) Date: Sun, 4 Nov 2018 11:28:35 -0500 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <20181104133827.GA25586@xps> References: <20181104133827.GA25586@xps> Message-ID: <010401d4745b$6fec9a90$4fc5cfb0$@sdamon.com> Doesn't read the docs already do this for pull requests? Even if it doesn't, don't the core maintainers of read the docs go to pycon? I wouldn't suggest read the docs for primary docs hosting for python, but they are perfectly fine for live testing pull request documentation without having to roll our own. > -----Original Message----- > From: Python-Dev list=sdamon.com at python.org> On Behalf Of Stephane Wirtel > Sent: Sunday, November 4, 2018 8:38 AM > To: python-dev at python.org > Subject: [Python-Dev] Get a running instance of the doc for a PR. > > Hi all, > > When we receive a PR about the documentation, I think that could be > interesting if we could have a running instance of the doc on a sub > domain of python.org. > > For example, pr-10000-doc.python.org or whatever, but by this way the > reviewers could see the result online. > > The workflow would be like that: > > New PR -> build the doc (done by Travis) -> publish it to a server -> > once published, the PR is notified by "doc is available at URL". > > Once merged -> we remove the doc and the link (hello bedevere). > > I am interested by this feature and if you also interested, tell me. > I would like discuss with Julien Palard and Ernest W. Durbin III for a > solution as soon as possible. > > Have a nice day, > > St?phane > > -- > St?phane Wirtel - https://wirtel.be - @matrixise > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium- > list%40sdamon.com From stephane at wirtel.be Sun Nov 4 11:32:38 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 17:32:38 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <20181104151239.GA14162@xps> <20181104155017.GX3817@ando.pearwood.info> Message-ID: <20181104163238.GB23921@xps> On 11/04, Mariatta Wijaya wrote: >This will make review turnout quicker, since I can potentially review and >view the output from anywhere (phone while on a beach) instead of waiting >until I'm back home, open a computer, and then verify the output myself. Yep, last week I was at a dinner at PyCon.DE, I have seen a PR, I wanted to check the result and approve it or not. With this feature, I can check it asap and just approve via github. Once approved, miss-islington could merge the PR, once merged, Bedevere can notify the service and remove the documentation of this PR. -- St?phane Wirtel - https://wirtel.be - @matrixise From nad at python.org Sun Nov 4 11:40:25 2018 From: nad at python.org (Ned Deily) Date: Sun, 4 Nov 2018 11:40:25 -0500 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <20181104151239.GA14162@xps> References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <20181104151239.GA14162@xps> Message-ID: On Nov 4, 2018, at 10:12, Stephane Wirtel wrote: > 3. On this after-noon, I have reviewed a PR, and I was in the same case, > download the PR, build python, compile the doc and run the local server. > > My workflow is the following steps: > > git wpr XYZ > cd ../cpython-XYZ > ./configure --prefix=$PWD-build --with-pydebug --silent > make -j 4 -s > make PYTHON=../python -C Doc/ venv > make -C Doc/ check suspicious html serve > > and run the browser on http://localhost:8000/ and check the result. To address one point, there's no need to run a web server to review the docs. The generated docs can be viewed directly by any modern web browser by opening one of the files in the web browser's file menu. Or perhaps easier, you can click on any of the generated html files; or on macOS you can use the open(1) command from the shell to open and view in the default web browser. $ open build/html/index.html -- Ned Deily nad at python.org -- [] From paul at ganssle.io Sun Nov 4 12:04:31 2018 From: paul at ganssle.io (Paul Ganssle) Date: Sun, 4 Nov 2018 12:04:31 -0500 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <010401d4745b$6fec9a90$4fc5cfb0$@sdamon.com> References: <20181104133827.GA25586@xps> <010401d4745b$6fec9a90$4fc5cfb0$@sdamon.com> Message-ID: <26398c8f-170d-c468-b603-ea79ca8fdcc2@ganssle.io> There is an open request for this on GH, but it's not currently done: https://github.com/rtfd/readthedocs.org/issues/1340 At the PyCon US sprints this year, we added documentation previews via netlify, and they have been super useful: https://github.com/pypa/setuptools/pull/1367 My understanding is that other projects do something similar with CircleCI. It's not amazingly /difficult/ for reviewers to fetch the submitter's branch, build the documentation and review it locally, but it's a decent number of extra steps for what /should/ be a very simple review. I think we all know that reviewer time and effort is one of the biggest bottlenecks in the CPython development workflow, and this could make it /much/ easier to do reviews. Some of the concerns about increasing the surface area I think are a bit overblown. I haven't seen any problems yet in the projects that do this, and I don't think it lends itself to abuse particularly well.//Considering that the rest of the CI suite lets you run arbitrary code on many platforms, I don't think it's particularly more dangerous to allow people to generate ephemeral static hosted web sites as well. Best. Paul On 11/4/18 11:28 AM, Alex Walters wrote: > Doesn't read the docs already do this for pull requests? Even if it doesn't, don't the core maintainers of read the docs go to pycon? I wouldn't suggest read the docs for primary docs hosting for python, but they are perfectly fine for live testing pull request documentation without having to roll our own. > >> -----Original Message----- >> From: Python-Dev > list=sdamon.com at python.org> On Behalf Of Stephane Wirtel >> Sent: Sunday, November 4, 2018 8:38 AM >> To: python-dev at python.org >> Subject: [Python-Dev] Get a running instance of the doc for a PR. >> >> Hi all, >> >> When we receive a PR about the documentation, I think that could be >> interesting if we could have a running instance of the doc on a sub >> domain of python.org. >> >> For example, pr-10000-doc.python.org or whatever, but by this way the >> reviewers could see the result online. >> >> The workflow would be like that: >> >> New PR -> build the doc (done by Travis) -> publish it to a server -> >> once published, the PR is notified by "doc is available at URL". >> >> Once merged -> we remove the doc and the link (hello bedevere). >> >> I am interested by this feature and if you also interested, tell me. >> I would like discuss with Julien Palard and Ernest W. Durbin III for a >> solution as soon as possible. >> >> Have a nice day, >> >> St?phane >> >> -- >> St?phane Wirtel - https://wirtel.be - @matrixise >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium- >> list%40sdamon.com > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From nad at python.org Sun Nov 4 12:16:14 2018 From: nad at python.org (Ned Deily) Date: Sun, 4 Nov 2018 12:16:14 -0500 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <26398c8f-170d-c468-b603-ea79ca8fdcc2@ganssle.io> References: <20181104133827.GA25586@xps> <010401d4745b$6fec9a90$4fc5cfb0$@sdamon.com> <26398c8f-170d-c468-b603-ea79ca8fdcc2@ganssle.io> Message-ID: <261F41E1-7678-47CC-B40B-829F3ADB57BD@python.org> On Nov 4, 2018, at 12:04, Paul Ganssle wrote: > Some of the concerns about increasing the surface area I think are a bit overblown. I haven't seen any problems yet in the projects that do this, and I don't think it lends itself to abuse particularly well. Considering that the rest of the CI suite lets you run arbitrary code on many platforms, I don't think it's particularly more dangerous to allow people to generate ephemeral static hosted web sites as well. The rest of the CI suite does not let you publish things on the python.org domain, unless I'm forgetting something; they're clearly under a CI environment like Travis or AppVeyor or Azure. That's really my main concern. -- Ned Deily nad at python.org -- [] From paul at ganssle.io Sun Nov 4 12:21:15 2018 From: paul at ganssle.io (Paul Ganssle) Date: Sun, 4 Nov 2018 12:21:15 -0500 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <261F41E1-7678-47CC-B40B-829F3ADB57BD@python.org> References: <20181104133827.GA25586@xps> <010401d4745b$6fec9a90$4fc5cfb0$@sdamon.com> <26398c8f-170d-c468-b603-ea79ca8fdcc2@ganssle.io> <261F41E1-7678-47CC-B40B-829F3ADB57BD@python.org> Message-ID: <99a88026-60c4-68df-c022-f7d5186afeae@ganssle.io> Oh, sorry if I misunderstood the concern. Yes, I agree that putting this under python.org would not be a good idea. Either hosting it on a hosting provider like netlify (or azure if that's possible) or a dedicated domain that could be created for the purpose (e.g. python-doc-ci.org) would be best. Alternatively, the domain could be skipped entirely and the github hooks could link directly to documentation by machine IP (though I suspect buying a domain for this purpose would be a lot easier than coordinating what's necessary to make direct links to a machine IP reasonable). Best, Paul On 11/4/18 12:16 PM, Ned Deily wrote: > On Nov 4, 2018, at 12:04, Paul Ganssle wrote: >> Some of the concerns about increasing the surface area I think are a bit overblown. I haven't seen any problems yet in the projects that do this, and I don't think it lends itself to abuse particularly well. Considering that the rest of the CI suite lets you run arbitrary code on many platforms, I don't think it's particularly more dangerous to allow people to generate ephemeral static hosted web sites as well. > The rest of the CI suite does not let you publish things on the python.org domain, unless I'm forgetting something; they're clearly under a CI environment like Travis or AppVeyor or Azure. That's really my main concern. > > > -- > Ned Deily > nad at python.org -- [] > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From stephane at wirtel.be Sun Nov 4 14:02:33 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 20:02:33 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <99a88026-60c4-68df-c022-f7d5186afeae@ganssle.io> References: <20181104133827.GA25586@xps> <010401d4745b$6fec9a90$4fc5cfb0$@sdamon.com> <26398c8f-170d-c468-b603-ea79ca8fdcc2@ganssle.io> <261F41E1-7678-47CC-B40B-829F3ADB57BD@python.org> <99a88026-60c4-68df-c022-f7d5186afeae@ganssle.io> Message-ID: <20181104190233.GA29252@xps> On 11/04, Paul Ganssle wrote: >Oh, sorry if I misunderstood the concern. Yes, I agree that putting this >under python.org would not be a good idea. > >Either hosting it on a hosting provider like netlify (or azure if that's >possible) or a dedicated domain that could be created for the purpose >(e.g. python-doc-ci.org) would be best. Alternatively, the domain could >be skipped entirely and the github hooks could link directly to >documentation by machine IP (though I suspect buying a domain for this >purpose would be a lot easier than coordinating what's necessary to make >direct links to a machine IP reasonable). Yep, I am fine with that. Thanks Paul > > >Best, >Paul > >On 11/4/18 12:16 PM, Ned Deily wrote: >> On Nov 4, 2018, at 12:04, Paul Ganssle wrote: >>> Some of the concerns about increasing the surface area I think are a bit overblown. I haven't seen any problems yet in the projects that do this, and I don't think it lends itself to abuse particularly well. Considering that the rest of the CI suite lets you run arbitrary code on many platforms, I don't think it's particularly more dangerous to allow people to generate ephemeral static hosted web sites as well. >> The rest of the CI suite does not let you publish things on the python.org domain, unless I'm forgetting something; they're clearly under a CI environment like Travis or AppVeyor or Azure. That's really my main concern. >> >> >> -- >> Ned Deily >> nad at python.org -- [] >> > >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >https://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: https://mail.python.org/mailman/options/python-dev/stephane%40wirtel.be -- St?phane Wirtel - https://wirtel.be - @matrixise From stephane at wirtel.be Sun Nov 4 14:05:31 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 4 Nov 2018 20:05:31 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <20181104151239.GA14162@xps> Message-ID: <20181104190531.GB29252@xps> On 11/04, Ned Deily wrote: >On Nov 4, 2018, at 10:12, Stephane Wirtel wrote: >> 3. On this after-noon, I have reviewed a PR, and I was in the same case, >> download the PR, build python, compile the doc and run the local server. >> >> My workflow is the following steps: >> >> git wpr XYZ >> cd ../cpython-XYZ >> ./configure --prefix=$PWD-build --with-pydebug --silent >> make -j 4 -s >> make PYTHON=../python -C Doc/ venv >> make -C Doc/ check suspicious html serve >> >> and run the browser on http://localhost:8000/ and check the result. > >To address one point, there's no need to run a web server to review the >docs. The generated docs can be viewed directly by any modern web >browser by opening one of the files in the web browser's file menu. Or >perhaps easier, you can click on any of the generated html files; or on >macOS you can use the open(1) command from the shell to open and view >in the default web browser. > >$ open build/html/index.html Ned, I agree with you about this point, but there is an advantage when you use localhost:8000, you only have one entrypoint. And when you switch between many PR, you just have to execute the make -C Doc/ serve command (because in my workflow and with the full URI, the path can change in function of the PR, I use git worktree for each PR). But yep, I am +1 for firefox build/html/index.html > > >-- > Ned Deily > nad at python.org -- [] > -- St?phane Wirtel - https://wirtel.be - @matrixise From mhroncok at redhat.com Sun Nov 4 16:58:49 2018 From: mhroncok at redhat.com (=?UTF-8?Q?Miro_Hron=c4=8dok?=) Date: Sun, 4 Nov 2018 22:58:49 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <20181104154957.GD12103@xps> References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <20181104154957.GD12103@xps> Message-ID: <22297099-29ed-dbb2-ee69-e8c23af1a844@redhat.com> On 04. 11. 18 16:49, Stephane Wirtel wrote: > On 11/04, Serhiy Storchaka wrote: >> 04.11.18 17:00, Julien Palard via Python-Dev ????: >>> Considering feedback from Ned, what about building this as an >>> independent service? We don't really need to interface with >>> python.org at all, we just need some hardware, a domain, some code to >>> interface with github API and... to start it's probably enough? It >>> would be a usefull POC. >> >> This will just move risks to this service. >> >> Ned mentioned potential abuse. We will host unchecked content. >> Malicious user can create a PR which replaces Python documentation >> with malicious content. > The content will be generated by the build/html directory from Travis. > If Travis is green we upload the doc, if Travis is red, we do not > publish it. If there is an abuse, we close/drop the PR, maybe Bedevere > can receive this notification via the webhooks and notify the server to > remove the doc. >> >> The Doc/ directory includes Python scripts and Makefile which are used >> for building documentation. Malicious user can use this for executing >> arbitrary code on our server. > Currently, we use Travis. The malicious code will be execute in the > container of Travis, not on the server. We only copy the static files > and if we use nginx/apache, we don't execute the .py files. Just serve > the .html,.css,.js files Yet you need to let Travis CI know how to upload the results (HTML files) somewhere. That usually involves some kind of credentials. You can secretly tell Travis the credentials, however somebody would be able to modify the Makefile (or anything else) to send the credentials to their webservice/e-mail/whatever. That is in fact not possible, for exactly this reason: Travis CI builds from PRs don't see the "secret" stuff. https://docs.travis-ci.com/user/environment-variables#defining-encrypted-variables-in-travisyml -- Miro Hron?ok -- Phone: +420777974800 IRC: mhroncok From steve at pearwood.info Sun Nov 4 17:38:36 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 5 Nov 2018 09:38:36 +1100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <261F41E1-7678-47CC-B40B-829F3ADB57BD@python.org> References: <20181104133827.GA25586@xps> <010401d4745b$6fec9a90$4fc5cfb0$@sdamon.com> <26398c8f-170d-c468-b603-ea79ca8fdcc2@ganssle.io> <261F41E1-7678-47CC-B40B-829F3ADB57BD@python.org> Message-ID: <20181104223836.GY3817@ando.pearwood.info> On Sun, Nov 04, 2018 at 12:16:14PM -0500, Ned Deily wrote: > On Nov 4, 2018, at 12:04, Paul Ganssle wrote: > > > Some of the concerns about increasing the surface area I think are a > > bit overblown. I haven't seen any problems yet in the projects that > > do this, You may or may not be right, but have you looked for problems or just assumed that because nobody has brought any to your attention, they don't exist? "I have seen nothing" != "there is nothing to see". > > and I don't think it lends itself to abuse particularly > > well. Considering that the rest of the CI suite lets you run > > arbitrary code on many platforms, I don't think it's particularly > > more dangerous to allow people to generate ephemeral static hosted > > web sites as well. > > The rest of the CI suite does not let you publish things on the > python.org domain, unless I'm forgetting something; they're clearly > under a CI environment like Travis or AppVeyor or Azure. That's > really my main concern. Sorry Ned, I don't follow you here. It sounds like you're saying that you're fine with spam or abusive content being hosted in our name, so long as its hosted by somebody else, rather than by us (python.org) ourselves. I trust I'm missing something, but I don't know what it is. -- Steve From steve at pearwood.info Sun Nov 4 17:48:42 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 5 Nov 2018 09:48:42 +1100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <20181104160507.GF12103@xps> References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <20181104151239.GA14162@xps> <20181104155017.GX3817@ando.pearwood.info> <20181104160507.GF12103@xps> Message-ID: <20181104224842.GZ3817@ando.pearwood.info> On Sun, Nov 04, 2018 at 05:05:07PM +0100, Stephane Wirtel wrote: > >If I am making doc patches, shouldn't I be doing that *before* I > >submit the PR? How else will I know that my changes haven't broken the > >docs? > > You can use the web interface of Github and just add/remove/modify a > paragraph. Does Github show a preview? If not, then my question still stands: how do I know my changes aren't broken? If Github does show a preview, then couldn't the reviewer look at that too? -- Steve From paul at ganssle.io Sun Nov 4 18:01:30 2018 From: paul at ganssle.io (Paul Ganssle) Date: Sun, 4 Nov 2018 18:01:30 -0500 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <20181104223836.GY3817@ando.pearwood.info> References: <20181104133827.GA25586@xps> <010401d4745b$6fec9a90$4fc5cfb0$@sdamon.com> <26398c8f-170d-c468-b603-ea79ca8fdcc2@ganssle.io> <261F41E1-7678-47CC-B40B-829F3ADB57BD@python.org> <20181104223836.GY3817@ando.pearwood.info> Message-ID: <156b2748-5a38-9cee-7700-b4891351a851@ganssle.io> On 11/4/18 5:38 PM, Steven D'Aprano wrote: > On Sun, Nov 04, 2018 at 12:16:14PM -0500, Ned Deily wrote: > >> On Nov 4, 2018, at 12:04, Paul Ganssle wrote: >> >>> Some of the concerns about increasing the surface area I think are a >>> bit overblown. I haven't seen any problems yet in the projects that >>> do this, > You may or may not be right, but have you looked for problems or just > assumed that because nobody has brought any to your attention, they > don't exist? > > "I have seen nothing" != "there is nothing to see". > I can only speak from my experience with setuptools, but I do look at every setuptools PR and I've never seen anything even close to this. That said, I have also never seen anyone using my Travis or Appveyor instances to mine cryptocurrency, but I've been told that that happens. In any case, I think the standard should not be "this never happens" (otherwise you also can't run CI), but that it happens rarely enough that it's not a major problem and that you can deal with it when it does come up. Frankly, I think the much more likely target for these sorts of attacks is small, mostly abandoned projects with very few followers. If you post a spam site on some ephemeral domain via the CPython CI, it's likely that hundreds of people will notice it just because it's a very active project. You will be banned from the project for life and probably reported to github nearly instantly. Likely you have much more value for your time if you target some 1-star repo that set this up 2 years ago and is maintained by someone who hasn't committed to github in over a year. That said, big projects like CPython are probably more likely to attract the troll version of this, where the point isn't to get away with hosting some content or using the CI, but to annoy and disrupt the project itself by wasting our resources chasing down spam or whatever. I think if that isn't already happening with comment floods on the issue tracker, GH threads and mailing lists, it's not especially /more/ likely to happen because people can spin up a website with a PR. >>> and I don't think it lends itself to abuse particularly >>> well. Considering that the rest of the CI suite lets you run >>> arbitrary code on many platforms, I don't think it's particularly >>> more dangerous to allow people to generate ephemeral static hosted >>> web sites as well. >> The rest of the CI suite does not let you publish things on the >> python.org domain, unless I'm forgetting something; they're clearly >> under a CI environment like Travis or AppVeyor or Azure. That's >> really my main concern. > Sorry Ned, I don't follow you here. It sounds like you're saying that > you're fine with spam or abusive content being hosted in our name, so > long as its hosted by somebody else, rather than by us (python.org) > ourselves. > > I trust I'm missing something, but I don't know what it is. I think there are two concerns - one is that the python.org domain is generally (currently) used for official content. If people can put arbitrary websites on there, presumably they can exploit whatever trust people have put into this fact. Another is that - and I am not a web expert here - I think that the domain where content is hosted is used as a marker of trust between different pages, and many applications will consider anything on *.python.org to be first-party content from other *.python.org domains.? I believe this is the reason why readthedocs moved all hosted documentation from *.readthedocs.org to *.readthedocs.io. Similarly user-submitted content on PyPI is usually hosted under the pythonhosted.org domain, not pypi.org or pypi.python.org. You'll notice that GH also hosts user content under a githubusercontent.org domain. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From vstinner at redhat.com Sun Nov 4 18:48:42 2018 From: vstinner at redhat.com (Victor Stinner) Date: Mon, 5 Nov 2018 00:48:42 +0100 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <20181104133827.GA25586@xps> References: <20181104133827.GA25586@xps> Message-ID: OpenStack does that on review.openstack.org PRs. If I recall correctly, the CI produces files which are online on a static web server. Nothing crazy. And it works. Old files are removed, I don't know when exactly. I don't think that it matters where the static files are hosted. Victor Le dimanche 4 novembre 2018, Stephane Wirtel a ?crit : > Hi all, > > When we receive a PR about the documentation, I think that could be > interesting if we could have a running instance of the doc on a sub > domain of python.org. > > For example, pr-10000-doc.python.org or whatever, but by this way the > reviewers could see the result online. > > The workflow would be like that: > > New PR -> build the doc (done by Travis) -> publish it to a server -> > once published, the PR is notified by "doc is available at URL". > > Once merged -> we remove the doc and the link (hello bedevere). > > I am interested by this feature and if you also interested, tell me. > I would like discuss with Julien Palard and Ernest W. Durbin III for a > solution as soon as possible. > > Have a nice day, > > St?phane > > -- > St?phane Wirtel - https://wirtel.be - @matrixise > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sorin.sbarnea at gmail.com Sun Nov 4 19:39:32 2018 From: sorin.sbarnea at gmail.com (Sorin Sbarnea) Date: Mon, 5 Nov 2018 00:39:32 +0000 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> Message-ID: <95112A17-AC0F-47F9-BC00-A431F02E2FA8@gmail.com> I can confirm that this is what OpenStack does. Sometimes the build artifacts (logs, docs ...) are rotated in two weeks but this is more than enough to perform a review. If I remember well retention is based on disk space and not hardcoded to a number of days, which is great. TBH, I don't really know how a human can check the docs if they cannot access them on a webserver. I hope to see the same on python too, very useful. > On 4 Nov 2018, at 23:48, Victor Stinner wrote: > > OpenStack does that on review.openstack.org PRs. If I recall correctly, the CI produces files which are online on a static web server. Nothing crazy. And it works. Old files are removed, I don't know when exactly. > > I don't think that it matters where the static files are hosted. > > Victor > > Le dimanche 4 novembre 2018, Stephane Wirtel > a ?crit : > > Hi all, > > > > When we receive a PR about the documentation, I think that could be > > interesting if we could have a running instance of the doc on a sub > > domain of python.org . > > > > For example, pr-10000-doc.python.org or whatever, but by this way the > > reviewers could see the result online. > > > > The workflow would be like that: > > > > New PR -> build the doc (done by Travis) -> publish it to a server -> > > once published, the PR is notified by "doc is available at URL". > > > > Once merged -> we remove the doc and the link (hello bedevere). > > > > I am interested by this feature and if you also interested, tell me. > > I would like discuss with Julien Palard and Ernest W. Durbin III for a > > solution as soon as possible. > > > > Have a nice day, > > > > St?phane > > > > -- > > St?phane Wirtel - https://wirtel.be - @matrixise > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/sorin.sbarnea%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From nad at python.org Sun Nov 4 21:11:49 2018 From: nad at python.org (Ned Deily) Date: Sun, 4 Nov 2018 21:11:49 -0500 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <95112A17-AC0F-47F9-BC00-A431F02E2FA8@gmail.com> References: <20181104133827.GA25586@xps> <95112A17-AC0F-47F9-BC00-A431F02E2FA8@gmail.com> Message-ID: <8064CA01-718F-4229-9772-96017B299476@python.org> Not to belabor the point but: On Nov 4, 2018, at 19:39, Sorin Sbarnea wrote: > TBH, I don't really know how a human can check the docs if they cannot access them on a webserver. It's actually trivially easy with the Python doc set because the docs were designed to also be downloadable and usable off-line. That is, the same files html, js, css, and other resources that get built and loaded onto the website can also be just browsed directly from a file system, like: firefox Doc/build/html/index.html or, say, using the Open File command in Safari or whatever. No web server is needed. The docsets for the heads of each of the active branches are built and packaged nightly and downloadable from python.org: https://docs.python.org/3/download.html Also, archive copies of the docset at the time of each release is downloadable from here: https://www.python.org/doc/versions/ There are built using the same Doc/Makefile found in the cpython repo branches and the resulting Doc/build/html directory contains the docset for that snapshot of the repo with every file in the proper location for the webbrowser to load directly. -- Ned Deily nad at python.org -- [] From pierre.glaser at inria.fr Mon Nov 5 06:30:39 2018 From: pierre.glaser at inria.fr (Pierre Glaser) Date: Mon, 5 Nov 2018 12:30:39 +0100 Subject: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data In-Reply-To: <34eaf991-fb6c-c540-ea6d-e7b6267df34f@gmail.com> References: <34eaf991-fb6c-c540-ea6d-e7b6267df34f@gmail.com> Message-ID: Hi All, As part of our scikit-learn development and our effort to provide better parallelism for python, we rely heavily on dynamic classes and functions pickling. For this usage we use cloudpickle, but it suffers from performance issues due to its pure python implementation. After long discussions with Olivier Grisel and Thomas Moreau, we ended up agreeing on the fact that the best solution for this problem would be to add those functionalities to the _pickle.c module. I am already quite familiar with the C/Python API, and can dedicate a lot of my time in the next couple months to make this happen. Serhiy, from this thread (https://mail.python.org/pipermail/python-dev/2018-March/152509.html), it seems that you already started implementing local classes pickling. I would be happy to use this work as a starting point and build from it. What do you think? Regards, Pierre From pierre.glaser at inria.fr Mon Nov 5 06:36:50 2018 From: pierre.glaser at inria.fr (Pierre Glaser) Date: Mon, 5 Nov 2018 12:36:50 +0100 Subject: [Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data In-Reply-To: <34eaf991-fb6c-c540-ea6d-e7b6267df34f@gmail.com> References: <34eaf991-fb6c-c540-ea6d-e7b6267df34f@gmail.com> Message-ID: <80f765f2-850f-76c7-3ba1-3a5ca2c87701@inria.fr> Hi All, As part of our scikit-learn development and our effort to provide better parallelism for python, we rely heavily on dynamic classes and functions pickling. For this usage we use cloudpickle, but it suffers from performance issues due to its pure python implementation. After long discussions with Olivier Grisel and Thomas Moreau, we ended up agreeing on the fact that the best solution for this problem would be to add those functionalities to the _pickle.c module. I am already quite familiar with the C/Python API, and can dedicate a lot of my time in the next couple months to make this happen. Serhiy, from this thread (https://mail.python.org/pipermail/python-dev/2018-March/152509.html), it seems that you already started implementing local classes pickling. I would be happy to use this work as a starting point and build from it. What do you think? Regards, Pierre From vstinner at redhat.com Tue Nov 6 10:09:09 2018 From: vstinner at redhat.com (Victor Stinner) Date: Tue, 6 Nov 2018 16:09:09 +0100 Subject: [Python-Dev] What is the difference between Py_BUILD_CORE and Py_BUILD_CORE_BUILTIN? Message-ID: Hi, I'm trying to cleanup the Python C API, especially strictly separate the public C API, the stable C API (Py_LIMITED_API), and the internal C API (Py_BUILD_CORE). Move internal headers to Include/internal/ : https://bugs.python.org/issue35081 Move !Py_LIMITED_API to Include/pycapi/: https://bugs.python.org/issue35134 I tried to ensure that Py_BUILD_CORE is defined when including pycore_xxx.h headers from Include/internal/, but the compilation of the _json module fails. Modules/_json.c contains: /* Core extension modules are built-in on some platforms (e.g. Windows). */ #ifdef Py_BUILD_CORE #define Py_BUILD_CORE_BUILTIN #undef Py_BUILD_CORE #endif I don't understand the difference between Py_BUILD_CORE and Py_BUILD_CORE_BUILTIN defines. Do we need to have two different defines? Can't we compile _json with Py_BUILD_CORE? _json.c uses pycore_accu.h: /* * A two-level accumulator of unicode objects that avoids both the overhead * of keeping a huge number of small separate objects, and the quadratic * behaviour of using a naive repeated concatenation scheme. */ Is it a problem of the visibility/scope of symbols in the python DLL on Windows? Victor From srinivasan.rns at gmail.com Tue Nov 6 13:07:40 2018 From: srinivasan.rns at gmail.com (srinivasan) Date: Tue, 6 Nov 2018 23:37:40 +0530 Subject: [Python-Dev] SyntaxError: can't assign to literal while using ""blkid -o export %s | grep 'TYPE' | cut -d"=" -f3" % (fs)" using subprocess module in Python Message-ID: Dear Python Experts Team, As am newbie to python development, I am trying to use the below function to get verify the filesystem type of the SD card parition using bash command in python using subprocess module, I ma seeing the below Error "SyntaxError: can't assign to literal" *CODE:* *====* import helper from os import path import subprocess import os import otg_ni class emmc(object): """ emmc getters and setters info: https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt """ def __init__(self): self._helper = helper.helper() self._otg_ni = otg_ni.otg_ni() *def get_fstype_of_mounted_partition(self, fs):* """ Get the filesystem type of the mounted partition. :partition_name : Partition path as string (e.g. /dev/mmcblk0p1) :return: filesystem type as string or None if not found """ * cmd = "blkid -o export %s | grep 'TYPE' | cut -d"=" -f3" % (fs)* *return self._helper.execute_cmd_output_string(cmd)* *def execute_cmd_output_string(self, cmd, enable_shell=False):* """ Execute a command and return its output as a string. :param cmd: abs path of the command with arguments :param enable_shell : force the cmd to be run as shell script :return: a string. """ try: result = subprocess.check_output(split(cmd), stderr=subprocess.STDOUT, shell=enable_shell) except subprocess.CalledProcessError as e: s = """While executing '{}' something went wrong. Return code == '{}' Return output:\n'{}' """.format(cmd, e.returncode, e.output, shell=enable_shell) raise AssertionError(s) return result.strip().decode("utf-8") *if __name__ == "__main__":* m = emmc() * m.get_fstype_of_mounted_partition("/dev/mmcblk0p1")* *Error:* *======* root:~/qa/test_library# python3 sd.py File "sd.py", line 99 * cmd = "blkid -o export %s | grep 'TYPE' | cut -d"=" -f3" % (fs)* * ^* *SyntaxError: can't assign to literal* root:~/qa/test_library# Kindly do the needful as early as possible, as am stuck with this issue from past 2 days no clues yet, please redirect me to the correct forum if this is not the right place for pasting python related queries Many Thanks in advance, Srini -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Tue Nov 6 13:13:19 2018 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 6 Nov 2018 11:13:19 -0700 Subject: [Python-Dev] Rename Include/internals/ to Include/pycore/ In-Reply-To: References: <1540759207.1168671.1557524968.5CFA99F3@webmail.messagingengine.com> <20181102172238.u7ysgkpj2b7aids5@python.ca> Message-ID: On Fri, Nov 2, 2018 at 1:02 PM Victor Stinner wrote: > > Le ven. 2 nov. 2018 ? 18:32, Neil Schemenauer a ?crit : > > A simple approach would be to introduce something like > > Python-internal.h. If you are a Python internal unit, you can > > include both Python.h and Python-internal.h. We could, over time, > > split Python-iternal.h into smaller modular includes. > > Since this discussion, I already moved most Py_BUILD_CORE in > Include/internal/. I added a lot of #include "pycore_xxx.h" in C > files. > > I started to reach the limit with this PR which adds the > pycore_object.h include to not less than 33 C files: > https://github.com/python/cpython/pull/10272 > > I rewrote this PR to avoid the need to modify 33 C files: > https://github.com/python/cpython/pull/10276/files > > I added the following code to Python.h: > > #ifdef Py_BUILD_CORE > /* bpo-35081: Automatically include pycore_object.h in Python.h, to avoid > to have to add an explicit #include "pycore_object.h" to each C file. */ > # include "pycore_object.h" > #endif > > I'm not sure of it's a temporary workaround or not :-) Maybe a > Python-internal.h would be a better solution? We can identify the most > common header files needed to access "Python internals" and put them > in this "common" header file. > > For example, most C files using Python internals have to access > _PyRuntime global variable (55 C files), so "pycore_state.h" is a good > candidate. FWIW, I'm still in favor of keeping the includes specific. For me personally it helps to narrow down dependencies, making the source easier to follow. > By the way, I don't understand the rationale to have _PyRuntime in > pycore_state.h. IMHO pycore_state.h should only contain functions to > get the Python thread and Python interpreter states. IMHO _PyRuntime > is unrelated and should belong to a different header file, maybe > pycore_runtime.h? I didn't do this change yet *because* I would have > to modify the 55 files currently including it. The runtime state isn't all that different from the thread state or the interpreter state. It's simply broader in scope, much like the interpreter state is broader in scope than the thread state. That's why I put it where I did. -eric From rosuav at gmail.com Tue Nov 6 13:15:21 2018 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 7 Nov 2018 05:15:21 +1100 Subject: [Python-Dev] SyntaxError: can't assign to literal while using ""blkid -o export %s | grep 'TYPE' | cut -d"=" -f3" % (fs)" using subprocess module in Python In-Reply-To: References: Message-ID: On Wed, Nov 7, 2018 at 5:11 AM srinivasan wrote: > As am newbie to python development, I am trying to use the below function to get verify the filesystem type of the SD card parition using bash command in python using subprocess module, I ma seeing the below Error "SyntaxError: can't assign to literal" > This is more appropriate for python-list than for python-dev, which is about the development OF Python. But your problem here is quite simple: > cmd = "blkid -o export %s | grep 'TYPE' | cut -d"=" -f3" % (fs) You're trying to use the same quotation mark inside the string and outside it. That's not going to work. Try removing your inner quotes. You may also want to consider using Python to do your searching and cutting, rather than a shell pipeline. For more information on that, ask on python-list; people will be very happy to help out. ChrisA From ericsnowcurrently at gmail.com Tue Nov 6 13:19:17 2018 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 6 Nov 2018 11:19:17 -0700 Subject: [Python-Dev] What is the difference between Py_BUILD_CORE and Py_BUILD_CORE_BUILTIN? In-Reply-To: References: Message-ID: On Tue, Nov 6, 2018 at 8:09 AM Victor Stinner wrote: > I don't understand the difference between Py_BUILD_CORE and > Py_BUILD_CORE_BUILTIN defines. Do we need to have two different > defines? Can't we compile _json with Py_BUILD_CORE? > > [snip] > > Is it a problem of the visibility/scope of symbols in the python DLL on Windows? Yep. I added Py_BUILD_CORE_BUILTIN as a workaround for building on Windows. For extension modules on Windows there's a conflict between Py_BUILD_CORE and some of the Windows symbols that get defined. My Windows build knowledge is pretty limited so that's about as far as I can go. :) Steve might be able to give you more info. -eric From david at graniteweb.com Tue Nov 6 13:33:59 2018 From: david at graniteweb.com (David Rock) Date: Tue, 6 Nov 2018 12:33:59 -0600 Subject: [Python-Dev] [Tutor] SyntaxError: can't assign to literal while using ""blkid -o export %s | grep 'TYPE' | cut -d"=" -f3" % (fs)" using subprocess module in Python In-Reply-To: References: Message-ID: > > *def get_fstype_of_mounted_partition(self, fs):* > """ > Get the filesystem type of the mounted partition. > > :partition_name : Partition path as string (e.g. /dev/mmcblk0p1) > :return: filesystem type as string or None if not found > """ > > * cmd = "blkid -o export %s | grep 'TYPE' | cut -d"=" -f3" % (fs)* > *return self._helper.execute_cmd_output_string(cmd)* > > > > root:~/qa/test_library# python3 sd.py > File "sd.py", line 99 > * cmd = "blkid -o export %s | grep 'TYPE' | cut -d"=" -f3" % (fs)* > * ^* > *SyntaxError: can't assign to literal* > root:~/qa/test_library# > looking at cmd = "blkid -o export %s | grep 'TYPE' | cut -d"=" -f3" % (fs)* It?s probably because you have ? characters that are inside ? characters and it can?t tell where the string ends. It looks like you are trying to do cmd = "blkid -o export %s | grep 'TYPE' | cut -d? = " -f3" % (fs)* which doesn?t make sense. Try using triple quotes instead so it?s clear what string you are trying to use. cmd = ?""blkid -o export %s | grep 'TYPE' | cut -d"=" -f3?"" % (fs) ? David Rock david at graniteweb.com From steve at pearwood.info Tue Nov 6 17:25:51 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 7 Nov 2018 09:25:51 +1100 Subject: [Python-Dev] SyntaxError: can't assign to literal while using ""blkid -o export %s | grep 'TYPE' | cut -d"=" -f3" % (fs)" using subprocess module in Python In-Reply-To: References: Message-ID: <20181106222551.GA4071@ando.pearwood.info> This list is for the development *of* the Python interpreter, not support for *using* Python. If you signed up to this mailing list via the web, it clearly says: Do not post general Python questions to this list. highlighted in a great big box in red. Was that not clear enough? What can we do to make that more clear? You can try many other places for support, such as Stackoverflow, Reddit's /r/LearnPython, or the Python tutor mailing list: https://mail.python.org/mailman/listinfo/tutor but I expect most of them will tell you the same thing. You should try to give a *minimum* example of your problem, not your entire script. Start by reading here: http://sscce.org/ As for your SyntaxError, the problem is line 99: > File "sd.py", line 99 > * cmd = "blkid -o export %s | grep 'TYPE' | cut -d"=" -f3" % (fs)* > * ^* > *SyntaxError: can't assign to literal* The arrow ^ is pointing to the wrong equals sign, but simplifying the code and adding spaces makes it clear: cmd = "spam" = "eggs" gives the same error. Perhaps you mean == (equals) or perhaps the second assignment shouldn't be there at all? -- Steve From justinarthur at gmail.com Wed Nov 7 23:24:55 2018 From: justinarthur at gmail.com (Justin Turner Arthur) Date: Wed, 7 Nov 2018 22:24:55 -0600 Subject: [Python-Dev] Implementing an awaitable Message-ID: I'm trying to figure out if our documentation on the new awaitable concept in Python 3.6+ is correct. It seems to imply that if an object's __await__ method returns an iterator, the object is awaitable. However, just returning an iterator doesn't seem to work with await in a coroutine or with the asyncio selector loop's run_until_complete method. If the awaitable is not a coroutine or future, it looks like we wrap it in a coroutine using sub-generator delegation, and therefore have to have an iterator that fits a very specific shape for the coroutine step process that isn't documented anywhere I could find. Am I missing something? If the definition of an awaitable is more than just an __await__ iterator, we may need to expand the documentation as well as the abstract base class. Here's what I tried in making a synchronous awaitable that resolves to the int 42: class MyAwaitable(Awaitable): def __await__(self): return iter((42,)) # RuntimeError: Task got bad yield: 42 class MyAwaitable(Awaitable): def __await__(self): yield 42 # RuntimeError: Task got bad yield: 42 class MyAwaitable(Awaitable): def __await__(self): return (i for i in (42,)) # RuntimeError: Task got bad yield: 42 class MyAwaitable(Awaitable): def __await__(self): return self def __next__(self): return 42 # RuntimeError: Task got bad yield: 42''' class MyAwaitable(Awaitable): def __await__(self): return iter(asyncio.coroutine(lambda: 42)()) # TypeError: __await__() returned a coroutine class MyAwaitable(Awaitable): def __await__(self): yield from asyncio.coroutine(lambda: 42)() # None class MyAwaitable(Awaitable): def __await__(self): return (yield from asyncio.coroutine(lambda: 42)()) # 42 async def await_things(): print(await MyAwaitable()) asyncio.get_event_loop().run_until_complete(await_things()) -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Nov 8 00:26:59 2018 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 7 Nov 2018 21:26:59 -0800 Subject: [Python-Dev] Implementing an awaitable In-Reply-To: References: Message-ID: "Awaitable" is a language-level concept. To actually use awaitables, you also need a coroutine runner library, and each library defines additional restrictions on the awaitables it works with. So e.g. when using asyncio as your coroutine runner, asyncio expects your awaitables to follow particular rules about what values they yield, what kinds of values they can handle being sent/thrown back in, etc. Different async libraries use different rules here. Asyncio's rules aren't documented, I guess because it's such a low-level thing that anyone who really needs to know is expected to read the source :-). (In particular asyncio/futures.py and asyncio/tasks.py.) But it's basically: the object returned by __await__ has to implement the generator interface (which is a superset of the iterator interface), the objects yielded by your iterator have to implement the Future interface, and then you're resumed either by sending back None when the Future completes, or else by having an exception thrown in. -n On Wed, Nov 7, 2018 at 8:24 PM, Justin Turner Arthur wrote: > I'm trying to figure out if our documentation on the new awaitable concept > in Python 3.6+ is correct. It seems to imply that if an object's __await__ > method returns an iterator, the object is awaitable. However, just returning > an iterator doesn't seem to work with await in a coroutine or with the > asyncio selector loop's run_until_complete method. > > If the awaitable is not a coroutine or future, it looks like we wrap it in a > coroutine using sub-generator delegation, and therefore have to have an > iterator that fits a very specific shape for the coroutine step process that > isn't documented anywhere I could find. Am I missing something? > > If the definition of an awaitable is more than just an __await__ iterator, > we may need to expand the documentation as well as the abstract base class. > > Here's what I tried in making a synchronous awaitable that resolves to the > int 42: > class MyAwaitable(Awaitable): > def __await__(self): > return iter((42,)) > # RuntimeError: Task got bad yield: 42 > > class MyAwaitable(Awaitable): > def __await__(self): > yield 42 > # RuntimeError: Task got bad yield: 42 > > class MyAwaitable(Awaitable): > def __await__(self): > return (i for i in (42,)) > # RuntimeError: Task got bad yield: 42 > > class MyAwaitable(Awaitable): > def __await__(self): > return self > def __next__(self): > return 42 > # RuntimeError: Task got bad yield: 42''' > > class MyAwaitable(Awaitable): > def __await__(self): > return iter(asyncio.coroutine(lambda: 42)()) > # TypeError: __await__() returned a coroutine > > class MyAwaitable(Awaitable): > def __await__(self): > yield from asyncio.coroutine(lambda: 42)() > # None > > class MyAwaitable(Awaitable): > def __await__(self): > return (yield from asyncio.coroutine(lambda: 42)()) > # 42 > > async def await_things(): > print(await MyAwaitable()) > > asyncio.get_event_loop().run_until_complete(await_things()) > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/njs%40pobox.com > -- Nathaniel J. Smith -- https://vorpus.org From justinarthur at gmail.com Thu Nov 8 01:05:26 2018 From: justinarthur at gmail.com (Justin Turner Arthur) Date: Thu, 8 Nov 2018 00:05:26 -0600 Subject: [Python-Dev] Implementing an awaitable In-Reply-To: References: Message-ID: Thanks for the thorough rundown, Nathaniel. I started to get an idea of the required shape only by looking at CPython code like you suggest. I wanted to create an awaitable compatible with asyncio and trio that could be awaited more than once unlike a coroutine, and not runner-specific like a Future or Deferred. Are coroutines the only common awaitable the various async libraries are going to have for now? I'll take Python documentation suggestions up with other channels. - Justin On Wed, Nov 7, 2018 at 11:27 PM Nathaniel Smith wrote: > "Awaitable" is a language-level concept. To actually use awaitables, > you also need a coroutine runner library, and each library defines > additional restrictions on the awaitables it works with. So e.g. when > using asyncio as your coroutine runner, asyncio expects your > awaitables to follow particular rules about what values they yield, > what kinds of values they can handle being sent/thrown back in, etc. > Different async libraries use different rules here. > > Asyncio's rules aren't documented, I guess because it's such a > low-level thing that anyone who really needs to know is expected to > read the source :-). (In particular asyncio/futures.py and > asyncio/tasks.py.) But it's basically: the object returned by > __await__ has to implement the generator interface (which is a > superset of the iterator interface), the objects yielded by your > iterator have to implement the Future interface, and then you're > resumed either by sending back None when the Future completes, or else > by having an exception thrown in. > > -n > > On Wed, Nov 7, 2018 at 8:24 PM, Justin Turner Arthur > wrote: > > I'm trying to figure out if our documentation on the new awaitable > concept > > in Python 3.6+ is correct. It seems to imply that if an object's > __await__ > > method returns an iterator, the object is awaitable. However, just > returning > > an iterator doesn't seem to work with await in a coroutine or with the > > asyncio selector loop's run_until_complete method. > > > > If the awaitable is not a coroutine or future, it looks like we wrap it > in a > > coroutine using sub-generator delegation, and therefore have to have an > > iterator that fits a very specific shape for the coroutine step process > that > > isn't documented anywhere I could find. Am I missing something? > > > > If the definition of an awaitable is more than just an __await__ > iterator, > > we may need to expand the documentation as well as the abstract base > class. > > > > Here's what I tried in making a synchronous awaitable that resolves to > the > > int 42: > > class MyAwaitable(Awaitable): > > def __await__(self): > > return iter((42,)) > > # RuntimeError: Task got bad yield: 42 > > > > class MyAwaitable(Awaitable): > > def __await__(self): > > yield 42 > > # RuntimeError: Task got bad yield: 42 > > > > class MyAwaitable(Awaitable): > > def __await__(self): > > return (i for i in (42,)) > > # RuntimeError: Task got bad yield: 42 > > > > class MyAwaitable(Awaitable): > > def __await__(self): > > return self > > def __next__(self): > > return 42 > > # RuntimeError: Task got bad yield: 42''' > > > > class MyAwaitable(Awaitable): > > def __await__(self): > > return iter(asyncio.coroutine(lambda: 42)()) > > # TypeError: __await__() returned a coroutine > > > > class MyAwaitable(Awaitable): > > def __await__(self): > > yield from asyncio.coroutine(lambda: 42)() > > # None > > > > class MyAwaitable(Awaitable): > > def __await__(self): > > return (yield from asyncio.coroutine(lambda: 42)()) > > # 42 > > > > async def await_things(): > > print(await MyAwaitable()) > > > > asyncio.get_event_loop().run_until_complete(await_things()) > > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python-dev/njs%40pobox.com > > > > > > -- > Nathaniel J. Smith -- https://vorpus.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Nov 9 06:05:07 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 9 Nov 2018 22:05:07 +1100 Subject: [Python-Dev] Signalling NANs Message-ID: <20181109110507.GK4071@ando.pearwood.info> I'm trying to understand some unexpected behaviour with float NANs in Python 3.5. Background: in IEEE-754 maths, NANs (Not A Number) come in two flavours, so called "quiet NANs" and "signalling NANs". By default, arithmetic operations on qnans return a qnan; operations on snans "signal", which in Python terms means raising an exception. The original IEEE-754 standard didn't specify how to distinguish a qnan from a snan, but a de facto standard arose that bit 51 of the float was the "quiet bit", if it were set, the NAN was quiet. For the purposes of this email, I'm going to assume that standard is in place, even though technically speaking it is platform-dependent. According to my tests, it seems that we cannot create snans in Python. The float constructor doesn't recognise "snan", raising ValueError. Nor can we convert a Decimal snan into a float: py> from decimal import Decimal py> snan = Decimal('snan') py> float(dsnan) Traceback (most recent call last): File "", line 1, in ValueError: cannot convert signaling NaN to float But unexpectedly (to me at least), we apparently cannot even create a signalling NAN by casting 64 bits to a float. Here are the functions I use to do the cast: from struct import pack, unpack def cast_float2int(x): return unpack(' x = cast_int2float(0x7ff8000000000001) py> x nan py> hex(cast_float2int(x)) '0x7ff8000000000001' So far so good. But now let me try with a signalling NAN: py> x = cast_int2float(0x7ff0000000000001) py> x nan py> hex(cast_float2int(x)) '0x7ff8000000000001' So it seems that the "quiet" bit is automatically set, even when using the struct module, making it impossible to create snan floats. Is this intended? If so, why? Is this meant as a language feature? Thanks in advance, Steve From storchaka at gmail.com Fri Nov 9 06:46:12 2018 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 9 Nov 2018 13:46:12 +0200 Subject: [Python-Dev] Signalling NANs In-Reply-To: <20181109110507.GK4071@ando.pearwood.info> References: <20181109110507.GK4071@ando.pearwood.info> Message-ID: 09.11.18 13:05, Steven D'Aprano ????: > py> x = cast_int2float(0x7ff0000000000001) > py> x > nan > py> hex(cast_float2int(x)) > '0x7ff8000000000001' I got '0x7ff0000000000001'. From status at bugs.python.org Fri Nov 9 12:09:53 2018 From: status at bugs.python.org (Python tracker) Date: Fri, 9 Nov 2018 18:09:53 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20181109170953.393015789C@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2018-11-02 - 2018-11-09) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6834 ( +6) closed 40118 (+47) total 46952 (+53) Open issues with patches: 2726 Issues opened (42) ================== #10536: Enhancements to gettext docs https://bugs.python.org/issue10536 reopened by serhiy.storchaka #32409: venv activate.bat is UTF-8 encoded but uses current console co https://bugs.python.org/issue32409 reopened by pablogsal #33486: regen autotools related files https://bugs.python.org/issue33486 reopened by ned.deily #34011: Default preference not given to venv DLL's https://bugs.python.org/issue34011 reopened by pablogsal #35145: sqlite3: "select *" should optionally sniff and autoconvert TE https://bugs.python.org/issue35145 reopened by matrixise #35149: pip3 show causing Error for ConfigParaser https://bugs.python.org/issue35149 opened by mdileep #35151: Python 2 xml.etree.ElementTree documentation tutorial uses und https://bugs.python.org/issue35151 opened by epakai #35153: Allow to set headers in xmlrpc.client.ServerProxy https://bugs.python.org/issue35153 opened by ced #35155: Clarify Protocol Handlers in urllib.request Docs https://bugs.python.org/issue35155 opened by Denton-L #35156: Consider revising documentation on Python Builds from source https://bugs.python.org/issue35156 opened by neyuru #35157: Missing pyconfig.h when building from source and pgo flag is e https://bugs.python.org/issue35157 opened by neyuru #35158: Fix PEP 3115 to NOT imply that the class dictionary is used in https://bugs.python.org/issue35158 opened by joydiamond #35163: locale: setlocale(..., 'eo') sets non-existing locale https://bugs.python.org/issue35163 opened by carmenbianca #35164: socket.getfqdn and socket.gethostbyname fail on MacOS https://bugs.python.org/issue35164 opened by ssbarnea #35165: Possible wrong method name in attribute references doc https://bugs.python.org/issue35165 opened by denis-osipov #35166: BUILD_MAP_UNPACK doesn't function as expected for dict subclas https://bugs.python.org/issue35166 opened by bup #35168: shlex punctuation_chars inconsistency https://bugs.python.org/issue35168 opened by tphh #35169: Improve error messages for assignment https://bugs.python.org/issue35169 opened by serhiy.storchaka #35172: Add support for other MSVC compiler versions to distutils. dis https://bugs.python.org/issue35172 opened by Ali Rizvi-Santiago #35173: Re-use already existing functionality to allow Python 2.7.x (b https://bugs.python.org/issue35173 opened by Ali Rizvi-Santiago #35174: Calling for super().__str__ seems to call self.__repr__ in lis https://bugs.python.org/issue35174 opened by Camion #35177: Add missing dependencies between AST/parser header files https://bugs.python.org/issue35177 opened by vstinner #35178: Typo/trivial mistake in warnings.py (may be related to 2.x to https://bugs.python.org/issue35178 opened by tashrifbillah #35181: Doc: Namespace Packages: Inconsistent documentation of __loade https://bugs.python.org/issue35181 opened by mdk #35182: Popen.communicate() breaks when child closes its side of pipe https://bugs.python.org/issue35182 opened by and800 #35183: os.path.splitext documentation needs typical example https://bugs.python.org/issue35183 opened by shaungriffith #35184: Makefile is not correctly generated when compiling pyextat wit https://bugs.python.org/issue35184 opened by mgmacias95 #35185: Logger race condition - loses lines if removeHandler called fr https://bugs.python.org/issue35185 opened by benspiller #35186: distutils.command.upload uses deprecated platform.dist with bd https://bugs.python.org/issue35186 opened by p-ganssle #35189: PEP 475: fnctl functions are not retried if interrupted by a s https://bugs.python.org/issue35189 opened by akeskimo #35190: collections.abc.Sequence cannot be used to test whether a clas https://bugs.python.org/issue35190 opened by hroncok #35191: socket.setblocking(x) treats multiples of 2**32 as False https://bugs.python.org/issue35191 opened by izbyshev #35192: pathlib mkdir throws FileExistsError when not supposed to https://bugs.python.org/issue35192 opened by Adam Dunlap #35193: Off by one error in peephole call to find_op on case RETURN_VA https://bugs.python.org/issue35193 opened by gregory.p.smith #35194: A typo in a constant in cp932 codec https://bugs.python.org/issue35194 opened by izbyshev #35195: Pandas read_csv() is 3.5X Slower on Python 3.7.1 vs Python 3.6 https://bugs.python.org/issue35195 opened by Dragoljub #35196: IDLE text squeezer is too aggressive and is slow https://bugs.python.org/issue35196 opened by rhettinger #35197: graminit.h defines very generic names like 'stmt' or 'test' https://bugs.python.org/issue35197 opened by vstinner #35198: Build issue while compiling cpp files in AIX https://bugs.python.org/issue35198 opened by Ayappan #35199: Convert PyTuple_GET_ITEM() macro to a function call with addit https://bugs.python.org/issue35199 opened by vstinner #35200: Range repr could be better https://bugs.python.org/issue35200 opened by mdk #35201: Recursive '**' matches non-existent directories. https://bugs.python.org/issue35201 opened by daniel Most recent 15 issues with no replies (15) ========================================== #35201: Recursive '**' matches non-existent directories. https://bugs.python.org/issue35201 #35198: Build issue while compiling cpp files in AIX https://bugs.python.org/issue35198 #35195: Pandas read_csv() is 3.5X Slower on Python 3.7.1 vs Python 3.6 https://bugs.python.org/issue35195 #35191: socket.setblocking(x) treats multiples of 2**32 as False https://bugs.python.org/issue35191 #35186: distutils.command.upload uses deprecated platform.dist with bd https://bugs.python.org/issue35186 #35185: Logger race condition - loses lines if removeHandler called fr https://bugs.python.org/issue35185 #35184: Makefile is not correctly generated when compiling pyextat wit https://bugs.python.org/issue35184 #35183: os.path.splitext documentation needs typical example https://bugs.python.org/issue35183 #35177: Add missing dependencies between AST/parser header files https://bugs.python.org/issue35177 #35173: Re-use already existing functionality to allow Python 2.7.x (b https://bugs.python.org/issue35173 #35172: Add support for other MSVC compiler versions to distutils. dis https://bugs.python.org/issue35172 #35169: Improve error messages for assignment https://bugs.python.org/issue35169 #35163: locale: setlocale(..., 'eo') sets non-existing locale https://bugs.python.org/issue35163 #35158: Fix PEP 3115 to NOT imply that the class dictionary is used in https://bugs.python.org/issue35158 #35155: Clarify Protocol Handlers in urllib.request Docs https://bugs.python.org/issue35155 Most recent 15 issues waiting for review (15) ============================================= #35200: Range repr could be better https://bugs.python.org/issue35200 #35199: Convert PyTuple_GET_ITEM() macro to a function call with addit https://bugs.python.org/issue35199 #35197: graminit.h defines very generic names like 'stmt' or 'test' https://bugs.python.org/issue35197 #35194: A typo in a constant in cp932 codec https://bugs.python.org/issue35194 #35193: Off by one error in peephole call to find_op on case RETURN_VA https://bugs.python.org/issue35193 #35191: socket.setblocking(x) treats multiples of 2**32 as False https://bugs.python.org/issue35191 #35189: PEP 475: fnctl functions are not retried if interrupted by a s https://bugs.python.org/issue35189 #35186: distutils.command.upload uses deprecated platform.dist with bd https://bugs.python.org/issue35186 #35181: Doc: Namespace Packages: Inconsistent documentation of __loade https://bugs.python.org/issue35181 #35177: Add missing dependencies between AST/parser header files https://bugs.python.org/issue35177 #35173: Re-use already existing functionality to allow Python 2.7.x (b https://bugs.python.org/issue35173 #35172: Add support for other MSVC compiler versions to distutils. dis https://bugs.python.org/issue35172 #35169: Improve error messages for assignment https://bugs.python.org/issue35169 #35155: Clarify Protocol Handlers in urllib.request Docs https://bugs.python.org/issue35155 #35153: Allow to set headers in xmlrpc.client.ServerProxy https://bugs.python.org/issue35153 Top 10 most discussed issues (10) ================================= #34805: Explicitly specify `MyClass.__subclasses__()` returns classes https://bugs.python.org/issue34805 11 msgs #35194: A typo in a constant in cp932 codec https://bugs.python.org/issue35194 10 msgs #35113: inspect.getsource returns incorrect source for classes when cl https://bugs.python.org/issue35113 9 msgs #35145: sqlite3: "select *" should optionally sniff and autoconvert TE https://bugs.python.org/issue35145 9 msgs #32409: venv activate.bat is UTF-8 encoded but uses current console co https://bugs.python.org/issue32409 8 msgs #34155: email.utils.parseaddr mistakenly parse an email https://bugs.python.org/issue34155 7 msgs #35105: Document that CPython accepts "invalid" identifiers https://bugs.python.org/issue35105 7 msgs #35131: Cannot access to customized paths within .pth file https://bugs.python.org/issue35131 7 msgs #35157: Missing pyconfig.h when building from source and pgo flag is e https://bugs.python.org/issue35157 7 msgs #35104: IDLE: On macOS, Command-M minimizes & opens "Open Module..." https://bugs.python.org/issue35104 6 msgs Issues closed (48) ================== #2504: Add gettext.pgettext() and variants support https://bugs.python.org/issue2504 closed by serhiy.storchaka #17560: problem using multiprocessing with really big objects? https://bugs.python.org/issue17560 closed by pitrou #19675: Pool dies with excessive workers, but does not cleanup https://bugs.python.org/issue19675 closed by mdk #21263: test_gdb failures on os x 10.9.2 https://bugs.python.org/issue21263 closed by ned.deily #25711: Rewrite zipimport from scratch https://bugs.python.org/issue25711 closed by serhiy.storchaka #29341: Missing accepting path-like object in docstrings of os module https://bugs.python.org/issue29341 closed by pablogsal #31553: Extend json.tool to handle jsonlines (with a flag) https://bugs.python.org/issue31553 closed by serhiy.storchaka #31583: 2to3 call for file in current directory yields error https://bugs.python.org/issue31583 closed by denis-osipov #32285: In `unicodedata`, it should be possible to check a unistr's no https://bugs.python.org/issue32285 closed by benjamin.peterson #32512: Add an option to profile to run library module as a script https://bugs.python.org/issue32512 closed by mariocj89 #33000: IDLE Doc: Text consumes unlimited RAM, consoles likely not https://bugs.python.org/issue33000 closed by terry.reedy #33275: glob.glob should explicitly note that results aren't sorted https://bugs.python.org/issue33275 closed by mdk #33462: reversible dict https://bugs.python.org/issue33462 closed by inada.naoki #33578: cjkcodecs missing getstate and setstate implementations https://bugs.python.org/issue33578 closed by inada.naoki #34205: Ansible: _PyImport_LoadDynamicModuleWithSpec() crash on an inv https://bugs.python.org/issue34205 closed by mdk #34547: Wsgiref server does not handle closed connections gracefully https://bugs.python.org/issue34547 closed by chris.jerdonek #34699: allow path-like objects in program arguments in Windows https://bugs.python.org/issue34699 closed by serhiy.storchaka #34726: Add support of checked hash-based pycs in zipimport https://bugs.python.org/issue34726 closed by serhiy.storchaka #34885: asyncio documention has lost its paragraph about cancellation https://bugs.python.org/issue34885 closed by asvetlov #34898: add mtime argument to gzip.compress https://bugs.python.org/issue34898 closed by serhiy.storchaka #34966: Pydoc: better support of method aliases https://bugs.python.org/issue34966 closed by serhiy.storchaka #34969: Add --fast, --best to the gzip CLI https://bugs.python.org/issue34969 closed by mdk #35015: availability directive breaks po files https://bugs.python.org/issue35015 closed by mdk #35032: Remove the videos from faq/Windows https://bugs.python.org/issue35032 closed by matrixise #35065: Reading received data from a closed TCP stream using `StreamRe https://bugs.python.org/issue35065 closed by asvetlov #35099: Improve the IDLE - console differences doc https://bugs.python.org/issue35099 closed by terry.reedy #35109: Doctest in CI uses python binary built from master causing Dep https://bugs.python.org/issue35109 closed by mdk #35118: Add peek() or first() method in queue https://bugs.python.org/issue35118 closed by rhettinger #35123: Add style guide for sentinel usage https://bugs.python.org/issue35123 closed by serhiy.storchaka #35133: Bugs in concatenating string literals on different lines https://bugs.python.org/issue35133 closed by serhiy.storchaka #35147: _Py_NO_RETURN is always empty on GCC https://bugs.python.org/issue35147 closed by vstinner #35148: cannot activate a venv environment on a Swiss German windows https://bugs.python.org/issue35148 closed by vinay.sajip #35150: Misc/README.valgrind out-of-date https://bugs.python.org/issue35150 closed by cheryl.sabella #35152: too small type for struct.pack/unpack in mutliprocessing.Conne https://bugs.python.org/issue35152 closed by serhiy.storchaka #35154: subprocess.list2cmdline() does not allow running some commands https://bugs.python.org/issue35154 closed by Roffild #35159: Add a link to the devguide in the sidebar of the documentation https://bugs.python.org/issue35159 closed by mdk #35160: PyObjects initialized with PyObject_New have uninitialized poi https://bugs.python.org/issue35160 closed by benjamin.peterson #35161: ASAN: stack-use-after-scope in grp.getgr{nam,gid} and pwd.getp https://bugs.python.org/issue35161 closed by serhiy.storchaka #35162: Inconsistent behaviour around __new__ https://bugs.python.org/issue35162 closed by serhiy.storchaka #35167: Specify program for gzip and json.tool command line options https://bugs.python.org/issue35167 closed by serhiy.storchaka #35170: 3.7.1 compile failure on CentOS 6.10; _ctypes did not build https://bugs.python.org/issue35170 closed by lana.deere #35171: test_TimeRE_recreation_timezone failure on systems with non-de https://bugs.python.org/issue35171 closed by benjamin.peterson #35175: Builtin function all() is handling dict() types in a weird way https://bugs.python.org/issue35175 closed by josh.r #35176: for loop range bug https://bugs.python.org/issue35176 closed by steven.daprano #35179: Limit max sendfile chunk to 0x7ffff000 https://bugs.python.org/issue35179 closed by asvetlov #35180: Ctypes segfault or TypeError tested for python2.7 and 3 https://bugs.python.org/issue35180 closed by josh.r #35187: a bug about np.arrange https://bugs.python.org/issue35187 closed by mark.dickinson #35188: something confused about numpy.arange https://bugs.python.org/issue35188 closed by xyl123 From chris.barker at noaa.gov Fri Nov 9 16:17:09 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 9 Nov 2018 13:17:09 -0800 Subject: [Python-Dev] Signalling NANs In-Reply-To: References: <20181109110507.GK4071@ando.pearwood.info> Message-ID: works for me, too: In [9]: x = cast_int2float(0x7ff8000000000001) In [10]: hex(cast_float2int(x)) Out[10]: '0x7ff8000000000001' In [11]: x = cast_int2float(0x7ff0000000000001) In [12]: hex(cast_float2int(x)) Out[12]: '0x7ff0000000000001' OS-X, conda build: Python 3.7.0 | packaged by conda-forge | (default, Aug 27 2018, 17:24:52) [Clang 6.1.0 (clang-602.0.53)] on darwin I suspect it depends on the compiler's math library But neither is raising an exception: In [3]: x = cast_int2float(0x7ff0000000000001) In [4]: y = cast_int2float(0x7ff8000000000001) In [5]: 2.0 / x Out[5]: nan In [6]: 2.0 / y Out[6]: nan When should it? -CHB On Fri, Nov 9, 2018 at 3:46 AM, Serhiy Storchaka wrote: > 09.11.18 13:05, Steven D'Aprano ????: > >> py> x = cast_int2float(0x7ff0000000000001) >> py> x >> nan >> py> hex(cast_float2int(x)) >> '0x7ff8000000000001' >> > > I got '0x7ff0000000000001'. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris. > barker%40noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From vstinner at redhat.com Fri Nov 9 19:30:37 2018 From: vstinner at redhat.com (Victor Stinner) Date: Sat, 10 Nov 2018 01:30:37 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) Message-ID: Hi, The current C API of Python is both a strength and a weakness of the Python ecosystem as a whole. It's a strength because it allows to quickly reuse a huge number of existing libraries by writing a glue for them. It made numpy possible and this project is a big sucess! It's a weakness because of its cost on the maintenance, it prevents optimizations, and more generally it prevents to experiment modifying Python internals. For example, CPython cannot use tagged pointers, because the existing C API is heavily based on the ability to dereference a PyObject* object and access directly members of objects (like PyTupleObject). For example, Py_INCREF() modifies *directly* PyObject.ob_refcnt. It's not possible neither to use a Python compiled in debug mode on C extensions (compiled in release mode), because the ABI is different in debug mode. As a consequence, nobody uses the debug mode, whereas it is very helpful to develop C extensions and investigate bugs. I also consider that the C API gives too much work to PyPy (for their "cpyext" module). A better C API (not leaking implementation) details would make PyPy more efficient (and simplify its implementation in the long term, when the support for the old C API can be removed). For example, PyList_GetItem(list, 0) currently converts all items of the list to PyObject* in PyPy, it can waste memory if only the first item of the list is needed. PyPy has much more efficient storage than an array of PyObject* for lists. I wrote a website to explain all these issues with much more details: https://pythoncapi.readthedocs.io/ I identified "bad APIs" like using borrowed references or giving access to PyObject** (ex: PySequence_Fast_ITEMS). I already wrote an (incomplete) implementation of a new C API which doesn't leak implementation details: https://github.com/pythoncapi/pythoncapi It uses an opt-in option (Py_NEWCAPI define -- I'm not sure about the name) to get the new API. The current C API is unchanged. Ah, important points. I don't want to touch the current C API nor make it less efficient. And compatibility in both directions (current C API <=> new C API) is very important for me. There is no such plan as "Python 4" which would break the world and *force* everybody to upgrade to the new C API, or stay to Python 3 forever. No. The new C API must be an opt-in option, and current C API remains the default and not be changed. I have different ideas for the compatibility part, but I'm not sure of what are the best options yet. My short term for the new C API would be to ease the experimentation of projects like tagged pointers. Currently, I have to maintain the implementation of a new C API which is not really convenient. -- Today I tried to abuse the Py_DEBUG define for the new C API, but it seems to be a bad idea: https://github.com/python/cpython/pull/10435 A *new* define is needed to opt-in for the new C API. Victor From mike at selik.org Fri Nov 9 19:49:13 2018 From: mike at selik.org (Michael Selik) Date: Fri, 9 Nov 2018 16:49:13 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: On Fri, Nov 9, 2018 at 4:33 PM Victor Stinner wrote: > It uses an opt-in option (Py_NEWCAPI define -- I'm not sure about the > name) to get the new API. The current C API is unchanged. > While one can hope that this will be the only time the C API will be revised, it may be better to number it instead of calling it "NEW". 20 years from now, it won't feel new anymore. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vstinner at redhat.com Fri Nov 9 19:53:26 2018 From: vstinner at redhat.com (Victor Stinner) Date: Sat, 10 Nov 2018 01:53:26 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: To hide all implementation details, I propose to stop using macros and use function calls instead. For example, replace: #define PyTuple_GET_ITEM(op, i) \ (((PyTupleObject *)(op))->ob_item[i]) with: # define PyTuple_GET_ITEM(op, i) PyTuple_GetItem(op, i) With this change, C extensions using PyTuple_GET_ITEM() does no longer dereference PyObject* nor access PyTupleObject.ob_item. For example, PyPy doesn't have to convert all tuple items to PyObject, but only create one PyObject for the requested item. Another example is that it becomes possible to use a "CPython debug runtime" which checks at runtime that the first argument is a tuple and that the index is valid. For a longer explanation, see the idea of different "Python runtimes": https://pythoncapi.readthedocs.io/runtimes.html Replacing macros with function calls is only a first step. It doesn't solve the problem of borrowed references for example. Obviously, such change has a cost on performances. Sadly, I didn't run a benchmark yet. At this point, I mostly care about correctness and the feasibility of the whole project. I also hope that the new C API will allow to implement new optimizations which cannot even be imagined today, because of the backward compatibility. The question is if the performance balance is positive or not at the all :-) Hopefully, there is no urgency to take any decision at this point. The whole project is experimental and can be cancelled anytime. Victor From vstinner at redhat.com Fri Nov 9 19:56:02 2018 From: vstinner at redhat.com (Victor Stinner) Date: Sat, 10 Nov 2018 01:56:02 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: Le sam. 10 nov. 2018 ? 01:49, Michael Selik a ?crit : >> It uses an opt-in option (Py_NEWCAPI define -- I'm not sure about the >> name) to get the new API. The current C API is unchanged. > > While one can hope that this will be the only time the C API will be revised, it may be better to number it instead of calling it "NEW". 20 years from now, it won't feel new anymore. That's exactly why I dislike "New", it's like adding "Ex" or "2" to a function name :-) Well, before bikeshedding on the C define name, I would prefer to see if the overall idea of trying to push code for the new C API in the master branch is a good idea, or if it's too early and the experiment must continue in a fork. Victor From njs at pobox.com Fri Nov 9 20:50:04 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 9 Nov 2018 17:50:04 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: On Fri, Nov 9, 2018 at 4:30 PM, Victor Stinner wrote: > Ah, important points. I don't want to touch the current C API nor make > it less efficient. And compatibility in both directions (current C API > <=> new C API) is very important for me. There is no such plan as > "Python 4" which would break the world and *force* everybody to > upgrade to the new C API, or stay to Python 3 forever. No. The new C > API must be an opt-in option, and current C API remains the default > and not be changed. Doesn't this mean that you're just making the C API larger and more complicated, rather than simplifying it? You cite some benefits (tagged pointers, changing the layout of PyObject, making PyPy's life easier), but I don't see how you can do any of those things so long as the current C API remains supported. -n -- Nathaniel J. Smith -- https://vorpus.org From vstinner at redhat.com Fri Nov 9 21:03:03 2018 From: vstinner at redhat.com (Victor Stinner) Date: Sat, 10 Nov 2018 03:03:03 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: Le sam. 10 nov. 2018 ? 02:50, Nathaniel Smith a ?crit : > Doesn't this mean that you're just making the C API larger and more > complicated, rather than simplifying it? You cite some benefits > (tagged pointers, changing the layout of PyObject, making PyPy's life > easier), but I don't see how you can do any of those things so long as > the current C API remains supported. Tagged pointers and changing the layout of PyObject can only be experimented in a new different Python runtime which only supports C extensions compiled with the new C API. Technically, it can be CPython compiled with a different flag, as there is already python3-dbg (debug mode, ./configure --with-pydebug) and python3 (release mode). Or it can be CPython fork. I don't propose to experiment tagged pointer or changing the layout of PyObject in CPython. It may require too many changes and it's unclear if it's worth it or not. I only propose to implement the least controversial part of the new C API in the master branch, since maintaining this new C API in a fork is painful. I cannot promise that it will make PyPy's life easier. PyPy developers already told me that they already implemented the support of the current C API. The promise is that if you use the new C API, PyPy should be more efficient, because it would have less things to emulate. To be honest, I'm not sure at this point, I don't know PyPy internals. I also know that PyPy developers always complain when we *add new functions* to the C API, and there is a non-zero risk that I would like to add new functions, since current ones have issues :-) I am working with PyPy to involve them in the new C API. Victor From vstinner at redhat.com Fri Nov 9 21:14:33 2018 From: vstinner at redhat.com (Victor Stinner) Date: Sat, 10 Nov 2018 03:14:33 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: Sometimes, code is easier to understand than a long explanation, so here is a very simple example of modified function for the new C API: https://bugs.python.org/issue35206 https://github.com/python/cpython/pull/10443/files PyTuple_GET_ITEM() becomes a function call and the function implementation checks arguments at runtime if compiled in debug mode. Technically, the header file still uses a macro, to implicitly cast to PyObject*, since currently the macro accepts any type, and the new C API should not change that. Victor Le sam. 10 nov. 2018 ? 01:53, Victor Stinner a ?crit : > > To hide all implementation details, I propose to stop using macros and > use function calls instead. For example, replace: > > #define PyTuple_GET_ITEM(op, i) \ > (((PyTupleObject *)(op))->ob_item[i]) > > with: > > # define PyTuple_GET_ITEM(op, i) PyTuple_GetItem(op, i) > > With this change, C extensions using PyTuple_GET_ITEM() does no longer > dereference PyObject* nor access PyTupleObject.ob_item. For example, > PyPy doesn't have to convert all tuple items to PyObject, but only > create one PyObject for the requested item. Another example is that it > becomes possible to use a "CPython debug runtime" which checks at > runtime that the first argument is a tuple and that the index is > valid. For a longer explanation, see the idea of different "Python > runtimes": > > https://pythoncapi.readthedocs.io/runtimes.html > > Replacing macros with function calls is only a first step. It doesn't > solve the problem of borrowed references for example. > > Obviously, such change has a cost on performances. Sadly, I didn't run > a benchmark yet. At this point, I mostly care about correctness and > the feasibility of the whole project. I also hope that the new C API > will allow to implement new optimizations which cannot even be > imagined today, because of the backward compatibility. The question is > if the performance balance is positive or not at the all :-) > Hopefully, there is no urgency to take any decision at this point. The > whole project is experimental and can be cancelled anytime. > > Victor From njs at pobox.com Fri Nov 9 22:02:37 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 9 Nov 2018 19:02:37 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: On Fri, Nov 9, 2018 at 6:03 PM, Victor Stinner wrote: > Le sam. 10 nov. 2018 ? 02:50, Nathaniel Smith a ?crit : >> Doesn't this mean that you're just making the C API larger and more >> complicated, rather than simplifying it? You cite some benefits >> (tagged pointers, changing the layout of PyObject, making PyPy's life >> easier), but I don't see how you can do any of those things so long as >> the current C API remains supported. > > Tagged pointers and changing the layout of PyObject can only be > experimented in a new different Python runtime which only supports C > extensions compiled with the new C API. Technically, it can be CPython > compiled with a different flag, as there is already python3-dbg (debug > mode, ./configure --with-pydebug) and python3 (release mode). Or it > can be CPython fork. > > I don't propose to experiment tagged pointer or changing the layout of > PyObject in CPython. It may require too many changes and it's unclear > if it's worth it or not. I only propose to implement the least > controversial part of the new C API in the master branch, since > maintaining this new C API in a fork is painful. > > I cannot promise that it will make PyPy's life easier. PyPy developers > already told me that they already implemented the support of the > current C API. The promise is that if you use the new C API, PyPy > should be more efficient, because it would have less things to > emulate. To be honest, I'm not sure at this point, I don't know PyPy > internals. I also know that PyPy developers always complain when we > *add new functions* to the C API, and there is a non-zero risk that I > would like to add new functions, since current ones have issues :-) I > am working with PyPy to involve them in the new C API. So is it fair to say that your plan is that CPython will always use the current ("old") API internally, and the "new" API will be essentially an abstraction layer, that's designed to let people write C extensions that target the old API, while also being flexible enough to target PyPy and other "new different Python runtimes"? If so, then would it make more sense to develop this as an actual separate abstraction layer? That would have the huge advantage that it could be distributed and versioned separately from CPython, different packages could use different versions of the abstraction layer, PyPy isn't forced to immediately add a bunch of new APIs... -n -- Nathaniel J. Smith -- https://vorpus.org From dinoviehland at gmail.com Fri Nov 9 23:58:27 2018 From: dinoviehland at gmail.com (Dino Viehland) Date: Fri, 9 Nov 2018 20:58:27 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: > > That's exactly why I dislike "New", it's like adding "Ex" or "2" to a > function name :-) > > Well, before bikeshedding on the C define name, I would prefer to see > if the overall idea of trying to push code for the new C API in the > master branch is a good idea, or if it's too early and the experiment > must continue in a fork. Rather than adding yet another pre-processor directive for this I would suggest just adding a new header file that only has the new stable API. For example it could just be "py.h" or "pyapi.h". It would have all of the definitions for the stable API. While that would involve some duplication from the existing headers, I don't think it would be such a big deal - the idea is the API won't change, methods won't be removed, and occasionally new methods will get added in a very thoughtful manner. Having it be separate will force thought and conversation about it. It would also make it very easy to look and see what exactly is in the stable API as well. There's be a pretty flat list which can be consulted, and hopefully it ends up not being super huge either. BTW, thanks for continuing to push on this Victor, it seems like great progress! On Fri, Nov 9, 2018 at 4:57 PM Victor Stinner wrote: > Le sam. 10 nov. 2018 ? 01:49, Michael Selik a ?crit : > >> It uses an opt-in option (Py_NEWCAPI define -- I'm not sure about the > >> name) to get the new API. The current C API is unchanged. > > > > While one can hope that this will be the only time the C API will be > revised, it may be better to number it instead of calling it "NEW". 20 > years from now, it won't feel new anymore. > > That's exactly why I dislike "New", it's like adding "Ex" or "2" to a > function name :-) > > Well, before bikeshedding on the C define name, I would prefer to see > if the overall idea of trying to push code for the new C API in the > master branch is a good idea, or if it's too early and the experiment > must continue in a fork. > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/dinoviehland%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil at python.ca Sat Nov 10 14:22:48 2018 From: neil at python.ca (Neil Schemenauer) Date: Sat, 10 Nov 2018 13:22:48 -0600 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: <20181110192248.lk5ureidahc3yfzi@python.ca> On 2018-11-09, Dino Viehland wrote: > Rather than adding yet another pre-processor directive for this I would > suggest just adding a new header file that only has the new stable API. > For example it could just be "py.h" or "pyapi.h". It would have all of the > definitions for the stable API. I like this idea. It will be easier to define a minimal and clean API with this approach. I believe it can mostly be a subset of the current API. I think we could Dino's idea with Nathaniel's suggestion of developing it separate from CPython. Victor's C-API project is already attempting to provide backwards compatibility. I.e. you can have an extension module that uses the new API but compiles and runs with older versions of Python (e.g. 3.6). So, whatever is inside this new API, it must be possible to build it on top of the existing Python API. Regards, Neil From steve at pearwood.info Sat Nov 10 18:26:42 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 11 Nov 2018 10:26:42 +1100 Subject: [Python-Dev] Signalling NANs In-Reply-To: References: <20181109110507.GK4071@ando.pearwood.info> Message-ID: <20181110232642.GS4071@ando.pearwood.info> On Fri, Nov 09, 2018 at 01:17:09PM -0800, Chris Barker via Python-Dev wrote: > works for me, too: > > In [9]: x = cast_int2float(0x7ff8000000000001) > In [10]: hex(cast_float2int(x)) > Out[10]: '0x7ff8000000000001' > > In [11]: x = cast_int2float(0x7ff0000000000001) > In [12]: hex(cast_float2int(x)) > Out[12]: '0x7ff0000000000001' Fascinating. I borrowed a Debian system and tried it on there, and got the same results as you. So I wonder whether it is something unusual about my Red Hat system that it prevents the formation of signalling NANs? However, I don't think that explains why the float constructor doesn't allow Decimal('snan') to be converted to a float. > I suspect it depends on the compiler's math library Unfortunately that's probably true. > But neither is raising an exception: [...] > When should it? I think that, by default, any arithmetic operation, comparison or math library function call ought to raise if given a snan, if the underlying math library supports IEEE-754 signals. Which I imagine these days nearly all should do. So any of these should raise: snan + 1 math.sin(snan) snan == 0 -- Steve From njs at pobox.com Sat Nov 10 19:27:29 2018 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 10 Nov 2018 16:27:29 -0800 Subject: [Python-Dev] Signalling NANs In-Reply-To: <20181110232642.GS4071@ando.pearwood.info> References: <20181109110507.GK4071@ando.pearwood.info> <20181110232642.GS4071@ando.pearwood.info> Message-ID: On Sat, Nov 10, 2018 at 3:26 PM, Steven D'Aprano wrote: > On Fri, Nov 09, 2018 at 01:17:09PM -0800, Chris Barker via Python-Dev wrote: >> works for me, too: >> >> In [9]: x = cast_int2float(0x7ff8000000000001) >> In [10]: hex(cast_float2int(x)) >> Out[10]: '0x7ff8000000000001' >> >> In [11]: x = cast_int2float(0x7ff0000000000001) >> In [12]: hex(cast_float2int(x)) >> Out[12]: '0x7ff0000000000001' > > > Fascinating. I borrowed a Debian system and tried it on there, and got > the same results as you. So I wonder whether it is something unusual > about my Red Hat system that it prevents the formation of signalling > NANs? Apparently loading a sNaN into an x87 register silently converts it to a qNaN, and on Linux C compilers are allowed to do that at any point: https://stackoverflow.com/questions/22816095/signalling-nan-was-corrupted-when-returning-from-x86-function-flds-fstps-of-x87 So the Debian/RH difference may just be different register allocation in two slightly different compiler versions. Also, gcc doesn't even try to support sNaN correctly unless you pass -fsignaling-nans, and the docs warn that this isn't very well tested: https://www.cleancss.com/explain-command/gcc/5197 IEEE754 is a wonderful thing, and even imperfect implementations are still way better than what came before, but real systems are almost universally sloppy about details. I don't think any real libm even tries to set all the status flags correctly. And I'm not sure sNaN support is actually useful anyway... Of course if all you want is a value that raises an exception whenever it's used in an arithmetic expression, then Python does support that :-) sNaN = object() -n -- Nathaniel J. Smith -- https://vorpus.org From stephane at wirtel.be Sun Nov 11 09:34:45 2018 From: stephane at wirtel.be (Stephane Wirtel) Date: Sun, 11 Nov 2018 14:34:45 +0000 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <20181104151239.GA14162@xps> <20181104155017.GX3817@ando.pearwood.info> Message-ID: <20181111143445.GA23982@xps> Mariatta, Do you think we could add a webhook for the build of the documentation for each PR where the build of the doc works? St?phane -- St?phane Wirtel - https://wirtel.be - @matrixise From steve at pearwood.info Sun Nov 11 16:50:23 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 12 Nov 2018 08:50:23 +1100 Subject: [Python-Dev] Signalling NANs In-Reply-To: References: <20181109110507.GK4071@ando.pearwood.info> <20181110232642.GS4071@ando.pearwood.info> Message-ID: <20181111215023.GZ4071@ando.pearwood.info> On Sat, Nov 10, 2018 at 04:27:29PM -0800, Nathaniel Smith wrote: > Apparently loading a sNaN into an x87 register silently converts it to > a qNaN, and on Linux C compilers are allowed to do that at any point: > > https://stackoverflow.com/questions/22816095/signalling-nan-was-corrupted-when-returning-from-x86-function-flds-fstps-of-x87 Thanks for finding that. > So the Debian/RH difference may just be different register allocation > in two slightly different compiler versions. The Debian box uses an ARM processor, so there's that difference too. -- Steve From greg.ewing at canterbury.ac.nz Sun Nov 11 17:05:19 2018 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 12 Nov 2018 11:05:19 +1300 Subject: [Python-Dev] Signalling NANs In-Reply-To: <20181111215023.GZ4071@ando.pearwood.info> References: <20181109110507.GK4071@ando.pearwood.info> <20181110232642.GS4071@ando.pearwood.info> <20181111215023.GZ4071@ando.pearwood.info> Message-ID: <5BE8A79F.70907@canterbury.ac.nz> Steven D'Aprano wrote: > The Debian box uses an ARM processor, so there's that difference too. FWIW, I tried this on MacOSX 10.6 with an Intel Xeon and it also seems to suppress sNaNs. -- Greg From vstinner at redhat.com Sun Nov 11 18:19:24 2018 From: vstinner at redhat.com (Victor Stinner) Date: Mon, 12 Nov 2018 00:19:24 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: Le sam. 10 nov. 2018 ? 04:02, Nathaniel Smith a ?crit : > So is it fair to say that your plan is that CPython will always use > the current ("old") API internally, and the "new" API will be > essentially an abstraction layer, that's designed to let people write > C extensions that target the old API, while also being flexible enough > to target PyPy and other "new different Python runtimes"? What we call is "Python C API" is not really an API. I mean, nobody tried to sit down and think about a proper API to access Python from C. We happened is that we had an implementation of Python written in C and it was cool to just "expose everything". By everything, I mean "everything". It's hard to find a secret CPython feature not exposed or leaked in the C API. The problem that I'm trying to solve to "fix the C API" to hide implementation details, with one constraint: don't create a new "Python 4". I don't want to break the backward compatibility without having a slow and smooth transition plan. The "old" and new C API must be supported in parallel, at least in the standard CPython, /usr/bin/python3. Writing a new API from scratch is nice, but it's harder to moving all existing C extensions from the "old" C API to a new one. Replacing macros with functions has little impact on backward compatibility. Most C extensions should still work if macros become functions. I'm not sure yet how far we should go towards a perfect API which doesn't leak everything. We have to move slowly, and make sure that we don't break major C extensions. We need to write tools to fully automate the conversion. If it's not possible, maybe the whole project will fail. I'm looking for a practical solutions based on the existing C API and the existing CPython code base. > If so, then would it make more sense to develop this as an actual > separate abstraction layer? That would have the huge advantage that it > could be distributed and versioned separately from CPython, different > packages could use different versions of the abstraction layer, PyPy > isn't forced to immediately add a bunch of new APIs... I didn't investigate this option. But I expect that you will have to write a full new API using a different prefix than "Py_". Otherwise, I'm not sure how you want to handle PyTuple_GET_ITEM() as a macro on one side (Include/tupleobject.h) and PyTuple_GET_ITEM() on the other side (hypotetical_new_api.h). Would it mean to duplicate all functions to get a different prefix? If you keep the "Py_" prefix, what I would like to ensure is that some functions are no longer accessible. How you remove PySequence_Fast_GET_ITEM() for example? For me, it seems simpler to modify CPython headers than starting on something new. It seems simpler to choose the proper level of compatibility. I start from an API 100% compatible (the current C API), and decide what is changed and how. Victor From greg at krypto.org Tue Nov 13 01:46:30 2018 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 12 Nov 2018 22:46:30 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: On Fri, Nov 9, 2018 at 5:50 PM Nathaniel Smith wrote: > On Fri, Nov 9, 2018 at 4:30 PM, Victor Stinner > wrote: > > Ah, important points. I don't want to touch the current C API nor make > > it less efficient. And compatibility in both directions (current C API > > <=> new C API) is very important for me. There is no such plan as > > "Python 4" which would break the world and *force* everybody to > > upgrade to the new C API, or stay to Python 3 forever. No. The new C > > API must be an opt-in option, and current C API remains the default > > and not be changed. > > Doesn't this mean that you're just making the C API larger and more > complicated, rather than simplifying it? You cite some benefits > (tagged pointers, changing the layout of PyObject, making PyPy's life > easier), but I don't see how you can do any of those things so long as > the current C API remains supported. > > -n > I believe the implied missing thing from Victor's description is this: Experimentation with new internal implementations can begin once we have a new C API by explicitly breaking the old C API with-in such experiments (as is required for most anything interesting). All code that is written to the new C API still works during this process, thus making the job of practical testing of such new VM internals easier. >From there, you can make decisions on how heavily to push the world towards adoption of the new C API and by when so that a runtime not supporting the old API can be realized with a list of enticing carrot tasting benefits. (devising any necessary pypy cpyext-like compatibility solutions for super long lived code or things that sadly want to use the 3.7 ABI for 10 years in the process - and similarly an extension to provide some or all of this API/ABI on top of older existing stable Python releases!) I'd *love* to get to a situation where the only valid ABI we support knows nothing about internal structs at all. Today, PyObject memory layout is exposed to the world and unchangable. :( This is a long process release wise (assume multiple stable releases go by before we could declare that). But *we've got to start* by defining what we want to provide as a seriously limited but functional API and ABI even if it doesn't perform as well as things compiled against our existing exposed-internals C API. For *most* extension modules, performance of this sort is not important. For the numpys of the world life is more complicated, we should work with them to figure out their C API needs. If it wasn't already obvious, you've got my support on this. :) -gps PS If the conversation devolves to arguing about "new" being a bad name, that's a good sign. I suggest calling it the Vorpal API after the bunny. Or be boring and just use a year number for the name. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Tue Nov 13 02:13:19 2018 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 12 Nov 2018 23:13:19 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: On Sun, Nov 11, 2018 at 3:19 PM Victor Stinner wrote: > Le sam. 10 nov. 2018 ? 04:02, Nathaniel Smith a ?crit : > > So is it fair to say that your plan is that CPython will always use > > the current ("old") API internally, and the "new" API will be > > essentially an abstraction layer, that's designed to let people write > > C extensions that target the old API, while also being flexible enough > > to target PyPy and other "new different Python runtimes"? > > What we call is "Python C API" is not really an API. I mean, nobody > tried to sit down and think about a proper API to access Python from > C. We happened is that we had an implementation of Python written in C > and it was cool to just "expose everything". By everything, I mean > "everything". It's hard to find a secret CPython feature not exposed > or leaked in the C API. > > The problem that I'm trying to solve to "fix the C API" to hide > implementation details, with one constraint: don't create a new > "Python 4". I don't want to break the backward compatibility without > having a slow and smooth transition plan. The "old" and new C API must > be supported in parallel, at least in the standard CPython, > /usr/bin/python3. > > Writing a new API from scratch is nice, but it's harder to moving all > existing C extensions from the "old" C API to a new one. > > Replacing macros with functions has little impact on backward > compatibility. Most C extensions should still work if macros become > functions. > > I'm not sure yet how far we should go towards a perfect API which > doesn't leak everything. We have to move slowly, and make sure that we > don't break major C extensions. We need to write tools to fully > automate the conversion. If it's not possible, maybe the whole project > will fail. > > I'm looking for a practical solutions based on the existing C API and > the existing CPython code base. > > > If so, then would it make more sense to develop this as an actual > > separate abstraction layer? That would have the huge advantage that it > > could be distributed and versioned separately from CPython, different > > packages could use different versions of the abstraction layer, PyPy > > isn't forced to immediately add a bunch of new APIs... > I like where this thought is headed! I didn't investigate this option. But I expect that you will have to > write a full new API using a different prefix than "Py_". Otherwise, > I'm not sure how you want to handle PyTuple_GET_ITEM() as a macro on > one side (Include/tupleobject.h) and PyTuple_GET_ITEM() on the other > side (hypotetical_new_api.h). > > Would it mean to duplicate all functions to get a different prefix? > For strict C, yes, namespacing has always been manual collision avoidance/mangling. You'd need a new unique name for anything defined as a function that conflicts with the old C API function names. You could hide these from code if you are just reusing the name by using #define in the new API header of the old name to the new unique name... but that makes code a real challenge to read and understand at a glance. Without examining the header file imports the reader wouldn't know for example if the code is calling the API that borrows a reference (old api, evil) or one that gets its own reference (presumed new api). When things have only ever been macros (Py_INCREF, etc) the name can be reused if there has never been a function of that name in an old C API. But beware of reuse for anything where the semantics change to avoid misunderstandings about behavior from people familiar with the old API or googling API names to look up behavior. I suspect optimizing for ease of transition from code written to the existing C API to the new API by keeping names the same is the wrong thing to optimize for. Using entirely new names may actually be a good thing as it makes it immediately clear which way a given piece of code is written. It'd also be good for PyObject* the old C API thing be a different type from PythonHandle* (a new API thing who's name I just made up) such that they could not be passed around and exchanged for one another without a compiler complaint. Code written using both APIs should not be allowed to transit objects directly between different APIs. -gps > > If you keep the "Py_" prefix, what I would like to ensure is that some > functions are no longer accessible. How you remove > PySequence_Fast_GET_ITEM() for example? > > For me, it seems simpler to modify CPython headers than starting on > something new. It seems simpler to choose the proper level of > compatibility. I start from an API 100% compatible (the current C > API), and decide what is changed and how. > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vstinner at redhat.com Tue Nov 13 04:31:35 2018 From: vstinner at redhat.com (Victor Stinner) Date: Tue, 13 Nov 2018 10:31:35 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: Le mar. 13 nov. 2018 ? 08:13, Gregory P. Smith a ?crit : > When things have only ever been macros (Py_INCREF, etc) the name can be reused if there has never been a function of that name in an old C API. But beware of reuse for anything where the semantics change to avoid misunderstandings about behavior from people familiar with the old API or googling API names to look up behavior. My plan is to only keep an existing function if it has no flaw. If it has a flaw, it should be removed and maybe replaced with a new function (or suggest a replacement using existing APIs). I don't want to modify the behavior depending if it's the "old" or the "new" API. My plan reuses the same code base, I don't want to put the whole body of a function inside a "#ifdef NEWCAPI". > I suspect optimizing for ease of transition from code written to the existing C API to the new API by keeping names the same is the wrong thing to optimize for. Not all functions in the current C API are bad. Many functions are just fine. For example, PyObject_GetAttr() returns a strong reference. I don't see anything wrong with this API. Only a small portion of the C API is "bad". > Using entirely new names may actually be a good thing as it makes it immediately clear which way a given piece of code is written. It'd also be good for PyObject* the old C API thing be a different type from PythonHandle* (a new API thing who's name I just made up) such that they could not be passed around and exchanged for one another without a compiler complaint. Code written using both APIs should not be allowed to transit objects directly between different APIs. On Windows, the HANDLE type is just an integer, it's not a pointer. If it's a pointer, some developer may want to dereference it, whereas it must really be a dummy integer. Consider tagged pointers: you don't want to dereferenced a tagged pointer. But no, I don't plan to replace "PyObject*". Again, I want to reduce the number of changes. If the PyObject structure is not exposed, I don't think that it's an issue to keep "PyObject*" type. Example: --- #include typedef struct _object PyObject; PyObject* dummy(void) { return (PyObject *)NULL; } int main() { PyObject *obj = dummy(); return obj->ob_type; } --- This program is valid, except of the single line which attempts to dereference PyObject*: x.c: In function 'main': x.c:13:15: error: dereferencing pointer to incomplete type 'PyObject {aka struct _object}' return obj->ob_type; If I could restart from scratch, I would design the C API differently. For example, I'm not sure that I would use "global variables" (Python thread state) to store the current exception. I would use similar like Rust error handling: https://doc.rust-lang.org/book/first-edition/error-handling.html But that's not my plan. My plan is not to write a new bright world. My plan is to make a "small step" towards a better API to make PyPy more efficient and to allow to write a new more optimized CPython. I also plan to *iterate* on the API rather than having a frozen API. It's just that we cannot jump towards the perfect API at once. We need small steps and make sure that we don't break too many C extensions at each milestone. Maybe the new API should be versioned as Android NDK for example. Victor From nd at perlig.de Tue Nov 13 14:21:59 2018 From: nd at perlig.de (=?ISO-8859-1?Q?Andr=E9?= Malo) Date: Tue, 13 Nov 2018 20:21:59 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: <1614616.psXdYk7axO@chryseis> Victor Stinner wrote: > Replacing macros with functions has little impact on backward > compatibility. Most C extensions should still work if macros become > functions. As long as they are recompiled. However, they will lose a lot of performance. Both these points have been mentioned somewhere, I'm certain, but it cannot be stressed enough, IMHO. > > I'm not sure yet how far we should go towards a perfect API which > doesn't leak everything. We have to move slowly, and make sure that we > don't break major C extensions. We need to write tools to fully > automate the conversion. If it's not possible, maybe the whole project > will fail. I'm wondering, how you suggest to measure "major". I believe, every C extension, which is public and running in production somewhere, is major enough. Maybe "easiness to fix"? Lines of code? Cheers, -- > R?tselnd, was ein Anthroposoph mit Unterwerfung zu tun hat... ^^^^^^^^^^^^ [...] Dieses Wort gibt so viele Stellen f?r einen Spelling Flame her, und Du g?nnst einem keine einzige. -- Jean Claude und David Kastrup in dtl From nd at perlig.de Tue Nov 13 14:32:03 2018 From: nd at perlig.de (=?ISO-8859-1?Q?Andr=E9?= Malo) Date: Tue, 13 Nov 2018 20:32:03 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: <3346184.CCZ30flunD@chryseis> Victor Stinner wrote: > Replacing macros with functions has little impact on backward > compatibility. Most C extensions should still work if macros become > functions. As long as they are recompiled. However, they will lose a lot of performance. Both these points have been mentioned somewhere, I'm certain, but it cannot be stressed enough, IMHO. > > I'm not sure yet how far we should go towards a perfect API which > doesn't leak everything. We have to move slowly, and make sure that we > don't break major C extensions. We need to write tools to fully > automate the conversion. If it's not possible, maybe the whole project > will fail. I'm wondering, how you suggest to measure "major". I believe, every C extension, which is public and running in production somewhere, is major enough. Maybe "easiness to fix"? Lines of code? Cheers, -- > R?tselnd, was ein Anthroposoph mit Unterwerfung zu tun hat... ^^^^^^^^^^^^ Du g?nnst einem keine einzige. -- Jean Claude und David Kastrup in dtl[...] Dieses Wort gibt so viele Stellen f?r einen Spelling Flame her, und From vstinner at redhat.com Tue Nov 13 15:59:14 2018 From: vstinner at redhat.com (Victor Stinner) Date: Tue, 13 Nov 2018 21:59:14 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <1614616.psXdYk7axO@chryseis> References: <1614616.psXdYk7axO@chryseis> Message-ID: Le mar. 13 nov. 2018 ? 20:32, Andr? Malo a ?crit : > As long as they are recompiled. However, they will lose a lot of performance. > Both these points have been mentioned somewhere, I'm certain, but it cannot be > stressed enough, IMHO. Somewhere is here: https://pythoncapi.readthedocs.io/performance.html > I'm wondering, how you suggest to measure "major". I believe, every C > extension, which is public and running in production somewhere, is major > enough. My plan is to select something like the top five most popular C extensions based on PyPI download statistics. I cannot test everything, I have to put practical limits. Victor From njs at pobox.com Tue Nov 13 21:24:46 2018 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 13 Nov 2018 18:24:46 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: On Mon, Nov 12, 2018 at 10:46 PM, Gregory P. Smith wrote: > > On Fri, Nov 9, 2018 at 5:50 PM Nathaniel Smith wrote: >> >> On Fri, Nov 9, 2018 at 4:30 PM, Victor Stinner >> wrote: >> > Ah, important points. I don't want to touch the current C API nor make >> > it less efficient. And compatibility in both directions (current C API >> > <=> new C API) is very important for me. There is no such plan as >> > "Python 4" which would break the world and *force* everybody to >> > upgrade to the new C API, or stay to Python 3 forever. No. The new C >> > API must be an opt-in option, and current C API remains the default >> > and not be changed. >> >> Doesn't this mean that you're just making the C API larger and more >> complicated, rather than simplifying it? You cite some benefits >> (tagged pointers, changing the layout of PyObject, making PyPy's life >> easier), but I don't see how you can do any of those things so long as >> the current C API remains supported. [...] > I'd love to get to a situation where the only valid ABI we support knows nothing about internal structs at all. Today, PyObject memory layout is exposed to the world and unchangable. :( > This is a long process release wise (assume multiple stable releases go by before we could declare that). It seems like the discussion so far is: Victor: "I know people when people hear 'new API' they get scared and think we're going to do a Python-3-like breaking transition, but don't worry, we're never going to do that." Nathaniel: "But then what does the new API add?" Greg: "It lets us do a Python-3-like breaking transition!" To make a new API work we need to *either* have some plan for how it will produce benefits without a big breaking transition, *or* some plan for how to make this kind of transition viable. These are both super super hard questions -- that's why this discussion has been dragging on for a decade now! But you do have to pick one or the other :-). > Experimentation with new internal implementations can begin once we have a new C API by explicitly breaking the old C API with-in such experiments (as is required for most anything interesting). All code that is written to the new C API still works during this process, thus making the job of practical testing of such new VM internals easier. So I think what you're saying is that your goal is to get a new/better/shinier VM, and the plan to accomplish that is: 1. Define a new C API. 2. Migrate projects to the new C API. 3. Build a new VM that gets benefits from only supporting the new API. This sounds exactly backwards to me? If you define the new API before you build the VM, then no-one is going to migrate, because why should they bother? You'd be asking overworked third-party maintainers to do a bunch of work with no benefit, except that maybe someday later something good might happen. And if you define the new API first, then when you start building the VM you're 100% guaranteed to discover that the new API isn't *quite* right for the optimizations you want to do, and have to change it again to make a new-new API. And then go back to the maintainers who you did convince to put their neck out and do work on spec, and explain that haha whoops actually they need to update their code *again*. There have been lots of Python VM projects at this point. They've faced many challenges, but I don't think any have failed because there just wasn't enough pure-Python code around to test the VM internals. If I were trying to build a new Python VM, that's not even in the top 10 of issues I'd be worried about... -n -- Nathaniel J. Smith -- https://vorpus.org From raymond.hettinger at gmail.com Tue Nov 13 22:06:35 2018 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 13 Nov 2018 19:06:35 -0800 Subject: [Python-Dev] General concerns about C API changes Message-ID: <62A87B15-0F0E-4242-BF44-9C58FE890D8B@gmail.com> Overall, I support the efforts to improve the C API, but over the last few weeks have become worried. I don't want to hold up progress with fear, uncertainty, and doubt. Yet, I would like to be more comfortable that we're all aware of what is occurring and what are the potential benefits and risks. * Inline functions are great. They provide true local variables, better separation of concerns, are far less kludgy than text based macro substitution, and will typically generate the same code as the equivalent macro. This is good tech when used with in a single source file where it has predictable results. However, I'm not at all confident about moving these into header files which are included in multiple target .c files which need be compiled into separate .o files and linked to other existing libraries. With a macro, I know for sure that the substitution is taking place. This happens at all levels of optimization and in a debug mode. The effects are 100% predictable and have a well-established track record in our mature battle-tested code base. With cross module function calls, I'm less confident about what is happening, partly because compilers are free to ignore inline directives and partly because the semantics of inlining are less clear when the crossing module boundaries. * Other categories of changes that we make tend to have only a shallow reach. However, these C API changes will likely touch every C extension that has ever been written, some of which is highly tuned but not actively re-examined. If any mistakes are make, they will likely be pervasive. Accordingly, caution is warranted. My expectation was that the changes would be conducted in experimental branches. But extensive changes are already being made (or about to be made) on the 3.8 master. If a year from now, we decide that the changes were destabilizing or that the promised benefits didn't materialize, they will be difficult to undo because there are so many of them and because they will be interleaved with other changes. The original motivation was to achieve a 2x speedup in return for significantly churning the C API. However, the current rearranging of the include files and macro-to-inline-function changes only give us churn. At the very best, they will be performance neutral. At worst, formerly cheap macro calls will become expensive in places that we haven't thought to run timings on. Given that compilers don't have to honor an inline directive, we can't really know for sure -- perhaps today it works out fine, and perhaps tomorrow the compilers opt for a different behavior. Maybe everything that is going on is fine. Maybe it's not. I am not expert enough to know for sure, but we should be careful before green-lighting such an extensive series of changes directly to master. Reasonable questions to ask are: 1) What are the risks to third party modules, 2) Do we really know that the macro-to-inline-function transformations are semantically neutral. 3) If there is no performance benefit (none has been seen so far, nor is any promised in the pending PRs), is it worth it? We do know that PyPy folks have had their share of issues with the C API, but I'm not sure that we can make any of this go away without changing the foundations of the whole ecosystem. It is inconvenient for a full GC environment to interact with the API for a reference counted environment -- I don't think we can make this challenge go away without giving up reference counting. It is inconvenient for a system that manifests objects on demand to interact with an API that assumes that objects have identity and never more once they are created -- I don't think we can make this go away either. It is inconvenient to a system that uses unboxed data to interact with our API where everything is an object that includes a type pointer and reference count -- We have provided an API for boxing and boxing, but the trip back-and-forth is inconveniently expensive -- I don't think we can make that go away either because too much of the ecosystem depends on that API. There are some things that can be mitigated such as challenges with borrowed references but that doesn't seem to have been the focus on any of the PRs. In short, I'm somewhat concerned about the extensive changes that are occurring. I do know they will touch substantially every C module in the entire ecosystem. I don't know whether they are safe or whether they will give any real benefit. FWIW, none of this is a criticism of the work being done. Someone needs to think deeply about the C API or else progress will never be made. That said, it is a high risk project with many PRs going directly into master, so it does warrant having buy in that the churn isn't destabilizing and will actually produce a benefit that is worth it. Raymond From njs at pobox.com Tue Nov 13 22:17:31 2018 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 13 Nov 2018 19:17:31 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: On Sun, Nov 11, 2018 at 3:19 PM, Victor Stinner wrote: > I'm not sure yet how far we should go towards a perfect API which > doesn't leak everything. We have to move slowly, and make sure that we > don't break major C extensions. We need to write tools to fully > automate the conversion. If it's not possible, maybe the whole project > will fail. This is why I'm nervous about adding this directly to CPython. If we're just talking about adding a few new API calls to replace old ones that are awkward to use, then that's fine, that's not very risky. But if you're talking about a large project that makes fundamental changes in the C API (e.g., disallowing pointer dereferences, like tagged pointers do), then yeah, there's a very large risk that that might fail. >> If so, then would it make more sense to develop this as an actual>> separate abstraction layer? That would have the huge advantage that it >> could be distributed and versioned separately from CPython, different >> packages could use different versions of the abstraction layer, PyPy >> isn't forced to immediately add a bunch of new APIs... > > I didn't investigate this option. But I expect that you will have to > write a full new API using a different prefix than "Py_". Otherwise, > I'm not sure how you want to handle PyTuple_GET_ITEM() as a macro on > one side (Include/tupleobject.h) and PyTuple_GET_ITEM() on the other > side (hypotetical_new_api.h). > > Would it mean to duplicate all functions to get a different prefix? > > If you keep the "Py_" prefix, what I would like to ensure is that some > functions are no longer accessible. How you remove > PySequence_Fast_GET_ITEM() for example? > > For me, it seems simpler to modify CPython headers than starting on > something new. It seems simpler to choose the proper level of > compatibility. I start from an API 100% compatible (the current C > API), and decide what is changed and how. It may be simpler, but it's hugely more risky. Once you add something to CPython, you can't take it back again without a huge amount of work. You said above that the whole project might fail. But if it's in CPython, failure is not acceptable! The whole problem you're trying to solve is that the C API is too big, but your proposed solution starts by making it bigger, so if your project fails then it makes the problem even bigger... I don't know if making it a separate project is the best approach or not, it was just an idea :-). But it would have the huge benefit that you can actually experiment and try things out without committing to supporting them forever. And I don't know the best answer to all your questions above, that's what experimenting is for :-). But it certainly is technically possible to make a new API that shares a common subset with the old API, e.g.: /* NewPython.h */ #include #define PyTuple_GET_ITEM PyTuple_Get_Item #undef PySequence_Fast_GET_ITEM -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Tue Nov 13 23:40:48 2018 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 13 Nov 2018 20:40:48 -0800 Subject: [Python-Dev] General concerns about C API changes In-Reply-To: <62A87B15-0F0E-4242-BF44-9C58FE890D8B@gmail.com> References: <62A87B15-0F0E-4242-BF44-9C58FE890D8B@gmail.com> Message-ID: To me, the "new C API" discussion and the "converting macros into inline functions" discussions are very different, almost unrelated. There are always lots of small C API changes happening, and AFAIK the macros->inline changes fall into that category. It sounds like you want to discuss whether inline functions are a good idea? Or are there other changes happening that you're worried about? Or is there some connection between inline functions and API breakage that I'm not aware of? Your email touches on a lot of different topics and I'm having trouble understanding how they fit together. (And I guess like most people here, I'm not watching every commit to the master branch so I may not even know what changes you're referring to.) On Tue, Nov 13, 2018 at 7:06 PM, Raymond Hettinger wrote: > Overall, I support the efforts to improve the C API, but over the last few weeks have become worried. I don't want to hold up progress with fear, uncertainty, and doubt. Yet, I would like to be more comfortable that we're all aware of what is occurring and what are the potential benefits and risks. > > * Inline functions are great. They provide true local variables, better separation of concerns, are far less kludgy than text based macro substitution, and will typically generate the same code as the equivalent macro. This is good tech when used with in a single source file where it has predictable results. > > However, I'm not at all confident about moving these into header files which are included in multiple target .c files which need be compiled into separate .o files and linked to other existing libraries. > > With a macro, I know for sure that the substitution is taking place. This happens at all levels of optimization and in a debug mode. The effects are 100% predictable and have a well-established track record in our mature battle-tested code base. With cross module function calls, I'm less confident about what is happening, partly because compilers are free to ignore inline directives and partly because the semantics of inlining are less clear when the crossing module boundaries. > > * Other categories of changes that we make tend to have only a shallow reach. However, these C API changes will likely touch every C extension that has ever been written, some of which is highly tuned but not actively re-examined. If any mistakes are make, they will likely be pervasive. Accordingly, caution is warranted. > > My expectation was that the changes would be conducted in experimental branches. But extensive changes are already being made (or about to be made) on the 3.8 master. If a year from now, we decide that the changes were destabilizing or that the promised benefits didn't materialize, they will be difficult to undo because there are so many of them and because they will be interleaved with other changes. > > The original motivation was to achieve a 2x speedup in return for significantly churning the C API. However, the current rearranging of the include files and macro-to-inline-function changes only give us churn. At the very best, they will be performance neutral. At worst, formerly cheap macro calls will become expensive in places that we haven't thought to run timings on. Given that compilers don't have to honor an inline directive, we can't really know for sure -- perhaps today it works out fine, and perhaps tomorrow the compilers opt for a different behavior. > > Maybe everything that is going on is fine. Maybe it's not. I am not expert enough to know for sure, but we should be careful before green-lighting such an extensive series of changes directly to master. Reasonable questions to ask are: 1) What are the risks to third party modules, 2) Do we really know that the macro-to-inline-function transformations are semantically neutral. 3) If there is no performance benefit (none has been seen so far, nor is any promised in the pending PRs), is it worth it? > > We do know that PyPy folks have had their share of issues with the C API, but I'm not sure that we can make any of this go away without changing the foundations of the whole ecosystem. It is inconvenient for a full GC environment to interact with the API for a reference counted environment -- I don't think we can make this challenge go away without giving up reference counting. It is inconvenient for a system that manifests objects on demand to interact with an API that assumes that objects have identity and never more once they are created -- I don't think we can make this go away either. It is inconvenient to a system that uses unboxed data to interact with our API where everything is an object that includes a type pointer and reference count -- We have provided an API for boxing and boxing, but the trip back-and-forth is inconveniently expensive -- I don't think we can make that go away either because too much of the ecosystem depends on that API. There are some things that ca > n be mitigated such as challenges with borrowed references but that doesn't seem to have been the focus on any of the PRs. > > In short, I'm somewhat concerned about the extensive changes that are occurring. I do know they will touch substantially every C module in the entire ecosystem. I don't know whether they are safe or whether they will give any real benefit. > > FWIW, none of this is a criticism of the work being done. Someone needs to think deeply about the C API or else progress will never be made. That said, it is a high risk project with many PRs going directly into master, so it does warrant having buy in that the churn isn't destabilizing and will actually produce a benefit that is worth it. > > > Raymond > > > > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%40pobox.com -- Nathaniel J. Smith -- https://vorpus.org From vstinner at redhat.com Wed Nov 14 05:03:49 2018 From: vstinner at redhat.com (Victor Stinner) Date: Wed, 14 Nov 2018 11:03:49 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: Le mer. 14 nov. 2018 ? 03:24, Nathaniel Smith a ?crit : > So I think what you're saying is that your goal is to get a > new/better/shinier VM, and the plan to accomplish that is: > > 1. Define a new C API. > 2. Migrate projects to the new C API. > 3. Build a new VM that gets benefits from only supporting the new API. > > This sounds exactly backwards to me? > > If you define the new API before you build the VM, then no-one is > going to migrate, because why should they bother? You'd be asking > overworked third-party maintainers to do a bunch of work with no > benefit, except that maybe someday later something good might happen. Oh, I should stop to promote my "CPython fork" idea. There is already an existing VM which is way faster than CPython but its performances are limited by the current C API. The VM is called... PyPy! The bet is that migrating to a new C API would make your C extension faster. Victor From solipsis at pitrou.net Wed Nov 14 05:20:55 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 14 Nov 2018 11:20:55 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) References: Message-ID: <20181114112055.2131d35b@fsol> On Wed, 14 Nov 2018 11:03:49 +0100 Victor Stinner wrote: > > Oh, I should stop to promote my "CPython fork" idea. > > There is already an existing VM which is way faster than CPython but > its performances are limited by the current C API. The VM is called... > PyPy! > > The bet is that migrating to a new C API would make your C extension faster. Faster on PyPy... but potentially slower on CPython. That's what we (you :-)) need to investigate and solve. Those macros and inline functions are actually important for many use cases. For example in PyArrow we use PySequence_Fast_GET_ITEM() (*) and even PyType_HasFeature() (**) (to quickly check for multiple base types with a single fetch and comparison). (*) https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/iterators.h#L39-L86 (**) https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/helpers.cc#L266-L299 Regards Antoine. From vstinner at redhat.com Wed Nov 14 05:48:15 2018 From: vstinner at redhat.com (Victor Stinner) Date: Wed, 14 Nov 2018 11:48:15 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <20181114112055.2131d35b@fsol> References: <20181114112055.2131d35b@fsol> Message-ID: Le mer. 14 nov. 2018 ? 11:24, Antoine Pitrou a ?crit : > For example in PyArrow we use PySequence_Fast_GET_ITEM() (*) Maybe PyArrow is a kind of C extension which should have one implementation for the new C API (PyPy) and one implementation for the current C API (CPython)? Cython can be used to generate two different C code from the same source code using a different compilation mode. > and even > PyType_HasFeature() (**) (to quickly check for multiple base types with > a single fetch and comparison). I'm not sure that PyType_HasFeature() is an issue? Victor From vstinner at redhat.com Wed Nov 14 06:23:03 2018 From: vstinner at redhat.com (Victor Stinner) Date: Wed, 14 Nov 2018 12:23:03 +0100 Subject: [Python-Dev] General concerns about C API changes In-Reply-To: References: <62A87B15-0F0E-4242-BF44-9C58FE890D8B@gmail.com> Message-ID: Hi, I made many different kinds of changes to the C API last weeks. None of them should impact the backward compatibility. If it's the case, the changes causing the backward compatibility should be reverted. The most controversial changes are the conversion of macros to static inline functions, but these changes have been discussed in length and approved by multiple core developers. (*) Move private internal API outside Include/: continue the work started in Python 3.7 to move it to Include/internal/ This work is almost complete. The remaining issue is the _PyObject_GC_TRACK() function. The main change is that an explicit #include "pycore_
.h" is now required in C code. https://bugs.python.org/issue35081 https://mail.python.org/pipermail/python-dev/2018-October/155587.html https://mail.python.org/pipermail/python-dev/2018-November/155688.html (*) Move "unstable" API outside Include/: move "#ifndef Py_LIMITED_API" code to a new subdirectory This work didn't start. It's still being discussed to see how we should do it. I chose to try to finish Include/internal/ first. https://bugs.python.org/issue35134 (*) Add a *new* C API which doesn't leak implementation details This work didn't start. It's still under discussion, I'm not sure that it's going to happen in the master branch in the short term. https://pythoncapi.readthedocs.io/ https://bugs.python.org/issue35206 https://mail.python.org/pipermail/python-dev/2018-November/155702.html (*) Convert macros to static inline functions I guess that Raymond is worried by the impact on performance of these changes. So far, the following macros have been converted to static inline functions: * PyObject_INIT(), PyObject_INIT_VAR() * _Py_NewReference(), _Py_ForgetReference() * Py_INCREF(), Py_DECREF() * Py_XINCREF(), Py_XDECREF() * _Py_Dealloc() First, I attempted for "force inlining" (ex: __attribute__((always_inline)) for GCC/Clang), but it has been decided to not do that. Please read the discussion on the issue for the rationale. https://bugs.python.org/issue35059 I modified the Visual Studio project to ask the compiler to respect "inline" *hint* when CPython is compiled in debug mode: "Set InlineFunctionExpansion to OnlyExplicitInline ("/Ob1" option) on all projects (in pyproject.props) in Debug mode on Win32 and x64 platforms to expand functions marked as inline." This change spotted a bug in _decimal ("Add missing EXTINLINE in mpdecimal.h"). Macros have many pitfalls, and the intent here is to prevent these pitfalls.: https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html _Py_Dealloc() example: #ifdef COUNT_ALLOCS #define _Py_INC_TPFREES(OP) dec_count(Py_TYPE(OP)) #define _Py_COUNT_ALLOCS_COMMA , #else #define _Py_INC_TPFREES(OP) #define _Py_COUNT_ALLOCS_COMMA #endif /* COUNT_ALLOCS */ #define _Py_Dealloc(op) ( \ _Py_INC_TPFREES(op) _Py_COUNT_ALLOCS_COMMA \ (*Py_TYPE(op)->tp_dealloc)((PyObject *)(op))) _Py_Dealloc() produced something like "dealloc()" or "dec_count(), dealloc()" depending on compiler options on Python 3.7. On Python 3.8, it's now a well defined function call which returns "void" (no result): "[static inline] void _Py_Dealloc(PyObject *);" Another example: #define Py_DECREF(op) \ do { \ PyObject *_py_decref_tmp = (PyObject *)(op); \ if (_Py_DEC_REFTOTAL _Py_REF_DEBUG_COMMA \ --(_py_decref_tmp)->ob_refcnt != 0) \ _Py_CHECK_REFCNT(_py_decref_tmp) \ else \ _Py_Dealloc(_py_decref_tmp); \ } while (0) This macro required a temporary "_py_decref_tmp" variable in Python 3.7, to cast the argument to PyObject*. It's gone in Python 3.8: static inline void _Py_DECREF(const char *filename, int lineno, PyObject *op) { _Py_DEC_REFTOTAL; if (--op->ob_refcnt != 0) { #ifdef Py_REF_DEBUG if (op->ob_refcnt < 0) { _Py_NegativeRefcount(filename, lineno, op); } #endif } else { _Py_Dealloc(op); } } #define Py_DECREF(op) _Py_DECREF(__FILE__, __LINE__, (PyObject *)(op)) The cast is now done in Py_DECREF() macro. The Py_DECREF() contract is that it accepts basically any pointer. I chose to not change that (always require the exact PyObject* type and nothing else), since it would create a compiler warning on basically every usage of Py_DECREF() for no real benefit. Victor From vstinner at redhat.com Wed Nov 14 06:47:33 2018 From: vstinner at redhat.com (Victor Stinner) Date: Wed, 14 Nov 2018 12:47:33 +0100 Subject: [Python-Dev] General concerns about C API changes In-Reply-To: References: <62A87B15-0F0E-4242-BF44-9C58FE890D8B@gmail.com> Message-ID: > First, I attempted for "force inlining" (ex: > __attribute__((always_inline)) for GCC/Clang), but it has been decided > to not do that. Please read the discussion on the issue for the > rationale. > > https://bugs.python.org/issue35059 It has been decided to use "static inline" syntax since it's the one required by the PEP 7, but I'm open to switch to "inline" (without static) or something else. Honestly, I don't understand exactly all consequences of the exact syntax, so I relied on other core devs who understand C99 that better than me :-) Join the discussion on the issue ;-) Victor From solipsis at pitrou.net Wed Nov 14 07:10:08 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 14 Nov 2018 13:10:08 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <20181114112055.2131d35b@fsol> Message-ID: <20181114131008.6ec6c7e0@fsol> On Wed, 14 Nov 2018 11:48:15 +0100 Victor Stinner wrote: > Le mer. 14 nov. 2018 ? 11:24, Antoine Pitrou a ?crit : > > For example in PyArrow we use PySequence_Fast_GET_ITEM() (*) > > Maybe PyArrow is a kind of C extension which should have one > implementation for the new C API (PyPy) and one implementation for the > current C API (CPython)? Yes, maybe. I'm just pointing out that we're using those macros and removing them from the C API (or replacing them with non-inline functions) would hurt us. > > and even > > PyType_HasFeature() (**) (to quickly check for multiple base types with > > a single fetch and comparison). > > I'm not sure that PyType_HasFeature() is an issue? I don't know. You're the one who decides :-) cheers Antoine. From nd at perlig.de Wed Nov 14 08:11:00 2018 From: nd at perlig.de (=?ISO-8859-1?Q?Andr=E9?= Malo) Date: Wed, 14 Nov 2018 14:11:00 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <1614616.psXdYk7axO@chryseis> Message-ID: <220319772.FgY98erabj@finnegan> On Dienstag, 13. November 2018 21:59:14 CET Victor Stinner wrote: > Le mar. 13 nov. 2018 ? 20:32, Andr? Malo a ?crit : > > As long as they are recompiled. However, they will lose a lot of > > performance. Both these points have been mentioned somewhere, I'm > > certain, but it cannot be stressed enough, IMHO. > > Somewhere is here: > https://pythoncapi.readthedocs.io/performance.html > > I'm wondering, how you suggest to measure "major". I believe, every C > > extension, which is public and running in production somewhere, is major > > enough. > > My plan is to select something like the top five most popular C > extensions based on PyPI download statistics. I cannot test > everything, I have to put practical limits. You shouldn't. Chances are, that you don't even know them enough to do that. A scalable approach would be to talk to the projects and let them do it instead. No? Cheers, -- package Hacker::Perl::Another::Just;print qq~@{[reverse split/::/ =>__PACKAGE__]}~; # Andr? Malo # http://www.perlig.de # From p.f.moore at gmail.com Wed Nov 14 08:36:08 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 14 Nov 2018 13:36:08 +0000 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <1614616.psXdYk7axO@chryseis> Message-ID: On Tue, 13 Nov 2018 at 21:02, Victor Stinner wrote: > My plan is to select something like the top five most popular C > extensions based on PyPI download statistics. I cannot test > everything, I have to put practical limits. You should probably also consider embedding applications - these have the potential to be adversely affected too. One example would be vim, which embeds Python, and makes fairly heavy use of the API (in some relatively nonstandard ways, for better or worse). Paul PS What percentage does "top 5" translate to? In terms of both downloads and actual numbers of extensions? With only 5, it would be very easy (I suspect) to get only scientific packages, and (for example) miss out totally on database APIs, or web helpers. You'll likely get a broader sense of where issues lie if you cover a wide range of application domains. PPS I'd like to see a summary of your backward compatibility plan. I've not been following this thread so maybe I missed it (if so, a pointer would be appreciated), but I'd expect as a user that extensions and embedding applications would *not* need a major rewrite to work with Python 3.8 - that being the implication of "opt in". I'd also expect that to remain true for any future version of Python - assuming the experiment is successful, forced (as opposed to opt-in) migration to the new API would be handled in a gradual, backward compatibility respecting manner, exactly as any other changes to the C API are. A hard break like Python 3, even if limited to the C API, would be bad for users (for example, preventing adoption of Python 3.X until the scientific stack migrates to the new API and works out how to handle supporting old-API versions of Python...) From vstinner at redhat.com Wed Nov 14 09:28:19 2018 From: vstinner at redhat.com (Victor Stinner) Date: Wed, 14 Nov 2018 15:28:19 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <1614616.psXdYk7axO@chryseis> Message-ID: Le mer. 14 nov. 2018 ? 14:36, Paul Moore a ?crit : > PS What percentage does "top 5" translate to? In terms of both > downloads and actual numbers of extensions? With only 5, it would be > very easy (I suspect) to get only scientific packages, and (for > example) miss out totally on database APIs, or web helpers. You'll > likely get a broader sense of where issues lie if you cover a wide > range of application domains. I don't want to force anyone to move to a new experimental API. I don't want to propose patches to third party modules for example. I would like to ensure that I don't break too many C extensions, or that tools to convert C extensions to the new API work as expected :-) Everything is experimental. > PPS I'd like to see a summary of your backward compatibility plan. https://pythoncapi.readthedocs.io/backward_compatibility.html > assuming the experiment is successful, forced (as opposed to opt-in) > migration to the new API would be handled in a gradual, No, the current C API will remain available. No one is forced to do anything. That's not part of my plan. Victor From p.f.moore at gmail.com Wed Nov 14 09:39:11 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 14 Nov 2018 14:39:11 +0000 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <1614616.psXdYk7axO@chryseis> Message-ID: On Wed, 14 Nov 2018 at 14:28, Victor Stinner wrote: > > assuming the experiment is successful, forced (as opposed to opt-in) > > migration to the new API would be handled in a gradual, > > No, the current C API will remain available. No one is forced to do > anything. That's not part of my plan. Oh, cool. So current code will continue working indefinitely? What's the incentive for projects to switch to the new API in that case? Won't we just end up having to carry two APIs indefinitely? Sorry if this is all obvious, or was explained previously - as I said I've not been following precisely because I assumed it was all being handled on an "if you don't care you can ignore it and nothing will change" basis, but Raymond's comments plus your suggestion that you needed to test existing C extensions, made me wonder. If it is the case that there's no need for any 3rd party code to change in order to continue working with 3.8+, then I apologise for the interruption. Paul From p.f.moore at gmail.com Wed Nov 14 09:53:25 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 14 Nov 2018 14:53:25 +0000 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <1614616.psXdYk7axO@chryseis> Message-ID: On Wed, 14 Nov 2018 at 14:39, Paul Moore wrote: > If it is the case that there's no need for any 3rd party code to > change in order to continue working with 3.8+, then I apologise for > the interruption. This is where being able to edit posts, a la Discourse would be useful :-) It occurs to me that we may be talking at cross purposes. I noticed https://pythoncapi.readthedocs.io/backward_compatibility.html#forward-compatibility-with-python-3-8-and-newer which seems to be saying that 3rd party code *will* need to change for 3.8. You mention removed functions there, so I guess "stop using the removed functions and you'll work with 3.8+ and <=3.7" is the compatible approach - but it doesn't offer a way for projects that *need* the functionality that's been removed to move forward. That's the type of hard break that I was trying to ask about, and which I thought you said would not happen when you stated "I don't want to force anyone to move to a new experimental API", and "No, the current C API will remain available. No one is forced to do anything. That's not part of my plan". So to try to be clear, your proposal is that in 3.8: 1. The existing C API will remain 2. A new C API will be *added* that 3rd party projects can use should they wish to. And in 3.9 onwards, both C APIs will remain, maybe with gradual and incremental changes that move users of the existing C API closer and closer to the new one (via deprecations, replacement APIs etc as per our normal compatibility rules). Or is the intention that at *some* point there will be a compatibility break and the existing API will simply be removed in favour of the "new" API? Fundamentally, that's what I'm trying to get a clear picture of. The above is clear, but I don't see what incentive there is in that scenario for anyone to actually migrate to the new API... Paul From J.Demeyer at UGent.be Wed Nov 14 09:48:43 2018 From: J.Demeyer at UGent.be (Jeroen Demeyer) Date: Wed, 14 Nov 2018 15:48:43 +0100 Subject: [Python-Dev] General concerns about C API changes In-Reply-To: <60176b8d36074e1381b301a515467327@xmail203.UGent.be> References: <60176b8d36074e1381b301a515467327@xmail203.UGent.be> Message-ID: <5BEC35CB.9010101@UGent.be> On 2018-11-14 04:06, Raymond Hettinger wrote: > With cross module function calls, I'm less confident about what is happening If the functions are "static inline" (as opposed to plain "inline"), those aren't really cross-module function calls. Because the functions are "static" and defined in a header file, every module has its own copy of the function. If the function is not inlined in the end, this would inflate the compiled size because you end up with multiple compilations of the same code in the CPython library. It would not affect correct functioning in any way though. If the function *is* inlined, then the result should be no different from using a macro. Jeroen. From vstinner at redhat.com Wed Nov 14 11:00:35 2018 From: vstinner at redhat.com (Victor Stinner) Date: Wed, 14 Nov 2018 17:00:35 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <1614616.psXdYk7axO@chryseis> Message-ID: In short, you don't have to modify your C extensions and they will continue to work as before on Python 3.8. I only propose to add a new C API, don't touch the existing one in any way. Introducing backward incompatible changes to the existing C API is out of my plan. /usr/bin/python3.8 will support C extensions compiled with the old C API and C extensions compiled with the new C API. My plan also includes to be able to write C extensions compatible with the old and new C API in a single code base. As we have Python code working nicely on Python 2 and Python 3 (thanks to six, mostly). In my experience, having two branches or two repositories for two flavors of the same code is a nice recipe towards inconsistent code and painful workflow. Le mer. 14 nov. 2018 ? 15:53, Paul Moore a ?crit : > It occurs to me that we may be talking at cross purposes. I noticed > https://pythoncapi.readthedocs.io/backward_compatibility.html#forward-compatibility-with-python-3-8-and-newer > which seems to be saying that 3rd party code *will* need to change for > 3.8. Oh. It's badly explained in that case. This section is only about C extensions which really want to become compatible with the new C API. > You mention removed functions there, so I guess "stop using the > removed functions and you'll work with 3.8+ and <=3.7" is the > compatible approach - but it doesn't offer a way for projects that > *need* the functionality that's been removed to move forward. If you need a removed functions, don't use the new C API. > So to try to be clear, your proposal is that in 3.8: > > 1. The existing C API will remain > 2. A new C API will be *added* that 3rd party projects can use should > they wish to. Yes, that's it. Add a new API, don't touch existing API. > And in 3.9 onwards, both C APIs will remain, maybe with gradual and > incremental changes that move users of the existing C API closer and > closer to the new one (via deprecations, replacement APIs etc as per > our normal compatibility rules). Honestly, it's too early to say if we should modify the current C API in any way. I only plan to put advices in the *documentation*. Something like "this function is really broken, don't use it" :-) Or "you can use xxx instead which makes your code compatible with the new C API". But I don't plan it to modify the doc soon. It's too early at this point. > Or is the intention that at *some* > point there will be a compatibility break and the existing API will > simply be removed in favour of the "new" API? That's out of the scope of *my* plan. Maybe someone else will show up in 10 years and say "ok, let's deprecate the old C API". But in my experience, legacy stuff never goes away :-) (Python 2, anyone?) > The above is clear, but I don't see what incentive there is in that > scenario for anyone to actually migrate to the new API... https://pythoncapi.readthedocs.io/ tries to explain why you should want to be compatible with the new C API. The main advantage of the new C API is to compile your C extension once and use it on multiple runtimes: * use PyPy for better performances (better than with the old C API) * use a Python Debug Runtime which contains additional runtime checks to detect various kinds of bugs in your C extension * distribute a single binary working on multiple Python versions (compile on 3.8, use it on 3.9): "stable ABI" -- we are no there yet, I didn't check what should be done in practice for that I hope that "later" we will get a faster CPython using , only compatible with C extensions compiled with the new C API. My secret hope is that it should ease the experimentation of a (yet another) JIT compiler for CPython :-) Victor From p.f.moore at gmail.com Wed Nov 14 11:28:02 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 14 Nov 2018 16:28:02 +0000 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <1614616.psXdYk7axO@chryseis> Message-ID: On Wed, 14 Nov 2018 at 16:00, Victor Stinner wrote: > > In short, you don't have to modify your C extensions and they will > continue to work as before on Python 3.8. [...] > I hope that "later" we will get a faster CPython using new optimizations there>, only compatible with C extensions compiled > with the new C API. My secret hope is that it should ease the > experimentation of a (yet another) JIT compiler for CPython :-) OK, got it. Thanks for taking the time to clarify and respond to my concerns. Much appreciated. Paul From vstinner at redhat.com Wed Nov 14 12:25:09 2018 From: vstinner at redhat.com (Victor Stinner) Date: Wed, 14 Nov 2018 18:25:09 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <1614616.psXdYk7axO@chryseis> Message-ID: Le mer. 14 nov. 2018 ? 17:28, Paul Moore a ?crit : > OK, got it. Thanks for taking the time to clarify and respond to my > concerns. Much appreciated. I'm my fault. I am failing to explain my plan proplerly. It seems like I had to update my website to better explain :-) Victor From maik.riechert at arcor.de Wed Nov 14 12:44:18 2018 From: maik.riechert at arcor.de (Maik Riechert) Date: Wed, 14 Nov 2018 17:44:18 +0000 Subject: [Python-Dev] Clarify required Visual C++ compiler for binary extensions in 3.7 Message-ID: <35aa558e-9595-c0f4-24bd-9e8cfcc79d4d@arcor.de> Hi, I just wanted to point out that the page https://wiki.python.org/moin/WindowsCompilers doesn't mention Python 3.7 in the table. Is the compiler still Visual C++ 14.0? Thanks Maik From vstinner at redhat.com Wed Nov 14 18:20:56 2018 From: vstinner at redhat.com (Victor Stinner) Date: Thu, 15 Nov 2018 00:20:56 +0100 Subject: [Python-Dev] Clarify required Visual C++ compiler for binary extensions in 3.7 In-Reply-To: <35aa558e-9595-c0f4-24bd-9e8cfcc79d4d@arcor.de> References: <35aa558e-9595-c0f4-24bd-9e8cfcc79d4d@arcor.de> Message-ID: I maintain my own compatibility matrix: https://pythondev.readthedocs.io/windows.html#python-and-visual-studio-version-matrix Ned Deily asked to update the devguide from my table :-) https://github.com/python/devguide/issues/433 Victor Le mer. 14 nov. 2018 ? 18:56, Maik Riechert a ?crit : > > Hi, > > I just wanted to point out that the page > https://wiki.python.org/moin/WindowsCompilers doesn't mention Python 3.7 > in the table. Is the compiler still Visual C++ 14.0? > > Thanks > Maik > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com From greg at krypto.org Wed Nov 14 19:03:09 2018 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 14 Nov 2018 16:03:09 -0800 Subject: [Python-Dev] General concerns about C API changes In-Reply-To: <62A87B15-0F0E-4242-BF44-9C58FE890D8B@gmail.com> References: <62A87B15-0F0E-4242-BF44-9C58FE890D8B@gmail.com> Message-ID: On Tue, Nov 13, 2018 at 7:06 PM Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > Overall, I support the efforts to improve the C API, but over the last few > weeks have become worried. I don't want to hold up progress with fear, > uncertainty, and doubt. Yet, I would like to be more comfortable that > we're all aware of what is occurring and what are the potential benefits > and risks. > > * Inline functions are great. They provide true local variables, better > separation of concerns, are far less kludgy than text based macro > substitution, and will typically generate the same code as the equivalent > macro. This is good tech when used with in a single source file where it > has predictable results. > > However, I'm not at all confident about moving these into header files > which are included in multiple target .c files which need be compiled into > separate .o files and linked to other existing libraries. > > With a macro, I know for sure that the substitution is taking place. This > happens at all levels of optimization and in a debug mode. The effects are > 100% predictable and have a well-established track record in our mature > battle-tested code base. With cross module function calls, I'm less > confident about what is happening, partly because compilers are free to > ignore inline directives and partly because the semantics of inlining are > less clear when the crossing module boundaries. > > * Other categories of changes that we make tend to have only a shallow > reach. However, these C API changes will likely touch every C extension > that has ever been written, some of which is highly tuned but not actively > re-examined. If any mistakes are make, they will likely be pervasive. > Accordingly, caution is warranted. > > My expectation was that the changes would be conducted in experimental > branches. But extensive changes are already being made (or about to be > made) on the 3.8 master. If a year from now, we decide that the changes > were destabilizing or that the promised benefits didn't materialize, they > will be difficult to undo because there are so many of them and because > they will be interleaved with other changes. > > The original motivation was to achieve a 2x speedup in return for > significantly churning the C API. However, the current rearranging of the > include files and macro-to-inline-function changes only give us churn. At > the very best, they will be performance neutral. At worst, formerly cheap > macro calls will become expensive in places that we haven't thought to run > timings on. Given that compilers don't have to honor an inline directive, > we can't really know for sure -- perhaps today it works out fine, and > perhaps tomorrow the compilers opt for a different behavior. > > Maybe everything that is going on is fine. Maybe it's not. I am not > expert enough to know for sure, but we should be careful before > green-lighting such an extensive series of changes directly to master. > Reasonable questions to ask are: 1) What are the risks to third party > modules, 2) Do we really know that the macro-to-inline-function > transformations are semantically neutral. 3) If there is no performance > benefit (none has been seen so far, nor is any promised in the pending > PRs), is it worth it? > > We do know that PyPy folks have had their share of issues with the C API, > but I'm not sure that we can make any of this go away without changing the > foundations of the whole ecosystem. It is inconvenient for a full GC > environment to interact with the API for a reference counted environment -- > I don't think we can make this challenge go away without giving up > reference counting. It is inconvenient for a system that manifests objects > on demand to interact with an API that assumes that objects have identity > and never more once they are created -- I don't think we can make this go > away either. It is inconvenient to a system that uses unboxed data to > interact with our API where everything is an object that includes a type > pointer and reference count -- We have provided an API for boxing and > boxing, but the trip back-and-forth is inconveniently expensive -- I don't > think we can make that go away either because too much of the ecosystem > depends on that API. There are some things that ca > n be mitigated such as challenges with borrowed references but that > doesn't seem to have been the focus on any of the PRs. > > In short, I'm somewhat concerned about the extensive changes that are > occurring. I do know they will touch substantially every C module in the > entire ecosystem. I don't know whether they are safe or whether they will > give any real benefit. > > FWIW, none of this is a criticism of the work being done. Someone needs > to think deeply about the C API or else progress will never be made. That > said, it is a high risk project with many PRs going directly into master, > so it does warrant having buy in that the churn isn't destabilizing and > will actually produce a benefit that is worth it. > > > Raymond > > While I haven't looked at *all* of the existing changes, glancing at an example one, the XINCREF and XDECREF macro -> static inline .h function change from https://github.com/python/cpython/commit/541497e6197268517b0d492856027774c43e0949, I don't have any concerns. Because I am confident that gcc and clang will behave well with this type of change. >From my point of view: A static inline function is a much nicer modern code style than a C preprocessor macro. I expect the largest visible impact may be that a profiler may now attribute CPU cycles takes by these code snippets to the function from the .h file rather than directly to the functions the macro expanded in in the past due to additional debug symbol info attribution. Just more data. Consider that a win. I do not believe any compiler will generate poor code from this on any meaningful system and compiler configuration (-O0 or -Og code seems irrelevant, nothing ships that way). Regardless, pablogsal appears to be doing the homework to double check which seems like a good plan before we release 3.8. :) -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Wed Nov 14 19:08:41 2018 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 14 Nov 2018 16:08:41 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: > > It seems like the discussion so far is: > > Victor: "I know people when people hear 'new API' they get scared and > think we're going to do a Python-3-like breaking transition, but don't > worry, we're never going to do that." > Nathaniel: "But then what does the new API add?" > Greg: "It lets us do a Python-3-like breaking transition!" > That is not what I am proposing but it seems too easy for people to misunderstand it as such. Sorry. Between everything discussed across this thread I believe we have enough information to suggest that we can avoid an "everyone's afraid of a new 3" mistake by instead making a shim available with a proposed new API that works on top of existing Python VM(s) so that if we decide to drop the old API being public in the future, we could do so *without a breaking transition*. Given that, I suggest not worrying about defining a new C API within the CPython project and release itself (yet). Without an available benefit, little will use it (and given the function call overhead we want to isolate some concepts, we know it will perform worse on today's VMs). That "top-5" module using it idea? Maintain forks (hooray for git) of whatever your definition of "top-5" projects is that use the new API instead of the CPython API. If you attempt this on things like NumPy, you may be shocked at the states (plural on purpose) of their extension module code. That holds true for a lot of popular modules. Part of the point of this work is to demonstrate that non-incremental order of magnitude performance change can be had on a Python VM that only supports such an API can be done in its own fork of CPython, PyPy, VictorBikeshedPy, FbIsAfraidToReleaseANewGcVmPy, etc. implementation to help argue for figuring out a viable not-breaking-the-world transition plan to do such a C API change thing in CPython itself. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From vstinner at redhat.com Wed Nov 14 20:28:19 2018 From: vstinner at redhat.com (Victor Stinner) Date: Thu, 15 Nov 2018 02:28:19 +0100 Subject: [Python-Dev] General concerns about C API changes In-Reply-To: References: <62A87B15-0F0E-4242-BF44-9C58FE890D8B@gmail.com> Message-ID: Le jeu. 15 nov. 2018 ? 01:06, Gregory P. Smith a ?crit : > I expect the largest visible impact may be that a profiler may now attribute CPU cycles takes by these code snippets to the function from the .h file rather than directly to the functions the macro expanded in in the past due to additional debug symbol info attribution. Just more data. Consider that a win. Oh. That's very interesting. I just tried gdb and I confirm that gdb understands well inlined function. When I debug Python, gdb moves into Py_INCREF() or Py_DECREF() when I use "next". I also tried perf record/perf report: if I annotate a function (assembler code of the function), perf shows me the C code of inlined Py_INCREF and Py_DECREF! That's nice! Victor From maik.riechert at arcor.de Thu Nov 15 03:43:35 2018 From: maik.riechert at arcor.de (Maik Riechert) Date: Thu, 15 Nov 2018 08:43:35 +0000 Subject: [Python-Dev] Clarify required Visual C++ compiler for binary extensions in 3.7 In-Reply-To: References: <35aa558e-9595-c0f4-24bd-9e8cfcc79d4d@arcor.de> Message-ID: <85d6be18-d4b3-8b9f-ec63-502721629afe@arcor.de> Thanks Victor. I noticed you write "For Python 3.7, the minimum is VS 2017 Community, minimum installer options:" but then in the table the version listed is 2015 or 2017, so the minimum then probably is 2015? Maybe it should be mentioned on this page (or a separate one) that those versions have to be used as well when building binary extensions for Python. Maik On 14/11/2018 23:20, Victor Stinner wrote: > I maintain my own compatibility matrix: > https://pythondev.readthedocs.io/windows.html#python-and-visual-studio-version-matrix > > Ned Deily asked to update the devguide from my table :-) > https://github.com/python/devguide/issues/433 > > Victor > Le mer. 14 nov. 2018 ? 18:56, Maik Riechert a ?crit : >> Hi, >> >> I just wanted to point out that the page >> https://wiki.python.org/moin/WindowsCompilers doesn't mention Python 3.7 >> in the table. Is the compiler still Visual C++ 14.0? >> >> Thanks >> Maik >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com From status at bugs.python.org Fri Nov 16 12:10:03 2018 From: status at bugs.python.org (Python tracker) Date: Fri, 16 Nov 2018 18:10:03 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20181116171003.B8BAB13390C@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2018-11-09 - 2018-11-16) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6863 (+29) closed 40154 (+36) total 47017 (+65) Open issues with patches: 2741 Issues opened (46) ================== #31354: Fixing a bug related to LTO only build https://bugs.python.org/issue31354 reopened by ned.deily #33826: enable discovery of class source code in IPython interactively https://bugs.python.org/issue33826 reopened by xtreak #35177: Add missing dependencies between AST/parser header files https://bugs.python.org/issue35177 reopened by vstinner #35202: Remove unused imports in standard library https://bugs.python.org/issue35202 opened by thatiparthy #35203: Windows Installer Ignores Launcher Installer Options Where The https://bugs.python.org/issue35203 opened by gr-dexterl #35206: [WIP] Add a new experimental _Py_CAPI2 API https://bugs.python.org/issue35206 opened by vstinner #35208: IDLE: Squeezed lines count ignores window width https://bugs.python.org/issue35208 opened by taleinat #35210: Use bytes + memoryview + resize instead of bytesarray + array https://bugs.python.org/issue35210 opened by tzickel #35212: Expressions with format specifiers in f-strings give wrong cod https://bugs.python.org/issue35212 opened by arminius #35213: IDLE: use 'macOS' where appropriate. https://bugs.python.org/issue35213 opened by terry.reedy #35214: Get the test suite passing with clang Memory Sanitizer enabled https://bugs.python.org/issue35214 opened by gregory.p.smith #35216: misleading error message from shutil.copy() https://bugs.python.org/issue35216 opened by cedricvanrompay #35217: REPL history is broken when python is invoked with cmd /c https://bugs.python.org/issue35217 opened by ????????? #35218: decompressing and then re-compressing zipfiles with Python 3 z https://bugs.python.org/issue35218 opened by keeely #35220: delete "how do I emulate os.kill" section in Windows FAQ https://bugs.python.org/issue35220 opened by deronnax #35221: Enhance venv activate commands readability https://bugs.python.org/issue35221 opened by mdk #35224: PEP 572: Assignment Expressions https://bugs.python.org/issue35224 opened by emilyemorehouse #35225: test_faulthandler fails under ubsan https://bugs.python.org/issue35225 opened by benjamin.peterson #35226: mock.call equality surprisingly broken https://bugs.python.org/issue35226 opened by cjw296 #35227: [RFE] tarfile: support adding file objects without prior known https://bugs.python.org/issue35227 opened by mgorny #35228: Index search in CHM help crashes viewer https://bugs.python.org/issue35228 opened by chrullrich #35231: make install may not call ldconfig on GNU/Linux when using --e https://bugs.python.org/issue35231 opened by pmpp #35232: Add `module`/`qualname` arguments to make_dataclass for pickla https://bugs.python.org/issue35232 opened by Antony.Lee #35234: ssl module falls over with internationalized domain names https://bugs.python.org/issue35234 opened by mcasadevall #35235: Access violation on alloc in Windows x86-64 python, pymalloc_a https://bugs.python.org/issue35235 opened by markind #35236: urllib.request.urlopen throws on some valid FTP files https://bugs.python.org/issue35236 opened by Ian Liu Rodrigues #35238: Alleviate memory reservation of fork_exec in subprocess.Popen https://bugs.python.org/issue35238 opened by oesteban #35240: Travis CI: xvfb-run: error: Xvfb failed to start https://bugs.python.org/issue35240 opened by vstinner #35242: multiprocessing.Queue in an inconsistent state and a traceback https://bugs.python.org/issue35242 opened by szobov #35243: readline timeout too long for async gfx use https://bugs.python.org/issue35243 opened by pmpp #35244: Allow to setup Clang as default compiler for modules build https://bugs.python.org/issue35244 opened by Jaime Torres #35246: asyncio.create_subprocess_exec doesn't accept pathlib.Path lik https://bugs.python.org/issue35246 opened by lilydjwg #35247: test.test_socket.RDSTest.testPeek hangs indefinitely https://bugs.python.org/issue35247 opened by markmcclain #35248: RawArray causes FileNotFoundError at cleanup https://bugs.python.org/issue35248 opened by Mathieu Lamarre #35250: Minor parameter documentation mismatch for turtle https://bugs.python.org/issue35250 opened by Shane Smith #35251: FTPHandler.ftp_open documentation error https://bugs.python.org/issue35251 opened by lys.nikolaou #35252: test_functools dead code after FIXME https://bugs.python.org/issue35252 opened by lys.nikolaou #35253: Linker warning LNK4281 https://bugs.python.org/issue35253 opened by neyuru #35255: delete "How do I extract the downloaded documentation" section https://bugs.python.org/issue35255 opened by deronnax #35257: Add LDFLAGS_NODIST for the LDFLAGS not intended for propagatio https://bugs.python.org/issue35257 opened by cstratak #35258: Consider enabling -Wmissing-prototypes https://bugs.python.org/issue35258 opened by izbyshev #35259: Py_FinalizeEx unconditionally exists in Py_LIMITED_API https://bugs.python.org/issue35259 opened by AJNeufeld #35263: Add None handling for get_saved() in IDLE https://bugs.python.org/issue35263 opened by rhettinger #35264: SSL Module build fails with OpenSSL 1.1.0 for Python 2.7 https://bugs.python.org/issue35264 opened by Alexandru Ardelean #35265: Internal C API: pass the memory allocator in a context https://bugs.python.org/issue35265 opened by vstinner #35266: Add _PyPreConfig and rework _PyCoreConfig and _PyMainInterpret https://bugs.python.org/issue35266 opened by vstinner Most recent 15 issues with no replies (15) ========================================== #35266: Add _PyPreConfig and rework _PyCoreConfig and _PyMainInterpret https://bugs.python.org/issue35266 #35265: Internal C API: pass the memory allocator in a context https://bugs.python.org/issue35265 #35264: SSL Module build fails with OpenSSL 1.1.0 for Python 2.7 https://bugs.python.org/issue35264 #35263: Add None handling for get_saved() in IDLE https://bugs.python.org/issue35263 #35259: Py_FinalizeEx unconditionally exists in Py_LIMITED_API https://bugs.python.org/issue35259 #35258: Consider enabling -Wmissing-prototypes https://bugs.python.org/issue35258 #35255: delete "How do I extract the downloaded documentation" section https://bugs.python.org/issue35255 #35251: FTPHandler.ftp_open documentation error https://bugs.python.org/issue35251 #35246: asyncio.create_subprocess_exec doesn't accept pathlib.Path lik https://bugs.python.org/issue35246 #35243: readline timeout too long for async gfx use https://bugs.python.org/issue35243 #35238: Alleviate memory reservation of fork_exec in subprocess.Popen https://bugs.python.org/issue35238 #35236: urllib.request.urlopen throws on some valid FTP files https://bugs.python.org/issue35236 #35232: Add `module`/`qualname` arguments to make_dataclass for pickla https://bugs.python.org/issue35232 #35231: make install may not call ldconfig on GNU/Linux when using --e https://bugs.python.org/issue35231 #35228: Index search in CHM help crashes viewer https://bugs.python.org/issue35228 Most recent 15 issues waiting for review (15) ============================================= #35266: Add _PyPreConfig and rework _PyCoreConfig and _PyMainInterpret https://bugs.python.org/issue35266 #35265: Internal C API: pass the memory allocator in a context https://bugs.python.org/issue35265 #35264: SSL Module build fails with OpenSSL 1.1.0 for Python 2.7 https://bugs.python.org/issue35264 #35263: Add None handling for get_saved() in IDLE https://bugs.python.org/issue35263 #35255: delete "How do I extract the downloaded documentation" section https://bugs.python.org/issue35255 #35252: test_functools dead code after FIXME https://bugs.python.org/issue35252 #35250: Minor parameter documentation mismatch for turtle https://bugs.python.org/issue35250 #35236: urllib.request.urlopen throws on some valid FTP files https://bugs.python.org/issue35236 #35226: mock.call equality surprisingly broken https://bugs.python.org/issue35226 #35224: PEP 572: Assignment Expressions https://bugs.python.org/issue35224 #35218: decompressing and then re-compressing zipfiles with Python 3 z https://bugs.python.org/issue35218 #35214: Get the test suite passing with clang Memory Sanitizer enabled https://bugs.python.org/issue35214 #35213: IDLE: use 'macOS' where appropriate. https://bugs.python.org/issue35213 #35210: Use bytes + memoryview + resize instead of bytesarray + array https://bugs.python.org/issue35210 #35208: IDLE: Squeezed lines count ignores window width https://bugs.python.org/issue35208 Top 10 most discussed issues (10) ================================= #34160: ElementTree not preserving attribute order https://bugs.python.org/issue34160 20 msgs #35195: [Windows] Python 3.7 initializes LC_CTYPE locale at startup, c https://bugs.python.org/issue35195 19 msgs #35214: Get the test suite passing with clang Memory Sanitizer enabled https://bugs.python.org/issue35214 16 msgs #35196: IDLE text squeezer is too aggressive and is slow https://bugs.python.org/issue35196 15 msgs #33725: Python crashes on macOS after fork with no exec https://bugs.python.org/issue33725 10 msgs #35202: Remove unused imports in standard library https://bugs.python.org/issue35202 10 msgs #35200: Range repr could be better https://bugs.python.org/issue35200 9 msgs #35213: IDLE: use 'macOS' where appropriate. https://bugs.python.org/issue35213 9 msgs #33196: multiprocessing: serialization must ensure that contexts are c https://bugs.python.org/issue33196 8 msgs #35081: Move internal headers to Include/internal/ https://bugs.python.org/issue35081 8 msgs Issues closed (38) ================== #23220: IDLE: Document how Shell displays user code output https://bugs.python.org/issue23220 closed by terry.reedy #23734: zipimport should not check pyc timestamps against zipped py fi https://bugs.python.org/issue23734 closed by gregory.p.smith #28709: PyStructSequence_NewType is broken; makes GC type without sett https://bugs.python.org/issue28709 closed by petr.viktorin #30064: BaseSelectorEventLoop.sock_{recv,sendall}() don't remove their https://bugs.python.org/issue30064 closed by asvetlov #31541: Mock called_with does not ensure self/cls argument is used https://bugs.python.org/issue31541 closed by pablogsal #32429: Outdated Modules/Setup warning is invisible https://bugs.python.org/issue32429 closed by mdk #32485: Multiprocessing dict sharing between forked processes https://bugs.python.org/issue32485 closed by pitrou #32613: Use PEP 397 py launcher in windows faq https://bugs.python.org/issue32613 closed by mdk #33052: Sporadic segmentation fault in test_datetime.test_check_arg_ty https://bugs.python.org/issue33052 closed by vstinner #33695: Have shutil.copytree(), copy() and copystat() use cached scand https://bugs.python.org/issue33695 closed by giampaolo.rodola #33699: Don't describe try's else clause in a footnote https://bugs.python.org/issue33699 closed by Mariatta #33878: Doc: Assignment statement to tuple or list: case missing. https://bugs.python.org/issue33878 closed by mdk #34864: In Idle, Mac tabs make editor status line disappear. https://bugs.python.org/issue34864 closed by terry.reedy #34949: ntpath.abspath no longer uses normpath https://bugs.python.org/issue34949 closed by eryksun #35193: Off by one error in peephole call to find_op on case RETURN_VA https://bugs.python.org/issue35193 closed by gregory.p.smith #35199: Convert PyTuple_GET_ITEM() macro to a function call with addit https://bugs.python.org/issue35199 closed by vstinner #35204: Disable thread and memory sanitizers for address_in_range() https://bugs.python.org/issue35204 closed by benjamin.peterson #35205: os.path.realpath preserves the trailing backslash on Windows i https://bugs.python.org/issue35205 closed by eryksun #35207: Disallow expressions like (a) = 42 https://bugs.python.org/issue35207 closed by pablogsal #35209: Crash with tkinter text on osx https://bugs.python.org/issue35209 closed by ned.deily #35211: Turtle Shutdown Problem(s) https://bugs.python.org/issue35211 closed by ned.deily #35215: Replacing CPython memory allocation https://bugs.python.org/issue35215 closed by pablogsal #35219: macOS 10.14 Mojave crashes in multiprocessing https://bugs.python.org/issue35219 closed by barry #35222: email.utils.formataddr is not exactly the reverse of email.uti https://bugs.python.org/issue35222 closed by r.david.murray #35223: Pathlib incorrectly merges strings. https://bugs.python.org/issue35223 closed by eryksun #35229: Deprecate _PyObject_GC_TRACK() in Python 3.6 https://bugs.python.org/issue35229 closed by vstinner #35230: Remove _Py_REF_DEBUG_COMMA https://bugs.python.org/issue35230 closed by inada.naoki #35233: _PyMainInterpreterConfig_Copy() doesn't copy install_signal_ha https://bugs.python.org/issue35233 closed by vstinner #35237: Allow Path instances in sys.path ? https://bugs.python.org/issue35237 closed by ammar2 #35239: _PySys_EndInit() doesn't copy main interpreter configuration https://bugs.python.org/issue35239 closed by vstinner #35241: [3.7] test_embed fails on MacOS buildbots https://bugs.python.org/issue35241 closed by vstinner #35245: list comprehension for flattened or nested list differ too muc https://bugs.python.org/issue35245 closed by steven.daprano #35249: Docs Makefile always rebuilds entire doc https://bugs.python.org/issue35249 closed by seluj78 #35254: Process finished with exit code -1073741795 (0xC000001D) https://bugs.python.org/issue35254 closed by eryksun #35256: The Console of Python 3.7.0. https://bugs.python.org/issue35256 closed by eric.smith #35260: 2to3 Parse Error on Python 3 print() with arguments https://bugs.python.org/issue35260 closed by mark.dickinson #35261: readline.c: PyOS_InputHook not protected against SIGWINCH https://bugs.python.org/issue35261 closed by vstinner #35262: There should be a list.get just like dict.get https://bugs.python.org/issue35262 closed by rhettinger From brett at python.org Fri Nov 16 12:17:50 2018 From: brett at python.org (Brett Cannon) Date: Fri, 16 Nov 2018 09:17:50 -0800 Subject: [Python-Dev] Get a running instance of the doc for a PR. In-Reply-To: <20181104224842.GZ3817@ando.pearwood.info> References: <20181104133827.GA25586@xps> <203B5254-326E-43EC-A80D-4A8147791A9D@python.org> <20181104151239.GA14162@xps> <20181104155017.GX3817@ando.pearwood.info> <20181104160507.GF12103@xps> <20181104224842.GZ3817@ando.pearwood.info> Message-ID: On Sun, 4 Nov 2018 at 14:49, Steven D'Aprano wrote: > On Sun, Nov 04, 2018 at 05:05:07PM +0100, Stephane Wirtel wrote: > > > >If I am making doc patches, shouldn't I be doing that *before* I > > >submit the PR? How else will I know that my changes haven't broken the > > >docs? > > > > You can use the web interface of Github and just add/remove/modify a > > paragraph. > > Does Github show a preview? If not, then my question still stands: how > do I know my changes aren't broken? > > If Github does show a preview, then couldn't the reviewer look at that > too? > GitHub provides a "View" button for files while viewing a diff which will do a basic render for reST files. It won't change all the references like sealso and such, but basic stuff like literals and links will show up appropriately. I think baby steps are probably best here, so getting a zip file as Steve pointed out to start is still an improvement than the status quo and much easier to implement. And any further discussion should probably happen over on core-workflow. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Nov 16 12:46:36 2018 From: brett at python.org (Brett Cannon) Date: Fri, 16 Nov 2018 09:46:36 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: On Wed, 14 Nov 2018 at 16:09, Gregory P. Smith wrote: > It seems like the discussion so far is: >> >> Victor: "I know people when people hear 'new API' they get scared and >> think we're going to do a Python-3-like breaking transition, but don't >> worry, we're never going to do that." >> Nathaniel: "But then what does the new API add?" >> Greg: "It lets us do a Python-3-like breaking transition!" >> > > That is not what I am proposing but it seems too easy for people to > misunderstand it as such. Sorry. > > Between everything discussed across this thread I believe we have enough > information to suggest that we can avoid an "everyone's afraid of a new 3" > mistake by instead making a shim available with a proposed new API that > works on top of existing Python VM(s) so that if we decide to drop the old > API being public in the future, we could do so *without a breaking > transition*. > I know that has always been my hope, especially if any new API is actually going to be more restrictive instead of broader. > > Given that, I suggest not worrying about defining a new C API within the > CPython project and release itself (yet). > +1 from me. Until we have a PEP outlining the actual proposed API I'm not ready to have it go into 'master'. Helping show the shape of the API by wrapping pre-existing APIs I think that's going to be the way to sell it. > > Without an available benefit, little will use it (and given the function > call overhead we want to isolate some concepts, we know it will perform > worse on today's VMs). > > That "top-5" module using it idea? Maintain forks (hooray for git) of > whatever your definition of "top-5" projects is that use the new API > instead of the CPython API. If you attempt this on things like NumPy, you > may be shocked at the states (plural on purpose) of their extension module > code. That holds true for a lot of popular modules. > > Part of the point of this work is to demonstrate that non-incremental > order of magnitude performance change can be had on a Python VM that only > supports such an API can be done in its own fork of CPython, PyPy, > VictorBikeshedPy, FbIsAfraidToReleaseANewGcVmPy, etc. implementation to > help argue for figuring out a viable not-breaking-the-world transition plan > to do such a C API change thing in CPython itself. > I think part of the challenge here (and I believe it has been brought up elsewhere) is no one knows what kind of API is necessary for some faster VM other than PyPy. To me, the only C API that would could potentially start working toward and promoting **today** is one which is stripped to its bare bones and worst mirrors Python syntax. For instance, I have seen PyTuple_GET_ITEM() brought up a couple of times. But that's not syntax in Python, so I wouldn't feel comfortable including that in a simplified API. You really only need attribute access and object calling to make object indexing work, although for simplicity I can see wanting to provide an indexing API. But otherwise I think we are making assumptions here. For me, unless we are trying to trim the C API down to just what is syntactically supported in Python and in such a way that it hides all C-level details I feel like we are guessing at what's best for other VMs, both today and in the future, until they can tell us that e.g. tuple indexing is actually not a problem performance-wise. And Just to be clear, I totally support coming up with a totally stripped-down C API as I have outlined above as that shouldn't be controversial for any VM that wants to have a C-level API. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vstinner at redhat.com Fri Nov 16 13:03:12 2018 From: vstinner at redhat.com (Victor Stinner) Date: Fri, 16 Nov 2018 19:03:12 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: Brett: > But otherwise I think we are making assumptions here. For me, unless we are trying to trim the C API down to just what is syntactically supported in Python and in such a way that it hides all C-level details I feel like we are guessing at what's best for other VMs, both today and in the future, until they can tell us that e.g. tuple indexing is actually not a problem performance-wise. The current API of PyTuple_GET_ITEM() allows to write: PyObject **items = &PyTuple_GET_ITEM(tuple, 0); to access PyTupleObject.ob_item. Not only it's possible, but it's used commonly in the CPython code base. Last week I replaced &PyTuple_GET_ITEM() pattern with a new _PyTuple_ITEMS() macro which is private. To be able to return PyObject**, you have to convert the full tuple into PyObject* objects which is inefficient if your VM uses something different (again, PyPy doesn't use PyObject* at all). More generally, I like to use PyTuple_GET_ITEM() example, just because it's easy to understand this macro. But it's maybe not a good example :-) Victor From p.f.moore at gmail.com Fri Nov 16 13:10:59 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 16 Nov 2018 18:10:59 +0000 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: On Fri, 16 Nov 2018 at 17:49, Brett Cannon wrote: > And Just to be clear, I totally support coming up with a totally stripped-down C API as I have outlined above as that shouldn't be controversial for any VM that wants to have a C-level API. If a stripped down API like this is intended as "use this and you get compatibility across multiple Python interpreters and multiple Python versions" (essentially a much stronger and more effective version of the stable ABI) then I'm solidly in favour (and such an API has clear trade-offs that allow people to judge whether it's the right choice for them). Having this alongside the existing API, which would still be supported for projects that need low-level access or backward compatibility (or simply don't have the resources to change), but which will remain CPython-specific, seems like a perfectly fine idea. Paul From nas-python at arctrix.com Fri Nov 16 18:10:24 2018 From: nas-python at arctrix.com (Neil Schemenauer) Date: Fri, 16 Nov 2018 17:10:24 -0600 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: <20181116231024.bqkn67s2anhdgfk5@python.ca> On 2018-11-16, Brett Cannon wrote: > I think part of the challenge here (and I believe it has been > brought up elsewhere) is no one knows what kind of API is > necessary for some faster VM other than PyPy. I think we have some pretty good ideas as to what are the problematic parts of the current API. Victor's C-API web site has details[1]. We can ask other implementors which parts are hard to support. Here are my thoughts about some desired changes: - We are *not* getting rid of refcounting for extension modules. That would require a whole new API. We might as well start from scratch with Python 4. No one wants that. However, it is likely different VMs use a different GC internally and only use refcounting for objects passed through the C-API. Using refcounted handles is the usual implementation approach. We can make some changes to make that easier. I think making PyObject an opaque pointer would help. - Borrowed references are a problem. However, because they are so commonly used and because the source code changes needed to change to a non-borrowed API is non-trivial, I don't think we should try to change this. Maybe we could just discourage their use? For CPython, using a borrowed reference API is faster. For other Python implementations, it is likely slower and maybe much slower. So, if you are an extension module that wants to work well with other VMs, you should avoid those APIs. - It would be nice to make PyTypeObject an opaque pointer as well. I think that's a lot more difficult than making PyObject opaque. So, I don't think we should attempt it in the near future. Maybe we could make a half-way step and discourage accessing ob_type directly. We would provide functions (probably inline) to do what you would otherwise do by using op->ob_type->. One reason you want to discourage access to ob_type is that internally there is not necessarily one PyTypeObject structure for each Python level type. E.g. the VM might have specialized types for certain sub-domains. This is like the different flavours of strings, depending on the set of characters stored in them. Or, you could have different list types. One type of list if all values are ints, for example. Basically, with CPython op->ob_type is super fast. For other VMs, it could be a lot slower. By accessing ob_type you are saying "give me all possible type information for this object pointer". By using functions to get just what you need, you could be putting less burden on the VM. E.g. "is this object an instance of some type" is faster to compute. - APIs that return pointers to the internals of objects are a problem. E.g. PySequence_Fast_ITEMS(). For CPython, this is really fast because it is just exposing the internal details of the layout that is already in the correct format. For other VMs, that API could be expensive to emulate. E.g. you have a list to store only ints. If someone calls PySequence_Fast_ITEMS(), you have to create real PyObjects for all of the list elements. - Reducing the size of the API seems helpful. E.g. we don't need PyObject_CallObject() *and* PyObject_Call(). Also, do we really need all the type specific APIs, PyList_GetItem() vs PyObject_GetItem()? In some cases maybe we can justify the bigger API due to performance. To add a new API, someone should have a benchmark that shows a real speedup (not just that they imagine it makes a difference). I don't think we should change CPython internals to try to use this new API. E.g. we know that getting ob_type is fast so just leave the code that does that alone. Maybe in the far distant future, if we have successfully got extension modules to switch to using the new API, we could consider changing CPython internals. There would have to be a big benefit though to justify the code churn. E.g. if my tagged pointers experiment shows significant performance gains (it hasn't yet). I like Nathaniel Smith's idea of doing the new API as a separate project, outside the cpython repo. It is possible that in that effort, we would like some minor changes to cpython in order to make the new API more efficient, for example. Those should be pretty limited changes because we are hoping that the new API will work on top of old Python versions, e.g. 3.6. To avoid exposing APIs that should be hidden, re-organizing include files is an idea. However, that doesn't help for old versions of Python. So, I'm thinking that Dino's idea of just duplicating the prototypes would be better. We would like a minimal API and so the number of duplicated prototypes shouldn't be too large. Victor's recent work in changing some macros to inline functions is not really related to the new API project, IMHO. I don't think there is a problem to leave an existing macro as a macro. If we need to introduce new APIs, e.g. to help hide PyTypeObject, those APIs could use inline functions. That way, if using CPython then using the new API would be just as fast as accessing ob_type directly. You are getting an essentially zero cost abstraction. For the limited API builds, maybe it would be okay to change the inline functions into non-inlined versions (same function name). If the new API is going to be successful, it needs to be realatively easy to change extension source code to use it. E.g. replacing one function with another is pretty easy (PyObject_GetItem vs PyList_GetItem). If too many difficult changes are required, extensions are never going to get ported. The ported extension *must* be usable with older Python versions. That's a big mistake we made with the Python 2 to 3 migration. Let's not repeat it. Also, the extension module should not take a big performance hit. So, you can't change all APIs that were macros into non-inlined functions. People are not going to accept that and rightly so. However, it could be that we introduce a new ifdef like Py_LIMITED_API that gives a stable ABI. E.g. when that's enabled, most everything would turn into non-inline functions. In exchange for the performance hit, your extension would become ABI compatible between a range of CPython releases. That would be a nice feature. Basically a more useful version of Py_LIMITED_API. Regards, Neil 1. https://pythoncapi.readthedocs.io/bad_api.html From njs at pobox.com Fri Nov 16 18:27:50 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 16 Nov 2018 15:27:50 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <20181116231024.bqkn67s2anhdgfk5@python.ca> References: <20181116231024.bqkn67s2anhdgfk5@python.ca> Message-ID: On Fri, Nov 16, 2018 at 3:12 PM Neil Schemenauer wrote: > Also, the extension module should not take a big performance hit. > So, you can't change all APIs that were macros into non-inlined > functions. People are not going to accept that and rightly so. > However, it could be that we introduce a new ifdef like > Py_LIMITED_API that gives a stable ABI. E.g. when that's enabled, > most everything would turn into non-inline functions. In exchange > for the performance hit, your extension would become ABI compatible > between a range of CPython releases. That would be a nice feature. > Basically a more useful version of Py_LIMITED_API. It seems like a lot of the things being talked about here actually *are* features of Py_LIMITED_API. E.g. it does a lot of work to hide the internal layout of PyTypeObject, and of course the whole selling point is that it's stable across multiple Python versions. If that's the kind of ABI you're looking for, then it seems like you should investigate (a) whether you can make Py_LIMITED_API *be* that API, instead of having two different ifdefs, (b) why no popular extension modules actually use Py_LIMITED_API. I'm guessing it's partly due to limits of the API, but also things like: lack of docs and examples, lack of py2 support, ... -n -- Nathaniel J. Smith -- https://vorpus.org From nas-python at arctrix.com Fri Nov 16 19:40:23 2018 From: nas-python at arctrix.com (Neil Schemenauer) Date: Fri, 16 Nov 2018 18:40:23 -0600 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <20181116231024.bqkn67s2anhdgfk5@python.ca> Message-ID: <20181117004023.kwitipxsotj6sv62@python.ca> On 2018-11-16, Nathaniel Smith wrote: > [..] it seems like you should investigate (a) whether you can make > Py_LIMITED_API *be* that API, instead of having two different > ifdefs That might be a good idea. One problem is that we might like to make backwards incompatible changes to Py_LIMITED_API. Maybe it doesn't matter if no extensions actually use Py_LIMITED_API. Keeping API and ABI compatibility with the existing Py_LIMITED_API could be difficult. What would be the downside of using a new CPP define? We could deprecate Py_LIMITED_API and the new API could do the job. Also, I think extensions should have to option to turn the ABI compatibility off. For some extensions, they will not want to convert if there is a big performance hit (some macros turn into non-inlined functions, call functions rather than access a non-opaque structure). Maybe there is a reason my toggling idea won't work. If we can use a CPP define to toggle between inline and non-inline functions, I think it should work. Maybe it will get complicated. Providing ABI compatibility like Py_LIMITED_API is a different goal than making the API more friendly to alternative Python VMs. So, maybe it is a mistake to try to tackle both goals at once. However, the goals seem closely related and so it would be a shame to do a bunch of work and not achieve both. Regards, Neil From ncoghlan at gmail.com Sun Nov 18 07:32:35 2018 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 18 Nov 2018 22:32:35 +1000 Subject: [Python-Dev] Need discussion for a PR about memory and objects In-Reply-To: <20181104130314.GU3817@ando.pearwood.info> References: <20181104104350.GA12859@xps> <20181104130314.GU3817@ando.pearwood.info> Message-ID: On Sun, 4 Nov 2018 at 23:33, Steven D'Aprano wrote: > > On Sun, Nov 04, 2018 at 11:43:50AM +0100, Stephane Wirtel wrote: > > In this PR [https://github.com/python/cpython/pull/3382] "Remove reference > > to > > address from the docs, as it only causes confusion", opened by Chris > > Angelico, there is a discussion about the right term to use for the > > address of an object in memory. > > Why do we need to refer to the address of objects in memory? Following up on this discussion from a couple of weeks ago, note that Stephane misstated Chris's question/proposal from the PR slightly. The context is that the data model documentation for objects currently describes them as having an identity, a type, and a value, and then uses "address in memory" as an analogy for the properties that the object identity has (i.e. only one object can have a given identifier at any particular point in time, but identifiers can be re-used over time as objects are created and destroyed). That analogy is problematic, since it encourages the "object identities are memory addresses" mindset that happens to be true in CPython, but isn't true for Python implementations in general, and also isn't helpful for folks that have never learned a lower level language where you're manipulating pointers directly. However, simply removing the analogy entirely leaves that paragraph in the documentation feeling incomplete, so it would be valuable to replace it with a different real world analogy that will make sense to a broad audience. Chris's initial suggestion was to use "license number" or "social security number" (i.e. numbers governments assign to people), but I'm thinking a better comparison might be to vehicle registration numbers, since that analogy can be extended to the type and value characteristics in a fairly straightforward way: - the object identity is like the registration number or license plate (unique within the particular system of registering vehicles, but not unique across systems, and may sometimes be transferred to a new vehicle after the old one is destroyed) - the object type is like the make and model (e.g. a 2007 Toyota Corolla Ascent Sedan) - the object value is a specific car (e.g. "that white Corolla over there with 89000 km on the odometer") On the other hand, we're talking about the language reference here, not the tutorial, and understanding memory addressing seems like a reasonable assumed pre-requisite in that context. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan_ml at behnel.de Sun Nov 18 10:53:19 2018 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 18 Nov 2018 16:53:19 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <20181116231024.bqkn67s2anhdgfk5@python.ca> References: <20181116231024.bqkn67s2anhdgfk5@python.ca> Message-ID: Neil Schemenauer schrieb am 17.11.18 um 00:10: > I think making PyObject an opaque pointer would help. ... well, as long as type checks are still as fast as with "ob_type", and visible to the C compiler so that it can eliminate redundant ones, I wouldn't mind. :) > - Borrowed references are a problem. However, because they are so > commonly used and because the source code changes needed to change > to a non-borrowed API is non-trivial, I don't think we should try > to change this. Maybe we could just discourage their use? FWIW, the code that Cython generates has a macro guard [1] that makes it avoid borrowed references where possible, e.g. when it detects compilation under PyPy. That's definitely doable already, right now. > - It would be nice to make PyTypeObject an opaque pointer as well. > I think that's a lot more difficult than making PyObject opaque. > So, I don't think we should attempt it in the near future. Maybe > we could make a half-way step and discourage accessing ob_type > directly. We would provide functions (probably inline) to do what > you would otherwise do by using op->ob_type->. I've sometimes been annoyed by the fact that protocol checks require two pointer indirections in CPython (or even three in some cases), so that the C compiler is essentially prevented from making any assumptions, and the CPU branch prediction is also stretched a bit more than necessary. At least, the slot check usually comes right before the call, so that the lookups are not wasted. Inline functions are unlikely to improve that situation, but at least they shouldn't make it worse, and they would be more explicit. Needless to say that Cython also has a macro guard in [1] that disables direct slot access and makes it fall back to C-API calls, for users and Python implementations where direct slot support is not wanted/available. > One reason you want to discourage access to ob_type is that > internally there is not necessarily one PyTypeObject structure for > each Python level type. E.g. the VM might have specialized types > for certain sub-domains. This is like the different flavours of > strings, depending on the set of characters stored in them. Or, > you could have different list types. One type of list if all > values are ints, for example. An implementation like this could also be based on the buffer protocol. It's already supported by the array.array type (which people probably also just use when they have a need like this and don't want to resort to NumPy). > Basically, with CPython op->ob_type is super fast. For other VMs, > it could be a lot slower. By accessing ob_type you are saying > "give me all possible type information for this object pointer". > By using functions to get just what you need, you could be putting > less burden on the VM. E.g. "is this object an instance of some > type" is faster to compute. Agreed. I think that inline functions (well, or macros, because why not?) that check for certain protocols explicitly could be helpful. > - APIs that return pointers to the internals of objects are a > problem. E.g. PySequence_Fast_ITEMS(). For CPython, this is > really fast because it is just exposing the internal details of > the layout that is already in the correct format. For other VMs, > that API could be expensive to emulate. E.g. you have a list to > store only ints. If someone calls PySequence_Fast_ITEMS(), you > have to create real PyObjects for all of the list elements. But that's intended by the caller, right? They want a flat serial representation of the sequence, with potential conversion to a (list) array if necessary. They might be a bit badly named, but that's exactly the contract of the "PySequence_Fast_*()" line of functions. In Cython, we completely avoid these functions, because they are way too generic for optimisation purposes. Direct type checks and code specialisation are much more effective. > - Reducing the size of the API seems helpful. E.g. we don't need > PyObject_CallObject() *and* PyObject_Call(). Also, do we really > need all the type specific APIs, PyList_GetItem() vs > PyObject_GetItem()? In some cases maybe we can justify the bigger > API due to performance. To add a new API, someone should have a > benchmark that shows a real speedup (not just that they imagine it > makes a difference). So, in Cython, we use macros wherever possible, and often avoid generic protocols in favour of type specialisations. We sometimes keep local copies of C-API helper functions, because inlining them allows the C compiler to strip down and streamline the implementation at compile time, rather than jumping through generic code. (Also, it's sometimes required in order to backport new CPython features to Py2.7+.) PyPy's cpyext often just maps type specific C-API functions to the same generic code, obviously, but in CPython, having a way to bypass protocols and going straight to the type is really nice. I've sometimes been thinking about how to get access to the actual implementations in CPython, because hopping through slots when Cython already knows that, say, "float_add()" will be called in the end is just wasteful. I actually wonder if a more low-level C call interface [2] could also apply to protocols. It would be great to have an iteration protocol, for example, that would allow its user to choose whether she wants objects, integers or C doubles as return values, and then adapt accordingly, depending on the runtime data. So, "reducing the size of the API", maybe not. Rather, make it more specialised and provide a more low-level C integration of Python's protocols that avoids the current object indirections. That would also help projects like PyPy or Numba where things are inherently low-level internally, but currently have to go through static signatures that require wrapping and unwrapping things in inefficient containers. > E.g. if my tagged pointers experiment shows significant performance > gains (it hasn't yet). In case it ever does, I'd certainly like to see the internals exposed in order to make direct use of them. :) > Victor's recent work in changing some macros to inline functions is > not really related to the new API project, IMHO. I don't think > there is a problem to leave an existing macro as a macro. Yeah, it feels more like code churn. There are some things that cannot be done in macros (such as deciding whether to handle an error or to return a result unchanged), in which case an inline function is nice. For simple things, macros are just as good as inline functions. Minus some compile time type checking, perhaps. > However, it could be that we introduce a new ifdef like > Py_LIMITED_API that gives a stable ABI. E.g. when that's enabled, > most everything would turn into non-inline functions. In exchange > for the performance hit, your extension would become ABI compatible > between a range of CPython releases. That would be a nice feature. > Basically a more useful version of Py_LIMITED_API. I actually wouldn't mind adding such a binary compatibility mode to Cython. Probably some work, but in the end, it would just be another macro guard for us. The overall size of the generated C files has rarely been a matter of debates. :) Stefan [1] https://github.com/cython/cython/blob/f158e490b9e8515cf47cf301f996c1b7e631eebb/Cython/Utility/ModuleSetupCode.c#L43-L191 [2] https://github.com/cython/peps/blob/master/pep-ccalls.rst From stefan_ml at behnel.de Sun Nov 18 11:50:44 2018 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 18 Nov 2018 17:50:44 +0100 Subject: [Python-Dev] General concerns about C API changes In-Reply-To: References: <62A87B15-0F0E-4242-BF44-9C58FE890D8B@gmail.com> Message-ID: Gregory P. Smith schrieb am 15.11.18 um 01:03: > From my point of view: A static inline function is a much nicer modern code > style than a C preprocessor macro. It's also slower to compile, given that function inlining happens at a much later point in the compiler pipeline than macro expansion. The C compiler won't even get to see macros in fact, whereas whether to inline a function or not is a dedicated decision during the optimisation phase based on metrics collected in earlier stages. For something as ubiquitous as Py_INCREF/Py_DECREF, it might even be visible in the compilation times. Oh, BTW, I don't know if this was mentioned in the discussion before, but transitive inlining can easily be impacted by the switch from a macro to an inline function. Since inlining happens long before the final CPU code generation, the C compiler needs to uses heuristics for estimating the eventual "code weight" of an inline function, and then sums up all weights within a calling function to decide whether to also inline that calling function into the transitive callers or not. Now imagine that you have an inline function that executes several Py_INCREF/Py_DECREF call cycles, and the C compiler happens to slightly overestimate the weights of these two. Then it might end up deciding against inlining the function now, whereas it previously might have decided for it since it was able to see the exact source code expanded from the macros. I think that's what Raymond meant with his concerns regarding changing macros into inline functions. C compilers might be smart enough to always inline CPython's new inline functions themselves, but the style change can still have unexpected transitive impacts on code that uses them. I agree with Raymond that as long as there is no clear gain in this code churn, we should not underestimate the risk of degarding code on user side. Stefan From Richard at Damon-Family.org Sun Nov 18 14:00:08 2018 From: Richard at Damon-Family.org (Richard Damon) Date: Sun, 18 Nov 2018 14:00:08 -0500 Subject: [Python-Dev] Need discussion for a PR about memory and objects In-Reply-To: References: <20181104104350.GA12859@xps> <20181104130314.GU3817@ando.pearwood.info> Message-ID: <022f855a-671d-1de6-e01c-24eda52006c7@Damon-Family.org> On 11/18/18 7:32 AM, Nick Coghlan wrote: > On Sun, 4 Nov 2018 at 23:33, Steven D'Aprano wrote: >> On Sun, Nov 04, 2018 at 11:43:50AM +0100, Stephane Wirtel wrote: >>> In this PR [https://github.com/python/cpython/pull/3382] "Remove reference >>> to >>> address from the docs, as it only causes confusion", opened by Chris >>> Angelico, there is a discussion about the right term to use for the >>> address of an object in memory. >> Why do we need to refer to the address of objects in memory? > Following up on this discussion from a couple of weeks ago, note that > Stephane misstated Chris's question/proposal from the PR slightly. > > The context is that the data model documentation for objects currently > describes them as having an identity, a type, and a value, and then > uses "address in memory" as an analogy for the properties that the > object identity has (i.e. only one object can have a given identifier > at any particular point in time, but identifiers can be re-used over > time as objects are created and destroyed). > > That analogy is problematic, since it encourages the "object > identities are memory addresses" mindset that happens to be true in > CPython, but isn't true for Python implementations in general, and > also isn't helpful for folks that have never learned a lower level > language where you're manipulating pointers directly. > > However, simply removing the analogy entirely leaves that paragraph in > the documentation feeling incomplete, so it would be valuable to > replace it with a different real world analogy that will make sense to > a broad audience. > > Chris's initial suggestion was to use "license number" or "social > security number" (i.e. numbers governments assign to people), but I'm > thinking a better comparison might be to vehicle registration numbers, > since that analogy can be extended to the type and value > characteristics in a fairly straightforward way: > > - the object identity is like the registration number or license plate > (unique within the particular system of registering vehicles, but not > unique across systems, and may sometimes be transferred to a new > vehicle after the old one is destroyed) > - the object type is like the make and model (e.g. a 2007 Toyota > Corolla Ascent Sedan) > - the object value is a specific car (e.g. "that white Corolla over > there with 89000 km on the odometer") > > On the other hand, we're talking about the language reference here, > not the tutorial, and understanding memory addressing seems like a > reasonable assumed pre-requisite in that context. > > Cheers, > Nick. One issue with things like vehicle registration numbers is that the VIN of a vehicle is really a UUID, it is globally unique no other vehicle will every have that same ID number, and people may not think of the fact that some other ID numbers, like the SSN do get reused. Since the Python Object Identities can get reused after the object goes away, the analogy really needs to keep that clear, and not give the other extreme of a false impression that the ID won't get reused after the object goes away. -- Richard Damon From rosuav at gmail.com Sun Nov 18 14:05:14 2018 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 19 Nov 2018 06:05:14 +1100 Subject: [Python-Dev] Need discussion for a PR about memory and objects In-Reply-To: <022f855a-671d-1de6-e01c-24eda52006c7@Damon-Family.org> References: <20181104104350.GA12859@xps> <20181104130314.GU3817@ando.pearwood.info> <022f855a-671d-1de6-e01c-24eda52006c7@Damon-Family.org> Message-ID: On Mon, Nov 19, 2018 at 6:01 AM Richard Damon wrote: > One issue with things like vehicle registration numbers is that the VIN > of a vehicle is really a UUID, it is globally unique no other vehicle > will every have that same ID number, and people may not think of the > fact that some other ID numbers, like the SSN do get reused. Since the > Python Object Identities can get reused after the object goes away, the > analogy really needs to keep that clear, and not give the other extreme > of a false impression that the ID won't get reused after the object goes > away. Licence plate numbers do get reused. ChrisA From brett at python.org Sun Nov 18 14:06:15 2018 From: brett at python.org (Brett Cannon) Date: Sun, 18 Nov 2018 11:06:15 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: Message-ID: On Fri, 16 Nov 2018 at 10:11, Paul Moore wrote: > On Fri, 16 Nov 2018 at 17:49, Brett Cannon wrote: > > And Just to be clear, I totally support coming up with a totally > stripped-down C API as I have outlined above as that shouldn't be > controversial for any VM that wants to have a C-level API. > > If a stripped down API like this is intended as "use this and you get > compatibility across multiple Python interpreters and multiple Python > versions" (essentially a much stronger and more effective version of > the stable ABI) then I'm solidly in favour (and such an API has clear > trade-offs that allow people to judge whether it's the right choice > for them). > Yes, that's what I'm getting at. Basically we have to approach this from the "start with nothing and build up until we have _just_ enough and thus we know **everyone** now and into the future can support it", or we approach with "take what we have now and start peeling back until we _think_ it's good enough". Personally, I think the former is more future-proof. > > Having this alongside the existing API, which would still be supported > for projects that need low-level access or backward compatibility (or > simply don't have the resources to change), but which will remain > CPython-specific, seems like a perfectly fine idea. > And it can be done as wrappers around the current C API and as an external project to start. As Nathaniel pointed out in another thread, this is somewhat like what Py_LIMITED_API was meant to be, but I think we all admit we slightly messed up by making it opt-out instead of opt-in and so we didn't explicitly control that API as well as we probably should have (I know I have probably screwed up by accidentally including import functions by forgetting it was opt-out). I also don't think it was necessarily designed from a minimalist perspective to begin with as it defines things in terms of what's _not_ in Py_LIMITED_API instead of explicitly listing what _is_. So it may (or may not) lead to a different set of APIs in the end when you have to explicitly list every API to include. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ja.py at farowl.co.uk Sun Nov 18 13:58:04 2018 From: ja.py at farowl.co.uk (Jeff Allen) Date: Sun, 18 Nov 2018 18:58:04 +0000 Subject: [Python-Dev] Need discussion for a PR about memory and objects In-Reply-To: References: <20181104104350.GA12859@xps> <20181104130314.GU3817@ando.pearwood.info> Message-ID: <029f5d5f-7cab-d932-0556-bbfe9eb61f8d@farowl.co.uk> I found this (very good) summary ended in a surprising conclusion. On 18/11/2018 12:32, Nick Coghlan wrote: > On Sun, 4 Nov 2018 at 23:33, Steven D'Aprano wrote: >> On Sun, Nov 04, 2018 at 11:43:50AM +0100, Stephane Wirtel wrote: >>> In this PR [https://github.com/python/cpython/pull/3382] "Remove reference >>> to >>> address from the docs, as it only causes confusion", opened by Chris >>> Angelico, there is a discussion about the right term to use for the >>> address of an object in memory. >> Why do we need to refer to the address of objects in memory? > ... > Chris's initial suggestion was to use "license number" or "social > security number" (i.e. numbers governments assign to people), but I'm > thinking a better comparison might be to vehicle registration numbers, > ... > On the other hand, we're talking about the language reference here, > not the tutorial, and understanding memory addressing seems like a > reasonable assumed pre-requisite in that context. > > Cheers, > Nick. It is a good point that this is in the language reference, not a tutorial. Could we not expect readers of that to be prepared for a notion of object identity as the abstraction of what we mean by "the same object" vs "a distinct object"? If it were necessary to be explicit about what Python means by it, one could unpack the idea by its properties: distinct names may be given to the same object (is-operator); distinct objects may have the same value (==-operator); an object may change in value (if allowed by its type) while keeping its identity. And then there is the id() function. That is an imperfect reflection of the identity. id() guarantees that for a given object (identity) it will always return the same integer during the life of that object, and a different integer for any distinct object (distinct identity) with an overlapping lifetime. We note that, in an implementation of Python where objects are fixed in memory for life, a conformant id() may return the object's address. Jeff Allen -------------- next part -------------- An HTML attachment was scrubbed... URL: From hugo.fisher at gmail.com Sun Nov 18 15:51:22 2018 From: hugo.fisher at gmail.com (Hugh Fisher) Date: Mon, 19 Nov 2018 07:51:22 +1100 Subject: [Python-Dev] Need discussion for a PR about memory and > objects In-Reply-To: References: Message-ID: > Date: Sun, 18 Nov 2018 22:32:35 +1000 > From: Nick Coghlan > To: "Steven D'Aprano" > Cc: python-dev > Subject: Re: [Python-Dev] Need discussion for a PR about memory and > objects [ munch background ] > > Chris's initial suggestion was to use "license number" or "social > security number" (i.e. numbers governments assign to people), but I'm > thinking a better comparison might be to vehicle registration numbers, > since that analogy can be extended to the type and value > characteristics in a fairly straightforward way: > > - the object identity is like the registration number or license plate > (unique within the particular system of registering vehicles, but not > unique across systems, and may sometimes be transferred to a new > vehicle after the old one is destroyed) > - the object type is like the make and model (e.g. a 2007 Toyota > Corolla Ascent Sedan) > - the object value is a specific car (e.g. "that white Corolla over > there with 89000 km on the odometer") > > On the other hand, we're talking about the language reference here, > not the tutorial, and understanding memory addressing seems like a > reasonable assumed pre-requisite in that context. "Handle" has been used since the 1980s among Macintosh and Win32 programmers as "unique identifier of some object but isn't the memory address". The usage within those APIs seems to match what's being proposed for the new Python C API, in that programmers used functions to ask "what type are you?" "what value do you have?" but couldn't, or at least shouldn't, rely on actual memory layout. I suggest that for the language reference, use the license plate or registration analogy to introduce "handle" and after that use handle throughout. It's short, distinctive, and either will match up with what the programmer already knows or won't clash if or when they encounter handles elsewhere. -- cheers, Hugh Fisher From njs at pobox.com Sun Nov 18 16:53:54 2018 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 18 Nov 2018 13:53:54 -0800 Subject: [Python-Dev] General concerns about C API changes In-Reply-To: References: <62A87B15-0F0E-4242-BF44-9C58FE890D8B@gmail.com> Message-ID: On Sun, Nov 18, 2018 at 8:52 AM Stefan Behnel wrote: > > Gregory P. Smith schrieb am 15.11.18 um 01:03: > > From my point of view: A static inline function is a much nicer modern code > > style than a C preprocessor macro. > > It's also slower to compile, given that function inlining happens at a much > later point in the compiler pipeline than macro expansion. The C compiler > won't even get to see macros in fact, whereas whether to inline a function > or not is a dedicated decision during the optimisation phase based on > metrics collected in earlier stages. For something as ubiquitous as > Py_INCREF/Py_DECREF, it might even be visible in the compilation times. Have you measured this? I had the opposite intuition, that macros on average will be slower to compile because they increase the amount of code that the frontend has to process. But I've never checked... -n -- Nathaniel J. Smith -- https://vorpus.org From greg.ewing at canterbury.ac.nz Sun Nov 18 17:39:50 2018 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 19 Nov 2018 11:39:50 +1300 Subject: [Python-Dev] Need discussion for a PR about memory and objects In-Reply-To: References: <20181104104350.GA12859@xps> <20181104130314.GU3817@ando.pearwood.info> Message-ID: <5BF1EA36.1040603@canterbury.ac.nz> Nick Coghlan wrote: > - the object identity is like the registration number or license plate > (unique within the particular system of registering vehicles, but not > unique across systems, and may sometimes be transferred to a new > vehicle after the old one is destroyed) > - the object type is like the make and model (e.g. a 2007 Toyota > Corolla Ascent Sedan) > - the object value is a specific car (e.g. "that white Corolla over > there with 89000 km on the odometer") A bit confusing, because "that white Corolla over there" is referring to its identity. -- Greg From greg.ewing at canterbury.ac.nz Sun Nov 18 18:21:05 2018 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 19 Nov 2018 12:21:05 +1300 Subject: [Python-Dev] Need discussion for a PR about memory and objects In-Reply-To: References: <20181104104350.GA12859@xps> <20181104130314.GU3817@ando.pearwood.info> <022f855a-671d-1de6-e01c-24eda52006c7@Damon-Family.org> Message-ID: <5BF1F3E1.6070804@canterbury.ac.nz> Chris Angelico wrote: > Licence plate numbers do get reused. And they can change, e.g. if you get a personalised plate. -- Greg From solipsis at pitrou.net Mon Nov 19 04:37:36 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 19 Nov 2018 10:37:36 +0100 Subject: [Python-Dev] Need discussion for a PR about memory and objects References: <20181104104350.GA12859@xps> <20181104130314.GU3817@ando.pearwood.info> Message-ID: <20181119103736.690ef0d7@fsol> On Sun, 18 Nov 2018 22:32:35 +1000 Nick Coghlan wrote: > > Chris's initial suggestion was to use "license number" or "social > security number" (i.e. numbers governments assign to people), but I'm > thinking a better comparison might be to vehicle registration numbers, > since that analogy can be extended to the type and value > characteristics in a fairly straightforward way: > > - the object identity is like the registration number or license plate > (unique within the particular system of registering vehicles, but not > unique across systems, and may sometimes be transferred to a new > vehicle after the old one is destroyed) > - the object type is like the make and model (e.g. a 2007 Toyota > Corolla Ascent Sedan) > - the object value is a specific car (e.g. "that white Corolla over > there with 89000 km on the odometer") > > On the other hand, we're talking about the language reference here, > not the tutorial, and understanding memory addressing seems like a > reasonable assumed pre-requisite in that context. I'd rather keep the reference to memory addressing than start doing car analogies in the reference documentation. Regards Antoine. From solipsis at pitrou.net Mon Nov 19 04:44:22 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 19 Nov 2018 10:44:22 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) References: Message-ID: <20181119104422.46d4bb29@fsol> On Fri, 16 Nov 2018 09:46:36 -0800 Brett Cannon wrote: > > I think part of the challenge here (and I believe it has been brought up > elsewhere) is no one knows what kind of API is necessary for some faster VM > other than PyPy. To me, the only C API that would could potentially start > working toward and promoting **today** is one which is stripped to its bare > bones and worst mirrors Python syntax. For instance, I have seen > PyTuple_GET_ITEM() brought up a couple of times. But that's not syntax in > Python, so I wouldn't feel comfortable including that in a simplified API. > You really only need attribute access and object calling to make object > indexing work, although for simplicity I can see wanting to provide an > indexing API. If the C API only provides Python-level semantics, then it will roughly have the speed of pure Python (modulo bytecode execution). There are important use cases for the C API where it is desired to have fast type-specific access to Python objects such as tuples, ints, strings, etc. This is relied upon by modules such as _json and _pickle, and third-party extensions as well. Regards Antoine. From solipsis at pitrou.net Mon Nov 19 04:50:58 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 19 Nov 2018 10:50:58 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) References: <20181116231024.bqkn67s2anhdgfk5@python.ca> Message-ID: <20181119105058.1f95eb58@fsol> On Sun, 18 Nov 2018 16:53:19 +0100 Stefan Behnel wrote: > > So, in Cython, we use macros wherever possible, and often avoid generic > protocols in favour of type specialisations. We sometimes keep local copies > of C-API helper functions, because inlining them allows the C compiler to > strip down and streamline the implementation at compile time, rather than > jumping through generic code. (Also, it's sometimes required in order to > backport new CPython features to Py2.7+.) Also this approach allows those ballooning compile times that are part of Cython's charm and appeal ;-) (sorry, couldn't resist) Regards Antoine. From solipsis at pitrou.net Mon Nov 19 04:52:55 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 19 Nov 2018 10:52:55 +0100 Subject: [Python-Dev] General concerns about C API changes References: <62A87B15-0F0E-4242-BF44-9C58FE890D8B@gmail.com> Message-ID: <20181119105255.63201e93@fsol> On Sun, 18 Nov 2018 13:53:54 -0800 Nathaniel Smith wrote: > On Sun, Nov 18, 2018 at 8:52 AM Stefan Behnel wrote: > > > > Gregory P. Smith schrieb am 15.11.18 um 01:03: > > > From my point of view: A static inline function is a much nicer modern code > > > style than a C preprocessor macro. > > > > It's also slower to compile, given that function inlining happens at a much > > later point in the compiler pipeline than macro expansion. The C compiler > > won't even get to see macros in fact, whereas whether to inline a function > > or not is a dedicated decision during the optimisation phase based on > > metrics collected in earlier stages. For something as ubiquitous as > > Py_INCREF/Py_DECREF, it might even be visible in the compilation times. > > Have you measured this? I had the opposite intuition, that macros on > average will be slower to compile because they increase the amount of > code that the frontend has to process. But I've never checked... It will certainly depend on how much code the macro expands to. Py_INCREF is an extremely simple macro, so expanding everywhere doesn't sound like a problem. On the other hand, modern "macros" that are C++ templates can inline vast amounts of code at the call site, and that's a common cause of slow C++ compiles. Regards Antoine. From vstinner at redhat.com Mon Nov 19 05:28:46 2018 From: vstinner at redhat.com (Victor Stinner) Date: Mon, 19 Nov 2018 11:28:46 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <20181119104422.46d4bb29@fsol> References: <20181119104422.46d4bb29@fsol> Message-ID: Le lun. 19 nov. 2018 ? 10:48, Antoine Pitrou a ?crit : > If the C API only provides Python-level semantics, then it will > roughly have the speed of pure Python (modulo bytecode execution). > > There are important use cases for the C API where it is desired to have > fast type-specific access to Python objects such as tuples, ints, > strings, etc. This is relied upon by modules such as _json and _pickle, > and third-party extensions as well. Are you sure that using PyDict_GetItem() is really way faster than PyObject_GetItem()? Did someone run a benchmark to have numbers? I would expect that the most common source of speed up of a C extension is the removal of the cost of bytecode evaluation (ceval.c loop). Python internals rely on internals to implement further optimizations, than modifying an "immutable" tuple, bytes or str object, because you can do that at the C level. But I'm not sure that I would like 3rd party extensions to rely on such things. For example, unicodeobject.c uses the following function to check if a str object can be modified in-place, or if a new str object must be created: #ifdef Py_DEBUG static int unicode_is_singleton(PyObject *unicode) { PyASCIIObject *ascii = (PyASCIIObject *)unicode; if (unicode == unicode_empty) return 1; if (ascii->state.kind != PyUnicode_WCHAR_KIND && ascii->length == 1) { Py_UCS4 ch = PyUnicode_READ_CHAR(unicode, 0); if (ch < 256 && unicode_latin1[ch] == unicode) return 1; } return 0; } #endif static int unicode_modifiable(PyObject *unicode) { assert(_PyUnicode_CHECK(unicode)); if (Py_REFCNT(unicode) != 1) return 0; if (_PyUnicode_HASH(unicode) != -1) return 0; if (PyUnicode_CHECK_INTERNED(unicode)) return 0; if (!PyUnicode_CheckExact(unicode)) return 0; #ifdef Py_DEBUG /* singleton refcount is greater than 1 */ assert(!unicode_is_singleton(unicode)); #endif return 1; } Victor From solipsis at pitrou.net Mon Nov 19 05:53:42 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 19 Nov 2018 11:53:42 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <20181119104422.46d4bb29@fsol> Message-ID: <20181119115342.5a538507@fsol> On Mon, 19 Nov 2018 11:28:46 +0100 Victor Stinner wrote: > I would expect that the most common source of speed up of a C > extension is the removal of the cost of bytecode evaluation (ceval.c > loop). Well, I don't. All previous experiments showed that simply compiling Python code to C code using the "generic" C API yielded a 30% improvement. Conversely, the C _pickle module can be 100x faster than the pure Python pickle module. It's doing it *not* by using the generic C API, but by special-casing access to concrete types. You don't get that level of performance simply by removing the cost of bytecode evaluation: # C version $ python3 -m timeit -s "import pickle; x = list(range(1000))" "pickle.dumps(x)" 100000 loops, best of 3: 19 usec per loop # Python version $ python3 -m timeit -s "import pickle; x = list(range(1000))" "pickle._dumps(x)" 100 loops, best of 3: 2.25 msec per loop So, the numbers are on my side. So is the abundant experience of experts such as the Cython developers. > Python internals rely on internals to implement further optimizations, > than modifying an "immutable" tuple, bytes or str object, because you > can do that at the C level. But I'm not sure that I would like 3rd > party extensions to rely on such things. I'm not even talking about *modifying* tuples or str objects, I'm talking about *accessing* their value without going through an abstract API that does slot lookups, indirect function calls and object unboxing. For example, people may need a fast way to access the UTF-8 representation of a unicode object. Without making indirect function calls, and ideally without making a copy of the data either. How do you do that using the generic C API? Regards Antoine. From mal at egenix.com Mon Nov 19 05:59:21 2018 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 19 Nov 2018 11:59:21 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <20181119115342.5a538507@fsol> References: <20181119104422.46d4bb29@fsol> <20181119115342.5a538507@fsol> Message-ID: <460193ad-6ae0-c54a-4335-8422268db84e@egenix.com> On 19.11.2018 11:53, Antoine Pitrou wrote: > On Mon, 19 Nov 2018 11:28:46 +0100 > Victor Stinner wrote: >> Python internals rely on internals to implement further optimizations, >> than modifying an "immutable" tuple, bytes or str object, because you >> can do that at the C level. But I'm not sure that I would like 3rd >> party extensions to rely on such things. > > I'm not even talking about *modifying* tuples or str objects, I'm > talking about *accessing* their value without going through an abstract > API that does slot lookups, indirect function calls and object unboxing. > > For example, people may need a fast way to access the UTF-8 > representation of a unicode object. Without making indirect function > calls, and ideally without making a copy of the data either. How do > you do that using the generic C API? Something else you need to consider is creating instances of types, e.g. a tuple. In C you will have to be able to put values into the data structure before it is passed outside the function in order to build the tuple. If you remove this possibility to have to copy data all the time, losing the advantages of having a rich C API. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Nov 19 2018) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From solipsis at pitrou.net Mon Nov 19 06:10:19 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 19 Nov 2018 12:10:19 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) References: <20181119104422.46d4bb29@fsol> <20181119115342.5a538507@fsol> Message-ID: <20181119121019.69f7ba63@fsol> On Mon, 19 Nov 2018 11:53:42 +0100 Antoine Pitrou wrote: > On Mon, 19 Nov 2018 11:28:46 +0100 > Victor Stinner wrote: > > I would expect that the most common source of speed up of a C > > extension is the removal of the cost of bytecode evaluation (ceval.c > > loop). > > Well, I don't. All previous experiments showed that simply compiling > Python code to C code using the "generic" C API yielded a 30% > improvement. > > Conversely, the C _pickle module can be 100x faster than the pure > Python pickle module. It's doing it *not* by using the generic C > API, but by special-casing access to concrete types. You don't get > that level of performance simply by removing the cost of bytecode > evaluation: > > # C version > $ python3 -m timeit -s "import pickle; x = list(range(1000))" > "pickle.dumps(x)" 100000 loops, best of 3: 19 usec per loop > > # Python version > $ python3 -m timeit -s "import pickle; x = list(range(1000))" > "pickle._dumps(x)" 100 loops, best of 3: 2.25 msec per loop And to show that this is important for third-party C extensions as well, PyArrow (*) has comparable performance using similar techniques: $ python -m timeit -s "import pyarrow as pa; x = list(range(1000))" "pa.array(x, type=pa.int64())" 10000 loops, best of 5: 27.2 usec per loop (*) https://arrow.apache.org/docs/python/ Regards Antoine. From vstinner at redhat.com Mon Nov 19 06:14:52 2018 From: vstinner at redhat.com (Victor Stinner) Date: Mon, 19 Nov 2018 12:14:52 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <460193ad-6ae0-c54a-4335-8422268db84e@egenix.com> References: <20181119104422.46d4bb29@fsol> <20181119115342.5a538507@fsol> <460193ad-6ae0-c54a-4335-8422268db84e@egenix.com> Message-ID: To design a new C API, I see 3 options: (1) add more functions to the existing Py_LIMITED_API (2) "fork" the current public C API: remove functions and hide as much implementation details as possible (3) write a new C API from scratch, based on the current C API. Something like #define newcapi_Object_GetItem PyObject_GetItem"? Sorry, but "#undef " doesn't work. Only very few functions are defined using "#define ...". I dislike (1) because it's too far from what is currently used in practice. Moreover, I failed to find anyone who can explain me how the C API is used in the wild, which functions are important or not, what is the C API, etc. I propose (2). We control how much changes we do at each milestone, and we start from the maximum compatibility with current C API. Each change can be discussed and experimented to define what is the C API, what we want, etc. I'm working on this approach for 1 year, that's why many discussions popped up around specific changes :-) Some people recently proposed (3) on python-dev. I dislike this option because it starts by breaking the backward compatibility. It looks like (1), but worse. The goal and the implementation are unclear to me. -- Replacing PyDict_GetItem() (specialized call) with PyObject_Dict() (generic API) is not part of my short term plan. I wrote it in the roadmap, but as I wrote before, each change should be discusssed, experimented, benchmarked, etc. Victor Le lun. 19 nov. 2018 ? 12:02, M.-A. Lemburg a ?crit : > > On 19.11.2018 11:53, Antoine Pitrou wrote: > > On Mon, 19 Nov 2018 11:28:46 +0100 > > Victor Stinner wrote: > >> Python internals rely on internals to implement further optimizations, > >> than modifying an "immutable" tuple, bytes or str object, because you > >> can do that at the C level. But I'm not sure that I would like 3rd > >> party extensions to rely on such things. > > > > I'm not even talking about *modifying* tuples or str objects, I'm > > talking about *accessing* their value without going through an abstract > > API that does slot lookups, indirect function calls and object unboxing. > > > > For example, people may need a fast way to access the UTF-8 > > representation of a unicode object. Without making indirect function > > calls, and ideally without making a copy of the data either. How do > > you do that using the generic C API? > > Something else you need to consider is creating instances of > types, e.g. a tuple. In C you will have to be able to put > values into the data structure before it is passed outside > the function in order to build the tuple. > > If you remove this possibility to have to copy data all the > time, losing the advantages of having a rich C API. > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Nov 19 2018) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> Python Database Interfaces ... http://products.egenix.com/ > >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > > ::: We implement business ideas - efficiently in both time and costs ::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > http://www.malemburg.com/ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com From stefan at bytereef.org Mon Nov 19 06:59:34 2018 From: stefan at bytereef.org (Stefan Krah) Date: Mon, 19 Nov 2018 12:59:34 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) Message-ID: <20181119115933.GA4918@bytereef.org> Victor Stinner wrote: > Moreover, I failed to find anyone who can explain me how the C API is used > in the wild, which functions are important or not, what is the C API, etc. In practice people desperately *have* to use whatever is there, including functions with underscores that are not even officially in the C-API. I have to use _PyFloat_Pack* in order to be compatible with CPython, I need PySlice_Unpack() etc., I need PyUnicode_KIND(), need PyUnicode_AsUTF8AndSize(), I *wish* there were PyUnicode_AsAsciiAndSize(). In general, in daily use of the C-API I wish it were *larger* and not smaller. I often want functions that return C instead of Python values ot functions that take C instead of Python values. The ideal situation for me would be a lower layer library, say libcpython.a that has all those functions like _PyFloat_Pack*. It would be an enormous amount of work though, especially since the status quo kind of works. Stefan Krah From vstinner at redhat.com Mon Nov 19 10:08:07 2018 From: vstinner at redhat.com (Victor Stinner) Date: Mon, 19 Nov 2018 16:08:07 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <20181119115933.GA4918@bytereef.org> References: <20181119115933.GA4918@bytereef.org> Message-ID: Hi Stefan, Le lun. 19 nov. 2018 ? 13:18, Stefan Krah a ?crit : > In practice people desperately *have* to use whatever is there, including > functions with underscores that are not even officially in the C-API. > > I have to use _PyFloat_Pack* in order to be compatible with CPython, Oh, I never used this function. These functions are private (name prefixed by "_") and excluded from the limited API. For me, the limited API should be functions available on all Python implementations. Does it make sense to provide PyFloat_Pack4() in MicroPython, Jython, IronPython and PyPy? Or is it something more specific to CPython? I don't know the answer. If yes, open an issue to propose to make this function public? > I need PyUnicode_KIND() IMHO this one should not be part of the public API. The only usage would be to micro-optimize, but such API is very specific to one Python implementation. For example, PyPy doesn't use "compact string" but UTF-8 internally. If you use PyUnicode_KIND(), your code becomes incompatible with PyPy. What is your use case? I would prefer to expose the "_PyUnicodeWriter" API than PyUnicode_KIND(). > need PyUnicode_AsUTF8AndSize(), Again, that's a micro-optimization and it's very specific to CPython: result cached in the "immutable" str object. I don't want to put it in a public API. PyUnicode_AsUTF8String() is better since it doesn't require an internal cache. > I *wish* there were PyUnicode_AsAsciiAndSize(). PyUnicode_AsASCIIString() looks good to me. Sadly, it doesn't return the length, but usually the length is not needed. Victor From nas-python at arctrix.com Mon Nov 19 13:13:30 2018 From: nas-python at arctrix.com (Neil Schemenauer) Date: Mon, 19 Nov 2018 12:13:30 -0600 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <20181119104422.46d4bb29@fsol> <20181119115342.5a538507@fsol> <460193ad-6ae0-c54a-4335-8422268db84e@egenix.com> Message-ID: <20181119181330.fb26vkhhkbyxsokd@python.ca> On 2018-11-19, Victor Stinner wrote: > Moreover, I failed to find anyone who can explain me how the C API > is used in the wild, which functions are important or not, what is > the C API, etc. One idea is to download a large sample of extension modules from PyPI and then analyze them with some automated tool (maybe libclang). I guess it is possible there is a large non-public set of extensions that we would miss. Regards, Neil From ja.py at farowl.co.uk Mon Nov 19 14:16:21 2018 From: ja.py at farowl.co.uk (Jeff Allen) Date: Mon, 19 Nov 2018 19:16:21 +0000 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <20181119115933.GA4918@bytereef.org> Message-ID: <377de449-f028-fdb1-992c-7d96e8d1cea8@farowl.co.uk> On 19/11/2018 15:08, Victor Stinner wrote: > ... > For me, the limited API should be functions available on all Python > implementations. Does it make sense to provide PyFloat_Pack4() in > ..., Jython, ... ? Or is it something more > specific to CPython? I don't know the answer. I'd say it's a CPython thing. It is helpful to copy a lot of things from the reference implementation, but generally the lexical conventions of the C-API would seem ludicrous in Java, where scope is already provided by a class. And then there's the impossibility of a C-like pointer to byte. Names related to C-API have mnemonic value, though, in translation. Maybe "static void PyFloat.pack4(double, ByteBuffer, boolean)" would do the trick. It makes sense for JyNI to supply it by the exact C API name, and all other API that C extensions are likely to use. Jeff Allen -------------- next part -------------- An HTML attachment was scrubbed... URL: From nas-python at arctrix.com Mon Nov 19 17:03:19 2018 From: nas-python at arctrix.com (Neil Schemenauer) Date: Mon, 19 Nov 2018 16:03:19 -0600 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <20181119104422.46d4bb29@fsol> References: <20181119104422.46d4bb29@fsol> Message-ID: <20181119220319.nzbpaglcatwbj2uj@python.ca> On 2018-11-19, Antoine Pitrou wrote: > There are important use cases for the C API where it is desired to have > fast type-specific access to Python objects such as tuples, ints, > strings, etc. This is relied upon by modules such as _json and _pickle, > and third-party extensions as well. Thank you for pointing this out. The feedback from Stefan on what Cython would like (e.g. more access to functions that are currently "internal") is useful too. Keeping our dreams tied to reality is important. ;-P It seems to me that we can't "have our cake and eat it too". I.e. on the one hand hide CPython implementation internals but on the other hand allow extensions that want to take advantage of those internals to provide the best performance. Maybe we could have a multiple levels of API: A) maximum portability (Py_LIMITED_API) B) source portability (non-stable ABI, inlined functions) C) portability but poor performance on non-CPython VMs (PySequence_Fast_ITEMS, borrowed refs, etc) D) non-portability, CPython specific (access to more internals like Stefan was asking for). The extension would have to be re-implemented on each VM or provide a pure Python alternative. I think it would be nice if the extension module could explicitly choose which level of API it wants to use. It would be interesting to do a census on what extensions are out there. If they mostly fall into wanting level "C" then I think this API overhaul is not going to work out too well. Level C is mostly what we have now. No point in putting the effort into A and B if no one will use them. From chris.barker at noaa.gov Mon Nov 19 19:14:03 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 19 Nov 2018 16:14:03 -0800 Subject: [Python-Dev] Need discussion for a PR about memory and objects In-Reply-To: <20181119103736.690ef0d7@fsol> References: <20181104104350.GA12859@xps> <20181104130314.GU3817@ando.pearwood.info> <20181119103736.690ef0d7@fsol> Message-ID: On Mon, Nov 19, 2018 at 1:41 AM Antoine Pitrou wrote: > I'd rather keep the reference to memory addressing than start doing car > analogies in the reference documentation. > I agree -- and any of the car analogies will probably be only valid in some jurisdictions, anyway. I think being a bit more explicit about what properties an ID has, and how the id() function works, and we may not need an anlogy at all, it's not that difficult a concept. And methions that in c_python the id is (currently) the memory address is a good idea for those that will wonder about it, and if there is enough explanation, folks that don't know about memory addresses will not get confused. This is what's in the docs now (3.8.0a0): """ Every object has an identity, a type and a value. An object?s identity never changes once it has been created; you may think of it as the object?s address in memory. The ?is? operator compares the identity of two objects; the id() function returns an integer representing its identity. **CPython implementation detail:** For CPython, id(x) is the memory address where x is stored. """ I suggest something like the following: """ Every object has an identity, a type and a value. An object?s identity uniquely identifies the object. It will remain the same as long as that object exists. No two different objects will have the same id at the same time, but the same id may be re-used for future objects once one has been deleted. The ?is? operator compares the identity of two objects; the id() function returns an integer representing its identity. ``id(object_a) == id(object_b)`` if and only if they are the same object. **CPython implementation detail:** For CPython, id(x) is the memory address where x is stored. """ -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Mon Nov 19 19:39:31 2018 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 19 Nov 2018 16:39:31 -0800 Subject: [Python-Dev] Need discussion for a PR about memory and objects In-Reply-To: References: <20181104104350.GA12859@xps> <20181104130314.GU3817@ando.pearwood.info> <20181119103736.690ef0d7@fsol> Message-ID: <3573cffd-ea04-3178-0cf3-f0353ce59fed@g.nevcal.com> On 11/19/2018 4:14 PM, Chris Barker via Python-Dev wrote: > On Mon, Nov 19, 2018 at 1:41 AM Antoine Pitrou > wrote: > > I'd rather keep the reference to memory addressing than start > doing car > analogies in the reference documentation. > > > I agree -- and any of the car analogies will probably be only valid in > some jurisdictions, anyway. > > I think being a bit more explicit about what properties an ID has, and > how the id() function works, and we may not need an anlogy at all, > it's not that difficult a concept. And methions that in c_python the > id is (currently) the memory address is a good idea for those that > will wonder about it, and if there is enough explanation, folks that > don't know about memory addresses will not get confused. > > This is what's in the docs now (3.8.0a0): > > """ > Every object has an identity, a type and a value. An object?s identity > never changes once it has been created; you may think of it as the > object?s address in memory. The ?is? operator compares the identity of > two objects; the id() function returns an integer representing its > identity. > > **CPython implementation detail:** For CPython, id(x) is the memory > address where x is stored. > """ > > I suggest something like the following: > > """ > Every object has an identity, a type and a value. An object?s identity > uniquely identifies the object. It will remain the same as long as > that object exists. No two different objects will have the same id at > the same time, but the same id may be re-used for future objects once > one has been deleted. The ?is? operator compares the identity of two > objects; the id() function returns an integer representing its > identity. ``id(object_a) == id(object_b)`` if and only if they are the > same object. > > **CPython implementation detail:** For CPython, id(x) is the memory > address where x is stored. > """ > Well re-worded in my opinion. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bcannon at gmail.com Mon Nov 19 21:17:03 2018 From: bcannon at gmail.com (Brett Cannon) Date: Mon, 19 Nov 2018 18:17:03 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <20181119220319.nzbpaglcatwbj2uj@python.ca> References: <20181119104422.46d4bb29@fsol> <20181119220319.nzbpaglcatwbj2uj@python.ca> Message-ID: On Mon., Nov. 19, 2018, 14:04 Neil Schemenauer On 2018-11-19, Antoine Pitrou wrote: > > There are important use cases for the C API where it is desired to have > > fast type-specific access to Python objects such as tuples, ints, > > strings, etc. This is relied upon by modules such as _json and _pickle, > > and third-party extensions as well. > > Thank you for pointing this out. The feedback from Stefan on what > Cython would like (e.g. more access to functions that are currently > "internal") is useful too. Keeping our dreams tied to reality > is important. ;-P > > It seems to me that we can't "have our cake and eat it too". I.e. on > the one hand hide CPython implementation internals but on the other > hand allow extensions that want to take advantage of those internals > to provide the best performance. > No, but those are different APIs as well. E.g. no one is saying CPython has to do away with any of its API. What I and some others have said is the CPython API is too broad to be called "universal". > Maybe we could have a multiple levels of API: > > A) maximum portability (Py_LIMITED_API) > > B) source portability (non-stable ABI, inlined functions) > > C) portability but poor performance on non-CPython VMs > (PySequence_Fast_ITEMS, borrowed refs, etc) > I don't know own how doable that is as e.g. borrowed refs are not pleasant to simulate. > D) non-portability, CPython specific (access to more internals like > Stefan was asking for). The extension would have to be > re-implemented on each VM or provide a pure Python > alternative. > I think it would be nice if the extension module could explicitly > choose which level of API it wants to use. > Yes, and I thought we were working towards nesting our header files so you very clearly opted into your level of compatibility. In my head there's: - bare minimum, cross-VM, gets you FFI - CPython API for more performance that we're willing to maintain - Everything open for e.g. CPython with no compatibility guarantees Due note my first point isn't necessarily worrying about crazy performance to start. I would assume an alternative VM would help make up for this with a faster runtime where dropping into C is more about FFI than performance (we all know PyPy, for instance, wished people just wrote more Python code). Otherwise we're back to the idea of standardizing on some Cython solution to help make perfect easier without tying oneself to the C API (like Julia's FFI solution). > It would be interesting to do a census on what extensions are out > there. If they mostly fall into wanting level "C" then I think this > API overhaul is not going to work out too well. Level C is mostly > what we have now. No point in putting the effort into A and B if no > one will use them. It won't until someone can show benefits for switching. This is very much a chicken-and-egg problem. _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Nov 19 22:07:27 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 20 Nov 2018 14:07:27 +1100 Subject: [Python-Dev] Need discussion for a PR about memory and objects In-Reply-To: References: <20181104104350.GA12859@xps> <20181104130314.GU3817@ando.pearwood.info> <20181119103736.690ef0d7@fsol> Message-ID: <20181120030725.GB5054@ando.pearwood.info> Responding to a few points out of order. On Mon, Nov 19, 2018 at 04:14:03PM -0800, Chris Barker via Python-Dev wrote: > I think being a bit more explicit about what properties an ID has, and how > the id() function works, and we may not need an anlogy at all, it's not > that difficult a concept. [...] > I suggest something like the following: > > """ > Every object has an identity, a type and a value. An object?s identity > uniquely identifies the object. It will remain the same as long as that > object exists. No two different objects will have the same id at the same > time, but the same id may be re-used for future objects once one has been > deleted. The ?is? operator compares the identity of two objects; the id() > function returns an integer representing its identity. ``id(object_a) == > id(object_b)`` if and only if they are the same object. That looks good to me. However... > And methions that in c_python the id is > (currently) the memory address is a good idea for those that will wonder > about it, and if there is enough explanation, folks that don't know about > memory addresses will not get confused. I don't think that the problem is that people don't understand the "memory address as ID" implementation. I think the problem is people who can't separate the implementation from the interface. jThere is a small minority of developers, not just beginners, who insist that (paraphrasing) "the id() function returns the object's memory address", which leads others asking how to dereference the ID to get access to the object. E.g. I recently saw somebody on Reddit asking how to delete an object given its address in Python. I admit that this is just a minor point of confusion. We're not exactly innundated with dozens of requests for pointer arithmetic and PEEK/POKE commands *wink* but if we can reduce the confusion even further, that would be nice. We've had 20+ years of telling people that the C memory address of the object is an implementation detail, and some folks still don't get it. I'd like to that we reduce the emphasis on memory address in the docs. Perhaps all the way to zero :-) As you quoted, we currently we have a note in the docs that says: **CPython implementation detail:** For CPython, id(x) is the memory address where x is stored. I'd like to banish that note to the C-API docs (I'm not sure where), the FAQs (which apparently nobody ever reads *wink*) or perhaps just link to the source and let those who care read if for themselves. Instead, I'd like a more comprehensive comment directly in the description of id, something like: There are no guarantees made for the ID number except as above. For example, Python implementations are known to take IDs from a sequential series of integers (1, 2, 3, ...), or use arbitrary implementation-defined values (263459012). Any such integer is permitted so long as the ID is constant and unique for the lifespan of the object. -- Steve From ja.py at farowl.co.uk Tue Nov 20 03:49:16 2018 From: ja.py at farowl.co.uk (Jeff Allen) Date: Tue, 20 Nov 2018 08:49:16 +0000 Subject: [Python-Dev] Need discussion for a PR about memory and objects In-Reply-To: References: <20181104104350.GA12859@xps> <20181104130314.GU3817@ando.pearwood.info> <20181119103736.690ef0d7@fsol> Message-ID: On 20/11/2018 00:14, Chris Barker via Python-Dev wrote: > On Mon, Nov 19, 2018 at 1:41 AM Antoine Pitrou > wrote: > > I'd rather keep the reference to memory addressing than start > doing car > analogies in the reference documentation. > > > I agree -- and any of the car analogies will probably be only valid in > some jurisdictions, anyway. > > I think being a bit more explicit about what properties an ID has, and > how the id() function works, and we may not need an anlogy at all, > it's not that difficult a concept. And methions that in c_python the > id is (currently) the memory address is a good idea for those that > will wonder about it, and if there is enough explanation, folks that > don't know about memory addresses will not get confused. ... > I suggest something like the following: > > """ > Every object has an identity, a type and a value. An object?s identity > uniquely identifies the object. It will remain the same as long as > that object exists. No two different objects will have the same id at > the same time, but the same id may be re-used for future objects once > one has been deleted. The ?is? operator compares the identity of two > objects; the id() function returns an integer representing its > identity. ``id(object_a) == id(object_b)`` if and only if they are the > same object. > > **CPython implementation detail:** For CPython, id(x) is the memory > address where x is stored. > """ I agree that readers of a language reference should be able to manage without the analogies. I want to suggest that identity and id() are different things. The notion of identity in Python is what we access in phrases like "two different objects" and "the same object" in the text above. For me it defies definition, although one may make statements about it. A new object, wherever stored, is identically different from all objects preceding it. Any Python has to implement the concept of identity in order to refer, without confusion, to objects in structures and bound by names. In practice, Python need only identify an object while the object exists in the interpreter, and the object exists as long as something refers to it in this way. To this extent, the identifier in the implementation need not be unique for all time. The id() function returns an integer approximating this second idea. There is no mechanism to reach the object itself from the result, since it does not keep the object in existence, and worse, it may now be the id of a different object. In defining id(), I think it is confusing to imply that this number *is* the identity of the object that provided it. Jeff Allen -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Tue Nov 20 04:32:37 2018 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 20 Nov 2018 10:32:37 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <20181119104422.46d4bb29@fsol> <20181119115342.5a538507@fsol> <460193ad-6ae0-c54a-4335-8422268db84e@egenix.com> Message-ID: <7bac006f-3682-4418-d979-404407ed5659@gmail.com> On 11/19/18 12:14 PM, Victor Stinner wrote: > To design a new C API, I see 3 options: > > (1) add more functions to the existing Py_LIMITED_API > (2) "fork" the current public C API: remove functions and hide as much > implementation details as possible > (3) write a new C API from scratch, based on the current C API. > Something like #define newcapi_Object_GetItem PyObject_GetItem"? > Sorry, but "#undef " doesn't work. Only very few > functions are defined using "#define ...". > > I dislike (1) because it's too far from what is currently used in > practice. Moreover, I failed to find anyone who can explain me how the > C API is used in the wild, which functions are important or not, what > is the C API, etc. One big, complex project that now uses the limited API is PySide. They do some workarounds, but the limited API works. Here's a writeup of the troubles they have with it: https://github.com/pyside/pyside2-setup/blob/5.11/sources/shiboken2/libshiboken/pep384impl_doc.rst > I propose (2). We control how much changes we do at each milestone, > and we start from the maximum compatibility with current C API. Each > change can be discussed and experimented to define what is the C API, > what we want, etc. I'm working on this approach for 1 year, that's why > many discussions popped up around specific changes :-) I hope the new C API will be improvements (and clarifications) of the stable ABI, rather than a completely new thing. My ideal would be that Python 4.0 would keep the same API (with questionable things emulated & deprecated), but break *ABI*. The "new C API" would become that new stable ABI -- and this time it'd be something we'd really want to support, without reservations. One thing that did not work with the stable ABI was that it's "opt-out"; I think we can agree that a new one must be "opt-in" from the start. I'd also like the "new API" to be a *strict subset* of the stable ABI: if a new function needs to be added, it should be added to both. > Some people recently proposed (3) on python-dev. I dislike this option > because it starts by breaking the backward compatibility. It looks > like (1), but worse. The goal and the implementation are unclear to > me. > > -- > > Replacing PyDict_GetItem() (specialized call) with PyObject_Dict() > (generic API) is not part of my short term plan. I wrote it in the > roadmap, but as I wrote before, each change should be discusssed, > experimented, benchmarked, etc. > > Victor > Le lun. 19 nov. 2018 ? 12:02, M.-A. Lemburg a ?crit : >> >> On 19.11.2018 11:53, Antoine Pitrou wrote: >>> On Mon, 19 Nov 2018 11:28:46 +0100 >>> Victor Stinner wrote: >>>> Python internals rely on internals to implement further optimizations, >>>> than modifying an "immutable" tuple, bytes or str object, because you >>>> can do that at the C level. But I'm not sure that I would like 3rd >>>> party extensions to rely on such things. >>> >>> I'm not even talking about *modifying* tuples or str objects, I'm >>> talking about *accessing* their value without going through an abstract >>> API that does slot lookups, indirect function calls and object unboxing. >>> >>> For example, people may need a fast way to access the UTF-8 >>> representation of a unicode object. Without making indirect function >>> calls, and ideally without making a copy of the data either. How do >>> you do that using the generic C API? >> >> Something else you need to consider is creating instances of >> types, e.g. a tuple. In C you will have to be able to put >> values into the data structure before it is passed outside >> the function in order to build the tuple. >> >> If you remove this possibility to have to copy data all the >> time, losing the advantages of having a rich C API. >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Experts (#1, Nov 19 2018) >>>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>>>> Python Database Interfaces ... http://products.egenix.com/ >>>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ >> ________________________________________________________________________ >> >> ::: We implement business ideas - efficiently in both time and costs ::: >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ >> http://www.malemburg.com/ >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/encukou%40gmail.com > From njs at pobox.com Tue Nov 20 06:12:59 2018 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 20 Nov 2018 03:12:59 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <7bac006f-3682-4418-d979-404407ed5659@gmail.com> References: <20181119104422.46d4bb29@fsol> <20181119115342.5a538507@fsol> <460193ad-6ae0-c54a-4335-8422268db84e@egenix.com> <7bac006f-3682-4418-d979-404407ed5659@gmail.com> Message-ID: On Tue, Nov 20, 2018 at 1:34 AM Petr Viktorin wrote: > > On 11/19/18 12:14 PM, Victor Stinner wrote: > > To design a new C API, I see 3 options: > > > > (1) add more functions to the existing Py_LIMITED_API > > (2) "fork" the current public C API: remove functions and hide as much > > implementation details as possible > > (3) write a new C API from scratch, based on the current C API. > > Something like #define newcapi_Object_GetItem PyObject_GetItem"? > > Sorry, but "#undef " doesn't work. Only very few > > functions are defined using "#define ...". > > > > I dislike (1) because it's too far from what is currently used in > > practice. Moreover, I failed to find anyone who can explain me how the > > C API is used in the wild, which functions are important or not, what > > is the C API, etc. > > One big, complex project that now uses the limited API is PySide. They > do some workarounds, but the limited API works. Here's a writeup of the > troubles they have with it: > https://github.com/pyside/pyside2-setup/blob/5.11/sources/shiboken2/libshiboken/pep384impl_doc.rst AFAIK the only two projects that use the limited API are PySide-generated modules and cffi-generated modules. I guess if there is some cleanup needed to remove stuff that snuck into the limited API, then that will be fine as long as you make sure they aren't used by either of those two projects. For the regular C API, I guess the PyPy folks, and especially Matti Picus, probably know more than anyone else about what parts are actually used in the wild, since they've spent way more time digging into real projects. (Do you want to know about the exact conditions in which real projects rely on being able to skip calling PyType_Ready on a statically allocated PyTypeObject? Matti knows...) > I hope the new C API will be improvements (and clarifications) of the > stable ABI, rather than a completely new thing. > My ideal would be that Python 4.0 would keep the same API (with > questionable things emulated & deprecated), but break *ABI*. The "new C > API" would become that new stable ABI -- and this time it'd be something > we'd really want to support, without reservations. We already break ABI with every feature release ? at least for the main ABI. The limited ABI supposedly doesn't, but probably does, and as noted above it has such limited use that it's probably still possible to fix any stuff that's leaked out accidentally. -n -- Nathaniel J. Smith -- https://vorpus.org From bgerrity at ucdavis.edu Tue Nov 20 04:11:54 2018 From: bgerrity at ucdavis.edu (Brendan Gerrity) Date: Tue, 20 Nov 2018 04:11:54 -0500 Subject: [Python-Dev] bpo-34532 status Message-ID: Just wanted to check on bpo-34532/pr#9039. The requested changes were submitted as a commit to the PR. Best, Brendan -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Tue Nov 20 13:02:29 2018 From: brett at python.org (Brett Cannon) Date: Tue, 20 Nov 2018 10:02:29 -0800 Subject: [Python-Dev] bpo-34532 status In-Reply-To: References: Message-ID: To provide context, https://bugs.python.org/issue34532 is about making the Python launcher on Windows not return an error condition when using `py -0` (and probably `py --list` and `py --list-paths`). On Tue, 20 Nov 2018 at 08:29, Brendan Gerrity wrote: > Just wanted to check on bpo-34532/pr#9039. The requested changes were > submitted as a commit to the PR. > > Best, > Brendan > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Tue Nov 20 16:29:50 2018 From: steve.dower at python.org (Steve Dower) Date: Tue, 20 Nov 2018 13:29:50 -0800 Subject: [Python-Dev] bpo-34532 status In-Reply-To: References: Message-ID: It's merged now. Sorry for not seeing the update (I get far too many github notifications/emails to be able to pay attention to them - pinging on the issue tracker generally works best). Cheers, Steve On 20Nov2018 1002, Brett Cannon wrote: > To provide context, https://bugs.python.org/issue34532 is about making > the Python launcher on Windows not return an error condition when using > `py -0` (and probably `py --list` and `py --list-paths`). > > On Tue, 20 Nov 2018 at 08:29, Brendan Gerrity > wrote: > > Just wanted to check on?bpo-34532/pr#9039. The requested changes > were submitted as a commit to the PR. > > Best, > Brendan From stefan at bytereef.org Tue Nov 20 17:04:39 2018 From: stefan at bytereef.org (Stefan Krah) Date: Tue, 20 Nov 2018 23:04:39 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <20181119115933.GA4918@bytereef.org> Message-ID: <20181120220439.GA2765@bytereef.org> On Mon, Nov 19, 2018 at 04:08:07PM +0100, Victor Stinner wrote: > Le lun. 19 nov. 2018 ? 13:18, Stefan Krah a ?crit : > > In practice people desperately *have* to use whatever is there, including > > functions with underscores that are not even officially in the C-API. > > > > I have to use _PyFloat_Pack* in order to be compatible with CPython, > > Oh, I never used this function. These functions are private (name > prefixed by "_") and excluded from the limited API. > > For me, the limited API should be functions available on all Python > implementations. Does it make sense to provide PyFloat_Pack4() in > MicroPython, Jython, IronPython and PyPy? Or is it something more > specific to CPython? I don't know the answer. If yes, open an issue to > propose to make this function public? It depends on what the goal is: If PyPy wants to be able to use as many C extensions as possible, then yes. The function is just one example of what people have to use to be 100% compatible with CPython (or copy these functions and maintain them ...). Intuitively, it should probably not be part of a limited API, but I never quite understood the purpose of this API, because I regularly need any function that I can get my hands on. > > I need PyUnicode_KIND() > > IMHO this one should not be part of the public API. The only usage > would be to micro-optimize, but such API is very specific to one > Python implementation. For example, PyPy doesn't use "compact string" > but UTF-8 internally. If you use PyUnicode_KIND(), your code becomes > incompatible with PyPy. > > What is your use case? Reading typed strings directly into an array with minimal overhead. > I would prefer to expose the "_PyUnicodeWriter" API than PyUnicode_KIND(). > > > need PyUnicode_AsUTF8AndSize(), > > Again, that's a micro-optimization and it's very specific to CPython: > result cached in the "immutable" str object. I don't want to put it in > a public API. PyUnicode_AsUTF8String() is better since it doesn't > require an internal cache. > > > I *wish* there were PyUnicode_AsAsciiAndSize(). > > PyUnicode_AsASCIIString() looks good to me. Sadly, it doesn't return > the length, but usually the length is not needed. Yes, these are all just examples. It's also very useful to be able to do PyLong_Type.tp_as_number->nb_multiply or grab as_integer_ratio from the float PyMethodDef. The latter two cases are for speed reasons but also because sometimes you *don't* want a method from a subclass (Serhiy was very good in finding corner cases :-). Most C modules that I've seen have some internals. Psycopg2: PyDateTime_DELTA_GET_MICROSECONDS PyDateTime_DELTA_GET_DAYS PyDateTime_DELTA_GET_SECONDS PyList_GET_ITEM Bytes_GET_SIZE Py_BEGIN_ALLOW_THREADS Py_END_ALLOW_THREADS floatobject.h and longintrepr.h are also popular. Stefan Krah From vstinner at redhat.com Tue Nov 20 17:17:05 2018 From: vstinner at redhat.com (Victor Stinner) Date: Tue, 20 Nov 2018 23:17:05 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <20181120220439.GA2765@bytereef.org> References: <20181119115933.GA4918@bytereef.org> <20181120220439.GA2765@bytereef.org> Message-ID: Le mar. 20 nov. 2018 ? 23:08, Stefan Krah a ?crit : > Intuitively, it should probably not be part of a limited API, but I never > quite understood the purpose of this API, because I regularly need any > function that I can get my hands on. > (...) > Reading typed strings directly into an array with minimal overhead. IMHO performance and hiding implementation details are exclusive. You should either use the C API with impl. details for best performances, or use a "limited" C API for best compatibility. Since I would like to not touch the C API with impl. details, you can imagine to have two compilation modes: one for best performances on CPython, one for best compatibility (ex: compatible with PyPy). I'm not sure how the "compilation mode" will be selected. Victor From v+python at g.nevcal.com Tue Nov 20 21:02:57 2018 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 20 Nov 2018 18:02:57 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <20181119115933.GA4918@bytereef.org> <20181120220439.GA2765@bytereef.org> Message-ID: On 11/20/2018 2:17 PM, Victor Stinner wrote: > Le mar. 20 nov. 2018 ? 23:08, Stefan Krah a ?crit : >> Intuitively, it should probably not be part of a limited API, but I never >> quite understood the purpose of this API, because I regularly need any >> function that I can get my hands on. >> (...) >> Reading typed strings directly into an array with minimal overhead. > IMHO performance and hiding implementation details are exclusive. You > should either use the C API with impl. details for best performances, > or use a "limited" C API for best compatibility. The "limited" C API concept would seem to be quite sufficient for extensions that want to extend Python functionality to include new system calls, etc. (pywin32, pyMIDI, pySide, etc.) whereas the numpy and decimal might want best performance. > Since I would like to not touch the C API with impl. details, you can > imagine to have two compilation modes: one for best performances on > CPython, one for best compatibility (ex: compatible with PyPy). I'm > not sure how the "compilation mode" will be selected. The nicest interface from a compilation point of view would be to have two #include files: One to import the limited API, and one to import the performance API. Importing both should be allowed and should work. If you import the performance API, you have to learn more, and be more careful. Of course, there might be appropriate subsets of each API, having multiple include files, to avoid including everything, but that is a refinement. > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/v%2Bpython%40g.nevcal.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Nov 21 01:33:27 2018 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 20 Nov 2018 22:33:27 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <20181119115933.GA4918@bytereef.org> <20181120220439.GA2765@bytereef.org> Message-ID: On Tue, Nov 20, 2018 at 6:05 PM Glenn Linderman wrote: > > On 11/20/2018 2:17 PM, Victor Stinner wrote: >> IMHO performance and hiding implementation details are exclusive. You >> should either use the C API with impl. details for best performances, >> or use a "limited" C API for best compatibility. > > The "limited" C API concept would seem to be quite sufficient for extensions that want to extend Python functionality to include new system calls, etc. (pywin32, pyMIDI, pySide, etc.) whereas the numpy and decimal might want best performance. To make things more complicated: numpy and decimal are in a category of modules where if you want them to perform well on JIT-based VMs, then there's no possible C API that can achieve that. To get the benefits of a JIT on code using numpy or decimal, the JIT has to be able to see into their internals to do inlining etc., which means they can't be written in C at all [1], at which point the C API becomes irrelevant. It's not clear to me how this affects any of the discussion in CPython, since supporting JITs might not be part of the goal of a new C API, and I'm not sure how many packages fall between the numpy/decimal side and the pure-ffi side. -n [1] Well, there's also the option of teaching your Python JIT to handle LLVM bitcode as a source language, which is the approach that Graal is experimenting with. It seems completely wacky to me to hope you could write a C API emulation layer like PyPy's cpyext, and compile that + C extension code to LLVM bitcode, translate the LLVM bitcode to JVM bytecode, inline the whole mess into your Python JIT, and then fold everything away to produce something reasonable. But I could be wrong, and Oracle is throwing a lot of money at Graal so I guess we'll find out. -- Nathaniel J. Smith -- https://vorpus.org From v+python at g.nevcal.com Wed Nov 21 02:28:44 2018 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 20 Nov 2018 23:28:44 -0800 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: References: <20181119115933.GA4918@bytereef.org> <20181120220439.GA2765@bytereef.org> Message-ID: <0f594899-192c-a6cc-7f91-ddd2772b6187@g.nevcal.com> On 11/20/2018 10:33 PM, Nathaniel Smith wrote: > On Tue, Nov 20, 2018 at 6:05 PM Glenn Linderman wrote: >> On 11/20/2018 2:17 PM, Victor Stinner wrote: >>> IMHO performance and hiding implementation details are exclusive. You >>> should either use the C API with impl. details for best performances, >>> or use a "limited" C API for best compatibility. >> The "limited" C API concept would seem to be quite sufficient for extensions that want to extend Python functionality to include new system calls, etc. (pywin32, pyMIDI, pySide, etc.) whereas the numpy and decimal might want best performance. > To make things more complicated: numpy and decimal are in a category > of modules where if you want them to perform well on JIT-based VMs, > then there's no possible C API that can achieve that. To get the > benefits of a JIT on code using numpy or decimal, the JIT has to be > able to see into their internals to do inlining etc., which means they > can't be written in C at all [1], at which point the C API becomes > irrelevant. > > It's not clear to me how this affects any of the discussion in > CPython, since supporting JITs might not be part of the goal of a new > C API, and I'm not sure how many packages fall between the > numpy/decimal side and the pure-ffi side. > > -n > > [1] Well, there's also the option of teaching your Python JIT to > handle LLVM bitcode as a source language, which is the approach that > Graal is experimenting with. It seems completely wacky to me to hope > you could write a C API emulation layer like PyPy's cpyext, and > compile that + C extension code to LLVM bitcode, translate the LLVM > bitcode to JVM bytecode, inline the whole mess into your Python JIT, > and then fold everything away to produce something reasonable. But I > could be wrong, and Oracle is throwing a lot of money at Graal so I > guess we'll find out. > Interesting, thanks for the introduction to wacky. I was quite content with the idea that numpy, and other modules that would choose to use the unlimited API, would be sacrificing portability to non-CPython implementations... except by providing a Python equivalent (decimal, and some others do that, IIRC). Regarding JIT in general, though, it would seem that "precompiled" extensions like numpy would not need to be re-compiled by the JIT. But if it does, then the JIT better understand/support C syntax, but JVM JITs probably don't! so that leads to the scenario you describe. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Nov 21 06:08:06 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 21 Nov 2018 12:08:06 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) References: <20181119115933.GA4918@bytereef.org> <20181120220439.GA2765@bytereef.org> Message-ID: <20181121120806.65c9fbcf@fsol> On Tue, 20 Nov 2018 23:17:05 +0100 Victor Stinner wrote: > Le mar. 20 nov. 2018 ? 23:08, Stefan Krah a ?crit : > > Intuitively, it should probably not be part of a limited API, but I never > > quite understood the purpose of this API, because I regularly need any > > function that I can get my hands on. > > (...) > > Reading typed strings directly into an array with minimal overhead. > > IMHO performance and hiding implementation details are exclusive. You > should either use the C API with impl. details for best performances, > or use a "limited" C API for best compatibility. > > Since I would like to not touch the C API with impl. details, you can > imagine to have two compilation modes: one for best performances on > CPython, one for best compatibility (ex: compatible with PyPy). I'm > not sure how the "compilation mode" will be selected. You mean the same API can compile to two different things depending on a configuration? I expect it to be error-prone. For example, let's suppose I want to compile in a given mode, but I also use Numpy's C API. Will the compile mode "leak" to Numpy as well? What if a third-party header includes "Python.h" before I do the "#define" that's necessary? Regards Antoine. From vstinner at redhat.com Wed Nov 21 06:52:34 2018 From: vstinner at redhat.com (Victor Stinner) Date: Wed, 21 Nov 2018 12:52:34 +0100 Subject: [Python-Dev] Experiment an opt-in new C API for Python? (leave current API unchanged) In-Reply-To: <20181121120806.65c9fbcf@fsol> References: <20181119115933.GA4918@bytereef.org> <20181120220439.GA2765@bytereef.org> <20181121120806.65c9fbcf@fsol> Message-ID: Le mer. 21 nov. 2018 ? 12:11, Antoine Pitrou a ?crit : > You mean the same API can compile to two different things depending on > a configuration? Yes, my current plan is to keep #include but have an opt-in define to switch to the new C API. > I expect it to be error-prone. For example, let's suppose I want to > compile in a given mode, but I also use Numpy's C API. Will the > compile mode "leak" to Numpy as well? For example, if we continue to use Py_LIMITED_API: I don't think that Numpy currently uses #ifdef Py_LIMITED_API, nor plans to do that. If we add a new define (ex: my current proof-of-concept uses Py_NEWCAPI), we can make sure that it's not already used by Numpy :-) > What if a third-party header > includes "Python.h" before I do the "#define" that's necessary? IMHO the define should be added by distutils directly, using -D in compiler flags. I wouldn't suggest: #define #include But the two APIs should diverge, so your C extension should also use a define to decide to use the old or the new API. So something will break if you mess up in the compilation :-) Victor From mcepl at cepl.eu Wed Nov 21 09:38:04 2018 From: mcepl at cepl.eu (=?UTF-8?Q?Mat=C4=9Bj?= Cepl) Date: Wed, 21 Nov 2018 15:38:04 +0100 Subject: [Python-Dev] Missing functions [Was: Re: Experiment an opt-in new C API for Python? (leave current API unchanged)] References: <20181119115933.GA4918@bytereef.org> Message-ID: On 2018-11-19, 11:59 GMT, Stefan Krah wrote: > In practice people desperately *have* to use whatever is > there, including functions with underscores that are not even > officially in the C-API. Yes, there are some functions which evaporated and I have never heard a reason why and how I am supposed to overcome their removal. E.g., when porting M2Crypto to Python3 I had to reimplement my own (bad) version of `FILE* PyFile_AsFile(PyObject *pyfile)` function (https://is.gd/tgQGDw). I think it is obvious why it is necessary for C bindings, and I have never found a way how to get the underlying FILE handle from the Python File object properly. Just my ?0.02. Mat?j -- https://matej.ceplovi.cz/blog/, Jabber: mcepl at ceplovi.cz GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8 All of us could take a lesson from the weather. It pays no attention to criticism. -- somewhere on the Intenret From benjamin at python.org Wed Nov 21 09:54:19 2018 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 21 Nov 2018 09:54:19 -0500 Subject: [Python-Dev] =?utf-8?q?Missing_functions_=5BWas=3A_Re=3A_Experim?= =?utf-8?q?ent_an_opt-in_new_C_API_for_Python=3F_=28leave_current_API_unch?= =?utf-8?q?anged=29=5D?= In-Reply-To: References: <20181119115933.GA4918@bytereef.org> Message-ID: <7d26c00b-7cac-48c4-8cbd-807448c1df7f@sloti1d3t01> On Wed, Nov 21, 2018, at 06:53, Mat?j Cepl wrote: > On 2018-11-19, 11:59 GMT, Stefan Krah wrote: > > In practice people desperately *have* to use whatever is > > there, including functions with underscores that are not even > > officially in the C-API. > > Yes, there are some functions which evaporated and I have never > heard a reason why and how I am supposed to overcome their > removal. E.g., when porting M2Crypto to Python3 I had to > reimplement my own (bad) version of `FILE* > PyFile_AsFile(PyObject *pyfile)` function > (https://is.gd/tgQGDw). I think it is obvious why it is > necessary for C bindings, and I have never found a way how to > get the underlying FILE handle from the Python File object > properly. In Python 3, there is no underlying FILE* because the io module is implemented using fds directly rather than C stdio. From vstinner at redhat.com Wed Nov 21 10:07:12 2018 From: vstinner at redhat.com (Victor Stinner) Date: Wed, 21 Nov 2018 16:07:12 +0100 Subject: [Python-Dev] Missing functions [Was: Re: Experiment an opt-in new C API for Python? (leave current API unchanged)] In-Reply-To: References: <20181119115933.GA4918@bytereef.org> Message-ID: Please open a bug report once you have such issue ;-) Victor Le mer. 21 nov. 2018 ? 15:56, Mat?j Cepl a ?crit : > > On 2018-11-19, 11:59 GMT, Stefan Krah wrote: > > In practice people desperately *have* to use whatever is > > there, including functions with underscores that are not even > > officially in the C-API. > > Yes, there are some functions which evaporated and I have never > heard a reason why and how I am supposed to overcome their > removal. E.g., when porting M2Crypto to Python3 I had to > reimplement my own (bad) version of `FILE* > PyFile_AsFile(PyObject *pyfile)` function > (https://is.gd/tgQGDw). I think it is obvious why it is > necessary for C bindings, and I have never found a way how to > get the underlying FILE handle from the Python File object > properly. > > Just my ?0.02. > > Mat?j > -- > https://matej.ceplovi.cz/blog/, Jabber: mcepl at ceplovi.cz > GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8 > > All of us could take a lesson from the weather. It pays no > attention to criticism. > -- somewhere on the Intenret > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com From mcepl at cepl.eu Wed Nov 21 10:11:54 2018 From: mcepl at cepl.eu (=?UTF-8?Q?Mat=C4=9Bj?= Cepl) Date: Wed, 21 Nov 2018 16:11:54 +0100 Subject: [Python-Dev] Missing functions [Was: Re: Experiment an opt-in new C API for Python? (leave current API unchanged)] References: <20181119115933.GA4918@bytereef.org> <7d26c00b-7cac-48c4-8cbd-807448c1df7f@sloti1d3t01> Message-ID: On 2018-11-21, 14:54 GMT, Benjamin Peterson wrote: > In Python 3, there is no underlying FILE* because the io > module is implemented using fds directly rather than C stdio. OK, so the proper solution is to kill all functions which expect FILE, and if you are anal retentive about stability of API, then you have to fake it by creating FILE structure around the underlying fd handler as I did in M2Crypto, right? Best, Mat?j -- https://matej.ceplovi.cz/blog/, Jabber: mcepl at ceplovi.cz GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8 If you have a problem and you think awk(1) is the solution, then you have two problems. -- David Tilbrook (at least 1989, source of the later famous jwz rant on regular expressions). From encukou at gmail.com Thu Nov 22 05:13:00 2018 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 22 Nov 2018 11:13:00 +0100 Subject: [Python-Dev] Missing functions [Was: Re: Experiment an opt-in new C API for Python? (leave current API unchanged)] In-Reply-To: References: <20181119115933.GA4918@bytereef.org> <7d26c00b-7cac-48c4-8cbd-807448c1df7f@sloti1d3t01> Message-ID: <23f8b8b8-ce78-bc79-bb33-41b31a77f376@gmail.com> On 11/21/18 4:11 PM, Mat?j Cepl wrote: > On 2018-11-21, 14:54 GMT, Benjamin Peterson wrote: >> In Python 3, there is no underlying FILE* because the io >> module is implemented using fds directly rather than C stdio. > > OK, so the proper solution is to kill all functions which expect > FILE Indeed. This has another side to it: there are file-like objects that aren't backed by FILE*. In most case, being a "real" file is an unnecessary distinction, like that between the old `int` vs. `long`. "Fits in the machine register" is a detail from a level below Python, and so is "the kernel treats this as a file". Of course, this is not how C libraries work -- so, sadly, it makes wrappers harder to write. And a perfect solution might require adding more generic I/O to the C library. > and if you are anal retentive about stability of API, then > you have to fake it by creating FILE structure around the > underlying fd handler as I did in M2Crypto, right? Yes, AFAIK that is the least bad solution. I did something very similar here: https://github.com/encukou/py3c/blob/master/include/py3c/fileshim.h From mcepl at cepl.eu Thu Nov 22 06:43:13 2018 From: mcepl at cepl.eu (=?UTF-8?Q?Mat=C4=9Bj?= Cepl) Date: Thu, 22 Nov 2018 12:43:13 +0100 Subject: [Python-Dev] Missing functions [Was: Re: Experiment an opt-in new C API for Python? (leave current API unchanged)] References: <20181119115933.GA4918@bytereef.org> <7d26c00b-7cac-48c4-8cbd-807448c1df7f@sloti1d3t01> <23f8b8b8-ce78-bc79-bb33-41b31a77f376@gmail.com> Message-ID: On 2018-11-22, 10:13 GMT, Petr Viktorin wrote: > Yes, AFAIK that is the least bad solution. > I did something very similar here: > https://github.com/encukou/py3c/blob/master/include/py3c/fileshim.h Thank you. Mat?j From vstinner at redhat.com Fri Nov 23 07:58:24 2018 From: vstinner at redhat.com (Victor Stinner) Date: Fri, 23 Nov 2018 13:58:24 +0100 Subject: [Python-Dev] General concerns about C API changes In-Reply-To: References: <62A87B15-0F0E-4242-BF44-9C58FE890D8B@gmail.com> Message-ID: Le dim. 18 nov. 2018 ? 17:54, Stefan Behnel a ?crit : > It's also slower to compile, given that function inlining happens at a much > later point in the compiler pipeline than macro expansion. The C compiler > won't even get to see macros in fact, whereas whether to inline a function > or not is a dedicated decision during the optimisation phase based on > metrics collected in earlier stages. For something as ubiquitous as > Py_INCREF/Py_DECREF, it might even be visible in the compilation times. I ran a benchmark: there is no significant slowdown (+4 seconds, 6% slower, in the worst case). https://bugs.python.org/issue35059#msg330316 > Now imagine that you have an inline function that executes several > Py_INCREF/Py_DECREF call cycles, and the C compiler happens to slightly > overestimate the weights of these two. Then it might end up deciding > against inlining the function now, whereas it previously might have decided > for it since it was able to see the exact source code expanded from the > macros. I think that's what Raymond meant with his concerns regarding > changing macros into inline functions. C compilers might be smart enough to > always inline CPython's new inline functions themselves, but the style > change can still have unexpected transitive impacts on code that uses them. I ran the performance benchmark suite to compare C macros to static inline functions: there is no significant impact on performance. https://bugs.python.org/issue35059#msg330302 > I agree with Raymond that as long as there is no clear gain in this code > churn, we should not underestimate the risk of degarding code on user side. I don't understand how what you mean with "degarding code on user side". If you are talking about performance, again, my changes have no significant impact on performance (not on compilation time nor runtime performance). > "there is no clear gain in this code churn" There are multiple advantages: * Better development and debugging experience: tools understand inlined functions much better than C macros: gdb, Linux perf, etc. * Better API: arguments now have a type and the function has a return type. In practice, some macros still cast their argument to PyObject* to not introduce new compiler warnings in Python 3.8. For example, even if Py_INCREF() is documented (*) as a function expecting PyObject*, it accepts any pointer type (PyTupleObject*, PyUnicodeObject*, etc.). Technically, it also accepts PyObject** which is a bug, but that's a different story ;-) * Much better code, just plain regular C. C macros are ugly: "do { ... } while (0)" workaround, additional parenthesis around each argument, strange "expr1, expr2" syntax of "macro expression" which returns a value (inline function just uses regular "return" and ";" at the end of instructions), strange indentation, etc. * No more "macro pitfals": https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html * Local variables no longer need a magic name to avoid risk of name conflict, and have a clearly defined scope. Py_DECREF() and _Py_XINCREF() no longer need a local variable since it's argument already has a clearly defined type: PyObject*. I introduced a new variable in _Py_Dealloc() to fix a possible race condition. Previously, the variable was probably avoided because it's tricky use variables in macros. * #ifdef can now be used inside the inline function: it makes the code easier to understand. * etc. Are you aware that Python had macros like: #define _Py_REF_DEBUG_COMMA , #define _Py_CHECK_REFCNT(OP) /* a semicolon */; I let you judge the quality of this macro: #define _Py_NewReference(op) ( \ _Py_INC_TPALLOCS(op) _Py_COUNT_ALLOCS_COMMA \ _Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA \ Py_REFCNT(op) = 1) Is it an expression? Can it be used in "if (test) _Py_NewReference(op);"? It doesn't use the "do { ... } while (0)" protection against macro pitfals. (*) Py_INCREF doc: https://docs.python.org/dev/c-api/refcounting.html#c.Py_INCREF Victor From armin.rigo at gmail.com Fri Nov 23 08:15:27 2018 From: armin.rigo at gmail.com (Armin Rigo) Date: Fri, 23 Nov 2018 15:15:27 +0200 Subject: [Python-Dev] C API changes In-Reply-To: References: Message-ID: Hi Hugo, hi all, On Sun, 18 Nov 2018 at 22:53, Hugh Fisher wrote: > I suggest that for the language reference, use the license plate > or registration analogy to introduce "handle" and after that use > handle throughout. It's short, distinctive, and either will match > up with what the programmer already knows or won't clash if > or when they encounter handles elsewhere. FWIW, a "handle" is typically something that users of an API store and pass around, and which can be used to do all operations on some object. It is whatever a specific implementation needs to describe references to an object. In the CPython C API, this is ``PyObject*``. I think that using "handle" for something more abstract is just going to create confusion. Also FWIW, my own 2 cents on the topic of changing the C API: let's entirely drop ``PyObject *`` and instead use more opaque handles---like a ``PyHandle`` that is defined as a pointer-sized C type but is not actually directly a pointer. The main difference this would make is that the user of the API cannot dereference anything from the opaque handle, nor directly compare handles with each other to learn about object identity. They would work exactly like Windows handles or POSIX file descriptors. These handles would be returned by C API calls, and would need to be closed when no longer used. Several different handles may refer to the same object, which stays alive for at least as long as there are open handles to it. Doing it this way would untangle the notion of objects from their actual implementation. In CPython objects would internally use reference counting, a handle is really just a PyObject pointer in disguise, and closing a handle decreases the reference counter. In PyPy we'd have a global table of "open objects", and a handle would be an index in that table; closing a handle means writing NULL into that table entry. No emulated reference counting needed: we simply use the existing GC to keep alive objects that are referenced from one or more table entries. The cost is limited to a single indirection. The C API would change a lot, so it's not reasonable to do that in the CPython repo. But it could be a third-party project, attempting to define an API like this and implement it well on top of both CPython and PyPy. IMHO this might be a better idea than just changing the API of functions defined long ago to make them more regular (e.g. stop returning borrowed references); by now this would mostly mean creating more work for the PyPy team to track and adapt to the changes, with no real benefits. A bient?t, Armin. From status at bugs.python.org Fri Nov 23 12:10:08 2018 From: status at bugs.python.org (Python tracker) Date: Fri, 23 Nov 2018 18:10:08 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20181123171008.15C765657E@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2018-11-16 - 2018-11-23) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6865 ( +2) closed 40187 (+33) total 47052 (+35) Open issues with patches: 2735 Issues opened (27) ================== #35148: cannot activate a venv environment on a Swiss German windows https://bugs.python.org/issue35148 reopened by vinay.sajip #35267: reproducible deadlock with multiprocessing.Pool https://bugs.python.org/issue35267 opened by dzhu #35268: Windows asyncio reading continously stdin and stdout Stockfish https://bugs.python.org/issue35268 opened by Cezary.Wagner #35270: Cmd.complete does not handle cmd=None https://bugs.python.org/issue35270 opened by blueyed #35271: venv creates pyvenv.cfg with wrong home https://bugs.python.org/issue35271 opened by wvxvw #35272: sqlite3 get the connected database url https://bugs.python.org/issue35272 opened by midnio #35276: Document thread safety https://bugs.python.org/issue35276 opened by vstinner #35277: Upgrade bundled pip/setuptools https://bugs.python.org/issue35277 opened by dstufft #35278: [security] directory traversal in tempfile prefix https://bugs.python.org/issue35278 opened by Yusuke Endoh #35279: asyncio uses too many threads by default https://bugs.python.org/issue35279 opened by Vojt??ch Bo??ek #35280: Interactive shell overwrites history https://bugs.python.org/issue35280 opened by dingens #35281: Allow access to unittest.TestSuite tests https://bugs.python.org/issue35281 opened by lbenezriravin #35282: Add a return value to lib2to3.refactor.refactor_file and refac https://bugs.python.org/issue35282 opened by martindemello #35283: "threading._DummyThread" redefines "is_alive" but forgets "isA https://bugs.python.org/issue35283 opened by dmaurer #35284: Incomplete error handling in Python/compile.c:compiler_call() https://bugs.python.org/issue35284 opened by ZackerySpytz #35285: Make Proactor api extensible for reasonably any file handle https://bugs.python.org/issue35285 opened by Ignas Bra??i??kis #35286: wrong result for difflib.SequenceMatcher https://bugs.python.org/issue35286 opened by Boris Yang #35291: duplicate of memoryview from io.BufferedWriter leaks https://bugs.python.org/issue35291 opened by jmadden #35292: Make SimpleHTTPRequestHandler load mimetypes lazily https://bugs.python.org/issue35292 opened by steve.dower #35293: make doctest (Sphinx) emits a lot of warnings https://bugs.python.org/issue35293 opened by vstinner #35294: Race condition involving SocketServer.TCPServer https://bugs.python.org/issue35294 opened by Ruslan Dautkhanov #35295: Please clarify whether PyUnicode_AsUTF8AndSize() or PyUnicode_ https://bugs.python.org/issue35295 opened by Marcin Kowalczyk #35297: untokenize documentation is not correct https://bugs.python.org/issue35297 opened by csernazs #35298: Segfault in _PyObject_GenericGetAttrWithDict https://bugs.python.org/issue35298 opened by gilado #35299: LGHT0091: Duplicate symbol 'File:include_pyconfig.h' found https://bugs.python.org/issue35299 opened by neyuru #35300: Bug with memoization and mutable objects https://bugs.python.org/issue35300 opened by bolorsociedad #35301: python.exe crashes - lzma? https://bugs.python.org/issue35301 opened by jonathan-lp Most recent 15 issues with no replies (15) ========================================== #35301: python.exe crashes - lzma? https://bugs.python.org/issue35301 #35299: LGHT0091: Duplicate symbol 'File:include_pyconfig.h' found https://bugs.python.org/issue35299 #35298: Segfault in _PyObject_GenericGetAttrWithDict https://bugs.python.org/issue35298 #35297: untokenize documentation is not correct https://bugs.python.org/issue35297 #35295: Please clarify whether PyUnicode_AsUTF8AndSize() or PyUnicode_ https://bugs.python.org/issue35295 #35294: Race condition involving SocketServer.TCPServer https://bugs.python.org/issue35294 #35291: duplicate of memoryview from io.BufferedWriter leaks https://bugs.python.org/issue35291 #35285: Make Proactor api extensible for reasonably any file handle https://bugs.python.org/issue35285 #35284: Incomplete error handling in Python/compile.c:compiler_call() https://bugs.python.org/issue35284 #35282: Add a return value to lib2to3.refactor.refactor_file and refac https://bugs.python.org/issue35282 #35280: Interactive shell overwrites history https://bugs.python.org/issue35280 #35279: asyncio uses too many threads by default https://bugs.python.org/issue35279 #35270: Cmd.complete does not handle cmd=None https://bugs.python.org/issue35270 #35264: SSL Module build fails with OpenSSL 1.1.0 for Python 2.7 https://bugs.python.org/issue35264 #35263: Add None handling for get_saved() in IDLE https://bugs.python.org/issue35263 Most recent 15 issues waiting for review (15) ============================================= #35284: Incomplete error handling in Python/compile.c:compiler_call() https://bugs.python.org/issue35284 #35283: "threading._DummyThread" redefines "is_alive" but forgets "isA https://bugs.python.org/issue35283 #35282: Add a return value to lib2to3.refactor.refactor_file and refac https://bugs.python.org/issue35282 #35278: [security] directory traversal in tempfile prefix https://bugs.python.org/issue35278 #35277: Upgrade bundled pip/setuptools https://bugs.python.org/issue35277 #35270: Cmd.complete does not handle cmd=None https://bugs.python.org/issue35270 #35266: Add _PyPreConfig and rework _PyCoreConfig and _PyMainInterpret https://bugs.python.org/issue35266 #35264: SSL Module build fails with OpenSSL 1.1.0 for Python 2.7 https://bugs.python.org/issue35264 #35263: Add None handling for get_saved() in IDLE https://bugs.python.org/issue35263 #35259: Py_FinalizeEx unconditionally exists in Py_LIMITED_API https://bugs.python.org/issue35259 #35255: delete "How do I extract the downloaded documentation" section https://bugs.python.org/issue35255 #35252: test_functools dead code after FIXME https://bugs.python.org/issue35252 #35250: Minor parameter documentation mismatch for turtle https://bugs.python.org/issue35250 #35236: urllib.request.urlopen throws on some valid FTP files https://bugs.python.org/issue35236 #35226: mock.call equality surprisingly broken https://bugs.python.org/issue35226 Top 10 most discussed issues (10) ================================= #35134: Add a new Include/cpython/ subdirectory for the "CPython API" https://bugs.python.org/issue35134 14 msgs #35283: "threading._DummyThread" redefines "is_alive" but forgets "isA https://bugs.python.org/issue35283 13 msgs #34995: functools.cached_property does not maintain the wrapped method https://bugs.python.org/issue34995 9 msgs #35276: Document thread safety https://bugs.python.org/issue35276 5 msgs #22121: IDLE should start with HOME as the initial working directory https://bugs.python.org/issue22121 4 msgs #35021: Assertion failures in datetimemodule.c. https://bugs.python.org/issue35021 4 msgs #35266: Add _PyPreConfig and rework _PyCoreConfig and _PyMainInterpret https://bugs.python.org/issue35266 4 msgs #24658: open().write() fails on 2 GB+ data (OS X) https://bugs.python.org/issue24658 3 msgs #24746: doctest 'fancy diff' formats incorrectly strip trailing whites https://bugs.python.org/issue24746 3 msgs #33416: Add endline and endcolumn to every AST node https://bugs.python.org/issue33416 3 msgs Issues closed (33) ================== #9842: Document ... used in recursive repr of containers https://bugs.python.org/issue9842 closed by serhiy.storchaka #15245: ast.literal_eval fails on some literals https://bugs.python.org/issue15245 closed by serhiy.storchaka #18407: Fix compiler warnings in pythoncore for Win64 https://bugs.python.org/issue18407 closed by vstinner #18859: README.valgrind should mention --with-valgrind https://bugs.python.org/issue18859 closed by lukasz.langa #25438: document what codec PyMemberDef T_STRING decodes the char * as https://bugs.python.org/issue25438 closed by gregory.p.smith #28401: Don't support the PEP384 stable ABI in pydebug builds https://bugs.python.org/issue28401 closed by brett.cannon #28604: Exception raised by python3.5 when using en_GB locale https://bugs.python.org/issue28604 closed by vstinner #31115: Py3.6 threading/reference counting issues with `numexpr` https://bugs.python.org/issue31115 closed by serhiy.storchaka #31146: Docs: On non-public translations, language picker fallback to https://bugs.python.org/issue31146 closed by mdk #32281: bdist_rpm fails if no rpm command and rpmbuild is not in /usr/ https://bugs.python.org/issue32281 closed by ned.deily #32892: Remove specific constant AST types in favor of ast.Constant https://bugs.python.org/issue32892 closed by serhiy.storchaka #33211: lineno and col_offset are wrong on function definitions with d https://bugs.python.org/issue33211 closed by levkivskyi #33325: Optimize sequences of constants in the compiler https://bugs.python.org/issue33325 closed by serhiy.storchaka #34532: Windows launcher exits with error after listing available vers https://bugs.python.org/issue34532 closed by steve.dower #35035: Documentation for email.utils is named email.util.rst https://bugs.python.org/issue35035 closed by mdk #35059: Convert Py_INCREF() and PyObject_INIT() to inlined functions https://bugs.python.org/issue35059 closed by vstinner #35081: Move internal headers to Include/internal/ https://bugs.python.org/issue35081 closed by vstinner #35169: Improve error messages for assignment https://bugs.python.org/issue35169 closed by serhiy.storchaka #35200: Make the half-open range behaviour easier to teach https://bugs.python.org/issue35200 closed by mdk #35202: Remove unused imports in standard library https://bugs.python.org/issue35202 closed by terry.reedy #35206: [WIP] Add a new experimental _Py_CAPI2 API https://bugs.python.org/issue35206 closed by vstinner #35220: delete "how do I emulate os.kill" section in Windows FAQ https://bugs.python.org/issue35220 closed by deronnax #35221: Enhance venv activate commands readability https://bugs.python.org/issue35221 closed by rhettinger #35265: Internal C API: pass the memory allocator in a context https://bugs.python.org/issue35265 closed by vstinner #35269: A possible segfault involving a newly-created coroutine https://bugs.python.org/issue35269 closed by asvetlov #35273: 'eval' in generator expression behave different in dict from l https://bugs.python.org/issue35273 closed by rhettinger #35274: Running print("\x98") then freeze in Interpreter https://bugs.python.org/issue35274 closed by vstinner #35275: Reading umask (thread-safe) https://bugs.python.org/issue35275 closed by serhiy.storchaka #35287: wrong result for difflib.SequenceMatcher https://bugs.python.org/issue35287 closed by Boris Yang #35288: wrong result for difflib.SequenceMatcher https://bugs.python.org/issue35288 closed by Boris Yang #35289: wrong result for difflib.SequenceMatcher https://bugs.python.org/issue35289 closed by Boris Yang #35290: [FreeBSD] test_c_locale_coercion doesn't support new C.UTF-8 l https://bugs.python.org/issue35290 closed by vstinner #35296: Install Include/internal/ header files https://bugs.python.org/issue35296 closed by vstinner From stefan at bytereef.org Fri Nov 23 14:09:06 2018 From: stefan at bytereef.org (Stefan Krah) Date: Fri, 23 Nov 2018 20:09:06 +0100 Subject: [Python-Dev] C API changes Message-ID: <20181123190906.GA3114@bytereef.org> Armin Rigo wrote: > The C API would change a lot, so it's not reasonable to do that in the > CPython repo. But it could be a third-party project, attempting to > define an API like this and implement it well on top of both CPython > and PyPy. IMHO this might be a better idea than just changing the API > of functions defined long ago to make them more regular (e.g. stop > returning borrowed references); by now this would mostly mean creating > more work for the PyPy team to track and adapt to the changes, with no > real benefits. I like this idea. For example, when writing two versions of a C module, one that uses CPython internals indiscriminately and another that uses a "clean" API, such a third-party project would help. I'd also be more motivated to write two versions if I know that the project is supported by PyPy devs. Do you think that such an API might be faster than CFFI on PyPy? Stefan Krah From ncoghlan at gmail.com Sat Nov 24 12:27:34 2018 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 25 Nov 2018 03:27:34 +1000 Subject: [Python-Dev] C API changes In-Reply-To: References: Message-ID: On Fri, 23 Nov 2018 at 23:24, Armin Rigo wrote (regarding opaque "handles" in the C API): > The C API would change a lot, so it's not reasonable to do that in the > CPython repo. But it could be a third-party project, attempting to > define an API like this and implement it well on top of both CPython > and PyPy. IMHO this might be a better idea than just changing the API > of functions defined long ago to make them more regular (e.g. stop > returning borrowed references); by now this would mostly mean creating > more work for the PyPy team to track and adapt to the changes, with no > real benefits. And the nice thing about doing it as a shim is that it can be applied to *existing* versions of CPython, rather than having to wait for new ones. Node.js started switching over to doing things this way last year, and its a good way to go about it: https://medium.com/the-node-js-collection/n-api-next-generation-node-js-apis-for-native-modules-169af5235b06 While this would still be a difficult project to pursue, and would suffer from many of the same barriers to adoption as CPython's native stable ABI, it does offer a concrete benefit to 3rd party module authors: being able to create single wheel files that can be shared across multiple Python versions, rather than needing to be built separately for each one. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From remi.lapeyre at henki.fr Sat Nov 24 14:32:40 2018 From: remi.lapeyre at henki.fr (=?UTF-8?Q?R=C3=A9mi_Lapeyre?=) Date: Sat, 24 Nov 2018 14:32:40 -0500 Subject: [Python-Dev] Usage of tafile.copyfileobj Message-ID: Hi, I?m working on the tarfile module to add support for file objects whose size is not know beforehand (https://bugs.python.org/issue35227). In doing so, I need to adapt `tarfile.copyfileobj` to return the length of the file after it has been copied. Calling this function with `length=None` currently leads to data being copied but without adding the necessary padding. This seems weird to me, I do not understand why this would be needed and this behaviour is currently not used. This function is not documented in Python documentation so nobody is probably using it. Can I safely change `tarfile.copyfileobj` to make it write the padding when `length=None`? Thanks, R?mi From stefan_ml at behnel.de Sat Nov 24 15:15:36 2018 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 24 Nov 2018 21:15:36 +0100 Subject: [Python-Dev] C API changes In-Reply-To: References: Message-ID: Armin Rigo schrieb am 23.11.18 um 14:15: > In PyPy we'd have a global table of > "open objects", and a handle would be an index in that table; closing > a handle means writing NULL into that table entry. No emulated > reference counting needed: we simply use the existing GC to keep alive > objects that are referenced from one or more table entries. The cost > is limited to a single indirection. Couldn't this also be achieved via reference counting? Count only in C space, and delete the "open object" when the refcount goes to 0? Stefan From armin.rigo at gmail.com Sun Nov 25 00:15:03 2018 From: armin.rigo at gmail.com (Armin Rigo) Date: Sun, 25 Nov 2018 07:15:03 +0200 Subject: [Python-Dev] C API changes In-Reply-To: References: Message-ID: Hi Stefan, On Sat, 24 Nov 2018 at 22:17, Stefan Behnel wrote: > Couldn't this also be achieved via reference counting? Count only in C > space, and delete the "open object" when the refcount goes to 0? The point is to remove the need to return the same handle to C code if the object is the same one. This saves one of the largest costs of the C API emulation, which is looking up the object in a big dictionary to know if there is already a ``PyObject *`` that corresponds to it or not---for *all* objects that go from Python to C. Once we do that, then there is no need for a refcount any more. Yes, you could add your custom refcount code in C, but in practice it is rarely done. For example, with POSIX file descriptors, when you would need to "incref" a file descriptor, you instead use dup(). This gives you a different file descriptor which can be closed independently of the original one, but they both refer to the same file. A bient?t, Armin. From stefan_ml at behnel.de Sun Nov 25 03:13:18 2018 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 25 Nov 2018 09:13:18 +0100 Subject: [Python-Dev] C API changes In-Reply-To: References: Message-ID: Hi Armin, Armin Rigo schrieb am 25.11.18 um 06:15: > On Sat, 24 Nov 2018 at 22:17, Stefan Behnel wrote: >> Couldn't this also be achieved via reference counting? Count only in C >> space, and delete the "open object" when the refcount goes to 0? > > The point is to remove the need to return the same handle to C code if > the object is the same one. This saves one of the largest costs of > the C API emulation, which is looking up the object in a big > dictionary to know if there is already a ``PyObject *`` that > corresponds to it or not---for *all* objects that go from Python to C. Ok, got it. And since the handle is a simple integer, there's also no additional cost for memory allocation on the way out. > Once we do that, then there is no need for a refcount any more. Yes, > you could add your custom refcount code in C, but in practice it is > rarely done. For example, with POSIX file descriptors, when you would > need to "incref" a file descriptor, you instead use dup(). This gives > you a different file descriptor which can be closed independently of > the original one, but they both refer to the same file. Ok, then an INCREF() would be replaced by such a dup() call that creates and returns a new handle. In CPython, it would just INCREF and return the PyObject*, which is as fast as the current Py_INCREF(). For PyPy, however, that means that increfs become more costly. One of the outcomes of a recent experiment with tagged pointers for integers was that they make increfs and decrefs more expensive, and (IIUC) that reduced the overall performance quite visibly. In the case of pointers, it's literally just adding a tiny condition that makes this so much slower. In the case of handles, it would add a lookup and a reference copy in the handles array. That's way more costly already than just the simple condition. Now, it's unclear if this performance degredation is specific to CPython (where PyObject* is native), or if it would also apply to PyPy. But I guess the only way to find this out would be to try it. IIUC, the only thing that is needed is to replace Py_INCREF(obj); with obj = Py_NEWREF(obj); which CPython would implement as #define Py_NEWREF(obj) (Py_INCREF(obj), obj) Py_DECREF() would then just invalidate and clean up the handle under the hood. There are probably some places in user code where this would end up leaking a reference by accident because of unclean reference handling (it could overwrite the old handle in the case of a temporary INCREF/DECREF cycle), but it might still be enough for trying it out. We could definitely switch to this pattern in Cython (in fact, we already use such a NEWREF macro in a couple of places, since it's a common pattern). Overall, this seems like something that PyPy could try out as an experiment, by just taking a simple extension module and replacing all increfs with newref assignments. And obviously implementing the whole thing for the C-API, but IIUC, you might be able to tweak that into your cpyext wrapping layer somehow, without manually rewriting all C-API functions? Stefan From armin.rigo at gmail.com Mon Nov 26 00:37:37 2018 From: armin.rigo at gmail.com (Armin Rigo) Date: Mon, 26 Nov 2018 07:37:37 +0200 Subject: [Python-Dev] C API changes In-Reply-To: References: Message-ID: Hi, On Sun, 25 Nov 2018 at 10:15, Stefan Behnel wrote: > Overall, this seems like something that PyPy could try out as an > experiment, by just taking a simple extension module and replacing all > increfs with newref assignments. And obviously implementing the whole thing > for the C-API Just to be clear, I suggested making a new API, not just tweaking Py_INCREF() and hoping that all the rest works as it is. I'm skeptical about that. To start with, a ``Py_NEWREF()`` like you describe *will* lead people just renaming all ``Py_INCREF()`` to ``Py_NEWREF()`` ignoring the return value, because that's the easiest change and it would work fine on CPython. A bient?t, Armin. From erik.m.bray at gmail.com Mon Nov 26 03:53:32 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Mon, 26 Nov 2018 09:53:32 +0100 Subject: [Python-Dev] Ping on PR #8712 Message-ID: Hi folks, I've had a PR open for nearly 3 months now with no review at: https://github.com/python/cpython/pull/8712 I know everyone is overextended so normally I wouldn't fuss about it. But I would still like to remain committed to providing better Cygwin (and to a lesser extent, personally, MinGW) support in CPython. I have had a buildbot chugging along rather uselessly due to the blocker issue that the above PR fixes: https://buildbot.python.org/all/#/builders/164 Only when the above issue is fixed will it be possible to get some semi-useful builds and test runs on this buildbot. The issue that is fixed is really a general bug, it just happens to only affect builds on those platforms that implement a POSIX layer on top of Windows. Specifically, modules that are built into the libpython DLL are not linked properly. This is a regression that was introduced by https://bugs.python.org/issue30860 The fix I've proposed is simple and undisruptive on unaffected platforms. Thanks for having a look! From erik.m.bray at gmail.com Mon Nov 26 04:03:41 2018 From: erik.m.bray at gmail.com (E. Madison Bray) Date: Mon, 26 Nov 2018 10:03:41 +0100 Subject: [Python-Dev] C API changes In-Reply-To: References: Message-ID: On Fri, Nov 23, 2018 at 2:22 PM Armin Rigo wrote: > > Hi Hugo, hi all, > > On Sun, 18 Nov 2018 at 22:53, Hugh Fisher wrote: > > I suggest that for the language reference, use the license plate > > or registration analogy to introduce "handle" and after that use > > handle throughout. It's short, distinctive, and either will match > > up with what the programmer already knows or won't clash if > > or when they encounter handles elsewhere. > > FWIW, a "handle" is typically something that users of an API store and > pass around, and which can be used to do all operations on some > object. It is whatever a specific implementation needs to describe > references to an object. In the CPython C API, this is ``PyObject*``. > I think that using "handle" for something more abstract is just going > to create confusion. > > Also FWIW, my own 2 cents on the topic of changing the C API: let's > entirely drop ``PyObject *`` and instead use more opaque > handles---like a ``PyHandle`` that is defined as a pointer-sized C > type but is not actually directly a pointer. The main difference this > would make is that the user of the API cannot dereference anything > from the opaque handle, nor directly compare handles with each other > to learn about object identity. They would work exactly like Windows > handles or POSIX file descriptors. These handles would be returned by > C API calls, and would need to be closed when no longer used. Several > different handles may refer to the same object, which stays alive for > at least as long as there are open handles to it. Doing it this way > would untangle the notion of objects from their actual implementation. > In CPython objects would internally use reference counting, a handle > is really just a PyObject pointer in disguise, and closing a handle > decreases the reference counter. In PyPy we'd have a global table of > "open objects", and a handle would be an index in that table; closing > a handle means writing NULL into that table entry. No emulated > reference counting needed: we simply use the existing GC to keep alive > objects that are referenced from one or more table entries. The cost > is limited to a single indirection. +1 As another point of reference, if you're interested, I've been working lately on the special purpose computer algebra system GAP. It also uses an approach like this: Objects are referenced throughout via an opaque "Obj" type (which is really just a typedef of "Bag", the internal storage reference handle of its "GASMAN" garbage collector [1]). A nice benefit of this, along with the others discussed above, is that it has being relatively easy to replace the garbage collector in GAP--there are options for it to use Boehm-GC, as well as Julia's GC. GAP has its own problems, but it's relatively simple and has been inspiring to look at; I was coincidentally wondering just recently if there's anything Python could take from it (conversely, I'm trying to bring some things I've learned from Python to improve GAP...). [1] https://github.com/gap-system/gap/blob/master/src/gasman.c From stefan_ml at behnel.de Mon Nov 26 12:06:16 2018 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 26 Nov 2018 18:06:16 +0100 Subject: [Python-Dev] C API changes In-Reply-To: References: Message-ID: Armin Rigo schrieb am 26.11.18 um 06:37: > On Sun, 25 Nov 2018 at 10:15, Stefan Behnel wrote: >> Overall, this seems like something that PyPy could try out as an >> experiment, by just taking a simple extension module and replacing all >> increfs with newref assignments. And obviously implementing the whole thing >> for the C-API > > Just to be clear, I suggested making a new API, not just tweaking > Py_INCREF() and hoping that all the rest works as it is. I'm > skeptical about that. Oh, I'm not skeptical at all. I'm actually sure that it's not that easy. I would guess that such an automatic transformation should work in something like 70% of the cases. Another 25% should be trivial to fix manually, and the remaining 5% ? well. They can probably still be changed with some thinking and refactoring. That also involves cases where pointer equality is used to detect object identity. Having a macro for that might be a good idea. Overall, relatively easy. And therefore not unlikely to happen. The lower the bar, the more likely we will see adoption. Also note that explicit Py_INCREF() calls are actually not that common. I just checked and found only 465 calls in 124K lines of Cython generated C code for Cython itself, and 725 calls in 348K C lines of lxml. Not exactly a snap, but definitely not huge. All other objects originate from the C-API in one way or another, which you control. > To start with, a ``Py_NEWREF()`` like you describe *will* lead people > just renaming all ``Py_INCREF()`` to ``Py_NEWREF()`` ignoring the > return value, because that's the easiest change and it would work fine > on CPython. First of all, as long as Py_INCREF() is not going away, they probably won't change anything. Therefore, before we discuss how laziness will hinder the adoption, I would rather like to see an actual motivation for them to do it. And since this change seems to have zero advantages in CPython, but adds a tiny bit of complexity, I think it's now up to PyPy to show that this added complexity has an advantage that is large enough to motivates it. If you could come up with a prototype that demonstrates the advantage (or at least uncovers the problems we'd face), we could actually discuss about real solutions rather than uncertain ideas. Stefan From larry at hastings.org Mon Nov 26 19:08:55 2018 From: larry at hastings.org (Larry Hastings) Date: Mon, 26 Nov 2018 16:08:55 -0800 Subject: [Python-Dev] C API changes In-Reply-To: References: Message-ID: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> On 11/23/18 5:15 AM, Armin Rigo wrote: > Also FWIW, my own 2 cents on the topic of changing the C API: let's > entirely drop ``PyObject *`` and instead use more opaque > handles---like a ``PyHandle`` that is defined as a pointer-sized C > type but is not actually directly a pointer. The main difference this > would make is that the user of the API cannot dereference anything > from the opaque handle, nor directly compare handles with each other > to learn about object identity. They would work exactly like Windows > handles or POSIX file descriptors. Why would this be better than simply returning the pointer? Sure, it prevents ever dereferencing the pointer and messing with the object, it is true.? So naughty people would be prevented from messing with the object directly instead of using the API as they should.? But my understanding is that the implementation would be slightly slower--there'd be all that looking up objects based on handles, and managing the handle namespace too.? I'm not convinced the nice-to-have of "you can't dereference the pointer anymore" is worth this runtime overhead. Or maybe you have something pretty cheap in mind, e.g. "handle = pointer ^ 49"?? Or even "handle = pointer ^ (random odd number picked at startup)" to punish the extra-naughty? //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Mon Nov 26 20:26:09 2018 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 26 Nov 2018 20:26:09 -0500 Subject: [Python-Dev] C API changes In-Reply-To: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> Message-ID: On 11/26/2018 7:08 PM, Larry Hastings wrote: > On 11/23/18 5:15 AM, Armin Rigo wrote: >> Also FWIW, my own 2 cents on the topic of changing the C API: let's >> entirely drop ``PyObject *`` and instead use more opaque >> handles---like a ``PyHandle`` that is defined as a pointer-sized C >> type but is not actually directly a pointer. The main difference this >> would make is that the user of the API cannot dereference anything >> from the opaque handle, nor directly compare handles with each other >> to learn about object identity. They would work exactly like Windows >> handles or POSIX file descriptors. > > Why would this be better than simply returning the pointer? Sure, it > prevents ever dereferencing the pointer and messing with the object, it > is true.? So naughty people would be prevented from messing with the > object directly instead of using the API as they should.? But my > understanding is that the implementation would be slightly > slower--there'd be all that looking up objects based on handles, and > managing the handle namespace too.? I'm not convinced the nice-to-have > of "you can't dereference the pointer anymore" is worth this runtime > overhead. > > Or maybe you have something pretty cheap in mind, e.g. "handle = pointer > ^ 49"?? Or even "handle = pointer ^ (random odd number picked at > startup)" to punish the extra-naughty? I thought the important part of the proposal was to have multiple PyHandles that point to the same PyObject (you couldn't "directly compare handles with each other to learn about object identity"). But I'll admit I'm not sure why this would be a win. Then of course they couldn't be regular pointers. Eric From python at mrabarnett.plus.com Mon Nov 26 22:40:22 2018 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 27 Nov 2018 03:40:22 +0000 Subject: [Python-Dev] C API changes In-Reply-To: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> Message-ID: On 2018-11-27 00:08, Larry Hastings wrote: > On 11/23/18 5:15 AM, Armin Rigo wrote: >> Also FWIW, my own 2 cents on the topic of changing the C API: let's >> entirely drop ``PyObject *`` and instead use more opaque >> handles---like a ``PyHandle`` that is defined as a pointer-sized C >> type but is not actually directly a pointer. The main difference this >> would make is that the user of the API cannot dereference anything >> from the opaque handle, nor directly compare handles with each other >> to learn about object identity. They would work exactly like Windows >> handles or POSIX file descriptors. > > Why would this be better than simply returning the pointer? Sure, it > prevents ever dereferencing the pointer and messing with the object, it > is true.? So naughty people would be prevented from messing with the > object directly instead of using the API as they should.? But my > understanding is that the implementation would be slightly > slower--there'd be all that looking up objects based on handles, and > managing the handle namespace too.? I'm not convinced the nice-to-have > of "you can't dereference the pointer anymore" is worth this runtime > overhead. > > Or maybe you have something pretty cheap in mind, e.g. "handle = pointer > ^ 49"?? Or even "handle = pointer ^ (random odd number picked at > startup)" to punish the extra-naughty? > An advantage would be that objects could be moved to reduce memory fragmentation. From njs at pobox.com Mon Nov 26 23:15:10 2018 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 26 Nov 2018 20:15:10 -0800 Subject: [Python-Dev] C API changes In-Reply-To: References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> Message-ID: On Mon, Nov 26, 2018 at 6:12 PM Eric V. Smith wrote: > I thought the important part of the proposal was to have multiple > PyHandles that point to the same PyObject (you couldn't "directly > compare handles with each other to learn about object identity"). But > I'll admit I'm not sure why this would be a win. Then of course they > couldn't be regular pointers. Whenever PyPy passes an object from PyPy -> C, then it has to invent a "PyObject*" to represent the PyPy object. 0.1% of the time, the C code will use C pointer comparison to implement an "is" check on this PyObject*. But PyPy doesn't know which 0.1% of the time this will happen, so 100% of the time an object goes from PyPy -> C, PyPy has to check and update some global intern table to figure out whether this particular object has ever made the transition before and use the same PyObject*. 99.9% of the time, this is pure overhead, and it slows down one of *the* most common operations C extension code does. If C extensions checked object identity using some explicit operation like PyObject_Is() instead of comparing pointers, then PyPy could defer the expensive stuff until someone actually called PyObject_Is(). Note: numbers are made up and I have no idea how much overhead this actually adds. But I'm pretty sure this is the basic idea that Armin's talking about. -n -- Nathaniel J. Smith -- https://vorpus.org From vstinner at redhat.com Tue Nov 27 09:09:41 2018 From: vstinner at redhat.com (Victor Stinner) Date: Tue, 27 Nov 2018 15:09:41 +0100 Subject: [Python-Dev] C API changes In-Reply-To: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> Message-ID: Le mar. 27 nov. 2018 ? 01:13, Larry Hastings a ?crit : > (...) I'm not convinced the nice-to-have of "you can't dereference the pointer anymore" is worth this runtime overhead. About the general idea of a new C API. If you only look at CPython in release mode, there is no benefit. But you should consider the overall picture: * ability to distribute a single binary for CPython in release mode, CPython in debug mode, PyPy, and maybe some new more funky Python runtimes * better performance on PyPy The question is if we can implement new optimizations in CPython (like tagged pointer) which would move the overall performance impact to at least "not significant" (not slower, not faster), or maybe even to "faster". Note: Again, in my plan, the new C API would be an opt-in API. The old C API would remain unchanged and fully supported. So there is no impact on performance if you consider to use the old C API. Victor From steve.dower at python.org Tue Nov 27 12:12:31 2018 From: steve.dower at python.org (Steve Dower) Date: Tue, 27 Nov 2018 09:12:31 -0800 Subject: [Python-Dev] C API changes In-Reply-To: References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> Message-ID: <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> On 27Nov2018 0609, Victor Stinner wrote: > Note: Again, in my plan, the new C API would be an opt-in API. The old > C API would remain unchanged and fully supported. So there is no > impact on performance if you consider to use the old C API. This is one of the things that makes me think your plan is not feasible. I *hope* that remaining on the old C API eventually has a performance impact, since the whole point is to enable new optimizations that currently require tricky emulation to remain compatible with the old API. If we never have to add any emulation for the old API, we haven't added anything useful for the new one. Over time, the old C API's performance (not functionality) should degrade as the new C API's performance increases. If the increase isn't significantly better than the degradation, the whole project can be declared a failure, as we would have been better off leaving the API alone and not changing anything. But this is great discussion. Looking forward to seeing some of it turn into reality :) Cheers, Steve From greg at krypto.org Tue Nov 27 19:24:31 2018 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 27 Nov 2018 16:24:31 -0800 Subject: [Python-Dev] C API changes In-Reply-To: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> Message-ID: On Mon, Nov 26, 2018 at 4:10 PM Larry Hastings wrote: > On 11/23/18 5:15 AM, Armin Rigo wrote: > > Also FWIW, my own 2 cents on the topic of changing the C API: let's > entirely drop ``PyObject *`` and instead use more opaque > handles---like a ``PyHandle`` that is defined as a pointer-sized C > type but is not actually directly a pointer. The main difference this > would make is that the user of the API cannot dereference anything > from the opaque handle, nor directly compare handles with each other > to learn about object identity. They would work exactly like Windows > handles or POSIX file descriptors. > > > Why would this be better than simply returning the pointer? Sure, it > prevents ever dereferencing the pointer and messing with the object, it is > true. So naughty people would be prevented from messing with the object > directly instead of using the API as they should. But my understanding is > that the implementation would be slightly slower--there'd be all that > looking up objects based on handles, and managing the handle namespace > too. I'm not convinced the nice-to-have of "you can't dereference the > pointer anymore" is worth this runtime overhead. > > Or maybe you have something pretty cheap in mind, e.g. "handle = pointer ^ > 49"? Or even "handle = pointer ^ (random odd number picked at startup)" to > punish the extra-naughty? > Heck, it'd be find if someones implementation (such as a simple shim for CPython's existing API) wants to internally keep a PyObject structure and have PyHandle's implementation just be a typecast from PyObject* to PyHandle. The real point is that a handle is opaque and cannot be depended on by any API _user_ as being a pointer. What it means behind the scenes of a given VM is left entirely up to the VM. When an API returns a handle, that is an implicit internal INCREF if a VM is reference counting. When code calls an API that consumes a handle by taking ownership of it for itself (Py_DECREF could be considered one of these if you have a Py_DECREF equivalent API) that means "I can no longer using this handle". Comparisons get documented as being invalid, pointing to the API to call for an identity check, but it is up to each implementation to decide if it wants to force the handles to be unique. Anyone depending on that behavior is being bad and should not be supported. -gps PS ... use C++ and you could actually make handle identity comparisons do the right thing... -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.underwood at gmail.com Wed Nov 28 05:28:19 2018 From: jonathan.underwood at gmail.com (Jonathan Underwood) Date: Wed, 28 Nov 2018 10:28:19 +0000 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? Message-ID: Hi, I have for sometime maintained the Python bindings to the LZ4 compression library[0, 1]: I am wondering if there is interest in having these bindings move to the standard library to sit alongside the gzip, lzma etc bindings? Obviously the code would need to be modified to fit the coding guidelines etc. I'm following the guidelines [2] and asking here first before committing any work to this endeavour, but if folks think this would be a useful addition, I would be willing to put in the hours to create a PR and continued maintenance. Would welcome any thoughts. Cheers, Jonathan. [0] https://github.com/python-lz4/python-lz4 [1] https://python-lz4.readthedocs.io/en/stable/ [2] https://devguide.python.org/stdlibchanges/ From solipsis at pitrou.net Wed Nov 28 11:43:43 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 28 Nov 2018 17:43:43 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? References: Message-ID: <20181128174343.6bc8760b@fsol> On Wed, 28 Nov 2018 10:28:19 +0000 Jonathan Underwood wrote: > Hi, > > I have for sometime maintained the Python bindings to the LZ4 > compression library[0, 1]: > > I am wondering if there is interest in having these bindings move to > the standard library to sit alongside the gzip, lzma etc bindings? > Obviously the code would need to be modified to fit the coding > guidelines etc. Personally I would find it useful indeed. LZ4 is very attractive when (de)compression speed is a primary factor, for example when sending data over a fast network link or a fast local SSD. Another compressor worth including is Zstandard (by the same author as LZ4). Actually, Zstandard and LZ4 cover most of the (speed / compression ratio) range quite well. Informative graphs below: https://gregoryszorc.com/blog/2017/03/07/better-compression-with-zstandard/ Regards Antoine. From brett at python.org Wed Nov 28 12:51:57 2018 From: brett at python.org (Brett Cannon) Date: Wed, 28 Nov 2018 09:51:57 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <20181128174343.6bc8760b@fsol> References: <20181128174343.6bc8760b@fsol> Message-ID: Are we getting to the point that we want a compresslib like hashlib if we are going to be adding more compression algorithms? On Wed, 28 Nov 2018 at 08:44, Antoine Pitrou wrote: > On Wed, 28 Nov 2018 10:28:19 +0000 > Jonathan Underwood wrote: > > Hi, > > > > I have for sometime maintained the Python bindings to the LZ4 > > compression library[0, 1]: > > > > I am wondering if there is interest in having these bindings move to > > the standard library to sit alongside the gzip, lzma etc bindings? > > Obviously the code would need to be modified to fit the coding > > guidelines etc. > > Personally I would find it useful indeed. LZ4 is very attractive > when (de)compression speed is a primary factor, for example when > sending data over a fast network link or a fast local SSD. > > Another compressor worth including is Zstandard (by the same author as > LZ4). Actually, Zstandard and LZ4 cover most of the (speed / > compression ratio) range quite well. Informative graphs below: > https://gregoryszorc.com/blog/2017/03/07/better-compression-with-zstandard/ > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Wed Nov 28 12:59:26 2018 From: mertz at gnosis.cx (David Mertz) Date: Wed, 28 Nov 2018 12:59:26 -0500 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> Message-ID: +1 to Brett's idea. It's hard to have a good mental model already of which compression algorithms are and are not in stdlib. A package to contain them all would help a lot. On Wed, Nov 28, 2018, 12:56 PM Brett Cannon Are we getting to the point that we want a compresslib like hashlib if we > are going to be adding more compression algorithms? > > On Wed, 28 Nov 2018 at 08:44, Antoine Pitrou wrote: > >> On Wed, 28 Nov 2018 10:28:19 +0000 >> Jonathan Underwood wrote: >> > Hi, >> > >> > I have for sometime maintained the Python bindings to the LZ4 >> > compression library[0, 1]: >> > >> > I am wondering if there is interest in having these bindings move to >> > the standard library to sit alongside the gzip, lzma etc bindings? >> > Obviously the code would need to be modified to fit the coding >> > guidelines etc. >> >> Personally I would find it useful indeed. LZ4 is very attractive >> when (de)compression speed is a primary factor, for example when >> sending data over a fast network link or a fast local SSD. >> >> Another compressor worth including is Zstandard (by the same author as >> LZ4). Actually, Zstandard and LZ4 cover most of the (speed / >> compression ratio) range quite well. Informative graphs below: >> >> https://gregoryszorc.com/blog/2017/03/07/better-compression-with-zstandard/ >> >> Regards >> >> Antoine. >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/brett%40python.org >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/mertz%40gnosis.cx > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Nov 28 12:59:08 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 28 Nov 2018 18:59:08 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> Message-ID: <20181128185908.0e7fac1d@fsol> On Wed, 28 Nov 2018 09:51:57 -0800 Brett Cannon wrote: > Are we getting to the point that we want a compresslib like hashlib if we > are going to be adding more compression algorithms? It may be useful as a generic abstraction wrapper for simple usage but some compression libraries have custom facilities that would still require a dedicated interface. For example, LZ4 has two formats: a raw format and a framed format. Zstandard allows you to pass a custom dictionary to optimize compression of small data. I believe lzma has many tunables. Regards Antoine. From greg at krypto.org Wed Nov 28 13:43:04 2018 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 28 Nov 2018 10:43:04 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> Message-ID: On Wed, Nov 28, 2018 at 9:52 AM Brett Cannon wrote: > Are we getting to the point that we want a compresslib like hashlib if we > are going to be adding more compression algorithms? > Lets avoid the lib suffix when unnecessary. I used the name hashlib because the name hash was already taken by a builtin that people normally shouldn't be using. zlib gets a lib suffix because a one letter name is evil and it matches the project name. ;) "compress" sounds nicer. ... looking on PyPI to see if that name is taken: https://pypi.org/project/compress/ exists and is already effectively what you are describing. (never used it or seen it used, no idea about quality) I don't think adding lz4 to the stdlib is worthwhile. It isn't required for core functionality as zlib is (lowest common denominator zip support). I'd argue that bz2 doesn't even belong in the stdlib, but we shouldn't go removing things. PyPI makes getting more algorithms easy. If anything, it'd be nice to standardize on some stdlib namespaces that others could plug their modules into. Create a compress in the stdlib with zlib and bz2 in it, and a way for extension modules to add themselves in a managed manner instead of requiring a top level name? Opening up a designated namespace to third party modules is not something we've done as a project in the past though. It requires care. I haven't thought that through. -gps > > On Wed, 28 Nov 2018 at 08:44, Antoine Pitrou wrote: > >> On Wed, 28 Nov 2018 10:28:19 +0000 >> Jonathan Underwood wrote: >> > Hi, >> > >> > I have for sometime maintained the Python bindings to the LZ4 >> > compression library[0, 1]: >> > >> > I am wondering if there is interest in having these bindings move to >> > the standard library to sit alongside the gzip, lzma etc bindings? >> > Obviously the code would need to be modified to fit the coding >> > guidelines etc. >> >> Personally I would find it useful indeed. LZ4 is very attractive >> when (de)compression speed is a primary factor, for example when >> sending data over a fast network link or a fast local SSD. >> >> Another compressor worth including is Zstandard (by the same author as >> LZ4). Actually, Zstandard and LZ4 cover most of the (speed / >> compression ratio) range quite well. Informative graphs below: >> >> https://gregoryszorc.com/blog/2017/03/07/better-compression-with-zstandard/ >> >> Regards >> >> Antoine. >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/brett%40python.org >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Nov 28 13:54:44 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 28 Nov 2018 19:54:44 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> Message-ID: <20181128195444.727883c3@fsol> On Wed, 28 Nov 2018 10:43:04 -0800 "Gregory P. Smith" wrote: > On Wed, Nov 28, 2018 at 9:52 AM Brett Cannon wrote: > > > Are we getting to the point that we want a compresslib like hashlib if we > > are going to be adding more compression algorithms? > > > > Lets avoid the lib suffix when unnecessary. I used the name hashlib > because the name hash was already taken by a builtin that people normally > shouldn't be using. zlib gets a lib suffix because a one letter name is > evil and it matches the project name. ;) "compress" sounds nicer. > > ... looking on PyPI to see if that name is taken: > https://pypi.org/project/compress/ exists and is already effectively what > you are describing. (never used it or seen it used, no idea about quality) > > I don't think adding lz4 to the stdlib is worthwhile. It isn't required > for core functionality as zlib is (lowest common denominator zip support). Actually, if some people are interested in compressing .pyc files, lz4 is probably the best candidate (will yield significant compression benefits with very little CPU overhead). Regards Antoine. From jonathan.underwood at gmail.com Wed Nov 28 14:35:31 2018 From: jonathan.underwood at gmail.com (Jonathan Underwood) Date: Wed, 28 Nov 2018 19:35:31 +0000 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <20181128195444.727883c3@fsol> References: <20181128174343.6bc8760b@fsol> <20181128195444.727883c3@fsol> Message-ID: On Wed, 28 Nov 2018 at 18:57, Antoine Pitrou wrote: > > On Wed, 28 Nov 2018 10:43:04 -0800 > "Gregory P. Smith" wrote: [snip] > > I don't think adding lz4 to the stdlib is worthwhile. It isn't required > > for core functionality as zlib is (lowest common denominator zip support). > > Actually, if some people are interested in compressing .pyc files, lz4 > is probably the best candidate (will yield significant compression > benefits with very little CPU overhead). > It's interesting to note that there's an outstanding feature request to enable "import modules from a library.tar.lz4", justified on the basis that it would be helpful to the python-for-android project: https://github.com/python-lz4/python-lz4/issues/45 Cheers, Jonathan From solipsis at pitrou.net Wed Nov 28 15:06:12 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 28 Nov 2018 21:06:12 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? References: <20181128174343.6bc8760b@fsol> <20181128195444.727883c3@fsol> Message-ID: <20181128210612.58ecd96a@fsol> On Wed, 28 Nov 2018 19:35:31 +0000 Jonathan Underwood wrote: > On Wed, 28 Nov 2018 at 18:57, Antoine Pitrou wrote: > > > > On Wed, 28 Nov 2018 10:43:04 -0800 > > "Gregory P. Smith" wrote: > [snip] > > > I don't think adding lz4 to the stdlib is worthwhile. It isn't required > > > for core functionality as zlib is (lowest common denominator zip support). > > > > Actually, if some people are interested in compressing .pyc files, lz4 > > is probably the best candidate (will yield significant compression > > benefits with very little CPU overhead). > > > > It's interesting to note that there's an outstanding feature request > to enable "import modules from a library.tar.lz4", justified on the > basis that it would be helpful to the python-for-android project: > > https://github.com/python-lz4/python-lz4/issues/45 Interesting. The tar format isn't adequate for this: the whole tar file will be compressed at once, so you need to uncompress it all even to import a single module. The zip format is more adapted, but it doesn't seem to have LZ4 in its registered codecs. At least for pyc files, though, this could be done at the marshal level rather than at the importlib level. Regards Antoine. From steve at pearwood.info Wed Nov 28 16:27:37 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 29 Nov 2018 08:27:37 +1100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> Message-ID: <20181128212736.GW4319@ando.pearwood.info> On Wed, Nov 28, 2018 at 10:43:04AM -0800, Gregory P. Smith wrote: > PyPI makes getting more algorithms easy. Can we please stop over-generalising like this? PyPI makes getting more algorithms easy for *SOME* people. (Sorry for shouting, but you just pressed one of my buttons.) PyPI might as well not exist for those who cannot, for technical or policy reasons, install addition software beyond the std lib on the computers they use. (I hesitate to say "their computers".) In many school or corporate networks, installing unapproved software can get you expelled or fired. And getting approval may be effectively impossible, or take months of considerable effort navigating some complex bureaucratic process. This is not an argument either for or against adding LZ4, I have no opinion either way. But it is a reminder that "just get it from PyPI" represents an extremely privileged position that not all Python users are capable of taking, and we shouldn't be so blase about abandoning those who can't to future std lib improvements. -- Steve From njs at pobox.com Wed Nov 28 16:37:33 2018 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 28 Nov 2018 13:37:33 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <20181128210612.58ecd96a@fsol> References: <20181128174343.6bc8760b@fsol> <20181128195444.727883c3@fsol> <20181128210612.58ecd96a@fsol> Message-ID: On Wed, Nov 28, 2018, 12:08 Antoine Pitrou On Wed, 28 Nov 2018 19:35:31 +0000 > Jonathan Underwood wrote: > > On Wed, 28 Nov 2018 at 18:57, Antoine Pitrou > wrote: > > > > > > On Wed, 28 Nov 2018 10:43:04 -0800 > > > "Gregory P. Smith" wrote: > > [snip] > > > > I don't think adding lz4 to the stdlib is worthwhile. It isn't > required > > > > for core functionality as zlib is (lowest common denominator zip > support). > > > > > > Actually, if some people are interested in compressing .pyc files, lz4 > > > is probably the best candidate (will yield significant compression > > > benefits with very little CPU overhead). > > > > > > > It's interesting to note that there's an outstanding feature request > > to enable "import modules from a library.tar.lz4", justified on the > > basis that it would be helpful to the python-for-android project: > > > > https://github.com/python-lz4/python-lz4/issues/45 > > Interesting. The tar format isn't adequate for this: the whole tar > file will be compressed at once, so you need to uncompress it all even > to import a single module. The zip format is more adapted, but it > doesn't seem to have LZ4 in its registered codecs. > Zip can be used without compression as a simple container for files, though, and there's nothing that says those can't be .pyc.lz4 files. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Nov 28 22:14:03 2018 From: brett at python.org (Brett Cannon) Date: Wed, 28 Nov 2018 19:14:03 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <20181128212736.GW4319@ando.pearwood.info> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: On Wed, 28 Nov 2018 at 13:29, Steven D'Aprano wrote: > On Wed, Nov 28, 2018 at 10:43:04AM -0800, Gregory P. Smith wrote: > > > PyPI makes getting more algorithms easy. > > Can we please stop over-generalising like this? PyPI makes getting > more algorithms easy for *SOME* people. (Sorry for shouting, but you > just pressed one of my buttons.) > Is shouting necessary to begin with, though? I understand people relying on PyPI more and more can be troublesome for some and a sticking point, but if you know it's a trigger for you then waiting until you didn't feel like shouting seems like a reasonable course of action while still getting your point across. > > PyPI might as well not exist for those who cannot, for technical or > policy reasons, install addition software beyond the std lib on the > computers they use. (I hesitate to say "their computers".) > > In many school or corporate networks, installing unapproved software can > get you expelled or fired. And getting approval may be effectively > impossible, or take months of considerable effort navigating some > complex bureaucratic process. > > This is not an argument either for or against adding LZ4, I have no > opinion either way. > But it is a reminder that "just get it from PyPI" > represents an extremely privileged position that not all Python users > are capable of taking, and we shouldn't be so blase about abandoning > those who can't to future std lib improvements. > We have never really had a discussion about how we want to guide the stdlib going forward (e.g. how much does PyPI influence things, focus/theme, etc.). Maybe we should consider finally having that discussion once the governance model is chosen and before we consider adding a new module as things like people's inability to access PyPI come up pretty consistently (e.g. I know Paul Moore also brings this up regularly). -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Nov 29 00:02:29 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 29 Nov 2018 16:02:29 +1100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: <20181129050229.GA4319@ando.pearwood.info> On Wed, Nov 28, 2018 at 07:14:03PM -0800, Brett Cannon wrote: > On Wed, 28 Nov 2018 at 13:29, Steven D'Aprano wrote: > > > On Wed, Nov 28, 2018 at 10:43:04AM -0800, Gregory P. Smith wrote: > > > > > PyPI makes getting more algorithms easy. > > > > Can we please stop over-generalising like this? PyPI makes getting > > more algorithms easy for *SOME* people. (Sorry for shouting, but you > > just pressed one of my buttons.) > > Is shouting necessary to begin with, though? Yes. My apology was a polite fiction, a left over from the old Victorian British "stiff upper lip" attitude that showing emotion in public is Not The Done Thing. I should stop making those faux apologies, it is a bad habit. We aren't robots, we're human beings and we shouldn't apologise for showing our emotions. Nothing important ever got done without people having, and showing, strong emotions either for or against it. Of course I'm not genuinely sorry for showing my strength of feeling over this issue. Its just a figure of speech: all-caps is used to give emphasis in a plain text medium, it is not literal shouting. In any case, I retract the faux apology, it was a silly thing for me to say that undermines my own message as well as reinforcing the pernicious message that expressing the strength of emotional feeling about an issue is a bad thing that needs to be surpressed. -- Steve From armin.rigo at gmail.com Thu Nov 29 01:08:01 2018 From: armin.rigo at gmail.com (Armin Rigo) Date: Thu, 29 Nov 2018 08:08:01 +0200 Subject: [Python-Dev] C API changes In-Reply-To: <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> Message-ID: Hi Steve, On Tue, 27 Nov 2018 at 19:14, Steve Dower wrote: > On 27Nov2018 0609, Victor Stinner wrote: > > Note: Again, in my plan, the new C API would be an opt-in API. The old > > C API would remain unchanged and fully supported. So there is no > > impact on performance if you consider to use the old C API. > > This is one of the things that makes me think your plan is not feasible. I can easily imagine the new API having two different implementations even for CPython: A) you can use the generic implementation, which produces a cross-python-compatible .so. All function calls go through the API at runtime. The same .so works on any version of CPython or PyPy. B) you can use a different set of headers or a #define or something, and you get a higher-performance version of your unmodified code---with the issue that the .so only runs on the exact version of CPython. This is done by defining some of the functions as macros. I would expect this version to be of similar speed than the current C API in most cases. This might give a way forward: people would initially port their extensions hoping to use the option B; once that is done, they can easily measure---not guess--- the extra performance costs of the option A, and decide based on actual data if the difference is really worth the additional troubles of distributing many versions. Even if it is, they can distribute an A version for PyPy and for unsupported CPython versions, and add a few B versions on top of that. ...Also, although I'm discussing it here, I think the whole approach would be better if done as a third-party extension for now, without requiring changes to CPython---just use the existing C API to implement the CPython version. The B option discussed above can even be mostly *just* a set of macros, with a bit of runtime that we might as well include in the produced .so in order to make it a standalone, regular CPython C extension module. A bient?t, Armin. PS: on CPython could use ``typedef struct { PyObject *_obj; } PyHandle;``. This works like a pointer, but you can't use ``==`` to compare them. From rosuav at gmail.com Thu Nov 29 01:19:28 2018 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 29 Nov 2018 17:19:28 +1100 Subject: [Python-Dev] C API changes In-Reply-To: References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> Message-ID: On Thu, Nov 29, 2018 at 5:10 PM Armin Rigo wrote: > PS: on CPython could use ``typedef struct { PyObject *_obj; } > PyHandle;``. This works like a pointer, but you can't use ``==`` to > compare them. And then you could have a macro or inline function to compare them, simply by looking at that private member, and it should compile down to the exact same machine code as comparing the original pointers directly. It'd be a not-unreasonable migration path, should you want to work that way - zero run-time cost. ChrisA From greg at krypto.org Thu Nov 29 04:10:01 2018 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 29 Nov 2018 01:10:01 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> Message-ID: On Wed, Nov 28, 2018 at 10:43 AM Gregory P. Smith wrote: > > On Wed, Nov 28, 2018 at 9:52 AM Brett Cannon wrote: > >> Are we getting to the point that we want a compresslib like hashlib if we >> are going to be adding more compression algorithms? >> > > Lets avoid the lib suffix when unnecessary. I used the name hashlib > because the name hash was already taken by a builtin that people normally > shouldn't be using. zlib gets a lib suffix because a one letter name is > evil and it matches the project name. ;) "compress" sounds nicer. > > ... looking on PyPI to see if that name is taken: > https://pypi.org/project/compress/ exists and is already effectively what > you are describing. (never used it or seen it used, no idea about quality) > > I don't think adding lz4 to the stdlib is worthwhile. It isn't required > for core functionality as zlib is (lowest common denominator zip support). > I'd argue that bz2 doesn't even belong in the stdlib, but we shouldn't go > removing things. PyPI makes getting more algorithms easy. > > If anything, it'd be nice to standardize on some stdlib namespaces that > others could plug their modules into. Create a compress in the stdlib with > zlib and bz2 in it, and a way for extension modules to add themselves in a > managed manner instead of requiring a top level name? Opening up a > designated namespace to third party modules is not something we've done as > a project in the past though. It requires care. I haven't thought that > through. > > -gps > While my gut reaction was to say "no" to adding lz4 to the stdlib above... I'm finding myself reconsidering and not against adding lz4 to the stdlib. I just want us to have a good reason if we do. This type of extension module tends to be very easy to maintain (and you are volunteering). A good reason in the past has been the algorithm being widely used. Obviously the case with zlib (gzip and zipfile), bz2, and lzma (.xz). Those are all slower and tighter though. lz4 is extremely fast, especially for decompression. It could make a nice addition as that is an area our standard library offers nothing. So change my -1 to a +0.5. Q: Are there other popular alternatives to fill that niche that we should strongly consider instead or as well? 5 years ago the answer would've been Snappy. 15 years ago the answer would've been LZO. I suggest not rabbit-holing this on whether we should adopt a top level namespace for these such as "compress". A good question to ask, but we can resolve that larger topic on its own without blocking anything. lz4 has claimed the global pypi lz4 module namespace today so moving it to the stdlib under that name is normal - A pretty transparent transition. If we do that, the PyPI version of lz4 should remain for use on older CPython versions, but effectively be frozen, never to gain new features once lz4 has landed in its first actual CPython release. -gps > >> >> On Wed, 28 Nov 2018 at 08:44, Antoine Pitrou wrote: >> >>> On Wed, 28 Nov 2018 10:28:19 +0000 >>> Jonathan Underwood wrote: >>> > Hi, >>> > >>> > I have for sometime maintained the Python bindings to the LZ4 >>> > compression library[0, 1]: >>> > >>> > I am wondering if there is interest in having these bindings move to >>> > the standard library to sit alongside the gzip, lzma etc bindings? >>> > Obviously the code would need to be modified to fit the coding >>> > guidelines etc. >>> >>> Personally I would find it useful indeed. LZ4 is very attractive >>> when (de)compression speed is a primary factor, for example when >>> sending data over a fast network link or a fast local SSD. >>> >>> Another compressor worth including is Zstandard (by the same author as >>> LZ4). Actually, Zstandard and LZ4 cover most of the (speed / >>> compression ratio) range quite well. Informative graphs below: >>> >>> https://gregoryszorc.com/blog/2017/03/07/better-compression-with-zstandard/ >>> >>> Regards >>> >>> Antoine. >>> >>> >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> https://mail.python.org/mailman/options/python-dev/brett%40python.org >>> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.underwood at gmail.com Thu Nov 29 04:39:45 2018 From: jonathan.underwood at gmail.com (Jonathan Underwood) Date: Thu, 29 Nov 2018 09:39:45 +0000 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> Message-ID: On Thu, 29 Nov 2018 at 09:13, Gregory P. Smith wrote: > Q: Are there other popular alternatives to fill that niche that we should strongly consider instead or as well? > > 5 years ago the answer would've been Snappy. 15 years ago the answer would've been LZO. > Today LZ4 hits a sweet spot for fast compression and decompression at the lower compression ratio end of the spectrum, offering significantly faster compression and decompression than zlib or bz2, but not as high compression ratios (at usable speeds). It's also had time to stabilize, and a standard frame format for compressed data has been adopted by the community. The other main contenders in town are zstd, which was mentioned earlier in the thread, and brotli. Both are based on dictionary compression. Zstd is very impressive, offering high compression ratios, but is being very actively developed at present, so is a bit more of a moving target.Brotli is in the same ballpark as Zstd. They both cover the higher compression end of the spectrum than lz4. Some nice visualizations are here (although the data is now a bit out of date - lz4 has had some speed improvements at the higher compression ratio end): https://gregoryszorc.com/blog/2017/03/07/better-compression-with-zstandard/ > I suggest not rabbit-holing this on whether we should adopt a top level namespace for these such as "compress". A good question to ask, but we can resolve that larger topic on its own without blocking anything. > It's funny, but I had gone around in that loop in my head ahead of sending my email. My thinking was: there's a real need for some unification and simplification in the compression space, but I'll work on integrating LZ4, and in the process look at opportunities for the new interface design. I'm a fan of learning through iteration, rather than spending 5 years designing the ultimate compression abstraction and then finding a corner case that it doesn't fit. > lz4 has claimed the global pypi lz4 module namespace today so moving it to the stdlib under that name is normal - A pretty transparent transition. If we do that, the PyPI version of lz4 should remain for use on older CPython versions, but effectively be frozen, never to gain new features once lz4 has landed in its first actual CPython release. > Yes, that was what I was presuming would be the path forward. Cheers, Jonathan From p.f.moore at gmail.com Thu Nov 29 04:52:29 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 29 Nov 2018 09:52:29 +0000 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: [This is getting off-topic, so I'll limit my comments to this one email] On Thu, 29 Nov 2018 at 03:17, Brett Cannon wrote: > We have never really had a discussion about how we want to guide the stdlib going forward (e.g. how much does PyPI influence things, focus/theme, etc.). Maybe we should consider finally having that discussion once the governance model is chosen and before we consider adding a new module as things like people's inability to access PyPI come up pretty consistently (e.g. I know Paul Moore also brings this up regularly). I'm not sure a formal discussion on this matter will help much - my feeling is that most people have relatively fixed views on how they would like things to go (large stdlib/batteries included vs external modules/PyPI/slim stdlib). The "problem" isn't so much with people having different views (as a group, we're pretty good at achieving workable compromises in the face of differing views) as it is about people forgetting that their experience isn't the only reality, which causes unnecessary frustration in discussions. That's more of a people problem than a technical one. What would be nice would be if we could persuade people *not* to assume "adding an external dependency is easy" in all cases, and frame their arguments in a way that doesn't make that mistaken assumption. The arguments/debates are fine, what's not so fine is having to repeatedly explain how people are making fundamentally unsound assumptions. Having said that, what *is* difficult to know, is how *many* people are in situations where adding an external dependency is hard, and how hard it is in practice. The sorts of environment where PyPI access is hard are also the sorts of environment where participation in open source discussions is rare, so getting good information is hard. Paul From songofacandy at gmail.com Thu Nov 29 05:01:55 2018 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 29 Nov 2018 19:01:55 +0900 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <20181128212736.GW4319@ando.pearwood.info> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: On Thu, Nov 29, 2018 at 6:27 AM Steven D'Aprano wrote: > > On Wed, Nov 28, 2018 at 10:43:04AM -0800, Gregory P. Smith wrote: > > > PyPI makes getting more algorithms easy. > > Can we please stop over-generalising like this? PyPI makes getting > more algorithms easy for *SOME* people. (Sorry for shouting, but you > just pressed one of my buttons.) I don't think this is over-generalising. If "get it from PyPI" is not easy enough, why not adding hundreds of famous libraries? Because we can't maintain all of them well. When considering adding new format (not only compression, but also serialization like toml), I think it should be stable, widely used, and will be used widely for a long time. If we want to use the format in Python core or Python stdlib, it's good reasoning too. gzip and json are good example. When we say "we can use PyPI", it means "are there enough reasons make the package special enough to add to stdlib?" We don't mean "everyone can use PyPI." Regards, -- INADA Naoki From solipsis at pitrou.net Thu Nov 29 05:54:15 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 29 Nov 2018 11:54:15 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: <20181129115415.0ee3e1f1@fsol> On Thu, 29 Nov 2018 09:52:29 +0000 Paul Moore wrote: > [This is getting off-topic, so I'll limit my comments to this one email] > > On Thu, 29 Nov 2018 at 03:17, Brett Cannon wrote: > > We have never really had a discussion about how we want to guide the stdlib going forward (e.g. how much does PyPI influence things, focus/theme, etc.). Maybe we should consider finally having that discussion once the governance model is chosen and before we consider adding a new module as things like people's inability to access PyPI come up pretty consistently (e.g. I know Paul Moore also brings this up regularly). > > I'm not sure a formal discussion on this matter will help much - my > feeling is that most people have relatively fixed views on how they > would like things to go (large stdlib/batteries included vs external > modules/PyPI/slim stdlib). The "problem" isn't so much with people > having different views (as a group, we're pretty good at achieving > workable compromises in the face of differing views) as it is about > people forgetting that their experience isn't the only reality, which > causes unnecessary frustration in discussions. That's more of a people > problem than a technical one. I'd like to point the discussion is asymmetric here. On the one hand, people who don't have access to PyPI would _really_ benefit from a larger stdlib with more batteries included. On the other hand, people who have access to PyPI _don't_ benefit from having a slim stdlib. There's nothing virtuous or advantageous about having _less_ batteries included. Python doesn't become magically faster or more powerful by including less in its standard distribution: the best it does is make the distribution slightly smaller. So there's really one bunch of people arguing for practical benefits, and another bunch of people arguing for mostly aesthetical or philosophical reasons. Regards Antoine. From andrew.svetlov at gmail.com Thu Nov 29 05:55:07 2018 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Thu, 29 Nov 2018 12:55:07 +0200 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: 5 cents about lz4 alternatives: Broli (mentioned above) is widely supported by web. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding mentions it along with gzip and deflate methods. I don't recall lz4 or Zstd metioning in this context. Both Chrome/Chromium and Firefox accepts it by default (didn't check Microsoft products yet). P.S. I worked with lz4 python binding a year ago. It sometimes crashed to core dump when used in multithreaded environment (we used to run compressor/decompresson with asyncio by loop.run_in_executor() call). I hope the bug is fixed now, have no update for the current state. On Thu, Nov 29, 2018 at 12:04 PM INADA Naoki wrote: > On Thu, Nov 29, 2018 at 6:27 AM Steven D'Aprano > wrote: > > > > On Wed, Nov 28, 2018 at 10:43:04AM -0800, Gregory P. Smith wrote: > > > > > PyPI makes getting more algorithms easy. > > > > Can we please stop over-generalising like this? PyPI makes getting > > more algorithms easy for *SOME* people. (Sorry for shouting, but you > > just pressed one of my buttons.) > > I don't think this is over-generalising. > > If "get it from PyPI" is not easy enough, why not adding hundreds of > famous libraries? > Because we can't maintain all of them well. > > When considering adding new format (not only compression, but also > serialization like toml), I think it should be stable, widely used, and > will > be used widely for a long time. If we want to use the format in Python > core > or Python stdlib, it's good reasoning too. gzip and json are good example. > > When we say "we can use PyPI", it means "are there enough reasons > make the package special enough to add to stdlib?" We don't mean > "everyone can use PyPI." > > Regards, > -- > INADA Naoki > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com > -- Thanks, Andrew Svetlov -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Nov 29 06:01:07 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 29 Nov 2018 12:01:07 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129050229.GA4319@ando.pearwood.info> Message-ID: <20181129120107.5bd6e6b1@fsol> On Thu, 29 Nov 2018 16:02:29 +1100 Steven D'Aprano wrote: > On Wed, Nov 28, 2018 at 07:14:03PM -0800, Brett Cannon wrote: > > On Wed, 28 Nov 2018 at 13:29, Steven D'Aprano wrote: > > > > > On Wed, Nov 28, 2018 at 10:43:04AM -0800, Gregory P. Smith wrote: > > > > > > > PyPI makes getting more algorithms easy. > > > > > > Can we please stop over-generalising like this? PyPI makes getting > > > more algorithms easy for *SOME* people. (Sorry for shouting, but you > > > just pressed one of my buttons.) > > > > Is shouting necessary to begin with, though? > > Yes. > > My apology was a polite fiction, a left over from the old Victorian > British "stiff upper lip" attitude that showing emotion in public is Not > The Done Thing. I should stop making those faux apologies, it is a bad > habit. > > We aren't robots, we're human beings and we shouldn't apologise for > showing our emotions. Nothing important ever got done without people > having, and showing, strong emotions either for or against it. +1. Not that this is very important to discuss, but I don't think putting a single word in caps is offensive or agressive behavior. Let's cut people some slack :-) Regards Antoine. From jonathan.underwood at gmail.com Thu Nov 29 08:45:34 2018 From: jonathan.underwood at gmail.com (Jonathan Underwood) Date: Thu, 29 Nov 2018 13:45:34 +0000 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: On Thu, 29 Nov 2018 at 11:00, Andrew Svetlov wrote: > I worked with lz4 python binding a year ago. > It sometimes crashed to core dump when used in multithreaded environment (we used to run compressor/decompresson with asyncio by loop.run_in_executor() call). > I hope the bug is fixed now, have no update for the current state. Andrew and I discussed this off-list. The upshot is that this was happening in a situation where a (de)compression context was being used simultaneously by multiple threads, which is not supported. I'll look at making this use case not crash, though. From benjamin at python.org Thu Nov 29 09:36:51 2018 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 29 Nov 2018 09:36:51 -0500 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <20181129115415.0ee3e1f1@fsol> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> Message-ID: <52341a7b-7299-44fa-abfd-be0765277a35@sloti1d3t01> On Thu, Nov 29, 2018, at 04:54, Antoine Pitrou wrote: > On Thu, 29 Nov 2018 09:52:29 +0000 > Paul Moore wrote: > > [This is getting off-topic, so I'll limit my comments to this one email] > > > > On Thu, 29 Nov 2018 at 03:17, Brett Cannon wrote: > > > We have never really had a discussion about how we want to guide the stdlib going forward (e.g. how much does PyPI influence things, focus/theme, etc.). Maybe we should consider finally having that discussion once the governance model is chosen and before we consider adding a new module as things like people's inability to access PyPI come up pretty consistently (e.g. I know Paul Moore also brings this up regularly). > > > > I'm not sure a formal discussion on this matter will help much - my > > feeling is that most people have relatively fixed views on how they > > would like things to go (large stdlib/batteries included vs external > > modules/PyPI/slim stdlib). The "problem" isn't so much with people > > having different views (as a group, we're pretty good at achieving > > workable compromises in the face of differing views) as it is about > > people forgetting that their experience isn't the only reality, which > > causes unnecessary frustration in discussions. That's more of a people > > problem than a technical one. > > I'd like to point the discussion is asymmetric here. > > On the one hand, people who don't have access to PyPI would _really_ > benefit from a larger stdlib with more batteries included. > > On the other hand, people who have access to PyPI _don't_ benefit from > having a slim stdlib. There's nothing virtuous or advantageous about > having _less_ batteries included. Python doesn't become magically > faster or more powerful by including less in its standard > distribution: the best it does is make the distribution slightly > smaller. > > So there's really one bunch of people arguing for practical benefits, > and another bunch of people arguing for mostly aesthetical or > philosophical reasons. I don't think it's asymmetric. People have raised several practical problems with a large stdlib in this thread. These include: - The evelopment of stdlib modules slows to the rate of the Python release schedule. - stdlib modules become a permanent maintenance burden to CPython core developers. - The blessed status of stdlib modules means that users might use a substandard stdlib modules when a better thirdparty alternative exists. From antoine at python.org Thu Nov 29 09:45:39 2018 From: antoine at python.org (Antoine Pitrou) Date: Thu, 29 Nov 2018 15:45:39 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <52341a7b-7299-44fa-abfd-be0765277a35@sloti1d3t01> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <52341a7b-7299-44fa-abfd-be0765277a35@sloti1d3t01> Message-ID: <4380075d-c8cd-8b11-56b2-e194ed4d41f8@python.org> Le 29/11/2018 ? 15:36, Benjamin Peterson a ?crit?: >> >> I'd like to point the discussion is asymmetric here. >> >> On the one hand, people who don't have access to PyPI would _really_ >> benefit from a larger stdlib with more batteries included. >> >> On the other hand, people who have access to PyPI _don't_ benefit from >> having a slim stdlib. There's nothing virtuous or advantageous about >> having _less_ batteries included. Python doesn't become magically >> faster or more powerful by including less in its standard >> distribution: the best it does is make the distribution slightly >> smaller. >> >> So there's really one bunch of people arguing for practical benefits, >> and another bunch of people arguing for mostly aesthetical or >> philosophical reasons. > > I don't think it's asymmetric. People have raised several practical problems with a large stdlib in this thread. These include: > - The evelopment of stdlib modules slows to the rate of the Python release schedule. Can you explain why that would be the case? As a matter of fact, the Python release schedule seems to remain largely the same even though we accumulate more stdlib modules and functionality. The only risk is the prematurate inclusion of an unstable or ill-written module which may lengthen the stabilization cycle. > - stdlib modules become a permanent maintenance burden to CPython core developers. This is true. We should only accept modules that we think are useful for a reasonable part of the user base. > - The blessed status of stdlib modules means that users might use a substandard stdlib modules when a better thirdparty alternative exists. We can always mention the better thirdparty alternative in the docs where desired. But there are many cases where the stdlib module is good enough for what people use it. For example, the upstream simplejson module may have slightly more advanced functionality, but for most purposes the stdlib json module fits the bill and spares you from tracking a separate dependency. Regards Antoine. From benjamin at python.org Thu Nov 29 09:53:30 2018 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 29 Nov 2018 09:53:30 -0500 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <20181128212736.GW4319@ando.pearwood.info> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: On Wed, Nov 28, 2018, at 15:27, Steven D'Aprano wrote: > On Wed, Nov 28, 2018 at 10:43:04AM -0800, Gregory P. Smith wrote: > > > PyPI makes getting more algorithms easy. > > Can we please stop over-generalising like this? PyPI makes getting > more algorithms easy for *SOME* people. (Sorry for shouting, but you > just pressed one of my buttons.) > > PyPI might as well not exist for those who cannot, for technical or > policy reasons, install addition software beyond the std lib on the > computers they use. (I hesitate to say "their computers".) > > In many school or corporate networks, installing unapproved software can > get you expelled or fired. And getting approval may be effectively > impossible, or take months of considerable effort navigating some > complex bureaucratic process. > > This is not an argument either for or against adding LZ4, I have no > opinion either way. But it is a reminder that "just get it from PyPI" > represents an extremely privileged position that not all Python users > are capable of taking, and we shouldn't be so blase about abandoning > those who can't to future std lib improvements. While I'm sympathetic to users in such situations, I'm not sure how much we can really help them. These are the sorts of users who are likely to still be stuck using Python 2.6. Any stdlib improvements we discuss and implement today are easily a decade away from benefiting users in restrictive environments. On that kind of timescale, it's very hard to know what to do, especially since, as Paul says, we don't hear much feedback from such users. The model these users are living in is increasingly at odds with how software is written and used these days. Browsers and Rust are updated every month. The node.js world is a frenzy of change. Are these users' environments running 5 year old browsers? I hope not for security's sake. At some point, languorous corporate environments will have to catch up to how modern software development is done to avoid seriously hampering their employees' effectiveness (and security!). From benjamin at python.org Thu Nov 29 10:05:47 2018 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 29 Nov 2018 10:05:47 -0500 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <4380075d-c8cd-8b11-56b2-e194ed4d41f8@python.org> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <52341a7b-7299-44fa-abfd-be0765277a35@sloti1d3t01> <4380075d-c8cd-8b11-56b2-e194ed4d41f8@python.org> Message-ID: <171a9da9-559b-4c80-a2b3-3e8d286eb988@sloti1d3t01> On Thu, Nov 29, 2018, at 08:45, Antoine Pitrou wrote: > > Le 29/11/2018 ? 15:36, Benjamin Peterson a ?crit : > >> > >> I'd like to point the discussion is asymmetric here. > >> > >> On the one hand, people who don't have access to PyPI would _really_ > >> benefit from a larger stdlib with more batteries included. > >> > >> On the other hand, people who have access to PyPI _don't_ benefit from > >> having a slim stdlib. There's nothing virtuous or advantageous about > >> having _less_ batteries included. Python doesn't become magically > >> faster or more powerful by including less in its standard > >> distribution: the best it does is make the distribution slightly > >> smaller. > >> > >> So there's really one bunch of people arguing for practical benefits, > >> and another bunch of people arguing for mostly aesthetical or > >> philosophical reasons. > > > > I don't think it's asymmetric. People have raised several practical problems with a large stdlib in this thread. These include: > > - The evelopment of stdlib modules slows to the rate of the Python release schedule. > > Can you explain why that would be the case? As a matter of fact, the > Python release schedule seems to remain largely the same even though we > accumulate more stdlib modules and functionality. The problem is the length of the Python release schedule. It means, in the extreme case, that stdlib modifications won't see the light of day for 2 years (feature freeze + 1.5 year release schedule). And that's only if people update Python immediately after a release. PyPI modules can evolve much more rapidly. CPython releases come in one big chunk. Even if you want to use improvements to one stdlib module in a new Python version, you may be hampered by breakages in your code from language changes or changes in a different stdlib module. Modules on PyPI, being decoupled, don't have this problem. > > The only risk is the prematurate inclusion of an unstable or ill-written > module which may lengthen the stabilization cycle. > > > - stdlib modules become a permanent maintenance burden to CPython core developers. > > This is true. We should only accept modules that we think are useful > for a reasonable part of the user base. We agree with in the ultimate conclusion then. > > > - The blessed status of stdlib modules means that users might use a substandard stdlib modules when a better thirdparty alternative exists. > > We can always mention the better thirdparty alternative in the docs > where desired. But there are many cases where the stdlib module is good > enough for what people use it. > > For example, the upstream simplejson module may have slightly more > advanced functionality, but for most purposes the stdlib json module > fits the bill and spares you from tracking a separate dependency. Yes, I think json was a fine addition to the stdlib. However, we also have stdlib modules where there is a universally better alternative. For example, requests is better than urllib.* both in simple and complex cases. From steve at pearwood.info Thu Nov 29 10:07:35 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 30 Nov 2018 02:07:35 +1100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <52341a7b-7299-44fa-abfd-be0765277a35@sloti1d3t01> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <52341a7b-7299-44fa-abfd-be0765277a35@sloti1d3t01> Message-ID: <20181129150734.GE4319@ando.pearwood.info> On Thu, Nov 29, 2018 at 09:36:51AM -0500, Benjamin Peterson wrote: > I don't think it's asymmetric. People have raised several practical > problems with a large stdlib in this thread. These include: > > - The evelopment of stdlib modules slows to the rate of the Python > release schedule. That's not a bug, that's a feature :-) Of course that's a concern for rapidly changing libraries, but they won't be considered for the stdlib because they are rapidly changing. > - stdlib modules become a permanent maintenance burden to CPython core > developers. That's a concern, of course. Every proposed library needs to convince that the potential benefit outweighs the potential costs. On the other hand mature, stable software can survive with little or no maintenance for a very long time. The burden is not necessarily high. > - The blessed status of stdlib modules means that users might use a > substandard stdlib modules when a better thirdparty alternative > exists. Or they might re-invent the wheel and write something worse than either. I don't think it is productive to try to guess what users will do and protect them from making the "wrong" choice. Wrong choice according to whose cost-benefit analysis? -- Steve From a.badger at gmail.com Thu Nov 29 10:10:53 2018 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 29 Nov 2018 07:10:53 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: On Thu, Nov 29, 2018, 6:56 AM Benjamin Peterson > > On Wed, Nov 28, 2018, at 15:27, Steven D'Aprano wrote: > > On Wed, Nov 28, 2018 at 10:43:04AM -0800, Gregory P. Smith wrote: > > > > > PyPI makes getting more algorithms easy. > > > > Can we please stop over-generalising like this? PyPI makes getting > > more algorithms easy for *SOME* people. (Sorry for shouting, but you > > just pressed one of my buttons.) > > > > PyPI might as well not exist for those who cannot, for technical or > > policy reasons, install addition software beyond the std lib on the > > computers they use. (I hesitate to say "their computers".) > > > > In many school or corporate networks, installing unapproved software can > > get you expelled or fired. And getting approval may be effectively > > impossible, or take months of considerable effort navigating some > > complex bureaucratic process. > > > > This is not an argument either for or against adding LZ4, I have no > > opinion either way. But it is a reminder that "just get it from PyPI" > > represents an extremely privileged position that not all Python users > > are capable of taking, and we shouldn't be so blase about abandoning > > those who can't to future std lib improvements. > > While I'm sympathetic to users in such situations, I'm not sure how much > we can really help them. These are the sorts of users who are likely to > still be stuck using Python 2.6. Any stdlib improvements we discuss and > implement today are easily a decade away from benefiting users in > restrictive environments. On that kind of timescale, it's very hard to know > what to do, especially since, as Paul says, we don't hear much feedback > from such users. > As a developer of software that has to run in such environments, having a library be in the stdlib is helpful as it is easier to convince the rest of the team to bundle a backport of something that's in a future stdlib than a random package from pypi. Stdlib inclusion gives the library a known future and a (perhaps illusory, perhaps real) blessing from the core devs that helps to sell the library as the preferred solution. -Toshio > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Nov 29 10:32:53 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 29 Nov 2018 16:32:53 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <52341a7b-7299-44fa-abfd-be0765277a35@sloti1d3t01> <4380075d-c8cd-8b11-56b2-e194ed4d41f8@python.org> <171a9da9-559b-4c80-a2b3-3e8d286eb988@sloti1d3t01> Message-ID: <20181129163253.06297d9d@fsol> On Thu, 29 Nov 2018 10:05:47 -0500 "Benjamin Peterson" wrote: > On Thu, Nov 29, 2018, at 08:45, Antoine Pitrou wrote: > > > > Le 29/11/2018 ? 15:36, Benjamin Peterson a ?crit : > > >> > > >> I'd like to point the discussion is asymmetric here. > > >> > > >> On the one hand, people who don't have access to PyPI would _really_ > > >> benefit from a larger stdlib with more batteries included. > > >> > > >> On the other hand, people who have access to PyPI _don't_ benefit from > > >> having a slim stdlib. There's nothing virtuous or advantageous about > > >> having _less_ batteries included. Python doesn't become magically > > >> faster or more powerful by including less in its standard > > >> distribution: the best it does is make the distribution slightly > > >> smaller. > > >> > > >> So there's really one bunch of people arguing for practical benefits, > > >> and another bunch of people arguing for mostly aesthetical or > > >> philosophical reasons. > > > > > > I don't think it's asymmetric. People have raised several practical problems with a large stdlib in this thread. These include: > > > - The evelopment of stdlib modules slows to the rate of the Python release schedule. > > > > Can you explain why that would be the case? As a matter of fact, the > > Python release schedule seems to remain largely the same even though we > > accumulate more stdlib modules and functionality. > > The problem is the length of the Python release schedule. It means, in the extreme case, that stdlib modifications won't see the light of day for 2 years (feature freeze + 1.5 year release schedule). And that's only if people update Python immediately after a release. PyPI modules can evolve much more rapidly. Ah, I realize I had misread as "the development of stdlib modules slows the rate of the Python release schedule" (instead of "slows *to* the rate"). Regards Antoine. From steve at pearwood.info Thu Nov 29 10:39:24 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 30 Nov 2018 02:39:24 +1100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: <20181129153923.GF4319@ando.pearwood.info> On Thu, Nov 29, 2018 at 09:53:30AM -0500, Benjamin Peterson wrote: [...] > > This is not an argument either for or against adding LZ4, I have no > > opinion either way. But it is a reminder that "just get it from PyPI" > > represents an extremely privileged position that not all Python users > > are capable of taking, and we shouldn't be so blase about abandoning > > those who can't to future std lib improvements. > > While I'm sympathetic to users in such situations, I'm not sure how > much we can really help them. These are the sorts of users who are > likely to still be stuck using Python 2.6. Not necessarily. They may have approval for the latest approved vendor software, but not for third party packages. Nick has been quiet lately, it would be good to hear from him, because I expect Red Hat's corporate userbase will be an informative example. I hear Red Hat has now moved to Python 3 for its system Python, so I expect that many of their corporate users will be, or will soon be, running Python 3.x too. In any case, it's not the users stuck on 2.6 *now* that I'm so concerned about, since as you say there's nothing we can do for them. Its the users stuck on 3.9 ten years from now, who are missing out on functionality because we keep saying "its easy to get it from PyPI". (To be clear, this isn't a plea to add everything to the stdlib.) > Any stdlib improvements we > discuss and implement today are easily a decade away from benefiting > users in restrictive environments. On that kind of timescale, it's > very hard to know what to do, especially since, as Paul says, we don't > hear much feedback from such users. Indeed, it isn't easy to know where to draw the line. That's what makes the PyPI argument all the more pernicious: it makes it *seem* easy, "just get it from PyPI", because it is easy for we privileged users who control the computers we use. > The model these users are living in is increasingly at odds with how > software is written and used these days. Browsers and Rust are updated > every month. The node.js world is a frenzy of change. The node.js world is hardly the paragon of excellence in software development we ought to emulate. > Are these users' > environments running 5 year old browsers? I hope not for security's > sake. At some point, languorous corporate environments will have to > catch up to how modern software development is done to avoid seriously > hampering their employees' effectiveness (and security!). Arguably, modern software development will have to slow down and stop introducing a dozen zero-day exploits every week. (Only half tongue in cheek.) Browsers are at the interface of the most hostile, exposed part of the computer ecosystem. Not all software operates in such an exposed position, and *stability* remains important for many uses and users. I don't see that this is really relevant to Python though, not unless you're thinking about accelerating the pace of new releases. -- Steve From p.f.moore at gmail.com Thu Nov 29 10:44:54 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 29 Nov 2018 15:44:54 +0000 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: On Thu, 29 Nov 2018 at 14:56, Benjamin Peterson wrote: > While I'm sympathetic to users in such situations, I'm not sure how much we can really help them. These are the sorts of users who are likely to still be stuck using Python 2.6. Any stdlib improvements we discuss and implement today are easily a decade away from benefiting users in restrictive environments. On that kind of timescale, it's very hard to know what to do, especially since, as Paul says, we don't hear much feedback from such users. As a user in that situation, I can confirm that there *are* situations where I am stuck using older versions of Python (2.6? Ha - luxury! I have to use 2.4 on some of our systems!) But there are also situations where I can do a one-off install of the latest version of Python (typically by copying install files from one machine to another, until I get it to a machine with no internet access) but installing additional modules, while possible (by similar means) is too painful to be worthwhile. Also, when sharing scripts with other users who are able to handle "download and run the installer from python.org", but for whom "pip install" (including the proxy configuration to let pip see the internet, which isn't needed for the installer because our browsers auto-configure) is impossibly hard. So "latest Python, no PyPI access" is not an unrealistic model for me to work to. I can't offer more than one person's feedback, but my experience is a real-life situation. And it's one where I've been able to push Python over other languages (such as Perl) *precisely* because the stdlib provides a big chunk of built in functionality as part of the base install. If we hadn't been able to use Python with its stdlib, I'd have probably had to argue for Java (for the same "large stdlib" reasons). > The model these users are living in is increasingly at odds with how software is written and used these days. I'd dispute that. It's increasingly at odds with how *open source* software, and modern web applications are written (and my experience is just as limited in the opposite direction, so I'm sure this statement is just as inaccurate as the one I'm disputing ;-)). But from what I can see there's a huge base of people in "enterprise" companies (the people who 20 years ago were still writing COBOL) who are just starting to use "modern" tools like Python, usually in a "grass roots" fashion, but whose corporate infrastructure hasn't got a clue how to deal with the sort of "everything is downloaded from the web as needed" environment that web-based development is embracing. > Browsers and Rust are updated every month. The node.js world is a frenzy of change. Are these users' environments running 5 year old browsers? I hope not for security's sake. For people in those environments, I hope not as well. For people in locked down environments where upgrading internal applications is a massive project, and upgrading browsers without suffcient testing could potentially break old but key business applications, and for whom "limit internet access" is a draconian but mostly effective solution, slow change and limited connectivity is a practical solution. It frustrates me enormously, and I hate having to argue that it's the right solution, but it's certainly *not* as utterly wrong-headed as some people try to argue. Priorities differ. > At some point, languorous corporate environments will have to catch up to how modern software development is done to avoid seriously hampering their employees' effectiveness (and security!). And employees making a grass-roots effort to do so by gradually introducing modern tools like Python are part of that process. Making it harder to demonstrate benefits without needing infrastructure-level changes is not helping them do so. It's not necessarily Python's role to help that process, admittedly - but I personally have that goal, and therefore I'll continue to argue that the benefits of having a comprehensive stdlib are worth it. Paul From phd at phdru.name Thu Nov 29 10:17:51 2018 From: phd at phdru.name (Oleg Broytman) Date: Thu, 29 Nov 2018 16:17:51 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <52341a7b-7299-44fa-abfd-be0765277a35@sloti1d3t01> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <52341a7b-7299-44fa-abfd-be0765277a35@sloti1d3t01> Message-ID: <20181129151751.pd23uxuvl2wbtvsx@phdru.name> On Thu, Nov 29, 2018 at 09:36:51AM -0500, Benjamin Peterson wrote: > - stdlib modules become a permanent maintenance burden to CPython core developers. Add ditributions maintainers here. Oleg. -- Oleg Broytman https://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From p.f.moore at gmail.com Thu Nov 29 10:57:25 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 29 Nov 2018 15:57:25 +0000 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <20181129151751.pd23uxuvl2wbtvsx@phdru.name> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <52341a7b-7299-44fa-abfd-be0765277a35@sloti1d3t01> <20181129151751.pd23uxuvl2wbtvsx@phdru.name> Message-ID: On Thu, 29 Nov 2018 at 15:52, Oleg Broytman wrote: > > On Thu, Nov 29, 2018 at 09:36:51AM -0500, Benjamin Peterson wrote: > > - stdlib modules become a permanent maintenance burden to CPython core developers. > > Add ditributions maintainers here. Well, given that "you shouldn't use pip install on your distribution-installed Python", I'm not sure that's as clear cut a factor as it first seems. Yes, I know people should use virtual environments, or use "--user" installs, but distributions still have a maintenance burden providing distro-packaged modules, whether they are in the stdlib or not. If unittest were removed from the stdlib, I'd confidently expect that distributions would still provide a python3-unittest package, for example... Paul From steve.dower at python.org Thu Nov 29 11:19:49 2018 From: steve.dower at python.org (Steve Dower) Date: Thu, 29 Nov 2018 08:19:49 -0800 Subject: [Python-Dev] C API changes In-Reply-To: References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> Message-ID: On 28Nov2018 2208, Armin Rigo wrote: > Hi Steve, > > On Tue, 27 Nov 2018 at 19:14, Steve Dower wrote: >> On 27Nov2018 0609, Victor Stinner wrote: >>> Note: Again, in my plan, the new C API would be an opt-in API. The old >>> C API would remain unchanged and fully supported. So there is no >>> impact on performance if you consider to use the old C API. >> >> This is one of the things that makes me think your plan is not feasible. > > I can easily imagine the new API having two different implementations > even for CPython: > > A) you can use the generic implementation, which produces a > cross-python-compatible .so. All function calls go through the API at > runtime. The same .so works on any version of CPython or PyPy. > > B) you can use a different set of headers or a #define or something, > and you get a higher-performance version of your unmodified > code---with the issue that the .so only runs on the exact version of > CPython. This is done by defining some of the functions as macros. I > would expect this version to be of similar speed than the current C > API in most cases. > > This might give a way forward: people would initially port their > extensions hoping to use the option B; once that is done, they can > easily measure---not guess--- the extra performance costs of the > option A, and decide based on actual data if the difference is really > worth the additional troubles of distributing many versions. Even if > it is, they can distribute an A version for PyPy and for unsupported > CPython versions, and add a few B versions on top of that. This makes sense, but unless it results in PyPy drastically gaining popularity as a production runtime, it basically leaves us in the status quo. We continue to not be able to change CPython internals at all, since that will break people using option B. Though potentially if we picked an official option for A, we could deprecate the stability of option B (over a few releases) and require people using it to thoroughly test, update and #ifdef their code for each version. That would allow us to make changes to the runtime while preserving option A as the reliable version. You might want to have a look at https://github.com/Microsoft/xlang/ which is not yet ready for showtime (in particular, there's no "make it look Pythonic" support yet), but is going to extend our existing cross-language ABI to Python (alongside C++/.NET/JS) and non-Windows platforms. It's been in use for years in Windows and has been just fine. (Sample generated output at https://github.com/devhawk/pywinrt-output/tree/master/generated/pyrt/src but the design docs at the first link are probably most interesting.) Cheers, Steve From steve.dower at python.org Thu Nov 29 11:25:38 2018 From: steve.dower at python.org (Steve Dower) Date: Thu, 29 Nov 2018 08:25:38 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <20181129115415.0ee3e1f1@fsol> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> Message-ID: <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> On 29Nov2018 0254, Antoine Pitrou wrote: > I'd like to point the discussion is asymmetric here. > > On the one hand, people who don't have access to PyPI would _really_ > benefit from a larger stdlib with more batteries included. > > On the other hand, people who have access to PyPI _don't_ benefit from > having a slim stdlib. There's nothing virtuous or advantageous about > having _less_ batteries included. Python doesn't become magically > faster or more powerful by including less in its standard > distribution: the best it does is make the distribution slightly > smaller. > > So there's really one bunch of people arguing for practical benefits, > and another bunch of people arguing for mostly aesthetical or > philosophical reasons. My experience is that the first group would benefit from a larger _standard distribution_, which is not necessarily the same thing as a larger stdlib. I'm firmly on the "smaller core, larger distribution" side of things, where we as the core team take responsibility for the bare minimum needed to be an effective language and split more functionality out to individual libraries. We then also prepare/recommend a standard distribution that bundles many of these libraries by default (Anaconda style), as well as a minimal one that is a better starting point for low-footprint systems (Miniconda style) or embedding into other apps. Cheers, Steve From antoine at python.org Thu Nov 29 11:32:31 2018 From: antoine at python.org (Antoine Pitrou) Date: Thu, 29 Nov 2018 17:32:31 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> Message-ID: Le 29/11/2018 ? 17:25, Steve Dower a ?crit?: > > My experience is that the first group would benefit from a larger > _standard distribution_, which is not necessarily the same thing as a > larger stdlib. > > I'm firmly on the "smaller core, larger distribution" side of things, > where we as the core team take responsibility for the bare minimum > needed to be an effective language and split more functionality out to > individual libraries. We may ask ourselves if there is really a large difference between a "standard distribution" and a "standard library". The primary difference seems to be that the distribution is modular, while the stdlib is not. As for reviews and CI, though, we would probably want a standard distribution to be high-quality, so to go through a review process and be tested by the same buildbot fleet as the stdlib. Regards Antoine. From christian at python.org Thu Nov 29 11:45:46 2018 From: christian at python.org (Christian Heimes) Date: Thu, 29 Nov 2018 17:45:46 +0100 Subject: [Python-Dev] Extended Python distribution [was: Inclusion of lz4 bindings in stdlib?] In-Reply-To: <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> Message-ID: <24f7db41-319a-0cdd-228e-789cb749f62a@python.org> On 29/11/2018 17.25, Steve Dower wrote: > On 29Nov2018 0254, Antoine Pitrou wrote: >> I'd like to point the discussion is asymmetric here. >> >> On the one hand, people who don't have access to PyPI would _really_ >> benefit from a larger stdlib with more batteries included. >> >> On the other hand, people who have access to PyPI _don't_ benefit from >> having a slim stdlib.? There's nothing virtuous or advantageous about >> having _less_ batteries included.? Python doesn't become magically >> faster or more powerful by including less in its standard >> distribution: the best it does is make the distribution slightly >> smaller. >> >> So there's really one bunch of people arguing for practical benefits, >> and another bunch of people arguing for mostly aesthetical or >> philosophical reasons. > > My experience is that the first group would benefit from a larger > _standard distribution_, which is not necessarily the same thing as a > larger stdlib. > > I'm firmly on the "smaller core, larger distribution" side of things, > where we as the core team take responsibility for the bare minimum > needed to be an effective language and split more functionality out to > individual libraries. We then also prepare/recommend a standard > distribution that bundles many of these libraries by default (Anaconda > style), as well as a minimal one that is a better starting point for > low-footprint systems (Miniconda style) or embedding into other apps. I was about to suggest the same proposal. Thanks, Steve! The core dev team is already busy to keep Python running and evolving. Pulling more modules into the stdlib has multiple downsides. An extended Python distribution would solve the issue for users on restricted machines. It's also something that can be handled by a working group instead of a the core dev team. A SIG can do the selection of packages, deal with legal issues, politics and create releases. An extended Python distribution can also be updated outside the release cycle of CPython. This allows out-of-band security updates of libraries that are bundled with an extended distribution. Christian Christian From christian at python.org Thu Nov 29 12:17:10 2018 From: christian at python.org (Christian Heimes) Date: Thu, 29 Nov 2018 18:17:10 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> Message-ID: On 29/11/2018 17.32, Antoine Pitrou wrote: > We may ask ourselves if there is really a large difference between a > "standard distribution" and a "standard library". The primary > difference seems to be that the distribution is modular, while the > stdlib is not. Yes, there is a huge difference between a larger distribution and a stdlib. I'm going to ignore all legal issues and human drama of "choose your favorite kid", but start with a simple example. I'm sure we can all agree that the requests library should be part of an extended Python distribution. After all it's the best and easiest HTTP library for Python. I'm using it on a daily basis. However I would require some changes to align it with other stdlib modules. Among others requests and urllib3 would have to obey sys.flags.ignore_environment and accept an SSLContext object. Requests depends on certifi to provide a CA trust store. I would definitely veto against the inclusion of certifi, because it's not just dangerous but also breaks my code. If we would keep the standard distribution of Python as it is and just have a Python SIG offer an additional extended distribution on python.org, then I don't have to care about the quality and security of additional code. The Python core team would neither own the code nor takes responsibility of the code. Instead the extended distribution SIG merely set the quality standards and leave the maintance burden to the original owners. In case a library doesn't keep up or has severe flaws, the SIG may even decide to remove a package from the extended distribution. Christian From greg at krypto.org Thu Nov 29 12:20:15 2018 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 29 Nov 2018 09:20:15 -0800 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> Message-ID: meta: I'm not participating in this sub-thread. Just changing the subject line. On Thu, Nov 29, 2018 at 9:17 AM Christian Heimes wrote: > On 29/11/2018 17.32, Antoine Pitrou wrote: > > We may ask ourselves if there is really a large difference between a > > "standard distribution" and a "standard library". The primary > > difference seems to be that the distribution is modular, while the > > stdlib is not. > > Yes, there is a huge difference between a larger distribution and a > stdlib. I'm going to ignore all legal issues and human drama of "choose > your favorite kid", but start with a simple example. > > I'm sure we can all agree that the requests library should be part of an > extended Python distribution. After all it's the best and easiest HTTP > library for Python. I'm using it on a daily basis. However I would > require some changes to align it with other stdlib modules. Among others > requests and urllib3 would have to obey sys.flags.ignore_environment and > accept an SSLContext object. Requests depends on certifi to provide a CA > trust store. I would definitely veto against the inclusion of certifi, > because it's not just dangerous but also breaks my code. > > If we would keep the standard distribution of Python as it is and just > have a Python SIG offer an additional extended distribution on > python.org, then I don't have to care about the quality and security of > additional code. The Python core team would neither own the code nor > takes responsibility of the code. Instead the extended distribution SIG > merely set the quality standards and leave the maintance burden to the > original owners. In case a library doesn't keep up or has severe flaws, > the SIG may even decide to remove a package from the extended distribution. > > Christian > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine at python.org Thu Nov 29 12:23:51 2018 From: antoine at python.org (Antoine Pitrou) Date: Thu, 29 Nov 2018 18:23:51 +0100 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> Message-ID: <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Le 29/11/2018 ? 18:17, Christian Heimes a ?crit?: > > If we would keep the standard distribution of Python as it is and just > have a Python SIG offer an additional extended distribution on > python.org, then I don't have to care about the quality and security of > additional code. The Python core team would neither own the code nor > takes responsibility of the code. Then it's an argument against the extended distribution. If its security and quality are not up to Python's quality standards, those people who don't want to install from PyPI may not accept the extended distribution either. And we may not want to offer those downloads from the main python.org page either, lest it taints the project's reputation. I think the whole argument amounts to hand waving anyway. You are inventing an extended distribution which doesn't exist (except as Anaconda) to justify that we shouldn't accept more modules in the stdlib. But obviously maintaining an extended distribution is a lot more work than accepting a single module in the stdlib, and that's why you don't see anyone doing it, even though people have been floating the idea for years. Regards Antoine. From njs at pobox.com Thu Nov 29 12:49:32 2018 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 29 Nov 2018 09:49:32 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> Message-ID: On Thu, Nov 29, 2018, 08:34 Antoine Pitrou > Le 29/11/2018 ? 17:25, Steve Dower a ?crit : > > > > My experience is that the first group would benefit from a larger > > _standard distribution_, which is not necessarily the same thing as a > > larger stdlib. > > > > I'm firmly on the "smaller core, larger distribution" side of things, > > where we as the core team take responsibility for the bare minimum > > needed to be an effective language and split more functionality out to > > individual libraries. > > We may ask ourselves if there is really a large difference between a > "standard distribution" and a "standard library". The primary > difference seems to be that the distribution is modular, while the > stdlib is not. > Some differences that come to mind: - a standard distribution provides a clear path for creating and managing subsets, which is useful when disk/download weight is an issue. (This is another situation that only affects some people and is easy for most of us to forget about.) I guess this is your "modular" point? - a standard distribution lets those who *do* have internet access install upgrades incrementally as needed. - This may be controversial, but my impression is that keeping package development outside the stdlib almost always produces better packages in the long run. You get faster feedback cycles, more responsive maintainers, and it's just easier to find people to help maintain one focused package in an area they care about than it is to get new core devs on board. Of course there's probably some survivorship bias here too (urllib is worse than requests, but is it worse than the average third-party package http client?). But that's my impression. Concretely: requests, pip, numpy, setuptools are all examples of packages that should *obviously* be included in any self-respecting set of batteries, but where we're not willing to put them in the stdlib. Obviously we aren't doing a great job of supporting offline users, regardless of whether we add lz4. There are a lot of challenges to switching to a "standard distribution" model. I'm not certain it's the best option. But what I like about it is that it could potentially reduce the conflict between what our different user groups need, instead of playing zero-sum tug-of-war every time this comes up. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Thu Nov 29 13:07:13 2018 From: steve.dower at python.org (Steve Dower) Date: Thu, 29 Nov 2018 10:07:13 -0800 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: On 29Nov2018 0923, Antoine Pitrou wrote: > I think the whole argument amounts to hand waving anyway. You are > inventing an extended distribution which doesn't exist (except as > Anaconda) to justify that we shouldn't accept more modules in the > stdlib. But obviously maintaining an extended distribution is a lot > more work than accepting a single module in the stdlib, and that's why > you don't see anyone doing it, even though people have been floating the > idea for years. https://anaconda.com/ https://www.activestate.com/products/activepython/ http://winpython.github.io/ http://python-xy.github.io/ https://www.enthought.com/product/canopy/ https://software.intel.com/en-us/distribution-for-python http://every-linux-distro-ever.example.com Do I need to keep going? Accepting a module in the stdlib means accepting the full development and maintenance burden. Maintaining a list of "we recommend these so strongly here's an installer that will give them to you" is a very different kind of burden, and one that is significantly easier to bear. Cheers, Steve From christian at python.org Thu Nov 29 13:08:56 2018 From: christian at python.org (Christian Heimes) Date: Thu, 29 Nov 2018 19:08:56 +0100 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: On 29/11/2018 18.23, Antoine Pitrou wrote: > > Le 29/11/2018 ? 18:17, Christian Heimes a ?crit?: >> >> If we would keep the standard distribution of Python as it is and just >> have a Python SIG offer an additional extended distribution on >> python.org, then I don't have to care about the quality and security of >> additional code. The Python core team would neither own the code nor >> takes responsibility of the code. > > Then it's an argument against the extended distribution. > If its security and quality are not up to Python's quality standards, > those people who don't want to install from PyPI may not accept the > extended distribution either. And we may not want to offer those > downloads from the main python.org page either, lest it taints the > project's reputation. You are assuming that you can convince or force upstream developers to change their project and development style. Speaking from personal experience, that is even unrealistic for projects that are already developed and promoted by officially acknowledged and PSF approved Python authorities. The owners and developers of these projects set their own terms and don't follow the same rigorous CI, backwards compatibility and security policies as Python core. You can't force projects to work differently. Christian From antoine at python.org Thu Nov 29 13:20:01 2018 From: antoine at python.org (Antoine Pitrou) Date: Thu, 29 Nov 2018 19:20:01 +0100 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: Le 29/11/2018 ? 19:07, Steve Dower a ?crit?: > On 29Nov2018 0923, Antoine Pitrou wrote: >> I think the whole argument amounts to hand waving anyway. You are >> inventing an extended distribution which doesn't exist (except as >> Anaconda) to justify that we shouldn't accept more modules in the >> stdlib. But obviously maintaining an extended distribution is a lot >> more work than accepting a single module in the stdlib, and that's why >> you don't see anyone doing it, even though people have been floating the >> idea for years. > > https://anaconda.com/ > https://www.activestate.com/products/activepython/ > http://winpython.github.io/ > http://python-xy.github.io/ > https://www.enthought.com/product/canopy/ > https://software.intel.com/en-us/distribution-for-python > http://every-linux-distro-ever.example.com > > Do I need to keep going? I'm sure you could. So what? The point is that it's a lot of work to maintain if you want to do it seriously and with quality standards that would actually _satisfy_ the people for whom PyPI is not an option. Notice how the serious efforts in the list above are from commercial companies. Not small groups of volunteers. Yes, it's not unheard to have distributions maintained by volunteers (Debian?). It's just _hard_ and an awful lot of work, and apparently you're not volunteering to start it. So saying "we should make an extended distribution" if you're just waiting for others to do the job doesn't sound convincing to me, it just feels like you are derailing the discussion. Regards Antoine. From solipsis at pitrou.net Thu Nov 29 13:23:55 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 29 Nov 2018 19:23:55 +0100 Subject: [Python-Dev] Standard library vs Standard distribution? References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: <20181129192355.48c2665b@fsol> On Thu, 29 Nov 2018 19:08:56 +0100 Christian Heimes wrote: > > The owners and developers of these projects set their own terms and > don't follow the same rigorous CI, backwards compatibility and security > policies as Python core. You can't force projects to work differently. Who's talking about forcing anyone? We only include modules in the stdlib when authors are _willing_ for them to go into the stdlib. Regards Antoine. From solipsis at pitrou.net Thu Nov 29 13:30:07 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 29 Nov 2018 19:30:07 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> Message-ID: <20181129193007.156b7c1a@fsol> On Thu, 29 Nov 2018 09:49:32 -0800 Nathaniel Smith wrote: > > There are a lot of challenges to switching to a "standard distribution" > model. I'm not certain it's the best option. But what I like about it is > that it could potentially reduce the conflict between what our different > user groups need, instead of playing zero-sum tug-of-war every time this > comes up. There is no conflict between what different _user_ groups need. Including lz4 in the stdlib will not create a conflict for people who prefer numpy or cryptography. Regards Antoine. From p.f.moore at gmail.com Thu Nov 29 13:40:42 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 29 Nov 2018 18:40:42 +0000 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> Message-ID: On Thu, 29 Nov 2018 at 16:28, Steve Dower wrote: > My experience is that the first group would benefit from a larger > _standard distribution_, which is not necessarily the same thing as a > larger stdlib. > > I'm firmly on the "smaller core, larger distribution" side of things, > where we as the core team take responsibility for the bare minimum > needed to be an effective language and split more functionality out to > individual libraries. We then also prepare/recommend a standard > distribution that bundles many of these libraries by default (Anaconda > style), as well as a minimal one that is a better starting point for > low-footprint systems (Miniconda style) or embedding into other apps. For the environments I'm familiar with, a "large standard distribution" would be just as acceptable as a "large standard library". However, we have to be *very* careful that we understand each other if we're going to make a fine distinction like this. So, specifically, my requirements are: 1. The installers shipped from python.org as the "official" Windows builds would need to be the "standard distribution", not a stripped down core. People not familiar with the nuances (colleagues who want to use a Python script I wrote, corporate auditors making decisions on "what's allowed", etc) can't be expected to deal with "Python, but not the one at python.org, this one instead" or even "Python, but make sure you get this non-default download".[1] 2. The various other distributions (Anaconda, ActiveState, ...) would need to buy into, and ship, the "standard distribution" (having to deal with debates around "I have Python", "No you don't, that distribution has different libraries" is problematic for grass-roots adoption - well, for any sort of consistent experience). This is probably both the most difficult requirement to achieve, and the one I'd have the best chance of being somewhat flexible over. But only "somewhat" flexible - we'd rapidly get bogged down in debating questions on a module-by-module basis... 3. The distribution needs to be versioned as a whole. A key point of a "standard set of modules" is not to have to deal with a combinatorial explosion of version management issues. Even "Python x.y.z with library a.b.c" offers some risks, unless the library is pretty stable (slow pace of change is, of course, one of the problems with the stdlib that proponents of a decoupled library hope to solve...) At this point, I've talked myself into a position where I don't see any practical difference between a stdlib and a standard distribution. So what I think I need is for someone to describe a concrete proposal for a "standard distribution", and explain clearly how it differs from the stdlib, and where it *fails* to meet the above criteria. Then, and only then, can I form a clear view on whether I would be OK with their version of a "standard distribution". Paul From steve.dower at python.org Thu Nov 29 13:46:23 2018 From: steve.dower at python.org (Steve Dower) Date: Thu, 29 Nov 2018 10:46:23 -0800 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: On 29Nov2018 1020, Antoine Pitrou wrote: > It's just _hard_ > and an awful lot of work, and apparently you're not volunteering to > start it. So saying "we should make an extended distribution" if you're > just waiting for others to do the job doesn't sound convincing to me, it > just feels like you are derailing the discussion. The problem with volunteering is that I'll immediately get told to just go off and do it as a separate thing, when the condition under which I will contribute to selecting and maintaining a set of bundled-but-not-stdlib packages is that we're actively trying to reduce the stdlib and hence the core maintenance burden. Without it being a core project, it's just more work with no benefit. I've previously volunteered to move certain modules to their own PyPI packages and bundle them (omitting package names to avoid upsetting people again), and I've done various analyses of which modules can be moved out. I've also deliberately designed functionality into the Windows installer to be able to bundle and install arbitrary packages whenever we like (there's an example in https://github.com/python/cpython/blob/master/Tools/msi/bundle/packagegroups/packageinstall.wxs). Plus I've been involved in packaging longer than I've been involved in core development. I find it highly embarrassing, but there are people out there who publicly credit me with "making it possible to use any packages at all on Windows". Please don't accuse me of throwing out ideas in this area without doing any work. When the discussion is about getting Python modules onto people's machines, discussing ways to get Python modules onto people's machines is actually keeping it on topic. Cheers, Steve From njs at pobox.com Thu Nov 29 14:05:28 2018 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 29 Nov 2018 11:05:28 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <20181129193007.156b7c1a@fsol> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <20181129193007.156b7c1a@fsol> Message-ID: On Thu, Nov 29, 2018, 10:32 Antoine Pitrou On Thu, 29 Nov 2018 09:49:32 -0800 > Nathaniel Smith wrote: > > > > There are a lot of challenges to switching to a "standard distribution" > > model. I'm not certain it's the best option. But what I like about it is > > that it could potentially reduce the conflict between what our different > > user groups need, instead of playing zero-sum tug-of-war every time this > > comes up. > > There is no conflict between what different _user_ groups need. > Including lz4 in the stdlib will not create a conflict for people who > prefer numpy or cryptography. Some users need as much functionality as possible in the standard download. Some users need the best quality, most up to date software. The current stdlib design makes it impossible to serve both sets of users well. The conflict is less extreme for software that's stable, tightly scoped, and ubiquitous, like zlib or json; maybe lz4 is in the same place. But every time we talk about adding a new package, it turns into a flashpoint for these underlying tensions. I'm talking about the underlying issue, not about lz4 in particular. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine at python.org Thu Nov 29 14:17:13 2018 From: antoine at python.org (Antoine Pitrou) Date: Thu, 29 Nov 2018 20:17:13 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <20181129193007.156b7c1a@fsol> Message-ID: <5044da26-389d-3488-404e-ed4e81c07dc3@python.org> Le 29/11/2018 ? 20:05, Nathaniel Smith a ?crit?: > On Thu, Nov 29, 2018, 10:32 Antoine Pitrou wrote: > > On Thu, 29 Nov 2018 09:49:32 -0800 > Nathaniel Smith > wrote: > > > > There are a lot of challenges to switching to a "standard > distribution" > > model. I'm not certain it's the best option. But what I like about > it is > > that it could potentially reduce the conflict between what our > different > > user groups need, instead of playing zero-sum tug-of-war every > time this > > comes up. > > There is no conflict between what different _user_ groups need. > Including lz4 in the stdlib will not create a conflict for people who > prefer numpy or cryptography. > > Some users need as much functionality as possible in the standard > download. Some users need the best quality, most up to date software. > The current stdlib design makes it impossible to serve both sets of > users well. But it doesn't have to. Obviously not every need can be solved by the stdlib, unless every Python package that has at least one current user is put in the stdlib. So, yes, there's a discussion for each concretely proposed package about whether it's sufficiently useful (and stable etc.) to be put in the stdlib. Every time it's a balancing act, and obviously it's an imperfect decision. That doesn't mean it cannot be done. You'd actually end up having the same discussions if designing a distribution, only the overall bar would probably be lower. By the way, I'll argue that putting modules in the stdlib does produce better quality modules. For example, putting multiprocessing in the stdlib uncovered many defects that were previously largely unseen (thanks in part to the extended CI that Python runs on). Improving it was a sometimes painful process, but today multiprocessing's quality is far beyond what it was when originally included, and it can be reasonably considered rock-solid within its intended domain. One argument against putting lz4 in the stdlib is that compression algorithms come by and, unless there's a large use base already, can disappear quickly if they get outdated by some newer and better algorithm. I'm surprised I haven't seen that argument yet. Regards Antoine. From p.f.moore at gmail.com Thu Nov 29 14:52:47 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 29 Nov 2018 19:52:47 +0000 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <20181129193007.156b7c1a@fsol> Message-ID: On Thu, 29 Nov 2018 at 19:08, Nathaniel Smith wrote: > > On Thu, Nov 29, 2018, 10:32 Antoine Pitrou > >> On Thu, 29 Nov 2018 09:49:32 -0800 >> Nathaniel Smith wrote: >> > >> > There are a lot of challenges to switching to a "standard distribution" >> > model. I'm not certain it's the best option. But what I like about it is >> > that it could potentially reduce the conflict between what our different >> > user groups need, instead of playing zero-sum tug-of-war every time this >> > comes up. >> >> There is no conflict between what different _user_ groups need. >> Including lz4 in the stdlib will not create a conflict for people who >> prefer numpy or cryptography. > > > Some users need as much functionality as possible in the standard download. Some users need the best quality, most up to date software. The current stdlib design makes it impossible to serve both sets of users well. ... and some users need a single, unambiguous choice for the "official, complete" distribution. Which need the current stdlib serves extremely well. > The conflict is less extreme for software that's stable, tightly scoped, and ubiquitous, like zlib or json; maybe lz4 is in the same place. But every time we talk about adding a new package, it turns into a flashpoint for these underlying tensions. I'm talking about the underlying issue, not about lz4 in particular. Agreed, there are functional areas that are less foundational, and more open to debate. But historically, we've been very good at achieving a balance in those areas. In exploring alternatives, let's not lose sight of the fact that the stdlib has been a huge success, so we know we *can* deliver an extremely successful distribution based on that model, no matter how much it might trigger regular debates :-) Paul From p.f.moore at gmail.com Thu Nov 29 15:00:19 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 29 Nov 2018 20:00:19 +0000 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> Message-ID: On Thu, 29 Nov 2018 at 17:52, Nathaniel Smith wrote: > > On Thu, Nov 29, 2018, 08:34 Antoine Pitrou > >> >> Le 29/11/2018 ? 17:25, Steve Dower a ?crit : >> > >> > My experience is that the first group would benefit from a larger >> > _standard distribution_, which is not necessarily the same thing as a >> > larger stdlib. >> > >> > I'm firmly on the "smaller core, larger distribution" side of things, >> > where we as the core team take responsibility for the bare minimum >> > needed to be an effective language and split more functionality out to >> > individual libraries. >> >> We may ask ourselves if there is really a large difference between a >> "standard distribution" and a "standard library". The primary >> difference seems to be that the distribution is modular, while the >> stdlib is not. > > > Some differences that come to mind: Ha. That arrived after I'd sent my other email. I'm not going to discuss these points case by case, except to say that there are some definite aspects of your interpretation of a "standard distribution" that are in conflict with what would work best for my use case.[1] Regardless of the details that we discuss here, it seems self-evident to me that any proposal to move away from the current "core+stdlib" distribution would be a significant change and would absolutely require a PEP, which would have to cover all of these trade-offs in detail. But until there is such a PEP, that clearly explains the precise variant of "standard distribution" model that is being proposed, I don't see much point in getting into details. There's simply too much likelihood of people talking past each other because of differing assumptions. Paul [1] Which, to reiterate, remains just one particular environment, plus a lot of extrapolation that there are "many other enterprise environments like mine"[2] [2] Oh, I really hope not - pity the poor souls who would work there :-) From mertz at gnosis.cx Thu Nov 29 15:09:47 2018 From: mertz at gnosis.cx (David Mertz) Date: Thu, 29 Nov 2018 15:09:47 -0500 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <20181129193007.156b7c1a@fsol> Message-ID: On Thu, Nov 29, 2018, 2:55 PM Paul Moore ... and some users need a single, unambiguous choice for the > "official, complete" distribution. Which need the current stdlib > serves extremely well. > Except it doesn't. At least not for a large swatch of users. 10 years ago, what I wanted in Python was pretty much entirely in the stdlib. The contents of stdlib haven't changed that much since then, but MY needs have. For what I do personally, a distribution without NumPy, Pandas, Numba, scikit-learn, and matplotlib is unusably incomplete. On the other hand, I rarely care about Django, Twisted, Whoosh, or Sphinx. But some users need those things, and even lots of supporting packages in their ecosystems. What makes a "complete distribution?" It really depends on context. The stdlib is an extremely good compromise, but it absolutely is a compromise. I feel like there is plenty of room for different purpose-driven supersets of the stdlib to make different compromises. Steve Dower lists 10 or so such distros; what they have in common is that SOMEONE, decided to curate a collection... which does not need any approval from the PSF or the core developers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Nov 29 15:29:49 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 29 Nov 2018 20:29:49 +0000 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: On Thu, 29 Nov 2018 at 18:09, Steve Dower wrote: > > On 29Nov2018 0923, Antoine Pitrou wrote: > > I think the whole argument amounts to hand waving anyway. You are > > inventing an extended distribution which doesn't exist (except as > > Anaconda) to justify that we shouldn't accept more modules in the > > stdlib. But obviously maintaining an extended distribution is a lot > > more work than accepting a single module in the stdlib, and that's why > > you don't see anyone doing it, even though people have been floating the > > idea for years. > > https://anaconda.com/ > https://www.activestate.com/products/activepython/ > http://winpython.github.io/ > http://python-xy.github.io/ > https://www.enthought.com/product/canopy/ > https://software.intel.com/en-us/distribution-for-python > http://every-linux-distro-ever.example.com > > Do I need to keep going? Nope, you've already demonstrated the problem with recommending external distributions :-) Which one do I promote within my company? Suppose I say "the Linux distro packaged version", Then Windows clients are out of the picture. If I say "Anaconda" for them, they will use (for example) numpy, and the resulting scripts won't work on the Linux machines. Choice is not always an advantage. Every single one of those distributions includes the stdlib. If we remove the stdlib, what will end up as the lowest common denominator functionality that all Python scripts can assume? Obviously at least initially, inertia will mean the stdlib will still be present, but how long will it be before someone removes urllib in favour of the (better, but with an incompatible API) requests library? And how then can a "generic" Python script get a resource from the web? > Accepting a module in the stdlib means accepting the full development > and maintenance burden. Absolutely. And yes, that's a significant cost that we pay. > Maintaining a list of "we recommend these so > strongly here's an installer that will give them to you" is a very > different kind of burden, and one that is significantly easier to bear. OK, so that reduces our costs. But what about our users? Does it increase their costs, offer a benefit to them, or is it cost-neutral? Obviously it depends on the user, but I contend that overall, it's a cost for our user base (even users who have easy access to PyPI will still incur overheads for an extra external dependency). So we're asking our users to pay the cost for a benefit to us. That may be reasonable, but let's at least be clear about it. Alternatively, if you *do* see it as a benefit for our users, I'd like to know how, because I'm missing that point. Paul From greg at krypto.org Thu Nov 29 15:30:09 2018 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 29 Nov 2018 12:30:09 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: On Thu, Nov 29, 2018 at 2:58 AM Andrew Svetlov wrote: > 5 cents about lz4 alternatives: Broli (mentioned above) is widely > supported by web. > > https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding > mentions it along with gzip and deflate methods. > I don't recall lz4 or Zstd metioning in this context. > > Both Chrome/Chromium and Firefox accepts it by default (didn't check > Microsoft products yet). > Acceptance by multiple popular browsers is a good reason to *also* propose brotli support in the stdlib. Though it'd probably make sense to actually _support_ Accept-Encoding based on available compression modules within the stdlib http.client (and potentially server) as a prerequisite for that reasoning. https://github.com/python/cpython/blob/master/Lib/http/client.py#L1168. -gps > P.S. > I worked with lz4 python binding a year ago. > It sometimes crashed to core dump when used in multithreaded environment > (we used to run compressor/decompresson with asyncio by > loop.run_in_executor() call). > I hope the bug is fixed now, have no update for the current state. > > On Thu, Nov 29, 2018 at 12:04 PM INADA Naoki > wrote: > >> On Thu, Nov 29, 2018 at 6:27 AM Steven D'Aprano >> wrote: >> > >> > On Wed, Nov 28, 2018 at 10:43:04AM -0800, Gregory P. Smith wrote: >> > >> > > PyPI makes getting more algorithms easy. >> > >> > Can we please stop over-generalising like this? PyPI makes getting >> > more algorithms easy for *SOME* people. (Sorry for shouting, but you >> > just pressed one of my buttons.) >> >> I don't think this is over-generalising. >> >> If "get it from PyPI" is not easy enough, why not adding hundreds of >> famous libraries? >> Because we can't maintain all of them well. >> >> When considering adding new format (not only compression, but also >> serialization like toml), I think it should be stable, widely used, and >> will >> be used widely for a long time. If we want to use the format in Python >> core >> or Python stdlib, it's good reasoning too. gzip and json are good >> example. >> >> When we say "we can use PyPI", it means "are there enough reasons >> make the package special enough to add to stdlib?" We don't mean >> "everyone can use PyPI." >> >> Regards, >> -- >> INADA Naoki >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com >> > > > -- > Thanks, > Andrew Svetlov > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Nov 29 16:08:35 2018 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 29 Nov 2018 13:08:35 -0800 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: On Thu, Nov 29, 2018 at 10:11 AM Christian Heimes wrote: > You are assuming that you can convince or force upstream developers to > change their project and development style. Speaking from personal > experience, that is even unrealistic for projects that are already > developed and promoted by officially acknowledged and PSF approved > Python authorities. > > The owners and developers of these projects set their own terms and > don't follow the same rigorous CI, backwards compatibility and security > policies as Python core. You can't force projects to work differently. Python core does an excellent job at CI, backcompat, security, etc., and everyone who works to make that happen should be proud. But... I think we should be careful not to frame this as Python core just having higher standards than everyone else. For some projects that's probably true. But for the major projects where I have some knowledge of the development process -- like requests, urllib3, numpy, pip, setuptools, cryptography -- the main blocker to putting them in the stdlib is that the maintainers don't think a stdlibized version could meet their quality standards. -n -- Nathaniel J. Smith -- https://vorpus.org From steve.dower at python.org Thu Nov 29 16:18:46 2018 From: steve.dower at python.org (Steve Dower) Date: Thu, 29 Nov 2018 13:18:46 -0800 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: On 29Nov2018 1229, Paul Moore wrote: > On Thu, 29 Nov 2018 at 18:09, Steve Dower wrote: >> Maintaining a list of "we recommend these so >> strongly here's an installer that will give them to you" is a very >> different kind of burden, and one that is significantly easier to bear. > > OK, so that reduces our costs. But what about our users? Does it > increase their costs, offer a benefit to them, or is it cost-neutral? > Obviously it depends on the user, but I contend that overall, it's a > cost for our user base (even users who have easy access to PyPI will > still incur overheads for an extra external dependency). So we're > asking our users to pay the cost for a benefit to us. That may be > reasonable, but let's at least be clear about it. Alternatively, if > you *do* see it as a benefit for our users, I'd like to know how, > because I'm missing that point. Probably an assumption I'm making (because I've argued the case previously) is that anything we remove from the current stdlib becomes a pip installable package that is preinstalled with the main distro. Perhaps our distro doesn't even grow from what it is today - it simply gets rearranged a bit on disk. The benefits for users is now backports are on the same footing as core libraries, as are per-package updates. The "core+precise dependencies" model for deployment could drastically improve install times in some circumstances (particularly Windows, but hey, that's my area so I care about it :) ). A number of core packages aren't really tied to the version of Python they ship with, and so users could safely backport all fixes and improvements at any time. Longer term, if something happens like "the core only includes a very high-level HTTPS API and 'socket' is an extra module if you need that API", then we can use the OS APIs and give proper proxy/TLS behaviour in core for a narrower set of uses (and sure, maybe the Linux core still requires socket and OpenSSL, but other platforms don't have to require them for functionality provided by the OS). Of course, any churn has a risk of causing new issues and so it has a cost both to us and users. There will certainly be new shadowing concerns, and code changes to unwind tricky dependencies could lead to new bugs. I think the upsides are worth it in the long run, but obviously that's not (yet) the consensus or we'd be doing it already :) Cheers, Steve From steve.dower at python.org Thu Nov 29 16:22:24 2018 From: steve.dower at python.org (Steve Dower) Date: Thu, 29 Nov 2018 13:22:24 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: <83fbba69-d862-40d4-38a8-8e96c8e7f93b@python.org> On 29Nov2018 1230, Gregory P. Smith wrote: > > On Thu, Nov 29, 2018 at 2:58 AM Andrew Svetlov > wrote: > > 5 cents about lz4 alternatives: Broli (mentioned above) is widely > supported by web. > > https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding > mentions it along with gzip and deflate methods. > I don't recall lz4 or Zstd metioning in this context. > > Both Chrome/Chromium and Firefox accepts it by default (didn't check > Microsoft products yet). > > > Acceptance by multiple popular browsers is a good reason to /also/ > propose brotli support in the stdlib. Though it'd probably make sense to > actually _support_ Accept-Encoding based on available compression > modules within the stdlib http.client (and potentially server) as a > prerequisite for that reasoning. > https://github.com/python/cpython/blob/master/Lib/http/client.py#L1168. FWIW, Brotli has been supported in Microsoft Edge since early last year: https://blogs.windows.com/msedgedev/2016/12/20/introducing-brotli-compression/ > -gps From njs at pobox.com Thu Nov 29 16:30:28 2018 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 29 Nov 2018 13:30:28 -0800 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: On Thu, Nov 29, 2018 at 10:22 AM Antoine Pitrou wrote: > > Le 29/11/2018 ? 19:07, Steve Dower a ?crit : > > On 29Nov2018 0923, Antoine Pitrou wrote: > >> I think the whole argument amounts to hand waving anyway. You are > >> inventing an extended distribution which doesn't exist (except as > >> Anaconda) to justify that we shouldn't accept more modules in the > >> stdlib. But obviously maintaining an extended distribution is a lot > >> more work than accepting a single module in the stdlib, and that's why > >> you don't see anyone doing it, even though people have been floating the > >> idea for years. > > > > https://anaconda.com/ > > https://www.activestate.com/products/activepython/ > > http://winpython.github.io/ > > http://python-xy.github.io/ > > https://www.enthought.com/product/canopy/ > > https://software.intel.com/en-us/distribution-for-python > > http://every-linux-distro-ever.example.com > > > > Do I need to keep going? > > I'm sure you could. So what? The point is that it's a lot of work to > maintain if you want to do it seriously and with quality standards that > would actually _satisfy_ the people for whom PyPI is not an option. Yeah, I draw two conclusions from the list above: - Paul expressed uncertainty about how many people are in his position of needing a single download with all the batteries included, but obviously he's not alone. So many people want a single-box-of-batteries that whole businesses are being built on fulfilling that need. - Currently, our single-box-of-batteries is doing such a lousy job of solving Paul's problem, that people are building whole businesses on our failure. If Python core wants to be in the business of providing a single-box-of-batteries that solves Paul's problem, then we need to rethink how the stdlib works. Or, we could decide we want to leave that to the distros that are better at it, and focus on our core strengths like the language and interpreter. But if the stdlib isn't a single-box-of-batteries, then what is it? It's really hard to tell whether specific packages would be good or bad additions to the stdlib, when we don't even know what the stdlib is supposed to be. -n -- Nathaniel J. Smith -- https://vorpus.org From steve.dower at python.org Thu Nov 29 16:46:56 2018 From: steve.dower at python.org (Steve Dower) Date: Thu, 29 Nov 2018 13:46:56 -0800 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: <5bfea260-42d0-f091-5a84-385b7cc213f3@python.org> On 29Nov2018 1330, Nathaniel Smith wrote: > On Thu, Nov 29, 2018 at 10:22 AM Antoine Pitrou wrote: >> >> Le 29/11/2018 ? 19:07, Steve Dower a ?crit : >>> On 29Nov2018 0923, Antoine Pitrou wrote: >>>> I think the whole argument amounts to hand waving anyway. You are >>>> inventing an extended distribution which doesn't exist (except as >>>> Anaconda) to justify that we shouldn't accept more modules in the >>>> stdlib. But obviously maintaining an extended distribution is a lot >>>> more work than accepting a single module in the stdlib, and that's why >>>> you don't see anyone doing it, even though people have been floating the >>>> idea for years. >>> >>> https://anaconda.com/ >>> https://www.activestate.com/products/activepython/ >>> http://winpython.github.io/ >>> http://python-xy.github.io/ >>> https://www.enthought.com/product/canopy/ >>> https://software.intel.com/en-us/distribution-for-python >>> http://every-linux-distro-ever.example.com >>> >>> Do I need to keep going? >> >> I'm sure you could. So what? The point is that it's a lot of work to >> maintain if you want to do it seriously and with quality standards that >> would actually _satisfy_ the people for whom PyPI is not an option. > > Yeah, I draw two conclusions from the list above: > > - Paul expressed uncertainty about how many people are in his position > of needing a single download with all the batteries included, but > obviously he's not alone. So many people want a > single-box-of-batteries that whole businesses are being built on > fulfilling that need. > > - Currently, our single-box-of-batteries is doing such a lousy job of > solving Paul's problem, that people are building whole businesses on > our failure. I agree with these two conclusions, and for what it's worth, I don't think we would do anything to put these out of business - they each have their own value that we wouldn't reproduce. Python's box of batteries these days are really the building blocks needed to ensure the libraries in these bigger distributions can communicate with each other. For example, asyncio has its place in the stdlib for this reason, as does pathlib (though perhaps the __fspath__ protocol makes the latter one a little less compelling). If the stdlib was to grow a fundamental data type somehow shared by numpy/scipy/pandas/etc. then I'd consider that a very good candidate for the stdlib to help those libraries share data, even if none of those packages were in the standard distro. At the same time, why carry that data type into your embedded app if it won't be used? Why design it in such a way that it can't be used with earlier versions of Python even if it reasonably could be? We already do backports for many new stdlib modules, so why aren't we just installing the backport module in the newer version by default and running it on its own release schedule? I forget which one, but there was one backport that ended up being recommended in place of its stdlib counterpart for a while because it got critical bugfixes sooner. Installing pip in this manner hasn't broken anyone's world (that I'm aware of). There's plenty of space for businesses to be built on being "the best Python distro for ", and if the stdlib is the best layer for enabling third-party libraries to agree on how to work with each other, then we make the entire ecosystem stronger. (That simple agreement got longer than I intended :) ) Cheers, Steve From christian at python.org Thu Nov 29 17:04:15 2018 From: christian at python.org (Christian Heimes) Date: Thu, 29 Nov 2018 23:04:15 +0100 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: <9c0071c2-a455-53cd-756c-4ae95a487a41@python.org> On 29/11/2018 22.08, Nathaniel Smith wrote: > On Thu, Nov 29, 2018 at 10:11 AM Christian Heimes wrote: >> You are assuming that you can convince or force upstream developers to >> change their project and development style. Speaking from personal >> experience, that is even unrealistic for projects that are already >> developed and promoted by officially acknowledged and PSF approved >> Python authorities. >> >> The owners and developers of these projects set their own terms and >> don't follow the same rigorous CI, backwards compatibility and security >> policies as Python core. You can't force projects to work differently. > > Python core does an excellent job at CI, backcompat, security, etc., > and everyone who works to make that happen should be proud. > > But... I think we should be careful not to frame this as Python core > just having higher standards than everyone else. For some projects > that's probably true. But for the major projects where I have some > knowledge of the development process -- like requests, urllib3, numpy, > pip, setuptools, cryptography -- the main blocker to putting them in > the stdlib is that the maintainers don't think a stdlibized version > could meet their quality standards. It looks like I phrased my statement unclear. CPython's backwards compatibility and security policy isn't necessarily superior to that of other projects. Our policy is just different and more "enterprisy". Let's take cryptography as an example. I contribute to cryptography once in a while and I maintain the official Fedora and RHEL packages. - Alex and Paul release a new version about a day after each OpenSSL security release. That's much better than CPython's policy, because CPython only updates OpenSSL with its regularly scheduled updates. - However they don't release fixes for older minor releases while CPython has multiple maintained minor releases that get backports of fixes. That means I have to manually backport security fixes for cryptography because I can't just introduce new features or API changes in RHEL. The policy is different from CPython's policy. Some aspects are better, in other aspects CPython and cryptography follow a different philosophy. Christian From andrew.svetlov at gmail.com Thu Nov 29 17:10:48 2018 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Fri, 30 Nov 2018 00:10:48 +0200 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: Neither http.client nor http.server doesn't support compression (gzip/compress/deflate) at all. I doubt if we want to add this feature: for client better to use requests or, well, aiohttp. The same for servers: almost any production ready web server from PyPI supports compression. I don't insist on adding brotli to standard library. There is officiall brotli library on PyPI from google, binary wheels are provided. Unfortunately installing from a tarball requires C++ compiler On other hand writing a binding in pure C looks very easy. On Thu, Nov 29, 2018 at 10:30 PM Gregory P. Smith wrote: > > On Thu, Nov 29, 2018 at 2:58 AM Andrew Svetlov > wrote: > >> 5 cents about lz4 alternatives: Broli (mentioned above) is widely >> supported by web. >> >> https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding >> mentions it along with gzip and deflate methods. >> I don't recall lz4 or Zstd metioning in this context. >> >> Both Chrome/Chromium and Firefox accepts it by default (didn't check >> Microsoft products yet). >> > > Acceptance by multiple popular browsers is a good reason to *also* > propose brotli support in the stdlib. Though it'd probably make sense to > actually _support_ Accept-Encoding based on available compression modules > within the stdlib http.client (and potentially server) as a prerequisite > for that reasoning. > https://github.com/python/cpython/blob/master/Lib/http/client.py#L1168. > > -gps > > >> P.S. >> I worked with lz4 python binding a year ago. >> It sometimes crashed to core dump when used in multithreaded environment >> (we used to run compressor/decompresson with asyncio by >> loop.run_in_executor() call). >> I hope the bug is fixed now, have no update for the current state. >> >> On Thu, Nov 29, 2018 at 12:04 PM INADA Naoki >> wrote: >> >>> On Thu, Nov 29, 2018 at 6:27 AM Steven D'Aprano >>> wrote: >>> > >>> > On Wed, Nov 28, 2018 at 10:43:04AM -0800, Gregory P. Smith wrote: >>> > >>> > > PyPI makes getting more algorithms easy. >>> > >>> > Can we please stop over-generalising like this? PyPI makes getting >>> > more algorithms easy for *SOME* people. (Sorry for shouting, but you >>> > just pressed one of my buttons.) >>> >>> I don't think this is over-generalising. >>> >>> If "get it from PyPI" is not easy enough, why not adding hundreds of >>> famous libraries? >>> Because we can't maintain all of them well. >>> >>> When considering adding new format (not only compression, but also >>> serialization like toml), I think it should be stable, widely used, and >>> will >>> be used widely for a long time. If we want to use the format in Python >>> core >>> or Python stdlib, it's good reasoning too. gzip and json are good >>> example. >>> >>> When we say "we can use PyPI", it means "are there enough reasons >>> make the package special enough to add to stdlib?" We don't mean >>> "everyone can use PyPI." >>> >>> Regards, >>> -- >>> INADA Naoki >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> https://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com >>> >> >> >> -- >> Thanks, >> Andrew Svetlov >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org >> > -- Thanks, Andrew Svetlov -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.svetlov at gmail.com Thu Nov 29 17:11:37 2018 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Fri, 30 Nov 2018 00:11:37 +0200 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: <83fbba69-d862-40d4-38a8-8e96c8e7f93b@python.org> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <83fbba69-d862-40d4-38a8-8e96c8e7f93b@python.org> Message-ID: On Thu, Nov 29, 2018 at 11:22 PM Steve Dower wrote: > FWIW, Brotli has been supported in Microsoft Edge since early last year: > > > https://blogs.windows.com/msedgedev/2016/12/20/introducing-brotli-compression/ > > Thanks, good to know. -- Thanks, Andrew Svetlov -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Nov 29 17:29:33 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 29 Nov 2018 22:29:33 +0000 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: On Thu, 29 Nov 2018 at 21:33, Nathaniel Smith wrote: > > On Thu, Nov 29, 2018 at 10:22 AM Antoine Pitrou wrote: > > > > Le 29/11/2018 ? 19:07, Steve Dower a ?crit : > > > On 29Nov2018 0923, Antoine Pitrou wrote: > > >> I think the whole argument amounts to hand waving anyway. You are > > >> inventing an extended distribution which doesn't exist (except as > > >> Anaconda) to justify that we shouldn't accept more modules in the > > >> stdlib. But obviously maintaining an extended distribution is a lot > > >> more work than accepting a single module in the stdlib, and that's why > > >> you don't see anyone doing it, even though people have been floating the > > >> idea for years. > > > > > > https://anaconda.com/ > > > https://www.activestate.com/products/activepython/ > > > http://winpython.github.io/ > > > http://python-xy.github.io/ > > > https://www.enthought.com/product/canopy/ > > > https://software.intel.com/en-us/distribution-for-python > > > http://every-linux-distro-ever.example.com > > > > > > Do I need to keep going? > > > > I'm sure you could. So what? The point is that it's a lot of work to > > maintain if you want to do it seriously and with quality standards that > > would actually _satisfy_ the people for whom PyPI is not an option. > > Yeah, I draw two conclusions from the list above: > > - Paul expressed uncertainty about how many people are in his position > of needing a single download with all the batteries included, but > obviously he's not alone. So many people want a > single-box-of-batteries that whole businesses are being built on > fulfilling that need. > > - Currently, our single-box-of-batteries is doing such a lousy job of > solving Paul's problem, that people are building whole businesses on > our failure. Ouch. Congratulations on neatly using my own arguments to reverse my position :-) Those are both very good points. However... > If Python core wants to be in the business of providing a > single-box-of-batteries that solves Paul's problem, then we need to > rethink how the stdlib works. Or, we could decide we want to leave > that to the distros that are better at it, and focus on our core > strengths like the language and interpreter. But if the stdlib isn't a > single-box-of-batteries, then what is it? IMO, the CPython stdlib is the *reference* library. It's what you can rely on if you want to publish code that "just works" *everywhere*. By "publish" here, I don't particularly mean "distribute code as .py files". I'm thinking much more of StackOverflow answers to "how do I do X in Python" questions, books and tutorials teaching Python, etc etc. It solves my problem, even if other distributions also do - and it has the advantage of being the solution that every other solution is a superset of, so I can say "this works on Python", and know that my statement encompasses "this works on Anaconda", "this works on ActiveState Python", "this works on your distribution's Python", ... - whereas the converse is *not* true. In the environment I work in, various groups and environments may get management buy in (or more often, just management who are willing to not object) for using Python. But they don't always end up using the same version (the data science people use Anaconda, the automation people use the distro Python, the integration guys like me use the python.org Python, ...), so having that common denominator means we can still share code.[1] Steve Dower's explanation of how he sees "splitting up the stdlib" working strikes me as potentially a good way of removing some of the maintenance cost of the stdlib *without* losing the "this is stuff you can expect to be available in every version of Python" aspect of the stdlib. But I'd want to see a more concrete proposal on how it would work, and how we could ensure that (for example) we retain the ability for *every* Python implementation to get data from a URL, even if it's clunky old urllib and the cool distributions with requests only supply it "for compatibility", before I'd be completely sold on the idea. > It's really hard to tell whether specific packages would be good or > bad additions to the stdlib, when we don't even know what the stdlib > is supposed to be. Agreed. But I think you're making things sound worse than they are. We (collectively) probably *do* know what the stdlib is, even if it's not easy to articulate. It's what we confidently expect to be present in any Python environment we sit down at. Much like you'd expect every Linux distribution to include grep, even though newer tools like ag or rg may be much better and what you'd prefer to use in practice. And just like Linux isn't much if all you have is the kernel, so Python is more than just the core language and builtins. Paul [1] Actually, things are currently not that far advanced - the various groups don't interact at all, yet. So focusing on the stdlib for me is a way of preparing for the time when we *do* start sharing, and making that transition relatively effortless and hence an argument in favour of Python rather than demonstrating that "these special-purpose languages just create silos". From brett at python.org Thu Nov 29 19:12:29 2018 From: brett at python.org (Brett Cannon) Date: Thu, 29 Nov 2018 16:12:29 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: On Thu, 29 Nov 2018 at 14:12, Andrew Svetlov wrote: > Neither http.client nor http.server doesn't support compression > (gzip/compress/deflate) at all. > I doubt if we want to add this feature: for client better to use requests > or, well, aiohttp. > The same for servers: almost any production ready web server from PyPI > supports compression. > There was actually a PR to add compressions support to http.server but I closed it in the name of maintainability since http.server, as you said, isn't for production use so compression isn't critical. -Brett > > I don't insist on adding brotli to standard library. > There is officiall brotli library on PyPI from google, binary wheels are > provided. > Unfortunately installing from a tarball requires C++ compiler > On other hand writing a binding in pure C looks very easy. > > On Thu, Nov 29, 2018 at 10:30 PM Gregory P. Smith wrote: > >> >> On Thu, Nov 29, 2018 at 2:58 AM Andrew Svetlov >> wrote: >> >>> 5 cents about lz4 alternatives: Broli (mentioned above) is widely >>> supported by web. >>> >>> https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding >>> mentions it along with gzip and deflate methods. >>> I don't recall lz4 or Zstd metioning in this context. >>> >>> Both Chrome/Chromium and Firefox accepts it by default (didn't check >>> Microsoft products yet). >>> >> >> Acceptance by multiple popular browsers is a good reason to *also* >> propose brotli support in the stdlib. Though it'd probably make sense to >> actually _support_ Accept-Encoding based on available compression modules >> within the stdlib http.client (and potentially server) as a prerequisite >> for that reasoning. >> https://github.com/python/cpython/blob/master/Lib/http/client.py#L1168. >> >> -gps >> >> >>> P.S. >>> I worked with lz4 python binding a year ago. >>> It sometimes crashed to core dump when used in multithreaded environment >>> (we used to run compressor/decompresson with asyncio by >>> loop.run_in_executor() call). >>> I hope the bug is fixed now, have no update for the current state. >>> >>> On Thu, Nov 29, 2018 at 12:04 PM INADA Naoki >>> wrote: >>> >>>> On Thu, Nov 29, 2018 at 6:27 AM Steven D'Aprano >>>> wrote: >>>> > >>>> > On Wed, Nov 28, 2018 at 10:43:04AM -0800, Gregory P. Smith wrote: >>>> > >>>> > > PyPI makes getting more algorithms easy. >>>> > >>>> > Can we please stop over-generalising like this? PyPI makes getting >>>> > more algorithms easy for *SOME* people. (Sorry for shouting, but you >>>> > just pressed one of my buttons.) >>>> >>>> I don't think this is over-generalising. >>>> >>>> If "get it from PyPI" is not easy enough, why not adding hundreds of >>>> famous libraries? >>>> Because we can't maintain all of them well. >>>> >>>> When considering adding new format (not only compression, but also >>>> serialization like toml), I think it should be stable, widely used, and >>>> will >>>> be used widely for a long time. If we want to use the format in Python >>>> core >>>> or Python stdlib, it's good reasoning too. gzip and json are good >>>> example. >>>> >>>> When we say "we can use PyPI", it means "are there enough reasons >>>> make the package special enough to add to stdlib?" We don't mean >>>> "everyone can use PyPI." >>>> >>>> Regards, >>>> -- >>>> INADA Naoki >>>> _______________________________________________ >>>> Python-Dev mailing list >>>> Python-Dev at python.org >>>> https://mail.python.org/mailman/listinfo/python-dev >>>> Unsubscribe: >>>> https://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com >>>> >>> >>> >>> -- >>> Thanks, >>> Andrew Svetlov >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org >>> >> > > -- > Thanks, > Andrew Svetlov > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Nov 29 19:14:47 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 30 Nov 2018 11:14:47 +1100 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: <20181130001447.GG4319@ando.pearwood.info> On Thu, Nov 29, 2018 at 01:30:28PM -0800, Nathaniel Smith wrote: [...] > > > https://anaconda.com/ > > > https://www.activestate.com/products/activepython/ > > > http://winpython.github.io/ > > > http://python-xy.github.io/ > > > https://www.enthought.com/product/canopy/ > > > https://software.intel.com/en-us/distribution-for-python > > > http://every-linux-distro-ever.example.com > Yeah, I draw two conclusions from the list above: > > - Paul expressed uncertainty about how many people are in his position > of needing a single download with all the batteries included, but > obviously he's not alone. So many people want a > single-box-of-batteries that whole businesses are being built on > fulfilling that need. I think that's inaccurate: at least some of those are not "box of batteries" but "nuclear reactor" distros, aimed at adding significant value to above and beyond anything that the stdlib can practically include. Things like numpy/scipy, or a powerful IDE. I'm confident that they're ALL likely to be nuclear reactor distros, for different values of nuclear reactors, but just in case one of them is not, I'll say "at least some" *wink* Another nuclear reactor is packaging itself. Despite pip, installing third-party packages is still enough of a burden and annoyance that some people are willing to pay money to have a third-party deal with the installation hassles. That's a data point going against the "just get it from PyPI" mindset. > - Currently, our single-box-of-batteries is doing such a lousy job of > solving Paul's problem, that people are building whole businesses on > our failure. That's an unfairly derogatory way of describing it. Nobody has suggested that the stdlib could be, or ought to be, the one solution for *everyone*. That would be impossible. Even Java has a rich ecosystem of third-party add-ons. No matter what we have in the stdlib, there's always going to be opportunities for people to attempt to build businesses on the gaps left over. And that is how it should be, and we should not conclude that the stdlib is doing "such a lousy job" at solving problems. Especially not *Paul's* problems, as I understand he personally is reasonably satisfied with the stdlib and doesn't use any of those third-party distros. (Paul, did I get that right?) We don't know how niche the above distros are. We don't know how successful their businesses are. All we know is: (1) they fill at least some gaps in the stdlib; (2) such gaps are inevitable, no matter how small or large the stdlib is, it can't be all things to all people; (3) and this is a sign of a healthy Python ecosystem, not a sign of failure of the stdlib. > If Python core wants to be in the business of providing a > single-box-of-batteries that solves Paul's problem, then we need to > rethink how the stdlib works. No we don't "need" to rethink anything. The current model has worked fine for 25+ years and grew Python from a tiny one-person skunkworks project in Guido's home to one of the top ten most popular programming languages in the world, however you measure popularity. And alone(?) among them, Python is the only one without either an ISO standard or a major corporate backer pushing it. We don't absolutely know that Python's success was due to the "batteries included" policy, but it seems pretty likely. We ought to believe people when they say that they were drawn to Python because of the stdlib. There are plenty of other languages that come with a tiny stdlib and leave everything else to third parties. Outside of those like Javascript, which has a privileged position due to it being the standard browser scripting language (and is backed by an ISO standard and at least one major companies vigourously driving it), how is that working out for them? The current model for the stdlib seems to be working well, and we mess with it at our peril. -- Steve From oscar.j.benjamin at gmail.com Thu Nov 29 20:45:17 2018 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 30 Nov 2018 01:45:17 +0000 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: <20181130001447.GG4319@ando.pearwood.info> References: <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> <20181130001447.GG4319@ando.pearwood.info> Message-ID: On Fri, 30 Nov 2018 at 00:17, Steven D'Aprano wrote: > > On Thu, Nov 29, 2018 at 01:30:28PM -0800, Nathaniel Smith wrote: > > [...] > > > > https://anaconda.com/ > > > > https://www.activestate.com/products/activepython/ > > > > http://winpython.github.io/ > > > > http://python-xy.github.io/ > > > > https://www.enthought.com/product/canopy/ > > > > https://software.intel.com/en-us/distribution-for-python > > > > http://every-linux-distro-ever.example.com > > > Yeah, I draw two conclusions from the list above: > > > > - Paul expressed uncertainty about how many people are in his position > > of needing a single download with all the batteries included, but > > obviously he's not alone. So many people want a > > single-box-of-batteries that whole businesses are being built on > > fulfilling that need. > > I think that's inaccurate: at least some of those are not "box of > batteries" but "nuclear reactor" distros, aimed at adding significant > value to above and beyond anything that the stdlib can practically > include. Things like numpy/scipy, or a powerful IDE. Of those listed above I have used Canopy, Anaconda and Python-xy. All three fulfil the same need from my perspective which is that they include the missing batteries needed even for basic scientific computing. In the past I've generally solved that problem for myself by using Linux and having e.g. apt install the other pieces. Since I can't expect my students to do the same I've always recommended for them to use something like Anaconda which we/they can for free. They do include much more than we need so I'd agree that Anaconda is a nuclear reactor. The things I want from it are not though. Numpy is not a nuclear reactor: at it's core it's just providing a multidimensional array. Some form of multidimensional array is included as standard in many programming languages. In large part numpy is the most frequency cited dependency on PyPI because of this omission from the stdlib. > Another nuclear reactor is packaging itself. Despite pip, installing > third-party packages is still enough of a burden and annoyance that some > people are willing to pay money to have a third-party deal with the > installation hassles. That's a data point going against the "just get it > from PyPI" mindset. Not really. Each of those distributions predates the point where pip was usable for installing basic extension modules like numpy. They arose out of necessity because particularly on Windows it would be very difficult (impossible for a novice) for someone to compile the various packages themselves. There have been significant improvements in pip, pypi and the whole packaging ecosystem in recent years thanks to the efforts of many including Paul. I've been pushing students and others to Anaconda simply because I knew that at minimum they would need numpy, scipy and matplotlib and that pip would fail to install those. That has changed in the last year or two: it is now possible to pip install binaries for those packages and many others on the 3 major OSes. I don't see any reason to recommend Anaconda to my students now. It's not a huge problem for my students but I think an important thing missing from all of this is a GUI for pip. The situation Paul described is that you can instruct someone to install Python using the python.org installer but cannot then instruct them to use pip from a terminal and I can certainly relate to that. If installing Python gave you a GUI from the programs menu that could install the other pieces that would be a significant improvement. -- Oscar From mike at selik.org Thu Nov 29 20:51:23 2018 From: mike at selik.org (Michael Selik) Date: Thu, 29 Nov 2018 17:51:23 -0800 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: On Thu, Nov 29, 2018 at 1:33 PM Nathaniel Smith wrote: > > > https://anaconda.com/ > > > https://www.activestate.com/products/activepython/ > > > http://winpython.github.io/ > > > http://python-xy.github.io/ > > > https://www.enthought.com/product/canopy/ > > > https://software.intel.com/en-us/distribution-for-python > > > http://every-linux-distro-ever.example.com > > - Currently, our single-box-of-batteries is doing such a lousy job of > solving Paul's problem, that people are building whole businesses on > our failure. > Would you say that the Linux community made the same failure, since companies like Red Hat and Canonical exist? One reason a company might purchase a distribution is simply to have someone to sue, maybe indemnification against licensing issues. Even if there were an official free distribution, companies might still choose to purchase one. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at selik.org Thu Nov 29 21:02:10 2018 From: mike at selik.org (Michael Selik) Date: Thu, 29 Nov 2018 18:02:10 -0800 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> <20181130001447.GG4319@ando.pearwood.info> Message-ID: On Thu, Nov 29, 2018 at 5:48 PM Oscar Benjamin wrote: > There have been significant improvements in pip, pypi and the whole > packaging ecosystem in recent years thanks to the efforts of many > including Paul. I've been pushing students and others to Anaconda > simply because I knew that at minimum they would need numpy, scipy and > matplotlib and that pip would fail to install those. That has changed > in the last year or two: it is now possible to pip install binaries > for those packages and many others on the 3 major OSes. I don't see > any reason to recommend Anaconda to my students now. > One advantage of conda over pip is avoidance of typosquatting on PyPI. It's not a huge problem for my students but I think an important thing > missing from all of this is a GUI for pip. The situation Paul > described is that you can instruct someone to install Python using the > python.org installer but cannot then instruct them to use pip from a > terminal and I can certainly relate to that. If installing Python gave > you a GUI from the programs menu that could install the other pieces > that would be a significant improvement. > I haven't checked in about a year, but the Windows installer from python.org wouldn't set the PATH such that python or pip were not valid executables at the DOS prompt. Fixing that one (simple?) thing would also be a significant improvement. -------------- next part -------------- An HTML attachment was scrubbed... URL: From armin.rigo at gmail.com Fri Nov 30 01:06:44 2018 From: armin.rigo at gmail.com (Armin Rigo) Date: Fri, 30 Nov 2018 08:06:44 +0200 Subject: [Python-Dev] C API changes In-Reply-To: References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> Message-ID: Hi, On Thu, 29 Nov 2018 at 18:19, Steve Dower wrote: > quo. We continue to not be able to change CPython internals at all, > since that will break people using option B. No? That will only break users if they only have an option-B ``foo.cpython-318m-x86_64-linux-gnu.so``, no option-A .so and no source code, and want to run it elsewhere than CPython 3.18. That's the same as today. If you want option-B .so for N versions of CPython, recompile the source code N times. Just to be clear, if done correctly there should be no need for #ifdefs in the source code of the extension module. A bient?t, Armin. From p.f.moore at gmail.com Fri Nov 30 04:00:35 2018 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 30 Nov 2018 09:00:35 +0000 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: <20181130001447.GG4319@ando.pearwood.info> References: <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> <20181130001447.GG4319@ando.pearwood.info> Message-ID: On Fri, 30 Nov 2018 at 00:17, Steven D'Aprano wrote: > Especially not *Paul's* problems, as I understand he personally is > reasonably satisfied with the stdlib and doesn't use any of those > third-party distros. (Paul, did I get that right?) That's correct. And not just "reasonably satisfied" - I strongly prefer the python.org distribution (with "just" the stdlib) over those others. But we should be careful making comparisons like this. The benefit of the stdlib is as a "common denominator" core functionality. It's not that I don't need other modules. I use data analysis packages a lot, but I still prefer the core distribution (plus PyPI) over Anaconda, because when I'm *not* using data analysis packages (which I use in a virtualenv) my "base" environment is something that exists in *every* Python installation, no matter what distribution. And the core is available in places where I *can't* use extra modules. Paul From andrew.svetlov at gmail.com Fri Nov 30 05:17:17 2018 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Fri, 30 Nov 2018 12:17:17 +0200 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: On Fri, Nov 30, 2018 at 2:12 AM Brett Cannon wrote: > > > On Thu, 29 Nov 2018 at 14:12, Andrew Svetlov > wrote: > >> Neither http.client nor http.server doesn't support compression >> (gzip/compress/deflate) at all. >> I doubt if we want to add this feature: for client better to use requests >> or, well, aiohttp. >> The same for servers: almost any production ready web server from PyPI >> supports compression. >> > > There was actually a PR to add compressions support to http.server but I > closed it in the name of maintainability since http.server, as you said, > isn't for production use so compression isn't critical. > > I remember this PR and agree with your decision. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Nov 30 05:38:28 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 30 Nov 2018 11:38:28 +0100 Subject: [Python-Dev] Standard library vs Standard distribution? References: <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> <20181130001447.GG4319@ando.pearwood.info> Message-ID: <20181130113828.599861bf@fsol> On Fri, 30 Nov 2018 11:14:47 +1100 Steven D'Aprano wrote: > > There are plenty of other languages that come with a tiny stdlib and > leave everything else to third parties. Outside of those like > Javascript, which has a privileged position due to it being the standard > browser scripting language (and is backed by an ISO standard and at > least one major companies vigourously driving it), how is that working > out for them? And even for Javascript, that seems to be a problem, with the myriad of dependencies JS apps seem to have for almost trivial matters, and the security issues that come with relying on so many (sometimes ill-maintained) third-party libraries. Actually, PyPI is also been targeted these days, even though hopefully it didn't (yet?) have the ramifications such attacks have had in the JS world (see e.g. the recent "event-stream" incident: https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident ) I agree with you that the stdlib's "batteries included" is a major feature of Python. Regards Antoine. From steve at pearwood.info Fri Nov 30 07:24:38 2018 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 30 Nov 2018 23:24:38 +1100 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: References: <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> <20181130001447.GG4319@ando.pearwood.info> Message-ID: <20181130122436.GI4319@ando.pearwood.info> On Fri, Nov 30, 2018 at 01:45:17AM +0000, Oscar Benjamin wrote: > They do include much more than we need so I'd agree that Anaconda is a > nuclear reactor. The things I want from it are not though. Numpy is > not a nuclear reactor: at it's core it's just providing a > multidimensional array. And a metric tonne of functions in linear algebra, root finding, special functions, polynomials, statistics, generation of random numbers, datetime calculations, FFTs, interfacing with C, and including *eight* different specialist variations of sorting alone. Just because some people don't use the entire output of the nuclear reactor, doesn't mean it isn't one. -- Steve From status at bugs.python.org Fri Nov 30 12:10:07 2018 From: status at bugs.python.org (Python tracker) Date: Fri, 30 Nov 2018 18:10:07 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20181130171007.0180D15366C@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2018-11-23 - 2018-11-30) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6880 (+15) closed 40234 (+47) total 47114 (+62) Open issues with patches: 2736 Issues opened (41) ================== #33015: Fix function cast warning in thread_pthread.h https://bugs.python.org/issue33015 reopened by vstinner #35302: create_connection with local_addr misses valid socket bindings https://bugs.python.org/issue35302 opened by kyuupichan #35305: subprocess.Popen(['/sbin/ldconfig', '-p'], stdin=PIPE) itself https://bugs.python.org/issue35305 opened by Henrik Bengtsson #35306: OSError [WinError 123] when testing if pathlib.Path('*') (aste https://bugs.python.org/issue35306 opened by jimbo1qaz_ #35307: Command line help example is missing "--prompt" option https://bugs.python.org/issue35307 opened by daftwullie #35310: select which was interrupted by EINTR isn't re-run if the time https://bugs.python.org/issue35310 opened by Brian Maissy #35311: exception unpickling error causes `multiprocessing.Pool` to ha https://bugs.python.org/issue35311 opened by Anthony Sottile #35312: lib2to3.pgen2.parse.ParseError is not roundtrip pickleable https://bugs.python.org/issue35312 opened by Anthony Sottile #35316: test_eintr fails randomly on macOS https://bugs.python.org/issue35316 opened by vstinner #35318: Check accuracy of str() doc string for its encoding argument https://bugs.python.org/issue35318 opened by rhettinger #35319: pkgutil.get_data() is a wrapper for a deprecated class https://bugs.python.org/issue35319 opened by Kevin.Norris #35321: None _frozen_importlib.__spec__.origin attribute https://bugs.python.org/issue35321 opened by maggyero #35324: ssl: FileNotFoundError when do handshake https://bugs.python.org/issue35324 opened by josephshi at yahoo.com #35325: imp.find_module() return value documentation discrepancy https://bugs.python.org/issue35325 opened by StefanBauerTT #35327: Using skipTest with subTest gives a misleading UI. https://bugs.python.org/issue35327 opened by p-ganssle #35328: Set a environment variable for venv prompt https://bugs.python.org/issue35328 opened by batisteo #35329: Documentation - capitalization issue https://bugs.python.org/issue35329 opened by Strijker #35330: When using mock to wrap an existing object, side_effect requir https://bugs.python.org/issue35330 opened by noamraph #35331: Incorrect __module__ attribute for _struct.Struct and perhaps https://bugs.python.org/issue35331 opened by bup #35332: shutil.rmtree(..., ignore_errors=True) doesn't ignore errors f https://bugs.python.org/issue35332 opened by rabraham #35333: Rename private structs to use names closer to types https://bugs.python.org/issue35333 opened by vstinner #35335: msgfmt should be able to merge more than one po file https://bugs.python.org/issue35335 opened by s-ball #35337: Check index in PyTuple_GET_ITEM/PyTuple_SET_ITEM in debug mode https://bugs.python.org/issue35337 opened by vstinner #35338: set union/intersection/difference could accept zero arguments https://bugs.python.org/issue35338 opened by carandraug #35341: Add generic version of OrderedDict to typing module https://bugs.python.org/issue35341 opened by itoijala #35342: email "default" policy raises exception iterating over unparse https://bugs.python.org/issue35342 opened by rptb1 #35344: platform: get macOS version rather than darwin version? https://bugs.python.org/issue35344 opened by vstinner #35346: Modernize Lib/platform.py code https://bugs.python.org/issue35346 opened by vstinner #35348: Problems with handling the file command output in platform.arc https://bugs.python.org/issue35348 opened by serhiy.storchaka #35350: importing "ctypes" immediately causes a segmentation fault https://bugs.python.org/issue35350 opened by n0s69z #35351: LTOFLAGS are passed to BASECFLAGS when using LTO https://bugs.python.org/issue35351 opened by cstratak #35352: test_asyncio fails on RHEL8 https://bugs.python.org/issue35352 opened by cstratak #35353: Add frame command to pdb https://bugs.python.org/issue35353 opened by cnklein #35354: Generator functions stack overflow https://bugs.python.org/issue35354 opened by asdwqii #35357: unittest.mock.call can't represent calls to a method called 'p https://bugs.python.org/issue35357 opened by cjw296 #35358: avoid '-' in importlib.import_module and builtins.__import__ https://bugs.python.org/issue35358 opened by matrixise #35359: [2.7][Windows] Define _CRT_SECURE_NO_WARNINGS to build Modules https://bugs.python.org/issue35359 opened by vstinner #35360: [Windows] Update SQLite dependency https://bugs.python.org/issue35360 opened by vstinner #35361: Update libffi dependency to 3.2.1? https://bugs.python.org/issue35361 opened by vstinner #35362: list inheritor with abstract content does not raise https://bugs.python.org/issue35362 opened by vaslem #35363: test_eintr: test_open() hangs randomly on x86-64 El Capitan 3. https://bugs.python.org/issue35363 opened by vstinner Most recent 15 issues with no replies (15) ========================================== #35362: list inheritor with abstract content does not raise https://bugs.python.org/issue35362 #35361: Update libffi dependency to 3.2.1? https://bugs.python.org/issue35361 #35359: [2.7][Windows] Define _CRT_SECURE_NO_WARNINGS to build Modules https://bugs.python.org/issue35359 #35357: unittest.mock.call can't represent calls to a method called 'p https://bugs.python.org/issue35357 #35353: Add frame command to pdb https://bugs.python.org/issue35353 #35351: LTOFLAGS are passed to BASECFLAGS when using LTO https://bugs.python.org/issue35351 #35332: shutil.rmtree(..., ignore_errors=True) doesn't ignore errors f https://bugs.python.org/issue35332 #35331: Incorrect __module__ attribute for _struct.Struct and perhaps https://bugs.python.org/issue35331 #35329: Documentation - capitalization issue https://bugs.python.org/issue35329 #35324: ssl: FileNotFoundError when do handshake https://bugs.python.org/issue35324 #35302: create_connection with local_addr misses valid socket bindings https://bugs.python.org/issue35302 #35298: Segfault in _PyObject_GenericGetAttrWithDict https://bugs.python.org/issue35298 #35297: untokenize documentation is not correct https://bugs.python.org/issue35297 #35295: Please clarify whether PyUnicode_AsUTF8AndSize() or PyUnicode_ https://bugs.python.org/issue35295 #35294: Race condition involving SocketServer.TCPServer https://bugs.python.org/issue35294 Most recent 15 issues waiting for review (15) ============================================= #35359: [2.7][Windows] Define _CRT_SECURE_NO_WARNINGS to build Modules https://bugs.python.org/issue35359 #35352: test_asyncio fails on RHEL8 https://bugs.python.org/issue35352 #35351: LTOFLAGS are passed to BASECFLAGS when using LTO https://bugs.python.org/issue35351 #35346: Modernize Lib/platform.py code https://bugs.python.org/issue35346 #35344: platform: get macOS version rather than darwin version? https://bugs.python.org/issue35344 #35337: Check index in PyTuple_GET_ITEM/PyTuple_SET_ITEM in debug mode https://bugs.python.org/issue35337 #35328: Set a environment variable for venv prompt https://bugs.python.org/issue35328 #35316: test_eintr fails randomly on macOS https://bugs.python.org/issue35316 #35312: lib2to3.pgen2.parse.ParseError is not roundtrip pickleable https://bugs.python.org/issue35312 #35310: select which was interrupted by EINTR isn't re-run if the time https://bugs.python.org/issue35310 #35307: Command line help example is missing "--prompt" option https://bugs.python.org/issue35307 #35284: Incomplete error handling in Python/compile.c:compiler_call() https://bugs.python.org/issue35284 #35283: "threading._DummyThread" redefines "is_alive" but forgets "isA https://bugs.python.org/issue35283 #35282: Add a return value to lib2to3.refactor.refactor_file and refac https://bugs.python.org/issue35282 #35278: [security] directory traversal in tempfile prefix https://bugs.python.org/issue35278 Top 10 most discussed issues (10) ================================= #35134: Add a new Include/cpython/ subdirectory for the "CPython API" https://bugs.python.org/issue35134 11 msgs #33930: Segfault with deep recursion into object().__dir__ https://bugs.python.org/issue33930 8 msgs #35338: set union/intersection/difference could accept zero arguments https://bugs.python.org/issue35338 8 msgs #35352: test_asyncio fails on RHEL8 https://bugs.python.org/issue35352 8 msgs #33015: Fix function cast warning in thread_pthread.h https://bugs.python.org/issue33015 7 msgs #35316: test_eintr fails randomly on macOS https://bugs.python.org/issue35316 7 msgs #35346: Modernize Lib/platform.py code https://bugs.python.org/issue35346 7 msgs #35344: platform: get macOS version rather than darwin version? https://bugs.python.org/issue35344 6 msgs #35327: Using skipTest with subTest gives a misleading UI. https://bugs.python.org/issue35327 5 msgs #35337: Check index in PyTuple_GET_ITEM/PyTuple_SET_ITEM in debug mode https://bugs.python.org/issue35337 5 msgs Issues closed (46) ================== #27903: Avoid ResourceWarnings from platform._dist_try_harder https://bugs.python.org/issue27903 closed by xtreak #29877: compileall hangs when accessing urandom even if number of work https://bugs.python.org/issue29877 closed by inada.naoki #30167: site.main() does not work on Python 3.6 and superior if PYTHON https://bugs.python.org/issue30167 closed by inada.naoki #31241: ast col_offset wrong for list comprehensions, generators and t https://bugs.python.org/issue31241 closed by serhiy.storchaka #31628: test_emails failure on FreeBSD https://bugs.python.org/issue31628 closed by vstinner #31900: localeconv() should decode numeric fields from LC_NUMERIC enco https://bugs.python.org/issue31900 closed by vstinner #32035: Documentation of zipfile.ZipFile().writestr() fails to mention https://bugs.python.org/issue32035 closed by serhiy.storchaka #33012: Invalid function cast warnings with gcc 8 for METH_NOARGS https://bugs.python.org/issue33012 closed by serhiy.storchaka #33029: Invalid function cast warnings with gcc 8 for getter and sette https://bugs.python.org/issue33029 closed by serhiy.storchaka #33954: float.__format__('n') fails with _PyUnicode_CheckConsistency a https://bugs.python.org/issue33954 closed by vstinner #34021: [2.7] test_regrtest: test_env_changed() fails on x86 Windows X https://bugs.python.org/issue34021 closed by vstinner #34022: 6 tests fail using SOURCE_DATE_EPOCH env var https://bugs.python.org/issue34022 closed by vstinner #34100: Same constants in tuples are not merged while compile() https://bugs.python.org/issue34100 closed by inada.naoki #34212: Cygwin link failure with builtin modules since issue30860 https://bugs.python.org/issue34212 closed by benjamin.peterson #34279: RFC: issue a warning in regrtest when no tests have been execu https://bugs.python.org/issue34279 closed by vstinner #34812: [Security] support.args_from_interpreter_flags() doesn't inher https://bugs.python.org/issue34812 closed by vstinner #34893: Add 2to3 fixer to change send and recv methods of socket objec https://bugs.python.org/issue34893 closed by benjamin.peterson #34978: check type of object in fix_dict.py in 2to3 https://bugs.python.org/issue34978 closed by benjamin.peterson #35189: PEP 475: fnctl functions are not retried if interrupted by a s https://bugs.python.org/issue35189 closed by vstinner #35217: REPL history is broken when python is invoked with cmd /c https://bugs.python.org/issue35217 closed by eryksun #35240: Travis CI: xvfb-run: error: Xvfb failed to start https://bugs.python.org/issue35240 closed by vstinner #35255: delete "How do I extract the downloaded documentation" section https://bugs.python.org/issue35255 closed by mdk #35281: Allow access to unittest.TestSuite tests https://bugs.python.org/issue35281 closed by brett.cannon #35300: Document what functions are suitable for use with the lru_cach https://bugs.python.org/issue35300 closed by rhettinger #35303: A reference leak in _operator.c's methodcaller_repr() https://bugs.python.org/issue35303 closed by serhiy.storchaka #35304: The return of truncate(size=None) (file io) is unexpected https://bugs.python.org/issue35304 closed by martin.panter #35308: webbrowser regression: BROWSER env var not respected https://bugs.python.org/issue35308 closed by serhiy.storchaka #35309: Typo in urllib.request warning: cpath ??? capath https://bugs.python.org/issue35309 closed by benjamin.peterson #35313: test_embed fails in travis CI when tests are executed from a v https://bugs.python.org/issue35313 closed by vstinner #35314: fnmatch failed with leading caret (^) https://bugs.python.org/issue35314 closed by serhiy.storchaka #35315: Error (Segmentation fault - core dumped) while installing Pyth https://bugs.python.org/issue35315 closed by christian.heimes #35317: test_email: test_localtime_daylight_false_dst_true() fails dep https://bugs.python.org/issue35317 closed by vstinner #35320: Writable __spec__.has_location attribute https://bugs.python.org/issue35320 closed by brett.cannon #35322: test_datetime leaks memory on Windows https://bugs.python.org/issue35322 closed by vstinner #35323: Windows x86 executable installer can't install https://bugs.python.org/issue35323 closed by outofthink #35326: ctypes cast and from_address cause crash on Windows 10 https://bugs.python.org/issue35326 closed by eryksun #35334: urllib3 fails with type error exception, when cannot reach PyP https://bugs.python.org/issue35334 closed by christian.heimes #35336: Bug in C locale coercion with PYTHONCOERCECLOCALE=1 https://bugs.python.org/issue35336 closed by vstinner #35339: Populating instances of class automatically https://bugs.python.org/issue35339 closed by steven.daprano #35340: global symbol "freegrammar" should be made static ore renamed https://bugs.python.org/issue35340 closed by benjamin.peterson #35343: Importlib.reload has different behaviour in script and interac https://bugs.python.org/issue35343 closed by brett.cannon #35345: Remove platform.popen(), deprecated since Python 3.3 https://bugs.python.org/issue35345 closed by vstinner #35347: test_socket: NonBlockingTCPTests.testRecv() uses a weak 100 ms https://bugs.python.org/issue35347 closed by vstinner #35349: collections.abc.Sequence isn't an abstract base class for arra https://bugs.python.org/issue35349 closed by serhiy.storchaka #35355: Some Errors involving formatted strings aren't reported in the https://bugs.python.org/issue35355 closed by terry.reedy #35356: A possible reference leak in the nis module https://bugs.python.org/issue35356 closed by serhiy.storchaka From steve.dower at python.org Fri Nov 30 12:22:30 2018 From: steve.dower at python.org (Steve Dower) Date: Fri, 30 Nov 2018 09:22:30 -0800 Subject: [Python-Dev] C API changes In-Reply-To: References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> Message-ID: <550a5261-c4de-7cfb-eac4-f25035505301@python.org> On 29Nov2018 2206, Armin Rigo wrote: > On Thu, 29 Nov 2018 at 18:19, Steve Dower wrote: >> quo. We continue to not be able to change CPython internals at all, >> since that will break people using option B. > > No? That will only break users if they only have an option-B > ``foo.cpython-318m-x86_64-linux-gnu.so``, no option-A .so and no > source code, and want to run it elsewhere than CPython 3.18. That's > the same as today. If you want option-B .so for N versions of > CPython, recompile the source code N times. > > Just to be clear, if done correctly there should be no need for > #ifdefs in the source code of the extension module. The problem is that if option B remains as compatible as it is today, we can't make option A faster enough to be attractive. The marketing pitch for this looks like: "rewrite all your existing code to be slower but works with PyPy, or don't rewrite your existing code and it'll be fastest with CPython and won't break in the future". This is status quo (where option A today is something like CFFI or Cython), and we can already see how many people have made the switch (FWIW, I totally prefer Cython over pure C for my own projects :) ). My proposed marketing pitch is: "rewrite your existing code to be forward-compatible today and faster in the future without more work, or be prepared to rewrite/update your source code for each CPython release to remain compatible with the low level API". The promise of "faster in the future" needs to be justified (and I think there's plenty of precedent in PyPy, Larry's Gilectomy and the various JavaScript VMs to assume that we can do it). We've already done enough investigation to know that making the runtime faster requires changing the low level APIs, and otherwise we're stuck in a local optima. Offering a stable, loosely coupled option A and then *planning* to break the low level APIs each version in the name of performance is the only realistic way to change what we're currently doing. Cheers, Steve From solipsis at pitrou.net Fri Nov 30 12:32:08 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 30 Nov 2018 18:32:08 +0100 Subject: [Python-Dev] C API changes References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> <550a5261-c4de-7cfb-eac4-f25035505301@python.org> Message-ID: <20181130183208.4b4d6264@fsol> On Fri, 30 Nov 2018 09:22:30 -0800 Steve Dower wrote: > > My proposed marketing pitch is: "rewrite your existing code to be > forward-compatible today and faster in the future without more work, or > be prepared to rewrite/update your source code for each CPython release > to remain compatible with the low level API". The promise of "faster in > the future" needs to be justified (and I think there's plenty of > precedent in PyPy, Larry's Gilectomy and the various JavaScript VMs to > assume that we can do it). I think that should be qualified. Technically it's certainly possible to have a faster CPython with different internals. Socially and organisationally I'm not sure we're equipped to achieve it. Regards Antoine. From stefan at bytereef.org Fri Nov 30 13:10:21 2018 From: stefan at bytereef.org (Stefan Krah) Date: Fri, 30 Nov 2018 19:10:21 +0100 Subject: [Python-Dev] C API changes Message-ID: <20181130181021.GA15863@bytereef.org> Steve Dower wrote: > My proposed marketing pitch is: "rewrite your existing code to be > forward-compatible today and faster in the future without more work, or > be prepared to rewrite/update your source code for each CPython release > to remain compatible with the low level API". The promise of "faster in > the future" needs to be justified (and I think there's plenty of > precedent in PyPy, Larry's Gilectomy and the various JavaScript VMs to > assume that we can do it). It's hard to discuss this in the abstract without knowing how big the breakage between each version is going to be. But for the scientific ecosystem this sounds a bit like a potential Python-4.0 breakage, which was universally rejected (so far). In the extreme case I can imagine people staying on 3.7. But it really depends on the nature of the changes. Stefan Krah From nas-python at arctrix.com Fri Nov 30 14:06:11 2018 From: nas-python at arctrix.com (Neil Schemenauer) Date: Fri, 30 Nov 2018 13:06:11 -0600 Subject: [Python-Dev] C API changes In-Reply-To: References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> Message-ID: <20181130190611.tvk3xlv5qb3wlpyj@python.ca> On 2018-11-29, Armin Rigo wrote: > ...Also, although I'm discussing it here, I think the whole approach > would be better if done as a third-party extension for now, without > requiring changes to CPython---just use the existing C API to > implement the CPython version. Hello Armin, Thank you for providing your input on this subject. I too like the idea of an API "shim layer" as a separate project. What do you think of writing the shim layer in C++? I'm not a C++ programmer but my understanding is that modern C++ compilers are much better than years ago. Using C++ would allow us to provide a higher level API with smaller runtime costs. However, it would require that any project using the shim layer would have to be compiled with a C++ compiler (CPython and PyPy could still expose a C compatible API). Perhaps it is a bad idea. If someone does create such a shim layer, it will already be challenging to convince extension authors to move to it. If it requires them to switch to using a C++ compiler rather than a C compiler, maybe that's too much effort. OTOH, with C++ I think you could do things like use smart pointers to automatically handle refcounts on the handles. Or maybe we should just skip C++ and implement the layer in Rust. Then the Rust borrow checker can handle the refcounts. ;-) Regards, Neil From armin.rigo at gmail.com Fri Nov 30 14:14:22 2018 From: armin.rigo at gmail.com (Armin Rigo) Date: Fri, 30 Nov 2018 21:14:22 +0200 Subject: [Python-Dev] C API changes In-Reply-To: <550a5261-c4de-7cfb-eac4-f25035505301@python.org> References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> <550a5261-c4de-7cfb-eac4-f25035505301@python.org> Message-ID: Hi Steve, On 30/11/2018, Steve Dower wrote: > On 29Nov2018 2206, Armin Rigo wrote: >> On Thu, 29 Nov 2018 at 18:19, Steve Dower wrote: >>> quo. We continue to not be able to change CPython internals at all, >>> since that will break people using option B. >> >> No? That will only break users if they only have an option-B >> ``foo.cpython-318m-x86_64-linux-gnu.so``, no option-A .so and no >> source code, and want to run it elsewhere than CPython 3.18. That's >> the same as today. If you want option-B .so for N versions of >> CPython, recompile the source code N times. >> >> Just to be clear, if done correctly there should be no need for >> #ifdefs in the source code of the extension module. > > The problem is that if option B remains as compatible as it is today, we > can't make option A faster enough to be attractive. The marketing pitch > for this looks like: "rewrite all your existing code to be slower but > works with PyPy, or don't rewrite your existing code and it'll be > fastest with CPython and won't break in the future". This is status quo > (where option A today is something like CFFI or Cython), and we can > already see how many people have made the switch (FWIW, I totally prefer > Cython over pure C for my own projects :) ). > > My proposed marketing pitch is: "rewrite your existing code to be > forward-compatible today and faster in the future without more work, or > be prepared to rewrite/update your source code for each CPython release > to remain compatible with the low level API". > (...) Discussing marketing pitches on python-dev is not one of my favorite past-times, so I'll excuse myself out of this conversation. Instead, I might try to implement the basics, check out the performance on CPython and on PyPy, and seek out interest---I'm thinking about Cython, for example, which might relatively easily be adapted to generate that kind of code. This might be a solution for the poor performance of Cython on PyPy... If everything works out, maybe I'll come back here at some point, with the argument "the CPython C API is blocking CPython from evolution more and more? Here's one possible path forward." A bient?t, Armin. From solipsis at pitrou.net Fri Nov 30 14:33:49 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 30 Nov 2018 20:33:49 +0100 Subject: [Python-Dev] C API changes References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> <20181130190611.tvk3xlv5qb3wlpyj@python.ca> Message-ID: <20181130203349.1ac12768@fsol> On Fri, 30 Nov 2018 13:06:11 -0600 Neil Schemenauer wrote: > On 2018-11-29, Armin Rigo wrote: > > ...Also, although I'm discussing it here, I think the whole approach > > would be better if done as a third-party extension for now, without > > requiring changes to CPython---just use the existing C API to > > implement the CPython version. > > Hello Armin, > > Thank you for providing your input on this subject. I too like the > idea of an API "shim layer" as a separate project. > > What do you think of writing the shim layer in C++? I'm not a C++ > programmer but my understanding is that modern C++ compilers are > much better than years ago. Using C++ would allow us to provide a > higher level API with smaller runtime costs. The main problem with exposing a C++ *API* is that all people implementing that API suddenly must understand and implement the C++ *ABI* (with itself varies from platform to platform :-)). That's trivially easy if your implementation is itself written in C++, but not if it's written in something else such as RPython, Java, Rust, etc. C is really the lingua franca when exposing an interface that can be understood, implemented and/or interfaced with from many different languages. So I'd turn the proposal on its head: you can implement the internals of your interpreter or object layer in C++ (and indeed I think it would be crazy to start writing a new Python VM in raw C), but you should still expose a C-compatible API for third-party providers and consumers. Regards Antoine. From steve.dower at python.org Fri Nov 30 14:45:20 2018 From: steve.dower at python.org (Steve Dower) Date: Fri, 30 Nov 2018 11:45:20 -0800 Subject: [Python-Dev] C API changes In-Reply-To: <20181130203349.1ac12768@fsol> References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> <20181130190611.tvk3xlv5qb3wlpyj@python.ca> <20181130203349.1ac12768@fsol> Message-ID: <85e2ff3f-0f00-3544-5ab4-d7072059c521@python.org> On 30Nov2018 1133, Antoine Pitrou wrote: > On Fri, 30 Nov 2018 13:06:11 -0600 > Neil Schemenauer wrote: >> On 2018-11-29, Armin Rigo wrote: >>> ...Also, although I'm discussing it here, I think the whole approach >>> would be better if done as a third-party extension for now, without >>> requiring changes to CPython---just use the existing C API to >>> implement the CPython version. >> >> Hello Armin, >> >> Thank you for providing your input on this subject. I too like the >> idea of an API "shim layer" as a separate project. >> >> What do you think of writing the shim layer in C++? I'm not a C++ >> programmer but my understanding is that modern C++ compilers are >> much better than years ago. Using C++ would allow us to provide a >> higher level API with smaller runtime costs. > > The main problem with exposing a C++ *API* is that all people > implementing that API suddenly must understand and implement the C++ > *ABI* (with itself varies from platform to platform :-)). That's > trivially easy if your implementation is itself written in C++, but not > if it's written in something else such as RPython, Java, Rust, etc. > > C is really the lingua franca when exposing an interface that can be > understood, implemented and/or interfaced with from many different > languages. > > So I'd turn the proposal on its head: you can implement the internals > of your interpreter or object layer in C++ (and indeed I think it > would be crazy to start writing a new Python VM in raw C), but you > should still expose a C-compatible API for third-party providers and > consumers. I totally agree with Antoine here. C++ is great for internals, but not the public interfaces. The one additional point I'd add is that there are other ABIs that C++ can use (such as xlang, Corba and COM), which can provide stability in ways the plain-old C++ ABI does not. So we wouldn't necessarily have to design a new C-based ABI for this, we could adopt an existing one that is already proven and already has supporting tools. Cheers, Steve From v+python at g.nevcal.com Fri Nov 30 16:00:37 2018 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 30 Nov 2018 13:00:37 -0800 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? In-Reply-To: References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> Message-ID: <1921b844-f1a9-6434-b367-7aa118ee187e@g.nevcal.com> On 11/29/2018 2:10 PM, Andrew Svetlov wrote: > Neither http.client nor http.server doesn't support compression > (gzip/compress/deflate) at all. > I doubt if we want to add this feature: for client better to use > requests or, well, aiohttp. > The same for servers: almost any production ready web server from PyPI > supports compression. > What production ready web servers exist on PyPi? Are there any that don't bring lots of baggage, their own enhanced way of doing things? The nice thing about the http.server is that it does things in a standard-conforming way, the bad thing about it is that it doesn't implement all the standards, and isn't maintained very well. From just reading PyPi, it is hard to discover whether a particular package is production-ready or not. I had used CherryPy for a while, but at the time it didn't support Python 3, and to use the same scripts behind CherryPy or Apache CGI (my deployment target, because that was what web hosts provided) became difficult for complex scripts.... so I reverted to http.server with a few private extensions (private because no one merged the bugs I reported some 3 versions of Python-development-process ago; back then I submitted patches, but I haven't had time to keep up with the churn of technologies Pythondev has used since Python 3 came out, which is when I started using Python, and I'm sure the submitted patches have bit-rotted by now). When I google "python web server" the first hit is the doc page for http.server, the second is a wiki page that mentions CherryPy and a bunch of others, but the descriptions, while terse, mostly point out some special capabilities of the server, making it seem like you not only get a web server, but a philosophy. I just want a web server. The last one, Waitress, is the only one that doesn't seem to have a philosophy in its description. So it would be nice if http.server and http.client could get some basic improvements to be complete, or if the docs could point to a replacement that is a complete server, but without a philosophy or framework (bloatware) to have to learn and/or work around. Glenn -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Nov 30 16:13:43 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 30 Nov 2018 22:13:43 +0100 Subject: [Python-Dev] Inclusion of lz4 bindings in stdlib? References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <1921b844-f1a9-6434-b367-7aa118ee187e@g.nevcal.com> Message-ID: <20181130221343.0c08dcbe@fsol> On Fri, 30 Nov 2018 13:00:37 -0800 Glenn Linderman wrote: > > So it would be nice if http.server and http.client could get some basic > improvements to be complete, or if the docs could point to a replacement > that is a complete server, but without a philosophy or framework > (bloatware) to have to learn and/or work around. Why do you think http.server is any different? If you want to implement your own Web service with http.server you must implement your own handler class, which is not very different from writing a handler for a third-party HTTP server. Maybe you're so used to this way of doing that you find it "natural", but in reality http.server is as opinionated as any other HTTP server implementation. By the way, there is a framework in the stdlib, it's called asyncio. And I'm sure you'll find production-ready HTTP servers written for it :-) Regards Antoine. From vstinner at redhat.com Fri Nov 30 17:10:50 2018 From: vstinner at redhat.com (Victor Stinner) Date: Fri, 30 Nov 2018 23:10:50 +0100 Subject: [Python-Dev] C API changes In-Reply-To: <85e2ff3f-0f00-3544-5ab4-d7072059c521@python.org> References: <83fca743-5874-ee27-9717-b407ccdcb60d@hastings.org> <99b32440-1c8a-8a92-7729-c037648fbbce@python.org> <20181130190611.tvk3xlv5qb3wlpyj@python.ca> <20181130203349.1ac12768@fsol> <85e2ff3f-0f00-3544-5ab4-d7072059c521@python.org> Message-ID: I just would want to say that I'm very happy to read the discussions about the C API finally happening on python-dev :-) The discussion is very interesting! > C is really the lingua franca Sorry, but why not writing the API directly in french? Victor From solipsis at pitrou.net Fri Nov 30 17:35:30 2018 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 30 Nov 2018 23:35:30 +0100 Subject: [Python-Dev] Standard library vs Standard distribution? References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> Message-ID: <20181130233530.2cb0ef42@fsol> On Thu, 29 Nov 2018 10:46:23 -0800 Steve Dower wrote: > > I've previously volunteered to move certain modules to their own PyPI > packages and bundle them (omitting package names to avoid upsetting > people again), and I've done various analyses of which modules can be > moved out. I've also deliberately designed functionality into the > Windows installer to be able to bundle and install arbitrary packages > whenever we like (there's an example in > https://github.com/python/cpython/blob/master/Tools/msi/bundle/packagegroups/packageinstall.wxs). > > Plus I've been involved in packaging longer than I've been involved in > core development. I find it highly embarrassing, but there are people > out there who publicly credit me with "making it possible to use any > packages at all on Windows". Please don't accuse me of throwing out > ideas in this area without doing any work. Sorry. I've been unfair and unduly antagonistic. Regards Antoine. From steve.dower at python.org Fri Nov 30 18:14:29 2018 From: steve.dower at python.org (Steve Dower) Date: Fri, 30 Nov 2018 15:14:29 -0800 Subject: [Python-Dev] Standard library vs Standard distribution? In-Reply-To: <20181130233530.2cb0ef42@fsol> References: <20181128174343.6bc8760b@fsol> <20181128212736.GW4319@ando.pearwood.info> <20181129115415.0ee3e1f1@fsol> <34544e1d-fc87-32ab-7b4c-40cb1e59c228@python.org> <29a69029-3e16-4717-21c6-8084d2d7eae4@python.org> <20181130233530.2cb0ef42@fsol> Message-ID: <127cfde6-1033-a1d0-b95f-fe523e1bc0b0@python.org> On 30Nov2018 1435, Antoine Pitrou wrote: > Sorry. I've been unfair and unduly antagonistic. Apology is totally accepted and all is forgiven. Thank you. Cheers, Steve