From eric at trueblade.com Fri Dec 1 02:49:35 2017 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 1 Dec 2017 02:49:35 -0500 Subject: [Python-Dev] Third and hopefully final post: PEP 557, Data Classes In-Reply-To: <25d8d06d-ac8b-865e-6e58-b16c1e91e61e@trueblade.com> References: <516a9e5b-e89e-0dd8-64c5-8a712b58f1ca@trueblade.com> <20171130101658.7f7e5807@fsol> <20171130125903.26fc1054@fsol> <13c3f7dc-ec8d-14ea-af2c-a8156e1e1936@trueblade.com> <25d8d06d-ac8b-865e-6e58-b16c1e91e61e@trueblade.com> Message-ID: <814458c0-933f-60b6-abb2-a174a7c082ee@trueblade.com> On 11/30/2017 7:22 PM, Eric V. Smith wrote: > On 11/30/2017 1:30 PM, Brett Cannon wrote: >> >> >> On Thu, 30 Nov 2017 at 05:00 Eric V. Smith > > wrote: >> >> ??? On 11/30/2017 6:59 AM, Antoine Pitrou wrote: >> ???? > >> ???? > Or, simply, is_dataclass_instance(), which is even longer, but >> ??? far more >> ???? > readable thanks to explicit word boundaries :-) >> >> ??? That actually doesn't bother me. I think this API will be used >> rarely, >> ??? if ever. Or more realistically, it should be used rarely: what >> actually >> ??? happens will no doubt surprise me. >> >> ??? So I'm okay with is_dataclass_instance() and is_dataclass_class(). >> >> ??? But then I'm also okay with dropping the API entirely. nametuple has >> ??? lived for years without it, although Raymond's advice there is >> that if >> ??? you really want to know, look for _fields. See >> ??? https://bugs.python.org/issue7796#msg99869 and the following >> discussion. >> >> >> My question was going to be whether this is even necessary. :) Perhaps >> we just drop it for now and add it in if we find there's a public need >> for it? > > That's what I'm leaning toward. I've been trying to figure out what > attr.has() or hasattr(obj, '_fields') are actually used for. The attrs > version is hard to search for, and while I see the question about > namedtuples asked fairly often on SO, I haven't seen an actual use case. > > It's easy enough for someone to write their own isdataclass(), > admittedly using an undocumented feature. Actually there's a supported way to write your own isdataclass(): call dataclasses.fields(obj). If it throws a TypeError, it's not a dataclass instance or class. I'll add a note to the PEP. Eric. > > So I'm thinking let's drop it and then gauge the demand for it, if any. > > Eric. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com > From eric at trueblade.com Fri Dec 1 02:59:37 2017 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 1 Dec 2017 02:59:37 -0500 Subject: [Python-Dev] Third and hopefully final post: PEP 557, Data Classes In-Reply-To: <5d98fa5b-fc53-b4bb-119e-57e569403ec2@oddbird.net> References: <516a9e5b-e89e-0dd8-64c5-8a712b58f1ca@trueblade.com> <5d98fa5b-fc53-b4bb-119e-57e569403ec2@oddbird.net> Message-ID: <79d5b2e9-000e-bfff-d42c-57a43dcf92c1@trueblade.com> On 11/30/2017 3:35 PM, Carl Meyer wrote: > On 11/29/2017 05:02 PM, Guido van Rossum wrote: >> I tried to look up the discussion but didn't find much except that you >> flagged this as an issue. To repeat, your concern is that isdataclass() >> applies to *instances*, not classes, which is how Eric has designed it, >> but you worry that either through the name or just because people don't >> read the docs it will be confusing. What do you suppose we do? I think >> making it work for classes as well as for instances would cause another >> category of bugs (confusion between cases where a class is needed vs. an >> instance abound in other situations -- we don't want to add to that). >> Maybe it should raise TypeError when passed a class (unless its >> metaclass is a dataclass)? Maybe it should be renamed to >> isdataclassinstance()? That's a mouthful, but I don't know how common >> the need to call this is, and people who call it a lot can define their >> own shorter alias. > > Yeah, I didn't propose a specific fix because I think there are several > options (all mentioned in this thread already), and I don't really have > strong feelings about them: > > 1) Keep the existing function and name, let it handle either classes or > instances. I agree that this is probably not the best option available, > though IMO it's still marginally better than the status quo). > > 2) Punt the problem by removing the function; don't add it to the public > API at all until we have demonstrated demand. > > 3) Rename it to "is_dataclass_instance" (and maybe also keep a separate > "is_dataclass" for testing classes directly). (Then there's also the > choice about raising TypeError vs just returning False if a function is > given the wrong type; I think TypeError is better.) In that case, you can spell "is_dataclass_instance": def isdataclass_instance(obj): dataclasses.fields(obj) # raises TypeError for non-dataclass # classes or instances if not isinstance(obj, type): raise TypeError('not an instance') return True Since this is easy enough to do in your own code, and I still don't see a use case, I'll just add a note to the PEP and omit delete isdataclass(). Plus, you can decide for yourself how to deal with the question of returning true for classes or instances or both. Eric. From steve at pearwood.info Fri Dec 1 05:31:05 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 1 Dec 2017 21:31:05 +1100 Subject: [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: <1512104079.2959729.1190270136.08EB0A92@webmail.messagingengine.com> References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <5D409985-1D45-4957-9A27-B41C6311BA8B@python.org> <5A20858A.3070807@canterbury.ac.nz> <1512104079.2959729.1190270136.08EB0A92@webmail.messagingengine.com> Message-ID: <20171201103105.GK22248@ando.pearwood.info> On Thu, Nov 30, 2017 at 11:54:39PM -0500, Random832 wrote: > The OP isn't confusing anything; it's Eric who is confused. The quoted > paragraph of the PEP clearly and unambiguously claims that the sequence > is "arguments -> function -> call", meaning that something happens after > the "function" stage [i.e. a None check] cannot short-circuit the > "arguments" stage. But in fact the sequence is "function -> arguments -> > call". I'm more confused than ever. You seem to be arguing that Python functions CAN short-circuit their arguments and avoid evaluating them. Is that the case? If not, then I fail to see the difference between "arguments -> function -> call" "function -> arguments -> call" In *both cases* the arguments are fully evaluated before the function is called, and so there is nothing the function can do to delay evaluating its arguments. If this is merely about when the name "function" is looked up, then I don't see why that's relevant to the PEP. What am I missing? -- Steve From steve at holdenweb.com Fri Dec 1 07:01:01 2017 From: steve at holdenweb.com (Steve Holden) Date: Fri, 1 Dec 2017 12:01:01 +0000 Subject: [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: <20171201103105.GK22248@ando.pearwood.info> References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <5D409985-1D45-4957-9A27-B41C6311BA8B@python.org> <5A20858A.3070807@canterbury.ac.nz> <1512104079.2959729.1190270136.08EB0A92@webmail.messagingengine.com> <20171201103105.GK22248@ando.pearwood.info> Message-ID: On Fri, Dec 1, 2017 at 10:31 AM, Steven D'Aprano wrote: > On Thu, Nov 30, 2017 at 11:54:39PM -0500, Random832 wrote: > > > The OP isn't confusing anything; it's Eric who is confused. The quoted > > paragraph of the PEP clearly and unambiguously claims that the sequence > > is "arguments -> function -> call", meaning that something happens after > > the "function" stage [i.e. a None check] cannot short-circuit the > > "arguments" stage. But in fact the sequence is "function -> arguments -> > > call". > > I'm more confused than ever. You seem to be arguing that Python > functions CAN short-circuit their arguments and avoid evaluating them. > Is that the case? > > If not, then I fail to see the difference between > > "arguments -> function -> call" > > "function -> arguments -> call" > > In *both cases* the arguments are fully evaluated before the function is > called, and so there is nothing the function can do to delay evaluating > its arguments. > > If this is merely about when the name "function" is looked up, then I > don't see why that's relevant to the PEP. > > What am I missing? > > ?I guess it's possible that if computing the function (i.e., evaluating the expression immediately to the left of the argument list) and/or the argument has side effects? ?then the evaluation order will affect the outcome. Intuitively it seems more straightforward to compute the function first. If this expression were to raise an exception, of course, then the arguments would not then be evaluated. Or vice versa. It would be best of the specification matches current CPython bahviour. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Fri Dec 1 08:24:05 2017 From: random832 at fastmail.com (Random832) Date: Fri, 01 Dec 2017 08:24:05 -0500 Subject: [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: <20171201103105.GK22248@ando.pearwood.info> References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <5D409985-1D45-4957-9A27-B41C6311BA8B@python.org> <5A20858A.3070807@canterbury.ac.nz> <1512104079.2959729.1190270136.08EB0A92@webmail.messagingengine.com> <20171201103105.GK22248@ando.pearwood.info> Message-ID: <1512134645.3087079.1190627376.7505C5A6@webmail.messagingengine.com> On Fri, Dec 1, 2017, at 05:31, Steven D'Aprano wrote: > I'm more confused than ever. You seem to be arguing that Python > functions CAN short-circuit their arguments and avoid evaluating them. > Is that the case? > If this is merely about when the name "function" is looked up, then I > don't see why that's relevant to the PEP. > > What am I missing? You're completely missing the context of the discussion, which was the supposed reason that a *new* function call operator, with the proposed syntax function?(args), that would short-circuit (based on the 'function' being None) could not be implemented. The whole thing doesn't make sense to me anyway, since a new operator could have its own sequence different from the existing one if necessary. From eric at trueblade.com Fri Dec 1 08:33:35 2017 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 1 Dec 2017 08:33:35 -0500 Subject: [Python-Dev] Third and hopefully final post: PEP 557, Data Classes In-Reply-To: <79d5b2e9-000e-bfff-d42c-57a43dcf92c1@trueblade.com> References: <516a9e5b-e89e-0dd8-64c5-8a712b58f1ca@trueblade.com> <5d98fa5b-fc53-b4bb-119e-57e569403ec2@oddbird.net> <79d5b2e9-000e-bfff-d42c-57a43dcf92c1@trueblade.com> Message-ID: <72b7d31b-5089-9eca-ee41-cecc74e8a993@trueblade.com> > Since this is easy enough to do in your own code, and I still don't see > a use case, I'll just add a note to the PEP and delete isdataclass(). > > Plus, you can decide for yourself how to deal with the question of > returning true for classes or instances or both. I've updated the PEP and reposted it. The only change is removing isdataclass(). Eric. From agriff at tin.it Fri Dec 1 09:03:19 2017 From: agriff at tin.it (Andrea Griffini) Date: Fri, 1 Dec 2017 15:03:19 +0100 Subject: [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: <20171201103105.GK22248@ando.pearwood.info> References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <5D409985-1D45-4957-9A27-B41C6311BA8B@python.org> <5A20858A.3070807@canterbury.ac.nz> <1512104079.2959729.1190270136.08EB0A92@webmail.messagingengine.com> <20171201103105.GK22248@ando.pearwood.info> Message-ID: The PEP says that a None-aware function call operator (e.g. "f?(x, y)") would break the rule of python that arguments are evaluated before the function but this is not correct. In Python the function is evaluated before the arguments (but of course the CALL is made after the evaluation of the arguments). A None-aware function call operator ?(...) wouldn't break this order of evaluation rule: 1) evaluate the function, 2) only if it's not None evaluate arguments and make the call. In bytecode the None-aware function call would simply require a extra "JNONE" to the end... ... evaluate the function ... JNONE skip ... evaluate arguments ... CALL n skip: Note that I'm not saying this would be a good thing, just that the reason the PEP uses to dismiss this option is actually wrong because Python doesn't work the way the PEP says it does. Andrea On Fri, Dec 1, 2017 at 11:31 AM, Steven D'Aprano wrote: > On Thu, Nov 30, 2017 at 11:54:39PM -0500, Random832 wrote: > > > The OP isn't confusing anything; it's Eric who is confused. The quoted > > paragraph of the PEP clearly and unambiguously claims that the sequence > > is "arguments -> function -> call", meaning that something happens after > > the "function" stage [i.e. a None check] cannot short-circuit the > > "arguments" stage. But in fact the sequence is "function -> arguments -> > > call". > > I'm more confused than ever. You seem to be arguing that Python > functions CAN short-circuit their arguments and avoid evaluating them. > Is that the case? > > If not, then I fail to see the difference between > > "arguments -> function -> call" > > "function -> arguments -> call" > > In *both cases* the arguments are fully evaluated before the function is > called, and so there is nothing the function can do to delay evaluating > its arguments. > > If this is merely about when the name "function" is looked up, then I > don't see why that's relevant to the PEP. > > What am I missing? > > > -- > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > agriff%40tin.it > -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Dec 1 12:09:55 2017 From: status at bugs.python.org (Python tracker) Date: Fri, 1 Dec 2017 18:09:55 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20171201170955.13B6E5666C@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2017-11-24 - 2017-12-01) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6281 (+14) closed 37665 (+55) total 43946 (+69) Open issues with patches: 2414 Issues opened (42) ================== #26856: android does not have pwd.getpwall() https://bugs.python.org/issue26856 reopened by xdegaye #30487: DOC: automatically create a venv and install Sphinx when runni https://bugs.python.org/issue30487 reopened by ned.deily #30657: [security] CVE-2017-1000158: Unsafe arithmetic in PyString_Dec https://bugs.python.org/issue30657 reopened by vstinner #32128: test_nntplib: test_article_head_body() fails in SSL mode https://bugs.python.org/issue32128 opened by vstinner #32129: Icon on macOS https://bugs.python.org/issue32129 opened by wordtech #32130: xml.sax parser validation sometimes fails when obtaining DTDs https://bugs.python.org/issue32130 opened by failys #32131: Missing encoding parameter in urllib/parse.py https://bugs.python.org/issue32131 opened by jmbc #32133: documentation: numbers module nitpick https://bugs.python.org/issue32133 opened by abcdef #32137: Stack overflow in repr of deeply nested dicts https://bugs.python.org/issue32137 opened by serhiy.storchaka #32140: IDLE debugger fails with non-trivial __new__ super call https://bugs.python.org/issue32140 opened by Camion #32141: configure with Spaces in Directory Name on macOS https://bugs.python.org/issue32141 opened by philthompson10 #32142: heapq.heappop - documentation misleading or doesn't work https://bugs.python.org/issue32142 opened by scooter4j #32143: os.statvfs lacks f_fsid https://bugs.python.org/issue32143 opened by gscrivano #32145: Wrong ExitStack Callback recipe https://bugs.python.org/issue32145 opened by Denaun #32146: multiprocessing freeze_support needed outside win32 https://bugs.python.org/issue32146 opened by dancol #32147: improve performance of binascii.unhexlify() by using conversio https://bugs.python.org/issue32147 opened by sir-sigurd #32152: Add pid to .cover filename in lib/trace.py https://bugs.python.org/issue32152 opened by nikhilh #32153: mock.create_autospec fails if an attribute is a partial functi https://bugs.python.org/issue32153 opened by cbelu #32156: Fix flake8 warning F401: ... imported but unused https://bugs.python.org/issue32156 opened by vstinner #32160: lzma documentation: example to XZ compress file on disk https://bugs.python.org/issue32160 opened by dhimmel #32162: typing.Generic breaks __init_subclass__ https://bugs.python.org/issue32162 opened by Ilya.Kulakov #32165: PyEval_InitThreads is called before Py_Initialize in LoadPytho https://bugs.python.org/issue32165 opened by mrkn #32170: Contrary to documentation, ZipFile.extract does not extract ti https://bugs.python.org/issue32170 opened by Malcolm Smith #32173: linecache.py add lazycache to __all__ and use dict.clear to cl https://bugs.python.org/issue32173 opened by ganziqim #32174: nonASCII punctuation characters can not display in python363.c https://bugs.python.org/issue32174 opened by zaazbb #32175: Add hash auto-randomization https://bugs.python.org/issue32175 opened by bjarvis #32176: Zero argument super is broken in 3.6 for methods with a hacked https://bugs.python.org/issue32176 opened by bup #32177: spammers mine emails from bugs.python.org https://bugs.python.org/issue32177 opened by joern #32178: Some invalid email address groups cause an IndexError instead https://bugs.python.org/issue32178 opened by mtorromeo #32179: Empty email address in headers triggers an IndexError https://bugs.python.org/issue32179 opened by mtorromeo #32180: bool() vs len() > 0 on lists https://bugs.python.org/issue32180 opened by dilyan.palauzov #32181: runaway Tasks with Task.cancel() ignored. https://bugs.python.org/issue32181 opened by Oleg K2 #32182: Infinite recursion in email.message.as_string() https://bugs.python.org/issue32182 opened by Silla Rizzoli #32183: Coverity: CID 1423264: Insecure data handling (TAINTED_SCALA https://bugs.python.org/issue32183 opened by vstinner #32185: SSLContext.wrap_socket sends SNI Extension when server_hostnam https://bugs.python.org/issue32185 opened by nitzmahone #32186: io.FileIO hang all threads if fstat blocks on inaccessible NFS https://bugs.python.org/issue32186 opened by nirs #32188: ImpImporter.find_modules removes symlinks in paths https://bugs.python.org/issue32188 opened by Henk-Jaap Wagenaar #32189: SyntaxError for yield expressions inside comprehensions & gene https://bugs.python.org/issue32189 opened by ncoghlan #32190: Separate out legacy introspection APIs in the inspect docs https://bugs.python.org/issue32190 opened by ncoghlan #32192: Provide importlib.util.lazy_import helper function https://bugs.python.org/issue32192 opened by ncoghlan #32193: Convert asyncio to async/await https://bugs.python.org/issue32193 opened by asvetlov #32195: datetime.strftime with %Y no longer outputs leading zeros https://bugs.python.org/issue32195 opened by davechallis Most recent 15 issues with no replies (15) ========================================== #32195: datetime.strftime with %Y no longer outputs leading zeros https://bugs.python.org/issue32195 #32192: Provide importlib.util.lazy_import helper function https://bugs.python.org/issue32192 #32190: Separate out legacy introspection APIs in the inspect docs https://bugs.python.org/issue32190 #32189: SyntaxError for yield expressions inside comprehensions & gene https://bugs.python.org/issue32189 #32183: Coverity: CID 1423264: Insecure data handling (TAINTED_SCALA https://bugs.python.org/issue32183 #32181: runaway Tasks with Task.cancel() ignored. https://bugs.python.org/issue32181 #32179: Empty email address in headers triggers an IndexError https://bugs.python.org/issue32179 #32174: nonASCII punctuation characters can not display in python363.c https://bugs.python.org/issue32174 #32173: linecache.py add lazycache to __all__ and use dict.clear to cl https://bugs.python.org/issue32173 #32165: PyEval_InitThreads is called before Py_Initialize in LoadPytho https://bugs.python.org/issue32165 #32153: mock.create_autospec fails if an attribute is a partial functi https://bugs.python.org/issue32153 #32146: multiprocessing freeze_support needed outside win32 https://bugs.python.org/issue32146 #32141: configure with Spaces in Directory Name on macOS https://bugs.python.org/issue32141 #32137: Stack overflow in repr of deeply nested dicts https://bugs.python.org/issue32137 #32133: documentation: numbers module nitpick https://bugs.python.org/issue32133 Most recent 15 issues waiting for review (15) ============================================= #32189: SyntaxError for yield expressions inside comprehensions & gene https://bugs.python.org/issue32189 #32186: io.FileIO hang all threads if fstat blocks on inaccessible NFS https://bugs.python.org/issue32186 #32178: Some invalid email address groups cause an IndexError instead https://bugs.python.org/issue32178 #32175: Add hash auto-randomization https://bugs.python.org/issue32175 #32173: linecache.py add lazycache to __all__ and use dict.clear to cl https://bugs.python.org/issue32173 #32156: Fix flake8 warning F401: ... imported but unused https://bugs.python.org/issue32156 #32147: improve performance of binascii.unhexlify() by using conversio https://bugs.python.org/issue32147 #32143: os.statvfs lacks f_fsid https://bugs.python.org/issue32143 #32137: Stack overflow in repr of deeply nested dicts https://bugs.python.org/issue32137 #32129: Icon on macOS https://bugs.python.org/issue32129 #32128: test_nntplib: test_article_head_body() fails in SSL mode https://bugs.python.org/issue32128 #32124: Document functions safe to be called before Py_Initialize() https://bugs.python.org/issue32124 #32118: Doc for comparison of sequences with non-orderable elements https://bugs.python.org/issue32118 #32117: Tuple unpacking in return and yield statements https://bugs.python.org/issue32117 #32114: The get_event_loop change in bpo28613 did not update the docum https://bugs.python.org/issue32114 Top 10 most discussed issues (10) ================================= #16487: Allow ssl certificates to be specified from memory rather than https://bugs.python.org/issue16487 12 msgs #30487: DOC: automatically create a venv and install Sphinx when runni https://bugs.python.org/issue30487 9 msgs #32124: Document functions safe to be called before Py_Initialize() https://bugs.python.org/issue32124 7 msgs #30657: [security] CVE-2017-1000158: Unsafe arithmetic in PyString_Dec https://bugs.python.org/issue30657 6 msgs #32030: PEP 432: Rewrite Py_Main() https://bugs.python.org/issue32030 6 msgs #27172: Undeprecate inspect.getfullargspec() https://bugs.python.org/issue27172 5 msgs #30855: [2.7] test_tk: test_use() of test_tkinter.test_widgets randoml https://bugs.python.org/issue30855 5 msgs #32129: Icon on macOS https://bugs.python.org/issue32129 5 msgs #32142: heapq.heappop - documentation misleading or doesn't work https://bugs.python.org/issue32142 5 msgs #32180: bool() vs len() > 0 on lists https://bugs.python.org/issue32180 5 msgs Issues closed (54) ================== #10544: yield expression inside generator expression does nothing https://bugs.python.org/issue10544 closed by ncoghlan #20891: PyGILState_Ensure on non-Python thread causes fatal error https://bugs.python.org/issue20891 closed by vstinner #23033: Disallow support for a*.example.net, *a.example.net, and a*b.e https://bugs.python.org/issue23033 closed by Mariatta #24641: Log type of unserializable value when raising JSON TypeError https://bugs.python.org/issue24641 closed by serhiy.storchaka #25394: CoroWrapper breaks gen.throw https://bugs.python.org/issue25394 closed by vstinner #27535: Ignored ResourceWarning warnings leak memory in warnings regis https://bugs.python.org/issue27535 closed by vstinner #27606: Android cross-built for armv5te with clang and '-mthumb' crash https://bugs.python.org/issue27606 closed by xdegaye #28334: netrc does not work if $HOME is not set https://bugs.python.org/issue28334 closed by berker.peksag #29879: typing.Text not available in python 3.5.1 https://bugs.python.org/issue29879 closed by Mariatta #29885: Allow GMT timezones to be used in datetime. https://bugs.python.org/issue29885 closed by Decorater #30004: in regex-howto, improve example on grouping https://bugs.python.org/issue30004 closed by Mariatta #30396: Document the PyClassMethod* C API functions. https://bugs.python.org/issue30396 closed by Decorater #31705: test_sha256 from test_socket fails on ppc64le arch https://bugs.python.org/issue31705 closed by vstinner #31854: Add mmap.ACCESS_DEFAULT to namespace https://bugs.python.org/issue31854 closed by berker.peksag #32051: Possible issue in multiprocessing doc https://bugs.python.org/issue32051 closed by berker.peksag #32059: detect_modules() in setup.py must also search the sysroot path https://bugs.python.org/issue32059 closed by xdegaye #32071: Add py.test-like "-k" test selection to unittest https://bugs.python.org/issue32071 closed by pitrou #32101: Add PYTHONDEVMODE=1 to enable the developer mode https://bugs.python.org/issue32101 closed by vstinner #32107: Improve MAC address calculation and fix test_uuid.py https://bugs.python.org/issue32107 closed by barry #32110: Make codecs.StreamReader.read() more compatible with read() of https://bugs.python.org/issue32110 closed by serhiy.storchaka #32116: CSV import and export simplified https://bugs.python.org/issue32116 closed by rhettinger #32121: tracemalloc.Traceback.format() should have an option to revers https://bugs.python.org/issue32121 closed by vstinner #32125: Remove global configuration variable Py_UseClassExceptionsFlag https://bugs.python.org/issue32125 closed by vstinner #32126: [asyncio] test failure when the platform lacks a functional s https://bugs.python.org/issue32126 closed by xdegaye #32127: tutorial on dictionaries has error in example https://bugs.python.org/issue32127 closed by tberla #32132: Android5 https://bugs.python.org/issue32132 closed by berker.peksag #32134: Crash on OSX https://bugs.python.org/issue32134 closed by terry.reedy #32135: Dict creation with update will result to NoneType https://bugs.python.org/issue32135 closed by eric.smith #32136: Move embedding tests to their own test module https://bugs.python.org/issue32136 closed by ncoghlan #32138: android: test_faulthandler fails also on API 24 https://bugs.python.org/issue32138 closed by xdegaye #32139: android: locale is modified by test_strftime https://bugs.python.org/issue32139 closed by xdegaye #32144: email.policy.SMTP and SMTPUTF8 doesn't honor linesep's value https://bugs.python.org/issue32144 closed by r.david.murray #32148: Python 2.7.14 has Tkinter with big T letter. https://bugs.python.org/issue32148 closed by serhiy.storchaka #32149: bolen-dmg-3.x: compiled failed with: blurb: command not found https://bugs.python.org/issue32149 closed by ned.deily #32150: Expand tabs to spaces in C files https://bugs.python.org/issue32150 closed by serhiy.storchaka #32151: -mvenv vs minor python version updates https://bugs.python.org/issue32151 closed by ronaldoussoren #32154: asyncio: Don't export selectors and _overlapped in asyncio nam https://bugs.python.org/issue32154 closed by vstinner #32155: Fix flake8 warning F841: local variable ... is assigned to but https://bugs.python.org/issue32155 closed by vstinner #32157: Remove explicit quotes around %r and {!r} https://bugs.python.org/issue32157 closed by serhiy.storchaka #32158: Suppress (and other contextlib context managers) should work a https://bugs.python.org/issue32158 closed by jason.coombs #32159: Remove tools for CVS and Subversion https://bugs.python.org/issue32159 closed by vstinner #32161: Python 2.7.14 installation on Ubuntu 16.04/GCC 5.4 throws "int https://bugs.python.org/issue32161 closed by vstinner #32163: getattr() returns None even when default is given https://bugs.python.org/issue32163 closed by rhettinger #32164: IDLE: delete tabbedpages.py https://bugs.python.org/issue32164 closed by terry.reedy #32166: Drop python 3.4 code from asyncio.coroutines and asyncio.unix_ https://bugs.python.org/issue32166 closed by asvetlov #32167: Improve random.choice function for FiFa purpose https://bugs.python.org/issue32167 closed by duphan #32168: Mutable instance variables don't get new references with args. https://bugs.python.org/issue32168 closed by eric.smith #32169: Drop python 3.4-3.5 code from asyncio.unix_events https://bugs.python.org/issue32169 closed by asvetlov #32171: Inconsistent results for fractional power of -infinity https://bugs.python.org/issue32171 closed by mark.dickinson #32172: Add length counter for iterables https://bugs.python.org/issue32172 closed by steven.daprano #32184: pdb/ipdb is not usable on Linux (which works on Windows) from https://bugs.python.org/issue32184 closed by nartes #32187: tab completion fails in pdb/ipdb/ipython for python3.7 https://bugs.python.org/issue32187 closed by nartes #32191: TypeError does not work when function with type hint https://bugs.python.org/issue32191 closed by Kang #32194: When creating list of dictionaries and updating datetime objec https://bugs.python.org/issue32194 closed by gdr at garethrees.org From chris.barker at noaa.gov Fri Dec 1 12:47:30 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 1 Dec 2017 09:47:30 -0800 Subject: [Python-Dev] iso8601 parsing In-Reply-To: <4AC956F5-9FD1-40BC-972D-619DC91AC600@ganssle.io> References: <01e69881-3710-87c8-f47a-dfc427ec65b5@mgmiller.net> <4AC956F5-9FD1-40BC-972D-619DC91AC600@ganssle.io> Message-ID: On Wed, Nov 29, 2017 at 4:19 PM, Paul G wrote: > I can write at least a pure Python implementation in the next few days, if > not a full C implementation. Shouldn't be too hard since I've got a few > different Cython implementations sitting around anyway. > > Thanks! -CHB > On November 29, 2017 7:06:58 PM EST, Alexander Belopolsky < > alexander.belopolsky at gmail.com> wrote: >> >> >> >> On Wed, Nov 29, 2017 at 6:42 PM, Chris Barker >> wrote: >> >>> >>> indeed what is the holdup? I don't recall anyone saying it was a bad >>> idea in the last discussion. >>> >>> Do we just need an implementation? >>> >>> Is the one in the Bug Report not up to snuff? If not, then what's wrong >>> with it? This is just not that hard a problem to solve. >>> >> >> >> See my comment from over a year ago: > issue15873#msg273609>. The proposed patch did not have a C >> implementation, but we can use the same approach as with strptime and call >> Python code from C. If users will start complaining about performance, we >> can speed it up in later releases. Also the new method needs to be >> documented. Overall, it does not seem to require more than an hour of work >> from a motivated developer, but the people who contributed to the issue in >> the past seem to have lost their interest. >> > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at ganssle.io Fri Dec 1 12:51:29 2017 From: paul at ganssle.io (Paul G) Date: Fri, 1 Dec 2017 12:51:29 -0500 Subject: [Python-Dev] iso8601 parsing In-Reply-To: References: <01e69881-3710-87c8-f47a-dfc427ec65b5@mgmiller.net> <4AC956F5-9FD1-40BC-972D-619DC91AC600@ganssle.io> Message-ID: <2667a3e2-3d61-3633-5edd-78295cb0f337@ganssle.io> As an update, I have the C version done and basically tested as an extension (I "cheated" on the tests by using hypothesis, so I still need to write unittest-style tests), just writing the Python version with tests now. I know there is a feature freeze coming in soon, is there a strict deadline here if we want this for Python 3.7? Best, Paul On 12/01/2017 12:47 PM, Chris Barker wrote: > On Wed, Nov 29, 2017 at 4:19 PM, Paul G wrote: > >> I can write at least a pure Python implementation in the next few days, if >> not a full C implementation. Shouldn't be too hard since I've got a few >> different Cython implementations sitting around anyway. >> >> > Thanks! > > -CHB > > > > >> On November 29, 2017 7:06:58 PM EST, Alexander Belopolsky < >> alexander.belopolsky at gmail.com> wrote: >>> >>> >>> >>> On Wed, Nov 29, 2017 at 6:42 PM, Chris Barker >>> wrote: >>> >>>> >>>> indeed what is the holdup? I don't recall anyone saying it was a bad >>>> idea in the last discussion. >>>> >>>> Do we just need an implementation? >>>> >>>> Is the one in the Bug Report not up to snuff? If not, then what's wrong >>>> with it? This is just not that hard a problem to solve. >>>> >>> >>> >>> See my comment from over a year ago: >> issue15873#msg273609>. The proposed patch did not have a C >>> implementation, but we can use the same approach as with strptime and call >>> Python code from C. If users will start complaining about performance, we >>> can speed it up in later releases. Also the new method needs to be >>> documented. Overall, it does not seem to require more than an hour of work >>> from a motivated developer, but the people who contributed to the issue in >>> the past seem to have lost their interest. >>> >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From ericfahlgren at gmail.com Fri Dec 1 12:52:41 2017 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Fri, 1 Dec 2017 09:52:41 -0800 Subject: [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: <1512134645.3087079.1190627376.7505C5A6@webmail.messagingengine.com> References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <5D409985-1D45-4957-9A27-B41C6311BA8B@python.org> <5A20858A.3070807@canterbury.ac.nz> <1512104079.2959729.1190270136.08EB0A92@webmail.messagingengine.com> <20171201103105.GK22248@ando.pearwood.info> <1512134645.3087079.1190627376.7505C5A6@webmail.messagingengine.com> Message-ID: On Fri, Dec 1, 2017 at 5:24 AM, Random832 wrote: > You're completely missing the context of the discussion, which was the > supposed reason that a *new* function call operator, with the proposed > syntax function?(args), that would short-circuit (based on the > 'function' being None) could not be implemented. The whole thing doesn't > make sense to me anyway, since a new operator could have its own > sequence different from the existing one if necessary. > ?Right, I was clearly misinterpreting the wording in the PEP. It's a bit ambiguous and should probably make explicit that "evaluate the function" isn't just the common vernacular for "call the function". -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Dec 1 14:41:57 2017 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 1 Dec 2017 14:41:57 -0500 Subject: [Python-Dev] iso8601 parsing In-Reply-To: <2667a3e2-3d61-3633-5edd-78295cb0f337@ganssle.io> References: <01e69881-3710-87c8-f47a-dfc427ec65b5@mgmiller.net> <4AC956F5-9FD1-40BC-972D-619DC91AC600@ganssle.io> <2667a3e2-3d61-3633-5edd-78295cb0f337@ganssle.io> Message-ID: > is there a strict deadline here if we want this for Python 3.7? The deadline for the new features is the date of the first beta currently scheduled for 2018-01-29, but if you can get this in before the last alpha (2018-01-08) it will be best. See PEP 537 (https://www.python.org/dev/peps/pep-0537) for details and updates. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Fri Dec 1 16:00:35 2017 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 1 Dec 2017 21:00:35 +0000 Subject: [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: <1512134645.3087079.1190627376.7505C5A6@webmail.messagingengine.com> References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <5D409985-1D45-4957-9A27-B41C6311BA8B@python.org> <5A20858A.3070807@canterbury.ac.nz> <1512104079.2959729.1190270136.08EB0A92@webmail.messagingengine.com> <20171201103105.GK22248@ando.pearwood.info> <1512134645.3087079.1190627376.7505C5A6@webmail.messagingengine.com> Message-ID: <4c3494cd-3a15-81b7-e816-f1b426cdef10@mrabarnett.plus.com> On 2017-12-01 13:24, Random832 wrote: > On Fri, Dec 1, 2017, at 05:31, Steven D'Aprano wrote: >> I'm more confused than ever. You seem to be arguing that Python >> functions CAN short-circuit their arguments and avoid evaluating them. >> Is that the case? > >> If this is merely about when the name "function" is looked up, then I >> don't see why that's relevant to the PEP. >> >> What am I missing? > > You're completely missing the context of the discussion, which was the > supposed reason that a *new* function call operator, with the proposed > syntax function?(args), that would short-circuit (based on the > 'function' being None) could not be implemented. The whole thing doesn't > make sense to me anyway, since a new operator could have its own > sequence different from the existing one if necessary. > The code: function?(args) would be equivalent to: None if function is None else function(args) where 'function' would be evaluated once. If function is None, the arguments would not be evaluated. From eric at trueblade.com Fri Dec 1 17:15:33 2017 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 1 Dec 2017 17:15:33 -0500 Subject: [Python-Dev] Third and hopefully final post: PEP 557, Data Classes In-Reply-To: <516a9e5b-e89e-0dd8-64c5-8a712b58f1ca@trueblade.com> References: <516a9e5b-e89e-0dd8-64c5-8a712b58f1ca@trueblade.com> Message-ID: <43e293da-75e3-75b5-d7dc-66ccf9a5f1ff@trueblade.com> See https://github.com/ericvsmith/dataclasses/issues/104 for a discussion on making order=False the default. This matches regular classes in Python 3, which cannot be ordered. Eric. From steve at pearwood.info Fri Dec 1 21:01:39 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 2 Dec 2017 13:01:39 +1100 Subject: [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: <1512134645.3087079.1190627376.7505C5A6@webmail.messagingengine.com> References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <5D409985-1D45-4957-9A27-B41C6311BA8B@python.org> <5A20858A.3070807@canterbury.ac.nz> <1512104079.2959729.1190270136.08EB0A92@webmail.messagingengine.com> <20171201103105.GK22248@ando.pearwood.info> <1512134645.3087079.1190627376.7505C5A6@webmail.messagingengine.com> Message-ID: <20171202020139.GR22248@ando.pearwood.info> On Fri, Dec 01, 2017 at 08:24:05AM -0500, Random832 wrote: > On Fri, Dec 1, 2017, at 05:31, Steven D'Aprano wrote: > > I'm more confused than ever. You seem to be arguing that Python > > functions CAN short-circuit their arguments and avoid evaluating them. > > Is that the case? > > > If this is merely about when the name "function" is looked up, then I > > don't see why that's relevant to the PEP. > > > > What am I missing? > > You're completely missing the context of the discussion, Yes I am. That's why I asked. > which was the > supposed reason that a *new* function call operator, with the proposed > syntax function?(args), that would short-circuit (based on the > 'function' being None) could not be implemented. Given that neither your post (which I replied to) nor the post you were replying to mentioned anything about function?() syntax, perhaps I might be forgiven for having no idea what you were talking about? The PEP only mentions function?() as a rejected idea, do I don't know why we're even talking about it. The PEP is deferred, with considerable opposition and luke-warm support, even the PEP author has said he's not going to push for it, and we're arguing about a pedantic point related to a part of the PEP which is rejected... :-) -- Steve From eric at trueblade.com Sat Dec 2 09:02:37 2017 From: eric at trueblade.com (Eric V. Smith) Date: Sat, 2 Dec 2017 09:02:37 -0500 Subject: [Python-Dev] PEP 557 Data Classes 5th posting Message-ID: I've pushed another version of PEP 557. The only difference is changing the default value of "order" to False instead of True. This matches regular classes: instances can be tested for equality, but are unordered. Discussion at https://github.com/ericvsmith/dataclasses/issues/104 It's already available at https://www.python.org/dev/peps/pep-0557/ I've updated the implementation on PyPI to reflect this change: https://pypi.python.org/pypi/dataclasses/0.3 Eric. From steve.dower at python.org Sat Dec 2 18:27:40 2017 From: steve.dower at python.org (Steve Dower) Date: Sat, 2 Dec 2017 15:27:40 -0800 Subject: [Python-Dev] PEPs: ``.. code:: python`` or ``::`` (syntaxhighlighting) In-Reply-To: References: Message-ID: I tried using code blocks while writing PEP 551 but they weren?t highlighted on the python.org rendering. Personally I think it would be great, provided a good colour scheme is available (some default schemes are... not always works of art). I?m not sure who is responsible for that side of things though ? presumably it?s just a case of installing pygments. (FWIW, I?m not keen on going back and modifying old PEPs, but I won?t stop someone doing it on their own terms.) Cheers, Steve Top-posted from my Windows phone at North Bay Python From: Wes Turner Sent: Thursday, November 30, 2017 5:06 To: Python-Dev Subject: [Python-Dev] PEPs: ``.. code:: python`` or ``::`` (syntaxhighlighting) In ReStructuredText, this gets syntax highlighted because of the code directive [1][2][3]: .. code:: python ? ?import this ? ?def func(*args, **kwargs): ? ? ? ?pass This also gets syntax highlighted as python[3]: .. code:: python ? ?import this ? ?def func(*args, **kwargs): ? ? ? ?pass This does not:: ? ?import this ? ?def func(*args, **kwargs): ? ? ? ?pass Syntax highlighting in Docutils 0.9+ is powered by Pygments. If Pygments is not installed, or there is a syntax error, syntax highlighting is absent. GitHub does show Pygments syntax highlighting in .. code:: blocks for .rst and .restructuredtext documents [4] 1. Does the python.org PEP view support .. code:: blocks? [5] 2. Syntax highlighting is an advantage for writers, editors, and readers. 3. Should PEPs use .. code:: blocks to provide this advantage? [1]?http://docutils.sourceforge.net/docs/ref/rst/directives.html#code [2]?http://www.sphinx-doc.org/en/stable/markup/code.html [3]?http://www.sphinx-doc.org/en/stable/config.html#confval-highlight_language [4] https://github.com/python/peps/blob/master/pep-0557.rst [5] https://www.python.org/dev/peps/pep-0557/ https://www.python.org/dev/peps/pep-0458/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mariatta.wijaya at gmail.com Sat Dec 2 19:34:33 2017 From: mariatta.wijaya at gmail.com (Mariatta Wijaya) Date: Sat, 2 Dec 2017 16:34:33 -0800 Subject: [Python-Dev] PEPs: ``.. code:: python`` or ``::`` (syntaxhighlighting) In-Reply-To: References: Message-ID: If we were to add Pygments support, it is to be done in pythondotorg project. I recalled the decision was to get PEPs rendered using Sphinx and host it at Read The Docs, so we don't have to worry about updating pythondotorg. Mariatta Wijaya -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Dec 2 21:30:02 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 3 Dec 2017 12:30:02 +1000 Subject: [Python-Dev] Third and hopefully final post: PEP 557, Data Classes In-Reply-To: <43e293da-75e3-75b5-d7dc-66ccf9a5f1ff@trueblade.com> References: <516a9e5b-e89e-0dd8-64c5-8a712b58f1ca@trueblade.com> <43e293da-75e3-75b5-d7dc-66ccf9a5f1ff@trueblade.com> Message-ID: On 2 December 2017 at 08:15, Eric V. Smith wrote: > See https://github.com/ericvsmith/dataclasses/issues/104 for a discussion on > making order=False the default. This matches regular classes in Python 3, > which cannot be ordered. +1 for making "order=True" be explicitly opt-in. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From wes.turner at gmail.com Sat Dec 2 21:32:18 2017 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 2 Dec 2017 21:32:18 -0500 Subject: [Python-Dev] PEPs: ``.. code:: python`` or ``::`` (syntax highlighting) Message-ID: Pending a transition of PEPs to ReadTheDocs (with HTTPS on a custom domain? and redirects?) (is there a gh issue for this task?), for the pythondotorg project is it as simple as `pip install pygments` and rebuilding each .rst with docutils with pygments installed? On Saturday, December 2, 2017, Mariatta Wijaya wrote: > If we were to add Pygments support, it is to be done in pythondotorg > project. > > I recalled the decision was to get PEPs rendered using Sphinx and host it > at Read The Docs, so we don't have to worry about updating pythondotorg. > > Mariatta Wijaya > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Dec 2 21:39:02 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 3 Dec 2017 12:39:02 +1000 Subject: [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: <20171202020139.GR22248@ando.pearwood.info> References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <5D409985-1D45-4957-9A27-B41C6311BA8B@python.org> <5A20858A.3070807@canterbury.ac.nz> <1512104079.2959729.1190270136.08EB0A92@webmail.messagingengine.com> <20171201103105.GK22248@ando.pearwood.info> <1512134645.3087079.1190627376.7505C5A6@webmail.messagingengine.com> <20171202020139.GR22248@ando.pearwood.info> Message-ID: On 2 December 2017 at 12:01, Steven D'Aprano wrote: > The PEP only mentions function?() as a rejected idea, do I don't know > why we're even talking about it. The PEP is deferred, with considerable > opposition and luke-warm support, even the PEP author has said he's not > going to push for it, and we're arguing about a pedantic point related > to a part of the PEP which is rejected... Nevertheless, I've fixed the rationale for that decision so folks don't get hung up on the mistake in the previously noted rationale: https://github.com/python/peps/commit/966dd426787e6de8ec6218955cec57f65086c5b4 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Dec 2 21:42:43 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 3 Dec 2017 12:42:43 +1000 Subject: [Python-Dev] PEPs: ``.. code:: python`` or ``::`` (syntax highlighting) In-Reply-To: References: Message-ID: On 3 December 2017 at 12:32, Wes Turner wrote: > Pending a transition of PEPs to ReadTheDocs (with HTTPS on a custom domain? > and redirects?) (is there a gh issue for this task?), See https://github.com/python/peps/projects/1 and https://github.com/python/core-workflow/issues/5 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From wes.turner at gmail.com Sat Dec 2 23:49:49 2017 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 2 Dec 2017 23:49:49 -0500 Subject: [Python-Dev] PEPs: ``.. code:: python`` or ``::`` (syntax highlighting) In-Reply-To: References: Message-ID: Add pygments for ``.. code::`` directive PEP syntax highlighting #1206 https://github.com/python/pythondotorg/issues/1206 Syntax highlighting is an advantage for writers, editors, and readers. reStructuredText PEPs are rendered into HTML with docutils. Syntax highlighting in Docutils 0.9+ is powered by Pygments. If Pygments is not installed, or there is a syntax error, syntax highlighting is absent. Docutils renders ``.. code::`` blocks with Python syntax highlighting by default. You can specify ``.. code:: python`` or ``.. code:: python3``. - GitHub shows Pygments syntax highlighting for ``.. code::`` directives for .rst and .restructuredtext documents - PEPs may eventually be hosted on ReadTheDocs with Sphinx (which installs docutils and pygments as install_requires in setup.py). https://github.com/python/peps/issues/2 https://github.com/python/core-workflow/issues/5 In order to use pygments with pythondotorg-hosted PEPs, a few things need to happen: - [ ] Include ``pygments`` in ``base-requirements.txt`` - [ ] Pick a pygments theme - Should we use the sphinx_rtd_theme default for consistency with the eventual RTD-hosted PEPs? - [ ] Include the necessary pygments CSS in the PEPs django template - [ ] rebuild the PEPs - Start using code directives in new PEPs - Manually review existing PEPs after adding code directives PEPs may use ``.. code::`` blocks instead of ``::`` so that code is syntax highlighted. On Saturday, December 2, 2017, Nick Coghlan wrote: > On 3 December 2017 at 12:32, Wes Turner > wrote: > > Pending a transition of PEPs to ReadTheDocs (with HTTPS on a custom > domain? > > and redirects?) (is there a gh issue for this task?), > > See https://github.com/python/peps/projects/1 and > https://github.com/python/core-workflow/issues/5 > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, > Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sun Dec 3 01:52:44 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 3 Dec 2017 08:52:44 +0200 Subject: [Python-Dev] PEPs: ``.. code:: python`` or ``::`` (syntax highlighting) In-Reply-To: References: Message-ID: 30.11.17 15:00, Wes Turner ????: > In ReStructuredText, this gets syntax highlighted > because of the code directive [1][2][3]: > > .. code:: python > > ? ?import this > ? ?def func(*args, **kwargs): > ? ? ? ?pass > > This also gets syntax highlighted as python[3]: > > .. code:: python > > ? ?import this > ? ?def func(*args, **kwargs): > ? ? ? ?pass > > This does not:: > > ? ?import this > ? ?def func(*args, **kwargs): > ? ? ? ?pass > > Syntax highlighting in Docutils 0.9+ is powered by Pygments. > If Pygments is not installed, or there is a syntax error, > syntax highlighting is absent. > > GitHub does show Pygments syntax highlighting > in .. code:: blocks for .rst and .restructuredtext documents [4] > > 1. Does the python.org PEP view support .. code:: > blocks? [5] > 2. Syntax highlighting is an advantage for writers, editors, and readers. > 3. Should PEPs use .. code:: blocks to provide this advantage? This was discussed when PEPs were converted to the .rst format. At that time this didn't work. I'm sure there is an open issue about adding support of Pygments. If this isn't, open the one. From eric at trueblade.com Sun Dec 3 09:55:10 2017 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 3 Dec 2017 09:55:10 -0500 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: References: Message-ID: I've made a minor change: the return type of fields() is now a tuple, it was a list. Eric. On 12/2/2017 9:02 AM, Eric V. Smith wrote: > I've pushed another version of PEP 557. The only difference is changing > the default value of "order" to False instead of True. This matches > regular classes: instances can be tested for equality, but are unordered. > > Discussion at https://github.com/ericvsmith/dataclasses/issues/104 > > It's already available at https://www.python.org/dev/peps/pep-0557/ > > I've updated the implementation on PyPI to reflect this change: > https://pypi.python.org/pypi/dataclasses/0.3 > > Eric. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com > From guido at python.org Sun Dec 3 11:56:15 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 3 Dec 2017 08:56:15 -0800 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: References: Message-ID: Not sure I like that better. It's an open-ended sequence of homogeneous types. What's the advantage of a tuple? I don't want to blindly follow existing APIs. On Sun, Dec 3, 2017 at 6:55 AM, Eric V. Smith wrote: > I've made a minor change: the return type of fields() is now a tuple, it > was a list. > > Eric. > > On 12/2/2017 9:02 AM, Eric V. Smith wrote: > >> I've pushed another version of PEP 557. The only difference is changing >> the default value of "order" to False instead of True. This matches regular >> classes: instances can be tested for equality, but are unordered. >> >> Discussion at https://github.com/ericvsmith/dataclasses/issues/104 >> >> It's already available at https://www.python.org/dev/peps/pep-0557/ >> >> I've updated the implementation on PyPI to reflect this change: >> https://pypi.python.org/pypi/dataclasses/0.3 >> >> Eric. >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba- >> python-dev%40trueblade.com >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido% > 40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sun Dec 3 12:07:15 2017 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 3 Dec 2017 12:07:15 -0500 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: References: Message-ID: <1ea4e3be-d2ae-8467-0ba4-b382542cc283@trueblade.com> On 12/3/2017 11:56 AM, Guido van Rossum wrote: > Not sure I like that better. It's an open-ended sequence of homogeneous > types. What's the advantage of a tuple? I don't want to blindly follow > existing APIs. So people don't modify it, but consenting adults would say "don't do that". I currently return a new tuple in each call to fields(), but in the future I might return the same one every time (per class). I really don't care so much. The only reason I made any change was because the implementation was returning an OrderedDict, so I was changing the tests anyway. I'm happy to change it back to a list, based on the convention of homogeneous types being in a list. Eric. > > On Sun, Dec 3, 2017 at 6:55 AM, Eric V. Smith > wrote: > > I've made a minor change: the return type of fields() is now a > tuple, it was a list. > > Eric. > > On 12/2/2017 9:02 AM, Eric V. Smith wrote: > > I've pushed another version of PEP 557. The only difference is > changing the default value of "order" to False instead of True. > This matches regular classes: instances can be tested for > equality, but are unordered. > > Discussion at > https://github.com/ericvsmith/dataclasses/issues/104 > > > It's already available at > https://www.python.org/dev/peps/pep-0557/ > > > I've updated the implementation on PyPI to reflect this change: > https://pypi.python.org/pypi/dataclasses/0.3 > > > Eric. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > -- > --Guido van Rossum (python.org/~guido ) From guido at python.org Sun Dec 3 15:02:24 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 3 Dec 2017 12:02:24 -0800 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: <1ea4e3be-d2ae-8467-0ba4-b382542cc283@trueblade.com> References: <1ea4e3be-d2ae-8467-0ba4-b382542cc283@trueblade.com> Message-ID: On second thought I don't care that much. On Dec 3, 2017 9:07 AM, "Eric V. Smith" wrote: > On 12/3/2017 11:56 AM, Guido van Rossum wrote: > >> Not sure I like that better. It's an open-ended sequence of homogeneous >> types. What's the advantage of a tuple? I don't want to blindly follow >> existing APIs. >> > > So people don't modify it, but consenting adults would say "don't do > that". I currently return a new tuple in each call to fields(), but in the > future I might return the same one every time (per class). > > I really don't care so much. The only reason I made any change was because > the implementation was returning an OrderedDict, so I was changing the > tests anyway. I'm happy to change it back to a list, based on the > convention of homogeneous types being in a list. > > Eric. > > >> On Sun, Dec 3, 2017 at 6:55 AM, Eric V. Smith > > wrote: >> >> I've made a minor change: the return type of fields() is now a >> tuple, it was a list. >> >> Eric. >> >> On 12/2/2017 9:02 AM, Eric V. Smith wrote: >> >> I've pushed another version of PEP 557. The only difference is >> changing the default value of "order" to False instead of True. >> This matches regular classes: instances can be tested for >> equality, but are unordered. >> >> Discussion at >> https://github.com/ericvsmith/dataclasses/issues/104 >> >> >> It's already available at >> https://www.python.org/dev/peps/pep-0557/ >> >> >> I've updated the implementation on PyPI to reflect this change: >> https://pypi.python.org/pypi/dataclasses/0.3 >> >> >> Eric. >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/eric%2Ba- >> python-dev%40trueblade.com >> > 2Ba-python-dev%40trueblade.com> >> >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org >> >> >> >> >> >> -- >> --Guido van Rossum (python.org/~guido ) >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Dec 3 15:33:32 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 3 Dec 2017 21:33:32 +0100 Subject: [Python-Dev] PEP 557 Data Classes 5th posting References: Message-ID: <20171203213332.6f315523@fsol> On Sat, 2 Dec 2017 09:02:37 -0500 "Eric V. Smith" wrote: > I've pushed another version of PEP 557. The only difference is changing > the default value of "order" to False instead of True. This matches > regular classes: instances can be tested for equality, but are unordered. > > Discussion at https://github.com/ericvsmith/dataclasses/issues/104 > > It's already available at https://www.python.org/dev/peps/pep-0557/ Thanks. I have to ask: why don't "asdict" and "astuple" respect PEP 8 naming? Regards Antoine. From eric at trueblade.com Sun Dec 3 16:00:45 2017 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 3 Dec 2017 16:00:45 -0500 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: References: <1ea4e3be-d2ae-8467-0ba4-b382542cc283@trueblade.com> Message-ID: Me, either. So I'm going to leave it as a tuple. Unless I find something while reviewing it tonight, I'm done. Eric. On 12/3/2017 3:02 PM, Guido van Rossum wrote: > On second thought I don't care that much. > > On Dec 3, 2017 9:07 AM, "Eric V. Smith" > wrote: > > On 12/3/2017 11:56 AM, Guido van Rossum wrote: > > Not sure I like that better. It's an open-ended sequence of > homogeneous types. What's the advantage of a tuple? I don't want > to blindly follow existing APIs. > > > So people don't modify it, but consenting adults would say "don't do > that". I currently return a new tuple in each call to fields(), but > in the future I might return the same one every time (per class). > > I really don't care so much. The only reason I made any change was > because the implementation was returning an OrderedDict, so I was > changing the tests anyway. I'm happy to change it back to a list, > based on the convention of homogeneous types being in a list. > > Eric. > > > On Sun, Dec 3, 2017 at 6:55 AM, Eric V. Smith > > >> wrote: > > ? ? I've made a minor change: the return type of fields() is now a > ? ? tuple, it was a list. > > ? ? Eric. > > ? ? On 12/2/2017 9:02 AM, Eric V. Smith wrote: > > ? ? ? ? I've pushed another version of PEP 557. The only > difference is > ? ? ? ? changing the default value of "order" to False instead > of True. > ? ? ? ? This matches regular classes: instances can be tested for > ? ? ? ? equality, but are unordered. > > ? ? ? ? Discussion at > https://github.com/ericvsmith/dataclasses/issues/104 > > ? ? ? ? > > > ? ? ? ? It's already available at > https://www.python.org/dev/peps/pep-0557/ > > ? ? ? ? > > > ? ? ? ? I've updated the implementation on PyPI to reflect this > change: > https://pypi.python.org/pypi/dataclasses/0.3 > > ? ? ? ? > > > ? ? ? ? Eric. > ? ? ? ? _______________________________________________ > ? ? ? ? Python-Dev mailing list > Python-Dev at python.org > > > https://mail.python.org/mailman/listinfo/python-dev > > ? ? ? ? > > ? ? ? ? Unsubscribe: > https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com > > > > > > > > ? ? _______________________________________________ > ? ? Python-Dev mailing list > Python-Dev at python.org > > > https://mail.python.org/mailman/listinfo/python-dev > > ? ? > > ? ? Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > > > > -- > --Guido van Rossum (python.org/~guido > ) > > From eric at trueblade.com Sun Dec 3 16:28:37 2017 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 3 Dec 2017 16:28:37 -0500 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: <20171203213332.6f315523@fsol> References: <20171203213332.6f315523@fsol> Message-ID: <5db84927-4135-1cb9-425e-7ea59e0da4f5@trueblade.com> On 12/3/2017 3:33 PM, Antoine Pitrou wrote: > On Sat, 2 Dec 2017 09:02:37 -0500 > "Eric V. Smith" wrote: >> I've pushed another version of PEP 557. The only difference is changing >> the default value of "order" to False instead of True. This matches >> regular classes: instances can be tested for equality, but are unordered. >> >> Discussion at https://github.com/ericvsmith/dataclasses/issues/104 >> >> It's already available at https://www.python.org/dev/peps/pep-0557/ > > Thanks. I have to ask: why don't "asdict" and "astuple" respect PEP 8 > naming? I guess it depends if you think the underscore is needed to improve readability. "Function names should be lowercase, with words separated by underscores as necessary to improve readability." I don't feel strongly enough about it to change it, but part of that is because I'm burned out on the PEP, so I might not be a good one to judge at this point. I guess if I clear my head and I were doing it from scratch again I'd make them as_dict and as_tuple, so maybe I should brush aside inertia and make the change. Eric. From guido at python.org Sun Dec 3 20:31:13 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 3 Dec 2017 17:31:13 -0800 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: <5db84927-4135-1cb9-425e-7ea59e0da4f5@trueblade.com> References: <20171203213332.6f315523@fsol> <5db84927-4135-1cb9-425e-7ea59e0da4f5@trueblade.com> Message-ID: On Sun, Dec 3, 2017 at 1:28 PM, Eric V. Smith wrote: > On 12/3/2017 3:33 PM, Antoine Pitrou wrote: > >> On Sat, 2 Dec 2017 09:02:37 -0500 >> "Eric V. Smith" wrote: >> >>> I've pushed another version of PEP 557. The only difference is changing >>> the default value of "order" to False instead of True. This matches >>> regular classes: instances can be tested for equality, but are unordered. >>> >>> Discussion at https://github.com/ericvsmith/dataclasses/issues/104 >>> >>> It's already available at https://www.python.org/dev/peps/pep-0557/ >>> >> >> Thanks. I have to ask: why don't "asdict" and "astuple" respect PEP 8 >> naming? >> > > I guess it depends if you think the underscore is needed to improve > readability. "Function names should be lowercase, with words separated by > underscores as necessary to improve readability." > > I don't feel strongly enough about it to change it, but part of that is > because I'm burned out on the PEP, so I might not be a good one to judge at > this point. I guess if I clear my head and I were doing it from scratch > again I'd make them as_dict and as_tuple, so maybe I should brush aside > inertia and make the change. > The Python stdlib is incredibly inconsistent where it comes to inserting underscores. I think in this case it matches `namedtuple._asdict()` and that's good enough for me. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sun Dec 3 21:07:42 2017 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 3 Dec 2017 21:07:42 -0500 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: References: <20171203213332.6f315523@fsol> <5db84927-4135-1cb9-425e-7ea59e0da4f5@trueblade.com> Message-ID: <45c7f8be-7acd-0f66-e1b3-4a5df5def515@trueblade.com> On 12/3/2017 8:31 PM, Guido van Rossum wrote: > On Sun, Dec 3, 2017 at 1:28 PM, Eric V. Smith > wrote: > > On 12/3/2017 3:33 PM, Antoine Pitrou wrote: > > Thanks.? I have to ask: why don't "asdict" and "astuple" respect > PEP 8 > naming? > > > I guess it depends if you think the underscore is needed to improve > readability. "Function names should be lowercase, with words > separated by underscores as necessary to improve readability." > > I don't feel strongly enough about it to change it, but part of that > is because I'm burned out on the PEP, so I might not be a good one > to judge at this point. I guess if I clear my head and I were doing > it from scratch again I'd make them as_dict and as_tuple, so maybe I > should brush aside inertia and make the change. > > > The Python stdlib is incredibly inconsistent where it comes to inserting > underscores. I think in this case it matches `namedtuple._asdict()` and > that's good enough for me. It also matches `attrs.asdict()`, which is what originally inspired it. Eric. From eric at trueblade.com Sun Dec 3 21:11:01 2017 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 3 Dec 2017 21:11:01 -0500 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: <45c7f8be-7acd-0f66-e1b3-4a5df5def515@trueblade.com> References: <20171203213332.6f315523@fsol> <5db84927-4135-1cb9-425e-7ea59e0da4f5@trueblade.com> <45c7f8be-7acd-0f66-e1b3-4a5df5def515@trueblade.com> Message-ID: <21408bb5-7a37-7329-c0b2-e7ff436d5737@trueblade.com> On 12/3/2017 9:07 PM, Eric V. Smith wrote: > It also matches `attrs.asdict()`, which is what originally inspired it. Make that `attr.asdict()`. So easy to get that wrong. Eric. From eric at trueblade.com Mon Dec 4 03:58:02 2017 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 4 Dec 2017 03:58:02 -0500 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: <45c7f8be-7acd-0f66-e1b3-4a5df5def515@trueblade.com> References: <20171203213332.6f315523@fsol> <5db84927-4135-1cb9-425e-7ea59e0da4f5@trueblade.com> <45c7f8be-7acd-0f66-e1b3-4a5df5def515@trueblade.com> Message-ID: <573264df-1f5e-eade-be1c-bfd2f07ddcbd@trueblade.com> On 12/3/2017 9:07 PM, Eric V. Smith wrote: > On 12/3/2017 8:31 PM, Guido van Rossum wrote: >> On Sun, Dec 3, 2017 at 1:28 PM, Eric V. Smith > > wrote: >> >> ??? On 12/3/2017 3:33 PM, Antoine Pitrou wrote: > > >> >> ??????? Thanks.? I have to ask: why don't "asdict" and "astuple" respect >> ??????? PEP 8 >> ??????? naming? >> >> >> ??? I guess it depends if you think the underscore is needed to improve >> ??? readability. "Function names should be lowercase, with words >> ??? separated by underscores as necessary to improve readability." >> >> ??? I don't feel strongly enough about it to change it, but part of that >> ??? is because I'm burned out on the PEP, so I might not be a good one >> ??? to judge at this point. I guess if I clear my head and I were doing >> ??? it from scratch again I'd make them as_dict and as_tuple, so maybe I >> ??? should brush aside inertia and make the change. >> >> >> The Python stdlib is incredibly inconsistent where it comes to >> inserting underscores. I think in this case it matches >> `namedtuple._asdict()` and that's good enough for me. > > It also matches `attrs.asdict()`, which is what originally inspired it. After a brief discussion at https://github.com/ericvsmith/dataclasses/issues/110, the decision is to leave the function names as-is, without underscores, to be consistent with namedtuples and attrs. I'll add a note in the PEP's discussion section. Eric. From guido at python.org Mon Dec 4 11:58:11 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 4 Dec 2017 08:58:11 -0800 Subject: [Python-Dev] Accepting PEP 562 -- Module __getattr__ and __dir__ Message-ID: Ivan, Congrats on your PEP. I believe the outstanding issues are now resolved and I am hereby accepting it. PS. Sorry, Larry, PEP 549 is rejected. But that happened a while ago. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Dec 4 12:17:25 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 4 Dec 2017 09:17:25 -0800 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: <573264df-1f5e-eade-be1c-bfd2f07ddcbd@trueblade.com> References: <20171203213332.6f315523@fsol> <5db84927-4135-1cb9-425e-7ea59e0da4f5@trueblade.com> <45c7f8be-7acd-0f66-e1b3-4a5df5def515@trueblade.com> <573264df-1f5e-eade-be1c-bfd2f07ddcbd@trueblade.com> Message-ID: And with this, I'm accepting PEP 557, Data Classes. Eric, congrats with your efforts in proposing and implementing this PEP and guiding it through the discussion! It's been great to see this idea come to fruition. Thanks also to the many people who reviewed drafts or implementation code, including the very generous authors and maintainers of "attrs", from which this has taken many ideas. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Dec 4 12:34:57 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 04 Dec 2017 09:34:57 -0800 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: References: <20171203213332.6f315523@fsol> <5db84927-4135-1cb9-425e-7ea59e0da4f5@trueblade.com> <45c7f8be-7acd-0f66-e1b3-4a5df5def515@trueblade.com> <573264df-1f5e-eade-be1c-bfd2f07ddcbd@trueblade.com> Message-ID: <5A258741.2080602@stoneleaf.us> On 12/04/2017 09:17 AM, Guido van Rossum wrote: > And with this, I'm accepting PEP 557, Data Classes. Congratulations, Eric! Data Classes will be a handy thing to have. :) -- ~Ethan~ From victor.stinner at gmail.com Mon Dec 4 12:46:10 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 4 Dec 2017 18:46:10 +0100 Subject: [Python-Dev] Accepting PEP 562 -- Module __getattr__ and __dir__ In-Reply-To: References: Message-ID: Link for lazy people like me: https://www.python.org/dev/peps/pep-0562/ I changed the PEP status to fix a typo in the abstract: https://github.com/python/peps/commit/a87417b22bf15bc4382daeaef6d32886c687ad19 Victor 2017-12-04 17:58 GMT+01:00 Guido van Rossum : > Ivan, > > Congrats on your PEP. I believe the outstanding issues are now resolved and > I am hereby accepting it. > > PS. Sorry, Larry, PEP 549 is rejected. But that happened a while ago. > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > From eric at trueblade.com Mon Dec 4 12:56:30 2017 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 4 Dec 2017 12:56:30 -0500 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: References: <20171203213332.6f315523@fsol> <5db84927-4135-1cb9-425e-7ea59e0da4f5@trueblade.com> <45c7f8be-7acd-0f66-e1b3-4a5df5def515@trueblade.com> <573264df-1f5e-eade-be1c-bfd2f07ddcbd@trueblade.com> Message-ID: <80CD3DCB-28C6-4444-A7DA-6CCE1C13FDE0@trueblade.com> Thanks, Guido. And indeed, thanks to everyone else who provided inspiration and feedback. I too would like to thank Hynek and the other authors of ?attrs?. I?ll get the implementation committed in the next day or so. -- Eric. > On Dec 4, 2017, at 12:17 PM, Guido van Rossum wrote: > > And with this, I'm accepting PEP 557, Data Classes. > > Eric, congrats with your efforts in proposing and implementing this PEP and guiding it through the discussion! It's been great to see this idea come to fruition. Thanks also to the many people who reviewed drafts or implementation code, including the very generous authors and maintainers of "attrs", from which this has taken many ideas. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Dec 4 11:56:41 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 4 Dec 2017 08:56:41 -0800 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types Message-ID: Ivan, Congrats on your PEP. I believe the outstanding issues are now resolved and I am hereby accepting it. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Mon Dec 4 13:52:34 2017 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 4 Dec 2017 10:52:34 -0800 Subject: [Python-Dev] Accepting PEP 562 -- Module __getattr__ and __dir__ In-Reply-To: References: Message-ID: <7b03ba4a-9739-c1d8-1970-26d2a7f2f060@g.nevcal.com> The word "a" is extraneous/confusing-in-grammar in the first line of the abstract also. I can't fix it. On 12/4/2017 9:46 AM, Victor Stinner wrote: > Link for lazy people like me: > https://www.python.org/dev/peps/pep-0562/ > > I changed the PEP status to fix a typo in the abstract: > https://github.com/python/peps/commit/a87417b22bf15bc4382daeaef6d32886c687ad19 > > Victor > > > 2017-12-04 17:58 GMT+01:00 Guido van Rossum : >> Ivan, >> >> Congrats on your PEP. I believe the outstanding issues are now resolved and >> I am hereby accepting it. >> >> PS. Sorry, Larry, PEP 549 is rejected. But that happened a while ago. >> >> -- >> --Guido van Rossum (python.org/~guido) >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/v%2Bpython%40g.nevcal.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nad at python.org Mon Dec 4 14:27:29 2017 From: nad at python.org (Ned Deily) Date: Mon, 4 Dec 2017 14:27:29 -0500 Subject: [Python-Dev] 3.7.0a3 still open Message-ID: <3E7162BE-F5C0-4DDF-8D1B-25DB0EB13BC2@python.org> Congratulations to the owners of the newly accepted PEPs. The code cutoff for 3.7.0 alpha 3 is scheduled for today (along with 3.6.3rc1). I know at least one of the PEPs has code ready to commit. I will hold off on tagging 3.7.0a3 for another 6 hours or so. If you feel your code is adequately reviewed and ready to go, go for it; likewise for normal bug fixes and doc changes. But keep in the mind that there is still one more alpha preview release coming prior to the beta 1 feature code freeze, so no need to panic. -- Ned Deily nad at python.org -- [] From guido at python.org Mon Dec 4 14:34:01 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 4 Dec 2017 11:34:01 -0800 Subject: [Python-Dev] Accepting PEP 562 -- Module __getattr__ and __dir__ In-Reply-To: <7b03ba4a-9739-c1d8-1970-26d2a7f2f060@g.nevcal.com> References: <7b03ba4a-9739-c1d8-1970-26d2a7f2f060@g.nevcal.com> Message-ID: Thanks, I fixed it. Will be live in 15-60 minutes. On Mon, Dec 4, 2017 at 10:52 AM, Glenn Linderman wrote: > The word "a" is extraneous/confusing-in-grammar in the first line of the > abstract also. I can't fix it. > > > On 12/4/2017 9:46 AM, Victor Stinner wrote: > > Link for lazy people like me:https://www.python.org/dev/peps/pep-0562/ > > I changed the PEP status to fix a typo in the abstract:https://github.com/python/peps/commit/a87417b22bf15bc4382daeaef6d32886c687ad19 > > Victor > > > 2017-12-04 17:58 GMT+01:00 Guido van Rossum : > > Ivan, > > Congrats on your PEP. I believe the outstanding issues are now resolved and > I am hereby accepting it. > > PS. Sorry, Larry, PEP 549 is rejected. But that happened a while ago. > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-Dev mailing listPython-Dev at python.orghttps://mail.python.org/mailman/listinfo/python-dev > Unsubscribe:https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > > _______________________________________________ > Python-Dev mailing listPython-Dev at python.orghttps://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: https://mail.python.org/mailman/options/python-dev/v%2Bpython%40g.nevcal.com > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Dec 4 11:42:38 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 4 Dec 2017 08:42:38 -0800 Subject: [Python-Dev] PEP 563: Postponed Evaluation of Annotations (Draft 3) In-Reply-To: References: Message-ID: ?ukasz, I am hereby accepting your PEP. This will be a great improvement in the experience of users annotating large complex codebases. Congrats on the design and implementation and on your shepherding the PEP through the discussion phase. Also a special thanks to Serhiy for thoroughly reviewing and contributing to the ast-expr-stringification code. --Guido PS. I have some editorial quibbles (mostly suggestions to make the exposition clearer in a few places) but they don't affect acceptance of the PEP and I will contact you at a later time with these. On Tue, Nov 21, 2017 at 4:26 PM, Lukasz Langa wrote: > Based on the feedback I gather in early November, > I'm publishing the third draft for consideration on python-dev. > I hope you like it! > > A nicely formatted rendering is available here: > https://www.python.org/dev/peps/pep-0563/ > > The full list of changes between this version and the previous draft > can be found here: > https://github.com/ambv/static-annotations/compare/ > python-dev1...python-dev2 > > - ? > > > > PEP: 563 > Title: Postponed Evaluation of Annotations > Version: $Revision$ > Last-Modified: $Date$ > Author: ?ukasz Langa > Discussions-To: Python-Dev > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 8-Sep-2017 > Python-Version: 3.7 > Post-History: 1-Nov-2017, 21-Nov-2017 > Resolution: > > > Abstract > ======== > > PEP 3107 introduced syntax for function annotations, but the semantics > were deliberately left undefined. PEP 484 introduced a standard meaning > to annotations: type hints. PEP 526 defined variable annotations, > explicitly tying them with the type hinting use case. > > This PEP proposes changing function annotations and variable annotations > so that they are no longer evaluated at function definition time. > Instead, they are preserved in ``__annotations__`` in string form. > > This change is going to be introduced gradually, starting with a new > ``__future__`` import in Python 3.7. > > > Rationale and Goals > =================== > > PEP 3107 added support for arbitrary annotations on parts of a function > definition. Just like default values, annotations are evaluated at > function definition time. This creates a number of issues for the type > hinting use case: > > * forward references: when a type hint contains names that have not been > defined yet, that definition needs to be expressed as a string > literal; > > * type hints are executed at module import time, which is not > computationally free. > > Postponing the evaluation of annotations solves both problems. > > Non-goals > --------- > > Just like in PEP 484 and PEP 526, it should be emphasized that **Python > will remain a dynamically typed language, and the authors have no desire > to ever make type hints mandatory, even by convention.** > > This PEP is meant to solve the problem of forward references in type > annotations. There are still cases outside of annotations where > forward references will require usage of string literals. Those are > listed in a later section of this document. > > Annotations without forced evaluation enable opportunities to improve > the syntax of type hints. This idea will require its own separate PEP > and is not discussed further in this document. > > Non-typing usage of annotations > ------------------------------- > > While annotations are still available for arbitrary use besides type > checking, it is worth mentioning that the design of this PEP, as well > as its precursors (PEP 484 and PEP 526), is predominantly motivated by > the type hinting use case. > > In Python 3.8 PEP 484 will graduate from provisional status. Other > enhancements to the Python programming language like PEP 544, PEP 557, > or PEP 560, are already being built on this basis as they depend on > type annotations and the ``typing`` module as defined by PEP 484. > In fact, the reason PEP 484 is staying provisional in Python 3.7 is to > enable rapid evolution for another release cycle that some of the > aforementioned enhancements require. > > With this in mind, uses for annotations incompatible with the > aforementioned PEPs should be considered deprecated. > > > Implementation > ============== > > In Python 4.0, function and variable annotations will no longer be > evaluated at definition time. Instead, a string form will be preserved > in the respective ``__annotations__`` dictionary. Static type checkers > will see no difference in behavior, whereas tools using annotations at > runtime will have to perform postponed evaluation. > > The string form is obtained from the AST during the compilation step, > which means that the string form might not preserve the exact formatting > of the source. Note: if an annotation was a string literal already, it > will still be wrapped in a string. > > Annotations need to be syntactically valid Python expressions, also when > passed as literal strings (i.e. ``compile(literal, '', 'eval')``). > Annotations can only use names present in the module scope as postponed > evaluation using local names is not reliable (with the sole exception of > class-level names resolved by ``typing.get_type_hints()``). > > Note that as per PEP 526, local variable annotations are not evaluated > at all since they are not accessible outside of the function's closure. > > Enabling the future behavior in Python 3.7 > ------------------------------------------ > > The functionality described above can be enabled starting from Python > 3.7 using the following special import:: > > from __future__ import annotations > > A reference implementation of this functionality is available > `on GitHub `_. > > > Resolving Type Hints at Runtime > =============================== > > To resolve an annotation at runtime from its string form to the result > of the enclosed expression, user code needs to evaluate the string. > > For code that uses type hints, the > ``typing.get_type_hints(obj, globalns=None, localns=None)`` function > correctly evaluates expressions back from its string form. Note that > all valid code currently using ``__annotations__`` should already be > doing that since a type annotation can be expressed as a string literal. > > For code which uses annotations for other purposes, a regular > ``eval(ann, globals, locals)`` call is enough to resolve the > annotation. > > In both cases it's important to consider how globals and locals affect > the postponed evaluation. An annotation is no longer evaluated at the > time of definition and, more importantly, *in the same scope* where it > was defined. Consequently, using local state in annotations is no > longer possible in general. As for globals, the module where the > annotation was defined is the correct context for postponed evaluation. > > The ``get_type_hints()`` function automatically resolves the correct > value of ``globalns`` for functions and classes. It also automatically > provides the correct ``localns`` for classes. > > When running ``eval()``, > the value of globals can be gathered in the following way: > > * function objects hold a reference to their respective globals in an > attribute called ``__globals__``; > > * classes hold the name of the module they were defined in, this can be > used to retrieve the respective globals:: > > cls_globals = vars(sys.modules[SomeClass.__module__]) > > Note that this needs to be repeated for base classes to evaluate all > ``__annotations__``. > > * modules should use their own ``__dict__``. > > The value of ``localns`` cannot be reliably retrieved for functions > because in all likelihood the stack frame at the time of the call no > longer exists. > > For classes, ``localns`` can be composed by chaining vars of the given > class and its base classes (in the method resolution order). Since slots > can only be filled after the class was defined, we don't need to consult > them for this purpose. > > Runtime annotation resolution and class decorators > -------------------------------------------------- > > Metaclasses and class decorators that need to resolve annotations for > the current class will fail for annotations that use the name of the > current class. Example:: > > def class_decorator(cls): > annotations = get_type_hints(cls) # raises NameError on 'C' > print(f'Annotations for {cls}: {annotations}') > return cls > > @class_decorator > class C: > singleton: 'C' = None > > This was already true before this PEP. The class decorator acts on > the class before it's assigned a name in the current definition scope. > > Runtime annotation resolution and ``TYPE_CHECKING`` > --------------------------------------------------- > > Sometimes there's code that must be seen by a type checker but should > not be executed. For such situations the ``typing`` module defines a > constant, ``TYPE_CHECKING``, that is considered ``True`` during type > checking but ``False`` at runtime. Example:: > > import typing > > if typing.TYPE_CHECKING: > import expensive_mod > > def a_func(arg: expensive_mod.SomeClass) -> None: > a_var: expensive_mod.SomeClass = arg > ... > > This approach is also useful when handling import cycles. > > Trying to resolve annotations of ``a_func`` at runtime using > ``typing.get_type_hints()`` will fail since the name ``expensive_mod`` > is not defined (``TYPE_CHECKING`` variable being ``False`` at runtime). > This was already true before this PEP. > > > Backwards Compatibility > ======================= > > This is a backwards incompatible change. Applications depending on > arbitrary objects to be directly present in annotations will break > if they are not using ``typing.get_type_hints()`` or ``eval()``. > > Annotations that depend on locals at the time of the function > definition will not be resolvable later. Example:: > > def generate(): > A = Optional[int] > class C: > field: A = 1 > def method(self, arg: A) -> None: ... > return C > X = generate() > > Trying to resolve annotations of ``X`` later by using > ``get_type_hints(X)`` will fail because ``A`` and its enclosing scope no > longer exists. Python will make no attempt to disallow such annotations > since they can often still be successfully statically analyzed, which is > the predominant use case for annotations. > > Annotations using nested classes and their respective state are still > valid. They can use local names or the fully qualified name. Example:: > > class C: > field = 'c_field' > def method(self) -> C.field: # this is OK > ... > > def method(self) -> field: # this is OK > ... > > def method(self) -> C.D: # this is OK > ... > > def method(self) -> D: # this is OK > ... > > class D: > field2 = 'd_field' > def method(self) -> C.D.field2: # this is OK > ... > > def method(self) -> D.field2: # this is OK > ... > > def method(self) -> field2: # this is OK > ... > > def method(self) -> field: # this FAILS, class D doesn't > ... # see C's attributes, This was > # already true before this PEP. > > In the presence of an annotation that isn't a syntactically valid > expression, SyntaxError is raised at compile time. However, since names > aren't resolved at that time, no attempt is made to validate whether > used names are correct or not. > > Deprecation policy > ------------------ > > Starting with Python 3.7, a ``__future__`` import is required to use the > described functionality. No warnings are raised. > > In Python 3.8 a ``PendingDeprecationWarning`` is raised by the > compiler in the presence of type annotations in modules without the > ``__future__`` import. > > Starting with Python 3.9 the warning becomes a ``DeprecationWarning``. > > In Python 4.0 this will become the default behavior. Use of annotations > incompatible with this PEP is no longer supported. > > > Forward References > ================== > > Deliberately using a name before it was defined in the module is called > a forward reference. For the purpose of this section, we'll call > any name imported or defined within a ``if TYPE_CHECKING:`` block > a forward reference, too. > > This PEP addresses the issue of forward references in *type annotations*. > The use of string literals will no longer be required in this case. > However, there are APIs in the ``typing`` module that use other syntactic > constructs of the language, and those will still require working around > forward references with string literals. The list includes: > > * type definitions:: > > T = TypeVar('T', bound='') > UserId = NewType('UserId', '') > Employee = NamedTuple('Employee', [('name', '', ('id', > '')]) > > * aliases:: > > Alias = Optional[''] > AnotherAlias = Union['', ''] > YetAnotherAlias = '' > > * casting:: > > cast('', value) > > * base classes:: > > class C(Tuple['', '']): ... > > Depending on the specific case, some of the cases listed above might be > worked around by placing the usage in a ``if TYPE_CHECKING:`` block. > This will not work for any code that needs to be available at runtime, > notably for base classes and casting. For named tuples, using the new > class definition syntax introduced in Python 3.6 solves the issue. > > In general, fixing the issue for *all* forward references requires > changing how module instantiation is performed in Python, from the > current single-pass top-down model. This would be a major change in the > language and is out of scope for this PEP. > > > Rejected Ideas > ============== > > Keeping the ability to use function local state when defining annotations > ------------------------------------------------------------------------- > > With postponed evaluation, this would require keeping a reference to > the frame in which an annotation got created. This could be achieved > for example by storing all annotations as lambdas instead of strings. > > This would be prohibitively expensive for highly annotated code as the > frames would keep all their objects alive. That includes predominantly > objects that won't ever be accessed again. > > To be able to address class-level scope, the lambda approach would > require a new kind of cell in the interpreter. This would proliferate > the number of types that can appear in ``__annotations__``, as well as > wouldn't be as introspectable as strings. > > Note that in the case of nested classes, the functionality to get the > effective "globals" and "locals" at definition time is provided by > ``typing.get_type_hints()``. > > If a function generates a class or a function with annotations that > have to use local variables, it can populate the given generated > object's ``__annotations__`` dictionary directly, without relying on > the compiler. > > Disallowing local state usage for classes, too > ---------------------------------------------- > > This PEP originally proposed limiting names within annotations to only > allow names from the model-level scope, including for classes. The > author argued this makes name resolution unambiguous, including in cases > of conflicts between local names and module-level names. > > This idea was ultimately rejected in case of classes. Instead, > ``typing.get_type_hints()`` got modified to populate the local namespace > correctly if class-level annotations are needed. > > The reasons for rejecting the idea were that it goes against the > intuition of how scoping works in Python, and would break enough > existing type annotations to make the transition cumbersome. Finally, > local scope access is required for class decorators to be able to > evaluate type annotations. This is because class decorators are applied > before the class receives its name in the outer scope. > > Introducing a new dictionary for the string literal form instead > ---------------------------------------------------------------- > > Yury Selivanov shared the following idea: > > 1. Add a new special attribute to functions: ``__annotations_text__``. > > 2. Make ``__annotations__`` a lazy dynamic mapping, evaluating > expressions from the corresponding key in ``__annotations_text__`` > just-in-time. > > This idea is supposed to solve the backwards compatibility issue, > removing the need for a new ``__future__`` import. Sadly, this is not > enough. Postponed evaluation changes which state the annotation has > access to. While postponed evaluation fixes the forward reference > problem, it also makes it impossible to access function-level locals > anymore. This alone is a source of backwards incompatibility which > justifies a deprecation period. > > A ``__future__`` import is an obvious and explicit indicator of opting > in for the new functionality. It also makes it trivial for external > tools to recognize the difference between a Python files using the old > or the new approach. In the former case, that tool would recognize that > local state access is allowed, whereas in the latter case it would > recognize that forward references are allowed. > > Finally, just-in-time evaluation in ``__annotations__`` is an > unnecessary step if ``get_type_hints()`` is used later. > > Dropping annotations with -O > ---------------------------- > > There are two reasons this is not satisfying for the purpose of this > PEP. > > First, this only addresses runtime cost, not forward references, those > still cannot be safely used in source code. A library maintainer would > never be able to use forward references since that would force the > library users to use this new hypothetical -O switch. > > Second, this throws the baby out with the bath water. Now *no* runtime > annotation use can be performed. PEP 557 is one example of a recent > development where evaluating type annotations at runtime is useful. > > All that being said, a granular -O option to drop annotations is > a possibility in the future, as it's conceptually compatible with > existing -O behavior (dropping docstrings and assert statements). This > PEP does not invalidate the idea. > > Pass string literals in annotations verbatim to ``__annotations__`` > ------------------------------------------------------------------- > > This PEP originally suggested directly storing the contents of a string > literal under its respective key in ``__annotations__``. This was > meant to simplify support for runtime type checkers. > > Mark Shannon pointed out this idea was flawed since it wasn't handling > situations where strings are only part of a type annotation. > > The inconsistency of it was always apparent but given that it doesn't > fully prevent cases of double-wrapping strings anyway, it is not worth > it. > > Make the name of the future import more verbose > ----------------------------------------------- > > Instead of requiring the following import:: > > from __future__ import annotations > > the PEP could call the feature more explicitly, for example > ``string_annotations``, ``stringify_annotations``, > ``annotation_strings``, ``annotations_as_strings``, ``lazy_anotations``, > ``static_annotations``, etc. > > The problem with those names is that they are very verbose. Each of > them besides ``lazy_annotations`` would constitute the longest future > feature name in Python. They are long to type and harder to remember > than the single-word form. > > There is precedence of a future import name that sounds overly generic > but in practice was obvious to users as to what it does:: > > from __future__ import division > > > Prior discussion > ================ > > In PEP 484 > ---------- > > The forward reference problem was discussed when PEP 484 was originally > drafted, leading to the following statement in the document: > > A compromise is possible where a ``__future__`` import could enable > turning *all* annotations in a given module into string literals, as > follows:: > > from __future__ import annotations > > class ImSet: > def add(self, a: ImSet) -> List[ImSet]: ... > > assert ImSet.add.__annotations__ == { > 'a': 'ImSet', 'return': 'List[ImSet]' > } > > Such a ``__future__`` import statement may be proposed in a separate > PEP. > > python/typing#400 > ----------------- > > The problem was discussed at length on the typing module's GitHub > project, under `Issue 400 `_. > The problem statement there includes critique of generic types requiring > imports from ``typing``. This tends to be confusing to > beginners: > > Why this:: > > from typing import List, Set > def dir(o: object = ...) -> List[str]: ... > def add_friends(friends: Set[Friend]) -> None: ... > > But not this:: > > def dir(o: object = ...) -> list[str]: ... > def add_friends(friends: set[Friend]) -> None ... > > Why this:: > > up_to_ten = list(range(10)) > friends = set() > > But not this:: > > from typing import List, Set > up_to_ten = List[int](range(10)) > friends = Set[Friend]() > > While typing usability is an interesting problem, it is out of scope > of this PEP. Specifically, any extensions of the typing syntax > standardized in PEP 484 will require their own respective PEPs and > approval. > > Issue 400 ultimately suggests postponing evaluation of annotations and > keeping them as strings in ``__annotations__``, just like this PEP > specifies. This idea was received well. Ivan Levkivskyi supported > using the ``__future__`` import and suggested unparsing the AST in > ``compile.c``. Jukka Lehtosalo pointed out that there are some cases > of forward references where types are used outside of annotations and > postponed evaluation will not help those. For those cases using the > string literal notation would still be required. Those cases are > discussed briefly in the "Forward References" section of this PEP. > > The biggest controversy on the issue was Guido van Rossum's concern > that untokenizing annotation expressions back to their string form has > no precedent in the Python programming language and feels like a hacky > workaround. He said: > > One thing that comes to mind is that it's a very random change to > the language. It might be useful to have a more compact way to > indicate deferred execution of expressions (using less syntax than > ``lambda:``). But why would the use case of type annotations be so > all-important to change the language to do it there first (rather > than proposing a more general solution), given that there's already > a solution for this particular use case that requires very minimal > syntax? > > Eventually, Ethan Smith and schollii voiced that feedback gathered > during PyCon US suggests that the state of forward references needs > fixing. Guido van Rossum suggested coming back to the ``__future__`` > idea, pointing out that to prevent abuse, it's important for the > annotations to be kept both syntactically valid and evaluating correctly > at runtime. > > First draft discussion on python-ideas > -------------------------------------- > > Discussion happened largely in two threads, `the original announcement > September/thread.html#47031>`_ > and a follow-up called `PEP 563 and expensive backwards compatibility > September/thread.html#47108>`_. > > The PEP received rather warm feedback (4 strongly in favor, > 2 in favor with concerns, 2 against). The biggest voice of concern on > the former thread being Steven D'Aprano's review stating that the > problem definition of the PEP doesn't justify breaking backwards > compatibility. In this response Steven seemed mostly concerned about > Python no longer supporting evaluation of annotations that depended on > local function/class state. > > A few people voiced concerns that there are libraries using annotations > for non-typing purposes. However, none of the named libraries would be > invalidated by this PEP. They do require adapting to the new > requirement to call ``eval()`` on the annotation with the correct > ``globals`` and ``locals`` set. > > This detail about ``globals`` and ``locals`` having to be correct was > picked up by a number of commenters. Nick Coghlan benchmarked turning > annotations into lambdas instead of strings, sadly this proved to be > much slower at runtime than the current situation. > > The latter thread was started by Jim J. Jewett who stressed that > the ability to properly evaluate annotations is an important requirement > and backwards compatibility in that regard is valuable. After some > discussion he admitted that side effects in annotations are a code smell > and modal support to either perform or not perform evaluation is > a messy solution. His biggest concern remained loss of functionality > stemming from the evaluation restrictions on global and local scope. > > Nick Coghlan pointed out that some of those evaluation restrictions from > the PEP could be lifted by a clever implementation of an evaluation > helper, which could solve self-referencing classes even in the form of a > class decorator. He suggested the PEP should provide this helper > function in the standard library. > > Second draft discussion on python-dev > ------------------------------------- > > Discussion happened mainly in the `announcement thread < > https://mail.python.org/pipermail/python-dev/2017-November/150062.html>`_, > followed by a brief discussion under `Mark Shannon's post > >`_. > > Steven D'Aprano was concerned whether it's acceptable for typos to be > allowed in annotations after the change proposed by the PEP. Brett > Cannon responded that type checkers and other static analyzers (like > linters or programming text editors) will catch this type of error. > Jukka Lehtosalo added that this situation is analogous to how names in > function bodies are not resolved until the function is called. > > A major topic of discussion was Nick Coghlan's suggestion to store > annotations in "thunk form", in other words as a specialized lambda > which would be able to access class-level scope (and allow for scope > customization at call time). He presented a possible design for it > (`indirect attribute cells > >`_). > This was later seen as equivalent to "special forms" in Lisp. Guido van > Rossum expressed worry that this sort of feature cannot be safely > implemented in twelve weeks (i.e. in time before the Python 3.7 beta > freeze). > > After a while it became clear that the point of division between > supporters of the string form vs. supporters of the thunk form is > actually about whether annotations should be perceived as a general > syntactic element vs. something tied to the type checking use case. > > Finally, Guido van Rossum declared he's rejecting the thunk idea > based on the fact that it would require a new building block in the > interpreter. This block would be exposed in annotations, multiplying > possible types of values stored in ``__annotations__`` (arbitrary > objects, strings, and now thunks). Moreover, thunks aren't as > introspectable as strings. Most importantly, Guido van Rossum > explicitly stated interest in gradually restricting the use of > annotations to static typing (with an optional runtime component). > > Nick Coghlan got convinced to PEP 563, too, promptly beginning > the mandatory bike shedding session on the name of the ``__future__`` > import. Many debaters agreed that ``annotations`` seems like > an overly broad name for the feature name. Guido van Rossum briefly > decided to call it ``string_annotations`` but then changed his mind, > arguing that ``division`` is a precedent of a broad name with a clear > meaning. > > The final improvement to the PEP suggested in the discussion by Mark > Shannon was the rejection of the temptation to pass string literals > through to ``__annotations__`` verbatim. > > A side-thread of discussion started around the runtime penalty of > static typing, with topic like the import time of the ``typing`` > module (which is comparable to ``re`` without dependencies, and > three times as heavy as ``re`` when counting dependencies). > > > Acknowledgements > ================ > > This document could not be completed without valuable input, > encouragement and advice from Guido van Rossum, Jukka Lehtosalo, and > Ivan Levkivskyi. > > > Copyright > ========= > > This document has been placed in the public domain. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Mon Dec 4 15:33:16 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 4 Dec 2017 12:33:16 -0800 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: References: <20171203213332.6f315523@fsol> <5db84927-4135-1cb9-425e-7ea59e0da4f5@trueblade.com> <45c7f8be-7acd-0f66-e1b3-4a5df5def515@trueblade.com> <573264df-1f5e-eade-be1c-bfd2f07ddcbd@trueblade.com> Message-ID: <6BA83D33-719F-4F2A-9354-67512857DEF9@gmail.com> > On Dec 4, 2017, at 9:17 AM, Guido van Rossum wrote: > > And with this, I'm accepting PEP 557, Data Classes. Woohoo! I think everyone was looking forward to this moment. Raymond From levkivskyi at gmail.com Mon Dec 4 17:19:51 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Mon, 4 Dec 2017 23:19:51 +0100 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: References: <20171203213332.6f315523@fsol> <5db84927-4135-1cb9-425e-7ea59e0da4f5@trueblade.com> <45c7f8be-7acd-0f66-e1b3-4a5df5def515@trueblade.com> <573264df-1f5e-eade-be1c-bfd2f07ddcbd@trueblade.com> Message-ID: Congratulations, Eric! This is a great PEP and I am looking forward to implement support for it in mypy ;-) -- Ivan On 4 December 2017 at 18:17, Guido van Rossum wrote: > And with this, I'm accepting PEP 557, Data Classes. > > Eric, congrats with your efforts in proposing and implementing this PEP > and guiding it through the discussion! It's been great to see this idea > come to fruition. Thanks also to the many people who reviewed drafts or > implementation code, including the very generous authors and maintainers of > "attrs", from which this has taken many ideas. > > -- > --Guido van Rossum (python.org/~guido) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Mon Dec 4 17:32:31 2017 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 4 Dec 2017 17:32:31 -0500 Subject: [Python-Dev] PEP 557 Data Classes 5th posting In-Reply-To: References: <20171203213332.6f315523@fsol> <5db84927-4135-1cb9-425e-7ea59e0da4f5@trueblade.com> <45c7f8be-7acd-0f66-e1b3-4a5df5def515@trueblade.com> <573264df-1f5e-eade-be1c-bfd2f07ddcbd@trueblade.com> Message-ID: On 12/4/2017 5:19 PM, Ivan Levkivskyi wrote: > Congratulations, Eric! This is a great PEP and I am looking forward to > implement support for it in mypy ;-) Thanks for all of your help, Ivan, especially for design decisions that help interoperability with mypy. I'm looking forward to mypy support, too! Eric. From levkivskyi at gmail.com Mon Dec 4 17:14:22 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Mon, 4 Dec 2017 23:14:22 +0100 Subject: [Python-Dev] Accepting PEP 562 -- Module __getattr__ and __dir__ In-Reply-To: References: Message-ID: Thank you Guido! And thanks everyone for help, discussions, and ideas (in particular Larry who started this discussion). I will submit a PR with implementation soon. -- Ivan On 4 December 2017 at 17:58, Guido van Rossum wrote: > Ivan, > > Congrats on your PEP. I believe the outstanding issues are now resolved > and I am hereby accepting it. > > PS. Sorry, Larry, PEP 549 is rejected. But that happened a while ago. > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Mon Dec 4 17:18:05 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Mon, 4 Dec 2017 23:18:05 +0100 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: References: Message-ID: Thank you! It looks like we have a bunch of accepted PEPs today. It is great to see all this! Thanks everyone who participated in discussions here, on python-ideas and on typing tracker. Special thanks to Mark who started this discussion. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Mon Dec 4 18:21:09 2017 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 4 Dec 2017 23:21:09 +0000 Subject: [Python-Dev] Zero-width matching in regexes Message-ID: <16f7437d-3e11-6a19-3569-e4d55a370744@mrabarnett.plus.com> I've finally come to a conclusion as to what the "correct" behaviour of zero-width matches should be: """always return the first match, but never a zero-width match that is joined to a previous zero-width match""". If it's about to return a zero-width match that's joined to a previous zero-width match, then backtrack and keep on looking for a match. Example: >>> print([m.span() for m in re.finditer(r'|.', 'a')]) [(0, 0), (0, 1), (1, 1)] re.findall, re.split and re.sub should work accordingly. If re.finditer finds n matches, then re.split should return a list of n+1 strings and re.sub should make n replacements (excepting maxsplit, etc.). From levkivskyi at gmail.com Mon Dec 4 17:22:24 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Mon, 4 Dec 2017 23:22:24 +0100 Subject: [Python-Dev] PEP 563: Postponed Evaluation of Annotations (Draft 3) In-Reply-To: References: Message-ID: Congratulations, ?ukasz! -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ja.py at farowl.co.uk Tue Dec 5 01:11:32 2017 From: ja.py at farowl.co.uk (Jeff Allen) Date: Tue, 5 Dec 2017 06:11:32 +0000 Subject: [Python-Dev] PEPs: ``.. code:: python`` or ``::`` (syntax highlighting) In-Reply-To: References: Message-ID: <33f5fdb0-5de7-92b3-7994-c9f35ec12ae8@farowl.co.uk> The way this is expressed to docutils is slightly different from the way it would be expressed to Sphinx. I expected someone would mention this in relation to a possible move to RTD and Sphinx for PEPs and potential to have to re-work the ReST. Sorry if this was obvious, and the re-work simply too trivial to mention. Both use pygments, but the directive to Sphinx is ".. code-block:: ". The "::" shorthand works, meaning to take the language from the last ".. highlight:: " directive, or conf.py (usually "python"). This may be got from the references [1] vs [2] and [3] in Wes' original post, but in addition there's a little section in the devguide [6]. In my experience, when browsing a .rst file, GitHub recognises my code blocks (Sphinx "code-block::") and it colours Python (and Java) but not Python console. It does not use the scheme chosen in conf.py (but nor does RTD [7]). There are other limitations. Browsing the devguide source [8] there gives a good idea what the GitHub can and cannot represent in this view. [6] https://devguide.python.org/documenting/#showing-code-examples [7] https://docs.readthedocs.io/en/latest/faq.html#i-want-to-use-the-blue-default-sphinx-theme [8] https://github.com/python/devguide Jeff Allen On 03/12/2017 04:49, Wes Turner wrote: > Add pygments for ``.. code::`` directive PEP syntax highlighting #1206 > https://github.com/python/pythondotorg/issues/1206 > > Syntax highlighting is an advantage for writers, editors, and readers. > > reStructuredText PEPs are rendered into HTML with docutils. Syntax > highlighting in Docutils 0.9+ is powered by Pygments. If Pygments is > not installed, or there is a syntax error, syntax highlighting is > absent. Docutils renders ``.. code::`` blocks with Python syntax > highlighting by default. You can specify ``.. code:: python`` or ``.. > code:: python3``. > > - GitHub shows Pygments syntax highlighting > for ``.. code::`` directives for .rst and .restructuredtext documents > - PEPs may eventually be hosted on ReadTheDocs with Sphinx (which > installs docutils and pygments as install_requires in setup.py). > https://github.com/python/peps/issues/2 > https://github.com/python/core-workflow/issues/5 > > In order to use pygments with pythondotorg-hosted PEPs, a few things > need to happen: > > - [ ] Include ``pygments`` in ``base-requirements.txt`` > - [ ] Pick a pygments theme > ? - Should we use the sphinx_rtd_theme default for consistency with > the eventual RTD-hosted PEPs? > - [ ] Include the necessary pygments CSS in the PEPs django template > - [ ] rebuild the PEPs > - Start using code directives in new PEPs > - Manually review existing PEPs after adding code directives > > PEPs may use ``.. code::`` blocks instead of ``::`` so that code is > syntax highlighted. > > On Saturday, December 2, 2017, Nick Coghlan > wrote: > > On 3 December 2017 at 12:32, Wes Turner > wrote: > > Pending a transition of PEPs to ReadTheDocs (with HTTPS on a > custom domain? > > and redirects?) (is there a gh issue for this task?), > > See https://github.com/python/peps/projects/1 > and > https://github.com/python/core-workflow/issues/5 > > > Cheers, > Nick. > > -- > Nick Coghlan? ?| ncoghlan at gmail.com ?|? ?Brisbane, > Australia > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ja.py%40farowl.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Dec 5 09:25:51 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 5 Dec 2017 15:25:51 +0100 Subject: [Python-Dev] =?utf-8?q?=22CPython_loves_your_Pull_Requests=22_ta?= =?utf-8?q?lk_by_St=C3=A9phane_Wirtel?= Message-ID: Hi, St?phane Wirtel gave a talk last month at Pycon CA about CPython pull requests. His slides: https://speakerdeck.com/matrixise/cpython-loves-your-pull-requests He produced interesting statistics that we didn't have before on pull requests (PR), from February 2017 to October 2017: * total number of merged PR: 4204 * number of contributors: 586 !!! (96%) * number of core developers: 27 (4%) * Time to merge a PR: 3 days in average, good! * etc. It would be nice to get these statistics updated regularly on a service running somewhere. By the way, I'm also looking for statistics on reviews on GitHub. Does someone know how to do that? Victor From mariatta.wijaya at gmail.com Tue Dec 5 10:25:11 2017 From: mariatta.wijaya at gmail.com (Mariatta Wijaya) Date: Tue, 5 Dec 2017 07:25:11 -0800 Subject: [Python-Dev] =?utf-8?q?=22CPython_loves_your_Pull_Requests=22_ta?= =?utf-8?q?lk_by_St=C3=A9phane_Wirtel?= In-Reply-To: References: Message-ID: I saw the talk in person :) Congrats St?phane! You can get the reviews from a specific PR using the API: https://developer.github.com/v3/pulls/reviews/#list-reviews-on-a-pull-request For example, for reviews made to CPython PR number 1: https://api.github.com/repos/python/cpython/pulls/1/reviews * Time to merge a PR: 3 days in average, good! Regarding the average time to merge PR, I'm interested to know the average time to merge for PRs not made by Python Core Devs. Mariatta Wijaya -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Dec 5 10:50:34 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 5 Dec 2017 07:50:34 -0800 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: If you ask me this PEP is not going to make everyone happy, but I think it is an improvement, and it seems many people are in agreement or at least don't object to it (and obviously Nick thinks it's going to be a big improvement). Therefore I am planning to accept it by the end of this week unless more objections are voiced. Honestly, I didn't completely follow what Victor thinks of the PEP -- his post seemed mostly about promoting his own -X dev flag. I have nothing against that flag but I don't see how its existence is relevant to the PEP, which is about giving users who don't even know they are Python developers a hint when they are using deprecated features (for which there always must be a shiny new replacement!). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Dec 5 10:52:32 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 5 Dec 2017 16:52:32 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode Message-ID: Hi, Since it's the PEP Acceptance Week, I try my luck! Here is my very long PEP to propose a tiny change. The PEP is very long to explain the rationale and limitations. Inaccurate tl; dr with the UTF-8 mode, Unicode "just works" as expected. Reminder: INADA Naoki was nominated as the BDFL-Delegate. https://www.python.org/dev/peps/pep-0540/ Full-text below. Victor PEP: 540 Title: Add a new UTF-8 mode Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner , Nick Coghlan BDFL-Delegate: INADA Naoki Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 5-January-2016 Python-Version: 3.7 Abstract ======== Add a new UTF-8 mode, enabled by default in the POSIX locale, to ignore the locale and force the usage of the UTF-8 encoding for external operating system interfaces, including the standard IO streams. Essentially, the UTF-8 mode behaves as Python 2 and other C based applications on \*nix systems: it aims to process text as best it can, but it errs on the side of producing or propagating mojibake to subsequent components in a processing pipeline rather than requiring strictly valid encodings at every step in the process. The UTF-8 mode can be configured as strict to reduce the risk of producing or propagating mojibake. A new ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable are added to explicitly control the UTF-8 mode (including turning it off entirely, even in the POSIX locale). Rationale ========= "It's not a bug, you must fix your locale" is not an acceptable answer ---------------------------------------------------------------------- Since Python 3.0 was released in 2008, the usual answer to users getting Unicode errors is to ask developers to fix their code to handle Unicode properly. Most applications and Python modules were fixed, but users kept reporting Unicode errors regularly: see the long list of issues in the `Links`_ section below. In fact, a second class of bugs comes from a locale which is not properly configured. The usual answer to such a bug report is: "it is not a bug, you must fix your locale". Technically, the answer is correct, but from a practical point of view, the answer is not acceptable. In many cases, "fixing the issue" is a hard task. Moreover, sometimes, the usage of the POSIX locale is deliberate. A good example of a concrete issue are build systems which create a fresh environment for each build using a chroot, a container, a virtual machine or something else to get reproducible builds. Such a setup usually uses the POSIX locale. To get 100% reproducible builds, the POSIX locale is a good choice: see the `Locales section of reproducible-builds.org `_. PEP 538 lists additional problems related to the use of Linux containers to run network services and command line applications. UNIX users don't expect Unicode errors, since the common command lines tools like ``cat``, ``grep`` or ``sed`` never fail with Unicode errors - they produce mostly-readable text instead. These users similarly expect that tools written in Python 3 (including those updated from Python 2), continue to tolerate locale misconfigurations and avoid bothering them with text encoding details. >From their point of the view, the bug is not their locale but is obviously Python 3 ("Everything else works, including Python 2, so what's wrong with Python 3?"). Since Python 2 handles data as bytes, similar to system utilities written in C and C++, it's rarer in Python 2 compared to Python 3 to get explicit Unicode errors. It also contributes significantly to why many affected users perceive Python 3 as the root cause of their Unicode errors. At the same time, the stricter text handling model was deliberately introduced into Python 3 to reduce the frequency of data corruption bugs arising in production services due to mismatched assumptions regarding text encodings. It's one thing to emit mojibake to a user's terminal while listing a directory, but something else entirely to store that in a system manifest in a database, or to send it to a remote client attempting to retrieve files from the system. Since different group of users have different expectations, there is no silver bullet which solves all issues at once. Last but not least, backward compatibility should be preserved whenever possible. Locale and operating system data -------------------------------- .. _operating system data: Python uses an encoding called the "filesystem encoding" to decide how to encode and decode data from/to the operating system: * file content * command line arguments: ``sys.argv`` * standard streams: ``sys.stdin``, ``sys.stdout``, ``sys.stderr`` * environment variables: ``os.environ`` * filenames: ``os.listdir(str)`` for example * pipes: ``subprocess.Popen`` using ``subprocess.PIPE`` for example * error messages: ``os.strerror(code)`` for example * user and terminal names: ``os``, ``grp`` and ``pwd`` modules * host name, UNIX socket path: see the ``socket`` module * etc. At startup, Python calls ``setlocale(LC_CTYPE, "")`` to use the user ``LC_CTYPE`` locale and then store the locale encoding as the "filesystem error". It's possible to get this encoding using ``sys.getfilesystemencoding()``. In the whole lifetime of a Python process, the same encoding and error handler are used to encode and decode data from/to the operating system. The ``os.fsdecode()`` and ``os.fsencode()`` functions can be used to decode and encode operating system data. These functions use the filesystem error handler: ``sys.getfilesystemencodeerrors()``. .. note:: In some corner cases, the *current* ``LC_CTYPE`` locale must be used instead of ``sys.getfilesystemencoding()``. For example, the ``time`` module uses the *current* ``LC_CTYPE`` locale to decode timezone names. The POSIX locale and its encoding --------------------------------- The following environment variables are used to configure the locale, in this preference order: * ``LC_ALL``, most important variable * ``LC_CTYPE`` * ``LANG`` The POSIX locale, also known as "the C locale", is used: * if the first set variable is set to ``"C"`` * if all these variables are unset, for example when a program is started in an empty environment. The encoding of the POSIX locale must be ASCII or a superset of ASCII. On Linux, the POSIX locale uses the ASCII encoding. On FreeBSD and Solaris, ``nl_langinfo(CODESET)`` announces an alias of the ASCII encoding, whereas ``mbstowcs()`` and ``wcstombs()`` functions use the ISO 8859-1 encoding (Latin1) in practice. The problem is that ``os.fsencode()`` and ``os.fsdecode()`` use ``locale.getpreferredencoding()`` codec. For example, if command line arguments are decoded by ``mbstowcs()`` and encoded back by ``os.fsencode()``, an ``UnicodeEncodeError`` exception is raised instead of retrieving the original byte string. To fix this issue, Python checks since Python 3.4 if ``mbstowcs()`` really uses the ASCII encoding if the the ``LC_CTYPE`` uses the the POSIX locale and ``nl_langinfo(CODESET)`` returns ``"ASCII"`` (or an alias to ASCII). If not (the effective encoding is not ASCII), Python uses its own ASCII codec instead of using ``mbstowcs()`` and ``wcstombs()`` functions for `operating system data`_. See the `POSIX locale (2016 Edition) `_. POSIX locale used by mistake ---------------------------- In many cases, the POSIX locale is not really expected by users who get it by mistake. Examples: * program started in an empty environment * User forcing LANG=C to get messages in English * LANG=C used for bad reasons, without being aware of the ASCII encoding * SSH shell * Linux installed with no configured locale * chroot environment, Docker image, container, ... with no locale is configured * User locale set to a non-existing locale, typo in the locale name for example C.UTF-8 and C.utf8 locales -------------------------- Some UNIX operating systems provide a variant of the POSIX locale using the UTF-8 encoding: * Fedora 25: ``"C.utf8"`` or ``"C.UTF-8"`` * Debian (eglibc 2.13-1, 2011), Ubuntu: ``"C.UTF-8"`` * HP-UX: ``"C.utf8"`` It was proposed to add a ``C.UTF-8`` locale to the glibc: `glibc C.UTF-8 proposal `_. It is not planned to add such locale to BSD systems. Popularity of the UTF-8 encoding -------------------------------- Python 3 uses UTF-8 by default for Python source files. On Mac OS X, Windows and Android, Python always use UTF-8 for operating system data. For Windows, see the `PEP 529`_: "Change Windows filesystem encoding to UTF-8". On Linux, UTF-8 became the de facto standard encoding, replacing legacy encodings like ISO 8859-1 or ShiftJIS. For example, using different encodings for filenames and standard streams is likely to create mojibake, so UTF-8 is now used *everywhere* (at least for modern distributions using their default settings). The UTF-8 encoding is the default encoding of XML and JSON file format. In January 2017, UTF-8 was used in `more than 88% of web pages `_ (HTML, Javascript, CSS, etc.). See `utf8everywhere.org `_ for more general information on the UTF-8 codec. .. note:: Some applications and operating systems (especially Windows) use Byte Order Markers (BOM) to indicate the used Unicode encoding: UTF-7, UTF-8, UTF-16-LE, etc. BOM are not well supported and rarely used in Python. Old data stored in different encodings and surrogateescape ---------------------------------------------------------- Even if UTF-8 became the de facto standard, there are still systems in the wild which don't use UTF-8. And there are a lot of data stored in different encodings. For example, an old USB key using the ext3 filesystem with filenames encoded to ISO 8859-1. The Linux kernel and libc don't decode filenames: a filename is used as a raw array of bytes. The common solution to support any filename is to store filenames as bytes and don't try to decode them. When displayed to stdout, mojibake is displayed if the filename and the terminal don't use the same encoding. Python 3 promotes Unicode everywhere including filenames. A solution to support filenames not decodable from the locale encoding was found: the ``surrogateescape`` error handler (`PEP 383`_), store undecodable bytes as surrogate characters. This error handler is used by default for `operating system data`_, by ``os.fsdecode()`` and ``os.fsencode()`` for example (except on Windows which uses the ``strict`` error handler). Standard streams ---------------- Python uses the locale encoding for standard streams: stdin, stdout and stderr. The ``strict`` error handler is used by stdin and stdout to prevent mojibake. The ``backslashreplace`` error handler is used by stderr to avoid Unicode encode errors when displaying non-ASCII text. It is especially useful when the POSIX locale is used, because this locale usually uses the ASCII encoding. The problem is that `operating system data`_ like filenames are decoded using the ``surrogateescape`` error handler (`PEP 383`_). Displaying a filename to stdout raises a Unicode encode error if the filename contains an undecoded byte stored as a surrogate character. Python 3.5+ now uses ``surrogateescape`` for stdin and stdout if the POSIX locale is used: `issue #19977 `_. The idea is to pass through `operating system data`_ even if it means mojibake, because most UNIX applications work like that. Such UNIX applications often store filenames as bytes, in many cases because their basic design principles (or those of the language they're implemented in) were laid down half a century ago when it was still a feat for computers to handle English text correctly, rather than humans having to work with raw numeric indexes. .. note:: The encoding and/or the error handler of standard streams can be overriden with the ``PYTHONIOENCODING`` environment variable. Proposal ======== Changes ------- Add a new UTF-8 mode, enabled by default in the POSIX locale, but otherwise disabled by default, to ignore the locale and force the usage of the UTF-8 encoding with the ``surrogateescape`` error handler, instead using the locale encoding (with ``strict`` or ``surrogateescape`` error handler depending on the case). The "normal" UTF-8 mode uses ``surrogateescape`` on the standard input and output streams and opened files, as well as on all operating system interfaces. This is the mode implicitly activated by the POSIX locale. The "strict" UTF-8 mode reduces the risk of producing or propogating mojibake: the UTF-8 encoding is used with the ``strict`` error handler for inputs and outputs, but the ``surrogateescape`` error handler is still used for `operating system data`_. This mode is never activated implicitly, but can be requested explicitly. The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable are added to control the UTF-8 mode. The UTF-8 mode is enabled by ``-X utf8`` or ``PYTHONUTF8=1``. The UTF-8 Strict mode is configured by ``-X utf8=strict`` or ``PYTHONUTF8=strict``. The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``. Other option values fail with an error. Options priority for the UTF-8 mode: * ``PYTHONLEGACYWINDOWSFSENCODING`` * ``-X utf8`` * ``PYTHONUTF8`` * POSIX locale For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the UTF-8 mode, whereas ``LC_ALL=C python3.7 -X utf8=0`` disables the UTF-8 mode and so use the encoding of the POSIX locale. Encodings used by ``open()``, highest priority first: * *encoding* and *errors* parameters (if set) * UTF-8 mode * ``os.device_encoding(fd)`` * ``os.getpreferredencoding(False)`` Encoding and error handler -------------------------- The UTF-8 mode changes the default encoding and error handler used by ``open()``, ``os.fsdecode()``, ``os.fsencode()``, ``sys.stdin``, ``sys.stdout`` and ``sys.stderr``: ============================ ======================= ========================== ========================== Function Default UTF-8 mode or POSIX locale UTF-8 Strict mode ============================ ======================= ========================== ========================== open() locale/strict **UTF-8/surrogateescape** **UTF-8**/strict os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape **UTF-8**/surrogateescape sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape** **UTF-8**/strict sys.stderr locale/backslashreplace **UTF-8**/backslashreplace **UTF-8**/backslashreplace ============================ ======================= ========================== ========================== By comparison, Python 3.6 uses: ============================ ======================= ========================== Function Default POSIX locale ============================ ======================= ========================== open() locale/strict locale/strict os.fsdecode(), os.fsencode() locale/surrogateescape locale/surrogateescape sys.stdin, sys.stdout locale/strict locale/**surrogateescape** sys.stderr locale/backslashreplace locale/backslashreplace ============================ ======================= ========================== The UTF-8 mode uses the ``surrogateescape`` error handler instead of the strict mode for consistency with other standard \*nix operating system components: the idea is that data not encoded to UTF-8 are passed through "Python" without being modified, as raw bytes. The ``PYTHONIOENCODING`` environment variable has priority over the UTF-8 mode for standard streams. For example, ``PYTHONIOENCODING=latin1 python3 -X utf8`` uses the Latin1 encoding for stdin, stdout and stderr. Encoding and error handler on Windows ------------------------------------- On Windows, the encodings and error handlers are different: ============================ ======================= ========================== ========================== ========================== Function Default Legacy Windows FS encoding UTF-8 mode UTF-8 Strict mode ============================ ======================= ========================== ========================== ========================== open() mbcs/strict mbcs/strict **UTF-8/surrogateescape** **UTF-8**/strict os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** UTF-8/surrogatepass UTF-8/surrogatepass sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape UTF-8/surrogateescape **UTF-8/strict** sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace ============================ ======================= ========================== ========================== ========================== By comparison, Python 3.6 uses: ============================ ======================= ========================== Function Default Legacy Windows FS encoding ============================ ======================= ========================== open() mbcs/strict mbcs/strict os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace ============================ ======================= ========================== The "Legacy Windows FS encoding" is enabled by setting the ``PYTHONLEGACYWINDOWSFSENCODING`` environment variable to ``1`` as specified in `PEP 529` . Enabling the legacy Windows filesystem encoding disables the UTF-8 mode (as ``-X utf8=0``). If stdin and/or stdout is redirected to a pipe, ``sys.stdin`` and/or ``sys.output`` use ``mbcs`` encoding by default rather than UTF-8. But with the UTF-8 mode, ``sys.stdin`` and ``sys.stdout`` always use the UTF-8 encoding. There is no POSIX locale on Windows. The ANSI code page is used to the locale encoding, and this code page never uses the ASCII encoding. Rationale --------- The UTF-8 mode is disabled by default to keep hard Unicode errors when encoding or decoding `operating system data`_ failed, and to keep the backward compatibility. The user is responsible to enable explicitly the UTF-8 mode, and so is better prepared for mojibake than if the UTF-8 mode would be enabled *by default*. The UTF-8 mode should be used on systems known to be configured with UTF-8 where most applications speak UTF-8. It prevents Unicode errors if the user overrides a locale *by mistake* or if a Python program is started with no locale configured (and so with the POSIX locale). Most UNIX applications handle `operating system data`_ as bytes, so ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables have a limited impact on how these data are handled by the application. The Python UTF-8 mode should help to make Python more interoperable with the other UNIX applications in the system assuming that *UTF-8* is used everywhere and that users *expect* UTF-8. Ignoring ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables in Python is more convenient, since they are more commonly misconfigured *by mistake* (configured to use an encoding different than UTF-8, whereas the system uses UTF-8), rather than being misconfigured by intent. Expected mojibake and surrogate character issues ------------------------------------------------ The UTF-8 mode only affects code running directly in Python, especially code written in pure Python. The other code, called "external code" here, is not aware of this mode. Examples: * C libraries called by Python modules like OpenSSL * The application code when Python is embedded in an application In the UTF-8 mode, Python uses the ``surrogateescape`` error handler which stores bytes not decodable from UTF-8 as surrogate characters. If the external code uses the locale and the locale encoding is UTF-8, it should work fine. External code using bytes ^^^^^^^^^^^^^^^^^^^^^^^^^ If the external code processes data as bytes, surrogate characters are not an issue since they are only used inside Python. Python encodes back surrogate characters to bytes at the edges, before calling external code. The UTF-8 mode can produce mojibake since Python and external code don't both of invalid bytes, but it's a deliberate choice. The UTF-8 mode can be configured as strict to prevent mojibake and fail early when data is not decodable from UTF-8 or not encodable to UTF-8. External code using text ^^^^^^^^^^^^^^^^^^^^^^^^ If the external code uses text API, for example using the ``wchar_t*`` C type, mojibake should not occur, but the external code can fail on surrogate characters. Use Cases ========= The following use cases were written to help to understand the impact of chosen encodings and error handlers on concrete examples. The "Exception?" column shows the potential benefit of having a UTF-8 mode which is closer to the traditional Python 2 behaviour of passing along raw binary data even if it isn't valid UTF-8. The "Mojibake" column shows that ignoring the locale causes a practical issue: the UTF-8 mode produces mojibake if the terminal doesn't use the UTF-8 encoding. The ideal configuration is "No exception, no risk of mojibake", but that isn't always possible in the presence of non-UTF-8 encoded binary data. List a directory into stdout ---------------------------- Script listing the content of the current directory into stdout:: import os for name in os.listdir(os.curdir): print(name) Result: ======================== ========== ========= Python Exception? Mojibake? ======================== ========== ========= Python 2 No **Yes** Python 3 **Yes** No Python 3.5, POSIX locale No **Yes** UTF-8 mode No **Yes** UTF-8 Strict mode **Yes** No ======================== ========== ========= "Exception?" means that the script can fail on decoding or encoding a filename depending on the locale or the filename. To be able to never fail that way, the program must be able to produce mojibake. For automated and interactive process, mojibake is often more user friendly than an error with a truncated or empty output, since it confines the problem to the affected entry, rather than aborting the whole task. Example with a directory which contains the file called ``b'xxx\xff'`` (the byte ``0xFF`` is invalid in UTF-8). Default and UTF-8 Strict mode fail on ``print()`` with an encode error:: $ python3.7 ../ls.py Traceback (most recent call last): File "../ls.py", line 5, in print(name) UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ... $ python3.7 -X utf8=strict ../ls.py Traceback (most recent call last): File "../ls.py", line 5, in print(name) UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ... The UTF-8 mode, POSIX locale, Python 2 and the UNIX ``ls`` command work but display mojibake:: $ python3.7 -X utf8 ../ls.py xxx? $ LC_ALL=C /python3.6 ../ls.py xxx? $ python2 ../ls.py xxx? $ ls 'xxx'$'\377' List a directory into a text file --------------------------------- Similar to the previous example, except that the listing is written into a text file:: import os names = os.listdir(os.curdir) with open("/tmp/content.txt", "w") as fp: for name in names: fp.write("%s\n" % name) Result: ======================== ========== ========= Python Exception? Mojibake? ======================== ========== ========= Python 2 No **Yes** Python 3 **Yes** No Python 3.5, POSIX locale **Yes** No UTF-8 mode No **Yes** UTF-8 Strict mode **Yes** No ======================== ========== ========= Again, never throwing an exception requires that mojibake can be produced, while preventing mojibake means that the script can fail on decoding or encoding a filename depending on the locale or the filename. Typical error:: $ LC_ALL=C python3 test.py Traceback (most recent call last): File "test.py", line 5, in fp.write("%s\n" % name) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) Compared with native system tools:: $ ls > /tmp/content.txt $ cat /tmp/content.txt xxx? Display Unicode characters into stdout -------------------------------------- Very basic example used to illustrate a common issue, display the euro sign (U+20AC: ?):: print("euro: \u20ac") Result: ======================== ========== ========= Python Exception? Mojibake? ======================== ========== ========= Python 2 **Yes** No Python 3 **Yes** No Python 3.5, POSIX locale **Yes** No UTF-8 mode No **Yes** UTF-8 Strict mode No **Yes** ======================== ========== ========= The UTF-8 and UTF-8 Strict modes will always encode the euro sign as UTF-8. If the terminal uses a different encoding, we get mojibake. For example, using ``iconv`` to emulate a GB-18030 terminal inside a UTF-8 one:: $ python3 -c 'print("euro: \u20ac")' | iconv -f gb18030 -t utf8 euro: ?iconv: illegal input sequence at position 8 The misencoding also corrupts the trailing newline such that the output stream isn't actually a valid GB-18030 sequence, hence the error message after the euro symbol is misinterpreted as a hanzi character. Replace a word in a text ------------------------ The following script replaces the word "apple" with "orange". It reads input from stdin and writes the output into stdout:: import sys text = sys.stdin.read() sys.stdout.write(text.replace("apple", "orange")) Result: ======================== ========== ========= Python Exception? Mojibake? ======================== ========== ========= Python 2 No **Yes** Python 3 **Yes** No Python 3.5, POSIX locale No **Yes** UTF-8 mode No **Yes** UTF-8 Strict mode **Yes** No ======================== ========== ========= This is a case where passing along the raw bytes (by way of the ``surrogateescape`` error handler) will bring Python 3's behaviour back into line with standard operating system tools like ``sed`` and ``awk``. Producer-consumer model using pipes ----------------------------------- Let's say that we have a "producer" program which writes data into its stdout and a "consumer" program which reads data from its stdin. On a shell, such programs are run with the command:: producer | consumer The question if these programs will work with any data and any locale. UNIX users don't expect Unicode errors, and so expect that such programs "just works", in the sense that Unicode errors may cause problems in the data stream, but won't cause the entire stream processing *itself* to abort. If the producer only produces ASCII output, no error should occur. Let's say that the producer writes at least one non-ASCII character (at least one byte in the range ``0x80..0xff``). To simplify the problem, let's say that the consumer has no output (doesn't write results into a file or stdout). A "Bytes producer" is an application which cannot fail with a Unicode error and produces bytes into stdout. Let's say that a "Bytes consumer" does not decode stdin but stores data as bytes: such consumer always work. Common UNIX command line tools like ``cat``, ``grep`` or ``sed`` are in this category. Many Python 2 applications are also in this category, as are applications that work with the lower level binary input and output stream in Python 3 rather than the default text mode streams. "Python producer" and "Python consumer" are producer and consumer implemented in Python using the default text mode input and output streams. Bytes producer, Bytes consumer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This won't through exceptions, but it is out of the scope of this PEP since it doesn't involve Python's default text mode input and output streams. Python producer, Bytes consumer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Python producer:: print("euro: \u20ac") Result: ======================== ========== ========= Python Exception? Mojibake? ======================== ========== ========= Python 2 **Yes** No Python 3 **Yes** No Python 3.5, POSIX locale **Yes** No UTF-8 mode No **Yes** UTF-8 Strict mode No **Yes** ======================== ========== ========= The question here is not if the consumer is able to decode the input, but if Python is able to produce its output. So it's similar to the `Display Unicode characters into stdout`_ case. UTF-8 modes work with any locale since the consumer doesn't try to decode its stdin. Bytes producer, Python consumer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Python consumer:: import sys text = sys.stdin.read() result = text.replace("apple", "orange") # ignore the result Result: ======================== ========== ========= Python Exception? Mojibake? ======================== ========== ========= Python 2 No **Yes** Python 3 **Yes** No Python 3.5, POSIX locale No **Yes** UTF-8 mode No **Yes** UTF-8 Strict mode **Yes** No ======================== ========== ========= Python 3 may throw an exception on decoding stdin depending on the input and the locale. Python producer, Python consumer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Python producer:: print("euro: \u20ac") Python consumer:: import sys text = sys.stdin.read() result = text.replace("apple", "orange") # ignore the result Result, same Python version used for the producer and the consumer: ======================== ========== ========= Python Exception? Mojibake? ======================== ========== ========= Python 2 **Yes** No Python 3 **Yes** No Python 3.5, POSIX locale **Yes** No UTF-8 mode No No(!) UTF-8 Strict mode No No(!) ======================== ========== ========= This case combines a Python producer with a Python consumer, and the result is mainly the same as that for `Python producer, Bytes consumer`_, since the consumer can't read what the producer can't emit. However, the behaviour of the "UTF-8" and "UTF-8 Strict" modes in this configuration is notable: they don't produce an exception, *and* they shouldn't produce mojibake, as both the producer and the consumer are making *consistent* assumptions regarding the text encoding used on the pipe between them (i.e. UTF-8). Any mojibake generated would only be in the interfaces bween the consuming component and the outside world (e.g. the terminal, or when writing to a file). Backward Compatibility ====================== The main backward incompatible change is that the UTF-8 encoding is now used by default if the locale is POSIX. Since the UTF-8 encoding is used with the ``surrogateescape`` error handler, encoding errors should not occur and so the change should not break applications. The UTF-8 encoding is also quite restrictive regarding where it allows plain ASCII code points to appear in the byte stream, so even for ASCII-incompatible encodings, such byte values will often be escaped rather than being processed as ASCII characters. The more likely source of trouble comes from external libraries. Python can decode successfully data from UTF-8, but a library using the locale encoding can fail to encode the decoded text back to bytes. For example, GNU readline currently has problems on Android due to the mismatch between CPython's encoding assumptions there (always UTF-8) and GNU readline's encoding assumptions (which are based on the nominal locale). The PEP only changes the default behaviour if the locale is POSIX. For other locales, the *default* behaviour is unchanged. PEP 538 is a follow-up to this PEP that extends CPython's assumptions to other locale-aware components in the same process by explicitly coercing the POSIX locale to something more suitable for modern text processing. See that PEP for further details. Alternatives ============ Don't modify the encoding of the POSIX locale --------------------------------------------- A first version of the PEP did not change the encoding and error handler used of the POSIX locale. The problem is that adding the ``-X utf8`` command line option or setting the ``PYTHONUTF8`` environment variable is not possible in some cases, or at least not convenient. Moreover, many users simply expect that Python 3 behaves as Python 2: don't bother them with encodings and "just works" in all cases. These users don't worry about mojibake, or even expect mojibake because of complex documents using multiple incompatibles encodings. Always use UTF-8 ---------------- Python already always uses the UTF-8 encoding on Mac OS X, Android and Windows. Since UTF-8 became the de facto encoding, it makes sense to always use it on all platforms with any locale. The problem with this approach is that Python is also used extensively in desktop environments, and it is often a practical or even legal requirement to support locale encoding other than UTF-8 (for example, GB-18030 in China, and Shift-JIS or ISO-2022-JP in Japan) Force UTF-8 for the POSIX locale -------------------------------- An alternative to always using UTF-8 in any case is to only use UTF-8 when the ``LC_CTYPE`` locale is the POSIX locale. The `PEP 538`_ "Coercing the legacy C locale to C.UTF-8" of Nick Coghlan proposes to implement that using the ``C.UTF-8`` locale. Use the strict error handler for operating system data ------------------------------------------------------ Using the ``surrogateescape`` error handler for `operating system data`_ creates surprising surrogate characters. No Python codec (except of ``utf-7``) accept surrogates, and so encoding text coming from the operating system is likely to raise an error error. The problem is that the error comes late, very far from where the data was read. The ``strict`` error handler can be used instead to decode (``os.fsdecode()``) and encode (``os.fsencode()``) operating system data, to raise encoding errors as soon as possible. It helps to find bugs more quickly. The main drawback of this strategy is that it doesn't work in practice. Python 3 is designed on top on Unicode strings. Most functions expect Unicode and produce Unicode. Even if many operating system functions have two flavors, bytes and Unicode, the Unicode flavor is used in most cases. There are good reasons for that: Unicode is more convenient in Python 3 and using Unicode helps to support the full Unicode Character Set (UCS) on Windows (even if Python now uses UTF-8 since Python 3.6, see the `PEP 528`_ and the `PEP 529`_). For example, if ``os.fsdecode()`` uses ``utf8/strict``, ``os.listdir(str)`` fails to list filenames of a directory if a single filename is not decodable from UTF-8. As a consequence, ``shutil.rmtree(str)`` fails to remove a directory. Undecodable filenames, environment variables, etc. are simply too common to make this alternative viable. Links ===== PEPs: * `PEP 538 `_: "Coercing the legacy C locale to C.UTF-8" * `PEP 529 `_: "Change Windows filesystem encoding to UTF-8" * `PEP 528 `_: "Change Windows console encoding to UTF-8" * `PEP 383 `_: "Non-decodable Bytes in System Character Interfaces" Main Python issues: * `Issue #29240: Implementation of the PEP 540: Add a new UTF-8 mode `_ * `Issue #28180: sys.getfilesystemencoding() should default to utf-8 `_ * `Issue #19977: Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale `_ * `Issue #19847: Setting the default filesystem-encoding `_ * `Issue #8622: Add PYTHONFSENCODING environment variable `_: added but reverted because of many issues, read the `Inconsistencies if locale and filesystem encodings are different `_ thread on the python-dev mailing list Incomplete list of Python issues related to Unicode errors, especially with the POSIX locale: * 2016-12-22: `LANG=C python3 -c "import os; os.path.exists('\xff')" `_ * 2014-07-20: `issue #22016: Add a new 'surrogatereplace' output only error handler `_ * 2014-04-27: `Issue #21368: Check for systemd locale on startup if current locale is set to POSIX `_ -- read manually /etc/locale.conf when the locale is POSIX * 2014-01-21: `Issue #20329: zipfile.extractall fails in Posix shell with utf-8 filename `_ * 2013-11-30: `Issue #19846: Python 3 raises Unicode errors with the C locale `_ * 2010-05-04: `Issue #8610: Python3/POSIX: errors if file system encoding is None `_ * 2013-08-12: `Issue #18713: Clearly document the use of PYTHONIOENCODING to set surrogateescape `_ * 2013-09-27: `Issue #19100: Use backslashreplace in pprint `_ * 2012-01-05: `Issue #13717: os.walk() + print fails with UnicodeEncodeError `_ * 2011-12-20: `Issue #13643: 'ascii' is a bad filesystem default encoding `_ * 2011-03-16: `issue #11574: TextIOWrapper should use UTF-8 by default for the POSIX locale `_, thread on python-dev: `Low-Level Encoding Behavior on Python 3 `_ * 2010-04-26: `Issue #8533: regrtest: use backslashreplace error handler for stdout `_, regrtest fails with Unicode encode error if the locale is POSIX Some issues are real bugs in applications which must explicitly set the encoding. Well, it just works in the common case (locale configured correctly), so what? The program "suddenly" fails when the POSIX locale is used (probably for bad reasons). Such bugs are not well understood by users. Example of such issues: * 2013-11-21: `pip: open() uses the locale encoding to parse Python script, instead of the encoding cookie `_ -- pip must use the encoding cookie to read a Python source code file * 2011-01-21: `IDLE 3.x can crash decoding recent file list `_ Prior Art ========= Perl has a ``-C`` command line option and a ``PERLUNICODE`` environment variable to force UTF-8: see `perlrun `_. It is possible to configure UTF-8 per standard stream, on input and output streams, etc. Post History ============ * 2017-04: `[Python-Dev] Proposed BDFL Delegate update for PEPs 538 & 540 (assuming UTF-8 for *nix system boundaries) `_ * 2017-01: `[Python-ideas] PEP 540: Add a new UTF-8 mode `_ * 2017-01: `bpo-28180: Implementation of the PEP 538: coerce C locale to C.utf-8 (msg284764) `_ * 2016-08-17: `bpo-27781: Change sys.getfilesystemencoding() on Windows to UTF-8 (msg272916) `_ -- Victor proposed ``-X utf8`` for the :pep:`529` (Change Windows filesystem encoding to UTF-8) Copyright ========= This document has been placed in the public domain. From senthil at uthcode.com Tue Dec 5 11:25:03 2017 From: senthil at uthcode.com (Senthil Kumaran) Date: Tue, 5 Dec 2017 08:25:03 -0800 Subject: [Python-Dev] =?utf-8?q?=22CPython_loves_your_Pull_Requests=22_ta?= =?utf-8?q?lk_by_St=C3=A9phane_Wirtel?= In-Reply-To: References: Message-ID: On Tue, Dec 5, 2017 at 6:25 AM, Victor Stinner wrote: > St?phane Wirtel gave a talk last month at Pycon CA about CPython pull > requests. His slides: > > https://speakerdeck.com/matrixise/cpython-loves-your-pull-requests > > He produced interesting statistics that we didn't have before on pull > requests (PR), from February 2017 to October 2017: > > * total number of merged PR: 4204 > * number of contributors: 586 !!! (96%) > * number of core developers: 27 (4%) > * Time to merge a PR: 3 days in average, good! > * etc. > Plus, the slides were entertaining. Congrats and thanks for those stats, St?phane. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Tue Dec 5 12:43:05 2017 From: brett at python.org (Brett Cannon) Date: Tue, 05 Dec 2017 17:43:05 +0000 Subject: [Python-Dev] =?utf-8?q?=22CPython_loves_your_Pull_Requests=22_ta?= =?utf-8?q?lk_by_St=C3=A9phane_Wirtel?= In-Reply-To: References: Message-ID: On Tue, 5 Dec 2017 at 08:31 Senthil Kumaran wrote: > On Tue, Dec 5, 2017 at 6:25 AM, Victor Stinner > wrote: > >> St?phane Wirtel gave a talk last month at Pycon CA about CPython pull >> requests. His slides: >> >> https://speakerdeck.com/matrixise/cpython-loves-your-pull-requests >> >> He produced interesting statistics that we didn't have before on pull >> requests (PR), from February 2017 to October 2017: >> >> * total number of merged PR: 4204 >> * number of contributors: 586 !!! (96%) >> * number of core developers: 27 (4%) >> * Time to merge a PR: 3 days in average, good! >> * etc. >> > > Plus, the slides were entertaining. > Congrats and thanks for those stats, St?phane. > Like Mariatta I was also in the audience and Stephane was entertaining himself as he really got into it (although he also gave me a bottle of Belgian white wine in the talk so I may be biased :) . -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Dec 5 12:59:49 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 5 Dec 2017 18:59:49 +0100 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: 2017-12-05 16:50 GMT+01:00 Guido van Rossum : > Honestly, I didn't completely follow what Victor thinks of the PEP -- his > post seemed mostly about promoting his own -X dev flag. -X dev is similar (but different) than -W default: show warnings which are hidden by default otherwise. -W default works on Python 2.7 and 3.6. > I have nothing > against that flag but I don't see how its existence is relevant to the PEP, > which is about giving users who don't even know they are Python developers a > hint when they are using deprecated features (for which there always must be > a shiny new replacement!). I disagree that *users* of an application is supposed to "handle" deprecation warnings: report them to the developer, or even try to fix them. IHMO these warnings (hidden by default) were introduced for developers of the application. My point is that I prefer to keep the status quo: continue to hide deprecation warnings, but promote existing solutions like -W default to display these warnings, teach to developers how to see and fix these warnings. Even for developers, I'm not sure that only showing warnings in __main__ is useful, since more and more application use a __main__ module which is a thin entry point : import + function call (ex: "from app import main; main()"). > Therefore I am planning to accept it by the end of this week unless more objections are voiced. It's ok if we disagree. I just wanted to share my opinion on this issue ;-) Victor From guido at python.org Tue Dec 5 13:24:43 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 5 Dec 2017 10:24:43 -0800 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: On Tue, Dec 5, 2017 at 9:59 AM, Victor Stinner wrote: > 2017-12-05 16:50 GMT+01:00 Guido van Rossum : > > Honestly, I didn't completely follow what Victor thinks of the PEP -- his > > post seemed mostly about promoting his own -X dev flag. > > -X dev is similar (but different) than -W default: show warnings which > are hidden by default otherwise. -W default works on Python 2.7 and > 3.6. > > > I have nothing > > against that flag but I don't see how its existence is relevant to the > PEP, > > which is about giving users who don't even know they are Python > developers a > > hint when they are using deprecated features (for which there always > must be > > a shiny new replacement!). > > I disagree that *users* of an application is supposed to "handle" > deprecation warnings: report them to the developer, or even try to fix > them. IHMO these warnings (hidden by default) were introduced for > developers of the application. > But the whole point of the PEP is that it only warns about deprecations in code over which the user has control -- likely __main__ is their own code, and they *can* handle it. > My point is that I prefer to keep the status quo: continue to hide > deprecation warnings, but promote existing solutions like -W default > to display these warnings, teach to developers how to see and fix > these warnings. > If they import a 3rd party module which does something deprecated, the user won't see any deprecation warnings. > Even for developers, I'm not sure that only showing warnings in > __main__ is useful, since more and more application use a __main__ > module which is a thin entry point : import + function call (ex: "from > app import main; main()"). > And that's intentional -- such developers are supposed to actively test their code, and deprecations will be shown to them by the unittest framework. Having a minimal __main__ implies that their users won't see any deprecation warnings. > > Therefore I am planning to accept it by the end of this week unless more > objections are voiced. > > It's ok if we disagree. I just wanted to share my opinion on this issue ;-) > I just worry that your opinion might be based on a misunderstanding of the proposal. If we agree on what the PEP actually proposed to change, and how that will affect different categories of users and developers, I am okay with disagreement. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephane at wirtel.be Tue Dec 5 13:46:39 2017 From: stephane at wirtel.be (Stephane Wirtel) Date: Tue, 5 Dec 2017 19:46:39 +0100 Subject: [Python-Dev] =?utf-8?q?=22CPython_loves_your_Pull_Requests=22_ta?= =?utf-8?q?lk_by_St=C3=A9phane_Wirtel?= In-Reply-To: References: Message-ID: Hi, Thank you for this post to python-dev. About my talk, it was a real pleasure to give it at PyCon Canada, and I hope I could propose it to PyCon US for a larger public. But the goal behind this talk was to show that we have a good community, firstly by the external contributors and by the core-dev. For the statistics, I used the REST API v3 of GitHub and matplotlib. For the future, I would like to update them via a service running on my own server and maybe submit it to the Python Software Foundation, because I think it's a good indicator for the future contributors of the project. But seriously, I was surprised by the number of Pull Requests and by the number of contributors from Feb 2017 to Oct 2017. Here is my graph for October and November 2017. I will share my scripts on Github and if you want to help me with some good ideas, you are welcome. St?phane Le 05/12/17 ? 15:25, Victor Stinner a ?crit?: > Hi, > > St?phane Wirtel gave a talk last month at Pycon CA about CPython pull > requests. His slides: > > https://speakerdeck.com/matrixise/cpython-loves-your-pull-requests > > He produced interesting statistics that we didn't have before on pull > requests (PR), from February 2017 to October 2017: > > * total number of merged PR: 4204 > * number of contributors: 586 !!! (96%) > * number of core developers: 27 (4%) > * Time to merge a PR: 3 days in average, good! > * etc. > > It would be nice to get these statistics updated regularly on a > service running somewhere. > > By the way, I'm also looking for statistics on reviews on GitHub. Does > someone know how to do that? > > Victor > -------------- next part -------------- A non-text attachment was scrubbed... Name: period-last-quarter.png Type: image/png Size: 39110 bytes Desc: not available URL: From stephane at wirtel.be Tue Dec 5 13:49:01 2017 From: stephane at wirtel.be (Stephane Wirtel) Date: Tue, 5 Dec 2017 19:49:01 +0100 Subject: [Python-Dev] =?utf-8?q?=22CPython_loves_your_Pull_Requests=22_ta?= =?utf-8?q?lk_by_St=C3=A9phane_Wirtel?= In-Reply-To: References: Message-ID: <5f8176db-a894-2121-b1ea-aa8208eea3f7@wirtel.be> Hi Mariatta, Thank you, I was really happy to see you at my talk, usually this kind of talk is boring ;-) just kidding, but usually I prefer a technical talk. Le 05/12/17 ? 16:25, Mariatta Wijaya a ?crit?: > I saw the talk in person :) Congrats?St?phane! > > You can get the reviews from a specific PR using the API: > https://developer.github.com/v3/pulls/reviews/#list-reviews-on-a-pull-request > > For example, for reviews made to CPython PR number 1: > > https://api.github.com/repos/python/cpython/pulls/1/reviews > > * Time to merge a PR: 3 days in average, good! > > > Regarding the average time to merge PR, I'm interested to know the > average time to merge for PRs not made by Python Core Devs. +1 I could add this point in my scripts. Have a nice day and thank you for your feedback. St?phane From solipsis at pitrou.net Tue Dec 5 13:55:45 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 5 Dec 2017 19:55:45 +0100 Subject: [Python-Dev] =?utf-8?q?=22CPython_loves_your_Pull_Requests=22_ta?= =?utf-8?q?lk_by_St=C3=A9phane_Wirtel?= References: Message-ID: <20171205195545.6dad7c34@fsol> On Tue, 5 Dec 2017 19:46:39 +0100 Stephane Wirtel via Python-Dev wrote: > But seriously, I was surprised by the number of Pull Requests and by the > number of contributors from Feb 2017 to Oct 2017. > > Here is my graph for October and November 2017. Is it possible to compare those numbers with the number of commits / month before the Github migration? Regards Antoine. From barry at python.org Tue Dec 5 14:11:58 2017 From: barry at python.org (Barry Warsaw) Date: Tue, 5 Dec 2017 14:11:58 -0500 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: On Dec 5, 2017, at 13:24, Guido van Rossum wrote: > But the whole point of the PEP is that it only warns about deprecations in code over which the user has control -- likely __main__ is their own code, and they *can* handle it. I?m not so sure how true that is. I have no sense of the relative popularity of hand crafted dunder-mains vs entry point crafted ones. I know that in my own applications, I tend to use the latter (although pkg_resources performance issues bum me out). But then you have applications like pex that use fairly complex hand crafted dunder-mains in their zip files. In either case I don?t want consumers of my applications to have to worry about DeprecationWarnings, since *they* really can?t do anything about them. All that to say I really don?t know what the right thing to do here is. All of our fiddling with the reporting of DeprecationWarnings, not to mention PendingDeprecationWarnings and FutureWarnings feels like experimental shots in the dark, and I suspect we won?t really know if PEP 565 will be helpful, harmful, or neutral until it?s out in the wild for a while. I suspect either that what we?re trying to accomplish really can?t be done, or that we really don?t have a good understanding of the problem and we?re just chipping away at the edges. I know that?s unhelpful in deciding whether to accept the PEP or not. In the absence of any clear consensus, I?m happy to trust Guido?s instincts or keep the status quo. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From guido at python.org Tue Dec 5 14:32:27 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 5 Dec 2017 11:32:27 -0800 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: On Tue, Dec 5, 2017 at 11:11 AM, Barry Warsaw wrote: > On Dec 5, 2017, at 13:24, Guido van Rossum wrote: > > > But the whole point of the PEP is that it only warns about deprecations > in code over which the user has control -- likely __main__ is their own > code, and they *can* handle it. > > I?m not so sure how true that is. I have no sense of the relative > popularity of hand crafted dunder-mains vs entry point crafted ones. I > know that in my own applications, I tend to use the latter (although > pkg_resources performance issues bum me out). But then you have > applications like pex that use fairly complex hand crafted dunder-mains in > their zip files. In either case I don?t want consumers of my applications > to have to worry about DeprecationWarnings, since *they* really can?t do > anything about them. > This makes it the responsibility of the pex developers to at least test for deprecation errors in their __main__. I don't know what pex is, but presumably they have some QA and they can test their zips or at least their __main__ with specific Python versions before distributing them. I am confident that it's not going to be a problem for pex developers or users. > All that to say I really don?t know what the right thing to do here is. > All of our fiddling with the reporting of DeprecationWarnings, not to > mention PendingDeprecationWarnings and FutureWarnings feels like > experimental shots in the dark, and I suspect we won?t really know if PEP > 565 will be helpful, harmful, or neutral until it?s out in the wild for a > while. I suspect either that what we?re trying to accomplish really can?t > be done, or that we really don?t have a good understanding of the problem > and we?re just chipping away at the edges. > > I know that?s unhelpful in deciding whether to accept the PEP or not. In > the absence of any clear consensus, I?m happy to trust Guido?s instincts or > keep the status quo. > I also expect that this PEP will only have a small effect. It is a compromise, but I'm okay with that. There seems to be no way to find out what effect changes in this area will really have, because small-scale experiments where some development team starts paying attention to deprecations don't translate into what it will mean for the large majority who aren't particularly interested in them (but who may still be affected when the deprecations finally take effect and some feature disappears). The main category of users who are going to be affected are "casual" users -- it's they who have the most code in __main__ files and at the same time have the least idea of where Python is header. Yes, they will occasionally be annoyed. But they will also occasionally be glad that we point out a deprecation to them. And, unlike the situation before Python 2.7, they won't be annoyed by warnings in code they can't update -- they'll get warnings only about their own scripts. All in all, I think that for professional developers and users of professionally developed Python packages, at worst not much will change and at best there will be some small benefits; while for casual developers and users there will be some benefits and those will outweigh the downsides. In 5 years or so we can reevaluate. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Dec 5 15:10:22 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 5 Dec 2017 21:10:22 +0100 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: 2017-12-05 19:24 GMT+01:00 Guido van Rossum : >> I disagree that *users* of an application is supposed to "handle" >> deprecation warnings: report them to the developer, or even try to fix >> them. IHMO these warnings (hidden by default) were introduced for >> developers of the application. > > But the whole point of the PEP is that it only warns about deprecations in > code over which the user has control -- likely __main__ is their own code, > and they *can* handle it. IMHO the core of the PEP 565 is to propose a compromise to separate "own code" and "external code" (cannot be modified). I'm unhappy with this suboptimal compromise: "only __main__ is my own code". Maybe we need something to declare the code that we own, to enable warnings on them. Just a simple helper on top of warnings.filterwarnings()? Or maybe I'm already in the "the better is the enemy of the good" greay area :-) Victor From guido at python.org Tue Dec 5 15:15:08 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 5 Dec 2017 12:15:08 -0800 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: On Tue, Dec 5, 2017 at 12:10 PM, Victor Stinner wrote: > 2017-12-05 19:24 GMT+01:00 Guido van Rossum : > >> I disagree that *users* of an application is supposed to "handle" > >> deprecation warnings: report them to the developer, or even try to fix > >> them. IHMO these warnings (hidden by default) were introduced for > >> developers of the application. > > > > But the whole point of the PEP is that it only warns about deprecations > in > > code over which the user has control -- likely __main__ is their own > code, > > and they *can* handle it. > > IMHO the core of the PEP 565 is to propose a compromise to separate > "own code" and "external code" (cannot be modified). > > I'm unhappy with this suboptimal compromise: "only __main__ is my own > code". Maybe we need something to declare the code that we own, to > enable warnings on them. Just a simple helper on top of > warnings.filterwarnings()? Or maybe I'm already in the "the better is > the enemy of the good" greay area :-) > This is a very good characterization of the dilemma. But IMO we don't have a better proxy for "my own code" (it's tempting to try and define it based on the current directory but this makes too many assumptions about development styles). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Dec 5 15:21:39 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 5 Dec 2017 22:21:39 +0200 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: 05.12.17 22:10, Victor Stinner ????: > Maybe we need something to declare the code that we own, to > enable warnings on them. Just compare __author__ with the name of the user running a script. ;-) From tjreedy at udel.edu Tue Dec 5 15:26:04 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 5 Dec 2017 15:26:04 -0500 Subject: [Python-Dev] Zero-width matching in regexes In-Reply-To: <16f7437d-3e11-6a19-3569-e4d55a370744@mrabarnett.plus.com> References: <16f7437d-3e11-6a19-3569-e4d55a370744@mrabarnett.plus.com> Message-ID: On 12/4/2017 6:21 PM, MRAB wrote: > I've finally come to a conclusion as to what the "correct" behaviour of > zero-width matches should be: """always return the first match, but > never a zero-width match that is joined to a previous zero-width match""". Is this different from current re or regex? > If it's about to return a zero-width match that's joined to a previous > zero-width match, then backtrack and keep on looking for a match. > > Example: > > >>> print([m.span() for m in re.finditer(r'|.', 'a')]) > [(0, 0), (0, 1), (1, 1)] > > re.findall, re.split and re.sub should work accordingly. > > If re.finditer finds n matches, then re.split should return a list of > n+1 strings and re.sub should make n replacements (excepting maxsplit, > etc.). -- Terry Jan Reedy From victor.stinner at gmail.com Tue Dec 5 15:43:34 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 5 Dec 2017 21:43:34 +0100 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: 2017-12-05 21:21 GMT+01:00 Serhiy Storchaka : > 05.12.17 22:10, Victor Stinner ????: >> >> Maybe we need something to declare the code that we own, to >> enable warnings on them. > > Just compare __author__ with the name of the user running a script. ;-) I was thinking as something like: enable_warnings_on('app') which would enable warnings on app.py and app/* including subdirectories (app/tools.py but also app/submodule/thing.py). But the drawback of this idea compared to PEP 565 is that it requires to modify each application, whereas the PEP 565 changes the default behaviour. Warnings are a complex beast. If you think about ResourceWarning: the warning can be emited "in the stdlib", whereas the file was opened in "your own code". ResourceWarning are basically emited *anywhere* because of their weird nature. It's often that these warnings are emited on a garbage collection, and this can happen anytime, usually far from where the resource was allocated. (Since Python 3.6, if you enable tracemalloc, the ResourceWarning is now logged with the traceback where the resource was allocated!!!) -b and -bb options are global options to change the behaviour of BytesWarning: it's not possible to only make BytesWarning strict "in your own code". (I'm not sure neither that all code uses properly the warnings API to identify correctly the caller where the warning should be emitted.) ... For all these reasons, I prefer to not try to *guess* what is the "owned code", but simply always emit warnings everywhere :-) That's why I suggest to use -W default option or PYTHONWARNINGS=default (or the new -X dev option or PYTHONDEVMODE=1). In my experience, it's fine to get warnings in third party code. I'm lucky, I'm only working on open source softwares, so I can report and even fix (!) warnings in the code that I don't own :-) By the way, if you are annoyed by one very specific warning in external code (which prevents you to fix other warnings "further" in the code), you can ignore it using a filter. You can specify a module name, and even a line number of a module. https://docs.python.org/dev/library/warnings.html#warnings.filterwarnings At the end, I'm not sure that the PEP 565 is really needed or would help anyone. Again, I understand that there is no perfect solution, and that PEP 565 is a compromise. Victor From guido at python.org Tue Dec 5 16:18:47 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 5 Dec 2017 13:18:47 -0800 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: I've been discussing this PEP offline with Victor, but he suggested we should discuss it in public instead. I am very worried about this long and rambling PEP, and I propose that it not be accepted without a major rewrite to focus on clarity of the specification. The "Unicode just works" summary is more a wish than a proper summary of the PEP. For others interested in reviewing this, the implementation link is hidden in the long list of links; it is http://bugs.python.org/issue29240. FWIW the relationship with PEP 538 is also pretty unclear. (Or maybe that's another case of the forest and the trees.) And that PEP (while already accepted) also comes across as rambling and vague, and I have no idea what it actually does. And it seems to mention PEP 540 quite a few times. So I guess PEP acceptance week is over. :-( On Tue, Dec 5, 2017 at 7:52 AM, Victor Stinner wrote: > Hi, > > Since it's the PEP Acceptance Week, I try my luck! Here is my very > long PEP to propose a tiny change. The PEP is very long to explain the > rationale and limitations. > > Inaccurate tl; dr with the UTF-8 mode, Unicode "just works" as expected. > > Reminder: INADA Naoki was nominated as the BDFL-Delegate. > > https://www.python.org/dev/peps/pep-0540/ > > Full-text below. > > Victor > > > PEP: 540 > Title: Add a new UTF-8 mode > Version: $Revision$ > Last-Modified: $Date$ > Author: Victor Stinner , > Nick Coghlan > BDFL-Delegate: INADA Naoki > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 5-January-2016 > Python-Version: 3.7 > > > Abstract > ======== > > Add a new UTF-8 mode, enabled by default in the POSIX locale, to ignore > the locale and force the usage of the UTF-8 encoding for external > operating system interfaces, including the standard IO streams. > > Essentially, the UTF-8 mode behaves as Python 2 and other C based > applications on \*nix systems: it aims to process text as best it can, > but it errs on the side of producing or propagating mojibake to > subsequent components in a processing pipeline rather than requiring > strictly valid encodings at every step in the process. > > The UTF-8 mode can be configured as strict to reduce the risk of > producing or propagating mojibake. > > A new ``-X utf8`` command line option and ``PYTHONUTF8`` environment > variable are added to explicitly control the UTF-8 mode (including > turning it off entirely, even in the POSIX locale). > > > Rationale > ========= > > "It's not a bug, you must fix your locale" is not an acceptable answer > ---------------------------------------------------------------------- > > Since Python 3.0 was released in 2008, the usual answer to users getting > Unicode errors is to ask developers to fix their code to handle Unicode > properly. Most applications and Python modules were fixed, but users > kept reporting Unicode errors regularly: see the long list of issues in > the `Links`_ section below. > > In fact, a second class of bugs comes from a locale which is not properly > configured. The usual answer to such a bug report is: "it is not a bug, > you must fix your locale". > > Technically, the answer is correct, but from a practical point of view, > the answer is not acceptable. In many cases, "fixing the issue" is a > hard task. Moreover, sometimes, the usage of the POSIX locale is > deliberate. > > A good example of a concrete issue are build systems which create a > fresh environment for each build using a chroot, a container, a virtual > machine or something else to get reproducible builds. Such a setup > usually uses the POSIX locale. To get 100% reproducible builds, the > POSIX locale is a good choice: see the `Locales section of > reproducible-builds.org > `_. > > PEP 538 lists additional problems related to the use of Linux containers to > run network services and command line applications. > > UNIX users don't expect Unicode errors, since the common command lines > tools like ``cat``, ``grep`` or ``sed`` never fail with Unicode errors - > they produce mostly-readable text instead. > > These users similarly expect that tools written in Python 3 (including > those updated from Python 2), continue to tolerate locale > misconfigurations and avoid bothering them with text encoding details. > From their point of the view, the bug is not their locale but is > obviously Python 3 ("Everything else works, including Python 2, so > what's wrong with Python 3?"). > > Since Python 2 handles data as bytes, similar to system utilities > written in C and C++, it's rarer in Python 2 compared to Python 3 to get > explicit Unicode errors. It also contributes significantly to why many > affected users perceive Python 3 as the root cause of their Unicode > errors. > > At the same time, the stricter text handling model was deliberately > introduced into Python 3 to reduce the frequency of data corruption bugs > arising in production services due to mismatched assumptions regarding > text encodings. It's one thing to emit mojibake to a user's terminal > while listing a directory, but something else entirely to store that in > a system manifest in a database, or to send it to a remote client > attempting to retrieve files from the system. > > Since different group of users have different expectations, there is no > silver bullet which solves all issues at once. Last but not least, > backward compatibility should be preserved whenever possible. > > Locale and operating system data > -------------------------------- > > .. _operating system data: > > Python uses an encoding called the "filesystem encoding" to decide how > to encode and decode data from/to the operating system: > > * file content > * command line arguments: ``sys.argv`` > * standard streams: ``sys.stdin``, ``sys.stdout``, ``sys.stderr`` > * environment variables: ``os.environ`` > * filenames: ``os.listdir(str)`` for example > * pipes: ``subprocess.Popen`` using ``subprocess.PIPE`` for example > * error messages: ``os.strerror(code)`` for example > * user and terminal names: ``os``, ``grp`` and ``pwd`` modules > * host name, UNIX socket path: see the ``socket`` module > * etc. > > At startup, Python calls ``setlocale(LC_CTYPE, "")`` to use the user > ``LC_CTYPE`` locale and then store the locale encoding as the > "filesystem error". It's possible to get this encoding using > ``sys.getfilesystemencoding()``. In the whole lifetime of a Python > process, the same encoding and error handler are used to encode and > decode data from/to the operating system. > > The ``os.fsdecode()`` and ``os.fsencode()`` functions can be used to > decode and encode operating system data. These functions use the > filesystem error handler: ``sys.getfilesystemencodeerrors()``. > > .. note:: > In some corner cases, the *current* ``LC_CTYPE`` locale must be used > instead of ``sys.getfilesystemencoding()``. For example, the ``time`` > module uses the *current* ``LC_CTYPE`` locale to decode timezone > names. > > > The POSIX locale and its encoding > --------------------------------- > > The following environment variables are used to configure the locale, in > this preference order: > > * ``LC_ALL``, most important variable > * ``LC_CTYPE`` > * ``LANG`` > > The POSIX locale, also known as "the C locale", is used: > > * if the first set variable is set to ``"C"`` > * if all these variables are unset, for example when a program is > started in an empty environment. > > The encoding of the POSIX locale must be ASCII or a superset of ASCII. > > On Linux, the POSIX locale uses the ASCII encoding. > > On FreeBSD and Solaris, ``nl_langinfo(CODESET)`` announces an alias of > the ASCII encoding, whereas ``mbstowcs()`` and ``wcstombs()`` functions > use the ISO 8859-1 encoding (Latin1) in practice. The problem is that > ``os.fsencode()`` and ``os.fsdecode()`` use > ``locale.getpreferredencoding()`` codec. For example, if command line > arguments are decoded by ``mbstowcs()`` and encoded back by > ``os.fsencode()``, an ``UnicodeEncodeError`` exception is raised instead > of retrieving the original byte string. > > To fix this issue, Python checks since Python 3.4 if ``mbstowcs()`` > really uses the ASCII encoding if the the ``LC_CTYPE`` uses the the > POSIX locale and ``nl_langinfo(CODESET)`` returns ``"ASCII"`` (or an > alias to ASCII). If not (the effective encoding is not ASCII), Python > uses its own ASCII codec instead of using ``mbstowcs()`` and > ``wcstombs()`` functions for `operating system data`_. > > See the `POSIX locale (2016 Edition) > >`_. > > > POSIX locale used by mistake > ---------------------------- > > In many cases, the POSIX locale is not really expected by users who get > it by mistake. Examples: > > * program started in an empty environment > * User forcing LANG=C to get messages in English > * LANG=C used for bad reasons, without being aware of the ASCII encoding > * SSH shell > * Linux installed with no configured locale > * chroot environment, Docker image, container, ... with no locale is > configured > * User locale set to a non-existing locale, typo in the locale name for > example > > > C.UTF-8 and C.utf8 locales > -------------------------- > > Some UNIX operating systems provide a variant of the POSIX locale using > the UTF-8 encoding: > > * Fedora 25: ``"C.utf8"`` or ``"C.UTF-8"`` > * Debian (eglibc 2.13-1, 2011), Ubuntu: ``"C.UTF-8"`` > * HP-UX: ``"C.utf8"`` > > It was proposed to add a ``C.UTF-8`` locale to the glibc: `glibc C.UTF-8 > proposal `_. > > It is not planned to add such locale to BSD systems. > > > Popularity of the UTF-8 encoding > -------------------------------- > > Python 3 uses UTF-8 by default for Python source files. > > On Mac OS X, Windows and Android, Python always use UTF-8 for operating > system data. For Windows, see the `PEP 529`_: "Change Windows filesystem > encoding to UTF-8". > > On Linux, UTF-8 became the de facto standard encoding, > replacing legacy encodings like ISO 8859-1 or ShiftJIS. For example, > using different encodings for filenames and standard streams is likely > to create mojibake, so UTF-8 is now used *everywhere* (at least for > modern > distributions using their default settings). > > The UTF-8 encoding is the default encoding of XML and JSON file format. > In January 2017, UTF-8 was used in `more than 88% of web pages > `_ (HTML, > Javascript, CSS, etc.). > > See `utf8everywhere.org `_ for more general > information on the UTF-8 codec. > > .. note:: > Some applications and operating systems (especially Windows) use Byte > Order Markers (BOM) to indicate the used Unicode encoding: UTF-7, > UTF-8, UTF-16-LE, etc. BOM are not well supported and rarely used in > Python. > > > Old data stored in different encodings and surrogateescape > ---------------------------------------------------------- > > Even if UTF-8 became the de facto standard, there are still systems in > the wild which don't use UTF-8. And there are a lot of data stored in > different encodings. For example, an old USB key using the ext3 > filesystem with filenames encoded to ISO 8859-1. > > The Linux kernel and libc don't decode filenames: a filename is used > as a raw array of bytes. The common solution to support any filename is > to store filenames as bytes and don't try to decode them. When displayed > to stdout, mojibake is displayed if the filename and the terminal don't > use the same encoding. > > Python 3 promotes Unicode everywhere including filenames. A solution to > support filenames not decodable from the locale encoding was found: the > ``surrogateescape`` error handler (`PEP 383`_), store undecodable bytes > as surrogate characters. This error handler is used by default for > `operating system data`_, by ``os.fsdecode()`` and ``os.fsencode()`` for > example (except on Windows which uses the ``strict`` error handler). > > > Standard streams > ---------------- > > Python uses the locale encoding for standard streams: stdin, stdout and > stderr. The ``strict`` error handler is used by stdin and stdout to > prevent mojibake. > > The ``backslashreplace`` error handler is used by stderr to avoid > Unicode encode errors when displaying non-ASCII text. It is especially > useful when the POSIX locale is used, because this locale usually uses > the ASCII encoding. > > The problem is that `operating system data`_ like filenames are decoded > using the ``surrogateescape`` error handler (`PEP 383`_). Displaying a > filename to stdout raises a Unicode encode error if the filename > contains an undecoded byte stored as a surrogate character. > > Python 3.5+ now uses ``surrogateescape`` for stdin and stdout if the > POSIX locale is used: `issue #19977 > `_. The idea is to pass through > `operating system data`_ even if it means mojibake, because most UNIX > applications work like that. Such UNIX applications often store > filenames as bytes, in many cases because their basic design principles > (or those of the language they're implemented in) were laid down half a > century ago when it was still a feat for computers to handle English > text correctly, rather than > humans having to work with raw numeric indexes. > > .. note:: > The encoding and/or the error handler of standard streams can be > overriden with the ``PYTHONIOENCODING`` environment variable. > > > Proposal > ======== > > Changes > ------- > > Add a new UTF-8 mode, enabled by default in the POSIX locale, but > otherwise disabled by default, to ignore the locale and force the usage > of the UTF-8 encoding with the ``surrogateescape`` error handler, > instead using the locale encoding (with ``strict`` or > ``surrogateescape`` error handler depending on the case). > > The "normal" UTF-8 mode uses ``surrogateescape`` on the standard input > and output streams and opened files, as well as on all operating > system interfaces. This is the mode implicitly activated by the POSIX > locale. > > The "strict" UTF-8 mode reduces the risk of producing or propogating > mojibake: the UTF-8 encoding is used with the ``strict`` error handler > for inputs and outputs, but the ``surrogateescape`` error handler is > still used for `operating system data`_. This mode is never activated > implicitly, but can be requested explicitly. > > The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment > variable are added to control the UTF-8 mode. > > The UTF-8 mode is enabled by ``-X utf8`` or ``PYTHONUTF8=1``. > > The UTF-8 Strict mode is configured by ``-X utf8=strict`` or > ``PYTHONUTF8=strict``. > > The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode > can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``. > > Other option values fail with an error. > > Options priority for the UTF-8 mode: > > * ``PYTHONLEGACYWINDOWSFSENCODING`` > * ``-X utf8`` > * ``PYTHONUTF8`` > * POSIX locale > > For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the UTF-8 mode, > whereas ``LC_ALL=C python3.7 -X utf8=0`` disables the UTF-8 mode and so > use the encoding of the POSIX locale. > > Encodings used by ``open()``, highest priority first: > > * *encoding* and *errors* parameters (if set) > * UTF-8 mode > * ``os.device_encoding(fd)`` > * ``os.getpreferredencoding(False)`` > > > Encoding and error handler > -------------------------- > > The UTF-8 mode changes the default encoding and error handler used by > ``open()``, ``os.fsdecode()``, ``os.fsencode()``, ``sys.stdin``, > ``sys.stdout`` and ``sys.stderr``: > > ============================ ======================= > ========================== ========================== > Function Default UTF-8 mode or > POSIX locale UTF-8 Strict mode > ============================ ======================= > ========================== ========================== > open() locale/strict > **UTF-8/surrogateescape** **UTF-8**/strict > os.fsdecode(), os.fsencode() locale/surrogateescape > **UTF-8**/surrogateescape **UTF-8**/surrogateescape > sys.stdin, sys.stdout locale/strict > **UTF-8/surrogateescape** **UTF-8**/strict > sys.stderr locale/backslashreplace > **UTF-8**/backslashreplace **UTF-8**/backslashreplace > ============================ ======================= > ========================== ========================== > > By comparison, Python 3.6 uses: > > ============================ ======================= > ========================== > Function Default POSIX locale > ============================ ======================= > ========================== > open() locale/strict locale/strict > os.fsdecode(), os.fsencode() locale/surrogateescape > locale/surrogateescape > sys.stdin, sys.stdout locale/strict > locale/**surrogateescape** > sys.stderr locale/backslashreplace > locale/backslashreplace > ============================ ======================= > ========================== > > The UTF-8 mode uses the ``surrogateescape`` error handler instead of the > strict mode for consistency with other standard \*nix operating system > components: the idea is that data not encoded to UTF-8 are passed through > "Python" without being modified, as raw bytes. > > The ``PYTHONIOENCODING`` environment variable has priority over the > UTF-8 mode for standard streams. For example, ``PYTHONIOENCODING=latin1 > python3 -X utf8`` uses the Latin1 encoding for stdin, stdout and stderr. > > Encoding and error handler on Windows > ------------------------------------- > > On Windows, the encodings and error handlers are different: > > ============================ ======================= > ========================== ========================== > ========================== > Function Default Legacy Windows > FS encoding UTF-8 mode UTF-8 Strict mode > ============================ ======================= > ========================== ========================== > ========================== > open() mbcs/strict mbcs/strict > **UTF-8/surrogateescape** **UTF-8**/strict > os.fsdecode(), os.fsencode() UTF-8/surrogatepass > **mbcs/replace** UTF-8/surrogatepass > UTF-8/surrogatepass > sys.stdin, sys.stdout UTF-8/surrogateescape > UTF-8/surrogateescape UTF-8/surrogateescape > **UTF-8/strict** > sys.stderr UTF-8/backslashreplace > UTF-8/backslashreplace UTF-8/backslashreplace > UTF-8/backslashreplace > ============================ ======================= > ========================== ========================== > ========================== > > By comparison, Python 3.6 uses: > > ============================ ======================= > ========================== > Function Default Legacy Windows > FS encoding > ============================ ======================= > ========================== > open() mbcs/strict mbcs/strict > os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** > sys.stdin, sys.stdout UTF-8/surrogateescape > UTF-8/surrogateescape > sys.stderr UTF-8/backslashreplace > UTF-8/backslashreplace > ============================ ======================= > ========================== > > The "Legacy Windows FS encoding" is enabled by setting the > ``PYTHONLEGACYWINDOWSFSENCODING`` environment variable to ``1`` as > specified in `PEP 529` . > > Enabling the legacy Windows filesystem encoding disables the UTF-8 mode > (as ``-X utf8=0``). > > If stdin and/or stdout is redirected to a pipe, ``sys.stdin`` and/or > ``sys.output`` use ``mbcs`` encoding by default rather than UTF-8. But > with the UTF-8 mode, ``sys.stdin`` and ``sys.stdout`` always use the > UTF-8 encoding. > > There is no POSIX locale on Windows. The ANSI code page is used to the > locale encoding, and this code page never uses the ASCII encoding. > > > Rationale > --------- > > The UTF-8 mode is disabled by default to keep hard Unicode errors when > encoding or decoding `operating system data`_ failed, and to keep the > backward compatibility. The user is responsible to enable explicitly the > UTF-8 mode, and so is better prepared for mojibake than if the UTF-8 > mode would be enabled *by default*. > > The UTF-8 mode should be used on systems known to be configured with > UTF-8 where most applications speak UTF-8. It prevents Unicode errors if > the user overrides a locale *by mistake* or if a Python program is > started with no locale configured (and so with the POSIX locale). > > Most UNIX applications handle `operating system data`_ as bytes, so > ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables have a > limited impact on how these data are handled by the application. > > The Python UTF-8 mode should help to make Python more interoperable with > the other UNIX applications in the system assuming that *UTF-8* is used > everywhere and that users *expect* UTF-8. > > Ignoring ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables in > Python is more convenient, since they are more commonly misconfigured > *by mistake* (configured to use an encoding different than UTF-8, > whereas the system uses UTF-8), rather than being misconfigured by > intent. > > Expected mojibake and surrogate character issues > ------------------------------------------------ > > The UTF-8 mode only affects code running directly in Python, especially > code written in pure Python. The other code, called "external code" > here, is not aware of this mode. Examples: > > * C libraries called by Python modules like OpenSSL > * The application code when Python is embedded in an application > > In the UTF-8 mode, Python uses the ``surrogateescape`` error handler > which stores bytes not decodable from UTF-8 as surrogate characters. > > If the external code uses the locale and the locale encoding is UTF-8, > it should work fine. > > External code using bytes > ^^^^^^^^^^^^^^^^^^^^^^^^^ > > If the external code processes data as bytes, surrogate characters are > not an issue since they are only used inside Python. Python encodes back > surrogate characters to bytes at the edges, before calling external > code. > > The UTF-8 mode can produce mojibake since Python and external code don't > both of invalid bytes, but it's a deliberate choice. The UTF-8 mode can > be configured as strict to prevent mojibake and fail early when data > is not decodable from UTF-8 or not encodable to UTF-8. > > External code using text > ^^^^^^^^^^^^^^^^^^^^^^^^ > > If the external code uses text API, for example using the ``wchar_t*`` C > type, mojibake should not occur, but the external code can fail on > surrogate characters. > > > Use Cases > ========= > > The following use cases were written to help to understand the impact of > chosen encodings and error handlers on concrete examples. > > The "Exception?" column shows the potential benefit of having a UTF-8 > mode which is closer to the traditional Python 2 behaviour of passing > along raw binary data even if it isn't valid UTF-8. > > The "Mojibake" column shows that ignoring the locale causes a practical > issue: the UTF-8 mode produces mojibake if the terminal doesn't use the > UTF-8 encoding. > > The ideal configuration is "No exception, no risk of mojibake", but that > isn't always possible in the presence of non-UTF-8 encoded binary data. > > List a directory into stdout > ---------------------------- > > Script listing the content of the current directory into stdout:: > > import os > for name in os.listdir(os.curdir): > print(name) > > Result: > > ======================== ========== ========= > Python Exception? Mojibake? > ======================== ========== ========= > Python 2 No **Yes** > Python 3 **Yes** No > Python 3.5, POSIX locale No **Yes** > UTF-8 mode No **Yes** > UTF-8 Strict mode **Yes** No > ======================== ========== ========= > > "Exception?" means that the script can fail on decoding or encoding a > filename depending on the locale or the filename. > > To be able to never fail that way, the program must be able to produce > mojibake. For automated and interactive process, mojibake is often more > user friendly than an error with a truncated or empty output, since it > confines the problem to the affected entry, rather than aborting the > whole task. > > Example with a directory which contains the file called ``b'xxx\xff'`` > (the byte ``0xFF`` is invalid in UTF-8). > > Default and UTF-8 Strict mode fail on ``print()`` with an encode error:: > > $ python3.7 ../ls.py > Traceback (most recent call last): > File "../ls.py", line 5, in > print(name) > UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ... > > $ python3.7 -X utf8=strict ../ls.py > Traceback (most recent call last): > File "../ls.py", line 5, in > print(name) > UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ... > > The UTF-8 mode, POSIX locale, Python 2 and the UNIX ``ls`` command work > but display mojibake:: > > $ python3.7 -X utf8 ../ls.py > xxx? > > $ LC_ALL=C /python3.6 ../ls.py > xxx? > > $ python2 ../ls.py > xxx? > > $ ls > 'xxx'$'\377' > > > List a directory into a text file > --------------------------------- > > Similar to the previous example, except that the listing is written into > a text file:: > > import os > names = os.listdir(os.curdir) > with open("/tmp/content.txt", "w") as fp: > for name in names: > fp.write("%s\n" % name) > > Result: > > ======================== ========== ========= > Python Exception? Mojibake? > ======================== ========== ========= > Python 2 No **Yes** > Python 3 **Yes** No > Python 3.5, POSIX locale **Yes** No > UTF-8 mode No **Yes** > UTF-8 Strict mode **Yes** No > ======================== ========== ========= > > Again, never throwing an exception requires that mojibake can be > produced, while preventing mojibake means that the script can fail on > decoding or encoding a filename depending on the locale or the filename. > Typical error:: > > $ LC_ALL=C python3 test.py > Traceback (most recent call last): > File "test.py", line 5, in > fp.write("%s\n" % name) > UnicodeEncodeError: 'ascii' codec can't encode characters in > position 0-1: ordinal not in range(128) > > Compared with native system tools:: > > $ ls > /tmp/content.txt > $ cat /tmp/content.txt > xxx? > > > Display Unicode characters into stdout > -------------------------------------- > > Very basic example used to illustrate a common issue, display the euro > sign (U+20AC: ?):: > > print("euro: \u20ac") > > Result: > > ======================== ========== ========= > Python Exception? Mojibake? > ======================== ========== ========= > Python 2 **Yes** No > Python 3 **Yes** No > Python 3.5, POSIX locale **Yes** No > UTF-8 mode No **Yes** > UTF-8 Strict mode No **Yes** > ======================== ========== ========= > > The UTF-8 and UTF-8 Strict modes will always encode the euro sign as > UTF-8. If the terminal uses a different encoding, we get mojibake. > > For example, using ``iconv`` to emulate a GB-18030 terminal inside a > UTF-8 one:: > > $ python3 -c 'print("euro: \u20ac")' | iconv -f gb18030 -t utf8 > euro: ?iconv: illegal input sequence at position 8 > > The misencoding also corrupts the trailing newline such that the output > stream isn't actually a valid GB-18030 sequence, hence the error message > after the euro symbol is misinterpreted as a hanzi character. > > > Replace a word in a text > ------------------------ > > The following script replaces the word "apple" with "orange". It > reads input from stdin and writes the output into stdout:: > > import sys > text = sys.stdin.read() > sys.stdout.write(text.replace("apple", "orange")) > > Result: > > ======================== ========== ========= > Python Exception? Mojibake? > ======================== ========== ========= > Python 2 No **Yes** > Python 3 **Yes** No > Python 3.5, POSIX locale No **Yes** > UTF-8 mode No **Yes** > UTF-8 Strict mode **Yes** No > ======================== ========== ========= > > This is a case where passing along the raw bytes (by way of the > ``surrogateescape`` error handler) will bring Python 3's behaviour back > into line with standard operating system tools like ``sed`` and ``awk``. > > > Producer-consumer model using pipes > ----------------------------------- > > Let's say that we have a "producer" program which writes data into its > stdout and a "consumer" program which reads data from its stdin. > > On a shell, such programs are run with the command:: > > producer | consumer > > The question if these programs will work with any data and any locale. > UNIX users don't expect Unicode errors, and so expect that such programs > "just works", in the sense that Unicode errors may cause problems in the > data stream, but won't cause the entire stream processing *itself* to > abort. > > If the producer only produces ASCII output, no error should occur. Let's > say that the producer writes at least one non-ASCII character (at least > one byte in the range ``0x80..0xff``). > > To simplify the problem, let's say that the consumer has no output > (doesn't write results into a file or stdout). > > A "Bytes producer" is an application which cannot fail with a Unicode > error and produces bytes into stdout. > > Let's say that a "Bytes consumer" does not decode stdin but stores data > as bytes: such consumer always work. Common UNIX command line tools like > ``cat``, ``grep`` or ``sed`` are in this category. Many Python 2 > applications are also in this category, as are applications that work > with the lower level binary input and output stream in Python 3 rather > than the default text mode streams. > > "Python producer" and "Python consumer" are producer and consumer > implemented in Python using the default text mode input and output > streams. > > Bytes producer, Bytes consumer > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > This won't through exceptions, but it is out of the scope of this PEP > since it doesn't involve Python's default text mode input and output > streams. > > Python producer, Bytes consumer > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Python producer:: > > print("euro: \u20ac") > > Result: > > ======================== ========== ========= > Python Exception? Mojibake? > ======================== ========== ========= > Python 2 **Yes** No > Python 3 **Yes** No > Python 3.5, POSIX locale **Yes** No > UTF-8 mode No **Yes** > UTF-8 Strict mode No **Yes** > ======================== ========== ========= > > The question here is not if the consumer is able to decode the input, > but if Python is able to produce its output. So it's similar to the > `Display Unicode characters into stdout`_ case. > > UTF-8 modes work with any locale since the consumer doesn't try to > decode its stdin. > > Bytes producer, Python consumer > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Python consumer:: > > import sys > text = sys.stdin.read() > result = text.replace("apple", "orange") > # ignore the result > > Result: > > ======================== ========== ========= > Python Exception? Mojibake? > ======================== ========== ========= > Python 2 No **Yes** > Python 3 **Yes** No > Python 3.5, POSIX locale No **Yes** > UTF-8 mode No **Yes** > UTF-8 Strict mode **Yes** No > ======================== ========== ========= > > Python 3 may throw an exception on decoding stdin depending on the input > and the locale. > > > Python producer, Python consumer > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Python producer:: > > print("euro: \u20ac") > > Python consumer:: > > import sys > text = sys.stdin.read() > result = text.replace("apple", "orange") > # ignore the result > > Result, same Python version used for the producer and the consumer: > > ======================== ========== ========= > Python Exception? Mojibake? > ======================== ========== ========= > Python 2 **Yes** No > Python 3 **Yes** No > Python 3.5, POSIX locale **Yes** No > UTF-8 mode No No(!) > UTF-8 Strict mode No No(!) > ======================== ========== ========= > > This case combines a Python producer with a Python consumer, and the > result is mainly the same as that for `Python producer, Bytes > consumer`_, since the consumer can't read what the producer can't emit. > > However, the behaviour of the "UTF-8" and "UTF-8 Strict" modes in this > configuration is notable: they don't produce an exception, *and* they > shouldn't produce mojibake, as both the producer and the consumer are > making *consistent* assumptions regarding the text encoding used on the > pipe between them (i.e. UTF-8). > > Any mojibake generated would only be in the interfaces bween the > consuming component and the outside world (e.g. the terminal, or when > writing to a file). > > Backward Compatibility > ====================== > > The main backward incompatible change is that the UTF-8 encoding is now > used by default if the locale is POSIX. Since the UTF-8 encoding is used > with the ``surrogateescape`` error handler, encoding errors should not > occur and so the change should not break applications. > > The UTF-8 encoding is also quite restrictive regarding where it allows > plain ASCII code points to appear in the byte stream, so even for > ASCII-incompatible encodings, such byte values will often be escaped > rather than being processed as ASCII characters. > > The more likely source of trouble comes from external libraries. Python > can decode successfully data from UTF-8, but a library using the locale > encoding can fail to encode the decoded text back to bytes. For example, > GNU readline currently has problems on Android due to the mismatch > between CPython's encoding assumptions there (always UTF-8) and GNU > readline's encoding assumptions (which are based on the nominal locale). > > The PEP only changes the default behaviour if the locale is POSIX. For > other locales, the *default* behaviour is unchanged. > > PEP 538 is a follow-up to this PEP that extends CPython's assumptions to > other locale-aware components in the same process by explicitly coercing > the POSIX locale to something more suitable for modern text processing. > See that PEP for further details. > > > Alternatives > ============ > > Don't modify the encoding of the POSIX locale > --------------------------------------------- > > A first version of the PEP did not change the encoding and error handler > used of the POSIX locale. > > The problem is that adding the ``-X utf8`` command line option or > setting the ``PYTHONUTF8`` environment variable is not possible in some > cases, or at least not convenient. > > Moreover, many users simply expect that Python 3 behaves as Python 2: > don't bother them with encodings and "just works" in all cases. These > users don't worry about mojibake, or even expect mojibake because of > complex documents using multiple incompatibles encodings. > > > Always use UTF-8 > ---------------- > > Python already always uses the UTF-8 encoding on Mac OS X, Android and > Windows. Since UTF-8 became the de facto encoding, it makes sense to > always use it on all platforms with any locale. > > The problem with this approach is that Python is also used extensively > in desktop environments, and it is often a practical or even legal > requirement to support locale encoding other than UTF-8 (for example, > GB-18030 in China, and Shift-JIS or ISO-2022-JP in Japan) > > Force UTF-8 for the POSIX locale > -------------------------------- > > An alternative to always using UTF-8 in any case is to only use UTF-8 > when the ``LC_CTYPE`` locale is the POSIX locale. > > The `PEP 538`_ "Coercing the legacy C locale to C.UTF-8" of Nick > Coghlan proposes to implement that using the ``C.UTF-8`` locale. > > > Use the strict error handler for operating system data > ------------------------------------------------------ > > Using the ``surrogateescape`` error handler for `operating system data`_ > creates surprising surrogate characters. No Python codec (except of > ``utf-7``) accept surrogates, and so encoding text coming from the > operating system is likely to raise an error error. The problem is that > the error comes late, very far from where the data was read. > > The ``strict`` error handler can be used instead to decode > (``os.fsdecode()``) and encode (``os.fsencode()``) operating system > data, to raise encoding errors as soon as possible. It helps to find > bugs more quickly. > > The main drawback of this strategy is that it doesn't work in practice. > Python 3 is designed on top on Unicode strings. Most functions expect > Unicode and produce Unicode. Even if many operating system functions > have two flavors, bytes and Unicode, the Unicode flavor is used in most > cases. There are good reasons for that: Unicode is more convenient in > Python 3 and using Unicode helps to support the full Unicode Character > Set (UCS) on Windows (even if Python now uses UTF-8 since Python 3.6, > see the `PEP 528`_ and the `PEP 529`_). > > For example, if ``os.fsdecode()`` uses ``utf8/strict``, > ``os.listdir(str)`` fails to list filenames of a directory if a single > filename is not decodable from UTF-8. As a consequence, > ``shutil.rmtree(str)`` fails to remove a directory. Undecodable > filenames, environment variables, etc. are simply too common to make > this alternative viable. > > > Links > ===== > > PEPs: > > * `PEP 538 `_: > "Coercing the legacy C locale to C.UTF-8" > * `PEP 529 `_: > "Change Windows filesystem encoding to UTF-8" > * `PEP 528 `_: > "Change Windows console encoding to UTF-8" > * `PEP 383 `_: > "Non-decodable Bytes in System Character Interfaces" > > Main Python issues: > > * `Issue #29240: Implementation of the PEP 540: Add a new UTF-8 mode > `_ > * `Issue #28180: sys.getfilesystemencoding() should default to utf-8 > `_ > * `Issue #19977: Use "surrogateescape" error handler for sys.stdin and > sys.stdout on UNIX for the C locale > `_ > * `Issue #19847: Setting the default filesystem-encoding > `_ > * `Issue #8622: Add PYTHONFSENCODING environment variable > `_: added but reverted because of > many issues, read the `Inconsistencies if locale and filesystem > encodings are different > >`_ > thread on the python-dev mailing list > > Incomplete list of Python issues related to Unicode errors, especially > with the POSIX locale: > > * 2016-12-22: `LANG=C python3 -c "import os; os.path.exists('\xff')" > `_ > * 2014-07-20: `issue #22016: Add a new 'surrogatereplace' output only > error handler `_ > * 2014-04-27: `Issue #21368: Check for systemd locale on startup if > current locale is set to POSIX `_ > -- read manually /etc/locale.conf when the locale is POSIX > * 2014-01-21: `Issue #20329: zipfile.extractall fails in Posix shell > with utf-8 filename `_ > * 2013-11-30: `Issue #19846: Python 3 raises Unicode errors with the C > locale > `_ > * 2010-05-04: `Issue #8610: Python3/POSIX: errors if file system > encoding is None `_ > * 2013-08-12: `Issue #18713: Clearly document the use of > PYTHONIOENCODING to set surrogateescape > `_ > * 2013-09-27: `Issue #19100: Use backslashreplace in pprint > `_ > * 2012-01-05: `Issue #13717: os.walk() + print fails with > UnicodeEncodeError > `_ > * 2011-12-20: `Issue #13643: 'ascii' is a bad filesystem default encoding > `_ > * 2011-03-16: `issue #11574: TextIOWrapper should use UTF-8 by default > for the POSIX locale `_, thread on > python-dev: `Low-Level Encoding Behavior on Python 3 > `_ > * 2010-04-26: `Issue #8533: regrtest: use backslashreplace error handler > for stdout `_, regrtest fails with > Unicode encode error if the locale is POSIX > > Some issues are real bugs in applications which must explicitly set the > encoding. Well, it just works in the common case (locale configured > correctly), so what? The program "suddenly" fails when the POSIX > locale is used (probably for bad reasons). Such bugs are not well > understood by users. Example of such issues: > > * 2013-11-21: `pip: open() uses the locale encoding to parse Python > script, instead of the encoding cookie > `_ -- pip must use the encoding > cookie to read a Python source code file > * 2011-01-21: `IDLE 3.x can crash decoding recent file list > `_ > > > Prior Art > ========= > > Perl has a ``-C`` command line option and a ``PERLUNICODE`` environment > variable to force UTF-8: see `perlrun > `_. It is possible to configure > UTF-8 per standard stream, on input and output streams, etc. > > > Post History > ============ > > * 2017-04: `[Python-Dev] Proposed BDFL Delegate update for PEPs 538 & > 540 (assuming UTF-8 for *nix system boundaries) > `_ > * 2017-01: `[Python-ideas] PEP 540: Add a new UTF-8 mode > >`_ > * 2017-01: `bpo-28180: Implementation of the PEP 538: coerce C locale to > C.utf-8 (msg284764) `_ > * 2016-08-17: `bpo-27781: Change sys.getfilesystemencoding() on Windows > to UTF-8 (msg272916) `_ > -- Victor proposed ``-X utf8`` for the :pep:`529` (Change Windows > filesystem encoding to UTF-8) > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Dec 5 16:33:56 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 5 Dec 2017 22:33:56 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: 2017-12-05 22:18 GMT+01:00 Guido van Rossum : > So I guess PEP acceptance week is over. :-( My bad, Barry wrote "PEP Acceptance Day", not week, on twitter ;-) https://twitter.com/pumpichank/status/937770805076905990 Victor From chris.barker at noaa.gov Tue Dec 5 17:38:10 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 5 Dec 2017 14:38:10 -0800 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: On Tue, Dec 5, 2017 at 1:18 PM, Guido van Rossum wrote: > > > I am very worried about this long and rambling PEP, > FWIW, I read the PEP on the bus this morning on a phone, and while lng, I didn't find it too rambling. And this topic has been very often discussed in very long and rambling mailing list threads, etc. So I think a long (If not rambling) PEP is in order. This is a very important topic for Python -- the py2-3 transition got a LOT of flack, to the point of people claiming that it was easier to learn a whole new language than convert to py3 -- and THIS particular issue was a big part of it: The truth is that any system that does not use a clearly defined encoding for filenames (and everything else) is broken, plain and simple. But the other truth is (as talked about in the PEP) they some *nix systems are that broken because C code that simply passed around char* still works fine. And no matter how you slice it telling people that they need to fix their broken system in order for your software to run is not a popular option. When Python added surrogateescape to its Unicode implementation, the tools were there to work with broken (OK, I'll be charitable: misconfigured) systems. Now we just need some easier defaults. OK, now I'm getting long and rambling.... TL;DR -- The proposal in the PEP is an important step forward, and the issue is fraught with enough history and controversy that a long PEP is probably a good idea. So the addition of a better summary of the specification up at the top, and editing of the rest, and we could have a good PEP. Too late for this release, but what can you do? > The "Unicode just works" summary is more a wish than a proper summary of > the PEP. > well, yeah. > FWIW the relationship with PEP 538 is also pretty unclear. (Or maybe > that's another case of the forest and the trees.) And that PEP (while > already accepted) also comes across as rambling and vague, and I have no > idea what it actually does. And it seems to mention PEP 540 quite a few > times. > I just took another look at 538 -- and yes, the relationship between the two is really unclear. In particular, with 538, why do we need 540? I honestly don't know. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Dec 5 17:50:57 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 5 Dec 2017 23:50:57 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode In-Reply-To: References: Message-ID: Chris: > I just took another look at 538 -- and yes, the relationship between the two > is really unclear. In particular, with 538, why do we need 540? I honestly > don't know. The PEP 538 only impacts platforms which provide the C.UTF-8 locale or a variant: only a few recent Linux distribution. I know Fedora, maybe a few other have it? FreeBSD and macOS are completely ignored by the PEP 538. The PEP 540 uses the UTF-8 encoding for the POSIX locale on *all* platforms. Moreover, the PEP 538 only concerns the POSIX locale (locale "C"), whereas the PEP 540 is usable with any locale. For example, using the "fr_FR.iso88591" locale, the encoding is Latin1. But if you enable the UTF-8 mode with this locale, Python will use UTF-8. The other difference is that the PEP 538 is implemented with setlocale(LC_CTYPE, "C.UTF-8"), whereas the PEP 540 is implemented in Python internals and ignores the locale. The PEP 540 scope is limited to Python, non-Python running in the same process is not aware of the "Python UTF-8 mode". Victor From python at mrabarnett.plus.com Tue Dec 5 19:17:12 2017 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 6 Dec 2017 00:17:12 +0000 Subject: [Python-Dev] Zero-width matching in regexes In-Reply-To: References: <16f7437d-3e11-6a19-3569-e4d55a370744@mrabarnett.plus.com> Message-ID: On 2017-12-05 20:26, Terry Reedy wrote: > On 12/4/2017 6:21 PM, MRAB wrote: >> I've finally come to a conclusion as to what the "correct" behaviour of >> zero-width matches should be: """always return the first match, but >> never a zero-width match that is joined to a previous zero-width match""". > > Is this different from current re or regex? > Sometimes yes. It's difficult to know how a zero-width match should be handled. The normal way that, say, findall works is that it searches for a match and then continues from where it left off. If at any point it matched an empty string, it would stall because the end of the match is also the start of the match. How should that be handled? The old buggy behaviour of the re module was to just advance by one character after a zero-width match, which can result in a character being skipped and going missing. A solution is to prohibit a zero-width match that's joined to the previous match, but I'm not convinced that that's correct. >> If it's about to return a zero-width match that's joined to a previous >> zero-width match, then backtrack and keep on looking for a match. >> >> Example: >> >> >>> print([m.span() for m in re.finditer(r'|.', 'a')]) >> [(0, 0), (0, 1), (1, 1)] >> >> re.findall, re.split and re.sub should work accordingly. >> >> If re.finditer finds n matches, then re.split should return a list of >> n+1 strings and re.sub should make n replacements (excepting maxsplit, >> etc.). > From victor.stinner at gmail.com Tue Dec 5 19:49:41 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 6 Dec 2017 01:49:41 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) Message-ID: Hi, I knew that I had to rewrite my PEP 540, but I was too lazy. Since Guido explicitly requested a shorter PEP, here you have! https://www.python.org/dev/peps/pep-0540/ Trust me, it's the same PEP, but focused on the most important information and with a shorter rationale ;-) Full text below. Victor PEP: 540 Title: Add a new UTF-8 mode Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner BDFL-Delegate: INADA Naoki Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 5-January-2016 Python-Version: 3.7 Abstract ======== Add a new UTF-8 mode to ignore the locale and use the UTF-8 encoding with the ``surrogateescape`` error handler. This mode is enabled by default in the POSIX locale, but otherwise disabled by default. Add also a "strict" UTF-8 mode which uses the ``strict`` error handler, instead of ``surrogateescape``, with the UTF-8 encoding. The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable are added to control the UTF-8 mode. Rationale ========= Locale encoding and UTF-8 ------------------------- Python 3.6 uses the locale encoding for filenames, environment variables, standard streams, etc. The locale encoding is inherited from the locale; the encoding and the locale are tightly coupled. Many users inherit the ASCII encoding from the POSIX locale, aka the "C" locale, but are unable change the locale for different reasons. This encoding is very limited in term of Unicode support: any non-ASCII character is likely to cause troubles. For example, the Alpine Linux distribution became popular thanks to Docker containers, but it uses the POSIX locale by default. It is not easy to get the expected locale. Locales don't get the exact same name on all Linux distributions, FreeBSD, macOS, etc. Some locales, like the recent ``C.UTF-8`` locale, are only supported by a few platforms. For example, a SSH connection can use a different encoding than the filesystem or terminal encoding of the local host. On the other side, Python 3.6 is already using UTF-8 by default on macOS, Android and Windows (PEP 529) for most functions, except of ``open()``. UTF-8 is also the default encoding of Python scripts, XML and JSON file formats. The Go programming language uses UTF-8 for strings. When all data are stored as UTF-8 but the locale is often misconfigured, an obvious solution is to ignore the locale and use UTF-8. Passthough undecodable bytes: surrogateescape --------------------------------------------- Using UTF-8 is nice, until you read the first file encoded to a different encoding. When using the ``strict`` error handler, which is the default, Python 3 raises a ``UnicodeDecodeError`` on the first undecodable byte. Unix command line tools like ``cat`` or ``grep`` and most Python 2 applications simply do not have this class of bugs: they don't decode data, but process data as a raw bytes sequence. Python 3 already has a solution to behave like Unix tools and Python 2: the ``surrogateescape`` error handler (:pep:`383`). It allows to process data "as bytes" but uses Unicode in practice (undecodable bytes are stored as surrogate characters). For an application written as a Unix "pipe" tool like ``grep``, taking input on stdin and writing output to stdout, ``surrogateescape`` allows to "passthrough" undecodable bytes. The UTF-8 encoding used with the ``surrogateescape`` error handler is a compromise between correctness and usability. Strict UTF-8 for correctness ---------------------------- When correctness matters more than usability, the ``strict`` error handler is preferred over ``surrogateescape`` to raise an encoding error at the first undecodable byte or unencodable character. No change by default for best backward compatibility ---------------------------------------------------- While UTF-8 is perfect in most cases, sometimes the locale encoding is actually the best encoding. This PEP changes the behaviour for the POSIX locale since this locale usually gives the ASCII encoding, whereas UTF-8 is a much better choice. It does not change the behaviour for other locales to prevent any risk or regression. As users are responsible to enable explicitly the new UTF-8 mode, they are responsible for any potential mojibake issues caused by this mode. Proposal ======== Add a new UTF-8 mode to ignore the locale and use the UTF-8 encoding with the ``surrogateescape`` error handler. This mode is enabled by default in the POSIX locale, but otherwise disabled by default. Add also a "strict" UTF-8 mode which uses the ``strict`` error handler, instead of ``surrogateescape``, with the UTF-8 encoding. The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable are added to control the UTF-8 mode: * The UTF-8 mode is enabled by ``-X utf8`` or ``PYTHONUTF8=1`` * The Strict UTF-8 mode is configured by ``-X utf8=strict`` or ``PYTHONUTF8=strict`` The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``. For standard streams, the ``PYTHONIOENCODING`` environment variable has priority over the UTF-8 mode. On Windows, the ``PYTHONLEGACYWINDOWSFSENCODING`` environment variable (:pep:`529`) has the priority over the UTF-8 mode. Backward Compatibility ====================== The only backward incompatible change is that the UTF-8 encoding is now used for the POSIX locale. Annex: Encodings And Error Handlers =================================== The UTF-8 mode changes the default encoding and error handler used by ``open()``, ``os.fsdecode()``, ``os.fsencode()``, ``sys.stdin``, ``sys.stdout`` and ``sys.stderr``. Encoding and error handler -------------------------- ============================ ======================= ========================== ========================== Function Default UTF-8 mode or POSIX locale Strict UTF-8 mode ============================ ======================= ========================== ========================== open() locale/strict **UTF-8/surrogateescape** **UTF-8**/strict os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape **UTF-8**/surrogateescape sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape** **UTF-8**/strict sys.stderr locale/backslashreplace **UTF-8**/backslashreplace **UTF-8**/backslashreplace ============================ ======================= ========================== ========================== By comparison, Python 3.6 uses: ============================ ======================= ========================== Function Default POSIX locale ============================ ======================= ========================== open() locale/strict locale/strict os.fsdecode(), os.fsencode() locale/surrogateescape locale/surrogateescape sys.stdin, sys.stdout locale/strict locale/**surrogateescape** sys.stderr locale/backslashreplace locale/backslashreplace ============================ ======================= ========================== Encoding and error handler on Windows ------------------------------------- On Windows, the encodings and error handlers are different: ============================ ======================= ========================== ========================== ========================== Function Default Legacy Windows FS encoding UTF-8 mode Strict UTF-8 mode ============================ ======================= ========================== ========================== ========================== open() mbcs/strict mbcs/strict **UTF-8/surrogateescape** **UTF-8**/strict os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** UTF-8/surrogatepass UTF-8/surrogatepass sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape UTF-8/surrogateescape **UTF-8/strict** sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace ============================ ======================= ========================== ========================== ========================== By comparison, Python 3.6 uses: ============================ ======================= ========================== Function Default Legacy Windows FS encoding ============================ ======================= ========================== open() mbcs/strict mbcs/strict os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace ============================ ======================= ========================== The "Legacy Windows FS encoding" is enabled by the ``PYTHONLEGACYWINDOWSFSENCODING`` environment variable. If stdin and/or stdout is redirected to a pipe, ``sys.stdin`` and/or ``sys.output`` use ``mbcs`` encoding by default rather than UTF-8. But in the UTF-8 mode, ``sys.stdin`` and ``sys.stdout`` always use the UTF-8 encoding. .. note: There is no POSIX locale on Windows. The ANSI code page is used to the locale encoding, and this code page never uses the ASCII encoding. Annex: Differences between the PEP 538 and the PEP 540 ====================================================== The PEP 538 uses the "C.UTF-8" locale which is quite new and only supported by a few Linux distributions; this locale is not currently supported by FreeBSD or macOS for example. This PEP 540 supports all operating systems. The PEP 538 only changes the behaviour for the POSIX locale. While the new UTF-8 mode of this PEP is only enabled by the POSIX locale, it can be enabled manually for any other locale. The PEP 538 is implemented with ``setlocale(LC_CTYPE, "C.UTF-8")``: any non-Python code running in the process is impacted by this change. This PEP is implemented in Python internals and ignores the locale: non-Python running in the same process is not aware of the "Python UTF-8 mode". Links ===== * `bpo-29240: Implementation of the PEP 540: Add a new UTF-8 mode `_ * `PEP 538 `_: "Coercing the legacy C locale to C.UTF-8" * `PEP 529 `_: "Change Windows filesystem encoding to UTF-8" * `PEP 528 `_: "Change Windows console encoding to UTF-8" * `PEP 383 `_: "Non-decodable Bytes in System Character Interfaces" Post History ============ * 2017-12: `[Python-Dev] PEP 540: Add a new UTF-8 mode `_ * 2017-04: `[Python-Dev] Proposed BDFL Delegate update for PEPs 538 & 540 (assuming UTF-8 for *nix system boundaries) `_ * 2017-01: `[Python-ideas] PEP 540: Add a new UTF-8 mode `_ * 2017-01: `bpo-28180: Implementation of the PEP 538: coerce C locale to C.utf-8 (msg284764) `_ * 2016-08-17: `bpo-27781: Change sys.getfilesystemencoding() on Windows to UTF-8 (msg272916) `_ -- Victor proposed ``-X utf8`` for the :pep:`529` (Change Windows filesystem encoding to UTF-8) Copyright ========= This document has been placed in the public domain. From victor.stinner at gmail.com Tue Dec 5 20:01:28 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 6 Dec 2017 02:01:28 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: > Annex: Differences between the PEP 538 and the PEP 540 > ====================================================== > > The PEP 538 uses the "C.UTF-8" locale which is quite new and only > supported by a few Linux distributions; this locale is not currently > supported by FreeBSD or macOS for example. This PEP 540 supports all > operating systems. > > The PEP 538 only changes the behaviour for the POSIX locale. While the > new UTF-8 mode of this PEP is only enabled by the POSIX locale, it can > be enabled manually for any other locale. > > The PEP 538 is implemented with ``setlocale(LC_CTYPE, "C.UTF-8")``: any > non-Python code running in the process is impacted by this change. This > PEP is implemented in Python internals and ignores the locale: > non-Python running in the same process is not aware of the "Python UTF-8 > mode". The main advantage of the PEP 538 ?over* the PEP 540 is that, for the POSIX locale, non-Python code running in the same process gets the UTF-8 encoding. To be honest, I'm not sure that there is a lot of code in the wild which uses "text" types like the C type wchar_t* and rely on the locale encoding. Almost all C library handle data as bytes using the char* type, like filenames and environment variables. First I understood that the PEP 538 changed the locale encoding using an environment variable. But no, it's implemented with setlocale(LC_CTYPE, "C.UTF-8") which only impacts the current process and is not inherited by child processes. So I'm not sure anymore that PEP 538 and PEP 540 are really complementary. I'm not sure how PyGTK interacts with the PEP 538 for example. Does it use UTF-8 with the POSIX locale? Victor From ncoghlan at gmail.com Tue Dec 5 21:31:12 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Dec 2017 12:31:12 +1000 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: On 6 December 2017 at 11:01, Victor Stinner wrote: >> Annex: Differences between the PEP 538 and the PEP 540 >> ====================================================== >> >> The PEP 538 uses the "C.UTF-8" locale which is quite new and only >> supported by a few Linux distributions; this locale is not currently >> supported by FreeBSD or macOS for example. This PEP 540 supports all >> operating systems. >> >> The PEP 538 only changes the behaviour for the POSIX locale. While the >> new UTF-8 mode of this PEP is only enabled by the POSIX locale, it can >> be enabled manually for any other locale. >> >> The PEP 538 is implemented with ``setlocale(LC_CTYPE, "C.UTF-8")``: any >> non-Python code running in the process is impacted by this change. This >> PEP is implemented in Python internals and ignores the locale: >> non-Python running in the same process is not aware of the "Python UTF-8 >> mode". I submitted a PR to reword this part: https://github.com/python/peps/pull/493 > The main advantage of the PEP 538 ?over* the PEP 540 is that, for the > POSIX locale, non-Python code running in the same process gets the > UTF-8 encoding. > > To be honest, I'm not sure that there is a lot of code in the wild > which uses "text" types like the C type wchar_t* and rely on the > locale encoding. Almost all C library handle data as bytes using the > char* type, like filenames and environment variables. At the very least, GNU readline breaks if you don't change the locale setting: https://www.python.org/dev/peps/pep-0538/#considering-locale-coercion-independently-of-utf-8-mode Given that we found an example of this directly in the standard library, I assume that there are plenty more in third party extension modules (especially once we take C++ extensions into account, not just C ones). > First I understood that the PEP 538 changed the locale encoding using > an environment variable. But no, it's implemented with > setlocale(LC_CTYPE, "C.UTF-8") which only impacts the current process > and is not inherited by child processes. So I'm not sure anymore that > PEP 538 and PEP 540 are really complementary. It sets the LC_CTYPE environment variable as well: https://www.python.org/dev/peps/pep-0538/#explicitly-setting-lc-ctype-for-utf-8-locale-coercion The relevant code is in _coerce_default_locale_settings (currently at https://github.com/python/cpython/blob/master/Python/pylifecycle.c#L448) > I'm not sure how PyGTK interacts with the PEP 538 for example. Does it > use UTF-8 with the POSIX locale? Desktop environments aim not to get into this situation in the first place by ensuring they're using a more appropriate locale :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From nad at python.org Tue Dec 5 21:29:55 2017 From: nad at python.org (Ned Deily) Date: Tue, 5 Dec 2017 21:29:55 -0500 Subject: [Python-Dev] [RELEASE] Python 3.6.4rc1 and 3.7.0a3 now available for testing Message-ID: Announcing the immediate availability of Python 3.6.4 release candidate 1 and of Python 3.7.0 alpha 3! Python 3.6.4rc1 is the first release candidate for Python 3.6.4, the next maintenance release of Python 3.6. While 3.6.4rc1 is a preview release and, thus, not intended for production environments, we encourage you to explore it and provide feedback via the Python bug tracker (https://bugs.python.org). 3.6.4 is planned for final release on 2017-12-18 with the next maintenance release expected to follow in about 3 months. You can find Python 3.6.4rc1 and more information here: https://www.python.org/downloads/release/python-364rc1/ Python 3.7.0a3 is the third of four planned alpha releases of Python 3.7, the next feature release of Python. During the alpha phase, Python 3.7 remains under heavy development: additional features will be added and existing features may be modified or deleted. Please keep in mind that this is a preview release and its use is not recommended for production environments. The next preview release, 3.7.0a4, is planned for 2018-01-08. You can find Python 3.7.0a3 and more information here: https://www.python.org/downloads/release/python-370a3/ -- Ned Deily nad at python.org -- [] From ericsnowcurrently at gmail.com Tue Dec 5 21:51:17 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 5 Dec 2017 19:51:17 -0700 Subject: [Python-Dev] PEP 554 v4 (new interpreters module) Message-ID: Hi all, I've finally updated PEP 554. Feedback would be most welcome. The PEP is in a pretty good place now and I hope to we're close to a decision to accept it. :) In addition to resolving the open questions, I've also made the following changes to the PEP: * put an API summary at the top and moved the full API description down * add the "is_shareable()" function to indicate if an object can be shared * added None as a shareable object Regarding the open questions: * "Leaking exceptions across interpreters" I chose to go with an approach that effectively creates a traceback.TracebackException proxy of the original exception, wraps that in a RuntimeError, and raises that in the calling interpreter. Raising an exception that safely preserves the original exception and traceback seems like the most intuitive behavior (to me, as a user). The only alternative that made sense is to fully duplicate the exception and traceback (minus stack frames) in the calling interpreter, which is probably overkill and likely to be confusing. * "Initial support for buffers in channels" I chose to add a "SendChannel.send_buffer(obj)" method for this. Supporting buffer objects from the beginning makes sense, opening good experimentation opportunities for a valuable set of users. Supporting buffer objects separately and explicitly helps set clear expectations for users. I decided not to go with a separate class (e.g. MemChannel) as it didn't seem like there's enough difference to warrant keeping them strictly separate. FWIW, I'm still strongly in favor of support for passing (copies of) bytes objects via channels. Passing objects to SendChannel.send() is obvious. Limiting it, for now, to bytes (and None) helps us avoid tying ourselves strongly to any particular implementation (it seems like all the reservations were relative to the implementation). So I do not see a reason to wait. * "Pass channels explicitly to run()?" I've applied the suggested solution (make "channels" an explicit keyword argument). -eric I've include the latest full text (https://www.python.org/dev/peps/pep-0554/) below: +++++++++++++++++++++++++++++++++++++++++++++++++ PEP: 554 Title: Multiple Interpreters in the Stdlib Author: Eric Snow Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2017-09-05 Python-Version: 3.7 Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017, 05-Dec-2017 Abstract ======== CPython has supported multiple interpreters in the same process (AKA "subinterpreters") since version 1.5. The feature has been available via the C-API. [c-api]_ Subinterpreters operate in `relative isolation from one another `_, which provides the basis for an `alternative concurrency model `_. This proposal introduces the stdlib ``interpreters`` module. The module will be `provisional `_. It exposes the basic functionality of subinterpreters already provided by the C-API, along with new functionality for sharing data between interpreters. Proposal ======== The ``interpreters`` module will be added to the stdlib. It will provide a high-level interface to subinterpreters and wrap a new low-level ``_interpreters`` (in the same was as the ``threading`` module). See the `Examples`_ section for concrete usage and use cases. Along with exposing the existing (in CPython) subinterpreter support, the module will also provide a mechanism for sharing data between interpreters. This mechanism centers around "channels", which are similar to queues and pipes. Note that *objects* are not shared between interpreters since they are tied to the interpreter in which they were created. Instead, the objects' *data* is passed between interpreters. See the `Shared data`_ section for more details about sharing between interpreters. At first only the following types will be supported for sharing: * None * bytes * PEP 3118 buffer objects (via ``send_buffer()``) Support for other basic types (e.g. int, Ellipsis) will be added later. API summary for interpreters module ----------------------------------- Here is a summary of the API for the ``interpreters`` module. For a more in-depth explanation of the proposed classes and functions, see the `"interpreters" Module API`_ section below. For creating and using interpreters: +------------------------------+----------------------------------------------+ | signature | description | +============================+=+==============================================+ | list_all() -> [Intepreter] | Get all existing interpreters. | +------------------------------+----------------------------------------------+ | get_current() -> Interpreter | Get the currently running interpreter. | +------------------------------+----------------------------------------------+ | create() -> Interpreter | Initialize a new (idle) Python interpreter. | +------------------------------+----------------------------------------------+ | +-----------------------+-----------------------------------------------------+ | signature | description | +=======================+=====================================================+ | class Interpreter(id) | A single interpreter. | +-----------------------+-----------------------------------------------------+ | .id | The interpreter's ID (read-only). | +-----------------------+-----------------------------------------------------+ | .is_running() -> Bool | Is the interpreter currently executing code? | +-----------------------+-----------------------------------------------------+ | .destroy() | Finalize and destroy the interpreter. | +-----------------------+-----------------------------------------------------+ | .run(src_str, /, \*, | | Run the given source code in the interpreter. | | channels=None) | | (This blocks the current thread until done.) | +-----------------------+-----------------------------------------------------+ For sharing data between interpreters: +--------------------------------+--------------------------------------------+ | signature | description | +================================+============================================+ | is_shareable(obj) -> Bool | | Can the object's data be shared | | | | between interpreters? | +--------------------------------+--------------------------------------------+ | create_channel() -> | | Create a new channel for passing | | (RecvChannel, SendChannel) | | data between interpreters. | +--------------------------------+--------------------------------------------+ | list_all_channels() -> | Get all open channels. | | [(RecvChannel, SendChannel)] | | +--------------------------------+--------------------------------------------+ | +-------------------------------+-----------------------------------------------+ | signature | description | +===============================+===============================================+ | class RecvChannel(id) | The receiving end of a channel. | +-------------------------------+-----------------------------------------------+ | .id | The channel's unique ID. | +-------------------------------+-----------------------------------------------+ | .interpreters | The list of associated interpreters. | +-------------------------------+-----------------------------------------------+ | .recv() -> object | | Get the next object from the channel, | | | | and wait if none have been sent. | | | | Associate the interpreter with the channel. | +-------------------------------+-----------------------------------------------+ | .recv_nowait(default=None) -> | | Like recv(), but return the default | | object | | instead of waiting. | +-------------------------------+-----------------------------------------------+ | .close() | | No longer associate the current interpreter | | | | with the channel (on the receiving end). | +-------------------------------+-----------------------------------------------+ | +---------------------------+-------------------------------------------------+ | signature | description | +===========================+=================================================+ | class SendChannel(id) | The sending end of a channel. | +---------------------------+-------------------------------------------------+ | .id | The channel's unique ID. | +---------------------------+-------------------------------------------------+ | .interpreters | The list of associated interpreters. | +---------------------------+-------------------------------------------------+ | .send(obj) | | Send the object (i.e. its data) to the | | | | receiving end of the channel and wait. | | | | Associate the interpreter with the channel. | +---------------------------+-------------------------------------------------+ | .send_nowait(obj) | | Like send(), but Fail if not received. | +---------------------------+-------------------------------------------------+ | .send_buffer(obj) | | Send the object's (PEP 3118) buffer to the | | | | receiving end of the channel and wait. | | | | Associate the interpreter with the channel. | +---------------------------+-------------------------------------------------+ | .send_buffer_nowait(obj) | | Like send_buffer(), but fail if not received. | +---------------------------+-------------------------------------------------+ | .close() | | No longer associate the current interpreter | | | | with the channel (on the sending end). | +---------------------------+-------------------------------------------------+ Examples ======== Run isolated code ----------------- :: interp = interpreters.create() print('before') interp.run('print("during")') print('after') Run in a thread --------------- :: interp = interpreters.create() def run(): interp.run('print("during")') t = threading.Thread(target=run) print('before') t.start() print('after') Pre-populate an interpreter --------------------------- :: interp = interpreters.create() interp.run(tw.dedent(""" import some_lib import an_expensive_module some_lib.set_up() """)) wait_for_request() interp.run(tw.dedent(""" some_lib.handle_request() """)) Handling an exception --------------------- :: interp = interpreters.create() try: interp.run(tw.dedent(""" raise KeyError """)) except KeyError: print("got the error from the subinterpreter") Synchronize using a channel --------------------------- :: interp = interpreters.create() r, s = interpreters.create_channel() def run(): interp.run(tw.dedent(""" reader.recv() print("during") reader.close() """), reader=r)) t = threading.Thread(target=run) print('before') t.start() print('after') s.send(b'') s.close() Sharing a file descriptor ------------------------- :: interp = interpreters.create() r1, s1 = interpreters.create_channel() r2, s2 = interpreters.create_channel() def run(): interp.run(tw.dedent(""" fd = int.from_bytes( reader.recv(), 'big') for line in os.fdopen(fd): print(line) writer.send(b'') """), reader=r1, writer=s2) t = threading.Thread(target=run) t.start() with open('spamspamspam') as infile: fd = infile.fileno().to_bytes(1, 'big') s.send(fd) r.recv() Passing objects via marshal --------------------------- :: interp = interpreters.create() r, s = interpreters.create_fifo() interp.run(tw.dedent(""" import marshal """), reader=r) def run(): interp.run(tw.dedent(""" data = reader.recv() while data: obj = marshal.loads(data) do_something(obj) data = reader.recv() reader.close() """), reader=r) t = threading.Thread(target=run) t.start() for obj in input: data = marshal.dumps(obj) s.send(data) s.send(b'') Passing objects via pickle -------------------------- :: interp = interpreters.create() r, s = interpreters.create_channel() interp.run(tw.dedent(""" import pickle """), reader=r) def run(): interp.run(tw.dedent(""" data = reader.recv() while data: obj = pickle.loads(data) do_something(obj) data = reader.recv() reader.close() """), reader=r) t = threading.Thread(target=run) t.start() for obj in input: data = pickle.dumps(obj) s.send(data) s.send(b'') Running a module ---------------- :: interp = interpreters.create() main_module = mod_name interp.run(f'import runpy; runpy.run_module({main_module!r})') Running as script (including zip archives & directories) -------------------------------------------------------- :: interp = interpreters.create() main_script = path_name interp.run(f"import runpy; runpy.run_path({main_script!r})") Running in a thread pool executor --------------------------------- :: interps = [interpreters.create() for i in range(5)] with concurrent.futures.ThreadPoolExecutor(max_workers=len(interps)) as pool: print('before') for interp in interps: pool.submit(interp.run, 'print("starting"); print("stopping")' print('after') Rationale ========= Running code in multiple interpreters provides a useful level of isolation within the same process. This can be leveraged in a number of ways. Furthermore, subinterpreters provide a well-defined framework in which such isolation may extended. Nick Coghlan explained some of the benefits through a comparison with multi-processing [benefits]_:: [I] expect that communicating between subinterpreters is going to end up looking an awful lot like communicating between subprocesses via shared memory. The trade-off between the two models will then be that one still just looks like a single process from the point of view of the outside world, and hence doesn't place any extra demands on the underlying OS beyond those required to run CPython with a single interpreter, while the other gives much stricter isolation (including isolating C globals in extension modules), but also demands much more from the OS when it comes to its IPC capabilities. The security risk profiles of the two approaches will also be quite different, since using subinterpreters won't require deliberately poking holes in the process isolation that operating systems give you by default. CPython has supported subinterpreters, with increasing levels of support, since version 1.5. While the feature has the potential to be a powerful tool, subinterpreters have suffered from neglect because they are not available directly from Python. Exposing the existing functionality in the stdlib will help reverse the situation. This proposal is focused on enabling the fundamental capability of multiple isolated interpreters in the same Python process. This is a new area for Python so there is relative uncertainly about the best tools to provide as companions to subinterpreters. Thus we minimize the functionality we add in the proposal as much as possible. Concerns -------- * "subinterpreters are not worth the trouble" Some have argued that subinterpreters do not add sufficient benefit to justify making them an official part of Python. Adding features to the language (or stdlib) has a cost in increasing the size of the language. So an addition must pay for itself. In this case, subinterpreters provide a novel concurrency model focused on isolated threads of execution. Furthermore, they provide an opportunity for changes in CPython that will allow simulateous use of multiple CPU cores (currently prevented by the GIL). Alternatives to subinterpreters include threading, async, and multiprocessing. Threading is limited by the GIL and async isn't the right solution for every problem (nor for every person). Multiprocessing is likewise valuable in some but not all situations. Direct IPC (rather than via the multiprocessing module) provides similar benefits but with the same caveat. Notably, subinterpreters are not intended as a replacement for any of the above. Certainly they overlap in some areas, but the benefits of subinterpreters include isolation and (potentially) performance. In particular, subinterpreters provide a direct route to an alternate concurrency model (e.g. CSP) which has found success elsewhere and will appeal to some Python users. That is the core value that the ``interpreters`` module will provide. * "stdlib support for subinterpreters adds extra burden on C extension authors" In the `Interpreter Isolation`_ section below we identify ways in which isolation in CPython's subinterpreters is incomplete. Most notable is extension modules that use C globals to store internal state. PEP 3121 and PEP 489 provide a solution for most of the problem, but one still remains. [petr-c-ext]_ Until that is resolved, C extension authors will face extra difficulty to support subinterpreters. Consequently, projects that publish extension modules may face an increased maintenance burden as their users start using subinterpreters, where their modules may break. This situation is limited to modules that use C globals (or use libraries that use C globals) to store internal state. For numpy, the reported-bug rate is one every 6 months. [bug-rate]_ Ultimately this comes down to a question of how often it will be a problem in practice: how many projects would be affected, how often their users will be affected, what the additional maintenance burden will be for projects, and what the overall benefit of subinterpreters is to offset those costs. The position of this PEP is that the actual extra maintenance burden will be small and well below the threshold at which subinterpreters are worth it. About Subinterpreters ===================== Concurrency ----------- Concurrency is a challenging area of software development. Decades of research and practice have led to a wide variety of concurrency models, each with different goals. Most center on correctness and usability. One class of concurrency models focuses on isolated threads of execution that interoperate through some message passing scheme. A notable example is `Communicating Sequential Processes`_ (CSP), upon which Go's concurrency is based. The isolation inherent to subinterpreters makes them well-suited to this approach. Shared data ----------- Subinterpreters are inherently isolated (with caveats explained below), in contrast to threads. So the same communicate-via-shared-memory approach doesn't work. Without an alternative, effective use of concurrency via subinterpreters is significantly limited. The key challenge here is that sharing objects between interpreters faces complexity due to various constraints on object ownership, visibility, and mutability. At a conceptual level it's easier to reason about concurrency when objects only exist in one interpreter at a time. At a technical level, CPython's current memory model limits how Python *objects* may be shared safely between interpreters; effectively objects are bound to the interpreter in which they were created. Furthermore the complexity of *object* sharing increases as subinterpreters become more isolated, e.g. after GIL removal. Consequently,the mechanism for sharing needs to be carefully considered. There are a number of valid solutions, several of which may be appropriate to support in Python. This proposal provides a single basic solution: "channels". Ultimately, any other solution will look similar to the proposed one, which will set the precedent. Note that the implementation of ``Interpreter.run()`` can be done in a way that allows for multiple solutions to coexist, but doing so is not technically a part of the proposal here. Regarding the proposed solution, "channels", it is a basic, opt-in data sharing mechanism that draws inspiration from pipes, queues, and CSP's channels. [fifo]_ As simply described earlier by the API summary, channels have two operations: send and receive. A key characteristic of those operations is that channels transmit data derived from Python objects rather than the objects themselves. When objects are sent, their data is extracted. When the "object" is received in the other interpreter, the data is converted back into an object. To make this work, the mutable shared state will be managed by the Python runtime, not by any of the interpreters. Initially we will support only one type of objects for shared state: the channels provided by ``create_channel()``. Channels, in turn, will carefully manage passing objects between interpreters. This approach, including keeping the API minimal, helps us avoid further exposing any underlying complexity to Python users. Along those same lines, we will initially restrict the types that may be passed through channels to the following: * None * bytes * PEP 3118 buffer objects (via ``send_buffer()``) Limiting the initial shareable types is a practical matter, reducing the potential complexity of the initial implementation. There are a number of strategies we may pursue in the future to expand supported objects and object sharing strategies. Interpreter Isolation --------------------- CPython's interpreters are intended to be strictly isolated from each other. Each interpreter has its own copy of all modules, classes, functions, and variables. The same applies to state in C, including in extension modules. The CPython C-API docs explain more. [caveats]_ However, there are ways in which interpreters share some state. First of all, some process-global state remains shared: * file descriptors * builtin types (e.g. dict, bytes) * singletons (e.g. None) * underlying static module data (e.g. functions) for builtin/extension/frozen modules There are no plans to change this. Second, some isolation is faulty due to bugs or implementations that did not take subinterpreters into account. This includes things like extension modules that rely on C globals. [cryptography]_ In these cases bugs should be opened (some are already): * readline module hook functions (http://bugs.python.org/issue4202) * memory leaks on re-init (http://bugs.python.org/issue21387) Finally, some potential isolation is missing due to the current design of CPython. Improvements are currently going on to address gaps in this area: * interpreters share the GIL * interpreters share memory management (e.g. allocators, gc) * GC is not run per-interpreter [global-gc]_ * at-exit handlers are not run per-interpreter [global-atexit]_ * extensions using the ``PyGILState_*`` API are incompatible [gilstate]_ Existing Usage -------------- Subinterpreters are not a widely used feature. In fact, the only documented cases of wide-spread usage are `mod_wsgi `_and `JEP `_. On the one hand, this case provides confidence that existing subinterpreter support is relatively stable. On the other hand, there isn't much of a sample size from which to judge the utility of the feature. Provisional Status ================== The new ``interpreters`` module will be added with "provisional" status (see PEP 411). This allows Python users to experiment with the feature and provide feedback while still allowing us to adjust to that feedback. The module will be provisional in Python 3.7 and we will make a decision before the 3.8 release whether to keep it provisional, graduate it, or remove it. Alternate Python Implementations ================================ I'll be soliciting feedback from the different Python implementors about subinterpreter support. Multiple-interpter support in the major Python implementations: TBD * jython: yes [jython]_ * ironpython: yes? * pypy: maybe not? [pypy]_ * micropython: ??? "interpreters" Module API ========================= The module provides the following functions: ``list_all()``:: Return a list of all existing interpreters. ``get_current()``:: Return the currently running interpreter. ``create()``:: Initialize a new Python interpreter and return it. The interpreter will be created in the current thread and will remain idle until something is run in it. The interpreter may be used in any thread and will run in whichever thread calls ``interp.run()``. The module also provides the following class: ``Interpreter(id)``:: id: The interpreter's ID (read-only). is_running(): Return whether or not the interpreter is currently executing code. Calling this on the current interpreter will always return True. destroy(): Finalize and destroy the interpreter. This may not be called on an already running interpreter. Doing so results in a RuntimeError. run(source_str, /, *, channels=None): Run the provided Python source code in the interpreter. If the "channels" keyword argument is provided (and is a mapping of attribute names to channels) then it is added to the interpreter's execution namespace (the interpreter's "__main__" module). If any of the values are not are not RecvChannel or SendChannel instances then ValueError gets raised. This may not be called on an already running interpreter. Doing so results in a RuntimeError. A "run()" call is similar to a function call. Once it completes, the code that called "run()" continues executing (in the original interpreter). Likewise, if there is any uncaught exception then it effectively (see below) propagates into the code where ``run()`` was called. However, unlike function calls (but like threads), there is no return value. If any value is needed, pass it out via a channel. The big difference is that "run()" executes the code in an entirely different interpreter, with entirely separate state. The state of the current interpreter in the current OS thread is swapped out with the state of the target interpreter (the one that will execute the code). When the target finishes executing, the original interpreter gets swapped back in and its execution resumes. So calling "run()" will effectively cause the current Python thread to pause. Sometimes you won't want that pause, in which case you should make the "run()" call in another thread. To do so, add a function that calls "run()" and then run that function in a normal "threading.Thread". Note that the interpreter's state is never reset, neither before "run()" executes the code nor after. Thus the interpreter state is preserved between calls to "run()". This includes "sys.modules", the "builtins" module, and the internal state of C extension modules. Also note that "run()" executes in the namespace of the "__main__" module, just like scripts, the REPL, "-m", and "-c". Just as the interpreter's state is not ever reset, the "__main__" module is never reset. You can imagine concatenating the code from each "run()" call into one long script. This is the same as how the REPL operates. Regarding uncaught exceptions, we noted that they are "effectively" propagated into the code where ``run()`` was called. To prevent leaking exceptions (and tracebacks) between interpreters, we create a surrogate of the exception and its traceback (see ``traceback.TracebackException``), wrap it in a RuntimeError, and raise that. Supported code: source text. API for sharing data -------------------- Subinterpreters are less useful without a mechanism for sharing data between them. Sharing actual Python objects between interpreters, however, has enough potential problems that we are avoiding support for that here. Instead, only mimimum set of types will be supported. Initially this will include ``bytes`` and channels. Further types may be supported later. The ``interpreters`` module provides a way for users to determine whether an object is shareable or not: ``is_shareable(obj)``:: Return True if the object may be shared between interpreters. This does not necessarily mean that the actual objects will be shared. Insead, it means that the objects' underlying data will be shared in a cross-interpreter way, whether via a proxy, a copy, or some other means. This proposal provides two ways to do share such objects between interpreters. First, shareable objects may be passed to ``run()`` as keyword arguments, where they are effectively injected into the target interpreter's ``__main__`` module. This is mainly intended for sharing meta-objects (e.g. channels) between interpreters, as it is less useful to pass other objects (like ``bytes``) to ``run``. Second, the main mechanism for sharing objects (i.e. their data) between interpreters is through channels. A channel is a simplex FIFO similar to a pipe. The main difference is that channels can be associated with zero or more interpreters on either end. Unlike queues, which are also many-to-many, channels have no buffer. ``create_channel()``:: Create a new channel and return (recv, send), the RecvChannel and SendChannel corresponding to the ends of the channel. The channel is not closed and destroyed (i.e. garbage-collected) until the number of associated interpreters returns to 0. An interpreter gets associated with a channel by calling its "send()" or "recv()" method. That association gets dropped by calling "close()" on the channel. Both ends of the channel are supported "shared" objects (i.e. may be safely shared by different interpreters. Thus they may be passed as keyword arguments to "Interpreter.run()". ``list_all_channels()``:: Return a list of all open (RecvChannel, SendChannel) pairs. ``RecvChannel(id)``:: The receiving end of a channel. An interpreter may use this to receive objects from another interpreter. At first only bytes will be supported. id: The channel's unique ID. interpreters: The list of associated interpreters: those that have called the "recv()" or "__next__()" methods and haven't called "close()". recv(): Return the next object (i.e. the data from the sent object) from the channel. If none have been sent then wait until the next send. This associates the current interpreter with the channel. If the channel is already closed (see the close() method) then raise EOFError. If the channel isn't closed, but the current interpreter already called the "close()" method (which drops its association with the channel) then raise ValueError. recv_nowait(default=None): Return the next object from the channel. If none have been sent then return the default. Otherwise, this is the same as the "recv()" method. close(): No longer associate the current interpreter with the channel (on the receiving end) and block future association (via the "recv()" method. If the interpreter was never associated with the channel then still block future association. Once an interpreter is no longer associated with the channel, subsequent (or current) send() and recv() calls from that interpreter will raise ValueError (or EOFError if the channel is actually marked as closed). Once the number of associated interpreters on both ends drops to 0, the channel is actually marked as closed. The Python runtime will garbage collect all closed channels, though it may not be immediately. Note that "close()" is automatically called in behalf of the current interpreter when the channel is no longer used (i.e. has no references) in that interpreter. This operation is idempotent. Return True if "close()" has not been called before by the current interpreter. ``SendChannel(id)``:: The sending end of a channel. An interpreter may use this to send objects to another interpreter. At first only bytes will be supported. id: The channel's unique ID. interpreters: The list of associated interpreters (those that have called the "send()" method). send(obj): Send the object (i.e. its data) to the receiving end of the channel. Wait until the object is received. If the the object is not shareable then ValueError is raised. Currently only bytes are supported. If the channel is already closed (see the close() method) then raise EOFError. If the channel isn't closed, but the current interpreter already called the "close()" method (which drops its association with the channel) then raise ValueError. send_nowait(obj): Send the object to the receiving end of the channel. If the other end is not currently receiving then raise RuntimeError. Otherwise this is the same as "send()". send_buffer(obj): Send a MemoryView of the object rather than the object. Otherwise this is the same as send(). Note that the object must implement the PEP 3118 buffer protocol. send_buffer_nowait(obj): Send a MemoryView of the object rather than the object. If the other end is not currently receiving then raise RuntimeError. Otherwise this is the same as "send_buffer()". close(): This is the same as "RecvChannel.close(), but applied to the sending end of the channel. Note that ``send_buffer()`` is similar to how ``multiprocessing.Connection`` works. [mp-conn]_ Open Questions ============== None Open Implementation Questions ============================= Does every interpreter think that their thread is the "main" thread? -------------------------------------------------------------------- (This is more of an implementation detail that an issue for the PEP.) CPython's interpreter implementation identifies the OS thread in which it was started as the "main" thread. The interpreter the has slightly different behavior depending on if the current thread is the main one or not. This presents a problem in cases where "main thread" is meant to imply "main thread in the main interpreter" [main-thread]_, where the main interpreter is the initial one. Disallow subinterpreters in the main thread? -------------------------------------------- (This is more of an implementation detail that an issue for the PEP.) This is a specific case of the above issue. Currently in CPython, "we need a main \*thread\* in order to sensibly manage the way signal handling works across different platforms". [main-thread]_ Since signal handlers are part of the interpreter state, running a subinterpreter in the main thread means that the main interpreter can no longer properly handle signals (since it's effectively paused). Furthermore, running a subinterpreter in the main thread would conceivably allow setting signal handlers on that interpreter, which would likewise impact signal handling when that interpreter isn't running or is running in a different thread. Ultimately, running subinterpreters in the main OS thread introduces complications to the signal handling implementation. So it may make the most sense to disallow running subinterpreters in the main thread. Support for it could be considered later. The downside is that folks wanting to try out subinterpreters would be required to take the extra step of using threads. This could slow adoption and experimentation, whereas without the restriction there's less of an obstacle. Deferred Functionality ====================== In the interest of keeping this proposal minimal, the following functionality has been left out for future consideration. Note that this is not a judgement against any of said capability, but rather a deferment. That said, each is arguably valid. Interpreter.call() ------------------ It would be convenient to run existing functions in subinterpreters directly. ``Interpreter.run()`` could be adjusted to support this or a ``call()`` method could be added:: Interpreter.call(f, *args, **kwargs) This suffers from the same problem as sharing objects between interpreters via queues. The minimal solution (running a source string) is sufficient for us to get the feature out where it can be explored. timeout arg to recv() and send() -------------------------------- Typically functions that have a ``block`` argument also have a ``timeout`` argument. It sometimes makes sense to do likewise for functions that otherwise block, like the channel ``recv()`` and ``send()`` methods. We can add it later if needed. get_main() ---------- CPython has a concept of a "main" interpreter. This is the initial interpreter created during CPython's runtime initialization. It may be useful to identify the main interpreter. For instance, the main interpreter should not be destroyed. However, for the basic functionality of a high-level API a ``get_main()`` function is not necessary. Furthermore, there is no requirement that a Python implementation have a concept of a main interpreter. So until there's a clear need we'll leave ``get_main()`` out. Interpreter.run_in_thread() --------------------------- This method would make a ``run()`` call for you in a thread. Doing this using only ``threading.Thread`` and ``run()`` is relatively trivial so we've left it out. Synchronization Primitives -------------------------- The ``threading`` module provides a number of synchronization primitives for coordinating concurrent operations. This is especially necessary due to the shared-state nature of threading. In contrast, subinterpreters do not share state. Data sharing is restricted to channels, which do away with the need for explicit synchronization. If any sort of opt-in shared state support is added to subinterpreters in the future, that same effort can introduce synchronization primitives to meet that need. CSP Library ----------- A ``csp`` module would not be a large step away from the functionality provided by this PEP. However, adding such a module is outside the minimalist goals of this proposal. Syntactic Support ----------------- The ``Go`` language provides a concurrency model based on CSP, so it's similar to the concurrency model that subinterpreters support. ``Go`` provides syntactic support, as well several builtin concurrency primitives, to make concurrency a first-class feature. Conceivably, similar syntactic (and builtin) support could be added to Python using subinterpreters. However, that is *way* outside the scope of this PEP! Multiprocessing --------------- The ``multiprocessing`` module could support subinterpreters in the same way it supports threads and processes. In fact, the module's maintainer, Davin Potts, has indicated this is a reasonable feature request. However, it is outside the narrow scope of this PEP. C-extension opt-in/opt-out -------------------------- By using the ``PyModuleDef_Slot`` introduced by PEP 489, we could easily add a mechanism by which C-extension modules could opt out of support for subinterpreters. Then the import machinery, when operating in a subinterpreter, would need to check the module for support. It would raise an ImportError if unsupported. Alternately we could support opting in to subinterpreter support. However, that would probably exclude many more modules (unnecessarily) than the opt-out approach. The scope of adding the ModuleDef slot and fixing up the import machinery is non-trivial, but could be worth it. It all depends on how many extension modules break under subinterpreters. Given the relatively few cases we know of through mod_wsgi, we can leave this for later. Poisoning channels ------------------ CSP has the concept of poisoning a channel. Once a channel has been poisoned, and ``send()`` or ``recv()`` call on it will raise a special exception, effectively ending execution in the interpreter that tried to use the poisoned channel. This could be accomplished by adding a ``poison()`` method to both ends of the channel. The ``close()`` method could work if it had a ``force`` option to force the channel closed. Regardless, these semantics are relatively specialized and can wait. Sending channels over channels ------------------------------ Some advanced usage of subinterpreters could take advantage of the ability to send channels over channels, in addition to bytes. Given that channels will already be multi-interpreter safe, supporting then in ``RecvChannel.recv()`` wouldn't be a big change. However, this can wait until the basic functionality has been ironed out. Reseting __main__ ----------------- As proposed, every call to ``Interpreter.run()`` will execute in the namespace of the interpreter's existing ``__main__`` module. This means that data persists there between ``run()`` calls. Sometimes this isn't desireable and you want to execute in a fresh ``__main__``. Also, you don't necessarily want to leak objects there that you aren't using any more. Note that the following won't work right because it will clear too much (e.g. ``__name__`` and the other "__dunder__" attributes:: interp.run('globals().clear()') Possible solutions include: * a ``create()`` arg to indicate resetting ``__main__`` after each ``run`` call * an ``Interpreter.reset_main`` flag to support opting in or out after the fact * an ``Interpreter.reset_main()`` method to opt in when desired * ``importlib.util.reset_globals()`` [reset_globals]_ Also note that reseting ``__main__`` does nothing about state stored in other modules. So any solution would have to be clear about the scope of what is being reset. Conceivably we could invent a mechanism by which any (or every) module could be reset, unlike ``reload()`` which does not clear the module before loading into it. Regardless, since ``__main__`` is the execution namespace of the interpreter, resetting it has a much more direct correlation to interpreters and their dynamic state than does resetting other modules. So a more generic module reset mechanism may prove unnecessary. This isn't a critical feature initially. It can wait until later if desirable. Support passing ints in channels -------------------------------- Passing ints around should be fine and ultimately is probably desirable. However, we can get by with serializing them as bytes for now. The goal is a minimal API for the sake of basic functionality at first. File descriptors and sockets in channels ---------------------------------------- Given that file descriptors and sockets are process-global resources, support for passing them through channels is a reasonable idea. They would be a good candidate for the first effort at expanding the types that channels support. They aren't strictly necessary for the initial API. Integration with async ---------------------- Per Antoine Pitrou [async]_:: Has any thought been given to how FIFOs could integrate with async code driven by an event loop (e.g. asyncio)? I think the model of executing several asyncio (or Tornado) applications each in their own subinterpreter may prove quite interesting to reconcile multi- core concurrency with ease of programming. That would require the FIFOs to be able to synchronize on something an event loop can wait on (probably a file descriptor?). A possible solution is to provide async implementations of the blocking channel methods (``__next__()``, ``recv()``, and ``send()``). However, the basic functionality of subinterpreters does not depend on async and can be added later. Support for iteration --------------------- Supporting iteration on ``RecvChannel`` (via ``__iter__()`` or ``_next__()``) may be useful. A trivial implementation would use the ``recv()`` method, similar to how files do iteration. Since this isn't a fundamental capability and has a simple analog, adding iteration support can wait until later. Channel context managers ------------------------ Context manager support on ``RecvChannel`` and ``SendChannel`` may be helpful. The implementation would be simple, wrapping a call to ``close()`` like files do. As with iteration, this can wait. Pipes and Queues ---------------- With the proposed object passing machanism of "channels", other similar basic types aren't required to achieve the minimal useful functionality of subinterpreters. Such types include pipes (like channels, but one-to-one) and queues (like channels, but buffered). See below in `Rejected Ideas` for more information. Even though these types aren't part of this proposal, they may still be useful in the context of concurrency. Adding them later is entirely reasonable. The could be trivially implemented as wrappers around channels. Alternatively they could be implemented for efficiency at the same low level as channels. interpreters.RunFailedError --------------------------- As currently proposed, ``Interpreter.run()`` offers you no way to distinguish an error coming from the subinterpreter from any other error in the current interpreter. Your only option would be to explicitly wrap your ``run()`` call in a ``try: ... except RuntimeError:`` (since we wrap a proxy of the original exception in a RuntimeError and raise that). If this is a problem in practice then would could add something like ``interpreters.RunFailedError`` (subclassing RuntimeError) and raise that in ``run()``. Return a lock from send() ------------------------- When sending an object through a channel, you don't have a way of knowing when the object gets received on the other end. One way to work around this is to return a locked ``threading.Lock`` from ``SendChannel.send()`` that unlocks once the object is received. This matters for buffered channels (i.e. queues). For unbuffered channels it is a non-issue. So this can be dealt with once channels support buffering. Rejected Ideas ============== Explicit channel association ---------------------------- Interpreters are implicitly associated with channels upon ``recv()`` and ``send()`` calls. They are de-associated with ``close()`` calls. The alternative would be explicit methods. It would be either ``add_channel()`` and ``remove_channel()`` methods on ``Interpreter`` objects or something similar on channel objects. In practice, this level of management shouldn't be necessary for users. So adding more explicit support would only add clutter to the API. Use pipes instead of channels ----------------------------- A pipe would be a simplex FIFO between exactly two interpreters. For most use cases this would be sufficient. It could potentially simplify the implementation as well. However, it isn't a big step to supporting a many-to-many simplex FIFO via channels. Also, with pipes the API ends up being slightly more complicated, requiring naming the pipes. Use queues instead of channels ------------------------------ The main difference between queues and channels is that queues support buffering. This would complicate the blocking semantics of ``recv()`` and ``send()``. Also, queues can be built on top of channels. "enumerate" ----------- The ``list_all()`` function provides the list of all interpreters. In the threading module, which partly inspired the proposed API, the function is called ``enumerate()``. The name is different here to avoid confusing Python users that are not already familiar with the threading API. For them "enumerate" is rather unclear, whereas "list_all" is clear. Alternate solutions to prevent leaking exceptions across interpreters --------------------------------------------------------------------- In function calls, uncaught exceptions propagate to the calling frame. The same approach could be taken with ``run()``. However, this would mean that exception objects would leak across the inter-interpreter boundary. Likewise, the frames in the traceback would potentially leak. While that might not be a problem currently, it would be a problem once interpreters get better isolation relative to memory management (which is necessary to stop sharing the GIL between interpreters). We've resolved the semantics of how the exceptions propagate by raising a RuntimeError instead, which wraps a safe proxy for the original exception and traceback. Rejected possible solutions: * set the RuntimeError's __cause__ to the proxy of the original exception * reproduce the exception and traceback in the original interpreter and raise that. * convert at the boundary (a la ``subprocess.CalledProcessError``) (requires a cross-interpreter representation) * support customization via ``Interpreter.excepthook`` (requires a cross-interpreter representation) * wrap in a proxy at the boundary (including with support for something like ``err.raise()`` to propagate the traceback). * return the exception (or its proxy) from ``run()`` instead of raising it * return a result object (like ``subprocess`` does) [result-object]_ (unecessary complexity?) * throw the exception away and expect users to deal with unhandled exceptions explicitly in the script they pass to ``run()`` (they can pass error info out via channels); with threads you have to do something similar References ========== .. [c-api] https://docs.python.org/3/c-api/init.html#sub-interpreter-support .. _Communicating Sequential Processes: .. [CSP] https://en.wikipedia.org/wiki/Communicating_sequential_processes https://github.com/futurecore/python-csp .. [fifo] https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Pipe https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue https://docs.python.org/3/library/queue.html#module-queue http://stackless.readthedocs.io/en/2.7-slp/library/stackless/channels.html https://golang.org/doc/effective_go.html#sharing http://www.jtolds.com/writing/2016/03/go-channels-are-bad-and-you-should-feel-bad/ .. [caveats] https://docs.python.org/3/c-api/init.html#bugs-and-caveats .. [petr-c-ext] https://mail.python.org/pipermail/import-sig/2016-June/001062.html https://mail.python.org/pipermail/python-ideas/2016-April/039748.html .. [cryptography] https://github.com/pyca/cryptography/issues/2299 .. [global-gc] http://bugs.python.org/issue24554 .. [gilstate] https://bugs.python.org/issue10915 http://bugs.python.org/issue15751 .. [global-atexit] https://bugs.python.org/issue6531 .. [mp-conn] https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Connection .. [bug-rate] https://mail.python.org/pipermail/python-ideas/2017-September/047094.html .. [benefits] https://mail.python.org/pipermail/python-ideas/2017-September/047122.html .. [main-thread] https://mail.python.org/pipermail/python-ideas/2017-September/047144.html https://mail.python.org/pipermail/python-dev/2017-September/149566.html .. [reset_globals] https://mail.python.org/pipermail/python-dev/2017-September/149545.html .. [async] https://mail.python.org/pipermail/python-dev/2017-September/149420.html https://mail.python.org/pipermail/python-dev/2017-September/149585.html .. [result-object] https://mail.python.org/pipermail/python-dev/2017-September/149562.html .. [jython] https://mail.python.org/pipermail/python-ideas/2017-May/045771.html .. [pypy] https://mail.python.org/pipermail/python-ideas/2017-September/046973.html Copyright ========= This document has been placed in the public domain. From songofacandy at gmail.com Tue Dec 5 22:17:12 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 6 Dec 2017 12:17:12 +0900 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: I'm sorry about my laziness. I've very busy these months, but I'm back to OSS world from today. While I should review carefully again, I think I'm close to accept PEP 540. * PEP 540 really helps containers and old Linux machines PEP 538 doesn't work. And containers is really important for these days. Many new Pythonistas who is not Linux experts start using containers. * In recent years, UTF-8 fixed many mojibakes. Now UnicodeError is more usability problem for many Python users. So I agree opt-out UTF-8 mode is better than opt-in on POSIX locale. I don't have enough time to read all mails in ML archive. So if someone have opposite opinion, please remind me by this weekend. Regards, From ncoghlan at gmail.com Tue Dec 5 22:57:39 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Dec 2017 13:57:39 +1000 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: On 6 December 2017 at 05:11, Barry Warsaw wrote: > On Dec 5, 2017, at 13:24, Guido van Rossum wrote: > >> But the whole point of the PEP is that it only warns about deprecations in code over which the user has control -- likely __main__ is their own code, and they *can* handle it. > > I?m not so sure how true that is. I have no sense of the relative popularity of hand crafted dunder-mains vs entry point crafted ones. I know that in my own applications, I tend to use the latter (although pkg_resources performance issues bum me out). But then you have applications like pex that use fairly complex hand crafted dunder-mains in their zip files. In either case I don?t want consumers of my applications to have to worry about DeprecationWarnings, since *they* really can?t do anything about them. For something like pex, it would likely be appropriate to add the following to __main__.py: import sys, warnings if not sys.warnoptions: warnings.simplefilter("ignore") We don't currently provide that snippet anywhere in the docs, so the PEP suggests we add it: https://www.python.org/dev/peps/pep-0565/#other-documentation-updates > All that to say I really don?t know what the right thing to do here is. All of our fiddling with the reporting of DeprecationWarnings, not to mention PendingDeprecationWarnings and FutureWarnings feels like experimental shots in the dark, and I suspect we won?t really know if PEP 565 will be helpful, harmful, or neutral until it?s out in the wild for a while. I suspect either that what we?re trying to accomplish really can?t be done, or that we really don?t have a good understanding of the problem and we?re just chipping away at the edges. I'm entirely comfortable with telling app developers "Include this 3 line snippet if you don't want your users to see any warnings by default". Historically, we've never provided good documentation on how to do that, so apps have tended to rely on the default filters, and we've then had ongoing arguments about who wins and who loses in deciding what the defaults should be. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From songofacandy at gmail.com Tue Dec 5 23:07:18 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 6 Dec 2017 13:07:18 +0900 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: Oh, revised version is really short! And I have one worrying point. With UTF-8 mode, open()'s default encoding/error handler is UTF-8/surrogateescape. Containers are really growing. PyCharm supports Docker and many new Python developers use Docker instead of installing Python directly on their system, especially on Windows. And opening binary file without "b" option is very common mistake of new developers. If default error handler is surrogateescape, they lose a chance to notice their bug. On the other hand, it helps some use cases when user want byte-transparent behavior, without modifying code to use "surrogateescape" explicitly. Which is more important scenario? Anyone has opinion about it? Are there any rationals and use cases I missing? Regards, INADA Naoki On Wed, Dec 6, 2017 at 12:17 PM, INADA Naoki wrote: > I'm sorry about my laziness. > I've very busy these months, but I'm back to OSS world from today. > > While I should review carefully again, I think I'm close to accept PEP 540. > > * PEP 540 really helps containers and old Linux machines PEP 538 doesn't work. > And containers is really important for these days. Many new > Pythonistas who is > not Linux experts start using containers. > > * In recent years, UTF-8 fixed many mojibakes. Now UnicodeError is > more usability > problem for many Python users. So I agree opt-out UTF-8 mode is > better than opt-in > on POSIX locale. > > I don't have enough time to read all mails in ML archive. > So if someone have opposite opinion, please remind me by this weekend. > > Regards, From ncoghlan at gmail.com Tue Dec 5 23:34:36 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Dec 2017 14:34:36 +1000 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: On 6 December 2017 at 06:43, Victor Stinner wrote: > At the end, I'm not sure that the PEP 565 is really needed or would help anyone. Folks, I'd really appreciate it if you could comment on the merits of the PEP without implicitly claiming that I don't exist, and that Linux system administrators don't exist :) Right now, Linux distro Python library upgrades (including CPython updates to a new feature release) may result in *hard compatibility breaks*, as previously deprecated features disappear without warning. With PEP 565, ad hoc personal scripts will at least get a DeprecationWarning when APIs they're using are at risk of disappearing as a result of a distro package or version update. Right now, they don't get a warning at all. There is *no way* for any opt in flag to help these users. Now, PEP 565 being rejected wouldn't be the end of the world on that front - we can apply comparable changes as a downstream patch. However, these problems aren't unique to Linux, and they aren't unique to any specific distro: they apply whenever folks are using Python for personal workflow automation, rather than for redistributable applications and libraries. So it makes more sense to me to do this upstream, rather than having the default warnings handling be a redistributor dependent behaviour. That said, I go agree we could offer easier to use APIs to app developers that just want to hide warnings from their users, so I've filed https://bugs.python.org/issue32229 to propose a straightforward "warnings.hide_warnings()" API that encapsulates things like checking for a non-empty sys.warnoptions list. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Dec 5 23:50:36 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Dec 2017 14:50:36 +1000 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: On 6 December 2017 at 14:34, Nick Coghlan wrote: > That said, I go agree we could offer easier to use APIs to app > developers that just want to hide warnings from their users, so I've > filed https://bugs.python.org/issue32229 to propose a straightforward > "warnings.hide_warnings()" API that encapsulates things like checking > for a non-empty sys.warnoptions list. I've updated the "Limitations" section of the PEP to mention that separate proposal: https://github.com/python/peps/commit/6e93c8d2e6ad698834578d4077b92a8fc84a70f5 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Dec 6 00:46:17 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Dec 2017 15:46:17 +1000 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: Something I've just noticed that needs to be clarified: on Linux, "C" locale and "POSIX" locale are aliases, but this isn't true in general (e.g. it's not the case on *BSD systems, including Mac OS X). To handle that in PEP 538, I made it clear that everything is keyed specifically off the "C" locale, since that's what you actually get by default. So if PEP 540 is going to implicitly trigger switching encodings, it needs to specify whether it's going to look for the C locale or the POSIX locale (I'd suggest C locale, since that's the actual default that causes problems). The precedence relationship with locale coercion also needs to be spelled out: successful locale coercion should skip implicitly enabling UTF-8 mode (for opt-in UTF-8 mode, we'd still try to coerce the locale setting as appropriate, so extensions modules are more likely to behave themselves). On 6 December 2017 at 14:07, INADA Naoki wrote: > Oh, revised version is really short! > > And I have one worrying point. > With UTF-8 mode, open()'s default encoding/error handler is > UTF-8/surrogateescape. > > Containers are really growing. PyCharm supports Docker and many new Python > developers use Docker instead of installing Python directly on their system, > especially on Windows. > > And opening binary file without "b" option is very common mistake of new > developers. If default error handler is surrogateescape, they lose a chance > to notice their bug. > > On the other hand, it helps some use cases when user want byte-transparent > behavior, without modifying code to use "surrogateescape" explicitly. > > Which is more important scenario? Anyone has opinion about it? > Are there any rationals and use cases I missing? For platforms that offer a C.UTF-8 locale, I'd like "LC_CTYPE=C.UTF-8 python" and "PYTHONCOERCECLOCALE=0 LC_CTYPE=C PYTHONUTF8=1" to be equivalent (aside from the known limitation that extension modules may not do the right thing in the latter case). For the locale coercion case, the default error handler for `open` remains as "strict", which means I'd be in favour of keeping it as "strict" by default in UTF-8 mode as well. That would flip the toggle in the PEP: "strict UTF-8" would be the default selection for "PYTHONUTF8=1, and you'd choose the more relaxed option via "PYTHONUTF8=permissive". That way, the combination of PEPs 538 and 540 would give us the following situation in the C locale: 1. Our preferred approach is to coerce LC_CTYPE in the C locale to a UTF-8 based equivalent 2. Only if that fails (e.g. as it will on CentOS 7) do we resort to implicitly enabling CPython's internal UTF-8 mode (which should behave like C.UTF-8, *except* for the fact extension modules won't respect it) That way, the ideal outcome is that a UTF-8 based locale exists, and we use it automatically when needed. UTF-8 mode than lets us cope with older platforms where neither C.UTF-8 nor an equivalent exists. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Wed Dec 6 00:59:31 2017 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 6 Dec 2017 16:59:31 +1100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: On Wed, Dec 6, 2017 at 4:46 PM, Nick Coghlan wrote: > Something I've just noticed that needs to be clarified: on Linux, "C" > locale and "POSIX" locale are aliases, but this isn't true in general > (e.g. it's not the case on *BSD systems, including Mac OS X). For those of us with little to no BSD/MacOS experience, can you give a quick run-down of the differences between "C" and "POSIX"? ChrisA From ncoghlan at gmail.com Wed Dec 6 01:15:29 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Dec 2017 16:15:29 +1000 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: On 6 December 2017 at 15:59, Chris Angelico wrote: > On Wed, Dec 6, 2017 at 4:46 PM, Nick Coghlan wrote: >> Something I've just noticed that needs to be clarified: on Linux, "C" >> locale and "POSIX" locale are aliases, but this isn't true in general >> (e.g. it's not the case on *BSD systems, including Mac OS X). > > For those of us with little to no BSD/MacOS experience, can you give a > quick run-down of the differences between "C" and "POSIX"? The one that's relevant to default locale detection is just the string that "setlocale(LC_CTYPE, NULL)" returns. On Linux (or, more accurately, with glibc), after setting "LC_CTYPE=POSIX", that call still returns "C" (since the "POSIX" locale is defined as an alias for the "C" locale). By contrast, on *BSD, it will return "POSIX" (since "POSIX" is actually a distinct locale there). Beyond that, I don't know what the actual functional differences are. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From v+python at g.nevcal.com Wed Dec 6 01:18:06 2017 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 5 Dec 2017 22:18:06 -0800 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: <64dbe7e0-47c9-4938-24d7-7befce2d0ad4@g.nevcal.com> On 12/5/2017 8:07 PM, INADA Naoki wrote: > Oh, revised version is really short! > > And I have one worrying point. > With UTF-8 mode, open()'s default encoding/error handler is > UTF-8/surrogateescape. > > Containers are really growing. PyCharm supports Docker and many new Python > developers use Docker instead of installing Python directly on their system, > especially on Windows. > > And opening binary file without "b" option is very common mistake of new > developers. If default error handler is surrogateescape, they lose a chance > to notice their bug. "b" mostly matters on Windows, correct? And Windows doesn't use C or POSIX locale, correct? And if these are correct, then is this an issue? And if so, why? > On the other hand, it helps some use cases when user want byte-transparent > behavior, without modifying code to use "surrogateescape" explicitly. > > Which is more important scenario? Anyone has opinion about it? > Are there any rationals and use cases I missing? > > Regards, > > INADA Naoki > > > On Wed, Dec 6, 2017 at 12:17 PM, INADA Naoki wrote: >> I'm sorry about my laziness. >> I've very busy these months, but I'm back to OSS world from today. >> >> While I should review carefully again, I think I'm close to accept PEP 540. >> >> * PEP 540 really helps containers and old Linux machines PEP 538 doesn't work. >> And containers is really important for these days. Many new >> Pythonistas who is >> not Linux experts start using containers. >> >> * In recent years, UTF-8 fixed many mojibakes. Now UnicodeError is >> more usability >> problem for many Python users. So I agree opt-out UTF-8 mode is >> better than opt-in >> on POSIX locale. >> >> I don't have enough time to read all mails in ML archive. >> So if someone have opposite opinion, please remind me by this weekend. >> >> Regards, > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/v%2Bpython%40g.nevcal.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Wed Dec 6 01:20:36 2017 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 6 Dec 2017 07:20:36 +0100 Subject: [Python-Dev] PEP 554 v4 (new interpreters module) In-Reply-To: References: Message-ID: <9ccb96d9-4189-cf46-fe1a-d4ef12a2bac9@gmail.com> On 12/06/2017 03:51 AM, Eric Snow wrote: > Hi all, > > I've finally updated PEP 554. Feedback would be most welcome. The > PEP is in a pretty good place now and I hope to we're close to a > decision to accept it. :) [...] > C-extension opt-in/opt-out > -------------------------- > > By using the ``PyModuleDef_Slot`` introduced by PEP 489, we could easily > add a mechanism by which C-extension modules could opt out of support > for subinterpreters. Then the import machinery, when operating in > a subinterpreter, would need to check the module for support. It would > raise an ImportError if unsupported. > > Alternately we could support opting in to subinterpreter support. > However, that would probably exclude many more modules (unnecessarily) > than the opt-out approach. Currently it's already opt-in, as modules that use PyModuleDef are expected to support subinterpreters: https://www.python.org/dev/peps/pep-0489/#subinterpreters-and-interpreter-reloading [...] > .. [global-atexit] > https://bugs.python.org/issue6531 Oh dear; there's now also https://bugs.python.org/issue31901 From tjreedy at udel.edu Wed Dec 6 01:26:08 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 6 Dec 2017 01:26:08 -0500 Subject: [Python-Dev] =?utf-8?q?=22CPython_loves_your_Pull_Requests=22_ta?= =?utf-8?q?lk_by_St=C3=A9phane_Wirtel?= In-Reply-To: References: Message-ID: On 12/5/2017 10:25 AM, Mariatta Wijaya wrote: > * Time to merge a PR: 3 days in average, good! Slide said 2.98 days, another said 4.4% by developers. > Regarding the average time to merge PR, I'm interested to know the > average time to merge for PRs not made by Python Core Devs. Trivially different: assume 0 days for all dev PRs, then average would be 2.96 / .956 = 3.12. But any average that includes backports, which I suspect the above does, is skewed way down because backports are mostly merged soon after tests complete. So I think 6 days average may be more realistic for master branch (3.7) PRs. The average may be skewed down even more because some of the PRs that have been open a month or more will eventually be merged. On the other hand, the mean, certainly by itself, is the *wrong* statistic for this data. The argument for the value of using means depends on the data having a distribution that is at least roughly gaussian (misleadingly called 'normal'). The waiting times for merging appear to be negative exponential (slide 75). Long waiting times have an oversized influence on the mean. So one should either calculate and report the median time or possibly the mean log(wait), converted back to waiting time. In other words, exp(mean(map(log, waits))). For the latter, one should probably start the clock after the initial CI tests finish, which is at least 1/2 hour. -- Terry Jan Reedy (retired statistician) From ncoghlan at gmail.com Wed Dec 6 01:28:11 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Dec 2017 16:28:11 +1000 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: <64dbe7e0-47c9-4938-24d7-7befce2d0ad4@g.nevcal.com> References: <64dbe7e0-47c9-4938-24d7-7befce2d0ad4@g.nevcal.com> Message-ID: On 6 December 2017 at 16:18, Glenn Linderman wrote: > "b" mostly matters on Windows, correct? And Windows doesn't use C or POSIX > locale, correct? And if these are correct, then is this an issue? And if so, > why? In Python 3, "b" matters everywhere, since it controls whether the stream gets wrapped in TextIOWrapper or not. It's only in Python 2 that the distinction is Windows-specific (where it controls how "\r\n" sequences get handled). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Dec 6 01:49:07 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Dec 2017 16:49:07 +1000 Subject: [Python-Dev] PEP 554 v4 (new interpreters module) In-Reply-To: References: Message-ID: On 6 December 2017 at 12:51, Eric Snow wrote: > Hi all, > > I've finally updated PEP 554. Feedback would be most welcome. The > PEP is in a pretty good place now and I hope to we're close to a > decision to accept it. :) Nice updates! I like this version. > In addition to resolving the open questions, I've also made the > following changes to the PEP: > > * put an API summary at the top and moved the full API description down > * add the "is_shareable()" function to indicate if an object can be shared > * added None as a shareable object > > Regarding the open questions: > > * "Leaking exceptions across interpreters" > > I chose to go with an approach that effectively creates a > traceback.TracebackException proxy of the original exception, wraps > that in a RuntimeError, and raises that in the calling interpreter. > Raising an exception that safely preserves the original exception and > traceback seems like the most intuitive behavior (to me, as a user). > The only alternative that made sense is to fully duplicate the > exception and traceback (minus stack frames) in the calling > interpreter, which is probably overkill and likely to be confusing. My one suggestion here would be to consider a dedicated exception type like "interpreters.SubinterpreterError", rather than re-using RuntimeError directly. That way you can put the extracted traceback on a named attribute, and retain the option of potentially adding subinterpreter awareness to the traceback module in the future. > * "Initial support for buffers in channels" > > I chose to add a "SendChannel.send_buffer(obj)" method for this. > Supporting buffer objects from the beginning makes sense, opening good > experimentation opportunities for a valuable set of users. Supporting > buffer objects separately and explicitly helps set clear expectations > for users. I decided not to go with a separate class (e.g. > MemChannel) as it didn't seem like there's enough difference to > warrant keeping them strictly separate. > > FWIW, I'm still strongly in favor of support for passing (copies of) > bytes objects via channels. Passing objects to SendChannel.send() is > obvious. Limiting it, for now, to bytes (and None) helps us avoid > tying ourselves strongly to any particular implementation (it seems > like all the reservations were relative to the implementation). So I > do not see a reason to wait. Aye, the split sending API with a merged receive API works for me. > * "Pass channels explicitly to run()?" > > I've applied the suggested solution (make "channels" an explicit > keyword argument). Cool. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ericsnowcurrently at gmail.com Wed Dec 6 03:10:31 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 6 Dec 2017 01:10:31 -0700 Subject: [Python-Dev] PEP 554 v4 (new interpreters module) In-Reply-To: References: Message-ID: On Dec 5, 2017 23:49, "Nick Coghlan" wrote: Nice updates! I like this version. Great! :) My one suggestion here would be to consider a dedicated exception type like "interpreters.SubinterpreterError", rather than re-using RuntimeError directly. That way you can put the extracted traceback on a named attribute, and retain the option of potentially adding subinterpreter awareness to the traceback module in the future. Yeah, I already have a deferred idea item for this. :). TBH, I was on the fence about a dedicated exception type, so you've nudged me on board. :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Dec 6 03:15:01 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Dec 2017 18:15:01 +1000 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: On 6 December 2017 at 14:50, Nick Coghlan wrote: > On 6 December 2017 at 14:34, Nick Coghlan wrote: >> That said, I go agree we could offer easier to use APIs to app >> developers that just want to hide warnings from their users, so I've >> filed https://bugs.python.org/issue32229 to propose a straightforward >> "warnings.hide_warnings()" API that encapsulates things like checking >> for a non-empty sys.warnoptions list. > > I've updated the "Limitations" section of the PEP to mention that > separate proposal: > https://github.com/python/peps/commit/6e93c8d2e6ad698834578d4077b92a8fc84a70f5 Having rebased the PEP 565 patch atop the "-X dev" changes, I think that if we don't change some of the details of how `-X dev` is implemented, `warnings.hide_warnings` (or a comparable convenience API) is going to be a requirement to help app developers effectively manage their default warnings settings in 3.7+. The problem is that devmode doesn't currently behave the same way `-Wd` does when it comes to sys.warnoptions: $ ./python -Wd -c "import sys; print(sys.warnoptions); print(sys.flags.dev_mode)" ['d'] False $ ./python -X dev -c "import sys; print(sys.warnoptions); print(sys.flags.dev_mode)" [] True As currently implemented, the warnings module actually checks `sys.flags.dev_mode` directly during startup (or `sys._xoptions` in the case of the pure Python fallback), and populates the warnings filter differently depending on what it finds: $ ./python -c "import warnings; print('\n'.join(map(str, warnings.filters)))" ('default', None, , '__main__', 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) $ ./python -X dev -c "import warnings; print('\n'.join(map(str, warnings.filters)))" ('ignore', None, , None, 0) ('default', None, , None, 0) ('default', None, , None, 0) $ ./python -Wd -c "import warnings; print('\n'.join(map(str, warnings.filters)))" ('default', None, , None, 0) ('default', None, , '__main__', 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) This means the app development snippet proposed in the PEP will no longer do the right thing, since it will ignore the dev mode flag: if not sys.warnoptions: # This still runs for `-X dev` warnings.simplefilter("ignore") My main suggested fix would be to adjust the way `-X dev` is implemented to include `sys.warnoptions.append('default')` (and remove the direct dev_mode query from the warnings module code). However, another possible way to go would be to make the correct Python 3.7+-only snippet look like this: import warnings warnings.hide_warnings() And have the forward-compatible snippet look like this: import warnings: if hasattr(warnings, "hide_warnings"): # Accounts for `-W`, `-X dev`, and any other implementation specific settings warnings.hide_warnings() else: # Only accounts for `-W` import sys if not sys.warnoptions: warnings.simplefilter("ignore") (We can also do both, of course) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From victor.stinner at gmail.com Wed Dec 6 04:42:41 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 6 Dec 2017 10:42:41 +0100 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: Let's discuss -Xdev implementation issue at https://bugs.python.org/issue32230 In short, -Xdev must add its warning at the end to respect BytesWarning, whereas it's not possible with -W option :-( Victor Le 6 d?c. 2017 09:15, "Nick Coghlan" a ?crit : On 6 December 2017 at 14:50, Nick Coghlan wrote: > On 6 December 2017 at 14:34, Nick Coghlan wrote: >> That said, I go agree we could offer easier to use APIs to app >> developers that just want to hide warnings from their users, so I've >> filed https://bugs.python.org/issue32229 to propose a straightforward >> "warnings.hide_warnings()" API that encapsulates things like checking >> for a non-empty sys.warnoptions list. > > I've updated the "Limitations" section of the PEP to mention that > separate proposal: > https://github.com/python/peps/commit/6e93c8d2e6ad698834578d4077b92a 8fc84a70f5 Having rebased the PEP 565 patch atop the "-X dev" changes, I think that if we don't change some of the details of how `-X dev` is implemented, `warnings.hide_warnings` (or a comparable convenience API) is going to be a requirement to help app developers effectively manage their default warnings settings in 3.7+. The problem is that devmode doesn't currently behave the same way `-Wd` does when it comes to sys.warnoptions: $ ./python -Wd -c "import sys; print(sys.warnoptions); print(sys.flags.dev_mode)" ['d'] False $ ./python -X dev -c "import sys; print(sys.warnoptions); print(sys.flags.dev_mode)" [] True As currently implemented, the warnings module actually checks `sys.flags.dev_mode` directly during startup (or `sys._xoptions` in the case of the pure Python fallback), and populates the warnings filter differently depending on what it finds: $ ./python -c "import warnings; print('\n'.join(map(str, warnings.filters)))" ('default', None, , '__main__', 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) $ ./python -X dev -c "import warnings; print('\n'.join(map(str, warnings.filters)))" ('ignore', None, , None, 0) ('default', None, , None, 0) ('default', None, , None, 0) $ ./python -Wd -c "import warnings; print('\n'.join(map(str, warnings.filters)))" ('default', None, , None, 0) ('default', None, , '__main__', 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) ('ignore', None, , None, 0) This means the app development snippet proposed in the PEP will no longer do the right thing, since it will ignore the dev mode flag: if not sys.warnoptions: # This still runs for `-X dev` warnings.simplefilter("ignore") My main suggested fix would be to adjust the way `-X dev` is implemented to include `sys.warnoptions.append('default')` (and remove the direct dev_mode query from the warnings module code). However, another possible way to go would be to make the correct Python 3.7+-only snippet look like this: import warnings warnings.hide_warnings() And have the forward-compatible snippet look like this: import warnings: if hasattr(warnings, "hide_warnings"): # Accounts for `-W`, `-X dev`, and any other implementation specific settings warnings.hide_warnings() else: # Only accounts for `-W` import sys if not sys.warnoptions: warnings.simplefilter("ignore") (We can also do both, of course) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Wed Dec 6 05:34:59 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 6 Dec 2017 11:34:59 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: Hi Naoki, 2017-12-06 5:07 GMT+01:00 INADA Naoki : > Oh, revised version is really short! > > And I have one worrying point. > With UTF-8 mode, open()'s default encoding/error handler is > UTF-8/surrogateescape. The Strict UTF-8 Mode is for you if you prioritize correctness over usability. In the very first version of my PEP/idea, I wanted to use UTF-8/strict. But then I started to play with the implementation and I got many "practical" issues. Using UTF-8/strict, you quickly get encoding errors. For example, you become unable to read undecodable bytes from stdin. stdin.read() only gives you an error, without letting you decide how to handle these "invalid" data. Same issue with stdout. Compare encodings of the UTF-8 mode and the Strict UTF-8 Mode: https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler I tried to summarize all these kinds of issues in the second short subsection of the rationale: https://www.python.org/dev/peps/pep-0540/#passthough-undecodable-bytes-surrogateescape In the old long version of the PEP, I tried to explain UTF-8/strict issues with very concrete examples, the removed "Use Cases" section: https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L490 Tell me if I should rephrase the rationale of the PEP 540 to better justify the usage of surrogateescape. Maybe the "UTF-8 Mode" should be renamed to "UTF-8 with surrogateescape, or backslashreplace for stderr, or surrogatepass for fsencode/fsencode on Windows, or strict for Strict UTF-8 Mode"... But the PEP title would be too long, no? :-) > And opening binary file without "b" option is very common mistake of new > developers. If default error handler is surrogateescape, they lose a chance > to notice their bug. When open() in used in text mode to read "binary data", usually the developer would only notify when getting the POSIX locale (ASCII encoding). But the PEP 538 already changed that by using the C.UTF-8 locale (and so the UTF-8 encoding, instead of the ASCII encoding). I'm not sure that locales are the best way to detect such class of bytes. I suggest to use -b or -bb option to detect such bugs without having to care of the locale. > On the other hand, it helps some use cases when user want byte-transparent > behavior, without modifying code to use "surrogateescape" explicitly. > > Which is more important scenario? Anyone has opinion about it? > Are there any rationals and use cases I missing? Usually users expect that Python 3 "just works" and don't bother them with the locale (thay nobody understands). The old version of the PEP contains a long list of issues: https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L924-L986 I already replaced the strict error handler with surrogateescape for sys.stdin and sys.stdout on the POSIX locale in Python 3.5: https://bugs.python.org/issue19977 For the rationale, read for example these comments: * https://bugs.python.org/issue19846#msg205727 "As I would state it, the problem is that python's boundary with the OS is not yet uniform. (...) Note that currently, input() and sys.stdin.read() won't read undecodable data so this is somewhat symmetrical but it seems to me that saying "everything that interfaces with the OS except the standard streams will use surrogateescape on undecodable bytes" is drawing a line in an unintuitive location." * https://bugs.python.org/issue19977#msg206141 "My impression was that python3 was supposed to help get rid of UnicodeError tracebacks, not mojibake. If mojibake was the problem then we should never have gone down the surrogateescape path for input." * https://bugs.python.org/issue19846#msg205646 "For example I'm using [LANG=C] for testcases to set the language uncomplicated to english." In bug reports, to get the user expectations, just ignore all core developers comments :-) Users set the locale to C to get messages in english and still expects "Unicode" to work properly. Only Python 3 is so strict about encodings. Most other programming languages, like Python 2, "just works", since they process data as bytes. Victor From victor.stinner at gmail.com Wed Dec 6 05:38:40 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 6 Dec 2017 11:38:40 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: Nick: > So if PEP 540 is going to implicitly trigger switching encodings, it > needs to specify whether it's going to look for the C locale or the > POSIX locale (I'd suggest C locale, since that's the actual default > that causes problems). I'm thinking at the test already used by check_force_ascii() (function checking if the LC_CTYPE uses the ASCII encoding or something else): loc = setlocale(LC_CTYPE, NULL); if (loc == NULL) goto error; if (strcmp(loc, "C") != 0) { /* the LC_CTYPE locale is different than C */ return 0; } Victor From storchaka at gmail.com Wed Dec 6 06:58:27 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 6 Dec 2017 13:58:27 +0200 Subject: [Python-Dev] Zero-width matching in regexes In-Reply-To: <16f7437d-3e11-6a19-3569-e4d55a370744@mrabarnett.plus.com> References: <16f7437d-3e11-6a19-3569-e4d55a370744@mrabarnett.plus.com> Message-ID: 05.12.17 01:21, MRAB ????: > I've finally come to a conclusion as to what the "correct" behaviour of > zero-width matches should be: """always return the first match, but > never a zero-width match that is joined to a previous zero-width match""". > > If it's about to return a zero-width match that's joined to a previous > zero-width match, then backtrack and keep on looking for a match. Isn't this how sub(), findall() and finditer() work in regex with VERSION1? I agree that this behavior looks most logical and self-consistent. Unfortunately the different behavior of re.sub() is documented explicitly: "Empty matches for the pattern are replaced only when not adjacent to a previous match, so sub('x*', '-', 'abc') returns '-a-b-c-'." And there a special purposed test for this. One time the behavior was changed when the re implementation was changed from pre to sre, but the older behavior was restored. [1] [2] [1] https://bugs.python.org/issue462270 [2] https://github.com/python/cpython/commit/21009b9c6fc40b25fcb30ee60d6108f235733e40 From ncoghlan at gmail.com Wed Dec 6 07:58:00 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 6 Dec 2017 22:58:00 +1000 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: On 6 December 2017 at 20:38, Victor Stinner wrote: > Nick: >> So if PEP 540 is going to implicitly trigger switching encodings, it >> needs to specify whether it's going to look for the C locale or the >> POSIX locale (I'd suggest C locale, since that's the actual default >> that causes problems). > > I'm thinking at the test already used by check_force_ascii() (function > checking if the LC_CTYPE uses the ASCII encoding or something else): > > loc = setlocale(LC_CTYPE, NULL); > if (loc == NULL) > goto error; > if (strcmp(loc, "C") != 0) { > /* the LC_CTYPE locale is different than C */ > return 0; > } Yeah, the locale coercion code changes the locale multiple times to make sure we have a coercion target that will actually work (and then checks nl_langinfo as well, since that sometimes breaks on BSD systems, even if the original setlocale() call claimed to work). Once we've found a locale that appears to work though, then we configure the LC_CTYPE environment variable, and reload the locale from the environment. It's all annoyingly convoluted and arcane, but it works well enough for https://github.com/python/cpython/blob/master/Lib/test/test_c_locale_coercion.py to pass across the full BuildBot fleet :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Wed Dec 6 08:13:46 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 6 Dec 2017 15:13:46 +0200 Subject: [Python-Dev] Zero-width matching in regexes In-Reply-To: References: <16f7437d-3e11-6a19-3569-e4d55a370744@mrabarnett.plus.com> Message-ID: 05.12.17 22:26, Terry Reedy ????: > On 12/4/2017 6:21 PM, MRAB wrote: >> I've finally come to a conclusion as to what the "correct" behaviour >> of zero-width matches should be: """always return the first match, but >> never a zero-width match that is joined to a previous zero-width >> match""". > > Is this different from current re or regex? Partially. There are different ways of handling the problem of repeated zero-width searching. 1. The one formulated by Matthew. This is the behavior of findall() and finditer() in regex in both VERSION0 and VERSION1 modes, sub() in regex in the VERSION1 mode, and findall() and finditer() in re since 3.7. 2. Prohibit a zero-width match that is joined to a previous match (independent from its width). This is the behavior of sub() in re and in regex in the VERSION0 mode, and split() in regex in the VERSION1 mode. This is the only correctly documented and explicitly tested behavior in re. 3. Prohibit a zero-width match (always). This is the behavior of split() in re in 3.4 and older (deprecated since 3.5) and in regex in VERSION0 mode. 4. Exclude the character following a zero-width match from following matches. This is the behavior of findall() and finditer() in 3.6 and older. The case 4 is definitely incorrect. It leads to excluding characters from matching. re.findall(r'^|\w+', 'two words') returns ['', 'wo', 'words']. The case 3 is pretty useless. It disallow splitting on useful zero-width patterns like `\b` and makes `\s*` just equal to `\s+`. The difference between cases 1 and 2 is subtle. The case 1 looks more logical and matches the behavior of Perl and PCRE, but the case 2 is explicitly documented and tested. This behavior is kept for compatibility with an ancient re implementation. From p.f.moore at gmail.com Wed Dec 6 08:37:47 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 6 Dec 2017 13:37:47 +0000 Subject: [Python-Dev] Zero-width matching in regexes In-Reply-To: References: <16f7437d-3e11-6a19-3569-e4d55a370744@mrabarnett.plus.com> Message-ID: On 6 December 2017 at 13:13, Serhiy Storchaka wrote: > 05.12.17 22:26, Terry Reedy ????: >> >> On 12/4/2017 6:21 PM, MRAB wrote: >>> >>> I've finally come to a conclusion as to what the "correct" behaviour of >>> zero-width matches should be: """always return the first match, but never a >>> zero-width match that is joined to a previous zero-width match""". >> >> >> Is this different from current re or regex? > > > Partially. There are different ways of handling the problem of repeated > zero-width searching. > > 1. The one formulated by Matthew. This is the behavior of findall() and > finditer() in regex in both VERSION0 and VERSION1 modes, sub() in regex in > the VERSION1 mode, and findall() and finditer() in re since 3.7. > > 2. Prohibit a zero-width match that is joined to a previous match > (independent from its width). This is the behavior of sub() in re and in > regex in the VERSION0 mode, and split() in regex in the VERSION1 mode. This > is the only correctly documented and explicitly tested behavior in re. > > 3. Prohibit a zero-width match (always). This is the behavior of split() in > re in 3.4 and older (deprecated since 3.5) and in regex in VERSION0 mode. > > 4. Exclude the character following a zero-width match from following > matches. This is the behavior of findall() and finditer() in 3.6 and older. > > The case 4 is definitely incorrect. It leads to excluding characters from > matching. re.findall(r'^|\w+', 'two words') returns ['', 'wo', 'words']. > > The case 3 is pretty useless. It disallow splitting on useful zero-width > patterns like `\b` and makes `\s*` just equal to `\s+`. > > The difference between cases 1 and 2 is subtle. The case 1 looks more > logical and matches the behavior of Perl and PCRE, but the case 2 is > explicitly documented and tested. This behavior is kept for compatibility > with an ancient re implementation. Behaviour (1) means that we'd get >>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION1) 'xx xx' (because \w* matches the empty string after each word, as well as each word itself). I just tested in Perl, and that is indeed what happens there as well. On that basis, I have to say that I find behaviour (2) more intuitive and (arguably) "correct": >>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION0) 'x x' >>> re.sub(r'\w*', 'x', 'hello world') 'x x' Paul From songofacandy at gmail.com Wed Dec 6 09:02:16 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 6 Dec 2017 23:02:16 +0900 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: >> And I have one worrying point. >> With UTF-8 mode, open()'s default encoding/error handler is >> UTF-8/surrogateescape. > > The Strict UTF-8 Mode is for you if you prioritize correctness over usability. Yes, but as I said, I cares about not experienced developer who doesn't know what UTF-8 mode is. > > In the very first version of my PEP/idea, I wanted to use > UTF-8/strict. But then I started to play with the implementation and I > got many "practical" issues. Using UTF-8/strict, you quickly get > encoding errors. For example, you become unable to read undecodable > bytes from stdin. stdin.read() only gives you an error, without > letting you decide how to handle these "invalid" data. Same issue with > stdout. > I don't care about stdio, because PEP 538 uses surrogateescape for stdio/error https://www.python.org/dev/peps/pep-0538/#changes-to-the-default-error-handling-on-the-standard-streams I care only about builtin open()'s behavior. PEP 538 doesn't change default error handler of open(). I think PEP 538 and PEP 540 should behave almost identical except changing locale or not. So I need very strong reason if PEP 540 changes default error handler of open(). > In the old long version of the PEP, I tried to explain UTF-8/strict > issues with very concrete examples, the removed "Use Cases" section: > https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L490 > > Tell me if I should rephrase the rationale of the PEP 540 to better > justify the usage of surrogateescape. OK, "List a directory into a text file" example demonstrates why surrogateescape is used for open(). If os.listdir() returns surrogateescpaed data, file.write() will be fail. All other examples are about stdio. But we should achieve good balance between correctness and usability of default behavior. > > Maybe the "UTF-8 Mode" should be renamed to "UTF-8 with > surrogateescape, or backslashreplace for stderr, or surrogatepass for > fsencode/fsencode on Windows, or strict for Strict UTF-8 Mode"... But > the PEP title would be too long, no? :-) > I feel short name is enough. > >> And opening binary file without "b" option is very common mistake of new >> developers. If default error handler is surrogateescape, they lose a chance >> to notice their bug. > > When open() in used in text mode to read "binary data", usually the > developer would only notify when getting the POSIX locale (ASCII > encoding). But the PEP 538 already changed that by using the C.UTF-8 > locale (and so the UTF-8 encoding, instead of the ASCII encoding). > With PEP 538 (C.UTF-8 locale), open() uses UTF-8/strict, not UTF-8/surrogateescape. For example, this code raise UnicodeDecodeError with PEP 538 if the file is JPEG file. with open(fn) as f: f.read() > I'm not sure that locales are the best way to detect such class of > bytes. I suggest to use -b or -bb option to detect such bugs without > having to care of the locale. > But many new developers doesn't use/know -b or -bb option. > >> On the other hand, it helps some use cases when user want byte-transparent >> behavior, without modifying code to use "surrogateescape" explicitly. >> >> Which is more important scenario? Anyone has opinion about it? >> Are there any rationals and use cases I missing? > > Usually users expect that Python 3 "just works" and don't bother them > with the locale (thay nobody understands). > > The old version of the PEP contains a long list of issues: > https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L924-L986 > > I already replaced the strict error handler with surrogateescape for > sys.stdin and sys.stdout on the POSIX locale in Python 3.5: > https://bugs.python.org/issue19977 > > For the rationale, read for example these comments: > [snip] OK, I'll read them and think again about open()'s default behavior. But I still hope open()'s behavior is consistent with PEP 538 and PEP 540. Regards, From storchaka at gmail.com Wed Dec 6 09:15:12 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 6 Dec 2017 16:15:12 +0200 Subject: [Python-Dev] Zero-width matching in regexes In-Reply-To: References: <16f7437d-3e11-6a19-3569-e4d55a370744@mrabarnett.plus.com> Message-ID: 06.12.17 15:37, Paul Moore ????: > Behaviour (1) means that we'd get > > >>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION1) > 'xx xx' > > (because \w* matches the empty string after each word, as well as each > word itself). I just tested in Perl, and that is indeed what happens > there as well. Yes, because in this case you need to use `\w+`, not `\w*`. No CPython tests will be failed if change re.sub() to behaviour (2) except just added in 3.7 tests and the one test specially purposed to guard the old behavior. But I don't know how much third party code will be broken if made this change. > On that basis, I have to say that I find behaviour (2) more intuitive > and (arguably) "correct": > > >>> regex.sub(r'\w*', 'x', 'hello world', flags=regex.VERSION0) > 'x x' > >>> re.sub(r'\w*', 'x', 'hello world') > 'x x' The actual behavior of re.sub() and regex.sub() in the VERSION0 mode was a weird behavior (4). >>> regex.sub(r'(\b|\w+)', r'[\1]', 'hello world', flags=regex.VERSION0) '[]h[ello] []w[orld]' >>> regex.sub(r'(\b|\w+)', r'[\1]', 'hello world', flags=regex.VERSION1) '[][hello][] [][world][]' >>> re.sub(r'(\b|\w+)', r'[\1]', 'hello world') # 3.6, behavior (4) '[]h[ello] []w[orld]' >>> re.sub(r'(\b|\w+)', r'[\1]', 'hello world') # 3.7, behavior (2) '[][hello] [][world]' From guido at python.org Wed Dec 6 10:50:28 2017 From: guido at python.org (Guido van Rossum) Date: Wed, 6 Dec 2017 07:50:28 -0800 Subject: [Python-Dev] PEP 554 v4 (new interpreters module) In-Reply-To: References: Message-ID: Sorry to burst your bubble, but I have not followed any of the discussion and I am actually very worried about this topic. I don't think I will be able to make time for this before the 3.7b1 feature freeze. On Tue, Dec 5, 2017 at 6:51 PM, Eric Snow wrote: > Hi all, > > I've finally updated PEP 554. Feedback would be most welcome. The > PEP is in a pretty good place now and I hope to we're close to a > decision to accept it. :) > > In addition to resolving the open questions, I've also made the > following changes to the PEP: > > * put an API summary at the top and moved the full API description down > * add the "is_shareable()" function to indicate if an object can be shared > * added None as a shareable object > > Regarding the open questions: > > * "Leaking exceptions across interpreters" > > I chose to go with an approach that effectively creates a > traceback.TracebackException proxy of the original exception, wraps > that in a RuntimeError, and raises that in the calling interpreter. > Raising an exception that safely preserves the original exception and > traceback seems like the most intuitive behavior (to me, as a user). > The only alternative that made sense is to fully duplicate the > exception and traceback (minus stack frames) in the calling > interpreter, which is probably overkill and likely to be confusing. > > * "Initial support for buffers in channels" > > I chose to add a "SendChannel.send_buffer(obj)" method for this. > Supporting buffer objects from the beginning makes sense, opening good > experimentation opportunities for a valuable set of users. Supporting > buffer objects separately and explicitly helps set clear expectations > for users. I decided not to go with a separate class (e.g. > MemChannel) as it didn't seem like there's enough difference to > warrant keeping them strictly separate. > > FWIW, I'm still strongly in favor of support for passing (copies of) > bytes objects via channels. Passing objects to SendChannel.send() is > obvious. Limiting it, for now, to bytes (and None) helps us avoid > tying ourselves strongly to any particular implementation (it seems > like all the reservations were relative to the implementation). So I > do not see a reason to wait. > > * "Pass channels explicitly to run()?" > > I've applied the suggested solution (make "channels" an explicit > keyword argument). > > -eric > > > I've include the latest full text > (https://www.python.org/dev/peps/pep-0554/) below: > > +++++++++++++++++++++++++++++++++++++++++++++++++ > > PEP: 554 > Title: Multiple Interpreters in the Stdlib > Author: Eric Snow > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 2017-09-05 > Python-Version: 3.7 > Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017, 05-Dec-2017 > > > Abstract > ======== > > CPython has supported multiple interpreters in the same process (AKA > "subinterpreters") since version 1.5. The feature has been available > via the C-API. [c-api]_ Subinterpreters operate in > `relative isolation from one another `_, which > provides the basis for an > `alternative concurrency model `_. > > This proposal introduces the stdlib ``interpreters`` module. The module > will be `provisional `_. It exposes the basic > functionality of subinterpreters already provided by the C-API, along > with new functionality for sharing data between interpreters. > > > Proposal > ======== > > The ``interpreters`` module will be added to the stdlib. It will > provide a high-level interface to subinterpreters and wrap a new > low-level ``_interpreters`` (in the same was as the ``threading`` > module). See the `Examples`_ section for concrete usage and use cases. > > Along with exposing the existing (in CPython) subinterpreter support, > the module will also provide a mechanism for sharing data between > interpreters. This mechanism centers around "channels", which are > similar to queues and pipes. > > Note that *objects* are not shared between interpreters since they are > tied to the interpreter in which they were created. Instead, the > objects' *data* is passed between interpreters. See the `Shared data`_ > section for more details about sharing between interpreters. > > At first only the following types will be supported for sharing: > > * None > * bytes > * PEP 3118 buffer objects (via ``send_buffer()``) > > Support for other basic types (e.g. int, Ellipsis) will be added later. > > API summary for interpreters module > ----------------------------------- > > Here is a summary of the API for the ``interpreters`` module. For a > more in-depth explanation of the proposed classes and functions, see > the `"interpreters" Module API`_ section below. > > For creating and using interpreters: > > +------------------------------+---------------------------- > ------------------+ > | signature | description > | > +============================+=+============================ > ==================+ > | list_all() -> [Intepreter] | Get all existing interpreters. > | > +------------------------------+---------------------------- > ------------------+ > | get_current() -> Interpreter | Get the currently running interpreter. > | > +------------------------------+---------------------------- > ------------------+ > | create() -> Interpreter | Initialize a new (idle) Python > interpreter. | > +------------------------------+---------------------------- > ------------------+ > > | > > +-----------------------+----------------------------------- > ------------------+ > | signature | description > | > +=======================+=================================== > ==================+ > | class Interpreter(id) | A single interpreter. > | > +-----------------------+----------------------------------- > ------------------+ > | .id | The interpreter's ID (read-only). > | > +-----------------------+----------------------------------- > ------------------+ > | .is_running() -> Bool | Is the interpreter currently executing code? > | > +-----------------------+----------------------------------- > ------------------+ > | .destroy() | Finalize and destroy the interpreter. > | > +-----------------------+----------------------------------- > ------------------+ > | .run(src_str, /, \*, | | Run the given source code in the interpreter. > | > | channels=None) | | (This blocks the current thread until done.) > | > +-----------------------+----------------------------------- > ------------------+ > > For sharing data between interpreters: > > +--------------------------------+-------------------------- > ------------------+ > | signature | description > | > +================================+========================== > ==================+ > | is_shareable(obj) -> Bool | | Can the object's data be shared > | > | | | between interpreters? > | > +--------------------------------+-------------------------- > ------------------+ > | create_channel() -> | | Create a new channel for passing > | > | (RecvChannel, SendChannel) | | data between interpreters. > | > +--------------------------------+-------------------------- > ------------------+ > | list_all_channels() -> | Get all open channels. > | > | [(RecvChannel, SendChannel)] | > | > +--------------------------------+-------------------------- > ------------------+ > > | > > +-------------------------------+--------------------------- > --------------------+ > | signature | description > | > +===============================+=========================== > ====================+ > | class RecvChannel(id) | The receiving end of a channel. > | > +-------------------------------+--------------------------- > --------------------+ > | .id | The channel's unique ID. > | > +-------------------------------+--------------------------- > --------------------+ > | .interpreters | The list of associated interpreters. > | > +-------------------------------+--------------------------- > --------------------+ > | .recv() -> object | | Get the next object from the > channel, | > | | | and wait if none have been sent. > | > | | | Associate the interpreter with the > channel. | > +-------------------------------+--------------------------- > --------------------+ > | .recv_nowait(default=None) -> | | Like recv(), but return the > default | > | object | | instead of waiting. > | > +-------------------------------+--------------------------- > --------------------+ > | .close() | | No longer associate the current > interpreter | > | | | with the channel (on the receiving > end). | > +-------------------------------+--------------------------- > --------------------+ > > | > > +---------------------------+------------------------------- > ------------------+ > | signature | description > | > +===========================+=============================== > ==================+ > | class SendChannel(id) | The sending end of a channel. > | > +---------------------------+------------------------------- > ------------------+ > | .id | The channel's unique ID. > | > +---------------------------+------------------------------- > ------------------+ > | .interpreters | The list of associated interpreters. > | > +---------------------------+------------------------------- > ------------------+ > | .send(obj) | | Send the object (i.e. its data) to the > | > | | | receiving end of the channel and wait. > | > | | | Associate the interpreter with the > channel. | > +---------------------------+------------------------------- > ------------------+ > | .send_nowait(obj) | | Like send(), but Fail if not received. > | > +---------------------------+------------------------------- > ------------------+ > | .send_buffer(obj) | | Send the object's (PEP 3118) buffer to > the | > | | | receiving end of the channel and wait. > | > | | | Associate the interpreter with the > channel. | > +---------------------------+------------------------------- > ------------------+ > | .send_buffer_nowait(obj) | | Like send_buffer(), but fail if not > received. | > +---------------------------+------------------------------- > ------------------+ > | .close() | | No longer associate the current > interpreter | > | | | with the channel (on the sending end). > | > +---------------------------+------------------------------- > ------------------+ > > > Examples > ======== > > Run isolated code > ----------------- > > :: > > interp = interpreters.create() > print('before') > interp.run('print("during")') > print('after') > > Run in a thread > --------------- > > :: > > interp = interpreters.create() > def run(): > interp.run('print("during")') > t = threading.Thread(target=run) > print('before') > t.start() > print('after') > > Pre-populate an interpreter > --------------------------- > > :: > > interp = interpreters.create() > interp.run(tw.dedent(""" > import some_lib > import an_expensive_module > some_lib.set_up() > """)) > wait_for_request() > interp.run(tw.dedent(""" > some_lib.handle_request() > """)) > > Handling an exception > --------------------- > > :: > > interp = interpreters.create() > try: > interp.run(tw.dedent(""" > raise KeyError > """)) > except KeyError: > print("got the error from the subinterpreter") > > Synchronize using a channel > --------------------------- > > :: > > interp = interpreters.create() > r, s = interpreters.create_channel() > def run(): > interp.run(tw.dedent(""" > reader.recv() > print("during") > reader.close() > """), > reader=r)) > t = threading.Thread(target=run) > print('before') > t.start() > print('after') > s.send(b'') > s.close() > > Sharing a file descriptor > ------------------------- > > :: > > interp = interpreters.create() > r1, s1 = interpreters.create_channel() > r2, s2 = interpreters.create_channel() > def run(): > interp.run(tw.dedent(""" > fd = int.from_bytes( > reader.recv(), 'big') > for line in os.fdopen(fd): > print(line) > writer.send(b'') > """), > reader=r1, writer=s2) > t = threading.Thread(target=run) > t.start() > with open('spamspamspam') as infile: > fd = infile.fileno().to_bytes(1, 'big') > s.send(fd) > r.recv() > > Passing objects via marshal > --------------------------- > > :: > > interp = interpreters.create() > r, s = interpreters.create_fifo() > interp.run(tw.dedent(""" > import marshal > """), > reader=r) > def run(): > interp.run(tw.dedent(""" > data = reader.recv() > while data: > obj = marshal.loads(data) > do_something(obj) > data = reader.recv() > reader.close() > """), > reader=r) > t = threading.Thread(target=run) > t.start() > for obj in input: > data = marshal.dumps(obj) > s.send(data) > s.send(b'') > > Passing objects via pickle > -------------------------- > > :: > > interp = interpreters.create() > r, s = interpreters.create_channel() > interp.run(tw.dedent(""" > import pickle > """), > reader=r) > def run(): > interp.run(tw.dedent(""" > data = reader.recv() > while data: > obj = pickle.loads(data) > do_something(obj) > data = reader.recv() > reader.close() > """), > reader=r) > t = threading.Thread(target=run) > t.start() > for obj in input: > data = pickle.dumps(obj) > s.send(data) > s.send(b'') > > Running a module > ---------------- > > :: > > interp = interpreters.create() > main_module = mod_name > interp.run(f'import runpy; runpy.run_module({main_module!r})') > > Running as script (including zip archives & directories) > -------------------------------------------------------- > > :: > > interp = interpreters.create() > main_script = path_name > interp.run(f"import runpy; runpy.run_path({main_script!r})") > > Running in a thread pool executor > --------------------------------- > > :: > > interps = [interpreters.create() for i in range(5)] > with concurrent.futures.ThreadPoolExecutor(max_workers=len(interps)) > as pool: > print('before') > for interp in interps: > pool.submit(interp.run, 'print("starting"); print("stopping")' > print('after') > > > Rationale > ========= > > Running code in multiple interpreters provides a useful level of > isolation within the same process. This can be leveraged in a number > of ways. Furthermore, subinterpreters provide a well-defined framework > in which such isolation may extended. > > Nick Coghlan explained some of the benefits through a comparison with > multi-processing [benefits]_:: > > [I] expect that communicating between subinterpreters is going > to end up looking an awful lot like communicating between > subprocesses via shared memory. > > The trade-off between the two models will then be that one still > just looks like a single process from the point of view of the > outside world, and hence doesn't place any extra demands on the > underlying OS beyond those required to run CPython with a single > interpreter, while the other gives much stricter isolation > (including isolating C globals in extension modules), but also > demands much more from the OS when it comes to its IPC > capabilities. > > The security risk profiles of the two approaches will also be quite > different, since using subinterpreters won't require deliberately > poking holes in the process isolation that operating systems give > you by default. > > CPython has supported subinterpreters, with increasing levels of > support, since version 1.5. While the feature has the potential > to be a powerful tool, subinterpreters have suffered from neglect > because they are not available directly from Python. Exposing the > existing functionality in the stdlib will help reverse the situation. > > This proposal is focused on enabling the fundamental capability of > multiple isolated interpreters in the same Python process. This is a > new area for Python so there is relative uncertainly about the best > tools to provide as companions to subinterpreters. Thus we minimize > the functionality we add in the proposal as much as possible. > > Concerns > -------- > > * "subinterpreters are not worth the trouble" > > Some have argued that subinterpreters do not add sufficient benefit > to justify making them an official part of Python. Adding features > to the language (or stdlib) has a cost in increasing the size of > the language. So an addition must pay for itself. In this case, > subinterpreters provide a novel concurrency model focused on isolated > threads of execution. Furthermore, they provide an opportunity for > changes in CPython that will allow simulateous use of multiple CPU > cores (currently prevented by the GIL). > > Alternatives to subinterpreters include threading, async, and > multiprocessing. Threading is limited by the GIL and async isn't > the right solution for every problem (nor for every person). > Multiprocessing is likewise valuable in some but not all situations. > Direct IPC (rather than via the multiprocessing module) provides > similar benefits but with the same caveat. > > Notably, subinterpreters are not intended as a replacement for any of > the above. Certainly they overlap in some areas, but the benefits of > subinterpreters include isolation and (potentially) performance. In > particular, subinterpreters provide a direct route to an alternate > concurrency model (e.g. CSP) which has found success elsewhere and > will appeal to some Python users. That is the core value that the > ``interpreters`` module will provide. > > * "stdlib support for subinterpreters adds extra burden > on C extension authors" > > In the `Interpreter Isolation`_ section below we identify ways in > which isolation in CPython's subinterpreters is incomplete. Most > notable is extension modules that use C globals to store internal > state. PEP 3121 and PEP 489 provide a solution for most of the > problem, but one still remains. [petr-c-ext]_ Until that is resolved, > C extension authors will face extra difficulty to support > subinterpreters. > > Consequently, projects that publish extension modules may face an > increased maintenance burden as their users start using subinterpreters, > where their modules may break. This situation is limited to modules > that use C globals (or use libraries that use C globals) to store > internal state. For numpy, the reported-bug rate is one every 6 > months. [bug-rate]_ > > Ultimately this comes down to a question of how often it will be a > problem in practice: how many projects would be affected, how often > their users will be affected, what the additional maintenance burden > will be for projects, and what the overall benefit of subinterpreters > is to offset those costs. The position of this PEP is that the actual > extra maintenance burden will be small and well below the threshold at > which subinterpreters are worth it. > > > About Subinterpreters > ===================== > > Concurrency > ----------- > > Concurrency is a challenging area of software development. Decades of > research and practice have led to a wide variety of concurrency models, > each with different goals. Most center on correctness and usability. > > One class of concurrency models focuses on isolated threads of > execution that interoperate through some message passing scheme. A > notable example is `Communicating Sequential Processes`_ (CSP), upon > which Go's concurrency is based. The isolation inherent to > subinterpreters makes them well-suited to this approach. > > Shared data > ----------- > > Subinterpreters are inherently isolated (with caveats explained below), > in contrast to threads. So the same communicate-via-shared-memory > approach doesn't work. Without an alternative, effective use of > concurrency via subinterpreters is significantly limited. > > The key challenge here is that sharing objects between interpreters > faces complexity due to various constraints on object ownership, > visibility, and mutability. At a conceptual level it's easier to > reason about concurrency when objects only exist in one interpreter > at a time. At a technical level, CPython's current memory model > limits how Python *objects* may be shared safely between interpreters; > effectively objects are bound to the interpreter in which they were > created. Furthermore the complexity of *object* sharing increases as > subinterpreters become more isolated, e.g. after GIL removal. > > Consequently,the mechanism for sharing needs to be carefully considered. > There are a number of valid solutions, several of which may be > appropriate to support in Python. This proposal provides a single basic > solution: "channels". Ultimately, any other solution will look similar > to the proposed one, which will set the precedent. Note that the > implementation of ``Interpreter.run()`` can be done in a way that allows > for multiple solutions to coexist, but doing so is not technically > a part of the proposal here. > > Regarding the proposed solution, "channels", it is a basic, opt-in data > sharing mechanism that draws inspiration from pipes, queues, and CSP's > channels. [fifo]_ > > As simply described earlier by the API summary, > channels have two operations: send and receive. A key characteristic > of those operations is that channels transmit data derived from Python > objects rather than the objects themselves. When objects are sent, > their data is extracted. When the "object" is received in the other > interpreter, the data is converted back into an object. > > To make this work, the mutable shared state will be managed by the > Python runtime, not by any of the interpreters. Initially we will > support only one type of objects for shared state: the channels provided > by ``create_channel()``. Channels, in turn, will carefully manage > passing objects between interpreters. > > This approach, including keeping the API minimal, helps us avoid further > exposing any underlying complexity to Python users. Along those same > lines, we will initially restrict the types that may be passed through > channels to the following: > > * None > * bytes > * PEP 3118 buffer objects (via ``send_buffer()``) > > Limiting the initial shareable types is a practical matter, reducing > the potential complexity of the initial implementation. There are a > number of strategies we may pursue in the future to expand supported > objects and object sharing strategies. > > Interpreter Isolation > --------------------- > > CPython's interpreters are intended to be strictly isolated from each > other. Each interpreter has its own copy of all modules, classes, > functions, and variables. The same applies to state in C, including in > extension modules. The CPython C-API docs explain more. [caveats]_ > > However, there are ways in which interpreters share some state. First > of all, some process-global state remains shared: > > * file descriptors > * builtin types (e.g. dict, bytes) > * singletons (e.g. None) > * underlying static module data (e.g. functions) for > builtin/extension/frozen modules > > There are no plans to change this. > > Second, some isolation is faulty due to bugs or implementations that did > not take subinterpreters into account. This includes things like > extension modules that rely on C globals. [cryptography]_ In these > cases bugs should be opened (some are already): > > * readline module hook functions (http://bugs.python.org/issue4202) > * memory leaks on re-init (http://bugs.python.org/issue21387) > > Finally, some potential isolation is missing due to the current design > of CPython. Improvements are currently going on to address gaps in this > area: > > * interpreters share the GIL > * interpreters share memory management (e.g. allocators, gc) > * GC is not run per-interpreter [global-gc]_ > * at-exit handlers are not run per-interpreter [global-atexit]_ > * extensions using the ``PyGILState_*`` API are incompatible [gilstate]_ > > Existing Usage > -------------- > > Subinterpreters are not a widely used feature. In fact, the only > documented cases of wide-spread usage are > `mod_wsgi `_and > `JEP `_. On the one hand, this case > provides confidence that existing subinterpreter support is relatively > stable. On the other hand, there isn't much of a sample size from which > to judge the utility of the feature. > > > Provisional Status > ================== > > The new ``interpreters`` module will be added with "provisional" status > (see PEP 411). This allows Python users to experiment with the feature > and provide feedback while still allowing us to adjust to that feedback. > The module will be provisional in Python 3.7 and we will make a decision > before the 3.8 release whether to keep it provisional, graduate it, or > remove it. > > > Alternate Python Implementations > ================================ > > I'll be soliciting feedback from the different Python implementors about > subinterpreter support. > > Multiple-interpter support in the major Python implementations: > > TBD > > * jython: yes [jython]_ > * ironpython: yes? > * pypy: maybe not? [pypy]_ > * micropython: ??? > > > "interpreters" Module API > ========================= > > The module provides the following functions: > > ``list_all()``:: > > Return a list of all existing interpreters. > > ``get_current()``:: > > Return the currently running interpreter. > > ``create()``:: > > Initialize a new Python interpreter and return it. The > interpreter will be created in the current thread and will remain > idle until something is run in it. The interpreter may be used > in any thread and will run in whichever thread calls > ``interp.run()``. > > > The module also provides the following class: > > ``Interpreter(id)``:: > > id: > > The interpreter's ID (read-only). > > is_running(): > > Return whether or not the interpreter is currently executing code. > Calling this on the current interpreter will always return True. > > destroy(): > > Finalize and destroy the interpreter. > > This may not be called on an already running interpreter. Doing > so results in a RuntimeError. > > run(source_str, /, *, channels=None): > > Run the provided Python source code in the interpreter. If the > "channels" keyword argument is provided (and is a mapping of > attribute names to channels) then it is added to the interpreter's > execution namespace (the interpreter's "__main__" module). If any > of the values are not are not RecvChannel or SendChannel instances > then ValueError gets raised. > > This may not be called on an already running interpreter. Doing > so results in a RuntimeError. > > A "run()" call is similar to a function call. Once it completes, > the code that called "run()" continues executing (in the original > interpreter). Likewise, if there is any uncaught exception then > it effectively (see below) propagates into the code where > ``run()`` was called. However, unlike function calls (but like > threads), there is no return value. If any value is needed, pass > it out via a channel. > > The big difference is that "run()" executes the code in an > entirely different interpreter, with entirely separate state. > The state of the current interpreter in the current OS thread > is swapped out with the state of the target interpreter (the one > that will execute the code). When the target finishes executing, > the original interpreter gets swapped back in and its execution > resumes. > > So calling "run()" will effectively cause the current Python > thread to pause. Sometimes you won't want that pause, in which > case you should make the "run()" call in another thread. To do > so, add a function that calls "run()" and then run that function > in a normal "threading.Thread". > > Note that the interpreter's state is never reset, neither before > "run()" executes the code nor after. Thus the interpreter > state is preserved between calls to "run()". This includes > "sys.modules", the "builtins" module, and the internal state > of C extension modules. > > Also note that "run()" executes in the namespace of the "__main__" > module, just like scripts, the REPL, "-m", and "-c". Just as > the interpreter's state is not ever reset, the "__main__" module > is never reset. You can imagine concatenating the code from each > "run()" call into one long script. This is the same as how the > REPL operates. > > Regarding uncaught exceptions, we noted that they are > "effectively" propagated into the code where ``run()`` was called. > To prevent leaking exceptions (and tracebacks) between > interpreters, we create a surrogate of the exception and its > traceback (see ``traceback.TracebackException``), wrap it in a > RuntimeError, and raise that. > > Supported code: source text. > > > API for sharing data > -------------------- > > Subinterpreters are less useful without a mechanism for sharing data > between them. Sharing actual Python objects between interpreters, > however, has enough potential problems that we are avoiding support > for that here. Instead, only mimimum set of types will be supported. > Initially this will include ``bytes`` and channels. Further types may > be supported later. > > The ``interpreters`` module provides a way for users to determine > whether an object is shareable or not: > > ``is_shareable(obj)``:: > > Return True if the object may be shared between interpreters. This > does not necessarily mean that the actual objects will be shared. > Insead, it means that the objects' underlying data will be shared in > a cross-interpreter way, whether via a proxy, a copy, or some other > means. > > This proposal provides two ways to do share such objects between > interpreters. > > First, shareable objects may be passed to ``run()`` as keyword arguments, > where they are effectively injected into the target interpreter's > ``__main__`` module. This is mainly intended for sharing meta-objects > (e.g. channels) between interpreters, as it is less useful to pass other > objects (like ``bytes``) to ``run``. > > Second, the main mechanism for sharing objects (i.e. their data) between > interpreters is through channels. A channel is a simplex FIFO similar > to a pipe. The main difference is that channels can be associated with > zero or more interpreters on either end. Unlike queues, which are also > many-to-many, channels have no buffer. > > ``create_channel()``:: > > Create a new channel and return (recv, send), the RecvChannel and > SendChannel corresponding to the ends of the channel. The channel > is not closed and destroyed (i.e. garbage-collected) until the number > of associated interpreters returns to 0. > > An interpreter gets associated with a channel by calling its "send()" > or "recv()" method. That association gets dropped by calling > "close()" on the channel. > > Both ends of the channel are supported "shared" objects (i.e. may be > safely shared by different interpreters. Thus they may be passed as > keyword arguments to "Interpreter.run()". > > ``list_all_channels()``:: > > Return a list of all open (RecvChannel, SendChannel) pairs. > > > ``RecvChannel(id)``:: > > The receiving end of a channel. An interpreter may use this to > receive objects from another interpreter. At first only bytes will > be supported. > > id: > > The channel's unique ID. > > interpreters: > > The list of associated interpreters: those that have called > the "recv()" or "__next__()" methods and haven't called "close()". > > recv(): > > Return the next object (i.e. the data from the sent object) from > the channel. If none have been sent then wait until the next > send. This associates the current interpreter with the channel. > > If the channel is already closed (see the close() method) > then raise EOFError. If the channel isn't closed, but the current > interpreter already called the "close()" method (which drops its > association with the channel) then raise ValueError. > > recv_nowait(default=None): > > Return the next object from the channel. If none have been sent > then return the default. Otherwise, this is the same as the > "recv()" method. > > close(): > > No longer associate the current interpreter with the channel (on > the receiving end) and block future association (via the "recv()" > method. If the interpreter was never associated with the channel > then still block future association. Once an interpreter is no > longer associated with the channel, subsequent (or current) send() > and recv() calls from that interpreter will raise ValueError > (or EOFError if the channel is actually marked as closed). > > Once the number of associated interpreters on both ends drops > to 0, the channel is actually marked as closed. The Python > runtime will garbage collect all closed channels, though it may > not be immediately. Note that "close()" is automatically called > in behalf of the current interpreter when the channel is no longer > used (i.e. has no references) in that interpreter. > > This operation is idempotent. Return True if "close()" has not > been called before by the current interpreter. > > > ``SendChannel(id)``:: > > The sending end of a channel. An interpreter may use this to send > objects to another interpreter. At first only bytes will be > supported. > > id: > > The channel's unique ID. > > interpreters: > > The list of associated interpreters (those that have called > the "send()" method). > > send(obj): > > Send the object (i.e. its data) to the receiving end of the > channel. Wait until the object is received. If the the > object is not shareable then ValueError is raised. Currently > only bytes are supported. > > If the channel is already closed (see the close() method) > then raise EOFError. If the channel isn't closed, but the current > interpreter already called the "close()" method (which drops its > association with the channel) then raise ValueError. > > send_nowait(obj): > > Send the object to the receiving end of the channel. If the other > end is not currently receiving then raise RuntimeError. Otherwise > this is the same as "send()". > > send_buffer(obj): > > Send a MemoryView of the object rather than the object. Otherwise > this is the same as send(). Note that the object must implement > the PEP 3118 buffer protocol. > > send_buffer_nowait(obj): > > Send a MemoryView of the object rather than the object. If the > other end is not currently receiving then raise RuntimeError. > Otherwise this is the same as "send_buffer()". > > close(): > > This is the same as "RecvChannel.close(), but applied to the > sending end of the channel. > > Note that ``send_buffer()`` is similar to how > ``multiprocessing.Connection`` works. [mp-conn]_ > > > Open Questions > ============== > > None > > > Open Implementation Questions > ============================= > > Does every interpreter think that their thread is the "main" thread? > -------------------------------------------------------------------- > > (This is more of an implementation detail that an issue for the PEP.) > > CPython's interpreter implementation identifies the OS thread in which > it was started as the "main" thread. The interpreter the has slightly > different behavior depending on if the current thread is the main one > or not. This presents a problem in cases where "main thread" is meant > to imply "main thread in the main interpreter" [main-thread]_, where > the main interpreter is the initial one. > > Disallow subinterpreters in the main thread? > -------------------------------------------- > > (This is more of an implementation detail that an issue for the PEP.) > > This is a specific case of the above issue. Currently in CPython, > "we need a main \*thread\* in order to sensibly manage the way signal > handling works across different platforms". [main-thread]_ > > Since signal handlers are part of the interpreter state, running a > subinterpreter in the main thread means that the main interpreter > can no longer properly handle signals (since it's effectively paused). > > Furthermore, running a subinterpreter in the main thread would > conceivably allow setting signal handlers on that interpreter, which > would likewise impact signal handling when that interpreter isn't > running or is running in a different thread. > > Ultimately, running subinterpreters in the main OS thread introduces > complications to the signal handling implementation. So it may make > the most sense to disallow running subinterpreters in the main thread. > Support for it could be considered later. The downside is that folks > wanting to try out subinterpreters would be required to take the extra > step of using threads. This could slow adoption and experimentation, > whereas without the restriction there's less of an obstacle. > > > Deferred Functionality > ====================== > > In the interest of keeping this proposal minimal, the following > functionality has been left out for future consideration. Note that > this is not a judgement against any of said capability, but rather a > deferment. That said, each is arguably valid. > > Interpreter.call() > ------------------ > > It would be convenient to run existing functions in subinterpreters > directly. ``Interpreter.run()`` could be adjusted to support this or > a ``call()`` method could be added:: > > Interpreter.call(f, *args, **kwargs) > > This suffers from the same problem as sharing objects between > interpreters via queues. The minimal solution (running a source string) > is sufficient for us to get the feature out where it can be explored. > > timeout arg to recv() and send() > -------------------------------- > > Typically functions that have a ``block`` argument also have a > ``timeout`` argument. It sometimes makes sense to do likewise for > functions that otherwise block, like the channel ``recv()`` and > ``send()`` methods. We can add it later if needed. > > get_main() > ---------- > > CPython has a concept of a "main" interpreter. This is the initial > interpreter created during CPython's runtime initialization. It may > be useful to identify the main interpreter. For instance, the main > interpreter should not be destroyed. However, for the basic > functionality of a high-level API a ``get_main()`` function is not > necessary. Furthermore, there is no requirement that a Python > implementation have a concept of a main interpreter. So until there's > a clear need we'll leave ``get_main()`` out. > > Interpreter.run_in_thread() > --------------------------- > > This method would make a ``run()`` call for you in a thread. Doing this > using only ``threading.Thread`` and ``run()`` is relatively trivial so > we've left it out. > > Synchronization Primitives > -------------------------- > > The ``threading`` module provides a number of synchronization primitives > for coordinating concurrent operations. This is especially necessary > due to the shared-state nature of threading. In contrast, > subinterpreters do not share state. Data sharing is restricted to > channels, which do away with the need for explicit synchronization. If > any sort of opt-in shared state support is added to subinterpreters in > the future, that same effort can introduce synchronization primitives > to meet that need. > > CSP Library > ----------- > > A ``csp`` module would not be a large step away from the functionality > provided by this PEP. However, adding such a module is outside the > minimalist goals of this proposal. > > Syntactic Support > ----------------- > > The ``Go`` language provides a concurrency model based on CSP, so > it's similar to the concurrency model that subinterpreters support. > ``Go`` provides syntactic support, as well several builtin concurrency > primitives, to make concurrency a first-class feature. Conceivably, > similar syntactic (and builtin) support could be added to Python using > subinterpreters. However, that is *way* outside the scope of this PEP! > > Multiprocessing > --------------- > > The ``multiprocessing`` module could support subinterpreters in the same > way it supports threads and processes. In fact, the module's > maintainer, Davin Potts, has indicated this is a reasonable feature > request. However, it is outside the narrow scope of this PEP. > > C-extension opt-in/opt-out > -------------------------- > > By using the ``PyModuleDef_Slot`` introduced by PEP 489, we could easily > add a mechanism by which C-extension modules could opt out of support > for subinterpreters. Then the import machinery, when operating in > a subinterpreter, would need to check the module for support. It would > raise an ImportError if unsupported. > > Alternately we could support opting in to subinterpreter support. > However, that would probably exclude many more modules (unnecessarily) > than the opt-out approach. > > The scope of adding the ModuleDef slot and fixing up the import > machinery is non-trivial, but could be worth it. It all depends on > how many extension modules break under subinterpreters. Given the > relatively few cases we know of through mod_wsgi, we can leave this > for later. > > Poisoning channels > ------------------ > > CSP has the concept of poisoning a channel. Once a channel has been > poisoned, and ``send()`` or ``recv()`` call on it will raise a special > exception, effectively ending execution in the interpreter that tried > to use the poisoned channel. > > This could be accomplished by adding a ``poison()`` method to both ends > of the channel. The ``close()`` method could work if it had a ``force`` > option to force the channel closed. Regardless, these semantics are > relatively specialized and can wait. > > Sending channels over channels > ------------------------------ > > Some advanced usage of subinterpreters could take advantage of the > ability to send channels over channels, in addition to bytes. Given > that channels will already be multi-interpreter safe, supporting then > in ``RecvChannel.recv()`` wouldn't be a big change. However, this can > wait until the basic functionality has been ironed out. > > Reseting __main__ > ----------------- > > As proposed, every call to ``Interpreter.run()`` will execute in the > namespace of the interpreter's existing ``__main__`` module. This means > that data persists there between ``run()`` calls. Sometimes this isn't > desireable and you want to execute in a fresh ``__main__``. Also, > you don't necessarily want to leak objects there that you aren't using > any more. > > Note that the following won't work right because it will clear too much > (e.g. ``__name__`` and the other "__dunder__" attributes:: > > interp.run('globals().clear()') > > Possible solutions include: > > * a ``create()`` arg to indicate resetting ``__main__`` after each > ``run`` call > * an ``Interpreter.reset_main`` flag to support opting in or out > after the fact > * an ``Interpreter.reset_main()`` method to opt in when desired > * ``importlib.util.reset_globals()`` [reset_globals]_ > > Also note that reseting ``__main__`` does nothing about state stored > in other modules. So any solution would have to be clear about the > scope of what is being reset. Conceivably we could invent a mechanism > by which any (or every) module could be reset, unlike ``reload()`` > which does not clear the module before loading into it. Regardless, > since ``__main__`` is the execution namespace of the interpreter, > resetting it has a much more direct correlation to interpreters and > their dynamic state than does resetting other modules. So a more > generic module reset mechanism may prove unnecessary. > > This isn't a critical feature initially. It can wait until later > if desirable. > > Support passing ints in channels > -------------------------------- > > Passing ints around should be fine and ultimately is probably > desirable. However, we can get by with serializing them as bytes > for now. The goal is a minimal API for the sake of basic > functionality at first. > > File descriptors and sockets in channels > ---------------------------------------- > > Given that file descriptors and sockets are process-global resources, > support for passing them through channels is a reasonable idea. They > would be a good candidate for the first effort at expanding the types > that channels support. They aren't strictly necessary for the initial > API. > > Integration with async > ---------------------- > > Per Antoine Pitrou [async]_:: > > Has any thought been given to how FIFOs could integrate with async > code driven by an event loop (e.g. asyncio)? I think the model of > executing several asyncio (or Tornado) applications each in their > own subinterpreter may prove quite interesting to reconcile multi- > core concurrency with ease of programming. That would require the > FIFOs to be able to synchronize on something an event loop can wait > on (probably a file descriptor?). > > A possible solution is to provide async implementations of the blocking > channel methods (``__next__()``, ``recv()``, and ``send()``). However, > the basic functionality of subinterpreters does not depend on async and > can be added later. > > Support for iteration > --------------------- > > Supporting iteration on ``RecvChannel`` (via ``__iter__()`` or > ``_next__()``) may be useful. A trivial implementation would use the > ``recv()`` method, similar to how files do iteration. Since this isn't > a fundamental capability and has a simple analog, adding iteration > support can wait until later. > > Channel context managers > ------------------------ > > Context manager support on ``RecvChannel`` and ``SendChannel`` may be > helpful. The implementation would be simple, wrapping a call to > ``close()`` like files do. As with iteration, this can wait. > > Pipes and Queues > ---------------- > > With the proposed object passing machanism of "channels", other similar > basic types aren't required to achieve the minimal useful functionality > of subinterpreters. Such types include pipes (like channels, but > one-to-one) and queues (like channels, but buffered). See below in > `Rejected Ideas` for more information. > > Even though these types aren't part of this proposal, they may still > be useful in the context of concurrency. Adding them later is entirely > reasonable. The could be trivially implemented as wrappers around > channels. Alternatively they could be implemented for efficiency at the > same low level as channels. > > interpreters.RunFailedError > --------------------------- > > As currently proposed, ``Interpreter.run()`` offers you no way to > distinguish an error coming from the subinterpreter from any other > error in the current interpreter. Your only option would be to > explicitly wrap your ``run()`` call in a > ``try: ... except RuntimeError:`` (since we wrap a proxy of the original > exception in a RuntimeError and raise that). > > If this is a problem in practice then would could add something like > ``interpreters.RunFailedError`` (subclassing RuntimeError) and raise that > in ``run()``. > > Return a lock from send() > ------------------------- > > When sending an object through a channel, you don't have a way of knowing > when the object gets received on the other end. One way to work around > this is to return a locked ``threading.Lock`` from ``SendChannel.send()`` > that unlocks once the object is received. > > This matters for buffered channels (i.e. queues). For unbuffered > channels it is a non-issue. So this can be dealt with once channels > support buffering. > > > Rejected Ideas > ============== > > Explicit channel association > ---------------------------- > > Interpreters are implicitly associated with channels upon ``recv()`` and > ``send()`` calls. They are de-associated with ``close()`` calls. The > alternative would be explicit methods. It would be either > ``add_channel()`` and ``remove_channel()`` methods on ``Interpreter`` > objects or something similar on channel objects. > > In practice, this level of management shouldn't be necessary for users. > So adding more explicit support would only add clutter to the API. > > Use pipes instead of channels > ----------------------------- > > A pipe would be a simplex FIFO between exactly two interpreters. For > most use cases this would be sufficient. It could potentially simplify > the implementation as well. However, it isn't a big step to supporting > a many-to-many simplex FIFO via channels. Also, with pipes the API > ends up being slightly more complicated, requiring naming the pipes. > > Use queues instead of channels > ------------------------------ > > The main difference between queues and channels is that queues support > buffering. This would complicate the blocking semantics of ``recv()`` > and ``send()``. Also, queues can be built on top of channels. > > "enumerate" > ----------- > > The ``list_all()`` function provides the list of all interpreters. > In the threading module, which partly inspired the proposed API, the > function is called ``enumerate()``. The name is different here to > avoid confusing Python users that are not already familiar with the > threading API. For them "enumerate" is rather unclear, whereas > "list_all" is clear. > > Alternate solutions to prevent leaking exceptions across interpreters > --------------------------------------------------------------------- > > In function calls, uncaught exceptions propagate to the calling frame. > The same approach could be taken with ``run()``. However, this would > mean that exception objects would leak across the inter-interpreter > boundary. Likewise, the frames in the traceback would potentially leak. > > While that might not be a problem currently, it would be a problem once > interpreters get better isolation relative to memory management (which > is necessary to stop sharing the GIL between interpreters). We've > resolved the semantics of how the exceptions propagate by raising a > RuntimeError instead, which wraps a safe proxy for the original > exception and traceback. > > Rejected possible solutions: > > * set the RuntimeError's __cause__ to the proxy of the original > exception > * reproduce the exception and traceback in the original interpreter > and raise that. > * convert at the boundary (a la ``subprocess.CalledProcessError``) > (requires a cross-interpreter representation) > * support customization via ``Interpreter.excepthook`` > (requires a cross-interpreter representation) > * wrap in a proxy at the boundary (including with support for > something like ``err.raise()`` to propagate the traceback). > * return the exception (or its proxy) from ``run()`` instead of > raising it > * return a result object (like ``subprocess`` does) [result-object]_ > (unecessary complexity?) > * throw the exception away and expect users to deal with unhandled > exceptions explicitly in the script they pass to ``run()`` > (they can pass error info out via channels); with threads you have > to do something similar > > > References > ========== > > .. [c-api] > https://docs.python.org/3/c-api/init.html#sub-interpreter-support > > .. _Communicating Sequential Processes: > > .. [CSP] > https://en.wikipedia.org/wiki/Communicating_sequential_processes > https://github.com/futurecore/python-csp > > .. [fifo] > https://docs.python.org/3/library/multiprocessing.html# > multiprocessing.Pipe > https://docs.python.org/3/library/multiprocessing.html# > multiprocessing.Queue > https://docs.python.org/3/library/queue.html#module-queue > http://stackless.readthedocs.io/en/2.7-slp/library/ > stackless/channels.html > https://golang.org/doc/effective_go.html#sharing > http://www.jtolds.com/writing/2016/03/go-channels-are-bad- > and-you-should-feel-bad/ > > .. [caveats] > https://docs.python.org/3/c-api/init.html#bugs-and-caveats > > .. [petr-c-ext] > https://mail.python.org/pipermail/import-sig/2016-June/001062.html > https://mail.python.org/pipermail/python-ideas/2016-April/039748.html > > .. [cryptography] > https://github.com/pyca/cryptography/issues/2299 > > .. [global-gc] > http://bugs.python.org/issue24554 > > .. [gilstate] > https://bugs.python.org/issue10915 > http://bugs.python.org/issue15751 > > .. [global-atexit] > https://bugs.python.org/issue6531 > > .. [mp-conn] > https://docs.python.org/3/library/multiprocessing.html# > multiprocessing.Connection > > .. [bug-rate] > https://mail.python.org/pipermail/python-ideas/2017- > September/047094.html > > .. [benefits] > https://mail.python.org/pipermail/python-ideas/2017- > September/047122.html > > .. [main-thread] > https://mail.python.org/pipermail/python-ideas/2017- > September/047144.html > https://mail.python.org/pipermail/python-dev/2017-September/149566.html > > .. [reset_globals] > https://mail.python.org/pipermail/python-dev/2017-September/149545.html > > .. [async] > https://mail.python.org/pipermail/python-dev/2017-September/149420.html > https://mail.python.org/pipermail/python-dev/2017-September/149585.html > > .. [result-object] > https://mail.python.org/pipermail/python-dev/2017-September/149562.html > > .. [jython] > https://mail.python.org/pipermail/python-ideas/2017-May/045771.html > > .. [pypy] > https://mail.python.org/pipermail/python-ideas/2017- > September/046973.html > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From robb at datalogics.com Wed Dec 6 10:22:06 2017 From: robb at datalogics.com (Rob Boehne) Date: Wed, 6 Dec 2017 15:22:06 +0000 Subject: [Python-Dev] HP-UX pr not feeling the love Message-ID: <9C117CD2-6987-46E1-84EC-DC0324B8A41C@datalogics.com> Hello, Back in June I was fired up to get my diverse set of platforms all running Python 3, but quickly ran into issues and submitted a PR. https://github.com/python/cpython/pull/2519 It seems as though this HP-UX specific change isn?t getting much consideration, which probably isn?t a big deal. What may be more important is that I?ve stopped trying to contribute, and if I really need Python 3 on HP-UX, AIX, Sparc Solaris or other operating systems, I?ll have to hack it together myself and maintain my own fork, while presumably others do the same. At the same time I?m working hard to convince management that we shouldn?t create technical debt by maintaining patches to all the tools we use, and that we should get these changes accepted into the upstream repos. Could someone have a look at this PR and possibly merge? Thanks, Rob Boehne -------------- next part -------------- An HTML attachment was scrubbed... URL: From jwilk at jwilk.net Wed Dec 6 10:59:44 2017 From: jwilk at jwilk.net (Jakub Wilk) Date: Wed, 6 Dec 2017 16:59:44 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: <20171206155943.vppcjgnq3pqxapcq@jwilk.net> * Nick Coghlan , 2017-12-06, 16:15: >>>Something I've just noticed that needs to be clarified: on Linux, "C" >>>locale and "POSIX" locale are aliases, but this isn't true in general >>>(e.g. it's not the case on *BSD systems, including Mac OS X). >>For those of us with little to no BSD/MacOS experience, can you give a >>quick run-down of the differences between "C" and "POSIX"? POSIX says that "C" and "POSIX" are equivalent[0]. >The one that's relevant to default locale detection is just the string >that "setlocale(LC_CTYPE, NULL)" returns. POSIX doesn't require any particular return value for setlocale() calls. It's only guaranteed that the returned string can be used in subsequent setlocale() calls to restore the original locale. So in the POSIX locale, a compliant setlocale() implementation could return "C", or "POSIX", or even something entirely different. >Beyond that, I don't know what the actual functional differences are. I don't believe there are any. [0] http://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html -- Jakub Wilk From lukasz at langa.pl Wed Dec 6 12:45:17 2017 From: lukasz at langa.pl (Lukasz Langa) Date: Wed, 6 Dec 2017 09:45:17 -0800 Subject: [Python-Dev] HP-UX pr not feeling the love In-Reply-To: <9C117CD2-6987-46E1-84EC-DC0324B8A41C@datalogics.com> References: <9C117CD2-6987-46E1-84EC-DC0324B8A41C@datalogics.com> Message-ID: Hi Rob, thanks for your patch. CPython core developers, as volunteers, have limited resources available to maintain Python. Those resources are not only time, they are also mental resources necessary to make a change in Python as well as actual physical resources. Supporting a platform requires all three: 1. You need time to make a platform work initially, and then continuous effort to keep it working, fixing regressions, including this platform in new features, etc. 2. You need mental resources to manage additional complexity that comes from #ifdef sprinkled through the code, cryptic configure/Makefile machinery, etc. 3. You need access to machines running the given operating system to be able to test if your changes are compatible. This is why we are keeping the list of supported platforms relatively short. In fact, in time we're cutting support for less popular platforms that we couldn't keep running. Details in https://www.python.org/dev/peps/pep-0011/ . Look, just in 3.7 we're dropping IRIX and systems without threads. As you're saying, while your current PR is relatively innocent, more are needed to make it work. If those require more drastic changes in our codebase, we won't be able to accept them due to reasons stated above. I understand where you're coming from. If you're serious about this, we would need to see the full extent of changes required to make Python 3.7 work on HP UX, preferably minimal. We would also need a buildbot added to our fleet (see http://buildbot.python.org/ ) that would ensure the build stays green. Finally, we would need you to think whether you could provide the patches that keep the build green for a significant period of time (counted in years). - ? > On Dec 6, 2017, at 7:22 AM, Rob Boehne wrote: > > Hello, > > Back in June I was fired up to get my diverse set of platforms all running Python 3, but quickly ran into issues and submitted a PR. > > https://github.com/python/cpython/pull/2519 > > It seems as though this HP-UX specific change isn?t getting much consideration, which probably isn?t a big deal. What may be more important is that I?ve stopped trying to contribute, and if I really need Python 3 on HP-UX, AIX, Sparc Solaris or other operating systems, I?ll have to hack it together myself and maintain my own fork, while presumably others do the same. At the same time I?m working hard to convince management that we shouldn?t create technical debt by maintaining patches to all the tools we use, and that we should get these changes accepted into the upstream repos. > > Could someone have a look at this PR and possibly merge? > > Thanks, > > Rob Boehne > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/lukasz%40langa.pl -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 874 bytes Desc: Message signed with OpenPGP URL: From brett at python.org Wed Dec 6 13:23:46 2017 From: brett at python.org (Brett Cannon) Date: Wed, 06 Dec 2017 18:23:46 +0000 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: On Wed, 6 Dec 2017 at 06:10 INADA Naoki wrote: > >> And I have one worrying point. > >> With UTF-8 mode, open()'s default encoding/error handler is > >> UTF-8/surrogateescape. > > > > The Strict UTF-8 Mode is for you if you prioritize correctness over > usability. > > Yes, but as I said, I cares about not experienced developer > who doesn't know what UTF-8 mode is. > > > > > In the very first version of my PEP/idea, I wanted to use > > UTF-8/strict. But then I started to play with the implementation and I > > got many "practical" issues. Using UTF-8/strict, you quickly get > > encoding errors. For example, you become unable to read undecodable > > bytes from stdin. stdin.read() only gives you an error, without > > letting you decide how to handle these "invalid" data. Same issue with > > stdout. > > > > I don't care about stdio, because PEP 538 uses surrogateescape for > stdio/error > > https://www.python.org/dev/peps/pep-0538/#changes-to-the-default-error-handling-on-the-standard-streams > > I care only about builtin open()'s behavior. > PEP 538 doesn't change default error handler of open(). > > I think PEP 538 and PEP 540 should behave almost identical except > changing locale > or not. So I need very strong reason if PEP 540 changes default error > handler of open(). > I don't have enough locale experience to weigh in as an expert, but I already was leaning towards INADA-san's logic of not wanting to change open() and this makes me really not want to change it. -Brett > > > > In the old long version of the PEP, I tried to explain UTF-8/strict > > issues with very concrete examples, the removed "Use Cases" section: > > > https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L490 > > > > Tell me if I should rephrase the rationale of the PEP 540 to better > > justify the usage of surrogateescape. > > OK, "List a directory into a text file" example demonstrates why > surrogateescape > is used for open(). If os.listdir() returns surrogateescpaed data, > file.write() will be > fail. > All other examples are about stdio. > > But we should achieve good balance between correctness and usability of > default behavior. > > > > > Maybe the "UTF-8 Mode" should be renamed to "UTF-8 with > > surrogateescape, or backslashreplace for stderr, or surrogatepass for > > fsencode/fsencode on Windows, or strict for Strict UTF-8 Mode"... But > > the PEP title would be too long, no? :-) > > > > I feel short name is enough. > > > > >> And opening binary file without "b" option is very common mistake of new > >> developers. If default error handler is surrogateescape, they lose a > chance > >> to notice their bug. > > > > When open() in used in text mode to read "binary data", usually the > > developer would only notify when getting the POSIX locale (ASCII > > encoding). But the PEP 538 already changed that by using the C.UTF-8 > > locale (and so the UTF-8 encoding, instead of the ASCII encoding). > > > > With PEP 538 (C.UTF-8 locale), open() uses UTF-8/strict, not > UTF-8/surrogateescape. > > For example, this code raise UnicodeDecodeError with PEP 538 if the > file is JPEG file. > > with open(fn) as f: > f.read() > > > > I'm not sure that locales are the best way to detect such class of > > bytes. I suggest to use -b or -bb option to detect such bugs without > > having to care of the locale. > > > > But many new developers doesn't use/know -b or -bb option. > > > > >> On the other hand, it helps some use cases when user want > byte-transparent > >> behavior, without modifying code to use "surrogateescape" explicitly. > >> > >> Which is more important scenario? Anyone has opinion about it? > >> Are there any rationals and use cases I missing? > > > > Usually users expect that Python 3 "just works" and don't bother them > > with the locale (thay nobody understands). > > > > The old version of the PEP contains a long list of issues: > > > https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt#L924-L986 > > > > I already replaced the strict error handler with surrogateescape for > > sys.stdin and sys.stdout on the POSIX locale in Python 3.5: > > https://bugs.python.org/issue19977 > > > > For the rationale, read for example these comments: > > > [snip] > > OK, I'll read them and think again about open()'s default behavior. > But I still hope open()'s behavior is consistent with PEP 538 and PEP 540. > > Regards, > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robb at datalogics.com Wed Dec 6 13:27:57 2017 From: robb at datalogics.com (Rob Boehne) Date: Wed, 6 Dec 2017 18:27:57 +0000 Subject: [Python-Dev] HP-UX pr not feeling the love In-Reply-To: References: <9C117CD2-6987-46E1-84EC-DC0324B8A41C@datalogics.com> Message-ID: Thanks! I?m personally comfortable with dropping support for systems that people can?t buy support for, like IRIX, ULTRIX, SCO etc. It?s hard to envision those being anything other than hobbyist platforms. Here at Datalogics, we are selling and supporting software on HP-UX, for both pa-risc and Itanium, and now that SCons is beginning to support Python 3.x, I am attempting to use some of my spare time to get this platform (and others we support) to build, and then to run, on the development branch for Python 3. So rather than taking a large block of my time to port and fix any problems, I?m going to submit pr?s in a trickle, as time permits. I?m picking HP-UX because it?s probably the most obscure thing we use, and likely would take the most effort. From: Lukasz Langa Date: Wednesday, December 6, 2017 at 11:45 AM To: Rob Boehne Cc: "python-dev at python.org" Subject: Re: [Python-Dev] HP-UX pr not feeling the love Hi Rob, thanks for your patch. CPython core developers, as volunteers, have limited resources available to maintain Python. Those resources are not only time, they are also mental resources necessary to make a change in Python as well as actual physical resources. Supporting a platform requires all three: 1. You need time to make a platform work initially, and then continuous effort to keep it working, fixing regressions, including this platform in new features, etc. 2. You need mental resources to manage additional complexity that comes from #ifdef sprinkled through the code, cryptic configure/Makefile machinery, etc. 3. You need access to machines running the given operating system to be able to test if your changes are compatible. This is why we are keeping the list of supported platforms relatively short. In fact, in time we're cutting support for less popular platforms that we couldn't keep running. Details in https://www.python.org/dev/peps/pep-0011/. Look, just in 3.7 we're dropping IRIX and systems without threads. As you're saying, while your current PR is relatively innocent, more are needed to make it work. If those require more drastic changes in our codebase, we won't be able to accept them due to reasons stated above. I understand where you're coming from. If you're serious about this, we would need to see the full extent of changes required to make Python 3.7 work on HP UX, preferably minimal. We would also need a buildbot added to our fleet (see http://buildbot.python.org/) that would ensure the build stays green. Finally, we would need you to think whether you could provide the patches that keep the build green for a significant period of time (counted in years). - ? On Dec 6, 2017, at 7:22 AM, Rob Boehne > wrote: Hello, Back in June I was fired up to get my diverse set of platforms all running Python 3, but quickly ran into issues and submitted a PR. https://github.com/python/cpython/pull/2519 It seems as though this HP-UX specific change isn?t getting much consideration, which probably isn?t a big deal. What may be more important is that I?ve stopped trying to contribute, and if I really need Python 3 on HP-UX, AIX, Sparc Solaris or other operating systems, I?ll have to hack it together myself and maintain my own fork, while presumably others do the same. At the same time I?m working hard to convince management that we shouldn?t create technical debt by maintaining patches to all the tools we use, and that we should get these changes accepted into the upstream repos. Could someone have a look at this PR and possibly merge? Thanks, Rob Boehne _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/lukasz%40langa.pl -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Wed Dec 6 14:54:34 2017 From: barry at barrys-emacs.org (Barry Scott) Date: Wed, 6 Dec 2017 19:54:34 +0000 Subject: [Python-Dev] iso8601 parsing In-Reply-To: References: <01e69881-3710-87c8-f47a-dfc427ec65b5@mgmiller.net> <11362716.0I2SPu8sME@hammer.magicstack.net> <0fba01d34dca$47ad7940$d7086bc0$@sdamon.com> <20171025213056.GP9068@ando.pearwood.info> Message-ID: > On 26 Oct 2017, at 17:45, Chris Barker wrote: > > This is a key point that I hope is obvious: > > If an ISO string has NO offset or timezone indicator, then a naive datetime should be created. > > (I say, I "hope" it's obvious, because the numpy datetime64 implementation initially (and for years) would apply the machine local timezone to a bare iso string -- which was a f-ing nightmare!) I hope the other obvious thing is that if there is a offset then a datetime that is *not* naive can be created as it describes an unambiguous point in time. We just cannot know what political timezone to choose. I'd guess that it should use the UTC timezone in that case. Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Dec 6 16:48:38 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 07 Dec 2017 10:48:38 +1300 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: <5A2865B6.1000302@canterbury.ac.nz> Victor Stinner wrote: > Maybe the "UTF-8 Mode" should be renamed to "UTF-8 with > surrogateescape, or backslashreplace for stderr, or surrogatepass for > fsencode/fsencode on Windows, or strict for Strict UTF-8 Mode"... But > the PEP title would be too long, no? :-) Relaxed UTF-8 Mode? UTF8-Yeah-I'm-Fine-With-That mode? -- Greg From solipsis at pitrou.net Wed Dec 6 17:07:54 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 6 Dec 2017 23:07:54 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) References: Message-ID: <20171206230754.5abbdd2f@fsol> On Wed, 6 Dec 2017 01:49:41 +0100 Victor Stinner wrote: > Hi, > > I knew that I had to rewrite my PEP 540, but I was too lazy. Since > Guido explicitly requested a shorter PEP, here you have! > > https://www.python.org/dev/peps/pep-0540/ > > Trust me, it's the same PEP, but focused on the most important > information and with a shorter rationale ;-) Congrats on the rewriting! The shortening is appreciated :-) One question: how do you plan to test for the POSIX locale? Apparently you need to check at least for the "C" and "POSIX" strings, but perhaps other aliases as well? Regards Antoine. From victor.stinner at gmail.com Wed Dec 6 17:20:41 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 6 Dec 2017 23:20:41 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: <20171206230754.5abbdd2f@fsol> References: <20171206230754.5abbdd2f@fsol> Message-ID: 2017-12-06 23:07 GMT+01:00 Antoine Pitrou : > One question: how do you plan to test for the POSIX locale? I'm not sure. I will probably rely on Nick for that ;-) Nick already implemented this exact check for his PEP 538 which is already implemented in Python 3.7. I already implemented the PEP 540: https://bugs.python.org/issue29240 https://github.com/python/cpython/pull/855 Right now, my implementation uses: char *ctype = _PyMem_RawStrdup(setlocale(LC_CTYPE, "")); ... if (strcmp(ctype, "C") == 0) ... Victor From solipsis at pitrou.net Wed Dec 6 17:36:14 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 6 Dec 2017 23:36:14 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: <20171206230754.5abbdd2f@fsol> Message-ID: <20171206233614.199f593d@fsol> On Wed, 6 Dec 2017 23:20:41 +0100 Victor Stinner wrote: > 2017-12-06 23:07 GMT+01:00 Antoine Pitrou : > > One question: how do you plan to test for the POSIX locale? > > I'm not sure. I will probably rely on Nick for that ;-) Nick already > implemented this exact check for his PEP 538 which is already > implemented in Python 3.7. Other than that, +1 on the PEP. Regards Antoine. From victor.stinner at gmail.com Wed Dec 6 18:22:52 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 7 Dec 2017 00:22:52 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: <20171206233614.199f593d@fsol> References: <20171206230754.5abbdd2f@fsol> <20171206233614.199f593d@fsol> Message-ID: 2017-12-06 23:36 GMT+01:00 Antoine Pitrou : > Other than that, +1 on the PEP. Naoki doesn't seem to be confortable with the usage of the surrogateescape error handler by default for open(). Are you ok with that? If yes, would you mind to explain why? :-) Victor From pganssle at gmail.com Wed Dec 6 18:07:13 2017 From: pganssle at gmail.com (Paul Ganssle) Date: Wed, 6 Dec 2017 23:07:13 +0000 Subject: [Python-Dev] iso8601 parsing In-Reply-To: References: <01e69881-3710-87c8-f47a-dfc427ec65b5@mgmiller.net> <11362716.0I2SPu8sME@hammer.magicstack.net> <0fba01d34dca$47ad7940$d7086bc0$@sdamon.com> <20171025213056.GP9068@ando.pearwood.info> Message-ID: Here is the PR I've submitted: https://github.com/python/cpython/pull/4699 The contract that I'm supporting (and, I think it can be argued, the only reasonable contract in the intial implementation) is the following: dtstr = dt.isoformat(*args, **kwargs) dt_rt = datetime.fromisoformat(dtstr) assert dt_rt == dt # The two points represent the same absolute time assert dt_rt.replace(tzinfo=None) == dt.replace(tzinfo=None) # And the same wall time For all valid values of `dt`, `args` and `kwargs`. A corollary of the `dt_rt == dt` invariant is that you can perfectly recreate the original `datetime` with the following additional step: dt_rt = dt_rt if dt.tzinfo is None else dt_rt.astimezone(dt.tzinfo) There is no way for us to guarantee that `dt_rt.tzinfo == dt.tzinfo` or that `dt_rt.tzinfo is dt.tzinfo`, because `isoformat()` is slightly lossy (it loses the political zone), but this is not an issue because lossless round trips just require you to serialize the political zone, which is generally simple enough. On 12/06/2017 07:54 PM, Barry Scott wrote: > > >> On 26 Oct 2017, at 17:45, Chris Barker wrote: >> >> This is a key point that I hope is obvious: >> >> If an ISO string has NO offset or timezone indicator, then a naive datetime should be created. >> >> (I say, I "hope" it's obvious, because the numpy datetime64 implementation initially (and for years) would apply the machine local timezone to a bare iso string -- which was a f-ing nightmare!) > > > I hope the other obvious thing is that if there is a offset then a datetime that is *not* naive can be created > as it describes an unambiguous point in time. We just cannot know what political timezone to choose. > I'd guess that it should use the UTC timezone in that case. > > Barry > > > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From solipsis at pitrou.net Wed Dec 6 18:28:42 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 7 Dec 2017 00:28:42 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: <20171206230754.5abbdd2f@fsol> <20171206233614.199f593d@fsol> Message-ID: <20171207002842.5837d5d1@fsol> On Thu, 7 Dec 2017 00:22:52 +0100 Victor Stinner wrote: > 2017-12-06 23:36 GMT+01:00 Antoine Pitrou : > > Other than that, +1 on the PEP. > > Naoki doesn't seem to be confortable with the usage of the > surrogateescape error handler by default for open(). Are you ok with > that? If yes, would you mind to explain why? :-) Sorry, I had missed that objection. I agree with Inada Naoki: it's better to keep it strict. Regards Antoine. From ncoghlan at gmail.com Wed Dec 6 19:31:04 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 Dec 2017 10:31:04 +1000 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: <20171206155943.vppcjgnq3pqxapcq@jwilk.net> References: <20171206155943.vppcjgnq3pqxapcq@jwilk.net> Message-ID: On 7 December 2017 at 01:59, Jakub Wilk wrote: > * Nick Coghlan , 2017-12-06, 16:15: >> The one that's relevant to default locale detection is just the string >> that "setlocale(LC_CTYPE, NULL)" returns. > > POSIX doesn't require any particular return value for setlocale() calls. > It's only guaranteed that the returned string can be used in subsequent > setlocale() calls to restore the original locale. > > So in the POSIX locale, a compliant setlocale() implementation could return > "C", or "POSIX", or even something entirely different. Thanks. I'd been wondering if we should also handle the "POSIX" case in the legacy locale detection logic, and you've convinced me that we should. Issue filed for that here: https://bugs.python.org/issue32238 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Dec 6 19:37:04 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 Dec 2017 10:37:04 +1000 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: <20171206230754.5abbdd2f@fsol> Message-ID: On 7 December 2017 at 08:20, Victor Stinner wrote: > 2017-12-06 23:07 GMT+01:00 Antoine Pitrou : >> One question: how do you plan to test for the POSIX locale? > > I'm not sure. I will probably rely on Nick for that ;-) Nick already > implemented this exact check for his PEP 538 which is already > implemented in Python 3.7. > > I already implemented the PEP 540: > > https://bugs.python.org/issue29240 > https://github.com/python/cpython/pull/855 > > Right now, my implementation uses: > > char *ctype = _PyMem_RawStrdup(setlocale(LC_CTYPE, "")); > ... > if (strcmp(ctype, "C") == 0) ... We have a private helper for this as a result of the PEP 538 implementation: _Py_LegacyLocaleDetected() Details are in the source code at https://github.com/python/cpython/blob/master/Python/pylifecycle.c#L345 As per my comment there, and Jakub Wilk's post to this thread, we're missing a case to also check for the string "POSIX" (which will fix several of the current locale coercion discrepancies between Linux and *BSD systems). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Dec 6 20:04:20 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 Dec 2017 11:04:20 +1000 Subject: [Python-Dev] PEP 554 v4 (new interpreters module) In-Reply-To: References: Message-ID: On 7 December 2017 at 01:50, Guido van Rossum wrote: > Sorry to burst your bubble, but I have not followed any of the discussion > and I am actually very worried about this topic. I don't think I will be > able to make time for this before the 3.7b1 feature freeze. I think that will be OK, as it will encourage us to refactor Eric's branch into two distinct pieces in the meantime: exposing any needed C API elements that aren't currently visible as "nominally-private-but-linkable-if-you're-prepared-to-cope-with-potential-instability" interfaces, and then a pip-installable extension module that adds the Python level API. We won't be able to experiment with ideas like removing GIL sharing between subinterpreters that way, but we'll be able to work on the semantics of the user facing API design, and enable experimentation with things like CSP and Actor-based programming backed by stronger memory separation than is offered by Python threads. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Wed Dec 6 21:46:51 2017 From: guido at python.org (Guido van Rossum) Date: Wed, 6 Dec 2017 18:46:51 -0800 Subject: [Python-Dev] PEP 554 v4 (new interpreters module) In-Reply-To: References: Message-ID: So you're okay with putting this off till (at least) 3.8? That sounds good to me, given that I'd like to go on vacation soon. On Wed, Dec 6, 2017 at 5:04 PM, Nick Coghlan wrote: > On 7 December 2017 at 01:50, Guido van Rossum wrote: > > Sorry to burst your bubble, but I have not followed any of the discussion > > and I am actually very worried about this topic. I don't think I will be > > able to make time for this before the 3.7b1 feature freeze. > > I think that will be OK, as it will encourage us to refactor Eric's > branch into two distinct pieces in the meantime: exposing any needed C > API elements that aren't currently visible as > "nominally-private-but-linkable-if-you're-prepared-to-cope-with-potential- > instability" > interfaces, and then a pip-installable extension module that adds the > Python level API. > > We won't be able to experiment with ideas like removing GIL sharing > between subinterpreters that way, but we'll be able to work on the > semantics of the user facing API design, and enable experimentation > with things like CSP and Actor-based programming backed by stronger > memory separation than is offered by Python threads. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Dec 6 22:22:01 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 7 Dec 2017 13:22:01 +1000 Subject: [Python-Dev] PEP 554 v4 (new interpreters module) In-Reply-To: References: Message-ID: On 7 December 2017 at 12:46, Guido van Rossum wrote: > So you're okay with putting this off till (at least) 3.8? That sounds good > to me, given that I'd like to go on vacation soon. Eric reminded me off-list that we'd like to at least add the lower level _interpreters API for the benefit of the test suite - right now, all of our subinterpreter testing needs to be run through either test_embed or test_capi, which is annoying enough that we end up simply not testing the subinterpreter functionality properly (in practice, we're relying heavily on the regression test suites for mod_wsgi and JEP to find any problems we inadvertently introduce when refactoring CPython's internals). If we were to put that under test.support._interpreters for 3.7, we'd be able to make it clear that we're in "Even more experimental than provisional API status would account for" territory, while still enabling the improved testing and accessibility for experimentation that we're after in order to make some better informed API design proposals for Python 3.8. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Wed Dec 6 22:31:14 2017 From: guido at python.org (Guido van Rossum) Date: Wed, 6 Dec 2017 19:31:14 -0800 Subject: [Python-Dev] PEP 554 v4 (new interpreters module) In-Reply-To: References: Message-ID: If the point is just to be able to test the existing API better, no PEP is needed, right? It would be an unsupported, undocumented API. On Wed, Dec 6, 2017 at 7:22 PM, Nick Coghlan wrote: > On 7 December 2017 at 12:46, Guido van Rossum wrote: > > So you're okay with putting this off till (at least) 3.8? That sounds > good > > to me, given that I'd like to go on vacation soon. > > Eric reminded me off-list that we'd like to at least add the lower > level _interpreters API for the benefit of the test suite - right now, > all of our subinterpreter testing needs to be run through either > test_embed or test_capi, which is annoying enough that we end up > simply not testing the subinterpreter functionality properly (in > practice, we're relying heavily on the regression test suites for > mod_wsgi and JEP to find any problems we inadvertently introduce when > refactoring CPython's internals). > > If we were to put that under test.support._interpreters for 3.7, we'd > be able to make it clear that we're in "Even more experimental than > provisional API status would account for" territory, while still > enabling the improved testing and accessibility for experimentation > that we're after in order to make some better informed API design > proposals for Python 3.8. > > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Dec 6 22:57:55 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 6 Dec 2017 20:57:55 -0700 Subject: [Python-Dev] PEP 554 v4 (new interpreters module) In-Reply-To: References: Message-ID: On Dec 6, 2017 20:31, "Guido van Rossum" wrote: If the point is just to be able to test the existing API better, no PEP is needed, right? It would be an unsupported, undocumented API. In the short term that's one major goal. In the long term the functionality provided by the PEP is a prerequisite for other concurrency-related features, and targeting 3.8 for that is fine. :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Dec 6 23:14:06 2017 From: guido at python.org (Guido van Rossum) Date: Wed, 6 Dec 2017 20:14:06 -0800 Subject: [Python-Dev] PEP 554 v4 (new interpreters module) In-Reply-To: References: Message-ID: OK, then please just change the PEP's Version: header to 3.8. On Wed, Dec 6, 2017 at 7:57 PM, Eric Snow wrote: > > > On Dec 6, 2017 20:31, "Guido van Rossum" wrote: > > If the point is just to be able to test the existing API better, no PEP is > needed, right? It would be an unsupported, undocumented API. > > > In the short term that's one major goal. In the long term the > functionality provided by the PEP is a prerequisite for other > concurrency-related features, and targeting 3.8 for that is fine. :) > > -eric > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Dec 6 23:16:01 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 6 Dec 2017 21:16:01 -0700 Subject: [Python-Dev] PEP 554 v4 (new interpreters module) In-Reply-To: References: Message-ID: On Dec 6, 2017 21:14, "Guido van Rossum" wrote: OK, then please just change the PEP's Version: header to 3.8. Will do. Have a nice vacation! :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Thu Dec 7 01:49:30 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 7 Dec 2017 15:49:30 +0900 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: > I care only about builtin open()'s behavior. > PEP 538 doesn't change default error handler of open(). > > I think PEP 538 and PEP 540 should behave almost identical except > changing locale > or not. So I need very strong reason if PEP 540 changes default error > handler of open(). > I just came up with crazy idea; changing default error handler of open() to "surrogateescape" only when open mode is "w" or "a". When reading, "surrogateescape" error handler is dangerous because it can produce arbitrary broken unicode string by mistake. On the other hand, "surrogateescape" error handler for writing is not so dangerous if encoding is UTF-8. When writing normal unicode string, it doesn't create broken data. When writing string containing surrogateescaped data, data is (partially) broken before writing. This idea allows following code: with open("files.txt", "w") as f: for fn in os.listdir(): # may returns surrogateescaped string f.write(fn+'\n') And it doesn't allow following code: with open("image.jpg", "r") as f: # Binary data, not UTF-8 return f.read() I'm not sure about this is good idea. And I don't know when is good for changing write error handler; only when PEP 538 or PEP 540 is used? Or always when os.fsencoding() is UTF-8? Any thoughts? INADA Naoki From barry at python.org Thu Dec 7 12:48:07 2017 From: barry at python.org (Barry Warsaw) Date: Thu, 7 Dec 2017 12:48:07 -0500 Subject: [Python-Dev] Announcing importlib_resources 0.1 Message-ID: Brett and I have been working on a little skunkworks project for a few weeks, and it?s now time to announce the first release. We?re calling it importlib_resources and its intent is to replace the ?Basic Resource Access? APIs of pkg_resources with more efficient implementations based directly on importlib. importlib_resources 0.1 provides support for Python 2.7, and 3.4-3.7. It defines an ABC that loaders can implement to provide direct access to resources inside packages. importlib_resources has fallbacks for file system and zip file loaders, so it should work out of the box in most of the places that pkg_resources is currently used. We even have a migration guide for folks who want to drop pkg_resources altogether and adopt importlib_resources. importlib_resources explicitly does not support pkg_resources features like entry points, working sets, etc. Still, we think the APIs provided will be good enough for most current use cases. http://importlib-resources.readthedocs.io/ We are calling it ?importlib_resources? because we intend to port this into Python 3.7 under a new importlib.resources subpackage, so starting with Python 3.7, you will get this for free. The API is going to officially be provisional, but I?ve already done an experimental port of at least one big application (I?ll let you guess which one :) and it?s fairly straightforward, if not completely mechanical unfortunately. Take a look at the migration guide for details: http://importlib-resources.readthedocs.io/en/latest/migration.html We also intend to include the ABC in Python 3.7: http://importlib-resources.readthedocs.io/en/latest/abc.html You can of course `pip install importlib_resources`. We?re hosting the project on GitLab, and welcome feedback, bug fixes, improvements, etc! * Project home: https://gitlab.com/python-devs/importlib_resources * Report bugs at: https://gitlab.com/python-devs/importlib_resources/issues * Code hosting: https://gitlab.com/python-devs/importlib_resources.git * Documentation: http://importlib_resources.readthedocs.io/ Cheers. -Barry and Brett -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From chris.barker at noaa.gov Thu Dec 7 15:12:54 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 7 Dec 2017 12:12:54 -0800 Subject: [Python-Dev] iso8601 parsing In-Reply-To: References: <01e69881-3710-87c8-f47a-dfc427ec65b5@mgmiller.net> <11362716.0I2SPu8sME@hammer.magicstack.net> <0fba01d34dca$47ad7940$d7086bc0$@sdamon.com> <20171025213056.GP9068@ando.pearwood.info> Message-ID: On Wed, Dec 6, 2017 at 3:07 PM, Paul Ganssle wrote: > Here is the PR I've submitted: > > https://github.com/python/cpython/pull/4699 > > The contract that I'm supporting (and, I think it can be argued, the only > reasonable contract in the intial implementation) is the following: > > dtstr = dt.isoformat(*args, **kwargs) > dt_rt = datetime.fromisoformat(dtstr) > assert dt_rt == dt # The two points represent the > same absolute time > assert dt_rt.replace(tzinfo=None) == dt.replace(tzinfo=None) # And > the same wall time > that looks good. And I'm sorry, I got a bit lost in the PR, but you are attaching an "offset" tzinfo, when parsing an iso string that has one, yes? I see this in the comments in the PR: """ This does not support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of :meth:`datetime.isoformat` """ I fully agree that that's the MVP -- but is it that hard to parse arbitrary ISO8601 strings in once you've gotten this far? It's a bit uglier than I'd like, but not THAT bad a spec. what ISO8601 compatible features are not supported? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Thu Dec 7 15:27:52 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Thu, 7 Dec 2017 12:27:52 -0800 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level Message-ID: Both typing.NamedTuple and dataclasses.dataclass use the somewhat beautiful PEP 526 variable notations at the class level: @dataclasses.dataclass class Color: hue: int saturation: float lightness: float = 0.5 and class Color(typing.NamedTuple): hue: int saturation: float lightness: float = 0.5 I'm looking for guidance or workarounds for two issues that have arisen. First, the use of default values seems to completely preclude the use of __slots__. For example, this raises a ValueError: class A: __slots__ = ['x', 'y'] x: int = 10 y: int = 20 The second issue is that the different annotations give different signatures than would produced for manually written classes. It is unclear what the best practice is for where to put the annotations and their associated docstrings. In Pydoc for example, this class: class A: 'Class docstring. x is distance in miles' x: int y: int gives a different signature and docstring than for this class: class A: 'Class docstring' def __init__(self, x: int, y: int): 'x is distance in kilometers' pass or for this class: class A: 'Class docstring' def __new__(cls, x: int, y: int) -> A: '''x is distance in inches A is a singleton (once instance per x,y) ''' if (x, y) in cache: return cache[x, y] return object.__new__(cls, x, y) The distinction is important because the dataclass decorator allows you to suppress the generation of __init__ when you need more control than dataclass offers or when you need a __new__ method. I'm unclear on where the docstring and signature for the class is supposed to go so that we get useful signatures and matching docstrings. From eric at trueblade.com Thu Dec 7 15:47:38 2017 From: eric at trueblade.com (Eric V. Smith) Date: Thu, 7 Dec 2017 15:47:38 -0500 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level In-Reply-To: References: Message-ID: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> On 12/7/17 3:27 PM, Raymond Hettinger wrote: ... > I'm looking for guidance or workarounds for two issues that have arisen. > > First, the use of default values seems to completely preclude the use of __slots__. For example, this raises a ValueError: > > class A: > __slots__ = ['x', 'y'] > x: int = 10 > y: int = 20 Hmm, I wasn't aware of that. I'm not sure I understand why that's an error. Maybe it could be fixed? Otherwise, I have a decorator that takes a dataclass and returns a new class with slots set: >>> from dataclasses import dataclass >>> from dataclass_tools import add_slots >>> @add_slots ... @dataclass ... class C: ... x: int = 0 ... y: int = 0 ... >>> c = C() >>> c C(x=0, y=0) >>> c.z = 3 Traceback (most recent call last): File "", line 1, in AttributeError: 'C' object has no attribute 'z' This doesn't help the general case (your class A), but it does at least solve it for dataclasses. Whether it should be actually included, and what the interface would look like, can be (and I'm sure will be!) argued. The reason I didn't include it (as @dataclass(slots=True)) is because it has to return a new class, and the rest of the dataclass features just modifies the given class in place. I wanted to maintain that conceptual simplicity. But this might be a reason to abandon that. For what it's worth, attrs does have an @attr.s(slots=True) that returns a new class with __slots__ set. > The second issue is that the different annotations give different signatures than would produced for manually written classes. It is unclear what the best practice is for where to put the annotations and their associated docstrings. I don't have any suggestions here. Eric. From paul at ganssle.io Thu Dec 7 17:52:23 2017 From: paul at ganssle.io (Paul G) Date: Thu, 7 Dec 2017 22:52:23 +0000 Subject: [Python-Dev] iso8601 parsing In-Reply-To: References: <01e69881-3710-87c8-f47a-dfc427ec65b5@mgmiller.net> <11362716.0I2SPu8sME@hammer.magicstack.net> <0fba01d34dca$47ad7940$d7086bc0$@sdamon.com> <20171025213056.GP9068@ando.pearwood.info> Message-ID: <8766d45c-3d52-c281-bb1a-576ed04f6351@ganssle.io> > And I'm sorry, I got a bit lost in the PR, but you are attaching an > "offset" tzinfo, when parsing an iso string that has one, yes? Yes, a fixed offset time zone (since the original zone information is lost): >>> from dateutil import tz >>> from datetime import datetime >>> datetime(2014, 12, 11, 9, 30, tzinfo=tz.gettz('US/Eastern')) datetime.datetime(2014, 12, 11, 9, 30, tzinfo=tzfile('/usr/share/zoneinfo/US/Eastern')) >>> datetime(2014, 12, 11, 9, 30, tzinfo=tz.gettz('US/Eastern')).isoformat() '2014-12-11T09:30:00-05:00' >>> datetime.fromisoformat('2014-12-11T09:30:00-05:00') datetime.datetime(2014, 12, 11, 9, 30, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=68400))) > I fully agree that that's the MVP -- but is it that hard to parse arbitrary > ISO8601 strings in once you've gotten this far? It's a bit uglier than I'd > like, but not THAT bad a spec. No, and in fact this PR is adapted from a *more general* ISO-8601 parser that I wrote (which is now merged into master on python-dateutil). In the CPython PR I deliberately limited it to be the inverse of `isoformat()` for two major reasons: 1. It allows us to get something out there that everyone can agree on - not only would we have to agree on whether to support arcane ISO8601 formats like YYYY-Www-D, but we also have to then discuss whether we want to be strict and disallow YYYYMM like ISO-8601 does, do we want fractional minute support? What about different variations (we're already supporting replacing T with any character in `.isoformat()` and outputting time zones in the form hh:mm:ss, so what other non-compliant variations do we want to add... and then maintain? We can have these discussions later if we want, but we might as well start with the part everyone can agree on - if it comes out of `isoformat()` it should be able to go back in througuh `fromisoformat()`. 2. It makes it *much* easier to understand what formats are supported. You can say, "This function is for reading in dates serialized with `.isoformat()`", you *immediately* know how to create compliant dates. Not to mention, the specific of formats emitted by `isoformat()` can be written very cleanly as: YYYY-MM-DD[*[HH[:MM[:SS[.mmm[mmm]]]]][+HH:MM]] (where * means any character). ISO 8601 supports YYYY-MM-DD and YYYYMMDD but not YYYY-MMDD or YYYYMM-DD So, basically, it's not that it's amazingly hard to write a fully-featured ISO-8601, it's more that it doesn't seem like a great match for the problem this is intended to solve at this point. Best, Paul On 12/07/2017 08:12 PM, Chris Barker wrote: > >> Here is the PR I've submitted: >> >> https://github.com/python/cpython/pull/4699 >> >> The contract that I'm supporting (and, I think it can be argued, the only >> reasonable contract in the intial implementation) is the following: >> >> dtstr = dt.isoformat(*args, **kwargs) >> dt_rt = datetime.fromisoformat(dtstr) >> assert dt_rt == dt # The two points represent the >> same absolute time >> assert dt_rt.replace(tzinfo=None) == dt.replace(tzinfo=None) # And >> the same wall time >> > > > that looks good. > > I see this in the comments in the PR: > > > """ > This does not support parsing arbitrary ISO 8601 strings - it is only > intended > as the inverse operation of :meth:`datetime.isoformat` > """ > > > what ISO8601 compatible features are not supported? > > -CHB > > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From victor.stinner at gmail.com Thu Dec 7 17:57:48 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 7 Dec 2017 23:57:48 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: While I'm not strongly convinced that open() error handler must be changed for surrogateescape, first I would like to make sure that it's really a very bad idea because changing it :-) 2017-12-07 7:49 GMT+01:00 INADA Naoki : > I just came up with crazy idea; changing default error handler of open() > to "surrogateescape" only when open mode is "w" or "a". The idea is tempting but I'm not sure that it's a good idea. Moreover, what about "r+" and "w+" modes? I dislike getting a different behaviour for inputs and outputs. The motivation for surrogateescape is to "pass through" undecodable bytes: you need to handle them on the input side and on the output side. That's why I decided to not only change sys.stdin error handler to surrogateescape for the POSIX locale, but also sys.stdout: https://bugs.python.org/issue19977 > When reading, "surrogateescape" error handler is dangerous because > it can produce arbitrary broken unicode string by mistake. I'm fine with that. I wouldn't say that it's the purpose of the PEP, but sadly it's an expected, known and documented side effect. You get the same behaviour with Unix command line tools and most Python 2 applications (processing data as bytes). Nothing new under the sun. The PEP 540 allows users to write applications behaving like Unix tools/Python 2 with the power of the Python 3 language and stdlib. Again, use the Strict UTF8 mode if you prioritize *correctness* over *usability*. Honestly, I'm not even sure that the Strict UTF-8 mode is *usable* in practice, since we are all surrounded by old documents encoded to various "legacy" encodings (where legay means: "not UTF-8", like Latin1 or ShiftJIS). The first non-ASCII character which is not encoded to UTF-8 is going to "crash" the application (big traceback with an unicode error). Maybe the problem is the feature name: "UTF-8 mode". Users may think to "strict" when they read "UTF-8", since UTF-8 is known to be a strict encoding. For example, UTF-8 is much stricter than latin1 which is unable to tell if a document was encoded latin1 or whatever else. UTF-8 is able to tell if a document was actually encoded to UTF-8 or not, thanks to the design of the encoding itself. > And it doesn't allow following code: > > with open("image.jpg", "r") as f: # Binary data, not UTF-8 > return f.read() Using a JPEG image, the example is obviously wrong. But using surrogateescape on open() is written to read *text files* which are mostly correctly encoded to UTF-8, except a few bytes. I'm not sure how to explain the issue. The Mercurial wiki page has a good example of this issue that they call the "Makefile problem": https://www.mercurial-scm.org/wiki/EncodingStrategy#The_.22makefile_problem.22 While it's not exactly the discussed issue, it gives you an issue of the kind of issues that you have when you use open(filename, encoding="utf-8", errors="strict") versus open(filename, encoding="utf-8", errors="surrogateescape") > I'm not sure about this is good idea. And I don't know when is good for > changing write error handler; only when PEP 538 or PEP 540 is used? > Or always when os.fsencoding() is UTF-8? > > Any thoughts? The PEP 538 doesn't affect the error handler. The PEP 540 only changes the error handler for the POSIX locale, it's a deliberate choice. The PEP 538 is only enabled for the POSIX locale, and the PEP 540 will also be enabled by default by this locale. I dislike the idea of chaning the error handler if the filesystem encoding is UTF-8. The UTF-8 mode must be enabled explicitly on purpose. The reduce any risk of regression, and prepare users who enable it for any potential issue. Victor From victor.stinner at gmail.com Thu Dec 7 18:02:28 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Dec 2017 00:02:28 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: 2017-12-06 5:07 GMT+01:00 INADA Naoki : > And opening binary file without "b" option is very common mistake of new > developers. If default error handler is surrogateescape, they lose a chance > to notice their bug. To come back to your original point, I didn't know that it was a common mistake to open binary files in text mode. Honestly, I didn't try recently. How does Python behave when you do that? Is it possible to write a full binary parser using the text mode? You should quickly get issues pointing you to your mistake, no? Victor From guido at python.org Thu Dec 7 18:26:52 2017 From: guido at python.org (Guido van Rossum) Date: Thu, 7 Dec 2017 15:26:52 -0800 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: On Thu, Dec 7, 2017 at 3:02 PM, Victor Stinner wrote: > 2017-12-06 5:07 GMT+01:00 INADA Naoki : > > And opening binary file without "b" option is very common mistake of new > > developers. If default error handler is surrogateescape, they lose a > chance > > to notice their bug. > > To come back to your original point, I didn't know that it was a > common mistake to open binary files in text mode. > It probably is because in Python 2 it makes no difference on UNIX, and on Windows the only difference is that binary mode preserves \r. > Honestly, I didn't try recently. How does Python behave when you do that? > > Is it possible to write a full binary parser using the text mode? You > should quickly get issues pointing you to your mistake, no? > You will quickly get decoding errors, and that is INADA's point. (Unless you use encoding='Latin-1'.) His worry is that the surrogateescape error handler makes it so that you won't get decoding errors, and then the failure mode is much harder to debug. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Dec 7 19:48:25 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Dec 2017 01:48:25 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: 2017-12-08 0:26 GMT+01:00 Guido van Rossum : > You will quickly get decoding errors, and that is INADA's point. (Unless you > use encoding='Latin-1'.) His worry is that the surrogateescape error handler > makes it so that you won't get decoding errors, and then the failure mode is > much harder to debug. Hum, my question was more to know if Python fails because of an operation failing with strings whereas bytes were expected, or if Python fails with a decoding error... But now I'm not sure aynmore that this level of detail really matters. Let me think out loud. To explain unicode issues, I like to use filenames, since it's something that users view commonly, handle directly and can modify (and so enter many non-ASCII characters like diacritics and emojis ;-)). Filenames can be found on the command line, in environment variables (PYTHONSTARTUP), stdin (read a list of files from stdin), stdout (write the list of files into stdout), but also in text files (the Mercurial "makefile problem). I consider that the command line and environment variables should "just work" and so use surrogateescape. It would be too annoying to not even be able to *start* Python because of an Unicode error. For example, it wouldn't be easy to identify which environment variable causes the issue. Hopefully, the UTF-8 doesn't change anything here: surrogateescape is already used since Python 3.3 for the command line and environment variables. For stdin/stdout, I think that the main motivation here is to write Unix command line tools using Python 3: pass-through undecodable bytes without bugging the user with Unicode. Users don't use stdin and stdout as regular files, they are more used as pipes to pass data between programs with the Unix pipe in a shell like "producer | consumer". Sometimes stdout is redirected to a file, but I consider that it is expected to behave as a pipe and the regular TTY stdout. IMHO we are still in the safe surrogateescape area (for the specific case of the UTF-8 mode). Ok, now comes the real question, open(). For open(), I used the example of a code snippet *writing* the content of a directory (os.listdir) into a text file. Another example is to read filenames from a text files but pass-through undecodable bytes thanks to surrogateescape. But Naoki explained that open() is commonly misused to open binary files and Python should somehow fail badly to notify the developer of their mistake. If I should make a choice between the two categories of usage of open(), "read undecodable bytes in UTF-8 from a text file" versus "misuse open() on binary file", I expect that the later is more common that that open() shouldn't use surrogateescape by default. While stdin and stdout are usually associated to Unix pipes and Unix tools working on bytes, files are more commonly associated to important data that must not be lost nor corrupted. Python is expected to "help" the developer to use the proper options to read content from a file and to write content into a file. So I understand that open() should use the "strict" error handler in the UTF-8 mode, rather than "surrogateescape". I can survive to this "tiny" change to my PEP. I just posted a 3rd version of my PEP where open() error handler remains strict (is no more changed by the PEP). Victor From victor.stinner at gmail.com Thu Dec 7 19:50:15 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Dec 2017 01:50:15 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) Message-ID: Hi, I made the following two changes to the PEP 540: * open() error handler remains "strict" * remove the "Strict UTF8 mode" which doesn't make much sense anymore I wrote the Strict UTF-8 mode when open() used surrogateescape error handler in the UTF-8 mode. I don't think that a Strict UTF-8 mode is required just to change the error handler of stdin and stdout. Well, read the "Passthough undecodable bytes: surrogateescape" section of the PEP rationale :-) https://www.python.org/dev/peps/pep-0540/ Victor PEP: 540 Title: Add a new UTF-8 mode Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner BDFL-Delegate: INADA Naoki Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 5-January-2016 Python-Version: 3.7 Abstract ======== Add a new UTF-8 mode to ignore the locale, use the UTF-8 encoding, and change ``stdin`` and ``stdout`` error handlers to ``surrogateescape``. This mode is enabled by default in the POSIX locale, but otherwise disabled by default. The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable are added to control the UTF-8 mode. Rationale ========= Locale encoding and UTF-8 ------------------------- Python 3.6 uses the locale encoding for filenames, environment variables, standard streams, etc. The locale encoding is inherited from the locale; the encoding and the locale are tightly coupled. Many users inherit the ASCII encoding from the POSIX locale, aka the "C" locale, but are unable change the locale for different reasons. This encoding is very limited in term of Unicode support: any non-ASCII character is likely to cause troubles. It is not easy to get the expected locale. Locales don't get the exact same name on all Linux distributions, FreeBSD, macOS, etc. Some locales, like the recent ``C.UTF-8`` locale, are only supported by a few platforms. For example, a SSH connection can use a different encoding than the filesystem or terminal encoding of the local host. On the other side, Python 3.6 is already using UTF-8 by default on macOS, Android and Windows (PEP 529) for most functions, except of ``open()``. UTF-8 is also the default encoding of Python scripts, XML and JSON file formats. The Go programming language uses UTF-8 for strings. When all data are stored as UTF-8 but the locale is often misconfigured, an obvious solution is to ignore the locale and use UTF-8. PEP 538 attempts to mitigate this problem by coercing the C locale to a UTF-8 based locale when one is available, but that isn't a universal solution. For example, CentOS 7's container images default to the POSIX locale, and don't include the C.UTF-8 locale, so PEP 538's locale coercion is ineffective. Passthough undecodable bytes: surrogateescape --------------------------------------------- When decoding bytes from UTF-8 using the ``strict`` error handler, which is the default, Python 3 raises a ``UnicodeDecodeError`` on the first undecodable byte. Unix command line tools like ``cat`` or ``grep`` and most Python 2 applications simply do not have this class of bugs: they don't decode data, but process data as a raw bytes sequence. Python 3 already has a solution to behave like Unix tools and Python 2: the ``surrogateescape`` error handler (:pep:`383`). It allows to process data "as bytes" but uses Unicode in practice (undecodable bytes are stored as surrogate characters). The UTF-8 mode uses the ``surrogateescape`` error handler for ``stdin`` and ``stdout`` since these streams as commonly associated to Unix command line tools. However, users have a different expectation on files. Files are expected to be properly encoded. Python is expected to fail early when ``open()`` is called with the wrong options, like opening a JPEG picture in text mode. The ``open()`` default error handler remains ``strict`` for these reasons. No change by default for best backward compatibility ---------------------------------------------------- While UTF-8 is perfect in most cases, sometimes the locale encoding is actually the best encoding. This PEP changes the behaviour for the POSIX locale since this locale usually gives the ASCII encoding, whereas UTF-8 is a much better choice. It does not change the behaviour for other locales to prevent any risk or regression. As users are responsible to enable explicitly the new UTF-8 mode, they are responsible for any potential mojibake issues caused by this mode. Proposal ======== Add a new UTF-8 mode to ignore the locale, use the UTF-8 encoding, and change ``stdin`` and ``stdout`` error handlers to ``surrogateescape``. This mode is enabled by default in the POSIX locale, but otherwise disabled by default. The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable are added. The UTF-8 mode is enabled by ``-X utf8`` or ``PYTHONUTF8=1``. The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``. For standard streams, the ``PYTHONIOENCODING`` environment variable has priority over the UTF-8 mode. On Windows, the ``PYTHONLEGACYWINDOWSFSENCODING`` environment variable (:pep:`529`) has the priority over the UTF-8 mode. Backward Compatibility ====================== The only backward incompatible change is that the UTF-8 encoding is now used for the POSIX locale. Annex: Encodings And Error Handlers =================================== The UTF-8 mode changes the default encoding and error handler used by ``open()``, ``os.fsdecode()``, ``os.fsencode()``, ``sys.stdin``, ``sys.stdout`` and ``sys.stderr``. Encoding and error handler -------------------------- ============================ ======================= ========================== Function Default UTF-8 mode or POSIX locale ============================ ======================= ========================== open() locale/strict **UTF-8**/strict os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape** sys.stderr locale/backslashreplace **UTF-8**/backslashreplace ============================ ======================= ========================== By comparison, Python 3.6 uses: ============================ ======================= ========================== Function Default POSIX locale ============================ ======================= ========================== open() locale/strict locale/strict os.fsdecode(), os.fsencode() locale/surrogateescape locale/surrogateescape sys.stdin, sys.stdout locale/strict locale/**surrogateescape** sys.stderr locale/backslashreplace locale/backslashreplace ============================ ======================= ========================== Encoding and error handler on Windows ------------------------------------- On Windows, the encodings and error handlers are different: ============================ ======================= ========================== ========================== Function Default Legacy Windows FS encoding UTF-8 mode ============================ ======================= ========================== ========================== open() mbcs/strict mbcs/strict **UTF-8**/strict os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** UTF-8/surrogatepass sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape UTF-8/surrogateescape sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace ============================ ======================= ========================== ========================== By comparison, Python 3.6 uses: ============================ ======================= ========================== Function Default Legacy Windows FS encoding ============================ ======================= ========================== open() mbcs/strict mbcs/strict os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace ============================ ======================= ========================== The "Legacy Windows FS encoding" is enabled by the ``PYTHONLEGACYWINDOWSFSENCODING`` environment variable. If stdin and/or stdout is redirected to a pipe, ``sys.stdin`` and/or ``sys.output`` use ``mbcs`` encoding by default rather than UTF-8. But in the UTF-8 mode, ``sys.stdin`` and ``sys.stdout`` always use the UTF-8 encoding. .. note: There is no POSIX locale on Windows. The ANSI code page is used to the locale encoding, and this code page never uses the ASCII encoding. Annex: Differences between PEP 538 and PEP 540 ============================================== PEP 538's locale coercion is only effective if a suitable UTF-8 based locale is available as a coercion target. PEP 540's UTF-8 mode can be enabled even for operating systems that don't provide a suitable platform locale (such as CentOS 7). PEP 538 only changes the interpreter's behaviour for the C locale. While the new UTF-8 mode of this PEP is only enabled by default in the C locale, it can also be enabled manually for any other locale. PEP 538 is implemented with ``setlocale(LC_CTYPE, "")`` and ``setenv("LC_CTYPE", "")``, so any non-Python code running in the process and any subprocesses that inherit the environment is impacted by the change. PEP 540 is implemented in Python internals and ignores the locale: non-Python running in the same process is not aware of the "Python UTF-8 mode". The benefit of the PEP 538 approach is that it helps ensure that encoding handling in binary extension modules and subprocesses is consistent with CPython's encoding handling. The upside of the PEP 540 approach is that it allows an embedding application to change the interpreter's behaviour without having to change the process global locale settings. Links ===== * `bpo-29240: Implementation of the PEP 540: Add a new UTF-8 mode `_ * `PEP 538 `_: "Coercing the legacy C locale to C.UTF-8" * `PEP 529 `_: "Change Windows filesystem encoding to UTF-8" * `PEP 528 `_: "Change Windows console encoding to UTF-8" * `PEP 383 `_: "Non-decodable Bytes in System Character Interfaces" Post History ============ * 2017-12: `[Python-Dev] PEP 540: Add a new UTF-8 mode `_ * 2017-04: `[Python-Dev] Proposed BDFL Delegate update for PEPs 538 & 540 (assuming UTF-8 for *nix system boundaries) `_ * 2017-01: `[Python-ideas] PEP 540: Add a new UTF-8 mode `_ * 2017-01: `bpo-28180: Implementation of the PEP 538: coerce C locale to C.utf-8 (msg284764) `_ * 2016-08-17: `bpo-27781: Change sys.getfilesystemencoding() on Windows to UTF-8 (msg272916) `_ -- Victor proposed ``-X utf8`` for the :pep:`529` (Change Windows filesystem encoding to UTF-8) Copyright ========= This document has been placed in the public domain. From v+python at g.nevcal.com Thu Dec 7 20:15:45 2017 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 7 Dec 2017 17:15:45 -0800 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: <2b7f0a87-eb82-4f0f-b79c-196f96fb89b7@g.nevcal.com> On 12/7/2017 4:48 PM, Victor Stinner wrote: > > Ok, now comes the real question, open(). > > For open(), I used the example of a code snippet *writing* the content > of a directory (os.listdir) into a text file. Another example is to > read filenames from a text files but pass-through undecodable bytes > thanks to surrogateescape. > > But Naoki explained that open() is commonly misused to open binary > files and Python should somehow fail badly to notify the developer of > their mistake. So the real problem here is that open has a default mode of text. Instead of forcing the user to specify either "text" or "binary" when opening, text is used as a default, binary as an option to be specified. I understand that default has a long history in Unix-land, dating at last as far back as 1977 when I first learned how to use the Unix open() function. And now it would be an incompatible change to change it. The real question is whether or not it is a good idea to change it... at this point in time, with Unicode and UTF-8 so prevalent, text and binary modes are far different than back in 1977, when they mostly just documented that this was a binary file that was being opened, and that one could more likely expect to see read() than fgets() in the following code. If it were to be changed, one could add a text-mode option in 3.7, say "t" in the mode string, and a PendingDeprecationWarning for open calls without the specification of either t or b in the mode string. In 3.8, the warning would be changed to DeprecationWarning. In 3.9, all open calls would need to have either t or b, or would fail. Meanwhile, back on the PEP 540 ranch, text mode open calls could immediately use surrogateescape, binary mode open calls would not, and unspecified open calls would not. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcgoble3 at gmail.com Thu Dec 7 20:45:46 2017 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Fri, 08 Dec 2017 01:45:46 +0000 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: <2b7f0a87-eb82-4f0f-b79c-196f96fb89b7@g.nevcal.com> References: <2b7f0a87-eb82-4f0f-b79c-196f96fb89b7@g.nevcal.com> Message-ID: On Thu, Dec 7, 2017 at 8:38 PM Glenn Linderman wrote: > If it were to be changed, one could add a text-mode option in 3.7, say "t" > in the mode string, and a PendingDeprecationWarning for open calls without > the specification of either t or b in the mode string. > "t" is already supported in open()'s mode argument [1] as a way to explicitly request text mode, though it's essentially ignored right now since text is the default anyway. So since the option is already present, the only thing needed at this stage for your plan would be to begin deprecating not using it. *goes back to lurking* [1] https://docs.python.org/3/library/functions.html#open -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Thu Dec 7 20:49:43 2017 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 7 Dec 2017 17:49:43 -0800 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: <2b7f0a87-eb82-4f0f-b79c-196f96fb89b7@g.nevcal.com> Message-ID: On 12/7/2017 5:45 PM, Jonathan Goble wrote: > On Thu, Dec 7, 2017 at 8:38 PM Glenn Linderman > wrote: > > If it were to be changed, one could add a text-mode option in 3.7, > say "t" in the mode string, and a PendingDeprecationWarning for > open calls without the specification of either t or b in the mode > string. > > > "t" is already supported in open()'s mode argument [1] as a way to > explicitly request text mode, though it's essentially ignored right > now since text is the default anyway. So since the option is already > present, the only thing needed at this stage for your plan would be to > begin deprecating not using it. > > *goes back to lurking* > > [1] https://docs.python.org/3/library/functions.html#open Thanks for briefly de-lurking. So then for PEP 540... use surrogateescape immediately for t mode. Then, when the user encounters an encoding error, there would be three solutions: switch to t mode, explicitly switch to surrogateescape, or fix the locale. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Dec 7 20:57:23 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 7 Dec 2017 17:57:23 -0800 Subject: [Python-Dev] iso8601 parsing In-Reply-To: <8766d45c-3d52-c281-bb1a-576ed04f6351@ganssle.io> References: <01e69881-3710-87c8-f47a-dfc427ec65b5@mgmiller.net> <11362716.0I2SPu8sME@hammer.magicstack.net> <0fba01d34dca$47ad7940$d7086bc0$@sdamon.com> <20171025213056.GP9068@ando.pearwood.info> <8766d45c-3d52-c281-bb1a-576ed04f6351@ganssle.io> Message-ID: <6231266026602906357@unknownmsgid> >but is it that hard to parse arbitrary ISO8601 strings in once you've gotten this far? It's a bit uglier than I'd like, but not THAT bad a spec. No, and in fact this PR is adapted from a *more general* ISO-8601 parser that I wrote (which is now merged into master on python-dateutil). In the CPython PR I deliberately limited it to be the inverse of `isoformat()` for two major reasons: 1. It allows us to get something out there that everyone can agree on - not only would we have to agree on whether to support arcane ISO8601 formats like YYYY-Www-D, I don?t know ? would anyone complain about it supporting too arcane a format? Also ? ?most ISO compliant ? date time strings would get us a long way. but we also have to then discuss whether we want to be strict and disallow YYYYMM like ISO-8601 does, Well, I think disallowing something has little utility - we really don?t want this to be a validator. do we want fractional minute support? What about different variations (we're already supporting replacing T with any character in `.isoformat()` and outputting time zones in the form hh:mm:ss, so what other non-compliant variations do we want to add.. Wait ? does datetime.isoformat() put out non-compliant strings? Anyway, supporting all of what .isoformat() puts out, plus Most of iso8601 would be a great start. - if it comes out of `isoformat()` it should be able to go back in througuh `fromisoformat()`. Yup. But had anyone raised objections to it being more flexible? 2. It makes it *much* easier to understand what formats are supported. You can say, "This function is for reading in dates serialized with `.isoformat()`", you *immediately* know how to create compliant dates. We could still document that as the preferred form. You?re writing the code, and I don?t have time to help, so by all means do what you think is best. But if you?ve got code that?s more flexible, I can?t imagine anyone complaining about a more flexible parser. Though I have a limited imagination about such things. But I hope it will at least accept both with and without the T. Thanks for working on this. -Chris On 12/07/2017 08:12 PM, Chris Barker wrote: Here is the PR I've submitted: https://github.com/python/cpython/pull/4699 The contract that I'm supporting (and, I think it can be argued, the only reasonable contract in the intial implementation) is the following: dtstr = dt.isoformat(*args, **kwargs) dt_rt = datetime.fromisoformat(dtstr) assert dt_rt == dt # The two points represent the same absolute time assert dt_rt.replace(tzinfo=None) == dt.replace(tzinfo=None) # And the same wall time that looks good. I see this in the comments in the PR: """ This does not support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of :meth:`datetime.isoformat` """ what ISO8601 compatible features are not supported? -CHB -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Dec 7 21:10:36 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 7 Dec 2017 18:10:36 -0800 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: <2b7f0a87-eb82-4f0f-b79c-196f96fb89b7@g.nevcal.com> Message-ID: <372515396893022654@unknownmsgid> I?m a bit confused: File names and the like are one thing, and the CONTENTS of files is quite another. I get that there is theoretically a ?default? encoding for the contents of text files, but that is SO likely to be wrong as to be ignorable. open() already defaults to utf-8. Which is a fine default if you are going to have one, but it seems a bad idea to have it default to surrogateescape EVER, regardless of the locale or anything else. If the file is binary, or a different encoding, or simply broken, it?s much better to get an encoding error as soon as possible. Why does this have anything to do with the PEP? Perhaps the issue of reading a filename from the system, writing it to a file, then reading it back in again. I actually do that a lot ? but mostly so I can pass that file to another system, so I really don?t want broken encoding in it anyway. -CHB Sent from my iPhone On Dec 7, 2017, at 5:53 PM, Glenn Linderman wrote: On 12/7/2017 5:45 PM, Jonathan Goble wrote: On Thu, Dec 7, 2017 at 8:38 PM Glenn Linderman wrote: > If it were to be changed, one could add a text-mode option in 3.7, say "t" > in the mode string, and a PendingDeprecationWarning for open calls without > the specification of either t or b in the mode string. > "t" is already supported in open()'s mode argument [1] as a way to explicitly request text mode, though it's essentially ignored right now since text is the default anyway. So since the option is already present, the only thing needed at this stage for your plan would be to begin deprecating not using it. *goes back to lurking* [1] https://docs.python.org/3/library/functions.html#open Thanks for briefly de-lurking. So then for PEP 540... use surrogateescape immediately for t mode. Then, when the user encounters an encoding error, there would be three solutions: switch to t mode, explicitly switch to surrogateescape, or fix the locale. _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Dec 7 21:12:19 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 7 Dec 2017 18:12:19 -0800 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: <4921580886527508700@unknownmsgid> I made the following two changes to the PEP 540: * open() error handler remains "strict" * remove the "Strict UTF8 mode" which doesn't make much sense anymore +1 ? ignore my previous note. -CHB I wrote the Strict UTF-8 mode when open() used surrogateescape error handler in the UTF-8 mode. I don't think that a Strict UTF-8 mode is required just to change the error handler of stdin and stdout. Well, read the "Passthough undecodable bytes: surrogateescape" section of the PEP rationale :-) https://www.python.org/dev/peps/pep-0540/ Victor PEP: 540 Title: Add a new UTF-8 mode Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner BDFL-Delegate: INADA Naoki Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 5-January-2016 Python-Version: 3.7 Abstract ======== Add a new UTF-8 mode to ignore the locale, use the UTF-8 encoding, and change ``stdin`` and ``stdout`` error handlers to ``surrogateescape``. This mode is enabled by default in the POSIX locale, but otherwise disabled by default. The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable are added to control the UTF-8 mode. Rationale ========= Locale encoding and UTF-8 ------------------------- Python 3.6 uses the locale encoding for filenames, environment variables, standard streams, etc. The locale encoding is inherited from the locale; the encoding and the locale are tightly coupled. Many users inherit the ASCII encoding from the POSIX locale, aka the "C" locale, but are unable change the locale for different reasons. This encoding is very limited in term of Unicode support: any non-ASCII character is likely to cause troubles. It is not easy to get the expected locale. Locales don't get the exact same name on all Linux distributions, FreeBSD, macOS, etc. Some locales, like the recent ``C.UTF-8`` locale, are only supported by a few platforms. For example, a SSH connection can use a different encoding than the filesystem or terminal encoding of the local host. On the other side, Python 3.6 is already using UTF-8 by default on macOS, Android and Windows (PEP 529) for most functions, except of ``open()``. UTF-8 is also the default encoding of Python scripts, XML and JSON file formats. The Go programming language uses UTF-8 for strings. When all data are stored as UTF-8 but the locale is often misconfigured, an obvious solution is to ignore the locale and use UTF-8. PEP 538 attempts to mitigate this problem by coercing the C locale to a UTF-8 based locale when one is available, but that isn't a universal solution. For example, CentOS 7's container images default to the POSIX locale, and don't include the C.UTF-8 locale, so PEP 538's locale coercion is ineffective. Passthough undecodable bytes: surrogateescape --------------------------------------------- When decoding bytes from UTF-8 using the ``strict`` error handler, which is the default, Python 3 raises a ``UnicodeDecodeError`` on the first undecodable byte. Unix command line tools like ``cat`` or ``grep`` and most Python 2 applications simply do not have this class of bugs: they don't decode data, but process data as a raw bytes sequence. Python 3 already has a solution to behave like Unix tools and Python 2: the ``surrogateescape`` error handler (:pep:`383`). It allows to process data "as bytes" but uses Unicode in practice (undecodable bytes are stored as surrogate characters). The UTF-8 mode uses the ``surrogateescape`` error handler for ``stdin`` and ``stdout`` since these streams as commonly associated to Unix command line tools. However, users have a different expectation on files. Files are expected to be properly encoded. Python is expected to fail early when ``open()`` is called with the wrong options, like opening a JPEG picture in text mode. The ``open()`` default error handler remains ``strict`` for these reasons. No change by default for best backward compatibility ---------------------------------------------------- While UTF-8 is perfect in most cases, sometimes the locale encoding is actually the best encoding. This PEP changes the behaviour for the POSIX locale since this locale usually gives the ASCII encoding, whereas UTF-8 is a much better choice. It does not change the behaviour for other locales to prevent any risk or regression. As users are responsible to enable explicitly the new UTF-8 mode, they are responsible for any potential mojibake issues caused by this mode. Proposal ======== Add a new UTF-8 mode to ignore the locale, use the UTF-8 encoding, and change ``stdin`` and ``stdout`` error handlers to ``surrogateescape``. This mode is enabled by default in the POSIX locale, but otherwise disabled by default. The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable are added. The UTF-8 mode is enabled by ``-X utf8`` or ``PYTHONUTF8=1``. The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``. For standard streams, the ``PYTHONIOENCODING`` environment variable has priority over the UTF-8 mode. On Windows, the ``PYTHONLEGACYWINDOWSFSENCODING`` environment variable (:pep:`529`) has the priority over the UTF-8 mode. Backward Compatibility ====================== The only backward incompatible change is that the UTF-8 encoding is now used for the POSIX locale. Annex: Encodings And Error Handlers =================================== The UTF-8 mode changes the default encoding and error handler used by ``open()``, ``os.fsdecode()``, ``os.fsencode()``, ``sys.stdin``, ``sys.stdout`` and ``sys.stderr``. Encoding and error handler -------------------------- ============================ ======================= ========================== Function Default UTF-8 mode or POSIX locale ============================ ======================= ========================== open() locale/strict **UTF-8**/strict os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape** sys.stderr locale/backslashreplace **UTF-8**/backslashreplace ============================ ======================= ========================== By comparison, Python 3.6 uses: ============================ ======================= ========================== Function Default POSIX locale ============================ ======================= ========================== open() locale/strict locale/strict os.fsdecode(), os.fsencode() locale/surrogateescape locale/surrogateescape sys.stdin, sys.stdout locale/strict locale/**surrogateescape** sys.stderr locale/backslashreplace locale/backslashreplace ============================ ======================= ========================== Encoding and error handler on Windows ------------------------------------- On Windows, the encodings and error handlers are different: ============================ ======================= ========================== ========================== Function Default Legacy Windows FS encoding UTF-8 mode ============================ ======================= ========================== ========================== open() mbcs/strict mbcs/strict **UTF-8**/strict os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** UTF-8/surrogatepass sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape UTF-8/surrogateescape sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace ============================ ======================= ========================== ========================== By comparison, Python 3.6 uses: ============================ ======================= ========================== Function Default Legacy Windows FS encoding ============================ ======================= ========================== open() mbcs/strict mbcs/strict os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace ============================ ======================= ========================== The "Legacy Windows FS encoding" is enabled by the ``PYTHONLEGACYWINDOWSFSENCODING`` environment variable. If stdin and/or stdout is redirected to a pipe, ``sys.stdin`` and/or ``sys.output`` use ``mbcs`` encoding by default rather than UTF-8. But in the UTF-8 mode, ``sys.stdin`` and ``sys.stdout`` always use the UTF-8 encoding. .. note: There is no POSIX locale on Windows. The ANSI code page is used to the locale encoding, and this code page never uses the ASCII encoding. Annex: Differences between PEP 538 and PEP 540 ============================================== PEP 538's locale coercion is only effective if a suitable UTF-8 based locale is available as a coercion target. PEP 540's UTF-8 mode can be enabled even for operating systems that don't provide a suitable platform locale (such as CentOS 7). PEP 538 only changes the interpreter's behaviour for the C locale. While the new UTF-8 mode of this PEP is only enabled by default in the C locale, it can also be enabled manually for any other locale. PEP 538 is implemented with ``setlocale(LC_CTYPE, "")`` and ``setenv("LC_CTYPE", "")``, so any non-Python code running in the process and any subprocesses that inherit the environment is impacted by the change. PEP 540 is implemented in Python internals and ignores the locale: non-Python running in the same process is not aware of the "Python UTF-8 mode". The benefit of the PEP 538 approach is that it helps ensure that encoding handling in binary extension modules and subprocesses is consistent with CPython's encoding handling. The upside of the PEP 540 approach is that it allows an embedding application to change the interpreter's behaviour without having to change the process global locale settings. Links ===== * `bpo-29240: Implementation of the PEP 540: Add a new UTF-8 mode `_ * `PEP 538 `_: "Coercing the legacy C locale to C.UTF-8" * `PEP 529 `_: "Change Windows filesystem encoding to UTF-8" * `PEP 528 `_: "Change Windows console encoding to UTF-8" * `PEP 383 `_: "Non-decodable Bytes in System Character Interfaces" Post History ============ * 2017-12: `[Python-Dev] PEP 540: Add a new UTF-8 mode `_ * 2017-04: `[Python-Dev] Proposed BDFL Delegate update for PEPs 538 & 540 (assuming UTF-8 for *nix system boundaries) `_ * 2017-01: `[Python-ideas] PEP 540: Add a new UTF-8 mode `_ * 2017-01: `bpo-28180: Implementation of the PEP 538: coerce C locale to C.utf-8 (msg284764) `_ * 2016-08-17: `bpo-27781: Change sys.getfilesystemencoding() on Windows to UTF-8 (msg272916) `_ -- Victor proposed ``-X utf8`` for the :pep:`529` (Change Windows filesystem encoding to UTF-8) Copyright ========= This document has been placed in the public domain. _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-dev at mgmiller.net Thu Dec 7 22:38:00 2017 From: python-dev at mgmiller.net (Mike Miller) Date: Thu, 7 Dec 2017 19:38:00 -0800 Subject: [Python-Dev] iso8601 parsing In-Reply-To: <6231266026602906357@unknownmsgid> References: <01e69881-3710-87c8-f47a-dfc427ec65b5@mgmiller.net> <11362716.0I2SPu8sME@hammer.magicstack.net> <0fba01d34dca$47ad7940$d7086bc0$@sdamon.com> <20171025213056.GP9068@ando.pearwood.info> <8766d45c-3d52-c281-bb1a-576ed04f6351@ganssle.io> <6231266026602906357@unknownmsgid> Message-ID: <8bf55eb4-e986-cbbd-65d7-472f9b149ce8@mgmiller.net> Guess the argument for limiting what it accepts would be that every funky variation will need to be supported until the endtimes, even those of little use or utility. On the other hand, it might be good to keep the two implementations the same for consistency reasons. Thanks either way, -Mike On 2017-12-07 17:57, Chris Barker - NOAA Federal wrote: From songofacandy at gmail.com Fri Dec 8 00:02:23 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 8 Dec 2017 14:02:23 +0900 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: Looks nice. But I want to clarify more about difference/relationship between PEP 538 and 540. If I understand correctly: Both of PEP 538 (locale coercion) and PEP 540 (UTF-8 mode) shares same logic to detect POSIX locale. When POSIX locale is detected, locale coercion is tried first. And if locale coercion succeeds, UTF-8 mode is not used because locale is not POSIX anymore. If locale coercion is disabled or failed, UTF-8 mode is used automatically, unless it is disabled explicitly. UTF-8 mode is similar to C.UTF-8 or other locale coercion target locales. But UTF-8 mode is different from C.UTF-8 locale in these ways because actual locale is not changed: * Libraries using locale (e.g. readline) works as in POSIX locale. So UTF-8 cannot be used in such libraries. * locale.getpreferredencoding() returns 'ASCII' instead of 'UTF-8'. So libraries depending on locale.getpreferredencoding() may raise UnicodeErrors. Am I correct? Or locale.getpreferredencoding() returns UTF-8 in UTF-8 mode too? INADA Naoki On Fri, Dec 8, 2017 at 9:50 AM, Victor Stinner wrote: > Hi, > > I made the following two changes to the PEP 540: > > * open() error handler remains "strict" > * remove the "Strict UTF8 mode" which doesn't make much sense anymore > > I wrote the Strict UTF-8 mode when open() used surrogateescape error > handler in the UTF-8 mode. I don't think that a Strict UTF-8 mode is > required just to change the error handler of stdin and stdout. Well, > read the "Passthough undecodable bytes: surrogateescape" section of > the PEP rationale :-) > > > https://www.python.org/dev/peps/pep-0540/ > > Victor > > > PEP: 540 > Title: Add a new UTF-8 mode > Version: $Revision$ > Last-Modified: $Date$ > Author: Victor Stinner > BDFL-Delegate: INADA Naoki > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 5-January-2016 > Python-Version: 3.7 > > > Abstract > ======== > > Add a new UTF-8 mode to ignore the locale, use the UTF-8 encoding, and > change ``stdin`` and ``stdout`` error handlers to ``surrogateescape``. > This mode is enabled by default in the POSIX locale, but otherwise > disabled by default. > > The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment > variable are added to control the UTF-8 mode. > > > Rationale > ========= > > Locale encoding and UTF-8 > ------------------------- > > Python 3.6 uses the locale encoding for filenames, environment > variables, standard streams, etc. The locale encoding is inherited from > the locale; the encoding and the locale are tightly coupled. > > Many users inherit the ASCII encoding from the POSIX locale, aka the "C" > locale, but are unable change the locale for different reasons. This > encoding is very limited in term of Unicode support: any non-ASCII > character is likely to cause troubles. > > It is not easy to get the expected locale. Locales don't get the exact > same name on all Linux distributions, FreeBSD, macOS, etc. Some > locales, like the recent ``C.UTF-8`` locale, are only supported by a few > platforms. For example, a SSH connection can use a different encoding > than the filesystem or terminal encoding of the local host. > > On the other side, Python 3.6 is already using UTF-8 by default on > macOS, Android and Windows (PEP 529) for most functions, except of > ``open()``. UTF-8 is also the default encoding of Python scripts, XML > and JSON file formats. The Go programming language uses UTF-8 for > strings. > > When all data are stored as UTF-8 but the locale is often misconfigured, > an obvious solution is to ignore the locale and use UTF-8. > > PEP 538 attempts to mitigate this problem by coercing the C locale > to a UTF-8 based locale when one is available, but that isn't a > universal solution. For example, CentOS 7's container images default > to the POSIX locale, and don't include the C.UTF-8 locale, so PEP 538's > locale coercion is ineffective. > > > Passthough undecodable bytes: surrogateescape > --------------------------------------------- > > When decoding bytes from UTF-8 using the ``strict`` error handler, which > is the default, Python 3 raises a ``UnicodeDecodeError`` on the first > undecodable byte. > > Unix command line tools like ``cat`` or ``grep`` and most Python 2 > applications simply do not have this class of bugs: they don't decode > data, but process data as a raw bytes sequence. > > Python 3 already has a solution to behave like Unix tools and Python 2: > the ``surrogateescape`` error handler (:pep:`383`). It allows to process > data "as bytes" but uses Unicode in practice (undecodable bytes are > stored as surrogate characters). > > The UTF-8 mode uses the ``surrogateescape`` error handler for ``stdin`` > and ``stdout`` since these streams as commonly associated to Unix > command line tools. > > However, users have a different expectation on files. Files are expected > to be properly encoded. Python is expected to fail early when ``open()`` > is called with the wrong options, like opening a JPEG picture in text > mode. The ``open()`` default error handler remains ``strict`` for these > reasons. > > > No change by default for best backward compatibility > ---------------------------------------------------- > > While UTF-8 is perfect in most cases, sometimes the locale encoding is > actually the best encoding. > > This PEP changes the behaviour for the POSIX locale since this locale > usually gives the ASCII encoding, whereas UTF-8 is a much better choice. > It does not change the behaviour for other locales to prevent any risk > or regression. > > As users are responsible to enable explicitly the new UTF-8 mode, they > are responsible for any potential mojibake issues caused by this mode. > > > Proposal > ======== > > Add a new UTF-8 mode to ignore the locale, use the UTF-8 encoding, and > change ``stdin`` and ``stdout`` error handlers to ``surrogateescape``. > This mode is enabled by default in the POSIX locale, but otherwise > disabled by default. > > The new ``-X utf8`` command line option and ``PYTHONUTF8`` environment > variable are added. The UTF-8 mode is enabled by ``-X utf8`` or > ``PYTHONUTF8=1``. > > The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode > can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``. > > For standard streams, the ``PYTHONIOENCODING`` environment variable has > priority over the UTF-8 mode. > > On Windows, the ``PYTHONLEGACYWINDOWSFSENCODING`` environment variable > (:pep:`529`) has the priority over the UTF-8 mode. > > > Backward Compatibility > ====================== > > The only backward incompatible change is that the UTF-8 encoding is now > used for the POSIX locale. > > > Annex: Encodings And Error Handlers > =================================== > > The UTF-8 mode changes the default encoding and error handler used by > ``open()``, ``os.fsdecode()``, ``os.fsencode()``, ``sys.stdin``, > ``sys.stdout`` and ``sys.stderr``. > > Encoding and error handler > -------------------------- > > ============================ ======================= > ========================== > Function Default UTF-8 mode or > POSIX locale > ============================ ======================= > ========================== > open() locale/strict **UTF-8**/strict > os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape > sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape** > sys.stderr locale/backslashreplace > **UTF-8**/backslashreplace > ============================ ======================= > ========================== > > By comparison, Python 3.6 uses: > > ============================ ======================= > ========================== > Function Default POSIX locale > ============================ ======================= > ========================== > open() locale/strict locale/strict > os.fsdecode(), os.fsencode() locale/surrogateescape locale/surrogateescape > sys.stdin, sys.stdout locale/strict > locale/**surrogateescape** > sys.stderr locale/backslashreplace locale/backslashreplace > ============================ ======================= > ========================== > > Encoding and error handler on Windows > ------------------------------------- > > On Windows, the encodings and error handlers are different: > > ============================ ======================= > ========================== ========================== > Function Default Legacy Windows > FS encoding UTF-8 mode > ============================ ======================= > ========================== ========================== > open() mbcs/strict mbcs/strict > **UTF-8**/strict > os.fsdecode(), os.fsencode() UTF-8/surrogatepass > **mbcs/replace** UTF-8/surrogatepass > sys.stdin, sys.stdout UTF-8/surrogateescape > UTF-8/surrogateescape UTF-8/surrogateescape > sys.stderr UTF-8/backslashreplace > UTF-8/backslashreplace UTF-8/backslashreplace > ============================ ======================= > ========================== ========================== > > By comparison, Python 3.6 uses: > > ============================ ======================= > ========================== > Function Default Legacy Windows > FS encoding > ============================ ======================= > ========================== > open() mbcs/strict mbcs/strict > os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** > sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape > sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace > ============================ ======================= > ========================== > > The "Legacy Windows FS encoding" is enabled by the > ``PYTHONLEGACYWINDOWSFSENCODING`` environment variable. > > If stdin and/or stdout is redirected to a pipe, ``sys.stdin`` and/or > ``sys.output`` use ``mbcs`` encoding by default rather than UTF-8. But > in the UTF-8 mode, ``sys.stdin`` and ``sys.stdout`` always use the UTF-8 > encoding. > > .. note: > There is no POSIX locale on Windows. The ANSI code page is used to the > locale encoding, and this code page never uses the ASCII encoding. > > > Annex: Differences between PEP 538 and PEP 540 > ============================================== > > PEP 538's locale coercion is only effective if a suitable UTF-8 > based locale is available as a coercion target. PEP 540's > UTF-8 mode can be enabled even for operating systems that don't > provide a suitable platform locale (such as CentOS 7). > > PEP 538 only changes the interpreter's behaviour for the C locale. While the > new UTF-8 mode of this PEP is only enabled by default in the C locale, it can > also be enabled manually for any other locale. > > PEP 538 is implemented with ``setlocale(LC_CTYPE, "")`` and > ``setenv("LC_CTYPE", "")``, so any non-Python code running > in the process and any subprocesses that inherit the environment is impacted > by the change. PEP 540 is implemented in Python internals and ignores the > locale: non-Python running in the same process is not aware of the > "Python UTF-8 mode". The benefit of the PEP 538 approach is that it helps > ensure that encoding handling in binary extension modules and subprocesses > is consistent with CPython's encoding handling. The upside of the PEP 540 > approach is that it allows an embedding application to change the > interpreter's behaviour without having to change the process global > locale settings. > > > Links > ===== > > * `bpo-29240: Implementation of the PEP 540: Add a new UTF-8 mode > `_ > * `PEP 538 `_: > "Coercing the legacy C locale to C.UTF-8" > * `PEP 529 `_: > "Change Windows filesystem encoding to UTF-8" > * `PEP 528 `_: > "Change Windows console encoding to UTF-8" > * `PEP 383 `_: > "Non-decodable Bytes in System Character Interfaces" > > > Post History > ============ > > * 2017-12: `[Python-Dev] PEP 540: Add a new UTF-8 mode > `_ > * 2017-04: `[Python-Dev] Proposed BDFL Delegate update for PEPs 538 & > 540 (assuming UTF-8 for *nix system boundaries) > `_ > * 2017-01: `[Python-ideas] PEP 540: Add a new UTF-8 mode > `_ > * 2017-01: `bpo-28180: Implementation of the PEP 538: coerce C locale to > C.utf-8 (msg284764) `_ > * 2016-08-17: `bpo-27781: Change sys.getfilesystemencoding() on Windows > to UTF-8 (msg272916) `_ > -- Victor proposed ``-X utf8`` for the :pep:`529` (Change Windows > filesystem encoding to UTF-8) > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com From songofacandy at gmail.com Fri Dec 8 00:11:14 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 8 Dec 2017 14:11:14 +0900 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: > Or locale.getpreferredencoding() returns UTF-8 in UTF-8 mode too? Or should we change loale.getpreferredencoding() to return UTF-8 instead of ASCII always, regardless of PEP 538 and 540? INADA Naoki From greg.ewing at canterbury.ac.nz Fri Dec 8 00:20:49 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 08 Dec 2017 18:20:49 +1300 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v2) In-Reply-To: References: Message-ID: <5A2A2131.9050504@canterbury.ac.nz> Victor Stinner wrote: > Users don't use stdin and > stdout as regular files, they are more used as pipes to pass data > between programs with the Unix pipe in a shell like "producer | > consumer". Sometimes stdout is redirected to a file, but I consider > that it is expected to behave as a pipe and the regular TTY stdout. It seems weird to me to make a distinction between stdin/stdout connected to a file and accessing the file some other way. It would be surprising, for example, if the following two commands behaved differently with respect to encoding: cat foo | sort cat < foo | sort > But Naoki explained that open() is commonly misused to open binary > files and Python should somehow fail badly to notify the developer of > their mistake. Maybe if you *explicitly* open the file in text mode it should default to surrogateescape, but use strict if text mode is being used by default? I.e. open("foo", "rt") --> surrogateescape open("foo") --> strict That way you can easily open a file in a way that's compatible with the way stdin/stdout behave, but you will get bitten if you mistakenly open a binary file as text. -- Greg From victor.stinner at gmail.com Fri Dec 8 05:22:25 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Dec 2017 11:22:25 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: Hi, Oh, locale.getpreferredencoding(), that's a good question :-) 2017-12-08 6:02 GMT+01:00 INADA Naoki : > But I want to clarify more about difference/relationship between PEP > 538 and 540. > > If I understand correctly: > > Both of PEP 538 (locale coercion) and PEP 540 (UTF-8 mode) shares > same logic to detect POSIX locale. > > When POSIX locale is detected, locale coercion is tried first. And if > locale coercion > succeeds, UTF-8 mode is not used because locale is not POSIX anymore. No, I would like to enable the UTF-8 mode as well in this case. In short, locale coercion and UTF-8 mode will be both enabled by the POSIX locale. > If locale coercion is disabled or failed, UTF-8 mode is used automatically, > unless it is disabled explicitly. PEP 540 is always enabled if the POSIX locale is detected. Only PYTHONUTF8=0 or -X utf8=0 disable it in this case. Disabling locale coercion doesn't disable the PEP 540. > UTF-8 mode is similar to C.UTF-8 or other locale coercion target locales. > But UTF-8 mode is different from C.UTF-8 locale in these ways because > actual locale is not changed: > > * Libraries using locale (e.g. readline) works as in POSIX locale. So UTF-8 > cannot be used in such libraries. My assumption is that very few C library rely on the locale encoding. The wchar_t* type is rarely used. You may only get issues if Python pass UTF-8 encoded string to a C library which tries to decode it from the locale encoding which is not UTF-8. For example, with the POSIX locale, if the locale encoding is ASCII, you can get a decoding error if a C library tries to decode a UTF-8 encoded string coming from Python. But the encoding problem is not restricted to the current process. For the "producer | consumer" model, if the producer is a Python 3.7 application using UTF-8 mode and so encoding text to UTF-8 to stdout, an application may be unable to decode the UTF-8 data. Here we enter the grey area of encodings. Which applications rely use the locale encoding? Which applications always use UTF-8? Do some applications try UTF-8 first, or falls back on the locale encoding? (OpenSSL does that on filenames for example, as the glib if I recall correctly.) Until we know exactly how UTF-8 is used in the "wild", I chose to make the UTF-8 an opt-in option for locales other than POSIX. I expect a few bugs reports later which will help us to adjust our encodings. > * locale.getpreferredencoding() returns 'ASCII' instead of 'UTF-8'. So > libraries depending on locale.getpreferredencoding() may raise > UnicodeErrors. Right. > Or locale.getpreferredencoding() returns UTF-8 in UTF-8 mode too? Here is where the PEP 538 plays very nicely with the PEP 540. On platforms where the locale coercion is supported (Fedora, macOS, FreeBSD, maybe other Linux distributons), on the POSIX locale, locale.getpreferredencoding() will return UTF-8 and functions like mbstowcs() will use the UTF-8 encoding internally. Currently, in the implementation of my PEP 540, I chose to modify open() to use UTF-8 if the UTF-8 mode is used, rather using locale.getpreferredencoding(). Victor From victor.stinner at gmail.com Fri Dec 8 07:58:33 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Dec 2017 13:58:33 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: 2017-12-08 6:11 GMT+01:00 INADA Naoki : > Or should we change loale.getpreferredencoding() to return UTF-8 > instead of ASCII always, regardless of PEP 538 and 540? On the POSIX locale, if the locale coercion works (PEP 538), locale.getpreferredencoding() returns UTF-8. We are good. The question is for platforms like Centos 7 where the locale coercion (PEP 538) doesn't work and so Python uses UTF-8 (PEP 540), whereas the locale probably uses ASCII (or maybe Latin1). My current implementation of the PEP 540 is cheating for open(): if sys.flags.utf8_mode is non-zero, use the UTF-8 encoding rather than calling locale.getpreferredencoding(). I checked the stdlib, and I found many places where locale.getpreferredencoding() is used to get the user preferred encoding: * builtin open(): default encoding * cgi.FieldStorage: encode the query string * encoding._alias_mbcs(): check if the requested encoding is the ANSI code page * gettext.GNUTranslations: lgettext() and lngettext() methods * xml.etree.ElementTree: ElementTree.write(encoding='unicode') In the UTF-8 mode, I would expect that cgi, gettext and xml.etree all use the UTF-8 encoding by default. So locale.getpreferredencoding() should return UTF-8 if the UTF-8 mode is enabled. The private _alias_mbcs() method can be modified to call directly _locale._getdefaultlocale()[1] to get the ANSI code page. Question: do we need to add an option to getpreferredencoding() to return the locale encoding even if the UTF-8 mode is enabled. If yes, what should be the API? locale.getpreferredencoding(utf8_mode=False)? Victor From songofacandy at gmail.com Fri Dec 8 09:01:01 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 8 Dec 2017 23:01:01 +0900 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: On Fri, Dec 8, 2017 at 7:22 PM, Victor Stinner wrote: >> >> Both of PEP 538 (locale coercion) and PEP 540 (UTF-8 mode) shares >> same logic to detect POSIX locale. >> >> When POSIX locale is detected, locale coercion is tried first. And if >> locale coercion >> succeeds, UTF-8 mode is not used because locale is not POSIX anymore. > > No, I would like to enable the UTF-8 mode as well in this case. > > In short, locale coercion and UTF-8 mode will be both enabled by the > POSIX locale. > Hm, it is bit surprising because I thought UTF-8 mode is fallback of locale coercion when coercion is failed or disabled. As PEP 538 [1], all coercion target locales uses surrogateescape for stdin and stdout. So, do you mean "UTF-8 mode enabled as flag level, but it has no real effects"? [1]: https://www.python.org/dev/peps/pep-0538/#changes-to-the-default-error-handling-on-the-standard-streams Since coercion target locales and UTF-8 mode do same thing, I think this is not a big issue. But I want it is clarified in the PEP. Regards, --- INADA Naoki From victor.stinner at gmail.com Fri Dec 8 10:18:29 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Dec 2017 16:18:29 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: 2017-12-08 15:01 GMT+01:00 INADA Naoki : >> In short, locale coercion and UTF-8 mode will be both enabled by the >> POSIX locale. > > Hm, it is bit surprising because I thought UTF-8 mode is fallback > of locale coercion when coercion is failed or disabled. I rewrote the "differences between the PEP 538 and the PEP 540" as a new section "Relationship with the locale coercion (PEP 538)". https://www.python.org/dev/peps/pep-0540/#relationship-with-the-locale-coercion-pep-538 """ Relationship with the locale coercion (PEP 538) =============================================== The POSIX locale enables the locale coercion (PEP 538) and the UTF-8 mode (PEP 540). When the locale coercion is enabled, enabling the UTF-8 mode has no (additional) effect. Locale coercion only impacts non-Python code like C libraries, whereas the Python UTF-8 Mode only impacts Python code: the two PEPs are complementary. On platforms where locale coercion is not supported like Centos 7, the POSIX locale only enables the UTF-8 Mode. In this case, Python code uses the UTF-8 encoding and ignores the locale encoding, whereas non-Python code uses the locale encoding which is usually ASCII for the POSIX locale. While the UTF-8 Mode is supported on all platforms and can be enabled with any locale, the locale coercion is not supported by all platforms and is restricted to the POSIX locale. The UTF-8 Mode has only an impact on Python child processes when the ``PYTHONUTF8`` environment variable is set to ``1``, whereas the locale coercion sets the ``LC_CTYPE`` environment variables which impacts all child processes. The benefit of the locale coercion approach is that it helps ensure that encoding handling in binary extension modules and child processes is consistent with Python's encoding handling. The upside of the UTF-8 Mode approach is that it allows an embedding application to change the interpreter's behaviour without having to change the process global locale settings. """ I hope that it's now better explained. In short, the two PEPs are really complementary. > As PEP 538 [1], all coercion target locales uses surrogateescape > for stdin and stdout. > So, do you mean "UTF-8 mode enabled as flag level, but it has no > real effects"? Right and it was a deliberate choice of Nick Coghlan when he designed the PEP 538, to make sure that the two PEPs are complementary and "compatible". Victor From victor.stinner at gmail.com Fri Dec 8 10:22:35 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Dec 2017 16:22:35 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: I updated my PEP: in the 4th version, locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 Mode. https://www.python.org/dev/peps/pep-0540/ I also clarified the direct effects of the UTF-8 Mode, but also listed the most user visible changes as "Side effects". """ Effects of the UTF-8 Mode: * ``sys.getfilesystemencoding()`` returns ``'UTF-8'``. * ``locale.getpreferredencoding()`` returns ``UTF-8``, its *do_setlocale* argument and the locale encoding are ignored. * ``sys.stdin`` and ``sys.stdout`` error handler is set to ``surrogateescape`` Side effects: * ``open()`` uses the UTF-8 encoding by default. * ``os.fsdecode()`` and ``os.fsencode()`` use the UTF-8 encoding. * Command line arguments, environment variables and filenames use the UTF-8 encoding. """ Thank you Naokia INADA for your quick feedback, it was very helpful and I really like how the PEP evolves! IMHO the PEP 540 version 4 is just perfect and ready for pronouncement! (... until someone finds another flaw, obviously!) Victor 2017-12-08 13:58 GMT+01:00 Victor Stinner : > 2017-12-08 6:11 GMT+01:00 INADA Naoki : >> Or should we change loale.getpreferredencoding() to return UTF-8 >> instead of ASCII always, regardless of PEP 538 and 540? > > On the POSIX locale, if the locale coercion works (PEP 538), > locale.getpreferredencoding() returns UTF-8. We are good. > > The question is for platforms like Centos 7 where the locale coercion > (PEP 538) doesn't work and so Python uses UTF-8 (PEP 540), whereas the > locale probably uses ASCII (or maybe Latin1). > > My current implementation of the PEP 540 is cheating for open(): if > sys.flags.utf8_mode is non-zero, use the UTF-8 encoding rather than > calling locale.getpreferredencoding(). > > I checked the stdlib, and I found many places where > locale.getpreferredencoding() is used to get the user preferred > encoding: > > * builtin open(): default encoding > * cgi.FieldStorage: encode the query string > * encoding._alias_mbcs(): check if the requested encoding is the ANSI code page > * gettext.GNUTranslations: lgettext() and lngettext() methods > * xml.etree.ElementTree: ElementTree.write(encoding='unicode') > > In the UTF-8 mode, I would expect that cgi, gettext and xml.etree all > use the UTF-8 encoding by default. So locale.getpreferredencoding() > should return UTF-8 if the UTF-8 mode is enabled. > > The private _alias_mbcs() method can be modified to call directly > _locale._getdefaultlocale()[1] to get the ANSI code page. > > Question: do we need to add an option to getpreferredencoding() to > return the locale encoding even if the UTF-8 mode is enabled. If yes, > what should be the API? locale.getpreferredencoding(utf8_mode=False)? > > Victor From victor.stinner at gmail.com Fri Dec 8 10:23:42 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Dec 2017 16:23:42 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: 2017-12-08 16:22 GMT+01:00 Victor Stinner : > I updated my PEP: in the 4th version, locale.getpreferredencoding() > now returns 'UTF-8' in the UTF-8 Mode. Sorry, I forgot to mention that I already updated the implementation to the latest version of the PEP: https://github.com/python/cpython/pull/855 Victor From ethan at stoneleaf.us Fri Dec 8 11:29:31 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 08 Dec 2017 08:29:31 -0800 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: <5A2ABDEB.8020502@stoneleaf.us> There were some concerns about open() earlier: On Wed, 6 Dec 2017 at 06:10 INADA Naoki wrote: > I think PEP 538 and PEP 540 should behave almost identical except > changing locale or not. So I need very strong reason if PEP 540 > changes default error handler of open(). Brett replied: > I don't have enough locale experience to weigh in as an expert, > but I already was leaning towards INADA-san's logic of not wanting > to change open() and this makes me really not want to change it. On 12/08/2017 07:22 AM, Victor Stinner wrote: > """ > Effects of the UTF-8 Mode: [...] > Side effects: > > * ``open()`` uses the UTF-8 encoding by default. For those of us trying to follow along, is this change to open() one that Inada-san was worried about? Has something else changed? -- ~Ethan~ From victor.stinner at gmail.com Fri Dec 8 11:46:42 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 8 Dec 2017 17:46:42 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: <5A2ABDEB.8020502@stoneleaf.us> References: <5A2ABDEB.8020502@stoneleaf.us> Message-ID: 2017-12-08 17:29 GMT+01:00 Ethan Furman : > For those of us trying to follow along, is this change to open() one that > Inada-san was worried about? Has something else changed? I agree that my PEP is evolving quickly, that's why I added a "Version History" at the end: https://www.python.org/dev/peps/pep-0540/#version-history """ Version History =============== * Version 4: ``locale.getpreferredencoding()`` now returns ``'UTF-8'`` in the UTF-8 Mode. * Version 3: The UTF-8 Mode does not change the ``open()`` default error handler (``strict``) anymore, and the Strict UTF-8 Mode has been removed. * Version 2: Rewrite the PEP from scratch to make it much shorter and easier to understand. * Version 1: First version posted to python-dev. """ Naoki disliked the usage of the surrogateescape error handler for open(). I "fixed" this in the PEP version 3: open() error handler is not modified by the PEP. Victor From status at bugs.python.org Fri Dec 8 12:09:54 2017 From: status at bugs.python.org (Python tracker) Date: Fri, 8 Dec 2017 18:09:54 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20171208170954.540D011A86A@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2017-12-01 - 2017-12-08) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6315 (+34) closed 37691 (+26) total 44006 (+60) Open issues with patches: 2434 Issues opened (49) ================== #20891: PyGILState_Ensure on non-Python thread causes fatal error https://bugs.python.org/issue20891 reopened by vstinner #30213: ZipFile from 'a'ppend-mode file generates invalid zip https://bugs.python.org/issue30213 reopened by serhiy.storchaka #32107: Improve MAC address calculation and fix test_uuid.py https://bugs.python.org/issue32107 reopened by xdegaye #32196: Rewrite plistlib with functional style https://bugs.python.org/issue32196 opened by serhiy.storchaka #32198: \b reports false-positives in Indic strings involving combinin https://bugs.python.org/issue32198 opened by jamadagni #32202: [ctypes] all long double tests fail on android-24-x86_64 https://bugs.python.org/issue32202 opened by xdegaye #32203: [ctypes] test_struct_by_value fails on android-24-arm64 https://bugs.python.org/issue32203 opened by xdegaye #32206: Run modules with pdb https://bugs.python.org/issue32206 opened by mariocj89 #32208: Improve semaphore documentation https://bugs.python.org/issue32208 opened by Garrett Berg #32209: Crash in set_traverse Within the Garbage Collector's collect_g https://bugs.python.org/issue32209 opened by connorwfitzgerald #32210: Add platform.android_ver() to test.pythoninfo for Android pla https://bugs.python.org/issue32210 opened by xdegaye #32211: Document the bug in re.findall() and re.finditer() in 2.7 and https://bugs.python.org/issue32211 opened by serhiy.storchaka #32212: few discrepancy between source and docs in logging https://bugs.python.org/issue32212 opened by Michal Plichta #32215: sqlite3 400x-600x slower depending on formatting of an UPDATE https://bugs.python.org/issue32215 opened by bforst #32216: Document PEP 557 Data Classes https://bugs.python.org/issue32216 opened by eric.smith #32217: freeze.py fails to work. https://bugs.python.org/issue32217 opened by Decorater #32218: add __iter__ to enum.Flag members https://bugs.python.org/issue32218 opened by Guy Gangemi #32219: SSLWantWriteError being raised by blocking SSL socket https://bugs.python.org/issue32219 opened by njs #32220: multiprocessing: passing file descriptor using reduction break https://bugs.python.org/issue32220 opened by frickenate #32221: Converting ipv6 address to string representation using getname https://bugs.python.org/issue32221 opened by socketpair #32222: pygettext doesn't extract docstrings for functions with type a https://bugs.python.org/issue32222 opened by Tobotimus #32223: distutils doesn't correctly read UTF-8 content from config fil https://bugs.python.org/issue32223 opened by delivrance #32224: socket.create_connection needs to support full IPv6 argument https://bugs.python.org/issue32224 opened by Matthew Stoltenberg #32225: Implement PEP 562: module __getattr__ and __dir__ https://bugs.python.org/issue32225 opened by levkivskyi #32226: Implement PEP 560: Core support for typing module and generic https://bugs.python.org/issue32226 opened by levkivskyi #32227: singledispatch support for type annotations https://bugs.python.org/issue32227 opened by lukasz.langa #32228: truncate() changes current stream position https://bugs.python.org/issue32228 opened by andreymal #32229: Simplify hiding developer warnings in user facing applications https://bugs.python.org/issue32229 opened by ncoghlan #32230: -X dev doesn't set sys.warnoptions https://bugs.python.org/issue32230 opened by ncoghlan #32231: -bb option should override -W options https://bugs.python.org/issue32231 opened by ncoghlan #32232: building extensions as builtins is broken in 3.7 https://bugs.python.org/issue32232 opened by doko #32234: Add context management to mailbox.Mailbox https://bugs.python.org/issue32234 opened by sblondon #32235: test_xml_etree test_xml_etree_c failures with 2.7 and 3.6 bran https://bugs.python.org/issue32235 opened by doko #32236: open() shouldn't silently ignore buffering=1 in binary mode https://bugs.python.org/issue32236 opened by izbyshev #32237: test_xml_etree leaked [1, 1, 1] references, sum=3 https://bugs.python.org/issue32237 opened by vstinner #32238: Handle "POSIX" in the legacy locale detection https://bugs.python.org/issue32238 opened by ncoghlan #32240: Add the const qualifier for PyObject* array arguments https://bugs.python.org/issue32240 opened by serhiy.storchaka #32241: Add the const qualifier for char and wchar_t pointers to unmod https://bugs.python.org/issue32241 opened by serhiy.storchaka #32243: Tests that set aggressive switch interval hang in Cygwin on a https://bugs.python.org/issue32243 opened by erik.bray #32244: Multiprocessing: multiprocessing.connection.Listener.accept() https://bugs.python.org/issue32244 opened by Tom Cook #32245: OSError: raw write() returned invalid length on latest Win 10 https://bugs.python.org/issue32245 opened by Simon Depiets #32246: test_regrtest alters the execution environment on Android https://bugs.python.org/issue32246 opened by xdegaye #32248: Port importlib_resources (module and ABC) to Python 3.7 https://bugs.python.org/issue32248 opened by barry #32250: Add loop.current_task() and loop.all_tasks() methods https://bugs.python.org/issue32250 opened by asvetlov #32251: Add asyncio.BufferedProtocol https://bugs.python.org/issue32251 opened by yselivanov #32252: test_regrtest leaves a test_python_* directory in TEMPDIR https://bugs.python.org/issue32252 opened by xdegaye #32253: Deprecate old-style locking in asyncio/locks.py https://bugs.python.org/issue32253 opened by asvetlov #32254: documentation builds (even local ones) refer to https://docs.p https://bugs.python.org/issue32254 opened by doko #32255: csv.writer converts None to '""\n' when it is first line, othe https://bugs.python.org/issue32255 opened by licht-t Most recent 15 issues with no replies (15) ========================================== #32255: csv.writer converts None to '""\n' when it is first line, othe https://bugs.python.org/issue32255 #32253: Deprecate old-style locking in asyncio/locks.py https://bugs.python.org/issue32253 #32250: Add loop.current_task() and loop.all_tasks() methods https://bugs.python.org/issue32250 #32248: Port importlib_resources (module and ABC) to Python 3.7 https://bugs.python.org/issue32248 #32245: OSError: raw write() returned invalid length on latest Win 10 https://bugs.python.org/issue32245 #32243: Tests that set aggressive switch interval hang in Cygwin on a https://bugs.python.org/issue32243 #32241: Add the const qualifier for char and wchar_t pointers to unmod https://bugs.python.org/issue32241 #32236: open() shouldn't silently ignore buffering=1 in binary mode https://bugs.python.org/issue32236 #32228: truncate() changes current stream position https://bugs.python.org/issue32228 #32226: Implement PEP 560: Core support for typing module and generic https://bugs.python.org/issue32226 #32225: Implement PEP 562: module __getattr__ and __dir__ https://bugs.python.org/issue32225 #32221: Converting ipv6 address to string representation using getname https://bugs.python.org/issue32221 #32218: add __iter__ to enum.Flag members https://bugs.python.org/issue32218 #32216: Document PEP 557 Data Classes https://bugs.python.org/issue32216 #32211: Document the bug in re.findall() and re.finditer() in 2.7 and https://bugs.python.org/issue32211 Most recent 15 issues waiting for review (15) ============================================= #32251: Add asyncio.BufferedProtocol https://bugs.python.org/issue32251 #32241: Add the const qualifier for char and wchar_t pointers to unmod https://bugs.python.org/issue32241 #32240: Add the const qualifier for PyObject* array arguments https://bugs.python.org/issue32240 #32237: test_xml_etree leaked [1, 1, 1] references, sum=3 https://bugs.python.org/issue32237 #32232: building extensions as builtins is broken in 3.7 https://bugs.python.org/issue32232 #32230: -X dev doesn't set sys.warnoptions https://bugs.python.org/issue32230 #32227: singledispatch support for type annotations https://bugs.python.org/issue32227 #32226: Implement PEP 560: Core support for typing module and generic https://bugs.python.org/issue32226 #32225: Implement PEP 562: module __getattr__ and __dir__ https://bugs.python.org/issue32225 #32222: pygettext doesn't extract docstrings for functions with type a https://bugs.python.org/issue32222 #32221: Converting ipv6 address to string representation using getname https://bugs.python.org/issue32221 #32217: freeze.py fails to work. https://bugs.python.org/issue32217 #32211: Document the bug in re.findall() and re.finditer() in 2.7 and https://bugs.python.org/issue32211 #32208: Improve semaphore documentation https://bugs.python.org/issue32208 #32206: Run modules with pdb https://bugs.python.org/issue32206 Top 10 most discussed issues (10) ================================= #17611: Move unwinding of stack for "pseudo exceptions" from interpret https://bugs.python.org/issue17611 20 msgs #32230: -X dev doesn't set sys.warnoptions https://bugs.python.org/issue32230 14 msgs #32030: PEP 432: Rewrite Py_Main() https://bugs.python.org/issue32030 13 msgs #32107: Improve MAC address calculation and fix test_uuid.py https://bugs.python.org/issue32107 10 msgs #20891: PyGILState_Ensure on non-Python thread causes fatal error https://bugs.python.org/issue20891 9 msgs #25054: Capturing start of line '^' https://bugs.python.org/issue25054 8 msgs #31589: Links for French documentation PDF is broken: LaTeX issue with https://bugs.python.org/issue31589 8 msgs #15873: datetime: add ability to parse RFC 3339 dates and times https://bugs.python.org/issue15873 7 msgs #28791: update SQLite libraries for Windows and macOS installers https://bugs.python.org/issue28791 7 msgs #32208: Improve semaphore documentation https://bugs.python.org/issue32208 6 msgs Issues closed (27) ================== #21621: Add note to 3.x What's New re Idle changes in bugfix releases https://bugs.python.org/issue21621 closed by terry.reedy #22589: mimetypes uses image/x-ms-bmp as the type for bmp files https://bugs.python.org/issue22589 closed by r.david.murray #27240: 'UnstructuredTokenList' object has no attribute '_fold_as_ew' https://bugs.python.org/issue27240 closed by r.david.murray #30788: email.policy.SMTP.fold() issue for long filenames with spaces https://bugs.python.org/issue30788 closed by r.david.murray #31380: test_undecodable_filename() in Lib/test/test_httpservers.py br https://bugs.python.org/issue31380 closed by ned.deily #31430: [Windows][2.7] Python 2.7 compilation fails on mt.exe crashing https://bugs.python.org/issue31430 closed by zach.ware #31619: Strange error when convert hexadecimal with underscores to int https://bugs.python.org/issue31619 closed by serhiy.storchaka #31831: EmailMessage.add_attachment(filename="long or sp??cial") crash https://bugs.python.org/issue31831 closed by r.david.murray #32098: Hardcoded value in Lib/test/test_os.py:L1324:URandomTests.get_ https://bugs.python.org/issue32098 closed by vstinner #32175: Add hash auto-randomization https://bugs.python.org/issue32175 closed by rhettinger #32176: Zero argument super is broken in 3.6 for methods with a hacked https://bugs.python.org/issue32176 closed by ncoghlan #32182: Infinite recursion in email.message.as_string() https://bugs.python.org/issue32182 closed by r.david.murray #32195: datetime.strftime with %Y no longer outputs leading zeros https://bugs.python.org/issue32195 closed by ned.deily #32197: Compiling against master branch fails; error: expected express https://bugs.python.org/issue32197 closed by vstinner #32199: uuid.getnode() should return the MAC address on Android https://bugs.python.org/issue32199 closed by xdegaye #32200: Full docs build of 3.6 and 3.7 failing since 2017-10-15 https://bugs.python.org/issue32200 closed by ned.deily #32201: Python uuids may not be persistent across time https://bugs.python.org/issue32201 closed by xdegaye #32204: async/await performance is very low https://bugs.python.org/issue32204 closed by yselivanov #32205: test.pythoninfo does not print the cross-built sysconfig data https://bugs.python.org/issue32205 closed by xdegaye #32207: IDLE: run's tk update adds context traceback on callback error https://bugs.python.org/issue32207 closed by terry.reedy #32213: assertRaises and subTest context managers cannot be nested https://bugs.python.org/issue32213 closed by p-ganssle #32214: Implement PEP 557: Data Classes https://bugs.python.org/issue32214 closed by eric.smith #32233: [3.7 Regression] build with --with-system-libmpdec is broken https://bugs.python.org/issue32233 closed by skrah #32239: decimal module exception args incorrect for c module https://bugs.python.org/issue32239 closed by skrah #32242: loop in loop with with 'zip'ped object misbehaves in py3.6 https://bugs.python.org/issue32242 closed by serhiy.storchaka #32247: shutil-copytree: Create dst folder only if it doesn't exist https://bugs.python.org/issue32247 closed by rst0py #32249: Document handler.cancelled() https://bugs.python.org/issue32249 closed by asvetlov From guido at python.org Fri Dec 8 13:16:25 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 8 Dec 2017 10:16:25 -0800 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level In-Reply-To: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> References: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> Message-ID: Yes, I think this is a reasonable argument for adding a 'slots' option (off by default) for @dataclass(). However I don't think we need to rush it in. I'm not very happy with the general idea of slots any more, and I think that it's probably being overused, and at the same time I expect that there are a lot of classes with a slots declaration that still have a dict as well, because they inherit from a class without slots. I'm not sure what to do about docstrings -- I'm not a big user of pydoc and I find help() often too verbose (I usually read the source. Maybe we could add a 'doc' option to field()? That's similar to what we offer for property(). On Thu, Dec 7, 2017 at 12:47 PM, Eric V. Smith wrote: > On 12/7/17 3:27 PM, Raymond Hettinger wrote: > ... > > I'm looking for guidance or workarounds for two issues that have arisen. >> >> First, the use of default values seems to completely preclude the use of >> __slots__. For example, this raises a ValueError: >> >> class A: >> __slots__ = ['x', 'y'] >> x: int = 10 >> y: int = 20 >> > > Hmm, I wasn't aware of that. I'm not sure I understand why that's an > error. Maybe it could be fixed? > > Otherwise, I have a decorator that takes a dataclass and returns a new > class with slots set: > > >>> from dataclasses import dataclass > >>> from dataclass_tools import add_slots > >>> @add_slots > ... @dataclass > ... class C: > ... x: int = 0 > ... y: int = 0 > ... > >>> c = C() > >>> c > C(x=0, y=0) > >>> c.z = 3 > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'C' object has no attribute 'z' > > This doesn't help the general case (your class A), but it does at least > solve it for dataclasses. Whether it should be actually included, and what > the interface would look like, can be (and I'm sure will be!) argued. > > The reason I didn't include it (as @dataclass(slots=True)) is because it > has to return a new class, and the rest of the dataclass features just > modifies the given class in place. I wanted to maintain that conceptual > simplicity. But this might be a reason to abandon that. For what it's > worth, attrs does have an @attr.s(slots=True) that returns a new class with > __slots__ set. > > The second issue is that the different annotations give different >> signatures than would produced for manually written classes. It is unclear >> what the best practice is for where to put the annotations and their >> associated docstrings. >> > > I don't have any suggestions here. > > Eric. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido% > 40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Fri Dec 8 13:28:53 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 8 Dec 2017 10:28:53 -0800 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level In-Reply-To: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> References: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> Message-ID: > On Dec 7, 2017, at 12:47 PM, Eric V. Smith wrote: > > On 12/7/17 3:27 PM, Raymond Hettinger wrote: > ... > >> I'm looking for guidance or workarounds for two issues that have arisen. >> >> First, the use of default values seems to completely preclude the use of __slots__. For example, this raises a ValueError: >> >> class A: >> __slots__ = ['x', 'y'] >> x: int = 10 >> y: int = 20 > > Hmm, I wasn't aware of that. I'm not sure I understand why that's an error. Maybe it could be fixed? The way __slots__ works is that the type() metaclass automatically assigns member-objects to the class variables 'x' and 'y'. Member objects are descriptors that do the actual lookup. So, I don't think the language limitation can be "fixed". Essentially, we're wanting to use the class variables 'x' and 'y' to hold both member objects and a default value. > This doesn't help the general case (your class A), but it does at least solve it for dataclasses. Whether it should be actually included, and what the interface would look like, can be (and I'm sure will be!) argued. > > The reason I didn't include it (as @dataclass(slots=True)) is because it has to return a new class, and the rest of the dataclass features just modifies the given class in place. I wanted to maintain that conceptual simplicity. But this might be a reason to abandon that. For what it's worth, attrs does have an @attr.s(slots=True) that returns a new class with __slots__ set. I recommend that you follow the path taken by attrs and return a new class. Otherwise, we're leaving users with a devil's choice. You can have default values or you can have slots, but you can't have both. The slots are pretty important. With slots, a three attribute instance is only 64 bytes. Without slots, it is 296 bytes. > >> The second issue is that the different annotations give different signatures than would produced for manually written classes. It is unclear what the best practice is for where to put the annotations and their associated docstrings. > > I don't have any suggestions here. I'm hoping the typing experts will chime in here. The question is straight-forward. Where should we look for the signature and docstring for constructing instances? Should they be attached to the class, to __init__(), or to __new__() when it used. It would be nice to have an official position on that before, it gets set in stone through arbitrary choices made by pycharm, pydoc, mypy, typing.NamedTuple, and dataclasses.dataclass. Raymond From larry at hastings.org Fri Dec 8 17:10:45 2017 From: larry at hastings.org (Larry Hastings) Date: Fri, 8 Dec 2017 14:10:45 -0800 Subject: [Python-Dev] Proposed schedule for next 3.4 and 3.5 releases - end of January / early February Message-ID: Howdy howdy.? I know nobody's excited by the prospect of 3.4 and 3.5 releases--I mean, fer gosh sakes, neither of those versions even has f-strings! ? But we're about due.? I prefer to release roughly every six months, and the current releases came out in early August. Here's my proposed schedule: Sun Jan 21 2017 - release 3.4.8rc1 and 3.5.5rc1 Sun Feb 04 2017 - release 3.4.8 final and 3.5.5 final Unless I'm presented with good reasons to change it, that'll be the schedule.? I'll update the PEPs with the final release dates in about a week. Just for fun, I'll remind everybody here that 3.4 and 3.5 are both in security-fixes-only mode.? This means two things: 1. These will be source-code-only releases; the Python core dev team won't release any more binary installers for 3.4 or 3.5. 2. I'm the only person permitted to accept PRs for 3.4 and 3.5. If you have security fixes for either of those versions, please add me as a reviewer. Happy holidays to you and yours, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Fri Dec 8 18:44:41 2017 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 8 Dec 2017 18:44:41 -0500 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level In-Reply-To: References: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> Message-ID: On 12/8/2017 1:28 PM, Raymond Hettinger wrote: > > >> On Dec 7, 2017, at 12:47 PM, Eric V. Smith wrote: >> >> On 12/7/17 3:27 PM, Raymond Hettinger wrote: >> ... >> >>> I'm looking for guidance or workarounds for two issues that have arisen. >>> >>> First, the use of default values seems to completely preclude the use of __slots__. For example, this raises a ValueError: >>> >>> class A: >>> __slots__ = ['x', 'y'] >>> x: int = 10 >>> y: int = 20 >> >> Hmm, I wasn't aware of that. I'm not sure I understand why that's an error. Maybe it could be fixed? > > The way __slots__ works is that the type() metaclass automatically assigns member-objects to the class variables 'x' and 'y'. Member objects are descriptors that do the actual lookup. > > So, I don't think the language limitation can be "fixed". Essentially, we're wanting to use the class variables 'x' and 'y' to hold both member objects and a default value. Thanks. I figured this out after doing some research. Here's a thread "__slots__ and default values" from 14+ years ago from some guy named Hettinger: https://mail.python.org/pipermail/python-dev/2003-May/035575.html As to whether we add slots=True to @dataclasses, I'll let Guido decide. The code already exists as a separate decorator here: https://github.com/ericvsmith/dataclasses/blob/master/dataclass_tools.py#L3, if you want to play with it. Usage: >>> @add_slots ... @dataclass ... class A: ... x: int = 10 ... y: int = 20 ... >>> a = A() >>> a A(x=10, y=20) >>> a.x = 15 >>> a A(x=15, y=20) >>> a.z = 30 Traceback (most recent call last): File "", line 1, in AttributeError: 'A' object has no attribute 'z' Folding it in to @dataclass is easy enough. On the other hand, since it just uses the dataclasses public API, it's not strictly required to be in @dataclass. >>> The second issue is that the different annotations give different signatures than would produced for manually written classes. It is unclear what the best practice is for where to put the annotations and their associated docstrings. >> >> I don't have any suggestions here. > > I'm hoping the typing experts will chime in here. The question is straight-forward. Where should we look for the signature and docstring for constructing instances? Should they be attached to the class, to __init__(), or to __new__() when it used. > > It would be nice to have an official position on that before, it gets set in stone through arbitrary choices made by pycharm, pydoc, mypy, typing.NamedTuple, and dataclasses.dataclass. I'm not sure I see why this would relate specifically to typing, since I don't think they'd inspect docstrings. But yes, it would be good to come to an agreement. Eric. From chris.barker at noaa.gov Fri Dec 8 18:50:29 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Fri, 8 Dec 2017 15:50:29 -0800 Subject: [Python-Dev] iso8601 parsing In-Reply-To: <8bf55eb4-e986-cbbd-65d7-472f9b149ce8@mgmiller.net> References: <01e69881-3710-87c8-f47a-dfc427ec65b5@mgmiller.net> <11362716.0I2SPu8sME@hammer.magicstack.net> <0fba01d34dca$47ad7940$d7086bc0$@sdamon.com> <20171025213056.GP9068@ando.pearwood.info> <8766d45c-3d52-c281-bb1a-576ed04f6351@ganssle.io> <6231266026602906357@unknownmsgid> <8bf55eb4-e986-cbbd-65d7-472f9b149ce8@mgmiller.net> Message-ID: <-1320753456472990863@unknownmsgid> On Dec 7, 2017, at 7:52 PM, Mike Miller wrote: Guess the argument for limiting what it accepts would be that every funky variation will need to be supported until the endtimes, even those of little use or utility. I suppose so, but not that hard once implemented and tests in place. How about this for a ?practicality beats purity? approach: .fromiso() will parse the most commonly used iso8601 compliant date time strings. It is guaranteed to properly parse the output of .isoformat() It is Not a validator ? it may except non-iso compliant strings, and may give surprising results when passed such. In any case, I sure hope it will accept iso strings both with and without the ?T?. But again: Paul, do whatever you think is best. -CHB On the other hand, it might be good to keep the two implementations the same for consistency reasons. Thanks either way, -Mike On 2017-12-07 17:57, Chris Barker - NOAA Federal wrote: _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Dec 8 18:56:17 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 8 Dec 2017 15:56:17 -0800 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level In-Reply-To: References: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> Message-ID: On Fri, Dec 8, 2017 at 3:44 PM, Eric V. Smith wrote: > On 12/8/2017 1:28 PM, Raymond Hettinger wrote: > >> >> >> On Dec 7, 2017, at 12:47 PM, Eric V. Smith wrote: >>> >>> On 12/7/17 3:27 PM, Raymond Hettinger wrote: >>> ... >>> >>> I'm looking for guidance or workarounds for two issues that have arisen. >>>> >>>> First, the use of default values seems to completely preclude the use >>>> of __slots__. For example, this raises a ValueError: >>>> >>>> class A: >>>> __slots__ = ['x', 'y'] >>>> x: int = 10 >>>> y: int = 20 >>>> >>> >>> Hmm, I wasn't aware of that. I'm not sure I understand why that's an >>> error. Maybe it could be fixed? >>> >> >> The way __slots__ works is that the type() metaclass automatically >> assigns member-objects to the class variables 'x' and 'y'. Member objects >> are descriptors that do the actual lookup. >> >> So, I don't think the language limitation can be "fixed". Essentially, >> we're wanting to use the class variables 'x' and 'y' to hold both member >> objects and a default value. >> > > Thanks. I figured this out after doing some research. Here's a thread > "__slots__ and default values" from 14+ years ago from some guy named > Hettinger: > https://mail.python.org/pipermail/python-dev/2003-May/035575.html > > As to whether we add slots=True to @dataclasses, I'll let Guido decide. > > The code already exists as a separate decorator here: > https://github.com/ericvsmith/dataclasses/blob/master/datacl > ass_tools.py#L3, if you want to play with it. > > Usage: > > >>> @add_slots > ... @dataclass > ... class A: > ... x: int = 10 > ... y: int = 20 > ... > >>> a = A() > >>> a > A(x=10, y=20) > >>> a.x = 15 > >>> a > A(x=15, y=20) > >>> a.z = 30 > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'A' object has no attribute 'z' > > Folding it in to @dataclass is easy enough. On the other hand, since it > just uses the dataclasses public API, it's not strictly required to be in > @dataclass. > Let's do it. For most people the new class is an uninteresting implementation detail; for the rest we can document clearly that it is special. > The second issue is that the different annotations give different >>>> signatures than would produced for manually written classes. It is unclear >>>> what the best practice is for where to put the annotations and their >>>> associated docstrings. >>>> >>> >>> I don't have any suggestions here. >>> >> >> I'm hoping the typing experts will chime in here. The question is >> straight-forward. Where should we look for the signature and docstring for >> constructing instances? Should they be attached to the class, to >> __init__(), or to __new__() when it used. >> >> It would be nice to have an official position on that before, it gets set >> in stone through arbitrary choices made by pycharm, pydoc, mypy, >> typing.NamedTuple, and dataclasses.dataclass. >> > > I'm not sure I see why this would relate specifically to typing, since I > don't think they'd inspect docstrings. But yes, it would be good to come to > an agreement. > I don't recall in detail what all these tools and classes do with docstrings. Maybe if someone summarizes the status quo and explains how PEP 557 changes that it will be simple to decide. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Dec 8 21:14:58 2017 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 8 Dec 2017 18:14:58 -0800 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level In-Reply-To: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> References: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> Message-ID: On Dec 7, 2017 12:49, "Eric V. Smith" wrote: The reason I didn't include it (as @dataclass(slots=True)) is because it has to return a new class, and the rest of the dataclass features just modifies the given class in place. I wanted to maintain that conceptual simplicity. But this might be a reason to abandon that. For what it's worth, attrs does have an @attr.s(slots=True) that returns a new class with __slots__ set. They actually switched to always returning a new class, regardless of whether slots is set: https://github.com/python-attrs/attrs/pull/260 You'd have to ask Hynek to get the full rationale, but I believe it was both for consistency with slot classes, and for consistency with regular class definition. For example, type.__new__ actually does different things depending on whether it sees an __eq__ method, so adding a method after the fact led to weird bugs with hashing. That class of bug goes away if you always set up the autogenerated methods and then call type.__new__. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Dec 9 02:54:43 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 9 Dec 2017 17:54:43 +1000 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: On 9 December 2017 at 01:22, Victor Stinner wrote: > I updated my PEP: in the 4th version, locale.getpreferredencoding() > now returns 'UTF-8' in the UTF-8 Mode. +1, that's a good change, since it brings the "locale coercion failed" case even closer to the "locale coercion succeeded" behaviour. To continue with the CentOS 7 example: that actually does use a UTF-8 based locale by default, it's just en_US.UTF.8 rather than C.UTF-8. Earlier versions of PEP 538 thus included "en_US.UTF-8" on the candidate target locale list, but that turned out to cause assorted problems due to the "C -> en_US" part of the coercion. Cheers, Nick. P.S. Thinking back on the history of the changes though, it may be worth revisiting the idea of "en_US.UTF-8" as a potential coercion locale: it was dropped as a potential coercion target back when the PEP still set both LANG & LC_ALL, whereas it now changes only LC_CTYPE. That means setting it won't mess with LC_COLLATE, or any of the other locale categories. That said, I'm not sure if there are behavioural differences between "LC_CTYPE=C.UTF-8" and "LC_CTYPE=en_US.UTF-8", so I'm inclined to leave that alone for now. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From eric at trueblade.com Sat Dec 9 08:52:15 2017 From: eric at trueblade.com (Eric V. Smith) Date: Sat, 9 Dec 2017 08:52:15 -0500 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level In-Reply-To: References: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> Message-ID: On 12/8/2017 9:14 PM, Nathaniel Smith wrote: > On Dec 7, 2017 12:49, "Eric V. Smith" > wrote: > > The reason I didn't include it (as @dataclass(slots=True)) is > because it has to return a new class, and the rest of the dataclass > features just modifies the given class in place. I wanted to > maintain that conceptual simplicity. But this might be a reason to > abandon that. For what it's worth, attrs does have an > @attr.s(slots=True) that returns a new class with __slots__ set. > > > They actually switched to always returning a new class, regardless of > whether slots is set: > > https://github.com/python-attrs/attrs/pull/260 In the end, it looks like that PR ended up just refactoring things, and the decision to always return a new class was deferred. I still haven't finished evaluating exactly what the refactoring does, though. Eric. > You'd have to ask Hynek to get the full rationale, but I believe it was > both for consistency with slot classes, and for consistency with regular > class definition. For example, type.__new__ actually does different > things depending on whether it sees an __eq__ method, so adding a method > after the fact led to weird bugs with hashing. That class of bug goes > away if you always set up the autogenerated methods and then call > type.__new__. They have a bunch of test cases that I'll have to review, too. Eric. From levkivskyi at gmail.com Sat Dec 9 09:55:14 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Sat, 9 Dec 2017 15:55:14 +0100 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level In-Reply-To: References: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> Message-ID: On 8 December 2017 at 19:28, Raymond Hettinger wrote: > > I'm hoping the typing experts will chime in here. The question is > straight-forward. Where should we look for the signature and docstring for > constructing instances? Should they be attached to the class, to > __init__(), or to __new__() when it used. > > It would be nice to have an official position on that before, it gets set > in stone through arbitrary choices made by pycharm, pydoc, mypy, > typing.NamedTuple, and dataclasses.dataclass. > > Here are some thoughts about this: 1. Instance variables are given very little attention in pydoc. Consider this example: >>> class C: ... x: int = 1 ... def meth(self, y: int) -> None: ... ... >>> help(C) Help on class C in module __main__: class C(builtins.object) | Methods defined here: | | meth(self, y: int) -> None | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | __annotations__ = {'x': } | | x = 1 The methods defined are listed first and are nicely formatted, while variables together with __annotations__ are left at the very end. I think that a line like x: int = 1 should appear for every instance variable should appear first, even before methods, since this is how people write (and read) classes. See also https://bugs.python.org/issue28519 for another problem with pydoc. 2. pydoc already extracts the signature of class from __init__ and __new__ (giving the preference to later if both are present) including the type annotations. I think this can be kept as is, but the special constructs like NamedTuple and dataclass that auto-generate methods should add annotations to them. For example, there is an issue to add annotations to __new__ by NamedTuple, see https://bugs.python.org/issue31006 and https://github.com/python/typing/issues/454 -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaronchall at yahoo.com Sat Dec 9 20:16:41 2017 From: aaronchall at yahoo.com (Aaron Hall) Date: Sun, 10 Dec 2017 01:16:41 +0000 (UTC) Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level In-Reply-To: References: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> Message-ID: <1673884616.2001600.1512868601475@mail.yahoo.com> I'm not a typing expert, but I want to second Raymond's concerns, and perhaps I'm qualified to do so as I gave the PyCon USA __slots__ talk this year and I have a highly voted answer describing them on Stack Overflow. Beautiful thing we're doing here with the dataclasses, by the way. I think addressing the slots issue could be a killer feature of dataclasses. I hope this doesn't muddy the water: If I could change a couple of things about __slots__ it would be 1. to allow multiple inheritance with multiple parents with nonempty slots (raises "TypeError: multiple bases have instance lay-out conflict"), and 2. to avoid creating redundant slots if extant in a parent (but maybe we should do this in the C level for all classes?). It seems to me that Dataclasses could (and should) help us avoid the second issue regardless (should be trivial to look in the bases for preexisting slots, right?). My workaround for the first issue is to inherit from ABCs with empty slots, but you need cooperative multiple inheritance for this - and you need to track the expected attributes (easy if you use abstract properties, which slots provide for. Maybe not all users of Dataclasses are advanced enough to do this? So, maybe this is crazy (please don't call the nice men in white coats on me), came to me as I was responding, and definitely outside the box here, but perhaps we could make decorated dataclass be the abstract parent of the instantiated class? Thanks, Aaron Hall On Friday, December 8, 2017, 1:31:44 PM EST, Raymond Hettinger wrote: The way __slots__ works is that the type() metaclass automatically assigns member-objects to the class variables 'x' and 'y'.? Member objects are descriptors that do the actual lookup. So, I don't think the language limitation can be "fixed".? Essentially, we're wanting to use the class variables 'x' and 'y' to hold both member objects and a default value. I recommend that you follow the path taken by attrs and return a new class.? Otherwise, we're leaving users with a devil's choice.? You can have default values or you can have slots, but you can't have both. The slots are pretty important.? With slots, a three attribute instance is only 64 bytes.? Without slots, it is 296 bytes. I'm hoping the typing experts will chime in here.? The question is straight-forward.? Where should we look for the signature and docstring for constructing instances?? Should they be attached to the class, to __init__(), or to __new__() when it used. It would be nice to have an official position on that before, it gets set in stone through arbitrary choices made by pycharm, pydoc, mypy, typing.NamedTuple, and dataclasses.dataclass. Raymond _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/aaronchall%40yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Dec 9 22:25:58 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 10 Dec 2017 13:25:58 +1000 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level In-Reply-To: References: <90322c40-0566-60ee-c389-bcce61d8114b@trueblade.com> Message-ID: On 9 December 2017 at 12:14, Nathaniel Smith wrote: > You'd have to ask Hynek to get the full rationale, but I believe it was both > for consistency with slot classes, and for consistency with regular class > definition. For example, type.__new__ actually does different things > depending on whether it sees an __eq__ method, so adding a method after the > fact led to weird bugs with hashing. That class of bug goes away if you > always set up the autogenerated methods and then call type.__new__. The main case I'm aware of where we do method inference in type.__new__ is setting "__hash__ = None" if "__eq__" is set. The main *problem* that arises with type replacement is that it currently interacts pretty badly with zero-argument super, since we don't make it easy to find and remap all the "__class__" references to the new class object. So right now, I think this trade-off tilts heavily in favour of "Keep the same class, but reimplement any required method inference logic when injecting methods". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From songofacandy at gmail.com Sat Dec 9 23:47:48 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 10 Dec 2017 13:47:48 +0900 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: Now I'm OK to accept the PEP, except one nitpick. > > Locale coercion only impacts non-Python code like C libraries, whereas > the Python UTF-8 Mode only impacts Python code: the two PEPs are > complementary. > This sentence seems bit misleading. If UTF-8 mode is disabled explicitly, locale coercion affects Python code too. locale.getpreferredencoding() is UTF-8, open()' s default encoding is UTF-8, and stdio is UTF-8/surrogateescape. So shouldn't this sentence is: "Locale coercion impacts both of Python code and non-Python code like C libraries, whereas ..."? INADA Naoki From songofacandy at gmail.com Sat Dec 9 23:50:36 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 10 Dec 2017 13:50:36 +0900 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: > Earlier versions of PEP 538 thus included "en_US.UTF-8" on the > candidate target locale list, but that turned out to cause assorted > problems due to the "C -> en_US" part of the coercion. Hm, but PEP 538 says: > this PEP instead proposes to extend the "surrogateescape" default for stdin and stderr error handling to also apply to the three potential coercion target locales. https://www.python.org/dev/peps/pep-0538/#defaulting-to-surrogateescape-error-handling-on-the-standard-io-streams I don't think en_US.UTF-8 should use surrogateescape error handler. Regards, INADA Naoki From victor.stinner at gmail.com Sun Dec 10 04:57:32 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 10 Dec 2017 10:57:32 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: Hi, Le 10 d?c. 2017 05:48, "INADA Naoki" a ?crit : Now I'm OK to accept the PEP, except one nitpick. I got a private email about the same issue. I don't think that it's nitpicking since many people were confused about the relationship between the PEP 538 and PEP 540. So it seems like I was confused as well :-) I was also confused because my PEP evolved quickly. With the additionnal local.getpreferredenconding() change in my PEP, the two PEP became even more similar. > Locale coercion only impacts non-Python code like C libraries, whereas > the Python UTF-8 Mode only impacts Python code: the two PEPs are > complementary. > This sentence seems bit misleading. If UTF-8 mode is disabled explicitly, locale coercion affects Python code too. locale.getpreferredencoding() is UTF-8, open()' s default encoding is UTF-8, and stdio is UTF-8/surrogateescape. So shouldn't this sentence is: "Locale coercion impacts both of Python code and non-Python code like C libraries, whereas ..."? Right. I will rephrase it. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From xdegaye at gmail.com Sun Dec 10 09:19:11 2017 From: xdegaye at gmail.com (Xavier de Gaye) Date: Sun, 10 Dec 2017 15:19:11 +0100 Subject: [Python-Dev] Support of the Android platform Message-ID: The following note is a proposal to add the support of the Android platform. The note is easier to read with clickable links at https://github.com/xdegaye/cagibi/blob/master/doc/android_support.rst Motivations =========== * Android is ubiquitous. * This would be the first platform supported by Python that is cross-compiled, thanks to many contributors. * Although the Android operating system is linux, it is different from most linux platforms, for example it does not use GNU libc and runs SELinux in enforcing mode. Therefore supporting this platform would make Python more robust and also would allow testing it on arm 64-bit processors. * Python running on Android is also a handheld calculator, a successor of the slide rule and the `HP 41`_. Current status ============== * The Python test suite succeeds when run on Android emulators using buildbot strenuous settings with the following architectures on API 24: x86, x86_64, armv7 and arm64. * The `Android build system`_ is described in another section. * The `buildmaster-config PR 26`_ proposes to update ``master.cfg`` to enable buildbots to run a given Android API and architecture on the emulators. * The Android emulator is actually ``qemu``, so the test suites for x86 and x86_64 last about the same time as the test suite run natively when the processor of the build system is of the x86 family. The test suites for the arm architectures last much longer: about 8 hours for arm64 and 10 hours for armv7 on a four years old laptop. * The changes that have been made to achieve this status are listed in `bpo-26865`_, the Android meta-issue. * Given the cpu resources required to run the test suite on the arm emulators, it may be difficult to find a contributed buildbot worker. So it remains to find the hardware to run these buildbots. Proposal ======== Support the Android platform on API 24 [1]_ for the x86_64, armv7 and arm64 architectures built with NDK 14b. *API 24* * API 21 is the first version to provide usable support for wide characters and where SELinux is run in enforcing mode. * API 22 introduces an annoying bug on the linker that prints something like this when python is started:: ``WARNING: linker: libpython3.6m.so.1.0: unused DT entry: type 0x6ffffffe arg 0x14554``. The `termux`_ Android terminal emulator describes this problem at the end of its `termux-packages`_ gitlab page and has implemented a ``termux-elf-cleaner`` tool to strip the useless entries from the ELF header of executables. * API 24 is the first version where the `adb`_ shell is run on the emulator as a ``shell`` user instead of the ``root`` user previously, and the first version that supports arm64. *x86_64* It seems that no handheld device exists using that architecture. It is supported because the x86_64 Android emulator runs fast and therefore is a good candidate as a buildbot worker. *NDK 14b* This release of the NDK is the first one to use `Unified headers`_ fixing numerous problems that had been fixed by updating the Python configure script until now (those changes have been reverted by now). Android idiosyncrasies ====================== * The default shell is ``/system/bin/sh``. * The file system layout is not a traditional unix layout, there is no ``/tmp`` for example. Most directories have user restricted access, ``/sdcard`` is mounted as ``noexec`` for example. * The (java) applications are allocated a unix user id and a subdirectory on ``/data/data``. * SELinux is run in enforcing mode. * Shared memory and semaphores are not supported. * The default encoding is UTF-8. Android build system ==================== The Android build system is implemented at `bpo-30386`_ with `PR 1629`_ and is documented by its `README`_. It provides the following features: * To build a distribution for a device or an emulator with a given API level and a given architecture. * To start the emulator and + install the distribution + start a remote interactive shell + or run remotely a python command + or run remotely the buildbottest * Run gdb on the python process that is running on the emulator with python pretty-printing. The build system adds the ``Android/`` directory and the ``configure-android`` script to the root of the Python source directory on the master branch without modifying any other file. The build system can be installed, upgraded (i.e. the SDK and NDK) and run remotely, through ssh for example. The following external libraries, when they are configured in the build system, are downloaded from the internet and cross-compiled (only once, on the first run of the build system) before the cross-compilation of the extension modules: * ``ncurses`` * ``readline`` * ``sqlite`` * ``libffi`` * ``openssl``, the cross-compilation of openssl fails on x86_64 and arm64 and this step is skipped on those architectures. The following extension modules are disabled by adding them to the ``*disabled*`` section of ``Modules/Setup``: * ``_uuid``, Android has no uuid/uuid.h header. * ``grp`` some grp.h functions are not declared. * ``_crypt``, Android does not have crypt.h. * ``_ctypes`` on x86_64 where all long double tests fail (`bpo-32202`_) and on arm64 (see `bpo-32203`_). .. [1] On Wikipedia `Android version history`_ lists the correspondence between API level, commercial name and version for each release. It also provides information on the global Android version distribution, see the two charts on top. .. _`README`: https://github.com/xdegaye/cpython/blob/bpo-30386/Android/README.rst .. _`bpo-26865`: https://bugs.python.org/issue26865 .. _`bpo-30386`: https://bugs.python.org/issue30386 .. _`bpo-32202`: https://bugs.python.org/issue32202 .. _`bpo-32203`: https://bugs.python.org/issue32203 .. _`PR 1629`: https://github.com/python/cpython/pull/1629 .. _`buildmaster-config PR 26`: https://github.com/python/buildmaster-config/pull/26 .. _`Android version history`: https://en.wikipedia.org/wiki/Android_version_history .. _`termux`: https://termux.com/ .. _`termux-packages`: https://gitlab.com/jbwhips883/termux-packages .. _`adb`: https://developer.android.com/studio/command-line/adb.html .. _`Unified headers`: https://android.googlesource.com/platform/ndk.git/+/ndk-r14-release/docs/UnifiedHeaders.md .. _`HP 41`: https://en.wikipedia.org/wiki/HP-41C .. vim:filetype=rst:tw=78:ts=8:sts=2:sw=2:et: From victor.stinner at gmail.com Sun Dec 10 12:21:50 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 10 Dec 2017 18:21:50 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: Ok, I fixed the effects of the locale coercion (PEP 538). Does it now look good to you, Naoki? https://www.python.org/dev/peps/pep-0540/#relationship-with-the-locale-coercion-pep-538 The commit: https://github.com/python/peps/commit/71cda51fbb622ece63f7a9d3c8fa6cd33ce06b58 diff --git a/pep-0540.txt b/pep-0540.txt index 0a9cbc1e..c163916d 100644 --- a/pep-0540.txt +++ b/pep-0540.txt @@ -144,9 +144,15 @@ The POSIX locale enables the locale coercion (PEP 538) and the UTF-8 mode (PEP 540). When the locale coercion is enabled, enabling the UTF-8 mode has no (additional) effect. -Locale coercion only impacts non-Python code like C libraries, whereas -the Python UTF-8 Mode only impacts Python code: the two PEPs are -complementary. +The UTF-8 has the same effect than locale coercion: +``sys.getfilesystemencoding()`` returns ``'UTF-8'``, +``locale.getpreferredencoding()`` returns ``UTF-8``, ``sys.stdin`` and +``sys.stdout`` error handler set to ``surrogateescape``. These changes +only affect Python code. But the locale coercion has addiditonal +effects: the ``LC_CTYPE`` environment variable and the ``LC_CTYPE`` +locale are set to a UTF-8 locale like ``C.UTF-8``. The side effect is +that non-Python code is also impacted by the locale coercion. The two +PEPs are complementary. On platforms where locale coercion is not supported like Centos 7, the POSIX locale only enables the UTF-8 Mode. In this case, Python code uses Victor 2017-12-10 5:47 GMT+01:00 INADA Naoki : > Now I'm OK to accept the PEP, except one nitpick. > >> >> Locale coercion only impacts non-Python code like C libraries, whereas >> the Python UTF-8 Mode only impacts Python code: the two PEPs are >> complementary. >> > > This sentence seems bit misleading. > If UTF-8 mode is disabled explicitly, locale coercion affects Python code too. > locale.getpreferredencoding() is UTF-8, open()' s default encoding is UTF-8, > and stdio is UTF-8/surrogateescape. > > So shouldn't this sentence is: "Locale coercion impacts both of Python code > and non-Python code like C libraries, whereas ..."? > > INADA Naoki From songofacandy at gmail.com Sun Dec 10 12:46:35 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 11 Dec 2017 02:46:35 +0900 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: Except one typo I commented on Github, I accept PEP 540. Well done, Victor and Nick for PEP 540 and 538. Python 3.7 will be most UTF-8 friendly Python 3 than ever. INADA Naoki On Mon, Dec 11, 2017 at 2:21 AM, Victor Stinner wrote: > Ok, I fixed the effects of the locale coercion (PEP 538). Does it now > look good to you, Naoki? > > https://www.python.org/dev/peps/pep-0540/#relationship-with-the-locale-coercion-pep-538 > > The commit: > > https://github.com/python/peps/commit/71cda51fbb622ece63f7a9d3c8fa6cd33ce06b58 > > diff --git a/pep-0540.txt b/pep-0540.txt > index 0a9cbc1e..c163916d 100644 > --- a/pep-0540.txt > +++ b/pep-0540.txt > @@ -144,9 +144,15 @@ The POSIX locale enables the locale coercion (PEP > 538) and the UTF-8 > mode (PEP 540). When the locale coercion is enabled, enabling the UTF-8 > mode has no (additional) effect. > > -Locale coercion only impacts non-Python code like C libraries, whereas > -the Python UTF-8 Mode only impacts Python code: the two PEPs are > -complementary. > +The UTF-8 has the same effect than locale coercion: > +``sys.getfilesystemencoding()`` returns ``'UTF-8'``, > +``locale.getpreferredencoding()`` returns ``UTF-8``, ``sys.stdin`` and > +``sys.stdout`` error handler set to ``surrogateescape``. These changes > +only affect Python code. But the locale coercion has addiditonal > +effects: the ``LC_CTYPE`` environment variable and the ``LC_CTYPE`` > +locale are set to a UTF-8 locale like ``C.UTF-8``. The side effect is > +that non-Python code is also impacted by the locale coercion. The two > +PEPs are complementary. > > On platforms where locale coercion is not supported like Centos 7, the > POSIX locale only enables the UTF-8 Mode. In this case, Python code uses > > Victor > > > 2017-12-10 5:47 GMT+01:00 INADA Naoki : >> Now I'm OK to accept the PEP, except one nitpick. >> >>> >>> Locale coercion only impacts non-Python code like C libraries, whereas >>> the Python UTF-8 Mode only impacts Python code: the two PEPs are >>> complementary. >>> >> >> This sentence seems bit misleading. >> If UTF-8 mode is disabled explicitly, locale coercion affects Python code too. >> locale.getpreferredencoding() is UTF-8, open()' s default encoding is UTF-8, >> and stdio is UTF-8/surrogateescape. >> >> So shouldn't this sentence is: "Locale coercion impacts both of Python code >> and non-Python code like C libraries, whereas ..."? >> >> INADA Naoki From victor.stinner at gmail.com Sun Dec 10 12:55:35 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 10 Dec 2017 18:55:35 +0100 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: 2017-12-10 18:46 GMT+01:00 INADA Naoki : > Except one typo I commented on Github, Fixed: https://github.com/python/peps/commit/08224bf6bdf16b539fb6f8136061877e5924476d > I accept PEP 540. Wow, thank you :-) Again, thank you for your very useful feedback which helped to make the PEP 540 much better than its initial version. > Well done, Victor and Nick for PEP 540 and 538. > Python 3.7 will be most UTF-8 friendly Python 3 than ever. Yep. Once the PEP 540 will be implemented, we will need need to test them as much as possible before 3.7 final! https://bugs.python.org/issue29240 https://github.com/python/cpython/pull/855 Victor From a.badger at gmail.com Sun Dec 10 13:23:33 2017 From: a.badger at gmail.com (Toshio Kuratomi) Date: Sun, 10 Dec 2017 10:23:33 -0800 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: On Dec 9, 2017 8:53 PM, "INADA Naoki" wrote: > Earlier versions of PEP 538 thus included "en_US.UTF-8" on the > candidate target locale list, but that turned out to cause assorted > problems due to the "C -> en_US" part of the coercion. Hm, but PEP 538 says: > this PEP instead proposes to extend the "surrogateescape" default for stdin and stderr error handling to also apply to the three potential coercion target locales. https://www.python.org/dev/peps/pep-0538/#defaulting-to- surrogateescape-error-handling-on-the-standard-io-streams I don't think en_US.UTF-8 should use surrogateescape error handler. Could you explain why not? utf-8 seems like the common thread for using surrogateescape so I'm not sure what would make en_US.UTF-8 different than C.UTF-8. -Toshio -------------- next part -------------- An HTML attachment was scrubbed... URL: From tinchester at gmail.com Sun Dec 10 14:17:25 2017 From: tinchester at gmail.com (=?UTF-8?Q?Tin_Tvrtkovi=C4=87?=) Date: Sun, 10 Dec 2017 19:17:25 +0000 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level In-Reply-To: References: Message-ID: Hello, I'm one of the attrs contributors, and the person who initially wrote the slots functionality there. We've given up on returning a new class always since this can conflict with certain metaclasses (have you noticed you can't make a slots attrs class inheriting from Generic[T]?) and with PEP 487. I think with PEP 487 it's becoming especially evident class creation is not necessarily an idempotent operation. I'm currently brainstorming alternative APIs for slots. The best solution would be for Python to actually offer a way to add slotness to a class after it's been defined, and Guido has expressed approval ( https://github.com/ericvsmith/dataclasses/issues/60#issuecomment-348719029). Personally I really like slot classes, both for their memory characteristics and the fact the attributes need to be enumerated beforehand and typos get turned into errors, so I'd welcome any development on this front. :) Date: Sat, 9 Dec 2017 08:52:15 -0500 > From: "Eric V. Smith" > To: Nathaniel Smith > Cc: Python Dev > Subject: Re: [Python-Dev] Issues with PEP 526 Variable Notation at the > class level > Message-ID: > Content-Type: text/plain; charset=utf-8; format=flowed > > On 12/8/2017 9:14 PM, Nathaniel Smith wrote: > > On Dec 7, 2017 12:49, "Eric V. Smith" > > wrote: > > > > The reason I didn't include it (as @dataclass(slots=True)) is > > because it has to return a new class, and the rest of the dataclass > > features just modifies the given class in place. I wanted to > > maintain that conceptual simplicity. But this might be a reason to > > abandon that. For what it's worth, attrs does have an > > @attr.s(slots=True) that returns a new class with __slots__ set. > > > > > > They actually switched to always returning a new class, regardless of > > whether slots is set: > > > > https://github.com/python-attrs/attrs/pull/260 > > In the end, it looks like that PR ended up just refactoring things, and > the decision to always return a new class was deferred. I still haven't > finished evaluating exactly what the refactoring does, though. > > Eric. > > > You'd have to ask Hynek to get the full rationale, but I believe it was > > both for consistency with slot classes, and for consistency with regular > > class definition. For example, type.__new__ actually does different > > things depending on whether it sees an __eq__ method, so adding a method > > after the fact led to weird bugs with hashing. That class of bug goes > > away if you always set up the autogenerated methods and then call > > type.__new__. > > They have a bunch of test cases that I'll have to review, too. > > Eric. > - > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Dec 10 14:47:45 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 10 Dec 2017 20:47:45 +0100 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level References: Message-ID: <20171210204745.2bdd6a90@fsol> Hi, On Sun, 10 Dec 2017 19:17:25 +0000 Tin Tvrtkovi? wrote: > Hello, > > I'm one of the attrs contributors, and the person who initially wrote the > slots functionality there. > > We've given up on returning a new class always since this can conflict with > certain metaclasses (have you noticed you can't make a slots attrs class > inheriting from Generic[T]?) and with PEP 487. I think with PEP 487 it's > becoming especially evident class creation is not necessarily an idempotent > operation. Hmm... I understand you may be restricted by backwards compatibility here. But dataclasses don't have that issue, so we could decide we're incompatible with certain dataclasses from day 1. > I'm currently brainstorming alternative APIs for slots. The best solution > would be for Python to actually offer a way to add slotness to a class > after it's been defined, and Guido has expressed approval ( > https://github.com/ericvsmith/dataclasses/issues/60#issuecomment-348719029). The problem is less the API but the implementation. As Guido pointed out, this means the instance layout may now change after class definition. One possibility would be to allow changing the layout before the first instance (or even subclass) is instantiated, after which it would raise an error. Regards Antoine. From brett at python.org Sun Dec 10 15:27:06 2017 From: brett at python.org (Brett Cannon) Date: Sun, 10 Dec 2017 20:27:06 +0000 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: While the note from a technical standpoint is interest, Xavier, I don't quite see what needs to be done to support Android at this point. Are you simply asking we add Android API 24 as an official platform? Or permission to add your note to the Misc/ directory? Basically what are you wanting to see happen? :) On Sun, 10 Dec 2017 at 06:19 Xavier de Gaye wrote: > The following note is a proposal to add the support of the Android > platform. > > The note is easier to read with clickable links at > https://github.com/xdegaye/cagibi/blob/master/doc/android_support.rst > > Motivations > =========== > > * Android is ubiquitous. > * This would be the first platform supported by Python that is > cross-compiled, > thanks to many contributors. > * Although the Android operating system is linux, it is different from most > linux platforms, for example it does not use GNU libc and runs SELinux > in > enforcing mode. Therefore supporting this platform would make Python > more > robust and also would allow testing it on arm 64-bit processors. > * Python running on Android is also a handheld calculator, a successor of > the > slide rule and the `HP 41`_. > > Current status > ============== > > * The Python test suite succeeds when run on Android emulators using > buildbot > strenuous settings with the following architectures on API 24: x86, > x86_64, > armv7 and arm64. > * The `Android build system`_ is described in another section. > * The `buildmaster-config PR 26`_ proposes to update ``master.cfg`` to > enable > buildbots to run a given Android API and architecture on the emulators. > * The Android emulator is actually ``qemu``, so the test suites for x86 and > x86_64 last about the same time as the test suite run natively when the > processor of the build system is of the x86 family. The test suites for > the > arm architectures last much longer: about 8 hours for arm64 and 10 > hours for > armv7 on a four years old laptop. > * The changes that have been made to achieve this status are listed in > `bpo-26865`_, the Android meta-issue. > * Given the cpu resources required to run the test suite on the arm > emulators, > it may be difficult to find a contributed buildbot worker. So it > remains to > find the hardware to run these buildbots. > > Proposal > ======== > > Support the Android platform on API 24 [1]_ for the x86_64, armv7 and arm64 > architectures built with NDK 14b. > > *API 24* > * API 21 is the first version to provide usable support for wide > characters > and where SELinux is run in enforcing mode. > > * API 22 introduces an annoying bug on the linker that prints something > like > this when python is started:: > > ``WARNING: linker: libpython3.6m.so.1.0: unused DT entry: type > 0x6ffffffe arg 0x14554``. > > The `termux`_ Android terminal emulator describes this problem at the > end > of its `termux-packages`_ gitlab page and has implemented a > ``termux-elf-cleaner`` tool to strip the useless entries from the ELF > header of executables. > > * API 24 is the first version where the `adb`_ shell is run on the > emulator > as a ``shell`` user instead of the ``root`` user previously, and the > first > version that supports arm64. > > *x86_64* > It seems that no handheld device exists using that architecture. It is > supported because the x86_64 Android emulator runs fast and therefore > is a > good candidate as a buildbot worker. > > *NDK 14b* > This release of the NDK is the first one to use `Unified headers`_ > fixing > numerous problems that had been fixed by updating the Python configure > script > until now (those changes have been reverted by now). > > Android idiosyncrasies > ====================== > > * The default shell is ``/system/bin/sh``. > * The file system layout is not a traditional unix layout, there is no > ``/tmp`` for example. Most directories have user restricted access, > ``/sdcard`` is mounted as ``noexec`` for example. > * The (java) applications are allocated a unix user id and a subdirectory > on > ``/data/data``. > * SELinux is run in enforcing mode. > * Shared memory and semaphores are not supported. > * The default encoding is UTF-8. > > Android build system > ==================== > > The Android build system is implemented at `bpo-30386`_ with `PR 1629`_ and > is documented by its `README`_. It provides the following features: > > * To build a distribution for a device or an emulator with a given API > level > and a given architecture. > * To start the emulator and > + install the distribution > + start a remote interactive shell > + or run remotely a python command > + or run remotely the buildbottest > * Run gdb on the python process that is running on the emulator with python > pretty-printing. > > The build system adds the ``Android/`` directory and the > ``configure-android`` > script to the root of the Python source directory on the master branch > without > modifying any other file. The build system can be installed, upgraded > (i.e. the > SDK and NDK) and run remotely, through ssh for example. > > The following external libraries, when they are configured in the build > system, > are downloaded from the internet and cross-compiled (only once, on the > first > run of the build system) before the cross-compilation of the extension > modules: > > * ``ncurses`` > * ``readline`` > * ``sqlite`` > * ``libffi`` > * ``openssl``, the cross-compilation of openssl fails on x86_64 and arm64 > and > this step is skipped on those architectures. > > The following extension modules are disabled by adding them to the > ``*disabled*`` section of ``Modules/Setup``: > > * ``_uuid``, Android has no uuid/uuid.h header. > * ``grp`` some grp.h functions are not declared. > * ``_crypt``, Android does not have crypt.h. > * ``_ctypes`` on x86_64 where all long double tests fail (`bpo-32202`_) > and on > arm64 (see `bpo-32203`_). > > .. [1] On Wikipedia `Android version history`_ lists the correspondence > between > API level, commercial name and version for each release. It also > provides > information on the global Android version distribution, see the two > charts > on top. > > .. _`README`: > https://github.com/xdegaye/cpython/blob/bpo-30386/Android/README.rst > .. _`bpo-26865`: https://bugs.python.org/issue26865 > .. _`bpo-30386`: https://bugs.python.org/issue30386 > .. _`bpo-32202`: https://bugs.python.org/issue32202 > .. _`bpo-32203`: https://bugs.python.org/issue32203 > .. _`PR 1629`: https://github.com/python/cpython/pull/1629 > .. _`buildmaster-config PR 26`: > https://github.com/python/buildmaster-config/pull/26 > .. _`Android version history`: > https://en.wikipedia.org/wiki/Android_version_history > .. _`termux`: https://termux.com/ > .. _`termux-packages`: https://gitlab.com/jbwhips883/termux-packages > .. _`adb`: https://developer.android.com/studio/command-line/adb.html > .. _`Unified headers`: > https://android.googlesource.com/platform/ndk.git/+/ndk-r14-release/docs/UnifiedHeaders.md > .. _`HP 41`: https://en.wikipedia.org/wiki/HP-41C > .. vim:filetype=rst:tw=78:ts=8:sts=2:sw=2:et: > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Dec 10 16:15:01 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 10 Dec 2017 22:15:01 +0100 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level References: <20171210204745.2bdd6a90@fsol> Message-ID: <20171210221501.40233db0@fsol> On Sun, 10 Dec 2017 20:47:45 +0100 Antoine Pitrou wrote: > Hi, > > On Sun, 10 Dec 2017 19:17:25 +0000 > Tin Tvrtkovi? wrote: > > Hello, > > > > I'm one of the attrs contributors, and the person who initially wrote the > > slots functionality there. > > > > We've given up on returning a new class always since this can conflict with > > certain metaclasses (have you noticed you can't make a slots attrs class > > inheriting from Generic[T]?) and with PEP 487. I think with PEP 487 it's > > becoming especially evident class creation is not necessarily an idempotent > > operation. > > Hmm... I understand you may be restricted by backwards compatibility > here. But dataclasses don't have that issue, so we could decide we're > incompatible with certain dataclasses from day 1. Sorry... make that "incompatible with certain metaclasses" ;-) Regards Antoine. From raymond.hettinger at gmail.com Sun Dec 10 16:24:17 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 10 Dec 2017 13:24:17 -0800 Subject: [Python-Dev] Is static typing still optional? Message-ID: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> The make_dataclass() factory function in the dataclasses module currently requires type declarations. It would be nice if the type declarations were optional. With typing (currently works): Point = NamedTuple('Point', [('x', float), ('y', float), ('z', float)]) Point = make_dataclass('Point', [('x', float), ('y', float), ('z', float)]) Without typing (only the first currently works): Point = namedtuple('Point', ['x', 'y', 'z']) # underlying store is a tuple Point = make_dataclass('Point', ['x', 'y', 'z']) # underlying store is an instance dict This proposal would make it easy to cleanly switch between the immutable tuple-based container and the instancedict-based optionally-frozen container. The proposal would make it possible for instructors to teach dataclasses without having to teach typing as a prerequisite. And, it would make dataclasses usable for projects that have elected not to use static typing. Raymond From gvanrossum at gmail.com Sun Dec 10 16:24:26 2017 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun, 10 Dec 2017 13:24:26 -0800 Subject: [Python-Dev] Issues with PEP 526 Variable Notation at the class level In-Reply-To: <20171210221501.40233db0@fsol> References: <20171210204745.2bdd6a90@fsol> <20171210221501.40233db0@fsol> Message-ID: OTOH daytaclass is a decorator for *better* metaclass compatibility. On Dec 10, 2017 13:17, "Antoine Pitrou" wrote: > On Sun, 10 Dec 2017 20:47:45 +0100 > Antoine Pitrou wrote: > > > Hi, > > > > On Sun, 10 Dec 2017 19:17:25 +0000 > > Tin Tvrtkovi? wrote: > > > Hello, > > > > > > I'm one of the attrs contributors, and the person who initially wrote > the > > > slots functionality there. > > > > > > We've given up on returning a new class always since this can conflict > with > > > certain metaclasses (have you noticed you can't make a slots attrs > class > > > inheriting from Generic[T]?) and with PEP 487. I think with PEP 487 > it's > > > becoming especially evident class creation is not necessarily an > idempotent > > > operation. > > > > Hmm... I understand you may be restricted by backwards compatibility > > here. But dataclasses don't have that issue, so we could decide we're > > incompatible with certain dataclasses from day 1. > > Sorry... make that "incompatible with certain metaclasses" ;-) > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gvanrossum at gmail.com Sun Dec 10 16:26:54 2017 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun, 10 Dec 2017 13:26:54 -0800 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: Maybe it should be a PEP? On Dec 10, 2017 12:29, "Brett Cannon" wrote: > While the note from a technical standpoint is interest, Xavier, I don't > quite see what needs to be done to support Android at this point. Are you > simply asking we add Android API 24 as an official platform? Or permission > to add your note to the Misc/ directory? Basically what are you wanting to > see happen? :) > > On Sun, 10 Dec 2017 at 06:19 Xavier de Gaye wrote: > >> The following note is a proposal to add the support of the Android >> platform. >> >> The note is easier to read with clickable links at >> https://github.com/xdegaye/cagibi/blob/master/doc/android_support.rst >> >> Motivations >> =========== >> >> * Android is ubiquitous. >> * This would be the first platform supported by Python that is >> cross-compiled, >> thanks to many contributors. >> * Although the Android operating system is linux, it is different from >> most >> linux platforms, for example it does not use GNU libc and runs SELinux >> in >> enforcing mode. Therefore supporting this platform would make Python >> more >> robust and also would allow testing it on arm 64-bit processors. >> * Python running on Android is also a handheld calculator, a successor of >> the >> slide rule and the `HP 41`_. >> >> Current status >> ============== >> >> * The Python test suite succeeds when run on Android emulators using >> buildbot >> strenuous settings with the following architectures on API 24: x86, >> x86_64, >> armv7 and arm64. >> * The `Android build system`_ is described in another section. >> * The `buildmaster-config PR 26`_ proposes to update ``master.cfg`` to >> enable >> buildbots to run a given Android API and architecture on the emulators. >> * The Android emulator is actually ``qemu``, so the test suites for x86 >> and >> x86_64 last about the same time as the test suite run natively when the >> processor of the build system is of the x86 family. The test suites >> for the >> arm architectures last much longer: about 8 hours for arm64 and 10 >> hours for >> armv7 on a four years old laptop. >> * The changes that have been made to achieve this status are listed in >> `bpo-26865`_, the Android meta-issue. >> * Given the cpu resources required to run the test suite on the arm >> emulators, >> it may be difficult to find a contributed buildbot worker. So it >> remains to >> find the hardware to run these buildbots. >> >> Proposal >> ======== >> >> Support the Android platform on API 24 [1]_ for the x86_64, armv7 and >> arm64 >> architectures built with NDK 14b. >> >> *API 24* >> * API 21 is the first version to provide usable support for wide >> characters >> and where SELinux is run in enforcing mode. >> >> * API 22 introduces an annoying bug on the linker that prints >> something like >> this when python is started:: >> >> ``WARNING: linker: libpython3.6m.so.1.0: unused DT entry: type >> 0x6ffffffe arg 0x14554``. >> >> The `termux`_ Android terminal emulator describes this problem at >> the end >> of its `termux-packages`_ gitlab page and has implemented a >> ``termux-elf-cleaner`` tool to strip the useless entries from the ELF >> header of executables. >> >> * API 24 is the first version where the `adb`_ shell is run on the >> emulator >> as a ``shell`` user instead of the ``root`` user previously, and the >> first >> version that supports arm64. >> >> *x86_64* >> It seems that no handheld device exists using that architecture. It is >> supported because the x86_64 Android emulator runs fast and therefore >> is a >> good candidate as a buildbot worker. >> >> *NDK 14b* >> This release of the NDK is the first one to use `Unified headers`_ >> fixing >> numerous problems that had been fixed by updating the Python configure >> script >> until now (those changes have been reverted by now). >> >> Android idiosyncrasies >> ====================== >> >> * The default shell is ``/system/bin/sh``. >> * The file system layout is not a traditional unix layout, there is no >> ``/tmp`` for example. Most directories have user restricted access, >> ``/sdcard`` is mounted as ``noexec`` for example. >> * The (java) applications are allocated a unix user id and a subdirectory >> on >> ``/data/data``. >> * SELinux is run in enforcing mode. >> * Shared memory and semaphores are not supported. >> * The default encoding is UTF-8. >> >> Android build system >> ==================== >> >> The Android build system is implemented at `bpo-30386`_ with `PR 1629`_ >> and >> is documented by its `README`_. It provides the following features: >> >> * To build a distribution for a device or an emulator with a given API >> level >> and a given architecture. >> * To start the emulator and >> + install the distribution >> + start a remote interactive shell >> + or run remotely a python command >> + or run remotely the buildbottest >> * Run gdb on the python process that is running on the emulator with >> python >> pretty-printing. >> >> The build system adds the ``Android/`` directory and the >> ``configure-android`` >> script to the root of the Python source directory on the master branch >> without >> modifying any other file. The build system can be installed, upgraded >> (i.e. the >> SDK and NDK) and run remotely, through ssh for example. >> >> The following external libraries, when they are configured in the build >> system, >> are downloaded from the internet and cross-compiled (only once, on the >> first >> run of the build system) before the cross-compilation of the extension >> modules: >> >> * ``ncurses`` >> * ``readline`` >> * ``sqlite`` >> * ``libffi`` >> * ``openssl``, the cross-compilation of openssl fails on x86_64 and arm64 >> and >> this step is skipped on those architectures. >> >> The following extension modules are disabled by adding them to the >> ``*disabled*`` section of ``Modules/Setup``: >> >> * ``_uuid``, Android has no uuid/uuid.h header. >> * ``grp`` some grp.h functions are not declared. >> * ``_crypt``, Android does not have crypt.h. >> * ``_ctypes`` on x86_64 where all long double tests fail (`bpo-32202`_) >> and on >> arm64 (see `bpo-32203`_). >> >> .. [1] On Wikipedia `Android version history`_ lists the correspondence >> between >> API level, commercial name and version for each release. It also >> provides >> information on the global Android version distribution, see the two >> charts >> on top. >> >> .. _`README`: https://github.com/xdegaye/cpython/blob/bpo-30386/ >> Android/README.rst >> .. _`bpo-26865`: https://bugs.python.org/issue26865 >> .. _`bpo-30386`: https://bugs.python.org/issue30386 >> .. _`bpo-32202`: https://bugs.python.org/issue32202 >> .. _`bpo-32203`: https://bugs.python.org/issue32203 >> .. _`PR 1629`: https://github.com/python/cpython/pull/1629 >> .. _`buildmaster-config PR 26`: https://github.com/python/ >> buildmaster-config/pull/26 >> .. _`Android version history`: https://en.wikipedia.org/wiki/ >> Android_version_history >> .. _`termux`: https://termux.com/ >> .. _`termux-packages`: https://gitlab.com/jbwhips883/termux-packages >> .. _`adb`: https://developer.android.com/studio/command-line/adb.html >> .. _`Unified headers`: https://android.googlesource. >> com/platform/ndk.git/+/ndk-r14-release/docs/UnifiedHeaders.md >> .. _`HP 41`: https://en.wikipedia.org/wiki/HP-41C >> .. vim:filetype=rst:tw=78:ts=8:sts=2:sw=2:et: >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ >> brett%40python.org >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skip.montanaro at gmail.com Sun Dec 10 16:27:15 2017 From: skip.montanaro at gmail.com (Skip Montanaro) Date: Sun, 10 Dec 2017 15:27:15 -0600 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: I'm not familiar with software development on/for Android, but wouldn't official support also involve suitable package creation or does that just fall out for free from the build-for-emulator PR? Skip From levkivskyi at gmail.com Sun Dec 10 16:29:49 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Sun, 10 Dec 2017 22:29:49 +0100 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> Message-ID: On 10 December 2017 at 22:24, Raymond Hettinger wrote: > Without typing (only the first currently works): > > Point = namedtuple('Point', ['x', 'y', 'z']) # underlying > store is a tuple > Point = make_dataclass('Point', ['x', 'y', 'z']) # underlying > store is an instance dict > > Hm, I think this is a bug in implementation. The second form should also work. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sun Dec 10 16:37:35 2017 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 10 Dec 2017 16:37:35 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> Message-ID: <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> On 12/10/2017 4:29 PM, Ivan Levkivskyi wrote: > On 10 December 2017 at 22:24, Raymond Hettinger > > wrote: > > Without typing (only the first currently works): > > ? ? Point = namedtuple('Point', ['x', 'y', 'z'])? ? ? ? ? # > underlying store is a tuple > ? ? Point = make_dataclass('Point', ['x', 'y', 'z'])? ? ? # > underlying store is an instance dict > > > Hm, I think this is a bug in implementation. The second form should also > work. Agreed. I have a bunch of pending changes for dataclasses. I'll add this. Eric. From raymond.hettinger at gmail.com Sun Dec 10 17:00:56 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 10 Dec 2017 14:00:56 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> Message-ID: <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> > On Dec 10, 2017, at 1:37 PM, Eric V. Smith wrote: > > On 12/10/2017 4:29 PM, Ivan Levkivskyi wrote: >> On 10 December 2017 at 22:24, Raymond Hettinger > wrote: >> Without typing (only the first currently works): >> Point = namedtuple('Point', ['x', 'y', 'z']) # >> underlying store is a tuple >> Point = make_dataclass('Point', ['x', 'y', 'z']) # >> underlying store is an instance dict >> Hm, I think this is a bug in implementation. The second form should also work. > > Agreed. > > I have a bunch of pending changes for dataclasses. I'll add this. > > Eric. Thanks Eric and Ivan. You're both very responsive. I appreciate the enormous efforts you're putting in to getting this right. I suggest two other fix-ups: 1) Let make_dataclass() pass through keyword arguments to _process_class(), so that this will work: Point = make_dataclass('Point', ['x', 'y', 'z'], order=True) 2) Change the default value for "hash" from "None" to "False". This might take a little effort because there is currently an oddity where setting hash=False causes it to be hashable. I'm pretty sure this wasn't intended ;-) Raymond From nad at python.org Sun Dec 10 17:07:43 2017 From: nad at python.org (Ned Deily) Date: Sun, 10 Dec 2017 17:07:43 -0500 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: <8495B954-5762-43FA-BAA6-681C45EEF2CD@python.org> On Dec 10, 2017, at 16:26, Guido van Rossum wrote: > On Dec 10, 2017 12:29, "Brett Cannon" wrote: >> On Sun, 10 Dec 2017 at 06:19 Xavier de Gaye wrote: >>> The following note is a proposal to add the support of the Android platform. >>> [...] >> While the note from a technical standpoint is interest, Xavier, I don't quite see what needs to be done to support Android at this point. Are you simply asking we add Android API 24 as an official platform? Or permission to add your note to the Misc/ directory? Basically what are you wanting to see happen? :) > Maybe it should be a PEP? Yes, I agree there needs to be a PEP for this. I have conflicting thoughts about formalizing Android support. On the one hand, it would be nice to have. But on the other, it does add a large non-zero burden to all core developers and to the release teams, to the minimum extent of trying to make sure that all ongoing changes don't break platform support. At a minimum a PEP needs to address the minimum platform support requirement outlined in PEP 11 (https://www.python.org/dev/peps/pep-0011/#supporting-platforms). As long as Xavier is willing to keep supporting the platform, the first requirement, having a core developer, should be met. But for a platform that, understandably, has as many special requirements as Android does, the second requirement, having a stable buildbot, seems to me to be an absolute necessity, and the PEP needs to address exactly what sort of buildbot requirements make sense here: emulators, SDKs, etc. Otherwise, we run the risk of ending up with an ongoing maintenance headache and unhappy users, as has been the case in the past with support for other platforms. -- Ned Deily nad at python.org -- [] From guido at python.org Sun Dec 10 17:20:49 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 10 Dec 2017 14:20:49 -0800 Subject: [Python-Dev] Can Python guarantee the order of keyword-only parameters? In-Reply-To: <8b00b41d-2476-ff53-3fc7-edc9a5f49f0c@hastings.org> References: <90a367bb-c1c5-1bc1-f5f6-a537332290ea@hastings.org> <8b00b41d-2476-ff53-3fc7-edc9a5f49f0c@hastings.org> Message-ID: Sure. I think it's a good idea to make this a guaranteed language behavior, and it doesn't need a PEP. On Sun, Dec 10, 2017 at 1:52 PM, Larry Hastings wrote: > > Can I get a ruling on this? I got +1s from the community, but as it's a > (minor) language thing I feel like you're the only one who can actually > okay it. > > > */arry* > > > -------- Forwarded Message -------- > Subject: Can Python guarantee the order of keyword-only parameters? > Date: Mon, 27 Nov 2017 09:05:57 -0800 > From: Larry Hastings > To: Python-Dev > > > > First, a thirty-second refresher, so we're all using the same terminology: > > A *parameter* is a declared input variable to a function. > An *argument* is a value passed into a function. (*Arguments* are stored > in *parameters.*) > > So in the example "def foo(clonk): pass; foo(3)", clonk is a parameter, > and 3 is an argument. ++ > > > Keyword-only arguments were conceived of as being unordered. They're > stored in a dictionary--by convention called **kwargs--and dictionaries > didn't preserve order. But knowing the order of arguments is occasionally > very useful. PEP 468 proposed that Python preserve the order of > keyword-only arguments in kwargs. This became easy with the > order-preserving dictionaries added to Python 3.6. I don't recall the > order of events, but in the end PEP 468 was accepted, and as of 3.6 Python > guarantees order in **kwargs. > > But that's arguments. What about parameters? > > Although this isn't as directly impactful, the order of keyword-only > parameters *is* visible to the programmer. The best way to see a > function's parameters is with inspect.signature, although there's also the > deprecated inspect.getfullargspec; in CPython you can also directly examine > fn.__code__.co_varnames. Two of these methods present their data in a way > that preserves order for all parameters, including keyword-only > parameters--and the third one is deprecated. > > Python must (and does) guarantee the order of positional and > positional-or-keyword parameters, because it uses position to map arguments > to parameters when the function is called. But conceptually this isn't > necessary for keyword-only parameters because their position is > irrelevant. I only see one place in the language & library that addresses > the ordering of keyword-only parameters, by way of omission. The PEP for > inspect.signature (PEP 362) says that when comparing two signatures for > equality, their positional and positional-or-keyword parameters must be in > the same order. It makes a point of *not* requiring that the two > functions' keyword-only parameters be in the same order. > > For every currently supported version of Python 3, inspect.signature and > fn.__code__.co_varnames preserve the order of keyword-only parameters. > This isn't surprising; it's basically the same code path implementing those > as the two types of positional-relevant parameters, so the most > straightforward implementation would naturally preserve their order. It's > just not guaranteed. > > I'd like inspect.signature to guarantee that the order of keyword-only > parameters always matches the order they were declared in. Technically > this isn't a language feature, it's a library feature. But making this > guarantee would require that CPython internally cooperate, so it's kind of > a language feature too. > > Does this sound reasonable? Would it need a PEP? I'm hoping for "yes" > and "no", respectively. > > > Three final notes: > > - Yes, I do have a use case. I'm using inspect.signature metadata to > mechanically map arguments from an external domain (command-line arguments) > to a Python function. Relying on the declaration order of keyword-only > parameters would elegantly solve one small problem. > - I asked Armin Rigo about PyPy's support for Python 3. He said it > should already maintain the order of keyword-only parameters, and if I ever > catch it not maintaining them in order I should file a bug. I assert that > making this guarantee would be nearly zero effort for any Python > implementation--I bet they all already behave this way, all they need is a > test case and some documentation. > - One can extend this concept to functools.partial and > inspect.Signature.bind: should its transformations of keyword-only > parameters also maintain order in a consistent way? I suspect the answer > there is much the same--there's an obvious way it should behave, it almost > certainly already behaves that way, but it doesn't guarantee it. I don't > think I need this for my use case. > > > > */arry* > > ++ Yes, that means "Argument Clinic" should really have been called > "Parameter Clinic". But the "Parameter Clinic" sketch is nowhere near as > funny. > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Dec 10 17:25:03 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 10 Dec 2017 14:25:03 -0800 Subject: [Python-Dev] Support of the Android platform In-Reply-To: <8495B954-5762-43FA-BAA6-681C45EEF2CD@python.org> References: <8495B954-5762-43FA-BAA6-681C45EEF2CD@python.org> Message-ID: I think someone may have to mentor Xavier on how to get this accepted. The note already looks a bit like a PEP, but I suspect that Xavier is not sufficiently familiar with our process to realize the difference. On Sun, Dec 10, 2017 at 2:07 PM, Ned Deily wrote: > On Dec 10, 2017, at 16:26, Guido van Rossum wrote: > > On Dec 10, 2017 12:29, "Brett Cannon" wrote: > >> On Sun, 10 Dec 2017 at 06:19 Xavier de Gaye wrote: > >>> The following note is a proposal to add the support of the Android > platform. > >>> [...] > >> While the note from a technical standpoint is interest, Xavier, I don't > quite see what needs to be done to support Android at this point. Are you > simply asking we add Android API 24 as an official platform? Or permission > to add your note to the Misc/ directory? Basically what are you wanting to > see happen? :) > > Maybe it should be a PEP? > > Yes, I agree there needs to be a PEP for this. I have conflicting > thoughts about formalizing Android support. On the one hand, it would be > nice to have. But on the other, it does add a large non-zero burden to all > core developers and to the release teams, to the minimum extent of trying > to make sure that all ongoing changes don't break platform support. At a > minimum a PEP needs to address the minimum platform support requirement > outlined in PEP 11 (https://www.python.org/dev/peps/pep-0011/#supporting- > platforms). As long as Xavier is willing to keep supporting the > platform, the first requirement, having a core developer, should be met. > But for a platform that, understandably, has as many special requirements > as Android does, the second requirement, having a stable buildbot, seems to > me to be an absolute necessity, and the PEP needs to address exactly what > sort of buildbot requirements make sense here: emulators, SDKs, etc. > Otherwise, we run the risk of ending up with an ongoing maintenance > headache and unhappy users, as has been the case in the past with support > for other platforms. > > -- > Ned Deily > nad at python.org -- [] > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sun Dec 10 18:23:30 2017 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 10 Dec 2017 18:23:30 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> Message-ID: <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> On 12/10/2017 5:00 PM, Raymond Hettinger wrote: > > >> On Dec 10, 2017, at 1:37 PM, Eric V. Smith wrote: >> >> On 12/10/2017 4:29 PM, Ivan Levkivskyi wrote: >>> On 10 December 2017 at 22:24, Raymond Hettinger > wrote: >>> Without typing (only the first currently works): >>> Point = namedtuple('Point', ['x', 'y', 'z']) # >>> underlying store is a tuple >>> Point = make_dataclass('Point', ['x', 'y', 'z']) # >>> underlying store is an instance dict >>> Hm, I think this is a bug in implementation. The second form should also work. >> >> Agreed. >> >> I have a bunch of pending changes for dataclasses. I'll add this. >> >> Eric. > > Thanks Eric and Ivan. You're both very responsive. I appreciate the enormous efforts you're putting in to getting this right. Thank you for your feedback. It's very helpful. I see a couple of options: 1a: Use a default type annotation, if one is not is supplied. typing.Any would presumably make the most sense. 1b: Use None if not type is supplied. 2: Rework the code to not require annotations at all. I think I'd prefer 1a, since it's easy. However, typing is not currently imported by dataclasses.py. There's an argument that it really needs to be, and I should just bite the bullet and live with it. Possibly with Ivan's PEP 560 work my concern on importing typing goes away. 1b would be easy, but I don't like using non-types for annotations. 2 would be okay, but then that would be the only time __annotations__ wouldn't be set on a dataclass. > I suggest two other fix-ups: > > 1) Let make_dataclass() pass through keyword arguments to _process_class(), so that this will work: > > Point = make_dataclass('Point', ['x', 'y', 'z'], order=True) Agreed. > 2) Change the default value for "hash" from "None" to "False". This might take a little effort because there is currently an oddity where setting hash=False causes it to be hashable. I'm pretty sure this wasn't intended ;-) It's sufficiently confusing that I need to sit down when I have some free time and noodle this through. But it's still on my radar. Eric. From victor.stinner at gmail.com Sun Dec 10 18:24:48 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 11 Dec 2017 00:24:48 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: Xavier is working on fixing random issues specific to Android since 2 years. He is almost done, the last step is just to add a build infra to get a buildbot. https://github.com/python/cpython/pull/1629 https://bugs.python.org/issue30386 It's a set of scripts to cross compile Python from Linux to Android. >From the ones who missed it, Xavier is a core dev and will maintain this stuff ;-) Since these changes add a new directory without touching the rest of the code, I don't see a reason to not add it. Victor Le 10 d?c. 2017 21:29, "Brett Cannon" a ?crit : > While the note from a technical standpoint is interest, Xavier, I don't > quite see what needs to be done to support Android at this point. Are you > simply asking we add Android API 24 as an official platform? Or permission > to add your note to the Misc/ directory? Basically what are you wanting to > see happen? :) > > On Sun, 10 Dec 2017 at 06:19 Xavier de Gaye wrote: > >> The following note is a proposal to add the support of the Android >> platform. >> >> The note is easier to read with clickable links at >> https://github.com/xdegaye/cagibi/blob/master/doc/android_support.rst >> >> Motivations >> =========== >> >> * Android is ubiquitous. >> * This would be the first platform supported by Python that is >> cross-compiled, >> thanks to many contributors. >> * Although the Android operating system is linux, it is different from >> most >> linux platforms, for example it does not use GNU libc and runs SELinux >> in >> enforcing mode. Therefore supporting this platform would make Python >> more >> robust and also would allow testing it on arm 64-bit processors. >> * Python running on Android is also a handheld calculator, a successor of >> the >> slide rule and the `HP 41`_. >> >> Current status >> ============== >> >> * The Python test suite succeeds when run on Android emulators using >> buildbot >> strenuous settings with the following architectures on API 24: x86, >> x86_64, >> armv7 and arm64. >> * The `Android build system`_ is described in another section. >> * The `buildmaster-config PR 26`_ proposes to update ``master.cfg`` to >> enable >> buildbots to run a given Android API and architecture on the emulators. >> * The Android emulator is actually ``qemu``, so the test suites for x86 >> and >> x86_64 last about the same time as the test suite run natively when the >> processor of the build system is of the x86 family. The test suites >> for the >> arm architectures last much longer: about 8 hours for arm64 and 10 >> hours for >> armv7 on a four years old laptop. >> * The changes that have been made to achieve this status are listed in >> `bpo-26865`_, the Android meta-issue. >> * Given the cpu resources required to run the test suite on the arm >> emulators, >> it may be difficult to find a contributed buildbot worker. So it >> remains to >> find the hardware to run these buildbots. >> >> Proposal >> ======== >> >> Support the Android platform on API 24 [1]_ for the x86_64, armv7 and >> arm64 >> architectures built with NDK 14b. >> >> *API 24* >> * API 21 is the first version to provide usable support for wide >> characters >> and where SELinux is run in enforcing mode. >> >> * API 22 introduces an annoying bug on the linker that prints >> something like >> this when python is started:: >> >> ``WARNING: linker: libpython3.6m.so.1.0: unused DT entry: type >> 0x6ffffffe arg 0x14554``. >> >> The `termux`_ Android terminal emulator describes this problem at >> the end >> of its `termux-packages`_ gitlab page and has implemented a >> ``termux-elf-cleaner`` tool to strip the useless entries from the ELF >> header of executables. >> >> * API 24 is the first version where the `adb`_ shell is run on the >> emulator >> as a ``shell`` user instead of the ``root`` user previously, and the >> first >> version that supports arm64. >> >> *x86_64* >> It seems that no handheld device exists using that architecture. It is >> supported because the x86_64 Android emulator runs fast and therefore >> is a >> good candidate as a buildbot worker. >> >> *NDK 14b* >> This release of the NDK is the first one to use `Unified headers`_ >> fixing >> numerous problems that had been fixed by updating the Python configure >> script >> until now (those changes have been reverted by now). >> >> Android idiosyncrasies >> ====================== >> >> * The default shell is ``/system/bin/sh``. >> * The file system layout is not a traditional unix layout, there is no >> ``/tmp`` for example. Most directories have user restricted access, >> ``/sdcard`` is mounted as ``noexec`` for example. >> * The (java) applications are allocated a unix user id and a subdirectory >> on >> ``/data/data``. >> * SELinux is run in enforcing mode. >> * Shared memory and semaphores are not supported. >> * The default encoding is UTF-8. >> >> Android build system >> ==================== >> >> The Android build system is implemented at `bpo-30386`_ with `PR 1629`_ >> and >> is documented by its `README`_. It provides the following features: >> >> * To build a distribution for a device or an emulator with a given API >> level >> and a given architecture. >> * To start the emulator and >> + install the distribution >> + start a remote interactive shell >> + or run remotely a python command >> + or run remotely the buildbottest >> * Run gdb on the python process that is running on the emulator with >> python >> pretty-printing. >> >> The build system adds the ``Android/`` directory and the >> ``configure-android`` >> script to the root of the Python source directory on the master branch >> without >> modifying any other file. The build system can be installed, upgraded >> (i.e. the >> SDK and NDK) and run remotely, through ssh for example. >> >> The following external libraries, when they are configured in the build >> system, >> are downloaded from the internet and cross-compiled (only once, on the >> first >> run of the build system) before the cross-compilation of the extension >> modules: >> >> * ``ncurses`` >> * ``readline`` >> * ``sqlite`` >> * ``libffi`` >> * ``openssl``, the cross-compilation of openssl fails on x86_64 and arm64 >> and >> this step is skipped on those architectures. >> >> The following extension modules are disabled by adding them to the >> ``*disabled*`` section of ``Modules/Setup``: >> >> * ``_uuid``, Android has no uuid/uuid.h header. >> * ``grp`` some grp.h functions are not declared. >> * ``_crypt``, Android does not have crypt.h. >> * ``_ctypes`` on x86_64 where all long double tests fail (`bpo-32202`_) >> and on >> arm64 (see `bpo-32203`_). >> >> .. [1] On Wikipedia `Android version history`_ lists the correspondence >> between >> API level, commercial name and version for each release. It also >> provides >> information on the global Android version distribution, see the two >> charts >> on top. >> >> .. _`README`: https://github.com/xdegaye/cpython/blob/bpo-30386/ >> Android/README.rst >> .. _`bpo-26865`: https://bugs.python.org/issue26865 >> .. _`bpo-30386`: https://bugs.python.org/issue30386 >> .. _`bpo-32202`: https://bugs.python.org/issue32202 >> .. _`bpo-32203`: https://bugs.python.org/issue32203 >> .. _`PR 1629`: https://github.com/python/cpython/pull/1629 >> .. _`buildmaster-config PR 26`: https://github.com/python/ >> buildmaster-config/pull/26 >> .. _`Android version history`: https://en.wikipedia.org/wiki/ >> Android_version_history >> .. _`termux`: https://termux.com/ >> .. _`termux-packages`: https://gitlab.com/jbwhips883/termux-packages >> .. _`adb`: https://developer.android.com/studio/command-line/adb.html >> .. _`Unified headers`: https://android.googlesource. >> com/platform/ndk.git/+/ndk-r14-release/docs/UnifiedHeaders.md >> .. _`HP 41`: https://en.wikipedia.org/wiki/HP-41C >> .. vim:filetype=rst:tw=78:ts=8:sts=2:sw=2:et: >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ >> brett%40python.org >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > victor.stinner%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Sun Dec 10 18:29:14 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 11 Dec 2017 00:29:14 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: <8495B954-5762-43FA-BAA6-681C45EEF2CD@python.org> References: <8495B954-5762-43FA-BAA6-681C45EEF2CD@python.org> Message-ID: Le 10 d?c. 2017 23:10, "Ned Deily" a ?crit : > But on the other, it does add a large non-zero burden to all core developers (...) I reviewed some of the Android' pull requests. Most changes are small and self contained. > As long as Xavier is willing to keep supporting the platform, the first requirement, having a core developer, should be met. But for a platform that, understandably, has as many special requirements as Android does, the second requirement, having a stable buildbot, seems to me to be an absolute necessity, and the PEP needs to address exactly what sort of buildbot requirements make sense here: emulators, SDKs, etc. Xavier is a core dev and wants to add a buildbot to finish to support of Android. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Sun Dec 10 19:00:47 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 11 Dec 2017 09:00:47 +0900 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: > > Could you explain why not? utf-8 seems like the common thread for using > surrogateescape so I'm not sure what would make en_US.UTF-8 different than > C.UTF-8. > Because there are many lang_COUNTRY.UTF-8 locales: ja_JP.UTF-8, zh_TW.UTF-8, fr_FR.UTF-8, etc... If only en_US.UTF-8 should use surrogateescape, it may make confusing situation like: "This script works in English Linux desktop, but doesn't work in Japanese Linux desktop!" I accepted PEP 540. So even if failed to coerce locale, it is better than Python 3.6. Regards, INADA Naoki From barry at python.org Sun Dec 10 19:13:53 2017 From: barry at python.org (Barry Warsaw) Date: Sun, 10 Dec 2017 19:13:53 -0500 Subject: [Python-Dev] Mostly Official Python Development Container Image Message-ID: <1E8F20B3-EB57-4DDA-B635-D018C2D94534@python.org> As part of our work on importlib_resources, and with some fantastic help from Abhilash Raj, we now have a mostly official Python development container image that you can use for CI testing and other development purposes. This image is based on Ubuntu 16.04 LTS and provides the latest stable releases of Python 2.7, and 3.4-3.6, along with a mostly up-to-date git checkout of master, currently Python 3.7 of course. Once 3.7 is released to beta, we intend to track its release tarballs too. Note that these are pristine builds of upstream releases, so they don?t have any of the Ubuntu or Debian platform changes. We also install a few other commonly needed tools, like pip, git, unzip, wget, mypy, and tox. We do *not* recommend this image for application deployment purposes; use this for development and testing only, please. Here?s the project repo: https://gitlab.com/python-devs/ci-images We?re publishing this image automatically to quay.io, so you can pull the image in a .gitlab-ci.yml file to run tests against all these versions of Python. Here?s an example from the importlib_resources project: https://gitlab.com/python-devs/importlib_resources/blob/master/.gitlab-ci.yml We welcome contributors on the ci-images GitLab project! Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From rymg19 at gmail.com Sun Dec 10 19:36:02 2017 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sun, 10 Dec 2017 18:36:02 -0600 Subject: [Python-Dev] Mostly Official Python Development Container Image In-Reply-To: <1E8F20B3-EB57-4DDA-B635-D018C2D94534@python.org> References: <1E8F20B3-EB57-4DDA-B635-D018C2D94534@python.org> Message-ID: <4EF6749E-F7D7-44B3-8159-DFE8880211A8@gmail.com> Question: why is this using GitLab while CPython itself is using GitHub + Travis? On December 10, 2017 6:13:53 PM CST, Barry Warsaw wrote: >As part of our work on importlib_resources, and with some fantastic >help from Abhilash Raj, we now have a mostly official Python >development container image that you can use for CI testing and other >development purposes. > >This image is based on Ubuntu 16.04 LTS and provides the latest stable >releases of Python 2.7, and 3.4-3.6, along with a mostly up-to-date git >checkout of master, currently Python 3.7 of course. Once 3.7 is >released to beta, we intend to track its release tarballs too. Note >that these are pristine builds of upstream releases, so they don?t have >any of the Ubuntu or Debian platform changes. > >We also install a few other commonly needed tools, like pip, git, >unzip, wget, mypy, and tox. > >We do *not* recommend this image for application deployment purposes; >use this for development and testing only, please. > >Here?s the project repo: > >https://gitlab.com/python-devs/ci-images > >We?re publishing this image automatically to quay.io, so you can pull >the image in a .gitlab-ci.yml file to run tests against all these >versions of Python. Here?s an example from the importlib_resources >project: > >https://gitlab.com/python-devs/importlib_resources/blob/master/.gitlab-ci.yml > >We welcome contributors on the ci-images GitLab project! > >Cheers, >-Barry -- Ryan (????) Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone else https://refi64.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Sun Dec 10 20:55:51 2017 From: barry at python.org (Barry Warsaw) Date: Sun, 10 Dec 2017 20:55:51 -0500 Subject: [Python-Dev] Mostly Official Python Development Container Image In-Reply-To: <4EF6749E-F7D7-44B3-8159-DFE8880211A8@gmail.com> References: <1E8F20B3-EB57-4DDA-B635-D018C2D94534@python.org> <4EF6749E-F7D7-44B3-8159-DFE8880211A8@gmail.com> Message-ID: On Dec 10, 2017, at 19:36, Ryan Gonzalez wrote: > > Question: why is this using GitLab while CPython itself is using GitHub + Travis? Mostly because Brett gave me the option to use GitLab for importlib_resources, and this grew out of that. Enjoy! overturn-pep-507-ly y?rs, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From victor.stinner at gmail.com Mon Dec 11 06:56:08 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 11 Dec 2017 12:56:08 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: 2017-12-10 15:19 GMT+01:00 Xavier de Gaye : > Motivations > =========== > > * Android is ubiquitous. > * This would be the first platform supported by Python that is > cross-compiled, > thanks to many contributors. > * Although the Android operating system is linux, it is different from most > linux platforms, for example it does not use GNU libc and runs SELinux in > enforcing mode. Therefore supporting this platform would make Python more > robust and also would allow testing it on arm 64-bit processors. > * Python running on Android is also a handheld calculator, a successor of > the > slide rule and the `HP 41`_. I still don't understand what is "Android". What is the license of Android? > * The Python test suite succeeds when run on Android emulators using > buildbot Great achievement! Congrats! I know that it has been a long travel to reach this point! (Fix each invidivual test failure, fix many tiny things.) > * Given the cpu resources required to run the test suite on the arm > emulators, > it may be difficult to find a contributed buildbot worker. So it remains > to > find the hardware to run these buildbots. Do you have the hardware to host such worker? Or are you looking for a host somewhere? Which kind of hardware are you looking for? CPU, memory, network bandwidth, etc. > *API 24* > * API 21 is the first version to provide usable support for wide > characters > and where SELinux is run in enforcing mode. Some people are looking for API 19 support. Would it be doable, or would it require too many changes? I know that people are running heavily patched Python 2.7 and 3.5 on Android with API 19. I'm not asking for a "full support" for API 19, but more if it would be possible to get a "best effort" level of support, like accept patches if someone writes them. > The following extension modules are disabled by adding them to the > ``*disabled*`` section of ``Modules/Setup``: > > * ``_uuid``, Android has no uuid/uuid.h header. > * ``grp`` some grp.h functions are not declared. > * ``_crypt``, Android does not have crypt.h. > * ``_ctypes`` on x86_64 where all long double tests fail (`bpo-32202`_) and > on > arm64 (see `bpo-32203`_). That's a very short list, it's ok. It's not the most popular modules of the stdlib :-) ctypes would be nice to have, but it can be done later. Victor From xdegaye at gmail.com Mon Dec 11 07:52:01 2017 From: xdegaye at gmail.com (Xavier de Gaye) Date: Mon, 11 Dec 2017 13:52:01 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: On 12/10/2017 09:27 PM, Brett Cannon wrote: > While the note from a technical standpoint is interest, Xavier, I don't quite see what needs to be done to support Android at this point. Are you simply asking we add Android API 24 as an official > platform? Or permission to add your note to the Misc/ directory? Basically what are you wanting to see happen? :) This is mainly a proposal written in the form of a PEP and open for discussion, not a PEP because it is not required by PEP 11 but it may become one. I guess the discussion may be about the choice of the API level, or whether the buildbot should run on emulators or on devices (not currently supported by the build system) or on any other subject. What is left to be done to support Android at this point is to merge the PR that implements the build system, make the change in the buildbot buildmaster-config repo and test it on a worker and then find the hardware to run all the workers for these architectures and set it up. Xavier From xdegaye at gmail.com Mon Dec 11 08:00:00 2017 From: xdegaye at gmail.com (Xavier de Gaye) Date: Mon, 11 Dec 2017 14:00:00 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: <6e8feb1c-3a28-4927-deff-7aac97050cc6@gmail.com> On 12/10/2017 10:27 PM, Skip Montanaro wrote: > I'm not familiar with software development on/for Android, but > wouldn't official support also involve suitable package creation or > does that just fall out for free from the build-for-emulator PR? > > Skip > The build-for-emulator PR allows building for a device but the packaging of an Android application into an installable APK (a zip file actually) is the responsability of the java application developer. Xavier From xdegaye at gmail.com Mon Dec 11 08:13:06 2017 From: xdegaye at gmail.com (Xavier de Gaye) Date: Mon, 11 Dec 2017 14:13:06 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: <8495B954-5762-43FA-BAA6-681C45EEF2CD@python.org> References: <8495B954-5762-43FA-BAA6-681C45EEF2CD@python.org> Message-ID: On 12/10/2017 11:07 PM, Ned Deily wrote: > On the one hand, it would be nice to have. But on the other, it does add a large non-zero burden to all core developers and to the release teams, to the minimum extent of trying to make sure that all ongoing changes don't break platform support. Yes this platform has few quirks especially when SELinux is involved and some tests are harder to write because of that :-( Xavier From xdegaye at gmail.com Mon Dec 11 08:58:52 2017 From: xdegaye at gmail.com (Xavier de Gaye) Date: Mon, 11 Dec 2017 14:58:52 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: On 12/11/2017 12:56 PM, Victor Stinner wrote: > 2017-12-10 15:19 GMT+01:00 Xavier de Gaye : >> * Given the cpu resources required to run the test suite on the arm >> emulators, >> it may be difficult to find a contributed buildbot worker. So it remains >> to >> find the hardware to run these buildbots. > > Do you have the hardware to host such worker? Or are you looking for a > host somewhere? Which kind of hardware are you looking for? CPU, > memory, network bandwidth, etc. I cannot host the buildbots or any buildbot for that matter. I can maintain them. The host running the buildbots must be able to run 6 (i.e. 3 x (version 3.x + maintenance version)) emulators simultaneously, so with an eight core cpu, that will be 6 cores running at 100%. The armv7 and arm64 buildbot may be set to run only daily but the tests last a long time on these architectures anyway. >> *API 24* >> * API 21 is the first version to provide usable support for wide >> characters >> and where SELinux is run in enforcing mode. > > Some people are looking for API 19 support. Would it be doable, or > would it require too many changes? I know that people are running > heavily patched Python 2.7 and 3.5 on Android with API 19. > > I'm not asking for a "full support" for API 19, but more if it would > be possible to get a "best effort" level of support, like accept > patches if someone writes them. Not sure that python can be built on API 19. What I remember about API 19 at the time I started this project, is that wide characters support is not usable. If you look at the Android version history [1] on Wikipedia referred to in my initial post, the Kit Kat (API 19) share is 16 % now and will probably be 8 % next year. Another point to consider is that working on a change specific to Android is tedious: the test case must be ok on the build platform and on the emulator. The emulator must be started and an installation made from scratch, and after few file modifications on the emulator there is no 'git status' command to tell exactly what change you are running and you must re-install from scratch. Is there a way to browse these patches to get a better idea of the changes involved ? Xavier [1] https://en.wikipedia.org/wiki/Android_version_history From victor.stinner at gmail.com Mon Dec 11 09:40:19 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 11 Dec 2017 15:40:19 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: 2017-12-11 14:58 GMT+01:00 Xavier de Gaye : > The host running the buildbots must be able to run 6 (i.e. 3 x (version 3.x > + maintenance version)) emulators simultaneously, so with an eight core cpu, > that will be 6 cores running at 100%. The armv7 and arm64 buildbot may be > set to run only daily but the tests last a long time on these architectures > anyway. What do you mean by "maintenance version"? Do you want to add Android support to Python 2.7 and 3.6 as well? I would prefer to only support Android since the master branch. Would it be possible to only run an emulator to run a test, and then stop it? So we could test many combo in the same host? Victor From victor.stinner at gmail.com Mon Dec 11 10:14:07 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 11 Dec 2017 16:14:07 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: I'm asking for precise hardware specifications since Red Hat may be able to provide one through the https://osci.io/ program. Victor 2017-12-11 15:40 GMT+01:00 Victor Stinner : > 2017-12-11 14:58 GMT+01:00 Xavier de Gaye : >> The host running the buildbots must be able to run 6 (i.e. 3 x (version 3.x >> + maintenance version)) emulators simultaneously, so with an eight core cpu, >> that will be 6 cores running at 100%. The armv7 and arm64 buildbot may be >> set to run only daily but the tests last a long time on these architectures >> anyway. > > What do you mean by "maintenance version"? Do you want to add Android > support to Python 2.7 and 3.6 as well? I would prefer to only support > Android since the master branch. > > Would it be possible to only run an emulator to run a test, and then > stop it? So we could test many combo in the same host? > > Victor From xdegaye at gmail.com Mon Dec 11 10:40:18 2017 From: xdegaye at gmail.com (Xavier de Gaye) Date: Mon, 11 Dec 2017 16:40:18 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: <97a8b0e7-d58c-8fa8-57b7-49fa442ec732@gmail.com> On 12/11/2017 03:40 PM, Victor Stinner wrote: > 2017-12-11 14:58 GMT+01:00 Xavier de Gaye : >> The host running the buildbots must be able to run 6 (i.e. 3 x (version 3.x >> + maintenance version)) emulators simultaneously, so with an eight core cpu, >> that will be 6 cores running at 100%. The armv7 and arm64 buildbot may be >> set to run only daily but the tests last a long time on these architectures >> anyway. > > What do you mean by "maintenance version"? Do you want to add Android > support to Python 2.7 and 3.6 as well? I would prefer to only support > Android since the master branch. Yes, today only the master branch, and when 3.8 is released then both 3.7 and 3.8 would be supported. That is what I (implicitly) meant. > Would it be possible to only run an emulator to run a test, and then > stop it? So we could test many combo in the same host? I may not understand your question. An emulator runs with an AVD image (Android Virtual Device) that is specific to the API and architecture (armv7, x86_64, ...), and that contains the data (file systems) and the configuration. The AVD image is created once and for all for each of the combinations (API, architecture). You start the emulator with the AVD that matches the type of the built python (API, architecture) and install that build. Multiple emulators can run simultaneously except that the build system imposes the restriction that no two identical (API, architecture) emulators can run concurrently except if they are not used for the same Python version (emulators are identified by their port numbers by the Android tools and the build system allocates those ports statically). Xavier From xdegaye at gmail.com Mon Dec 11 11:41:42 2017 From: xdegaye at gmail.com (Xavier de Gaye) Date: Mon, 11 Dec 2017 17:41:42 +0100 Subject: [Python-Dev] What does Android support mean? In-Reply-To: References: Message-ID: On 12/11/2017 03:58 PM, Carl Bordum Hansen wrote: > > I've been lurking at your progress with android support for about a year, and now that it is closing in I simply have to ask: what does it actually mean that android is supported? That Android apps > will be easy to develop in Python? That I can easily run Python script or a REPL on my phone? Hello Carl, It means that we will make sure that changes made to Python to fix it or to improve it will not break on Android for the supported API level and architectures. One can already run Python scripts or the interactive interpreter and install packages with pip using termux, a great application. Kivy can be used to develop in Python apps that work on Android and you may be interested in the Python Mobile SIG [1] mailing list or in the VOC [2] transpiler that converts Python code into Java bytecode. Xavier [1] https://www.python.org/community/sigs/current/mobile-sig/ [2] https://github.com/pybee/voc From chris.barker at noaa.gov Mon Dec 11 12:10:57 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 11 Dec 2017 09:10:57 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> Message-ID: <3418511732122395686@unknownmsgid> . I see a couple of options: 1a: Use a default type annotation, if one is not is supplied. typing.Any would presumably make the most sense. 1b: Use None if not type is supplied. 2: Rework the code to not require annotations at all. I think I'd prefer 1a, since it's easy. 2) would be great :-) I find this bit of ?typing creep? makes me nervous? Typing should Never be required! I understand that the intent here is that the user could ignore typing and have it all still work. But I?d rather is was not still there under the hood. Just because standardized way to do something is included in core Python doesn?t mean the standard library has to use it. However, typing is not currently imported by dataclasses.py. And there you have an actual reason besides my uneasiness :-) - CHB -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Dec 11 14:48:40 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 11 Dec 2017 11:48:40 -0800 Subject: [Python-Dev] Last call for PEP approvals before the holidays Message-ID: After this Friday I'm going to go on vacation for a couple of weeks. Anyone who has a PEP for which they're awaiting acceptance and that acceptance needs to happen before the 3.7 feature freeze (a.k.a. 3.7b1, scheduled for January 29 2018, see PEP 537) please ping me with a pointer to the thread on python-dev (or occasionally on another list). If acceptance waits until the new year it's less likely to happen in time. OTOH if your PEP is not time-sensitive it's probably better to wait until after the 3.7b1 release. PEPs I am already aware of: - 565 (DeprecationWarning) -- I'm going to accept it - 550 (Context) -- I'm waiting for a new (simpler) draft by Yury - 561 (type checker module search) -- not time-sensitive, I'm confident that the work is going well regardless of acceptance - 554 (multiple interpreters) -- postponed to 3.8 - 558 (locals()) -- I think this will have to be postponed -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Dec 11 15:10:04 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 11 Dec 2017 12:10:04 -0800 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: Congrats Victor! Thanks mr. Inada for reviewing this PEP (and 538). Thanks everyone else who participated in the lively discussion! On Sun, Dec 10, 2017 at 4:00 PM, INADA Naoki wrote: > > > > Could you explain why not? utf-8 seems like the common thread for using > > surrogateescape so I'm not sure what would make en_US.UTF-8 different > than > > C.UTF-8. > > > > Because there are many lang_COUNTRY.UTF-8 locales: > ja_JP.UTF-8, zh_TW.UTF-8, fr_FR.UTF-8, etc... > > If only en_US.UTF-8 should use surrogateescape, it may make confusing > situation > like: "This script works in English Linux desktop, but doesn't work in > Japanese Linux > desktop!" > > I accepted PEP 540. So even if failed to coerce locale, it is better > than Python 3.6. > > Regards, > > INADA Naoki > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Mon Dec 11 15:01:31 2017 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 11 Dec 2017 15:01:31 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> Message-ID: On 12/10/2017 5:00 PM, Raymond Hettinger wrote: > > >> On Dec 10, 2017, at 1:37 PM, Eric V. Smith wrote: >> >> On 12/10/2017 4:29 PM, Ivan Levkivskyi wrote: >>> On 10 December 2017 at 22:24, Raymond Hettinger > wrote: >>> Without typing (only the first currently works): >>> Point = namedtuple('Point', ['x', 'y', 'z']) # >>> underlying store is a tuple >>> Point = make_dataclass('Point', ['x', 'y', 'z']) # >>> underlying store is an instance dict >>> Hm, I think this is a bug in implementation. The second form should also work. >> >> Agreed. >> >> I have a bunch of pending changes for dataclasses. I'll add this. This is bpo-32278. > I suggest two other fix-ups: > > 1) Let make_dataclass() pass through keyword arguments to _process_class(), so that this will work: > > Point = make_dataclass('Point', ['x', 'y', 'z'], order=True) This is bpo-32279. > > 2) Change the default value for "hash" from "None" to "False". This might take a little effort because there is currently an oddity where setting hash=False causes it to be hashable. I'm pretty sure this wasn't intended ;-) No time for this one yet. Soon! Eric. From levkivskyi at gmail.com Mon Dec 11 15:15:34 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Mon, 11 Dec 2017 21:15:34 +0100 Subject: [Python-Dev] Last call for PEP approvals before the holidays In-Reply-To: References: Message-ID: There is also PEP 544 (Structural subtyping, a.k.a. static duck typing), but I think we discussed off-list that it is also not time sensitive, given the (limited) provisional status of typing module. (Also mypy already supports it, so the question is mainly when this support is official after polishing few remaining issues.) -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Dec 11 21:18:49 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Dec 2017 12:18:49 +1000 Subject: [Python-Dev] Last call for PEP approvals before the holidays In-Reply-To: References: Message-ID: On 12 Dec. 2017 8:52 am, "Guido van Rossum" wrote: - 558 (locals()) -- I think this will have to be postponed +1 for deferring that one to 3.8. While I like where it's heading now, it isn't urgent, and we already have quite a few refactorings of low level internals landing in 3.7. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Dec 11 21:25:17 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 12 Dec 2017 12:25:17 +1000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> Message-ID: On 11 Dec. 2017 12:26 pm, "Eric V. Smith" wrote: I see a couple of options: 1a: Use a default type annotation, if one is not is supplied. typing.Any would presumably make the most sense. 1b: Use None if not type is supplied. 2: Rework the code to not require annotations at all. 1c: annotate with the string "typing.Any" (this may require a tweak to the rules for evaluating lazy annotations, though) Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From desmoulinmichel at gmail.com Tue Dec 12 03:39:17 2017 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Tue, 12 Dec 2017 09:39:17 +0100 Subject: [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: <718169B1-C1AE-4FCA-92E7-D4E246096612@python.org> References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <1511974989.903268.1188352080.36BF6ADC@webmail.messagingengine.com> <718169B1-C1AE-4FCA-92E7-D4E246096612@python.org> Message-ID: <1072df21-2646-371b-3c11-6329793d29cb@gmail.com> Le 29/11/2017 ? 19:02, Barry Warsaw a ?crit?: > On Nov 29, 2017, at 12:40, David Mertz wrote: > >> I think some syntax could be possible to only "catch" some exceptions and let others propagate. Maybe: >> >> val = name.strip()[4:].upper() except (AttributeError, KeyError): -1 >> >> I don't really like throwing a colon in an expression though. Perhaps some other word or symbol could work instead. How does this read: >> >> val = name.strip()[4:].upper() except -1 in (AttributeError, KeyError) > > I don?t know whether I like any of this but I think a more natural spelling would be: > > val = name.strip()[4:].upper() except (AttributeError, KeyError) as -1 > > which could devolve into: > > val = name.strip()[4:].upper() except KeyError as -1 > > or: > > val = name.strip()[4:].upper() except KeyError # Implicit `as None` > > I would *not* add any spelling for an explicit bare-except equivalent. You would have to write: > > val = name.strip()[4:].upper() except Exception as -1 > > Cheers, > -Barry > I really like this one. It's way more general. I can see a use for IndexError as well (lists don't have the dict.get() method). Also I would prefer not to use "as" this way. In the context of an exception, "as" already binds the exception to a variable so it's confusing. What about: val = name.strip()[4:].upper() except Exception: -1 From rosuav at gmail.com Tue Dec 12 03:48:42 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 12 Dec 2017 19:48:42 +1100 Subject: [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: <1072df21-2646-371b-3c11-6329793d29cb@gmail.com> References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <1511974989.903268.1188352080.36BF6ADC@webmail.messagingengine.com> <718169B1-C1AE-4FCA-92E7-D4E246096612@python.org> <1072df21-2646-371b-3c11-6329793d29cb@gmail.com> Message-ID: On Tue, Dec 12, 2017 at 7:39 PM, Michel Desmoulin wrote: > > > Le 29/11/2017 ? 19:02, Barry Warsaw a ?crit : >> On Nov 29, 2017, at 12:40, David Mertz wrote: >> >>> I think some syntax could be possible to only "catch" some exceptions and let others propagate. Maybe: >>> >>> val = name.strip()[4:].upper() except (AttributeError, KeyError): -1 >>> >>> I don't really like throwing a colon in an expression though. Perhaps some other word or symbol could work instead. How does this read: >>> >>> val = name.strip()[4:].upper() except -1 in (AttributeError, KeyError) >> >> I don?t know whether I like any of this but I think a more natural spelling would be: >> >> val = name.strip()[4:].upper() except (AttributeError, KeyError) as -1 >> >> which could devolve into: >> >> val = name.strip()[4:].upper() except KeyError as -1 >> >> or: >> >> val = name.strip()[4:].upper() except KeyError # Implicit `as None` >> >> I would *not* add any spelling for an explicit bare-except equivalent. You would have to write: >> >> val = name.strip()[4:].upper() except Exception as -1 >> >> Cheers, >> -Barry >> > > I really like this one. It's way more general. I can see a use for > IndexError as well (lists don't have the dict.get() method). > > Also I would prefer not to use "as" this way. In the context of an > exception, "as" already binds the exception to a variable so it's confusing. > > What about: > > > val = name.strip()[4:].upper() except Exception: -1 That happens to be the exact syntax recommended by PEP 463 (modulo some distinguishing parentheses). https://www.python.org/dev/peps/pep-0463/ ChrisA From eric at trueblade.com Tue Dec 12 04:46:34 2017 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 12 Dec 2017 04:46:34 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> Message-ID: <71e5af1a-5213-44d8-7b84-e86bb602999d@trueblade.com> On 12/11/2017 9:25 PM, Nick Coghlan wrote: > On 11 Dec. 2017 12:26 pm, "Eric V. Smith" > wrote: > > > > I see a couple of options: > 1a: Use a default type annotation, if one is not is supplied. > typing.Any would presumably make the most sense. > 1b: Use None if not type is supplied. > 2: Rework the code to not require annotations at all. > > > 1c: annotate with the string "typing.Any" (this may require a tweak to > the rules for evaluating lazy annotations, though) Good idea, since it needs to be supported, anyway, especially in light of PEP 563. Eric. From xdegaye at gmail.com Tue Dec 12 04:50:53 2017 From: xdegaye at gmail.com (Xavier de Gaye) Date: Tue, 12 Dec 2017 10:50:53 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: On 12/11/2017 04:14 PM, Victor Stinner wrote: > I'm asking for precise hardware specifications since Red Hat may be > able to provide one through the https://osci.io/ program. Is it acceptable to run the arm buildbots only on a weekly basis ? For all the architectures (x86_64, armv7 and arm64), the tests are run with the same Python code, built with the same tools and running on the same operating system that runs on emulators using the same configuration. If we do not take into account the ctypes issue, there is only one issue (issue 26939 [1]) among all the Android issues that is specific to arm, and it is not even really an arm issue but an issue related to the slowness of the emulator that has been fixed by increasing in a test the switch interval with sys.setswitchinterval(). Xavier [1] https://bugs.python.org/issue26939 From solipsis at pitrou.net Tue Dec 12 09:02:14 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 12 Dec 2017 15:02:14 +0100 Subject: [Python-Dev] Support of the Android platform References: Message-ID: <20171212150214.7439783d@fsol> On Tue, 12 Dec 2017 10:50:53 +0100 Xavier de Gaye wrote: > On 12/11/2017 04:14 PM, Victor Stinner wrote: > > I'm asking for precise hardware specifications since Red Hat may be > > able to provide one through the https://osci.io/ program. > > Is it acceptable to run the arm buildbots only on a weekly basis ? It sounds reasonable to me, as long as someone is monitoring their results (that would probably be you). Regards Antoine. From victor.stinner at gmail.com Tue Dec 12 10:36:47 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 12 Dec 2017 16:36:47 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: <20171212150214.7439783d@fsol> References: <20171212150214.7439783d@fsol> Message-ID: 2017-12-12 15:02 GMT+01:00 Antoine Pitrou : > It sounds reasonable to me, as long as someone is monitoring their > results (that would probably be you). There is now the buildbot-status where failures are reported. Trust me, I like making a lot of noise when something is broken :-) FYI I'm trying https://github.com/python/cpython/pull/1629 : I succeeded to "cross-compile" a binary on Linux for Android. I'm now downloading an ISO to boot an x86-64 Android VM to test this binary ;-) Victor From yselivanov.ml at gmail.com Tue Dec 12 12:33:24 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 12 Dec 2017 12:33:24 -0500 Subject: [Python-Dev] PEP 567 -- Context Variables Message-ID: Hi, This is a new proposal to implement context storage in Python. It's a successor of PEP 550 and builds on some of its API ideas and datastructures. Contrary to PEP 550 though, this proposal only focuses on adding new APIs and implementing support for it in asyncio. There are no changes to the interpreter or to the behaviour of generator or coroutine objects. PEP: 567 Title: Context Variables Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 12-Dec-2017 Python-Version: 3.7 Post-History: 12-Dec-2017 Abstract ======== This PEP proposes the new ``contextvars`` module and a set of new CPython C APIs to support context variables. This concept is similar to thread-local variables but, unlike TLS, it allows correctly keeping track of values per asynchronous task, e.g. ``asyncio.Task``. This proposal builds directly upon concepts originally introduced in :pep:`550`. The key difference is that this PEP is only concerned with solving the case for asynchronous tasks, and not generators. There are no proposed modifications to any built-in types or to the interpreter. Rationale ========= Thread-local variables are insufficient for asynchronous tasks which execute concurrently in the same OS thread. Any context manager that needs to save and restore a context value and uses ``threading.local()``, will have its context values bleed to other code unexpectedly when used in async/await code. A few examples where having a working context local storage for asynchronous code is desired: * Context managers like decimal contexts and ``numpy.errstate``. * Request-related data, such as security tokens and request data in web applications, language context for ``gettext`` etc. * Profiling, tracing, and logging in large code bases. Introduction ============ The PEP proposes a new mechanism for managing context variables. The key classes involved in this mechanism are ``contextvars.Context`` and ``contextvars.ContextVar``. The PEP also proposes some policies for using the mechanism around asynchronous tasks. The proposed mechanism for accessing context variables uses the ``ContextVar`` class. A module (such as decimal) that wishes to store a context variable should: * declare a module-global variable holding a ``ContextVar`` to serve as a "key"; * access the current value via the ``get()`` method on the key variable; * modify the current value via the ``set()`` method on the key variable. The notion of "current value" deserves special consideration: different asynchronous tasks that exist and execute concurrently may have different values. This idea is well-known from thread-local storage but in this case the locality of the value is not always necessarily to a thread. Instead, there is the notion of the "current ``Context``" which is stored in thread-local storage, and is accessed via ``contextvars.get_context()`` function. Manipulation of the current ``Context`` is the responsibility of the task framework, e.g. asyncio. A ``Context`` is conceptually a mapping, implemented using an immutable dictionary. The ``ContextVar.get()`` method does a lookup in the current ``Context`` with ``self`` as a key, raising a ``LookupError`` or returning a default value specified in the constructor. The ``ContextVar.set(value)`` method clones the current ``Context``, assigns the ``value`` to it with ``self`` as a key, and sets the new ``Context`` as a new current. Because ``Context`` uses an immutable dictionary, cloning it is O(1). Specification ============= A new standard library module ``contextvars`` is added with the following APIs: 1. ``get_context() -> Context`` function is used to get the current ``Context`` object for the current OS thread. 2. ``ContextVar`` class to declare and access context variables. 3. ``Context`` class encapsulates context state. Every OS thread stores a reference to its current ``Context`` instance. It is not possible to control that reference manually. Instead, the ``Context.run(callable, *args)`` method is used to run Python code in another context. contextvars.ContextVar ---------------------- The ``ContextVar`` class has the following constructor signature: ``ContextVar(name, *, default=no_default)``. The ``name`` parameter is used only for introspection and debug purposes. The ``default`` parameter is optional. Example:: # Declare a context variable 'var' with the default value 42. var = ContextVar('var', default=42) ``ContextVar.get()`` returns a value for context variable from the current ``Context``:: # Get the value of `var`. var.get() ``ContextVar.set(value) -> Token`` is used to set a new value for the context variable in the current ``Context``:: # Set the variable 'var' to 1 in the current context. var.set(1) ``contextvars.Token`` is an opaque object that should be used to restore the ``ContextVar`` to its previous value, or remove it from the context if it was not set before. The ``ContextVar.reset(Token)`` is used for that:: old = var.set(1) try: ... finally: var.reset(old) The ``Token`` API exists to make the current proposal forward compatible with :pep:`550`, in case there is demand to support context variables in generators and asynchronous generators in the future. ``ContextVar`` design allows for a fast implementation of ``ContextVar.get()``, which is particularly important for modules like ``decimal`` an ``numpy``. contextvars.Context ------------------- ``Context`` objects are mappings of ``ContextVar`` to values. To get the current ``Context`` for the current OS thread, use ``contextvars.get_context()`` method:: ctx = contextvars.get_context() To run Python code in some ``Context``, use ``Context.run()`` method:: ctx.run(function) Any changes to any context variables that ``function`` causes, will be contained in the ``ctx`` context:: var = ContextVar('var') var.set('spam') def function(): assert var.get() == 'spam' var.set('ham') assert var.get() == 'ham' ctx = get_context() ctx.run(function) assert var.get('spam') Any changes to the context will be contained and persisted in the ``Context`` object on which ``run()`` is called on. ``Context`` objects implement the ``collections.abc.Mapping`` ABC. This can be used to introspect context objects:: ctx = contextvars.get_context() # Print all context variables in their values in 'ctx': print(ctx.items()) # Print the value of 'some_variable' in context 'ctx': print(ctx[some_variable]) asyncio ------- ``asyncio`` uses ``Loop.call_soon()``, ``Loop.call_later()``, and ``Loop.call_at()`` to schedule the asynchronous execution of a function. ``asyncio.Task`` uses ``call_soon()`` to run the wrapped coroutine. We modify ``Loop.call_{at,later,soon}`` to accept the new optional *context* keyword-only argument, which defaults to the current context:: def call_soon(self, callback, *args, context=None): if context is None: context = contextvars.get_context() # ... some time later context.run(callback, *args) Tasks in asyncio need to maintain their own isolated context. ``asyncio.Task`` is modified as follows:: class Task: def __init__(self, coro): ... # Get the current context snapshot. self._context = contextvars.get_context() self._loop.call_soon(self._step, context=self._context) def _step(self, exc=None): ... # Every advance of the wrapped coroutine is done in # the task's context. self._loop.call_soon(self._step, context=self._context) ... CPython C API ------------- TBD Implementation ============== This section explains high-level implementation details in pseudo-code. Some optimizations are omitted to keep this section short and clear. The internal immutable dictionary for ``Context`` is implemented using Hash Array Mapped Tries (HAMT). They allow for O(log N) ``set`` operation, and for O(1) ``get_context()`` function. For the purposes of this section, we implement an immutable dictionary using ``dict.copy()``:: class _ContextData: def __init__(self): self.__mapping = dict() def get(self, key): return self.__mapping[key] def set(self, key, value): copy = _ContextData() copy.__mapping = self.__mapping.copy() copy.__mapping[key] = value return copy def delete(self, key): copy = _ContextData() copy.__mapping = self.__mapping.copy() del copy.__mapping[key] return copy Every OS thread has a reference to the current ``_ContextData``. ``PyThreadState`` is updated with a new ``context_data`` field that points to a ``_ContextData`` object:: PyThreadState: context : _ContextData ``contextvars.get_context()`` is implemented as follows: def get_context(): ts : PyThreadState = PyThreadState_Get() if ts.context_data is None: ts.context_data = _ContextData() ctx = Context() ctx.__data = ts.context_data return ctx ``contextvars.Context`` is a wrapper around ``_ContextData``:: class Context(collections.abc.Mapping): def __init__(self): self.__data = _ContextData() def run(self, callable, *args): ts : PyThreadState = PyThreadState_Get() saved_data : _ContextData = ts.context_data try: ts.context_data = self.__data callable(*args) finally: self.__data = ts.context_data ts.context_data = saved_data # Mapping API methods are implemented by delegating # `get()` and other Mapping calls to `self.__data`. ``contextvars.ContextVar`` interacts with ``PyThreadState.context_data`` directly:: class ContextVar: def __init__(self, name, *, default=NO_DEFAULT): self.__name = name self.__default = default @property def name(self): return self.__name def get(self, default=NO_DEFAULT): ts : PyThreadState = PyThreadState_Get() data : _ContextData = ts.context_data try: return data.get(self) except KeyError: pass if default is not NO_DEFAULT: return default if self.__default is not NO_DEFAULT: return self.__default raise LookupError def set(self, value): ts : PyThreadState = PyThreadState_Get() data : _ContextData = ts.context_data try: old_value = data.get(self) except KeyError: old_value = NO_VALUE ts.context_data = data.set(self, value) return Token(self, old_value) def reset(self, token): if token.__used: return if token.__old_value is NO_VALUE: ts.context_data = data.delete(token.__var) else: ts.context_data = data.set(token.__var, token.__old_value) token.__used = True class Token: def __init__(self, var, old_value): self.__var = var self.__old_value = old_value self.__used = False Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Libraries that use ``threading.local()`` to store context-related values, currently work correctly only for synchronous code. Switching them to use the proposed API will keep their behavior for synchronous code unmodified, but will automatically enable support for asynchronous code. Appendix: HAMT Performance Analysis =================================== .. figure:: pep-0550-hamt_vs_dict-v2.png :align: center :width: 100% Figure 1. Benchmark code can be found here: [1]_. The above chart demonstrates that: * HAMT displays near O(1) performance for all benchmarked dictionary sizes. * ``dict.copy()`` becomes very slow around 100 items. .. figure:: pep-0550-lookup_hamt.png :align: center :width: 100% Figure 2. Benchmark code can be found here: [2]_. Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based immutable mapping. HAMT lookup time is 30-40% slower than Python dict lookups on average, which is a very good result, considering that the latter is very well optimized. The reference implementation of HAMT for CPython can be found here: [3]_. References ========== .. [1] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd .. [2] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e .. [3] https://github.com/1st1/cpython/tree/hamt Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From guido at python.org Tue Dec 12 14:03:46 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 12 Dec 2017 11:03:46 -0800 Subject: [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <1511974989.903268.1188352080.36BF6ADC@webmail.messagingengine.com> <718169B1-C1AE-4FCA-92E7-D4E246096612@python.org> <1072df21-2646-371b-3c11-6329793d29cb@gmail.com> Message-ID: And I'll never approve syntax to make it easier to just ignore all exceptions without looking at them. On Tue, Dec 12, 2017 at 12:48 AM, Chris Angelico wrote: > On Tue, Dec 12, 2017 at 7:39 PM, Michel Desmoulin > wrote: > > > > > > Le 29/11/2017 ? 19:02, Barry Warsaw a ?crit : > >> On Nov 29, 2017, at 12:40, David Mertz wrote: > >> > >>> I think some syntax could be possible to only "catch" some exceptions > and let others propagate. Maybe: > >>> > >>> val = name.strip()[4:].upper() except (AttributeError, KeyError): -1 > >>> > >>> I don't really like throwing a colon in an expression though. Perhaps > some other word or symbol could work instead. How does this read: > >>> > >>> val = name.strip()[4:].upper() except -1 in (AttributeError, > KeyError) > >> > >> I don?t know whether I like any of this but I think a more > natural spelling would be: > >> > >> val = name.strip()[4:].upper() except (AttributeError, KeyError) as > -1 > >> > >> which could devolve into: > >> > >> val = name.strip()[4:].upper() except KeyError as -1 > >> > >> or: > >> > >> val = name.strip()[4:].upper() except KeyError # Implicit `as None` > >> > >> I would *not* add any spelling for an explicit bare-except equivalent. > You would have to write: > >> > >> val = name.strip()[4:].upper() except Exception as -1 > >> > >> Cheers, > >> -Barry > >> > > > > I really like this one. It's way more general. I can see a use for > > IndexError as well (lists don't have the dict.get() method). > > > > Also I would prefer not to use "as" this way. In the context of an > > exception, "as" already binds the exception to a variable so it's > confusing. > > > > What about: > > > > > > val = name.strip()[4:].upper() except Exception: -1 > > That happens to be the exact syntax recommended by PEP 463 (modulo > some distinguishing parentheses). > > https://www.python.org/dev/peps/pep-0463/ > > ChrisA > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Dec 12 14:13:14 2017 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 13 Dec 2017 06:13:14 +1100 Subject: [Python-Dev] What's the status of PEP 505: None-aware operators? In-Reply-To: References: <28D91255-56A9-4CC2-B45D-F83ECD715544@langa.pl> <50C74ECC-D462-4FAC-8A9C-A12BC939ADEB@gmail.com> <1511974989.903268.1188352080.36BF6ADC@webmail.messagingengine.com> <718169B1-C1AE-4FCA-92E7-D4E246096612@python.org> <1072df21-2646-371b-3c11-6329793d29cb@gmail.com> Message-ID: On Wed, Dec 13, 2017 at 6:03 AM, Guido van Rossum wrote: > And I'll never approve syntax to make it easier to just ignore all > exceptions without looking at them. Well, I certainly wouldn't advocate "except Exception: -1", but the syntax is the same as "except KeyError: -1" which is less unreasonable. But PEP 463 was rejected, which means that any proposals along these lines need to first deal with the objections to that PEP, else there's not a lot of point discussing them. ChrisA From guido at python.org Tue Dec 12 15:21:32 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 12 Dec 2017 12:21:32 -0800 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: Nick and Victor, I'm still hoping to accept this PEP, but I don't have time to wrap my head around -Xdev ("devmode"?) which appears to be Victor's latest pet project. Should PEP 565 be changed to copy with devmode's behavior, or the other way around, or should they just ignore each other? It is not clear of me what the status of the mention in PEP 565 of -Xdev is -- normative or informational? I really don't want to have to learn how devmode works in order to be able to accept PEP 565 (or send it back for revision), so I am asking you two to let me know. On Wed, Dec 6, 2017 at 1:42 AM, Victor Stinner wrote: > Let's discuss -Xdev implementation issue at https://bugs.python.org/ > issue32230 > > In short, -Xdev must add its warning at the end to respect BytesWarning, > whereas it's not possible with -W option :-( > > Victor > > Le 6 d?c. 2017 09:15, "Nick Coghlan" a ?crit : > > On 6 December 2017 at 14:50, Nick Coghlan wrote: > > On 6 December 2017 at 14:34, Nick Coghlan wrote: > >> That said, I go agree we could offer easier to use APIs to app > >> developers that just want to hide warnings from their users, so I've > >> filed https://bugs.python.org/issue32229 to propose a straightforward > >> "warnings.hide_warnings()" API that encapsulates things like checking > >> for a non-empty sys.warnoptions list. > > > > I've updated the "Limitations" section of the PEP to mention that > > separate proposal: > > https://github.com/python/peps/commit/6e93c8d2e6ad698834578d > 4077b92a8fc84a70f5 > > Having rebased the PEP 565 patch atop the "-X dev" changes, I think > that if we don't change some of the details of how `-X dev` is > implemented, `warnings.hide_warnings` (or a comparable convenience > API) is going to be a requirement to help app developers effectively > manage their default warnings settings in 3.7+. > > The problem is that devmode doesn't currently behave the same way > `-Wd` does when it comes to sys.warnoptions: > > $ ./python -Wd -c "import sys; print(sys.warnoptions); > print(sys.flags.dev_mode)" > ['d'] > False > $ ./python -X dev -c "import sys; print(sys.warnoptions); > print(sys.flags.dev_mode)" > [] > True > > As currently implemented, the warnings module actually checks > `sys.flags.dev_mode` directly during startup (or `sys._xoptions` in > the case of the pure Python fallback), and populates the warnings > filter differently depending on what it finds: > > $ ./python -c "import warnings; print('\n'.join(map(str, > warnings.filters)))" > ('default', None, , '__main__', 0) > ('ignore', None, , None, 0) > ('ignore', None, , None, 0) > ('ignore', None, , None, 0) > ('ignore', None, , None, 0) > ('ignore', None, , None, 0) > > $ ./python -X dev -c "import warnings; print('\n'.join(map(str, > warnings.filters)))" > ('ignore', None, , None, 0) > ('default', None, , None, 0) > ('default', None, , None, 0) > > $ ./python -Wd -c "import warnings; print('\n'.join(map(str, > warnings.filters)))" > ('default', None, , None, 0) > ('default', None, , '__main__', 0) > ('ignore', None, , None, 0) > ('ignore', None, , None, 0) > ('ignore', None, , None, 0) > ('ignore', None, , None, 0) > ('ignore', None, , None, 0) > > This means the app development snippet proposed in the PEP will no > longer do the right thing, since it will ignore the dev mode flag: > > if not sys.warnoptions: > # This still runs for `-X dev` > warnings.simplefilter("ignore") > > My main suggested fix would be to adjust the way `-X dev` is > implemented to include `sys.warnoptions.append('default')` (and remove > the direct dev_mode query from the warnings module code). > > However, another possible way to go would be to make the correct > Python 3.7+-only snippet look like this: > > import warnings > warnings.hide_warnings() > > And have the forward-compatible snippet look like this: > > import warnings: > if hasattr(warnings, "hide_warnings"): > # Accounts for `-W`, `-X dev`, and any other implementation > specific settings > warnings.hide_warnings() > else: > # Only accounts for `-W` > import sys > if not sys.warnoptions: > warnings.simplefilter("ignore") > > (We can also do both, of course) > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Dec 12 17:58:21 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 12 Dec 2017 23:58:21 +0100 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: Hi, 2017-12-12 21:21 GMT+01:00 Guido van Rossum : > I'm still hoping to accept this PEP, but I don't have time to wrap my head > around -Xdev ("devmode"?) which appears to be Victor's latest pet project. > Should PEP 565 be changed to copy with devmode's behavior, or the other way > around, or should they just ignore each other? It is not clear of me what > the status of the mention in PEP 565 of -Xdev is -- normative or > informational? I really don't want to have to learn how devmode works in > order to be able to accept PEP 565 (or send it back for revision), so I am > asking you two to let me know. The warnings filters had a few corner cases. We discussed with Nick to fix them to make them simpler. We agreed on these priorities for command line options and environment variables: -b and -bb > -W > PYTHONWARNINGS > -X dev > default filters In release mode, the default filters became: ignore::DeprecationWarning ignore::PendingDeprecationWarning ignore::ImportWarning ignore::ResourceWarning ignore::BytesWarning is gone. We now rely on the fact the BytesWarning should not be emited without -b nor -bb in practice. It has been implemented in https://bugs.python.org/issue32230 ! (I just merged Nick's PR.) Now -X dev behaves again as my initial propopal: for warnings, "-X dev" simply behaves as "-W default". (Previously, I had to hack the code to respect -b and -bb options, but I don't think that it's worth it to explain that here, it's doesn't matter anymore ;-)) The PEP 565 is still different: it doesn't behaves as "-W default", but "-W default::DeprecationWarning:__main__". Only DeprecationWarning warnings are shown, whereas -X dev shows DeprecationWarning, but also PendingDeprecationWarning, ResourceWarning and ImportWarning. Moreover, -X dev shows warnings in all modules, not only __main__. You may see -X dev as a builtin linter, whereas PEP 565 seems to be very specific to one specific issue: display deprecation warnings, but only in the __main__ module. Does it help you to understand the difference? Note: I still dislike the PEP 565, but well, that's just my opinion ;-) Victor From guido at python.org Tue Dec 12 18:24:52 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 12 Dec 2017 15:24:52 -0800 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: OK, in that case I'll just pronounce approval right here. Considered disagreement is acceptable. Nick, congrats with PEP 565! Please update the PEP to mark it as approved with a link to this message as the resolution, and let's get the implementation into 3.7a4! On Tue, Dec 12, 2017 at 2:58 PM, Victor Stinner wrote: > Hi, > > 2017-12-12 21:21 GMT+01:00 Guido van Rossum : > > I'm still hoping to accept this PEP, but I don't have time to wrap my > head > > around -Xdev ("devmode"?) which appears to be Victor's latest pet > project. > > Should PEP 565 be changed to copy with devmode's behavior, or the other > way > > around, or should they just ignore each other? It is not clear of me what > > the status of the mention in PEP 565 of -Xdev is -- normative or > > informational? I really don't want to have to learn how devmode works in > > order to be able to accept PEP 565 (or send it back for revision), so I > am > > asking you two to let me know. > > The warnings filters had a few corner cases. We discussed with Nick to > fix them to make them simpler. We agreed on these priorities for > command line options and environment variables: > > -b and -bb > -W > PYTHONWARNINGS > -X dev > default filters > > In release mode, the default filters became: > > ignore::DeprecationWarning > ignore::PendingDeprecationWarning > ignore::ImportWarning > ignore::ResourceWarning > > ignore::BytesWarning is gone. We now rely on the fact the BytesWarning > should not be emited without -b nor -bb in practice. > > It has been implemented in https://bugs.python.org/issue32230 ! (I > just merged Nick's PR.) > > > Now -X dev behaves again as my initial propopal: for warnings, "-X > dev" simply behaves as "-W default". (Previously, I had to hack the > code to respect -b and -bb options, but I don't think that it's worth > it to explain that here, it's doesn't matter anymore ;-)) > > The PEP 565 is still different: it doesn't behaves as "-W default", > but "-W default::DeprecationWarning:__main__". Only DeprecationWarning > warnings are shown, whereas -X dev shows DeprecationWarning, but also > PendingDeprecationWarning, ResourceWarning and ImportWarning. > Moreover, -X dev shows warnings in all modules, not only __main__. > > You may see -X dev as a builtin linter, whereas PEP 565 seems to be > very specific to one specific issue: display deprecation warnings, but > only in the __main__ module. > > Does it help you to understand the difference? > > > Note: I still dislike the PEP 565, but well, that's just my opinion ;-) > > Victor > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Dec 12 18:49:53 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 13 Dec 2017 00:49:53 +0100 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: Hi Yury, I like the overall idea and I prefer this PEP over PEP 550 since it's shorter and easier to read :-) Question: Is there an API to list all context variables? Would it be possible to have a very summary of the changes in the PEP? I propose: """ * Added contextvars module with ContextVar, Context and Token classes, and a get_context() function * asyncio: Added keyword-only context parameter to call_at(), call_later(), call_soon() methods of event loops and Future.add_done_callback(); Task are modified internally to maintain their own isolated context. """ Each get_context() call returns a new Context object. It may be worth to mention it. I understand why, but it's surprising that "assert get_context() is not get_context()" fails. Maybe it's a naming issue? Maybe rename it to contextvars.context()? > Abstract: ... This concept is similar to thread-local variables but, unlike TLS, ... nitpick: please write "Thread Local Storage (TLS)". When I read TLS, I understand HTTPS (Transport Layer Security) :-) Your PEP seems to be written for asyncio. Maybe it would help to understand it to make it more explicit in the abstract... even if I understand perfectly that it's not strictly specific to asyncio ;-) > # Declare a context variable 'var' with the default value 42. > var = ContextVar('var', default=42) nitpick: I suggest to use 'name' rather than 'var', to make it obvious that the first parameter is the variable name. > ``contextvars.Token`` is an opaque object that should be used to > restore the ``ContextVar`` to its previous value, or remove it from > the context if it was not set before. The ``ContextVar.reset(Token)`` > is used for that:: > > old = var.set(1) > try: > ... > finally: > var.reset(old) I don't see where is the token in this example. Does set() return a token object? Yes according to ContextVar pseudo-code below. When I read "old", I understand that set() returns the old value, not an opaque token. Maybe rename "old" to "token"? > The ``Token`` API exists to make the current proposal forward > compatible with :pep:`550`, in case there is demand to support > context variables in generators and asynchronous generators in the > future. Cool. I like the idea of starting with something simple in Python 3.7. Then extend it in Python 3.8 or later (support generators), if it becomes popular, once the first simple (but "incomplete", without generators) implementation is battle-tested. > Any changes to any context variables that ``function`` causes, will > be contained in the ``ctx`` context:: > > var = ContextVar('var') > var.set('spam') > > def function(): > assert var.get() == 'spam' > > var.set('ham') > assert var.get() == 'ham' > > ctx = get_context() > ctx.run(function) > > assert var.get('spam') Should I read assert var.get() == 'spam' here? At the first read, I understood that that ctx.run() creates a new temporary context which is removed once ctx.run() returns. Now I understand that context variable values are restored to their previous values once run() completes. Am I right? Maybe add a short comment to explain that? # Call function() in the context ctx # and then restores context variables of ctx to their previous values ctx.run(function) > Backwards Compatibility > ======================= > > This proposal preserves 100% backwards compatibility. Ok. > Libraries that use ``threading.local()`` to store context-related > values, currently work correctly only for synchronous code. Switching > them to use the proposed API will keep their behavior for synchronous > code unmodified, but will automatically enable support for > asynchronous code. I'm confused by this sentence. I suggest to remove it :-) Converting code to contextvars makes it immediately backward incompatible, no I'm not sure that it's a good it to suggest it in this section. Victor From victor.stinner at gmail.com Tue Dec 12 18:53:15 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 13 Dec 2017 00:53:15 +0100 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: 2017-12-13 0:24 GMT+01:00 Guido van Rossum : > Considered disagreement is acceptable. Sure, I'm fine with that ;-) > Nick, congrats with PEP 565! Please update the PEP to mark it as approved > with a link to this message as the resolution, and let's get the > implementation into 3.7a4! Nick wrote that he will be away, since I update his PEP: https://github.com/python/peps/commit/355eced94cf4117492c9e1eee8f950f08e53ec90 Victor From yselivanov.ml at gmail.com Tue Dec 12 20:35:00 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 12 Dec 2017 20:35:00 -0500 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: Hi Victor, On Tue, Dec 12, 2017 at 6:49 PM, Victor Stinner wrote: > Hi Yury, > > I like the overall idea and I prefer this PEP over PEP 550 since it's > shorter and easier to read :-) > > Question: Is there an API to list all context variables? Context implements abc.Mapping, so 'get_context().keys()' will give you a list of all ContextVars in the current context. > > Would it be possible to have a very summary of the changes in the PEP? > I propose: > > """ > * Added contextvars module with ContextVar, Context and Token classes, > and a get_context() function > * asyncio: Added keyword-only context parameter to call_at(), > call_later(), call_soon() methods of event loops and > Future.add_done_callback(); Task are modified internally to maintain > their own isolated context. > """ Added. > > Each get_context() call returns a new Context object. It may be worth > to mention it. I understand why, but it's surprising that "assert > get_context() is not get_context()" fails. Maybe it's a naming issue? > Maybe rename it to contextvars.context()? I think the name is fine. While get_context() will return a new instance every time you call it, those instances will have the same context variables/values in them, so I don't think it's a problem. > > >> Abstract: ... This concept is similar to thread-local variables but, unlike TLS, ... > > nitpick: please write "Thread Local Storage (TLS)". When I read TLS, I > understand HTTPS (Transport Layer Security) :-) Fixed. [..] >> ``contextvars.Token`` is an opaque object that should be used to >> restore the ``ContextVar`` to its previous value, or remove it from >> the context if it was not set before. The ``ContextVar.reset(Token)`` >> is used for that:: >> >> old = var.set(1) >> try: >> ... >> finally: >> var.reset(old) > > I don't see where is the token in this example. Does set() return a > token object? Yes according to ContextVar pseudo-code below. > > When I read "old", I understand that set() returns the old value, not > an opaque token. Maybe rename "old" to "token"? Fixed. > > >> The ``Token`` API exists to make the current proposal forward >> compatible with :pep:`550`, in case there is demand to support >> context variables in generators and asynchronous generators in the >> future. > > Cool. I like the idea of starting with something simple in Python 3.7. > Then extend it in Python 3.8 or later (support generators), if it > becomes popular, once the first simple (but "incomplete", without > generators) implementation is battle-tested. > > >> Any changes to any context variables that ``function`` causes, will >> be contained in the ``ctx`` context:: >> >> var = ContextVar('var') >> var.set('spam') >> >> def function(): >> assert var.get() == 'spam' >> >> var.set('ham') >> assert var.get() == 'ham' >> >> ctx = get_context() >> ctx.run(function) >> >> assert var.get('spam') > > Should I read assert var.get() == 'spam' here? Yes, fixed. > > At the first read, I understood that that ctx.run() creates a new > temporary context which is removed once ctx.run() returns. > > Now I understand that context variable values are restored to their > previous values once run() completes. Am I right? ctx.run(func) runs 'func' in the 'ctx' context. Any changes to ContextVars that func makes will stay isolated to the 'ctx' context. > > Maybe add a short comment to explain that? Added. > > # Call function() in the context ctx > # and then restores context variables of ctx to their previous values > ctx.run(function) > > >> Backwards Compatibility >> ======================= >> >> This proposal preserves 100% backwards compatibility. > > Ok. > >> Libraries that use ``threading.local()`` to store context-related >> values, currently work correctly only for synchronous code. Switching >> them to use the proposed API will keep their behavior for synchronous >> code unmodified, but will automatically enable support for >> asynchronous code. > > I'm confused by this sentence. I suggest to remove it :-) > > Converting code to contextvars makes it immediately backward > incompatible, no I'm not sure that it's a good it to suggest it in > this section. If we update decimal to use ContextVars internally, decimal will stay 100% backwards compatible. Yury From rosuav at gmail.com Tue Dec 12 21:07:20 2017 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 13 Dec 2017 13:07:20 +1100 Subject: [Python-Dev] [python/peps] PEP 567 review and copyedits (#503) In-Reply-To: References: Message-ID: Redirecting comments from the PR to the ML. Everything that was tightly bound to the PR has been dropped. On Wed, Dec 13, 2017 at 12:15 PM, Yury Selivanov wrote: > Most of your questions should be asked on python-dev. I'll answer them here, but if you have any follow-ups, please raise the on the ml. > >> What happens if you set, set again, then reset from the first one's token? > > The context will be reset to the state it was in before the first set, w.r.t. that variable's value. > >> A peek at the implementation shows that it simply resets the value, so aside from having magic that allows it to represent "no value", the token buys nothing that you couldn't get by simply returning the old value - which is a valuable API to have. Am I reading this correctly? > > "no value" is the main feature of Token, it what makes the PEP future compatible with PEP 550. A lot of APIs are built to return the old value, not wrapped in any sort of opaque token. If the "no value" magic were to be exposed (contextvars.NO_VALUE as a specific sentinel value), this would allow deliberate use of "previous value" as an actual part of the API. Does futureproofing require that the token be opaque? >> Implementation, _ContextData class - I don't often see "self.__mapping" in PEPs unless the name mangling is actually needed. Is it used here? Would "self._mapping" be as effective? > > __ is used to highlight the fact that all those attributes are private and inaccessible for Python code. Just to clarify, then: this is an artifact of the Python reference implementation being unable to perfectly represent the behaviour of C code, but had you been writing this for actual implementation in the stdlib, you'd have used a single underscore? >> The HAMT for Context boasts O(log N) 'set' operations. What is N? Number of variables used? Number of times they get set? Number of times set w/o being reset? > > Number of ContextVars set in the context. Thanks. Might be worth mentioning that; with the semantically-equivalent Python code, it's obvious that the cost of the copy scales with the number of unique variables, but without knowing what the actual HAMT does, it's not so obvious that that's true there too. The net result is that repeatedly setting the same variable doesn't increase the cost - right? Most of my questions were fairly straightforwardly answered in Yury's response on the PR. One question about formatting has been subsequently fixed, so all's well on that front too. The PEP looks pretty good to me. ChrisA From yselivanov.ml at gmail.com Tue Dec 12 21:24:25 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 12 Dec 2017 21:24:25 -0500 Subject: [Python-Dev] [python/peps] PEP 567 review and copyedits (#503) In-Reply-To: References: Message-ID: On Tue, Dec 12, 2017 at 9:07 PM, Chris Angelico wrote: > Redirecting comments from the PR to the ML. Everything that was > tightly bound to the PR has been dropped. > > On Wed, Dec 13, 2017 at 12:15 PM, Yury Selivanov > wrote: >> Most of your questions should be asked on python-dev. I'll answer them here, but if you have any follow-ups, please raise the on the ml. >> >>> What happens if you set, set again, then reset from the first one's token? >> >> The context will be reset to the state it was in before the first set, w.r.t. that variable's value. >> >>> A peek at the implementation shows that it simply resets the value, so aside from having magic that allows it to represent "no value", the token buys nothing that you couldn't get by simply returning the old value - which is a valuable API to have. Am I reading this correctly? >> >> "no value" is the main feature of Token, it what makes the PEP future compatible with PEP 550. > > A lot of APIs are built to return the old value, not wrapped in any > sort of opaque token. If the "no value" magic were to be exposed > (contextvars.NO_VALUE as a specific sentinel value), this would allow > deliberate use of "previous value" as an actual part of the API. Does > futureproofing require that the token be opaque? To get the previous value just use 'ContextVar.get()' before you call 'ContextVar.set()'. I don't see a lot of value in further enhancing "Token". Future-proofing requires us to have no ContextVar.delete() method, and that's why we have the set/reset API. > >>> Implementation, _ContextData class - I don't often see "self.__mapping" in PEPs unless the name mangling is actually needed. Is it used here? Would "self._mapping" be as effective? >> >> __ is used to highlight the fact that all those attributes are private and inaccessible for Python code. > > Just to clarify, then: this is an artifact of the Python reference > implementation being unable to perfectly represent the behaviour of C > code, but had you been writing this for actual implementation in the > stdlib, you'd have used a single underscore? I'd still use the dunder prefix, there's nothing wrong with it IMO. > >>> The HAMT for Context boasts O(log N) 'set' operations. What is N? Number of variables used? Number of times they get set? Number of times set w/o being reset? >> >> Number of ContextVars set in the context. > > Thanks. Might be worth mentioning that; Yeah, I'll update the PEP. > with the > semantically-equivalent Python code, it's obvious that the cost of the > copy scales with the number of unique variables, but without knowing > what the actual HAMT does, it's not so obvious that that's true there > too. The net result is that repeatedly setting the same variable > doesn't increase the cost - right? Yes! I had to balance the implementation around the following constraints: 1. get_context() must be fast, as it will be used in asyncio.call_soon() all the time. 2. ContextVar.get() must be fast, as modules like numpy and decimal won't use it otherwise. 3. ContextVar.set() must not be slow, or become slower and slower as we have more and more vars on the context. Having an immutable dict (as opposed to using 'dict.copy()') allows us to have a super fast 'get_context()'. We could move 'dict.copy()' to 'ContextVar.set()', but then it would make it an O(N) operation, which isn't acceptable either. HAMT is a way to implement an immutable mapping with a fast O(log N) 'set()' operation. > The PEP looks pretty good to me. Thank you, Chris. Yury From guido at python.org Tue Dec 12 21:55:30 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 12 Dec 2017 18:55:30 -0800 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: On Tue, Dec 12, 2017 at 5:35 PM, Yury Selivanov wrote: > On Tue, Dec 12, 2017 at 6:49 PM, Victor Stinner > wrote: > > I like the overall idea and I prefer this PEP over PEP 550 since it's > > shorter and easier to read :-) > > > > Question: Is there an API to list all context variables? > > Context implements abc.Mapping, so 'get_context().keys()' will give > you a list of all ContextVars in the current context. > This was hinted at in the PEP, but maybe an explicit example would be nice. > > Each get_context() call returns a new Context object. It may be worth > > to mention it. I understand why, but it's surprising that "assert > > get_context() is not get_context()" fails. Maybe it's a naming issue? > > Maybe rename it to contextvars.context()? > > I think the name is fine. While get_context() will return a new instance > every time you call it, those instances will have the same context > variables/values in them, so I don't think it's a problem. > I'm fine with this, but perhaps == should be supported so that those two are guaranteed to be considered equal? (Otherwise an awkward idiom to compare contexts using expensive dict() copies would be needed to properly compare two contexts for equality.) > > At the first read, I understood that that ctx.run() creates a new > > temporary context which is removed once ctx.run() returns. > > > > Now I understand that context variable values are restored to their > > previous values once run() completes. Am I right? > > ctx.run(func) runs 'func' in the 'ctx' context. Any changes to > ContextVars that func makes will stay isolated to the 'ctx' context. > > > > > Maybe add a short comment to explain that? > > Added. > The PEP still contains the following paragraph: > Any changes to the context will be contained and persisted in the > ``Context`` object on which ``run()`` is called on. This phrase is confusing; it could be read as implying that context changes made by the function *will* get propagated back to the caller of run(), contradicting what was said earlier. Maybe it's best to just delete it? Otherwise if you intend it to add something it needs to be rephrased. Maybe "persisted" is the key word causing confusion? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Dec 12 22:12:42 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 12 Dec 2017 22:12:42 -0500 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: On Tue, Dec 12, 2017 at 9:55 PM, Guido van Rossum wrote: > On Tue, Dec 12, 2017 at 5:35 PM, Yury Selivanov > wrote: >> >> On Tue, Dec 12, 2017 at 6:49 PM, Victor Stinner >> wrote: >> > I like the overall idea and I prefer this PEP over PEP 550 since it's >> > shorter and easier to read :-) >> > >> > Question: Is there an API to list all context variables? >> >> Context implements abc.Mapping, so 'get_context().keys()' will give >> you a list of all ContextVars in the current context. > > > This was hinted at in the PEP, but maybe an explicit example would be nice. Sure. > >> >> > Each get_context() call returns a new Context object. It may be worth >> > to mention it. I understand why, but it's surprising that "assert >> > get_context() is not get_context()" fails. Maybe it's a naming issue? >> > Maybe rename it to contextvars.context()? >> >> I think the name is fine. While get_context() will return a new instance >> every time you call it, those instances will have the same context >> variables/values in them, so I don't think it's a problem. > > > I'm fine with this, but perhaps == should be supported so that those two are > guaranteed to be considered equal? (Otherwise an awkward idiom to compare > contexts using expensive dict() copies would be needed to properly compare > two contexts for equality.) I've no problem with implementing 'Context.__eq__'. I think abc.Mapping also implements it. > >> >> > At the first read, I understood that that ctx.run() creates a new >> > temporary context which is removed once ctx.run() returns. >> > >> > Now I understand that context variable values are restored to their >> > previous values once run() completes. Am I right? >> >> ctx.run(func) runs 'func' in the 'ctx' context. Any changes to >> ContextVars that func makes will stay isolated to the 'ctx' context. >> >> > >> > Maybe add a short comment to explain that? >> >> Added. > > > The PEP still contains the following paragraph: > >> Any changes to the context will be contained and persisted in the >> ``Context`` object on which ``run()`` is called on. > > This phrase is confusing; it could be read as implying that context changes > made by the function *will* get propagated back to the caller of run(), > contradicting what was said earlier. Maybe it's best to just delete it? > Otherwise if you intend it to add something it needs to be rephrased. Maybe > "persisted" is the key word causing confusion? I'll remove "persisted" now, I agree it adds more confusion than clarity. Victor is also confused with how 'Context.run()' is currently explained, I'll try to make it clearer. Thank you, Yury From guido at python.org Tue Dec 12 22:36:37 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 12 Dec 2017 19:36:37 -0800 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: Some more feedback: > This proposal builds directly upon concepts originally introduced > in :pep:`550`. The phrase "builds upon" typically implies that the other resource must be read and understood first. I don't think that we should require PEP 550 for understanding of PEP 567. Maybe "This proposal is a simplified version of :pep:`550`." ? > The notion of "current value" deserves special consideration: > different asynchronous tasks that exist and execute concurrently > may have different values. This idea is well-known from thread-local > storage but in this case the locality of the value is not always > necessarily to a thread. Instead, there is the notion of the > "current ``Context``" which is stored in thread-local storage, and > is accessed via ``contextvars.get_context()`` function. > Manipulation of the current ``Context`` is the responsibility of the > task framework, e.g. asyncio. This begs two (related) questions: - If it's stored in TLS, why isn't it equivalent to TLS? - If it's read-only (as mentioned in the next paragraph) how can the framework modify it? I realize the answers are clear, but at this point in the exposition you haven't given the reader enough information to answer them, so this paragraph may confuse readers. > Specification > ============= > [points 1, 2, 3] Shouldn't this also list Token? (It must be a class defined here so users can declare the type of variables/arguments in their code representing these tokens.) > The ``ContextVar`` class has the following constructor signature: > ``ContextVar(name, *, default=no_default)``. I think a word or two about the provenance of `no_default` would be good. (I think it's an internal singleton right?) Ditto for NO_DEFAULT in the C implementation sketch. > class Task: > def __init__(self, coro): Do we need a keyword arg 'context=None' here too? (I'm not sure what would be the use case, but somehow it stands out in comparison to call_later() etc.) > CPython C API > ------------- > TBD Yeah, what about it? :-) > The internal immutable dictionary for ``Context`` is implemented > using Hash Array Mapped Tries (HAMT). They allow for O(log N) ``set`` > operation, and for O(1) ``get_context()`` function. [...] I wonder if we can keep the HAMT out of the discussion at this point. I have nothing against it, but given that you already say you're leaving out optimizations and nothing in the pseudo code given here depends on them I wonder if they shouldn't be mentioned later. (Also the appendix with the perf analysis is the one thing that I think we can safely leave out, just reference PEP 550 for this.) > class _ContextData Since this isn't a real class anyway I think the __mapping attribute might as well be named _mapping. Ditto for other __variables later. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Dec 12 23:20:07 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 12 Dec 2017 23:20:07 -0500 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: On Tue, Dec 12, 2017 at 10:36 PM, Guido van Rossum wrote: > Some more feedback: > >> This proposal builds directly upon concepts originally introduced >> in :pep:`550`. > > The phrase "builds upon" typically implies that the other resource must be > read and understood first. I don't think that we should require PEP 550 for > understanding of PEP 567. Maybe "This proposal is a simplified version of > :pep:`550`." ? I agree, "simplified version" is better. > >> The notion of "current value" deserves special consideration: >> different asynchronous tasks that exist and execute concurrently >> may have different values. This idea is well-known from thread-local >> storage but in this case the locality of the value is not always >> necessarily to a thread. Instead, there is the notion of the >> "current ``Context``" which is stored in thread-local storage, and >> is accessed via ``contextvars.get_context()`` function. >> Manipulation of the current ``Context`` is the responsibility of the >> task framework, e.g. asyncio. > > This begs two (related) questions: > - If it's stored in TLS, why isn't it equivalent to TLS? > - If it's read-only (as mentioned in the next paragraph) how can the > framework modify it? > > I realize the answers are clear, but at this point in the exposition you > haven't given the reader enough information to answer them, so this > paragraph may confuse readers. I'll think how to rephrase it. > >> Specification >> ============= >> [points 1, 2, 3] > > Shouldn't this also list Token? (It must be a class defined here so users > can declare the type of variables/arguments in their code representing these > tokens.) > >> The ``ContextVar`` class has the following constructor signature: >> ``ContextVar(name, *, default=no_default)``. > > I think a word or two about the provenance of `no_default` would be good. (I > think it's an internal singleton right?) Ditto for NO_DEFAULT in the C > implementation sketch. Fixed. > >> class Task: >> def __init__(self, coro): > > Do we need a keyword arg 'context=None' here too? (I'm not sure what would > be the use case, but somehow it stands out in comparison to call_later() > etc.) call_later() is low-level and it needs the 'context' argument as Task and Future use it in their implementation. It would be easy to add 'context' parameter to Task and loop.create_task(), but I don't know about any concrete use-case for that just yet. > >> CPython C API >> ------------- >> TBD > > Yeah, what about it? :-) I've added it: https://github.com/python/peps/pull/508/files I didn't want to get into too much detail about the C API until I have a working PR. Although I feel that the one I describe in the PEP now is very close to what we'll have. > >> The internal immutable dictionary for ``Context`` is implemented >> using Hash Array Mapped Tries (HAMT). They allow for O(log N) ``set`` >> operation, and for O(1) ``get_context()`` function. [...] > > I wonder if we can keep the HAMT out of the discussion at this point. I have > nothing against it, but given that you already say you're leaving out > optimizations and nothing in the pseudo code given here depends on them I > wonder if they shouldn't be mentioned later. (Also the appendix with the > perf analysis is the one thing that I think we can safely leave out, just > reference PEP 550 for this.) I've added a new section "Implementation Notes" that mentions HAMT and ContextVar.get() cache. Both refer to PEP 550's lengthy explanations. > >> class _ContextData > > Since this isn't a real class anyway I think the __mapping attribute might > as well be named _mapping. Ditto for other __variables later. Done. Yury From dimaqq at gmail.com Wed Dec 13 01:39:35 2017 From: dimaqq at gmail.com (Dima Tisnek) Date: Wed, 13 Dec 2017 14:39:35 +0800 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: My 2c: TL;DR PEP specifies implementation in some detail, but doesn't show how proposed change can or should be used. get()/set(value)/delete() methods: Python provides syntax sugar for these, let's use it. (dict: d["k"]/d["k] = value/del d["k"]; attrs: obj.k/obj.k = value/del obj.k; inheriting threading.Local) This PEP and 550 describe why TLS is inadequate, but don't seem to specify how proposed context behaves in async world. I'd be most interested in how it appears to work to the user of the new library. Consider a case of asynchronous cache: async def actual_lookup(name): ... def cached_lookup(name, cache={}): if name not in cache: cache["name"] = shield(ensure_future(actual_lookup(name)) return cache["name"] Unrelated (or related) asynchronous processes end up waiting on the same future: async def called_with_user_context(): ... await cached_lookup(...) ... Which context is propagated to actual_lookup()? The PEP doesn't seem to state that clearly. It appears to be first caller's context. Is it a copy or a reference? If first caller is cancelled, the context remains alive. token is fragile, I believe PEP should propose a working context manager instead. Btw., isn't a token really a reference to state-of-context-before-it's-cloned-and-modified? On 13 December 2017 at 01:33, Yury Selivanov wrote: > Hi, > > This is a new proposal to implement context storage in Python. > > It's a successor of PEP 550 and builds on some of its API ideas and > datastructures. Contrary to PEP 550 though, this proposal only focuses > on adding new APIs and implementing support for it in asyncio. There > are no changes to the interpreter or to the behaviour of generator or > coroutine objects. > > > PEP: 567 > Title: Context Variables > Version: $Revision$ > Last-Modified: $Date$ > Author: Yury Selivanov > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 12-Dec-2017 > Python-Version: 3.7 > Post-History: 12-Dec-2017 > > > Abstract > ======== > > This PEP proposes the new ``contextvars`` module and a set of new > CPython C APIs to support context variables. This concept is > similar to thread-local variables but, unlike TLS, it allows > correctly keeping track of values per asynchronous task, e.g. > ``asyncio.Task``. > > This proposal builds directly upon concepts originally introduced > in :pep:`550`. The key difference is that this PEP is only concerned > with solving the case for asynchronous tasks, and not generators. > There are no proposed modifications to any built-in types or to the > interpreter. > > > Rationale > ========= > > Thread-local variables are insufficient for asynchronous tasks which > execute concurrently in the same OS thread. Any context manager that > needs to save and restore a context value and uses > ``threading.local()``, will have its context values bleed to other > code unexpectedly when used in async/await code. > > A few examples where having a working context local storage for > asynchronous code is desired: > > * Context managers like decimal contexts and ``numpy.errstate``. > > * Request-related data, such as security tokens and request > data in web applications, language context for ``gettext`` etc. > > * Profiling, tracing, and logging in large code bases. > > > Introduction > ============ > > The PEP proposes a new mechanism for managing context variables. > The key classes involved in this mechanism are ``contextvars.Context`` > and ``contextvars.ContextVar``. The PEP also proposes some policies > for using the mechanism around asynchronous tasks. > > The proposed mechanism for accessing context variables uses the > ``ContextVar`` class. A module (such as decimal) that wishes to > store a context variable should: > > * declare a module-global variable holding a ``ContextVar`` to > serve as a "key"; > > * access the current value via the ``get()`` method on the > key variable; > > * modify the current value via the ``set()`` method on the > key variable. > > The notion of "current value" deserves special consideration: > different asynchronous tasks that exist and execute concurrently > may have different values. This idea is well-known from thread-local > storage but in this case the locality of the value is not always > necessarily to a thread. Instead, there is the notion of the > "current ``Context``" which is stored in thread-local storage, and > is accessed via ``contextvars.get_context()`` function. > Manipulation of the current ``Context`` is the responsibility of the > task framework, e.g. asyncio. > > A ``Context`` is conceptually a mapping, implemented using an > immutable dictionary. The ``ContextVar.get()`` method does a > lookup in the current ``Context`` with ``self`` as a key, raising a > ``LookupError`` or returning a default value specified in > the constructor. > > The ``ContextVar.set(value)`` method clones the current ``Context``, > assigns the ``value`` to it with ``self`` as a key, and sets the > new ``Context`` as a new current. Because ``Context`` uses an > immutable dictionary, cloning it is O(1). > > > Specification > ============= > > A new standard library module ``contextvars`` is added with the > following APIs: > > 1. ``get_context() -> Context`` function is used to get the current > ``Context`` object for the current OS thread. > > 2. ``ContextVar`` class to declare and access context variables. > > 3. ``Context`` class encapsulates context state. Every OS thread > stores a reference to its current ``Context`` instance. > It is not possible to control that reference manually. > Instead, the ``Context.run(callable, *args)`` method is used to run > Python code in another context. > > > contextvars.ContextVar > ---------------------- > > The ``ContextVar`` class has the following constructor signature: > ``ContextVar(name, *, default=no_default)``. The ``name`` parameter > is used only for introspection and debug purposes. The ``default`` > parameter is optional. Example:: > > # Declare a context variable 'var' with the default value 42. > var = ContextVar('var', default=42) > > ``ContextVar.get()`` returns a value for context variable from the > current ``Context``:: > > # Get the value of `var`. > var.get() > > ``ContextVar.set(value) -> Token`` is used to set a new value for > the context variable in the current ``Context``:: > > # Set the variable 'var' to 1 in the current context. > var.set(1) > > ``contextvars.Token`` is an opaque object that should be used to > restore the ``ContextVar`` to its previous value, or remove it from > the context if it was not set before. The ``ContextVar.reset(Token)`` > is used for that:: > > old = var.set(1) > try: > ... > finally: > var.reset(old) > > The ``Token`` API exists to make the current proposal forward > compatible with :pep:`550`, in case there is demand to support > context variables in generators and asynchronous generators in the > future. > > ``ContextVar`` design allows for a fast implementation of > ``ContextVar.get()``, which is particularly important for modules > like ``decimal`` an ``numpy``. > > > contextvars.Context > ------------------- > > ``Context`` objects are mappings of ``ContextVar`` to values. > > To get the current ``Context`` for the current OS thread, use > ``contextvars.get_context()`` method:: > > ctx = contextvars.get_context() > > To run Python code in some ``Context``, use ``Context.run()`` > method:: > > ctx.run(function) > > Any changes to any context variables that ``function`` causes, will > be contained in the ``ctx`` context:: > > var = ContextVar('var') > var.set('spam') > > def function(): > assert var.get() == 'spam' > > var.set('ham') > assert var.get() == 'ham' > > ctx = get_context() > ctx.run(function) > > assert var.get('spam') > > Any changes to the context will be contained and persisted in the > ``Context`` object on which ``run()`` is called on. > > ``Context`` objects implement the ``collections.abc.Mapping`` ABC. > This can be used to introspect context objects:: > > ctx = contextvars.get_context() > > # Print all context variables in their values in 'ctx': > print(ctx.items()) > > # Print the value of 'some_variable' in context 'ctx': > print(ctx[some_variable]) > > > asyncio > ------- > > ``asyncio`` uses ``Loop.call_soon()``, ``Loop.call_later()``, > and ``Loop.call_at()`` to schedule the asynchronous execution of a > function. ``asyncio.Task`` uses ``call_soon()`` to run the > wrapped coroutine. > > We modify ``Loop.call_{at,later,soon}`` to accept the new > optional *context* keyword-only argument, which defaults to > the current context:: > > def call_soon(self, callback, *args, context=None): > if context is None: > context = contextvars.get_context() > > # ... some time later > context.run(callback, *args) > > Tasks in asyncio need to maintain their own isolated context. > ``asyncio.Task`` is modified as follows:: > > class Task: > def __init__(self, coro): > ... > # Get the current context snapshot. > self._context = contextvars.get_context() > self._loop.call_soon(self._step, context=self._context) > > def _step(self, exc=None): > ... > # Every advance of the wrapped coroutine is done in > # the task's context. > self._loop.call_soon(self._step, context=self._context) > ... > > > CPython C API > ------------- > > TBD > > > Implementation > ============== > > This section explains high-level implementation details in > pseudo-code. Some optimizations are omitted to keep this section > short and clear. > > The internal immutable dictionary for ``Context`` is implemented > using Hash Array Mapped Tries (HAMT). They allow for O(log N) ``set`` > operation, and for O(1) ``get_context()`` function. For the purposes > of this section, we implement an immutable dictionary using > ``dict.copy()``:: > > class _ContextData: > > def __init__(self): > self.__mapping = dict() > > def get(self, key): > return self.__mapping[key] > > def set(self, key, value): > copy = _ContextData() > copy.__mapping = self.__mapping.copy() > copy.__mapping[key] = value > return copy > > def delete(self, key): > copy = _ContextData() > copy.__mapping = self.__mapping.copy() > del copy.__mapping[key] > return copy > > Every OS thread has a reference to the current ``_ContextData``. > ``PyThreadState`` is updated with a new ``context_data`` field that > points to a ``_ContextData`` object:: > > PyThreadState: > context : _ContextData > > ``contextvars.get_context()`` is implemented as follows: > > def get_context(): > ts : PyThreadState = PyThreadState_Get() > > if ts.context_data is None: > ts.context_data = _ContextData() > > ctx = Context() > ctx.__data = ts.context_data > return ctx > > ``contextvars.Context`` is a wrapper around ``_ContextData``:: > > class Context(collections.abc.Mapping): > > def __init__(self): > self.__data = _ContextData() > > def run(self, callable, *args): > ts : PyThreadState = PyThreadState_Get() > saved_data : _ContextData = ts.context_data > > try: > ts.context_data = self.__data > callable(*args) > finally: > self.__data = ts.context_data > ts.context_data = saved_data > > # Mapping API methods are implemented by delegating > # `get()` and other Mapping calls to `self.__data`. > > ``contextvars.ContextVar`` interacts with > ``PyThreadState.context_data`` directly:: > > class ContextVar: > > def __init__(self, name, *, default=NO_DEFAULT): > self.__name = name > self.__default = default > > @property > def name(self): > return self.__name > > def get(self, default=NO_DEFAULT): > ts : PyThreadState = PyThreadState_Get() > data : _ContextData = ts.context_data > > try: > return data.get(self) > except KeyError: > pass > > if default is not NO_DEFAULT: > return default > > if self.__default is not NO_DEFAULT: > return self.__default > > raise LookupError > > def set(self, value): > ts : PyThreadState = PyThreadState_Get() > data : _ContextData = ts.context_data > > try: > old_value = data.get(self) > except KeyError: > old_value = NO_VALUE > > ts.context_data = data.set(self, value) > return Token(self, old_value) > > def reset(self, token): > if token.__used: > return > > if token.__old_value is NO_VALUE: > ts.context_data = data.delete(token.__var) > else: > ts.context_data = data.set(token.__var, > token.__old_value) > > token.__used = True > > > class Token: > > def __init__(self, var, old_value): > self.__var = var > self.__old_value = old_value > self.__used = False > > > Backwards Compatibility > ======================= > > This proposal preserves 100% backwards compatibility. > > Libraries that use ``threading.local()`` to store context-related > values, currently work correctly only for synchronous code. Switching > them to use the proposed API will keep their behavior for synchronous > code unmodified, but will automatically enable support for > asynchronous code. > > > Appendix: HAMT Performance Analysis > =================================== > > .. figure:: pep-0550-hamt_vs_dict-v2.png > :align: center > :width: 100% > > Figure 1. Benchmark code can be found here: [1]_. > > The above chart demonstrates that: > > * HAMT displays near O(1) performance for all benchmarked > dictionary sizes. > > * ``dict.copy()`` becomes very slow around 100 items. > > .. figure:: pep-0550-lookup_hamt.png > :align: center > :width: 100% > > Figure 2. Benchmark code can be found here: [2]_. > > Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based > immutable mapping. HAMT lookup time is 30-40% slower than Python dict > lookups on average, which is a very good result, considering that the > latter is very well optimized. > > The reference implementation of HAMT for CPython can be found here: > [3]_. > > > References > ========== > > .. [1] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd > > .. [2] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e > > .. [3] https://github.com/1st1/cpython/tree/hamt > > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/dimaqq%40gmail.com From njs at pobox.com Wed Dec 13 05:23:44 2017 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 13 Dec 2017 02:23:44 -0800 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: On Tue, Dec 12, 2017 at 10:39 PM, Dima Tisnek wrote: > My 2c: > TL;DR PEP specifies implementation in some detail, but doesn't show > how proposed change can or should be used. > > > > get()/set(value)/delete() methods: Python provides syntax sugar for > these, let's use it. > (dict: d["k"]/d["k] = value/del d["k"]; attrs: obj.k/obj.k = value/del > obj.k; inheriting threading.Local) This was already discussed to death in the PEP 550 threads... what most users want is a single value, and routing get/set through a ContextVar object allows for important optimizations and a simpler implementation. Also, remember that 99% of users will never use these objects directly; it's a low-level API mostly useful to framework implementers. > This PEP and 550 describe why TLS is inadequate, but don't seem to > specify how proposed context behaves in async world. I'd be most > interested in how it appears to work to the user of the new library. > > Consider a case of asynchronous cache: > > async def actual_lookup(name): > ... > > def cached_lookup(name, cache={}): > if name not in cache: > cache["name"] = shield(ensure_future(actual_lookup(name)) > return cache["name"] > > Unrelated (or related) asynchronous processes end up waiting on the same future: > > async def called_with_user_context(): > ... > await cached_lookup(...) > ... > > Which context is propagated to actual_lookup()? > The PEP doesn't seem to state that clearly. > It appears to be first caller's context. Yes. > Is it a copy or a reference? It's a copy, as returned by get_context(). > If first caller is cancelled, the context remains alive. > > > > token is fragile, I believe PEP should propose a working context > manager instead. > Btw., isn't a token really a reference to > state-of-context-before-it's-cloned-and-modified? No, a Token only represents the value of one ContextVar, not the whole Context. This could maybe be clearer in the PEP, but it has to be this way or you'd get weird behavior from code like: with decimal.localcontext(...): # sets and then restores numpy.seterr(...) # sets without any plan to restore # after the 'with' block, the decimal ContextVar gets restored # but this shouldn't affect the numpy.seterr ContextVar -n -- Nathaniel J. Smith -- https://vorpus.org From victor.stinner at gmail.com Wed Dec 13 05:48:14 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 13 Dec 2017 11:48:14 +0100 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: Hi Dima, 2017-12-13 7:39 GMT+01:00 Dima Tisnek : > get()/set(value)/delete() methods: Python provides syntax sugar for > these, let's use it. > (dict: d["k"]/d["k] = value/del d["k"]; attrs: obj.k/obj.k = value/del > obj.k; inheriting threading.Local) I was trapped by Context which is described as "a mapping". Usually, when I read "mappin", I associate it to a mutable dictionary. But in fact Context is a *read-only* mapping. Yury changed the Introduction to add "read-only", but not the Context section: https://www.python.org/dev/peps/pep-0567/#contextvars-context Only a single ContextVar variable can be modified. This object is a container for a *single* value, not a mapping, you cannot write "var = value", you have to write "var.set(value)", and "var['key] = value" doesn't make sense. > This PEP and 550 describe why TLS is inadequate, but don't seem to > specify how proposed context behaves in async world. I'd be most > interested in how it appears to work to the user of the new library. In short, context is inherited automatically, you have nothing to do :-) Put anything you want into a context, and it will follow transparently your asynchronous code. The answer is in the sentence: "Tasks in asyncio need to maintain their own context that they inherit from the point they were created at. " You may want to use a task context to pass data from a HTTP request: user name, cookie, IP address, etc. If you save data into the "current context", in practice, the context is inherited by tasks and callbacks, and so even if your code is made of multiple tasks, you still "inherit" the context as expected. Only tasks have to manually "save/restore" the context, since only tasks use "await" in their code, not callbacks called by call_soon() & cie. > token is fragile, I believe PEP should propose a working context > manager instead. Why is it fragile? In asyncio, you cannot use a context manager because of the design of tasks. Victor From storchaka at gmail.com Wed Dec 13 10:26:11 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 13 Dec 2017 17:26:11 +0200 Subject: [Python-Dev] Zero-width matching in regexes In-Reply-To: <16f7437d-3e11-6a19-3569-e4d55a370744@mrabarnett.plus.com> References: <16f7437d-3e11-6a19-3569-e4d55a370744@mrabarnett.plus.com> Message-ID: <70b43b21-2551-6301-75a1-cd50460d4911@gmail.com> 05.12.17 01:21, MRAB ????: > I've finally come to a conclusion as to what the "correct" behaviour of > zero-width matches should be: """always return the first match, but > never a zero-width match that is joined to a previous zero-width match""". > > If it's about to return a zero-width match that's joined to a previous > zero-width match, then backtrack and keep on looking for a match. > > Example: > > >>> print([m.span() for m in re.finditer(r'|.', 'a')]) > [(0, 0), (0, 1), (1, 1)] > > re.findall, re.split and re.sub should work accordingly. > > If re.finditer finds n matches, then re.split should return a list of > n+1 strings and re.sub should make n replacements (excepting maxsplit, > etc.). We now have a good opportunity of changing a long standing behavior of re.sub(). Currently empty matches are prohibited if adjacent to a previous match. For consistency with re.finditer() and re.findall(), with regex.sub() with VERSION1 flag, and with Perl, PCRE and other engines they should be prohibited only if adjacent to a previous *empty* match. Currently re.sub('x*', '-', 'abxc') returns '-a-b-c-', but will return '-a-b--c-' if change the behavior. This behavior already was unintentionally temporary changed between 2.1 and 2.2, when the underlying implementation of re was changed from PCRE to SRE. But the former behavior was quickly restored (see https://bugs.python.org/issue462270). Ironically the behavior of the current PCRE is different. Possible options: 1. Change the behavior right now. 2. Start emitting a FutureWarning and change the behavior in future version. 3. Keep the status quo forever. We need to make a decision right now since in the first two cases we should to change the behavior of re.split() right now. Its behavior is changed in 3.7 in any case, and it is better to change the behavior once than break the behavior in two different releases. The changed detail is so subtle that no regular expressions in the stdlib and tests are affected, except the special purposed test added for guarding the current behavior. From solipsis at pitrou.net Wed Dec 13 15:15:40 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 13 Dec 2017 21:15:40 +0100 Subject: [Python-Dev] PEP 489: module m_traverse called with NULL module state Message-ID: <20171213211540.6b92975a@fsol> Hello, After debugging a crash on AppVeyor for a submitter's PR (see https://github.com/python/cpython/pull/4611 ), I came to the following diagnosis: converting the "atexit" module (which is a built-in C extension) to PEP 489 multiphase initialization can lead to its m_traverse function (and presumably also m_clear and m_free) to be called while not module state is yet registered: that is, `PyModule_GetState(self)` when called from m_traverse returns NULL! Is that an expected or known subtlety? Regards Antoine. From ericsnowcurrently at gmail.com Wed Dec 13 15:59:36 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 13 Dec 2017 13:59:36 -0700 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: Overall, I like this PEP. It's definitely easier to follow conceptually than PEP 550. Thanks for taking the time to re-think the idea. I have a few comments in-line below. -eric On Tue, Dec 12, 2017 at 10:33 AM, Yury Selivanov wrote: > This is a new proposal to implement context storage in Python. +1 This is something I've had on my back burner for years. Getting this right is non-trivial, so having a stdlib implementation will help open up clean solutions in a number of use cases that are currently addressed in more error-prone ways. > > It's a successor of PEP 550 and builds on some of its API ideas and > datastructures. Contrary to PEP 550 though, this proposal only focuses > on adding new APIs and implementing support for it in asyncio. There > are no changes to the interpreter or to the behaviour of generator or > coroutine objects. Do you have any plans to revisit extension of the concept to generators and coroutine objects? I agree they can be addressed separately, if necessary. TBH, I'd expect this PEP to provide an approach that allows such applications of the concept to effectively be implementation details that can be supported later. > Abstract > ======== > > This PEP proposes the new ``contextvars`` module and a set of new > CPython C APIs to support context variables. This concept is > similar to thread-local variables but, unlike TLS, it allows s/it allows/it also allows/ > correctly keeping track of values per asynchronous task, e.g. > ``asyncio.Task``. > > [snip] > > Rationale > ========= > > Thread-local variables are insufficient for asynchronous tasks which > execute concurrently in the same OS thread. Any context manager that > needs to save and restore a context value and uses > ``threading.local()``, will have its context values bleed to other > code unexpectedly when used in async/await code. FWIW, I'd consider the concept to extend to all execution contexts in the interpreter, of which threads and async/await are the only kinds we have currently. That said, I don't see us adding any new kinds of execution context so what you've said is entirely satisfactory. :) > > [snip] > > Introduction > ============ > > [snip] > > Specification > ============= > > A new standard library module ``contextvars`` is added Why not add this to contextlib instead of adding a new module? IIRC this was discussed relative to PEP 550, but I don't remember the reason. Regardless, it would be worth mentioning somewhere in the PEP. > with the > following APIs: > > 1. ``get_context() -> Context`` function is used to get the current > ``Context`` object for the current OS thread. > > 2. ``ContextVar`` class to declare and access context variables. It may be worth explaining somewhere in the PEP the reason why you've chosen to add ContextVar instead of adding a new keyword (e.g. "context", a la global and nonlocal) to do roughly the same thing. Consider that execution contexts are very much a language-level concept, a close sibling to scope. Driving that via a keyword would a reasonable approach, particularly since it introduces less coupling between a language-level feature and a stdlib module. (Making it a builtin would sort of help with that too, but a keyword would seem like a better fit.) A keyword would obviate the need for explicitly calling .get() and .set(). FWIW, I agree with not adding a new keyword. To me context variables are a low-level tool for library authors to implement their high-level APIs. ContextVar, with its explicit .get() and .set() methods is a good fit for that and better communicates the conceptual intent of the feature. However, it would still be worth explicitly mentioning the alternate keyword-based approach in the PEP. > > 3. ``Context`` class encapsulates context state. Every OS thread > stores a reference to its current ``Context`` instance. > It is not possible to control that reference manually. > Instead, the ``Context.run(callable, *args)`` method is used to run > Python code in another context. I'd call that "Context.call()" since its for callables. Did you have a specific reason for calling it "run" instead? > > FWIW, I think there are some helpers you could add that library authors would appreciate. However, they aren't critical so I'll hold off and maybe post about them later. :) > contextvars.ContextVar > ---------------------- > > The ``ContextVar`` class has the following constructor signature: > ``ContextVar(name, *, default=no_default)``. The ``name`` parameter > is used only for introspection and debug purposes. It doesn't need to be required then, right? > [snip] > > ``ContextVar.set(value) -> Token`` is used to set a new value for > the context variable in the current ``Context``:: > > # Set the variable 'var' to 1 in the current context. > var.set(1) > > ``contextvars.Token`` is an opaque object that should be used to > restore the ``ContextVar`` to its previous value, or remove it from > the context if it was not set before. The ``ContextVar.reset(Token)`` > is used for that:: > > old = var.set(1) > try: > ... > finally: > var.reset(old) > > The ``Token`` API exists to make the current proposal forward > compatible with :pep:`550`, in case there is demand to support > context variables in generators and asynchronous generators in the > future. The "restoring values" focus is valuable on its own, It emphasizes a specific usage pattern to users (though a context manager would achieve the same). The token + reset() approach means that users don't need to think about "not set" when restoring values. That said, is there otherwise any value to the "not set" concept? If so, "is_set()" (not strictly necessary) and "unset()" methods may be warranted. Also, there's a strong context manager vibe here. Some sort of context manager support would be nice. However, with the token coming out of .set() and with no alternative (e.g. "get_token()"), I'm not sure what an intuitive CM interface would be here. > [snip] > > contextvars.Context > ------------------- > > [snip] > > Any changes to any context variables that ``function`` causes, will > be contained in the ``ctx`` context:: > > var = ContextVar('var') > var.set('spam') > > def function(): > assert var.get() == 'spam' > > var.set('ham') > assert var.get() == 'ham' > > ctx = get_context() > ctx.run(function) > > assert var.get('spam') Shouldn't this be "assert var.get() == 'spam'"? > > Any changes to the context will be contained and persisted in the > ``Context`` object on which ``run()`` is called on. For me this would be more clear if it could be spelled like this: with ctx: function() Also, let's say I want to run a function under a custom context, whether a fresh one or an adaptation of an existing one. How can I compose such a Context? AFAICS, the only way to modify a context is by using ContextVar.set() (and reset()), which modifies the current context. It might be useful if there were a more direct way, like a "Context.add(*var) -> Context" and "Context.remove(*var) -> Context" and maybe even a "Context.set(var, value) -> Context" and "Context.unset(var) -> Context". From ericsnowcurrently at gmail.com Wed Dec 13 16:03:59 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 13 Dec 2017 14:03:59 -0700 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: On Tue, Dec 12, 2017 at 4:49 PM, Victor Stinner wrote: >> The ``Token`` API exists to make the current proposal forward >> compatible with :pep:`550`, in case there is demand to support >> context variables in generators and asynchronous generators in the >> future. > > Cool. I like the idea of starting with something simple in Python 3.7. > Then extend it in Python 3.8 or later (support generators), if it > becomes popular, once the first simple (but "incomplete", without > generators) implementation is battle-tested. +1 for starting with a basic API and building on that. -eric From yselivanov.ml at gmail.com Wed Dec 13 16:35:59 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 13 Dec 2017 16:35:59 -0500 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: Hi Eric, Thanks for a detailed review! On Wed, Dec 13, 2017 at 3:59 PM, Eric Snow wrote: > Overall, I like this PEP. It's definitely easier to follow > conceptually than PEP 550. Thanks for taking the time to re-think the > idea. I have a few comments in-line below. > > -eric > > On Tue, Dec 12, 2017 at 10:33 AM, Yury Selivanov > wrote: >> This is a new proposal to implement context storage in Python. > > +1 > > This is something I've had on my back burner for years. Getting this > right is non-trivial, so having a stdlib implementation will help open > up clean solutions in a number of use cases that are currently > addressed in more error-prone ways. Right! > >> >> It's a successor of PEP 550 and builds on some of its API ideas and >> datastructures. Contrary to PEP 550 though, this proposal only focuses >> on adding new APIs and implementing support for it in asyncio. There >> are no changes to the interpreter or to the behaviour of generator or >> coroutine objects. > > Do you have any plans to revisit extension of the concept to > generators and coroutine objects? I agree they can be addressed > separately, if necessary. TBH, I'd expect this PEP to provide an > approach that allows such applications of the concept to effectively > be implementation details that can be supported later. Maybe we'll extend the concept to work for generators in Python 3.8, but that's a pretty remote topic to discuss (and we'll need a new PEP for that). In case we decide to do that, PEP 550 provides a good implementation plan, and PEP 567 are forward-compatible with it. > >> Abstract >> ======== >> >> This PEP proposes the new ``contextvars`` module and a set of new >> CPython C APIs to support context variables. This concept is >> similar to thread-local variables but, unlike TLS, it allows > > s/it allows/it also allows/ Will fix it. [..] >> A new standard library module ``contextvars`` is added > > Why not add this to contextlib instead of adding a new module? IIRC > this was discussed relative to PEP 550, but I don't remember the > reason. Regardless, it would be worth mentioning somewhere in the > PEP. > The mechanism is generic and isn't directly related to context managers. Context managers can (and in many cases should) use the new APIs to store global state, but the contextvars APIs do not depend on context managers or require them. I also feel that contextlib is a big module already, so having the new APIs in their separate module and having a separate documentation page makes it more approachable. >> with the >> following APIs: >> >> 1. ``get_context() -> Context`` function is used to get the current >> ``Context`` object for the current OS thread. >> >> 2. ``ContextVar`` class to declare and access context variables. > > It may be worth explaining somewhere in the PEP the reason why you've > chosen to add ContextVar instead of adding a new keyword (e.g. > "context", a la global and nonlocal) to do roughly the same thing. > Consider that execution contexts are very much a language-level > concept, a close sibling to scope. Driving that via a keyword would a > reasonable approach, particularly since it introduces less coupling > between a language-level feature and a stdlib module. (Making it a > builtin would sort of help with that too, but a keyword would seem > like a better fit.) A keyword would obviate the need for explicitly > calling .get() and .set(). > > FWIW, I agree with not adding a new keyword. To me context variables > are a low-level tool for library authors to implement their high-level > APIs. ContextVar, with its explicit .get() and .set() methods is a > good fit for that and better communicates the conceptual intent of the > feature. However, it would still be worth explicitly mentioning the > alternate keyword-based approach in the PEP. Yeah, adding keywords is way harder than adding a new module. It would require a change in Grammar, new opcodes, changes to frameobject etc. I also don't think that ContextVars will be that popular to have their own syntax -- how many threadlocals do you see every day? For PEP 567/550 a keyword isn't really needed, we can implement the concept with a ContextVar class. >> >> 3. ``Context`` class encapsulates context state. Every OS thread >> stores a reference to its current ``Context`` instance. >> It is not possible to control that reference manually. >> Instead, the ``Context.run(callable, *args)`` method is used to run >> Python code in another context. > > I'd call that "Context.call()" since its for callables. Did you have > a specific reason for calling it "run" instead? We have a bunch of run() methods in asyncio, and as I'm actively working on its codebase I might be biased here, but ".run()" reads better for me personally than ".call()". > FWIW, I think there are some helpers you could add that library > authors would appreciate. However, they aren't critical so I'll hold > off and maybe post about them later. :) My goal with this PEP is to keep the API to its bare minimum, but if you have some ideas please share! > >> contextvars.ContextVar >> ---------------------- >> >> The ``ContextVar`` class has the following constructor signature: >> ``ContextVar(name, *, default=no_default)``. The ``name`` parameter >> is used only for introspection and debug purposes. > > It doesn't need to be required then, right? If it's not required then people won't use it. And then when you want to introspect the context, you'll see a bunch of anonymous variables. So as with namedtuple(), I think there'd no harm in requiring the name parameter. > >> [snip] >> >> ``ContextVar.set(value) -> Token`` is used to set a new value for >> the context variable in the current ``Context``:: >> >> # Set the variable 'var' to 1 in the current context. >> var.set(1) >> >> ``contextvars.Token`` is an opaque object that should be used to >> restore the ``ContextVar`` to its previous value, or remove it from >> the context if it was not set before. The ``ContextVar.reset(Token)`` >> is used for that:: >> >> old = var.set(1) >> try: >> ... >> finally: >> var.reset(old) >> >> The ``Token`` API exists to make the current proposal forward >> compatible with :pep:`550`, in case there is demand to support >> context variables in generators and asynchronous generators in the >> future. > > The "restoring values" focus is valuable on its own, It emphasizes a > specific usage pattern to users (though a context manager would > achieve the same). The token + reset() approach means that users > don't need to think about "not set" when restoring values. That said, > is there otherwise any value to the "not set" concept? If so, > "is_set()" (not strictly necessary) and "unset()" methods may be > warranted. "unset()" would be incompatible with PEP 550, which has a chained execution context model. When you have a chain of contexts, unset() becomes ambiguous. "is_set()" is trivially implemented via "var.get(default=marker) is marker", but I don't think people will need this to add this method now. [..] >> Any changes to the context will be contained and persisted in the >> ``Context`` object on which ``run()`` is called on. > > For me this would be more clear if it could be spelled like this: > > with ctx: > function() But we would still need "run()" to use in asyncio. Context managers are slower that a single method call. Also, context management like this is a *very* low-level API intended to be used by framework/library authors in very few places. Again, I'd really prefer to keep the API to the minimum in 3.7. > Also, let's say I want to run a function under a custom context, > whether a fresh one or an adaptation of an existing one. How can I > compose such a Context? AFAICS, the only way to modify a context is > by using ContextVar.set() (and reset()), which modifies the current > context. It might be useful if there were a more direct way, like a > "Context.add(*var) -> Context" and "Context.remove(*var) -> Context" > and maybe even a "Context.set(var, value) -> Context" and > "Context.unset(var) -> Context". Again this would be a shortcut for a very limited number of use-cases. I just can't come up with a good real-world example where you want to add many context variables to the context and run something in it. But even if you want that, you can always just wrap your function: def set_and_call(var, val, func): var.set(val) return func() context.run(set_and_call, var, val, func) Yury From victor.stinner at gmail.com Wed Dec 13 16:56:10 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 13 Dec 2017 22:56:10 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: Hi Xavier, I looked at your scripts to build Android but I failed to use them. Anyway, I'm not sure why these scripts have to be part of the CPython git repository. Technically, is there a reason to put it aside the source code and Unix build scripts (configure/Makefile/setup)? Your https://github.com/python/cpython/pull/1629 only adds new files without touching existing files. I suggest to create new Git project. It may be in the python organization, or you may start with your GitHub account. Cross-compilation is hard, and I'm not sure that it's possible to build a single recipe for all Android API versions, all configuration, any set of libraries, etc. For Android, it seems like each developer might want a subtle different configuration which might not be easy to support. Having a separated Git project would allow people to contribute more easily, experiment their fork, etc. What do you think? I'm only talking about the proposed Android/ directory and https://github.com/python/cpython/pull/1629. Everything else is fine :-) Victor 2017-12-10 15:19 GMT+01:00 Xavier de Gaye : > The following note is a proposal to add the support of the Android platform. > > The note is easier to read with clickable links at > https://github.com/xdegaye/cagibi/blob/master/doc/android_support.rst > > Motivations > =========== > > * Android is ubiquitous. > * This would be the first platform supported by Python that is > cross-compiled, > thanks to many contributors. > * Although the Android operating system is linux, it is different from most > linux platforms, for example it does not use GNU libc and runs SELinux in > enforcing mode. Therefore supporting this platform would make Python more > robust and also would allow testing it on arm 64-bit processors. > * Python running on Android is also a handheld calculator, a successor of > the > slide rule and the `HP 41`_. > > Current status > ============== > > * The Python test suite succeeds when run on Android emulators using > buildbot > strenuous settings with the following architectures on API 24: x86, > x86_64, > armv7 and arm64. > * The `Android build system`_ is described in another section. > * The `buildmaster-config PR 26`_ proposes to update ``master.cfg`` to > enable > buildbots to run a given Android API and architecture on the emulators. > * The Android emulator is actually ``qemu``, so the test suites for x86 and > x86_64 last about the same time as the test suite run natively when the > processor of the build system is of the x86 family. The test suites for > the > arm architectures last much longer: about 8 hours for arm64 and 10 hours > for > armv7 on a four years old laptop. > * The changes that have been made to achieve this status are listed in > `bpo-26865`_, the Android meta-issue. > * Given the cpu resources required to run the test suite on the arm > emulators, > it may be difficult to find a contributed buildbot worker. So it remains > to > find the hardware to run these buildbots. > > Proposal > ======== > > Support the Android platform on API 24 [1]_ for the x86_64, armv7 and arm64 > architectures built with NDK 14b. > > *API 24* > * API 21 is the first version to provide usable support for wide > characters > and where SELinux is run in enforcing mode. > > * API 22 introduces an annoying bug on the linker that prints something > like > this when python is started:: > > ``WARNING: linker: libpython3.6m.so.1.0: unused DT entry: type > 0x6ffffffe arg 0x14554``. > > The `termux`_ Android terminal emulator describes this problem at the > end > of its `termux-packages`_ gitlab page and has implemented a > ``termux-elf-cleaner`` tool to strip the useless entries from the ELF > header of executables. > > * API 24 is the first version where the `adb`_ shell is run on the > emulator > as a ``shell`` user instead of the ``root`` user previously, and the > first > version that supports arm64. > > *x86_64* > It seems that no handheld device exists using that architecture. It is > supported because the x86_64 Android emulator runs fast and therefore is a > good candidate as a buildbot worker. > > *NDK 14b* > This release of the NDK is the first one to use `Unified headers`_ fixing > numerous problems that had been fixed by updating the Python configure > script > until now (those changes have been reverted by now). > > Android idiosyncrasies > ====================== > > * The default shell is ``/system/bin/sh``. > * The file system layout is not a traditional unix layout, there is no > ``/tmp`` for example. Most directories have user restricted access, > ``/sdcard`` is mounted as ``noexec`` for example. > * The (java) applications are allocated a unix user id and a subdirectory on > ``/data/data``. > * SELinux is run in enforcing mode. > * Shared memory and semaphores are not supported. > * The default encoding is UTF-8. > > Android build system > ==================== > > The Android build system is implemented at `bpo-30386`_ with `PR 1629`_ and > is documented by its `README`_. It provides the following features: > > * To build a distribution for a device or an emulator with a given API level > and a given architecture. > * To start the emulator and > + install the distribution > + start a remote interactive shell > + or run remotely a python command > + or run remotely the buildbottest > * Run gdb on the python process that is running on the emulator with python > pretty-printing. > > The build system adds the ``Android/`` directory and the > ``configure-android`` > script to the root of the Python source directory on the master branch > without > modifying any other file. The build system can be installed, upgraded (i.e. > the > SDK and NDK) and run remotely, through ssh for example. > > The following external libraries, when they are configured in the build > system, > are downloaded from the internet and cross-compiled (only once, on the first > run of the build system) before the cross-compilation of the extension > modules: > > * ``ncurses`` > * ``readline`` > * ``sqlite`` > * ``libffi`` > * ``openssl``, the cross-compilation of openssl fails on x86_64 and arm64 > and > this step is skipped on those architectures. > > The following extension modules are disabled by adding them to the > ``*disabled*`` section of ``Modules/Setup``: > > * ``_uuid``, Android has no uuid/uuid.h header. > * ``grp`` some grp.h functions are not declared. > * ``_crypt``, Android does not have crypt.h. > * ``_ctypes`` on x86_64 where all long double tests fail (`bpo-32202`_) and > on > arm64 (see `bpo-32203`_). > > .. [1] On Wikipedia `Android version history`_ lists the correspondence > between > API level, commercial name and version for each release. It also provides > information on the global Android version distribution, see the two > charts > on top. > > .. _`README`: > https://github.com/xdegaye/cpython/blob/bpo-30386/Android/README.rst > .. _`bpo-26865`: https://bugs.python.org/issue26865 > .. _`bpo-30386`: https://bugs.python.org/issue30386 > .. _`bpo-32202`: https://bugs.python.org/issue32202 > .. _`bpo-32203`: https://bugs.python.org/issue32203 > .. _`PR 1629`: https://github.com/python/cpython/pull/1629 > .. _`buildmaster-config PR 26`: > https://github.com/python/buildmaster-config/pull/26 > .. _`Android version history`: > https://en.wikipedia.org/wiki/Android_version_history > .. _`termux`: https://termux.com/ > .. _`termux-packages`: https://gitlab.com/jbwhips883/termux-packages > .. _`adb`: https://developer.android.com/studio/command-line/adb.html > .. _`Unified headers`: > https://android.googlesource.com/platform/ndk.git/+/ndk-r14-release/docs/UnifiedHeaders.md > .. _`HP 41`: https://en.wikipedia.org/wiki/HP-41C > .. vim:filetype=rst:tw=78:ts=8:sts=2:sw=2:et: > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com From ncoghlan at gmail.com Thu Dec 14 01:48:25 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Dec 2017 16:48:25 +1000 Subject: [Python-Dev] PEP 540: Add a new UTF-8 mode (v3) In-Reply-To: References: Message-ID: On 11 Dec. 2017 6:50 am, "INADA Naoki" wrote: Except one typo I commented on Github, I accept PEP 540. Well done, Victor and Nick for PEP 540 and 538. Python 3.7 will be most UTF-8 friendly Python 3 than ever. And thank you for all of your work on reviewing them! The appropriate trade-offs between ease of use in common scenarios and an increased chance of emitting mojibake are hard to figure out, but I like where we've ended up :) Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Dec 14 01:54:40 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Dec 2017 16:54:40 +1000 Subject: [Python-Dev] PEP 565: Show DeprecationWarning in __main__ In-Reply-To: References: Message-ID: On 13 Dec. 2017 12:53 pm, "Victor Stinner" wrote: 2017-12-13 0:24 GMT+01:00 Guido van Rossum : > Considered disagreement is acceptable. Sure, I'm fine with that ;-) > Nick, congrats with PEP 565! Please update the PEP to mark it as approved > with a link to this message as the resolution, and let's get the > implementation into 3.7a4! Nick wrote that he will be away, since I update his PEP: https://github.com/python/peps/commit/355eced94cf4117492c9e1eee8f950 f08e53ec90 Thanks Guido for the approval, and Victor for explaining the dev mode connection updating the PEP status! I'll get the implementation updated & merged in the first week of January (my phone is my only client device for most of the time until then). Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Dec 14 02:00:10 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 Dec 2017 17:00:10 +1000 Subject: [Python-Dev] PEP 489: module m_traverse called with NULL module state In-Reply-To: <20171213211540.6b92975a@fsol> References: <20171213211540.6b92975a@fsol> Message-ID: On 14 Dec. 2017 9:19 am, "Antoine Pitrou" wrote: Hello, After debugging a crash on AppVeyor for a submitter's PR (see https://github.com/python/cpython/pull/4611 ), I came to the following diagnosis: converting the "atexit" module (which is a built-in C extension) to PEP 489 multiphase initialization can lead to its m_traverse function (and presumably also m_clear and m_free) to be called while not module state is yet registered: that is, `PyModule_GetState(self)` when called from m_traverse returns NULL! Is that an expected or known subtlety? Not that I'm aware of, so I'd be inclined to classify it as a bug in the way we're handling multi-phase initialisation unless/until we determine there's no way to preserve the existing invariant from the single phase case. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From xdegaye at gmail.com Thu Dec 14 04:26:19 2017 From: xdegaye at gmail.com (Xavier de Gaye) Date: Thu, 14 Dec 2017 10:26:19 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: On 12/13/2017 10:56 PM, Victor Stinner wrote: > I looked at your scripts to build Android but I failed to use them. You failed because you did not read the README and tried to improvise. I will change the documentation and the build process to make it simpler for those that do not have the time to RTFM :-) I have documented the correct steps in the PR especially for you (they are just a summary of what is in the README) after you reported that failure. So you can follow these steps any time now and use correctly the build system and report the results here. This would be only fair, I think. BTW You keep saying they are scripts when the build is actually driven by Makefiles (with the '.mk' extension). > Anyway, I'm not sure why these scripts have to be part of the CPython > git repository. > > Technically, is there a reason to put it aside the source code and > Unix build scripts (configure/Makefile/setup)? > Your https://github.com/python/cpython/pull/1629 only adds new files > without touching existing files. > > I suggest to create new Git project. It may be in the python > organization, or you may start with your GitHub account. The 'Mac' build system has its own subdirectory in the source tree and it makes sense as it is the reference build system for this platform. I do not see why this should be different for Android. > Cross-compilation is hard, and I'm not sure that it's possible to > build a single recipe for all Android API versions, all configuration, > any set of libraries, etc. For Android, it seems like each developer > might want a subtle different configuration which might not be easy to > support. You are mistaken, this proposal does not suggest that we are going to support "all Android API versions, all configuration, any set of libraries, etc.", quite the opposite actually. The proposal is an Android build system for a specific API, a set of architectures using the NDK r14 toolchain and a set of optional external libraries. The build system enforces the use of NDK r14 for example (as you have painfuly experienced). Python is tested on emulators so that there is no interference with vendor specific additions found on the Android devices or with installed PlayStore applications. > Having a separated Git project would allow people to contribute more > easily, experiment their fork, etc. > > What do you think? Certainly not. We, core-devs, are very happy that no one is experimenting with our build system, it is complex enough as it is. The same goes for this Android build system. Your suggestion seems to be driven by the failure you have experienced with this new build system and the fact that a user is also reporting a failure. The origin of this other failure is unclear because I cannot reproduce it even though all the components used for the build are well defined and identical for everyone: the NDK includes the clang compiler, the libraries and the headers, the external libraries are downloaded by the build system, all the users use identical tools (same versions) and the same source code and the only difference may be with some utility tools such as sed, awk, etc... This is a bad start for this proposal and it would have been fair to inform me that you were working in irc collaboration with this other user in testing the build system. On the other hand, these problems may have some positive consequences since it allows us to be aware of the fact that the bpo audience may change if we support Android and that this may be a problem. Android attracts all kind of developers that do not have the average expertise of unix developers and more importantly that do not have the same motivations and the same etiquette. I am now concerned by the fact that the quality of the bug reports on bpo may dramatically decrease if we adopt this proposal. Xavier From solipsis at pitrou.net Thu Dec 14 06:00:34 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 14 Dec 2017 12:00:34 +0100 Subject: [Python-Dev] PEP 489: module m_traverse called with NULL module state In-Reply-To: References: <20171213211540.6b92975a@fsol> Message-ID: <20171214120034.063e27c2@fsol> On Thu, 14 Dec 2017 17:00:10 +1000 Nick Coghlan wrote: > On 14 Dec. 2017 9:19 am, "Antoine Pitrou" wrote: > > > Hello, > > After debugging a crash on AppVeyor for a submitter's PR > (see https://github.com/python/cpython/pull/4611 ), I came to the > following diagnosis: converting the "atexit" module (which is a > built-in C extension) to PEP 489 multiphase initialization can lead to > its m_traverse function (and presumably also m_clear and m_free) to be > called while not module state is yet registered: that is, > `PyModule_GetState(self)` when called from m_traverse returns NULL! > > Is that an expected or known subtlety? > > > Not that I'm aware of, so I'd be inclined to classify it as a bug in the > way we're handling multi-phase initialisation unless/until we determine > there's no way to preserve the existing invariant from the single phase > case. Speaking of which, the doc is not very clear: is PEP 489 required for multi-interpreter support or is PyModule_GetState() sufficient? Regards Antoine. From solipsis at pitrou.net Thu Dec 14 07:42:02 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 14 Dec 2017 13:42:02 +0100 Subject: [Python-Dev] sqlite3 module needs maintainer Message-ID: <20171214134202.584204e3@fsol> Hello, After noticing that many issues were opened for it and it was lacking maintenance, I contacted the sqlite3 module's historical author and maintainer, Gerhard H?ring (for the record, Gerhard didn't make any changes to the sqlite3 module since 2011... and the Python 2-only, third-party "pysqlite" module, which he maintains as well, did not receive many changes lately). He answered me that he was ok to declare the sqlite3 module as officially unmaintained. Since sqlite3 is such a useful and widely-used standard library module, it probably deserves someone competent and motivated to maintain it. Berker Peksa? is also interested in sqlite3 and is interested in helping maintain it (I'm trying to channel his private words here... I hope I don't misrepresent his position). Regards Antoine. From encukou at gmail.com Thu Dec 14 09:05:02 2017 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 14 Dec 2017 15:05:02 +0100 Subject: [Python-Dev] PEP 489: module m_traverse called with NULL module state In-Reply-To: <20171214120034.063e27c2@fsol> References: <20171213211540.6b92975a@fsol> <20171214120034.063e27c2@fsol> Message-ID: <7197623f-d9cb-48ac-bb08-03bbf4e30058@gmail.com> On 12/14/2017 12:00 PM, Antoine Pitrou wrote: > On Thu, 14 Dec 2017 17:00:10 +1000 > Nick Coghlan wrote: >> On 14 Dec. 2017 9:19 am, "Antoine Pitrou" wrote: >> >> >> Hello, >> >> After debugging a crash on AppVeyor for a submitter's PR >> (see https://github.com/python/cpython/pull/4611 ), I came to the >> following diagnosis: converting the "atexit" module (which is a >> built-in C extension) to PEP 489 multiphase initialization can lead to >> its m_traverse function (and presumably also m_clear and m_free) to be >> called while not module state is yet registered: that is, >> `PyModule_GetState(self)` when called from m_traverse returns NULL! >> >> Is that an expected or known subtlety? Thank you for looking into this, Antoine! >> Not that I'm aware of, so I'd be inclined to classify it as a bug in the >> way we're handling multi-phase initialisation unless/until we determine >> there's no way to preserve the existing invariant from the single phase >> case. Yes, it's a bug ? at least in documentation. From initial investigation, the problem is that between the two phases of multi-phase init, module state is NULL, and Python code can run. This is expected, so I'm thinking m_traverse for all modules using multi-phase init should have a check for NULL. And this should be documented. Let's have Marcel run with this a bit further. > Speaking of which, the doc is not very clear: is PEP 489 required for > multi-interpreter support or is PyModule_GetState() sufficient? I'm not exactly sure what you're asking; which doc are you referring to? PEP 489 gives you good defaults, if you use it and avoid global state (roughly: C-level mutable static variables), then you should get multi-interpreter support for free in simple cases. It's also possible to use PyModule_GetState() and other APIs directly. However, I'd like to avoid solving subinterpreter support separately (and slightly differently) in each module. For a slightly bigger picture: as a part-time internship, Marcel is identifying where PEP 489 is inadequate, and solving the problems for the complex cases. This is part of better support for subinterpreter support in general. Going with a PEP 489-based solution with atexit would help us in that effort. I'm assuming fixing the atexit bug from 2009 [0] can be delayed a bit as issues with PEP 489 are investigated & solved. Does that sound fair? [0] https://bugs.python.org/issue6531 From victor.stinner at gmail.com Thu Dec 14 08:59:24 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Dec 2017 14:59:24 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: 2017-12-14 10:26 GMT+01:00 Xavier de Gaye : > The 'Mac' build system has its own subdirectory in the source tree and it > makes sense as it is the reference build system for this platform. I do not > see why this should be different for Android. Hum, Mac/ is mostly the recipe to build the installer and a .dmg image, no? macOS uses the same configure/Makefile than Linux, no? Building Python on macOS is quite simple, there is nothing special about macOS. And macOS is used on a *very* limited set of hardware, only sold by a single vendor, Apple. Android is quite the opposite. The propose Android directory is complex recipe for cross-compiling for a wide range of hardware, and likely different usages of Python. Some people may want a REPL, some people may only be interested by a GUI (like Kivy or Panda3D?), some people probably want to do something else. All Python modules work on macOS, since macOS is just one flavor of POSIX. While Android is compatible with POSIX, from the bug reports that I saw, many modules don't work and will not work on Android. So supporting Android is much more complex than supporting macOS. > You are mistaken, this proposal does not suggest that we are going to > support "all Android API versions, all configuration, any set of libraries, > etc.", quite the opposite actually. Since we are talking about the future, I would like to remain open to support wider Android API versions and any configuration. As I wrote, I don't require to support all configurations and all API versions, but just provide best effort support, and only fully support one specific API version and one specific config. > The build system enforces the use of NDK r14 for example (as you > have painfuly experienced). It seems like Android is evolving quickly, I would say quicker than Python releases. I'm asking if it's a good idea to put a recipe aside the Python source code for one specific Android API version? Would it still make sense to build for NDK v14 in 2 or 5 years? > This is a bad start for this proposal and it would have been fair to inform > me that you were working in irc collaboration with this other user in > testing the build system. Right, I'm working with Paul Peny (pmpp on IRC). He is helping me to test your PR. Paul understand Android much better than me. For me, it's still a huge black box. Basically, I don't understand what I'm doing :-) Victor From solipsis at pitrou.net Thu Dec 14 09:11:11 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 14 Dec 2017 15:11:11 +0100 Subject: [Python-Dev] PEP 489: module m_traverse called with NULL module state References: <20171213211540.6b92975a@fsol> <20171214120034.063e27c2@fsol> <7197623f-d9cb-48ac-bb08-03bbf4e30058@gmail.com> Message-ID: <20171214151111.01953845@fsol> On Thu, 14 Dec 2017 15:05:02 +0100 Petr Viktorin wrote: > > PEP 489 gives you good defaults, if you use it and avoid global state > (roughly: C-level mutable static variables), then you should get > multi-interpreter support for free in simple cases. > It's also possible to use PyModule_GetState() and other APIs directly. > However, I'd like to avoid solving subinterpreter support separately > (and slightly differently) in each module. My question is: can you get multi-interpreter support *without* PEP 489? That is, using single-phase initialization and PyModule_GetState(). > For a slightly bigger picture: as a part-time internship, Marcel is > identifying where PEP 489 is inadequate, and solving the problems for > the complex cases. Is Marcel mentored by anyone in particular? > I'm assuming fixing the atexit bug from 2009 [0] can be delayed a bit as > issues with PEP 489 are investigated & solved. > Does that sound fair? Probably, but the atexit bug deserves fixing in itself. If a fix is ready, it would be a pity not to let it in. Regards Antoine. From victor.stinner at gmail.com Thu Dec 14 10:16:15 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Dec 2017 16:16:15 +0100 Subject: [Python-Dev] PEP 432 progress: Python initalization Message-ID: Hi, Serhiy Storchaka seems to be worried by the high numbers of commits in https://bugs.python.org/issue32030 "PEP 432: Rewrite Py_Main()", so let me explain the context of this work :-) To prepare CPython to implement my UTF-8 Mode PEP (PEP 540), I worked on the implementation of Nick Coghlan's PEP 432: PEP 432 -- Restructuring the CPython startup sequence https://www.python.org/dev/peps/pep-0432/ The startup sequence is a big pile of code made of multiple functions: main(), Py_Main(), Py_Initialize(), Py_Finalize()... and a lot of tiny "configuration" functions like Py_SetPath(). Over the years, many configuration options were added in the middle of the code. The priority of configuration options is not always correct between command line options, envrionment variables, configuration files (like "pyenv.cfg"), etc. For technical reasons, it's hard to impement properly the -E option (ignore PYTHON* environment variables). For example, the new PYTHONCOERCECLOCALE environment variable (of PEP 538) doesn't handle properly -E (it ignores -E), because it was too complex to support -E. -- I'm working on fixing this. Last weeks, I mostly worked on the Py_Main() function, Modules/getpath.c and PC/getpathp.c, to "refactor" the code: * Split big functions (300 to 500 lines) into multiple small functions (50 lines or less), to make it easily to follow the control flow and to allow to more easily move code * Replace static and global variables with memory allocated on the heap. * Reorganize how the configuration is read: populate a first temporary structure (_PyMain using wchar_t*), then create Python objects (_PyMainInterpreterConfig) to finish with the real configuration (like setting attributes of the sys module). The goal is to centralize all code reading configuration to fix the priority and to simplify the code. My motivation was to write a correct implementation of the UTF-8 Mode (PEP 540). Nick's motivation is to make CPython easily to embed. His plan for Python 3.8 is to give access to the new _PyCoreConfig and _PyMainInterpreterConfig structures to: * easily give access to most (if not all?) configuration options to "embedders" * allow to configure Python without environment variables, command line options, configuration files, but only using these structures * allow to configure Python using Python objects (PyObject*) rather than C types (like wchar_t*) (I'm not sure that I understood correctly, so please read the PEP 432 ;-)) IMHO the most visible change of the PEP 432 is to split Python initialization in two parts: * Core: strict minimum to use the Python C API * Main: everything else The goal is to introduce the opportunity to configure Python between Core and Main. The implementation is currently a work-in-progress. Nick will not have the bandwidth, neither do I, to update his PEP and finish the implementation, before Python 3.7. So this work remains private until at least Python 3.8. Another part of the work is to enhance the documentation. You can for example now find an explicit list of C functions which can be called before Py_Initialize(): https://docs.python.org/dev/c-api/init.html#before-python-initialization And also a list of functions that must not be called before Py_Initialize(), whereas you might want to call them :-) Victor From victor.stinner at gmail.com Thu Dec 14 10:31:41 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Dec 2017 16:31:41 +0100 Subject: [Python-Dev] PEP 432 progress: Python initalization In-Reply-To: References: Message-ID: Currently, we have the following configuration options: typedef struct { int ignore_environment; /* -E */ int use_hash_seed; /* PYTHONHASHSEED=x */ unsigned long hash_seed; int _disable_importlib; /* Needed by freeze_importlib */ const char *allocator; /* Memory allocator: _PyMem_SetupAllocators() */ int dev_mode; /* -X dev */ int faulthandler; /* -X faulthandler */ int tracemalloc; /* -X tracemalloc=N */ int import_time; /* -X importtime */ int show_ref_count; /* -X showrefcount */ int show_alloc_count; /* -X showalloccount */ int dump_refs; /* PYTHONDUMPREFS */ int malloc_stats; /* PYTHONMALLOCSTATS */ int utf8_mode; /* -X utf8 or PYTHONUTF8 environment variable */ wchar_t *module_search_path_env; /* PYTHONPATH environment variable */ wchar_t *home; /* PYTHONHOME environment variable, see also Py_SetPythonHome(). */ wchar_t *program_name; /* Program name, see also Py_GetProgramName() */ } _PyCoreConfig; and typedef struct { int install_signal_handlers; PyObject *argv; /* sys.argv list, can be NULL */ PyObject *module_search_path; /* sys.path list */ PyObject *warnoptions; /* sys.warnoptions list, can be NULL */ PyObject *xoptions; /* sys._xoptions dict, can be NULL */ } _PyMainInterpreterConfig; Victor 2017-12-14 16:16 GMT+01:00 Victor Stinner : > Hi, > > Serhiy Storchaka seems to be worried by the high numbers of commits in > https://bugs.python.org/issue32030 "PEP 432: Rewrite Py_Main()", so > let me explain the context of this work :-) > > To prepare CPython to implement my UTF-8 Mode PEP (PEP 540), I worked > on the implementation of Nick Coghlan's PEP 432: > > PEP 432 -- Restructuring the CPython startup sequence > https://www.python.org/dev/peps/pep-0432/ > > The startup sequence is a big pile of code made of multiple functions: > main(), Py_Main(), Py_Initialize(), Py_Finalize()... and a lot of tiny > "configuration" functions like Py_SetPath(). > > Over the years, many configuration options were added in the middle of > the code. The priority of configuration options is not always correct > between command line options, envrionment variables, configuration > files (like "pyenv.cfg"), etc. For technical reasons, it's hard to > impement properly the -E option (ignore PYTHON* environment > variables). > > For example, the new PYTHONCOERCECLOCALE environment variable (of PEP > 538) doesn't handle properly -E (it ignores -E), because it was too > complex to support -E. -- I'm working on fixing this. > > Last weeks, I mostly worked on the Py_Main() function, > Modules/getpath.c and PC/getpathp.c, to "refactor" the code: > > * Split big functions (300 to 500 lines) into multiple small functions > (50 lines or less), to make it easily to follow the control flow and > to allow to more easily move code > > * Replace static and global variables with memory allocated on the heap. > > * Reorganize how the configuration is read: populate a first temporary > structure (_PyMain using wchar_t*), then create Python objects > (_PyMainInterpreterConfig) to finish with the real configuration (like > setting attributes of the sys module). The goal is to centralize all > code reading configuration to fix the priority and to simplify the > code. > > My motivation was to write a correct implementation of the UTF-8 Mode (PEP 540). > > Nick's motivation is to make CPython easily to embed. His plan for > Python 3.8 is to give access to the new _PyCoreConfig and > _PyMainInterpreterConfig structures to: > > * easily give access to most (if not all?) configuration options to "embedders" > * allow to configure Python without environment variables, command > line options, configuration files, but only using these structures > * allow to configure Python using Python objects (PyObject*) rather > than C types (like wchar_t*) > > (I'm not sure that I understood correctly, so please read the PEP 432 ;-)) > > IMHO the most visible change of the PEP 432 is to split Python > initialization in two parts: > > * Core: strict minimum to use the Python C API > * Main: everything else > > The goal is to introduce the opportunity to configure Python between > Core and Main. > > The implementation is currently a work-in-progress. Nick will not have > the bandwidth, neither do I, to update his PEP and finish the > implementation, before Python 3.7. So this work remains private until > at least Python 3.8. > > Another part of the work is to enhance the documentation. You can for > example now find an explicit list of C functions which can be called > before Py_Initialize(): > > https://docs.python.org/dev/c-api/init.html#before-python-initialization > > And also a list of functions that must not be called before > Py_Initialize(), whereas you might want to call them :-) > > Victor From guido at python.org Thu Dec 14 12:04:43 2017 From: guido at python.org (Guido van Rossum) Date: Thu, 14 Dec 2017 09:04:43 -0800 Subject: [Python-Dev] sqlite3 module needs maintainer In-Reply-To: <20171214134202.584204e3@fsol> References: <20171214134202.584204e3@fsol> Message-ID: SGTM. It's one of my favorite stdlib modules wrapping an external library -- I use it for a variety of tasks to which it is well-suited. Go Berker! On Thu, Dec 14, 2017 at 4:42 AM, Antoine Pitrou wrote: > > Hello, > > After noticing that many issues were opened for it and it was lacking > maintenance, I contacted the sqlite3 module's historical author and > maintainer, Gerhard H?ring (for the record, Gerhard didn't make any > changes to the sqlite3 module since 2011... and the Python 2-only, > third-party "pysqlite" module, which he maintains as well, did not > receive many changes lately). He answered me that he was ok to declare > the sqlite3 module as officially unmaintained. > > Since sqlite3 is such a useful and widely-used standard library module, > it probably deserves someone competent and motivated to maintain it. > Berker Peksa? is also interested in sqlite3 and is interested in > helping maintain it (I'm trying to channel his private words here... I > hope I don't misrepresent his position). > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Dec 14 16:54:52 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 14 Dec 2017 16:54:52 -0500 Subject: [Python-Dev] PEP 432 progress: Python initalization In-Reply-To: References: Message-ID: On 12/14/2017 10:16 AM, Victor Stinner wrote: > Hi, > > Serhiy Storchaka seems to be worried by the high numbers of commits in > https://bugs.python.org/issue32030 "PEP 432: Rewrite Py_Main()", so > let me explain the context of this work :-) You could have (and still could) made that a master issue with multiple dependencies. Last summer, I merged at least 20 patches for one idlelib file. I split them up among 1 master issue and about 6 dependency issues. That was essential because most of the patches were written by one of 3 new contributors and needed separate discussions about the strategy for a particular patch. I completely agree with keeping PRs to a reviewable size. -- Terry Jan Reedy From victor.stinner at gmail.com Thu Dec 14 17:25:39 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 14 Dec 2017 23:25:39 +0100 Subject: [Python-Dev] PEP 432 progress: Python initalization In-Reply-To: References: Message-ID: 2017-12-14 22:54 GMT+01:00 Terry Reedy : > You could have (and still could) made that a master issue with multiple > dependencies. Last summer, I merged at least 20 patches for one idlelib > file. I split them up among 1 master issue and about 6 dependency issues. > That was essential because most of the patches were written by one of 3 new > contributors and needed separate discussions about the strategy for a > particular patch. > > I completely agree with keeping PRs to a reviewable size. I'm not sure that multiple issues are needed since all these changes are related to Py_Main() or are very close to Py_Main(), and they implement what is defined in the PEP 432. Technically, I could push a single giant commit, but it would be impossible to review it, even for myself, whereas I'm reading each change multiple times. I'm testing each change on Windows, macOS, Linux and FreeBSD to make sure that everything is fine. Py_Main() has a few functions specific to one platform like Windows or macOS. I also had to "iterate" on the code to move slowly the code, step by step. I'm not really proud of all these refactoring changes :-( But I hope that "at the end", the code will be much easier to understand and to maintain. Moreover, as I wrote, my intent is also to fix all the code handling configuration. For example, I just fixed the code to define sys.argv earlier. Now, sys.argv is defined very soon in Python initialization. Previously, sys.argv was only defined after Py_Initialize() completed. For example, the site module cannot access sys.argv: Traceback (most recent call last): File "/home/vstinner/prog/python/3.6/Lib/site.py", line 600, in print(sys.argv) AttributeError: module 'sys' has no attribute 'argv' I'm not sure that it's useful, but I was surprised that sys was only partially initialized before the site moduel was loaded. Victor From yselivanov.ml at gmail.com Thu Dec 14 18:49:40 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 14 Dec 2017 18:49:40 -0500 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: References: Message-ID: Ivan, Guido, Would it be possible to add a slot so that types defined in C can implement __class_getitem__? static PyClassMethodDef class_methods = { foo_class_getitem /* cm_class_getitem */ } static PyTypeObject Foo = { .tp_class_methods = class_methods } Yury On Mon, Dec 4, 2017 at 5:18 PM, Ivan Levkivskyi wrote: > Thank you! It looks like we have a bunch of accepted PEPs today. > It is great to see all this! Thanks everyone who participated in discussions > here, on python-ideas and > on typing tracker. Special thanks to Mark who started this discussion. > > -- > Ivan > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com > From guido at python.org Thu Dec 14 19:03:48 2017 From: guido at python.org (Guido van Rossum) Date: Thu, 14 Dec 2017 16:03:48 -0800 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: References: Message-ID: A slot is pretty expensive, as *every* class in existence will be another 8 bytes larger (and possibly more due to malloc rounding). So unless we find that there's a significant performance benefit I think we should hold back on this. IIRC Ivan has already measured an order of magnitude's speedup (well, 7x), so we may not need it. :-) On Thu, Dec 14, 2017 at 3:49 PM, Yury Selivanov wrote: > Ivan, Guido, > > Would it be possible to add a slot so that types defined in C can > implement __class_getitem__? > > static PyClassMethodDef class_methods = { > foo_class_getitem /* cm_class_getitem */ > } > > static PyTypeObject Foo = { > .tp_class_methods = class_methods > } > > Yury > > On Mon, Dec 4, 2017 at 5:18 PM, Ivan Levkivskyi > wrote: > > Thank you! It looks like we have a bunch of accepted PEPs today. > > It is great to see all this! Thanks everyone who participated in > discussions > > here, on python-ideas and > > on typing tracker. Special thanks to Mark who started this discussion. > > > > -- > > Ivan > > > > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python-dev/ > yselivanov.ml%40gmail.com > > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Thu Dec 14 19:25:58 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 14 Dec 2017 19:25:58 -0500 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: References: Message-ID: On Thu, Dec 14, 2017 at 7:03 PM, Guido van Rossum wrote: > A slot is pretty expensive, as *every* class in existence will be another 8 > bytes larger (and possibly more due to malloc rounding). So unless we find > that there's a significant performance benefit I think we should hold back > on this. IIRC Ivan has already measured an order of magnitude's speedup > (well, 7x), so we may not need it. :-) My motivation to add the slot wasn't the performance: it's just not possible to have a class-level __getitem__ on types defined in C. The only way is to define a base class in C and then extend it in pure-Python. This isn't too hard usually, though. BTW that slot could also host the new __mro_entries__ method, and, potentially, other magic methods like __subclasscheck__ and __instancecheck__. Yury From solipsis at pitrou.net Thu Dec 14 19:33:18 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 15 Dec 2017 01:33:18 +0100 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types References: Message-ID: <20171215013318.6de20a16@fsol> On Thu, 14 Dec 2017 16:03:48 -0800 Guido van Rossum wrote: > A slot is pretty expensive, as *every* class in existence will be another 8 > bytes larger (and possibly more due to malloc rounding). I'm always surprised by the discussions about class object size. Even imagining you have 10000 classes in memory (a pretty large number, though I'm sure you can reach that number with a lot of dependencies), we're talking about a total 800 kB memory growth (let's recall that each of those classes will probably have code objects, docstrings and what not attached to it -- i.e. you don't often create empty classes). Is it really an important concern? Regards Antoine. PS: simple experiment at an IPython prompt, trying to load every large third-party package I have lying around (I may be forgetting some). >>> def count_classes(): ...: types = [object] ...: seen = set(types) ...: while types: ...: types = [c for c in itertools.chain.from_iterable(type.__subclasses__(c) for c in types) ...: if c not in seen] ...: seen.update(types) ...: return len(seen) ...: >>> import numpy, asyncio, cython, requests, pandas, curio >>> import django.apps, django.contrib, django.db, django.forms, django.http, django.middleware, django.views >>> import twisted.internet.reactor, twisted.web >>> import tornado.ioloop, tornado.gen, tornado.locks >>> len(sys.modules) 1668 >>> count_classes() 6130 At this point, the IPython process uses 113 MB RSS, which adding a 8-byte slot to each of those 6130 classes would increase by a mere 49 kB. And I'm not even doing anything useful (no user data) with all those modules, so an actual application using those modules would weigh much more. (and for the curious, the actual list of classes: https://gist.github.com/pitrou/8bd03dbb480f5acbc3abbe6782df5ebd) From shoyer at gmail.com Thu Dec 14 19:36:11 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 15 Dec 2017 00:36:11 +0000 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: References: Message-ID: On Thu, Dec 14, 2017 at 4:29 PM Yury Selivanov wrote: > On Thu, Dec 14, 2017 at 7:03 PM, Guido van Rossum > wrote: > My motivation to add the slot wasn't the performance: it's just not > possible to have a class-level __getitem__ on types defined in C. The > only way is to define a base class in C and then extend it in > pure-Python. This isn't too hard usually, though. This could potentially make it much more complicated to adding typing support to NumPy. numpy.ndarray is defined in C. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Dec 14 21:00:30 2017 From: guido at python.org (Guido van Rossum) Date: Thu, 14 Dec 2017 18:00:30 -0800 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: References: Message-ID: In the light of Antoine's and Stephan's feedback I think this can be reconsidered -- while I want to take a cautious stance about resource consumption I don't want to stand in the way of progress. On Thu, Dec 14, 2017 at 4:36 PM, Stephan Hoyer wrote: > On Thu, Dec 14, 2017 at 4:29 PM Yury Selivanov > wrote: > >> On Thu, Dec 14, 2017 at 7:03 PM, Guido van Rossum >> wrote: >> My motivation to add the slot wasn't the performance: it's just not >> possible to have a class-level __getitem__ on types defined in C. The >> only way is to define a base class in C and then extend it in >> pure-Python. This isn't too hard usually, though. > > > This could potentially make it much more complicated to adding typing > support to NumPy. numpy.ndarray is defined in C. > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Thu Dec 14 21:03:42 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 15 Dec 2017 11:03:42 +0900 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: Hi, folks. TLDR, was the final decision made already? If "dict keeps insertion order" is not language spec and we continue to recommend people to use OrderedDict to keep order, I want to optimize OrderedDict for creation/iteration and memory usage. (See https://bugs.python.org/issue31265#msg301942 ) If dict ordering is language spec, I'll stop the effort and use remaining time to another optimizations. My thought is, +1 to make it language spec. * PHP (PHP 7.2 interpreter is faster than Python) keeps insertion order. So even we make it language spec, I think we have enough room to optimize. * It can make stop discussion like "Does X keeps insertion order? It's language spec?", "What about Y? Z?". Everything on top of dict keeps insertion order. It's simple to learn and explain. Regards, INADA Naoki On Sun, Nov 5, 2017 at 3:35 AM, Guido van Rossum wrote: > This sounds reasonable -- I think when we introduced this in 3.6 we were > worried that other implementations (e.g. Jython) would have a problem with > this, but AFAIK they've reported back that they can do this just fine. So > let's just document this as a language guarantee. > > On Sat, Nov 4, 2017 at 10:30 AM, Stefan Krah wrote: >> >> >> Hello, >> >> would it be possible to guarantee that dict literals are ordered in v3.7? >> >> >> The issue is well-known and the workarounds are tedious, example: >> >> >> https://mail.python.org/pipermail/python-ideas/2015-December/037423.html >> >> >> If the feature is guaranteed now, people can rely on it around v3.9. >> >> >> >> Stefan Krah >> >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com > From guido at python.org Thu Dec 14 21:20:49 2017 From: guido at python.org (Guido van Rossum) Date: Thu, 14 Dec 2017 18:20:49 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: I'm in favor of stating that dict keeps order as part of the language spec. However re-reading https://mail.python.org/pipermail/python-dev/2017-November/150381.html there's still a point of debate: should it be allowed if dict reorders after deletion (presumably as a result of a rehash)? I'm in favor of prescribing that the order should be preserved even in that case, but I realize there's additional implementation work to be done. Inada-san, what do you think of this? --Guido On Thu, Dec 14, 2017 at 6:03 PM, INADA Naoki wrote: > Hi, folks. > > TLDR, was the final decision made already? > > If "dict keeps insertion order" is not language spec and we > continue to recommend people to use OrderedDict to keep > order, I want to optimize OrderedDict for creation/iteration > and memory usage. (See https://bugs.python.org/issue31265#msg301942 ) > > If dict ordering is language spec, I'll stop the effort and > use remaining time to another optimizations. > > My thought is, +1 to make it language spec. > > * PHP (PHP 7.2 interpreter is faster than Python) keeps insertion order. > So even we make it language spec, I think we have enough room > to optimize. > > * It can make stop discussion like "Does X keeps insertion order? > It's language spec?", "What about Y? Z?". Everything on top of dict > keeps insertion order. It's simple to learn and explain. > > Regards, > INADA Naoki > > > On Sun, Nov 5, 2017 at 3:35 AM, Guido van Rossum wrote: > > This sounds reasonable -- I think when we introduced this in 3.6 we were > > worried that other implementations (e.g. Jython) would have a problem > with > > this, but AFAIK they've reported back that they can do this just fine. So > > let's just document this as a language guarantee. > > > > On Sat, Nov 4, 2017 at 10:30 AM, Stefan Krah > wrote: > >> > >> > >> Hello, > >> > >> would it be possible to guarantee that dict literals are ordered in > v3.7? > >> > >> > >> The issue is well-known and the workarounds are tedious, example: > >> > >> > >> https://mail.python.org/pipermail/python-ideas/2015- > December/037423.html > >> > >> > >> If the feature is guaranteed now, people can rely on it around v3.9. > >> > >> > >> > >> Stefan Krah > >> > >> > >> > >> _______________________________________________ > >> Python-Dev mailing list > >> Python-Dev at python.org > >> https://mail.python.org/mailman/listinfo/python-dev > >> Unsubscribe: > >> https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > > > > > > -- > > --Guido van Rossum (python.org/~guido) > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python-dev/ > songofacandy%40gmail.com > > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Dec 14 21:24:50 2017 From: guido at python.org (Guido van Rossum) Date: Thu, 14 Dec 2017 18:24:50 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: Oh, I just found https://mail.python.org/pipermail/python-dev/2017-November/150323.html so I already know what you think: we should go with "dicts preserve insertion order" rather than "dicts preserve insertion order until the first deletion". I guess we should wait for Serhiy to confirm that he's okay with this. On Thu, Dec 14, 2017 at 6:20 PM, Guido van Rossum wrote: > I'm in favor of stating that dict keeps order as part of the language spec. > > However re-reading https://mail.python.org/pipermail/python-dev/2017- > November/150381.html there's still a point of debate: should it be > allowed if dict reorders after deletion (presumably as a result of a > rehash)? I'm in favor of prescribing that the order should be preserved > even in that case, but I realize there's additional implementation work to > be done. Inada-san, what do you think of this? > > --Guido > > On Thu, Dec 14, 2017 at 6:03 PM, INADA Naoki > wrote: > >> Hi, folks. >> >> TLDR, was the final decision made already? >> >> If "dict keeps insertion order" is not language spec and we >> continue to recommend people to use OrderedDict to keep >> order, I want to optimize OrderedDict for creation/iteration >> and memory usage. (See https://bugs.python.org/issue31265#msg301942 ) >> >> If dict ordering is language spec, I'll stop the effort and >> use remaining time to another optimizations. >> >> My thought is, +1 to make it language spec. >> >> * PHP (PHP 7.2 interpreter is faster than Python) keeps insertion order. >> So even we make it language spec, I think we have enough room >> to optimize. >> >> * It can make stop discussion like "Does X keeps insertion order? >> It's language spec?", "What about Y? Z?". Everything on top of dict >> keeps insertion order. It's simple to learn and explain. >> >> Regards, >> INADA Naoki >> >> >> On Sun, Nov 5, 2017 at 3:35 AM, Guido van Rossum >> wrote: >> > This sounds reasonable -- I think when we introduced this in 3.6 we were >> > worried that other implementations (e.g. Jython) would have a problem >> with >> > this, but AFAIK they've reported back that they can do this just fine. >> So >> > let's just document this as a language guarantee. >> > >> > On Sat, Nov 4, 2017 at 10:30 AM, Stefan Krah >> wrote: >> >> >> >> >> >> Hello, >> >> >> >> would it be possible to guarantee that dict literals are ordered in >> v3.7? >> >> >> >> >> >> The issue is well-known and the workarounds are tedious, example: >> >> >> >> >> >> https://mail.python.org/pipermail/python-ideas/2015-December >> /037423.html >> >> >> >> >> >> If the feature is guaranteed now, people can rely on it around v3.9. >> >> >> >> >> >> >> >> Stefan Krah >> >> >> >> >> >> >> >> _______________________________________________ >> >> Python-Dev mailing list >> >> Python-Dev at python.org >> >> https://mail.python.org/mailman/listinfo/python-dev >> >> Unsubscribe: >> >> https://mail.python.org/mailman/options/python-dev/guido%40python.org >> > >> > >> > >> > >> > -- >> > --Guido van Rossum (python.org/~guido) >> > >> > _______________________________________________ >> > Python-Dev mailing list >> > Python-Dev at python.org >> > https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: >> > https://mail.python.org/mailman/options/python-dev/songofaca >> ndy%40gmail.com >> > >> > > > > -- > --Guido van Rossum (python.org/~guido) > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Fri Dec 15 00:28:48 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Thu, 14 Dec 2017 21:28:48 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: > On Dec 14, 2017, at 6:03 PM, INADA Naoki wrote: > > If "dict keeps insertion order" is not language spec and we > continue to recommend people to use OrderedDict to keep > order, I want to optimize OrderedDict for creation/iteration > and memory usage. (See https://bugs.python.org/issue31265#msg301942 ) I support having regular dicts maintain insertion order but am opposed to Inada changing the implementation of collections.OrderedDict We can have the first without having the second. Over the holidays, I hope to have time to do further analysis and create convincing demonstrations of why we want to keep the doubly-linked list implementation for collections.OrderedDict(). The current regular dictionary is based on the design I proposed several years ago. The primary goals of that design were compactness and faster iteration over the dense arrays of keys and values. Maintaining order was an artifact rather than a design goal. The design can maintain order but that is not its specialty. In contrast, I gave collections.OrderedDict a different design (later coded in C by Eric Snow). The primary goal was to have efficient maintenance of order even for severe workloads such at that imposed by the lru_cache which frequently alters order without touching the underlying dict. Intentionally, the OrderedDict has a design that prioritizes ordering capabilities at the expense of additional memory overhead and a constant factor worse insertion time. It is still my goal to have collections.OrderedDict have a different design with different performance characteristics than regular dicts. It has some order specific methods that regular dicts don't have (such as a move_to_end() and a popitem() that pops efficiently from either end). The OrderedDict needs to be good at those operations because that is what differentiates it from regular dicts. The tracker issue https://bugs.python.org/issue31265 is assigned to me and I currently do not approve of it going forward. The sentiment is nice but it undoes very intentional design decisions. In the upcoming months, I will give it additional study and will be open minded but it is not cool to use a python-dev post as a way to do an end-run around my objections. Back to the original topic of ordering, it is my feeling that it was inevitable that sooner or later we would guarantee ordering for regular dicts. Once we had a performant implementation, the decision would be dominated by how convenient it is users. Also, a single guarantee is simpler for everyone and is better than having a hodgepodge of rules stating that X and Y are guaranteed while Z is not. I think an ordering guarantee for regular dicts would be a nice Christmas present for our users and developers. Cheers, Raymond From njs at pobox.com Fri Dec 15 01:04:19 2017 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 14 Dec 2017 22:04:19 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: On Dec 14, 2017 21:30, "Raymond Hettinger" wrote: > On Dec 14, 2017, at 6:03 PM, INADA Naoki wrote: > > If "dict keeps insertion order" is not language spec and we > continue to recommend people to use OrderedDict to keep > order, I want to optimize OrderedDict for creation/iteration > and memory usage. (See https://bugs.python.org/issue31265#msg301942 ) I support having regular dicts maintain insertion order but am opposed to Inada changing the implementation of collections.OrderedDict We can have the first without having the second. It seems like the two quoted paragraphs are in vociferous agreement. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Fri Dec 15 01:31:15 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Thu, 14 Dec 2017 22:31:15 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: <0FE3900F-1EB0-44CB-A422-FC16F5162A6F@gmail.com> > I support having regular dicts maintain insertion order but am opposed to Inada changing the implementation of collections.OrderedDict We can have the first without having the second. > > It seems like the two quoted paragraphs are in vociferous agreement. The referenced tracker entry proposes, "Issue31265: Remove doubly-linked list from C OrderedDict". I don't think that should go forward regardless of whether regular dict order is guaranteed. Inada presented a compound proposition: either guarantee regular dict order or let him rip out the core design of OrderedDicts against my wishes. Raymond From yselivanov.ml at gmail.com Fri Dec 15 01:50:47 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 15 Dec 2017 01:50:47 -0500 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: References: Message-ID: On Thu, Dec 14, 2017 at 9:00 PM, Guido van Rossum wrote: > In the light of Antoine's and Stephan's feedback I think this can be > reconsidered -- while I want to take a cautious stance about resource > consumption I don't want to stand in the way of progress. I've created an issue to discuss this further: https://bugs.python.org/issue32332 Yury From storchaka at gmail.com Fri Dec 15 02:46:25 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 15 Dec 2017 09:46:25 +0200 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: References: Message-ID: 15.12.17 02:25, Yury Selivanov ????: > My motivation to add the slot wasn't the performance: it's just not > possible to have a class-level __getitem__ on types defined in C. The > only way is to define a base class in C and then extend it in > pure-Python. This isn't too hard usually, though. What are problems? How this differs from __sizeof__ and __getstate__? From storchaka at gmail.com Fri Dec 15 03:03:32 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 15 Dec 2017 10:03:32 +0200 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: <20171215013318.6de20a16@fsol> References: <20171215013318.6de20a16@fsol> Message-ID: 15.12.17 02:33, Antoine Pitrou ????: > On Thu, 14 Dec 2017 16:03:48 -0800 > Guido van Rossum wrote: >> A slot is pretty expensive, as *every* class in existence will be another 8 >> bytes larger (and possibly more due to malloc rounding). > > I'm always surprised by the discussions about class object size. > Even imagining you have 10000 classes in memory (a pretty large number, > though I'm sure you can reach that number with a lot of dependencies), > we're talking about a total 800 kB memory growth (let's recall that > each of those classes will probably have code objects, docstrings and what > not attached to it -- i.e. you don't often create empty classes). > > Is it really an important concern? The increased memory consumption is not the only cost. Initializing new slots takes a time. You have to spent a time for all class objects, not only for theses that have correspondent methods. In case of complex hierarchy the cost is larger, because you need to look up methods in all parent classes. This increases the startup time and increases the cost of creating local classes. From p.f.moore at gmail.com Fri Dec 15 03:52:29 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 15 Dec 2017 08:52:29 +0000 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: On 15 December 2017 at 05:28, Raymond Hettinger wrote: > In contrast, I gave collections.OrderedDict a different design (later coded in C by Eric Snow). The primary goal was to have efficient maintenance of order even for severe workloads such at that imposed by the lru_cache which frequently alters order without touching the underlying dict. Intentionally, the OrderedDict has a design that prioritizes ordering capabilities at the expense of additional memory overhead and a constant factor worse insertion time. That's interesting information - I wasn't aware of the different performance goals. I'd suggest adding a discussion of these goals to the OrderedDict documentation. Now that dictionaries preserve order (whether or not we make that language guaranteed or an implementation detail) having clear information on the intended performance trade-offs of an OrderedDict would help people understand why they might choose one over the other. Paul From songofacandy at gmail.com Fri Dec 15 05:30:47 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 15 Dec 2017 19:30:47 +0900 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: > That's interesting information - I wasn't aware of the different > performance goals. FYI, performance characteristic of my POC implementation of OrderedDict based on dict order are: * 50% less memory usage * 15% faster creation * 100% (2x) faster iteration * 20% slower move_to_end * 40% slower comparison (copied from https://bugs.python.org/issue31265#msg301942 ) Comparison is very unoptimized at the moment and I believe it can be more faster. On the other hand, I'm not sure about I can optimize move_to_end() more. If OrderdDict is recommended to be used for just keeping insertion order, I feel 1/2 memory usage and 2x faster iteration are more important than 20% slower move_to_end(). But if either "dict keeps insertion order" or "dict keeps insertion order until deletion" is language spec, there is no reason to use energy and time for discussion of OrderedDict implementation. Regards, INADA Naoki From steve at holdenweb.com Fri Dec 15 05:56:28 2017 From: steve at holdenweb.com (Steve Holden) Date: Fri, 15 Dec 2017 10:56:28 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <3418511732122395686@unknownmsgid> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> Message-ID: On Mon, Dec 11, 2017 at 5:10 PM, Chris Barker - NOAA Federal < chris.barker at noaa.gov> wrote: > . > > I see a couple of options: > 1a: Use a default type annotation, if one is not is supplied. typing.Any > would presumably make the most sense. > 1b: Use None if not type is supplied. > 2: Rework the code to not require annotations at all. > > I think I'd prefer 1a, since it's easy. > > > 2) would be great :-) > > I find this bit of ?typing creep? makes me nervous? Typing should Never be > required! > > ?+1 ? > I understand that the intent here is that the user could ignore typing and > have it all still work. But I?d rather is was not still there under the > hood. > > Just because standardized way to do something is included in core Python > doesn?t mean the standard library has to use it. > > ?I trust my repetition of the point that the stdlib is an important learning resource isn't unduly harping on the subject. Python is in danger of becoming pretty arcane rather too rapidly for my own liking?, though I confess to being mostly a consumer of Python. > However, typing is not currently imported by dataclasses.py. > > > And there you have an actual reason besides my uneasiness :-) > > - CHB > > ?hmm...? -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Dec 15 06:04:01 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 15 Dec 2017 12:04:01 +0100 Subject: [Python-Dev] [OT] Re: Is static typing still optional? References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> Message-ID: <20171215120401.378b7404@fsol> I'm not sure what Mail User Agent each of you is using, but it is quite impossible (here) to make out who is saying what in your latest messages. See plain text rendering here: https://mail.python.org/pipermail/python-dev/2017-December/151274.html Regards Antoine. On Fri, 15 Dec 2017 10:56:28 +0000 Steve Holden wrote: > On Mon, Dec 11, 2017 at 5:10 PM, Chris Barker - NOAA Federal < > chris.barker at noaa.gov> wrote: > > > . > > > > I see a couple of options: > > 1a: Use a default type annotation, if one is not is supplied. typing.Any > > would presumably make the most sense. > > 1b: Use None if not type is supplied. > > 2: Rework the code to not require annotations at all. > > > > I think I'd prefer 1a, since it's easy. > > > > > > 2) would be great :-) > > > > I find this bit of ?typing creep? makes me nervous? Typing should Never be > > required! > > > > ?+1 > ? > > > > I understand that the intent here is that the user could ignore typing and > > have it all still work. But I?d rather is was not still there under the > > hood. > > > > Just because standardized way to do something is included in core Python > > doesn?t mean the standard library has to use it. > > > > ?I trust my repetition of the point that the stdlib is an important > learning resource isn't unduly harping on the subject. Python is in danger > of becoming pretty arcane rather too rapidly for my own liking?, though I > confess to being mostly a consumer of Python. > > > However, typing is not currently imported by dataclasses.py. > > > > > > And there you have an actual reason besides my uneasiness :-) > > > > - CHB > > > > ?hmm...? > From eric at trueblade.com Fri Dec 15 06:22:09 2017 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 15 Dec 2017 06:22:09 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> Message-ID: On 12/15/2017 5:56 AM, Steve Holden wrote: > On Mon, Dec 11, 2017 at 5:10 PM, Chris Barker - NOAA Federal > > wrote: ... >> However, typing is not currently imported by dataclasses.py. > > And there you have an actual reason besides my uneasiness :-) > > - CHB > > ?hmm...? [Agreed with Antoine on the MUA and quoting being confusing.] The only reason typing isn't imported is performance. I hope that once PEP 560 is complete this will no longer be an issue, and dataclasses will always import typing. But of course typing will still not be needed for most uses of @dataclass or make_dataclass(). This is explained in the PEP. Eric. From storchaka at gmail.com Fri Dec 15 07:05:46 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 15 Dec 2017 14:05:46 +0200 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: References: Message-ID: 15.12.17 04:00, Guido van Rossum ????: > In the light of Antoine's and Stephan's feedback I think this can be > reconsidered -- while I want to take a cautious stance about resource > consumption I don't want to stand in the way of progress. I don't see any problems with implementing this on types defined in C. This isn't harder than implementing __sizeof__ or pickling support, and NumPy classes already have implemented both. Maybe Yury forgot about METH_STATIC and METH_CLASS? The cost of adding new slots: 1. Increased memory consumption. This increases the size of *every* class, even if they don't implement this feature. 2. Increased class initialization time. For every class for every slot we need to look up corresponding methods in dictionaries of the class itself and all its parents (caching doesn't work fine at this stage). Significant part of class initialization time is spent on initializing slots. This will increase the startup time and the time of creating local classes. The relative overhead is more significant in Cython. 3. We need to add a new type feature flag Py_TPFLAGS_HAVE_*. The number of possible flags is limited, and most bits already are used. We can add the limited number of new slots, and should not spent this resource without large need. 4. Increased complexity. Currently the code related to PEP 560 is located in few places. With supporting new slots we will need to touch more delicate code not related directly to PEP 560. It is hard to review and to test such kind of changes. I can't guarantee the correctness. Some libraries can create new classes on demand (see for example https://rhye.org/post/python-cassandra-namedtuple-performance/ and not accepted proposition in https://bugs.python.org/issue13299). This increases the cost of items 1 and 2. From yselivanov.ml at gmail.com Fri Dec 15 09:55:04 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 15 Dec 2017 09:55:04 -0500 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: References: Message-ID: > I don't see any problems with implementing this on types defined in C. This isn't harder than implementing __sizeof__ or pickling support, and NumPy classes already have implemented both. Maybe Yury forgot about METH_STATIC and METH_CLASS? I just tested __class_getitem__ defined via METH_STATIC and it works. This means we don't need to add slots. Thanks for the hint, Serhiy! Ivan, this might be worth mentioning in the PEP or in the docs. Yury From storchaka at gmail.com Fri Dec 15 10:00:23 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 15 Dec 2017 17:00:23 +0200 Subject: [Python-Dev] Make __class_getitem__ a class method Message-ID: The class itself always is passed as the first argument to __class_getitem__(): cls.__class_getitem__(cls, item) I propose to make __class_getitem__ a class method. This will make simpler implementing it in C. Currently it should be declared with flags METH_VARARGS|METH_STATIC and implementing as static PyObject * generic_class_getitem(PyObject *Py_UNUSED(self), PyObject *args) { PyObject *type, *item; if (!PyArg_UnpackTuple(args, "__class_getitem__", 2, 2, &type, &item)) { return NULL; } ... } Note an unused parameter and the need of manual unpacking arguments. If use it as a class method it should be declared with flags METH_O|METH_CLASS and implemented as static PyObject * generic_class_getitem(PyObject *type, PyObject *item) { ... } See https://github.com/python/cpython/pull/4883 for sample. From levkivskyi at gmail.com Fri Dec 15 10:10:53 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Fri, 15 Dec 2017 16:10:53 +0100 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: References: Message-ID: On 15 December 2017 at 15:55, Yury Selivanov wrote: > > I don't see any problems with implementing this on types defined in C. > This isn't harder than implementing __sizeof__ or pickling support, and > NumPy classes already have implemented both. Maybe Yury forgot about > METH_STATIC and METH_CLASS? > > I just tested __class_getitem__ defined via METH_STATIC and it works. > This means we don't need to add slots. Thanks for the hint, Serhiy! > > Ivan, this might be worth mentioning in the PEP or in the docs. I think it should be added to the PEP. Somehow I didn't think about C extensions, but now I see that it is important, also recently there appeared a lot of interest in better support of static typing for NumPy arrays and numeric stack. I will make a PR a bit later. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Fri Dec 15 10:25:17 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Fri, 15 Dec 2017 16:25:17 +0100 Subject: [Python-Dev] Make __class_getitem__ a class method In-Reply-To: References: Message-ID: I like this idea. I have few suggestions for the test cases you added, will add them a bit later in the PR. -- Ivan On 15 December 2017 at 16:00, Serhiy Storchaka wrote: > The class itself always is passed as the first argument to > __class_getitem__(): > > cls.__class_getitem__(cls, item) > > I propose to make __class_getitem__ a class method. This will make simpler > implementing it in C. Currently it should be declared with flags > METH_VARARGS|METH_STATIC and implementing as > > static PyObject * > generic_class_getitem(PyObject *Py_UNUSED(self), PyObject *args) > { > PyObject *type, *item; > if (!PyArg_UnpackTuple(args, "__class_getitem__", 2, 2, &type, &item)) > { > return NULL; > } > ... > } > > Note an unused parameter and the need of manual unpacking arguments. > > If use it as a class method it should be declared with flags > METH_O|METH_CLASS and implemented as > > static PyObject * > generic_class_getitem(PyObject *type, PyObject *item) > { > ... > } > > See https://github.com/python/cpython/pull/4883 for sample. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/levkivsky > i%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wagherculano at hotmail.com Thu Dec 14 18:59:53 2017 From: wagherculano at hotmail.com (Wagner Herculano) Date: Thu, 14 Dec 2017 23:59:53 +0000 Subject: [Python-Dev] f-strings In-Reply-To: References: Message-ID: Good evening, I'm Wagner Herculano from Brazil. I was trying to do a table exercise with number 5 and tried formatting spaces and did not find it in PEP 498 documentation. Finally I found a way, if possible, include this example in the documentation please. Below is my script with the desired formatting about table of 5. n = 5 for i in range(1,11): print(f'{n} x {i:>2} = {n*i:>2}') Result 5 x 1 = 5 5 x 2 = 10 5 x 3 = 15 5 x 4 = 20 5 x 5 = 25 5 x 6 = 30 5 x 7 = 35 5 x 8 = 40 5 x 9 = 45 5 x 10 = 50 ----------- Sorry my English, I needed to use Google Translate Best Regards, Wagner Herculano -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Dec 15 10:53:40 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Dec 2017 07:53:40 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: Make it so. "Dict keeps insertion order" is the ruling. Thanks! On Fri, Dec 15, 2017 at 2:30 AM, INADA Naoki wrote: > > That's interesting information - I wasn't aware of the different > > performance goals. > > FYI, performance characteristic of my POC implementation of > OrderedDict based on dict order are: > > * 50% less memory usage > * 15% faster creation > * 100% (2x) faster iteration > * 20% slower move_to_end > * 40% slower comparison > > (copied from https://bugs.python.org/issue31265#msg301942 ) > > Comparison is very unoptimized at the moment and I believe it can be > more faster. > On the other hand, I'm not sure about I can optimize move_to_end() more. > > If OrderdDict is recommended to be used for just keeping insertion order, > I feel 1/2 memory usage and 2x faster iteration are more important than > 20% slower move_to_end(). > > But if either "dict keeps insertion order" or "dict keeps insertion order > until > deletion" is language spec, there is no reason to use energy and time for > discussion of OrderedDict implementation. > > Regards, > > INADA Naoki > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mariatta.wijaya at gmail.com Fri Dec 15 11:23:54 2017 From: mariatta.wijaya at gmail.com (Mariatta Wijaya) Date: Fri, 15 Dec 2017 08:23:54 -0800 Subject: [Python-Dev] f-strings In-Reply-To: References: Message-ID: That's covered under "format specifiers" I think. The PEP mentions this: https://www.python.org/dev/peps/pep-0498/#format-specifiers That specific example is not mentioned in the docs, but there other examples of using format specifiers with f-strings. https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals On Dec 15, 2017 7:39 AM, "Wagner Herculano" wrote: > Good evening, > I'm Wagner Herculano from Brazil. > I was trying to do a table exercise with number 5 and tried formatting > spaces and did not find it in PEP 498 documentation. > Finally I found a way, if possible, include this example in the > documentation please. > > Below is my script with the desired formatting about table of 5. > > *n = 5* > > > > > > > > > > > > > > *for i in range(1,11): print(f'{n} x {i:>2} = {n*i:>2}') Result 5 > x 1 = 5 5 x 2 = 10 5 x 3 = 15 5 x 4 = 20 5 x 5 = 25 5 x 6 = 30 5 x > 7 = 35 5 x 8 = 40 5 x 9 = 45 5 x 10 = 50* > *-----------* > *Sorry my English, I needed to use Google Translate* > > Best Regards, > Wagner Herculano > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/mariatta. > wijaya%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xdegaye at gmail.com Fri Dec 15 11:29:20 2017 From: xdegaye at gmail.com (Xavier de Gaye) Date: Fri, 15 Dec 2017 17:29:20 +0100 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: Message-ID: <2063a4ed-d741-7ea0-4b8e-5e79c8c9e5d5@gmail.com> On 12/14/2017 02:59 PM, Victor Stinner wrote: > It seems like Android is evolving quickly, would say quicker than > Python releases. I'm asking if it's a good idea to put a recipe aside > the Python source code for one specific Android API version? Would it > still make sense to build for NDK v14 in 2 or 5 years? NDK 14 has been released in march 2017 and the latest release is NDK 16. There are sometimes major changes between releases and I think it is critical to ensure that the builds all use the same NDK release for that reason. Supporting another NDK release is just a substitution in one of the files of the build system and I am sure that in 2 or 5 years there would have been a core developer smart enough to make that substitution (this could even have been me, I will only be 71 years old in 5 years :-)). Anyway if this is a problem, this should have been discussed in a review of the PR. There are concerns, including a concern raised by me, about supporting Android with that build system or to supporting Android at all. It has been interesting and gratifying to work on this build system and to get the Python test suite running on Android without failures. Given these concerns and the lack of interest in the support of Android it is time for me to switch to something else, maybe improve the bdb module, why not ? Xavier From raymond.hettinger at gmail.com Fri Dec 15 11:32:35 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 15 Dec 2017 08:32:35 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: > On Dec 15, 2017, at 7:53 AM, Guido van Rossum wrote: > > Make it so. "Dict keeps insertion order" is the ruling. Thanks! Thank you. That is wonderful news :-) Would it be reasonable to replace some of the OrderedDict() uses in the standard library with dict()? For example, have namedtuples's _asdict() go back to returning a plain dict as it did in its original incarnation. Also, it looks like argparse could save an import by using a regular dict. Raymond From solipsis at pitrou.net Fri Dec 15 11:36:30 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 15 Dec 2017 17:36:30 +0100 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types References: Message-ID: <20171215173630.4cff46e8@fsol> On Fri, 15 Dec 2017 14:05:46 +0200 Serhiy Storchaka wrote: > 15.12.17 04:00, Guido van Rossum ????: > > In the light of Antoine's and Stephan's feedback I think this can be > > reconsidered -- while I want to take a cautious stance about resource > > consumption I don't want to stand in the way of progress. > > I don't see any problems with implementing this on types defined in C. > This isn't harder than implementing __sizeof__ or pickling support, and > NumPy classes already have implemented both. Maybe Yury forgot about > METH_STATIC and METH_CLASS? > > The cost of adding new slots: > > 1. Increased memory consumption. This increases the size of *every* > class, even if they don't implement this feature. > > 2. Increased class initialization time. For every class for every slot > we need to look up corresponding methods in dictionaries of the class > itself and all its parents (caching doesn't work fine at this stage). > Significant part of class initialization time is spent on initializing > slots. This will increase the startup time and the time of creating > local classes. The relative overhead is more significant in Cython. > > 3. We need to add a new type feature flag Py_TPFLAGS_HAVE_*. The number > of possible flags is limited, and most bits already are used. We can add > the limited number of new slots, and should not spent this resource > without large need. > > 4. Increased complexity. Currently the code related to PEP 560 is > located in few places. With supporting new slots we will need to touch > more delicate code not related directly to PEP 560. It is hard to review > and to test such kind of changes. I can't guarantee the correctness. These are all very good points (except #1 which I think is a red herring, see my posted example). Do you have any general idea how to speed up class creation? Regards Antoine. From yselivanov.ml at gmail.com Fri Dec 15 11:47:12 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 15 Dec 2017 11:47:12 -0500 Subject: [Python-Dev] Make __class_getitem__ a class method In-Reply-To: References: Message-ID: Shouldn't we optimize the usability for pure-Python first, and then for C API? Right now we have the '__new__' magic method, which isn't a @classmethod. Making '__class_getitem__' a @classmethod will confuse regular Python users. For example: class Foo: def __new__(cls, ...): pass @classmethod def __class_getitem__(cls, item): pass To me it makes sense that type methods that are supposed to be called on type by the Python interpreter don't need the classmethod decorator. METH_STATIC is a public working API, and in my opinion it's totally fine if we use it. It's not even hard to use it, it's just *mildly* inconvenient at most. Yury From chris.barker at noaa.gov Fri Dec 15 11:49:08 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Fri, 15 Dec 2017 08:49:08 -0800 Subject: [Python-Dev] f-strings In-Reply-To: References: Message-ID: <7809465429117446362@unknownmsgid> That's covered under "format specifiers" I think. The PEP mentions this: https://www.python.org/dev/peps/pep-0498/#format-specifiers I can see how a newbie might not realize that that means that f-strings use the same formatting language as the .format() method, and or where to find documentation for it. So somewhere in the docs making that really clear, with a link to the formatting spec documentation would be good. Not sure where though ? a PEP is not designed to be user documentation. -CHB That specific example is not mentioned in the docs, but there other examples of using format specifiers with f-strings. https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals On Dec 15, 2017 7:39 AM, "Wagner Herculano" wrote: > Good evening, > I'm Wagner Herculano from Brazil. > I was trying to do a table exercise with number 5 and tried formatting > spaces and did not find it in PEP 498 documentation. > Finally I found a way, if possible, include this example in the > documentation please. > > Below is my script with the desired formatting about table of 5. > > *n = 5* > > > > > > > > > > > > > > *for i in range(1,11): print(f'{n} x {i:>2} = {n*i:>2}') Result 5 > x 1 = 5 5 x 2 = 10 5 x 3 = 15 5 x 4 = 20 5 x 5 = 25 5 x 6 = 30 5 x > 7 = 35 5 x 8 = 40 5 x 9 = 45 5 x 10 = 50* > *-----------* > *Sorry my English, I needed to use Google Translate* > > Best Regards, > Wagner Herculano > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/mariatta. > wijaya%40gmail.com > > _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Dec 15 11:55:28 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Dec 2017 08:55:28 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: On Fri, Dec 15, 2017 at 8:32 AM, Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > > > On Dec 15, 2017, at 7:53 AM, Guido van Rossum wrote: > > > > Make it so. "Dict keeps insertion order" is the ruling. Thanks! > > Thank you. That is wonderful news :-) > > Would it be reasonable to replace some of the OrderedDict() uses in the > standard library with dict()? For example, have namedtuples's _asdict() go > back to returning a plain dict as it did in its original incarnation. Also, > it looks like argparse could save an import by using a regular dict. > If it's documented as OrderedDict that would be backwards incompatible, since that has additional methods. Even if not documented it's likely to break some code. So, I'm not sure about this (though I agree with the sentiment that OrderedDict is much less important now). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Fri Dec 15 12:04:38 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Fri, 15 Dec 2017 18:04:38 +0100 Subject: [Python-Dev] Make __class_getitem__ a class method In-Reply-To: References: Message-ID: On 15 December 2017 at 17:47, Yury Selivanov wrote: > Shouldn't we optimize the usability for pure-Python first, and then for C > API? > > Right now we have the '__new__' magic method, which isn't a > @classmethod. Making '__class_getitem__' a @classmethod will confuse > regular Python users. For example: > > class Foo: > def __new__(cls, ...): pass > > @classmethod > def __class_getitem__(cls, item): pass > > To me it makes sense that type methods that are supposed to be called > on type by the Python interpreter don't need the classmethod > decorator. > Good point! Pure Python will be the primary use case and we have another precedent for "automatic" class method: __init_subclass__ (it does not need to be decorated). > METH_STATIC is a public working API, and in my opinion it's totally > fine if we use it. It's not even hard to use it, it's just *mildly* > inconvenient at most. > OK, then documenting this "recipe" (METH_STATIC plus tuple unpacking) should be sufficient. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Dec 15 12:09:51 2017 From: status at bugs.python.org (Python tracker) Date: Fri, 15 Dec 2017 18:09:51 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20171215170951.2204656A54@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2017-12-08 - 2017-12-15) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6330 (+15) closed 37755 (+64) total 44085 (+79) Open issues with patches: 2448 Issues opened (51) ================== #17852: Built-in module _io can lose data from buffered files at exit https://bugs.python.org/issue17852 reopened by pitrou #32256: Make patchcheck.py work for out-of-tree builds https://bugs.python.org/issue32256 opened by izbyshev #32257: Support Disabling Renegotiation for SSLContext https://bugs.python.org/issue32257 opened by chuq #32259: Misleading "not iterable" Error Message when generator return https://bugs.python.org/issue32259 opened by Camion #32261: Online doc does not include inspect.classify_class_attrs https://bugs.python.org/issue32261 opened by csabella #32263: Template string docs refer to "normal %-based substitutions" https://bugs.python.org/issue32263 opened by v+python #32266: test_pathlib fails if current path has junctions https://bugs.python.org/issue32266 opened by Ivan.Pozdeev #32267: strptime misparses offsets with microsecond format https://bugs.python.org/issue32267 opened by mariocj89 #32270: subprocess closes redirected fds even if they are in pass_fds https://bugs.python.org/issue32270 opened by izbyshev #32275: SSL socket methods don't retry on EINTR? https://bugs.python.org/issue32275 opened by pitrou #32276: there is no way to make tempfile reproducible (i.e. seed the u https://bugs.python.org/issue32276 opened by Yaroslav.Halchenko #32278: Allow dataclasses.make_dataclass() to omit type information https://bugs.python.org/issue32278 opened by eric.smith #32279: Pass keyword arguments from dataclasses.make_dataclass() to @d https://bugs.python.org/issue32279 opened by eric.smith #32280: Expose `_PyRuntime` through a section name https://bugs.python.org/issue32280 opened by Maxime Belanger #32281: bdist_rpm v.s. the Macintosh https://bugs.python.org/issue32281 opened by bhyde #32282: When using a Windows XP compatible toolset, `socketmodule.c` f https://bugs.python.org/issue32282 opened by Maxime Belanger #32283: Cmd.onecmd documentation is misleading https://bugs.python.org/issue32283 opened by lyda #32285: In `unicodedata`, it should be possible to check a unistr's no https://bugs.python.org/issue32285 opened by Maxime Belanger #32287: Import of _pyio module failed on cygwin https://bugs.python.org/issue32287 opened by Mat???? Valo #32288: Inconsistent behavior with slice assignment? https://bugs.python.org/issue32288 opened by Massimiliano Culpo #32289: Glossary does not define "extended slicing" https://bugs.python.org/issue32289 opened by steven.daprano #32290: bolen-dmg-3.6: compilation failed with OSError: [Errno 23] Too https://bugs.python.org/issue32290 opened by vstinner #32291: Value error for string shared memory in multiprocessing https://bugs.python.org/issue32291 opened by magu #32295: User friendly message when invoking bdist_wheel sans wheel pac https://bugs.python.org/issue32295 opened by EWDurbin #32296: Implement asyncio._get_running_loop() and get_event_loop() in https://bugs.python.org/issue32296 opened by yselivanov #32299: unittest.mock.patch.dict.__enter__ should return the dict https://bugs.python.org/issue32299 opened by Allen Li #32300: print(os.environ.keys()) should only print the keys https://bugs.python.org/issue32300 opened by Aaron.Meurer #32303: Namespace packages have inconsistent __loader__ and __spec__.l https://bugs.python.org/issue32303 opened by barry #32304: Upload failed (400): Digests do not match on .tar.gz ending wi https://bugs.python.org/issue32304 opened by llecaroz #32305: Namespace packages have inconsistent __file__ and __spec__.ori https://bugs.python.org/issue32305 opened by barry #32306: Clarify map API in concurrent.futures https://bugs.python.org/issue32306 opened by David Luke?? #32307: Bad assumption on thread stack size makes python crash with mu https://bugs.python.org/issue32307 opened by Natanael Copa #32308: Replace empty matches adjacent to a previous non-empty match i https://bugs.python.org/issue32308 opened by serhiy.storchaka #32309: Implement asyncio.run_in_executor shortcut https://bugs.python.org/issue32309 opened by asvetlov #32310: Remove _Py_PyAtExit from Python.h https://bugs.python.org/issue32310 opened by nascheme #32312: Create Py_AtExitRegister C API https://bugs.python.org/issue32312 opened by nascheme #32313: Wrong inspect.getsource for datetime https://bugs.python.org/issue32313 opened by Aaron.Meurer #32315: can't run any scripts with 2.7.x, 32 and 64-bit https://bugs.python.org/issue32315 opened by DoctorEvil #32317: sys.exc_clear() clears exception in other stack frames https://bugs.python.org/issue32317 opened by Garrett Berg #32318: Remove "globals()" call from "socket.accept()" https://bugs.python.org/issue32318 opened by yselivanov #32320: Add default value support to collections.namedtuple() https://bugs.python.org/issue32320 opened by rhettinger #32321: functools.reduce has a redundant guard or needs a pure Python https://bugs.python.org/issue32321 opened by steven.daprano #32322: Heap type with Py_TPFLAGS_HAVE_GC leads to segfault due to not https://bugs.python.org/issue32322 opened by rkond #32323: urllib.parse.urlsplit() must not lowercase() IPv6 scope value https://bugs.python.org/issue32323 opened by socketpair #32324: [Security] "python3 directory" inserts "directory" at sys.path https://bugs.python.org/issue32324 opened by vstinner #32326: Update Build projects to version 10.0.16299.0 of the Windows 1 https://bugs.python.org/issue32326 opened by Decorater #32328: ttk.Treeview: _tkinter.TclError: list element in quotes follow https://bugs.python.org/issue32328 opened by kumba #32330: Email parser creates a message object that can't be flattened https://bugs.python.org/issue32330 opened by msapiro #32331: apply SOCK_TYPE_MASK to socket.type on Linux https://bugs.python.org/issue32331 opened by yselivanov #32333: test_smtplib: dangling threads on x86 Gentoo Non-Debug with X https://bugs.python.org/issue32333 opened by vstinner #32334: test_configparser left @test_2876_tmp temporary file on x86 Wi https://bugs.python.org/issue32334 opened by vstinner Most recent 15 issues with no replies (15) ========================================== #32334: test_configparser left @test_2876_tmp temporary file on x86 Wi https://bugs.python.org/issue32334 #32333: test_smtplib: dangling threads on x86 Gentoo Non-Debug with X https://bugs.python.org/issue32333 #32328: ttk.Treeview: _tkinter.TclError: list element in quotes follow https://bugs.python.org/issue32328 #32322: Heap type with Py_TPFLAGS_HAVE_GC leads to segfault due to not https://bugs.python.org/issue32322 #32321: functools.reduce has a redundant guard or needs a pure Python https://bugs.python.org/issue32321 #32320: Add default value support to collections.namedtuple() https://bugs.python.org/issue32320 #32315: can't run any scripts with 2.7.x, 32 and 64-bit https://bugs.python.org/issue32315 #32313: Wrong inspect.getsource for datetime https://bugs.python.org/issue32313 #32310: Remove _Py_PyAtExit from Python.h https://bugs.python.org/issue32310 #32308: Replace empty matches adjacent to a previous non-empty match i https://bugs.python.org/issue32308 #32306: Clarify map API in concurrent.futures https://bugs.python.org/issue32306 #32305: Namespace packages have inconsistent __file__ and __spec__.ori https://bugs.python.org/issue32305 #32304: Upload failed (400): Digests do not match on .tar.gz ending wi https://bugs.python.org/issue32304 #32303: Namespace packages have inconsistent __loader__ and __spec__.l https://bugs.python.org/issue32303 #32299: unittest.mock.patch.dict.__enter__ should return the dict https://bugs.python.org/issue32299 Most recent 15 issues waiting for review (15) ============================================= #32331: apply SOCK_TYPE_MASK to socket.type on Linux https://bugs.python.org/issue32331 #32323: urllib.parse.urlsplit() must not lowercase() IPv6 scope value https://bugs.python.org/issue32323 #32320: Add default value support to collections.namedtuple() https://bugs.python.org/issue32320 #32318: Remove "globals()" call from "socket.accept()" https://bugs.python.org/issue32318 #32310: Remove _Py_PyAtExit from Python.h https://bugs.python.org/issue32310 #32309: Implement asyncio.run_in_executor shortcut https://bugs.python.org/issue32309 #32308: Replace empty matches adjacent to a previous non-empty match i https://bugs.python.org/issue32308 #32299: unittest.mock.patch.dict.__enter__ should return the dict https://bugs.python.org/issue32299 #32296: Implement asyncio._get_running_loop() and get_event_loop() in https://bugs.python.org/issue32296 #32285: In `unicodedata`, it should be possible to check a unistr's no https://bugs.python.org/issue32285 #32282: When using a Windows XP compatible toolset, `socketmodule.c` f https://bugs.python.org/issue32282 #32280: Expose `_PyRuntime` through a section name https://bugs.python.org/issue32280 #32267: strptime misparses offsets with microsecond format https://bugs.python.org/issue32267 #32266: test_pathlib fails if current path has junctions https://bugs.python.org/issue32266 #32259: Misleading "not iterable" Error Message when generator return https://bugs.python.org/issue32259 Top 10 most discussed issues (10) ================================= #32259: Misleading "not iterable" Error Message when generator return https://bugs.python.org/issue32259 21 msgs #32300: print(os.environ.keys()) should only print the keys https://bugs.python.org/issue32300 19 msgs #32257: Support Disabling Renegotiation for SSLContext https://bugs.python.org/issue32257 16 msgs #32226: Implement PEP 560: Core support for typing module and generic https://bugs.python.org/issue32226 12 msgs #32252: test_regrtest leaves a test_python_* directory in TEMPDIR https://bugs.python.org/issue32252 11 msgs #32251: Add asyncio.BufferedProtocol https://bugs.python.org/issue32251 10 msgs #30491: Add a lightweight mechanism for detecting un-awaited coroutine https://bugs.python.org/issue30491 9 msgs #32030: PEP 432: Rewrite Py_Main() https://bugs.python.org/issue32030 9 msgs #19431: Document PyFrame_FastToLocals() and PyFrame_FastToLocalsWithEr https://bugs.python.org/issue19431 8 msgs #30050: Please provide a way to disable the warning printed if the sig https://bugs.python.org/issue30050 7 msgs Issues closed (63) ================== #11123: problem with packaged dependency extracter script, pdeps https://bugs.python.org/issue11123 closed by csabella #20361: -W command line options and PYTHONWARNINGS environmental varia https://bugs.python.org/issue20361 closed by vstinner #22091: __debug__ in compile(optimize=1) https://bugs.python.org/issue22091 closed by serhiy.storchaka #22671: Typo in class io.BufferedIOBase docs https://bugs.python.org/issue22671 closed by vstinner #26259: Memleak when repeated calls to asyncio.queue.Queue.get is perf https://bugs.python.org/issue26259 closed by asvetlov #26549: co_stacksize is calculated from unoptimized code https://bugs.python.org/issue26549 closed by serhiy.storchaka #27169: __debug__ is not optimized out at compile time for anything bu https://bugs.python.org/issue27169 closed by serhiy.storchaka #27695: Long constant calculations stall compilation https://bugs.python.org/issue27695 closed by serhiy.storchaka #28185: Tabs in C source code https://bugs.python.org/issue28185 closed by martin.panter #28393: Update encoding lookup docs wrt #27938 https://bugs.python.org/issue28393 closed by vstinner #28813: Remove unneeded folded consts after peephole https://bugs.python.org/issue28813 closed by serhiy.storchaka #29469: AST-level Constant folding https://bugs.python.org/issue29469 closed by inada.naoki #30241: Add contextlib.AbstractAsyncContextManager https://bugs.python.org/issue30241 closed by yselivanov #31265: Remove doubly-linked list from C OrderedDict https://bugs.python.org/issue31265 closed by rhettinger #31620: asyncio.Queue leaks memory if the queue is empty and consumers https://bugs.python.org/issue31620 closed by asvetlov #31650: implement PEP 552 https://bugs.python.org/issue31650 closed by benjamin.peterson #31942: Document that support of start and stop parameters in the Sequ https://bugs.python.org/issue31942 closed by vstinner #31964: [3.4][3.5] pyexpat: compilaton of libexpat fails with: ISO C90 https://bugs.python.org/issue31964 closed by vstinner #31967: [Windows] test_distutils: fatal error LNK1158: cannot run 'rc. https://bugs.python.org/issue31967 closed by vstinner #32114: The get_event_loop change in bpo28613 did not update the docum https://bugs.python.org/issue32114 closed by yselivanov #32119: test_notify_all() of test_multiprocessing_forkserver failed on https://bugs.python.org/issue32119 closed by vstinner #32124: Document functions safe to be called before Py_Initialize() https://bugs.python.org/issue32124 closed by vstinner #32143: os.statvfs lacks f_fsid https://bugs.python.org/issue32143 closed by fdrake #32186: io.FileIO hang all threads if fstat blocks on inaccessible NFS https://bugs.python.org/issue32186 closed by berker.peksag #32193: Convert asyncio to async/await https://bugs.python.org/issue32193 closed by asvetlov #32208: Improve semaphore documentation https://bugs.python.org/issue32208 closed by berker.peksag #32225: Implement PEP 562: module __getattr__ and __dir__ https://bugs.python.org/issue32225 closed by levkivskyi #32230: -X dev doesn't set sys.warnoptions https://bugs.python.org/issue32230 closed by vstinner #32237: test_xml_etree leaked [1, 1, 1] references, sum=3 https://bugs.python.org/issue32237 closed by berker.peksag #32240: Add the const qualifier for PyObject* array arguments https://bugs.python.org/issue32240 closed by serhiy.storchaka #32241: Add the const qualifier for char and wchar_t pointers to unmod https://bugs.python.org/issue32241 closed by serhiy.storchaka #32253: Deprecate old-style locking in asyncio/locks.py https://bugs.python.org/issue32253 closed by asvetlov #32255: csv.writer converts None to '""\n' when it is first line, othe https://bugs.python.org/issue32255 closed by serhiy.storchaka #32258: Rewrite asyncio docs to use async/await syntax https://bugs.python.org/issue32258 closed by asvetlov #32260: siphash shouldn't byte swap the keys https://bugs.python.org/issue32260 closed by benjamin.peterson #32262: Fix linting errors in asyncio code; use f-strings consistently https://bugs.python.org/issue32262 closed by yselivanov #32264: move pygetopt.h into internal/ https://bugs.python.org/issue32264 closed by benjamin.peterson #32265: Correctly classify builtin static and class methods https://bugs.python.org/issue32265 closed by serhiy.storchaka #32268: quopri.decode(): string argument expected, got 'bytes' https://bugs.python.org/issue32268 closed by r.david.murray #32269: Add `asyncio.get_running_loop()` function https://bugs.python.org/issue32269 closed by yselivanov #32271: test_ssl test failed on Fedora 27 https://bugs.python.org/issue32271 closed by christian.heimes #32272: Remove asyncio.async function https://bugs.python.org/issue32272 closed by yselivanov #32273: Remove asyncio.test_utils https://bugs.python.org/issue32273 closed by yselivanov #32274: Potential leak in pysqlite_connection_init() https://bugs.python.org/issue32274 closed by lelit #32277: SystemError via chmod(symlink, ..., follow_symlinks=False) https://bugs.python.org/issue32277 closed by serhiy.storchaka #32284: typing.TextIO and BinaryIO are not aliases of IO[...] https://bugs.python.org/issue32284 closed by asvetlov #32286: python 2.7 cannot parse '' https://bugs.python.org/issue32286 closed by vstinner #32292: Building fails on Windows https://bugs.python.org/issue32292 closed by pitrou #32293: macos pkg fails 10.13.2 https://bugs.python.org/issue32293 closed by Lloyd Vancil #32294: test_semaphore_tracker() of test_multiprocessing_spawn fails w https://bugs.python.org/issue32294 closed by vstinner #32297: Few misspellings found in Python source code comments. https://bugs.python.org/issue32297 closed by asvetlov #32298: Email.quopriprime over-encodes characters https://bugs.python.org/issue32298 closed by r.david.murray #32301: Typo in array documentation https://bugs.python.org/issue32301 closed by steven.daprano #32302: test_distutils: test_get_exe_bytes() failure on AppVeyor https://bugs.python.org/issue32302 closed by vstinner #32311: Implement asyncio.create_task() shortcut https://bugs.python.org/issue32311 closed by asvetlov #32314: Implement asyncio.run() https://bugs.python.org/issue32314 closed by yselivanov #32316: [3.6] make regen-all fails on Travis CI on "python3.6" command https://bugs.python.org/issue32316 closed by vstinner #32319: re fullmatch error with non greedy modifier https://bugs.python.org/issue32319 closed by serhiy.storchaka #32325: C API should use 'const char *' instead of 'char *' https://bugs.python.org/issue32325 closed by serhiy.storchaka #32327: Make asyncio methods documented as coroutines - coroutines. https://bugs.python.org/issue32327 closed by yselivanov #32329: PYTHONHASHSEED=0 python3 -R should enable hash randomization https://bugs.python.org/issue32329 closed by vstinner #32332: Implement slots support for magic methods added in PEP 560 https://bugs.python.org/issue32332 closed by yselivanov #1346238: A constant folding optimization pass for the AST https://bugs.python.org/issue1346238 closed by serhiy.storchaka From storchaka at gmail.com Fri Dec 15 12:18:11 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 15 Dec 2017 19:18:11 +0200 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: <20171215173630.4cff46e8@fsol> References: <20171215173630.4cff46e8@fsol> Message-ID: 15.12.17 18:36, Antoine Pitrou ????: > Do you have any general idea how to speed up class creation? Some work was done in [https://bugs.python.org/issue31336]. Currently I have no ideas. Creating a class is 1-2 orders slower than creating a function. And adding parent classes significantly slows down it. $ ./python -m perf timeit --duplicate=100 'def f(s): pass' ..................... Mean +- std dev: 50.4 ns +- 0.8 ns $ ./python -m perf timeit --duplicate=100 'class C: pass' ..................... Mean +- std dev: 6.80 us +- 0.14 us $ ./python -m perf timeit --duplicate=100 'class C:' ' def m(s): pass' ..................... Mean +- std dev: 7.11 us +- 0.11 us $ ./python -m perf timeit --duplicate=100 'class C(str): pass' ..................... Mean +- std dev: 8.47 us +- 0.34 us I'm surprised that that creating a method is much slower (6 times!) than creating a function. Maybe due to __set_name__ or other magic. It isn't surprised that creating an enum or namedtuple class is much slower than creating a regular class. The latter was much worse before 3.7. $ ./python -m perf timeit -s 'from enum import Enum' --duplicate=100 'class E(Enum): A = 1' ..................... Mean +- std dev: 45.9 us +- 0.8 us $ ./python -m perf timeit -s 'from collections import namedtuple' --duplicate=100 'P = namedtuple("P", ("x",))' ..................... Mean +- std dev: 44.7 us +- 0.6 us From eric at trueblade.com Fri Dec 15 12:37:13 2017 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 15 Dec 2017 12:37:13 -0500 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: On 12/15/2017 11:55 AM, Guido van Rossum wrote: > On Fri, Dec 15, 2017 at 8:32 AM, Raymond Hettinger > > wrote: > > > > On Dec 15, 2017, at 7:53 AM, Guido van Rossum > wrote: > > > > Make it so. "Dict keeps insertion order" is the ruling. Thanks! > > Thank you.? That is wonderful news :-) > > Would it be reasonable to replace some of the OrderedDict() uses in > the standard library with dict()?? For example, have namedtuples's > _asdict() go back to returning a plain dict as it did in its > original incarnation. Also, it looks like argparse could save an > import by using a regular dict. > > > If it's documented as OrderedDict that would be backwards incompatible, > since that has additional methods. Even if not documented it's likely to > break some code. So, I'm not sure about this (though I agree with the > sentiment that OrderedDict is much less important now). For dataclasses, I'll change from OrderedDict to dict, since there's no backward compatibility concern. But I need to remember to not do that when I put the 3.6 version on PyPI. Eric. From mariatta.wijaya at gmail.com Fri Dec 15 12:39:19 2017 From: mariatta.wijaya at gmail.com (Mariatta Wijaya) Date: Fri, 15 Dec 2017 09:39:19 -0800 Subject: [Python-Dev] f-strings In-Reply-To: <7809465429117446362@unknownmsgid> References: <7809465429117446362@unknownmsgid> Message-ID: I agree it's useful info :) I went ahead and made a PR [1]. In my PR, I simply linked to the Format Specification Mini Language[2] from f-strings documentation[3]. Not sure about updating PEP 498 at this point.. [1] https://github.com/python/cpython/pull/4888 [2] https://docs.python.org/3.6/library/string.html#format-s pecification-mini-language [3] https://docs.python.org/3/reference/lexical_analysis.html#f-strings -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Dec 15 12:40:06 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 15 Dec 2017 19:40:06 +0200 Subject: [Python-Dev] __init_subclass__ is a class method (Was: Make __class_getitem__ a class method) In-Reply-To: References: Message-ID: 15.12.17 19:04, Ivan Levkivskyi ????: > Good point! Pure Python will be the primary use case and we have another > precedent > for "automatic" class method: __init_subclass__ (it does not need to be > decorated). __init_subclass__ is very different beast, and parallels with it can be confusing. It is automatically decorated with classmethod if it is a regular function implemented in C. The following two examples are totally equivalent: class A: def __init_subclass__(cls): pass class B: @classmethod def __init_subclass__(cls): pass help(A) shows __init_subclass__() as a class method (in 3.7). But if you implement the class in C you need to make __init_subclass__ a class method. I think __init_subclass__ should be documented as a class method since it is a class method. From storchaka at gmail.com Fri Dec 15 12:45:24 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 15 Dec 2017 19:45:24 +0200 Subject: [Python-Dev] Make __class_getitem__ a class method In-Reply-To: References: Message-ID: 15.12.17 18:47, Yury Selivanov ????: > Shouldn't we optimize the usability for pure-Python first, and then for C API? > > Right now we have the '__new__' magic method, which isn't a > @classmethod. Making '__class_getitem__' a @classmethod will confuse > regular Python users. For example: > > class Foo: > def __new__(cls, ...): pass > > @classmethod > def __class_getitem__(cls, item): pass > > To me it makes sense that type methods that are supposed to be called > on type by the Python interpreter don't need the classmethod > decorator. > > METH_STATIC is a public working API, and in my opinion it's totally > fine if we use it. It's not even hard to use it, it's just *mildly* > inconvenient at most. __new__ is not a class method, it is an "automatic" static method. >>> class C: ... def __new__(cls): return object.__new__(cls) ... >>> C().__new__ is C.__new__ True >>> C.__dict__['__new__'] The following two declarations are equivalent: class A: def __new__(cls): return cls.__name__ class B: @staticmethod def __new__(cls): return cls.__name__ From solipsis at pitrou.net Fri Dec 15 12:50:58 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 15 Dec 2017 18:50:58 +0100 Subject: [Python-Dev] Type creation speed References: <20171215173630.4cff46e8@fsol> Message-ID: <20171215185058.7e4ad2c0@fsol> On Fri, 15 Dec 2017 19:18:11 +0200 Serhiy Storchaka wrote: > 15.12.17 18:36, Antoine Pitrou ????: > > Do you have any general idea how to speed up class creation? > > Some work was done in [https://bugs.python.org/issue31336]. Currently I > have no ideas. > > Creating a class is 1-2 orders slower than creating a function. And > adding parent classes significantly slows down it. I made simple, approximate measurements with an empty class: - fixup_slot_dispatchers() takes 78% of the time (!) - __build_class__() takes 5% - computing the default __qualname__, __module__, __doc__ takes 3% - set_names() takes 2.5% - init_subclass() takes 2.5% Regards Antoine. From yselivanov.ml at gmail.com Fri Dec 15 13:02:53 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 15 Dec 2017 13:02:53 -0500 Subject: [Python-Dev] Make __class_getitem__ a class method In-Reply-To: References: Message-ID: On Fri, Dec 15, 2017 at 12:45 PM, Serhiy Storchaka wrote: > 15.12.17 18:47, Yury Selivanov ????: >> >> Shouldn't we optimize the usability for pure-Python first, and then for C >> API? >> >> Right now we have the '__new__' magic method, which isn't a >> @classmethod. Making '__class_getitem__' a @classmethod will confuse >> regular Python users. For example: >> >> class Foo: >> def __new__(cls, ...): pass >> >> @classmethod >> def __class_getitem__(cls, item): pass >> >> To me it makes sense that type methods that are supposed to be called >> on type by the Python interpreter don't need the classmethod >> decorator. >> >> METH_STATIC is a public working API, and in my opinion it's totally >> fine if we use it. It's not even hard to use it, it's just *mildly* >> inconvenient at most. > > > __new__ is not a class method, it is an "automatic" static method. I never said that __new__ is a class method :) > The following two declarations are equivalent: > > class A: > def __new__(cls): return cls.__name__ > > class B: > @staticmethod > def __new__(cls): return cls.__name__ But nobody decorates __new__ with a @staticmethod. And making __class_getitem__ a @classmethod will only confuse users -- that's all I'm saying. So I'm +1 to keep the things exactly as they are now. It would be great do document that in order to implement __class_getitem__ in C one should add it as METH_STATIC. I also think we should merge your PR that tests that it works the way it's expected. Yury From storchaka at gmail.com Fri Dec 15 13:19:00 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 15 Dec 2017 20:19:00 +0200 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: 15.12.17 17:53, Guido van Rossum ????: > Make it so. "Dict keeps insertion order" is the ruling. Thanks! What should dict.popitem() return? The first item, the last item, or unspecified? From guido at python.org Fri Dec 15 13:28:29 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Dec 2017 10:28:29 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: Whatever it does in 3.6. On Fri, Dec 15, 2017 at 10:19 AM, Serhiy Storchaka wrote: > 15.12.17 17:53, Guido van Rossum ????: > >> Make it so. "Dict keeps insertion order" is the ruling. Thanks! >> > > What should dict.popitem() return? The first item, the last item, or > unspecified? > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido% > 40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Dec 15 13:29:18 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 15 Dec 2017 18:29:18 +0000 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: On 15 December 2017 at 18:19, Serhiy Storchaka wrote: > 15.12.17 17:53, Guido van Rossum ????: >> >> Make it so. "Dict keeps insertion order" is the ruling. Thanks! > > What should dict.popitem() return? The first item, the last item, or > unspecified? I'd say leave it as unspecified. Paul From ericsnowcurrently at gmail.com Fri Dec 15 13:30:52 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 15 Dec 2017 11:30:52 -0700 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: On Fri, Dec 15, 2017 at 8:53 AM, Guido van Rossum wrote: > Make it so. "Dict keeps insertion order" is the ruling. Thanks! Does that include preserving order after deletion? -eric From v+python at g.nevcal.com Fri Dec 15 13:06:21 2017 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 15 Dec 2017 10:06:21 -0800 Subject: [Python-Dev] Support of the Android platform In-Reply-To: <2063a4ed-d741-7ea0-4b8e-5e79c8c9e5d5@gmail.com> References: <2063a4ed-d741-7ea0-4b8e-5e79c8c9e5d5@gmail.com> Message-ID: On 12/15/2017 8:29 AM, Xavier de Gaye wrote: > On 12/14/2017 02:59 PM, Victor Stinner wrote: > > It seems like Android is evolving quickly, would say quicker than > > Python releases. I'm asking if it's a good idea to put a recipe aside > > the Python source code for one specific Android API version? Would it > > still make sense to build for NDK v14 in 2 or 5 years? > > NDK 14 has been released in march 2017 and the latest release is NDK > 16. There are sometimes major changes between releases and I think it > is critical to ensure that the builds all use the same NDK release for > that reason. Supporting another NDK release is just a substitution in > one of the files of the build system and I am sure that in 2 or 5 > years there would have been a core developer smart enough to make that > substitution (this could even have been me, I will only be 71 years > old in 5 years :-)). Anyway if this is a problem, this should have > been discussed in a review of the PR. > > There are concerns, including a concern raised by me, about supporting > Android with that build system or to supporting Android at all. It has > been interesting and gratifying to work on this build system and to > get the Python test suite running on Android without failures. Given > these concerns and the lack of interest in the support of Android it > is time for me to switch to something else, maybe improve the bdb > module, why not ? > > Xavier I, for one, would love to see Android become a supported platform. My understanding is that APIs are generally backward compatible, so that programs created with one API continue to work on future APIs... there may be new features they don't use and maybe can't support, because they don't know about newer APIs, but that is true of Windows and Linux and Mac also. It doesn't seem like it would be necessary to "support" or "release" an official Python for every new Android version, but it would be nice to have ongoing support for a recent version each time Python is released, if Xavier or someone can do it.? It is certainly a mainstream platform for development these days. I see a variety of Python apps on Android, but I have no idea how well they work, or what their source is, or if they are supported, or what features of Python or its standard library might not be available, etc. I've been too busy on other projects to take time to investigate them. Glenn -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Dec 15 13:35:46 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 15 Dec 2017 20:35:46 +0200 Subject: [Python-Dev] Make __class_getitem__ a class method In-Reply-To: References: Message-ID: 15.12.17 20:02, Yury Selivanov ????: > But nobody decorates __new__ with a @staticmethod. And making > __class_getitem__ a @classmethod will only confuse users -- that's all > I'm saying. > > So I'm +1 to keep the things exactly as they are now. It would be > great do document that in order to implement __class_getitem__ in C > one should add it as METH_STATIC. I also think we should merge your > PR that tests that it works the way it's expected. In this case I suggest to make __class_getitem__ an automatic class method like __init_subclass__. The number of special cases bothers me. From guido at python.org Fri Dec 15 13:35:27 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Dec 2017 10:35:27 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: On Fri, Dec 15, 2017 at 10:30 AM, Eric Snow wrote: > On Fri, Dec 15, 2017 at 8:53 AM, Guido van Rossum > wrote: > > Make it so. "Dict keeps insertion order" is the ruling. Thanks! > > Does that include preserving order after deletion? Yes, that's what the rest of the thread was about. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Dec 15 13:41:24 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 15 Dec 2017 11:41:24 -0700 Subject: [Python-Dev] Make __class_getitem__ a class method In-Reply-To: References: Message-ID: On Fri, Dec 15, 2017 at 11:35 AM, Serhiy Storchaka wrote: > In this case I suggest to make __class_getitem__ an automatic class method > like __init_subclass__. +1 I was just about to suggest the same thing. -eric From tim.peters at gmail.com Fri Dec 15 13:47:54 2017 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 15 Dec 2017 12:47:54 -0600 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: [Eric Snow ] > Does that include preserving order after deletion? Given that we're blessing current behavior: - At any moment, iteration order is from oldest to newest. So, "yes" to your question. - While iteration starts with the oldest, .popitem() returns the youngest. This is analogous to how lists work, viewing a dict similarly ordered "left to right" (iteration starts at the left, .pop() at the right, for lists and dicts). From python-dev at mgmiller.net Fri Dec 15 14:07:15 2017 From: python-dev at mgmiller.net (Mike Miller) Date: Fri, 15 Dec 2017 11:07:15 -0800 Subject: [Python-Dev] Support of the Android platform In-Reply-To: References: <2063a4ed-d741-7ea0-4b8e-5e79c8c9e5d5@gmail.com> Message-ID: I've used Kivy with buildozer on Android and it generally works well, with a few issues. Currently it uses the Crystax NDK for Python 3 support. Does anyone know how this development will affect it? -Mike On 2017-12-15 10:06, Glenn Linderman wrote: > I see a variety of Python apps on Android, but I have no idea how well they > work, or what their source is, or if they are supported, or what features of > Python or its standard library might not be available, etc. I've been too busy > on other projects to take time to investigate them. From python at mrabarnett.plus.com Fri Dec 15 14:47:23 2017 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 15 Dec 2017 19:47:23 +0000 Subject: [Python-Dev] Accepting PEP 560 -- Core support for typing module and generic types In-Reply-To: <20171215173630.4cff46e8@fsol> References: <20171215173630.4cff46e8@fsol> Message-ID: On 2017-12-15 16:36, Antoine Pitrou wrote: > On Fri, 15 Dec 2017 14:05:46 +0200 > Serhiy Storchaka wrote: >> 15.12.17 04:00, Guido van Rossum ????: >> > In the light of Antoine's and Stephan's feedback I think this can be >> > reconsidered -- while I want to take a cautious stance about resource >> > consumption I don't want to stand in the way of progress. >> >> I don't see any problems with implementing this on types defined in C. >> This isn't harder than implementing __sizeof__ or pickling support, and >> NumPy classes already have implemented both. Maybe Yury forgot about >> METH_STATIC and METH_CLASS? >> >> The cost of adding new slots: >> >> 1. Increased memory consumption. This increases the size of *every* >> class, even if they don't implement this feature. >> >> 2. Increased class initialization time. For every class for every slot >> we need to look up corresponding methods in dictionaries of the class >> itself and all its parents (caching doesn't work fine at this stage). >> Significant part of class initialization time is spent on initializing >> slots. This will increase the startup time and the time of creating >> local classes. The relative overhead is more significant in Cython. >> >> 3. We need to add a new type feature flag Py_TPFLAGS_HAVE_*. The number >> of possible flags is limited, and most bits already are used. We can add >> the limited number of new slots, and should not spent this resource >> without large need. >> >> 4. Increased complexity. Currently the code related to PEP 560 is >> located in few places. With supporting new slots we will need to touch >> more delicate code not related directly to PEP 560. It is hard to review >> and to test such kind of changes. I can't guarantee the correctness. > > These are all very good points (except #1 which I think is a red > herring, see my posted example). Do you have any general idea how to > speed up class creation? > Re the flags, could a flag be used to indicate that there are additional flags? From barry at python.org Fri Dec 15 14:55:19 2017 From: barry at python.org (Barry Warsaw) Date: Fri, 15 Dec 2017 14:55:19 -0500 Subject: [Python-Dev] New crash in test_embed on macOS 10.12 Message-ID: <1DB90047-32AD-4DCA-BB38-CFA2A202AD3D@python.org> I haven?t bisected this yet, but with git head, built and tested on macOS 10.12.6 and Xcode 9.2, I?m seeing this crash in test_embed: ====================================================================== FAIL: test_bpo20891 (test.test_embed.EmbeddingTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/barry/projects/python/cpython/Lib/test/test_embed.py", line 207, in test_bpo20891 out, err = self.run_embedded_interpreter("bpo20891") File "/Users/barry/projects/python/cpython/Lib/test/test_embed.py", line 59, in run_embedded_interpreter (p.returncode, err)) AssertionError: -6 != 0 : bad returncode -6, stderr is 'Fatal Python error: PyEval_SaveThread: NULL tstate\n\nCurrent thread 0x00007fffcb58a3c0 (most recent call first):\n' Seems reproducible across different machines (all running 10.12.6 and Xcode 9.2), even after a make clean and configure. I don?t see the same failure on Debian, and I don?t see the crashes on the buildbots. Can anyone verify? -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From chris.barker at noaa.gov Fri Dec 15 15:07:18 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 15 Dec 2017 12:07:18 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> Message-ID: Sorry about the email mangling -- I do a lot of my listserve work on the bus on an iPhone, with the built -in mail client -- and it REALLY sucks for doing interspersed email replying -- highly encouraging the dreaded top posting... But anyway, I think both Steve and I were expressing concerns about "Typing Creep". Typing should always be optional in Python, and while this PEP does keep it optional, Steve's point was that the code in the standard library serves not only as a library, but as examples of how to write "robust" python code. The rest of this note is me -- I'm not pretending ot speak for Steve. Reading the PEP, this text makes me uneasy: "A field is defined as any variable identified in__annotations__. That is, a variable that has a type annotation." And if I understand the rest of the PEP, while typing itself is optional, the use of type Annotation is not -- it is exactly what's being used to generate the fields the user wants. And the examples are all using typing -- granted, primarily the built in types, but still: @dataclass class C: a: int # 'a' has no default value b: int = 0 # assign a default value for 'b' This sure LOOKS like typing is required. It also makes me nervous because, as I understand it, the types aren't actually used in the implementation (presumable they would be by mypy and the like?). So I think for folks that aren't using typing and a type checker in their development process, it would be pretty confusing that this means and what it actually does. Particularly folks that are coming from a background of a statically typed language. Then I see: """ Field objects describe each defined field. ... Its documented attributes are: name: The name of the field. type: The type of the field. ... """ So again, typing looks to be pretty baked in to the whole concept. and then: """ One place where dataclass actually inspects the type of a field is to determine if a field is a class variable as defined in PEP 526. """ and """ The other place where dataclass inspects a type annotation is to determine if a field is an init-only variable. It does this by seeing if the type of a field is of type dataclasses.InitVar. """ """ Data Classes will raise a TypeError if it detects a default parameter of type list, dict, or set. """ So: it seems that type hinting, while not required to use Data Classes, is very much baked into the implementation an examples. As I said -- this makes me uneasy -- It's a very big step that essentially promotes the type hinting to a new place in Python -- you will not be able to use a standard library class without at least a little thought about types and typing. I note this: """ This discussion started on python-ideas [9] and was moved to a GitHub repo [10] for further discussion. As part of this discussion, we made the decision to use PEP 526 syntax to drive the discovery of fields. """ I confess I only vaguely followed that discussion -- in fact, mostly I thought that the concept of Data Classes was a good one, and was glad to see SOMETHING get implemented, and didn't think I had much to contribute to the details of how it was done. So these issues may have already been raised and considered, so carry on. But: NOTE: from PEP 526: "Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention. " The Data Classes implementation is not making it mandatory by any means, but it is making it a more "standard" part of the language that can not simply be ignored anymore. And it seems some features of dataclasses can only be accessed via actual typing, in addition to the requirement of type annotations. If nothing else, the documentation should make it very clear that the typing aspects of Data Classes is indeed optional, and preferably give some untyped examples, something like: @dataclass class C: a: None # 'a' has no default value b: None = 0 # assign a default value for 'b' If, in fact, that would be the way to do it. -Chris On Fri, Dec 15, 2017 at 3:22 AM, Eric V. Smith wrote: > On 12/15/2017 5:56 AM, Steve Holden wrote: > >> On Mon, Dec 11, 2017 at 5:10 PM, Chris Barker - NOAA Federal < >> chris.barker at noaa.gov > wrote: >> > ... > >> However, typing is not currently imported by dataclasses.py. >>> >> > > >> And there you have an actual reason besides my uneasiness :-) >> >> - CHB >> >> ?hmm...? >> > > [Agreed with Antoine on the MUA and quoting being confusing.] > > The only reason typing isn't imported is performance. I hope that once PEP > 560 is complete this will no longer be an issue, and dataclasses will > always import typing. But of course typing will still not be needed for > most uses of @dataclass or make_dataclass(). This is explained in the PEP. > > Eric. > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Dec 15 15:19:56 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 15 Dec 2017 12:19:56 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> Message-ID: One other note (see my last message). The PEP should include a summary of the discussion of the decision to use the type annotation syntax vs other options. I just looked through all the gitHub issues and found nothing, and started to look at the python-ideas list archive and got overwhelmed. So having that justification in the PEP would be good. -CHB On Fri, Dec 15, 2017 at 12:07 PM, Chris Barker wrote: > Sorry about the email mangling -- I do a lot of my listserve work on the > bus on an iPhone, with the built -in mail client -- and it REALLY sucks for > doing interspersed email replying -- highly encouraging the dreaded top > posting... > > But anyway, I think both Steve and I were expressing concerns about > "Typing Creep". Typing should always be optional in Python, and while this > PEP does keep it optional, Steve's point was that the code in the standard > library serves not only as a library, but as examples of how to write > "robust" python code. > > The rest of this note is me -- I'm not pretending ot speak for Steve. > > Reading the PEP, this text makes me uneasy: > > "A field is defined as any variable identified in__annotations__. That > is, a variable that has a type annotation." > > And if I understand the rest of the PEP, while typing itself is optional, > the use of type Annotation is not -- it is exactly what's being used to > generate the fields the user wants. > > And the examples are all using typing -- granted, primarily the built in > types, but still: > > > @dataclass > class C: > a: int # 'a' has no default value > b: int = 0 # assign a default value for 'b' > > > This sure LOOKS like typing is required. It also makes me nervous because, > as I understand it, the types aren't actually used in the > implementation (presumable they would be by mypy and the like?). So I think > for folks that aren't using typing and a type checker in their development > process, it would be pretty confusing that this means and what it actually > does. Particularly folks that are coming from a background of a statically > typed language. > > Then I see: > > """ > Field objects describe each defined field. > ... > Its documented attributes are: > > name: The name of the field. > type: The type of the field. > ... > """ > > So again, typing looks to be pretty baked in to the whole concept. > > and then: > > """ > One place where dataclass actually inspects the type of a field is to > determine if a field is a class variable as defined in PEP 526. > """ > > and > > """ > The other place where dataclass inspects a type annotation is to determine > if a field is an init-only variable. It does this by seeing if the type of > a field is of type dataclasses.InitVar. > """ > > """ > Data Classes will raise a TypeError if it detects a default parameter of > type list, dict, or set. > """ > > So: it seems that type hinting, while not required to use Data Classes, is > very much baked into the implementation an examples. > > As I said -- this makes me uneasy -- It's a very big step that essentially > promotes the type hinting to a new place in Python -- you will not be able > to use a standard library class without at least a little thought about > types and typing. > > I note this: > > """ > This discussion started on python-ideas [9] and was moved to a GitHub repo > [10] for further discussion. As part of this discussion, we made the > decision to use PEP 526 syntax to drive the discovery of fields. > """ > > I confess I only vaguely followed that discussion -- in fact, mostly I > thought that the concept of Data Classes was a good one, and was glad to > see SOMETHING get implemented, and didn't think I had much to contribute to > the details of how it was done. So these issues may have already been > raised and considered, so carry on. > > But: > > NOTE: from PEP 526: > > "Python will remain a dynamically typed language, and the authors have no > desire to ever make type hints mandatory, even by convention. " > > The Data Classes implementation is not making it mandatory by any means, > but it is making it a more "standard" part of the language that can not > simply be ignored anymore. And it seems some features of dataclasses can > only be accessed via actual typing, in addition to the requirement of type > annotations. > > If nothing else, the documentation should make it very clear that the > typing aspects of Data Classes is indeed optional, and preferably give some > untyped examples, something like: > > @dataclass > class C: > a: None # 'a' has no default value > b: None = 0 # assign a default value for 'b' > > > If, in fact, that would be the way to do it. > > -Chris > > > > > On Fri, Dec 15, 2017 at 3:22 AM, Eric V. Smith wrote: > >> On 12/15/2017 5:56 AM, Steve Holden wrote: >> >>> On Mon, Dec 11, 2017 at 5:10 PM, Chris Barker - NOAA Federal < >>> chris.barker at noaa.gov > wrote: >>> >> ... >> >>> However, typing is not currently imported by dataclasses.py. >>>> >>> > >> >>> And there you have an actual reason besides my uneasiness :-) >>> >>> - CHB >>> >>> ?hmm...? >>> >> >> [Agreed with Antoine on the MUA and quoting being confusing.] >> >> The only reason typing isn't imported is performance. I hope that once >> PEP 560 is complete this will no longer be an issue, and dataclasses will >> always import typing. But of course typing will still not be needed for >> most uses of @dataclass or make_dataclass(). This is explained in the PEP. >> >> Eric. >> >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Fri Dec 15 15:14:54 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 15 Dec 2017 12:14:54 -0800 Subject: [Python-Dev] New crash in test_embed on macOS 10.12 In-Reply-To: <1DB90047-32AD-4DCA-BB38-CFA2A202AD3D@python.org> References: <1DB90047-32AD-4DCA-BB38-CFA2A202AD3D@python.org> Message-ID: <330219AB-B024-4306-A239-8812F2A77134@gmail.com> > On Dec 15, 2017, at 11:55 AM, Barry Warsaw wrote: > > I haven?t bisected this yet, but with git head, built and tested on macOS 10.12.6 and Xcode 9.2, I?m seeing this crash in test_embed: > > ====================================================================== > FAIL: test_bpo20891 (test.test_embed.EmbeddingTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/Users/barry/projects/python/cpython/Lib/test/test_embed.py", line 207, in test_bpo20891 > out, err = self.run_embedded_interpreter("bpo20891") > File "/Users/barry/projects/python/cpython/Lib/test/test_embed.py", line 59, in run_embedded_interpreter > (p.returncode, err)) > AssertionError: -6 != 0 : bad returncode -6, stderr is 'Fatal Python error: PyEval_SaveThread: NULL tstate\n\nCurrent thread 0x00007fffcb58a3c0 (most recent call first):\n' > > Seems reproducible across different machines (all running 10.12.6 and Xcode 9.2), even after a make clean and configure. I don?t see the same failure on Debian, and I don?t see the crashes on the buildbots. > > Can anyone verify? I saw this same test failure. After a "make distclean", it went away. Raymond From barry at python.org Fri Dec 15 15:42:12 2017 From: barry at python.org (Barry Warsaw) Date: Fri, 15 Dec 2017 15:42:12 -0500 Subject: [Python-Dev] New crash in test_embed on macOS 10.12 In-Reply-To: <330219AB-B024-4306-A239-8812F2A77134@gmail.com> References: <1DB90047-32AD-4DCA-BB38-CFA2A202AD3D@python.org> <330219AB-B024-4306-A239-8812F2A77134@gmail.com> Message-ID: On Dec 15, 2017, at 15:14, Raymond Hettinger wrote: > > I saw this same test failure. After a "make distclean", it went away. Dang, not for me. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From njs at pobox.com Fri Dec 15 15:44:46 2017 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 15 Dec 2017 12:44:46 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: On Dec 15, 2017 10:50, "Tim Peters" wrote: [Eric Snow ] > Does that include preserving order after deletion? Given that we're blessing current behavior: - At any moment, iteration order is from oldest to newest. So, "yes" to your question. - While iteration starts with the oldest, .popitem() returns the youngest. This is analogous to how lists work, viewing a dict similarly ordered "left to right" (iteration starts at the left, .pop() at the right, for lists and dicts). Fortunately, this also matches OrderedDict.popitem(). It'd be nice if we could also support dict.popitem(last=False) to get the other behavior, again matching OrderedDict. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Fri Dec 15 15:45:27 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 15 Dec 2017 12:45:27 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: > On Dec 15, 2017, at 7:53 AM, Guido van Rossum wrote: > > Make it so. "Dict keeps insertion order" is the ruling. On Twitter, someone raised an interesting question. Is the guarantee just for 3.7 and later? Or will the blessing also cover 3.6 where it is already true. The 3.6 guidance is to use OrderedDict() when ordering is required. As of now, that guidance seems superfluous and may no longer be a sensible practice. For example, it would be nice for Eric Smith when he does his 3.6 dataclasses backport to not have to put OrderedDict back in the code. Do you still have the keys to the time machine? Raymond From chris.barker at noaa.gov Fri Dec 15 15:47:23 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 15 Dec 2017 12:47:23 -0800 Subject: [Python-Dev] f-strings In-Reply-To: References: <7809465429117446362@unknownmsgid> Message-ID: On Fri, Dec 15, 2017 at 9:39 AM, Mariatta Wijaya wrote: > I agree it's useful info :) > > I went ahead and made a PR [1]. > Thanks! I added a couple comments to that PR. > Not sure about updating PEP 498 at this point.. > A little clarification text would be nice. I made a PR for that: https://github.com/python/peps/pull/514 -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Dec 15 16:04:30 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 15 Dec 2017 22:04:30 +0100 Subject: [Python-Dev] New crash in test_embed on macOS 10.12 In-Reply-To: <1DB90047-32AD-4DCA-BB38-CFA2A202AD3D@python.org> References: <1DB90047-32AD-4DCA-BB38-CFA2A202AD3D@python.org> Message-ID: Hi, 2017-12-15 20:55 GMT+01:00 Barry Warsaw : > I haven?t bisected this yet, but with git head, built and tested on macOS 10.12.6 and Xcode 9.2, I?m seeing this crash in test_embed: > > ====================================================================== > FAIL: test_bpo20891 (test.test_embed.EmbeddingTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/Users/barry/projects/python/cpython/Lib/test/test_embed.py", line 207, in test_bpo20891 > out, err = self.run_embedded_interpreter("bpo20891") > File "/Users/barry/projects/python/cpython/Lib/test/test_embed.py", line 59, in run_embedded_interpreter > (p.returncode, err)) > AssertionError: -6 != 0 : bad returncode -6, stderr is 'Fatal Python error: PyEval_SaveThread: NULL tstate\n\nCurrent thread 0x00007fffcb58a3c0 (most recent call first):\n' > > Seems reproducible across different machines (all running 10.12.6 and Xcode 9.2), even after a make clean and configure. I don?t see the same failure on Debian, and I don?t see the crashes on the buildbots. > > Can anyone verify? It's a known issue. The hint is in the method name: test_bpo20891 :-) https://bugs.python.org/issue20891#msg307553 I fixed a bug and added a test for it, but the test showed an unknown race condition. I wrote a fix for the race condition, but I was asked to run a benchmark and I didn't run it yet. https://github.com/python/cpython/pull/4700 I hesitate to skip the test until I fix the second bug. Victor From p.f.moore at gmail.com Fri Dec 15 16:14:34 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 15 Dec 2017 21:14:34 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> Message-ID: On 15 December 2017 at 20:07, Chris Barker wrote: > And if I understand the rest of the PEP, while typing itself is optional, > the use of type Annotation is not -- it is exactly what's being used to > generate the fields the user wants. > > And the examples are all using typing -- granted, primarily the built in > types, but still: > > > @dataclass > class C: > a: int # 'a' has no default value > b: int = 0 # assign a default value for 'b' > > > This sure LOOKS like typing is required. It also makes me nervous because, > as I understand it, the types aren't actually used in the implementation > (presumable they would be by mypy and the like?). So I think for folks that > aren't using typing and a type checker in their development process, it > would be pretty confusing that this means and what it actually does. > Particularly folks that are coming from a background of a statically typed > language. I actually don't have any problem with this. It looks natural to me, reads perfectly fine, and is a far better way of defining fields than many of the other approaches that I've seen in the past (that don't use annotations). The one thing I would find surprising is that the actual type used is ignored. @dataclass class C: a: str = 0 AIUI this is valid, but it looks weird to me. There's an easy answer, though - just don't do that. > Then I see: > > """ > Field objects describe each defined field. > ... > Its documented attributes are: > > name: The name of the field. > type: The type of the field. > ... > """ > > So again, typing looks to be pretty baked in to the whole concept. Well, being able to see the type the class author intended is a feature. I don't know I'd consider that as meaning typing is "baked in". It's useful but ignorable data. > and then: > > """ > One place where dataclass actually inspects the type of a field is to > determine if a field is a class variable as defined in PEP 526. > """ > > and > > """ > The other place where dataclass inspects a type annotation is to determine > if a field is an init-only variable. It does this by seeing if the type of a > field is of type dataclasses.InitVar. > """ Those are somewhat more explicit cases of directly using type annotations as declarations. But what alternative would you propose? It still seems fine to me. > """ > Data Classes will raise a TypeError if it detects a default parameter of > type list, dict, or set. > """ Doesn't that mean that @dataclass class C: a: int = [] raises an error? The problem here is the same as that of mutable function default parameters - we don't want every instance of C to share a single list object as their default value for a. It's got nothing to do with the annotation (that's why I used the deliberately-inconsistent annotation of int here). I'm a strong +1 on making this an error, as it's likely to be an easy mistake to make, and quite hard to debug. > So: it seems that type hinting, while not required to use Data Classes, is > very much baked into the implementation an examples. Annotations and the annotation syntax are fundamental to the design. But that's core Python syntax. But I wouldn't describe types as being that significant to the design, it's more "if you supply them we'll make use of them". Don't forget, function parameter annotations were around long before typing. Variable annotations weren't, but they could have been - it's just that typing exposed a use case for them. Data classes could just as easily have been the motivating use case for PEP 526. > As I said -- this makes me uneasy -- It's a very big step that essentially > promotes the type hinting to a new place in Python -- you will not be able > to use a standard library class without at least a little thought about > types and typing. I will say that while I don't use typing or mypy at all in my code, I don't have any particular dislike of the idea of typing, or the syntax for declaring annotations. So I find it hard to understand your concerns here. My personal uneasiness is actually somewhat the opposite - I find it disconcerting that if I annotate a variable/parameter as having type int, nothing stops me assigning a string to it. But that's *precisely* what typing being optional means, so while it seems odd to my static typing instincts, it's entirely within the spirit of not forcing typing onto Python. > If nothing else, the documentation should make it very clear that the typing > aspects of Data Classes is indeed optional, and preferably give some untyped > examples, something like: > > @dataclass > class C: > a: None # 'a' has no default value > b: None = 0 # assign a default value for 'b' This does seem like a reasonable option to note. Something along the lines of "If you don't use type annotations in your code, and you want to avoid introducing them, using None as a placeholder for the type is sufficient". However, I suspect that using None as a "I don't really want to assign a type" value might well confuse mypy - I don't know. But using typing.Any (which is what mypy would expect) clearly doesn't meet the "avoid typing totally" requirement here. Maybe (mis-)using string annotations, like @dataclass class C: a: 'variable' # 'a' has no default value b: 'variable' = 0 # assign a default value for 'b' would work? But honestly, this feels like jumping through hoops purely to avoid using int "because it means I've bought into the idea of typing". I guess if you're that adamant about never wanting to use typing in your code, data classes would make you uncomfortable. But conversely, I don't see the value in making data classes clumsier than they need to be out of a purist principle to not use a perfectly valid Python syntax. Paul From guido at python.org Fri Dec 15 16:47:04 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Dec 2017 13:47:04 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: On Fri, Dec 15, 2017 at 12:45 PM, Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > > > On Dec 15, 2017, at 7:53 AM, Guido van Rossum wrote: > > > > Make it so. "Dict keeps insertion order" is the ruling. > > On Twitter, someone raised an interesting question. > > Is the guarantee just for 3.7 and later? Or will the blessing also cover > 3.6 where it is already true. > > The 3.6 guidance is to use OrderedDict() when ordering is required. As of > now, that guidance seems superfluous and may no longer be a sensible > practice. For example, it would be nice for Eric Smith when he does his > 3.6 dataclasses backport to not have to put OrderedDict back in the code. > For 3.6 we can't change the language specs, we can just document how it works in CPython. I don't know what other Python implementations do in their version that's supposed to be compatible with 3.6 but I don't want to retroactively declare them non-conforming. (However for 3.7 they have to follow suit.) I also don't think that the "it stays ordered across deletions" part of the ruling is true in CPython 3.6. I don't know what guidance to give Eric, because I don't know what other implementations do nor whether Eric cares about being compatible with those. IIUC micropython does not guarantee this currently, but I don't know if they claim Python 3.6 compatibility -- in fact I can't find any document that specifies the Python version they're compatible with more precisely than "Python 3". -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Fri Dec 15 17:48:10 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Fri, 15 Dec 2017 23:48:10 +0100 Subject: [Python-Dev] __init_subclass__ is a class method (Was: Make __class_getitem__ a class method) In-Reply-To: References: Message-ID: On 15 December 2017 at 18:40, Serhiy Storchaka wrote: > 15.12.17 19:04, Ivan Levkivskyi ????: > >> Good point! Pure Python will be the primary use case and we have another >> precedent >> for "automatic" class method: __init_subclass__ (it does not need to be >> decorated). >> > > __init_subclass__ is very different beast, and parallels with it can be > confusing. It is automatically decorated with classmethod if it is a > regular function implemented in C. The following two examples are totally > equivalent: > > class A: > def __init_subclass__(cls): pass > > class B: > @classmethod > def __init_subclass__(cls): pass > > help(A) shows __init_subclass__() as a class method (in 3.7). > > But if you implement the class in C you need to make __init_subclass__ a > class method. > > I think __init_subclass__ should be documented as a class method since it > is a class method. > Thank you for clarification! Actually documentation https://docs.python.org/3.6/reference/datamodel.html#customizing-class-creation already says `classmethod object.__init_subclass__(cls)` I am not an expert in this, so I am not sure if the docs can be improved here (maybe we can add how this works with C API?) -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Fri Dec 15 18:00:47 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Sat, 16 Dec 2017 00:00:47 +0100 Subject: [Python-Dev] Make __class_getitem__ a class method In-Reply-To: References: Message-ID: On 15 December 2017 at 19:35, Serhiy Storchaka wrote: > 15.12.17 20:02, Yury Selivanov ????: > >> But nobody decorates __new__ with a @staticmethod. And making >> __class_getitem__ a @classmethod will only confuse users -- that's all >> I'm saying. >> >> So I'm +1 to keep the things exactly as they are now. It would be >> great do document that in order to implement __class_getitem__ in C >> one should add it as METH_STATIC. I also think we should merge your >> PR that tests that it works the way it's expected. >> > > In this case I suggest to make __class_getitem__ an automatic class method > like __init_subclass__. > > The number of special cases bothers me. > I just want to clarify what is proposed. As I understand: * From the point of view of a pure Python class there will be no difference with the current behaviour, one just writes class C: def __class_getitem__(cls, item): ... * In `type_new`, `__class_getitem__` will be wrapped in classmethod * From the point of view of C extensions one will use METH_CLASS and no tuple unpacking If this is true that this looks reasonable. If no-one is against, then I can make a PR. The only downside to this that I see is that `type.__new__` will be slightly slower. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Dec 15 18:31:52 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 16 Dec 2017 01:31:52 +0200 Subject: [Python-Dev] __init_subclass__ is a class method (Was: Make __class_getitem__ a class method) In-Reply-To: References: Message-ID: 16.12.17 00:48, Ivan Levkivskyi ????: > Actually documentation > https://docs.python.org/3.6/reference/datamodel.html#customizing-class-creation > already says `classmethod object.__init_subclass__(cls)` > I am not an expert in this, so I am not sure if the docs can be improved > here (maybe we can add how this works with C API?) Sorry, I had wrote my previous message before reading the documentation, just after reading the sources. Now I have discovered that the special behavior of __init_subclass__ is already documented. I don't know whether it is needed to add something more. From raymond.hettinger at gmail.com Fri Dec 15 18:36:27 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 15 Dec 2017 15:36:27 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: <628E027D-7D8D-46F4-B4F9-778F7648FFB3@gmail.com> > On Dec 15, 2017, at 1:47 PM, Guido van Rossum wrote: > > On Fri, Dec 15, 2017 at 12:45 PM, Raymond Hettinger wrote: > > > On Dec 15, 2017, at 7:53 AM, Guido van Rossum wrote: > > > > Make it so. "Dict keeps insertion order" is the ruling. > > On Twitter, someone raised an interesting question. > > Is the guarantee just for 3.7 and later? Or will the blessing also cover 3.6 where it is already true. > > The 3.6 guidance is to use OrderedDict() when ordering is required. As of now, that guidance seems superfluous and may no longer be a sensible practice. For example, it would be nice for Eric Smith when he does his 3.6 dataclasses backport to not have to put OrderedDict back in the code. > > For 3.6 we can't change the language specs, we can just document how it works in CPython. I don't know what other Python implementations do in their version that's supposed to be compatible with 3.6 but I don't want to retroactively declare them non-conforming. (However for 3.7 they have to follow suit.) I also don't think that the "it stays ordered across deletions" part of the ruling is true in CPython 3.6. FWIW, the regular dict does stay ordered across deletions in CPython3.6: >>> d = dict(a=1, b=2, c=3, d=4) >>> del d['b'] >>> d['b'] = 5 >>> d {'a': 1, 'c': 3, 'd': 4, 'b': 5} Here's are more interesting demonstration: from random import randrange, shuffle from collections import OrderedDict population = 1000000 s = list(range(population // 4)) shuffle(s) d = dict.fromkeys(s) od = OrderedDict.fromkeys(s) for i in range(500000): k = randrange(population) d[k] = i od[k] = i k = randrange(population) if k in d: del d[k] del od[k] assert list(d.items()) == list(od.items()) The dict object insertion logic just appends to the arrays of keys, values, and hashvalues. When the number of usable elements decreases to zero (reaching the limit of the most recent array allocation), the dict is resized (compacted) left-to-right so that order is preserved. Here are some of the relevant sections from the 3.6 source tree: Objects/dictobject.c line 89: Preserving insertion order It's simple for combined table. Since dk_entries is mostly append only, we can get insertion order by just iterating dk_entries. One exception is .popitem(). It removes last item in dk_entries and decrement dk_nentries to achieve amortized O(1). Since there are DKIX_DUMMY remains in dk_indices, we can't increment dk_usable even though dk_nentries is decremented. In split table, inserting into pending entry is allowed only for dk_entries[ix] where ix == mp->ma_used. Inserting into other index and deleting item cause converting the dict to the combined table. Objects/dictobject.c::insertdict() line 1140: if (mp->ma_keys->dk_usable <= 0) { /* Need to resize. */ if (insertion_resize(mp) < 0) { Py_DECREF(value); return -1; } hashpos = find_empty_slot(mp->ma_keys, key, hash); } Objects/dictobject.c::dictresize() line 1282: PyDictKeyEntry *ep = oldentries; for (Py_ssize_t i = 0; i < numentries; i++) { while (ep->me_value == NULL) ep++; newentries[i] = *ep++; } > > I don't know what guidance to give Eric, because I don't know what other implementations do nor whether Eric cares about being compatible with those. IIUC micropython does not guarantee this currently, but I don't know if they claim Python 3.6 compatibility -- in fact I can't find any document that specifies the Python version they're compatible with more precisely than "Python 3". I did a little research and here' what I found: "MicroPython aims to implement the Python 3.4 standard (with selected features from later versions)" -- http://docs.micropython.org/en/latest/pyboard/reference/index.html "PyPy is a fast, compliant alternative implementation of the Python language (2.7.13 and 3.5.3)." -- http://pypy.org/ "Jython 2.7.0 Final Released (May 2015)" -- http://www.jython.org/ "IronPython 2.7.7 released on 2016-12-07" -- http://ironpython.net/ So, it looks like your could say 3.6 does whatever CPython 3.6 already does and not worry about leaving other implementations behind. (And PyPy is actually ahead of us here, having compact and order-preserving dicts for quite a while). Cheers, Raymond From guido at python.org Fri Dec 15 19:51:59 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 15 Dec 2017 16:51:59 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: <628E027D-7D8D-46F4-B4F9-778F7648FFB3@gmail.com> References: <20171104173013.GA4005@bytereef.org> <628E027D-7D8D-46F4-B4F9-778F7648FFB3@gmail.com> Message-ID: Cool, thanks! I'm curious why this was brought up at all then... On Dec 15, 2017 3:36 PM, "Raymond Hettinger" wrote: > > > > On Dec 15, 2017, at 1:47 PM, Guido van Rossum wrote: > > > > On Fri, Dec 15, 2017 at 12:45 PM, Raymond Hettinger < > raymond.hettinger at gmail.com> wrote: > > > > > On Dec 15, 2017, at 7:53 AM, Guido van Rossum > wrote: > > > > > > Make it so. "Dict keeps insertion order" is the ruling. > > > > On Twitter, someone raised an interesting question. > > > > Is the guarantee just for 3.7 and later? Or will the blessing also > cover 3.6 where it is already true. > > > > The 3.6 guidance is to use OrderedDict() when ordering is required. As > of now, that guidance seems superfluous and may no longer be a sensible > practice. For example, it would be nice for Eric Smith when he does his > 3.6 dataclasses backport to not have to put OrderedDict back in the code. > > > > For 3.6 we can't change the language specs, we can just document how it > works in CPython. I don't know what other Python implementations do in > their version that's supposed to be compatible with 3.6 but I don't want to > retroactively declare them non-conforming. (However for 3.7 they have to > follow suit.) I also don't think that the "it stays ordered across > deletions" part of the ruling is true in CPython 3.6. > > FWIW, the regular dict does stay ordered across deletions in CPython3.6: > > >>> d = dict(a=1, b=2, c=3, d=4) > >>> del d['b'] > >>> d['b'] = 5 > >>> d > {'a': 1, 'c': 3, 'd': 4, 'b': 5} > > Here's are more interesting demonstration: > > from random import randrange, shuffle > from collections import OrderedDict > > population = 1000000 > s = list(range(population // 4)) > shuffle(s) > d = dict.fromkeys(s) > od = OrderedDict.fromkeys(s) > for i in range(500000): > k = randrange(population) > d[k] = i > od[k] = i > k = randrange(population) > if k in d: > del d[k] > del od[k] > assert list(d.items()) == list(od.items()) > > The dict object insertion logic just appends to the arrays of keys, > values, and hashvalues. When the number of usable elements decreases to > zero (reaching the limit of the most recent array allocation), the dict is > resized (compacted) left-to-right so that order is preserved. > > Here are some of the relevant sections from the 3.6 source tree: > > Objects/dictobject.c line 89: > > Preserving insertion order > > It's simple for combined table. Since dk_entries is mostly append > only, we can > get insertion order by just iterating dk_entries. > > One exception is .popitem(). It removes last item in dk_entries and > decrement > dk_nentries to achieve amortized O(1). Since there are DKIX_DUMMY > remains in > dk_indices, we can't increment dk_usable even though dk_nentries is > decremented. > > In split table, inserting into pending entry is allowed only for > dk_entries[ix] > where ix == mp->ma_used. Inserting into other index and deleting item > cause > converting the dict to the combined table. > > Objects/dictobject.c::insertdict() line 1140: > > if (mp->ma_keys->dk_usable <= 0) { > /* Need to resize. */ > if (insertion_resize(mp) < 0) { > Py_DECREF(value); > return -1; > } > hashpos = find_empty_slot(mp->ma_keys, key, hash); > } > > Objects/dictobject.c::dictresize() line 1282: > > PyDictKeyEntry *ep = oldentries; > for (Py_ssize_t i = 0; i < numentries; i++) { > while (ep->me_value == NULL) > ep++; > newentries[i] = *ep++; > } > > > > > I don't know what guidance to give Eric, because I don't know what other > implementations do nor whether Eric cares about being compatible with > those. IIUC micropython does not guarantee this currently, but I don't know > if they claim Python 3.6 compatibility -- in fact I can't find any document > that specifies the Python version they're compatible with more precisely > than "Python 3". > > > I did a little research and here' what I found: > > "MicroPython aims to implement the Python 3.4 standard (with selected > features from later versions)" > -- http://docs.micropython.org/en/latest/pyboard/reference/index.html > > "PyPy is a fast, compliant alternative implementation of the Python > language (2.7.13 and 3.5.3)." > -- http://pypy.org/ > > "Jython 2.7.0 Final Released (May 2015)" > -- http://www.jython.org/ > > "IronPython 2.7.7 released on 2016-12-07" > -- http://ironpython.net/ > > So, it looks like your could say 3.6 does whatever CPython 3.6 already > does and not worry about leaving other implementations behind. (And PyPy > is actually ahead of us here, having compact and order-preserving dicts for > quite a while). > > Cheers, > > > Raymond > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wagherculano at hotmail.com Fri Dec 15 20:08:49 2017 From: wagherculano at hotmail.com (Wagner Herculano) Date: Sat, 16 Dec 2017 01:08:49 +0000 Subject: [Python-Dev] f-strings In-Reply-To: <7809465429117446362@unknownmsgid> References: , <7809465429117446362@unknownmsgid> Message-ID: Hi, Thank you all for your time. In fact I'm really a newbie and I need to study so hard (including English) to become someone like you all. I don't know if is possible, but I need to try. Once again, thank you all for your time and I'm sorry if I'm bother both of you. Kind regards, Wagner Herculano. ________________________________ From: Chris Barker - NOAA Federal Sent: Friday, December 15, 2017 2:49:08 PM To: Mariatta Wijaya Cc: Wagner Herculano; Python Dev Subject: Re: [Python-Dev] f-strings That's covered under "format specifiers" I think. The PEP mentions this: https://www.python.org/dev/peps/pep-0498/#format-specifiers I can see how a newbie might not realize that that means that f-strings use the same formatting language as the .format() method, and or where to find documentation for it. So somewhere in the docs making that really clear, with a link to the formatting spec documentation would be good. Not sure where though ? a PEP is not designed to be user documentation. -CHB That specific example is not mentioned in the docs, but there other examples of using format specifiers with f-strings. https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals On Dec 15, 2017 7:39 AM, "Wagner Herculano" > wrote: Good evening, I'm Wagner Herculano from Brazil. I was trying to do a table exercise with number 5 and tried formatting spaces and did not find it in PEP 498 documentation. Finally I found a way, if possible, include this example in the documentation please. Below is my script with the desired formatting about table of 5. n = 5 for i in range(1,11): print(f'{n} x {i:>2} = {n*i:>2}') Result 5 x 1 = 5 5 x 2 = 10 5 x 3 = 15 5 x 4 = 20 5 x 5 = 25 5 x 6 = 30 5 x 7 = 35 5 x 8 = 40 5 x 9 = 45 5 x 10 = 50 ----------- Sorry my English, I needed to use Google Translate Best Regards, Wagner Herculano _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/mariatta.wijaya%40gmail.com _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Dec 16 08:22:57 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 16 Dec 2017 14:22:57 +0100 Subject: [Python-Dev] Usefulness of binary compatibility accross Python versions? Message-ID: <20171216142257.2a0c978c@fsol> Hello, Nowadays we have an official mechanism for third-party C extensions to be binary-compatible accross feature releases of Python: the stable ABI. But, for non-stable ABI-using C extensions, there are also mechanisms in place to *try* and ensure binary compatibility. One of them is the way in which we add tp_ slots to the PyTypeObject structure. Typically, when adding a tp_XXX slot, you also need to add a Py_TPFLAGS_HAVE_XXX type flag to signal those static type structures that have been compiled against a recent enough PyTypeObject definition. This way, extensions compiled against Python N-1 are supposed to "still work": as they don't have Py_TPFLAGS_HAVE_XXX set, the core Python runtime won't try to access the (non-existing) tp_XXX member. However, beside internal code complication, it means you need to add a new Py_TPFLAGS_HAVE_XXX each time we add a slot. Since we have only 32 such bits available (many of them already taken), it is a very limited resource. Is it worth it? (*) Can an extension compiled against Python N-1 really claim to be compatible with Python N, despite other possible differences? (*) we can't extend the tp_flags field to 64 bits, precisely because of the binary compatibility problem... Regards Antoine. From guido at python.org Sat Dec 16 12:34:02 2017 From: guido at python.org (Guido van Rossum) Date: Sat, 16 Dec 2017 09:34:02 -0800 Subject: [Python-Dev] Usefulness of binary compatibility accross Python versions? In-Reply-To: <20171216142257.2a0c978c@fsol> References: <20171216142257.2a0c978c@fsol> Message-ID: I think it's more acceptable to require matching versions now than it was 10 years ago -- people are much more likely to use installer tools like pip and conda that can check version compatibility. I think I'd be okay with dropping the flag-based mechanism you describe if we were to introduce a clear mechanism that always rejected a dynamically loaded module if it was compiled for a different Python version. This should happen without any cooperation from the module. Perhaps in Python.h we can introduce a reference to a variable whose name varies by version (major.minor, I think) and which is defined only by the interpreter itself. Or perhaps the version should be based on a separate ABI version. On Sat, Dec 16, 2017 at 5:22 AM, Antoine Pitrou wrote: > > Hello, > > Nowadays we have an official mechanism for third-party C extensions to > be binary-compatible accross feature releases of Python: the stable ABI. > > But, for non-stable ABI-using C extensions, there are also mechanisms > in place to *try* and ensure binary compatibility. One of them is the > way in which we add tp_ slots to the PyTypeObject structure. > > Typically, when adding a tp_XXX slot, you also need to add a > Py_TPFLAGS_HAVE_XXX type flag to signal those static type structures > that have been compiled against a recent enough PyTypeObject > definition. This way, extensions compiled against Python N-1 are > supposed to "still work": as they don't have Py_TPFLAGS_HAVE_XXX set, > the core Python runtime won't try to access the (non-existing) tp_XXX > member. > > However, beside internal code complication, it means you need to add a > new Py_TPFLAGS_HAVE_XXX each time we add a slot. Since we have only 32 > such bits available (many of them already taken), it is a very limited > resource. Is it worth it? (*) Can an extension compiled against Python > N-1 really claim to be compatible with Python N, despite other possible > differences? > > (*) we can't extend the tp_flags field to 64 bits, precisely because of > the binary compatibility problem... > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Dec 16 13:37:54 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 16 Dec 2017 19:37:54 +0100 Subject: [Python-Dev] Usefulness of binary compatibility accross Python versions? In-Reply-To: References: <20171216142257.2a0c978c@fsol> Message-ID: <20171216193754.501aa370@fsol> On Sat, 16 Dec 2017 09:34:02 -0800 Guido van Rossum wrote: > I think it's more acceptable to require matching versions now than it was > 10 years ago -- people are much more likely to use installer tools like pip > and conda that can check version compatibility. > > I think I'd be okay with dropping the flag-based mechanism you describe if > we were to introduce a clear mechanism that always rejected a dynamically > loaded module if it was compiled for a different Python version. This > should happen without any cooperation from the module. Perhaps in Python.h > we can introduce a reference to a variable whose name varies by version > (major.minor, I think) and which is defined only by the interpreter itself. > Or perhaps the version should be based on a separate ABI version. Interestingly, Python 2 had such an API version check (though it would only emit a warning), it was removed as part of PEP 3121: https://github.com/python/cpython/commit/1a21451b1d73b65af949193208372e86bf308411#diff-4664e4ea04dc636b18070ba01cf42d06L39 I haven't been able to find the pre-approval discussion around PEP 3121, so I'm not sure why the API check was removed. The PEP (quite short!) also says nothing about it. Currently, you can pass a `module_api_version` to PyModule_Create2(), but that function is for specialists only :-) ("""Most uses of this function should be using PyModule_Create() instead; only use this if you are sure you need it.""") And the new multi-phase initialization API doesn't seem to support passing an API version: https://docs.python.org/3/c-api/module.html#multi-phase-initialization Fortunately, nowadays all major platforms (Windows, Linux, macOS) tag C extension filenames with the interpreter version (*), e.g.: - "XXX.cpython-35m-darwin.so" on macOS - "XXX?.cp35-win32.pyd" on Windows - "XXX?.cpython-37dm-x86_64-linux-gnu.so" on Linux and others So in practice there should be little potential for confusion, except when renaming the extension file? (*) references: https://bugs.python.org/issue22980 https://docs.python.org/3.7/whatsnew/3.5.html#build-and-c-api-changes Regards Antoine. From solipsis at pitrou.net Sat Dec 16 14:14:06 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 16 Dec 2017 20:14:06 +0100 Subject: [Python-Dev] Usefulness of binary compatibility accross Python versions? References: <20171216142257.2a0c978c@fsol> <20171216193754.501aa370@fsol> Message-ID: <20171216201406.366788d2@fsol> On Sat, 16 Dec 2017 19:37:54 +0100 Antoine Pitrou wrote: > > Currently, you can pass a `module_api_version` to PyModule_Create2(), > but that function is for specialists only :-) > > ("""Most uses of this function should be using PyModule_Create() > instead; only use this if you are sure you need it.""") Ah, it turns out I misunderstood that piece of documentation and also what PEP 3121 really did w.r.t the module API check. PyModule_Create() is actually a *macro* calling PyModule_Create2() with the version number is was compiled against! #ifdef Py_LIMITED_API #define PyModule_Create(module) \ PyModule_Create2(module, PYTHON_ABI_VERSION) #else #define PyModule_Create(module) \ PyModule_Create2(module, PYTHON_API_VERSION) #endif And there's already a check for that version number in moduleobject.c: https://github.com/python/cpython/blob/master/Objects/moduleobject.c#L114 That check is always invoked when calling PyModule_Create() and PyModule_Create2(). Currently it merely invokes a warning, but we can easily turn that into an error. (with apologies to Martin von L?wis for not fully understanding what he did at the time :-)) Regards Antoine. From guido at python.org Sat Dec 16 14:42:15 2017 From: guido at python.org (Guido van Rossum) Date: Sat, 16 Dec 2017 11:42:15 -0800 Subject: [Python-Dev] Usefulness of binary compatibility accross Python versions? In-Reply-To: <20171216201406.366788d2@fsol> References: <20171216142257.2a0c978c@fsol> <20171216193754.501aa370@fsol> <20171216201406.366788d2@fsol> Message-ID: On Sat, Dec 16, 2017 at 11:14 AM, Antoine Pitrou wrote: > On Sat, 16 Dec 2017 19:37:54 +0100 > Antoine Pitrou wrote: > > > > Currently, you can pass a `module_api_version` to PyModule_Create2(), > > but that function is for specialists only :-) > > > > ("""Most uses of this function should be using PyModule_Create() > > instead; only use this if you are sure you need it.""") > > Ah, it turns out I misunderstood that piece of documentation and also > what PEP 3121 really did w.r.t the module API check. > > PyModule_Create() is actually a *macro* calling PyModule_Create2() with > the version number is was compiled against! > > #ifdef Py_LIMITED_API > #define PyModule_Create(module) \ > PyModule_Create2(module, PYTHON_ABI_VERSION) > #else > #define PyModule_Create(module) \ > PyModule_Create2(module, PYTHON_API_VERSION) > #endif > > And there's already a check for that version number in moduleobject.c: > https://github.com/python/cpython/blob/master/Objects/moduleobject.c#L114 > > That check is always invoked when calling PyModule_Create() and > PyModule_Create2(). Currently it merely invokes a warning, but we can > easily turn that into an error. > > (with apologies to Martin von L?wis for not fully understanding what he > did at the time :-)) > If it's only a warning, I worry that if we stop checking the flag bits it can cause wild pointer following. This sounds like it would be a potential security issue (load a module, ignore the warning, try to use a certain API on a class it defines, boom). Also, could there still be 3rd party modules out there that haven't been recompiled in a really long time and use some older backwards compatible module initialization API? (I guess we could stop supporting that and let them fail hard.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Dec 16 18:49:32 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 17 Dec 2017 00:49:32 +0100 Subject: [Python-Dev] Usefulness of binary compatibility accross Python versions? In-Reply-To: References: <20171216142257.2a0c978c@fsol> <20171216193754.501aa370@fsol> <20171216201406.366788d2@fsol> Message-ID: <20171217004932.08386178@fsol> On Sat, 16 Dec 2017 11:42:15 -0800 Guido van Rossum wrote: > > If it's only a warning, I worry that if we stop checking the flag bits it > can cause wild pointer following. This sounds like it would be a potential > security issue (load a module, ignore the warning, try to use a certain API > on a class it defines, boom). Also, could there still be 3rd party modules > out there that haven't been recompiled in a really long time and use some > older backwards compatible module initialization API? (I guess we could > stop supporting that and let them fail hard.) As far as I can tell, all the legacy APIs were removed when PEP 3121 was implemented (Python 3 allowed us to do a clean break here). Regards Antoine. From guido at python.org Sat Dec 16 18:57:23 2017 From: guido at python.org (Guido van Rossum) Date: Sat, 16 Dec 2017 15:57:23 -0800 Subject: [Python-Dev] Usefulness of binary compatibility accross Python versions? In-Reply-To: <20171217004932.08386178@fsol> References: <20171216142257.2a0c978c@fsol> <20171216193754.501aa370@fsol> <20171216201406.366788d2@fsol> <20171217004932.08386178@fsol> Message-ID: Well then, maybe you can propose some specific set of changes? (I'm about to go on vacation and I'd like to focus on other things for the next two weeks though, so don't count on me too much.) On Sat, Dec 16, 2017 at 3:49 PM, Antoine Pitrou wrote: > On Sat, 16 Dec 2017 11:42:15 -0800 > Guido van Rossum wrote: > > > > If it's only a warning, I worry that if we stop checking the flag bits it > > can cause wild pointer following. This sounds like it would be a > potential > > security issue (load a module, ignore the warning, try to use a certain > API > > on a class it defines, boom). Also, could there still be 3rd party > modules > > out there that haven't been recompiled in a really long time and use some > > older backwards compatible module initialization API? (I guess we could > > stop supporting that and let them fail hard.) > > As far as I can tell, all the legacy APIs were removed when PEP 3121 > was implemented (Python 3 allowed us to do a clean break here). > > Regards > > Antoine. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From shangdahao at gmail.com Sun Dec 17 01:04:52 2017 From: shangdahao at gmail.com (=?UTF-8?B?5bCa6L6J?=) Date: Sun, 17 Dec 2017 14:04:52 +0800 Subject: [Python-Dev] Decision of having a deprecation period or not for changing csv.DictReader returning type. Message-ID: Hi, guys In https://github.com/python/cpython/pull/4904, I made csv.DictReader returning regular dict instead of OrderedDict. But this code could break existing code that relied on methods like move_to_end() which are present in OrderedDict() but not in dict(). As rhettinger suggested, such code is either unlikely or rare, so it would be net less disruptive for users to just go forward with this patch. So, would we just go forward with this patch or having a deprecation period before this patch? Any help is big thanks. -- Best Regards . shangdahao -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Dec 17 05:29:34 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 17 Dec 2017 11:29:34 +0100 Subject: [Python-Dev] Decision of having a deprecation period or not for changing csv.DictReader returning type. References: Message-ID: <20171217112934.49dc8bf1@fsol> On Sun, 17 Dec 2017 14:04:52 +0800 ?? wrote: > Hi, guys > > In https://github.com/python/cpython/pull/4904, I made csv.DictReader > returning regular dict instead of OrderedDict. But this code could break > existing code that relied on methods like move_to_end() which are present > in OrderedDict() but not in dict(). What is the motivation to return a regular dict? Is it actually faster on some benchmark? PS: apparently the decision to return an OrderedDict was made in... Python 3.6. Regards Antoine. From listes at salort.eu Sun Dec 17 05:11:35 2017 From: listes at salort.eu (Julien Salort) Date: Sun, 17 Dec 2017 11:11:35 +0100 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> Message-ID: <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Le 15/12/2017 ? 22:14, Paul Moore a ?crit?: > Annotations and the annotation syntax are fundamental to the design. > But that's core Python syntax. But I wouldn't describe types as being > that significant to the design, it's more "if you supply them we'll > make use of them". Naive question from a lurker: does it mean that it works also if one annotates with something that is not a type, e.g. a comment, @dataclass class C: ??? a: "This represents the amplitude" = 0.0 ??? b: "This is an offset" = 0.0 From guido at python.org Sun Dec 17 10:46:45 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 17 Dec 2017 07:46:45 -0800 Subject: [Python-Dev] Decision of having a deprecation period or not for changing csv.DictReader returning type. In-Reply-To: References: Message-ID: My gut suggests me not to do this (neither here nor in other similar cases). I doubt there's much of a performance benefit anyway. On Sat, Dec 16, 2017 at 10:04 PM, ?? wrote: > Hi, guys > > In https://github.com/python/cpython/pull/4904, I made csv.DictReader > returning regular dict instead of OrderedDict. But this code could break > existing code that relied on methods like move_to_end() which are present > in OrderedDict() but not in dict(). > > As rhettinger suggested, such code is either unlikely or rare, so it would > be net less disruptive for users to just go forward with this patch. > > So, would we just go forward with this patch or having a deprecation > period before this patch? > > Any help is big thanks. > > -- > > > Best Regards . > > shangdahao > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Dec 17 11:22:53 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 17 Dec 2017 08:22:53 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: On Sun, Dec 17, 2017 at 2:11 AM, Julien Salort wrote: > Le 15/12/2017 ? 22:14, Paul Moore a ?crit : > > Annotations and the annotation syntax are fundamental to the design. >> But that's core Python syntax. But I wouldn't describe types as being >> that significant to the design, it's more "if you supply them we'll >> make use of them". >> > Naive question from a lurker: does it mean that it works also if one > annotates with something that is not a type, e.g. a comment, > > @dataclass > class C: > a: "This represents the amplitude" = 0.0 > b: "This is an offset" = 0.0 I would personally not use the notation for this, but it is legal code. However static type checkers like mypy won't be happy with this. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben at bendarnell.com Sun Dec 17 10:38:49 2017 From: ben at bendarnell.com (Ben Darnell) Date: Sun, 17 Dec 2017 15:38:49 +0000 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: On Tue, Dec 12, 2017 at 12:34 PM Yury Selivanov wrote: > Hi, > > This is a new proposal to implement context storage in Python. > > It's a successor of PEP 550 and builds on some of its API ideas and > datastructures. Contrary to PEP 550 though, this proposal only focuses > on adding new APIs and implementing support for it in asyncio. There > are no changes to the interpreter or to the behaviour of generator or > coroutine objects. > I like this proposal. Tornado has a more general implementation of a similar idea ( https://github.com/tornadoweb/tornado/blob/branch4.5/tornado/stack_context.py), but it also tried to solve the problem of exception handling of callback-based code so it had a significant performance cost (to interpose try/except blocks all over the place). Limiting the interface to coroutine-local variables should keep the performance impact minimal. If the contextvars package were published on pypi (and backported to older pythons), I'd deprecate Tornado's stack_context and use it instead (even if there's not an official backport, I'll probably move towards whatever interface is defined in this PEP if it is accepted). One caveat based on Tornado's experience with stack_context: There are times when the automatic propagation of contexts won't do the right thing (for example, a database client with a connection pool may end up hanging on to the context from the request that created the connection instead of picking up a new context for each query). Compatibility with this feature will require testing and possible fixes with many libraries in the asyncio ecosystem before it can be relied upon. -Ben > > > PEP: 567 > Title: Context Variables > Version: $Revision$ > Last-Modified: $Date$ > Author: Yury Selivanov > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 12-Dec-2017 > Python-Version: 3.7 > Post-History: 12-Dec-2017 > > > Abstract > ======== > > This PEP proposes the new ``contextvars`` module and a set of new > CPython C APIs to support context variables. This concept is > similar to thread-local variables but, unlike TLS, it allows > correctly keeping track of values per asynchronous task, e.g. > ``asyncio.Task``. > > This proposal builds directly upon concepts originally introduced > in :pep:`550`. The key difference is that this PEP is only concerned > with solving the case for asynchronous tasks, and not generators. > There are no proposed modifications to any built-in types or to the > interpreter. > > > Rationale > ========= > > Thread-local variables are insufficient for asynchronous tasks which > execute concurrently in the same OS thread. Any context manager that > needs to save and restore a context value and uses > ``threading.local()``, will have its context values bleed to other > code unexpectedly when used in async/await code. > > A few examples where having a working context local storage for > asynchronous code is desired: > > * Context managers like decimal contexts and ``numpy.errstate``. > > * Request-related data, such as security tokens and request > data in web applications, language context for ``gettext`` etc. > > * Profiling, tracing, and logging in large code bases. > > > Introduction > ============ > > The PEP proposes a new mechanism for managing context variables. > The key classes involved in this mechanism are ``contextvars.Context`` > and ``contextvars.ContextVar``. The PEP also proposes some policies > for using the mechanism around asynchronous tasks. > > The proposed mechanism for accessing context variables uses the > ``ContextVar`` class. A module (such as decimal) that wishes to > store a context variable should: > > * declare a module-global variable holding a ``ContextVar`` to > serve as a "key"; > > * access the current value via the ``get()`` method on the > key variable; > > * modify the current value via the ``set()`` method on the > key variable. > > The notion of "current value" deserves special consideration: > different asynchronous tasks that exist and execute concurrently > may have different values. This idea is well-known from thread-local > storage but in this case the locality of the value is not always > necessarily to a thread. Instead, there is the notion of the > "current ``Context``" which is stored in thread-local storage, and > is accessed via ``contextvars.get_context()`` function. > Manipulation of the current ``Context`` is the responsibility of the > task framework, e.g. asyncio. > > A ``Context`` is conceptually a mapping, implemented using an > immutable dictionary. The ``ContextVar.get()`` method does a > lookup in the current ``Context`` with ``self`` as a key, raising a > ``LookupError`` or returning a default value specified in > the constructor. > > The ``ContextVar.set(value)`` method clones the current ``Context``, > assigns the ``value`` to it with ``self`` as a key, and sets the > new ``Context`` as a new current. Because ``Context`` uses an > immutable dictionary, cloning it is O(1). > > > Specification > ============= > > A new standard library module ``contextvars`` is added with the > following APIs: > > 1. ``get_context() -> Context`` function is used to get the current > ``Context`` object for the current OS thread. > > 2. ``ContextVar`` class to declare and access context variables. > > 3. ``Context`` class encapsulates context state. Every OS thread > stores a reference to its current ``Context`` instance. > It is not possible to control that reference manually. > Instead, the ``Context.run(callable, *args)`` method is used to run > Python code in another context. > > > contextvars.ContextVar > ---------------------- > > The ``ContextVar`` class has the following constructor signature: > ``ContextVar(name, *, default=no_default)``. The ``name`` parameter > is used only for introspection and debug purposes. The ``default`` > parameter is optional. Example:: > > # Declare a context variable 'var' with the default value 42. > var = ContextVar('var', default=42) > > ``ContextVar.get()`` returns a value for context variable from the > current ``Context``:: > > # Get the value of `var`. > var.get() > > ``ContextVar.set(value) -> Token`` is used to set a new value for > the context variable in the current ``Context``:: > > # Set the variable 'var' to 1 in the current context. > var.set(1) > > ``contextvars.Token`` is an opaque object that should be used to > restore the ``ContextVar`` to its previous value, or remove it from > the context if it was not set before. The ``ContextVar.reset(Token)`` > is used for that:: > > old = var.set(1) > try: > ... > finally: > var.reset(old) > > The ``Token`` API exists to make the current proposal forward > compatible with :pep:`550`, in case there is demand to support > context variables in generators and asynchronous generators in the > future. > > ``ContextVar`` design allows for a fast implementation of > ``ContextVar.get()``, which is particularly important for modules > like ``decimal`` an ``numpy``. > > > contextvars.Context > ------------------- > > ``Context`` objects are mappings of ``ContextVar`` to values. > > To get the current ``Context`` for the current OS thread, use > ``contextvars.get_context()`` method:: > > ctx = contextvars.get_context() > > To run Python code in some ``Context``, use ``Context.run()`` > method:: > > ctx.run(function) > > Any changes to any context variables that ``function`` causes, will > be contained in the ``ctx`` context:: > > var = ContextVar('var') > var.set('spam') > > def function(): > assert var.get() == 'spam' > > var.set('ham') > assert var.get() == 'ham' > > ctx = get_context() > ctx.run(function) > > assert var.get('spam') > > Any changes to the context will be contained and persisted in the > ``Context`` object on which ``run()`` is called on. > > ``Context`` objects implement the ``collections.abc.Mapping`` ABC. > This can be used to introspect context objects:: > > ctx = contextvars.get_context() > > # Print all context variables in their values in 'ctx': > print(ctx.items()) > > # Print the value of 'some_variable' in context 'ctx': > print(ctx[some_variable]) > > > asyncio > ------- > > ``asyncio`` uses ``Loop.call_soon()``, ``Loop.call_later()``, > and ``Loop.call_at()`` to schedule the asynchronous execution of a > function. ``asyncio.Task`` uses ``call_soon()`` to run the > wrapped coroutine. > > We modify ``Loop.call_{at,later,soon}`` to accept the new > optional *context* keyword-only argument, which defaults to > the current context:: > > def call_soon(self, callback, *args, context=None): > if context is None: > context = contextvars.get_context() > > # ... some time later > context.run(callback, *args) > > Tasks in asyncio need to maintain their own isolated context. > ``asyncio.Task`` is modified as follows:: > > class Task: > def __init__(self, coro): > ... > # Get the current context snapshot. > self._context = contextvars.get_context() > self._loop.call_soon(self._step, context=self._context) > > def _step(self, exc=None): > ... > # Every advance of the wrapped coroutine is done in > # the task's context. > self._loop.call_soon(self._step, context=self._context) > ... > > > CPython C API > ------------- > > TBD > > > Implementation > ============== > > This section explains high-level implementation details in > pseudo-code. Some optimizations are omitted to keep this section > short and clear. > > The internal immutable dictionary for ``Context`` is implemented > using Hash Array Mapped Tries (HAMT). They allow for O(log N) ``set`` > operation, and for O(1) ``get_context()`` function. For the purposes > of this section, we implement an immutable dictionary using > ``dict.copy()``:: > > class _ContextData: > > def __init__(self): > self.__mapping = dict() > > def get(self, key): > return self.__mapping[key] > > def set(self, key, value): > copy = _ContextData() > copy.__mapping = self.__mapping.copy() > copy.__mapping[key] = value > return copy > > def delete(self, key): > copy = _ContextData() > copy.__mapping = self.__mapping.copy() > del copy.__mapping[key] > return copy > > Every OS thread has a reference to the current ``_ContextData``. > ``PyThreadState`` is updated with a new ``context_data`` field that > points to a ``_ContextData`` object:: > > PyThreadState: > context : _ContextData > > ``contextvars.get_context()`` is implemented as follows: > > def get_context(): > ts : PyThreadState = PyThreadState_Get() > > if ts.context_data is None: > ts.context_data = _ContextData() > > ctx = Context() > ctx.__data = ts.context_data > return ctx > > ``contextvars.Context`` is a wrapper around ``_ContextData``:: > > class Context(collections.abc.Mapping): > > def __init__(self): > self.__data = _ContextData() > > def run(self, callable, *args): > ts : PyThreadState = PyThreadState_Get() > saved_data : _ContextData = ts.context_data > > try: > ts.context_data = self.__data > callable(*args) > finally: > self.__data = ts.context_data > ts.context_data = saved_data > > # Mapping API methods are implemented by delegating > # `get()` and other Mapping calls to `self.__data`. > > ``contextvars.ContextVar`` interacts with > ``PyThreadState.context_data`` directly:: > > class ContextVar: > > def __init__(self, name, *, default=NO_DEFAULT): > self.__name = name > self.__default = default > > @property > def name(self): > return self.__name > > def get(self, default=NO_DEFAULT): > ts : PyThreadState = PyThreadState_Get() > data : _ContextData = ts.context_data > > try: > return data.get(self) > except KeyError: > pass > > if default is not NO_DEFAULT: > return default > > if self.__default is not NO_DEFAULT: > return self.__default > > raise LookupError > > def set(self, value): > ts : PyThreadState = PyThreadState_Get() > data : _ContextData = ts.context_data > > try: > old_value = data.get(self) > except KeyError: > old_value = NO_VALUE > > ts.context_data = data.set(self, value) > return Token(self, old_value) > > def reset(self, token): > if token.__used: > return > > if token.__old_value is NO_VALUE: > ts.context_data = data.delete(token.__var) > else: > ts.context_data = data.set(token.__var, > token.__old_value) > > token.__used = True > > > class Token: > > def __init__(self, var, old_value): > self.__var = var > self.__old_value = old_value > self.__used = False > > > Backwards Compatibility > ======================= > > This proposal preserves 100% backwards compatibility. > > Libraries that use ``threading.local()`` to store context-related > values, currently work correctly only for synchronous code. Switching > them to use the proposed API will keep their behavior for synchronous > code unmodified, but will automatically enable support for > asynchronous code. > > > Appendix: HAMT Performance Analysis > =================================== > > .. figure:: pep-0550-hamt_vs_dict-v2.png > :align: center > :width: 100% > > Figure 1. Benchmark code can be found here: [1]_. > > The above chart demonstrates that: > > * HAMT displays near O(1) performance for all benchmarked > dictionary sizes. > > * ``dict.copy()`` becomes very slow around 100 items. > > .. figure:: pep-0550-lookup_hamt.png > :align: center > :width: 100% > > Figure 2. Benchmark code can be found here: [2]_. > > Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based > immutable mapping. HAMT lookup time is 30-40% slower than Python dict > lookups on average, which is a very good result, considering that the > latter is very well optimized. > > The reference implementation of HAMT for CPython can be found here: > [3]_. > > > References > ========== > > .. [1] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd > > .. [2] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e > > .. [3] https://github.com/1st1/cpython/tree/hamt > > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/ben%40bendarnell.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Dec 17 11:54:04 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 17 Dec 2017 08:54:04 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: On Sun, Dec 17, 2017 at 8:22 AM, Guido van Rossum wrote: > On Sun, Dec 17, 2017 at 2:11 AM, Julien Salort wrote: > >> Naive question from a lurker: does it mean that it works also if one >> annotates with something that is not a type, e.g. a comment, >> >> @dataclass >> class C: >> a: "This represents the amplitude" = 0.0 >> b: "This is an offset" = 0.0 > > > I would personally not use the notation for this, but it is legal code. > However static type checkers like mypy won't be happy with this. > Mypy definitely won't like that use of annotation, but documentation systems might. For example, in a hover tooltip in an IDE/editor, it's probably more helpful to see the descriptive message than "int" or "float" for the attribute. What about data that isn't built-in scalars? Does this look right to people (and will mypy be happy with it)? @dataclass class C: a:numpy.ndarray = numpy.random.random((3,3)) b:MyCustomClass = MyCustomClass("foo", 37.2, 1+2j) I don't think those look terrible, but I think this looks better: @dataclass class C: a:Infer = np.random.random((3,3)) b:Infer = MyCustomClass("foo", 37.2, 1+2j) Where the name 'Infer' (or some other spelling) was a name defined in the `dataclasses` module. In this case, I don't want to use `typing.Any` since I really do want "the type of thing the default value has." -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Sun Dec 17 13:24:05 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 18 Dec 2017 03:24:05 +0900 Subject: [Python-Dev] Decision of having a deprecation period or not for changing csv.DictReader returning type. In-Reply-To: References: Message-ID: On Mon, Dec 18, 2017 at 12:46 AM, Guido van Rossum wrote: > My gut suggests me not to do this (neither here nor in other similar cases). > I doubt there's much of a performance benefit anyway. OrderedDict uses 2x memory than dict. So it affects memory usage of applications loading large CSV with DictReader. While I think application should use tuple when memory consumption is matter, there is significant benefit. INADA Naoki From solipsis at pitrou.net Sun Dec 17 13:57:39 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 17 Dec 2017 19:57:39 +0100 Subject: [Python-Dev] Decision of having a deprecation period or not for changing csv.DictReader returning type. References: Message-ID: <20171217195739.020fe1cb@fsol> On Mon, 18 Dec 2017 03:24:05 +0900 INADA Naoki wrote: > On Mon, Dec 18, 2017 at 12:46 AM, Guido van Rossum wrote: > > My gut suggests me not to do this (neither here nor in other similar cases). > > I doubt there's much of a performance benefit anyway. > > OrderedDict uses 2x memory than dict. > So it affects memory usage of applications loading large CSV with DictReader. > > While I think application should use tuple when memory consumption is > matter, there is significant benefit. Or they should use a dict-of-lists instead of a list-of-dicts. Or they should simply switch to Pandas. Simply put, I doubt using DictReader is a sensible decision if you're writing performance-sensitive code. Regards Antoine. From yselivanov.ml at gmail.com Sun Dec 17 14:49:26 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 17 Dec 2017 14:49:26 -0500 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: Hi Ben, On Sun, Dec 17, 2017 at 10:38 AM, Ben Darnell wrote: > On Tue, Dec 12, 2017 at 12:34 PM Yury Selivanov > wrote: >> >> Hi, >> >> This is a new proposal to implement context storage in Python. >> >> It's a successor of PEP 550 and builds on some of its API ideas and >> datastructures. Contrary to PEP 550 though, this proposal only focuses >> on adding new APIs and implementing support for it in asyncio. There >> are no changes to the interpreter or to the behaviour of generator or >> coroutine objects. > > > I like this proposal. Tornado has a more general implementation of a similar > idea > (https://github.com/tornadoweb/tornado/blob/branch4.5/tornado/stack_context.py), > but it also tried to solve the problem of exception handling of > callback-based code so it had a significant performance cost (to interpose > try/except blocks all over the place). Limiting the interface to > coroutine-local variables should keep the performance impact minimal. Thank you, Ben! Yes, task local API of PEP 567 has no impact on generators/coroutines. Impact on asyncio should be well within 1-2% slowdown, visible only in microbenchmarks (and asyncio will be 3-6% faster in 3.7 at least due to some asyncio.Task C optimizations). [..] > One caveat based on Tornado's experience with stack_context: There are times > when the automatic propagation of contexts won't do the right thing (for > example, a database client with a connection pool may end up hanging on to > the context from the request that created the connection instead of picking > up a new context for each query). I can see two scenarios that could lead to that: 1. The connection pool explicitly captures the context with 'get_context()' at the point where it was created. It later schedules all of its code within the captured context with Context.run(). 2. The connection pool calls ContextVar.get() once and _caches_ it. Both (1) and (2) are anti-patterns. The documentation of asyncio and contextvars module will explain that users are supposed to simply call ContextVar.get() whenever they need to get a context value (e.g. there's no need to cache/persist it) and that they should not manage the context manually (just trust asyncio to do that for you). Thank you, Yury From njs at pobox.com Sun Dec 17 15:05:39 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 17 Dec 2017 12:05:39 -0800 Subject: [Python-Dev] Usefulness of binary compatibility accross Python versions? In-Reply-To: References: <20171216142257.2a0c978c@fsol> <20171216193754.501aa370@fsol> <20171216201406.366788d2@fsol> Message-ID: On Dec 16, 2017 11:44 AM, "Guido van Rossum" wrote: On Sat, Dec 16, 2017 at 11:14 AM, Antoine Pitrou wrote: > On Sat, 16 Dec 2017 19:37:54 +0100 > Antoine Pitrou wrote: > > > > Currently, you can pass a `module_api_version` to PyModule_Create2(), > > but that function is for specialists only :-) > > > > ("""Most uses of this function should be using PyModule_Create() > > instead; only use this if you are sure you need it.""") > > Ah, it turns out I misunderstood that piece of documentation and also > what PEP 3121 really did w.r.t the module API check. > > PyModule_Create() is actually a *macro* calling PyModule_Create2() with > the version number is was compiled against! > > #ifdef Py_LIMITED_API > #define PyModule_Create(module) \ > PyModule_Create2(module, PYTHON_ABI_VERSION) > #else > #define PyModule_Create(module) \ > PyModule_Create2(module, PYTHON_API_VERSION) > #endif > > And there's already a check for that version number in moduleobject.c: > https://github.com/python/cpython/blob/master/Objects/moduleobject.c#L114 > > That check is always invoked when calling PyModule_Create() and > PyModule_Create2(). Currently it merely invokes a warning, but we can > easily turn that into an error. > > (with apologies to Martin von L?wis for not fully understanding what he > did at the time :-)) > If it's only a warning, I worry that if we stop checking the flag bits it can cause wild pointer following. This sounds like it would be a potential security issue (load a module, ignore the warning, try to use a certain API on a class it defines, boom). Also, could there still be 3rd party modules out there that haven't been recompiled in a really long time and use some older backwards compatible module initialization API? (I guess we could stop supporting that and let them fail hard.) I think there's a pretty simple way to avoid this kind of problem. Since PEP 3149 (Python 3.2), the import system has (IIUC) checked for: foo.cpython-XYm.so foo.abi3.so foo.so If we drop foo.so from this list, then we're pretty much guaranteed not to load anything into a python that it wasn't intended for. How disruptive would this be? AFAICT there hasn't been any standard way to build python extensions named like 'foo.so' since 3.2 was released, so we're talking about modules from 3.1 and earlier (or else people who are manually hacking around the compatibility checking system, who can presumably take care of themselves). We've at a minimum been issuing warnings about these modules for 5 versions now (based on Antoine's analysis above), and I'd be really surprised if a module built for 3.1 works on 3.7 anyway. So this change seems pretty reasonable to me. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Sun Dec 17 21:07:02 2017 From: random832 at fastmail.com (Random832) Date: Sun, 17 Dec 2017 21:07:02 -0500 Subject: [Python-Dev] Usefulness of binary compatibility accross Python versions? In-Reply-To: <20171216142257.2a0c978c@fsol> References: <20171216142257.2a0c978c@fsol> Message-ID: <1513562822.11518.1208219672.2A3C0470@webmail.messagingengine.com> On Sat, Dec 16, 2017, at 08:22, Antoine Pitrou wrote: > Typically, when adding a tp_XXX slot, you also need to add a > Py_TPFLAGS_HAVE_XXX type flag to signal those static type structures > that have been compiled against a recent enough PyTypeObject > definition. This way, extensions compiled against Python N-1 are > supposed to "still work": as they don't have Py_TPFLAGS_HAVE_XXX set, > the core Python runtime won't try to access the (non-existing) tp_XXX > member. Is there any practical for of having the flag off for one slot and on for another slot that's been added later? Could this be replaced (that is, a slot for such a thing added before it's too late) with a simple counter that goes up with each version, and any "unused" slot should have NULL or some other sentinel value? If it really is important to have the flags themselves, just add another set of flags - Py_TPFLAGS_HAVE_MORE_FLAGS. From shangdahao at gmail.com Mon Dec 18 01:30:25 2017 From: shangdahao at gmail.com (=?UTF-8?B?5bCa6L6J?=) Date: Mon, 18 Dec 2017 14:30:25 +0800 Subject: [Python-Dev] Decision of having a deprecation period or not for changing csv.DictReader returning type. In-Reply-To: References: Message-ID: Since regular dicts are ordered in 3.7, it might be cleaner to returning regular dict instead of OrderedDict? -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Dec 18 05:23:31 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 18 Dec 2017 11:23:31 +0100 Subject: [Python-Dev] Usefulness of binary compatibility accross Python versions? References: <20171216142257.2a0c978c@fsol> <1513562822.11518.1208219672.2A3C0470@webmail.messagingengine.com> Message-ID: <20171218112331.2a08c2f7@fsol> On Sun, 17 Dec 2017 21:07:02 -0500 Random832 wrote: > > Is there any practical for of having the flag off for one slot and on > for another slot that's been added later? > > Could this be replaced (that is, a slot for such a thing added before > it's too late) with a simple counter that goes up with each version, and > any "unused" slot should have NULL or some other sentinel value? Any replacement here would break binary compatibility, which is what those flags are precisely meant to avoid. > If it > really is important to have the flags themselves, just add another set > of flags - Py_TPFLAGS_HAVE_MORE_FLAGS. Yes, we could... but it's more complication again. That said, if we decide to drop cross-version compatibility, we then are allowed to enlarge the tp_flags field to, say, 64 bits. Regards Antoine. From victor.stinner at gmail.com Mon Dec 18 09:55:17 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 18 Dec 2017 15:55:17 +0100 Subject: [Python-Dev] Broken svn lookup? Message-ID: Hi, I was looking at old issues. In https://bugs.python.org/issue8610#msg105805 I found the link: http://svn.python.org/view?view=revision&revision=81190 Sadly, the link fails with HTTP 404 Not Found :-( Is the service down? It's no more possible to browse the old Subversion repository? Victor From steve at holdenweb.com Mon Dec 18 10:23:12 2017 From: steve at holdenweb.com (Steve Holden) Date: Mon, 18 Dec 2017 15:23:12 +0000 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: On Fri, Dec 15, 2017 at 9:47 PM, Guido van Rossum wrote: > ?[...] > > stays ordered across deletions" part of the ruling is true in CPython 3.6. > > I don't know what guidance to give Eric, because I don't know what other > implementations do nor whether Eric cares about being compatible with > those. IIUC micropython does not guarantee this currently, but I don't know > if they claim Python 3.6 compatibility -- in fact I can't find any document > that specifies the Python version they're compatible with more precisely > than "Python 3". > ?They currently specify 3.4+. Specifically, https://github.com/micropython/micropython includes: """ MicroPython implements the entire Python 3.4 syntax (including exceptions, with, yield from, etc., and additionally async/await keywords from Python 3.5). The following core datatypes are provided: str (including basic Unicode support), bytes, bytearray, tuple, list, dict, set, frozenset, array.array, collections.namedtuple, classes and instances. Builtin modules include sys, time, and struct, etc. Select ports have support for _thread module (multithreading). *Note that only a subset of Python 3 functionality is implemented for the data types and modules*. """ Note the emphasis I added on the last sentence. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Dec 18 10:24:16 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 18 Dec 2017 07:24:16 -0800 Subject: [Python-Dev] Decision of having a deprecation period or not for changing csv.DictReader returning type. In-Reply-To: References: Message-ID: Let's not do this. Once it's documented as returning an OrderedDict there's an additional promise about the API, e.g. move_to_end() exists. On Sun, Dec 17, 2017 at 10:30 PM, ?? wrote: > Since regular dicts are ordered in 3.7, it might be cleaner to returning regular dict instead of OrderedDict? > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at holdenweb.com Mon Dec 18 10:29:54 2017 From: steve at holdenweb.com (Steve Holden) Date: Mon, 18 Dec 2017 15:29:54 +0000 Subject: [Python-Dev] Decision of having a deprecation period or not for changing csv.DictReader returning type. In-Reply-To: References: Message-ID: I submitted the 3.6 patch that Raymond committed. The purpose of the change was to allow access to the ordering of the columns. It doesn't use any of the OrderedDict-only methods, and I'd be very surprised if a reversion to using dict in 3.7 would cause any tests to fail. regards Steve Steve Holden On Mon, Dec 18, 2017 at 6:30 AM, ?? wrote: > Since regular dicts are ordered in 3.7, it might be cleaner to returning regular dict instead of OrderedDict? > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > steve%40holdenweb.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcgoble3 at gmail.com Mon Dec 18 12:54:47 2017 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Mon, 18 Dec 2017 17:54:47 +0000 Subject: [Python-Dev] Broken svn lookup? In-Reply-To: References: Message-ID: On Mon, Dec 18, 2017 at 9:57 AM Victor Stinner wrote: > Hi, > > I was looking at old issues. In > https://bugs.python.org/issue8610#msg105805 I found the link: > > http://svn.python.org/view?view=revision&revision=81190 > > Sadly, the link fails with HTTP 404 Not Found :-( > > Is the service down? It's no more possible to browse the old > Subversion repository? > I don't get a 404 response. I get a Firefox dialog box asking for my username and password, and on clicking "Cancel", I then get an HTTP 401 Unauthorized response. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Dec 18 12:54:29 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 18 Dec 2017 09:54:29 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: Good Bad or Neutral, this discussion makes my point: Using typing annotation as a necessary part of a standard library module is injecting typing into "ordinary" python in a new way. It is no longer going to appear to be completely optional, and only of concern to those that choose to use it (and mypy or similar). And I do think it is really bad UI to have something like: @dataclass class C: a: Int = 1 b: float = 1.0 be the recommended (and shown in all the examples, and really be almost the only way) to define a dataclass, when the type will in fact be completely ignored by the implementation. Newbies are going to be confused by this -- they really are. Anyway, clearly I personally don't think this is a very good idea, but I see that annotations are a natural and easy way to express the fields without adding any new syntax. But most importantly I don't think this should become standard without consideration of the impact and a deliberate decision to do so. A note: I don't know who everyone is that was engaged in the gitHub discussion working out the details, but at least a few core folks are very engaged in the introduction of type hinting to Python in general -- so I think a certain perspective may have been over-represented. Are there other options?? plain old: @dataclass class C: a = 1 b = 1.0 would work, though then there would be no way to express fields without defaults: @dataclass class C: a = 1 b = None almost -- but they is there "no default" or is the default None Would it be impossible to use the annotation syntax, but with the type optional: @dataclass class C: a : = 1 # filed with default value b : # field with no default This is not legal python now, but are there barriers other than not wanting to make yet more changes to it being legal (i.e. hard/impossible to unambiguously parse, etc. Maybe this can all be addresses by more "Untyped" examples the docs. -CHB On Sun, Dec 17, 2017 at 8:54 AM, David Mertz wrote: > On Sun, Dec 17, 2017 at 8:22 AM, Guido van Rossum > wrote: > >> On Sun, Dec 17, 2017 at 2:11 AM, Julien Salort wrote: >> >>> Naive question from a lurker: does it mean that it works also if one >>> annotates with something that is not a type, e.g. a comment, >>> >>> @dataclass >>> class C: >>> a: "This represents the amplitude" = 0.0 >>> b: "This is an offset" = 0.0 >> >> >> I would personally not use the notation for this, but it is legal code. >> However static type checkers like mypy won't be happy with this. >> > > Mypy definitely won't like that use of annotation, but documentation > systems might. For example, in a hover tooltip in an IDE/editor, it's > probably more helpful to see the descriptive message than "int" or "float" > for the attribute. > > What about data that isn't built-in scalars? Does this look right to > people (and will mypy be happy with it)? > > @dataclass > class C: > a:numpy.ndarray = numpy.random.random((3,3)) > b:MyCustomClass = MyCustomClass("foo", 37.2, 1+2j) > > I don't think those look terrible, but I think this looks better: > > @dataclass > class C: > a:Infer = np.random.random((3,3)) > b:Infer = MyCustomClass("foo", 37.2, 1+2j) > > Where the name 'Infer' (or some other spelling) was a name defined in the > `dataclasses` module. In this case, I don't want to use `typing.Any` since > I really do want "the type of thing the default value has." > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > chris.barker%40noaa.gov > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Mon Dec 18 14:24:47 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Mon, 18 Dec 2017 20:24:47 +0100 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: @David What you propose as `Infer` annotation was proposed some time ago (not only for dataclasses, there are other use cases). The discussion is here https://github.com/python/typing/issues/276 @Chris People are still allowed not to use dataclasses if they really don't like type hints :-) Seriously however, annotations are just syntax. In this sense PEP 526 is more like PEP 3107, and less like PEP 484. People are still free to write: @dataclass class C: x: "first coordinate" y: "second coordinate" plus: "I don't like types" or @dataclass class C: x: ... y: ... I don't see so big difference between hypothesis (testing lib) using annotations for their purposes from the situation with dataclasses. It is true that the syntax was chosen to simplify support in static type checkers (partially because users were often asking for such feature), but not more than this. If you don't use type checkers, there is no problem in using one of the above forms. If you have ideas about how to improve the dataclass docs, this can be discussed in the issue https://bugs.python.org/issue32216 > ... the type will in fact be completely ignored by the implementation. > Newbies are going to be confused by this -- they really are. This is not different from def f(x: int): pass f("What") # OK that exists starting from Python 3.0. Although I agree this is confusing, the way forward could be just explaining this better in the docs. If you want my personal opinion about the current situation about type hints _in general_, then I can say that I have seen many cases where people use type hints where they are not needed (for example in 10 line scripts or in highly polymorphic functions), so I agree that some community style guidance (like PEP 8) may be helpful. I had started such project an the end of last year (it was called pep-555, but I didn't have time to work on this and this number is already taken). -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Dec 18 14:38:50 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 19 Dec 2017 05:38:50 +1000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: On 19 Dec. 2017 7:00 am, "Chris Barker" wrote: Are there other options?? plain old: @dataclass class C: a = 1 b = 1.0 would work, though then there would be no way to express fields without defaults: The PEP already supports using "a = field(); b = field()" (etc) to declare untyped fields without a default value. This annotation free spelling may not be clearly covered in the current module docs, though. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Mon Dec 18 14:55:12 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Mon, 18 Dec 2017 20:55:12 +0100 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: On 18 December 2017 at 20:38, Nick Coghlan wrote: > > On 19 Dec. 2017 7:00 am, "Chris Barker" wrote: > > > Are there other options?? > > plain old: > > @dataclass > class C: > a = 1 > b = 1.0 > > would work, though then there would be no way to express fields without > defaults: > > > The PEP already supports using "a = field(); b = field()" (etc) to declare > untyped fields without a default value. > > The PEP is not 100% clear not this, but it is currently not the case and this may be intentional (one obvious way to do it), I just tried and this does not work: @dataclass class C: x = field() generates `__init__` etc. with no arguments. I think however that it is better to generate an error than silently ignore it. (Or if this a bug in the implementation, it should be just fixed.) -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Mon Dec 18 15:33:00 2017 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 18 Dec 2017 15:33:00 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: <68b6a94a-c2b1-6bc1-4de1-8e063ab97d2c@trueblade.com> On 12/18/2017 2:55 PM, Ivan Levkivskyi wrote: > On 18 December 2017 at 20:38, Nick Coghlan > wrote: > > > On 19 Dec. 2017 7:00 am, "Chris Barker" > wrote: > > > Are there other options?? > > plain old: > > @dataclass > class C: > ? ? a = 1 > ? ? b = 1.0 > > would work, though then there would be no way to express fields > without defaults: > > > The PEP already supports using "a = field(); b = field()" (etc) to > declare untyped fields without a default value. > > > The PEP is not 100% clear not this, but it is currently not the case and > this may be intentional (one obvious way to do it), > I just tried and this does not work: > > @dataclass > class C: > ??? x = field() > > generates `__init__` etc. with no arguments. I think however that it is > better to generate an error than silently ignore it. > (Or if this a bug in the implementation, it should be just fixed.) Hmm, not sure why that doesn't generate an error. I think it's a bug that should be fixed. Or, we could make the same change we're making in make_dataclass(), where we'll use "typing.Any" (as a string) if the type is omitted. See https://bugs.python.org/issue32278. From levkivskyi at gmail.com Mon Dec 18 18:00:23 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Tue, 19 Dec 2017 00:00:23 +0100 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: On 13 December 2017 at 22:35, Yury Selivanov wrote: > [..] > >> A new standard library module ``contextvars`` is added > > > > Why not add this to contextlib instead of adding a new module? IIRC > > this was discussed relative to PEP 550, but I don't remember the > > reason. Regardless, it would be worth mentioning somewhere in the > > PEP. > > > > The mechanism is generic and isn't directly related to context > managers. Context managers can (and in many cases should) use the new > APIs to store global state, but the contextvars APIs do not depend on > context managers or require them. > > This was the main point of confusion for me when reading the PEP. Concept of TLS is independent of context managers, but using word "context" everywhere leads to doubts like "Am I getting everything right?" I think just adding the two quoted sentences will clarify the intent. Otherwise the PEP is easy to read, the proposed API looks simple, and this definitely will be a useful feature. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Dec 18 16:55:35 2017 From: brett at python.org (Brett Cannon) Date: Mon, 18 Dec 2017 21:55:35 +0000 Subject: [Python-Dev] Decision of having a deprecation period or not for changing csv.DictReader returning type. In-Reply-To: References: Message-ID: On Mon, 18 Dec 2017 at 07:30 Steve Holden wrote: > I submitted the 3.6 patch that Raymond committed. The purpose of the > change was to allow access to the ordering of the columns. > > It doesn't use any of the OrderedDict-only methods, and I'd be very > surprised if a reversion to using dict in 3.7 would cause any tests to fail. > But as Guido pointed out, the expected interface of what gets returned would be broken. I say just leave it as-is. -Brett > > regards > Steve > > Steve Holden > > On Mon, Dec 18, 2017 at 6:30 AM, ?? wrote: > >> Since regular dicts are ordered in 3.7, it might be cleaner to returning regular dict instead of OrderedDict? >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/steve%40holdenweb.com >> >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Mon Dec 18 19:20:47 2017 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 18 Dec 2017 16:20:47 -0800 Subject: [Python-Dev] Broken svn lookup? In-Reply-To: References: Message-ID: <1513642847.1986466.1209415816.430806DE@webmail.messagingengine.com> I turned viewvc off a few months ago because subversion is highly deprecated at this point. In fact, now that Windows build dependencies have moved off, I?m probably going to shut it off for good soon. On Mon, Dec 18, 2017, at 06:55, Victor Stinner wrote: > Hi, > > I was looking at old issues. In > https://bugs.python.org/issue8610#msg105805 I found the link: > > http://svn.python.org/view?view=revision&revision=81190 > > Sadly, the link fails with HTTP 404 Not Found :-( > > Is the service down? It's no more possible to browse the old > Subversion repository? > > Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Mon Dec 18 19:29:05 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 19 Dec 2017 01:29:05 +0100 Subject: [Python-Dev] Broken svn lookup? In-Reply-To: <1513642847.1986466.1209415816.430806DE@webmail.messagingengine.com> References: <1513642847.1986466.1209415816.430806DE@webmail.messagingengine.com> Message-ID: I don't need Subversion. I only need an online service redirecting me to the change. The bug tracker points to http://hg.python.org/lookup/r81190 which redirects to http://svn.python.org/view?view=revision&revision=81190 In the master branch, there is the Misc/svnmap.txt file which maps Subversion commits to Mercurial commits. Hum. Now we need Git commits :-) Well, it's not really a blocker issue. It would just be "nice to have", but sadly I don't have the bandwidth to work on that :-( Victor 2017-12-19 1:20 GMT+01:00 Benjamin Peterson : > I turned viewvc off a few months ago because subversion is highly deprecated > at this point. In fact, now that Windows build dependencies have moved off, > I?m probably going to shut it off for good soon. > > > On Mon, Dec 18, 2017, at 06:55, Victor Stinner wrote: >> Hi, >> >> I was looking at old issues. In >> https://bugs.python.org/issue8610#msg105805 I found the link: >> >> http://svn.python.org/view?view=revision&revision=81190 >> >> Sadly, the link fails with HTTP 404 Not Found :-( >> >> Is the service down? It's no more possible to browse the old >> Subversion repository? >> >> Victor > From yselivanov.ml at gmail.com Mon Dec 18 20:37:44 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 18 Dec 2017 20:37:44 -0500 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: > 3. The connection pool has a queue, and creates a task for each connection to serve requests from that queue. Naively, each task could inherit the context of the request that caused it to be created, but the task would outlive the request and go on to serve other requests. The connection pool would need to specifically suppress the caller's context when creating its worker tasks. I haven't used this pattern myself, but it looks like a good case for adding a keyword-only 'context' rgument to `loop.create_task()`. This way the pool can capture the context when some API method is called and pass it down to the queue along with the request. The queue task can then run connection code in that context. Yury From yselivanov.ml at gmail.com Mon Dec 18 20:38:13 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 18 Dec 2017 20:38:13 -0500 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: On Mon, Dec 18, 2017 at 6:00 PM, Ivan Levkivskyi wrote: > On 13 December 2017 at 22:35, Yury Selivanov > wrote: >> >> [..] >> >> A new standard library module ``contextvars`` is added >> > >> > Why not add this to contextlib instead of adding a new module? IIRC >> > this was discussed relative to PEP 550, but I don't remember the >> > reason. Regardless, it would be worth mentioning somewhere in the >> > PEP. >> > >> >> The mechanism is generic and isn't directly related to context >> managers. Context managers can (and in many cases should) use the new >> APIs to store global state, but the contextvars APIs do not depend on >> context managers or require them. >> > > This was the main point of confusion for me when reading the PEP. > Concept of TLS is independent of context managers, but using word "context" > everywhere leads to doubts like "Am I getting everything right?" I think > just adding the > two quoted sentences will clarify the intent. I'll try to clarify this in the Abstract section. > > Otherwise the PEP is easy to read, the proposed API looks simple, and this > definitely will be a useful feature. Thanks, Ivan! Yury From ben at bendarnell.com Mon Dec 18 20:21:12 2017 From: ben at bendarnell.com (Ben Darnell) Date: Tue, 19 Dec 2017 01:21:12 +0000 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: On Sun, Dec 17, 2017 at 2:49 PM Yury Selivanov wrote: > > One caveat based on Tornado's experience with stack_context: There are > times > > when the automatic propagation of contexts won't do the right thing (for > > example, a database client with a connection pool may end up hanging on > to > > the context from the request that created the connection instead of > picking > > up a new context for each query). > > I can see two scenarios that could lead to that: > > 1. The connection pool explicitly captures the context with > 'get_context()' at > the point where it was created. It later schedules all of its code within > the > captured context with Context.run(). > > 2. The connection pool calls ContextVar.get() once and _caches_ it. > > 3. The connection pool has a queue, and creates a task for each connection to serve requests from that queue. Naively, each task could inherit the context of the request that caused it to be created, but the task would outlive the request and go on to serve other requests. The connection pool would need to specifically suppress the caller's context when creating its worker tasks. The situation was more complicated for Tornado since we were trying to support callback-based workflows as well. Limiting this to coroutines closes off a lot of the potential issues - most of the specific examples I can think of would not be possible in a coroutine-only world. -Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Dec 18 21:11:05 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 18 Dec 2017 18:11:05 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: Now that dicts are order-preserving, maybe we should change prettyprint: In [7]: d = {'one':1, 'two':2, 'three':3} In [8]: print(d) {'one': 1, 'two': 2, 'three': 3} order preserved. In [9]: pprint.pprint(d) {'one': 1, 'three': 3, 'two': 2} order not preserved ( sorted, I presume? ) With arbitrary order, it made sense to sort, so as to always give the same "pretty" representation. But now that order is "part of" the dict itself, it seems prettyprint should present the preserved order of the dict. NOTE: I discovered this making examples for an intro to Python class -- I was updating the part where I teach that dicts do not preserve order. I was using iPython, which, unbeknownst to me, was using pprint under the hood, so got a different order depending on whether I simply displayed the dict (which used pprint) or called str() or repr() on it. Pretty confusing. Will changing pprint be considered a breaking change? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Dec 18 21:41:44 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 18 Dec 2017 18:41:44 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: I'm really surprised no one seems to get my point here. TL;DR: My point is that having type annotation syntax required for something in the stdlib is a significant step toward "normalizing" type hinting in Python. Whether that's a good idea or not is a judgement call, but it IS a big step. @Chris > People are still allowed not to use dataclasses if they really don't like > type hints :-) > Seriously however, annotations are just syntax. In this sense PEP 526 is > more like PEP 3107, > and less like PEP 484. People are still free to write: > > @dataclass > class C: > x: "first coordinate" > y: "second coordinate" > plus: "I don't like types" > Well, yes, of course, but this is not like PEP 3107, as it introduces a requirement for annotations (maybe not *type* annotations per se) in the std lib. Again, that may be the best way to go -- but it should be done deliberately. @dataclass > class C: > x: ... > y: ... > Ah! I had no idea you could use ellipses to indicate no type. That actually helps a lot. We really should have that prominent in the docs. And in the dataclass docs, not just the type hinting docs -- again, people will want to use these that may not have any interest in nor prior knowledge of type hints. > I don't see so big difference between hypothesis (testing lib) using > annotations for their purposes > from the situation with dataclasses. > The big difference is that hypothesis is not in the standard library. Also, I didn't know about hypothesis until just now, but their very first example in the quick start does not use annotation syntax, so it's not as baked in as it is with dataclasses. > If you have ideas about how to improve the dataclass docs, this can be > discussed in the issue https://bugs.python.org/issue32216 > I'll try to find time to contribute there -- though maybe better to have the doc draft in gitHub? > ... the type will in fact be completely ignored by the implementation. > > Newbies are going to be confused by this -- they really are. > > This is not different from > > def f(x: int): > pass > > f("What") # OK > > that exists starting from Python 3.0. Although I agree this is confusing, > the way forward could be just explaining this better in the docs. > Again the difference is that EVERY introduction to defining python functions doesn't use the type hint. And even more to the point, you CAN define a function without any annotations. But frankly, I think as type hinting becomes more common, we're going to see a lot of confusion :-( If you want my personal opinion about the current situation about type > hints _in general_, then I can say that > I have seen many cases where people use type hints where they are not > needed > (for example in 10 line scripts or in highly polymorphic functions), so I > agree that some community > style guidance (like PEP 8) may be helpful. > It's going to get worse before it gets better :-( @dataclass > class C: > x = field() that does require that `field` be imported, so not as nice. I kinda like the ellipses better. but good to have a way. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Mon Dec 18 22:01:33 2017 From: barry at python.org (Barry Warsaw) Date: Mon, 18 Dec 2017 22:01:33 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: On Dec 18, 2017, at 21:41, Chris Barker wrote: > > TL;DR: > My point is that having type annotation syntax required for something in the stdlib is a significant step toward "normalizing" type hinting in Python. Whether that's a good idea or not is a judgement call, but it IS a big step. This is something we?re discussing for importlib.resources: https://bugs.python.org/issue32248#msg308495 In the standalone version, we?re using annotations for the Python 3 bits. It would make our lives easier if we kept them for the stdlib version (applying diffs and keeping them in sync would be easier). Brett says in the follow up: "As for the type hints, I thought it was lifted such that new code could include it but we wouldn't be taking PRs to add them to pre-existing code?? So, what?s the deal? -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From barry at python.org Mon Dec 18 22:02:38 2017 From: barry at python.org (Barry Warsaw) Date: Mon, 18 Dec 2017 22:02:38 -0500 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> Message-ID: <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> On Dec 18, 2017, at 21:11, Chris Barker wrote: > Will changing pprint be considered a breaking change? Yes, definitely. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From njs at pobox.com Mon Dec 18 22:37:03 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 18 Dec 2017 19:37:03 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> References: <20171104173013.GA4005@bytereef.org> <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> Message-ID: On Mon, Dec 18, 2017 at 7:02 PM, Barry Warsaw wrote: > On Dec 18, 2017, at 21:11, Chris Barker wrote: > >> Will changing pprint be considered a breaking change? > > Yes, definitely. Wait, what? Why would changing pprint (so that it accurately reflects dict's new underlying semantics!) be a breaking change? Are you suggesting it shouldn't be changed in 3.7? -n -- Nathaniel J. Smith -- https://vorpus.org From ben at bendarnell.com Mon Dec 18 21:37:56 2017 From: ben at bendarnell.com (Ben Darnell) Date: Tue, 19 Dec 2017 02:37:56 +0000 Subject: [Python-Dev] PEP 567 -- Context Variables In-Reply-To: References: Message-ID: On Mon, Dec 18, 2017 at 8:37 PM Yury Selivanov wrote: > > 3. The connection pool has a queue, and creates a task for each > connection to serve requests from that queue. Naively, each task could > inherit the context of the request that caused it to be created, but the > task would outlive the request and go on to serve other requests. The > connection pool would need to specifically suppress the caller's context > when creating its worker tasks. > > I haven't used this pattern myself, but it looks like a good case for > adding a keyword-only 'context' rgument to `loop.create_task()`. This > way the pool can capture the context when some API method is called > and pass it down to the queue along with the request. The queue task > can then run connection code in that context. > > Yes, that would be useful. -Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Dec 18 22:41:55 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 Dec 2017 14:41:55 +1100 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: Message-ID: <20171219034154.GE16230@ando.pearwood.info> On Mon, Dec 18, 2017 at 06:11:05PM -0800, Chris Barker wrote: > Now that dicts are order-preserving, maybe we should change prettyprint: > > In [7]: d = {'one':1, 'two':2, 'three':3} > > In [8]: print(d) > {'one': 1, 'two': 2, 'three': 3} > > order preserved. > > In [9]: pprint.pprint(d) > {'one': 1, 'three': 3, 'two': 2} > > order not preserved ( sorted, I presume? ) Indeed. pprint.PrettyPrinter has separate methods for OrderedDict and regular dicts, and the method for printing dicts calls sorted() while the other does not. > With arbitrary order, it made sense to sort, so as to always give the same > "pretty" representation. But now that order is "part of" the dict itself, > it seems prettyprint should present the preserved order of the dict. I disagree. Many uses of dicts are still conceptually unordered, even if the dict now preserves insertion order. For those use-cases, insertion order is of no interest whatsoever, and sorting is still "prettier". -- Steve From steve at pearwood.info Mon Dec 18 22:58:18 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 Dec 2017 14:58:18 +1100 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> Message-ID: <20171219035818.GF16230@ando.pearwood.info> On Mon, Dec 18, 2017 at 07:37:03PM -0800, Nathaniel Smith wrote: > On Mon, Dec 18, 2017 at 7:02 PM, Barry Warsaw wrote: > > On Dec 18, 2017, at 21:11, Chris Barker wrote: > > > >> Will changing pprint be considered a breaking change? > > > > Yes, definitely. > > Wait, what? Why would changing pprint (so that it accurately reflects > dict's new underlying semantics!) be a breaking change? I have a script which today prints data like so: {'Aaron': 62, 'Anne': 51, 'Bob': 23, 'George': 30, 'Karen': 45, 'Sue': 17, 'Sylvester': 34} Tomorrow, it will suddenly start printing: {'Bob': 23, 'Karen': 45, 'Sue': 17, 'George': 30, 'Aaron': 62, 'Anne': 51, 'Sylvester': 34} and my users will yell at me that my script is broken because the data is now in random order. Now, maybe that's my own damn fault for using pprint instead of writing my own pretty printer... but surely the point of pprint is so I don't have to write my own? Besides, the docs say very prominently: "Dictionaries are sorted by key before the display is computed." https://docs.python.org/3/library/pprint.html so I think I can be excused having relied on that feature. -- Steve From chris.barker at noaa.gov Mon Dec 18 23:28:52 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 18 Dec 2017 20:28:52 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: <20171219034154.GE16230@ando.pearwood.info> References: <20171219034154.GE16230@ando.pearwood.info> Message-ID: On Mon, Dec 18, 2017 at 7:41 PM, Steven D'Aprano wrote: > > With arbitrary order, it made sense to sort, so as to always give the > same > > "pretty" representation. But now that order is "part of" the dict itself, > > it seems prettyprint should present the preserved order of the dict. > > I disagree. Many uses of dicts are still conceptually unordered, even if > the dict now preserves insertion order. For those use-cases, insertion > order is of no interest whatsoever, and sorting is still "prettier". > and many uses of dicts have "sorted" order as completely irrelevant, and sorting them arbitrarily is not necessarily pretty (you can't provide a sort key can you? -- so yes, it's arbitrary) I'm not necessarily saying we should break things, but I won't agree that pprint sorting dicts is the "right" interface for what is actually an order-preserving mapping. I would think it was only the right choice in the first place in order (get it?) to get a consistent representation, not because sorting was a good thing per se. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Dec 18 23:49:54 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 18 Dec 2017 20:49:54 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: <20171219035818.GF16230@ando.pearwood.info> References: <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <20171219035818.GF16230@ando.pearwood.info> Message-ID: On Mon, Dec 18, 2017 at 7:58 PM, Steven D'Aprano wrote: > On Mon, Dec 18, 2017 at 07:37:03PM -0800, Nathaniel Smith wrote: >> On Mon, Dec 18, 2017 at 7:02 PM, Barry Warsaw wrote: >> > On Dec 18, 2017, at 21:11, Chris Barker wrote: >> > >> >> Will changing pprint be considered a breaking change? >> > >> > Yes, definitely. >> >> Wait, what? Why would changing pprint (so that it accurately reflects >> dict's new underlying semantics!) be a breaking change? > > I have a script which today prints data like so: > > {'Aaron': 62, > 'Anne': 51, > 'Bob': 23, > 'George': 30, > 'Karen': 45, > 'Sue': 17, > 'Sylvester': 34} > > Tomorrow, it will suddenly start printing: > > {'Bob': 23, > 'Karen': 45, > 'Sue': 17, > 'George': 30, > 'Aaron': 62, > 'Anne': 51, > 'Sylvester': 34} > > > and my users will yell at me that my script is broken because the data > is now in random order. To make sure I understand, do you actually have a script like this, or is this hypothetical? > Now, maybe that's my own damn fault for using > pprint instead of writing my own pretty printer... but surely the point > of pprint is so I don't have to write my own? > > Besides, the docs say very prominently: > > "Dictionaries are sorted by key before the display is computed." > > https://docs.python.org/3/library/pprint.html > > so I think I can be excused having relied on that feature. No need to get aggro -- I asked a question, it wasn't a personal attack. At a high-level, pprint's job is to "pretty-print arbitray Python data structures in a form which can be used as input to the interpreter" (quoting the first sentence of its documentation), i.e., like repr() it's fundamentally intended as a debugging tool that's supposed to match how Python works, not any particular externally imposed output format. Now, how Python works has changed. Previously dict order was arbitrary, so picking the arbitrary order that happened to be sorted was a nice convenience. Now, dict order isn't arbitrary, and sorting dicts both obscures the actual structure of the Python objects, and also breaks round-tripping through pprint. Given that pprint's overarching documented contract of "represent Python objects" now conflicts with the more-specific documented contract of "sort dict keys", something has to give. My feeling is that we should preserve the overarching contract, not the details of how dicts were handled. Here's another example of a teacher struggling with this: https://mastodon.social/@aparrish/13011522 But I would be in favor of adding a kwarg to let people opt-in to the old behavior like: from pprint import PrettyPrinter pprint = PrettyPrinter(sortdict=True).pprint -n -- Nathaniel J. Smith -- https://vorpus.org From steve at pearwood.info Tue Dec 19 02:09:28 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 19 Dec 2017 18:09:28 +1100 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171219034154.GE16230@ando.pearwood.info> Message-ID: <20171219070928.GG16230@ando.pearwood.info> On Mon, Dec 18, 2017 at 08:28:52PM -0800, Chris Barker wrote: > On Mon, Dec 18, 2017 at 7:41 PM, Steven D'Aprano > wrote: > > > > With arbitrary order, it made sense to sort, so as to always give the > > same > > > "pretty" representation. But now that order is "part of" the dict itself, > > > it seems prettyprint should present the preserved order of the dict. > > > > I disagree. Many uses of dicts are still conceptually unordered, even if > > the dict now preserves insertion order. For those use-cases, insertion > > order is of no interest whatsoever, and sorting is still "prettier". > > > > and many uses of dicts have "sorted" order as completely irrelevant, and > sorting them arbitrarily is not necessarily pretty (you can't provide a > sort key can you? -- so yes, it's arbitrary) I completely agree. We might argue that it was a mistake to sort dicts in the first place, or at least a mistake to *always* sort them without allowing the caller to provide a sort key. But what's done is done: the fact that dicts are sorted by pprint is not merely an implementation detail, but a documented behaviour. > I'm not necessarily saying we should break things, but I won't agree that > pprint sorting dicts is the "right" interface for what is actually an > order-preserving mapping. If sorting dicts was the "right" behaviour in Python 3.4, it remains the right behaviour -- at least for use-cases that don't care about insertion order. Anyone using pprint on dicts *now* doesn't care about insertion order. If they did, they would be using OrderedDict. That will change in the future, but even in the future there are lots of use-cases for dicts where insertion order might as well be random. The order that some dict happen to be constructed may not be "pretty" or significant or even consistent from one dict to the next. (If your key/values pairs are coming in from an external source, they might not always come in the same order.) I'm not denying that sometimes it would be nice to see dicts in insertion order. Right now, those use-cases are handled by OrderedDict but in the future many of them will be taken over by regular dicts. So we have a conflict: - for some use-cases, insertion order is the "right" way for pprint to display the dict; - but for others, sorting by keys is the "pretty" way for pprint to display the dict; - and there's no way for pprint to know which is which just by inspecting the dict. How to break this tie? Backwards compatibility trumps all. If we want to change the default behaviour of pprint, we need to go through a deprecation period. Or add a flag sorted=True, and let the caller decide. > I would think it was only the right choice in the first place in order (get > it?) to get a consistent representation, not because sorting was a good > thing per se. *shrug* That's arguable. As you said yourself, dicts were sorted by key to give a "pretty" representation. I'm not so sure that consistency is the justification. What does that even mean? If you print the same dict twice, with no modifications, it will print the same whether you sort first or not. If you print two different dicts, who is to say that they were constructed in the same order? But the point is moot: whatever the justification, the fact that pprint sorts dicts by key is the defined behaviour, and even if it was a mistake to guarantee it, we can't just change it without a deprecation period. -- Steve From rosuav at gmail.com Tue Dec 19 02:37:20 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 19 Dec 2017 18:37:20 +1100 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: <20171219070928.GG16230@ando.pearwood.info> References: <20171219034154.GE16230@ando.pearwood.info> <20171219070928.GG16230@ando.pearwood.info> Message-ID: On Tue, Dec 19, 2017 at 6:09 PM, Steven D'Aprano wrote: > I completely agree. We might argue that it was a mistake to sort dicts > in the first place, or at least a mistake to *always* sort them without > allowing the caller to provide a sort key. But what's done is done: the > fact that dicts are sorted by pprint is not merely an implementation > detail, but a documented behaviour. Personally, I think it's a good default behaviour. Frequently you have a dictionary that, in normal code, you look up by key rather than iterating over; but for debugging, you print out everything. (Example: HTTP request/response headers. In your code, you'll query the headers dict for "content-type", but it's nice to be able to say "show me every header".) Insertion order is meaningless, and it's nice to be able to read them in some kind of sane order. Simply sorting the keys in the default way is good enough for a LOT of use-cases. > I'm not denying that sometimes it would be nice to see dicts in > insertion order. Right now, those use-cases are handled by OrderedDict > but in the future many of them will be taken over by regular dicts. So > we have a conflict: > > - for some use-cases, insertion order is the "right" way for pprint > to display the dict; > > - but for others, sorting by keys is the "pretty" way for pprint to > display the dict; > > - and there's no way for pprint to know which is which just by > inspecting the dict. > > How to break this tie? Backwards compatibility trumps all. If we want > to change the default behaviour of pprint, we need to go through a > deprecation period. > > Or add a flag sorted=True, and let the caller decide. Agreed, except for the last bit. I'd be inclined to kill two birds with one stone: add a flag sort_key=DEFAULT or sort_key=IDENTITY, which will sort the keys by themselves; you can provide a key function to change the way they're sorted, or you can pass sort_key=None to get them in insertion order. >> I would think it was only the right choice in the first place in order (get >> it?) to get a consistent representation, not because sorting was a good >> thing per se. > > *shrug* That's arguable. As you said yourself, dicts were sorted by key > to give a "pretty" representation. I'm not so sure that consistency is > the justification. What does that even mean? If you print the same dict > twice, with no modifications, it will print the same whether you sort > first or not. If you print two different dicts, who is to say that they > were constructed in the same order? In old versions of Python, this code would always produce the same result: d = {} d["qwer"] = 1 d["asdf"] = 2 d["zxcv"] = 3 print(d) import pprint; pprint.pprint(d) Then along came hash randomization, and the output would change - but pprint would still be consistent and tidy. Now we have insertion order retained, and both the repr and the pprint are consistent. Both of them are sane. They're just different sameness. > But the point is moot: whatever the justification, the fact that pprint > sorts dicts by key is the defined behaviour, and even if it was a > mistake to guarantee it, we can't just change it without a deprecation > period. This is really the clincher. But IMO the current behaviour isn't *just* for backcompat; it's good, useful behaviour as it is. I wouldn't want to see it changed even _with_ deprecation. ChrisA From steve.dower at python.org Tue Dec 19 02:38:18 2017 From: steve.dower at python.org (Steve Dower) Date: Mon, 18 Dec 2017 23:38:18 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: <20171219070928.GG16230@ando.pearwood.info> References: <20171219034154.GE16230@ando.pearwood.info> <20171219070928.GG16230@ando.pearwood.info> Message-ID: On 18Dec2017 2309, Steven D'Aprano wrote: > [A LOT OF THINGS I AGREE WITH] I agree completely with Steven's reasoning here, and it bothers me that what is an irrelevant change to many users (dict becoming ordered) seems to imply that all users of dict have to be updated. I have never needed OrderedDict before, and dict now also being ordered doesn't mean that when I reach for it I'm doing it because I need an ordered dict - I probably just need a regular dict. *Nothing* about dict should change for me between versions. Adding an option to pprint to explicitly control sorting without changing the default is fine. Please stop assuming that everyone wants an OrderedDict when they say dict. It's an invalid assumption. Cheers, Steve From eric at trueblade.com Tue Dec 19 02:49:05 2017 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 19 Dec 2017 02:49:05 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: On 12/18/2017 9:41 PM, Chris Barker wrote: > I'm really surprised no one seems to get my point here. > > TL;DR: > My point is that having type annotation syntax required for something in > the stdlib is a significant step toward "normalizing" type hinting in > Python. Whether that's a good idea or not is a judgement call, but it IS > a big step. I get your point, I'm just not concerned about it. I also don't think it's surprising that you can put misleading information (including non-types) in type annotations. All of the documentation and discussions are quite clear that type information is ignored at runtime. It _is_ true that @dataclass does actually inspect the type at runtime, but those uses are very rare. And if you do need them, the actual type T used by ClassVar[T] and InitVar[T] are still ignored. Data Classes is also not the first use of type annotations in the stdlib: https://docs.python.org/3/library/typing.html#typing.NamedTuple When I say that "typing is optional", I mean importing the typing module, not that annotations are optional. Eric. > @Chris > > People are still allowed not to use dataclasses if they really don't > like type hints :-) > Seriously however, annotations are just syntax. In this sense PEP > 526 is more like PEP 3107, > and less like PEP 484. People are still free to write: > > @dataclass > class C: > ??? x: "first coordinate" > ??? y: "second coordinate" > ??? plus: "I don't like types" > > > Well, yes, of course, but this is not like PEP 3107, as it introduces a > requirement for annotations (maybe not *type* annotations per se) in the > std lib. Again, that may be the best way to go -- but it should be done > deliberately. > > @dataclass > > class C: > ??? x: ... > ??? y: ... > > > Ah! I had no idea you could use ellipses to indicate no type. That > actually helps a lot. We really should have that prominent in the docs. > And in the dataclass docs, not just the type hinting docs -- again, > people will want to use these that may not have any interest in nor > prior knowledge of type hints. > > I don't see so big difference between hypothesis (testing lib) using > annotations for their purposes > from the situation with dataclasses. > > > The big difference is that hypothesis is not in the standard library. > Also, I didn't know about hypothesis until just now, but their very > first example in the quick start does not use annotation syntax, so it's > not as baked in as it is with dataclasses. > > If you have ideas about how to improve the dataclass docs, this can > be discussed in the issue https://bugs.python.org/issue32216 > > > > I'll try to find time to contribute there -- though maybe better to have > the doc draft in gitHub? > > > ... the type will in fact be completely ignored by the > implementation. > > Newbies are going to be confused by this -- they really are. > > This is not different from > > def f(x: int): > ??? pass > > f("What")? # OK > > that exists starting from Python 3.0. Although I agree this is > confusing, the way forward could be just explaining this better in > the docs. > > > Again the difference is that EVERY introduction to defining python > functions doesn't use the type hint. And even more to the point, you CAN > define a function without any annotations. > > But frankly, I think as type hinting becomes more common, we're going to > see a lot of confusion :-( > > If you want my personal opinion about the current situation about > type hints _in general_, then I can say that > I have seen many cases where people use type hints where they are > not needed > (for example in 10 line scripts or in highly polymorphic functions), > so I agree that some community > style guidance (like PEP 8) may be helpful. > > > It's going to get worse before it gets better :-( > > @dataclass > class C: > ???? x = field() > > > that does require that `field` be imported, so not as nice. I kinda like > the ellipses better. > > but good to have a way. > > -Chris > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice > 7600 Sand Point Way NE ??(206) 526-6329?? fax > Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception > > Chris.Barker at noaa.gov > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com > From eric at trueblade.com Tue Dec 19 02:53:47 2017 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 19 Dec 2017 02:53:47 -0500 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171219034154.GE16230@ando.pearwood.info> <20171219070928.GG16230@ando.pearwood.info> Message-ID: On 12/19/2017 2:38 AM, Steve Dower wrote: > On 18Dec2017 2309, Steven D'Aprano wrote: >> [A LOT OF THINGS I AGREE WITH] > I agree completely with Steven's reasoning here, and it bothers me that > what is an irrelevant change to many users (dict becoming ordered) seems > to imply that all users of dict have to be updated. > > I have never needed OrderedDict before, and dict now also being ordered > doesn't mean that when I reach for it I'm doing it because I need an > ordered dict - I probably just need a regular dict. *Nothing* about dict > should change for me between versions. > > Adding an option to pprint to explicitly control sorting without > changing the default is fine. Please stop assuming that everyone wants > an OrderedDict when they say dict. It's an invalid assumption. Well said, Steve and Steven. I completely agree. Eric. From njs at pobox.com Tue Dec 19 03:27:33 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 19 Dec 2017 00:27:33 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171219034154.GE16230@ando.pearwood.info> <20171219070928.GG16230@ando.pearwood.info> Message-ID: On Mon, Dec 18, 2017 at 11:38 PM, Steve Dower wrote: > On 18Dec2017 2309, Steven D'Aprano wrote: >> [A LOT OF THINGS I AGREE WITH] > I agree completely with Steven's reasoning here, and it bothers me that > what is an irrelevant change to many users (dict becoming ordered) seems > to imply that all users of dict have to be updated. Can we all take a deep breath and lay off the hyperbole? The only point under discussion in this subthread is whether pprint -- our module for producing nicely-formatted-reprs -- should continue to sort keys, or should continue to provide an accurate repr. There are reasonable arguments for both positions, but no-one's suggesting anything in the same solar system as "all users of dict have to be updated". Am I missing some underlying nerve that this is hitting for some reason? -n -- Nathaniel J. Smith -- https://vorpus.org From p.f.moore at gmail.com Tue Dec 19 05:53:20 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 19 Dec 2017 10:53:20 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: On 19 December 2017 at 07:49, Eric V. Smith wrote: > Data Classes is also not the first use of type annotations in the stdlib: > https://docs.python.org/3/library/typing.html#typing.NamedTuple > Also, the fact that no-one raised this issue during the whole time the PEP was being discussed (at least as far as I recollect) and that Guido (who of all of us should be most aware of what is and isn't acceptable use of annotations in the stdlib) approved the PEP, suggests to me that this isn't that big a deal. The only thing that has surprised me in this discussion is that the actual type used in the annotation makes no difference. And once someone reminded me that types are never enforced at runtime (you can call f(x: int) with f('haha')) that seemed fine. Paul From p.f.moore at gmail.com Tue Dec 19 06:04:34 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 19 Dec 2017 11:04:34 +0000 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171219034154.GE16230@ando.pearwood.info> <20171219070928.GG16230@ando.pearwood.info> Message-ID: On 19 December 2017 at 08:27, Nathaniel Smith wrote: > On Mon, Dec 18, 2017 at 11:38 PM, Steve Dower wrote: >> On 18Dec2017 2309, Steven D'Aprano wrote: >>> [A LOT OF THINGS I AGREE WITH] >> I agree completely with Steven's reasoning here, and it bothers me that >> what is an irrelevant change to many users (dict becoming ordered) seems >> to imply that all users of dict have to be updated. > > Can we all take a deep breath and lay off the hyperbole? The only > point under discussion in this subthread is whether pprint -- our > module for producing nicely-formatted-reprs -- should continue to sort > keys, or should continue to provide an accurate repr. There are > reasonable arguments for both positions, but no-one's suggesting > anything in the same solar system as "all users of dict have to be > updated". > > Am I missing some underlying nerve that this is hitting for some reason? IMO, the key thing is that people appear to be talking as if changing documented behaviour without deprecation is acceptable. Or even if they are OK with a deprecation period, they are still talking about changing documented behaviour for a trivial reason (as Steve Dower said, most people will still use dict for its dict behaviour, not for its orderedness). People (including me) are getting irritated because backward compatibility, which is normally a key principle, is being treated so lightly here. (It happened in another thread as well - changing the output of csv.DictReader from OrderedDict to dict - again suggesting that breaking a documented behaviour was OK, just because dicts are now ordered). Paul From erik.m.bray at gmail.com Tue Dec 19 06:12:01 2017 From: erik.m.bray at gmail.com (Erik Bray) Date: Tue, 19 Dec 2017 12:12:01 +0100 Subject: [Python-Dev] Question on a seemingly useless doctest Message-ID: Hi all, I have a ticket [1] that's hung up on a failure in one doctest in the form of sage.doctest.sources.FileDocTestSource._test_enough_doctests. This test has been there since, it seems, as long as the current doctest framework has been in place and nobody seems to have questioned it. Its expected output is generated from the Sage sources themselves, and can change when tests are added or removed to any module (if any of those tests should be "skipped"). Over the years the expected output to this test has just been updated as necessary. But in taking a closer look at the test--and I could be mistaken--but it's not even a useful test. It's *attempting* to validate that the doctest parser skips tests when it's supposed to. But it performs this validation by...implementing its own, less robust doctest parser, and comparing the results of that to the results of the real doctest parser. Sometimes--in fact often--the comparison is wrong (as the test itself acknowledges). This doesn't seem to me a correct or useful way to validate the doctest parser. If there are cases that the real doctest parser should be tested against, then unit tests/regression tests should be written that simply test the real doctest parser against those cases and check the results. Having essentially a real doctest parser, and a "fake" one that's incorrect doesn't make sense to me, unless there's something about this I'm misunderstanding. I would propose to just remove the test. If there are any actual regressions it's responsible for catching then more focused regression tests should be written for those cases. Erik [1] https://trac.sagemath.org/ticket/24261#comment:24 From erik.m.bray at gmail.com Tue Dec 19 06:17:16 2017 From: erik.m.bray at gmail.com (Erik Bray) Date: Tue, 19 Dec 2017 12:17:16 +0100 Subject: [Python-Dev] Question on a seemingly useless doctest In-Reply-To: References: Message-ID: Sorry, completely fat-fingered my autocomplete and sent to to wrong list. On Tue, Dec 19, 2017 at 12:12 PM, Erik Bray wrote: > Hi all, > > I have a ticket [1] that's hung up on a failure in one doctest in the > form of sage.doctest.sources.FileDocTestSource._test_enough_doctests. > > This test has been there since, it seems, as long as the current > doctest framework has been in place and nobody seems to have > questioned it. Its expected output is generated from the Sage sources > themselves, and can change when tests are added or removed to any > module (if any of those tests should be "skipped"). Over the years > the expected output to this test has just been updated as necessary. > > But in taking a closer look at the test--and I could be mistaken--but > it's not even a useful test. It's *attempting* to validate that the > doctest parser skips tests when it's supposed to. But it performs > this validation by...implementing its own, less robust doctest parser, > and comparing the results of that to the results of the real doctest > parser. Sometimes--in fact often--the comparison is wrong (as the > test itself acknowledges). > > This doesn't seem to me a correct or useful way to validate the > doctest parser. If there are cases that the real doctest parser > should be tested against, then unit tests/regression tests should be > written that simply test the real doctest parser against those cases > and check the results. Having essentially a real doctest parser, and > a "fake" one that's incorrect doesn't make sense to me, unless there's > something about this I'm misunderstanding. > > I would propose to just remove the test. If there are any actual > regressions it's responsible for catching then more focused regression > tests should be written for those cases. > > Erik > > > [1] https://trac.sagemath.org/ticket/24261#comment:24 From encukou at gmail.com Tue Dec 19 10:10:06 2017 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 19 Dec 2017 16:10:06 +0100 Subject: [Python-Dev] PEP 489: module m_traverse called with NULL module state In-Reply-To: <20171214120034.063e27c2@fsol> References: <20171213211540.6b92975a@fsol> <20171214120034.063e27c2@fsol> Message-ID: On Thu, Dec 14, 2017 at 12:00 PM, Antoine Pitrou wrote: > On Thu, 14 Dec 2017 17:00:10 +1000 > Nick Coghlan wrote: >> On 14 Dec. 2017 9:19 am, "Antoine Pitrou" wrote: >> >> >> Hello, >> >> After debugging a crash on AppVeyor for a submitter's PR >> (see https://github.com/python/cpython/pull/4611 ), I came to the >> following diagnosis: converting the "atexit" module (which is a >> built-in C extension) to PEP 489 multiphase initialization can lead to >> its m_traverse function (and presumably also m_clear and m_free) to be >> called while not module state is yet registered: that is, >> `PyModule_GetState(self)` when called from m_traverse returns NULL! >> >> Is that an expected or known subtlety? >> >> >> Not that I'm aware of, so I'd be inclined to classify it as a bug in the >> way we're handling multi-phase initialisation unless/until we determine >> there's no way to preserve the existing invariant from the single phase >> case. > > Speaking of which, the doc is not very clear: is PEP 489 required for > multi-interpreter support or is PyModule_GetState() sufficient? Yes, it is possible to have proper subinterpreter support without multi-phase init. From solipsis at pitrou.net Tue Dec 19 10:19:26 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 19 Dec 2017 16:19:26 +0100 Subject: [Python-Dev] PEP 489: module m_traverse called with NULL module state In-Reply-To: References: <20171213211540.6b92975a@fsol> <20171214120034.063e27c2@fsol> Message-ID: <20171219161926.4ff9114e@fsol> On Tue, 19 Dec 2017 16:10:06 +0100 Petr Viktorin wrote: > > > > Speaking of which, the doc is not very clear: is PEP 489 required for > > multi-interpreter support or is PyModule_GetState() sufficient? > > Yes, it is possible to have proper subinterpreter support without > multi-phase init. Thanks. I guess the C API docs need a user-friendly section laying out the various methods for initializing a module, and their various advantages :-) Regards Antoine. From encukou at gmail.com Tue Dec 19 10:53:49 2017 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 19 Dec 2017 16:53:49 +0100 Subject: [Python-Dev] PEP 489: module m_traverse called with NULL module state In-Reply-To: <20171219161926.4ff9114e@fsol> References: <20171213211540.6b92975a@fsol> <20171214120034.063e27c2@fsol> <20171219161926.4ff9114e@fsol> Message-ID: On 12/19/2017 04:19 PM, Antoine Pitrou wrote: > On Tue, 19 Dec 2017 16:10:06 +0100 > Petr Viktorin wrote: >>> >>> Speaking of which, the doc is not very clear: is PEP 489 required for >>> multi-interpreter support or is PyModule_GetState() sufficient? >> >> Yes, it is possible to have proper subinterpreter support without >> multi-phase init. > > Thanks. I guess the C API docs need a user-friendly section laying out > the various methods for initializing a module, and their various > advantages :-) That, or eventually remove multi-phase init's disadvantages, and have just one way to do it :) From nad at python.org Tue Dec 19 03:42:35 2017 From: nad at python.org (Ned Deily) Date: Tue, 19 Dec 2017 03:42:35 -0500 Subject: [Python-Dev] [RELEASE] Python 3.6.4 is now available Message-ID: <24E5A059-9558-4D15-B846-82A771FEC188@python.org> On behalf of the Python development community and the Python 3.6 release team, I am happy to announce the availability of Python 3.6.4, the fourth maintenance release of Python 3.6. Detailed information about the changes made in 3.6.4 can be found in the change log here: https://docs.python.org/3.6/whatsnew/changelog.html#python-3-6-4-final Please see "What?s New In Python 3.6" for more information about the new features in Python 3.6: https://docs.python.org/3.6/whatsnew/3.6.html You can download Python 3.6.4 here: https://www.python.org/downloads/release/python-364/ The next maintenance release of Python 3.6 is expected to follow in about 3 months, around the end of 2018-03. More information about the 3.6 release schedule can be found here: https://www.python.org/dev/peps/pep-0494/ Enjoy! -- Ned Deily nad at python.org -- [] From barry at python.org Tue Dec 19 11:14:07 2017 From: barry at python.org (Barry Warsaw) Date: Tue, 19 Dec 2017 11:14:07 -0500 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> Message-ID: On Dec 18, 2017, at 22:37, Nathaniel Smith wrote: > Wait, what? Why would changing pprint (so that it accurately reflects > dict's new underlying semantics!) be a breaking change? Are you > suggesting it shouldn't be changed in 3.7? As others have pointed out, exactly because the current behavior is documented. And we all know that if it?s documented (and often even if it?s not, but that?s besides the point here) it will be relied upon. So we can?t change the default behavior. But I have no problems conceptually with giving users options. The devil is in the details though, e.g. should we special case dictionary sorting only? Should we use a sort `key` to mirror sorted() and list.sort()? We can figure those things out and whether it?s even worth doing. I don?t think that?s PEP-worthy, so if someone is sufficiently motivated, please open an issue on bpo. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Dec 19 11:49:36 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 20 Dec 2017 01:49:36 +0900 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <20171219035818.GF16230@ando.pearwood.info> Message-ID: <23097.17184.943235.782649@turnbull.sk.tsukuba.ac.jp> Nathaniel Smith writes: > To make sure I understand, do you actually have a script like this, or > is this hypothetical? I have a couple of doctests that assume that pprint will sort by key, yes. It makes the tests look quite a bit nicer by pprinting the output, and I get sorting (which matters for some older Pythons) for free. (I admit I don't actually use those tests with older Pythons, but the principle stands.) I don't see why we don't do the obvious, namely add the option to use "native" order to the PrettyPrinter class, with the default being backward compatible. -- Associate Professor Division of Policy and Planning Science http://turnbull/sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull at sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Dec 19 11:47:28 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 20 Dec 2017 01:47:28 +0900 Subject: [Python-Dev] f-strings In-Reply-To: References: <7809465429117446362@unknownmsgid> Message-ID: <23097.17056.197997.743640@turnbull.sk.tsukuba.ac.jp> Mariatta Wijaya writes: > I agree it's useful info :) > > I went ahead and made a PR [1]. > In my PR, I simply linked to the Format Specification Mini Language[2] from > f-strings documentation[3]. > > Not sure about updating PEP 498 at this point.. I don't see any reason not to document tips and tricks with f-strings, and this is a nice and useful example. But it looks like TOOWTDI to me. The syntax is documented (6.1.3.1 in the Library Reference), along with a specific relevant example ("Aligning the text and specifying a width" in 6.1.3.2). So -1 on putting the recipe in the reference docs. I really don't think this kind of information belongs in a PEP for sure, and probably not even in the Library Reference. The Tutorial might be a good place for it, though. If I were Bach, I'd compose a more-itertools-like module to be named Variations_on_the_F_String. :-) Steve From steve at holdenweb.com Tue Dec 19 12:30:45 2017 From: steve at holdenweb.com (Steve Holden) Date: Tue, 19 Dec 2017 17:30:45 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: On Tue, Dec 19, 2017 at 10:53 AM, Paul Moore wrote: > On 19 December 2017 at 07:49, Eric V. Smith wrote: > > Data Classes is also not the first use of type annotations in the stdlib: > > https://docs.python.org/3/library/typing.html#typing.NamedTuple > > > > Also, the fact that no-one raised this issue during the whole time the > PEP was being discussed (at least as far as I recollect) and that > Guido (who of all of us should be most aware of what is and isn't > acceptable use of annotations in the stdlib) approved the PEP, > suggests to me that this isn't that big a deal. > > The only thing that has surprised me in this discussion is that the > actual type used in the annotation makes no difference. And once > someone reminded me that types are never enforced at runtime (you can > call f(x: int) with f('haha')) that seemed fine. > ?If anything, this makes things more difficult for the learner.? The fact that annotations are formally undefined as to anything but syntax is sensible but can be misleading (as the example above clearly shows). In the typing module it's logical to see annotations, I guess. But I really hope they aren't sprinkled around willy-nilly. Sooner or later there will be significant demand for annotated libraries, even though CPython will perform exactly as it does with non-annotated code. I can see the value of annotations in other environments and for different purposes, but it would be a pity if this were to unnecessarily complicate the stdlib. regards Steve -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at holdenweb.com Tue Dec 19 12:57:17 2017 From: steve at holdenweb.com (Steve Holden) Date: Tue, 19 Dec 2017 17:57:17 +0000 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: <23097.17184.943235.782649@turnbull.sk.tsukuba.ac.jp> References: <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <20171219035818.GF16230@ando.pearwood.info> <23097.17184.943235.782649@turnbull.sk.tsukuba.ac.jp> Message-ID: On Tue, Dec 19, 2017 at 4:49 PM, Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > Nathaniel Smith writes: > > > To make sure I understand, do you actually have a script like this, or > > is this hypothetical? > > I have a couple of doctests that assume that pprint will sort by key, > yes. It makes the tests look quite a bit nicer by pprinting the > output, and I get sorting (which matters for some older Pythons) for > free. (I admit I don't actually use those tests with older Pythons, > but the principle stands.) > > I don't see why we don't do the obvious, namely add the option to use > "native" order to the PrettyPrinter class, with the default being > backward compatible. > ?Perhaps now key ordering has been pronounced we could either add a "sorted" method to dicts equivalent to the following code. def sorted(self):? return {self[k] for k in sorted(self.keys())} Alternatively the sorted built-in could be modified to handle dicts in this way. Though I still find the assumption of any ordering at all a bit weird I suppose I'll grow used to it. regards Steve -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Dec 19 13:04:45 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 19 Dec 2017 10:04:45 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> Message-ID: On Tue, Dec 19, 2017 at 8:14 AM, Barry Warsaw wrote: > On Dec 18, 2017, at 22:37, Nathaniel Smith wrote: > > > Wait, what? Why would changing pprint (so that it accurately reflects > > dict's new underlying semantics!) be a breaking change? Are you > > suggesting it shouldn't be changed in 3.7? > > As others have pointed out, exactly because the current behavior is > documented. And we all know that if it?s documented (and often even if > it?s not, but that?s besides the point here) it will be relied upon. > Nathaniel Smith has pointed out that eval(pprint(a_dict)) is supposed to return the same dict -- so documented behavior may already be broken. (though I assume order is still ignored when comparing dicts, so: eval(pprint(a_dict)) == a_dict will still hold. But practicality beats purity, and a number of folks have already posted use-cases where they rely on sorted order, so there you go. > So we can?t change the default behavior. But I have no problems > conceptually with giving users options. The devil is in the details > though, e.g. should we special case dictionary sorting only? > Should we use a sort `key` to mirror sorted() and list.sort()? > That would be a nice feature! If anything is done, I think we should allow a key function. and maybe have key=None as "unsorted" -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Tue Dec 19 13:47:24 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 19 Dec 2017 18:47:24 +0000 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <20171219035818.GF16230@ando.pearwood.info> <23097.17184.943235.782649@turnbull.sk.tsukuba.ac.jp> Message-ID: On 19 December 2017 at 17:57, Steve Holden wrote: > On Tue, Dec 19, 2017 at 4:49 PM, Stephen J. Turnbull > wrote: >> >> Nathaniel Smith writes: >> >> > To make sure I understand, do you actually have a script like this, or >> > is this hypothetical? >> >> I have a couple of doctests that assume that pprint will sort by key, >> yes. It makes the tests look quite a bit nicer by pprinting the >> output, and I get sorting (which matters for some older Pythons) for >> free. (I admit I don't actually use those tests with older Pythons, >> but the principle stands.) >> >> I don't see why we don't do the obvious, namely add the option to use >> "native" order to the PrettyPrinter class, with the default being >> backward compatible. > > > Perhaps now key ordering has been pronounced we could either add a "sorted" > method to dicts equivalent to the following code. > > def sorted(self): > return {self[k] for k in sorted(self.keys())} > > Alternatively the sorted built-in could be modified to handle dicts in this > way. I don't think there's any need for this. > Though I still find the assumption of any ordering at all a bit weird I > suppose I'll grow used to it. As far as I'm concerned, dictionaries are still exactly as they were before - key-value mappings with no inherent order. None of my code makes any assumption about the ordering of dictionaries, so it'll be 100% unaffected by this change. I find this whole debate about the "consequences" of mandating insertion order to be completely out of proportion. As far as I'm concerned, the only practical impact is that when you iterate over things like dictionary displays, **kw arguments, etc, you get the "obvious" order, and it's not a lucky accident that you do so. Certainly, with the order guaranteed, people who currently use OrderedDict will be able to simply use a dict in future - although I'd expect very few people will take advantage of this in the immediate future, as by doing so they'll be restricting their code to Python 3.7+ only, for no significant benefit. And let's not forget that OrderedDict is used far less frequently than plain dictionaries, so we're talking about a small percentage of a tiny percentage of uses of mapping objects in the wild that will in *any* way be affected by this change. Paul From chris.barker at noaa.gov Tue Dec 19 15:11:55 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 19 Dec 2017 12:11:55 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <460940d5-48cb-4726-7f6f-e6391495f2bd@trueblade.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: On Mon, Dec 18, 2017 at 11:49 PM, Eric V. Smith wrote: > I also don't think it's surprising that you can put misleading information > (including non-types) in type annotations. All of the documentation and > discussions are quite clear that type information is ignored at runtime. > Sure -- but that's documentation of type annotations -- someone uninterested in typing, or completely unaware of it, will not be reading those docs. > Data Classes is also not the first use of type annotations in the stdlib: > https://docs.python.org/3/library/typing.html#typing.NamedTuple That's in the typing package, yes? collections.namedtuple is unchanged. So yes, obviously the entire typing package is about typing. This is something that has nothing to do with typing, but does use the typing syntax. It really is different. I haven't started teaching typing to newbies yet -- but I imagine I will have to some day -- and when I do, it will be in the context of: here is an optional feature that you can use along with a static type checker. And I can make it clear that the annotations only apply to the static type checker, and not run-time behavior. But using type annotations for something other than providing information to a static type checker, in an stdlib module, changes that introduction. And people don't read all the docs -- they read to the first example of how to use it, and away they go. And if that example is something like: @dataclass class C: a: int b: float = 0.0 There WILL be confusion. Paul Moore wrote: > Also, the fact that no-one raised this issue during the whole time the > PEP was being discussed (at least as far as I recollect) and that > Guido (who of all of us should be most aware of what is and isn't > acceptable use of annotations in the stdlib) approved the PEP, > suggests to me that this isn't that big a deal. That suggests to me that the people involved in discussing the PEP may not be representative of the bulk of Python users. There are a number of us that are uncomfortable with static typing in general, and the python-dev community has been criticised for doing too much, moving too fast, and complicating the language unnecessarily. The PEP's been accepted, so let's move forward, but please be aware of these issues with the documentation and examples. I'll try to contribute to that discussion as well. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Dec 19 15:32:03 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 19 Dec 2017 12:32:03 -0800 Subject: [Python-Dev] f-strings In-Reply-To: <23097.17056.197997.743640@turnbull.sk.tsukuba.ac.jp> References: <7809465429117446362@unknownmsgid> <23097.17056.197997.743640@turnbull.sk.tsukuba.ac.jp> Message-ID: On Tue, Dec 19, 2017 at 8:47 AM, Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > I don't see any reason not to document tips and tricks with f-strings, > and this is a nice and useful example. But it looks like TOOWTDI to > me. The syntax is documented (6.1.3.1 in the Library Reference), > along with a specific relevant example ("Aligning the text and > specifying a width" in 6.1.3.2). > > So -1 on putting the recipe in the reference docs. I really don't > think this kind of information belongs in a PEP for sure, and probably > not even in the Library Reference. The docs (and I think PEP) have been updated to clearly state that f-strings use the same formatting specifiers as .format(), and have links to those docs. So I think we're good. > The Tutorial might be a good place > for it, though. > yup. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericfahlgren at gmail.com Tue Dec 19 15:34:26 2017 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Tue, 19 Dec 2017 12:34:26 -0800 Subject: [Python-Dev] f-strings In-Reply-To: <23097.17056.197997.743640@turnbull.sk.tsukuba.ac.jp> References: <7809465429117446362@unknownmsgid> <23097.17056.197997.743640@turnbull.sk.tsukuba.ac.jp> Message-ID: On Tue, Dec 19, 2017 at 8:47 AM, Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > If I were Bach, I'd compose a more-itertools-like module to be named > Variations_on_the_F_String. :-) > ?Would that be P.D.Q. Bach to whom you are referring?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.cliffe at btinternet.com Tue Dec 19 18:35:20 2017 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Tue, 19 Dec 2017 23:35:20 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: <310ffb27-54dd-8465-6020-c12dc3cbffa3@btinternet.com> On 19/12/2017 20:11, Chris Barker wrote: > There are a number of us that are uncomfortable with static typing in > general, +1 > and the python-dev community has been criticised for doing too much, > moving too fast, and complicating the language unnecessarily. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Tue Dec 19 19:56:16 2017 From: steve.dower at python.org (Steve Dower) Date: Tue, 19 Dec 2017 16:56:16 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> Message-ID: <651ed3ba-79c4-2f4a-1299-0ca560e8996f@python.org> On 19Dec2017 1004, Chris Barker wrote: > Nathaniel Smith has pointed out that eval(pprint(a_dict)) is supposed to > return the same dict -- so documented behavior may already be broken. Two relevant quotes from the pprint module docs: >>> The pprint module provides a capability to ?pretty-print? arbitrary Python data structures in a form which can be used as input to the interpreter >>> Dictionaries are sorted by key before the display is computed. It says nothing about the resulting dict being the same as the original one, just that it can be used as input. So these are both still true (until someone deliberately breaks the latter). In any case, there are so many ways to spoil the first point for yourself that it's hardly worth treating as an important constraint. > (though I assume order is still ignored when comparing dicts, so: > eval(pprint(a_dict)) == a_dict will still hold. Order had better be ignored when comparing dicts, or plenty of code will break. For example: >>> {'a': 1, 'b': 2} == {'b': 2, 'a': 1} True Saying that "iter(dict)" will produce keys in the same order as they were inserted is not the same as saying that "dict" is an ordered mapping. As far as I understand, we've only said the first part. (And the "nerve" here is that I disagreed with even the first part, but didn't fight it too strongly because I never relied on the iteration order of dict. However, I *do* rely on nobody else relying on the iteration order of dict either, and so proposals to change existing semantics that were previously independent of insertion order to make them rely on insertion order will affect me. So now I'm pushing back.) Cheers, Steve From njs at pobox.com Tue Dec 19 20:32:52 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 19 Dec 2017 17:32:52 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: <651ed3ba-79c4-2f4a-1299-0ca560e8996f@python.org> References: <20171104173013.GA4005@bytereef.org> <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <651ed3ba-79c4-2f4a-1299-0ca560e8996f@python.org> Message-ID: On Tue, Dec 19, 2017 at 4:56 PM, Steve Dower wrote: > On 19Dec2017 1004, Chris Barker wrote: >> >> Nathaniel Smith has pointed out that eval(pprint(a_dict)) is supposed to >> return the same dict -- so documented behavior may already be broken. > > > Two relevant quotes from the pprint module docs: > >>>> The pprint module provides a capability to ?pretty-print? arbitrary >>>> Python data structures in a form which can be used as input to the >>>> interpreter > >>>> Dictionaries are sorted by key before the display is computed. > > It says nothing about the resulting dict being the same as the original one, > just that it can be used as input. So these are both still true (until > someone deliberately breaks the latter). This is a pretty fine hair to be splitting... I'm sure you wouldn't argue that it would be valid to display the dict {"a": 1} as '["hello"]', just because '["hello"]' is a valid input to the interpreter (that happens to produce a different object than the original one) :-). I think we can assume that pprint's output is supposed to let you reconstruct the original data structures, at least in simple cases, even if that isn't explicitly stated. > In any case, there are so many ways > to spoil the first point for yourself that it's hardly worth treating as an > important constraint. I guess the underlying issue here is partly the question of what the pprint module is for. In my understanding, it's primarily a tool for debugging/introspecting Python programs, and the reason it talks about "valid input to the interpreter" isn't because we want anyone to actually feed the data back into the interpreter, but to emphasize that it provides an accurate what-you-see-is-what's-really-there view into how the interpreter understands a given object. It also emphasizes that this is not intended for display to end users; making the output format be "Python code" suggests that the main intended audience is people who know how to read, well, Python code, and therefore can be expected to care about Python's semantics. >> (though I assume order is still ignored when comparing dicts, so: >> eval(pprint(a_dict)) == a_dict will still hold. > > > Order had better be ignored when comparing dicts, or plenty of code will > break. For example: > >>>> {'a': 1, 'b': 2} == {'b': 2, 'a': 1} > True Yes, this is never going to change -- I expect that in the long run, the only semantic difference between dict and OrderedDict will be in their __eq__ methods. > Saying that "iter(dict)" will produce keys in the same order as they were > inserted is not the same as saying that "dict" is an ordered mapping. As far > as I understand, we've only said the first part. > > (And the "nerve" here is that I disagreed with even the first part, but > didn't fight it too strongly because I never relied on the iteration order > of dict. However, I *do* rely on nobody else relying on the iteration order > of dict either, and so proposals to change existing semantics that were > previously independent of insertion order to make them rely on insertion > order will affect me. So now I'm pushing back.) I mean, I don't want to be a jerk about this, and we still need to examine things on a case-by-case basis but... Guido has pronounced that Python dict preserves order. If your code "rel[ies] on nobody else relying on the iteration order", then starting in 3.7 your code is no longer Python. Obviously I like that change more than you, but to some extent it's just something we have to live with, and even if I disagreed with the new semantics I'd still rather the standard library handle them consistently rather than being half-one-thing-and-half-another. -n -- Nathaniel J. Smith -- https://vorpus.org From barry at python.org Tue Dec 19 21:56:13 2017 From: barry at python.org (Barry Warsaw) Date: Tue, 19 Dec 2017 21:56:13 -0500 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <651ed3ba-79c4-2f4a-1299-0ca560e8996f@python.org> Message-ID: <4E15949C-A446-4867-8EE8-11945C0383D2@python.org> On Dec 19, 2017, at 20:32, Nathaniel Smith wrote: > I guess the underlying issue here is partly the question of what the > pprint module is for. In my understanding, it's primarily a tool for > debugging/introspecting Python programs, and the reason it talks about > "valid input to the interpreter" isn't because we want anyone to > actually feed the data back into the interpreter, but to emphasize > that it provides an accurate what-you-see-is-what's-really-there view > into how the interpreter understands a given object. It also > emphasizes that this is not intended for display to end users; making > the output format be "Python code" suggests that the main intended > audience is people who know how to read, well, Python code, and > therefore can be expected to care about Python's semantics. pprint.pprint() is indeed mostly for debugging, but not always. As an example of what will break if you change the sorting guarantee: in Mailman 3 the REST etag is calculated from the pprint.pformat() of the result dictionary before it?s JSON-ified. If the order is changed, then it?s possible a client will have an incorrect etag for a structure that is effectively the same. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From v+python at g.nevcal.com Wed Dec 20 00:15:24 2017 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 19 Dec 2017 21:15:24 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <651ed3ba-79c4-2f4a-1299-0ca560e8996f@python.org> Message-ID: On 12/19/2017 5:32 PM, Nathaniel Smith wrote: > On Tue, Dec 19, 2017 at 4:56 PM, Steve Dower wrote: >> On 19Dec2017 1004, Chris Barker wrote: >>> Nathaniel Smith has pointed out that eval(pprint(a_dict)) is supposed to >>> return the same dict -- so documented behavior may already be broken. >> >> Two relevant quotes from the pprint module docs: >> >>>>> The pprint module provides a capability to ?pretty-print? arbitrary >>>>> Python data structures in a form which can be used as input to the >>>>> interpreter >>>>> Dictionaries are sorted by key before the display is computed. >> It says nothing about the resulting dict being the same as the original one, >> just that it can be used as input. So these are both still true (until >> someone deliberately breaks the latter). > This is a pretty fine hair to be splitting... I'm sure you wouldn't > argue that it would be valid to display the dict {"a": 1} as > '["hello"]', just because '["hello"]' is a valid input to the > interpreter (that happens to produce a different object than the > original one) :-). I think we can assume that pprint's output is > supposed to let you reconstruct the original data structures, at least > in simple cases, even if that isn't explicitly stated. Any dict object read in from pprint is going to be a different object, not the original one. And, unless the original insertion order was sorted by the same key as pprint uses to sort, the iteration order will be different from the original. As pointed out below, it will compare equal to the original dict. pprint has always allowed you to reconstruct the original data structures, but not the iteration order of dicts. With the new insertion order guarantee, nothing has changed, here. A far more interesting question than what pprint does to dict order is what marshal and pickle do (and have done) with the dict order, although I can't figure that out from the documentation. > >> In any case, there are so many ways >> to spoil the first point for yourself that it's hardly worth treating as an >> important constraint. > I guess the underlying issue here is partly the question of what the > pprint module is for. In my understanding, it's primarily a tool for > debugging/introspecting Python programs, and the reason it talks about > "valid input to the interpreter" isn't because we want anyone to > actually feed the data back into the interpreter, but to emphasize > that it provides an accurate what-you-see-is-what's-really-there view > into how the interpreter understands a given object. It also > emphasizes that this is not intended for display to end users; making > the output format be "Python code" suggests that the main intended > audience is people who know how to read, well, Python code, and > therefore can be expected to care about Python's semantics. > >>> (though I assume order is still ignored when comparing dicts, so: >>> eval(pprint(a_dict)) == a_dict will still hold. >> >> Order had better be ignored when comparing dicts, or plenty of code will >> break. For example: >> >>>>> {'a': 1, 'b': 2} == {'b': 2, 'a': 1} >> True > Yes, this is never going to change -- I expect that in the long run, > the only semantic difference between dict and OrderedDict will be in > their __eq__ methods. > >> Saying that "iter(dict)" will produce keys in the same order as they were >> inserted is not the same as saying that "dict" is an ordered mapping. As far >> as I understand, we've only said the first part. >> >> (And the "nerve" here is that I disagreed with even the first part, but >> didn't fight it too strongly because I never relied on the iteration order >> of dict. However, I *do* rely on nobody else relying on the iteration order >> of dict either, and so proposals to change existing semantics that were >> previously independent of insertion order to make them rely on insertion >> order will affect me. So now I'm pushing back.) > I mean, I don't want to be a jerk about this, and we still need to > examine things on a case-by-case basis but... Guido has pronounced > that Python dict preserves order. If your code "rel[ies] on nobody > else relying on the iteration order", then starting in 3.7 your code > is no longer Python. > > Obviously I like that change more than you, but to some extent it's > just something we have to live with, and even if I disagreed with the > new semantics I'd still rather the standard library handle them > consistently rather than being half-one-thing-and-half-another. > > -n > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Dec 20 00:36:03 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 20 Dec 2017 16:36:03 +1100 Subject: [Python-Dev] Revisiting old enhancement requests Message-ID: <20171220053602.GC4215@ando.pearwood.info> What is the best practice for revisiting old enhancement requests on the tracker, if I believe that the time is right to revisit a rejected issue from many years ago? (Nearly a decade.) Should I raise a new enhancement request and link back to the old one, or re-open the original? Thanks, -- Steve From tjreedy at udel.edu Wed Dec 20 01:52:34 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 20 Dec 2017 01:52:34 -0500 Subject: [Python-Dev] Revisiting old enhancement requests In-Reply-To: <20171220053602.GC4215@ando.pearwood.info> References: <20171220053602.GC4215@ando.pearwood.info> Message-ID: On 12/20/2017 12:36 AM, Steven D'Aprano wrote: > What is the best practice for revisiting old enhancement requests on the > tracker, if I believe that the time is right to revisit a rejected issue > from many years ago? (Nearly a decade.) I have been thinking about the opposite: revisit old enhancement requests that have been open for a decade that I thing have no change ever and should be closed. > Should I raise a new enhancement request and link back to the old one, > or re-open the original? I think the answer for both is to consider posting on python-ideas. -- Terry Jan Reedy From steve at pearwood.info Wed Dec 20 03:43:19 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 20 Dec 2017 19:43:19 +1100 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: <651ed3ba-79c4-2f4a-1299-0ca560e8996f@python.org> References: <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <651ed3ba-79c4-2f4a-1299-0ca560e8996f@python.org> Message-ID: <20171220084319.GD4215@ando.pearwood.info> On Tue, Dec 19, 2017 at 04:56:16PM -0800, Steve Dower wrote: > On 19Dec2017 1004, Chris Barker wrote: > >(though I assume order is still ignored when comparing dicts, so: > >eval(pprint(a_dict)) == a_dict will still hold. > > Order had better be ignored when comparing dicts, or plenty of code will > break. For example: > > >>> {'a': 1, 'b': 2} == {'b': 2, 'a': 1} > True > > Saying that "iter(dict)" will produce keys in the same order as they > were inserted is not the same as saying that "dict" is an ordered > mapping. As far as I understand, we've only said the first part. Indeed. Regular dicts preserve insertion order, they don't take insertion order into account for the purposes of equality. See the example here: https://docs.python.org/3.7/library/stdtypes.html#mapping-types-dict and the description of mapping equality: https://docs.python.org/3.7/reference/expressions.html#value-comparisons "Mappings (instances of dict) compare equal if and only if they have equal (key, value) pairs. Equality comparison of the keys and values enforces reflexivity." Changing that would be a *huge* backwards-compatibility breaking change. Aside: I've just noticed that mapping equality is not transitive: a == b and b == c does not imply that a == c. py> from collections import OrderedDict as OD py> a, b, c = OD.fromkeys('xyz'), dict.fromkeys('xyz'), OD.fromkeys('zyx')) py> a == b == c True py> a == c False -- Steve From solipsis at pitrou.net Wed Dec 20 04:24:04 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 20 Dec 2017 10:24:04 +0100 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? References: <20171104173013.GA4005@bytereef.org> <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <651ed3ba-79c4-2f4a-1299-0ca560e8996f@python.org> Message-ID: <20171220102404.1fb93a4e@fsol> On Tue, 19 Dec 2017 17:32:52 -0800 Nathaniel Smith wrote: > > > In any case, there are so many ways > > to spoil the first point for yourself that it's hardly worth treating as an > > important constraint. > > I guess the underlying issue here is partly the question of what the > pprint module is for. In my understanding, it's primarily a tool for > debugging/introspecting Python programs, and the reason it talks about > "valid input to the interpreter" isn't because we want anyone to > actually feed the data back into the interpreter, [...] Actually, when you want to include a large constant in a Python program, pprint() can be useful to get a nicer formatting for your source code. That said, I do think that pprint() should continue sorting dicts by default. Even though dicts may be ordered *now*, most uses of dict don't expect any particular order. (I also think the pprint() example shows the potential confusion issues with making dict ordered by default, as the user and implementor of an API may not agree whether dict order is significant...) Regards Antoine. From steve at pearwood.info Wed Dec 20 05:31:20 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 20 Dec 2017 21:31:20 +1100 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <20171219035818.GF16230@ando.pearwood.info> Message-ID: <20171220103119.GE4215@ando.pearwood.info> On Mon, Dec 18, 2017 at 08:49:54PM -0800, Nathaniel Smith wrote: > On Mon, Dec 18, 2017 at 7:58 PM, Steven D'Aprano wrote: > > I have a script which today prints data like so: [...] > To make sure I understand, do you actually have a script like this, or > is this hypothetical? The details are much simplified, but basically, and my users probably won't literally yell at me, but yes I do. But does it matter? The thing about backwards-compatibility guarantees is that we have to proceed as if somebody does have such a script. We don't know who, we don't know why, but we have to assume that they are relying on whatever guarantees we've given and will be greatly inconvenienced by any change without sufficient notice. > > Now, maybe that's my own damn fault for using > > pprint [...] > > so I think I can be excused having relied on that feature. > > No need to get aggro -- I asked a question, it wasn't a personal attack. I didn't interpret it as an attack. Sorry for any confusion, I was trying to be funny -- at least, it sounded funny in my own head. > At a high-level, pprint's job is to "pretty-print arbitray Python data > structures in a form which can be used as input to the interpreter" > (quoting the first sentence of its documentation), i.e., like repr() The *high* level purpose of pprint is to *pretty-print* values, like the name says. If all we wanted was something that outputs an eval()'able representation, we already had that: repr(). But even that requirement that output can be used as input to the interpreter is a non-core promise. There are plenty of exceptions: recursive data structures, functions, any object with the default repr, etc. Even when it works, the guarantee is quite weak. For instance, even the object type is not preserved: py> class MyDict(dict): ... pass ... py> d = MyDict() py> x = eval(repr(d)) py> assert d == x py> assert type(d) == type(x) Traceback (most recent call last): File "", line 1, in AssertionError So the "promise" that eval(repr(obj)) will round-trip needs to be understood as being one of those Nice To Have non-core promises, not an actual guaranteed feature. (The bold print giveth, and the fine print taketh away.) So the fact that the output of pprint doesn't preserve the order of the dict won't be breaking any documented language guarantees. (It is probably worth documenting explicitly though, rather than just letting it be implied by the sorted keys guarantee.) > it's fundamentally intended as a debugging tool that's supposed to > match how Python works, not any particular externally imposed output > format. The point of pprint is not merely to duplicate what repr() already does, but to output an aesthetically pleasing view of the data structure. There is no reason to think that is only for the purposes of debugging. pprint is listed in the docs under Data Types, not Debugging: https://docs.python.org/3/library/datatypes.html https://docs.python.org/3/library/debug.html > Now, how Python works has changed. Previously dict order was > arbitrary, so picking the arbitrary order that happened to be sorted > was a nice convenience. Beware of promising a feature for convenience, because people will come to rely on it. In any case, lexicographic (the default sorting) order is in some ways the very opposite of "arbitrary order". > Now, dict order isn't arbitrary, No, we can't say that. Dicts *preserve insertion order*, that is all. There is no requirement that the insertion order be meaningful or significant in any way: it may be completely arbitrary. If I build a mapping of (say) product to price: d = {'hammer': 5, 'screwdriver': 3, 'ladder': 116} the order the items are inserted is arbitrary, probably representing the historical accident of when they were added to the database/catalog or when I thought of them while typing in the dict. The most we can say is that for *some* cases, dict order *may* be meaningful. We're under no obligation to break backwards-compatibility guarantees in order for pretty printing to reflect a feature of dicts which may or may not be of any significance to the user. > and sorting dicts both obscures the actual structure of the > Python objects, You can't see the actual structure of Python objects via pprint. For example, you can't see whether the dict is a split table (shared keys) or combined table. You can only see the parts of the public interface which the repr, or pprint, chooses to show. That's always been the case so nothing changes here. If pprint were new to 3.7, I daresay there would be a good argument to have it display keys in insertion order, but given backwards compatibility, that's not tenable without either an opt-in switch, or a period of deprecation. > and also breaks round-tripping through pprint. Round-tripping need not promise to preserve order, since dicts don't care about order for the purposes of equality. Round-tripping already is a lossy operation: - object identity is always lost (apart from a few cached objects like small ints and singletons like None); - in some cases, the type of objects can be lost; - any attribute of the object which is not both reflected in its repr and set by its constructor will be lost; (e.g. x = something(); x.extra_attribute = 'spam') - many objects don't round-trip at all, e.g. functions and recursive data structures. So the failure of pprint to preserve such insertion order by default is just one more example. > Given that pprint's > overarching documented contract of "represent Python objects" now > conflicts with the more-specific documented contract of "sort dict > keys", something has to give. I believe the overarching contract is to pretty print. Anything else is a Nice To Have. [...] > But I would be in favor of adding a kwarg to let people opt-in to the > old behavior like: > > from pprint import PrettyPrinter > pprint = PrettyPrinter(sortdict=True).pprint It would have to be the other way: opt-out of the current behaviour. -- Steve From solipsis at pitrou.net Wed Dec 20 12:30:55 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 20 Dec 2017 18:30:55 +0100 Subject: [Python-Dev] Usefulness of binary compatibility accross Python versions? References: <20171216142257.2a0c978c@fsol> Message-ID: <20171220183055.6db7c6b4@fsol> Following this discussion, I opened two issues: * https://bugs.python.org/issue32387: "Disallow untagged C extension import on major platforms" * https://bugs.python.org/issue32388: "Remove cross-version binary compatibility" Regards Antoine. On Sat, 16 Dec 2017 14:22:57 +0100 Antoine Pitrou wrote: > Hello, > > Nowadays we have an official mechanism for third-party C extensions to > be binary-compatible accross feature releases of Python: the stable ABI. > > But, for non-stable ABI-using C extensions, there are also mechanisms > in place to *try* and ensure binary compatibility. One of them is the > way in which we add tp_ slots to the PyTypeObject structure. > > Typically, when adding a tp_XXX slot, you also need to add a > Py_TPFLAGS_HAVE_XXX type flag to signal those static type structures > that have been compiled against a recent enough PyTypeObject > definition. This way, extensions compiled against Python N-1 are > supposed to "still work": as they don't have Py_TPFLAGS_HAVE_XXX set, > the core Python runtime won't try to access the (non-existing) tp_XXX > member. > > However, beside internal code complication, it means you need to add a > new Py_TPFLAGS_HAVE_XXX each time we add a slot. Since we have only 32 > such bits available (many of them already taken), it is a very limited > resource. Is it worth it? (*) Can an extension compiled against Python > N-1 really claim to be compatible with Python N, despite other possible > differences? > > (*) we can't extend the tp_flags field to 64 bits, precisely because of > the binary compatibility problem... > > Regards > > Antoine. > > From random832 at fastmail.com Wed Dec 20 15:17:57 2017 From: random832 at fastmail.com (Random832) Date: Wed, 20 Dec 2017 15:17:57 -0500 Subject: [Python-Dev] Usefulness of binary compatibility accross Python versions? In-Reply-To: <20171218112331.2a08c2f7@fsol> References: <20171216142257.2a0c978c@fsol> <1513562822.11518.1208219672.2A3C0470@webmail.messagingengine.com> <20171218112331.2a08c2f7@fsol> Message-ID: <1513801077.175073.1211584368.235D85A2@webmail.messagingengine.com> On Mon, Dec 18, 2017, at 05:23, Antoine Pitrou wrote: > On Sun, 17 Dec 2017 21:07:02 -0500 > Random832 wrote: > > > > Is there any practical for of having the flag off for one slot and on > > for another slot that's been added later? > > > > Could this be replaced (that is, a slot for such a thing added before > > it's too late) with a simple counter that goes up with each version, and > > any "unused" slot should have NULL or some other sentinel value? > > Any replacement here would break binary compatibility, which is what > those flags are precisely meant to avoid. I meant replacing the mechanism for new fields, rather than existing ones. > > If it > > really is important to have the flags themselves, just add another set > > of flags - Py_TPFLAGS_HAVE_MORE_FLAGS. > > Yes, we could... but it's more complication again. Hmm, maybe that could be eased with macros... /* Doing this preprocessor trick because if we used a ternary * operator, dummy macros needed to prevent compile errors may * become an attractive nuisance */ #define Py_TPFLAG_CHK(tp, flagname) \ Py__TPFCHK_##flagname(tp, flagname) #define Py__TPFCHK_OLD(tp, flagname) \ ((tp).tp_flags & Py_TPFLAGS_##flagname) #define Py__TPFCHK_NEW(tp, flagname) \ ((tp).tp_flags & Py_TPFLAGS_TPFLAGVER \ && (tp).tp_flagver >= Py_TPFLAGVER_##flagname \ && Py_TPFLAGCHK_##flagname(tp)) #define Py__TPFCHK_HEAPTYPE Py__TPFCHK_OLD #define Py_TPFLAGS_HEAPTYPE (1UL<<9) #define Py__TPFCHK_TPFLAGVER Py__TPFCHK_OLD #define Py_TPFLAGS_TPFLAGVER (1UL<<31) #define Py__TPFCHK_NEWFLD Py__TPFCHK_NEW #define Py_TPFLAGVER_NEWFLD 32 #define Py_TPFLAGCHK_NEWFLD(tp) ((tp).tp_newfld != NULL) So to check heap type you get Py_TPFLAG_CHK(tp, HEAPTYPE) ((tp).tp_flags & (1UL<<9)) And to check newfld1 you get Py_TPFLAG_CHK(tp, NEWFLD) ((tp).tp_flags & (1UL<<31) && (tp).tp_flagver >= 32 && ((tp).tp_newfld != ((void *)0))) Or in a "more flags" scenario: #define Py__TPFCHK_NEW(tp, flagname) \ ((tp).tp_flags & Py_TPFLAGS_TPMOREFLAGS \ && (tp).tp_moreflag & PY_TPMOREFLAGS_##flagname) From steve at pearwood.info Wed Dec 20 18:50:03 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 21 Dec 2017 10:50:03 +1100 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <20171219035818.GF16230@ando.pearwood.info> <20171220103119.GE4215@ando.pearwood.info> Message-ID: <20171220235003.GG4215@ando.pearwood.info> On Wed, Dec 20, 2017 at 03:23:16PM -0800, Chris Barker wrote: > On Wed, Dec 20, 2017 at 2:31 AM, Steven D'Aprano > wrote: > > Even when it works, the guarantee is quite weak. For instance, even > > the object type is not preserved: > > > > py> class MyDict(dict): > > ... pass > > ... > > py> d = MyDict() > > py> x = eval(repr(d)) > > py> assert d == x > > py> assert type(d) == type(x) > > Traceback (most recent call last): > > File "", line 1, in > > AssertionError > > > > Oh come on! If you subclass, and don't override __repr__ -- you're written > a (very slightly) broken class (OK, a class with a broken __repr__). Why is it broken? Is it documented somewhere that every subclass MUST override __repr__? If there's a bug here, and I'm not sure that there is, the bug is in dict itself, for having a repr which isn't friendly to subclasses. But in practice, why would I care? Obviously sometimes I do care, and for debugging it is often good to have a custom repr for subclasses, but it isn't mandatory or even always useful. Especially since in practice, it isn't that common to round-trip repr though eval (apart from the REPL itself, of course). -- Steve From python-dev at mgmiller.net Wed Dec 20 18:57:19 2017 From: python-dev at mgmiller.net (Mike Miller) Date: Wed, 20 Dec 2017 15:57:19 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <3ECA48D2-90FB-4AED-B87C-251951ABCF7F@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: On 2017-12-19 02:53, Paul Moore wrote: > Also, the fact that no-one raised this issue during the whole time the > PEP was being discussed (at least as far as I recollect) and that > Guido (who of all of us should be most aware of what is and isn't > acceptable use of annotations in the stdlib) approved the PEP, > suggests to me that this isn't that big a deal. Hi, I asked about this in the first posting of the PEP and agree with Chris. https://mail.python.org/pipermail/python-dev/2017-September/149406.html There is definitely a passive bias towards using types with dataclasses in that the Eric (the author) doesn't appear to want an example without them in the pep/docs. It seems that typing proponents are sufficiently enamored with them that they can't imagine anyone else feeling differently, haha. Personally, I wouldn't use types with Python unless I was leading a large project with a large team of folks with different levels of experience. That's where types shine, and those folks might be better served by Java or Kotlin. So we hearing that "types are optional" while the docs may imply the opposite. Liked the ellipsis since None is often used as a sentinel value and an extra import is a drag. -Mike From eric at trueblade.com Wed Dec 20 20:13:55 2017 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 20 Dec 2017 20:13:55 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <95799b46-94a3-d34b-34ba-2e37ba5779b5@trueblade.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> Message-ID: <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> On 12/20/2017 6:57 PM, Mike Miller wrote: > On 2017-12-19 02:53, Paul Moore wrote: >> Also, the fact that no-one raised this issue during the whole time the >> PEP was being discussed (at least as far as I recollect) and that >> Guido (who of all of us should be most aware of what is and isn't >> acceptable use of annotations in the stdlib) approved the PEP, >> suggests to me that this isn't that big a deal. > > > Hi, I asked about this in the first posting of the PEP and agree with > Chris. > > > https://mail.python.org/pipermail/python-dev/2017-September/149406.html > > > There is definitely a passive bias towards using types with dataclasses > in that the Eric (the author) doesn't appear to want an example without > them in the pep/docs. I'm not sure what such an example would look like. Do you mean without annotations? Or do you mean without specifying the "correct" type, like: @dataclass class C: x: int = 'hello world' ? Or something else? Can you provide an example of what you'd like to see? > It seems that typing proponents are sufficiently enamored with them that > they can't imagine anyone else feeling differently, haha. I've never used typing or mypy, so you're not talking about me. I do like the conciseness that annotations bring to dataclasses, though. If you buy that (and you might not), then I don't see the point of not using a correct type annotation. Eric. From eric at trueblade.com Wed Dec 20 20:29:41 2017 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 20 Dec 2017 20:29:41 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> Message-ID: On 12/20/2017 8:13 PM, Eric V. Smith wrote: >> There is definitely a passive bias towards using types with >> dataclasses in that the Eric (the author) doesn't appear to want an >> example without them in the pep/docs. > > I'm not sure what such an example would look like. Do you mean without > annotations? Or do you mean without specifying the "correct" type, like: > > @dataclass > class C: > ?? x: int = 'hello world' > > ? > > Or something else? > > Can you provide an example of what you'd like to see? Re-reading my post you referenced, is it just an example using typing.Any? I'm okay with that in the docs, I just didn't want to focus on it in the PEP. I want the PEP to only have the one reference to typing, for typing.ClassVar. I figure the people reading the PEP can extrapolate to all of the possible uses for annotations that they don't need to see a typing.Any example. Eric. From chris.barker at noaa.gov Thu Dec 21 01:46:01 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 20 Dec 2017 22:46:01 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <3418511732122395686@unknownmsgid> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> Message-ID: On Wed, Dec 20, 2017 at 5:29 PM, Eric V. Smith wrote: > There is definitely a passive bias towards using types with dataclasses in > that the Eric (the author) doesn't appear to want an example without them > in the pep/docs. > >> >> I'm not sure what such an example would look like. Do you mean without >> annotations? > > IIUC, there is not way to make a dataclass without annotations, yes? That it, using annotations to determine the fields is the one and only way the decorator works. So it's impossible to give an example without annotations, yes? > Or do you mean without specifying the "correct" type, like: >> >> @dataclass >> class C: >> x: int = 'hello world' >> > It may be a good idea to have an example like that in the docs (but probably not the PEP) to make it clear that the type is not used in any way at run time. But I don't think that anyone is suggesting that would be a recommended practice. I suggest that it be clear in the docs, and ideally in the PEP, that the dataclass decorator is using the *annotation" syntax, and that the the only relevant part it uses is that an annotation exists, but the value of the annotation is essentially (completely?) ignored. So we should have examples like: @dataclass class C: a: ... # field with no default b: ... = 0 # filed with a default value Then maybe: @dataclass class C: a: "the a parameter" # field with no default b: "another, different parameter" = 0.0 # field with a default Then the docs can go to say that if the user wants to specify a type for use with a static type checking pre-processor, they can do it like so: @dataclass class C: a: int # integer field with no default b: float = 0.0 # float field with a default And the types will be recognized by type checkers such as mypy. And I think the non-typed examples should go first in the docs. This is completely analogous to how all the other parts of python are taught. Would anyone suggest that the very first example of a function definition that a newbie sees would be: def func(a: int, b:float = 0.0): body_of_function Then, _maybe_ way down on the page, you mention that oh, by the way, those types are completely ignored by Python. And not even give any examples without types? > Re-reading my post you referenced, is it just an example using typing.Any? I actually think that is exactly the wrong point -- typing.Any is still using type hinting -- it's an explicit way to say, "any type will do", but it's only relevant if you are using a type checker. We really need examples for folks that don't know or care about type hinting at all. typing.Any is for use by people that are explicitly adding type hinting, and should be discussed in type hinting documentation. > I'm okay with that in the docs, I just didn't want to focus on it in the PEP. I want the PEP to only > have the one reference to typing, for typing.ClassVar. I figure the people reading the PEP can > extrapolate to all of the possible uses for annotations that they don't need to see a typing.Any > example. no they don't, but they DO need to see examples without type hints at all. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Thu Dec 21 04:22:35 2017 From: eric at trueblade.com (Eric V. Smith) Date: Thu, 21 Dec 2017 04:22:35 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> Message-ID: <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> On 12/21/2017 1:46 AM, Chris Barker wrote: > On Wed, Dec 20, 2017 at 5:29 PM, Eric V. Smith > wrote: > > There is definitely a passive bias towards using types with > dataclasses in that the Eric (the author) doesn't appear to want an > example without them in the pep/docs. > > > I'm not sure what such an example would look like. Do you mean > without annotations? > > > IIUC, there is not way to make a dataclass without annotations, yes? > That it, using annotations to determine the fields is the one and only > way the decorator works. So it's impossible to give an example without > annotations, yes? Correct. Well, you will be able to use make_dataclass() without type information after I fix bpo-32278, but most users won't be using that. > I suggest that it be clear in the docs, and ideally in the PEP, that the > dataclass decorator is using the *annotation" syntax, and that the the > only relevant part it uses is that an annotation exists, but the value > of the annotation is essentially (completely?) ignored. I think the PEP is very clear about this: "The dataclass decorator examines the class to find fields. A field is defined as any variable identified in __annotations__. That is, a variable that has a type annotation. With two exceptions described below, none of the Data Class machinery examines the type specified in the annotation." I agree the docs should also be clear about this. > So we should > have examples like: > > @dataclass > class C: > ??? a: ...? # field with no default > ??? b: ... = 0 # filed with a default value > > Then maybe: > > @dataclass > class C: > ??? a: "the a parameter" # field with no default > ??? b: "another, different parameter" = 0.0 # field with a default > > Then the docs can go to say that if the user wants to specify a type for > use with a static type checking pre-processor, they can do it like so: > > @dataclass > class C: > ??? a: int # integer field with no default > ??? b: float = 0.0 # float field with a default > > And the types will be recognized by type checkers such as mypy. > > And I think the non-typed examples should go first in the docs. I'll leave this for others to decide. The docs, and how approachable they are to various audiences, isn't my area of expertise. > This is completely analogous to how all the other parts of python are > taught. Would anyone suggest that the very first example of a function > definition that a newbie sees would be: > > def func(a: int, b:float = 0.0): > ??? body_of_function > > Then, _maybe_ way down on the page, you mention that oh, by the way, > those types are completely ignored by Python. And not even give any > examples without types? > > > > Re-reading my post you referenced, is it just an example using > typing.Any? > > I actually think that is exactly the wrong point -- typing.Any is still > using type hinting -- it's an explicit way to say, "any type will do", > but it's only relevant if you are using a type checker. We really need > examples for folks that don't know or care about type hinting at all. > > typing.Any is for use by people that are explicitly adding type hinting, > and should be discussed in type hinting documentation. > > >? I'm okay with that in the docs, I just didn't want to focus on it in > the PEP. I want the PEP to only > > have the one reference to typing, for typing.ClassVar. I figure the > people reading the PEP can > > extrapolate to all of the possible uses for annotations that they > don't need to see a typing.Any > > example. > > no they don't, but they DO need to see examples without type hints at all. I'm not opposed to this in the documentation. Maybe we should decide on a convention on what to use to convey "don't care". I've seen typing.Any, None, ellipsis, strings, etc. all used. Eric. From tjreedy at udel.edu Thu Dec 21 05:22:27 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 21 Dec 2017 05:22:27 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On 12/21/2017 4:22 AM, Eric V. Smith wrote: > On 12/21/2017 1:46 AM, Chris Barker wrote: >> I suggest that it be clear in the docs, and ideally in the PEP, that >> the dataclass decorator is using the *annotation" syntax, and that the >> the only relevant part it uses is that an annotation exists, but the >> value of the annotation is essentially (completely?) ignored. > > I think the PEP is very clear about this: "The dataclass decorator > examines the class to find fields. A field is defined as any variable > identified in __annotations__. That is, a variable that has a type > annotation. With two exceptions described below, none of the Data Class > machinery examines the type specified in the annotation." This seems clear enough. It could come after describing what a dataclass *is*. > I agree the docs should also be clear about this. >> So we should have examples like: >> >> @dataclass >> class C: >> ???? a: ...? # field with no default >> ???? b: ... = 0 # filed with a default value >> >> Then maybe: >> >> @dataclass >> class C: >> ???? a: "the a parameter" # field with no default >> ???? b: "another, different parameter" = 0.0 # field with a default >> >> Then the docs can go to say that if the user wants to specify a type >> for use with a static type checking pre-processor, they can do it like >> so: >> >> @dataclass >> class C: >> ???? a: int # integer field with no default >> ???? b: float = 0.0 # float field with a default >> >> And the types will be recognized by type checkers such as mypy. >> >> And I think the non-typed examples should go first in the docs. Module some bike-shedding, the above seems pretty good to me. > > I'll leave this for others to decide. The docs, and how approachable > they are to various audiences, isn't my area of expertise. -- Terry Jan Reedy From srkunze at mail.de Thu Dec 21 06:25:41 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 21 Dec 2017 12:25:41 +0100 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: <7f6de77e-d27f-742d-cab0-6c9fdf66f541@mail.de> On 21.12.2017 11:22, Terry Reedy wrote: > >>> @dataclass >>> class C: >>> ???? a: int # integer field with no default >>> ???? b: float = 0.0 # float field with a default >>> >>> And the types will be recognized by type checkers such as mypy. >>> >>> And I think the non-typed examples should go first in the docs. > I still don't understand why "I don't care" can be defined by "leaving out" @dataclass class C: ???? b = 0.0 # float field with a default For non-default fields, I like ellipsis too. Cheer, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Thu Dec 21 06:36:22 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Thu, 21 Dec 2017 12:36:22 +0100 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On 21 December 2017 at 11:22, Terry Reedy wrote: > On 12/21/2017 4:22 AM, Eric V. Smith wrote: > >> On 12/21/2017 1:46 AM, Chris Barker wrote: >> > > I suggest that it be clear in the docs, and ideally in the PEP, that the >>> dataclass decorator is using the *annotation" syntax, and that the the only >>> relevant part it uses is that an annotation exists, but the value of the >>> annotation is essentially (completely?) ignored. >>> >> >> I think the PEP is very clear about this: "The dataclass decorator >> examines the class to find fields. A field is defined as any variable >> identified in __annotations__. That is, a variable that has a type >> annotation. With two exceptions described below, none of the Data Class >> machinery examines the type specified in the annotation." >> > > This seems clear enough. It could come after describing what a dataclass > *is*. > > I agree the docs should also be clear about this. >> > > > So we should have examples like: >>> >>> @dataclass >>> class C: >>> a: ... # field with no default >>> b: ... = 0 # filed with a default value >>> >>> Then maybe: >>> >>> @dataclass >>> class C: >>> a: "the a parameter" # field with no default >>> b: "another, different parameter" = 0.0 # field with a default >>> >>> Then the docs can go to say that if the user wants to specify a type for >>> use with a static type checking pre-processor, they can do it like so: >>> >>> @dataclass >>> class C: >>> a: int # integer field with no default >>> b: float = 0.0 # float field with a default >>> >>> And the types will be recognized by type checkers such as mypy. >>> >>> And I think the non-typed examples should go first in the docs. >>> >> > Module some bike-shedding, the above seems pretty good to me. > For me, the three options for "don't care" have a bit different meaning: * typing.Any: this class is supposed to be used with static type checkers, but this field is too dynamic * ... (ellipsis): this class may or may not be used with static type checkers, use the inferred type in the latter case * "field docstring": this class should not be used with static type checkers Assuming this, the second option would be the "real" "don't care". If this makes sense, then we can go the way proposed in https://github.com/python/typing/issues/276 and make ellipsis semantics "official" in PEP 484. (pending Guido's approval) -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Thu Dec 21 09:23:12 2017 From: eric at trueblade.com (Eric V. Smith) Date: Thu, 21 Dec 2017 09:23:12 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <7f6de77e-d27f-742d-cab0-6c9fdf66f541@mail.de> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <7f6de77e-d27f-742d-cab0-6c9fdf66f541@mail.de> Message-ID: <5f2f3c20-ba10-96fc-ae7a-438e5d50a7b0@trueblade.com> On 12/21/17 6:25 AM, Sven R. Kunze wrote: > On 21.12.2017 11:22, Terry Reedy wrote: >> >>>> @dataclass >>>> class C: >>>> a: int # integer field with no default >>>> b: float = 0.0 # float field with a default >>>> >>>> And the types will be recognized by type checkers such as mypy. >>>> >>>> And I think the non-typed examples should go first in the docs. >> > > I still don't understand why "I don't care" can be defined by "leaving out" > > @dataclass > class C: > b = 0.0 # float field with a default Because you can't know the order that x and y are defined in this example: class C: x: int y = 0 'x' is not in C.__dict__, and 'y' is not in C.__annotations__. Someone will suggest a metaclass, but that has its own problems. Mainly, interfering with other metaclasses. Eric. > > > For non-default fields, I like ellipsis too. > > Cheer, > Sven > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com > From tjreedy at udel.edu Thu Dec 21 14:55:17 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 21 Dec 2017 14:55:17 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <5f2f3c20-ba10-96fc-ae7a-438e5d50a7b0@trueblade.com> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <7f6de77e-d27f-742d-cab0-6c9fdf66f541@mail.de> <5f2f3c20-ba10-96fc-ae7a-438e5d50a7b0@trueblade.com> Message-ID: On 12/21/2017 9:23 AM, Eric V. Smith wrote: > > > On 12/21/17 6:25 AM, Sven R. Kunze wrote: >> On 21.12.2017 11:22, Terry Reedy wrote: >>> >>>>> @dataclass >>>>> class C: >>>>> ???? a: int # integer field with no default >>>>> ???? b: float = 0.0 # float field with a default >>>>> >>>>> And the types will be recognized by type checkers such as mypy. >>>>> >>>>> And I think the non-typed examples should go first in the docs. >>> >> >> I still don't understand why "I don't care" can be defined by "leaving >> out" >> >> @dataclass >> class C: >> ???? b = 0.0 # float field with a default > > Because you can't know the order that x and y are defined in this example: > > class C: > ??? x: int > ??? y = 0 > > 'x' is not in C.__dict__, and 'y' is not in C.__annotations__. I think the understanding problem with this feature arises from two factors: using annotations to define possibly un-initialized slots is non-obvious; a new use of annotations for something other than static typing is a bit of a reversal of the recent pronouncement 'annotations should only be used for static typing'. Therefore, getting the permanent doc 'right' is important. The following naively plausible alternative does not work and cannot sensibly be made to work because the bare 'x' in the class scope, as opposed to a similar error within a method, causes NameError before the class is created. @dataclass class C: x y = 0 I think the doc should explicitly say that uninitialized fields require annotation with something (anything, not necessarily a type) simply to avoid NameError during class creation. It may not be obvious to some readers why x:'anything' does not also raise NameError, but that was a different PEP, and the dataclass doc could here link to wherever name:annotation in bodies is explained. -- Terry Jan Reedy From steve at holdenweb.com Thu Dec 21 16:46:03 2017 From: steve at holdenweb.com (Steve Holden) Date: Thu, 21 Dec 2017 21:46:03 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <7f6de77e-d27f-742d-cab0-6c9fdf66f541@mail.de> <5f2f3c20-ba10-96fc-ae7a-438e5d50a7b0@trueblade.com> Message-ID: On Thu, Dec 21, 2017 at 7:55 PM, Terry Reedy wrote: > On 12/21/2017 9:23 AM, Eric V. Smith wrote: > >> >> >> On 12/21/17 6:25 AM, Sven R. Kunze wrote: >> >>> On 21.12.2017 11:22, Terry Reedy wrote: >>> >>>> >>>> @dataclass >>>>>> class C: >>>>>> a: int # integer field with no default >>>>>> b: float = 0.0 # float field with a default >>>>>> >>>>>> And the types will be recognized by type checkers such as mypy. >>>>>> >>>>>> And I think the non-typed examples should go first in the docs. >>>>>> >>>>> >>>> >>> I still don't understand why "I don't care" can be defined by "leaving >>> out" >>> >>> @dataclass >>> class C: >>> b = 0.0 # float field with a default >>> >> >> Because you can't know the order that x and y are defined in this example: >> >> class C: >> x: int >> y = 0 >> >> 'x' is not in C.__dict__, and 'y' is not in C.__annotations__. >> > > ?Solely because, annotations being optional, the interpreter is not allowed to infer from its presence that an annotated name should be ?allocated an entry in __dict__, and clearly the value associated with it would be problematical. I think the understanding problem with this feature arises from two > factors: using annotations to define possibly un-initialized slots is > non-obvious; a new use of annotations for something other than static > typing is a bit of a reversal of the recent pronouncement 'annotations > should only be used for static typing'. Therefore, getting the permanent > doc 'right' is important. > ?Indeed. So annotations are optional, except where they aren't?? > The following naively plausible alternative does not work and cannot > sensibly be made to work because the bare 'x' in the class scope, as > opposed to a similar error within a method, causes NameError before the > class is created. > > @dataclass > class C: > x > y = 0 > > ?Quite. Could this be handled the same way not-yet initilialised slots are? (Pardon my ignornace). ? > I think the doc should explicitly say that uninitialized fields require > annotation with something (anything, not necessarily a type) simply to > avoid NameError during class creation. It may not be obvious to some > readers why x:'anything' does not also raise NameError, but that was a > different PEP, and the dataclass doc could here link to wherever > name:annotation in bodies is explained. > > ?This contortion is why I feel a better solution would be desirable. Alas I do not have one to hand. regards Steve? -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Dec 21 17:45:48 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 21 Dec 2017 14:45:48 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <7f6de77e-d27f-742d-cab0-6c9fdf66f541@mail.de> <5f2f3c20-ba10-96fc-ae7a-438e5d50a7b0@trueblade.com> Message-ID: On Thu, Dec 21, 2017 at 11:55 AM, Terry Reedy wrote: I think the understanding problem with this feature arises from two > factors: using annotations to define possibly un-initialized slots is > non-obvious; a new use of annotations for something other than static > typing is a bit of a reversal of the recent pronouncement 'annotations > should only be used for static typing'. you know, that may be where part of my confusion came from -- all the talk lately has been about "type hints" and "type annotations" -- the idea of "arbitrary annotations" has been lost. > Therefore, getting the permanent doc 'right' is important. > yup. > @dataclass > class C: > x > y = 0 > > I think the doc should explicitly say that uninitialized fields require > annotation with something (anything, not necessarily a type) simply to > avoid NameError during class creation. would this be possible? @dataclass class C: x: y: = 0 That is -- the colon indicates an annotation, but in this case, it's a "nothing" annotation. It's a syntax error now, but would it be possible to change that? Or would the parsing be ambiguous? particularly in other contexts. of course, then we'd need something to store in as a "nothing" annotation -- empty string? None? (but None might mean something) create yet anther type for "nothing_annotation" Hmm, I may have talked myself out of it.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Dec 21 18:10:35 2017 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 21 Dec 2017 23:10:35 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <7f6de77e-d27f-742d-cab0-6c9fdf66f541@mail.de> <5f2f3c20-ba10-96fc-ae7a-438e5d50a7b0@trueblade.com> Message-ID: <410a4b16-c8da-dec8-f172-f00a6fc2c2da@mrabarnett.plus.com> On 2017-12-21 22:45, Chris Barker wrote: > On Thu, Dec 21, 2017 at 11:55 AM, Terry Reedy > wrote: > > I think the understanding problem with this feature arises from two > factors: using annotations to define possibly un-initialized slots > is non-obvious; a new use of annotations for something other than > static typing is a bit of a reversal of the recent pronouncement > 'annotations should only be used for static typing'. > > > you know, that may be where part of my confusion came from -- all the > talk lately has been about "type hints" and "type annotations" -- the > idea of "arbitrary annotations" has been lost. > > Therefore, getting the permanent doc 'right' is important. > > > yup. > > @dataclass > class C: > ? ? x > ? ? y = 0 > > I think the doc should explicitly say that uninitialized fields > require annotation with something (anything, not necessarily a type) > simply to avoid NameError during class creation. > > > would this be possible? > > @dataclass > class C: > ? ? x: > ? ? y: = 0 > > That is -- the colon indicates an annotation, but in this case, it's a > "nothing" annotation. > > "..." or "pass", perhaps? @dataclass class C: x: ... y: ... = 0 or: @dataclass class C: x: pass y: pass = 0 > It's a syntax error now, but would it be possible to change that? Or > would the parsing be ambiguous? particularly in other contexts. > > of course, then we'd need something to store in as a "nothing" > annotation -- empty string? None? (but None might mean something) create > yet anther type for "nothing_annotation" > > Hmm, I may have talked myself out of it.... > From greg at krypto.org Thu Dec 21 18:21:30 2017 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 21 Dec 2017 23:21:30 +0000 Subject: [Python-Dev] pep-0557 dataclasses top level module vs part of collections? Message-ID: It seems a suggested use is "from dataclasses import dataclass" But people are already familiar with "from collections import namedtuple" which suggests to me that "from collections import dataclass" would be a more natural sounding API addition. But the dataclasses module has additional APIs beyond @dataclass which clearly do not belong at the top level in collections. Idea: How about making the current dataclasses.dataclass decorator function instead be a callable class instance (ie: it still functions as property, todays dataclasses.dataclass becomes collections.dataclass.__call__) with all of the current contents of the dataclasses module as attributes of a collections.dataclass class/instance singleton? It feels like a more natural API to me: from collections import dataclass @dataclass class ... and the following APIs show up on dataclass itself: dataclass.Field, dataclass.field, dataclass.fields, dataclass.make, dataclass.astuple, dataclass.replace, dataclass.asdict, dataclass.FrozenInstanceError, dataclass.InitVar instead of being in a separate dataclasses module and being a different style of thing to import than namedtuple. [ if this was discussed earlier for this pep and rejected and I missed it, my apologies, just drop me a reference to that thread if you've got one ] This isn't a blocker for me. I like having a dataclass implementation no matter how we arrange it. If we go with what's checked in today, a top level dataclasses module, so be it. I'm not going to bikeshed this to death it just feels odd to have such an API outside of collections but figured it was worth suggesting. Part of me just doesn't like the plural dataclasses module name. I can get over that. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Thu Dec 21 18:36:08 2017 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 21 Dec 2017 23:36:08 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <410a4b16-c8da-dec8-f172-f00a6fc2c2da@mrabarnett.plus.com> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <7f6de77e-d27f-742d-cab0-6c9fdf66f541@mail.de> <5f2f3c20-ba10-96fc-ae7a-438e5d50a7b0@trueblade.com> <410a4b16-c8da-dec8-f172-f00a6fc2c2da@mrabarnett.plus.com> Message-ID: On Thu, Dec 21, 2017 at 3:10 PM MRAB wrote: > On 2017-12-21 22:45, Chris Barker wrote: > > On Thu, Dec 21, 2017 at 11:55 AM, Terry Reedy > > wrote: > > > > I think the understanding problem with this feature arises from two > > factors: using annotations to define possibly un-initialized slots > > is non-obvious; a new use of annotations for something other than > > static typing is a bit of a reversal of the recent pronouncement > > 'annotations should only be used for static typing'. > > > > > > you know, that may be where part of my confusion came from -- all the > > talk lately has been about "type hints" and "type annotations" -- the > > idea of "arbitrary annotations" has been lost. > > > > Therefore, getting the permanent doc 'right' is important. > > > > > > yup. > > > > @dataclass > > class C: > > x > > y = 0 > > > > I think the doc should explicitly say that uninitialized fields > > require annotation with something (anything, not necessarily a type) > > simply to avoid NameError during class creation. > > > > > > would this be possible? > > > > @dataclass > > class C: > > x: > > y: = 0 > > > > That is -- the colon indicates an annotation, but in this case, it's a > > "nothing" annotation. > > > > > "..." or "pass", perhaps? > > @dataclass > class C: > x: ... > y: ... = 0 > > or: > > @dataclass > class C: > x: pass > y: pass = 0 > pass does not currently parse in that context. Otherwise I was thinking the same thing. But we already have ... which does - so I'd suggest that for people who are averse to importing anything from typing and using the also quite readable Any. (ie: document this as the expected practice with both having the same meaning) While I consider the annotation to be a good feature of data classes, it seems worth documenting that people not running a type analyzer should avoid declaring a type. A worse thing than no-type being specified is a wrong type being specified. That appearing in a library will break people who need their code to pass the analyzer and pytype, mypy, et. al. could be forced to implement a typeshed.pypi of sorts containing blacklists of known bad annotations in public libraries and/or actually correct type specification overrides for them. As for problems with order, if we were to accept @dataclass class Spam: beans = True ham: bool style instead, would it be objectionable to require keyword arguments only for dataclass __init__ methods? That'd get rid of the need to care about order. (but would annoy people with small 2-3 element data classes... so I'm assuming this idea is already rejected) -gps > > > It's a syntax error now, but would it be possible to change that? Or > > would the parsing be ambiguous? particularly in other contexts. > > > > of course, then we'd need something to store in as a "nothing" > > annotation -- empty string? None? (but None might mean something) create > > yet anther type for "nothing_annotation" > > > > Hmm, I may have talked myself out of it.... > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Dec 21 19:07:46 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 21 Dec 2017 16:07:46 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <7f6de77e-d27f-742d-cab0-6c9fdf66f541@mail.de> <5f2f3c20-ba10-96fc-ae7a-438e5d50a7b0@trueblade.com> <410a4b16-c8da-dec8-f172-f00a6fc2c2da@mrabarnett.plus.com> Message-ID: On Thu, Dec 21, 2017 at 3:36 PM, Gregory P. Smith wrote: > But we already have ... which does - so I'd suggest that for people who > are averse to importing anything from typing and using the also quite > readable Any. (ie: document this as the expected practice with both having > the same meaning) > I don't think they do, actually - I haven't been following the typing discussions, but someone in this thread said that ... means "use the type of teh default" or something like that. > While I consider the annotation to be a good feature of data classes, it > seems worth documenting that people not running a type analyzer should > avoid declaring a type. > +1 ! > A worse thing than no-type being specified is a wrong type being > specified. That appearing in a library will break people who need their > code to pass the analyzer and pytype, mypy, et. al. could be forced to > implement a typeshed.pypi of sorts containing blacklists of known bad > annotations in public libraries and/or actually correct type specification > overrides for them. > and the wrong type could be very common -- folks using "int", when float would do just fine, or "list" when any iterable would do, the list goes on and on. Typing is actually pretty complex in Python -- it's hard to get right, and if you aren't actually running a type checker, you'd never know. One challenge here is that annotations, per se, aren't only for typing. Bu tit would be nice if a type checker could see whatever "non-type" is recommended for dataclasses as "type not specified". Does an ellipses spell that? or None? or anything that doesn't have to be imported from typing :-) As for problems with order, if we were to accept > > @dataclass > class Spam: > beans = True > ham: bool > > style instead, would it be objectionable to require keyword arguments only > for dataclass __init__ methods? That'd get rid of the need to care about > order. > wouldn't that make the "ham: bool" legal -- i.e. no default? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Thu Dec 21 19:19:29 2017 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 22 Dec 2017 00:19:29 +0000 Subject: [Python-Dev] is typing optional in dataclasses? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <7f6de77e-d27f-742d-cab0-6c9fdf66f541@mail.de> <5f2f3c20-ba10-96fc-ae7a-438e5d50a7b0@trueblade.com> <410a4b16-c8da-dec8-f172-f00a6fc2c2da@mrabarnett.plus.com> Message-ID: (subject for this sub-thread updated) On Thu, Dec 21, 2017 at 4:08 PM Chris Barker wrote: > On Thu, Dec 21, 2017 at 3:36 PM, Gregory P. Smith wrote: > > >> But we already have ... which does - so I'd suggest that for people who >> are averse to importing anything from typing and using the also quite >> readable Any. (ie: document this as the expected practice with both having >> the same meaning) >> > > I don't think they do, actually - I haven't been following the typing > discussions, but someone in this thread said that ... means "use the type > of teh default" or something like that. > indeed, they may not. though if that is the definition is it reasonable to say that type analyzers recognize the potential recursive meaning when the _default_ is ... and treat that as Any? another option that crossed my mind was "a: 10" without using =. but that really abuses __attributes__ by sticking the default value in there which the @dataclass decorator would presumably immediately need to undo and fix up before returning the class. but I don't find assigning a value without an = sign to be pythonic so please lets not do that! :) > >> While I consider the annotation to be a good feature of data classes, it >> seems worth documenting that people not running a type analyzer should >> avoid declaring a type. >> > > +1 ! > > >> A worse thing than no-type being specified is a wrong type being >> specified. That appearing in a library will break people who need their >> code to pass the analyzer and pytype, mypy, et. al. could be forced to >> implement a typeshed.pypi of sorts containing blacklists of known bad >> annotations in public libraries and/or actually correct type specification >> overrides for them. >> > > and the wrong type could be very common -- folks using "int", when float > would do just fine, or "list" when any iterable would do, the list goes on > and on. Typing is actually pretty complex in Python -- it's hard to get > right, and if you aren't actually running a type checker, you'd never know. > Yeah, that is true. int vs float vs Number, etc. It suggests means we shouldn't worry about this problem at all for the pep 557 dataclasses implementation. Type analyzers by that definition are going to need to deal with incorrect annotations in data classes as a result no matter what so they'll deal with that regardless of how we say dataclasses should work. -gps > > One challenge here is that annotations, per se, aren't only for typing. Bu > tit would be nice if a type checker could see whatever "non-type" is > recommended for dataclasses as "type not specified". Does an ellipses spell > that? or None? or anything that doesn't have to be imported from typing :-) > > As for problems with order, if we were to accept >> >> @dataclass >> class Spam: >> beans = True >> ham: bool >> >> style instead, would it be objectionable to require keyword arguments >> only for dataclass __init__ methods? That'd get rid of the need to care >> about order. >> > > wouldn't that make the "ham: bool" legal -- i.e. no default? > > -CHB > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE > > (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Dec 21 19:55:18 2017 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 22 Dec 2017 00:55:18 +0000 Subject: [Python-Dev] is typing optional in dataclasses? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <7f6de77e-d27f-742d-cab0-6c9fdf66f541@mail.de> <5f2f3c20-ba10-96fc-ae7a-438e5d50a7b0@trueblade.com> <410a4b16-c8da-dec8-f172-f00a6fc2c2da@mrabarnett.plus.com> Message-ID: <7f6c352b-efac-5322-9320-13b00478e133@mrabarnett.plus.com> On 2017-12-22 00:19, Gregory P. Smith wrote: > (subject for this sub-thread updated) > > On Thu, Dec 21, 2017 at 4:08 PM Chris Barker > wrote: > > On Thu, Dec 21, 2017 at 3:36 PM, Gregory P. Smith > wrote: > > ?But we already have ... which does - so I'd suggest that for > people who are averse to importing anything from typing and > using the also quite readable Any.? (ie: document this as the > expected practice with both having the same meaning) > > > I don't think they do, actually - I haven't been following the > typing discussions, but someone in this thread said that ... means > "use the type of teh default" or something like that. > > > indeed, they may not.? though if that is the definition is it > reasonable to say that type analyzers recognize the potential > recursive meaning when the _default_ is ... and treat that as Any? > > another option that crossed my mind was "a: 10" without using =.? but > that really abuses __attributes__ by sticking the default value in > there which the @dataclass decorator would presumably immediately need > to undo and fix up before returning the class.? but I don't find > assigning a value without an = sign to be pythonic so please lets not > do that! :) > If you allowed "a: 10" (an int value), then you might also allow "a: 'foo'" (a string value), but wouldn't that be interpreted as a type called "foo"? If you can't have a string value, then you shouldn't have an int value either. [snip] From eric at trueblade.com Thu Dec 21 20:15:20 2017 From: eric at trueblade.com (Eric V. Smith) Date: Thu, 21 Dec 2017 20:15:20 -0500 Subject: [Python-Dev] is typing optional in dataclasses? In-Reply-To: <7f6c352b-efac-5322-9320-13b00478e133@mrabarnett.plus.com> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <7f6de77e-d27f-742d-cab0-6c9fdf66f541@mail.de> <5f2f3c20-ba10-96fc-ae7a-438e5d50a7b0@trueblade.com> <410a4b16-c8da-dec8-f172-f00a6fc2c2da@mrabarnett.plus.com> <7f6c352b-efac-5322-9320-13b00478e133@mrabarnett.plus.com> Message-ID: On 12/21/2017 7:55 PM, MRAB wrote: > On 2017-12-22 00:19, Gregory P. Smith wrote: >> (subject for this sub-thread updated) >> >> On Thu, Dec 21, 2017 at 4:08 PM Chris Barker > > wrote: >> >> ??? On Thu, Dec 21, 2017 at 3:36 PM, Gregory P. Smith > ??? > wrote: >> >> ??????? ?But we already have ... which does - so I'd suggest that for >> ??????? people who are averse to importing anything from typing and >> ??????? using the also quite readable Any.? (ie: document this as the >> ??????? expected practice with both having the same meaning) >> >> >> ??? I don't think they do, actually - I haven't been following the >> ??? typing discussions, but someone in this thread said that ... means >> ??? "use the type of teh default" or something like that. >> >> >> indeed, they may not.? though if that is the definition is it >> reasonable to say that type analyzers recognize the potential >> recursive meaning when the _default_ is ... and treat that as Any? >> >> another option that crossed my mind was "a: 10" without using =.? but >> that really abuses __attributes__ by sticking the default value in >> there which the @dataclass decorator would presumably immediately need >> to undo and fix up before returning the class.? but I don't find >> assigning a value without an = sign to be pythonic so please lets not >> do that! :) >> > If you allowed "a: 10" (an int value), then you might also allow "a: > 'foo'" (a string value), but wouldn't that be interpreted as a type > called "foo"? As far as dataclasses are concerned, both of these are allowed, and since neither is ClassVar or InitvVar, they're ignored. Type checkers would object to the int, and I assume also the string unless there was a type foo defined. See https://www.python.org/dev/peps/pep-0484/#the-problem-of-forward-declarations and typing.get_type_hints(). It's a bug that dataclasses currently does not inspect string annotations to see if they're actually ClassVar or InitVar declarations. PEP 563 makes it critical (and not merely important) to look at the string annotations. Whether or not that involves typing.get_type_hints() or not, I haven't yet decided. I'm waiting for PEPs 563 and 560 to be implemented before taking another look at it. Eric. > > If you can't have a string value, then you shouldn't have an int value > either. > > [snip] > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com > From ericsnowcurrently at gmail.com Thu Dec 21 21:32:38 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 21 Dec 2017 19:32:38 -0700 Subject: [Python-Dev] pep-0557 dataclasses top level module vs part of collections? In-Reply-To: References: Message-ID: On Thu, Dec 21, 2017 at 4:21 PM, Gregory P. Smith wrote: > It seems a suggested use is "from dataclasses import dataclass" > > But people are already familiar with "from collections import namedtuple" > which suggests to me that "from collections import dataclass" would be a > more natural sounding API addition. FWIW, I'd consider this a good time to add a new top-level classtools/classutils module (a la functools). There are plenty of other things that would fit there that we've shoved into other places. -eric From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Dec 21 22:51:36 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 22 Dec 2017 12:51:36 +0900 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> Message-ID: <23100.33096.202887.23177@turnbull.sk.tsukuba.ac.jp> Chris Barker writes: > Nathaniel Smith has pointed out that eval(pprint(a_dict)) is > supposed to return the same dict -- so documented behavior may > already be broken. Sure, but that's because we put shoes on a snake. Why anybody expects no impediment to slithering, I don't know! I understand the motivation to guarantee order, but it's a programmer convenience that has nothing to do with the idea of mapping, and the particular (insertion) order is very special and usually neither relevant nor reproducible. I have no problem whatsoever with just documenting any failure to preserve order while reproducing dicts, *except* that a process that inserts keys in the same order had better result in the same insertion order. Steve From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Dec 21 22:52:21 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 22 Dec 2017 12:52:21 +0900 Subject: [Python-Dev] f-strings In-Reply-To: References: <7809465429117446362@unknownmsgid> <23097.17056.197997.743640@turnbull.sk.tsukuba.ac.jp> Message-ID: <23100.33141.629509.14351@turnbull.sk.tsukuba.ac.jp> Eric Fahlgren writes: > On Tue, Dec 19, 2017 at 8:47 AM, Stephen J. Turnbull < > turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > > > If I were Bach, I'd compose a more-itertools-like module to be named > > Variations_on_the_F_String. :-) > > > > Would that be P.D.Q. Bach to whom you are referring? It's a curried function, supply your own arguments. "No, I won't!" Steve From raymond.hettinger at gmail.com Fri Dec 22 01:47:12 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Thu, 21 Dec 2017 22:47:12 -0800 Subject: [Python-Dev] pep-0557 dataclasses top level module vs part of collections? In-Reply-To: References: Message-ID: > On Dec 21, 2017, at 3:21 PM, Gregory P. Smith wrote: > > It seems a suggested use is "from dataclasses import dataclass" > > But people are already familiar with "from collections import namedtuple" which suggests to me that "from collections import dataclass" would be a more natural sounding API addition. This might make sense if it were a single self contained function. But dataclasses are their own little ecosystem that warrants its own module namespace: >>> import dataclasses >>> dataclasses.__all__ ['dataclass', 'field', 'FrozenInstanceError', 'InitVar', 'fields', 'asdict', 'astuple', 'make_dataclass', 'replace'] Also, remember that dataclasses have a dual role as a data holder (which is collection-like) and as a generator of boilerplate code (which is more like functools.total_ordering). I support Eric's decision to make this a separate module. Raymond From ericfahlgren at gmail.com Fri Dec 22 10:44:25 2017 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Fri, 22 Dec 2017 07:44:25 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: <23100.33096.202887.23177@turnbull.sk.tsukuba.ac.jp> References: <20171104173013.GA4005@bytereef.org> <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <23100.33096.202887.23177@turnbull.sk.tsukuba.ac.jp> Message-ID: On Thu, Dec 21, 2017 at 7:51 PM, Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > I understand the motivation to guarantee order, but it's a programmer > convenience that has nothing to do with the idea of mapping, and the > particular (insertion) order is very special and usually neither > relevant nor reproducible. I have no problem whatsoever with just > documenting any failure to preserve order while reproducing dicts, > *except* that a process that inserts keys in the same order had better > result in the same insertion order. > ?json, pickle == png, i.e., guaranteed lossless. repr, pprint == jpg, lossy for very specific motivating reasons.? In particular, I use pprint output in regression baselines, and if the long documented sort-by-key behavior changed, I would not be happy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Dec 22 12:09:29 2017 From: status at bugs.python.org (Python tracker) Date: Fri, 22 Dec 2017 18:09:29 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20171222170929.1068B11A87D@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2017-12-15 - 2017-12-22) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6342 (+12) closed 37819 (+64) total 44161 (+76) Open issues with patches: 2456 Issues opened (52) ================== #30697: segfault in PyErr_NormalizeException() after memory exhaustion https://bugs.python.org/issue30697 reopened by brett.cannon #32335: Failed Python build on Fedora 27 https://bugs.python.org/issue32335 opened by amitg-b14 #32336: Save OrderedDict import in argparse https://bugs.python.org/issue32336 opened by rhettinger #32337: Dict order is now guaranteed, so add tests and doc for it https://bugs.python.org/issue32337 opened by rhettinger #32338: Save OrderedDict import in re https://bugs.python.org/issue32338 opened by serhiy.storchaka #32339: Make the dict type used in csv.DictReader configurable https://bugs.python.org/issue32339 opened by serhiy.storchaka #32343: Leak Sanitizer reports memory leaks while building using ASAN https://bugs.python.org/issue32343 opened by kirit1193 #32345: EIO from write() is only fatal if print() contains a newline https://bugs.python.org/issue32345 opened by Creideiki #32346: Speed up slot lookup for class creation https://bugs.python.org/issue32346 opened by pitrou #32347: System Integrity Protection breaks shutil.copystat() https://bugs.python.org/issue32347 opened by Ryan Govostes #32352: `inspect.getfullargspec` doesn't work fine for some builtin ca https://bugs.python.org/issue32352 opened by thautwarm #32353: Add docs about Embedding with an frozen module limitation. https://bugs.python.org/issue32353 opened by Decorater #32354: Unclear intention of deprecating Py_UNICODE_TOLOWER / Py_UNICO https://bugs.python.org/issue32354 opened by ideasman42 #32358: json.dump: fp must be a text file object https://bugs.python.org/issue32358 opened by qingyunha #32359: Add getters for all SSLContext internal configuration https://bugs.python.org/issue32359 opened by njs #32360: Save OrderedDict imports in various stdlibs. https://bugs.python.org/issue32360 opened by inada.naoki #32361: global / nonlocal interference : is this a bug, a feature or a https://bugs.python.org/issue32361 opened by Camion #32362: multiprocessing.connection.Connection misdocumented as multipr https://bugs.python.org/issue32362 opened by Amery #32363: Deprecate task.set_result() and task.set_exception() https://bugs.python.org/issue32363 opened by asvetlov #32364: Add AbstractFuture and AbstractTask https://bugs.python.org/issue32364 opened by asvetlov #32367: [Security] CVE-2017-17522: webbrowser.py in Python does not va https://bugs.python.org/issue32367 opened by vstinner #32368: Segfault when compiling many conditional expressions https://bugs.python.org/issue32368 opened by snordhausen #32370: Wrong ANSI encoding used by subprocess for some locales https://bugs.python.org/issue32370 opened by Segev Finer #32371: Delay-loading of python dll is impossible when using some C ma https://bugs.python.org/issue32371 opened by Pierre Chatelier #32372: Optimize out __debug__ at the AST level https://bugs.python.org/issue32372 opened by serhiy.storchaka #32373: Add socket.getblocking() method https://bugs.python.org/issue32373 opened by yselivanov #32374: Document that m_traverse for multi-phase initialized modules c https://bugs.python.org/issue32374 opened by encukou #32375: Compilation warnings in getpath.c with gcc on Ubuntu / -D_FORT https://bugs.python.org/issue32375 opened by pitrou #32378: test_npn_protocols broken with LibreSSL 2.6.1+ https://bugs.python.org/issue32378 opened by christian.heimes #32380: functools.singledispatch interacts poorly with methods https://bugs.python.org/issue32380 opened by Ethan Smith #32381: Python 3.6 cannot reopen .pyc file with non-ASCII path https://bugs.python.org/issue32381 opened by tianjg #32384: Generator tests is broken in non-CPython implementation https://bugs.python.org/issue32384 opened by isaiahp #32387: Disallow untagged C extension import on major platforms https://bugs.python.org/issue32387 opened by pitrou #32388: Remove cross-version binary compatibility https://bugs.python.org/issue32388 opened by pitrou #32390: AIX (xlc_r) compile error with Modules/posixmodule.c: Function https://bugs.python.org/issue32390 opened by Michael.Felt #32391: Add StreamWriter.wait_closed() https://bugs.python.org/issue32391 opened by asvetlov #32392: subprocess.run documentation does not have **kwargs https://bugs.python.org/issue32392 opened by oprypin #32393: nav menu jitter in old documentation https://bugs.python.org/issue32393 opened by Joseph Hendrey #32394: socket lib beahavior change in 3.6.4 https://bugs.python.org/issue32394 opened by skn78 #32395: asyncio.StreamReader.readuntil is not general enough https://bugs.python.org/issue32395 opened by Bruce Merry #32396: Implement method to write/read to serials without blocking on https://bugs.python.org/issue32396 opened by jabdoa #32397: textwrap output may change if you wrap a paragraph twice https://bugs.python.org/issue32397 opened by larry #32398: OSX C++ linking workaround in distutils breaks other packages https://bugs.python.org/issue32398 opened by esuarezsantana #32399: _uuidmodule.c cannot build on AIX - different typedefs of uuid https://bugs.python.org/issue32399 opened by Michael.Felt #32400: inspect.isdatadescriptor false negative https://bugs.python.org/issue32400 opened by chnlior #32401: No module named '_ctypes' https://bugs.python.org/issue32401 opened by YoSTEALTH #32402: Coverity: CID 1426868/1426867: Null pointer dereferences in t https://bugs.python.org/issue32402 opened by vstinner #32403: date, time and datetime alternate constructors should take fas https://bugs.python.org/issue32403 opened by p-ganssle #32404: fromtimestamp does not call __new__ in datetime subclasses https://bugs.python.org/issue32404 opened by p-ganssle #32408: Performance regression in urllib.proxy_bypass_environment https://bugs.python.org/issue32408 opened by xiang.zhang #32409: venv activate.bat is UTF-8 encoded but uses current console co https://bugs.python.org/issue32409 opened by Jac0 #32410: Implement loop.sock_sendfile method https://bugs.python.org/issue32410 opened by asvetlov Most recent 15 issues with no replies (15) ========================================== #32410: Implement loop.sock_sendfile method https://bugs.python.org/issue32410 #32408: Performance regression in urllib.proxy_bypass_environment https://bugs.python.org/issue32408 #32404: fromtimestamp does not call __new__ in datetime subclasses https://bugs.python.org/issue32404 #32403: date, time and datetime alternate constructors should take fas https://bugs.python.org/issue32403 #32402: Coverity: CID 1426868/1426867: Null pointer dereferences in t https://bugs.python.org/issue32402 #32400: inspect.isdatadescriptor false negative https://bugs.python.org/issue32400 #32393: nav menu jitter in old documentation https://bugs.python.org/issue32393 #32384: Generator tests is broken in non-CPython implementation https://bugs.python.org/issue32384 #32380: functools.singledispatch interacts poorly with methods https://bugs.python.org/issue32380 #32378: test_npn_protocols broken with LibreSSL 2.6.1+ https://bugs.python.org/issue32378 #32372: Optimize out __debug__ at the AST level https://bugs.python.org/issue32372 #32371: Delay-loading of python dll is impossible when using some C ma https://bugs.python.org/issue32371 #32362: multiprocessing.connection.Connection misdocumented as multipr https://bugs.python.org/issue32362 #32360: Save OrderedDict imports in various stdlibs. https://bugs.python.org/issue32360 #32358: json.dump: fp must be a text file object https://bugs.python.org/issue32358 Most recent 15 issues waiting for review (15) ============================================= #32410: Implement loop.sock_sendfile method https://bugs.python.org/issue32410 #32402: Coverity: CID 1426868/1426867: Null pointer dereferences in t https://bugs.python.org/issue32402 #32399: _uuidmodule.c cannot build on AIX - different typedefs of uuid https://bugs.python.org/issue32399 #32390: AIX (xlc_r) compile error with Modules/posixmodule.c: Function https://bugs.python.org/issue32390 #32388: Remove cross-version binary compatibility https://bugs.python.org/issue32388 #32387: Disallow untagged C extension import on major platforms https://bugs.python.org/issue32387 #32384: Generator tests is broken in non-CPython implementation https://bugs.python.org/issue32384 #32378: test_npn_protocols broken with LibreSSL 2.6.1+ https://bugs.python.org/issue32378 #32374: Document that m_traverse for multi-phase initialized modules c https://bugs.python.org/issue32374 #32373: Add socket.getblocking() method https://bugs.python.org/issue32373 #32372: Optimize out __debug__ at the AST level https://bugs.python.org/issue32372 #32363: Deprecate task.set_result() and task.set_exception() https://bugs.python.org/issue32363 #32353: Add docs about Embedding with an frozen module limitation. https://bugs.python.org/issue32353 #32347: System Integrity Protection breaks shutil.copystat() https://bugs.python.org/issue32347 #32346: Speed up slot lookup for class creation https://bugs.python.org/issue32346 Top 10 most discussed issues (10) ================================= #32361: global / nonlocal interference : is this a bug, a feature or a https://bugs.python.org/issue32361 20 msgs #32030: PEP 432: Rewrite Py_Main() https://bugs.python.org/issue32030 15 msgs #32259: Misleading "not iterable" Error Message when generator return https://bugs.python.org/issue32259 14 msgs #32387: Disallow untagged C extension import on major platforms https://bugs.python.org/issue32387 11 msgs #25749: asyncio.Server class documented but not exported https://bugs.python.org/issue25749 9 msgs #32394: socket lib beahavior change in 3.6.4 https://bugs.python.org/issue32394 9 msgs #17852: Built-in module _io can lose data from buffered files at exit https://bugs.python.org/issue17852 8 msgs #32221: Converting ipv6 address to string representation using getname https://bugs.python.org/issue32221 8 msgs #32338: Save OrderedDict import in re https://bugs.python.org/issue32338 7 msgs #32346: Speed up slot lookup for class creation https://bugs.python.org/issue32346 7 msgs Issues closed (61) ================== #15216: Add encoding & errors parameters to TextIOWrapper.reconfigure( https://bugs.python.org/issue15216 closed by inada.naoki #15852: typos in curses argument error messages https://bugs.python.org/issue15852 closed by asvetlov #19764: subprocess: use PROC_THREAD_ATTRIBUTE_HANDLE_LIST with STARTUP https://bugs.python.org/issue19764 closed by vstinner #20493: select module: loop if the timeout is too large (OverflowError https://bugs.python.org/issue20493 closed by asvetlov #24539: StreamReaderProtocol.eof_received() should return True to keep https://bugs.python.org/issue24539 closed by asvetlov #24795: Make event loops with statement context managers https://bugs.python.org/issue24795 closed by asvetlov #25074: Bind logger and waninigs modules for asyncio __del__ methods https://bugs.python.org/issue25074 closed by asvetlov #25675: doc for BaseEventLoop.run_in_executor() says its a coroutine, https://bugs.python.org/issue25675 closed by asvetlov #26188: Provide more helpful error message when `await` is called insi https://bugs.python.org/issue26188 closed by asvetlov #26357: asyncio.wait loses coroutine return value https://bugs.python.org/issue26357 closed by asvetlov #27746: ResourceWarnings in test_asyncio https://bugs.python.org/issue27746 closed by asvetlov #28212: Closing server in asyncio is not efficient https://bugs.python.org/issue28212 closed by asvetlov #28697: asyncio.Lock, Condition, Semaphore docs don't mention `async w https://bugs.python.org/issue28697 closed by asvetlov #28942: await expressions in f-strings https://bugs.python.org/issue28942 closed by yselivanov #29042: os.path.exists should not throw "Embedded NUL character" excep https://bugs.python.org/issue29042 closed by vstinner #29344: sock_recv not detected a coroutine https://bugs.python.org/issue29344 closed by asvetlov #29517: "Can't pickle local object" when uses functools.partial with m https://bugs.python.org/issue29517 closed by yselivanov #29558: Provide run_until_complete inside loop https://bugs.python.org/issue29558 closed by asvetlov #29689: Asyncio-namespace helpers for async_generators https://bugs.python.org/issue29689 closed by asvetlov #29711: When you use stop_serving in proactor loop it's kill all liste https://bugs.python.org/issue29711 closed by yselivanov #29889: test_asyncio fails always https://bugs.python.org/issue29889 closed by asvetlov #29970: Severe open file leakage running asyncio SSL server https://bugs.python.org/issue29970 closed by asvetlov #30014: Speedup DefaultSelectors.modify() by 2x https://bugs.python.org/issue30014 closed by asvetlov #30050: Please provide a way to disable the warning printed if the sig https://bugs.python.org/issue30050 closed by yselivanov #30539: Make Proactor public in asyncio.ProactorEventLoop https://bugs.python.org/issue30539 closed by yselivanov #31059: asyncio.StreamReader.read hangs if n<0 https://bugs.python.org/issue31059 closed by asvetlov #31094: asyncio: get list of connected clients https://bugs.python.org/issue31094 closed by asvetlov #31901: atexit callbacks should be run at subinterpreter shutdown https://bugs.python.org/issue31901 closed by pitrou #32250: Add asyncio.current_task() and asyncio.all_tasks() funcitons https://bugs.python.org/issue32250 closed by asvetlov #32276: there is no way to make tempfile reproducible (i.e. seed the u https://bugs.python.org/issue32276 closed by serhiy.storchaka #32287: Import of _pyio module failed on cygwin https://bugs.python.org/issue32287 closed by erik.bray #32306: Clarify map API in concurrent.futures https://bugs.python.org/issue32306 closed by pitrou #32315: can't run any scripts with 2.7.x, 32 and 64-bit https://bugs.python.org/issue32315 closed by terry.reedy #32318: Remove "globals()" call from "socket.accept()" https://bugs.python.org/issue32318 closed by yselivanov #32323: urllib.parse.urlsplit() must not lowercase() IPv6 scope value https://bugs.python.org/issue32323 closed by asvetlov #32331: Fix socket.type on OSes with SOCK_NONBLOCK https://bugs.python.org/issue32331 closed by yselivanov #32340: ValueError: time data 'N/A' does not match format '%Y%m%d' https://bugs.python.org/issue32340 closed by r.david.murray #32341: itertools "generators"? https://bugs.python.org/issue32341 closed by rhettinger #32342: safe_power(): CID 1426161: Integer handling issues (DIVIDE_B https://bugs.python.org/issue32342 closed by vstinner #32344: Explore whether peephole.c tuple of constants folding can be a https://bugs.python.org/issue32344 closed by serhiy.storchaka #32348: Optimize asyncio.Future schedule/add/remove callback https://bugs.python.org/issue32348 closed by yselivanov #32349: Add detailed return value information for set.intersection fun https://bugs.python.org/issue32349 closed by rhettinger #32350: pip can't handle MSVC warnings containing special characters https://bugs.python.org/issue32350 closed by paul.moore #32351: Use fastpath in asyncio.sleep if delay<0 https://bugs.python.org/issue32351 closed by asvetlov #32355: Optimize asyncio.gather() https://bugs.python.org/issue32355 closed by yselivanov #32356: asyncio: Make transport.pause_reading()/resume_reading() idemp https://bugs.python.org/issue32356 closed by yselivanov #32357: Optimize asyncio.iscoroutine() and loop.create_task() for non- https://bugs.python.org/issue32357 closed by yselivanov #32365: Reference leak: test_ast test_builtin test_compile https://bugs.python.org/issue32365 closed by serhiy.storchaka #32366: suggestion:html.escape(s, quote=True) escape \n to
https://bugs.python.org/issue32366 closed by r.david.murray #32369: test_subprocess: last part of test_close_fds() doesn't check w https://bugs.python.org/issue32369 closed by izbyshev #32376: Unusable syntax error reported when Python keyword in a f-stri https://bugs.python.org/issue32376 closed by r.david.murray #32377: Difference in ressurrection behavior with __del__ in py2 vs. p https://bugs.python.org/issue32377 closed by pitrou #32379: MRO computation could be faster https://bugs.python.org/issue32379 closed by pitrou #32382: Python mulitiprocessing.Queue fail to get according to correct https://bugs.python.org/issue32382 closed by pitrou #32383: subprocess.Popen() is slower than subprocess.run() https://bugs.python.org/issue32383 closed by ?????????????? ?????????????? #32385: Clean up the C3 MRO algorithm implementation. https://bugs.python.org/issue32385 closed by serhiy.storchaka #32386: dynload_next.c is obsolete https://bugs.python.org/issue32386 closed by benjamin.peterson #32389: urllib3 wrong computation of 'Content-Length' for file upload https://bugs.python.org/issue32389 closed by r.david.murray #32405: clr: AttributeError: 'module' object has no attribute 'AddRefe https://bugs.python.org/issue32405 closed by r.david.murray #32406: Doc: The new dataclasses module is not documented https://bugs.python.org/issue32406 closed by eric.smith #32407: lib2to3 doesn't work when zipped https://bugs.python.org/issue32407 closed by r.david.murray From brett at python.org Fri Dec 22 11:49:50 2017 From: brett at python.org (Brett Cannon) Date: Fri, 22 Dec 2017 16:49:50 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On Thu, Dec 21, 2017, 03:37 Ivan Levkivskyi, wrote: > On 21 December 2017 at 11:22, Terry Reedy wrote: > >> On 12/21/2017 4:22 AM, Eric V. Smith wrote: >> >>> On 12/21/2017 1:46 AM, Chris Barker wrote: >>> >> >> I suggest that it be clear in the docs, and ideally in the PEP, that the >>>> dataclass decorator is using the *annotation" syntax, and that the the only >>>> relevant part it uses is that an annotation exists, but the value of the >>>> annotation is essentially (completely?) ignored. >>>> >>> >>> I think the PEP is very clear about this: "The dataclass decorator >>> examines the class to find fields. A field is defined as any variable >>> identified in __annotations__. That is, a variable that has a type >>> annotation. With two exceptions described below, none of the Data Class >>> machinery examines the type specified in the annotation." >>> >> >> This seems clear enough. It could come after describing what a dataclass >> *is*. >> >> I agree the docs should also be clear about this. >>> >> >> >> So we should have examples like: >>>> >>>> @dataclass >>>> class C: >>>> a: ... # field with no default >>>> b: ... = 0 # filed with a default value >>>> >>>> Then maybe: >>>> >>>> @dataclass >>>> class C: >>>> a: "the a parameter" # field with no default >>>> b: "another, different parameter" = 0.0 # field with a default >>>> >>>> Then the docs can go to say that if the user wants to specify a type >>>> for use with a static type checking pre-processor, they can do it like so: >>>> >>>> @dataclass >>>> class C: >>>> a: int # integer field with no default >>>> b: float = 0.0 # float field with a default >>>> >>>> And the types will be recognized by type checkers such as mypy. >>>> >>>> And I think the non-typed examples should go first in the docs. >>>> >>> >> Module some bike-shedding, the above seems pretty good to me. >> > > For me, the three options for "don't care" have a bit different meaning: > > * typing.Any: this class is supposed to be used with static type checkers, > but this field is too dynamic > * ... (ellipsis): this class may or may not be used with static type > checkers, use the inferred type in the latter case > * "field docstring": this class should not be used with static type > checkers > > Assuming this, the second option would be the "real" "don't care". If this > makes sense, > then we can go the way proposed in > https://github.com/python/typing/issues/276 and make ellipsis semantics > "official" in PEP 484. > (pending Guido's approval) > I vote for option 2 as well. I think it's worth reminding people that if they don't like the fact dataclasses (ab)use type hints for their succinct syntax that you can always use attrs instead to avoid type hints. Otherwise whichever approach we agree to from Ivan's suggestions will take care of this. As for those who feel dataclasses will force them to teach type hints and they simply don't want to, maybe we could help land protocols and then maybe you can use dataclasses as an opportunity to explicitly teach duck typing? But I think the key point I want to make is Guido chose dataclasses to support using the type hints syntax specifically over how attrs does things, so I don't see this thread trying to work around that going anywhere at this point since I haven't seen a solid alternative be proposed after all of this debating. -brett > -- > Ivan > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Dec 22 13:10:51 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 22 Dec 2017 18:10:51 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On Thu, Dec 21, 2017 at 6:39 AM Ivan Levkivskyi wrote: > For me, the three options for "don't care" have a bit different meaning: > > * typing.Any: this class is supposed to be used with static type checkers, > but this field is too dynamic > * ... (ellipsis): this class may or may not be used with static type > checkers, use the inferred type in the latter case > * "field docstring": this class should not be used with static type > checkers > > Assuming this, the second option would be the "real" "don't care". If this > makes sense, > then we can go the way proposed in > https://github.com/python/typing/issues/276 and make ellipsis semantics > "official" in PEP 484. > (pending Guido's approval) > I am a little nervous about using "..." for inferred types, because it could potentially cause confusion with other uses of ellipsis in typing. Ellipsis already has a special meaning for Tuple, so an annotation like MyClass[int, ...] could mean either a tuple subclass with integer elements or a two argument generic type where the second type is inferred. Actually, it's ambiguous even for Tuple. Ellipsis could also make a lot of sense for typing multi-dimensional arrays similar to how it's used in indexing to denote "any number of dimensions." Again, the semantics for "..." might defer from "an inferred size." > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Dec 22 14:38:06 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 22 Dec 2017 11:38:06 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On Fri, Dec 22, 2017 at 8:49 AM, Brett Cannon wrote: > I think it's worth reminding people that if they don't like the fact >> dataclasses (ab)use type hints for their succinct syntax that you can >> always use attrs instead to avoid type hints. >> > sure -- but this doesn't really address the issue, the whole reason this is even a discussion is because dataclasses is going into the standard library. Third party packages can do whatever they want, of course. And the concern is that people (in particular newbies) will get confused / the wrong impression / other-negative-response by the (semi) use of typing in a standard library module. > As for those who feel dataclasses will force them to teach type hints and > they simply don't want to, maybe we could help land protocols > Could you please clarify what this is about ??? > But I think the key point I want to make is Guido chose dataclasses to > support using the type hints syntax specifically over how attrs does > things, so I don't see this thread trying to work around that going > anywhere at this point since I haven't seen a solid alternative be proposed > after all of this debating. > And the PEP has been approved. So the actionable things are: Writing good docs Converging on a "recommended" way to do non-typed dataclass fields. And that should be decided in order to write the docs, (and probably should be in the PEP). -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Fri Dec 22 14:50:48 2017 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 22 Dec 2017 19:50:48 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On Fri, Dec 22, 2017 at 11:40 AM Chris Barker wrote: > On Fri, Dec 22, 2017 at 8:49 AM, Brett Cannon > But I think the key point I want to make is Guido chose dataclasses to >> support using the type hints syntax specifically over how attrs does >> things, so I don't see this thread trying to work around that going >> anywhere at this point since I haven't seen a solid alternative be proposed >> after all of this debating. >> > > And the PEP has been approved. > > So the actionable things are: > > Writing good docs > > Converging on a "recommended" way to do non-typed dataclass fields. > My preference for this is "just use Any" for anyone not concerned about the type. But if we wanted to make it more opaque so that people need not realizing that they are actually type annotations, I suggest adding an alias for Any in the dataclasses module (dataclasses.Data = typing.Any) from dataclasses import dataclass, Data @dataclass class Swallow: weight_in_oz: Data = 5 laden: Data = False species: Data = SwallowSpecies.AFRICAN the word "Data" is friendlier than "Any" in this context for people who don't need to care about the typing module. We could go further and have Data not be an alias for Any if desired (so that its repr wouldn't be confusing, not that anyone should be looking at its repr ever). -gps > > And that should be decided in order to write the docs, (and probably > should be in the PEP). > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Fri Dec 22 14:59:27 2017 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 22 Dec 2017 19:59:27 +0000 Subject: [Python-Dev] pep-0557 dataclasses top level module vs part of collections? In-Reply-To: References: Message-ID: On Thu, Dec 21, 2017 at 10:47 PM Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > > > > On Dec 21, 2017, at 3:21 PM, Gregory P. Smith wrote: > > > > It seems a suggested use is "from dataclasses import dataclass" > > > > But people are already familiar with "from collections import > namedtuple" which suggests to me that "from collections import dataclass" > would be a more natural sounding API addition. > > This might make sense if it were a single self contained function. But > dataclasses are their own little ecosystem that warrants its own module > namespace: > > >>> import dataclasses > >>> dataclasses.__all__ > ['dataclass', 'field', 'FrozenInstanceError', 'InitVar', 'fields', > 'asdict', 'astuple', 'make_dataclass', 'replace'] > > Also, remember that dataclasses have a dual role as a data holder (which > is collection-like) and as a generator of boilerplate code (which is more > like functools.total_ordering). > > I support Eric's decision to make this a separate module. > sounds good. lets leave it that way. dataclasses it is. if we were further along in figuring out how to remove the distinction between a class and a module as a namespace I'd suggest the module name itself be dataclass with a __call__ method so that the module could be the decorator so we could avoid the antipattern of importing a name from a module into your local namespace. but we're not, so we can't. :) -gps > > > Raymond > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Dec 22 14:55:38 2017 From: brett at python.org (Brett Cannon) Date: Fri, 22 Dec 2017 19:55:38 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On Fri, Dec 22, 2017, 11:38 Chris Barker, wrote: > On Fri, Dec 22, 2017 at 8:49 AM, Brett Cannon wrote: > >> I think it's worth reminding people that if they don't like the fact >>> dataclasses (ab)use type hints for their succinct syntax that you can >>> always use attrs instead to avoid type hints. >>> >> > sure -- but this doesn't really address the issue, the whole reason this > is even a discussion is because dataclasses is going into the standard > library. Third party packages can do whatever they want, of course. > > And the concern is that people (in particular newbies) will get confused / > the wrong impression / other-negative-response by the (semi) use of typing > in a standard library module. > I'm still not worried. Type hints are part of the syntax and so are no worse off than async/await and asyncio IMO. > >> As for those who feel dataclasses will force them to teach type hints and >> they simply don't want to, maybe we could help land protocols >> > > Could you please clarify what this is about ??? > There's a PEP by Ivan (on my phone else I would look up the number). -Brett > >> But I think the key point I want to make is Guido chose dataclasses to >> support using the type hints syntax specifically over how attrs does >> things, so I don't see this thread trying to work around that going >> anywhere at this point since I haven't seen a solid alternative be proposed >> after all of this debating. >> > > And the PEP has been approved. > > So the actionable things are: > > Writing good docs > > Converging on a "recommended" way to do non-typed dataclass fields. > > And that should be decided in order to write the docs, (and probably > should be in the PEP). > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE > > (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Dec 22 15:11:38 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 22 Dec 2017 20:11:38 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On 22 December 2017 at 19:50, Gregory P. Smith wrote: > My preference for this is "just use Any" for anyone not concerned about the > type. But if we wanted to make it more opaque so that people need not > realizing that they are actually type annotations, I suggest adding an alias > for Any in the dataclasses module (dataclasses.Data = typing.Any) > > from dataclasses import dataclass, Data > > @dataclass > class Swallow: > weight_in_oz: Data = 5 > laden: Data = False > species: Data = SwallowSpecies.AFRICAN > > the word "Data" is friendlier than "Any" in this context for people who > don't need to care about the typing module. > > We could go further and have Data not be an alias for Any if desired (so > that its repr wouldn't be confusing, not that anyone should be looking at > its repr ever). That sounds like a nice simple proposal. +1 from me. Documentation can say that variables should be annotated with "Data" to be recognised by the decorator, and if people are using type annotations an actual type can be used in place of "Data" (which acts the same as typing.Any. That seems to me to describe the feature in a suitably type-hinting-neutral way, while still making it clear how data classes interact with type annotations. Paul From chris.barker at noaa.gov Fri Dec 22 15:15:36 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 22 Dec 2017 12:15:36 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On Fri, Dec 22, 2017 at 10:10 AM, Stephan Hoyer wrote: > On Thu, Dec 21, 2017 at 6:39 AM Ivan Levkivskyi > wrote: > >> > * ... (ellipsis): this class may or may not be used with static type >> checkers, use the inferred type in the latter case >> > > * "field docstring": this class should not be used with static type >> checkers >> >> Assuming this, the second option would be the "real" "don't care". If >> this makes sense, >> then we can go the way proposed in https://github.com/python/ >> typing/issues/276 and make ellipsis semantics "official" in PEP 484. >> (pending Guido's approval) >> > > I am a little nervous about using "..." for inferred types, because it > could potentially cause confusion with other uses of ellipsis in typing. > Isn't that what "make ellipsis semantics "official"" means -- i.e. making it clear how they are used in typing? The core problem is that generic annotations are used in dataclasses without the "type hints" use-case. But: 1) Python is moving to make (PEP 484) type hints be THE recommended usage for annotations 2) We want the annotations in dataclasses to be "proper" PEP 484 type hints if they are there. The challenge is: - Annotations require a value. - Any value used might be interpreted by a static type checker. So we need a way to spell "no type specified" that will not be mis-interpreted by type checkers, and is in the built in namespace, and will seem natural to users with no knowledge or interest in static typing. The ellipses is tempting, because it's a literal that doesn't have any other obvious meaning in this context. Bu tif it has an incompatible meaning in PEP 484, then we're stuck. Is there another Obscure literal that would work? - I assume None means "the None type" to type checkers, yes? - empty string is one option -- or more to the point, any string -- so then it could be used as docs as well. - Is there another Obscure literal that would work? (or not so obscure one that doesn't have another meaning to type checkers) Would it be crazy to bring typing.Any into the builtin namespace? @dataclass: a: Any b: Any = 34 c: int = 0 That reads pretty well to me.... And having Any available in the built in namespace may help in other cases where type hints are getting introduced into code that isn't really being properly type checked. I don't LOVE it -- to me, Any means "any type will do", or "I don't care what type this is" and what we really want is "no type specified" -- i.e. the same thing as plain old Python code without type hints. But practically speaking, it has the same effect, yes? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Dec 22 15:29:48 2017 From: mertz at gnosis.cx (David Mertz) Date: Fri, 22 Dec 2017 12:29:48 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: There name Data seems very intuitive to me without suggesting type declaration as Any does (but it can still be treated as a synonym by actual type checkers) On Dec 22, 2017 12:12 PM, "Paul Moore" wrote: > On 22 December 2017 at 19:50, Gregory P. Smith wrote: > > > My preference for this is "just use Any" for anyone not concerned about > the > > type. But if we wanted to make it more opaque so that people need not > > realizing that they are actually type annotations, I suggest adding an > alias > > for Any in the dataclasses module (dataclasses.Data = typing.Any) > > > > from dataclasses import dataclass, Data > > > > @dataclass > > class Swallow: > > weight_in_oz: Data = 5 > > laden: Data = False > > species: Data = SwallowSpecies.AFRICAN > > > > the word "Data" is friendlier than "Any" in this context for people who > > don't need to care about the typing module. > > > > We could go further and have Data not be an alias for Any if desired (so > > that its repr wouldn't be confusing, not that anyone should be looking at > > its repr ever). > > That sounds like a nice simple proposal. +1 from me. > > Documentation can say that variables should be annotated with "Data" > to be recognised by the decorator, and if people are using type > annotations an actual type can be used in place of "Data" (which acts > the same as typing.Any. That seems to me to describe the feature in a > suitably type-hinting-neutral way, while still making it clear how > data classes interact with type annotations. > > Paul > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > mertz%40gnosis.cx > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-dev at mgmiller.net Fri Dec 22 16:02:38 2017 From: python-dev at mgmiller.net (Mike Miller) Date: Fri, 22 Dec 2017 13:02:38 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: <05050a77-4693-a75a-5aaf-d2dbc6eb89d0@mgmiller.net> On 2017-12-22 12:15, Chris Barker wrote: > Would it be crazy to bring typing.Any into the builtin namespace? > > @dataclass: > ? ? a: Any > ? ? b: Any = 34 > ? ? c: int = 0 > > That reads pretty well to me.... > And having Any available in the built in namespace may help in other cases where There is already an "any" function in the builtins. It looks fine but not sure how it will interact with type checkers. The "dataclass.Data" idea mentioned in a sibling thread is good alternative, though just wordy enough to make ... a shortcut. -Mike From python at mrabarnett.plus.com Fri Dec 22 16:18:20 2017 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 22 Dec 2017 21:18:20 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <05050a77-4693-a75a-5aaf-d2dbc6eb89d0@mgmiller.net> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <05050a77-4693-a75a-5aaf-d2dbc6eb89d0@mgmiller.net> Message-ID: On 2017-12-22 21:02, Mike Miller wrote: > > On 2017-12-22 12:15, Chris Barker wrote: >> Would it be crazy to bring typing.Any into the builtin namespace? >> >> @dataclass: >> ? ? a: Any >> ? ? b: Any = 34 >> ? ? c: int = 0 >> >> That reads pretty well to me.... > > And having Any available in the built in namespace may help in other cases where > > There is already an "any" function in the builtins. It looks fine but not sure > how it will interact with type checkers. > > The "dataclass.Data" idea mentioned in a sibling thread is good alternative, > though just wordy enough to make ... a shortcut. > The function is "any", the type is "Any", and "any" != "Any", although I wonder how many people will be caught out by that... From chris.barker at noaa.gov Fri Dec 22 17:17:02 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 22 Dec 2017 14:17:02 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <05050a77-4693-a75a-5aaf-d2dbc6eb89d0@mgmiller.net> Message-ID: On Fri, Dec 22, 2017 at 1:18 PM, MRAB wrote: > >> The function is "any", the type is "Any", and "any" != "Any", although I > wonder how many people will be caught out by that... enough that it's a bad idea.... oh well. -CHB > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris. > barker%40noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Dec 22 19:12:31 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 22 Dec 2017 16:12:31 -0800 Subject: [Python-Dev] Guarantee ordered dict literals in v3.7? In-Reply-To: References: <20171104173013.GA4005@bytereef.org> <4C2C51D6-FBB9-44DA-946A-8EDE9FFEA95C@python.org> <23100.33096.202887.23177@turnbull.sk.tsukuba.ac.jp> Message-ID: Let's not change pprint. On Fri, Dec 22, 2017 at 7:44 AM, Eric Fahlgren wrote: > On Thu, Dec 21, 2017 at 7:51 PM, Stephen J. Turnbull < > turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > >> I understand the motivation to guarantee order, but it's a programmer >> convenience that has nothing to do with the idea of mapping, and the >> particular (insertion) order is very special and usually neither >> relevant nor reproducible. I have no problem whatsoever with just >> documenting any failure to preserve order while reproducing dicts, >> *except* that a process that inserts keys in the same order had better >> result in the same insertion order. >> > > ?json, pickle == png, i.e., guaranteed lossless. > repr, pprint == jpg, lossy for very specific motivating reasons.? > > In particular, I use pprint output in regression baselines, and if the > long documented sort-by-key behavior changed, I would not be happy. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Dec 23 20:54:29 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 24 Dec 2017 11:54:29 +1000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On 23 Dec. 2017 9:37 am, "David Mertz" wrote: There name Data seems very intuitive to me without suggesting type declaration as Any does (but it can still be treated as a synonym by actual type checkers) Type checkers would also be free to interpret it as "infer the type from the default value", rather than necessarily treating it as Any. I still wonder about the "fields *must* be annotated" constraint though. I can understand a constraint that the style be *consistent* (i.e. all fields as annotations, or all fields as field instances), since that's needed to determine the field order, but I don't see the problem with the "no annotations" style otherwise. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at ethanhs.me Sun Dec 24 21:32:08 2017 From: ethan at ethanhs.me (Ethan Smith) Date: Sun, 24 Dec 2017 18:32:08 -0800 Subject: [Python-Dev] Supporting functools.singledispatch with classes. Message-ID: Hello all, In https://bugs.python.org/issue32380 I was hoping to add support for singledispatch with methods. Unfortunately, this cannot be achieved internally without ugly attribute or stack hacks. Therefore, I was thinking it would be nice if singledispatch supported a keyword argument of the argument index to dispatch on, thus one can say: class A: @singledispatch(arg=1) def method(self, a): return 'base' @method.register(int) def method(self, a): return 'int' The other option that could work is to define a special decorator for methods def methodsingledispatch(func): """Single-dispatch generic method decorator.""" wrapped = singledispatch(func) def wrapper(*args, **kw): return wrapped.dispatch(args[1].__class__)(*args, **kw) wrapper.register = wrapped.register update_wrapper(wrapper, func) return wrapper Since this is an API change, Ivan said I should post here to get feedback. I prefer the first design, as it is more generic and flexible. There is also the issue of classmethod and staticmethod. Since these are descriptors, I'm not sure they will work with singledispatch at all. if you do @classmethod @singledispatch def foo(cls, arg): ... You lose register on foo, breaking everything. I believe this would require changing classmethod thus is a non-starter. If you do @singledispatch @classmethod def foo(arg): ... The wrapper in singledispatch needs to be called as the __func__ in classmethod, but __func__ is readonly. So at the moment, I don't think it is possible to implement singledispatch on classmethod or staticmethod decorated functions. I look forward to people's thoughts on these issues. Cheers, Ethan Smith -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Dec 25 19:41:59 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 26 Dec 2017 10:41:59 +1000 Subject: [Python-Dev] Supporting functools.singledispatch with classes. In-Reply-To: References: Message-ID: On 25 December 2017 at 12:32, Ethan Smith wrote: > So at the moment, I don't think it is possible to implement singledispatch > on classmethod or staticmethod decorated functions. I've posted this to the PR, but adding it here as well: I think this is a situation very similar to the case with functools.partialmethod, where you're going to need to write a separate functools.singledispatchmethod class that's aware of the descriptor protocol, rather than trying to add the functionality directly to functools.singledispatch. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yogev at intsights.com Tue Dec 26 02:01:48 2017 From: yogev at intsights.com (Yogev Hendel) Date: Tue, 26 Dec 2017 07:01:48 +0000 Subject: [Python-Dev] (no subject) Message-ID: I don't know if this is the right place to put this, but I've found the following lines of code results in an incredibly long processing time. Perhaps it might be of use to someone. *import re* *pat = re.compile('^/?(?:\\w+)/(?:[%\\w-]+/?)+/?$')* *pat.match('/t/a-aa-aa-aaaaa-aa-aa-aa-aa-aa-aa./')* -------------- next part -------------- An HTML attachment was scrubbed... URL: From hugo.fisher at gmail.com Tue Dec 26 05:16:03 2017 From: hugo.fisher at gmail.com (Hugh Fisher) Date: Tue, 26 Dec 2017 21:16:03 +1100 Subject: [Python-Dev] Heap allocate type structs in native extension modules? Message-ID: I have a Python program which generates the boilerplate code for native extension modules from a Python source definition. (http://bitbucket.org/hugh_fisher/fullofeels if interested.) The examples in the Python doco and the "Python Essential Reference" book all use a statically declared PyTypeObject struct and PyType_Ready in the module init func, so I'm doing the same. Then Python 3.5 added a check for statically allocated types inheriting from heap types, which broke a couple of my classes. And now I'm trying to add a __dict__ to native classes so end users can add their own attributes, and this is turning out to be painful with static PyTypeObject structs Would it be better to use dynamically allocated type structs in native modules? -- cheers, Hugh Fisher From benjamin at python.org Tue Dec 26 09:00:55 2017 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 26 Dec 2017 06:00:55 -0800 Subject: [Python-Dev] Heap allocate type structs in native extension modules? In-Reply-To: References: Message-ID: <1514296855.2976189.1216128192.7E577D55@webmail.messagingengine.com> I imagine Cython already takes care of this? On Tue, Dec 26, 2017, at 02:16, Hugh Fisher wrote: > I have a Python program which generates the boilerplate code for > native extension modules from a Python source definition. > (http://bitbucket.org/hugh_fisher/fullofeels if interested.) > > The examples in the Python doco and the "Python Essential Reference" > book all use a statically declared PyTypeObject struct and > PyType_Ready in the module init func, so I'm doing the same. Then > Python 3.5 added a check for statically allocated types inheriting > from heap types, which broke a couple of my classes. And now I'm > trying to add a __dict__ to native classes so end users can add their> own attributes, and this is turning out to be painful with static > PyTypeObject structs > > Would it be better to use dynamically allocated type structs in native > modules?> > -- > > cheers, > Hugh Fisher > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/benjamin%40python.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Dec 26 13:49:53 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 26 Dec 2017 10:49:53 -0800 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On Sat, Dec 23, 2017 at 5:54 PM, Nick Coghlan wrote: > > I still wonder about the "fields *must* be annotated" constraint though. I > can understand a constraint that the style be *consistent* (i.e. all fields > as annotations, or all fields as field instances), since that's needed to > determine the field order, but I don't see the problem with the "no > annotations" style otherwise. > IIUC, without annotations, there is no way to set a field with no default. And supporting both approaches violates "only one way to do it" in, I think, a confusing manner -- particularly if you can't mix and match them. Also, could does using class attributes without annotations make a mess when subclassing? -- no I haven't thought that out yet. -CHB > > Cheers, > Nick. > > > > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Tue Dec 26 15:15:18 2017 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 26 Dec 2017 20:15:18 +0000 Subject: [Python-Dev] (no subject) In-Reply-To: References: Message-ID: On 2017-12-26 07:01, Yogev Hendel wrote: > > I don't know if this is the right place to put this, > but I've found the following lines of code results in an incredibly long > processing time. > Perhaps it might be of use to someone. > > /import re/ > /pat = re.compile('^/?(?:\\w+)/(?:[%\\w-]+/?)+/?$')/ > /pat.match('/t/a-aa-aa-aaaaa-aa-aa-aa-aa-aa-aa./')/ > The pattern has a repeated repeat, which results in catastrophic backtracking. As an example, think about how the pattern (?:a+)+b would try to match the string 'aaac'. Match 'aaa', but not 'c'. Match 'aa' and 'a', but not 'c'. Match 'a' and 'aa', but not 'c'. Match 'a' and 'a' and 'a', but not 'c'. That's 4 failed attempts. Now try match the string 'aaaac'. Match 'aaaa', but not 'c'. Match 'aaa' and 'a', but not 'c'. Match 'aa' and 'aa', but not 'c'. Match 'aa' and 'a a', but not 'c'. Match 'a' and 'aaa', but not 'c'. Match 'a' and 'aa' and 'a', but not 'c'. Match 'a' and 'a aa', but not 'c'. Match 'a' and 'a a' and 'a', but not 'c'. That's 8 failed attempts. Each additional 'a' in the string to match will double the number of attempts. Your pattern has (?:[%\w-]+/?)+, and the '/' is optional. The string has a '.', which the pattern can't match, but it'll keep trying until it finally fails. If you add just 1 more 'a' or '-' to the string, it'll take twice as long as it does now. You need to think more carefully about how the pattern matches and what it'll do when it doesn't match. From levkivskyi at gmail.com Tue Dec 26 18:29:08 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 27 Dec 2017 00:29:08 +0100 Subject: [Python-Dev] Supporting functools.singledispatch with classes. In-Reply-To: References: Message-ID: On 26 December 2017 at 01:41, Nick Coghlan wrote: > On 25 December 2017 at 12:32, Ethan Smith wrote: > > So at the moment, I don't think it is possible to implement > singledispatch > > on classmethod or staticmethod decorated functions. > > I've posted this to the PR, but adding it here as well: I think this > is a situation very similar to the case with functools.partialmethod, > where you're going to need to write a separate > functools.singledispatchmethod class that's aware of the descriptor > protocol, rather than trying to add the functionality directly to > functools.singledispatch. > I agree with Nick here. Adding a separate decorator looks like the right approach, especially taking into account the precedent of @partialmethod. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Tue Dec 26 18:43:42 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 27 Dec 2017 00:43:42 +0100 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On 22 December 2017 at 20:55, Brett Cannon wrote: > > > On Fri, Dec 22, 2017, 11:38 Chris Barker, wrote: > >> On Fri, Dec 22, 2017 at 8:49 AM, Brett Cannon wrote: >> >>> I think it's worth reminding people that if they don't like the fact >>>> dataclasses (ab)use type hints for their succinct syntax that you can >>>> always use attrs instead to avoid type hints. >>>> >>> >> sure -- but this doesn't really address the issue, the whole reason this >> is even a discussion is because dataclasses is going into the standard >> library. Third party packages can do whatever they want, of course. >> >> And the concern is that people (in particular newbies) will get confused >> / the wrong impression / other-negative-response by the (semi) use of >> typing in a standard library module. >> > > I'm still not worried. Type hints are part of the syntax and so are no > worse off than async/await and asyncio IMO. > > >> >>> As for those who feel dataclasses will force them to teach type hints >>> and they simply don't want to, maybe we could help land protocols >>> >> >> Could you please clarify what this is about ??? >> > > There's a PEP by Ivan (on my phone else I would look up the number). > > If anyone is curious this is PEP 544. It is actually already fully supported by mypy, so that one can play with it (you will need to also install typing_extensions, where Protocol class lives until the PEP is approved). -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Tue Dec 26 21:20:45 2017 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 26 Dec 2017 21:20:45 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <799224d8-0134-74a3-5c44-544adec1e00a@salort.eu> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: <2bf97cba-a4e8-9f95-3cf3-bb0471c04f94@trueblade.com> On 12/21/2017 6:36 AM, Ivan Levkivskyi wrote: > On 21 December 2017 at 11:22, Terry Reedy > wrote: > > On 12/21/2017 4:22 AM, Eric V. Smith wrote: > > On 12/21/2017 1:46 AM, Chris Barker wrote: > > > I suggest that it be clear in the docs, and ideally in the > PEP, that the dataclass decorator is using the *annotation" > syntax, and that the the only relevant part it uses is that > an annotation exists, but the value of the annotation is > essentially (completely?) ignored. > > > I think the PEP is very clear about this: "The dataclass > decorator examines the class to find fields. A field is defined > as any variable identified in __annotations__. That is, a > variable that has a type annotation. With two exceptions > described below, none of the Data Class machinery examines the > type specified in the annotation." > > > This seems clear enough.? It could come after describing what a > dataclass *is*. > > I agree the docs should also be clear about this. > > > > So we should have examples like: > > @dataclass > class C: > ???? a: ...? # field with no default > ???? b: ... = 0 # filed with a default value > > Then maybe: > > @dataclass > class C: > ???? a: "the a parameter" # field with no default > ???? b: "another, different parameter" = 0.0 # field with a > default > > Then the docs can go to say that if the user wants to > specify a type for use with a static type checking > pre-processor, they can do it like so: > > @dataclass > class C: > ???? a: int # integer field with no default > ???? b: float = 0.0 # float field with a default > > And the types will be recognized by type checkers such as mypy. > > And I think the non-typed examples should go first in the docs. > > > Module some bike-shedding, the above seems pretty good to me. > > > For me, the three options for "don't care" have a bit different meaning: > > * typing.Any: this class is supposed to be used with static type > checkers, but this field is too dynamic > * ... (ellipsis): this class may or may not be used with static type > checkers, use the inferred type in the latter case > * "field docstring": this class should not be used with static type checkers > > Assuming this, the second option would be the "real" "don't care". If > this makes sense, > then we can go the way proposed in > https://github.com/python/typing/issues/276 and make ellipsis semantics > "official" in PEP 484. > (pending Guido's approval) In https://github.com/ericvsmith/dataclasses/issues/2#issuecomment-353918024, Guido has suggested using `object`, which has the benefit of not needing an import. And to me, it communicates the "don't care" aspect well enough. I do understand the difference if you're using a type checker (see for example https://stackoverflow.com/questions/39817081/typing-any-vs-object), but if you care about that, use typing.Any. Eric. From ned at nedbatchelder.com Wed Dec 27 00:00:03 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Wed, 27 Dec 2017 00:00:03 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On 12/26/17 1:49 PM, Chris Barker wrote: > On Sat, Dec 23, 2017 at 5:54 PM, Nick Coghlan > wrote: > > > I still wonder about the "fields *must* be annotated" constraint > though. I can understand a constraint that the style be > *consistent* (i.e. all fields as annotations, or all fields as > field instances), since that's needed to determine the field > order, but I don't see the problem with the "no annotations" style > otherwise. > > > IIUC, without annotations, there is no way to set a field with no default. > > And supporting both approaches violates "only one way to do it" in, I > think, a confusing manner -- particularly if you can't mix and match them. > > Also, could does using class attributes without annotations make a > mess when subclassing? -- no I haven't thought that out yet. > > I have not been following the design of dataclasses, and maybe I'm misunderstanding the state of the work.? My impression is that attrs was a thing, and lots of people loved it, so we wanted something like it in the stdlib.? Data Classes is that thing, but it is a new thing being designed from scratch.? There are still big questions about how it should work, but it is already a part of 3.7. Often when people propose new things, we say, "Put it on PyPI first, and let's see how people like it."? Why isn't that the path for Data Classes?? Why are they already part of 3.7 when we have no practical experience with them?? Wouldn't it be better to let the design mature with real experience?? Especially since some of the questions being asked are about how it interrelates with another large new feature with little practical use yet (typing)? --Ned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Wed Dec 27 01:55:58 2017 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Wed, 27 Dec 2017 01:55:58 -0500 Subject: [Python-Dev] (no subject) In-Reply-To: References: Message-ID: On Tue, Dec 26, 2017 at 2:01 AM, Yogev Hendel wrote: > > I don't know if this is the right place to put this, > but I've found the following lines of code results in an incredibly long > processing time. > Perhaps it might be of use to someone. > > import re > pat = re.compile('^/?(?:\\w+)/(?:[%\\w-]+/?)+/?$') > pat.match('/t/a-aa-aa-aaaaa-aa-aa-aa-aa-aa-aa./') (I think the correct place is python-list. python-dev is primarily for the developers of Python itself. python-ideas is for proposing new features and changes to the language. python-list is for general discussion. Bug reports and feature requests belong in https://bugs.python.org/ (where your post could also have gone).) The textbook regular expression algorithm (which I believe grep uses) runs in linear time with respect to the text length. The algorithm used by Perl, Java, Python, JavaScript, Ruby, and many other languages instead use a backtracking algorithm, which can run up to exponential time with respect to text length. This worst-case is in fact necessary (assuming P != NP): Perl allows (introduced?) backreferences, which are NP-hard[1]. Perl also added some other features which complicate things, but backreferences are enough. The user-level solution is to understand how regexes are executed, and to work around it. Here are library-level solutions for your example: 1. Perl now has a regex optimizer, which will eliminate some redundancies. Something similar can be added to Python, at first as a third-party library. 2. In theory, we can use the textbook algorithm when possible, and the backtracking algorithm when necessary. However, the textbook version won't necessarily be faster, and may take more time to create, so there's a tradeoff here. 3. To go even further, I believe it's possible to use the textbook algorithm for subexpressions, while the overall expression uses backtracking, internally iterating through the matches of the textbook algorithm. There's a series of articles by Russ Cox that try to get us back to the textbook (see [2]). He and others implemented the ideas in the C++ library RE2[3], which has Python bindings[4]. RE2 was made for and used on Google Code Search[5] (described in his articles), a (now discontinued) search engine for open-source repos which allowed regular expressions in the queries. You can get a whiff of the limitations of the textbook algorithm by checking out RE2's syntax[6] and seeing what features aren't supported, though some features may be unsupported for different reasons (such as being redundant syntax). - Backreferences and lookaround assertions don't have a known solution.[7] - Bounded repetition is only supported up to a limit (1000), because each possible repetition needs its own set of states. - Possessive quantifiers aren't supported. Greedy and reluctant quantifiers are. - Groups and named groups _are_ supported. See the second and third Russ Cox articles, with the term "submatch".[2] (Apologies: I am making up reference syntax on-the-fly.) [1] "Perl Regular Expression Matching is NP-Hard" https://perl.plover.com/NPC/ [2] "Regular Expression Matching Can Be Simple And Fast" https://swtch.com/~rsc/regexp/regexp1.html "Regular Expression Matching: the Virtual Machine Approach" https://swtch.com/~rsc/regexp/regexp2.html "Regular Expression Matching in the Wild" https://swtch.com/~rsc/regexp/regexp3.html "Regular Expression Matching with a Trigram Index" https://swtch.com/~rsc/regexp/regexp4.html [3] RE2: https://github.com/google/re2 [4] pyre2: https://github.com/facebook/pyre2/ Also see re2 and re3 on PyPI, which intend to be a drop-in replacement. re3 is a Py3-compatible fork of re2, which last updated in 2015. [5] https://en.wikipedia.org/wiki/Google_Code_Search [6] https://github.com/google/re2/wiki/Syntax [7] Quote: "As a matter of principle, RE2 does not support constructs for which only backtracking solutions are known to exist. Thus, backreferences and look-around assertions are not supported." https://github.com/google/re2/wiki/WhyRE2 From jonathan.underwood at gmail.com Wed Dec 27 09:19:16 2017 From: jonathan.underwood at gmail.com (Jonathan Underwood) Date: Wed, 27 Dec 2017 14:19:16 +0000 Subject: [Python-Dev] When val=b'', but val == b'' returns False - bytes initialization Message-ID: Hello, I am not sure if this is expected behaviour, or a bug. In a C extension module, if I create and return an empty bytes object like this: val = PyBytes_FromStringAndSize (NULL, 20); Py_SIZE(val) = 0; Then from the Python interpreter's perspective: isinstance(val, bytes) returns True print(val) returns b'' print(repr(val)) returns b'' BUT val == b'' returns False. On the other hand, initializing the underlying memory: val = PyBytes_FromStringAndSize (NULL, 20); PyBytes_AS_STRING (val); c[0] = '\0'; Py_SIZE(val) = 0; Then, from the Python interpreter, val == b'' returns True, as expected. So, my question is: is this the expected behaviour, or a bug? I was slightly surprised to have to initialize the storage. On the other hand, I can perhaps also see it might be expected, since the docs do say that PyBytes_FromStringAndSize will not initialize the underlying storage. Please cc me on any replies - am not subscribed to the list. Many thanks, Jonathan From solipsis at pitrou.net Wed Dec 27 11:28:41 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 27 Dec 2017 17:28:41 +0100 Subject: [Python-Dev] When val=b'', but val == b'' returns False - bytes initialization References: Message-ID: <20171227172841.1c81dc7b@fsol> On Wed, 27 Dec 2017 14:19:16 +0000 Jonathan Underwood wrote: > Hello, > > I am not sure if this is expected behaviour, or a bug. > > In a C extension module, if I create and return an empty bytes object like this: > > val = PyBytes_FromStringAndSize (NULL, 20); > Py_SIZE(val) = 0; I wouldn't call it "expected", but bytes objects are supposed to be NULL-terminated internally, so your code technically creates an invalid bytes object. The NULL-terminated constraint may be relied on by some code, for example when the string gets passed to a third-party C function. Perhaps that should be mentioned in the C API docs. Regards Antoine. From barry at python.org Wed Dec 27 11:41:36 2017 From: barry at python.org (Barry Warsaw) Date: Wed, 27 Dec 2017 11:41:36 -0500 Subject: [Python-Dev] Documenting types outside of typing Message-ID: <3AF16B8B-1AF4-4560-9B2E-E027BC920EFD@python.org> In his review of PR#4911, Antoine points to the documentation of two type definitions in importlib.resources, Package and Resource. https://github.com/python/cpython/pull/4911/files#diff-2a479c407f7177f3d7cb876f244e47bcR804 One question is what markup to use for type definitions. I?m using class:: because that?s what?s used in typing and there doesn?t seem to be any better alternative. More to the point, Antoine questions whether these two types should be documented at all: https://github.com/python/cpython/pull/4911#discussion_r158801065 "What I mean is that a class is supposed to specify concrete behaviour, but being a type, Package doesn't have any methods or attributes of its own. So I don't see the point of mentioning it in the docs.? I suggest that they are worth documenting because they help to organize the discussion about what API is expected from the arguments to the functions, without having to duplicate that information in every function description. I also think that since you?ll see those types in the code, they are worth documenting. I don?t think you *lose* anything by including their documentation. But Antoine makes a good point that we probably don?t have a lot of precedence here, so suggests we discuss it on python-dev to come up with some useful conventions. I haven?t kept up on the dataclasses discussion, but given that types are important in that API too, have the same issues come up there and if so, how are they being handled? Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From solipsis at pitrou.net Wed Dec 27 12:51:15 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 27 Dec 2017 18:51:15 +0100 Subject: [Python-Dev] When val=b'', but val == b'' returns False - bytes initialization References: <20171227172841.1c81dc7b@fsol> Message-ID: <20171227185115.35d7e47c@fsol> On Wed, 27 Dec 2017 17:28:41 +0100 Antoine Pitrou wrote: > On Wed, 27 Dec 2017 14:19:16 +0000 > Jonathan Underwood wrote: > > Hello, > > > > I am not sure if this is expected behaviour, or a bug. > > > > In a C extension module, if I create and return an empty bytes object like this: > > > > val = PyBytes_FromStringAndSize (NULL, 20); > > Py_SIZE(val) = 0; > > I wouldn't call it "expected", but bytes objects are supposed to be > NULL-terminated internally, so your code technically creates an invalid > bytes object. The NULL-terminated constraint may be relied on by some > code, for example when the string gets passed to a third-party C > function. Note this is really happening because you're allocating a 20-long bytes object and then shortening it to 0 bytes. PyBytes_FromStringAndSize() already stored a NULL byte in the 21st place. Regards Antoine. From ethan at ethanhs.me Wed Dec 27 13:22:38 2017 From: ethan at ethanhs.me (Ethan Smith) Date: Wed, 27 Dec 2017 10:22:38 -0800 Subject: [Python-Dev] Supporting functools.singledispatch with classes. In-Reply-To: References: Message-ID: Okay, if there is no further feedback, I will work on a singledispatchmethod decorator like partialmethod. For the future perhaps, would it not be possible to tell that the passed argument is a descriptor/function and dispatch to the correct implementation, thus not needing two functions for essentially the same thing? It seems more straightforward to make the implementation a bit more complex to provide a single, simple API to users. Cheers, Ethan On Tue, Dec 26, 2017 at 3:29 PM, Ivan Levkivskyi wrote: > On 26 December 2017 at 01:41, Nick Coghlan wrote: > >> On 25 December 2017 at 12:32, Ethan Smith wrote: >> > So at the moment, I don't think it is possible to implement >> singledispatch >> > on classmethod or staticmethod decorated functions. >> >> I've posted this to the PR, but adding it here as well: I think this >> is a situation very similar to the case with functools.partialmethod, >> where you're going to need to write a separate >> functools.singledispatchmethod class that's aware of the descriptor >> protocol, rather than trying to add the functionality directly to >> functools.singledispatch. >> > > I agree with Nick here. Adding a separate decorator looks like the right > approach, > especially taking into account the precedent of @partialmethod. > > -- > Ivan > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Wed Dec 27 18:59:01 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Thu, 28 Dec 2017 00:59:01 +0100 Subject: [Python-Dev] Documenting types outside of typing In-Reply-To: <3AF16B8B-1AF4-4560-9B2E-E027BC920EFD@python.org> References: <3AF16B8B-1AF4-4560-9B2E-E027BC920EFD@python.org> Message-ID: FWIW the same problem was discussed a year ago when documenting typing. At that time the discussion was not conclusive, so that some types use class:: directive while other use data:: directive. At that time Guido was in favour of data:: and now in view of PEP 560 many types in typing will stop being class objects, and will be just (compact) objects. Therefore, my understanding is that all special forms like Union, Any, ClassVar, etc. will use data:: in the docs. Concerning the question whether it makes to document types, I think it makes sense if it is a publicly available type (or type alias) that will be useful to annotate user code. -- Ivan On 27 December 2017 at 17:41, Barry Warsaw wrote: > In his review of PR#4911, Antoine points to the documentation of two type > definitions in importlib.resources, Package and Resource. > > https://github.com/python/cpython/pull/4911/files#diff- > 2a479c407f7177f3d7cb876f244e47bcR804 > > One question is what markup to use for type definitions. I?m using > class:: because that?s what?s used in typing and there doesn?t seem to be > any better alternative. > > More to the point, Antoine questions whether these two types should be > documented at all: > > https://github.com/python/cpython/pull/4911#discussion_r158801065 > > "What I mean is that a class is supposed to specify concrete behaviour, > but being a type, Package doesn't have any methods or attributes of its > own. So I don't see the point of mentioning it in the docs.? > > I suggest that they are worth documenting because they help to organize > the discussion about what API is expected from the arguments to the > functions, without having to duplicate that information in every function > description. I also think that since you?ll see those types in the code, > they are worth documenting. I don?t think you *lose* anything by including > their documentation. > > But Antoine makes a good point that we probably don?t have a lot of > precedence here, so suggests we discuss it on python-dev to come up with > some useful conventions. I haven?t kept up on the dataclasses discussion, > but given that types are important in that API too, have the same issues > come up there and if so, how are they being handled? > > Cheers, > -Barry > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > levkivskyi%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Thu Dec 28 01:08:13 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 28 Dec 2017 01:08:13 -0500 Subject: [Python-Dev] PEP 567 v2 Message-ID: This is a second version of PEP 567. A few things have changed: 1. I now have a reference implementation: https://github.com/python/cpython/pull/5027 2. The C API was updated to match the implementation. 3. The get_context() function was renamed to copy_context() to better reflect what it is really doing. 4. Few clarifications/edits here and there to address earlier feedback. Yury PEP: 567 Title: Context Variables Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 12-Dec-2017 Python-Version: 3.7 Post-History: 12-Dec-2017, 28-Dec-2017 Abstract ======== This PEP proposes a new ``contextvars`` module and a set of new CPython C APIs to support context variables. This concept is similar to thread-local storage (TLS), but, unlike TLS, it also allows correctly keeping track of values per asynchronous task, e.g. ``asyncio.Task``. This proposal is a simplified version of :pep:`550`. The key difference is that this PEP is concerned only with solving the case for asynchronous tasks, not for generators. There are no proposed modifications to any built-in types or to the interpreter. This proposal is not strictly related to Python Context Managers. Although it does provide a mechanism that can be used by Context Managers to store their state. Rationale ========= Thread-local variables are insufficient for asynchronous tasks that execute concurrently in the same OS thread. Any context manager that saves and restores a context value using ``threading.local()`` will have its context values bleed to other code unexpectedly when used in async/await code. A few examples where having a working context local storage for asynchronous code is desirable: * Context managers like ``decimal`` contexts and ``numpy.errstate``. * Request-related data, such as security tokens and request data in web applications, language context for ``gettext``, etc. * Profiling, tracing, and logging in large code bases. Introduction ============ The PEP proposes a new mechanism for managing context variables. The key classes involved in this mechanism are ``contextvars.Context`` and ``contextvars.ContextVar``. The PEP also proposes some policies for using the mechanism around asynchronous tasks. The proposed mechanism for accessing context variables uses the ``ContextVar`` class. A module (such as ``decimal``) that wishes to store a context variable should: * declare a module-global variable holding a ``ContextVar`` to serve as a key; * access the current value via the ``get()`` method on the key variable; * modify the current value via the ``set()`` method on the key variable. The notion of "current value" deserves special consideration: different asynchronous tasks that exist and execute concurrently may have different values for the same key. This idea is well-known from thread-local storage but in this case the locality of the value is not necessarily bound to a thread. Instead, there is the notion of the "current ``Context``" which is stored in thread-local storage, and is accessed via ``contextvars.copy_context()`` function. Manipulation of the current ``Context`` is the responsibility of the task framework, e.g. asyncio. A ``Context`` is conceptually a read-only mapping, implemented using an immutable dictionary. The ``ContextVar.get()`` method does a lookup in the current ``Context`` with ``self`` as a key, raising a ``LookupError`` or returning a default value specified in the constructor. The ``ContextVar.set(value)`` method clones the current ``Context``, assigns the ``value`` to it with ``self`` as a key, and sets the new ``Context`` as the new current ``Context``. Specification ============= A new standard library module ``contextvars`` is added with the following APIs: 1. ``copy_context() -> Context`` function is used to get a copy of the current ``Context`` object for the current OS thread. 2. ``ContextVar`` class to declare and access context variables. 3. ``Context`` class encapsulates context state. Every OS thread stores a reference to its current ``Context`` instance. It is not possible to control that reference manually. Instead, the ``Context.run(callable, *args, **kwargs)`` method is used to run Python code in another context. contextvars.ContextVar ---------------------- The ``ContextVar`` class has the following constructor signature: ``ContextVar(name, *, default=_NO_DEFAULT)``. The ``name`` parameter is used only for introspection and debug purposes, and is exposed as a read-only ``ContextVar.name`` attribute. The ``default`` parameter is optional. Example:: # Declare a context variable 'var' with the default value 42. var = ContextVar('var', default=42) (The ``_NO_DEFAULT`` is an internal sentinel object used to detect if the default value was provided.) ``ContextVar.get()`` returns a value for context variable from the current ``Context``:: # Get the value of `var`. var.get() ``ContextVar.set(value) -> Token`` is used to set a new value for the context variable in the current ``Context``:: # Set the variable 'var' to 1 in the current context. var.set(1) ``ContextVar.reset(token)`` is used to reset the variable in the current context to the value it had before the ``set()`` operation that created the ``token``:: assert var.get(None) is None token = var.set(1) try: ... finally: var.reset(token) assert var.get(None) is None ``ContextVar.reset()`` method is idempotent and can be called multiple times on the same Token object: second and later calls will be no-ops. contextvars.Token ----------------- ``contextvars.Token`` is an opaque object that should be used to restore the ``ContextVar`` to its previous value, or remove it from the context if the variable was not set before. It can be created only by calling ``ContextVar.set()``. For debug and introspection purposes it has: * a read-only attribute ``Token.var`` pointing to the variable that created the token; * a read-only attribute ``Token.old_value`` set to the value the variable had before the ``set()`` call, or to ``Token.MISSING`` if the variable wasn't set before. Having the ``ContextVar.set()`` method returning a ``Token`` object and the ``ContextVar.reset(token)`` method, allows context variables to be removed from the context if they were not in it before the ``set()`` call. contextvars.Context ------------------- ``Context`` object is a mapping of context variables to values. ``Context()`` creates an empty context. To get a copy of the current ``Context`` for the current OS thread, use the ``contextvars.copy_context()`` method:: ctx = contextvars.copy_context() To run Python code in some ``Context``, use ``Context.run()`` method:: ctx.run(function) Any changes to any context variables that ``function`` causes will be contained in the ``ctx`` context:: var = ContextVar('var') var.set('spam') def function(): assert var.get() == 'spam' var.set('ham') assert var.get() == 'ham' ctx = copy_context() # Any changes that 'function' makes to 'var' will stay # isolated in the 'ctx'. ctx.run(function) assert var.get() == 'spam' Any changes to the context will be contained in the ``Context`` object on which ``run()`` is called on. ``Context.run()`` is used to control in which context asyncio callbacks and Tasks are executed. It can also be used to run some code in a different thread in the context of the current thread:: executor = ThreadPoolExecutor() current_context = contextvars.copy_context() executor.submit( lambda: current_context.run(some_function)) ``Context`` objects implement the ``collections.abc.Mapping`` ABC. This can be used to introspect context objects:: ctx = contextvars.copy_context() # Print all context variables and their values in 'ctx': print(ctx.items()) # Print the value of 'some_variable' in context 'ctx': print(ctx[some_variable]) asyncio ------- ``asyncio`` uses ``Loop.call_soon()``, ``Loop.call_later()``, and ``Loop.call_at()`` to schedule the asynchronous execution of a function. ``asyncio.Task`` uses ``call_soon()`` to run the wrapped coroutine. We modify ``Loop.call_{at,later,soon}`` and ``Future.add_done_callback()`` to accept the new optional *context* keyword-only argument, which defaults to the current context:: def call_soon(self, callback, *args, context=None): if context is None: context = contextvars.copy_context() # ... some time later context.run(callback, *args) Tasks in asyncio need to maintain their own context that they inherit from the point they were created at. ``asyncio.Task`` is modified as follows:: class Task: def __init__(self, coro): ... # Get the current context snapshot. self._context = contextvars.copy_context() self._loop.call_soon(self._step, context=self._context) def _step(self, exc=None): ... # Every advance of the wrapped coroutine is done in # the task's context. self._loop.call_soon(self._step, context=self._context) ... C API ----- 1. ``PyContextVar * PyContextVar_New(char *name, PyObject *default)``: create a ``ContextVar`` object. 2. ``int PyContextVar_Get(PyContextVar *, PyObject *default_value, PyObject **value)``: return ``-1`` if an error occurs during the lookup, ``0`` otherwise. If a value for the context variable is found, it will be set to the ``value`` pointer. Otherwise, ``value`` will be set to ``default_value`` when it is not ``NULL``. If ``default_value`` is ``NULL``, ``value`` will be set to the default value of the variable, which can be ``NULL`` too. ``value`` is always a borrowed reference. 3. ``PyContextToken * PyContextVar_Set(PyContextVar *, PyObject *)``: set the value of the variable in the current context. 4. ``PyContextVar_Reset(PyContextVar *, PyContextToken *)``: reset the value of the context variable. 5. ``PyContext * PyContext_New()``: create a new empty context. 6. ``PyContext * PyContext_Copy()``: get a copy of the current context. 7. ``int PyContext_Enter(PyContext *)`` and ``int PyContext_Exit(PyContext *)`` allow to set and restore the context for the current OS thread. It is required to always restore the previous context:: PyContext *old_ctx = PyContext_Copy(); if (old_ctx == NULL) goto error; if (PyContext_Enter(new_ctx)) goto error; // run some code if (PyContext_Exit(old_ctx)) goto error; Implementation ============== This section explains high-level implementation details in pseudo-code. Some optimizations are omitted to keep this section short and clear. For the purposes of this section, we implement an immutable dictionary using ``dict.copy()``:: class _ContextData: def __init__(self): self._mapping = dict() def get(self, key): return self._mapping[key] def set(self, key, value): copy = _ContextData() copy._mapping = self._mapping.copy() copy._mapping[key] = value return copy def delete(self, key): copy = _ContextData() copy._mapping = self._mapping.copy() del copy._mapping[key] return copy Every OS thread has a reference to the current ``_ContextData``. ``PyThreadState`` is updated with a new ``context_data`` field that points to a ``_ContextData`` object:: class PyThreadState: context_data: _ContextData ``contextvars.copy_context()`` is implemented as follows:: def copy_context(): ts : PyThreadState = PyThreadState_Get() if ts.context_data is None: ts.context_data = _ContextData() ctx = Context() ctx._data = ts.context_data return ctx ``contextvars.Context`` is a wrapper around ``_ContextData``:: class Context(collections.abc.Mapping): def __init__(self): self._data = _ContextData() def run(self, callable, *args, **kwargs): ts : PyThreadState = PyThreadState_Get() saved_data : _ContextData = ts.context_data try: ts.context_data = self._data return callable(*args, **kwargs) finally: self._data = ts.context_data ts.context_data = saved_data # Mapping API methods are implemented by delegating # `get()` and other Mapping calls to `self._data`. ``contextvars.ContextVar`` interacts with ``PyThreadState.context_data`` directly:: class ContextVar: def __init__(self, name, *, default=_NO_DEFAULT): self._name = name self._default = default @property def name(self): return self._name def get(self, default=_NO_DEFAULT): ts : PyThreadState = PyThreadState_Get() data : _ContextData = ts.context_data try: return data.get(self) except KeyError: pass if default is not _NO_DEFAULT: return default if self._default is not _NO_DEFAULT: return self._default raise LookupError def set(self, value): ts : PyThreadState = PyThreadState_Get() data : _ContextData = ts.context_data try: old_value = data.get(self) except KeyError: old_value = Token.MISSING ts.context_data = data.set(self, value) return Token(self, old_value) def reset(self, token): if token._used: return if token._old_value is Token.MISSING: ts.context_data = data.delete(token._var) else: ts.context_data = data.set(token._var, token._old_value) token._used = True class Token: MISSING = object() def __init__(self, var, old_value): self._var = var self._old_value = old_value self._used = False @property def var(self): return self._var @property def old_value(self): return self._old_value Implementation Notes ==================== * The internal immutable dictionary for ``Context`` is implemented using Hash Array Mapped Tries (HAMT). They allow for O(log N) ``set`` operation, and for O(1) ``copy_context()`` function, where *N* is the number of items in the dictionary. For a detailed analysis of HAMT performance please refer to :pep:`550` [1]_. * ``ContextVar.get()`` has an internal cache for the most recent value, which allows to bypass a hash lookup. This is similar to the optimization the ``decimal`` module implements to retrieve its context from ``PyThreadState_GetDict()``. See :pep:`550` which explains the implementation of the cache in a great detail. Summary of the New APIs ======================= * A new ``contextvars`` module with ``ContextVar``, ``Context``, and ``Token`` classes, and a ``copy_context()`` function. * ``asyncio.Loop.call_at()``, ``asyncio.Loop.call_later()``, ``asyncio.Loop.call_soon()``, and ``asyncio.Future.add_done_callback()`` run callback functions in the context they were called in. A new *context* keyword-only parameter can be used to specify a custom context. * ``asyncio.Task`` is modified internally to maintain its own context. Design Considerations ===================== Why contextvars.Token and not ContextVar.unset()? ------------------------------------------------- The Token API allows to get around having a ``ContextVar.unset()`` method, which is incompatible with chained contexts design of :pep:`550`. Future compatibility with :pep:`550` is desired (at least for Python 3.7) in case there is demand to support context variables in generators and asynchronous generators. The Token API also offers better usability: the user does not have to special-case absence of a value. Compare:: token = cv.get() try: cv.set(blah) # code finally: cv.reset(token) with:: _deleted = object() old = cv.get(default=_deleted) try: cv.set(blah) # code finally: if old is _deleted: cv.unset() else: cv.set(old) Rejected Ideas ============== Replication of threading.local() interface ------------------------------------------ Please refer to :pep:`550` where this topic is covered in detail: [2]_. Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Libraries that use ``threading.local()`` to store context-related values, currently work correctly only for synchronous code. Switching them to use the proposed API will keep their behavior for synchronous code unmodified, but will automatically enable support for asynchronous code. Reference Implementation ======================== The reference implementation can be found here: [3]_. References ========== .. [1] https://www.python.org/dev/peps/pep-0550/#appendix-hamt-performance-analysis .. [2] https://www.python.org/dev/peps/pep-0550/#replication-of-threading-local-interface .. [3] https://github.com/python/cpython/pull/5027 Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From victor.stinner at gmail.com Thu Dec 28 04:51:52 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 28 Dec 2017 10:51:52 +0100 Subject: [Python-Dev] PEP 567 v2 In-Reply-To: References: Message-ID: Hi, I like the new version of the PEP using "read only mapping" and copy_context(). It's easier to understand. I'm ok with seeing a context as a mapping, but I am confused about a context variable considered as a mapping item. I still see a context variable as a variable, so something which has a value or not. I just propose to rename the default parameter of the ContextVar constructor. Le 28 d?c. 2017 7:10 AM, "Yury Selivanov" a ?crit : ContextVar ---------------------- The ``ContextVar`` class has the following constructor signature: ``ContextVar(name, *, default=_NO_DEFAULT)``. The ``name`` parameter is used only for introspection and debug purposes, and is exposed as a read-only ``ContextVar.name`` attribute. The ``default`` parameter is optional. Example:: # Declare a context variable 'var' with the default value 42. var = ContextVar('var', default=42) In term of API, "default" parameter name is strange. Why not simply calling it "value"? var = ContextVar('var', default=42) and: var = ContextVar('var') var.set (42) behaves the same, no? The implementation explains where the "default" name comes from, but IMHO "value" is a better name. (The ``_NO_DEFAULT`` is an internal sentinel object used to detect if the default value was provided.) I would call it _NOT_SET. * a read-only attribute ``Token.old_value`` set to the value the variable had before the ``set()`` call, or to ``Token.MISSING`` if the variable wasn't set before. Hum, I also suggest to rename Token.MISSING to Token.NOT_SET. It would be more conistent with the last sentence. C API ----- Would it be possible to make this API private? 2. ``int PyContextVar_Get(PyContextVar *, PyObject *default_value, PyObject **value)``: (...) ``value`` is always a borrowed reference. I'm not sure that it's a good idea to add a new public C function which returns a borrowed reference. I would prefer to only use (regular) strong references in the public API. I don't want to elaborate here. You may see: http://vstinner.readthedocs.io/python_new_stable_api.html Internally, I don't care, do whatever you want for best performances :-) Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Dec 28 05:20:05 2017 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 28 Dec 2017 02:20:05 -0800 Subject: [Python-Dev] PEP 567 v2 In-Reply-To: References: Message-ID: On Thu, Dec 28, 2017 at 1:51 AM, Victor Stinner wrote: > var = ContextVar('var', default=42) > > and: > > var = ContextVar('var') > var.set (42) > > behaves the same, no? No, they're different. The second sets the value in the current context. The first sets the value in all contexts that currently exist, and all empty contexts created in the future. -n -- Nathaniel J. Smith -- https://vorpus.org From chris.jerdonek at gmail.com Thu Dec 28 05:28:28 2017 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Thu, 28 Dec 2017 02:28:28 -0800 Subject: [Python-Dev] PEP 567 v2 In-Reply-To: References: Message-ID: I have a couple basic questions around how this API could be used in practice. Both of my questions are for the Python API as applied to Tasks in asyncio. 1) Would this API support looking up the value of a context variable for **another** Task? For example, if you're managing multiple tasks using asyncio.wait() and there is an exception in some task, you might want to examine and report the value of a context variable for that task. 2) Would an appropriate use of this API be to assign a unique task id to each task? Or can that be handled more simply? I'm wondering because I recently thought this would be useful, and it doesn't seem like asyncio means for one to subclass Task (though I could be wrong). Thanks, --Chris On Wed, Dec 27, 2017 at 10:08 PM, Yury Selivanov wrote: > This is a second version of PEP 567. > > A few things have changed: > > 1. I now have a reference implementation: > https://github.com/python/cpython/pull/5027 > > 2. The C API was updated to match the implementation. > > 3. The get_context() function was renamed to copy_context() to better > reflect what it is really doing. > > 4. Few clarifications/edits here and there to address earlier feedback. > > > Yury > > > PEP: 567 > Title: Context Variables > Version: $Revision$ > Last-Modified: $Date$ > Author: Yury Selivanov > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 12-Dec-2017 > Python-Version: 3.7 > Post-History: 12-Dec-2017, 28-Dec-2017 > > > Abstract > ======== > > This PEP proposes a new ``contextvars`` module and a set of new > CPython C APIs to support context variables. This concept is > similar to thread-local storage (TLS), but, unlike TLS, it also allows > correctly keeping track of values per asynchronous task, e.g. > ``asyncio.Task``. > > This proposal is a simplified version of :pep:`550`. The key > difference is that this PEP is concerned only with solving the case > for asynchronous tasks, not for generators. There are no proposed > modifications to any built-in types or to the interpreter. > > This proposal is not strictly related to Python Context Managers. > Although it does provide a mechanism that can be used by Context > Managers to store their state. > > > Rationale > ========= > > Thread-local variables are insufficient for asynchronous tasks that > execute concurrently in the same OS thread. Any context manager that > saves and restores a context value using ``threading.local()`` will > have its context values bleed to other code unexpectedly when used > in async/await code. > > A few examples where having a working context local storage for > asynchronous code is desirable: > > * Context managers like ``decimal`` contexts and ``numpy.errstate``. > > * Request-related data, such as security tokens and request > data in web applications, language context for ``gettext``, etc. > > * Profiling, tracing, and logging in large code bases. > > > Introduction > ============ > > The PEP proposes a new mechanism for managing context variables. > The key classes involved in this mechanism are ``contextvars.Context`` > and ``contextvars.ContextVar``. The PEP also proposes some policies > for using the mechanism around asynchronous tasks. > > The proposed mechanism for accessing context variables uses the > ``ContextVar`` class. A module (such as ``decimal``) that wishes to > store a context variable should: > > * declare a module-global variable holding a ``ContextVar`` to > serve as a key; > > * access the current value via the ``get()`` method on the > key variable; > > * modify the current value via the ``set()`` method on the > key variable. > > The notion of "current value" deserves special consideration: > different asynchronous tasks that exist and execute concurrently > may have different values for the same key. This idea is well-known > from thread-local storage but in this case the locality of the value is > not necessarily bound to a thread. Instead, there is the notion of the > "current ``Context``" which is stored in thread-local storage, and > is accessed via ``contextvars.copy_context()`` function. > Manipulation of the current ``Context`` is the responsibility of the > task framework, e.g. asyncio. > > A ``Context`` is conceptually a read-only mapping, implemented using > an immutable dictionary. The ``ContextVar.get()`` method does a > lookup in the current ``Context`` with ``self`` as a key, raising a > ``LookupError`` or returning a default value specified in > the constructor. > > The ``ContextVar.set(value)`` method clones the current ``Context``, > assigns the ``value`` to it with ``self`` as a key, and sets the > new ``Context`` as the new current ``Context``. > > > Specification > ============= > > A new standard library module ``contextvars`` is added with the > following APIs: > > 1. ``copy_context() -> Context`` function is used to get a copy of > the current ``Context`` object for the current OS thread. > > 2. ``ContextVar`` class to declare and access context variables. > > 3. ``Context`` class encapsulates context state. Every OS thread > stores a reference to its current ``Context`` instance. > It is not possible to control that reference manually. > Instead, the ``Context.run(callable, *args, **kwargs)`` method is > used to run Python code in another context. > > > contextvars.ContextVar > ---------------------- > > The ``ContextVar`` class has the following constructor signature: > ``ContextVar(name, *, default=_NO_DEFAULT)``. The ``name`` parameter > is used only for introspection and debug purposes, and is exposed > as a read-only ``ContextVar.name`` attribute. The ``default`` > parameter is optional. Example:: > > # Declare a context variable 'var' with the default value 42. > var = ContextVar('var', default=42) > > (The ``_NO_DEFAULT`` is an internal sentinel object used to > detect if the default value was provided.) > > ``ContextVar.get()`` returns a value for context variable from the > current ``Context``:: > > # Get the value of `var`. > var.get() > > ``ContextVar.set(value) -> Token`` is used to set a new value for > the context variable in the current ``Context``:: > > # Set the variable 'var' to 1 in the current context. > var.set(1) > > ``ContextVar.reset(token)`` is used to reset the variable in the > current context to the value it had before the ``set()`` operation > that created the ``token``:: > > assert var.get(None) is None > > token = var.set(1) > try: > ... > finally: > var.reset(token) > > assert var.get(None) is None > > ``ContextVar.reset()`` method is idempotent and can be called > multiple times on the same Token object: second and later calls > will be no-ops. > > > contextvars.Token > ----------------- > > ``contextvars.Token`` is an opaque object that should be used to > restore the ``ContextVar`` to its previous value, or remove it from > the context if the variable was not set before. It can be created > only by calling ``ContextVar.set()``. > > For debug and introspection purposes it has: > > * a read-only attribute ``Token.var`` pointing to the variable > that created the token; > > * a read-only attribute ``Token.old_value`` set to the value the > variable had before the ``set()`` call, or to ``Token.MISSING`` > if the variable wasn't set before. > > Having the ``ContextVar.set()`` method returning a ``Token`` object > and the ``ContextVar.reset(token)`` method, allows context variables > to be removed from the context if they were not in it before the > ``set()`` call. > > > contextvars.Context > ------------------- > > ``Context`` object is a mapping of context variables to values. > > ``Context()`` creates an empty context. To get a copy of the current > ``Context`` for the current OS thread, use the > ``contextvars.copy_context()`` method:: > > ctx = contextvars.copy_context() > > To run Python code in some ``Context``, use ``Context.run()`` > method:: > > ctx.run(function) > > Any changes to any context variables that ``function`` causes will > be contained in the ``ctx`` context:: > > var = ContextVar('var') > var.set('spam') > > def function(): > assert var.get() == 'spam' > > var.set('ham') > assert var.get() == 'ham' > > ctx = copy_context() > > # Any changes that 'function' makes to 'var' will stay > # isolated in the 'ctx'. > ctx.run(function) > > assert var.get() == 'spam' > > Any changes to the context will be contained in the ``Context`` > object on which ``run()`` is called on. > > ``Context.run()`` is used to control in which context asyncio > callbacks and Tasks are executed. It can also be used to run some > code in a different thread in the context of the current thread:: > > executor = ThreadPoolExecutor() > current_context = contextvars.copy_context() > > executor.submit( > lambda: current_context.run(some_function)) > > ``Context`` objects implement the ``collections.abc.Mapping`` ABC. > This can be used to introspect context objects:: > > ctx = contextvars.copy_context() > > # Print all context variables and their values in 'ctx': > print(ctx.items()) > > # Print the value of 'some_variable' in context 'ctx': > print(ctx[some_variable]) > > > asyncio > ------- > > ``asyncio`` uses ``Loop.call_soon()``, ``Loop.call_later()``, > and ``Loop.call_at()`` to schedule the asynchronous execution of a > function. ``asyncio.Task`` uses ``call_soon()`` to run the > wrapped coroutine. > > We modify ``Loop.call_{at,later,soon}`` and > ``Future.add_done_callback()`` to accept the new optional *context* > keyword-only argument, which defaults to the current context:: > > def call_soon(self, callback, *args, context=None): > if context is None: > context = contextvars.copy_context() > > # ... some time later > context.run(callback, *args) > > Tasks in asyncio need to maintain their own context that they inherit > from the point they were created at. ``asyncio.Task`` is modified > as follows:: > > class Task: > def __init__(self, coro): > ... > # Get the current context snapshot. > self._context = contextvars.copy_context() > self._loop.call_soon(self._step, context=self._context) > > def _step(self, exc=None): > ... > # Every advance of the wrapped coroutine is done in > # the task's context. > self._loop.call_soon(self._step, context=self._context) > ... > > > C API > ----- > > 1. ``PyContextVar * PyContextVar_New(char *name, PyObject *default)``: > create a ``ContextVar`` object. > > 2. ``int PyContextVar_Get(PyContextVar *, PyObject *default_value, > PyObject **value)``: > return ``-1`` if an error occurs during the lookup, ``0`` otherwise. > If a value for the context variable is found, it will be set to the > ``value`` pointer. Otherwise, ``value`` will be set to > ``default_value`` when it is not ``NULL``. If ``default_value`` is > ``NULL``, ``value`` will be set to the default value of the > variable, which can be ``NULL`` too. ``value`` is always a borrowed > reference. > > 3. ``PyContextToken * PyContextVar_Set(PyContextVar *, PyObject *)``: > set the value of the variable in the current context. > > 4. ``PyContextVar_Reset(PyContextVar *, PyContextToken *)``: > reset the value of the context variable. > > 5. ``PyContext * PyContext_New()``: create a new empty context. > > 6. ``PyContext * PyContext_Copy()``: get a copy of the current context. > > 7. ``int PyContext_Enter(PyContext *)`` and > ``int PyContext_Exit(PyContext *)`` allow to set and restore > the context for the current OS thread. It is required to always > restore the previous context:: > > PyContext *old_ctx = PyContext_Copy(); > if (old_ctx == NULL) goto error; > > if (PyContext_Enter(new_ctx)) goto error; > > // run some code > > if (PyContext_Exit(old_ctx)) goto error; > > > Implementation > ============== > > This section explains high-level implementation details in > pseudo-code. Some optimizations are omitted to keep this section > short and clear. > > For the purposes of this section, we implement an immutable dictionary > using ``dict.copy()``:: > > class _ContextData: > > def __init__(self): > self._mapping = dict() > > def get(self, key): > return self._mapping[key] > > def set(self, key, value): > copy = _ContextData() > copy._mapping = self._mapping.copy() > copy._mapping[key] = value > return copy > > def delete(self, key): > copy = _ContextData() > copy._mapping = self._mapping.copy() > del copy._mapping[key] > return copy > > Every OS thread has a reference to the current ``_ContextData``. > ``PyThreadState`` is updated with a new ``context_data`` field that > points to a ``_ContextData`` object:: > > class PyThreadState: > context_data: _ContextData > > ``contextvars.copy_context()`` is implemented as follows:: > > def copy_context(): > ts : PyThreadState = PyThreadState_Get() > > if ts.context_data is None: > ts.context_data = _ContextData() > > ctx = Context() > ctx._data = ts.context_data > return ctx > > ``contextvars.Context`` is a wrapper around ``_ContextData``:: > > class Context(collections.abc.Mapping): > > def __init__(self): > self._data = _ContextData() > > def run(self, callable, *args, **kwargs): > ts : PyThreadState = PyThreadState_Get() > saved_data : _ContextData = ts.context_data > > try: > ts.context_data = self._data > return callable(*args, **kwargs) > finally: > self._data = ts.context_data > ts.context_data = saved_data > > # Mapping API methods are implemented by delegating > # `get()` and other Mapping calls to `self._data`. > > ``contextvars.ContextVar`` interacts with > ``PyThreadState.context_data`` directly:: > > class ContextVar: > > def __init__(self, name, *, default=_NO_DEFAULT): > self._name = name > self._default = default > > @property > def name(self): > return self._name > > def get(self, default=_NO_DEFAULT): > ts : PyThreadState = PyThreadState_Get() > data : _ContextData = ts.context_data > > try: > return data.get(self) > except KeyError: > pass > > if default is not _NO_DEFAULT: > return default > > if self._default is not _NO_DEFAULT: > return self._default > > raise LookupError > > def set(self, value): > ts : PyThreadState = PyThreadState_Get() > data : _ContextData = ts.context_data > > try: > old_value = data.get(self) > except KeyError: > old_value = Token.MISSING > > ts.context_data = data.set(self, value) > return Token(self, old_value) > > def reset(self, token): > if token._used: > return > > if token._old_value is Token.MISSING: > ts.context_data = data.delete(token._var) > else: > ts.context_data = data.set(token._var, > token._old_value) > > token._used = True > > > class Token: > > MISSING = object() > > def __init__(self, var, old_value): > self._var = var > self._old_value = old_value > self._used = False > > @property > def var(self): > return self._var > > @property > def old_value(self): > return self._old_value > > > Implementation Notes > ==================== > > * The internal immutable dictionary for ``Context`` is implemented > using Hash Array Mapped Tries (HAMT). They allow for O(log N) > ``set`` operation, and for O(1) ``copy_context()`` function, where > *N* is the number of items in the dictionary. For a detailed > analysis of HAMT performance please refer to :pep:`550` [1]_. > > * ``ContextVar.get()`` has an internal cache for the most recent > value, which allows to bypass a hash lookup. This is similar > to the optimization the ``decimal`` module implements to > retrieve its context from ``PyThreadState_GetDict()``. > See :pep:`550` which explains the implementation of the cache > in a great detail. > > > Summary of the New APIs > ======================= > > * A new ``contextvars`` module with ``ContextVar``, ``Context``, > and ``Token`` classes, and a ``copy_context()`` function. > > * ``asyncio.Loop.call_at()``, ``asyncio.Loop.call_later()``, > ``asyncio.Loop.call_soon()``, and > ``asyncio.Future.add_done_callback()`` run callback functions in > the context they were called in. A new *context* keyword-only > parameter can be used to specify a custom context. > > * ``asyncio.Task`` is modified internally to maintain its own > context. > > > Design Considerations > ===================== > > Why contextvars.Token and not ContextVar.unset()? > ------------------------------------------------- > > The Token API allows to get around having a ``ContextVar.unset()`` > method, which is incompatible with chained contexts design of > :pep:`550`. Future compatibility with :pep:`550` is desired > (at least for Python 3.7) in case there is demand to support > context variables in generators and asynchronous generators. > > The Token API also offers better usability: the user does not have > to special-case absence of a value. Compare:: > > token = cv.get() > try: > cv.set(blah) > # code > finally: > cv.reset(token) > > with:: > > _deleted = object() > old = cv.get(default=_deleted) > try: > cv.set(blah) > # code > finally: > if old is _deleted: > cv.unset() > else: > cv.set(old) > > > Rejected Ideas > ============== > > Replication of threading.local() interface > ------------------------------------------ > > Please refer to :pep:`550` where this topic is covered in detail: [2]_. > > > Backwards Compatibility > ======================= > > This proposal preserves 100% backwards compatibility. > > Libraries that use ``threading.local()`` to store context-related > values, currently work correctly only for synchronous code. Switching > them to use the proposed API will keep their behavior for synchronous > code unmodified, but will automatically enable support for > asynchronous code. > > > Reference Implementation > ======================== > > The reference implementation can be found here: [3]_. > > > References > ========== > > .. [1] https://www.python.org/dev/peps/pep-0550/#appendix-hamt- > performance-analysis > > .. [2] https://www.python.org/dev/peps/pep-0550/#replication-of- > threading-local-interface > > .. [3] https://github.com/python/cpython/pull/5027 > > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > chris.jerdonek%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Thu Dec 28 05:21:53 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 28 Dec 2017 12:21:53 +0200 Subject: [Python-Dev] 'continue'/'break'/'return' inside 'finally' clause Message-ID: Currently 'continue' is prohibited inside 'finally' clause, but 'break' and 'return' are allowed. What is the r 'continue' was prohibited in https://bugs.python.org/issue1542451. Should we prohibit also 'break' and 'return' or allow 'continue'? 'break' and 'return' are never used inside 'finally' clause in the stdlib. From erik.m.bray at gmail.com Thu Dec 28 06:29:10 2017 From: erik.m.bray at gmail.com (Erik Bray) Date: Thu, 28 Dec 2017 12:29:10 +0100 Subject: [Python-Dev] Heap allocate type structs in native extension modules? In-Reply-To: <1514296855.2976189.1216128192.7E577D55@webmail.messagingengine.com> References: <1514296855.2976189.1216128192.7E577D55@webmail.messagingengine.com> Message-ID: On Tue, Dec 26, 2017 at 3:00 PM, Benjamin Peterson wrote: > I imagine Cython already takes care of this? This appears to have a distinct purpose, albeit not unrelated to Cython. The OP's program would generate boilerplate C code for extension types the rest of which would perhaps be implemented by hand in C. Cython does this as well to an extent, but the generated code contains quite a bit of Cython-specific cruft and is not really meant to be edited by hand or read by humans in most cases. Anyways I don't think this answers the OP's question. > On Tue, Dec 26, 2017, at 02:16, Hugh Fisher wrote: >> I have a Python program which generates the boilerplate code for >> native extension modules from a Python source definition. >> (http://bitbucket.org/hugh_fisher/fullofeels if interested.) >> >> The examples in the Python doco and the "Python Essential Reference" >> book all use a statically declared PyTypeObject struct and >> PyType_Ready in the module init func, so I'm doing the same. Then >> Python 3.5 added a check for statically allocated types inheriting >> from heap types, which broke a couple of my classes. And now I'm >> trying to add a __dict__ to native classes so end users can add their >> own attributes, and this is turning out to be painful with static >> PyTypeObject structs >> >> Would it be better to use dynamically allocated type structs in native >> modules? From gvanrossum at gmail.com Thu Dec 28 09:38:15 2017 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu, 28 Dec 2017 06:38:15 -0800 Subject: [Python-Dev] 'continue'/'break'/'return' inside 'finally' clause In-Reply-To: References: Message-ID: Looks to me the prohibition was to prevent a crash. It makes more sense to fix it. On Dec 28, 2017 03:39, "Serhiy Storchaka" wrote: Currently 'continue' is prohibited inside 'finally' clause, but 'break' and 'return' are allowed. What is the r 'continue' was prohibited in https://bugs.python.org/issue1542451. Should we prohibit also 'break' and 'return' or allow 'continue'? 'break' and 'return' are never used inside 'finally' clause in the stdlib. _______________________________________________ Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido% 40python.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Dec 28 10:14:05 2017 From: barry at python.org (Barry Warsaw) Date: Thu, 28 Dec 2017 10:14:05 -0500 Subject: [Python-Dev] Documenting types outside of typing In-Reply-To: References: <3AF16B8B-1AF4-4560-9B2E-E027BC920EFD@python.org> Message-ID: On Dec 27, 2017, at 18:59, Ivan Levkivskyi wrote: > > FWIW the same problem was discussed a year ago when documenting typing. At that time the discussion was not conclusive, > so that some types use class:: directive while other use data:: directive. At that time Guido was in favour of data:: and now in view of > PEP 560 many types in typing will stop being class objects, and will be just (compact) objects. Therefore, my understanding is that > all special forms like Union, Any, ClassVar, etc. will use data:: in the docs. Thanks. I see that typing.rst has been updated to use data:: so I?ll change my branch accordingly. > Concerning the question whether it makes to document types, I think it makes sense if it is a publicly available type (or type alias) > that will be useful to annotate user code. Thanks, that?s my feeling about it too. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From yselivanov.ml at gmail.com Thu Dec 28 10:36:37 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 28 Dec 2017 10:36:37 -0500 Subject: [Python-Dev] PEP 567 v2 In-Reply-To: References: Message-ID: On Thu, Dec 28, 2017 at 5:28 AM, Chris Jerdonek wrote: > I have a couple basic questions around how this API could be used in > practice. Both of my questions are for the Python API as applied to Tasks in > asyncio. > > 1) Would this API support looking up the value of a context variable for > **another** Task? For example, if you're managing multiple tasks using > asyncio.wait() and there is an exception in some task, you might want to > examine and report the value of a context variable for that task. No, unless that another Task explicitly shares the value or captures its context and shares it. Same as with threading.local. > > 2) Would an appropriate use of this API be to assign a unique task id to > each task? Or can that be handled more simply? I'm wondering because I > recently thought this would be useful, and it doesn't seem like asyncio > means for one to subclass Task (though I could be wrong). The API should be used to share one ID between a Task and tasks it creates. You can use it to store individual Task IDs, but a combination of a WeakKeyDictionary and Task.current_task() seems to be a better/easier option. Yury From yselivanov.ml at gmail.com Thu Dec 28 10:42:28 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 28 Dec 2017 10:42:28 -0500 Subject: [Python-Dev] PEP 567 v2 In-Reply-To: References: Message-ID: On Thu, Dec 28, 2017 at 4:51 AM, Victor Stinner wrote: > Hi, > > I like the new version of the PEP using "read only mapping" and > copy_context(). It's easier to understand. Thanks, Victor! > > I'm ok with seeing a context as a mapping, but I am confused about a context > variable considered as a mapping item. I still see a context variable as a > variable, so something which has a value or not. I just propose to rename > the default parameter of the ContextVar constructor. As Nathaniel already explained, a 'default' for ContextVars is literally a default -- default value returned when a ContextVar hasn't been assigned a value in a context. So my opinion on this is that 'default' is the less ambiguous name here. [..] > > * a read-only attribute ``Token.old_value`` set to the value the > variable had before the ``set()`` call, or to ``Token.MISSING`` > if the variable wasn't set before. > > > Hum, I also suggest to rename Token.MISSING to Token.NOT_SET. It would be > more conistent with the last sentence. I like MISSING more than NOT_SET, but this is very subjective, of course. If Guido wants to rename it I rename it. > C API > ----- > > > Would it be possible to make this API private? We want _decimal and numpy to use the new API, and they will call ContextVar.get() on basically all operations, so it needs to be as fast as possible. asyncio/uvloop also want the fastest copy_context() and Context.run() possible, as they use them for *every* callback. So I think it's OK for us to add new C APIs here. > > 2. ``int PyContextVar_Get(PyContextVar *, PyObject *default_value, > PyObject **value)``: > (...) ``value`` is always a borrowed > reference. > > > I'm not sure that it's a good idea to add a new public C function which > returns a borrowed reference. I would prefer to only use (regular) strong > references in the public API. Sure, I'll change it. Yury From gvanrossum at gmail.com Thu Dec 28 11:00:43 2017 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu, 28 Dec 2017 08:00:43 -0800 Subject: [Python-Dev] PEP 567 v2 In-Reply-To: References: Message-ID: Keep MISSING. On Dec 28, 2017 8:44 AM, "Yury Selivanov" wrote: > On Thu, Dec 28, 2017 at 4:51 AM, Victor Stinner > wrote: > > Hi, > > > > I like the new version of the PEP using "read only mapping" and > > copy_context(). It's easier to understand. > > Thanks, Victor! > > > > > I'm ok with seeing a context as a mapping, but I am confused about a > context > > variable considered as a mapping item. I still see a context variable as > a > > variable, so something which has a value or not. I just propose to rename > > the default parameter of the ContextVar constructor. > > As Nathaniel already explained, a 'default' for ContextVars is > literally a default -- default value returned when a ContextVar hasn't > been assigned a value in a context. So my opinion on this is that > 'default' is the less ambiguous name here. > > [..] > > > > * a read-only attribute ``Token.old_value`` set to the value the > > variable had before the ``set()`` call, or to ``Token.MISSING`` > > if the variable wasn't set before. > > > > > > Hum, I also suggest to rename Token.MISSING to Token.NOT_SET. It would be > > more conistent with the last sentence. > > I like MISSING more than NOT_SET, but this is very subjective, of > course. If Guido wants to rename it I rename it. > > > > C API > > ----- > > > > > > Would it be possible to make this API private? > > We want _decimal and numpy to use the new API, and they will call > ContextVar.get() on basically all operations, so it needs to be as > fast as possible. asyncio/uvloop also want the fastest copy_context() > and Context.run() possible, as they use them for *every* callback. So > I think it's OK for us to add new C APIs here. > > > > > > 2. ``int PyContextVar_Get(PyContextVar *, PyObject *default_value, > > PyObject **value)``: > > (...) ``value`` is always a borrowed > > reference. > > > > > > I'm not sure that it's a good idea to add a new public C function which > > returns a borrowed reference. I would prefer to only use (regular) strong > > references in the public API. > > Sure, I'll change it. > > Yury > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Thu Dec 28 13:36:35 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 28 Dec 2017 20:36:35 +0200 Subject: [Python-Dev] 'continue'/'break'/'return' inside 'finally' clause In-Reply-To: References: Message-ID: 28.12.17 16:38, Guido van Rossum ????: > Looks to me the prohibition was to prevent a crash. It makes more sense > to fix it. The crash can be fixed by just removing the check after finishing issue17611. But is there any use case for 'continue'/'break'/'return' inside 'finally' clause? The code like try: return 1 finally: return 2 or try: continue finally: break looks at least confusing. Currently 'break' and 'return' are never used inside 'finally' clause in the stdlib. I would want to see a third-party code that uses them. From gvanrossum at gmail.com Thu Dec 28 15:30:19 2017 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu, 28 Dec 2017 12:30:19 -0800 Subject: [Python-Dev] 'continue'/'break'/'return' inside 'finally' clause In-Reply-To: References: Message-ID: I don't think the language definition should be judgmental here. The semantics are unambiguous. On Dec 28, 2017 11:38 AM, "Serhiy Storchaka" wrote: > 28.12.17 16:38, Guido van Rossum ????: > >> Looks to me the prohibition was to prevent a crash. It makes more sense >> to fix it. >> > > The crash can be fixed by just removing the check after finishing > issue17611. > > But is there any use case for 'continue'/'break'/'return' inside 'finally' > clause? The code like > > try: > return 1 > finally: > return 2 > > or > > try: > continue > finally: > break > > looks at least confusing. Currently 'break' and 'return' are never used > inside 'finally' clause in the stdlib. I would want to see a third-party > code that uses them. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido% > 40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Dec 28 18:48:00 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 29 Dec 2017 00:48:00 +0100 Subject: [Python-Dev] PEP 567 v2 In-Reply-To: References: Message-ID: NLe 28 d?c. 2017 11:20 AM, "Nathaniel Smith" a ?crit : On Thu, Dec 28, 2017 at 1:51 AM, Victor Stinner wrote: > var = ContextVar('var', default=42) > > and: > > var = ContextVar('var') > var.set (42) > > behaves the same, no? No, they're different. The second sets the value in the current context. The first sets the value in all contexts that currently exist, and all empty contexts created in the future. Oh, that's an important information. In this case, "default" is the best name. The PEP may be more explicit about the effect on all contexts. Proposition of documentation: "The optional *default* parameter is the default value in all contexts. If the variable is not set in the current context, it is returned by by context[var_name] and by var.get(), when get() is called without the default parameter." Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Dec 28 22:45:30 2017 From: brett at python.org (Brett Cannon) Date: Fri, 29 Dec 2017 03:45:30 +0000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: On Tue, 26 Dec 2017 at 21:00 Ned Batchelder wrote: > On 12/26/17 1:49 PM, Chris Barker wrote: > > On Sat, Dec 23, 2017 at 5:54 PM, Nick Coghlan wrote: > >> >> I still wonder about the "fields *must* be annotated" constraint though. >> I can understand a constraint that the style be *consistent* (i.e. all >> fields as annotations, or all fields as field instances), since that's >> needed to determine the field order, but I don't see the problem with the >> "no annotations" style otherwise. >> > > IIUC, without annotations, there is no way to set a field with no default. > > And supporting both approaches violates "only one way to do it" in, I > think, a confusing manner -- particularly if you can't mix and match them. > > Also, could does using class attributes without annotations make a mess > when subclassing? -- no I haven't thought that out yet. > > > > I have not been following the design of dataclasses, and maybe I'm > misunderstanding the state of the work. My impression is that attrs was a > thing, and lots of people loved it, so we wanted something like it in the > stdlib. > Yes. > Data Classes is that thing, but it is a new thing being designed from > scratch. There are still big questions about how it should work, but it is > already a part of 3.7. > I wouldn't characterize it as "big questions". For some people there's a question as to how to make them work without type hints, but otherwise how they function is settled. > > Often when people propose new things, we say, "Put it on PyPI first, and > let's see how people like it." Why isn't that the path for Data Classes? > Why are they already part of 3.7 when we have no practical experience with > them? Wouldn't it be better to let the design mature with real > experience? Especially since some of the questions being asked are about > how it interrelates with another large new feature with little practical > use yet (typing)? > The short answer: "Guido said so". :) The long answer (based on my understanding, which could be wrong :) : Guido liked the idea of an attrs-like thing in the stdlib, but not attrs itself as Guido was after a different API. Eric V. Smith volunteered to work on a solution, and so Guido, Hynek, and Eric got together and discussed things at PyCon US. A design was hashed out, Eric went away and implemented it, and that led to the current solution. The only thing left is some people don't like type hints and so they don't want a stdlib module that requires them to function (there's no issue with how they relate *to* type hints, just how to make dataclasses work *without* type hints). So right now we are trying to decide what should represent the "don't care" type hint. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Fri Dec 29 02:00:47 2017 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 28 Dec 2017 23:00:47 -0800 Subject: [Python-Dev] Heap allocate type structs in native extension modules? In-Reply-To: References: <1514296855.2976189.1216128192.7E577D55@webmail.messagingengine.com> Message-ID: <1514530847.3669370.1218537368.10E68DAE@webmail.messagingengine.com> On Thu, Dec 28, 2017, at 03:29, Erik Bray wrote: > On Tue, Dec 26, 2017 at 3:00 PM, Benjamin Peterson wrote: > > I imagine Cython already takes care of this? > > This appears to have a distinct purpose, albeit not unrelated to > Cython. The OP's program would generate boilerplate C code for > extension types the rest of which would perhaps be implemented by hand > in C. Cython does this as well to an extent, but the generated code > contains quite a bit of Cython-specific cruft and is not really meant > to be edited by hand or read by humans in most cases. It still seems the OP is likely to reinvent a lot of Cython. One option is to write a bunch of "pure" .c and then only have the Python bindings in Cython. > > Anyways I don't think this answers the OP's question. I think this belongs on python-list anyway. From storchaka at gmail.com Fri Dec 29 04:25:14 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 29 Dec 2017 11:25:14 +0200 Subject: [Python-Dev] Heap allocate type structs in native extension modules? In-Reply-To: References: Message-ID: 26.12.17 12:16, Hugh Fisher ????: > I have a Python program which generates the boilerplate code for > native extension modules from a Python source definition. > (http://bitbucket.org/hugh_fisher/fullofeels if interested.) > > The examples in the Python doco and the "Python Essential Reference" > book all use a statically declared PyTypeObject struct and > PyType_Ready in the module init func, so I'm doing the same. Then > Python 3.5 added a check for statically allocated types inheriting > from heap types, which broke a couple of my classes. And now I'm > trying to add a __dict__ to native classes so end users can add their > own attributes, and this is turning out to be painful with static > PyTypeObject structs > > Would it be better to use dynamically allocated type structs in native modules? Yes, you can create heap types by using PyType_FromSpecWithBases(). But be aware of caveats (https://bugs.python.org/issue26979). From ethan at ethanhs.me Fri Dec 29 05:23:56 2017 From: ethan at ethanhs.me (Ethan Smith) Date: Fri, 29 Dec 2017 02:23:56 -0800 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: References: Message-ID: Hello all, I've recently been experimenting with dataclasses. They totally rock! A lot of the boilerplate for the AST I've designed in Python is automatically taken care of, it's really great! However, I have a few concerns about the implementation. In a few cases I want to override the repr of the AST nodes. I wrote a __repr__ and ran the code but lo and behold I got a type error. I couldn't override it. I quickly learned that one needs to pass a keyword to the dataclass decorator to tell it *not* to auto generate methods you override. I have two usability concerns with the current implementation. I emailed Eric about the first, and he said I should ask for thoughts here. The second I found after a couple of days sitting on this message. The first is that needing both a keyword and method is duplicative and unnecessary. Eric agreed it was a hassle, but felt it was justified considering someone may accidentally override a dataclass method. I disagree with this point of view as dataclasses are billed as providing automatic methods. Overriding via method definition is very natural and idiomatic. I don't really see how someone could accidentally override a dataclass method if methods were not generated by the dataclass decorator that are already defined in the class at definition time. The second concern, which I came across more recently, is if I have a base class, and dataclasses inherit from this base class, inherited __repr__ & co are silently overridden by dataclass. This is both unexpected, and also means I need to pass a repr=False to each subclass' decorator to get correct behavior, which somewhat defeats the utility of subclassing. Im not as sure a whole lot can be done about this though. I appreciate any thoughts folks have related to this. Cheers, ~>Ethan Smith -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Fri Dec 29 05:25:05 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 29 Dec 2017 19:25:05 +0900 Subject: [Python-Dev] Where to report performance of re? [was: (no subject)] In-Reply-To: References: Message-ID: <23110.6145.692688.824415@turnbull.sk.tsukuba.ac.jp> Franklin? Lee writes: > On Tue, Dec 26, 2017 at 2:01 AM, Yogev Hendel wrote: > > > > I don't know if this is the right place to put this, > > but I've found the following lines of code results in an incredibly long > > processing time. > (I think the correct place is python-list. python-dev is primarily for > the developers of Python itself. I read this as a report of a performance problem, perhaps a regression. IMO YMMV, Python-list would be appropriate only if the reporter was already aware that some regular expressions are expected to be disastrously non-performant on some target strings, and wanted help dealing with that. That doesn't seem to be the case (and your response also presumes ignorance of the performance properties of re). > Bug reports and feature requests belong in > https://bugs.python.org/ (where your post could also have gone).) If you really meant that, you should have avoided rewarding him with a wordy explanation. ;-) N.B. I've bookmarked yours, it's excellent! Thank you very much! I think even in this case an experienced community member probably should report on the tracker, though. If all they got was, "WONTFIX: that's how the algorithm works" there, *then* they go to python-list. > (assuming P != NP): Perl allows (introduced?) backreferences, which I've been using them in Emacsen since 1987 or so. > are NP-hard[1]. Perl also added some other features which complicate > things, but backreferences are enough. > > The user-level solution is to understand how regexes are executed, and > to work around it. This is the Pythonic approach<0.5 wink/>, in the sense that we haven't gone to the trouble of trying to improve the algorithm yet. > 2. In theory, we can use the textbook algorithm when possible, and the > backtracking algorithm when necessary. However, the textbook version > won't necessarily be faster, and may take more time to create, so > there's a tradeoff here. There may be a question of the difficulty of maintenance, as well. The linear time algorithms are somewhat more delicate, and we're still occasionally discussing what semantics a regular expression should have (null matches), let alone whether they're implemented correctly. From solipsis at pitrou.net Fri Dec 29 05:45:21 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 29 Dec 2017 11:45:21 +0100 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses References: Message-ID: <20171229114521.707864ba@fsol> On Fri, 29 Dec 2017 02:23:56 -0800 Ethan Smith wrote: > > In a few cases I want to override the repr of the AST nodes. I wrote a > __repr__ and ran the code but lo and behold I got a type error. I couldn't > override it. I quickly learned that one needs to pass a keyword to the > dataclass decorator to tell it *not* to auto generate methods you override. > > I have two usability concerns with the current implementation. I emailed > Eric about the first, and he said I should ask for thoughts here. The > second I found after a couple of days sitting on this message. > > The first is that needing both a keyword and method is duplicative and > unnecessary. Eric agreed it was a hassle, but felt it was justified > considering someone may accidentally override a dataclass method. I > disagree with this point of view as dataclasses are billed as providing > automatic methods. Overriding via method definition is very natural and > idiomatic. Agreed. We shouldn't take magic too far just for the sake of protecting users against their own (alleged) mistakes. And I'm not sure how you "accidentally" override a dataclass method (if I'm implementing a __repr__ I'm doing so deliberately :-)). > The second concern, which I came across more recently, is if I have a base > class, and dataclasses inherit from this base class, inherited __repr__ & > co are silently overridden by dataclass. This is both unexpected, and also > means I need to pass a repr=False to each subclass' decorator to get > correct behavior, which somewhat defeats the utility of subclassing. Im not > as sure a whole lot can be done about this though. Agreed as well. If I make the effort of having a dataclass inherit from a base class, I probably don't want the base class' methods to be silently overriden by machine-generated methods. Of course, that can be worked around by using multiple inheritance, you just need to be careful and add a small amount of class definition boilerplate. I would expect dataclass parameters such as `repr` to be tri-state: * repr=None (the default): only provide a machine-generated implementation if none is already defined (either on a base class or in the dataclass namespace... ignoring runtime-provided defaults such as object.__repr__) * repr=False: never provide a machine-generated implementation * repr=True: always provide a machine-generated implementation, even overriding a previous user-defined implementation Regards Antoine. From hugo.fisher at gmail.com Fri Dec 29 08:35:05 2017 From: hugo.fisher at gmail.com (Hugh Fisher) Date: Sat, 30 Dec 2017 00:35:05 +1100 Subject: [Python-Dev] Heap allocate type structs in native extension modules? Message-ID: > Date: Fri, 29 Dec 2017 11:25:14 +0200 > From: Serhiy Storchaka > To: python-dev at python.org > Subject: Re: [Python-Dev] Heap allocate type structs in native > extension modules? [ munch ] > Yes, you can create heap types by using PyType_FromSpecWithBases(). > > But be aware of caveats (https://bugs.python.org/issue26979). Thanks! I'll give it a go. I've already run into and solved the tp_new issue, so that won't be a problem. -- cheers, Hugh Fisher From turnbull.stephen.fw at u.tsukuba.ac.jp Fri Dec 29 09:10:17 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 29 Dec 2017 23:10:17 +0900 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: <23110.19657.979865.681437@turnbull.sk.tsukuba.ac.jp> Brett Cannon writes: > I wouldn't characterize it as "big questions". For some people there's a > question as to how to make them work without type hints, but otherwise how > they function is settled. Recently a question has been raised about the decorator overriding methods defined in the class (especially __repr__). People feel that if the class defines a method, the decorator should not override it. The current API requires passing "repr=false" to the decorator. From ncoghlan at gmail.com Fri Dec 29 10:02:19 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 30 Dec 2017 01:02:19 +1000 Subject: [Python-Dev] Supporting functools.singledispatch with classes. In-Reply-To: References: Message-ID: On 28 December 2017 at 04:22, Ethan Smith wrote: > Okay, if there is no further feedback, I will work on a singledispatchmethod > decorator like partialmethod. > > For the future perhaps, would it not be possible to tell that the passed > argument is a descriptor/function and dispatch to the correct > implementation, thus not needing two functions for essentially the same > thing? > > It seems more straightforward to make the implementation a bit more complex > to provide a single, simple API to users. "Add 'method' to the decorator name when decorating a method" is a pretty simple rule to remember - it's much easier than "Add 'arg_index=1'" (which is a comparatively arbitrary adjustment that requires a fairly in depth understanding of both the descriptor protocol and type-based function dispatch to follow). And you need the change to be explicitly opt-in *somehow*, in order to avoid breaking any existing code that relies on methods decorated with "singledispatch" dispatching on the bound class. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From turnbull.stephen.fw at u.tsukuba.ac.jp Fri Dec 29 11:31:58 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 30 Dec 2017 01:31:58 +0900 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> Message-ID: <23110.28158.856530.502446@turnbull.sk.tsukuba.ac.jp> Brett Cannon writes: > I wouldn't characterize it as "big questions". For some people there's a > question as to how to make them work without type hints, but otherwise how > they function is settled. Recently a question has been raised about the decorator overriding methods defined in the class (especially __repr__). From status at bugs.python.org Fri Dec 29 12:09:53 2017 From: status at bugs.python.org (Python tracker) Date: Fri, 29 Dec 2017 18:09:53 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20171229170953.066E45C7EC@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2017-12-22 - 2017-12-29) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 6355 (+13) closed 37843 (+24) total 44198 (+37) Open issues with patches: 2472 Issues opened (31) ================== #30722: Tools/demo/redemo.py broken https://bugs.python.org/issue30722 reopened by serhiy.storchaka #32411: Idlelib.browser: stop sorting dicts created by pyclbr https://bugs.python.org/issue32411 opened by terry.reedy #32412: help() of bitwise operators should mention sets as well https://bugs.python.org/issue32412 opened by steven.daprano #32413: Document that locals() may return globals() https://bugs.python.org/issue32413 opened by steven.daprano #32414: PyCapsule_Import fails when name is in the form 'package.modul https://bugs.python.org/issue32414 opened by lekma #32417: fromutc does not respect datetime subclasses https://bugs.python.org/issue32417 opened by p-ganssle #32418: Implement Server.get_loop() method https://bugs.python.org/issue32418 opened by asvetlov #32419: Add unittest support for pyc projects https://bugs.python.org/issue32419 opened by brgirgis #32420: LookupError : unknown encoding : [0x7FF092395AD0] ANOMALY https://bugs.python.org/issue32420 opened by Kitamura #32421: Keeping an exception in cache can segfault the interpreter https://bugs.python.org/issue32421 opened by zunger #32423: The Windows SDK version 10.0.15063.0 was not found https://bugs.python.org/issue32423 opened by isuruf #32424: Synchronize copy methods between Python and C implementations https://bugs.python.org/issue32424 opened by gphemsley #32425: Allow non-default XML parsers to take advantage of a _parse_wh https://bugs.python.org/issue32425 opened by gphemsley #32426: Tkinter.ttk Widget does not define wich option exists to set t https://bugs.python.org/issue32426 opened by alex.75 #32427: Rename and expose dataclasses._MISSING https://bugs.python.org/issue32427 opened by eric.smith #32428: dataclasses: make it an error to have initialized non-fields i https://bugs.python.org/issue32428 opened by eric.smith #32429: Outdated Modules/Setup warning is invisible https://bugs.python.org/issue32429 opened by mdk #32430: Simplify Modules/Setup{,.dist,.local} https://bugs.python.org/issue32430 opened by mdk #32431: Two bytes objects of zero length don't compare equal https://bugs.python.org/issue32431 opened by jonathanunderwood #32433: Provide optimized HMAC digest https://bugs.python.org/issue32433 opened by christian.heimes #32434: pathlib.WindowsPath.reslove(strict=False) returns absoulte pat https://bugs.python.org/issue32434 opened by mliska #32435: tarfile recognizes .gz file as tar https://bugs.python.org/issue32435 opened by spetrunin #32436: Implement PEP 567 https://bugs.python.org/issue32436 opened by yselivanov #32438: PyLong_ API cleanup https://bugs.python.org/issue32438 opened by erik.bray #32439: Clean up the code for compiling comparison expressions https://bugs.python.org/issue32439 opened by serhiy.storchaka #32441: os.dup2 should return the new fd https://bugs.python.org/issue32441 opened by benjamin.peterson #32443: Add Linux's signalfd() to the signal module https://bugs.python.org/issue32443 opened by gregory.p.smith #32444: python -m venv symlink dependency on how python binary is call https://bugs.python.org/issue32444 opened by seliger #32445: Skip creating redundant wrapper functions in ExitStack.callbac https://bugs.python.org/issue32445 opened by ncoghlan #32446: ResourceLoader.get_data() should accept a PathLike https://bugs.python.org/issue32446 opened by barry #32447: IDLE shell won't open on Mac OS 10.13.1 https://bugs.python.org/issue32447 opened by sm1979 Most recent 15 issues with no replies (15) ========================================== #32447: IDLE shell won't open on Mac OS 10.13.1 https://bugs.python.org/issue32447 #32446: ResourceLoader.get_data() should accept a PathLike https://bugs.python.org/issue32446 #32445: Skip creating redundant wrapper functions in ExitStack.callbac https://bugs.python.org/issue32445 #32441: os.dup2 should return the new fd https://bugs.python.org/issue32441 #32439: Clean up the code for compiling comparison expressions https://bugs.python.org/issue32439 #32436: Implement PEP 567 https://bugs.python.org/issue32436 #32433: Provide optimized HMAC digest https://bugs.python.org/issue32433 #32427: Rename and expose dataclasses._MISSING https://bugs.python.org/issue32427 #32426: Tkinter.ttk Widget does not define wich option exists to set t https://bugs.python.org/issue32426 #32423: The Windows SDK version 10.0.15063.0 was not found https://bugs.python.org/issue32423 #32418: Implement Server.get_loop() method https://bugs.python.org/issue32418 #32410: Implement loop.sock_sendfile method https://bugs.python.org/issue32410 #32404: fromtimestamp does not call __new__ in datetime subclasses https://bugs.python.org/issue32404 #32403: date, time and datetime alternate constructors should take fas https://bugs.python.org/issue32403 #32400: inspect.isdatadescriptor false negative https://bugs.python.org/issue32400 Most recent 15 issues waiting for review (15) ============================================= #32441: os.dup2 should return the new fd https://bugs.python.org/issue32441 #32439: Clean up the code for compiling comparison expressions https://bugs.python.org/issue32439 #32436: Implement PEP 567 https://bugs.python.org/issue32436 #32433: Provide optimized HMAC digest https://bugs.python.org/issue32433 #32431: Two bytes objects of zero length don't compare equal https://bugs.python.org/issue32431 #32429: Outdated Modules/Setup warning is invisible https://bugs.python.org/issue32429 #32427: Rename and expose dataclasses._MISSING https://bugs.python.org/issue32427 #32424: Synchronize copy methods between Python and C implementations https://bugs.python.org/issue32424 #32418: Implement Server.get_loop() method https://bugs.python.org/issue32418 #32414: PyCapsule_Import fails when name is in the form 'package.modul https://bugs.python.org/issue32414 #32413: Document that locals() may return globals() https://bugs.python.org/issue32413 #32411: Idlelib.browser: stop sorting dicts created by pyclbr https://bugs.python.org/issue32411 #32410: Implement loop.sock_sendfile method https://bugs.python.org/issue32410 #32404: fromtimestamp does not call __new__ in datetime subclasses https://bugs.python.org/issue32404 #32403: date, time and datetime alternate constructors should take fas https://bugs.python.org/issue32403 Top 10 most discussed issues (10) ================================= #32424: Synchronize copy methods between Python and C implementations https://bugs.python.org/issue32424 16 msgs #17611: Move unwinding of stack for "pseudo exceptions" from interpret https://bugs.python.org/issue17611 11 msgs #32429: Outdated Modules/Setup warning is invisible https://bugs.python.org/issue32429 9 msgs #31639: http.server and SimpleHTTPServer hang after a few requests https://bugs.python.org/issue31639 7 msgs #32431: Two bytes objects of zero length don't compare equal https://bugs.python.org/issue32431 6 msgs #32360: Save OrderedDict imports in various stdlibs. https://bugs.python.org/issue32360 5 msgs #32420: LookupError : unknown encoding : [0x7FF092395AD0] ANOMALY https://bugs.python.org/issue32420 5 msgs #21288: hashlib.pbkdf2_hmac Hash Constructor https://bugs.python.org/issue21288 4 msgs #32145: Wrong ExitStack Callback recipe https://bugs.python.org/issue32145 4 msgs #32419: Add unittest support for pyc projects https://bugs.python.org/issue32419 4 msgs Issues closed (24) ================== #24960: Can't use lib2to3 with embeddable zip file. https://bugs.python.org/issue24960 closed by benjamin.peterson #26133: asyncio: ugly error related to signal handlers at exit if the https://bugs.python.org/issue26133 closed by asvetlov #26666: File object hook to modify select(ors) event mask https://bugs.python.org/issue26666 closed by asvetlov #28236: In xml.etree.ElementTree Element can be created with empty and https://bugs.python.org/issue28236 closed by rhettinger #29084: C API of OrderedDict https://bugs.python.org/issue29084 closed by serhiy.storchaka #29504: blake2: compile error with -march=bdver2 https://bugs.python.org/issue29504 closed by christian.heimes #29780: Interpreter hang on self._epoll.poll(timeout, max_ev) https://bugs.python.org/issue29780 closed by asvetlov #31721: assertion failure in FutureObj_finalize() after setting _log_t https://bugs.python.org/issue31721 closed by asvetlov #31983: Officially add Py_SETREF and Py_XSETREF https://bugs.python.org/issue31983 closed by serhiy.storchaka #31988: Saving bytearray to binary plist file doesn't work https://bugs.python.org/issue31988 closed by serhiy.storchaka #32261: Online doc does not include inspect.classify_class_attrs https://bugs.python.org/issue32261 closed by csabella #32324: [Security] "python3 directory" inserts "directory" at sys.path https://bugs.python.org/issue32324 closed by ncoghlan #32335: Failed Python build on Fedora 27 https://bugs.python.org/issue32335 closed by amitg-b14 #32363: Deprecate task.set_result() and task.set_exception() https://bugs.python.org/issue32363 closed by yselivanov #32372: Optimize out __debug__ at the AST level https://bugs.python.org/issue32372 closed by serhiy.storchaka #32401: No module named '_ctypes' https://bugs.python.org/issue32401 closed by YoSTEALTH #32402: Coverity: CID 1426868/1426867: Null pointer dereferences in t https://bugs.python.org/issue32402 closed by inada.naoki #32415: Add Task.get_loop() and Future.get_loop() https://bugs.python.org/issue32415 closed by yselivanov #32416: Refactor and add new tests for the f_lineno setter https://bugs.python.org/issue32416 closed by serhiy.storchaka #32422: Reduce lru_cache memory overhead. https://bugs.python.org/issue32422 closed by serhiy.storchaka #32432: [BUG] Python vs Macbook High Sierra 10.13.2 https://bugs.python.org/issue32432 closed by Felipe Filgueira Barral #32437: UnicodeError: 'IDNA does not round-trip' https://bugs.python.org/issue32437 closed by berker.peksag #32440: Use HTTPS in help() https://bugs.python.org/issue32440 closed by Mariatta #32442: Result of pathlib.Path.resolve() with UNC path is not very use https://bugs.python.org/issue32442 closed by uranusjr From ethan at ethanhs.me Fri Dec 29 13:55:01 2017 From: ethan at ethanhs.me (Ethan Smith) Date: Fri, 29 Dec 2017 10:55:01 -0800 Subject: [Python-Dev] Supporting functools.singledispatch with classes. In-Reply-To: References: Message-ID: On Fri, Dec 29, 2017 at 7:02 AM, Nick Coghlan wrote: > On 28 December 2017 at 04:22, Ethan Smith wrote: > > Okay, if there is no further feedback, I will work on a > singledispatchmethod > > decorator like partialmethod. > > > > For the future perhaps, would it not be possible to tell that the passed > > argument is a descriptor/function and dispatch to the correct > > implementation, thus not needing two functions for essentially the same > > thing? > > > > It seems more straightforward to make the implementation a bit more > complex > > to provide a single, simple API to users. > > "Add 'method' to the decorator name when decorating a method" is a > pretty simple rule to remember - it's much easier than "Add > 'arg_index=1'" (which is a comparatively arbitrary adjustment that > requires a fairly in depth understanding of both the descriptor > protocol and type-based function dispatch to follow). > > And you need the change to be explicitly opt-in *somehow*, in order to > avoid breaking any existing code that relies on methods decorated with > "singledispatch" dispatching on the bound class. > Good points. I will start working on the singledispatchmethod implementation. ~>Ethan Smith > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Dec 29 13:59:52 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 29 Dec 2017 11:59:52 -0700 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <23110.19657.979865.681437@turnbull.sk.tsukuba.ac.jp> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <8490e88d-ae58-b418-f1c1-5429f0681999@trueblade.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <23110.19657.979865.681437@turnbull.sk.tsukuba.ac.jp> Message-ID: On Fri, Dec 29, 2017 at 7:10 AM, Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > Recently a question has been raised about the decorator overriding > methods defined in the class (especially __repr__). People feel that > if the class defines a method, the decorator should not override it. > The current API requires passing "repr=false" to the decorator. > I think this is a reasonable question, though I'm not sure how "big" it is. Note that if the *base* class defines __repr__ the decorator should still override it (unless repr=False), since there's always object.__repr__ (and same for most other dunders). We should also (like we did with most questions big and small) look at what attrs does and why. Regarding whether this should live on PyPI first, in this case that would not be helpful, since attrs is already the category killer on PyPI. So we are IMO taking the best course possible given that we want something in the stdlib but not exactly attrs. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at ethanhs.me Fri Dec 29 14:12:11 2017 From: ethan at ethanhs.me (Ethan Smith) Date: Fri, 29 Dec 2017 11:12:11 -0800 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: <20171229114521.707864ba@fsol> References: <20171229114521.707864ba@fsol> Message-ID: On Fri, Dec 29, 2017 at 2:45 AM, Antoine Pitrou wrote: > On Fri, 29 Dec 2017 02:23:56 -0800 > Ethan Smith wrote: > > > > In a few cases I want to override the repr of the AST nodes. I wrote a > > __repr__ and ran the code but lo and behold I got a type error. I > couldn't > > override it. I quickly learned that one needs to pass a keyword to the > > dataclass decorator to tell it *not* to auto generate methods you > override. > > > > I have two usability concerns with the current implementation. I emailed > > Eric about the first, and he said I should ask for thoughts here. The > > second I found after a couple of days sitting on this message. > > > > The first is that needing both a keyword and method is duplicative and > > unnecessary. Eric agreed it was a hassle, but felt it was justified > > considering someone may accidentally override a dataclass method. I > > disagree with this point of view as dataclasses are billed as providing > > automatic methods. Overriding via method definition is very natural and > > idiomatic. > > Agreed. We shouldn't take magic too far just for the sake of > protecting users against their own (alleged) mistakes. And I'm not > sure how you "accidentally" override a dataclass method (if I'm > implementing a __repr__ I'm doing so deliberately :-)). > My thinking exactly. > > > The second concern, which I came across more recently, is if I have a > base > > class, and dataclasses inherit from this base class, inherited __repr__ & > > co are silently overridden by dataclass. This is both unexpected, and > also > > means I need to pass a repr=False to each subclass' decorator to get > > correct behavior, which somewhat defeats the utility of subclassing. Im > not > > as sure a whole lot can be done about this though. > > Agreed as well. If I make the effort of having a dataclass inherit > from a base class, I probably don't want the base class' methods to be > silently overriden by machine-generated methods. Of course, that can > be worked around by using multiple inheritance, you just need to be > careful and add a small amount of class definition boilerplate. > I am not sure exactly what you mean by "worked around by using multiple inheritance". Do you mean you think the dataclass decorator should be made into a dataclass base class for a dataclass class, or that it should look at the MRO? > > I would expect dataclass parameters such as `repr` to be tri-state: > > * repr=None (the default): only provide a machine-generated > implementation if none is already defined (either on a base class or > in the dataclass namespace... ignoring runtime-provided defaults such > as object.__repr__) > * repr=False: never provide a machine-generated implementation > * repr=True: always provide a machine-generated implementation, even > overriding a previous user-defined implementation > This is sensible to me. Thanks for your thoughts! ~>Ethan Smith > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > ethan%40ethanhs.me > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gvanrossum at gmail.com Fri Dec 29 14:14:49 2017 From: gvanrossum at gmail.com (Guido van Rossum) Date: Fri, 29 Dec 2017 12:14:49 -0700 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: <20171229114521.707864ba@fsol> References: <20171229114521.707864ba@fsol> Message-ID: But you always inherit __repr__, from object. The base class might also itself be a dataclass. I think it should only skip when the decorated class itself defines __repr__. On Dec 29, 2017 3:47 AM, "Antoine Pitrou" wrote: > On Fri, 29 Dec 2017 02:23:56 -0800 > Ethan Smith wrote: > > > > In a few cases I want to override the repr of the AST nodes. I wrote a > > __repr__ and ran the code but lo and behold I got a type error. I > couldn't > > override it. I quickly learned that one needs to pass a keyword to the > > dataclass decorator to tell it *not* to auto generate methods you > override. > > > > I have two usability concerns with the current implementation. I emailed > > Eric about the first, and he said I should ask for thoughts here. The > > second I found after a couple of days sitting on this message. > > > > The first is that needing both a keyword and method is duplicative and > > unnecessary. Eric agreed it was a hassle, but felt it was justified > > considering someone may accidentally override a dataclass method. I > > disagree with this point of view as dataclasses are billed as providing > > automatic methods. Overriding via method definition is very natural and > > idiomatic. > > Agreed. We shouldn't take magic too far just for the sake of > protecting users against their own (alleged) mistakes. And I'm not > sure how you "accidentally" override a dataclass method (if I'm > implementing a __repr__ I'm doing so deliberately :-)). > > > The second concern, which I came across more recently, is if I have a > base > > class, and dataclasses inherit from this base class, inherited __repr__ & > > co are silently overridden by dataclass. This is both unexpected, and > also > > means I need to pass a repr=False to each subclass' decorator to get > > correct behavior, which somewhat defeats the utility of subclassing. Im > not > > as sure a whole lot can be done about this though. > > Agreed as well. If I make the effort of having a dataclass inherit > from a base class, I probably don't want the base class' methods to be > silently overriden by machine-generated methods. Of course, that can > be worked around by using multiple inheritance, you just need to be > careful and add a small amount of class definition boilerplate. > > I would expect dataclass parameters such as `repr` to be tri-state: > > * repr=None (the default): only provide a machine-generated > implementation if none is already defined (either on a base class or > in the dataclass namespace... ignoring runtime-provided defaults such > as object.__repr__) > * repr=False: never provide a machine-generated implementation > * repr=True: always provide a machine-generated implementation, even > overriding a previous user-defined implementation > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Dec 29 14:18:43 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 29 Dec 2017 20:18:43 +0100 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: References: <20171229114521.707864ba@fsol> Message-ID: <20171229201843.2ee7da4e@fsol> On Fri, 29 Dec 2017 11:12:11 -0800 Ethan Smith wrote: > > > Agreed as well. If I make the effort of having a dataclass inherit > > from a base class, I probably don't want the base class' methods to be > > silently overriden by machine-generated methods. Of course, that can > > be worked around by using multiple inheritance, you just need to be > > careful and add a small amount of class definition boilerplate. > > I am not sure exactly what you mean by "worked around by using multiple > inheritance". I mean you can write: class _BaseClass: def __repr__(self): # ... @dataclass class _DataclassMixin: # your attribute definitions here class FinalClass(_BaseClass, _BaseDataclass): pass Yes, it's tedious and verbose :-) Regards Antoine. From ethan at stoneleaf.us Fri Dec 29 14:37:38 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 29 Dec 2017 11:37:38 -0800 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: References: Message-ID: <5A469982.5040205@stoneleaf.us> On 12/29/2017 02:23 AM, Ethan Smith wrote: > The first is that needing both a keyword and method is duplicative and unnecessary. Eric agreed it was a hassle, but > felt it was justified considering someone may accidentally override a dataclass method. I disagree with this point of > view as dataclasses are billed as providing automatic methods. Overriding via method definition is very natural and > idiomatic. I don't really see how someone could accidentally override a dataclass method if methods were not generated > by the dataclass decorator that are already defined in the class at definition time. Accidental or not, the decorator should not be replacing methods defined by the class. > The second concern, which I came across more recently, is if I have a base class, and dataclasses inherit from this base > class, inherited __repr__ & co are silently overridden by dataclass. This is both unexpected, and also means I need to > pass a repr=False to each subclass' decorator to get correct behavior, which somewhat defeats the utility of > subclassing. Im not as sure a whole lot can be done about this though. It is possible to determine whether an existing __repr__ is from 'object' or not, and only provide one if that is the case. I think that should be the default, with 'repr = True' for those cases where a new, auto-generated, __repr__ is desired. -- ~Ethan~ From ethan at ethanhs.me Fri Dec 29 14:55:38 2017 From: ethan at ethanhs.me (Ethan Smith) Date: Fri, 29 Dec 2017 11:55:38 -0800 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: <5A469982.5040205@stoneleaf.us> References: <5A469982.5040205@stoneleaf.us> Message-ID: On Fri, Dec 29, 2017 at 11:37 AM, Ethan Furman wrote: > On 12/29/2017 02:23 AM, Ethan Smith wrote: > > The first is that needing both a keyword and method is duplicative and >> unnecessary. Eric agreed it was a hassle, but >> felt it was justified considering someone may accidentally override a >> dataclass method. I disagree with this point of >> view as dataclasses are billed as providing automatic methods. Overriding >> via method definition is very natural and >> idiomatic. I don't really see how someone could accidentally override a >> dataclass method if methods were not generated >> by the dataclass decorator that are already defined in the class at >> definition time. >> > > Accidental or not, the decorator should not be replacing methods defined > by the class. > > The second concern, which I came across more recently, is if I have a base >> class, and dataclasses inherit from this base >> class, inherited __repr__ & co are silently overridden by dataclass. This >> is both unexpected, and also means I need to >> pass a repr=False to each subclass' decorator to get correct behavior, >> which somewhat defeats the utility of >> subclassing. Im not as sure a whole lot can be done about this though. >> > > It is possible to determine whether an existing __repr__ is from 'object' > or not, and only provide one if that is the case. I think that should be > the default, with 'repr = True' for those cases where a new, > auto-generated, __repr__ is desired. > The only other thing you'd want to handle is to cover inheriting from another dataclass. e.g., if I have dataclass with attribute a, and subclass it to add attribute b, I'd want both in the repr. ~>Ethan Smith > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ethan% > 40ethanhs.me > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Dec 29 15:30:52 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 29 Dec 2017 12:30:52 -0800 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: References: <5A469982.5040205@stoneleaf.us> Message-ID: <5A46A5FC.8050407@stoneleaf.us> On 12/29/2017 11:55 AM, Ethan Smith wrote: > On Fri, Dec 29, 2017 at 11:37 AM, Ethan Furman wrote: >> It is possible to determine whether an existing __repr__ is from 'object' >> or not, and only provide one if that is the case. I think that should be >> the default, with 'repr = True' for those cases where a new, auto-generated, >> __repr__ is desired. > > The only other thing you'd want to handle is to cover inheriting from another dataclass. e.g., if I have dataclass with > attribute a, and subclass it to add attribute b, I'd want both in the repr. Good point. So auto-generate a new __repr__ if: - one is not provided, and - existing __repr__ is either: - object.__repr__, or - a previous dataclass __repr__ And if the auto default doesn't work for one's use-case, use the keyword parameter to specify what you want. -- ~Ethan~ From christian at python.org Fri Dec 29 15:54:46 2017 From: christian at python.org (Christian Heimes) Date: Fri, 29 Dec 2017 21:54:46 +0100 Subject: [Python-Dev] [ssl] The weird case of IDNA Message-ID: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> Hi, tl;dr This mail is about internationalized domain names and TLS/SSL. It doesn't concern you if you live in ASCII-land. Me and a couple of other developers like to change the ssl module in a backwards-incompatible way to fix IDN support for TLS/SSL. Simply speaking the IDNA standards (internationalized domain names for applications) describe how to encode non-ASCII domain names. The DNS system and X.509 certificates cannot handle non-ASCII host names. Any non-ASCII part of a hostname is punyencoded. For example the host name 'www.b?cher.de' (books) is translated into 'www.xn--bcher-kva.de'. In IDNA terms, 'www.b?cher.de' is called an IDN U-label (unicode) and 'www.xn--bcher-kva.de' an IDN A-label (ASCII). Please refer to the TR64 document [1] for more information. In a perfect world, it would be very simple. We'd only had one IDNA standard. However there are multiple standards that are incompatible with each other. The German TLD .de demands IDNA-2008 with UTS#46 compatibility mapping. The hostname 'www.stra?e.de' maps to 'www.xn--strae-oqa.de'. However in the older IDNA 2003 standard, 'www.stra?e.de' maps to 'www.strasse.de', but 'strasse.de' is a totally different domain! CPython has only support for IDNA 2003. It's less of an issue for the socket module. It only converts text to IDNA bytes on the way in. All functions support bytes and text. Since IDNA encoding does change ASCII and IDNA-encoded data is ASCII, it is also no problem to pass IDNA2008-encoded text or bytes to all socket functions. Example: >>> import socket >>> import idna # from PyPI >>> names = ['stra?e.de', b'strasse.de', idna.encode('stra?e.de'), idna.encode('stra?e.de').encode('ascii')] >>> for name in names: ... print(name, socket.getaddrinfo(name, None, socket.AF_INET, socket.SOCK_STREAM, 0, socket.AI_CANONNAME)[0][3:5]) ... stra?e.de ('strasse.de', ('89.31.143.1', 0)) b'strasse.de' ('strasse.de', ('89.31.143.1', 0)) b'xn--strae-oqa.de' ('xn--strae-oqa.de', ('81.169.145.78', 0)) xn--strae-oqa.de ('xn--strae-oqa.de', ('81.169.145.78', 0)) As you can see, 'stra?e.de' is canonicalized as 'strasse.de'. The IDNA 2008 encoded hostname maps to a different IP address. On the other hand ssl module is currently completely broken. It converts hostnames from bytes to text with 'idna' codec in some places, but not in all. The SSLSocket.server_hostname attribute and callback function SSLContext.set_servername_callback() are decoded as U-label. Certificate's common name and subject alternative name fields are not decoded and therefore A-labels. The *must* stay A-labels because hostname verification is only defined in terms of A-labels. We even had a security issue once, because partial wildcard like 'xn*.example.org' must not match IDN hosts like 'xn--bcher-kva.example.org'. In issue [2] and PR [3], we all agreed that the only sensible fix is to make 'SSLContext.server_hostname' an ASCII text A-label. But this is an backwards incompatible fix. On the other hand, IDNA is totally broken without the fix. Also in my opinion, PR [3] is not going far enough. Since we have to break backwards compatibility anyway, I'd like to modify SSLContext.set_servername_callback() at the same time. Questions: - Is everybody OK with breaking backwards compatibility? The risk is small. ASCII-only domains are not affected and IDNA users are broken anyway. - Should I only fix 3.7 or should we consider a backport to 3.6, too? Regards, Christian [1] https://www.unicode.org/reports/tr46/ [2] https://bugs.python.org/issue28414 [3] https://github.com/python/cpython/pull/3010 From ethan at ethanhs.me Fri Dec 29 19:35:32 2017 From: ethan at ethanhs.me (Ethan Smith) Date: Fri, 29 Dec 2017 16:35:32 -0800 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: <5A46A5FC.8050407@stoneleaf.us> References: <5A469982.5040205@stoneleaf.us> <5A46A5FC.8050407@stoneleaf.us> Message-ID: On Fri, Dec 29, 2017 at 12:30 PM, Ethan Furman wrote: > On 12/29/2017 11:55 AM, Ethan Smith wrote: > >> On Fri, Dec 29, 2017 at 11:37 AM, Ethan Furman wrote: >> > > It is possible to determine whether an existing __repr__ is from 'object' >>> >> >> or not, and only provide one if that is the case. I think that should > be > >> the default, with 'repr = True' for those cases where a new, > auto-generated, > >> __repr__ is desired. >>> >> >> The only other thing you'd want to handle is to cover inheriting from >> another dataclass. e.g., if I have dataclass with >> attribute a, and subclass it to add attribute b, I'd want both in the >> repr. >> > > Good point. So auto-generate a new __repr__ if: > > - one is not provided, and > - existing __repr__ is either: > - object.__repr__, or > - a previous dataclass __repr__ > > And if the auto default doesn't work for one's use-case, use the keyword > parameter to specify what you want. > > > -- > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ethan% > 40ethanhs.me > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Dec 29 19:38:44 2017 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 29 Dec 2017 16:38:44 -0800 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: <5A46A5FC.8050407@stoneleaf.us> References: <5A469982.5040205@stoneleaf.us> <5A46A5FC.8050407@stoneleaf.us> Message-ID: On Fri, Dec 29, 2017 at 12:30 PM, Ethan Furman wrote: > Good point. So auto-generate a new __repr__ if: > > - one is not provided, and > - existing __repr__ is either: > - object.__repr__, or > - a previous dataclass __repr__ > > And if the auto default doesn't work for one's use-case, use the keyword > parameter to specify what you want. What does attrs do here? -n -- Nathaniel J. Smith -- https://vorpus.org From guido at python.org Fri Dec 29 19:52:31 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 29 Dec 2017 17:52:31 -0700 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: References: <5A469982.5040205@stoneleaf.us> <5A46A5FC.8050407@stoneleaf.us> Message-ID: I still think it should overrides anything that's just inherited but nothing that's defined in the class being decorated. On Dec 29, 2017 5:43 PM, "Nathaniel Smith" wrote: > On Fri, Dec 29, 2017 at 12:30 PM, Ethan Furman wrote: > > Good point. So auto-generate a new __repr__ if: > > > > - one is not provided, and > > - existing __repr__ is either: > > - object.__repr__, or > > - a previous dataclass __repr__ > > > > And if the auto default doesn't work for one's use-case, use the keyword > > parameter to specify what you want. > > What does attrs do here? > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at ethanhs.me Fri Dec 29 19:52:31 2017 From: ethan at ethanhs.me (Ethan Smith) Date: Fri, 29 Dec 2017 16:52:31 -0800 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: References: <5A469982.5040205@stoneleaf.us> <5A46A5FC.8050407@stoneleaf.us> Message-ID: attrs just silently overwrites any user provided __repr__ unless you provide repr=False to attr.s. I think we can all agree that if nothing else, silently overwriting unconditionally is not what we want for dataclasses. On Fri, Dec 29, 2017 at 4:38 PM, Nathaniel Smith wrote: > On Fri, Dec 29, 2017 at 12:30 PM, Ethan Furman wrote: > > Good point. So auto-generate a new __repr__ if: > > > > - one is not provided, and > > - existing __repr__ is either: > > - object.__repr__, or > > - a previous dataclass __repr__ > > > > And if the auto default doesn't work for one's use-case, use the keyword > > parameter to specify what you want. > > What does attrs do here? > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > ethan%40ethanhs.me > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at ethanhs.me Fri Dec 29 19:58:09 2017 From: ethan at ethanhs.me (Ethan Smith) Date: Fri, 29 Dec 2017 16:58:09 -0800 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: References: <5A469982.5040205@stoneleaf.us> <5A46A5FC.8050407@stoneleaf.us> Message-ID: On Fri, Dec 29, 2017 at 4:52 PM, Guido van Rossum wrote: > I still think it should overrides anything that's just inherited but > nothing that's defined in the class being decorated. > > Could you explain why you are of this opinion? Is it a concern about complexity of implementation? > On Dec 29, 2017 5:43 PM, "Nathaniel Smith" wrote: > >> On Fri, Dec 29, 2017 at 12:30 PM, Ethan Furman >> wrote: >> > Good point. So auto-generate a new __repr__ if: >> > >> > - one is not provided, and >> > - existing __repr__ is either: >> > - object.__repr__, or >> > - a previous dataclass __repr__ >> > >> > And if the auto default doesn't work for one's use-case, use the keyword >> > parameter to specify what you want. >> >> What does attrs do here? >> >> -n >> >> -- >> Nathaniel J. Smith -- https://vorpus.org >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido% >> 40python.org >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > ethan%40ethanhs.me > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Dec 29 20:04:35 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 29 Dec 2017 17:04:35 -0800 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: References: <5A469982.5040205@stoneleaf.us> <5A46A5FC.8050407@stoneleaf.us> Message-ID: <5A46E623.9060102@stoneleaf.us> On Fri, Dec 29, 2017 at 12:30 PM, Ethan Furman wrote: > Good point. So auto-generate a new __repr__ if: > > - one is not provided, and > - existing __repr__ is either: > - object.__repr__, or > - a previous dataclass __repr__ > > And if the auto default doesn't work for one's use-case, use the keyword > parameter to specify what you want. On Dec 29, 2017 5:43 PM, "Nathaniel Smith" wrote: > What does attrs do here? On 12/29/2017 04:52 PM, Ethan Smith wrote: > attrs just silently overwrites any user provided __repr__ unless you provide > repr=False to attr.s. On 12/29/2017 04:52 PM, Guido van Rossum wrote: > I still think it should overrides anything that's just inherited but nothing > that's defined in the class being decorated. I can certainly live with that. -- ~Ethan~ From guido at python.org Fri Dec 29 20:13:06 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 29 Dec 2017 18:13:06 -0700 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: References: <5A469982.5040205@stoneleaf.us> <5A46A5FC.8050407@stoneleaf.us> Message-ID: No, I am concerned about the rule being too complex to explain, and about surprising effects when the base changes (action at a distance). I also don't necessarily think "we all agree" that what attrs does is wrong, but the rule I propose seems reasonable. On Dec 29, 2017 5:58 PM, "Ethan Smith" wrote: > > > On Fri, Dec 29, 2017 at 4:52 PM, Guido van Rossum > wrote: > >> I still think it should overrides anything that's just inherited but >> nothing that's defined in the class being decorated. >> >> > Could you explain why you are of this opinion? Is it a concern about > complexity of implementation? > > >> On Dec 29, 2017 5:43 PM, "Nathaniel Smith" wrote: >> >>> On Fri, Dec 29, 2017 at 12:30 PM, Ethan Furman >>> wrote: >>> > Good point. So auto-generate a new __repr__ if: >>> > >>> > - one is not provided, and >>> > - existing __repr__ is either: >>> > - object.__repr__, or >>> > - a previous dataclass __repr__ >>> > >>> > And if the auto default doesn't work for one's use-case, use the >>> keyword >>> > parameter to specify what you want. >>> >>> What does attrs do here? >>> >>> -n >>> >>> -- >>> Nathaniel J. Smith -- https://vorpus.org >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: https://mail.python.org/mailma >>> n/options/python-dev/guido%40python.org >>> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ethan% >> 40ethanhs.me >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Dec 29 20:13:36 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 30 Dec 2017 11:13:36 +1000 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: References: <5A469982.5040205@stoneleaf.us> Message-ID: On 30 Dec. 2017 11:01 am, "Ethan Smith" wrote: On Fri, Dec 29, 2017 at 4:52 PM, Guido van Rossum wrote: > I still think it should overrides anything that's just inherited but > nothing that's defined in the class being decorated. > > Could you explain why you are of this opinion? Is it a concern about complexity of implementation? Adding a new method to a base class shouldn't risk breaking existing subclasses. If folks want to retain the base class implementation, they can request that explicitly (and doing so isn't redundant at the point of subclass definition the way it is for methods defined in the class body). Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Fri Dec 29 20:48:14 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Fri, 29 Dec 2017 20:48:14 -0500 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <23110.19657.979865.681437@turnbull.sk.tsukuba.ac.jp> Message-ID: <27567612-de51-bea9-eaf7-a5a0c94894e3@nedbatchelder.com> On 12/29/17 1:59 PM, Guido van Rossum wrote: > Regarding whether this should live on PyPI first, in this case that > would not be helpful, since attrs is already the category killer on > PyPI. So we are IMO taking the best course possible given that we want > something in the stdlib but not exactly attrs. It always seemed to me that the reason to recommend putting something on PyPI first wasn't so that it would climb up some kind of leaderboard, but so that people could get real-world experience with it before freezing it into the stdlib.? If we think people won't start using data classes from PyPI, why do we think it's important to get into the stdlib? It still seems to me like there are open questions about how data classes should work. Getting people using it will be a good way to get the best design before our hands are tied with backward compatibility in the stdlib.? What is the rush to put a new design into the stdlib?? Presumably it is better than attrs (or we would have simply adopted attrs).? Having data classes on PyPI will be a good way to gauge acceptance. --Ned. From ethan at ethanhs.me Fri Dec 29 21:52:11 2017 From: ethan at ethanhs.me (Ethan Smith) Date: Fri, 29 Dec 2017 18:52:11 -0800 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: References: <5A469982.5040205@stoneleaf.us> Message-ID: Okay, I think Guido's proposal is a good compromise. I already have a branch of dataclasses that should implement that behavior, so perhaps it was meant to be. :) ~>Ethan Smith On Fri, Dec 29, 2017 at 5:13 PM, Nick Coghlan wrote: > > > On 30 Dec. 2017 11:01 am, "Ethan Smith" wrote: > > > > On Fri, Dec 29, 2017 at 4:52 PM, Guido van Rossum > wrote: > >> I still think it should overrides anything that's just inherited but >> nothing that's defined in the class being decorated. >> >> > Could you explain why you are of this opinion? Is it a concern about > complexity of implementation? > > > Adding a new method to a base class shouldn't risk breaking existing > subclasses. > > If folks want to retain the base class implementation, they can request > that explicitly (and doing so isn't redundant at the point of subclass > definition the way it is for methods defined in the class body). > > Cheers, > Nick. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Dec 29 22:31:38 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 30 Dec 2017 13:31:38 +1000 Subject: [Python-Dev] Is static typing still optional? In-Reply-To: <27567612-de51-bea9-eaf7-a5a0c94894e3@nedbatchelder.com> References: <36710C01-10C0-4B70-8846-C0B0C235C4BC@gmail.com> <7696703b-6321-b08f-fc35-67774d237e08@trueblade.com> <23110.19657.979865.681437@turnbull.sk.tsukuba.ac.jp> <27567612-de51-bea9-eaf7-a5a0c94894e3@nedbatchelder.com> Message-ID: On 30 December 2017 at 11:48, Ned Batchelder wrote: > On 12/29/17 1:59 PM, Guido van Rossum wrote: >> >> Regarding whether this should live on PyPI first, in this case that would >> not be helpful, since attrs is already the category killer on PyPI. So we >> are IMO taking the best course possible given that we want something in the >> stdlib but not exactly attrs. > > > It always seemed to me that the reason to recommend putting something on > PyPI first wasn't so that it would climb up some kind of leaderboard, but so > that people could get real-world experience with it before freezing it into > the stdlib. If we think people won't start using data classes from PyPI, > why do we think it's important to get into the stdlib? > > It still seems to me like there are open questions about how data classes > should work. Getting people using it will be a good way to get the best > design before our hands are tied with backward compatibility in the stdlib. > What is the rush to put a new design into the stdlib? Presumably it is > better than attrs (or we would have simply adopted attrs). Having data > classes on PyPI will be a good way to gauge acceptance. attrs has already proved the utility of the approach, and the differences between the two (such as they are) are mostly cosmetic (attrs even already has a release out that supports the annotation based syntax). The cosmetic differences matter for educational purposes (i.e. "data classes" with "fields", vs trying to explain that "attributes", "attrs", "attr.s", and "attr.ib" are all different things), but "available by default" matters even more on that front. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Fri Dec 29 23:46:12 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 29 Dec 2017 20:46:12 -0800 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> Message-ID: This being a security issue I think it's okay to break 3.6. might even backport to 3.5 if it's easy? On Dec 29, 2017 1:59 PM, "Christian Heimes" wrote: > Hi, > > tl;dr > This mail is about internationalized domain names and TLS/SSL. It > doesn't concern you if you live in ASCII-land. Me and a couple of other > developers like to change the ssl module in a backwards-incompatible way > to fix IDN support for TLS/SSL. > > > Simply speaking the IDNA standards (internationalized domain names for > applications) describe how to encode non-ASCII domain names. The DNS > system and X.509 certificates cannot handle non-ASCII host names. Any > non-ASCII part of a hostname is punyencoded. For example the host name > 'www.b?cher.de ' (books) is translated into ' > www.xn--bcher-kva.de'. In > IDNA terms, 'www.b?cher.de ' is called an > IDN U-label (unicode) and > 'www.xn--bcher-kva.de' an IDN A-label (ASCII). Please refer to the TR64 > document [1] for more information. > > In a perfect world, it would be very simple. We'd only had one IDNA > standard. However there are multiple standards that are incompatible > with each other. The German TLD .de demands IDNA-2008 with UTS#46 > compatibility mapping. The hostname 'www.stra?e.de ' > maps to > 'www.xn--strae-oqa.de'. However in the older IDNA 2003 standard, > 'www.stra?e.de ' maps to 'www.strasse.de', but ' > strasse.de' is a totally > different domain! > > > CPython has only support for IDNA 2003. > > It's less of an issue for the socket module. It only converts text to > IDNA bytes on the way in. All functions support bytes and text. Since > IDNA encoding does change ASCII and IDNA-encoded data is ASCII, it is > also no problem to pass IDNA2008-encoded text or bytes to all socket > functions. > > Example: > > >>> import socket > >>> import idna # from PyPI > >>> names = ['stra?e.de ', b'strasse.de', idna.encode(' > stra?e.de '), > idna.encode('stra?e.de ').encode('ascii')] > >>> for name in names: > ... print(name, socket.getaddrinfo(name, None, socket.AF_INET, > socket.SOCK_STREAM, 0, socket.AI_CANONNAME)[0][3:5]) > ... > stra?e.de ('strasse.de', ('89.31.143.1', 0)) > b'strasse.de' ('strasse.de', ('89.31.143.1', 0)) > b'xn--strae-oqa.de' ('xn--strae-oqa.de', ('81.169.145.78', 0)) > xn--strae-oqa.de ('xn--strae-oqa.de', ('81.169.145.78', 0)) > > As you can see, 'stra?e.de ' is canonicalized as ' > strasse.de'. The IDNA > 2008 encoded hostname maps to a different IP address. > > > On the other hand ssl module is currently completely broken. It converts > hostnames from bytes to text with 'idna' codec in some places, but not > in all. The SSLSocket.server_hostname attribute and callback function > SSLContext.set_servername_callback() are decoded as U-label. > Certificate's common name and subject alternative name fields are not > decoded and therefore A-labels. The *must* stay A-labels because > hostname verification is only defined in terms of A-labels. We even had > a security issue once, because partial wildcard like 'xn*.example.org' > must not match IDN hosts like 'xn--bcher-kva.example.org'. > > In issue [2] and PR [3], we all agreed that the only sensible fix is to > make 'SSLContext.server_hostname' an ASCII text A-label. But this is an > backwards incompatible fix. On the other hand, IDNA is totally broken > without the fix. Also in my opinion, PR [3] is not going far enough. > Since we have to break backwards compatibility anyway, I'd like to > modify SSLContext.set_servername_callback() at the same time. > > Questions: > - Is everybody OK with breaking backwards compatibility? The risk is > small. ASCII-only domains are not affected and IDNA users are broken > anyway. > - Should I only fix 3.7 or should we consider a backport to 3.6, too? > > Regards, > Christian > > [1] https://www.unicode.org/reports/tr46/ > [2] https://bugs.python.org/issue28414 > [3] https://github.com/python/cpython/pull/3010 > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Dec 30 05:28:37 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 30 Dec 2017 11:28:37 +0100 Subject: [Python-Dev] [ssl] The weird case of IDNA References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> Message-ID: <20171230112837.25247c63@fsol> On Fri, 29 Dec 2017 21:54:46 +0100 Christian Heimes wrote: > > On the other hand ssl module is currently completely broken. It converts > hostnames from bytes to text with 'idna' codec in some places, but not > in all. The SSLSocket.server_hostname attribute and callback function > SSLContext.set_servername_callback() are decoded as U-label. > Certificate's common name and subject alternative name fields are not > decoded and therefore A-labels. The *must* stay A-labels because > hostname verification is only defined in terms of A-labels. We even had > a security issue once, because partial wildcard like 'xn*.example.org' > must not match IDN hosts like 'xn--bcher-kva.example.org'. > > In issue [2] and PR [3], we all agreed that the only sensible fix is to > make 'SSLContext.server_hostname' an ASCII text A-label. What are the changes in API terms? If I'm calling wrap_socket(), can I pass `server_hostname='stra?e'` and it will IDNA-encode it? Or do I have to encode it myself? If the latter, it seems like we are putting the burden of protocol compliance on users. Regards Antoine. From raymond.hettinger at gmail.com Sat Dec 30 06:20:39 2017 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 30 Dec 2017 01:20:39 -1000 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: References: <5A469982.5040205@stoneleaf.us> <5A46A5FC.8050407@stoneleaf.us> Message-ID: <86F4892E-A6B9-40E1-A507-A6A7D2F5835D@gmail.com> > On Dec 29, 2017, at 4:52 PM, Guido van Rossum wrote: > > I still think it should overrides anything that's just inherited but nothing that's defined in the class being decorated. This has the virtue of being easy to explain, and it will help with debugging by honoring the code proximate to the decorator :-) For what it is worth, the functools.total_ordering class decorator does something similar -- though not exactly the same. A root comparison method is considered user-specified if it is different than the default method provided by object: def total_ordering(cls): """Class decorator that fills in missing ordering methods""" # Find user-defined comparisons (not those inherited from object). roots = {op for op in _convert if getattr(cls, op, None) is not getattr(object, op, None)} ... The @dataclass decorator has a much broader mandate and we have almost no experience with it, so it is hard to know what legitimate use cases will arise. Raymond From skip.montanaro at gmail.com Sat Dec 30 07:19:11 2017 From: skip.montanaro at gmail.com (Skip Montanaro) Date: Sat, 30 Dec 2017 06:19:11 -0600 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> Message-ID: Guido wrote: This being a security issue I think it's okay to break 3.6. might even backport to 3.5 if it's easy? Is it also a security issue with 2.x? If so, should a fix to 2.7 be contemplated? Skip -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sat Dec 30 08:11:02 2017 From: eric at trueblade.com (Eric V. Smith) Date: Sat, 30 Dec 2017 07:11:02 -0600 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: <86F4892E-A6B9-40E1-A507-A6A7D2F5835D@gmail.com> References: <5A469982.5040205@stoneleaf.us> <5A46A5FC.8050407@stoneleaf.us> <86F4892E-A6B9-40E1-A507-A6A7D2F5835D@gmail.com> Message-ID: I?m traveling until next week, and haven?t had time to read any of these emails. I?ll look at them when I return. -- Eric. > On Dec 30, 2017, at 5:20 AM, Raymond Hettinger wrote: > > >> On Dec 29, 2017, at 4:52 PM, Guido van Rossum wrote: >> >> I still think it should overrides anything that's just inherited but nothing that's defined in the class being decorated. > > This has the virtue of being easy to explain, and it will help with debugging by honoring the code proximate to the decorator :-) > > For what it is worth, the functools.total_ordering class decorator does something similar -- though not exactly the same. A root comparison method is considered user-specified if it is different than the default method provided by object: > > def total_ordering(cls): > """Class decorator that fills in missing ordering methods""" > # Find user-defined comparisons (not those inherited from object). > roots = {op for op in _convert if getattr(cls, op, None) is not getattr(object, op, None)} > ... > > The @dataclass decorator has a much broader mandate and we have almost no experience with it, so it is hard to know what legitimate use cases will arise. > > > Raymond > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com From christian at python.org Sat Dec 30 08:34:04 2017 From: christian at python.org (Christian Heimes) Date: Sat, 30 Dec 2017 14:34:04 +0100 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: <20171230112837.25247c63@fsol> References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> <20171230112837.25247c63@fsol> Message-ID: On 2017-12-30 11:28, Antoine Pitrou wrote: > On Fri, 29 Dec 2017 21:54:46 +0100 > Christian Heimes wrote: >> >> On the other hand ssl module is currently completely broken. It converts >> hostnames from bytes to text with 'idna' codec in some places, but not >> in all. The SSLSocket.server_hostname attribute and callback function >> SSLContext.set_servername_callback() are decoded as U-label. >> Certificate's common name and subject alternative name fields are not >> decoded and therefore A-labels. The *must* stay A-labels because >> hostname verification is only defined in terms of A-labels. We even had >> a security issue once, because partial wildcard like 'xn*.example.org' >> must not match IDN hosts like 'xn--bcher-kva.example.org'. >> >> In issue [2] and PR [3], we all agreed that the only sensible fix is to >> make 'SSLContext.server_hostname' an ASCII text A-label. > > What are the changes in API terms? If I'm calling wrap_socket(), can I > pass `server_hostname='stra?e'` and it will IDNA-encode it? Or do I > have to encode it myself? If the latter, it seems like we are putting > the burden of protocol compliance on users. Only SSLSocket.server_hostname attribute and the hostname argument to the SNI callback will change. Both values will be A-labels instead of U-labels. You can still pass an U-label to the server_hostname argument and it will be encoded with "idna" encoding. >>> sock = ctx.wrap_socket(socket.socket(), server_hostname='www.stra?e.de') Currently: >>> sock.server_hostname 'www.stra?e.de' Changed: >>> sock.server_hostname 'www.strasse.de' Christian From christian at python.org Sat Dec 30 08:35:35 2017 From: christian at python.org (Christian Heimes) Date: Sat, 30 Dec 2017 14:35:35 +0100 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> Message-ID: <0c9d477b-73f0-957b-ab48-78b383673517@python.org> On 2017-12-30 13:19, Skip Montanaro wrote: > Guido wrote: > > This being a security issue I think it's okay to break 3.6. might > even backport to 3.5 if it's easy? > > > Is it also a security issue with 2.x? If so, should a fix to 2.7 be > contemplated? IMO the IDNA encoding problem isn't a security issue per se. The ssl module just cannot handle internationalized domain names at all. IDN domains always fail to verify. Users may just be encouraged to disable hostname verification. On the other hand the use of IDNA 2003 and lack of IDNA 2008 support [1] can be considered a security problem for German, Greek, Japanese, Chinese and Korean domains [2]. I neither have resources nor expertise to address the encoding issue. Christian [1] https://bugs.python.org/issue17305 [2] https://www.unicode.org/reports/tr46/#Transition_Considerations From solipsis at pitrou.net Sat Dec 30 08:50:28 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 30 Dec 2017 14:50:28 +0100 Subject: [Python-Dev] [ssl] The weird case of IDNA References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> <20171230112837.25247c63@fsol> Message-ID: <20171230145028.2cf445d4@fsol> Thanks. So the change sounds ok to me. Regards Antoine. On Sat, 30 Dec 2017 14:34:04 +0100 Christian Heimes wrote: > On 2017-12-30 11:28, Antoine Pitrou wrote: > > On Fri, 29 Dec 2017 21:54:46 +0100 > > Christian Heimes wrote: > >> > >> On the other hand ssl module is currently completely broken. It converts > >> hostnames from bytes to text with 'idna' codec in some places, but not > >> in all. The SSLSocket.server_hostname attribute and callback function > >> SSLContext.set_servername_callback() are decoded as U-label. > >> Certificate's common name and subject alternative name fields are not > >> decoded and therefore A-labels. The *must* stay A-labels because > >> hostname verification is only defined in terms of A-labels. We even had > >> a security issue once, because partial wildcard like 'xn*.example.org' > >> must not match IDN hosts like 'xn--bcher-kva.example.org'. > >> > >> In issue [2] and PR [3], we all agreed that the only sensible fix is to > >> make 'SSLContext.server_hostname' an ASCII text A-label. > > > > What are the changes in API terms? If I'm calling wrap_socket(), can I > > pass `server_hostname='stra?e'` and it will IDNA-encode it? Or do I > > have to encode it myself? If the latter, it seems like we are putting > > the burden of protocol compliance on users. > > Only SSLSocket.server_hostname attribute and the hostname argument to > the SNI callback will change. Both values will be A-labels instead of > U-labels. You can still pass an U-label to the server_hostname argument > and it will be encoded with "idna" encoding. > > >>> sock = ctx.wrap_socket(socket.socket(), server_hostname='www.stra?e.de') > > Currently: > >>> sock.server_hostname > 'www.stra?e.de' > > Changed: > >>> sock.server_hostname > 'www.strasse.de' > > Christian > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org From andrew.svetlov at gmail.com Sat Dec 30 09:20:27 2017 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Sat, 30 Dec 2017 14:20:27 +0000 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: <20171230145028.2cf445d4@fsol> References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> <20171230112837.25247c63@fsol> <20171230145028.2cf445d4@fsol> Message-ID: ssl.match_hostname was added in Python 2.7.9, looks like Python 2 should be fixed as well. On Sat, Dec 30, 2017 at 3:50 PM Antoine Pitrou wrote: > > Thanks. So the change sounds ok to me. > > Regards > > Antoine. > > > On Sat, 30 Dec 2017 14:34:04 +0100 > Christian Heimes wrote: > > On 2017-12-30 11:28, Antoine Pitrou wrote: > > > On Fri, 29 Dec 2017 21:54:46 +0100 > > > Christian Heimes wrote: > > >> > > >> On the other hand ssl module is currently completely broken. It > converts > > >> hostnames from bytes to text with 'idna' codec in some places, but not > > >> in all. The SSLSocket.server_hostname attribute and callback function > > >> SSLContext.set_servername_callback() are decoded as U-label. > > >> Certificate's common name and subject alternative name fields are not > > >> decoded and therefore A-labels. The *must* stay A-labels because > > >> hostname verification is only defined in terms of A-labels. We even > had > > >> a security issue once, because partial wildcard like 'xn*.example.org > ' > > >> must not match IDN hosts like 'xn--bcher-kva.example.org'. > > >> > > >> In issue [2] and PR [3], we all agreed that the only sensible fix is > to > > >> make 'SSLContext.server_hostname' an ASCII text A-label. > > > > > > What are the changes in API terms? If I'm calling wrap_socket(), can I > > > pass `server_hostname='stra?e'` and it will IDNA-encode it? Or do I > > > have to encode it myself? If the latter, it seems like we are putting > > > the burden of protocol compliance on users. > > > > Only SSLSocket.server_hostname attribute and the hostname argument to > > the SNI callback will change. Both values will be A-labels instead of > > U-labels. You can still pass an U-label to the server_hostname argument > > and it will be encoded with "idna" encoding. > > > > >>> sock = ctx.wrap_socket(socket.socket(), server_hostname=' > www.stra?e.de ') > > > > Currently: > > >>> sock.server_hostname > > 'www.stra?e.de ' > > > > Changed: > > >>> sock.server_hostname > > 'www.strasse.de' > > > > Christian > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com > -- Thanks, Andrew Svetlov -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Dec 30 10:26:34 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 31 Dec 2017 00:26:34 +0900 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> Message-ID: <23111.45098.4252.715584@turnbull.sk.tsukuba.ac.jp> Christian Heimes writes: > tl;dr > This mail is about internationalized domain names and TLS/SSL. It > doesn't concern you if you live in ASCII-land. Me and a couple of other > developers like to change the ssl module in a backwards-incompatible way > to fix IDN support for TLS/SSL. Yes please! Seriously, we *need* to fix the bug for German, and I would presume other languages that have used pure-ASCII transcodings, which I bet are in very common use in domain names. Do you have an issue # for this offhand? If not I'll just go dig it out for myself. > In a perfect world, it would be very simple. We'd only had one IDNA > standard. However there are multiple standards that are incompatible > with each other. You forgot the obligatory XKCD: https://www.xkcd.com/927. ;-) > The German TLD .de demands IDNA-2008 with UTS#46 > compatibility mapping. The hostname 'www.stra?e.de' maps to > 'www.xn--strae-oqa.de'. However in the older IDNA 2003 standard, > 'www.stra?e.de' maps to 'www.strasse.de', but 'strasse.de' is a totally > different domain! That's a mess! I bet the domain squatters have had a field day. > Questions: > - Is everybody OK with breaking backwards compatibility? The risk is > small. ASCII-only domains are not affected That's not quite true, as your German example shows. In some Oriental renderings it is impossible to distinguish halfwidth digits from full-width ones as the same glyphs are used. (This occasionally happens with other ASCII characters, but users are more fussy about digits lining up.) That is, while technically ASCII-only domain names are not affected, users of ASCII-only domain names are potentially vulnerable to confusable names when IDNA is introduced. (Hopefully the Asian registrars are as woke as the German ones! But you could still register a .com containing full-width digits or letters.) > and IDNA users are broken anyway. Agree with your analysis, except for the fine point above. Japanese don't use IDNA much yet (except like the WIDE folks, who know what they're doing), so I have little experience with potential breakage. On the other hand that suggests that transitioning quickly will be helpful. > - Should I only fix 3.7 or should we consider a backport to 3.6, too? 3.7 has a *lot* of new stuff in it. I suspect a lot of people are going to take their time moving production sites to it, so +1 on a backport. 3.5 too, if it's not too hard. From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Dec 30 10:27:22 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 31 Dec 2017 00:27:22 +0900 Subject: [Python-Dev] Concerns about method overriding and subclassing with dataclasses In-Reply-To: <5A46A5FC.8050407@stoneleaf.us> References: <5A469982.5040205@stoneleaf.us> <5A46A5FC.8050407@stoneleaf.us> Message-ID: <23111.45146.116335.667080@turnbull.sk.tsukuba.ac.jp> Ethan Furman writes: > Good point. So auto-generate a new __repr__ if: > > - one is not provided, and > - existing __repr__ is either: > - object.__repr__, or > - a previous dataclass __repr__ -0.5 I'm with Guido here. Just use the simple rule that a new __repr__ is generated unless provided in the dataclass. The logic I use (Guido seems to be just arguing for "simple" for now) is that a dataclass is "usually" going to add fields, which you "normally" want exposed in the repr, and that means that an *inherited* __repr__ is going to be broken in some sense. The code author will disagree in "a few" cases, and in those cases they will use repr=False to override. I grant that there may be many reasons why one would be deriving dataclasses from dataclasses without adding fields that should be in the repr, so the quote marks above may be taken to be indications of my lack of imagination. ;-) Here's to 2018. It *has* to be better than 2017 -- there will be a Python feature release! Steve From njs at pobox.com Sun Dec 31 01:13:10 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 30 Dec 2017 22:13:10 -0800 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: <23111.45098.4252.715584@turnbull.sk.tsukuba.ac.jp> References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> <23111.45098.4252.715584@turnbull.sk.tsukuba.ac.jp> Message-ID: On Sat, Dec 30, 2017 at 7:26 AM, Stephen J. Turnbull wrote: > Christian Heimes writes: > > Questions: > > - Is everybody OK with breaking backwards compatibility? The risk is > > small. ASCII-only domains are not affected > > That's not quite true, as your German example shows. In some Oriental > renderings it is impossible to distinguish halfwidth digits from > full-width ones as the same glyphs are used. (This occasionally > happens with other ASCII characters, but users are more fussy about > digits lining up.) That is, while technically ASCII-only domain names > are not affected, users of ASCII-only domain names are potentially > vulnerable to confusable names when IDNA is introduced. (Hopefully > the Asian registrars are as woke as the German ones! But you could > still register a .com containing full-width digits or letters.) This particular example isn't an issue: in IDNA encoding, full-width and half-width digits are normalized together, so number1.com and number?.com actually refer to the same domain name. This is true in both the 2003 and 2008 versions: # IDNA 2003 In [7]: "number\uff11.com".encode("idna") Out[7]: b'number1.com' # IDNA 2008 (using the 'idna' package from pypi) In [8]: idna.encode("number\uff11.com", uts46=True) Out[8]: b'number1.com' That said, IDNA does still allow for a bunch of spoofing opportunities that aren't possible with pure ASCII, and this requires some care: https://unicode.org/faq/idn.html#16 This is mostly a UI issue, though; there's not much that the socket or ssl modules can do to help here. -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Sun Dec 31 02:27:04 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 30 Dec 2017 23:27:04 -0800 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: <20171230112837.25247c63@fsol> References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> <20171230112837.25247c63@fsol> Message-ID: On Sat, Dec 30, 2017 at 2:28 AM, Antoine Pitrou wrote: > On Fri, 29 Dec 2017 21:54:46 +0100 > Christian Heimes wrote: >> >> On the other hand ssl module is currently completely broken. It converts >> hostnames from bytes to text with 'idna' codec in some places, but not >> in all. The SSLSocket.server_hostname attribute and callback function >> SSLContext.set_servername_callback() are decoded as U-label. >> Certificate's common name and subject alternative name fields are not >> decoded and therefore A-labels. The *must* stay A-labels because >> hostname verification is only defined in terms of A-labels. We even had >> a security issue once, because partial wildcard like 'xn*.example.org' >> must not match IDN hosts like 'xn--bcher-kva.example.org'. >> >> In issue [2] and PR [3], we all agreed that the only sensible fix is to >> make 'SSLContext.server_hostname' an ASCII text A-label. > > What are the changes in API terms? If I'm calling wrap_socket(), can I > pass `server_hostname='stra?e'` and it will IDNA-encode it? Or do I > have to encode it myself? If the latter, it seems like we are putting > the burden of protocol compliance on users. Part of what makes this confusing is that there are actually three intertwined issues here. (Also, anything that deals with Unicode *or* SSL/TLS is automatically confusing, and this is about both!) Issue 1: Python's built-in IDNA implementation is wrong (implements IDNA 2003, not IDNA 2008). Issue 2: The ssl module insists on using Python's built-in IDNA implementation whether you want it to or not. Issue 3: Also, the ssl module has a separate bug that means client-side cert validation has never worked for any IDNA domain. Issue 1 is potentially a security issue, because it means that in a small number of cases, Python will misinterpret a domain name. IDNA 2003 and IDNA 2008 are very similar, but there are 4 characters that are interpreted differently, with ? being one of them. Fixing this though is a big job, and doesn't exactly have anything to do with the ssl module -- for example, socket.getaddrinfo("stra?e.de", 80) and sock.connect("stra?e.de", 80) also do the wrong thing. Christian's not proposing to fix this here. It's issues 2 and 3 that he's proposing to fix. Issue 2 is a problem because it makes it impossible to work around issue 1, even for users who know what they're doing. In the socket module, you can avoid Python's automagical IDNA handling by doing it manually, and then calling socket.getaddrinfo("strasse.de", 80) or socket.getaddrinfo("xn--strae-oqa.de", 80), whichever you prefer. In the ssl module, this doesn't work. There are two places where ssl uses hostnames. In client mode, the user specifies the server_hostname that they want to see a certificate for, and then the module runs this through Python's IDNA machinery *even if* it's already properly encoded in ascii. And in server mode, when the user has specified an SNI callback so they can find out which certificate an incoming client connection is looking for, the module runs the incoming name through Python's IDNA machinery before handing it to user code. In both cases, the right thing to do would be to just pass through the ascii A-label versions, so savvy users can do whatever they want with them. (This also matches the general design principle around IDNA, which assumes that the pretty unicode U-labels are used only for UI purposes, and everything internal uses A-labels.) Issue 3 is just a silly bug that needs to be fixed, but it's tangled up here because the fix is the same as for Issue 2: the reason client-side cert validation has never worked is that we've been taking the A-label from the server's certificate and checking if it matches the U-label we expect, and of course it never does because we're comparing strings in different encodings. If we consistently converted everything to A-labels as soon as possible and kept it that way, then this bug would never have happened. What makes it tricky is that on both the client and the server, fixing this is actually user-visible. On the client, checking sslsock.server_hostname used to always show a U-label, but if we stop using U-labels internally then this doesn't make sense. Fortunately, since this case has never worked at all, fixing it shouldn't cause any problems. On the server, the obvious fix would be to start passing A-label-encoded names to the servername_callback, instead of U-label-encoded names. Unfortunately, this is a bit trickier, because this *has* historically worked (AFAIK) for IDNA names, so long as they didn't use one of the four magic characters who changed meaning between IDNA 2003 and IDNA 2008. But we do still need to do something. For example, right now, it's impossible to use the ssl module to implement a web server at https://stra?e.de, because incoming connections will use SNI to say that they expect a cert for "xn--strae-oqa.de", and then the ssl module will freak out and throw an exception instead of invoking the servername callback. It's ugly, but probably the simplest thing is to add a new function like set_servername_callback2 that uses the A-label, and then redefine set_servername_callback as a deprecated compatibility shim: def set_servername_callback(self, cb): def shim_cb(sslobj, servername, sslctx): if servername is not None: servername = servername.encode("ascii").decode("idna") return cb(sslobj, servername, sslctx) self.set_servername_callback2(shim_cb) We can bikeshed what the new name should be. Maybe set_sni_callback? or set_server_hostname_callback, since the corresponding client-mode argument is server_hostname? -n -- Nathaniel J. Smith -- https://vorpus.org From solipsis at pitrou.net Sun Dec 31 08:21:51 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 31 Dec 2017 14:21:51 +0100 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> <20171230112837.25247c63@fsol> Message-ID: <20171231142151.13ec7b39@fsol> On Sat, 30 Dec 2017 23:27:04 -0800 Nathaniel Smith wrote: > > We can bikeshed what the new name should be. Maybe set_sni_callback? > or set_server_hostname_callback, since the corresponding client-mode > argument is server_hostname? Or set_idna_servername_callback(). Regards Antoine. From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Dec 31 10:37:53 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 1 Jan 2018 00:37:53 +0900 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> <20171230112837.25247c63@fsol> Message-ID: <23113.1105.231843.272117@turnbull.sk.tsukuba.ac.jp> Nathaniel Smith writes: > Issue 1: Python's built-in IDNA implementation is wrong (implements > IDNA 2003, not IDNA 2008). Is "wrong" the right word here? I'll grant you that 2008 is *better*, but typically in practice versions coexist for years. Ie, is there no backward compatibility issue with registries that specified IDNA 2003? This is not entirely an idle question: I'd like to tool up on the RFCs, research existing practice (especially in the East/Southeast Asian registries), and contribute to the implementation if there may be an issue remaining. (Interpreting RFCs is something I'm reasonably good at.) Steve From njs at pobox.com Sun Dec 31 12:07:01 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 31 Dec 2017 09:07:01 -0800 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: <23113.1105.231843.272117@turnbull.sk.tsukuba.ac.jp> References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> <20171230112837.25247c63@fsol> <23113.1105.231843.272117@turnbull.sk.tsukuba.ac.jp> Message-ID: On Dec 31, 2017 7:37 AM, "Stephen J. Turnbull" < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: Nathaniel Smith writes: > Issue 1: Python's built-in IDNA implementation is wrong (implements > IDNA 2003, not IDNA 2008). Is "wrong" the right word here? I'll grant you that 2008 is *better*, but typically in practice versions coexist for years. Ie, is there no backward compatibility issue with registries that specified IDNA 2003? Well, yeah, I was simplifying, but at the least we can say that always and only using IDNA 2003 certainly isn't right :-). I think in most cases the preferred way to deal with these kinds of issues is not to carry around an IDNA 2003 implementation, but instead to use an IDNA 2008 implementation with the "transitional compatibility" flag enabled in the UTS46 preprocessor? But this is rapidly exceeding my knowledge. This is another reason why we ought to let users do their own IDNA handling if they want... This is not entirely an idle question: I'd like to tool up on the RFCs, research existing practice (especially in the East/Southeast Asian registries), and contribute to the implementation if there may be an issue remaining. (Interpreting RFCs is something I'm reasonably good at.) Maybe this is a good place to start: https://github.com/kjd/idna/blob/master/README.rst -n [Sorry if my quoting is messed up; posting from my phone and Gmail for Android apparently generates broken text/plain.] -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Dec 31 20:39:56 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 1 Jan 2018 12:39:56 +1100 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> <20171230112837.25247c63@fsol> <23113.1105.231843.272117@turnbull.sk.tsukuba.ac.jp> Message-ID: <20180101013956.GW4215@ando.pearwood.info> On Sun, Dec 31, 2017 at 09:07:01AM -0800, Nathaniel Smith wrote: > This is another reason why we ought to let users do their own IDNA handling > if they want... I expect that letting users do their own IDNA handling will correspond to not doing any IDNA handling at all. -- Steve From njs at pobox.com Sun Dec 31 20:51:47 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 31 Dec 2017 17:51:47 -0800 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: <20180101013956.GW4215@ando.pearwood.info> References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> <20171230112837.25247c63@fsol> <23113.1105.231843.272117@turnbull.sk.tsukuba.ac.jp> <20180101013956.GW4215@ando.pearwood.info> Message-ID: On Sun, Dec 31, 2017 at 5:39 PM, Steven D'Aprano wrote: > On Sun, Dec 31, 2017 at 09:07:01AM -0800, Nathaniel Smith wrote: > >> This is another reason why we ought to let users do their own IDNA handling >> if they want... > > I expect that letting users do their own IDNA handling will correspond > to not doing any IDNA handling at all. You did see the words "if they want", right? I'm not talking about removing the stdlib's default IDNA handling, I'm talking about fixing the cases where the stdlib goes out of its way to prevent users from overriding its IDNA handling. And "users" here is a very broad category; it includes libraries like requests, twisted, trio, ... that are already doing better IDNA handling than the stdlib, except in cases where the stdlib actively prevents it. -n -- Nathaniel J. Smith -- https://vorpus.org From rosuav at gmail.com Sun Dec 31 21:00:20 2017 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 1 Jan 2018 13:00:20 +1100 Subject: [Python-Dev] [ssl] The weird case of IDNA In-Reply-To: <20180101013956.GW4215@ando.pearwood.info> References: <081550d6-c884-d9b5-e5e9-8c62d48d787e@python.org> <20171230112837.25247c63@fsol> <23113.1105.231843.272117@turnbull.sk.tsukuba.ac.jp> <20180101013956.GW4215@ando.pearwood.info> Message-ID: On Mon, Jan 1, 2018 at 12:39 PM, Steven D'Aprano wrote: > On Sun, Dec 31, 2017 at 09:07:01AM -0800, Nathaniel Smith wrote: > >> This is another reason why we ought to let users do their own IDNA handling >> if they want... > > I expect that letting users do their own IDNA handling will correspond > to not doing any IDNA handling at all. > That'll lead to one of two possibilities: 1) People use Unicode strings to represent domain names. Python's existing IDNA handling will happen; they're not doing their own. Not what you're talking about. Or: 2) People use byte strings to represent domain names. Any non-ASCII characters will simply cause an exception, if I'm not mistaken. Safe, but not as functional. ChrisA