From martin at v.loewis.de Wed Jan 1 23:17:21 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Jan 2014 23:17:21 +0100 Subject: [Python-Dev] Buildbot - "slave lost" In-Reply-To: References: Message-ID: <52C493F1.7040300@v.loewis.de> Am 31.12.13 01:24, schrieb Chris Angelico: > Does Buildbot retain a constant TCP socket to its server? In short: yes. A little bit longer: It uses the Twisted PerspectiveBroker protocol. That has nearly-transparent reconnects (but as your case shows, not fully transparent), and does regular ping messages to keep the connection alive. So it should be able to handle a failover from one link to the other, but it's certainly better to bind it to the more reliable transport. I believe you can somehow configure the frequency of ping messages so that you network doesn't believe the connection goes idle, plus it will attempt a reconnect if the network did indeed cancel the connection. If you wanted to study this further, you could look into the slave's twisted log file. Regards, Martin From rosuav at gmail.com Thu Jan 2 00:24:32 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 2 Jan 2014 10:24:32 +1100 Subject: [Python-Dev] Buildbot - "slave lost" In-Reply-To: <52C493F1.7040300@v.loewis.de> References: <52C493F1.7040300@v.loewis.de> Message-ID: On Thu, Jan 2, 2014 at 9:17 AM, "Martin v. L?wis" wrote: > So it should be able to handle a failover from one link to > the other, but it's certainly better to bind it to the more > reliable transport. I believe you can somehow configure the > frequency of ping messages so that you network doesn't believe > the connection goes idle, plus it will attempt a reconnect > if the network did indeed cancel the connection. > > If you wanted to study this further, you could look into the > slave's twisted log file. I should have mentioned that I'd already peeked into the log and that's why I posted. There was something in there about it sending application-level keepalives (every 600 seconds, IIRC), which looked good; but it wasn't sufficient, as seen. After I switched it to the other connection, I didn't see any more problems (and it's completed a few builds, though 2.7 seems to be having trouble in the tests - not peculiar to this bot though). ChrisA From daniel at benamy.info Thu Jan 2 19:24:51 2014 From: daniel at benamy.info (Daniel Benamy) Date: Thu, 2 Jan 2014 13:24:51 -0500 Subject: [Python-Dev] Thanks for python! Message-ID: Hey all, I just wanted to say a quick thanks for python. The language, libs, docs, and really the whole ecosystem are so well done, and I really appreciate all your amazing work. Best, Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From drsalists at gmail.com Thu Jan 2 22:10:36 2014 From: drsalists at gmail.com (Dan Stromberg) Date: Thu, 2 Jan 2014 13:10:36 -0800 Subject: [Python-Dev] 2.x vs 3.x survey results Message-ID: Is there a better place to put this than: http://stromberg.dnsalias.org/~strombrg/python-2.x-vs-3.x-survey/ Thanks. From solipsis at pitrou.net Thu Jan 2 22:34:56 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 2 Jan 2014 22:34:56 +0100 Subject: [Python-Dev] 2.x vs 3.x survey results References: Message-ID: <20140102223456.270610c5@fsol> On Thu, 2 Jan 2014 13:10:36 -0800 Dan Stromberg wrote: > Is there a better place to put this than: > http://stromberg.dnsalias.org/~strombrg/python-2.x-vs-3.x-survey/ Thank you for doing this! If wiki.python.org supports file uploads, it may be the place for publishing the results. Regards Antoine. From ben+python at benfinney.id.au Thu Jan 2 23:36:30 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 03 Jan 2014 09:36:30 +1100 Subject: [Python-Dev] Thanks for python! References: Message-ID: <7w4n5mhuz5.fsf@benfinney.id.au> Daniel Benamy writes: > I just wanted to say a quick thanks for python. The language, libs, docs, > and really the whole ecosystem are so well done, and I really appreciate > all your amazing work. Thanks for expressing this! It's good to let the Python developers know that, behind all the requests and complaints, lies a great current of mostly silent but appreciative Python users :-) And, of course, I heartily second the sentiment. A happy new year and good fortune to all Python core developers in 2014! -- \ ?Software patents provide one more means of controlling access | `\ to information. They are the tool of choice for the internet | _o__) highwayman.? ?Anthony Taylor | Ben Finney From drsalists at gmail.com Thu Jan 2 23:54:22 2014 From: drsalists at gmail.com (Dan Stromberg) Date: Thu, 2 Jan 2014 14:54:22 -0800 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: <20140102223456.270610c5@fsol> References: <20140102223456.270610c5@fsol> Message-ID: On Thu, Jan 2, 2014 at 1:34 PM, Antoine Pitrou wrote: > On Thu, 2 Jan 2014 13:10:36 -0800 > Dan Stromberg wrote: >> Is there a better place to put this than: >> http://stromberg.dnsalias.org/~strombrg/python-2.x-vs-3.x-survey/ > > Thank you for doing this! My pleasure. > If wiki.python.org supports file uploads, it may be the place for > publishing the results. I put it at https://wiki.python.org/moin/2.x-vs-3.x-survey , with a link from https://wiki.python.org/moin/Python2orPython3 . I put it in the "CategoryImplementations" category. The text I added to https://wiki.python.org/moin/Python2orPython3 reads "Some people just don't want to use Python 3.x, which is their prerogative. However, they are in the minority.", where "in the minority" is a link to the survey page. Does anyone want to vet it before I tell more people where to find it? From ben+python at benfinney.id.au Fri Jan 3 00:20:26 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 03 Jan 2014 10:20:26 +1100 Subject: [Python-Dev] 2.x vs 3.x survey results References: <20140102223456.270610c5@fsol> Message-ID: <7wzjnegedh.fsf@benfinney.id.au> Antoine Pitrou writes: > If wiki.python.org supports file uploads, it may be the place for > publishing the results. Dan, can your reporting tool produce the report in HTML format (and plots as SVG images)? That would be IMO more suitable for uploading. -- \ ?Good morning, Pooh Bear?, said Eeyore gloomily. ?If it is a | `\ good morning?, he said. ?Which I doubt?, said he. ?A. A. Milne, | _o__) _Winnie-the-Pooh_ | Ben Finney From greg at krypto.org Fri Jan 3 02:45:33 2014 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 2 Jan 2014 17:45:33 -0800 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: References: <20140102223456.270610c5@fsol> Message-ID: Somewhere you need to describe the survey methodology, who was surveyed, how were they selected, etc. On Thu, Jan 2, 2014 at 2:54 PM, Dan Stromberg wrote: > On Thu, Jan 2, 2014 at 1:34 PM, Antoine Pitrou > wrote: > > On Thu, 2 Jan 2014 13:10:36 -0800 > > Dan Stromberg wrote: > >> Is there a better place to put this than: > >> http://stromberg.dnsalias.org/~strombrg/python-2.x-vs-3.x-survey/ > > > > Thank you for doing this! > > My pleasure. > > > If wiki.python.org supports file uploads, it may be the place for > > publishing the results. > > I put it at https://wiki.python.org/moin/2.x-vs-3.x-survey , with a > link from https://wiki.python.org/moin/Python2orPython3 . I put it in > the "CategoryImplementations" category. > > The text I added to https://wiki.python.org/moin/Python2orPython3 > reads "Some people just don't want to use Python 3.x, which is their > prerogative. However, they are in the minority.", where "in the > minority" is a link to the survey page. > > Does anyone want to vet it before I tell more people where to find it? > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From drsalists at gmail.com Fri Jan 3 03:10:34 2014 From: drsalists at gmail.com (Dan Stromberg) Date: Thu, 2 Jan 2014 18:10:34 -0800 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: <7wzjnegedh.fsf@benfinney.id.au> References: <20140102223456.270610c5@fsol> <7wzjnegedh.fsf@benfinney.id.au> Message-ID: On Thu, Jan 2, 2014 at 3:20 PM, Ben Finney wrote: > Antoine Pitrou writes: > >> If wiki.python.org supports file uploads, it may be the place for >> publishing the results. > > Dan, can your reporting tool produce the report in HTML format (and > plots as SVG images)? That would be IMO more suitable for uploading. I believe people with a more expensive account than I paid for, have access to an HTML version. I don't know if the graphs are jpeg's or what in that. From drsalists at gmail.com Fri Jan 3 03:18:23 2014 From: drsalists at gmail.com (Dan Stromberg) Date: Thu, 2 Jan 2014 18:18:23 -0800 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: References: <20140102223456.270610c5@fsol> Message-ID: I don't know much (if anything ^_^) about survey methodology. I just created a 9 question survey and tossed it at a few places that Pythonistas hang out. Does this look better? https://wiki.python.org/moin/2.x-vs-3.x-survey Thanks. On Thu, Jan 2, 2014 at 5:45 PM, Gregory P. Smith wrote: > Somewhere you need to describe the survey methodology, who was surveyed, how > were they selected, etc. > > > On Thu, Jan 2, 2014 at 2:54 PM, Dan Stromberg wrote: >> >> On Thu, Jan 2, 2014 at 1:34 PM, Antoine Pitrou >> wrote: >> > On Thu, 2 Jan 2014 13:10:36 -0800 >> > Dan Stromberg wrote: >> >> Is there a better place to put this than: >> >> http://stromberg.dnsalias.org/~strombrg/python-2.x-vs-3.x-survey/ >> > >> > Thank you for doing this! >> >> My pleasure. >> >> > If wiki.python.org supports file uploads, it may be the place for >> > publishing the results. >> >> I put it at https://wiki.python.org/moin/2.x-vs-3.x-survey , with a >> link from https://wiki.python.org/moin/Python2orPython3 . I put it in >> the "CategoryImplementations" category. >> >> The text I added to https://wiki.python.org/moin/Python2orPython3 >> reads "Some people just don't want to use Python 3.x, which is their >> prerogative. However, they are in the minority.", where "in the >> minority" is a link to the survey page. >> >> Does anyone want to vet it before I tell more people where to find it? >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > > From rosuav at gmail.com Fri Jan 3 04:01:10 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 3 Jan 2014 14:01:10 +1100 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: References: <20140102223456.270610c5@fsol> Message-ID: On Fri, Jan 3, 2014 at 1:18 PM, Dan Stromberg wrote: > I don't know much (if anything ^_^) about survey methodology. I just > created a 9 question survey and tossed it at a few places that > Pythonistas hang out. Specifically, your methodology was to post the link to python-list and python-dev (and whatever else I didn't see). Apart from "hacker news" - not sure if that's a specific site or you're talking generically - your current description sounds right. ChrisA From janzert at janzert.com Fri Jan 3 04:57:43 2014 From: janzert at janzert.com (Janzert) Date: Thu, 02 Jan 2014 22:57:43 -0500 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: References: <20140102223456.270610c5@fsol> Message-ID: On 1/2/2014 10:01 PM, Chris Angelico wrote: > On Fri, Jan 3, 2014 at 1:18 PM, Dan Stromberg wrote: >> I don't know much (if anything ^_^) about survey methodology. I just >> created a 9 question survey and tossed it at a few places that >> Pythonistas hang out. > > Specifically, your methodology was to post the link to python-list and > python-dev (and whatever else I didn't see). Apart from "hacker news" > - not sure if that's a specific site or you're talking generically - > your current description sounds right. > > ChrisA > I added links to the actual posts to help clarify where and how it was posted. Janzert From storchaka at gmail.com Fri Jan 3 08:59:23 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 03 Jan 2014 09:59:23 +0200 Subject: [Python-Dev] cpython: threading.RLock._acquire_restore() now raises a TypeError instead of a In-Reply-To: <3dw7235gCHz7LjR@mail.python.org> References: <3dw7235gCHz7LjR@mail.python.org> Message-ID: 02.01.14 13:54, victor.stinner ???????(??): > http://hg.python.org/cpython/rev/9a61be172c23 > changeset: 88249:9a61be172c23 > user: Victor Stinner > date: Thu Jan 02 12:47:24 2014 +0100 > summary: > threading.RLock._acquire_restore() now raises a TypeError instead of a > SystemError when it is not called with 2 arguments > - if (!PyArg_ParseTuple(arg, "kl:_acquire_restore", &count, &owner)) > + if (!PyArg_ParseTuple(args, "(kl):_acquire_restore", &count, &owner)) > return NULL; Please don't use "(...)" in PyArg_ParseTuple, it is dangerous (see issue6083 [1]). [1] http://bugs.python.org/issue6083 From victor.stinner at gmail.com Fri Jan 3 11:54:39 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 3 Jan 2014 11:54:39 +0100 Subject: [Python-Dev] cpython: threading.RLock._acquire_restore() now raises a TypeError instead of a In-Reply-To: References: <3dw7235gCHz7LjR@mail.python.org> Message-ID: Hi, 2014/1/3 Serhiy Storchaka : >> - if (!PyArg_ParseTuple(arg, "kl:_acquire_restore", &count, &owner)) >> + if (!PyArg_ParseTuple(args, "(kl):_acquire_restore", &count, &owner)) >> return NULL; > > Please don't use "(...)" in PyArg_ParseTuple, it is dangerous (see issue6083 > [1]). > > [1] http://bugs.python.org/issue6083 Oh, I didn't know this issue. Keeping a reference to the tuple is annoying, it adds a lot of cleanup code. Would it be possible to handle this issue in Argument Clinic, split the function in two parts: a function to parse arguments and keep references, and the implementation function? I already saw that when a format requires to keep a reference. See for example os_path() and os_path_impl() in Modules/posixmodule.c. Victor From victor.stinner at gmail.com Fri Jan 3 13:13:53 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 3 Jan 2014 13:13:53 +0100 Subject: [Python-Dev] cpython: threading.RLock._acquire_restore() now raises a TypeError instead of a In-Reply-To: References: <3dw7235gCHz7LjR@mail.python.org> Message-ID: 2014/1/3 Victor Stinner : > 2014/1/3 Serhiy Storchaka : >>> - if (!PyArg_ParseTuple(arg, "kl:_acquire_restore", &count, &owner)) >>> + if (!PyArg_ParseTuple(args, "(kl):_acquire_restore", &count, &owner)) >>> return NULL; >> >> Please don't use "(...)" in PyArg_ParseTuple, it is dangerous (see issue6083 >> [1]). >> >> [1] http://bugs.python.org/issue6083 > > ... > > Would it be possible to handle this issue in Argument Clinic, split > the function in two parts: a function to parse arguments and keep > references, and the implementation function? Oh, I found a similiar issue but different issue: >>> import resource >>> resource.prlimit(0, resource.RLIMIT_CORE, "\u0100\u0101") Erreur de segmentation (core dumped) This new function uses the following code to parse arguments: if (!PyArg_ParseTuple(args, _Py_PARSE_PID "i|(OO):prlimit", &pid, &resource, &curobj, &maxobj)) return NULL; "\u0100\u0101" is seen as a sequence. Getting an item of this sequence creates a new substring of 1 character, but the substring has only 1 reference, and the only reference is immediatly removed, so the borrowed reference (curobj and maxobj) become immediatly dangling pointers... Victor From zachary.ware at gmail.com Fri Jan 3 17:27:18 2014 From: zachary.ware at gmail.com (Zachary Ware) Date: Fri, 3 Jan 2014 10:27:18 -0600 Subject: [Python-Dev] [Python-checkins] cpython: add unicode_char() in unicodeobject.c to factorize code In-Reply-To: <3dwl7d4x8cz7Ljs@mail.python.org> References: <3dwl7d4x8cz7Ljs@mail.python.org> Message-ID: On Fri, Jan 3, 2014 at 6:01 AM, victor.stinner wrote: > http://hg.python.org/cpython/rev/d453c95def31 > changeset: 88271:d453c95def31 > user: Victor Stinner > date: Fri Jan 03 12:53:47 2014 +0100 > summary: > add unicode_char() in unicodeobject.c to factorize code > > files: > Objects/unicodeobject.c | 86 ++++++++++------------------ > 1 files changed, 31 insertions(+), 55 deletions(-) > > > diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c > --- a/Objects/unicodeobject.c > +++ b/Objects/unicodeobject.c > @@ -2887,17 +2883,7 @@ > return NULL; > } > > - if ((Py_UCS4)ordinal < 256) > - return get_latin1_char((unsigned char)ordinal); > - > - v = PyUnicode_New(1, ordinal); > - if (v == NULL) > - return NULL; > - kind = PyUnicode_KIND(v); > - data = PyUnicode_DATA(v); > - PyUnicode_WRITE(kind, data, 0, ordinal); > - assert(_PyUnicode_CheckConsistency(v, 1)); > - return v; > + return unicode_char((Py_UCS4)ordinal); > } > > PyObject * > @@ -11354,17 +11340,7 @@ > kind = PyUnicode_KIND(self); > data = PyUnicode_DATA(self); > ch = PyUnicode_READ(kind, data, index); > - if (ch < 256) > - return get_latin1_char(ch); > - > - res = PyUnicode_New(1, ch); > - if (res == NULL) > - return NULL; > - kind = PyUnicode_KIND(res); > - data = PyUnicode_DATA(res); > - PyUnicode_WRITE(kind, data, 0, ch); > - assert(_PyUnicode_CheckConsistency(res, 1)); > - return res; > + return unicode_char(ch); > } > > /* Believe it or not, this produces the same value for ASCII strings The above-quoted parts of this changeset caused several compiler warnings due to unused variables. On 32-bit Windows: ..\Objects\unicodeobject.c(2881): warning C4101: 'kind' : unreferenced local variable [P:\ath\to\cpython\PCbuild\pythoncore.vcxproj] ..\Objects\unicodeobject.c(2879): warning C4101: 'v' : unreferenced local variable [P:\ath\to\cpython\PCbuild\pythoncore.vcxproj] ..\Objects\unicodeobject.c(2880): warning C4101: 'data' : unreferenced local variable [P:\ath\to\cpython\PCbuild\pythoncore.vcxproj] ..\Objects\unicodeobject.c(11333): warning C4101: 'res' : unreferenced local variable [P:\ath\to\cpython\PCbuild\pythoncore.vcxproj] I believe this should fix it, but I'll leave it up to you to confirm that, Victor :) diff -r 8a3718f31188 Objects/unicodeobject.c --- a/Objects/unicodeobject.c Fri Jan 03 15:53:20 2014 +0100 +++ b/Objects/unicodeobject.c Fri Jan 03 10:20:12 2014 -0600 @@ -2876,10 +2876,6 @@ PyObject * PyUnicode_FromOrdinal(int ordinal) { - PyObject *v; - void *data; - int kind; - if (ordinal < 0 || ordinal > MAX_UNICODE) { PyErr_SetString(PyExc_ValueError, "chr() arg not in range(0x110000)"); @@ -11330,7 +11326,6 @@ void *data; enum PyUnicode_Kind kind; Py_UCS4 ch; - PyObject *res; if (!PyUnicode_Check(self) || PyUnicode_READY(self) == -1) { PyErr_BadArgument(); -- Zach From victor.stinner at gmail.com Fri Jan 3 17:42:23 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 3 Jan 2014 17:42:23 +0100 Subject: [Python-Dev] [Python-checkins] cpython: add unicode_char() in unicodeobject.c to factorize code In-Reply-To: References: <3dwl7d4x8cz7Ljs@mail.python.org> Message-ID: 2014/1/3 Zachary Ware : > The above-quoted parts of this changeset caused several compiler > warnings due to unused variables. On 32-bit Windows: > (...) > I believe this should fix it, but I'll leave it up to you to confirm > that, Victor :) Oh, I didn't notice these warnings. I fixed them, thanks. Victor From status at bugs.python.org Fri Jan 3 18:07:25 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 3 Jan 2014 18:07:25 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140103170725.2104156A2A@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2013-12-27 - 2014-01-03) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4348 (+16) closed 27538 (+18) total 31886 (+34) Open issues with patches: 1977 Issues opened (27) ================== #18310: itertools.tee() can't accept keyword arguments http://bugs.python.org/issue18310 reopened by py.user #20081: sys.getwindowsversion does not show some fields http://bugs.python.org/issue20081 opened by giampaolo.rodola #20082: Misbehavior of BufferedRandom.write with raw file in append mo http://bugs.python.org/issue20082 opened by erik.bray #20083: smtplib: support for IDN (international domain names) http://bugs.python.org/issue20083 opened by macfreek #20085: Python2.7, wxPython and IDLE 2.7 http://bugs.python.org/issue20085 opened by stubz #20086: test_locale fails on PPC64 PowerLinux http://bugs.python.org/issue20086 opened by serhiy.storchaka #20087: Mismatch between glibc and X11 locale.alias http://bugs.python.org/issue20087 opened by serhiy.storchaka #20088: locale.getlocale() fails if locale name doesn't include encodi http://bugs.python.org/issue20088 opened by serhiy.storchaka #20089: email.message_from_string no longer working in Python 3.4 http://bugs.python.org/issue20089 opened by apollo13 #20090: slight ambiguity in README.txt instructions for building docs http://bugs.python.org/issue20090 opened by MLModel #20091: An index entry for __main__ in "30.5 runpy" is missing http://bugs.python.org/issue20091 opened by MLModel #20092: type() constructor should bind __int__ to __index__ when __ind http://bugs.python.org/issue20092 opened by ethan.furman #20093: Wrong OSError message from os.rename() when dst is a non-empty http://bugs.python.org/issue20093 opened by jderose #20094: intermitent failures with test_dbm http://bugs.python.org/issue20094 opened by ethan.furman #20096: Mention modernize and future in Python 2/3 porting HOWTO http://bugs.python.org/issue20096 opened by brett.cannon #20098: email policy needs a mangle_from setting http://bugs.python.org/issue20098 opened by r.david.murray #20100: epoll docs are not clear with regards to CLOEXEC. http://bugs.python.org/issue20100 opened by r.david.murray #20101: Determine correct behavior for time functions on Windows http://bugs.python.org/issue20101 opened by zach.ware #20102: shutil._make_zipfile possible resource leak http://bugs.python.org/issue20102 opened by peter at psantoro.net #20103: Documentation of itertools.accumulate is confused http://bugs.python.org/issue20103 opened by MLModel #20104: expose posix_spawn(p) http://bugs.python.org/issue20104 opened by benjamin.peterson #20105: Codec exception chaining is losing traceback details http://bugs.python.org/issue20105 opened by ncoghlan #20106: warn_dir is always true for install_data, even if an install_d http://bugs.python.org/issue20106 opened by tabrezm #20109: TestProgram is mentioned in the unittest docs but is not docum http://bugs.python.org/issue20109 opened by r.david.murray #20112: The documentation for http.server error_message_format is inad http://bugs.python.org/issue20112 opened by r.david.murray #20113: os.readv() and os.writev() don't raise an OSError on readv()/w http://bugs.python.org/issue20113 opened by haypo #20114: Sporadic failure of test_semaphore_tracker() of test_multiproc http://bugs.python.org/issue20114 opened by haypo Most recent 15 issues with no replies (15) ========================================== #20114: Sporadic failure of test_semaphore_tracker() of test_multiproc http://bugs.python.org/issue20114 #20112: The documentation for http.server error_message_format is inad http://bugs.python.org/issue20112 #20109: TestProgram is mentioned in the unittest docs but is not docum http://bugs.python.org/issue20109 #20106: warn_dir is always true for install_data, even if an install_d http://bugs.python.org/issue20106 #20105: Codec exception chaining is losing traceback details http://bugs.python.org/issue20105 #20103: Documentation of itertools.accumulate is confused http://bugs.python.org/issue20103 #20102: shutil._make_zipfile possible resource leak http://bugs.python.org/issue20102 #20096: Mention modernize and future in Python 2/3 porting HOWTO http://bugs.python.org/issue20096 #20091: An index entry for __main__ in "30.5 runpy" is missing http://bugs.python.org/issue20091 #20090: slight ambiguity in README.txt instructions for building docs http://bugs.python.org/issue20090 #20088: locale.getlocale() fails if locale name doesn't include encodi http://bugs.python.org/issue20088 #20087: Mismatch between glibc and X11 locale.alias http://bugs.python.org/issue20087 #20082: Misbehavior of BufferedRandom.write with raw file in append mo http://bugs.python.org/issue20082 #20078: zipfile - ZipExtFile.read goes into 100% CPU infinite loop on http://bugs.python.org/issue20078 #20076: Add UTF-8 locale aliases http://bugs.python.org/issue20076 Most recent 15 issues waiting for review (15) ============================================= #20113: os.readv() and os.writev() don't raise an OSError on readv()/w http://bugs.python.org/issue20113 #20102: shutil._make_zipfile possible resource leak http://bugs.python.org/issue20102 #20098: email policy needs a mangle_from setting http://bugs.python.org/issue20098 #20082: Misbehavior of BufferedRandom.write with raw file in append mo http://bugs.python.org/issue20082 #20080: Unused variable in Lib/sqlite3/test/factory.py http://bugs.python.org/issue20080 #20079: Add support for glibc supported locales http://bugs.python.org/issue20079 #20077: Format of TypeError differs between comparison and arithmetic http://bugs.python.org/issue20077 #20076: Add UTF-8 locale aliases http://bugs.python.org/issue20076 #20075: help(open) eats first line http://bugs.python.org/issue20075 #20072: Ttk tests fail when wantobjects is false http://bugs.python.org/issue20072 #20069: Add unit test for os.chown http://bugs.python.org/issue20069 #20064: PyObject_Malloc is not documented http://bugs.python.org/issue20064 #20042: Python Launcher, Windows, fails on scripts w/ non-latin names http://bugs.python.org/issue20042 #20041: TypeError when f_trace is None and tracing. http://bugs.python.org/issue20041 #20035: Suppress 'os.environ was modified' warning on Tcl/Tk tests http://bugs.python.org/issue20035 Top 10 most discussed issues (10) ================================= #19995: %c, %o, %x, %X accept non-integer values instead of raising an http://bugs.python.org/issue19995 9 msgs #20092: type() constructor should bind __int__ to __index__ when __ind http://bugs.python.org/issue20092 8 msgs #20101: Determine correct behavior for time functions on Windows http://bugs.python.org/issue20101 7 msgs #20083: smtplib: support for IDN (international domain names) http://bugs.python.org/issue20083 6 msgs #11798: Test cases not garbage collected after run http://bugs.python.org/issue11798 5 msgs #20075: help(open) eats first line http://bugs.python.org/issue20075 5 msgs #16039: imaplib: unlimited readline() from connection http://bugs.python.org/issue16039 4 msgs #16778: Logger.findCaller needs to be smarter http://bugs.python.org/issue16778 4 msgs #18310: itertools.tee() can't accept keyword arguments http://bugs.python.org/issue18310 4 msgs #19944: Make importlib.find_spec load packages as needed http://bugs.python.org/issue19944 3 msgs Issues closed (18) ================== #16113: SHA-3 (Keccak) support may need to be removed before 3.4 http://bugs.python.org/issue16113 closed by loewis #17282: document the defaultTest parameter to unittest.main() http://bugs.python.org/issue17282 closed by r.david.murray #19347: PEP 453 implementation tracking issue http://bugs.python.org/issue19347 closed by ncoghlan #19422: Neither DTLS nor error for SSLSocket.sendto() of UDP socket http://bugs.python.org/issue19422 closed by pitrou #19648: Empty tests in pickletester need to be implemented or removed http://bugs.python.org/issue19648 closed by pitrou #19728: PEP 453: enable pip by default in the Windows binary installer http://bugs.python.org/issue19728 closed by loewis #19749: test_venv failure on AIX: 'module' object has no attribute 'O_ http://bugs.python.org/issue19749 closed by loewis #19918: PureWindowsPath.relative_to() is not case insensitive http://bugs.python.org/issue19918 closed by pitrou #20031: unittest.TextTestRunner missing run() documentation. http://bugs.python.org/issue20031 closed by python-dev #20055: On Windows NT 6 with administrator account, there are two fail http://bugs.python.org/issue20055 closed by pitrou #20084: smtplib: support for UTF-8 encoded headers (SMTPUTF8) http://bugs.python.org/issue20084 closed by r.david.murray #20095: what is that result!? http://bugs.python.org/issue20095 closed by mark.dickinson #20097: Bad use of `self` in importlib http://bugs.python.org/issue20097 closed by eric.snow #20099: a new idea http://bugs.python.org/issue20099 closed by ezio.melotti #20107: Revert PEP 453 integration http://bugs.python.org/issue20107 closed by dstufft #20108: cannot pass kwarg `func` to `inspect.getcallargs` http://bugs.python.org/issue20108 closed by python-dev #20110: Misleading word used for __annotations__ http://bugs.python.org/issue20110 closed by python-dev #20111: pathlib.PurePath.with_suffix() allows creation of otherwise im http://bugs.python.org/issue20111 closed by pitrou From arigo at tunes.org Sat Jan 4 09:59:14 2014 From: arigo at tunes.org (Armin Rigo) Date: Sat, 4 Jan 2014 09:59:14 +0100 Subject: [Python-Dev] cpython: threading.RLock._acquire_restore() now raises a TypeError instead of a In-Reply-To: References: <3dw7235gCHz7LjR@mail.python.org> Message-ID: Hi Serhiy, On Fri, Jan 3, 2014 at 8:59 AM, Serhiy Storchaka wrote: >> + if (!PyArg_ParseTuple(args, "(kl):_acquire_restore", &count, &owner)) > > Please don't use "(...)" in PyArg_ParseTuple, it is dangerous (see issue6083 I think that in this case it is fine, because the "k" and "l" are returning C integers. The refcounting issue occurs only when PyObject* are returned. A bient?t, Armin. From storchaka at gmail.com Sat Jan 4 13:58:57 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 04 Jan 2014 14:58:57 +0200 Subject: [Python-Dev] Subclasses vs. special methods Message-ID: Should implicit converting an instance of int, float, complex, str, bytes, etc subclasses to call appropriate special method __int__ (or __index__), __float__, __complex__, __str__, __bytes__, etc? Currently explicit converting calls these methods, but implicit converting doesn't. >>> class I(int): ... def __int__(self): return 42 ... def __index__(self): return 43 ... >>> class F(float): ... def __float__(self): return 42.0 ... >>> class S(str): ... def __str__(self): return '*' ... >>> int(I(65)) 42 >>> float(F(65)) 42.0 >>> str(S('A')) '*' >>> chr(I(65)) 'A' >>> import cmath; cmath.rect(F(65), 0) (65+0j) >>> ord(S('A')) 65 Issue17576 [1] proposes to call special methods for implicit converting. I have doubts about this. 1. I afraid that this will adds places where arbitrary Python code is unexpectedly called. For example see changeset9a61be172c23 discussed in neighbor thread. If the "k" format code will call __int__(), Python code can modify unpacked list argument during parsing arguments in PyArg_ParseTuple(). 2. PyLong_As*() functions already is not very consistent. Some of them calls __int__() for argument which is not an instance of int subclass, other accepts only instances of int subclasses. PyLong_AsVoidPtr() calls or not calls __int__() depending on the sign of the argument. 3. We can't consistency call special methods for all types. E.g. for strings we can't call __str__() when processing the "s" code in PyArg_ParseTuple() because this will cause a leak. I think that overriding special converting method in a subclass of corresponding type should be restricted. I see two consistent and safe possibilities: 1. Forbidden. I.e. above declarations of I, F and S classes should raise exceptions. 2. Has no effect. I.e. both int(I(65)) and operator.index(I(65)) should return 65. [1] http://bugs.python.org/issue17576 From hugo at gfierro.com Sat Jan 4 16:36:18 2014 From: hugo at gfierro.com (Hugo G. Fierro) Date: Sat, 4 Jan 2014 10:36:18 -0500 Subject: [Python-Dev] Bug? http.client assumes iso-8859-1 encoding of HTTP headers Message-ID: Hi Python devs, I am trying to download an HTML document. I get an HTTP 301 (Moved Permanently) with a UTF-8 encoded Location header and http.client decodes it as iso-8859-1. When there's a non-ASCII character in the redirect URL then I can't download the document. In client.py def parse_headers() I see the call to decode('iso-8859-1'). My personal hack is to use whatever charset is defined in the Content-Type HTTP header (utf8) or fall back into iso-8859-1. At this point I am not sure where/how a fix should occur so I thought I'd run it by you in case I should file a bug. Note that I don't use http.client directly, but through the python-requests library. I include some code to reproduce the problem below. Cheers, Hugo ----- #!/usr/bin/env python3 # Trying to replicate what wget does with a 301 redirect: # wget --server-response www.starbucks.com/store/158/AT/Karntnerstrasse/K%c3%a4rntnerstrasse-49-Vienna-9-1010 import http.client import urllib.parse s2='/store/158/AT/Karntnerstrasse/K%c3%a4rntnerstrasse-49-Vienna-9-1010' s3=' http://www.starbucks.com/store/158/at/karntnerstrasse/k%C3%A4rntnerstrasse-49-vienna-9-1010 ' conn = http.client.HTTPConnection('www.starbucks.com') conn.request('GET', s2) r = conn.getresponse() print('Location', r.headers.get('Location')) print('Expected', urllib.parse.unquote(s3)) assert r.status == 301 assert r.headers.get('Location') == urllib.parse.unquote(s3), \ 'decoded as iso-8859-1 instead of utf8' conn = http.client.HTTPConnection('www.starbucks.com') conn.request('GET', s3) r = conn.getresponse() assert r.status == 200 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jan 4 17:24:43 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 5 Jan 2014 03:24:43 +1100 Subject: [Python-Dev] Bug? http.client assumes iso-8859-1 encoding of HTTP headers In-Reply-To: References: Message-ID: On Sun, Jan 5, 2014 at 2:36 AM, Hugo G. Fierro wrote: > I am trying to download an HTML document. I get an HTTP 301 (Moved > Permanently) with a UTF-8 encoded Location header and http.client decodes it > as iso-8859-1. When there's a non-ASCII character in the redirect URL then I > can't download the document. > > In client.py def parse_headers() I see the call to decode('iso-8859-1'). My > personal hack is to use whatever charset is defined in the Content-Type > HTTP header (utf8) or fall back into iso-8859-1. > > At this point I am not sure where/how a fix should occur so I thought I'd > run it by you in case I should file a bug. Note that I don't use http.client > directly, but through the python-requests library. I'm not 100% sure, but I believe non-ASCII characters are outright forbidden in a Location: header. It's possible that an RFC2047 tag might be used, but my reading of RFC2616 is that that's only for text fields, not for Location. These non-ASCII characters ought to be percent-encoded, and anything doing otherwise is buggy. Confirming what you're seeing with a plain socket: >>> s=socket.socket() >>> s.connect(("www.starbucks.com",80)) >>> s.send(b'GET /store/158/AT/Karntnerstrasse/K%c3%a4rntnerstrasse-49-Vienna-9-1010 HTTP/1.1\r\nHost: www.starbucks.com\r\nAccept-Encoding: identity\r\n\r\n') 136 >>> s.recv(1024) b'HTTP/1.1 301 Moved Permanently\r\nContent-Type: text/html; charset=UTF-8\r\nLocation: http://www.starbucks.com/store/158/at/karntnerstrasse/k\xc3\xa4rntnerstrasse-49-vienna-9-1010\r\n ........' I'm pretty sure that server is in violation of the spec, so all bets are off as to what any other server will do. If you know you're dealing with this one server, you can probably hack around this, but I don't think it belongs in core code. Unless, of course, I'm completely wrong about the spec, or if there's a de facto spec that lots of servers follow, in which case maybe it would be worth doing. ChrisA From catch-all at masklinn.net Sat Jan 4 17:50:23 2014 From: catch-all at masklinn.net (Xavier Morel) Date: Sat, 4 Jan 2014 17:50:23 +0100 Subject: [Python-Dev] Bug? http.client assumes iso-8859-1 encoding of HTTP headers In-Reply-To: References: Message-ID: <83607460-8431-4EF4-984E-2D2DD09EEFDA@masklinn.net> On 2014-01-04, at 17:24 , Chris Angelico wrote: > On Sun, Jan 5, 2014 at 2:36 AM, Hugo G. Fierro wrote: >> I am trying to download an HTML document. I get an HTTP 301 (Moved >> Permanently) with a UTF-8 encoded Location header and http.client decodes it >> as iso-8859-1. When there's a non-ASCII character in the redirect URL then I >> can't download the document. >> >> In client.py def parse_headers() I see the call to decode('iso-8859-1'). My >> personal hack is to use whatever charset is defined in the Content-Type >> HTTP header (utf8) or fall back into iso-8859-1. >> >> At this point I am not sure where/how a fix should occur so I thought I'd >> run it by you in case I should file a bug. Note that I don't use http.client >> directly, but through the python-requests library. > > I'm not 100% sure, but I believe non-ASCII characters are outright > forbidden in a Location: header. It's possible that an RFC2047 tag > might be used, but my reading of RFC2616 is that that's only for text > fields, not for Location. These non-ASCII characters ought to be > percent-encoded, and anything doing otherwise is buggy. That is also my reading, the Location field?s value is defined as an absoluteURI (RFC2616, section 14.30): > Location = "Location" ":" absoluteURI section 3.2.1 indicates that "absoluteURI" (and other related concepts) are used as defined by RFC 2396 "Uniform Resource Identifiers (URI): Generic Syntax", that is: > absoluteURI = scheme ":" ( hier_part | opaque_part ) both "hier_part" and "opaque_part" consist of some punctuation characters, "escaped" and "unreserved". "escaped" is %-encoded characters which leaves "unreserved" defined as "alphanum | mark". "mark" is more punctuation and "alphanum" is ASCII's alphanumeric ranges. Furthermore, although RFC 3986 moves some stuff around and renames some production rules, it seems to have kept this limitation. From scott+python-dev at scottdial.com Sat Jan 4 20:21:04 2014 From: scott+python-dev at scottdial.com (Scott Dial) Date: Sat, 04 Jan 2014 14:21:04 -0500 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: References: <20140102223456.270610c5@fsol> Message-ID: <52C85F20.4030204@scottdial.com> On 2014-01-02 17:54, Dan Stromberg wrote: > I put it at https://wiki.python.org/moin/2.x-vs-3.x-survey It would've been nice to see some crosstabs. Pretty much any question after Q3 is incomprehensible without splitting the respondents into sub-groups first. Of the 2.49% of people who said they've never written Python 2.x, how many of those people also said they have never written Python 3.x too? (There really is 4 categories of developers being surveyed here.) Of the 22.91% of people who said Python 3.x was a mistake, how many of them also said they have never written any Python 3.x? Of the 40% of people who said they have never written Python 3.x, how many of them also said they had dependencies keeping them on Python 2.x? Etc. -- Scott Dial scott at scottdial.com From stefan at bytereef.org Sat Jan 4 21:00:09 2014 From: stefan at bytereef.org (Stefan Krah) Date: Sat, 4 Jan 2014 21:00:09 +0100 Subject: [Python-Dev] [Python-checkins] cpython: Issue #19976: Argument Clinic METH_NOARGS functions now always In-Reply-To: <3dxXbZ30w6z7Ljs@mail.python.org> References: <3dxXbZ30w6z7Ljs@mail.python.org> Message-ID: <20140104200009.GA23125@sleipnir.bytereef.org> Probably Rietveld did not send mail, so I mention my review comments again: larry.hastings wrote: > +#ifdef __GNUC__ > +#define Py_UNUSED(name) _unused_ ## name __attribute__((unused)) > +#else > +#define Py_UNUSED(name) _unused_ ## name > +#endif > + The Intel compiler defines __GNUC__ but chokes on the __attribute__(). This works: #if defined(__GNUC__) && !defined(__INTEL_COMPILER) > +_pickle_Pickler_clear_memo(PyObject *self, PyObject *Py_UNUSED(ignored)) I'm not a native speaker, but UNUSED(ignored) reads strange to me. I would prefer UNUSED(args). Stefan Krah From martin at v.loewis.de Sat Jan 4 21:43:57 2014 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sat, 04 Jan 2014 21:43:57 +0100 Subject: [Python-Dev] Bug? http.client assumes iso-8859-1 encoding of HTTP headers In-Reply-To: References: Message-ID: <20140104214357.Horde.nHoCZ_WcK6WQe5i-ZEyi0Q9@webmail.df.eu> Quoting Chris Angelico : > I'm pretty sure that server is in violation of the spec, so all bets > are off as to what any other server will do. If you know you're > dealing with this one server, you can probably hack around this, but I > don't think it belongs in core code. Unless, of course, I'm completely > wrong about the spec, or if there's a de facto spec that lots of > servers follow, in which case maybe it would be worth doing. It would be possible to support this better by using "ascii" with "surrogateescape" when receiving the redirect, and using the same for all URLs coming into http.client. This would implement a best-effort strategy at preserving the bogus URL, and still maintain the notion that URLs are text (with the other path being to also allow bytes as URLs, and always parsing Location as bytes). Regards, Martin From stefan_ml at behnel.de Sat Jan 4 22:17:43 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 04 Jan 2014 22:17:43 +0100 Subject: [Python-Dev] [Python-checkins] cpython: Issue #19976: Argument Clinic METH_NOARGS functions now always In-Reply-To: <20140104200009.GA23125@sleipnir.bytereef.org> References: <3dxXbZ30w6z7Ljs@mail.python.org> <20140104200009.GA23125@sleipnir.bytereef.org> Message-ID: Stefan Krah, 04.01.2014 21:00: > Probably Rietveld did not send mail, so I mention my review comments again: > > larry.hastings wrote: >> +#ifdef __GNUC__ >> +#define Py_UNUSED(name) _unused_ ## name __attribute__((unused)) >> +#else >> +#define Py_UNUSED(name) _unused_ ## name >> +#endif >> + > > The Intel compiler defines __GNUC__ but chokes on the __attribute__(). > > This works: > > #if defined(__GNUC__) && !defined(__INTEL_COMPILER) We use this in Cython and according to the mailing list echo, it would seem that there are people running it through Intel's compiler as well: """ #ifndef CYTHON_UNUSED # if defined(__GNUC__) # if !(defined(__cplusplus)) || (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif # elif defined(__ICC) || (defined(__INTEL_COMPILER) && !defined(_MSC_VER)) # define CYTHON_UNUSED __attribute__ ((__unused__)) # else # define CYTHON_UNUSED # endif #endif """ I wonder why this works, though, given that you say Intel doesn't support "__attribute__". The only difference I can spot is the space behind it. In any case, I agree that the right way to do it is a bit more complex than in the original commit. Stefan From stefan at bytereef.org Sat Jan 4 22:33:35 2014 From: stefan at bytereef.org (Stefan Krah) Date: Sat, 4 Jan 2014 22:33:35 +0100 Subject: [Python-Dev] [Python-checkins] cpython: Issue #19976: Argument Clinic METH_NOARGS functions now always In-Reply-To: References: <3dxXbZ30w6z7Ljs@mail.python.org> <20140104200009.GA23125@sleipnir.bytereef.org> Message-ID: <20140104213335.GA25643@sleipnir.bytereef.org> Stefan Behnel wrote: > """ > #ifndef CYTHON_UNUSED > # if defined(__GNUC__) > # if !(defined(__cplusplus)) || (__GNUC__ > 3 || (__GNUC__ == 3 && > __GNUC_MINOR__ >= 4)) > # define CYTHON_UNUSED __attribute__ ((__unused__)) > # else > # define CYTHON_UNUSED > # endif > # elif defined(__ICC) || (defined(__INTEL_COMPILER) && !defined(_MSC_VER)) > # define CYTHON_UNUSED __attribute__ ((__unused__)) > # else > # define CYTHON_UNUSED > # endif > #endif > """ > > I wonder why this works, though, given that you say Intel doesn't support > "__attribute__". The only difference I can spot is the space behind it. You're right, icc version 12.0 supports the attribute. It must have been some earlier version that failed. Stefan Krah From luca.sbardella at gmail.com Sun Jan 5 01:39:53 2014 From: luca.sbardella at gmail.com (Luca Sbardella) Date: Sun, 5 Jan 2014 00:39:53 +0000 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: <52C85F20.4030204@scottdial.com> References: <20140102223456.270610c5@fsol> <52C85F20.4030204@scottdial.com> Message-ID: On 4 January 2014 19:21, Scott Dial wrote: > On 2014-01-02 17:54, Dan Stromberg wrote: > > I put it at https://wiki.python.org/moin/2.x-vs-3.x-survey > > It would've been nice to see some crosstabs. Pretty much any question > after Q3 is incomprehensible without splitting the respondents into > sub-groups first. > > Of the 2.49% of people who said they've never written Python 2.x, how > many of those people also said they have never written Python 3.x too? > (There really is 4 categories of developers being surveyed here.) Of the > 22.91% of people who said Python 3.x was a mistake, how many of them > also said they have never written any Python 3.x? Of the 40% of people > who said they have never written Python 3.x, how many of them also said > they had dependencies keeping them on Python 2.x? Etc. > > -- > Scott Dial > scott at scottdial.com > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/luca.sbardella%40gmail.com > Hi guys, you are my heroes but this survey is quite useless, can you include more people? I wasn't aware of it so many thousands of python users. And after that, you are well aware that Python 3 or 2 is becoming a liability, just stick with one, anyone (3) at this point. I don't want to go and learn a new language, please. Sorry for the rant -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Sun Jan 5 04:42:05 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 05 Jan 2014 14:42:05 +1100 Subject: [Python-Dev] 2.x vs 3.x survey results References: <20140102223456.270610c5@fsol> <52C85F20.4030204@scottdial.com> Message-ID: <7wvbxzgkmq.fsf@benfinney.id.au> Luca Sbardella writes: > you are my heroes but this survey is quite useless, can you include more > people? The survey cohort was self-selected from those who read the forums where it was posted. > I wasn't aware of it so many thousands of python users. That statement confuses me. Were you aware of it, or not? How did you become aware of it? > And after that, you are well aware that Python 3 or 2 is becoming a > liability, just stick with one, anyone (3) at this point. The policy of the Python core developers is quite clear, and has been for many years: Python 2 is a dead end, and Python 2.7 (released 2010-07-03, 3? years ago) is the last Python 2. Python 2.7 is the last of the Python 2 line, there will never be new Python 2 features , everyone should migrate to Python 3. That is already the Python core developers's published policy. So, to whom are you speaking here on the Python core developers' forum? > I don't want to go and learn a new language, please. Great! If you already know Python, then there is very little (certainly not ?a new language?) different to move from Python 2.7 to Python 3. Enjoy! -- \ ?Our task must be to free ourselves from our prison by widening | `\ our circle of compassion to embrace all humanity and the whole | _o__) of nature in its beauty.? ?Albert Einstein | Ben Finney From gokoproject at gmail.com Sun Jan 5 05:20:24 2014 From: gokoproject at gmail.com (John Yeuk Hon Wong) Date: Sat, 04 Jan 2014 23:20:24 -0500 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: <7wvbxzgkmq.fsf@benfinney.id.au> References: <20140102223456.270610c5@fsol> <52C85F20.4030204@scottdial.com> <7wvbxzgkmq.fsf@benfinney.id.au> Message-ID: <52C8DD88.8010309@gmail.com> On 1/4/14 10:42 PM, Ben Finney wrote: > Luca Sbardella writes: > >> you are my heroes but this survey is quite useless, can you include more >> people? > The survey cohort was self-selected from those who read the forums where > it was posted. > >> I wasn't aware of it so many thousands of python users. > That statement confuses me. Were you aware of it, or not? How did you > become aware of it? > >> And after that, you are well aware that Python 3 or 2 is becoming a >> liability, just stick with one, anyone (3) at this point. > The policy of the Python core developers is quite clear, and has been > for many years: Python 2 is a dead end, and Python 2.7 (released > 2010-07-03, 3? years ago) is the last Python 2. > > Python 2.7 is the last of the Python 2 line, there will never be new > Python 2 features , everyone > should migrate to Python 3. > > That is already the Python core developers's published policy. So, to > whom are you speaking here on the Python core developers' forum? > >> I don't want to go and learn a new language, please. > Great! If you already know Python, then there is very little (certainly > not ?a new language?) different to move from Python 2.7 to Python 3. > > Enjoy! > I think it helps Luca and many others (including myself) if there is a reference of the difference between 2.7 and Python 3.3+. There are PEPs and books, but is there any such long list of references? If not, should we start investing in one? I know the basic one such as xrange and range, items vs iteritems, izip vs zip that sort of uniform syntax/library inclusion difference. If there is such reference available? Yeuk Hon From drsalists at gmail.com Sun Jan 5 05:48:40 2014 From: drsalists at gmail.com (Dan Stromberg) Date: Sat, 4 Jan 2014 20:48:40 -0800 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: <52C8DD88.8010309@gmail.com> References: <20140102223456.270610c5@fsol> <52C85F20.4030204@scottdial.com> <7wvbxzgkmq.fsf@benfinney.id.au> <52C8DD88.8010309@gmail.com> Message-ID: On Sat, Jan 4, 2014 at 8:20 PM, John Yeuk Hon Wong wrote: > I think it helps Luca and many others (including myself) if there is a > reference of the difference between 2.7 and Python 3.3+. > There are PEPs and books, but is there any such long list of references? > > If not, should we start investing in one? I know the basic one such as > xrange and range, items vs iteritems, izip vs zip that sort of uniform > syntax/library inclusion difference. > > If there is such reference available? This isn't comprehensive, but it does cover the issues I ran into while writing a backup program (of about 7,000 lines) that runs on 2.x and 3.x, unmodified: http://stromberg.dnsalias.org/~dstromberg/Intro-to-Python/ Specifically, I mean the "Writing code to run on Python 2.x and 3.x" document. From regebro at gmail.com Sun Jan 5 10:08:37 2014 From: regebro at gmail.com (Lennart Regebro) Date: Sun, 5 Jan 2014 10:08:37 +0100 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: <52C8DD88.8010309@gmail.com> References: <20140102223456.270610c5@fsol> <52C85F20.4030204@scottdial.com> <7wvbxzgkmq.fsf@benfinney.id.au> <52C8DD88.8010309@gmail.com> Message-ID: On Sun, Jan 5, 2014 at 5:20 AM, John Yeuk Hon Wong wrote: > I think it helps Luca and many others (including myself) if there is a > reference of the difference between 2.7 and Python 3.3+. Not specifically for 2.7 and 3.3, no. This is a fairly complete list: http://python3porting.com/differences.html > There are PEPs and books, but is there any such long list of references? > > If not, should we start investing in one? I know the basic one such as > xrange and range, items vs iteritems, izip vs zip that sort of uniform > syntax/library inclusion difference. > > If there is such reference available? I'm honestly despairing that people still don't know that there is a free book on the topic. I have no idea how to increase the knowledge on this point. //Lennart From ben+python at benfinney.id.au Sun Jan 5 10:26:31 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 05 Jan 2014 20:26:31 +1100 Subject: [Python-Dev] Discoverability of guides to Python 3 porting (was: 2.x vs 3.x survey results) References: <20140102223456.270610c5@fsol> <52C85F20.4030204@scottdial.com> <7wvbxzgkmq.fsf@benfinney.id.au> <52C8DD88.8010309@gmail.com> Message-ID: <7wmwjahj94.fsf_-_@benfinney.id.au> Lennart Regebro writes: > On Sun, Jan 5, 2014 at 5:20 AM, John Yeuk Hon Wong > wrote: > > If there is such reference available? > > I'm honestly despairing that people still don't know that there is a > free book on the topic. I have no idea how to increase the knowledge > on this point. John Yeuk Hon Wong, where did you look (unsuccessfully) for this information? Where, on the Python website, did you first expect to find this information and fail to find it? -- \ ?If you do not trust the source do not use this program.? | `\ ?Microsoft Vista security dialogue | _o__) | Ben Finney From larry at hastings.org Sun Jan 5 17:21:12 2014 From: larry at hastings.org (Larry Hastings) Date: Sun, 05 Jan 2014 08:21:12 -0800 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby Message-ID: <52C98678.6060201@hastings.org> Let me start with a summary of the current status of Argument Clinic. It's checked in, it seems to be working fine. As of Friday I've checked in some reasonably complete documentation as a howto: http://docs.python.org/3.4/howto/clinic.html At last, here in beta 2, Argument Clinic is ready for prime time. What about adoption? That's where Argument Clinic has stalled. By my estimate, there are about six hundred places that could be converted to work with Argument Clinic in CPython; as of this writing only a dozen or two have actually been converted. Now, properly converting a function to work with Argument Clinic does not change its behavior. Internally, the code performing argument parsing should be nigh-identical; it should call the same PyArg_Parse function, with the same arguments, and the implementation should perform the same work as a result. The only externally observable change should be that inspect.signature() now produces a valid signature for the builtin; in all other respects Python should be unchanged. No documentation should have to change, no tests should need to be modified, and absolutely no code should be broken as a result. Converting a function to use Argument Clinic should be a blissfully low-risk procedure, and produce a pleasant, easier-to-maintain result. You see where I'm going with this. I am now, reluctantly, proposing that once 3.4.0b2 ships (should be later today), we the Python core development community roll up our collective sleeves and attempt to convert all the builtins to work with Argument Clinic. I call this "The Great Argument Clinic Conversion Derby". The rules of the derby: * The derby stops when RC 1 gets tagged, which should be January 18th. * I'll create issues on the issue tracker for converting each C file. * Participants will take ownership of an issue for a particular file, and have a couple of days to submit a patch. If an issue languishes I reserve the right to reassign it. * I pledge to be highly available and responsive during the derby. * I volunteer to convert posixmodule.c, which is about 60 functions (and therefore 10% of the workload). * I volunteer to review patches until my eyes bleed. I'd prefer to review every single conversion, though it's possible that isn't feasible, not sure. (Though I will have a /lot/ of time I can devote to this.) * I'll create a leader board where contributors are ranked by how many functions they've converted, if people want it, in an endeavor to spark interest and provide some bragging rights. Upsides: * Every builtins we convert is one more builtin with introspection information. It'd be nice to have that in 3.4. * Easier maintenance going forward. Downsides: * Someone could improperly convert a function, which could change the builtin's semantics and break code, and nobody notices and we ship the breakage in 3.4.0 final. I've discussed this with a number of other core developers; so far I've only gotten positive responses. Otherwise I wouldn't propose such madness. (Making changes to 600 different places in the Python tree? What am I thinking?) Keep in mind, this isn't "now or never"; the choice is between "convert now for 3.4" and "wait until after 3.4 final, then convert everything, and it'll ship in 3.5". We'll have this sooner or later--the question is, sooner? or later? What say you? +1? -1e100? Anxiously yours, /arry -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Jan 5 16:58:14 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 05 Jan 2014 07:58:14 -0800 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: <52C8DD88.8010309@gmail.com> References: <20140102223456.270610c5@fsol> <52C85F20.4030204@scottdial.com> <7wvbxzgkmq.fsf@benfinney.id.au> <52C8DD88.8010309@gmail.com> Message-ID: <52C98116.9060609@stoneleaf.us> On 01/04/2014 08:20 PM, John Yeuk Hon Wong wrote: > > I think it helps Luca and many others (including myself) if there is a reference of the difference between 2.7 and > Python 3.3+. Here's another reference: http://ptgmedia.pearsoncmg.com/imprint_downloads/informit/promotions/python/python2python3.pdf -- ~Ethan~ From brian at python.org Sun Jan 5 18:23:45 2014 From: brian at python.org (Brian Curtin) Date: Sun, 5 Jan 2014 11:23:45 -0600 Subject: [Python-Dev] 2.x vs 3.x survey results In-Reply-To: References: <20140102223456.270610c5@fsol> <52C85F20.4030204@scottdial.com> <7wvbxzgkmq.fsf@benfinney.id.au> <52C8DD88.8010309@gmail.com> Message-ID: On Sun, Jan 5, 2014 at 3:08 AM, Lennart Regebro wrote: > On Sun, Jan 5, 2014 at 5:20 AM, John Yeuk Hon Wong > wrote: >> I think it helps Luca and many others (including myself) if there is a >> reference of the difference between 2.7 and Python 3.3+. > > Not specifically for 2.7 and 3.3, no. This is a fairly complete list: > > http://python3porting.com/differences.html > >> There are PEPs and books, but is there any such long list of references? >> >> If not, should we start investing in one? I know the basic one such as >> xrange and range, items vs iteritems, izip vs zip that sort of uniform >> syntax/library inclusion difference. >> >> If there is such reference available? > > I'm honestly despairing that people still don't know that there is a > free book on the topic. I have no idea how to increase the knowledge > on this point. I think we collectively need better SEO, or something like that. Python 3 would be in a better place if people actually knew the current state of things, versus asking people on "Hacker News". I constantly see people claiming they are stuck on Python 2 until NumPy, SciPy, and matplotlib are ported. Many of these people state they would love to use Python 3 if it weren't for those projects. However, those projects have all been ported -- and the first two have been available for several years now. The same goes for differences documents. I think 15 of us have written such documents, most of which cross-reference the other documents. Somehow very few people seem to know about any of them. From solipsis at pitrou.net Sun Jan 5 18:32:53 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Jan 2014 18:32:53 +0100 Subject: [Python-Dev] 2.x vs 3.x survey results References: <20140102223456.270610c5@fsol> <52C85F20.4030204@scottdial.com> <7wvbxzgkmq.fsf@benfinney.id.au> <52C8DD88.8010309@gmail.com> Message-ID: <20140105183253.7752812f@fsol> On Sun, 5 Jan 2014 11:23:45 -0600 Brian Curtin wrote: > On Sun, Jan 5, 2014 at 3:08 AM, Lennart Regebro wrote: > > On Sun, Jan 5, 2014 at 5:20 AM, John Yeuk Hon Wong > > wrote: > >> I think it helps Luca and many others (including myself) if there is a > >> reference of the difference between 2.7 and Python 3.3+. > > > > Not specifically for 2.7 and 3.3, no. This is a fairly complete list: > > > > http://python3porting.com/differences.html > > > >> There are PEPs and books, but is there any such long list of references? > >> > >> If not, should we start investing in one? I know the basic one such as > >> xrange and range, items vs iteritems, izip vs zip that sort of uniform > >> syntax/library inclusion difference. > >> > >> If there is such reference available? > > > > I'm honestly despairing that people still don't know that there is a > > free book on the topic. I have no idea how to increase the knowledge > > on this point. > > I think we collectively need better SEO, or something like that. > Python 3 would be in a better place if people actually knew the > current state of things, versus asking people on "Hacker News". Perhaps there should be a porting guide as a prominent chapter in http://docs.python.org/3/ ? Regards Antoine. From eliben at gmail.com Sun Jan 5 19:16:01 2014 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 5 Jan 2014 10:16:01 -0800 Subject: [Python-Dev] Fwd: 2.x vs 3.x survey results In-Reply-To: <20140105183253.7752812f@fsol> References: <20140102223456.270610c5@fsol> <52C85F20.4030204@scottdial.com> <7wvbxzgkmq.fsf@benfinney.id.au> <52C8DD88.8010309@gmail.com> <20140105183253.7752812f@fsol> Message-ID: ---------- Forwarded message ---------- From: Antoine Pitrou Date: Sun, Jan 5, 2014 at 9:32 AM Subject: Re: [Python-Dev] 2.x vs 3.x survey results To: python-dev at python.org On Sun, 5 Jan 2014 11:23:45 -0600 Brian Curtin wrote: > On Sun, Jan 5, 2014 at 3:08 AM, Lennart Regebro wrote: > > On Sun, Jan 5, 2014 at 5:20 AM, John Yeuk Hon Wong > > wrote: > >> I think it helps Luca and many others (including myself) if there is a > >> reference of the difference between 2.7 and Python 3.3+. > > > > Not specifically for 2.7 and 3.3, no. This is a fairly complete list: > > > > http://python3porting.com/differences.html > > > >> There are PEPs and books, but is there any such long list of references? > >> > >> If not, should we start investing in one? I know the basic one such as > >> xrange and range, items vs iteritems, izip vs zip that sort of uniform > >> syntax/library inclusion difference. > >> > >> If there is such reference available? > > > > I'm honestly despairing that people still don't know that there is a > > free book on the topic. I have no idea how to increase the knowledge > > on this point. > > I think we collectively need better SEO, or something like that. > Python 3 would be in a better place if people actually knew the > current state of things, versus asking people on "Hacker News". Perhaps there should be a porting guide as a prominent chapter in http://docs.python.org/3/ ? The (incognito) Google query "porting from python 2 to 3" pops up this as the first result: http://docs.python.org/dev/howto/pyporting.html 2nd place is the wiki.python.org page; 3 & 4 are from Lennart's book. So the SEO is fine, it seems - at least in this case. Similar queries provide similar results. If anyone comes up with a resonable query that gives bad results, we can do some lightweight SEO on it by adding a few links here and there. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sun Jan 5 20:49:26 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 05 Jan 2014 14:49:26 -0500 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby In-Reply-To: <52C98678.6060201@hastings.org> References: <52C98678.6060201@hastings.org> Message-ID: On 1/5/2014 11:21 AM, Larry Hastings wrote: > > > Let me start with a summary of the current status of Argument Clinic. > It's checked in, it seems to be working fine. As of Friday I've checked > in some reasonably complete documentation as a howto: > > http://docs.python.org/3.4/howto/clinic.html > > At last, here in beta 2, Argument Clinic is ready for prime time. > > What about adoption? That's where Argument Clinic has stalled. By my > estimate, there are about six hundred places that could be converted to > work with Argument Clinic in CPython; as of this writing only a dozen or > two have actually been converted. Do you remember which? I suggest builtin classes and functions as priorities. ... > You see where I'm going with this. I am now, reluctantly, proposing > that once 3.4.0b2 ships (should be later today), we the Python core > development community roll up our collective sleeves and attempt to > convert all the builtins to work with Argument Clinic. I will try to speed up my timetable for converting Idle calltips to using inspect.signature instead of the older functions. Does help (pydoc) already do so? -- Terry Jan Reedy From stefan at bytereef.org Sun Jan 5 21:12:36 2014 From: stefan at bytereef.org (Stefan Krah) Date: Sun, 5 Jan 2014 21:12:36 +0100 Subject: [Python-Dev] [RELEASE] libmpdec-2.4.0 (suitable for compiling _decimal) Message-ID: <20140105201236.GA30963@sleipnir.bytereef.org> Hi, I've released libmpdec-2.4.0, which can be used to compile _decimal with the "--with-system-libmpdec" configure option: http://www.bytereef.org/mpdecimal/download.html libmpdec-2.4.0 is exactly the same as the one in the CPython source tree. The API is stable and the libmpdec-2.x.y series will stay binary compatible [1]. Stefan Krah [1] Starting from 2.4.0: 2.4.0 is not binary compatible with 2.3. From larry at hastings.org Sun Jan 5 21:32:22 2014 From: larry at hastings.org (Larry Hastings) Date: Sun, 05 Jan 2014 12:32:22 -0800 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby In-Reply-To: References: <52C98678.6060201@hastings.org> Message-ID: <52C9C156.5030905@hastings.org> On 01/05/2014 11:49 AM, Terry Reedy wrote: > On 1/5/2014 11:21 AM, Larry Hastings wrote: >> By my estimate, there are about six hundred places that could be >> converted >> to work with Argument Clinic in CPython; as of this writing only a >> dozen or >> two have actually been converted. > Do you remember which? I suggest builtin classes and functions as > priorities. I don't, but they're easy to find with UNIX shell tools: fgrep -l clinic */*.c > I will try to speed up my timetable for converting Idle calltips to > using inspect.signature instead of the older functions. Does help > (pydoc) already do so? Yes. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sun Jan 5 22:49:21 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 05 Jan 2014 23:49:21 +0200 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby In-Reply-To: <52C98678.6060201@hastings.org> References: <52C98678.6060201@hastings.org> Message-ID: It looks interesting enough. I volunteer to convert at least the audioop, grp, operator, pwd, spw, sre, struct, tkinter modules (audioop already converted, tkinter in progress). If no one will get them, I perhaps will convert the builtins, sys, itertools, functools modules and str, bytes, bytearray, int objects. But I very much upset by the fact that the generated code is written mixed with written manually. It is difficult to navigate (list of symbols now contains three times more names), makes it difficult to read and provokes error (editing the generated code). It would be better if the generated code was written in separate files. From larry at hastings.org Sun Jan 5 22:20:50 2014 From: larry at hastings.org (Larry Hastings) Date: Sun, 05 Jan 2014 13:20:50 -0800 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 Message-ID: <52C9CCB2.7080703@hastings.org> On behalf of the Python development team, I'm pleased to announce the second beta release of Python 3.4. This is a preview release, and its use is not recommended for production settings. Python 3.4 includes a range of improvements of the 3.x series, including hundreds of small improvements and bug fixes. Major new features and changes in the 3.4 release series include: * PEP 428, a "pathlib" module providing object-oriented filesystem paths * PEP 435, a standardized "enum" module * PEP 436, a build enhancement that will help generate introspection information for builtins * PEP 442, improved semantics for object finalization * PEP 443, adding single-dispatch generic functions to the standard library * PEP 445, a new C API for implementing custom memory allocators * PEP 446, changing file descriptors to not be inherited by default in subprocesses * PEP 450, a new "statistics" module * PEP 451, standardizing module metadata for Python's module import system * PEP 453, a bundled installer for the *pip* package manager * PEP 454, a new "tracemalloc" module for tracing Python memory allocations * PEP 456, a new hash algorithm for Python strings and binary data * PEP 3154, a new and improved protocol for pickled objects * PEP 3156, a new "asyncio" module, a new framework for asynchronous I/O Python 3.4 is now in "feature freeze", meaning that no new features will be added. The final release is projected for late February 2014. To download Python 3.4.0b2 visit: http://www.python.org/download/releases/3.4.0/ Please consider trying Python 3.4.0b2 with your code and reporting any new issues you notice to: http://bugs.python.org/ Enjoy! -- Larry Hastings, Release Manager larry at hastings.org (on behalf of the entire python-dev team and 3.4's contributors) From larry at hastings.org Sun Jan 5 22:52:35 2014 From: larry at hastings.org (Larry Hastings) Date: Sun, 05 Jan 2014 13:52:35 -0800 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby In-Reply-To: References: <52C98678.6060201@hastings.org> Message-ID: <52C9D423.7020402@hastings.org> On 01/05/2014 01:49 PM, Serhiy Storchaka wrote: > But I very much upset by the fact that the generated code is written > mixed with written manually. It is difficult to navigate (list of > symbols now contains three times more names), makes it difficult to > read and provokes error (editing the generated code). It would be > better if the generated code was written in separate files. I had that working at one point. Guido said no, keep it all in one file. I'm flexible but first you'd have to convince him. Cheers, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jan 6 00:08:47 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 6 Jan 2014 09:08:47 +1000 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby In-Reply-To: <52C9D423.7020402@hastings.org> References: <52C98678.6060201@hastings.org> <52C9D423.7020402@hastings.org> Message-ID: On 6 Jan 2014 05:54, "Larry Hastings" wrote: > > On 01/05/2014 01:49 PM, Serhiy Storchaka wrote: >> >> But I very much upset by the fact that the generated code is written mixed with written manually. It is difficult to navigate (list of symbols now contains three times more names), makes it difficult to read and provokes error (editing the generated code). It would be better if the generated code was written in separate files. > > > I had that working at one point. Guido said no, keep it all in one file. I'm flexible but first you'd have to convince him. It's also not something we're stuck with forever - we can start with it inline (which has the advantage of keeping all the code in the same place), and later move to having the helpers in a separate file included from the implementation file if we decide it makes sense to do so. This was discussed a fair bit last language summit (and the day after between me, Guido and Larry), and the thing I like about the current approach is that a C coder should be able to understand the generated code *as C code* without needing to know anything about Argument Clinic and without needing to hunt through other files to find where the generated pieces are defined. As Terry noted, even if we just get "help(name)" working properly for the builtins, I'll count that as a major win. Cheers, Nick. > > Cheers, > > > /arry > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Mon Jan 6 00:25:53 2014 From: stefan at bytereef.org (Stefan Krah) Date: Mon, 6 Jan 2014 00:25:53 +0100 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby In-Reply-To: References: <52C98678.6060201@hastings.org> <52C9D423.7020402@hastings.org> Message-ID: <20140105232553.GA10675@sleipnir.bytereef.org> Nick Coghlan wrote: > > I had that working at one point. Guido said no, keep it all in one file. > > I'm flexible but first you'd have to convince him. > > It's also not something we're stuck with forever - we can start with it inline > (which has the advantage of keeping all the code in the same place), and later > move to having the helpers in a separate file included from the implementation > file if we decide it makes sense to do so. If we move big chunks of code around twice, I guess "hg blame" will break twice, too. That is another thing worth considering. I agree with Serhiy, but that is probably known at this point. :) Stefan Krah From d2mp1a9 at newsguy.com Mon Jan 6 01:19:32 2014 From: d2mp1a9 at newsguy.com (Bob Hanson) Date: Sun, 05 Jan 2014 16:19:32 -0800 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 References: <52C9CCB2.7080703@hastings.org> Message-ID: On Sun, 05 Jan 2014 13:20:50 -0800, Larry Hastings wrote: > On behalf of the Python development team, I'm pleased to announce > the second beta release of Python 3.4. Thanks, Larry and all the devs, your hard work is appreciated. However, why does this new version look like adware or other malware when installing? This is the first time I ever installed a version of Python which caused something called "MSIEXEC.EXE" to try to access some commercial dot-com site. Naturally, my firewall stopped it, but what's going on? (A "command prompt" Windows box opened, followed by my firewall firing off a set of warnings about having blocked attempted unauthorized outbound connections to blah-blah-blah-dot-com.) When I attempted to run the python interpreter it appeared to open normally, but I haven't tried doing anything with it. If this has to do with "ensure_pip" or whatever it's called, perhaps some other solution is called for which is more user-friendly and not as likely to incite unease and mistrust by attempting to silently access a commercial site while Python is installing. At the very least a warning seems to be called for (possibly along with an opt-out). Also, a more friendly site (python.org?) to connect to would help to restore some faith. If it *is* about the pip thing, I can only imagine the frustration of Window users: Having to reconfigure firewalls to (properly?) upgrade Python, or ignoring the firewall-warning "glitch" assuming it's okay. Or, possibly, a silent failure of "something" if they're firewalled with warnings turned off? If it's not pip, what is it? Finally, is my install now broken? Are offline installs now not possible. Does one now need an always-on internet connection to use Python? Thanks -- Bob Hanson -- Snowden reveals that George Orwell was an extreme optimist. From bp at benjamin-peterson.org Mon Jan 6 01:31:12 2014 From: bp at benjamin-peterson.org (Benjamin Peterson) Date: Sun, 05 Jan 2014 16:31:12 -0800 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: References: <52C9CCB2.7080703@hastings.org> Message-ID: <1388968272.16645.66955061.64551259@webmail.messagingengine.com> -- Regards, Benjamin On Sun, Jan 5, 2014, at 04:19 PM, Bob Hanson wrote: > On Sun, 05 Jan 2014 13:20:50 -0800, Larry Hastings wrote: > > > On behalf of the Python development team, I'm pleased to announce > > the second beta release of Python 3.4. > > Thanks, Larry and all the devs, your hard work is appreciated. > > However, why does this new version look like adware or other > malware when installing? > > This is the first time I ever installed a version of Python which > caused something called "MSIEXEC.EXE" to try to access some > commercial dot-com site. > > Naturally, my firewall stopped it, but what's going on? > > (A "command prompt" Windows box opened, followed by my firewall > firing off a set of warnings about having blocked attempted > unauthorized outbound connections to blah-blah-blah-dot-com.) Well, what is blah-blah-blah.com exactly? > > When I attempted to run the python interpreter it appeared to > open normally, but I haven't tried doing anything with it. > > If this has to do with "ensure_pip" or whatever it's called, > perhaps some other solution is called for which is more > user-friendly and not as likely to incite unease and mistrust by > attempting to silently access a commercial site while Python is > installing. At the very least a warning seems to be called for > (possibly along with an opt-out). Also, a more friendly site > (python.org?) to connect to would help to restore some faith. > > If it *is* about the pip thing, I can only imagine the > frustration of Window users: Having to reconfigure firewalls to > (properly?) upgrade Python, or ignoring the firewall-warning > "glitch" assuming it's okay. Or, possibly, a silent failure of > "something" if they're firewalled with warnings turned off? > > If it's not pip, what is it? > > Finally, is my install now broken? Are offline installs now not > possible. Does one now need an always-on internet connection to > use Python? > > Thanks -- > > Bob Hanson > > -- > Snowden reveals that George Orwell was an extreme optimist. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/benjamin%40python.org From guido at python.org Mon Jan 6 01:42:51 2014 From: guido at python.org (Guido van Rossum) Date: Sun, 5 Jan 2014 14:42:51 -1000 Subject: [Python-Dev] Fwd: 2.x vs 3.x survey results In-Reply-To: References: <20140102223456.270610c5@fsol> <52C85F20.4030204@scottdial.com> <7wvbxzgkmq.fsf@benfinney.id.au> <52C8DD88.8010309@gmail.com> <20140105183253.7752812f@fsol> Message-ID: I'm sure that the main problem is that people don't search. Surprisingly, it's often easier to complain "there is no X" than to try to search for a solution to X. On Sun, Jan 5, 2014 at 8:16 AM, Eli Bendersky wrote: > > > ---------- Forwarded message ---------- > From: Antoine Pitrou > Date: Sun, Jan 5, 2014 at 9:32 AM > Subject: Re: [Python-Dev] 2.x vs 3.x survey results > To: python-dev at python.org > > > On Sun, 5 Jan 2014 11:23:45 -0600 > Brian Curtin wrote: >> On Sun, Jan 5, 2014 at 3:08 AM, Lennart Regebro wrote: >> > On Sun, Jan 5, 2014 at 5:20 AM, John Yeuk Hon Wong >> > wrote: >> >> I think it helps Luca and many others (including myself) if there is a >> >> reference of the difference between 2.7 and Python 3.3+. >> > >> > Not specifically for 2.7 and 3.3, no. This is a fairly complete list: >> > >> > http://python3porting.com/differences.html >> > >> >> There are PEPs and books, but is there any such long list of >> >> references? >> >> >> >> If not, should we start investing in one? I know the basic one such as >> >> xrange and range, items vs iteritems, izip vs zip that sort of uniform >> >> syntax/library inclusion difference. >> >> >> >> If there is such reference available? >> > >> > I'm honestly despairing that people still don't know that there is a >> > free book on the topic. I have no idea how to increase the knowledge >> > on this point. >> >> I think we collectively need better SEO, or something like that. >> Python 3 would be in a better place if people actually knew the >> current state of things, versus asking people on "Hacker News". > > Perhaps there should be a porting guide as a prominent chapter in > http://docs.python.org/3/ ? > > The (incognito) Google query "porting from python 2 to 3" pops up this as > the first result: > > http://docs.python.org/dev/howto/pyporting.html > > 2nd place is the wiki.python.org page; 3 & 4 are from Lennart's book. > > So the SEO is fine, it seems - at least in this case. Similar queries > provide similar results. If anyone comes up with a resonable query that > gives bad results, we can do some lightweight SEO on it by adding a few > links here and there. > > Eli > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From tim.peters at gmail.com Mon Jan 6 01:49:40 2014 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 5 Jan 2014 18:49:40 -0600 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: <1388968272.16645.66955061.64551259@webmail.messagingengine.com> References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> Message-ID: [Benjamin Peterson] > ... > This is the first time I ever installed a version of Python which > caused something called "MSIEXEC.EXE" msiexec.exe is not part of the Python download.. msiexec.exe is part of the Windows operating system, and is precisely the program that installs .msi files (which the Python installer is). > to try to access some commercial dot-com site. > > Naturally, my firewall stopped it, but what's going on? Possible: you have a virus or trojan that replaced the system msiexec.exe with its own malware. Run a full virus scan ASAP, try Malwarebyte's anti-malware program, etc etc etc. From d2mp1a9 at newsguy.com Mon Jan 6 03:02:36 2014 From: d2mp1a9 at newsguy.com (Bob Hanson) Date: Sun, 05 Jan 2014 18:02:36 -0800 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> Message-ID: <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> [Bob Hanson] > > This is the first time I ever installed a version of Python which > > caused something called "MSIEXEC.EXE" [Tim Peters] > msiexec.exe is not part of the Python download.. msiexec.exe is part > of the Windows operating system, and is precisely the program that > installs .msi files (which the Python installer is). That is correct. ;-) [Bob Hanson] > > to try to access some commercial dot-com site. > > > > Naturally, my firewall stopped it, but what's going on? [Tim Peters] > Possible: you have a virus or trojan that replaced the system > msiexec.exe with its own malware. Run a full virus scan ASAP, try > Malwarebyte's anti-malware program, etc etc etc. Didn't think this likely, but I have now quintuple-checked everything again. Everything says I have the real McCoy msiexec.exe in its proper location -- just upgraded another app which used MSI installers and it went as per normal. I'm presuming, still, that it is something to do with the "ensure that pip is present on Windows" thing? [See the bottom of my original post.] Do Nick or Martin (or other dev) have any comments? Bob Hanson -- Sent from my Smart shoephone. From tim.peters at gmail.com Mon Jan 6 03:09:23 2014 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 5 Jan 2014 20:09:23 -0600 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> Message-ID: [Bob Hanson] > ... > Didn't think this likely, but I have now quintuple-checked > everything again. Everything says I have the real McCoy > msiexec.exe in its proper location -- just upgraded another app > which used MSI installers and it went as per normal. That sounds most likely to me too ;-) > I'm presuming, still, that it is something to do with the "ensure > that pip is present on Windows" thing? [See the bottom of my > original post.] As Benjamin asked, could you please flesh out what "blah-blah-blah-dot-com" means - what, exactly, was the site your firewall warned you about? My firewall didn't complain when I installed 3.4.0b2 on Windows. From ethan at stoneleaf.us Mon Jan 6 03:07:06 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 05 Jan 2014 18:07:06 -0800 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> Message-ID: <52CA0FCA.7050906@stoneleaf.us> On 01/05/2014 06:02 PM, Bob Hanson wrote: > > I'm presuming, still, that it is something to do with the "ensure > that pip is present on Windows" thing? Perhaps you could help us out by telling us what site was trying to be accessed? -- ~Ethan~ From d2mp1a9 at newsguy.com Mon Jan 6 03:43:15 2014 From: d2mp1a9 at newsguy.com (Bob Hanson) Date: Sun, 05 Jan 2014 18:43:15 -0800 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> Message-ID: <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> On Sun, 5 Jan 2014 20:09:23 -0600, Tim Peters wrote: > As Benjamin asked, could you please flesh out what > "blah-blah-blah-dot-com" means - what, exactly, was the site your > firewall warned you about? Forgive me, but I'm an old man with very poor vision. Using my magnifying glass, I see it is two very long URLs ending with something like after the blah-blah: < ... akametechnology.com> More precisely, these two IP addresses: 23.59.190.113:80 23.59.190.106:80 > My firewall didn't complain when I installed 3.4.0b2 on Windows. I don't use the Windows firewall, and I have mine sent to block all apps connecting in or out unless I make specific rules for them. I have never authorized C:/Windows/System32/msiexec.exe to connect in or out (and it didn't ;-) ). Bob Hanson From tim.peters at gmail.com Mon Jan 6 04:09:53 2014 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 5 Jan 2014 21:09:53 -0600 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> Message-ID: [Bob Hanson] > Forgive me, but I'm an old man with very poor vision. Using my > magnifying glass, I see it is two very long URLs ending with > something like after the blah-blah: < ... akametechnology.com> > > More precisely, these two IP addresses: > 23.59.190.113:80 > 23.59.190.106:80 So: C:\Code>ping -a 23.59.190.113 Pinging a23-59-190-113.deploy.static.akamaitechnologies.com [23.59.190.113] with 32 bytes of data: ... C:\Code>ping -a 23.59.190.106 Pinging a23-59-190-106.deploy.static.akamaitechnologies.com [23.59.190.106] with 32 bytes of data: So it's just Akamai caching content. Common as mud. Can't say specifically what was being cached, but it _could_ be that your ISP contracts with Akamai. >> My firewall didn't complain when I installed 3.4.0b2 on Windows. > I don't use the Windows firewall, and I have mine sent to block > all apps connecting in or out unless I make specific rules for > them. I have never authorized C:/Windows/System32/msiexec.exe to > connect in or out (and it didn't ;-) ). Same here on both counts. We're getting nowhere with admirable speed ;-) From d2mp1a9 at newsguy.com Mon Jan 6 04:32:15 2014 From: d2mp1a9 at newsguy.com (Bob Hanson) Date: Sun, 05 Jan 2014 19:32:15 -0800 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> Message-ID: <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> On Sun, 5 Jan 2014 21:09:53 -0600, Tim Peters wrote: > [Bob Hanson] > > Forgive me, but I'm an old man with very poor vision. Using my > > magnifying glass, I see it is two very long URLs ending with > > something like after the blah-blah: < ... akametechnology.com> > > > > More precisely, these two IP addresses: > > 23.59.190.113:80 > > 23.59.190.106:80 > > So: > > C:\Code>ping -a 23.59.190.113 > > Pinging a23-59-190-113.deploy.static.akamaitechnologies.com > [23.59.190.113] with 32 bytes of data: > > [...] > > So it's just Akamai caching content. Common as mud. Can't say > specifically what was being cached, but it _could_ be that your ISP > contracts with Akamai. Still not following *why* this should be happening. I was installing from my harddrive -- nothing needed to be cached as far as I was concerned. Indeed, I would normally think I could install while offline -- and often do on my PCs which are "air-gapped." Still wondering why, all of a sudden after years of using a firewalled msiexec.exe, I get it now trying to connect out while installing 3.4.0b2 from my harddrive...? > > > My firewall didn't complain when I installed 3.4.0b2 on Windows. > > > I don't use the Windows firewall, and I have mine sent to block > > all apps connecting in or out unless I make specific rules for > > them. I have never authorized C:/Windows/System32/msiexec.exe to > > connect in or out (and it didn't ;-) ). > > Same here on both counts. We're getting nowhere with admirable speed ;-) So, we just need to make some distance -- our speed is good? ;-) Less non-seriously, thanks for all the help, Tim (and others) -- Bob Hanson From rdmurray at bitdance.com Mon Jan 6 05:06:49 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 05 Jan 2014 23:06:49 -0500 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> Message-ID: <20140106040650.33B4D250165@webabinitio.net> On Sun, 05 Jan 2014 19:32:15 -0800, Bob Hanson wrote: > On Sun, 5 Jan 2014 21:09:53 -0600, Tim Peters wrote: > > So it's just Akamai caching content. Common as mud. Can't say > > specifically what was being cached, but it _could_ be that your ISP > > contracts with Akamai. > > Still not following *why* this should be happening. I was > installing from my harddrive -- nothing needed to be cached as > far as I was concerned. Indeed, I would normally think I could > install while offline -- and often do on my PCs which are > "air-gapped." > > Still wondering why, all of a sudden after years of using a > firewalled msiexec.exe, I get it now trying to connect out while > installing 3.4.0b2 from my harddrive...? The ensurepip developers will have to say for sure, but my understanding is that it does *not* go out to the network. On the other hand, it is conceivable that pip 1.5, unlike the earlier version in Beta1, is doing some sort of "up to date check" that it shouldn't be doing in the ensurepip scenario. I presume you did have the installer install pip. If you haven't already, You might try reinstalling and unchecking that option, and see if it msiexec still tries to go out to the network. That would confirm it is ensurepip that is the issue (although that does seem most likely). --David From donald at stufft.io Mon Jan 6 05:12:45 2014 From: donald at stufft.io (Donald Stufft) Date: Sun, 5 Jan 2014 23:12:45 -0500 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: <20140106040650.33B4D250165@webabinitio.net> References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> <20140106040650.33B4D250165@webabinitio.net> Message-ID: <6F53216C-42E0-4F4C-A8F5-55B56C4C6BE2@stufft.io> ensurepip uses ?no-index so it shouldn?t be hitting the network at all. On Jan 5, 2014, at 11:06 PM, R. David Murray wrote: > On Sun, 05 Jan 2014 19:32:15 -0800, Bob Hanson wrote: >> On Sun, 5 Jan 2014 21:09:53 -0600, Tim Peters wrote: >>> So it's just Akamai caching content. Common as mud. Can't say >>> specifically what was being cached, but it _could_ be that your ISP >>> contracts with Akamai. >> >> Still not following *why* this should be happening. I was >> installing from my harddrive -- nothing needed to be cached as >> far as I was concerned. Indeed, I would normally think I could >> install while offline -- and often do on my PCs which are >> "air-gapped." >> >> Still wondering why, all of a sudden after years of using a >> firewalled msiexec.exe, I get it now trying to connect out while >> installing 3.4.0b2 from my harddrive...? > > The ensurepip developers will have to say for sure, but my understanding > is that it does *not* go out to the network. On the other hand, it is > conceivable that pip 1.5, unlike the earlier version in Beta1, is doing > some sort of "up to date check" that it shouldn't be doing in the > ensurepip scenario. > > I presume you did have the installer install pip. If you haven't > already, You might try reinstalling and unchecking that option, and see > if it msiexec still tries to go out to the network. That would confirm > it is ensurepip that is the issue (although that does seem most likely). > > --David > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From stephen at xemacs.org Mon Jan 6 06:36:33 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 06 Jan 2014 14:36:33 +0900 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> Message-ID: <87ppo5isda.fsf@uwakimon.sk.tsukuba.ac.jp> Bob Hanson writes: > On Sun, 5 Jan 2014 20:09:23 -0600, Tim Peters wrote: > > > As Benjamin asked, could you please flesh out what > > "blah-blah-blah-dot-com" means - what, exactly, was the site your > > firewall warned you about? > > Forgive me, but I'm an old man with very poor vision. Using my > magnifying glass, I see it is two very long URLs ending with > something like after the blah-blah: < ... akametechnology.com> I suppose you tried cutting and pasting? Note that you don't need to be exact as long as you're pretty sure you got the whole thing -- your readers who have better eyesight can parse out the URL easily enough. > More precisely, these two IP addresses: > 23.59.190.113:80 > 23.59.190.106:80 Somebody who doesn't know the rules of capitalization (see ww1.akamitechnologies.com) appears to be spoofing Akamai (the web caching/distribution service used by President Obama among other prominent users). The domain referenced is presumably some variation on .deploy.static.akamitechnologies.com (according to host ), and the long URL is rooted at /ses/ so it's trying to convince you it's a session (whether that is actually true or not I don't know, that's just what I would guess if I were trying to reverse engineer an honest URL, which this sure doesn't seem to be). So your alarm seems to be verified, but why this happened to a Python download I don't know. It could be DNS hacking between you and python.org, as well as something in the Python MSI. HTH Steve From tim.peters at gmail.com Mon Jan 6 06:54:41 2014 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 5 Jan 2014 23:54:41 -0600 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: <87ppo5isda.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <87ppo5isda.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [Bob Hanson] >> ... magnifying glass, I see it is two very long URLs ending with >> something like after the blah-blah: < ... akametechnology.com> [Stephen J. Turnbull] > I suppose you tried cutting and pasting? Note that you don't need to > be exact as long as you're pretty sure you got the whole thing -- your > readers who have better eyesight can parse out the URL easily enough. I don't think this was cut 'n paste. Looking up the IP addresses returns legit Akamai URLs: >> More precisely, these two IP addresses: >> 23.59.190.113:80 >> 23.59.190.106:80 C:\Code>ping -a 23.59.190.113 Pinging a23-59-190-113.deploy.static.akamaitechnologies.com [23.59.190.113] with 32 bytes of data: ... C:\Code>ping -a 23.59.190.106 Pinging a23-59-190-106.deploy.static.akamaitechnologies.com [23.59.190.106] with 32 bytes of data: ... Bob's "< ... akametechnology.com>" just looks like compounded typos. > ... > So your alarm seems to be verified, but why this happened to a Python > download I don't know. It could be DNS hacking between you and > python.org, as well as something in the Python MSI. Honestly, for all we _know_, this firewall alert may have been triggered by some other program that just happened to wake up while Bob was installing Python. Sure, that's unlikely. But so is everything else about this ;-) From christian at python.org Mon Jan 6 07:15:55 2014 From: christian at python.org (Christian Heimes) Date: Mon, 06 Jan 2014 07:15:55 +0100 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: <6F53216C-42E0-4F4C-A8F5-55B56C4C6BE2@stufft.io> References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> <20140106040650.33B4D250165@webabinitio.net> <6F53216C-42E0-4F4C-A8F5-55B56C4C6BE2@stufft.io> Message-ID: <52CA4A1B.50805@python.org> On 06.01.2014 05:12, Donald Stufft wrote: > ensurepip uses ?no-index so it shouldn?t be hitting the network at all. Do you have a test to ensure that ensurepip doesn't try to use network connections? You could e.g. mock socket.create_connection() and socket.socket() in a custom socket module. The subprocess makes it a little bit more complicated to test its behavior. Christian From d2mp1a9 at newsguy.com Mon Jan 6 07:21:48 2014 From: d2mp1a9 at newsguy.com (Bob Hanson) Date: Sun, 05 Jan 2014 22:21:48 -0800 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 References: <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <87ppo5isda.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, 5 Jan 2014 23:54:41 -0600, Tim Peters wrote: [Bob Hanson] > >> ... magnifying glass, I see it is two very long URLs ending with > >> something like after the blah-blah: < ... akametechnology.com> [Stephen J. Turnbull] > > I suppose you tried cutting and pasting? [...] Tried, but was unsuccessful. [Tim Peters] > I don't think this was cut 'n paste. Looking up the IP addresses > returns legit Akamai URLs: > > [...] > > C:\Code>ping -a 23.59.190.113 > > Pinging a23-59-190-113.deploy.static.akamaitechnologies.com > [23.59.190.113] with 32 bytes of data: > ... > C:\Code>ping -a 23.59.190.106 > > Pinging a23-59-190-106.deploy.static.akamaitechnologies.com > [23.59.190.106] with 32 bytes of data: > ... > > Bob's "< ... akametechnology.com>" just looks like compounded typos. Typos or blindos. ;-) Took a screenshot just now and zoomed in -- I can now verify that the URLs are as Tim has 'em above. [Stephen J. Turnbull] > > So your alarm seems to be verified, but why this happened to a Python > > download I don't know. It could be DNS hacking between you and > > python.org, as well as something in the Python MSI. [Tim Peters] > Honestly, for all we _know_, this firewall alert may have been > triggered by some other program that just happened to wake up while > Bob was installing Python. Sure, that's unlikely. But so is > everything else about this ;-) Unlikely as the firewall alert has the full correct path for *msiexec.exe*. I also keep tabs on all processes running, watch my firewall routinely, etc. And -- I'm almost paranoid enough to be a computer security guy. ;-) Wanted to add this tiny bit of info, but now I need to retire for the night. I'll check further on things in the morning. Bob Hanson From rosuav at gmail.com Mon Jan 6 08:09:45 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 6 Jan 2014 18:09:45 +1100 Subject: [Python-Dev] Python 2.7 root buildbot showing errors Message-ID: The first build my new root buildbot did showed errors in the 2.7 test suite, but I thought little of it as quite a few other 2.7 buildbots are showing red, too. But it seems they're showing different errors, so there might be something wrong with the setup. http://buildbot.python.org/all/builders/AMD64%20Debian%20root%202.7/builds/3/steps/test/logs/stdio First off, it's complaining about being unable to build _curses (lacking curses.h). Is that a mandatory prereq that I should install, or should Python be compatible with not having it? Then further down, several SSL tests attempt: s.connect_ex(("svn.python.org", 444))) and get back EAGAIN when they're expecting ECONNREFUSED. Possibly my firewall's delaying things somewhat and it's timing out with a signal; when I try manually, the connection times out. Are these failures a problem? Should they be fixed? The 3.x builds are all coming up green. ChrisA From stephen at xemacs.org Mon Jan 6 08:58:10 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 06 Jan 2014 16:58:10 +0900 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: <87ppo5isda.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C9CCB2.7080703@hastings.org> <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <87ppo5isda.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87ha9hilt9.fsf@uwakimon.sk.tsukuba.ac.jp> Stephen J. Turnbull writes: > .deploy.static.akamitechnologies.com (according to host ), Ignore this; *my* aging eyes dropped the "A" in "akamAitechnologies.com". Sorry for the noise. From christian at python.org Mon Jan 6 09:11:35 2014 From: christian at python.org (Christian Heimes) Date: Mon, 06 Jan 2014 09:11:35 +0100 Subject: [Python-Dev] Python 2.7 root buildbot showing errors In-Reply-To: References: Message-ID: On 06.01.2014 08:09, Chris Angelico wrote: > Then further down, several SSL tests attempt: > > s.connect_ex(("svn.python.org", 444))) > > and get back EAGAIN when they're expecting ECONNREFUSED. Possibly my > firewall's delaying things somewhat and it's timing out with a signal; > when I try manually, the connection times out. Are you running the VM on Windows? I've seen similar issues on Windows and Windows as host platform for VMs: http://bugs.python.org/issue19919 Christian From rosuav at gmail.com Mon Jan 6 09:22:30 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 6 Jan 2014 19:22:30 +1100 Subject: [Python-Dev] Python 2.7 root buildbot showing errors In-Reply-To: References: Message-ID: On Mon, Jan 6, 2014 at 7:11 PM, Christian Heimes wrote: > On 06.01.2014 08:09, Chris Angelico wrote: >> >> Then further down, several SSL tests attempt: >> >> s.connect_ex(("svn.python.org", 444))) >> >> and get back EAGAIN when they're expecting ECONNREFUSED. Possibly my >> firewall's delaying things somewhat and it's timing out with a signal; >> when I try manually, the connection times out. > > Are you running the VM on Windows? I've seen similar issues on Windows and > Windows as host platform for VMs: > > http://bugs.python.org/issue19919 No, it's Debian Wheezy inside Debian Wheezy; though the outer system is a somewhat messy one (I installed Wheezy before it was stable, and compiled my own ALSA drivers and a few other things). But it could well be that same issue, as it seems to involve NAT. What's the policy on backporting patches to tests onto 2.7? ChrisA From christian at python.org Mon Jan 6 09:58:51 2014 From: christian at python.org (Christian Heimes) Date: Mon, 06 Jan 2014 09:58:51 +0100 Subject: [Python-Dev] Python 2.7 root buildbot showing errors In-Reply-To: References: Message-ID: On 06.01.2014 09:22, Chris Angelico wrote: > No, it's Debian Wheezy inside Debian Wheezy; though the outer system > is a somewhat messy one (I installed Wheezy before it was stable, and > compiled my own ALSA drivers and a few other things). > > But it could well be that same issue, as it seems to involve NAT. > What's the policy on backporting patches to tests onto 2.7? Interesting, maybe it's a general NAT issue? So far I have seen the issue on Windows only. What kind of VM are you using? I'm using virtualbox for my Windows VMs. Just backport the test fixes. Test fixes and new tests are not new feature so you are always allowed to add new tests or fix existing tests. Christian From rosuav at gmail.com Mon Jan 6 10:12:12 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 6 Jan 2014 20:12:12 +1100 Subject: [Python-Dev] Python 2.7 root buildbot showing errors In-Reply-To: References: Message-ID: On Mon, Jan 6, 2014 at 7:58 PM, Christian Heimes wrote: > Interesting, maybe it's a general NAT issue? So far I have seen the issue on > Windows only. What kind of VM are you using? I'm using virtualbox for my > Windows VMs. It's Oracle VirtualBox v4.2.20 r90963. > Just backport the test fixes. Test fixes and new tests are not new feature > so you are always allowed to add new tests or fix existing tests. Okay. I don't have write access, myself. ChrisA From walter at livinglogic.de Mon Jan 6 11:53:10 2014 From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=) Date: Mon, 06 Jan 2014 11:53:10 +0100 Subject: [Python-Dev] Subclasses vs. special methods In-Reply-To: References: Message-ID: <52CA8B16.2070506@livinglogic.de> On 04.01.14 13:58, Serhiy Storchaka wrote: > Should implicit converting an instance of int, float, complex, str, > bytes, etc subclasses to call appropriate special method __int__ (or > __index__), __float__, __complex__, __str__, __bytes__, etc? Currently > explicit converting calls these methods, but implicit converting doesn't. > >>>> class I(int): > ... def __int__(self): return 42 > ... def __index__(self): return 43 > ... >>>> class F(float): > ... def __float__(self): return 42.0 > ... >>>> class S(str): > ... def __str__(self): return '*' > ... >>>> int(I(65)) > 42 >>>> float(F(65)) > 42.0 >>>> str(S('A')) > '*' >>>> chr(I(65)) > 'A' >>>> import cmath; cmath.rect(F(65), 0) > (65+0j) >>>> ord(S('A')) > 65 > > Issue17576 [1] proposes to call special methods for implicit converting. > I have doubts about this. Note that for explicit conversion this was implemented a long time ago. See this ancient thread about str/unicode subclasses and __str__/__unicode__: https://mail.python.org/pipermail/python-dev/2005-January/051175.html And this bug report: http://bugs.python.org/issue1109424 > [...] Servus, Walter From victor.stinner at gmail.com Mon Jan 6 14:24:50 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 6 Jan 2014 14:24:50 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 Message-ID: Hi, bytes % args and bytes.format(args) are requested by Mercurial and Twisted projects. The issue #3982 was stuck because nobody proposed a complete definition of the "new" features. Here is a try as a PEP. The PEP is a draft with open questions. First, I'm not sure that both bytes%args and bytes.format(args) are needed. The implementation of .format() is more complex, so why not only adding bytes%args? Then, the following points must be decided to define the complete list of supported features (formatters): * Format integer to hexadecimal? ``%x`` and ``%X`` * Format integer to octal? ``%o`` * Format integer to binary? ``{!b}`` * Alignment? * Truncating? Truncate or raise an error? * format keywords? ``b'{arg}'.format(arg=5)`` * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)`` * Floating point number? * ``%i``, ``%u`` and ``%d`` formats for integer numbers? * Signed number? ``%+i`` and ``%-i`` HTML version of the PEP: http://www.python.org/dev/peps/pep-0460/ Inline copy: PEP: 460 Title: Add bytes % args and bytes.format(args) to Python 3.5 Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 6-Jan-2014 Python-Version: 3.5 Abstract ======== Add ``bytes % args`` operator and ``bytes.format(args)`` method to Python 3.5. Rationale ========= ``bytes % args`` and ``bytes.format(args)`` have been removed in Python 2. This operator and this method are requested by Mercurial and Twisted developers to ease porting their project on Python 3. Python 3 suggests to format text first and then encode to bytes. In some cases, it does not make sense because arguments are bytes strings. Typical usage is a network protocol which is binary, since data are send to and received from sockets. For example, SMTP, SIP, HTTP, IMAP, POP, FTP are ASCII commands interspersed with binary data. Using multiple ``bytes + bytes`` instructions is inefficient because it requires temporary buffers and copies which are slow and waste memory. Python 3.3 optimizes ``str2 += str2`` but not ``bytes2 += bytes1``. ``bytes % args`` and ``bytes.format(args)`` were asked since 2008, even before the first release of Python 3.0 (see issue #3982). ``struct.pack()`` is incomplete. For example, a number cannot be formatted as decimal and it does not support padding bytes string. Mercurial 2.8 still supports Python 2.4. Needed and excluded features ============================ Needed features * Bytes strings: bytes, bytearray and memoryview types * Format integer numbers as decimal * Padding with spaces and null bytes * "%s" should use the buffer protocol, not str() The feature set is minimal to keep the implementation as simple as possible to limit the cost of the implementation. ``str % args`` and ``str.format(args)`` are already complex and difficult to maintain, the code is heavily optimized. Excluded features: * no implicit conversion from Unicode to bytes (ex: encode to ASCII or to Latin1) * Locale support (``{!n}`` format for numbers). Locales are related to text and usually to an encoding. * ``repr()``, ``ascii()``: ``%r``, ``{!r}``, ``%a`` and ``{!a}`` formats. ``repr()`` and ``ascii()`` are used to debug, the output is displayed a terminal or a graphical widget. They are more related to text. * Attribute access: ``{obj.attr}`` * Indexing: ``{dict[key]}`` * Features of struct.pack(). For example, format a number as 32 bit unsigned integer in network endian. The ``struct.pack()`` can be used to prepare arguments, the implementation should be kept simple. * Features of int.to_bytes(). * Features of ctypes. * New format protocol like a new ``__bformat__()`` method. Since the * list of supported types is short, there is no need to add a new protocol. Other types must be explicitly casted. * Alternate format for integer. For example, ``'{|#x}'.format(0x123)`` to get ``0x123``. It is more related to debug, and the prefix can be easily be written in the format string (ex: ``0x%x``). * Relation with format() and the __format__() protocol. bytes.format() and str.format() are unrelated. Unknown: * Format integer to hexadecimal? ``%x`` and ``%X`` * Format integer to octal? ``%o`` * Format integer to binary? ``{!b}`` * Alignment? * Truncating? Truncate or raise an error? * format keywords? ``b'{arg}'.format(arg=5)`` * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)`` * Floating point number? * ``%i``, ``%u`` and ``%d`` formats for integer numbers? * Signed number? ``%+i`` and ``%-i`` bytes % args ============ Formatters: * ``"%c"``: one byte * ``"%s"``: integer or bytes strings * ``"%20s"`` pads to 20 bytes with spaces (``b' '``) * ``"%020s"`` pads to 20 bytes with zeros (``b'0'``) * ``"%\020s"`` pads to 20 bytes with null bytes (``b'\0'``) bytes.format(args) ================== Formatters: * ``"{!c}"``: one byte * ``"{!s}"``: integer or bytes strings * ``"{!.20s}"`` pads to 20 bytes with spaces (``b' '``) * ``"{!.020s}"`` pads to 20 bytes with zeros (``b'0'``) * ``"{!\020s}"`` pads to 20 bytes with null bytes (``b'\0'``) Examples ======== * ``b'a%sc%s' % (b'b', 4)`` gives ``b'abc4'`` * ``b'a{}c{}'.format(b'b', 4)`` gives ``b'abc4'`` * ``b'%c'`` % 88`` gives ``b'X``' * ``b'%%'`` gives ``b'%'`` Criticisms ========== * The development cost and maintenance cost. * In 3.3 encoding to ascii or latin1 is as fast as memcpy * Developers must work around the lack of bytes%args and bytes.format(args) anyway to support Python 3.0-3.4 * bytes.join() is consistently faster than format to join bytes strings. * Formatting functions can be implemented in a third party module References ========== * `Issue #3982: support .format for bytes `_ * `Mercurial project `_ * `Twisted project `_ * `Documentation of Python 2 formatting (str % args) `_ * `Documentation of Python 2 formatting (str.format) `_ Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From solipsis at pitrou.net Mon Jan 6 14:44:52 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 6 Jan 2014 14:44:52 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: Message-ID: <20140106144452.1a472ea1@fsol> Hi, On Mon, 6 Jan 2014 14:24:50 +0100 Victor Stinner wrote: > > The PEP is a draft with open questions. First, I'm not sure that both > bytes%args and bytes.format(args) are needed. The implementation of > .format() is more complex, so why not only adding bytes%args? I think we must either implement both or none of them. > Then, > the following points must be decided to define the complete list of > supported features (formatters): > > * Format integer to hexadecimal? ``%x`` and ``%X`` > * Format integer to octal? ``%o`` > * Format integer to binary? ``{!b}`` > * Alignment? > * Truncating? Truncate or raise an error? Not desirable IMHO. bytes formatting should serve mainly for templating situations (i.e. catenate and insert bytestrings into one another). We cannot start giving text-like semantics to bytes objects without confusing non-experts. > * format keywords? ``b'{arg}'.format(arg=5)`` > * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)`` Yes, bytes formatting must support the same calling conventions as str formatting. BTW, there's a subtlety here: ``%s`` currently means "insert the result of calling __str__", but bytes formatting should *not* call __str__. > * Floating point number? > * ``%i``, ``%u`` and ``%d`` formats for integer numbers? > * Signed number? ``%+i`` and ``%-i`` No, IMHO. Regards Antoine. From rosuav at gmail.com Mon Jan 6 14:54:17 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 7 Jan 2014 00:54:17 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140106144452.1a472ea1@fsol> References: <20140106144452.1a472ea1@fsol> Message-ID: On Tue, Jan 7, 2014 at 12:44 AM, Antoine Pitrou wrote: > BTW, there's a subtlety here: ``%s`` currently means "insert the result > of calling __str__", but bytes formatting should *not* call __str__. Since it derives from the C printf notation, it means "insert string here". The fact that __str__ will be called is secondary to that. I would say it's not a problem for bytes formatting to call __bytes__, or in some other way convert to bytes without calling __str__. Will it be confusing to have bytes and str supporting distinctly different format operations? Might it be better to instead create a separate and very different method on a bytes, just to emphasize the difference? ChrisA From solipsis at pitrou.net Mon Jan 6 14:59:07 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 6 Jan 2014 14:59:07 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <20140106144452.1a472ea1@fsol> Message-ID: <20140106145907.6bfcfb64@fsol> On Tue, 7 Jan 2014 00:54:17 +1100 Chris Angelico wrote: > On Tue, Jan 7, 2014 at 12:44 AM, Antoine Pitrou wrote: > > BTW, there's a subtlety here: ``%s`` currently means "insert the result > > of calling __str__", but bytes formatting should *not* call __str__. > > Since it derives from the C printf notation, it means "insert string > here". The fact that __str__ will be called is secondary to that. I > would say it's not a problem for bytes formatting to call __bytes__, > or in some other way convert to bytes without calling __str__. > > Will it be confusing to have bytes and str supporting distinctly > different format operations? Might it be better to instead create a > separate and very different method on a bytes, just to emphasize the > difference? The people who want bytes formatting, AFAICT, want something that is reasonably 2.x-compatible. That means using the same method / operator and calling conventions. Regards Antoine. From brett at python.org Mon Jan 6 15:09:28 2014 From: brett at python.org (Brett Cannon) Date: Mon, 6 Jan 2014 09:09:28 -0500 Subject: [Python-Dev] Python 2.7 root buildbot showing errors In-Reply-To: References: Message-ID: On Mon, Jan 6, 2014 at 2:09 AM, Chris Angelico wrote: > The first build my new root buildbot did showed errors in the 2.7 test > suite, but I thought little of it as quite a few other 2.7 buildbots > are showing red, too. But it seems they're showing different errors, > so there might be something wrong with the setup. > > > http://buildbot.python.org/all/builders/AMD64%20Debian%20root%202.7/builds/3/steps/test/logs/stdio > > First off, it's complaining about being unable to build _curses > (lacking curses.h). Is that a mandatory prereq that I should install, > or should Python be compatible with not having it? > Yes, curses should be considered entirely optional so the tests should still pass (as long as the build doesn't error out then the compiler message about not being able to build curses is not critical). -Brett > > Then further down, several SSL tests attempt: > > s.connect_ex(("svn.python.org", 444))) > > and get back EAGAIN when they're expecting ECONNREFUSED. Possibly my > firewall's delaying things somewhat and it's timing out with a signal; > when I try manually, the connection times out. > > Are these failures a problem? Should they be fixed? The 3.x builds are > all coming up green. > > ChrisA > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Jan 6 15:12:16 2014 From: brett at python.org (Brett Cannon) Date: Mon, 6 Jan 2014 09:12:16 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140106145907.6bfcfb64@fsol> References: <20140106144452.1a472ea1@fsol> <20140106145907.6bfcfb64@fsol> Message-ID: On Mon, Jan 6, 2014 at 8:59 AM, Antoine Pitrou wrote: > On Tue, 7 Jan 2014 00:54:17 +1100 > Chris Angelico wrote: > > On Tue, Jan 7, 2014 at 12:44 AM, Antoine Pitrou > wrote: > > > BTW, there's a subtlety here: ``%s`` currently means "insert the result > > > of calling __str__", but bytes formatting should *not* call __str__. > > > > Since it derives from the C printf notation, it means "insert string > > here". The fact that __str__ will be called is secondary to that. I > > would say it's not a problem for bytes formatting to call __bytes__, > > or in some other way convert to bytes without calling __str__. > > > > Will it be confusing to have bytes and str supporting distinctly > > different format operations? Might it be better to instead create a > > separate and very different method on a bytes, just to emphasize the > > difference? > > The people who want bytes formatting, AFAICT, want something that is > reasonably 2.x-compatible. That means using the same method / operator > and calling conventions. > Right, but that also doesn't mean that a library from the Cheeseshop couldn't be provided which works around any Python 2/3 differences. But my suspicion is anyone requesting this feature (e.g. Mercurial) want it implemented in C for performance and so some pure Python library to help with this won't get any traction. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Jan 6 15:13:14 2014 From: brett at python.org (Brett Cannon) Date: Mon, 6 Jan 2014 09:13:14 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140106144452.1a472ea1@fsol> References: <20140106144452.1a472ea1@fsol> Message-ID: On Mon, Jan 6, 2014 at 8:44 AM, Antoine Pitrou wrote: > > Hi, > > On Mon, 6 Jan 2014 14:24:50 +0100 > Victor Stinner wrote: > > > > The PEP is a draft with open questions. First, I'm not sure that both > > bytes%args and bytes.format(args) are needed. The implementation of > > .format() is more complex, so why not only adding bytes%args? > > I think we must either implement both or none of them. > Or bytes.format() only. But I do agree that only implementing the % operator is the wrong answer. -Brett > > > Then, > > the following points must be decided to define the complete list of > > supported features (formatters): > > > > * Format integer to hexadecimal? ``%x`` and ``%X`` > > * Format integer to octal? ``%o`` > > * Format integer to binary? ``{!b}`` > > * Alignment? > > * Truncating? Truncate or raise an error? > > Not desirable IMHO. bytes formatting should serve mainly for templating > situations (i.e. catenate and insert bytestrings into one another). We > cannot start giving text-like semantics to bytes objects without > confusing non-experts. > > > * format keywords? ``b'{arg}'.format(arg=5)`` > > * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)`` > > Yes, bytes formatting must support the same calling conventions as str > formatting. > > BTW, there's a subtlety here: ``%s`` currently means "insert the result > of calling __str__", but bytes formatting should *not* call __str__. > > > * Floating point number? > > * ``%i``, ``%u`` and ``%d`` formats for integer numbers? > > * Signed number? ``%+i`` and ``%-i`` > > No, IMHO. > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jan 6 15:45:58 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jan 2014 00:45:58 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140106144452.1a472ea1@fsol> <20140106145907.6bfcfb64@fsol> Message-ID: On 6 Jan 2014 22:15, "Brett Cannon" wrote: > > > > > On Mon, Jan 6, 2014 at 8:59 AM, Antoine Pitrou wrote: >> >> On Tue, 7 Jan 2014 00:54:17 +1100 >> Chris Angelico wrote: >> > On Tue, Jan 7, 2014 at 12:44 AM, Antoine Pitrou wrote: >> > > BTW, there's a subtlety here: ``%s`` currently means "insert the result >> > > of calling __str__", but bytes formatting should *not* call __str__. >> > >> > Since it derives from the C printf notation, it means "insert string >> > here". The fact that __str__ will be called is secondary to that. I >> > would say it's not a problem for bytes formatting to call __bytes__, >> > or in some other way convert to bytes without calling __str__. >> > >> > Will it be confusing to have bytes and str supporting distinctly >> > different format operations? Might it be better to instead create a >> > separate and very different method on a bytes, just to emphasize the >> > difference? >> >> The people who want bytes formatting, AFAICT, want something that is >> reasonably 2.x-compatible. That means using the same method / operator >> and calling conventions. > > > Right, but that also doesn't mean that a library from the Cheeseshop couldn't be provided which works around any Python 2/3 differences. But my suspicion is anyone requesting this feature (e.g. Mercurial) want it implemented in C for performance and so some pure Python library to help with this won't get any traction. Right, but it seems to me that a new helper module that could be made backwards compatible at least as far as 2.6 (if not further) would be more useful for that than a builtin change that won't be available until 2015. I think we have enough experience with Python 3 now to say yes, there are still some significant gaps in the support it offers for wire protocol development. We have been hoping others would volunteer to fill that gap, but it's getting to the point where we need to start thinking about handling it ourselves by providing a hybrid Python/C helper module specifically for wire protocol programming. An encodedstr type wouldn't implicitly interoperate with the builtins (until we finally fix the sequence operand coercion bug in CPython) but could at least handle formatting operations like this. Cheers, Nick. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-dev at masklinn.net Mon Jan 6 15:50:17 2014 From: python-dev at masklinn.net (Xavier Morel) Date: Mon, 6 Jan 2014 15:50:17 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140106144452.1a472ea1@fsol> References: <20140106144452.1a472ea1@fsol> Message-ID: On 2014-01-06, at 14:44 , Antoine Pitrou wrote: >> Then, >> the following points must be decided to define the complete list of >> supported features (formatters): >> >> * Format integer to hexadecimal? ``%x`` and ``%X`` >> * Format integer to octal? ``%o`` >> * Format integer to binary? ``{!b}`` >> * Alignment? >> * Truncating? Truncate or raise an error? > > Not desirable IMHO. bytes formatting should serve mainly for templating > situations (i.e. catenate and insert bytestrings into one another). We > cannot start giving text-like semantics to bytes objects without > confusing non-experts. But having at least some of struct's formatting options available on bytes.format or bytes % would be useful. From eric at trueblade.com Mon Jan 6 15:54:09 2014 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 06 Jan 2014 09:54:09 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140106144452.1a472ea1@fsol> Message-ID: <52CAC391.2070109@trueblade.com> On 01/06/2014 09:50 AM, Xavier Morel wrote: > On 2014-01-06, at 14:44 , Antoine Pitrou wrote: >>> Then, >>> the following points must be decided to define the complete list of >>> supported features (formatters): >>> >>> * Format integer to hexadecimal? ``%x`` and ``%X`` >>> * Format integer to octal? ``%o`` >>> * Format integer to binary? ``{!b}`` >>> * Alignment? >>> * Truncating? Truncate or raise an error? >> >> Not desirable IMHO. bytes formatting should serve mainly for templating >> situations (i.e. catenate and insert bytestrings into one another). We >> cannot start giving text-like semantics to bytes objects without >> confusing non-experts. > > But having at least some of struct's formatting options available on > bytes.format or bytes % would be useful. Perhaps, but the PEP's stated goal is to make porting between 2.x and 3.5 easier. Add struct formatting to 3.5 wouldn't help. Eric. From brett at python.org Mon Jan 6 15:56:20 2014 From: brett at python.org (Brett Cannon) Date: Mon, 6 Jan 2014 09:56:20 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140106144452.1a472ea1@fsol> <20140106145907.6bfcfb64@fsol> Message-ID: On Mon, Jan 6, 2014 at 9:45 AM, Nick Coghlan wrote: > > On 6 Jan 2014 22:15, "Brett Cannon" wrote: > > > > > > > > > > On Mon, Jan 6, 2014 at 8:59 AM, Antoine Pitrou > wrote: > >> > >> On Tue, 7 Jan 2014 00:54:17 +1100 > >> Chris Angelico wrote: > >> > On Tue, Jan 7, 2014 at 12:44 AM, Antoine Pitrou > wrote: > >> > > BTW, there's a subtlety here: ``%s`` currently means "insert the > result > >> > > of calling __str__", but bytes formatting should *not* call __str__. > >> > > >> > Since it derives from the C printf notation, it means "insert string > >> > here". The fact that __str__ will be called is secondary to that. I > >> > would say it's not a problem for bytes formatting to call __bytes__, > >> > or in some other way convert to bytes without calling __str__. > >> > > >> > Will it be confusing to have bytes and str supporting distinctly > >> > different format operations? Might it be better to instead create a > >> > separate and very different method on a bytes, just to emphasize the > >> > difference? > >> > >> The people who want bytes formatting, AFAICT, want something that is > >> reasonably 2.x-compatible. That means using the same method / operator > >> and calling conventions. > > > > > > Right, but that also doesn't mean that a library from the Cheeseshop > couldn't be provided which works around any Python 2/3 differences. But my > suspicion is anyone requesting this feature (e.g. Mercurial) want it > implemented in C for performance and so some pure Python library to help > with this won't get any traction. > > Right, but it seems to me that a new helper module that could be made > backwards compatible at least as far as 2.6 (if not further) would be more > useful for that than a builtin change that won't be available until 2015. I > think we have enough experience with Python 3 now to say yes, there are > still some significant gaps in the support it offers for wire protocol > development. > True, or at least we should be very clear as to how we expect people to do binary packing in Python 3 (Victor's PEP says struct doesn't work, so should that be fixed, etc.). That will help figure out where the holes are currently. > We have been hoping others would volunteer to fill that gap, but it's > getting to the point where we need to start thinking about handling it > ourselves by providing a hybrid Python/C helper module specifically for > wire protocol programming. > Probably. And it can work around any shortcomings we fix in Python 3.5. > An encodedstr type wouldn't implicitly interoperate with the builtins > (until we finally fix the sequence operand coercion bug in CPython) but > could at least handle formatting operations like this. > You really want that type, don't you? =) -Brett > Cheers, > Nick. > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jan 6 16:08:01 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 6 Jan 2014 16:08:01 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140106144452.1a472ea1@fsol> <20140106145907.6bfcfb64@fsol> Message-ID: <20140106160801.0e710ea0@fsol> On Tue, 7 Jan 2014 00:45:58 +1000 Nick Coghlan wrote: > > Right, but it seems to me that a new helper module that could be made > backwards compatible at least as far as 2.6 (if not further) would be more > useful for that than a builtin change that won't be available until 2015. More useful in the short term, less useful in the long term. > An encodedstr type wouldn't implicitly interoperate with the builtins > (until we finally fix the sequence operand coercion bug in CPython) but > could at least handle formatting operations like this. That's a crude hack. Also it doesn't address the situation where you want to interpolate bytestrings without them having any textual significance. Regards Antoine. From ncoghlan at gmail.com Mon Jan 6 16:08:37 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jan 2014 01:08:37 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140106144452.1a472ea1@fsol> <20140106145907.6bfcfb64@fsol> Message-ID: On 6 Jan 2014 22:56, "Brett Cannon" wrote: > > > > > On Mon, Jan 6, 2014 at 9:45 AM, Nick Coghlan wrote: >> >> >> On 6 Jan 2014 22:15, "Brett Cannon" wrote: >> > >> > >> > >> > >> > On Mon, Jan 6, 2014 at 8:59 AM, Antoine Pitrou wrote: >> >> >> >> On Tue, 7 Jan 2014 00:54:17 +1100 >> >> Chris Angelico wrote: >> >> > On Tue, Jan 7, 2014 at 12:44 AM, Antoine Pitrou wrote: >> >> > > BTW, there's a subtlety here: ``%s`` currently means "insert the result >> >> > > of calling __str__", but bytes formatting should *not* call __str__. >> >> > >> >> > Since it derives from the C printf notation, it means "insert string >> >> > here". The fact that __str__ will be called is secondary to that. I >> >> > would say it's not a problem for bytes formatting to call __bytes__, >> >> > or in some other way convert to bytes without calling __str__. >> >> > >> >> > Will it be confusing to have bytes and str supporting distinctly >> >> > different format operations? Might it be better to instead create a >> >> > separate and very different method on a bytes, just to emphasize the >> >> > difference? >> >> >> >> The people who want bytes formatting, AFAICT, want something that is >> >> reasonably 2.x-compatible. That means using the same method / operator >> >> and calling conventions. >> > >> > >> > Right, but that also doesn't mean that a library from the Cheeseshop couldn't be provided which works around any Python 2/3 differences. But my suspicion is anyone requesting this feature (e.g. Mercurial) want it implemented in C for performance and so some pure Python library to help with this won't get any traction. >> >> Right, but it seems to me that a new helper module that could be made backwards compatible at least as far as 2.6 (if not further) would be more useful for that than a builtin change that won't be available until 2015. I think we have enough experience with Python 3 now to say yes, there are still some significant gaps in the support it offers for wire protocol development. > > > True, or at least we should be very clear as to how we expect people to do binary packing in Python 3 (Victor's PEP says struct doesn't work, so should that be fixed, etc.). That will help figure out where the holes are currently. > >> >> We have been hoping others would volunteer to fill that gap, but it's getting to the point where we need to start thinking about handling it ourselves by providing a hybrid Python/C helper module specifically for wire protocol programming. > > Probably. And it can work around any shortcomings we fix in Python 3.5. > >> >> An encodedstr type wouldn't implicitly interoperate with the builtins (until we finally fix the sequence operand coercion bug in CPython) but could at least handle formatting operations like this. > > > You really want that type, don't you? =) I still don't think the 2.x bytestring is inherently evil, it's just the wrong type to use as the core text type because of the problems it has with silently creating mojibake and also with multi-byte codecs and slicing. The current python-ideas thread is close to convincing me even a stripped down version isn't a good idea, though :P Cheers, Nick. > > -Brett > >> >> Cheers, >> Nick. >> >> > >> > _______________________________________________ >> > Python-Dev mailing list >> > Python-Dev at python.org >> > https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jan 6 16:22:21 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jan 2014 01:22:21 +1000 Subject: [Python-Dev] [Python-checkins] cpython: whatsnew: XMLPullParser, plus some doc updates. In-Reply-To: <3dxnZS1P8vz7Lmc@mail.python.org> References: <3dxnZS1P8vz7Lmc@mail.python.org> Message-ID: On 5 Jan 2014 12:54, "r.david.murray" wrote: > > http://hg.python.org/cpython/rev/069f88f4935f > changeset: 88308:069f88f4935f > user: R David Murray > date: Sat Jan 04 23:52:50 2014 -0500 > summary: > whatsnew: XMLPullParser, plus some doc updates. > > I was confused by the text saying that read_events "iterated", since it > actually returns an iterator (that's what a generator does) that the > caller must then iterate. So I tidied up the language. I'm not sure > what the sentence "Events provided in a previous call to read_events() > will not be yielded again." is trying to convey, so I didn't try to fix that. It's a mutating API - once the events have been retrieved, that's it, they're gone from the internal state. Suggestions for wording improvements welcome :) Cheers, Nick. > > Also fixed a couple more news items. > > files: > Doc/library/xml.etree.elementtree.rst | 23 +++++++++----- > Doc/whatsnew/3.4.rst | 7 ++- > Lib/xml/etree/ElementTree.py | 2 +- > Misc/NEWS | 12 +++--- > 4 files changed, 25 insertions(+), 19 deletions(-) > > > diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst > --- a/Doc/library/xml.etree.elementtree.rst > +++ b/Doc/library/xml.etree.elementtree.rst > @@ -105,12 +105,15 @@ > >>> root[0][1].text > '2008' > > + > +.. _elementtree-pull-parsing: > + > Pull API for non-blocking parsing > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > -Most parsing functions provided by this module require to read the whole > -document at once before returning any result. It is possible to use a > -:class:`XMLParser` and feed data into it incrementally, but it's a push API that > +Most parsing functions provided by this module require the whole document > +to be read at once before returning any result. It is possible to use an > +:class:`XMLParser` and feed data into it incrementally, but it is a push API that > calls methods on a callback target, which is too low-level and inconvenient for > most needs. Sometimes what the user really wants is to be able to parse XML > incrementally, without blocking operations, while enjoying the convenience of > @@ -119,7 +122,7 @@ > The most powerful tool for doing this is :class:`XMLPullParser`. It does not > require a blocking read to obtain the XML data, and is instead fed with data > incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML > -elements, call :meth:`XMLPullParser.read_events`. Here's an example:: > +elements, call :meth:`XMLPullParser.read_events`. Here is an example:: > > >>> parser = ET.XMLPullParser(['start', 'end']) > >>> parser.feed('sometext') > @@ -1038,15 +1041,17 @@ > > .. method:: read_events() > > - Iterate over the events which have been encountered in the data fed to the > - parser. This method yields ``(event, elem)`` pairs, where *event* is a > + Return an iterator over the events which have been encountered in the > + data fed to the > + parser. The iterator yields ``(event, elem)`` pairs, where *event* is a > string representing the type of event (e.g. ``"end"``) and *elem* is the > encountered :class:`Element` object. > > Events provided in a previous call to :meth:`read_events` will not be > - yielded again. As events are consumed from the internal queue only as > - they are retrieved from the iterator, multiple readers calling > - :meth:`read_events` in parallel will have unpredictable results. > + yielded again. Events are consumed from the internal queue only when > + they are retrieved from the iterator, so multiple readers iterating in > + parallel over iterators obtained from :meth:`read_events` will have > + unpredictable results. > > .. note:: > > diff --git a/Doc/whatsnew/3.4.rst b/Doc/whatsnew/3.4.rst > --- a/Doc/whatsnew/3.4.rst > +++ b/Doc/whatsnew/3.4.rst > @@ -1088,9 +1088,10 @@ > xml.etree > --------- > > -Add an event-driven parser for non-blocking applications, > -:class:`~xml.etree.ElementTree.XMLPullParser`. > -(Contributed by Antoine Pitrou in :issue:`17741`.) > +A new parser, :class:`~xml.etree.ElementTree.XMLPullParser`, allows a > +non-blocking applications to parse XML documents. An example can be > +seen at :ref:`elementtree-pull-parsing`. (Contributed by Antoine > +Pitrou in :issue:`17741`.) > > The :mod:`xml.etree.ElementTree` :func:`~xml.etree.ElementTree.tostring` and > :func:`~xml.etree.ElementTree.tostringlist` functions, and the > diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py > --- a/Lib/xml/etree/ElementTree.py > +++ b/Lib/xml/etree/ElementTree.py > @@ -1251,7 +1251,7 @@ > self._close_and_return_root() > > def read_events(self): > - """Iterate over currently available (event, elem) pairs. > + """Return an iterator over currently available (event, elem) pairs. > > Events are consumed from the internal event queue as they are > retrieved from the iterator. > diff --git a/Misc/NEWS b/Misc/NEWS > --- a/Misc/NEWS > +++ b/Misc/NEWS > @@ -2193,14 +2193,14 @@ > - Issue #17555: Fix ForkAwareThreadLock so that size of after fork > registry does not grow exponentially with generation of process. > > -- Issue #17707: multiprocessing.Queue's get() method does not block for short > - timeouts. > - > -- Isuse #17720: Fix the Python implementation of pickle.Unpickler to correctly > +- Issue #17707: fix regression in multiprocessing.Queue's get() method where > + it did not block for short timeouts. > + > +- Issue #17720: Fix the Python implementation of pickle.Unpickler to correctly > process the APPENDS opcode when it is used on non-list objects. > > -- Issue #17012: shutil.which() no longer fallbacks to the PATH environment > - variable if empty path argument is specified. Patch by Serhiy Storchaka. > +- Issue #17012: shutil.which() no longer falls back to the PATH environment > + variable if an empty path argument is specified. Patch by Serhiy Storchaka. > > - Issue #17710: Fix pickle raising a SystemError on bogus input. > > > -- > Repository URL: http://hg.python.org/cpython > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > https://mail.python.org/mailman/listinfo/python-checkins > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d2mp1a9 at newsguy.com Mon Jan 6 16:29:16 2014 From: d2mp1a9 at newsguy.com (Bob Hanson) Date: Mon, 06 Jan 2014 07:29:16 -0800 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 References: <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> <20140106040650.33B4D250165@webabinitio.net> Message-ID: [For the record: I'm running 32bit Windows XP (Pro) SP2 and installing "for all users."] TL;DR: No matter what I tried this morning re uninstalling and reinstalling 3.4.0b2, pip or no pip, MSI still tried to connect to the Akamai URLs. On Sun, 05 Jan 2014 23:06:49 -0500, R. David Murray wrote: > > Still wondering why, all of a sudden after years of using a > > firewalled msiexec.exe, I get it now trying to connect out while > > installing 3.4.0b2 from my harddrive...? > > The ensurepip developers will have to say for sure, but my understanding > is that it does *not* go out to the network. On the other hand, it is > conceivable that pip 1.5, unlike the earlier version in Beta1, is doing > some sort of "up to date check" that it shouldn't be doing in the > ensurepip scenario. > > I presume you did have the installer install pip. To be honest, I forgot all about pip until after I become a wee bit alarmed by the installer going out to the interwebs -- didn't even notice a checkbox for that option. > If you haven't already, You might try reinstalling and unchecking > that option, and see if it msiexec still tries to go out to the > network. That would confirm it is ensurepip that is the issue > (although that does seem most likely). Working again on this, this morning: Uninstalled and then reinstalled 3.4.0b2. No check box for pip, but there was that strange "tree" of collapsed options which included a pip one and which appeared to default to "install pip." Left pip on as was the default. Curiously, although I hadn't remembered this happening yesterday, this morning, the installer said there was already a 3.4x installed and do I want to overwrite it. (Uninstalling 3.4.0b2 had left behind my addition of sympy and another file or two of my own from 3.4.0b1.) I told MSI to go ahead and overwrite. Sure enough, the installer tried to connect to the same two IPs (Akamai I'm now told) with the installer left at default options. --- Next, I uninstalled 3.4.0b2 again, this time removing the entire dir after uninstalling. Reinstalling (still default settings on installer) this time gave me a bunch of *new* additions to site-packages including pip, setuptools, easy_install.py -- all of which were *not* installed into site-packages when priorly overinstalling. Again, this time, msiexec.exe still attempted to connect (two bursts -- each time, twice to each of the aforementioned URLs). --- Finally, I uninstalled 3.4.0b2, removed the dir, and reinstalled yet again, this time selecting the "don't install pip" option in the funky Windows "option tree" in the MSI installer. Yet *still* again, there were several sets of attempts by msiexec.exe to connect to the same two Akamai URLs -- but, no pip or other cruft in site-packages nor any pip-things in Scripts after the install finished. So, whatever I have tried -- pip or no pip -- msiexec.exe still attempts to connect to those Akamai URLs. [Hopefully, I kept accurate notes this morning and didn't typo them above.] At any rate, the attempts to connect to the network seem like undesirable behavior to this man. If pip is necessary, then some Window users may well end up without it -- and then not know why something later doesn't work. Got to go now, but will check in this evening or in the morning, tomorrow. Again, many thanks to all for the help -- Bob Hanson From guido at python.org Mon Jan 6 16:43:38 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Jan 2014 05:43:38 -1000 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: References: <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> <20140106040650.33B4D250165@webabinitio.net> Message-ID: Since MSIEXEC.EXE is a legit binary (not coming from our packager) and Akamai is a legitimate company (MS most likely has an agreement with them), at this point I would assume that there's something that MSIEXEC.EXE wants to get from Akamai, which is unintentionally but harmlessly triggered by the Python install. Could it be checking for upgrades? On Mon, Jan 6, 2014 at 5:29 AM, Bob Hanson wrote: > [For the record: I'm running 32bit Windows XP (Pro) SP2 and > installing "for all users."] > > TL;DR: No matter what I tried this morning re uninstalling and > reinstalling 3.4.0b2, pip or no pip, MSI still tried to connect > to the Akamai URLs. > > On Sun, 05 Jan 2014 23:06:49 -0500, R. David Murray wrote: > >> > Still wondering why, all of a sudden after years of using a >> > firewalled msiexec.exe, I get it now trying to connect out while >> > installing 3.4.0b2 from my harddrive...? >> >> The ensurepip developers will have to say for sure, but my understanding >> is that it does *not* go out to the network. On the other hand, it is >> conceivable that pip 1.5, unlike the earlier version in Beta1, is doing >> some sort of "up to date check" that it shouldn't be doing in the >> ensurepip scenario. >> >> I presume you did have the installer install pip. > > To be honest, I forgot all about pip until after I become a wee > bit alarmed by the installer going out to the interwebs -- didn't > even notice a checkbox for that option. > >> If you haven't already, You might try reinstalling and unchecking >> that option, and see if it msiexec still tries to go out to the >> network. That would confirm it is ensurepip that is the issue >> (although that does seem most likely). > > Working again on this, this morning: Uninstalled and then > reinstalled 3.4.0b2. No check box for pip, but there was that > strange "tree" of collapsed options which included a pip one and > which appeared to default to "install pip." Left pip on as was > the default. > > Curiously, although I hadn't remembered this happening yesterday, > this morning, the installer said there was already a 3.4x > installed and do I want to overwrite it. (Uninstalling 3.4.0b2 > had left behind my addition of sympy and another file or two of > my own from 3.4.0b1.) I told MSI to go ahead and overwrite. > > Sure enough, the installer tried to connect to the same two IPs > (Akamai I'm now told) with the installer left at default options. > > --- > > Next, I uninstalled 3.4.0b2 again, this time removing the entire > dir after uninstalling. Reinstalling (still default settings on > installer) this time gave me a bunch of *new* additions to > site-packages including pip, setuptools, easy_install.py -- all > of which were *not* installed into site-packages when priorly > overinstalling. > > Again, this time, msiexec.exe still attempted to connect (two > bursts -- each time, twice to each of the aforementioned URLs). > > --- > > Finally, I uninstalled 3.4.0b2, removed the dir, and reinstalled > yet again, this time selecting the "don't install pip" option in > the funky Windows "option tree" in the MSI installer. > > Yet *still* again, there were several sets of attempts by > msiexec.exe to connect to the same two Akamai URLs -- but, no pip > or other cruft in site-packages nor any pip-things in Scripts > after the install finished. > > So, whatever I have tried -- pip or no pip -- msiexec.exe still > attempts to connect to those Akamai URLs. > > [Hopefully, I kept accurate notes this morning and didn't typo > them above.] > > At any rate, the attempts to connect to the network seem like > undesirable behavior to this man. If pip is necessary, then some > Window users may well end up without it -- and then not know why > something later doesn't work. > > Got to go now, but will check in this evening or in the morning, > tomorrow. > > Again, many thanks to all for the help -- > > Bob Hanson > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From p.f.moore at gmail.com Mon Jan 6 16:57:06 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 6 Jan 2014 15:57:06 +0000 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: References: <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> <20140106040650.33B4D250165@webabinitio.net> Message-ID: On 6 January 2014 15:29, Bob Hanson wrote: > At any rate, the attempts to connect to the network seem like > undesirable behavior to this man. If pip is necessary, then some > Window users may well end up without it -- and then not know why > something later doesn't work. I have installed python 3.4b2 on Windows (7, 64-bit) and seen no network connections like this. I didn't check too closely, and I don't know that my (corporate) firewall would necessarily report this to me. But it seemed fine to me. I'll see if I can try an install on a VM with no network access at some point, and see what that does. One possibility which might be worth investigating - some Windows software can insert itself into the network stack and trigger extra net calls (I believe it's common with things like parental control software, and I once ended up with a thoroughly broken network because ZoneAlarm did something nasty to me). As no-one else seems to be having the issues you are, could it be that something else is intercepting part of the install process, unrelated to Python? It's also worth noting that the Python MSI is "just" a database of files and settings to install (plus some post-install scripts that would behave the same on all systems, and don't connect to the net AIUI). The MSI is interpreted, as you note, by the OS-supplied msiexec.exe. Is it possible that you have some sort of patched msiexec (there's lots of opportunity for OEM customisation in Windows, maybe your hardware supplier added something to get a logo/advert from their website when installs run)? I'm clutching at straws here, certainly, but it does look like it's an issue local to your setup. Paul From sky.kok at speaklikeaking.com Mon Jan 6 11:38:46 2014 From: sky.kok at speaklikeaking.com (Vajrasky Kok) Date: Mon, 6 Jan 2014 18:38:46 +0800 Subject: [Python-Dev] The desired behaviour for resolve() when the path doesn't exist Message-ID: Dear friends, This is related with ticket 19717: "resolve() fails when the path doesn't exist". Assuming /home/cutecat exists but not /home/cutecat/aa, what is the desired output of Path('/home/cutecat/aa/bb/cc').resolve(strict=False)? Should it be: "/home/cutecat" (the existed path only), "/home/cutecat/aa" (the first non-existed path; my current strategy), or "/home/cutecat/aa/bb/cc" (the default behaviour of os.path.realpath)? Vajrasky From rdmurray at bitdance.com Mon Jan 6 17:22:08 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 06 Jan 2014 11:22:08 -0500 Subject: [Python-Dev] [Python-checkins] cpython: whatsnew: XMLPullParser, plus some doc updates. In-Reply-To: References: <3dxnZS1P8vz7Lmc@mail.python.org> Message-ID: <20140106162208.CFC842500AA@webabinitio.net> On Tue, 07 Jan 2014 01:22:21 +1000, Nick Coghlan wrote: > On 5 Jan 2014 12:54, "r.david.murray" wrote: > > > > http://hg.python.org/cpython/rev/069f88f4935f > > changeset: 88308:069f88f4935f > > user: R David Murray > > date: Sat Jan 04 23:52:50 2014 -0500 > > summary: > > whatsnew: XMLPullParser, plus some doc updates. > > > > I was confused by the text saying that read_events "iterated", since it > > actually returns an iterator (that's what a generator does) that the > > caller must then iterate. So I tidied up the language. I'm not sure > > what the sentence "Events provided in a previous call to read_events() > > will not be yielded again." is trying to convey, so I didn't try to fix > that. > > It's a mutating API - once the events have been retrieved, that's it, > they're gone from the internal state. Suggestions for wording improvements > welcome :) Well, my guess as to what it meant was roughly: "An Event will be yielded exactly once regardless of how many read_events iterators are processed." Looking at the code, though, I'm not sure that's actually true. The code does not appear to be thread-safe. Of course, it isn't intended to be used in a threaded context, but the docs don't quite make that explicit. I imagine that's the intent of the statement about "parallel" reading, but it doesn't actually say that the code is not thread safe. It reads more as if it is warning that the order of retrieval would be unpredictable. --David From murman at gmail.com Mon Jan 6 17:26:47 2014 From: murman at gmail.com (Michael Urman) Date: Mon, 6 Jan 2014 10:26:47 -0600 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: References: <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> <20140106040650.33B4D250165@webabinitio.net> Message-ID: On Mon, Jan 6, 2014 at 9:43 AM, Guido van Rossum wrote: > Since MSIEXEC.EXE is a legit binary (not coming from our packager) and > Akamai is a legitimate company (MS most likely has an agreement with > them), at this point I would assume that there's something that > MSIEXEC.EXE wants to get from Akamai, which is unintentionally but > harmlessly triggered by the Python install. Could it be checking for > upgrades? Here's some more guesswork. Does it seem possible that msiexec is trying to verify the revocation status of the certificate used to sign the python .msi file? Per http://blogs.technet.com/b/pki/archive/2006/11/30/basic-crl-checking-with-certutil.aspx it looks like crl.microsoft.com is the host; this is hosted on akamai: crl.microsoft.com is an alias for crl.www.ms.akadns.net. crl.www.ms.akadns.net is an alias for a1363.g.akamai.net. There are various things you could try to verify this. You could test with simpler .msi files where one is signed and another is not signed (I'll leave it up to you to find such things, but ORCA is a common "test" .msi file). Or you could take a verbose log of the installation process (msiexec /l*v python.log python.msi OR http://support.microsoft.com/kb/223300), sit on the prompt for network access so you can uniquely identify the log's timestamps, and try to identify at what point of the installation the network access occurs. Once that is known, more steps can be taken to identify and resolve any actual issues. Michael From solipsis at pitrou.net Mon Jan 6 21:17:20 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 6 Jan 2014 21:17:20 +0100 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby References: <52C98678.6060201@hastings.org> <52C9D423.7020402@hastings.org> <20140105232553.GA10675@sleipnir.bytereef.org> Message-ID: <20140106211720.50ea65a8@fsol> On Mon, 6 Jan 2014 00:25:53 +0100 Stefan Krah wrote: > Nick Coghlan wrote: > > > I had that working at one point. Guido said no, keep it all in one file. > > > I'm flexible but first you'd have to convince him. > > > > It's also not something we're stuck with forever - we can start with it inline > > (which has the advantage of keeping all the code in the same place), and later > > move to having the helpers in a separate file included from the implementation > > file if we decide it makes sense to do so. > > If we move big chunks of code around twice, I guess "hg blame" will break > twice, too. That is another thing worth considering. Breaking on generated code doesn't sound very annoying, though. > I agree with Serhiy, but that is probably known at this point. :) I agree with Serhiy and you too. Clinic's current output makes C files more tedious to read, and I'm not really willing to participate in the "conversion derby" because of that. What were Guido's arguments? Also, see http://bugs.python.org/issue19723 Regards Antoine. From skip at pobox.com Mon Jan 6 21:40:54 2014 From: skip at pobox.com (Skip Montanaro) Date: Mon, 6 Jan 2014 14:40:54 -0600 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby In-Reply-To: <20140106211720.50ea65a8@fsol> References: <52C98678.6060201@hastings.org> <52C9D423.7020402@hastings.org> <20140105232553.GA10675@sleipnir.bytereef.org> <20140106211720.50ea65a8@fsol> Message-ID: On Mon, Jan 6, 2014 at 2:17 PM, Antoine Pitrou wrote: >> I agree with Serhiy, but that is probably known at this point. :) > > I agree with Serhiy and you too. Clinic's current output makes C files > more tedious to read, and I'm not really willing to participate in the > "conversion derby" because of that. My first thought was that this exercise falls into the realm of fixing things which aren't broken. Skip From erik.m.bray at gmail.com Mon Jan 6 21:53:58 2014 From: erik.m.bray at gmail.com (Erik Bray) Date: Mon, 6 Jan 2014 15:53:58 -0500 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby In-Reply-To: <52C98678.6060201@hastings.org> References: <52C98678.6060201@hastings.org> Message-ID: On Sun, Jan 5, 2014 at 11:21 AM, Larry Hastings wrote: > Now, properly converting a function to work with Argument Clinic does not > change its behavior. Internally, the code performing argument parsing > should be nigh-identical; it should call the same PyArg_Parse function, with > the same arguments, and the implementation should perform the same work as a > result. The only externally observable change should be that > inspect.signature() now produces a valid signature for the builtin; in all > other respects Python should be unchanged. No documentation should have to > change, no tests should need to be modified, and absolutely no code should > be broken as a result. Converting a function to use Argument Clinic should > be a blissfully low-risk procedure, and produce a pleasant, > easier-to-maintain result. Hi, If it goes forward I would be willing to help out with the derby on a few modules. I haven't followed the Argument Clinic arguments closely before now, so I don't know if this question has been addressed. I didn't see it mentioned in the docs anywhere, but will the policy be to *prefer* renaming existing functions to the names generated by clinic (the "_impl" names) or to override that to keep the existing names? I ask because some built-in functions are used internally by other built-in functions. I don't know how common this is but, for example, fileio_read calls fileio_readall. So if fileio_readall is renamed to io_FileIO_readall_impl or whatever we need to also go through and fix any references to fileio_readall. Should be easy enough, but I wonder if there are any broader side-effects of this. Might it be safer for the first round to keep the existing function names? Erik From guido at python.org Mon Jan 6 22:00:36 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Jan 2014 11:00:36 -1000 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby In-Reply-To: <20140106211720.50ea65a8@fsol> References: <52C98678.6060201@hastings.org> <52C9D423.7020402@hastings.org> <20140105232553.GA10675@sleipnir.bytereef.org> <20140106211720.50ea65a8@fsol> Message-ID: On Mon, Jan 6, 2014 at 10:17 AM, Antoine Pitrou wrote: > On Mon, 6 Jan 2014 00:25:53 +0100 > Stefan Krah wrote: >> Nick Coghlan wrote: >> > > I had that working at one point. Guido said no, keep it all in one file. >> > > I'm flexible but first you'd have to convince him. >> > >> > It's also not something we're stuck with forever - we can start with it inline >> > (which has the advantage of keeping all the code in the same place), and later >> > move to having the helpers in a separate file included from the implementation >> > file if we decide it makes sense to do so. >> >> If we move big chunks of code around twice, I guess "hg blame" will break >> twice, too. That is another thing worth considering. > > Breaking on generated code doesn't sound very annoying, though. That depends on how stressed you are when you are trying to use hg blame to figure out where a certain breakage was introduced, when and by whom. >> I agree with Serhiy, but that is probably known at this point. :) > > I agree with Serhiy and you too. Clinic's current output makes C files > more tedious to read, and I'm not really willing to participate in the > "conversion derby" because of that. What were Guido's arguments? > > Also, see http://bugs.python.org/issue19723 I added a hopefully useful suggestion there; ISTM the situation can easily be improved by changing the wording of the magic comments. I'm not yet convinced that the generated code is better off in separate files nor why that is considered such a big deal. And how would you prevent the generated functions from becoming externally visible? As long as they are in the same file they can be static. (I'm not a fan of #include to stitch files together.) -- --Guido van Rossum (python.org/~guido) From storchaka at gmail.com Mon Jan 6 22:36:42 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 06 Jan 2014 23:36:42 +0200 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby In-Reply-To: References: <52C98678.6060201@hastings.org> Message-ID: 06.01.14 22:53, Erik Bray ???????(??): > I ask because some built-in functions are used internally by other > built-in functions. I don't know how common this is but, for example, > fileio_read calls fileio_readall. So if fileio_readall is renamed to > io_FileIO_readall_impl or whatever we need to also go through and fix > any references to fileio_readall. Should be easy enough, but I wonder > if there are any broader side-effects of this. Might it be safer for > the first round to keep the existing function names? You can left fileio_readall as is and call it from io_FileIO_readall_impl and other places. From timothy.c.delaney at gmail.com Mon Jan 6 23:54:40 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 7 Jan 2014 09:54:40 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: I've just posted about PEP 460 and this discussion on the mercurial-devel mailing list. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 7 00:16:10 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jan 2014 09:16:10 +1000 Subject: [Python-Dev] General Q&A regarding Python 3, adoption etc. Message-ID: For anyone that isn't already aware, I wrote a Q & A about Python 3 last year (in response to an article about how we should have fixed the GIL instead of Unicode), and I've updated it extensively over the past several days due to Alex's misunderstanding of the objectives for Python 3.4 as well as Armin's latest piece on the increased difficulties in writing wire protocol handling code. The two main additions I currently have planned are a question specifically about the state of the WSGI protocol (it works, but it an error prone way), as well as one on what I'd like to see as the next steps in encouraging Python 3 adoption now that we're within 18 months of the planned date for 2.7 to enter security fix only mode (which involve encouraging community workshops to switch to teaching Python 3.4 initially, with Python 2.7 as an optional follow up, helping Ubuntu & Fedora with their transitions to Py3 by default, bringing 3.5 closer to parity with Python 2 for wire protocol development, and, on the Red Hat/Fedora side, helping to encourage the adoption of software collections as a mechanism for decoupling the runtime for Python applications from the system Python on RHEL 6 and its derivatives. I thought I mentioned it on this list last year when I first wrote it, but some messages I've seen recently suggest many folks haven't seen it before. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Tue Jan 7 00:27:41 2014 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 06 Jan 2014 15:27:41 -0800 Subject: [Python-Dev] General Q&A regarding Python 3, adoption etc. In-Reply-To: References: Message-ID: <1389050861.28839.67407285.5EDF2571@webmail.messagingengine.com> On Mon, Jan 6, 2014, at 03:16 PM, Nick Coghlan wrote: > For anyone that isn't already aware, I wrote a Q & A about Python 3 last > year (in response to an article about how we should have fixed the GIL > instead of Unicode), and I've updated it extensively over the past > several > days due to Alex's misunderstanding of the objectives for Python 3.4 as > well as Armin's latest piece on the increased difficulties in writing > wire > protocol handling code. I'd like to thank you for taking on the task of Python 3 justification. From breamoreboy at yahoo.co.uk Tue Jan 7 00:25:12 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 06 Jan 2014 23:25:12 +0000 Subject: [Python-Dev] General Q&A regarding Python 3, adoption etc. In-Reply-To: References: Message-ID: On 06/01/2014 23:16, Nick Coghlan wrote: > For anyone that isn't already aware, I wrote a Q & A about Python 3 last > year (in response to an article about how we should have fixed the GIL > instead of Unicode), and I've updated it extensively over the past > several days due to Alex's misunderstanding of the objectives for Python > 3.4 as well as Armin's latest piece on the increased difficulties in > writing wire protocol handling code. > > The two main additions I currently have planned are a question > specifically about the state of the WSGI protocol (it works, but it an > error prone way), as well as one on what I'd like to see as the next > steps in encouraging Python 3 adoption now that we're within 18 months > of the planned date for 2.7 to enter security fix only mode (which > involve encouraging community workshops to switch to teaching Python 3.4 > initially, with Python 2.7 as an optional follow up, helping Ubuntu & > Fedora with their transitions to Py3 by default, bringing 3.5 closer to > parity with Python 2 for wire protocol development, and, on the Red > Hat/Fedora side, helping to encourage the adoption of software > collections as a mechanism for decoupling the runtime for Python > applications from the system Python on RHEL 6 and its derivatives. > > I thought I mentioned it on this list last year when I first wrote it, > but some messages I've seen recently suggest many folks haven't seen it > before. > > Cheers, > Nick. > Is it on the back of a fag packet or is there a link somewhere? :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From solipsis at pitrou.net Tue Jan 7 00:44:41 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 00:44:41 +0100 Subject: [Python-Dev] General Q&A regarding Python 3, adoption etc. References: Message-ID: <20140107004441.5aea8e76@fsol> On Mon, 06 Jan 2014 23:25:12 +0000 Mark Lawrence wrote: > On 06/01/2014 23:16, Nick Coghlan wrote: > > For anyone that isn't already aware, I wrote a Q & A about Python 3 last > > year (in response to an article about how we should have fixed the GIL > > instead of Unicode), and I've updated it extensively over the past > > several days due to Alex's misunderstanding of the objectives for Python > > 3.4 as well as Armin's latest piece on the increased difficulties in > > writing wire protocol handling code. > > > > The two main additions I currently have planned are a question > > specifically about the state of the WSGI protocol (it works, but it an > > error prone way), as well as one on what I'd like to see as the next > > steps in encouraging Python 3 adoption now that we're within 18 months > > of the planned date for 2.7 to enter security fix only mode (which > > involve encouraging community workshops to switch to teaching Python 3.4 > > initially, with Python 2.7 as an optional follow up, helping Ubuntu & > > Fedora with their transitions to Py3 by default, bringing 3.5 closer to > > parity with Python 2 for wire protocol development, and, on the Red > > Hat/Fedora side, helping to encourage the adoption of software > > collections as a mechanism for decoupling the runtime for Python > > applications from the system Python on RHEL 6 and its derivatives. > > > > I thought I mentioned it on this list last year when I first wrote it, > > but some messages I've seen recently suggest many folks haven't seen it > > before. > > > > Cheers, > > Nick. > > > > Is it on the back of a fag packet or is there a link somewhere? :) I assume it's http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html Regards Antoine. From brett at python.org Tue Jan 7 00:53:26 2014 From: brett at python.org (Brett Cannon) Date: Mon, 6 Jan 2014 18:53:26 -0500 Subject: [Python-Dev] Proposed: The Great Argument Clinic Conversion Derby In-Reply-To: References: <52C98678.6060201@hastings.org> <52C9D423.7020402@hastings.org> <20140105232553.GA10675@sleipnir.bytereef.org> <20140106211720.50ea65a8@fsol> Message-ID: On Mon, Jan 6, 2014 at 3:40 PM, Skip Montanaro wrote: > On Mon, Jan 6, 2014 at 2:17 PM, Antoine Pitrou > wrote: > >> I agree with Serhiy, but that is probably known at this point. :) > > > > I agree with Serhiy and you too. Clinic's current output makes C files > > more tedious to read, and I'm not really willing to participate in the > > "conversion derby" because of that. > > My first thought was that this exercise falls into the realm of fixing > things which aren't broken. > The gain in introspection now and possible automated improvements later (e.g. if we come up with a faster way to parse arguments it will automatically propagate through the code base) make it worth it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Jan 7 01:00:10 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 01:00:10 +0100 Subject: [Python-Dev] General Q&A regarding Python 3, adoption etc. References: Message-ID: <20140107010010.23facd44@fsol> On Tue, 7 Jan 2014 09:16:10 +1000 Nick Coghlan wrote: > For anyone that isn't already aware, I wrote a Q & A about Python 3 last > year (in response to an article about how we should have fixed the GIL > instead of Unicode), and I've updated it extensively over the past several > days due to Alex's misunderstanding of the objectives for Python 3.4 as > well as Armin's latest piece on the increased difficulties in writing wire > protocol handling code. A couple remarks: - the unicode section would gain being a little more on the practical side; for example the "surrogateescape" paragraph is an obscure and theoretical way of saying unicode filepaths (etc.) are fully supported on all platforms - also, it doesn't seem very clear that the primary string type (str) is now unicode; this has important consequences, for example non-ASCII exception messages work fine in 3.x while they were very delicate to work with in 2.x - when discussing Twisted / gevent alternatives, you should also mention Tornado, which is especially interesting because it works on both Python 2 and Python 3, and therefore presents a nice migration path - perhaps you should discuss the idea that "uptake is slow", because the numbers are rather conflicting on that point; see what I wrote in https://mail.python.org/pipermail/python-list/2014-January/663922.html and also Chris Angelico's elaboration in https://mail.python.org/pipermail/python-list/2014-January/664003.html Regards Antoine. From emile at fenx.com Tue Jan 7 00:31:50 2014 From: emile at fenx.com (Emile van Sebille) Date: Mon, 06 Jan 2014 15:31:50 -0800 Subject: [Python-Dev] General Q&A regarding Python 3, adoption etc. In-Reply-To: References: Message-ID: On 1/6/2014 3:16 PM, Nick Coghlan wrote: > I thought I mentioned it on this list last year when I first wrote it, > but some messages I've seen recently suggest many folks haven't seen it > before. And even more will see it if you provide a link. Please. Emile From ncoghlan at gmail.com Tue Jan 7 01:20:01 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jan 2014 10:20:01 +1000 Subject: [Python-Dev] General Q&A regarding Python 3, adoption etc. In-Reply-To: <20140107004441.5aea8e76@fsol> References: <20140107004441.5aea8e76@fsol> Message-ID: On 7 Jan 2014 07:46, "Antoine Pitrou" wrote: > > I assume it's > http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html Thanks, that's the one - I copied the link, but neglected to paste it in before hitting send :P Cheers, Nick. > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Tue Jan 7 10:40:15 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 07 Jan 2014 10:40:15 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: Am 06.01.2014 14:24, schrieb Victor Stinner: > Hi, > > bytes % args and bytes.format(args) are requested by Mercurial and > Twisted projects. The issue #3982 was stuck because nobody proposed a > complete definition of the "new" features. Here is a try as a PEP. Very nice, thanks. If I was to make a blasphemous suggestion I would even target it for Python 3.4. (No, seriously, this is a big issue - see the recent discussion by Armin - and the big names involved show that it is a major holdup of 3.x uptake.) It would of course depend a lot on how much code from unicode formatting can be retained or adapted as opposed to a rewrite from scratch. cheers, Georg From p.f.moore at gmail.com Tue Jan 7 10:59:20 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 7 Jan 2014 09:59:20 +0000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: On 7 January 2014 09:40, Georg Brandl wrote: > Very nice, thanks. If I was to make a blasphemous suggestion I would > even target it for Python 3.4. (No, seriously, this is a big issue > - see the recent discussion by Armin - and the big names involved show > that it is a major holdup of 3.x uptake.) It would of course depend > a lot on how much code from unicode formatting can be retained or > adapted as opposed to a rewrite from scratch. Will the relevant projects actually support only 2.X and 3.4/5+? If they expect to or have to support 3.2 or 3.3, then this change isn't actually going to help them much. If they will only support versions of Python 3 containing this change, then it may well be worth considering the impact of delaying it till 3.5. Paul. From victor.stinner at gmail.com Tue Jan 7 11:13:12 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 7 Jan 2014 11:13:12 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: 2014/1/7 Paul Moore : > Will the relevant projects actually support only 2.X and 3.4/5+? If > they expect to or have to support 3.2 or 3.3, then this change isn't > actually going to help them much. If they will only support versions > of Python 3 containing this change, then it may well be worth > considering the impact of delaying it till 3.5. Twisted and Mercurial don't support Python 3. (I heard that Twisted Core supports Python 3, but I don't know if it's true nor the Python 3 version.) Victor From ncoghlan at gmail.com Tue Jan 7 11:16:01 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jan 2014 20:16:01 +1000 Subject: [Python-Dev] General Q&A regarding Python 3, adoption etc. In-Reply-To: <20140107010010.23facd44@fsol> References: <20140107010010.23facd44@fsol> Message-ID: On 7 Jan 2014 08:03, "Antoine Pitrou" wrote: > > On Tue, 7 Jan 2014 09:16:10 +1000 > Nick Coghlan wrote: > > For anyone that isn't already aware, I wrote a Q & A about Python 3 last > > year (in response to an article about how we should have fixed the GIL > > instead of Unicode), and I've updated it extensively over the past several > > days due to Alex's misunderstanding of the objectives for Python 3.4 as > > well as Armin's latest piece on the increased difficulties in writing wire > > protocol handling code. > > A couple remarks: > > - the unicode section would gain being a little more on the practical > side; for example the "surrogateescape" paragraph is an obscure and > theoretical way of saying unicode filepaths (etc.) are fully > supported on all platforms > > - also, it doesn't seem very clear that the primary string type (str) > is now unicode; this has important consequences, for example > non-ASCII exception messages work fine in 3.x while they were very > delicate to work with in 2.x > > - when discussing Twisted / gevent alternatives, you should also mention > Tornado, which is especially interesting because it works on both > Python 2 and Python 3, and therefore presents a nice migration path Thanks, I've addressed these and a couple of other points people brought up (e.g. it is cx-freeze that supports Py3k, not py2exe). > - perhaps you should discuss the idea that "uptake is slow", because > the numbers are rather conflicting on that point; see what I wrote in > https://mail.python.org/pipermail/python-list/2014-January/663922.html > and also Chris Angelico's elaboration in > https://mail.python.org/pipermail/python-list/2014-January/664003.html I haven't incorporated these observations yet, but I will. It ties in closely with the point that bootstrapping the new Python 3 application ecosystem with cross-version libraries and frameworks is not the same thing as migrating the existing Python 2 *application* ecosystem, and the latter is expected to take *much* longer (since existing Python 2 users will have, of necessity, already worked around or avoided the bugs and limitations of that version of the language). Cheers, Nick. > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Tue Jan 7 11:16:18 2014 From: donald at stufft.io (Donald Stufft) Date: Tue, 7 Jan 2014 05:16:18 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <88224047-95EB-4F3F-B66C-4784B24549FE@stufft.io> Given the low adoption rates for Python 3 it would not surprise me if people who are hampered by the lack of this change are willing to wait until a Python version is released that has it. On Jan 7, 2014, at 5:13 AM, Victor Stinner wrote: > 2014/1/7 Paul Moore : >> Will the relevant projects actually support only 2.X and 3.4/5+? If >> they expect to or have to support 3.2 or 3.3, then this change isn't >> actually going to help them much. If they will only support versions >> of Python 3 containing this change, then it may well be worth >> considering the impact of delaying it till 3.5. > > Twisted and Mercurial don't support Python 3. > > (I heard that Twisted Core supports Python 3, but I don't know if it's > true nor the Python 3 version.) > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ncoghlan at gmail.com Tue Jan 7 11:28:23 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jan 2014 20:28:23 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <88224047-95EB-4F3F-B66C-4784B24549FE@stufft.io> References: <88224047-95EB-4F3F-B66C-4784B24549FE@stufft.io> Message-ID: On 7 Jan 2014 18:18, "Donald Stufft" wrote: > > Given the low adoption rates for Python 3 it would not surprise me if people > who are hampered by the lack of this change are willing to wait until a Python > version is released that has it. Once the code exists (regardless of the exact spelling), it also becomes much easier to extract as an extension module on PyPI for wire protocol formatting. That would allow folks to choose between just supporting 3.5+ and using the builtin formatting operations, or switching to the cross version compatible formatting module (if one was created). So I like the idea of restoring this capability for 3.5, but don't see a reason to consider rushing it into 3.4. Cheers, Nick. > > On Jan 7, 2014, at 5:13 AM, Victor Stinner wrote: > > > 2014/1/7 Paul Moore : > >> Will the relevant projects actually support only 2.X and 3.4/5+? If > >> they expect to or have to support 3.2 or 3.3, then this change isn't > >> actually going to help them much. If they will only support versions > >> of Python 3 containing this change, then it may well be worth > >> considering the impact of delaying it till 3.5. > > > > Twisted and Mercurial don't support Python 3. > > > > (I heard that Twisted Core supports Python 3, but I don't know if it's > > true nor the Python 3 version.) > > > > Victor > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io > > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Tue Jan 7 11:33:55 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 07 Jan 2014 11:33:55 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: Am 07.01.2014 10:59, schrieb Paul Moore: > On 7 January 2014 09:40, Georg Brandl wrote: >> Very nice, thanks. If I was to make a blasphemous suggestion I would >> even target it for Python 3.4. (No, seriously, this is a big issue >> - see the recent discussion by Armin - and the big names involved show >> that it is a major holdup of 3.x uptake.) It would of course depend >> a lot on how much code from unicode formatting can be retained or >> adapted as opposed to a rewrite from scratch. > > Will the relevant projects actually support only 2.X and 3.4/5+? If > they expect to or have to support 3.2 or 3.3, then this change isn't > actually going to help them much. If they will only support versions > of Python 3 containing this change, then it may well be worth > considering the impact of delaying it till 3.5. Yes, exactly. Another, and probably better, proposal would be to make 3.5 the "ultimate" viable porting target: we now know pretty well what the major remaining roadblocks (real and perceived) are for our developers and users. The proposal would be to focus entirely on addressing these roadblocks in the 3.5 version, and no other new features -- the release cycle needn't be 18 months for this one. This is similar to the moratorium for 3.2, but that one came too early for 3.x porting to really profit. In short, I am increasingly concerned that although we are going a pretty good way (and Nick's FAQ list makes that much clearer than anything else I've read), but it is not perceived as such, and could be better. We have brought Python 3 on the community, and as such we need to make it very very clear that we are working with them, not against them. A minor release dedicated to that end should be a very direct representation of that. I know about the "release everything to PyPI" strategy, but it just doesn't have the same impact. It would be very cool to have multiple projects working together with us for this, and at the release of 3.5 final, present (say) a Mercurial that works on 2.5 and 3.5. Mostly pipe-dreams though... Georg From solipsis at pitrou.net Tue Jan 7 12:13:08 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 12:13:08 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: Message-ID: <20140107121308.20fa228b@fsol> On Tue, 07 Jan 2014 10:40:15 +0100 Georg Brandl wrote: > Am 06.01.2014 14:24, schrieb Victor Stinner: > > Hi, > > > > bytes % args and bytes.format(args) are requested by Mercurial and > > Twisted projects. The issue #3982 was stuck because nobody proposed a > > complete definition of the "new" features. Here is a try as a PEP. > > Very nice, thanks. If I was to make a blasphemous suggestion I would > even target it for Python 3.4. (No, seriously, this is a big issue > - see the recent discussion by Armin - and the big names involved show > that it is a major holdup of 3.x uptake.) It would of course depend > a lot on how much code from unicode formatting can be retained or > adapted as opposed to a rewrite from scratch. From what I've seen of the unicode formatting code, a lot would have to be rewritten or refactored. It is a non-trivial task, definitely inappropriate for 3.4. Regards Antoine. From solipsis at pitrou.net Tue Jan 7 12:16:18 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 12:16:18 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: Message-ID: <20140107121618.5110a906@fsol> On Tue, 07 Jan 2014 11:33:55 +0100 Georg Brandl wrote: > > The proposal would be to focus entirely on addressing these roadblocks > in the 3.5 version, and no other new features -- the release cycle > needn't be 18 months for this one. This is similar to the moratorium > for 3.2, but that one came too early for 3.x porting to really profit. The moratorium was for alternate Python implementations IIRC, not for porting third-party libraries. > It would be very cool to have multiple projects working together with > us for this, and at the release of 3.5 final, present (say) a Mercurial > that works on 2.5 and 3.5. You seem to be forgetting that we are only one part of the equation here. Unless you want to tackle Mercurial and Twisted porting yourself? Good luck with that. Regards Antoine. From stefan at bytereef.org Tue Jan 7 12:24:25 2014 From: stefan at bytereef.org (Stefan Krah) Date: Tue, 7 Jan 2014 12:24:25 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140107121308.20fa228b@fsol> References: <20140107121308.20fa228b@fsol> Message-ID: <20140107112425.GA10696@sleipnir.bytereef.org> Antoine Pitrou wrote: > > Very nice, thanks. If I was to make a blasphemous suggestion I would > > even target it for Python 3.4. (No, seriously, this is a big issue > > - see the recent discussion by Armin - and the big names involved show > > that it is a major holdup of 3.x uptake.) It would of course depend > > a lot on how much code from unicode formatting can be retained or > > adapted as opposed to a rewrite from scratch. > > From what I've seen of the unicode formatting code, a lot would have to > be rewritten or refactored. It is a non-trivial task, definitely > inappropriate for 3.4. I do not know the stringlib well enough, so I have a silly question: Would it be possible to re-use the 2.x stringlib just for the bytes type, name it byteslib and disable features as appropriate? Stefan Krah From g.brandl at gmx.net Tue Jan 7 12:29:32 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 07 Jan 2014 12:29:32 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140107121618.5110a906@fsol> References: <20140107121618.5110a906@fsol> Message-ID: Am 07.01.2014 12:16, schrieb Antoine Pitrou: > On Tue, 07 Jan 2014 11:33:55 +0100 > Georg Brandl wrote: >> >> The proposal would be to focus entirely on addressing these roadblocks >> in the 3.5 version, and no other new features -- the release cycle >> needn't be 18 months for this one. This is similar to the moratorium >> for 3.2, but that one came too early for 3.x porting to really profit. > > The moratorium was for alternate Python implementations IIRC, not for > porting third-party libraries. Yes, but this would be a similar moratorium with another purpose. >> It would be very cool to have multiple projects working together with >> us for this, and at the release of 3.5 final, present (say) a Mercurial >> that works on 2.5 and 3.5. > > You seem to be forgetting that we are only one part of the equation > here. Unless you want to tackle Mercurial and Twisted porting yourself? > Good luck with that. No no, I did not forget :) that's why I wrote "working together with them". It would need to be coordinated with the external projects, but from what I've seen there are willing people. Georg From stephen at xemacs.org Tue Jan 7 13:26:20 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 07 Jan 2014 21:26:20 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140106144452.1a472ea1@fsol> <20140106145907.6bfcfb64@fsol> Message-ID: <874n5ghtar.fsf@uwakimon.sk.tsukuba.ac.jp> Is this really a good idea? PEP 460 proposes rather different semantics for bytes.format and the bytes % operator from the str versions. I think this is going to be both confusing and a continuous target for "further improvement" until the two implementations converge. Nick Coghlan writes: >I still don't think the 2.x bytestring is inherently evil, it's just >the wrong type to use as the core text type because of the problems >it has with silently creating mojibake and also with multi-byte >codecs and slicing. The current python-ideas thread is close to >convincing me even a stripped down version isn't a good idea, though >:P Lack of it is obviously a major pain point for many developers, but -- it is inherently evil. It's a structured data type passed around as an unstructured blob of memory, with no way for one part of the program to determine what (if anything) another part of the program thinks it's doing. It's the Python equivalent to the pointer type aliasing that gcc likes to whine about. Given that most wire protocols that benefit from this kind of thing are based on ASCII-coded commands and parameters, I think there's a better alternative to either adding 2.x bytestrings as a separate type or to PEP 460. This is to add a (minimal) structure we could call "ASCII-compatible byte array" to the current set of Unicode representations. The detailed proposal is on -ideas (where I call it "7-bit representation", but that has already caused misunderstanding.) This representation would treat non-ASCII bytes as the current representations do bytes encoded as surrogates. This representation would be produced only by a special "ascii-compatible" codec (which implies the surrogateescape- like behavior). It has the following advantages for bytestring-type processing: - double-encoding/decoding is not possible - uninterpreted bytes are marked as such -- they can be compared for equality, but other character manipulations are no-ops. - representation is efficient - output via the 'ascii-compatible' codec is just memcpy - input via the 'ascii-compatible' codec is reasonably efficient (in the posted proposal detection of non-ASCII bytes is required, so it cannot be just memcpy) - str operations are all available; only on I/O is any additional overhead imposed compared to str There's one other possible advantage that I haven't thought through yet: compatibility with 2.x literals (eg, "inputstring.find('To:')" instead of "inputbytes.find(b'To:')"). It probably does impose overhead compared to bytes, especially with the restricted functionality Victor proposes for .format() on bytes, but as Victor points out so does any full-featured string-style processing vs. low-level operations like .join(). I suppose it would be acceptable, except possibly the extra copying for I/O. The main disadvantage is additional complexity in the implementation of the str type. I don't think it imposes much runtime overhead, however, since the checks for different representations when operating on str must be done anyway. Operations involving "ascii-compatible" and other representations at the same time should be rare, except for the combinations of "ascii-compatible" and 8-bit representations -- which just involve copying bytes as between 8-bit and 8-bit, plus a bit of logic to set the type correctly. Steve From storchaka at gmail.com Tue Jan 7 14:22:06 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 07 Jan 2014 15:22:06 +0200 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: Most popular formatting codes in Mercurial sources: 2519 %s 493 %d 102 %r 48 %Y 47 %M 41 %H 39 %S 38 %m 33 %i 29 %b 23 %ld 19 %ln 12 %.3f 10 %a 10 %.1f 9 %(val)r 9 %p 9 %.2f 8 %I 6 %n 5 %(val)s 5 %.0f 5 %02x 4 %f 4 %c 4 %12s 3 %(user)s 3 %(id)s 3 %h 3 %(bzdir)s 3 %0.2f 3 %02d From dholth at gmail.com Tue Jan 7 15:08:57 2014 From: dholth at gmail.com (Daniel Holth) Date: Tue, 7 Jan 2014 09:08:57 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: +1 I have always been delighted that it is possible to manipulate binary data in Python using string operations. It's not just immoral non-Unicode text processing. A poor man's ASN.1 generator is an example of a very non-text thing that might be convenient to write with a few %s fill-in-the-blanks. Isn't it true that if you have bytes > 127 or surrogate escapes then encoding to latin1 is no longer as fast as memcpy? On Tue, Jan 7, 2014 at 8:22 AM, Serhiy Storchaka wrote: > Most popular formatting codes in Mercurial sources: > > 2519 %s > 493 %d > 102 %r > 48 %Y > 47 %M > 41 %H > 39 %S > 38 %m > 33 %i > 29 %b > 23 %ld > 19 %ln > 12 %.3f > 10 %a > 10 %.1f > 9 %(val)r > 9 %p > 9 %.2f > 8 %I > 6 %n > 5 %(val)s > 5 %.0f > 5 %02x > 4 %f > 4 %c > 4 %12s > 3 %(user)s > 3 %(id)s > 3 %h > 3 %(bzdir)s > 3 %0.2f > 3 %02d > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com From eric at trueblade.com Tue Jan 7 15:31:08 2014 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 07 Jan 2014 09:31:08 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140107112425.GA10696@sleipnir.bytereef.org> References: <20140107121308.20fa228b@fsol> <20140107112425.GA10696@sleipnir.bytereef.org> Message-ID: <52CC0FAC.5010103@trueblade.com> On 01/07/2014 06:24 AM, Stefan Krah wrote: > Antoine Pitrou wrote: >>> Very nice, thanks. If I was to make a blasphemous suggestion I would >>> even target it for Python 3.4. (No, seriously, this is a big issue >>> - see the recent discussion by Armin - and the big names involved show >>> that it is a major holdup of 3.x uptake.) It would of course depend >>> a lot on how much code from unicode formatting can be retained or >>> adapted as opposed to a rewrite from scratch. >> >> From what I've seen of the unicode formatting code, a lot would have to >> be rewritten or refactored. It is a non-trivial task, definitely >> inappropriate for 3.4. > > I do not know the stringlib well enough, so I have a silly question: > > Would it be possible to re-use the 2.x stringlib just for the bytes type, > name it byteslib and disable features as appropriate? I do know it pretty well. I think reusing stringlib from either 2.x or 3.x pre-PEP-393 version would be the best way to go about this. Unfortunately, reusing (or sharing) the PEP-393 version currently in 3.4 is probably not realistic. Eric. From hrvoje.niksic at avl.com Tue Jan 7 16:12:16 2014 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Tue, 7 Jan 2014 16:12:16 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <52CC1950.7080005@avl.com> On 01/07/2014 02:22 PM, Serhiy Storchaka wrote: > Most popular formatting codes in Mercurial sources: > > 2519 %s > 493 %d > 102 %r > 48 %Y > 47 %M > 41 %H > 39 %S > 38 %m > 33 %i > 29 %b [...] Are you sure you're not including str[fp]time formats in the count? From storchaka at gmail.com Tue Jan 7 16:26:20 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 07 Jan 2014 17:26:20 +0200 Subject: [Python-Dev] The desired behaviour for resolve() when the path doesn't exist In-Reply-To: References: Message-ID: 06.01.14 12:38, Vajrasky Kok ???????(??): > This is related with ticket 19717: "resolve() fails when the path > doesn't exist". > > Assuming /home/cutecat exists but not /home/cutecat/aa, > > what is the desired output of > Path('/home/cutecat/aa/bb/cc').resolve(strict=False)? > > Should it be: > > "/home/cutecat" (the existed path only), > "/home/cutecat/aa" (the first non-existed path; my current strategy), or > "/home/cutecat/aa/bb/cc" (the default behaviour of os.path.realpath)? The readlink command has three canonicalize modes `-f' `--canonicalize' Activate canonicalize mode. If any component of the file name except the last one is missing or unavailable, `readlink' produces no output and exits with a nonzero exit code. A trailing slash is ignored. `-e' `--canonicalize-existing' Activate canonicalize mode. If any component is missing or unavailable, `readlink' produces no output and exits with a nonzero exit code. A trailing slash requires that the name resolve to a directory. `-m' `--canonicalize-missing' Activate canonicalize mode. If any component is missing or unavailable, `readlink' treats it as a directory. Behavior of os.path.realpath() is equivalent to --canonicalize-missing. Current behavior of pathlib.Path.resolve() is equivalent to --canonicalize-existing. Behavior of --canonicalize-existing can be derived from --canonicalize, just check that resulting patch exists. But other modes can't be derived from --canonicalize-existing. def resolve_existing(path): path = path.resolve() if not path.exists(): raise FileNotFoundError(errno.ENOENT, 'No such file or directory: %r' % str(path)) return path So perhaps two main modes should be --canonicalize (default) and --canonicalize-missing (with missing=True)? From stephen at xemacs.org Tue Jan 7 16:36:48 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 08 Jan 2014 00:36:48 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <87wqibhkhb.fsf@uwakimon.sk.tsukuba.ac.jp> Daniel Holth writes: > Isn't it true that if you have bytes > 127 or surrogate escapes then > encoding to latin1 is no longer as fast as memcpy? Be careful. As phrased, the question makes no sense. You don't "have bytes" when you are encoding, you have characters. If you mean "what happens when my str contains characters in the range 128-255?", the answer is encoding a str in 8-bit representation to latin1 is effectively memcpy. If you read in latin1, it's memcpy all the way (unless you combine it with a non-latin1 string, in which case you're in the cases below). If you mean "what happens when my str contains characters in the range > 255", you have to truncate 16-bit units to 8 bit units; no memcpy. Surrogates require >= 16 bits; no memcpy. From martin at v.loewis.de Tue Jan 7 16:29:41 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Jan 2014 16:29:41 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <52CC1D65.2010203@v.loewis.de> Am 07.01.14 15:08, schrieb Daniel Holth: > Isn't it true that if you have bytes > 127 or surrogate escapes then > encoding to latin1 is no longer as fast as memcpy? You mean "decoding from latin1" (i.e. bytes to string)? No, the opposite is true. It couldn't use memcpy before, but does now (see _PyUnicode_FromUCS1). Regards, Martin From stefan_ml at behnel.de Tue Jan 7 18:58:44 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 07 Jan 2014 18:58:44 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: Victor Stinner, 06.01.2014 14:24: > ``struct.pack()`` is incomplete. For example, a number cannot be > formatted as decimal and it does not support padding bytes string. Then what about extending the struct module in a way that makes it cover more use cases like these? Stefan From victor.stinner at gmail.com Tue Jan 7 19:14:56 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 7 Jan 2014 19:14:56 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: 2014/1/7 Stefan Behnel : > Victor Stinner, 06.01.2014 14:24: >> ``struct.pack()`` is incomplete. For example, a number cannot be >> formatted as decimal and it does not support padding bytes string. > > Then what about extending the struct module in a way that makes it cover > more use cases like these? The idea of the PEP is to simply the portage work of Twisted and Mercurial developers. So the same code should work on Python 2 and Python 3. Extending struct features would not help. This is like adding a new type or function in a third-party module, it requires also to modify the source code for Python 2. And struct.pack() does not even support "%s", the current format for bytes strings requires to specify the length of the string in the format. Juraj Sukop asked me privately to support floating points in the PEP 460 for its PDF generator. Would you really like to add many features to the struct module? Padding, format as integer as decimal (maybe also binary, octal and hexadecimal), format floatting points as decimal, etc.? Victor From a.badger at gmail.com Tue Jan 7 19:57:44 2014 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 7 Jan 2014 10:57:44 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <874n5ghtar.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140106144452.1a472ea1@fsol> <20140106145907.6bfcfb64@fsol> <874n5ghtar.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140107185744.GB3012@rockytop.lan> On Tue, Jan 07, 2014 at 09:26:20PM +0900, Stephen J. Turnbull wrote: > Is this really a good idea? PEP 460 proposes rather different > semantics for bytes.format and the bytes % operator from the str > versions. I think this is going to be both confusing and a continuous > target for "further improvement" until the two implementations > converge. > Reading about the proposed differences reminded me of how in older python2 versions unicode() took keyword arguments but str.decode() only took positional arguments. I squashed a lot of trivial bugs in people's code where that difference wasn't anticpated. In later python2 versions both of those came to understand how to take their arguments as keywords which saved me from further unnecessary pain. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: not available URL: From solipsis at pitrou.net Tue Jan 7 20:53:08 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 20:53:08 +0100 Subject: [Python-Dev] Changing Clinic's output Message-ID: <20140107205308.05e1b5ce@fsol> Hello, Several core developers a bit unhappy with the way Argument Clinic currently scatters generated code into hand-written C modules. The opinion is that it makes C files more confusing and annoying to navigate through. Several solutions have been proposed: - move all generated code to separate C files, which would then be #included'd into the main module file - gather all generated code to a single place in the C module file, for example near the end (Larry's "accumulator" idea) - prefix all Clinic-generated lines with a recognizable marker, e.g. "/* AC */" What do you think? Regards Antoine. From benjamin at python.org Tue Jan 7 20:54:48 2014 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 07 Jan 2014 11:54:48 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140107205308.05e1b5ce@fsol> References: <20140107205308.05e1b5ce@fsol> Message-ID: <1389124488.19234.67800077.4D82EAF3@webmail.messagingengine.com> On Tue, Jan 7, 2014, at 11:53 AM, Antoine Pitrou wrote: > > Hello, > > Several core developers a bit unhappy with the way Argument Clinic > currently scatters generated code into hand-written C modules. The > opinion is that it makes C files more confusing and annoying to > navigate through. > > Several solutions have been proposed: > - move all generated code to separate C files, which would then be > #included'd into the main module file +1 I believe this is the "standard" solution for code generation. For example, Qt's moc uses it. From ethan at stoneleaf.us Tue Jan 7 21:03:00 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 12:03:00 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140107205308.05e1b5ce@fsol> References: <20140107205308.05e1b5ce@fsol> Message-ID: <52CC5D74.2080608@stoneleaf.us> On 01/07/2014 11:53 AM, Antoine Pitrou wrote: > > - move all generated code to separate C files, which would then be > #included'd into the main module file -1 (Guido has stated a strong dislike for this method) > - gather all generated code to a single place in the C module file, for > example near the end (Larry's "accumulator" idea) +1 > - prefix all Clinic-generated lines with a recognizable marker, e.g. > "/* AC */" +0 -- ~Ethan~ From larry at hastings.org Tue Jan 7 20:57:59 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 07 Jan 2014 11:57:59 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140107205308.05e1b5ce@fsol> References: <20140107205308.05e1b5ce@fsol> Message-ID: <52CC5C47.1090003@hastings.org> On 01/07/2014 11:53 AM, Antoine Pitrou wrote: > Hello, > > Several core developers a bit unhappy with the way Argument Clinic > currently scatters generated code into hand-written C modules. The > opinion is that it makes C files more confusing and annoying to > navigate through. > > Several solutions have been proposed: > - move all generated code to separate C files, which would then be > #included'd into the main module file > > - gather all generated code to a single place in the C module file, for > example near the end (Larry's "accumulator" idea) > > - prefix all Clinic-generated lines with a recognizable marker, e.g. > "/* AC */" > > What do you think? For what it's worth, I don't have a strong opinion about it. I had the first one (separate files) working at one point, as as memory serves Guido he didn't like that approach and I should remove the feature. I'm happy for Argument Clinic to do any/all/none of the above. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Jan 7 21:07:40 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 21:07:40 +0100 Subject: [Python-Dev] Changing Clinic's output References: <20140107205308.05e1b5ce@fsol> <52CC5D74.2080608@stoneleaf.us> Message-ID: <20140107210740.7f18909b@fsol> On Tue, 07 Jan 2014 12:03:00 -0800 Ethan Furman wrote: > On 01/07/2014 11:53 AM, Antoine Pitrou wrote: > > > > - move all generated code to separate C files, which would then be > > #included'd into the main module file > > -1 (Guido has stated a strong dislike for this method) Is it your own opinion too? Otherwise it shouldn't count as a -1. From solipsis at pitrou.net Tue Jan 7 21:28:41 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 21:28:41 +0100 Subject: [Python-Dev] The desired behaviour for resolve() when the path doesn't exist References: Message-ID: <20140107212841.6ec5acc3@fsol> On Tue, 07 Jan 2014 17:26:20 +0200 Serhiy Storchaka wrote: > > Behavior of --canonicalize-existing can be derived from --canonicalize, > just check that resulting patch exists. But other modes can't be derived > from --canonicalize-existing. > > def resolve_existing(path): > path = path.resolve() > if not path.exists(): > raise FileNotFoundError(errno.ENOENT, 'No such file or > directory: %r' % str(path)) > return path > > So perhaps two main modes should be --canonicalize (default) and > --canonicalize-missing (with missing=True)? That sounds reasonable. And I think strict should be the default. Regards Antoine. From barry at python.org Tue Jan 7 21:32:16 2014 From: barry at python.org (Barry Warsaw) Date: Tue, 7 Jan 2014 15:32:16 -0500 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140107205308.05e1b5ce@fsol> References: <20140107205308.05e1b5ce@fsol> Message-ID: <20140107153216.5e25ce31@anarchist.wooz.org> On Jan 07, 2014, at 08:53 PM, Antoine Pitrou wrote: >- move all generated code to separate C files, which would then be > #included'd into the main module file I'm not a big fan of this approach either, but maybe not as vehemently, so -0. >- gather all generated code to a single place in the C module file, for > example near the end (Larry's "accumulator" idea) +1 >- prefix all Clinic-generated lines with a recognizable marker, e.g. > "/* AC */" +0 -Barry From storchaka at gmail.com Tue Jan 7 21:39:02 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 07 Jan 2014 22:39:02 +0200 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140107205308.05e1b5ce@fsol> References: <20140107205308.05e1b5ce@fsol> Message-ID: 07.01.14 21:53, Antoine Pitrou ???????(??): > Several core developers a bit unhappy with the way Argument Clinic > currently scatters generated code into hand-written C modules. The > opinion is that it makes C files more confusing and annoying to > navigate through. > > Several solutions have been proposed: > - move all generated code to separate C files, which would then be > #included'd into the main module file Only this option will solve all my issues. My arguments against current behavior: * It increases the number of lines of code. now with Argument Clinic 1770 2704 Modules/audioop.c 1572 1997 Modules/binascii.c 3772 4558 Modules/_elementtree.c 2712 3360 Modules/_sre.c 3060 3742 Modules/_tkinter.c More PageUp/PageDown needed to list the sources, and you should be more accurate wish positioning the scrollbar. * It adds a lot of names which clutter up lists for navigation in your editor/IDE. For example if now there is only one name for the a2b_uu function in navigation list, with Argument Clinic there are three names: BINASCII_A2B_UU_METHODDEF, binascii_a2b_uu, and binascii_a2b_uu_impl (and only the last is interested for humans). If now the list of names fits in one screen, with Argument Clinic it will need three screens. * It makes harder to use search for navigation. Now a2b_uu is occurred in the source file 6 times, and with Argument Clinic it will be occurred 13 times (however moving generated code to separate file will decrease this number to 3). * It mixes manually written code with generated boilerplate * It clutters up hg log and hg blame results. Every time when you change clinic.py to generate different output, it touches multiple lines in all files which use Argument Clinic and clutters up their history. * It makes the code more errorprone. Peoples can edit generated code instead of clinic declaration. I have converted enough code to Argument Clinic last days and I seen how peoples work with already converted code, so I known what I say. If this doesn't convince you, I don't know what I can add. From storchaka at gmail.com Tue Jan 7 21:45:54 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 07 Jan 2014 22:45:54 +0200 Subject: [Python-Dev] The desired behaviour for resolve() when the path doesn't exist In-Reply-To: <20140107212841.6ec5acc3@fsol> References: <20140107212841.6ec5acc3@fsol> Message-ID: 07.01.14 22:28, Antoine Pitrou ???????(??): >> So perhaps two main modes should be --canonicalize (default) and >> --canonicalize-missing (with missing=True)? > > That sounds reasonable. And I think strict should be the default. --canonicalize is not strict. --canonicalize-existing is most strict and --canonicalize-missing is least strict. When you have a function which have non-strict behavior (--canonicalize), you can implement a wrapper with strict behavior (--canonicalize-existing), but not vice verse. From breamoreboy at yahoo.co.uk Tue Jan 7 21:46:41 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 07 Jan 2014 20:46:41 +0000 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140107205308.05e1b5ce@fsol> References: <20140107205308.05e1b5ce@fsol> Message-ID: On 07/01/2014 19:53, Antoine Pitrou wrote: > > Hello, > > Several core developers a bit unhappy with the way Argument Clinic > currently scatters generated code into hand-written C modules. The > opinion is that it makes C files more confusing and annoying to > navigate through. > > Several solutions have been proposed: > - move all generated code to separate C files, which would then be > #included'd into the main module file > > - gather all generated code to a single place in the C module file, for > example near the end (Larry's "accumulator" idea) > > - prefix all Clinic-generated lines with a recognizable marker, e.g. > "/* AC */" > > What do you think? > > Regards > > Antoine. > > Maybe overkill but why not follow 3 with 2 at the end of the file, the marker to be a very clear /* Generated by Argument Clinic - DO NOT EDIT BELOW THIS LINE */ or whatever wording is appropriate in this case. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From solipsis at pitrou.net Tue Jan 7 21:48:06 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 21:48:06 +0100 Subject: [Python-Dev] The desired behaviour for resolve() when the path doesn't exist References: <20140107212841.6ec5acc3@fsol> Message-ID: <20140107214806.1378f9e9@fsol> On Tue, 07 Jan 2014 22:45:54 +0200 Serhiy Storchaka wrote: > 07.01.14 22:28, Antoine Pitrou ???????(??): > >> So perhaps two main modes should be --canonicalize (default) and > >> --canonicalize-missing (with missing=True)? > > > > That sounds reasonable. And I think strict should be the default. > > --canonicalize is not strict. --canonicalize-existing is most strict and > --canonicalize-missing is least strict. When you have a function which > have non-strict behavior (--canonicalize), you can implement a wrapper > with strict behavior (--canonicalize-existing), but not vice verse. Yes, I meant --canonicalize should be the default. Regards Antoine. From ethan at stoneleaf.us Tue Jan 7 21:33:11 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 12:33:11 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140107210740.7f18909b@fsol> References: <20140107205308.05e1b5ce@fsol> <52CC5D74.2080608@stoneleaf.us> <20140107210740.7f18909b@fsol> Message-ID: <52CC6487.7030607@stoneleaf.us> On 01/07/2014 12:07 PM, Antoine Pitrou wrote: > On Tue, 07 Jan 2014 12:03:00 -0800 > Ethan Furman wrote: >> On 01/07/2014 11:53 AM, Antoine Pitrou wrote: >>> >>> - move all generated code to separate C files, which would then be >>> #included'd into the main module file >> >> -1 (Guido has stated a strong dislike for this method) > > Is it your own opinion too? Otherwise it shouldn't count as a -1. Yes it is. -- ~Ethan~ From solipsis at pitrou.net Tue Jan 7 22:04:18 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 22:04:18 +0100 Subject: [Python-Dev] Changing Clinic's output References: <20140107205308.05e1b5ce@fsol> <52CC5D74.2080608@stoneleaf.us> <20140107210740.7f18909b@fsol> <52CC6487.7030607@stoneleaf.us> Message-ID: <20140107220418.2671cf88@fsol> On Tue, 07 Jan 2014 12:33:11 -0800 Ethan Furman wrote: > On 01/07/2014 12:07 PM, Antoine Pitrou wrote: > > On Tue, 07 Jan 2014 12:03:00 -0800 > > Ethan Furman wrote: > >> On 01/07/2014 11:53 AM, Antoine Pitrou wrote: > >>> > >>> - move all generated code to separate C files, which would then be > >>> #included'd into the main module file > >> > >> -1 (Guido has stated a strong dislike for this method) > > > > Is it your own opinion too? Otherwise it shouldn't count as a -1. > > Yes it is. Would you care to elaborate on why you're against it? Regards Antoine. From ethan at stoneleaf.us Tue Jan 7 21:51:27 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 12:51:27 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> Message-ID: <52CC68CF.5020806@stoneleaf.us> On 01/07/2014 12:39 PM, Serhiy Storchaka wrote: > 07.01.14 21:53, Antoine Pitrou ???????(??): >> >> - move all generated code to separate C files, which would then be >> #included'd into the main module file > > Only this option will solve all my issues. > > My arguments against current behavior: [snip] > * It clutters up hg log and hg blame results. Every time when you change clinic.py to generate different output, it > touches multiple lines in all files which use Argument Clinic and clutters up their history. I think this is the reason to focus on -- the others seem like editor issues, or easily resolved by the second or third options. -- ~Ethan~ From tseaver at palladion.com Tue Jan 7 22:34:03 2014 From: tseaver at palladion.com (Tres Seaver) Date: Tue, 07 Jan 2014 16:34:03 -0500 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140107205308.05e1b5ce@fsol> References: <20140107205308.05e1b5ce@fsol> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/07/2014 02:53 PM, Antoine Pitrou wrote: > - prefix all Clinic-generated lines with a recognizable marker, e.g. > "/* AC */" +1. I would wrap generated code in even-more-visually-distinct markers, both before and after, e.g.:: /* ------------------------- Begin ArgumentClinic ---------------- */ /* ------------------------- End ArgumentClinic ------------------ */ I think delineating gencode blocks this way makes it easier to ignore them (or find them, if needed). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLMcssACgkQ+gerLs4ltQ5dSACfSEpN2E1EU/AAJhOiaQr1TKgg jZAAn2Wok6cr1suhwOfEgFZmqlsJ6HB8 =AT9/ -----END PGP SIGNATURE----- From ethan at stoneleaf.us Tue Jan 7 22:15:08 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 13:15:08 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140107220418.2671cf88@fsol> References: <20140107205308.05e1b5ce@fsol> <52CC5D74.2080608@stoneleaf.us> <20140107210740.7f18909b@fsol> <52CC6487.7030607@stoneleaf.us> <20140107220418.2671cf88@fsol> Message-ID: <52CC6E5C.7090509@stoneleaf.us> On 01/07/2014 01:04 PM, Antoine Pitrou wrote: > On Tue, 07 Jan 2014 12:33:11 -0800 > Ethan Furman wrote: >> On 01/07/2014 12:07 PM, Antoine Pitrou wrote: >>> On Tue, 07 Jan 2014 12:03:00 -0800 >>> Ethan Furman wrote: >>>> On 01/07/2014 11:53 AM, Antoine Pitrou wrote: >>>>> >>>>> - move all generated code to separate C files, which would then be >>>>> #included'd into the main module file >>>> >>>> -1 (Guido has stated a strong dislike for this method) >>> >>> Is it your own opinion too? Otherwise it shouldn't count as a -1. >> >> Yes it is. > > Would you care to elaborate on why you're against it? Seriously? Are you going to now ask all the other respondents who didn't explain themselves to do so? I don't care for it because I like to have all the code be in one file. I will say that Serhiy's comment about code churn has given me some pause to think... Okay, changing my vote to: - Use both /* AC */ markers for every line *and* have all the code be in one spot +1 This way the diffs will easily be clear on what was code generator and what was human. (Thanks, Breamoreboy, for the idea! ;) -- ~Ethan~ From stefan at bytereef.org Tue Jan 7 22:44:33 2014 From: stefan at bytereef.org (Stefan Krah) Date: Tue, 7 Jan 2014 22:44:33 +0100 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140107205308.05e1b5ce@fsol> References: <20140107205308.05e1b5ce@fsol> Message-ID: <20140107214433.GA15046@sleipnir.bytereef.org> Antoine Pitrou wrote: > Several solutions have been proposed: > - move all generated code to separate C files, which would then be > #included'd into the main module file +1 for the reasons that Serhiy has listed. Additionally, if custom parsers are implemented, the generated code will take up even more space (look e.g. at Cython's custom parsers). Stefan Krah From barry at python.org Tue Jan 7 21:46:50 2014 From: barry at python.org (Barry Warsaw) Date: Tue, 7 Jan 2014 15:46:50 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <20140107154650.0fd63db1@anarchist.wooz.org> On Jan 07, 2014, at 10:40 AM, Georg Brandl wrote: >Very nice, thanks. If I was to make a blasphemous suggestion I would >even target it for Python 3.4. (No, seriously, this is a big issue >- see the recent discussion by Armin - and the big names involved show >that it is a major holdup of 3.x uptake.) It would of course depend >a lot on how much code from unicode formatting can be retained or >adapted as opposed to a rewrite from scratch. I think we should be willing to entertain breaking feature freeze for getting this in Python 3.4. It's a serious enough problem, and Python 3.4 will be fairly widely distributed. For example, it will be a supported version in the next Debian release and in Ubuntu 14.04 LTS, and *possibly* the default Python 3 version. However, I think we'd need to see how disruptive the code changes are first, and get good review of any proposed patches. Larry and Guido would have to be on board with the exemption as well. If adopted for Python 3.4, PEP 460 should be modest in its goals, but I think I'd still like to see the following excluded and unknown features added: * Attribute access: {obj.attr} * Indexing: {dict[key]} * format keywords? b'{arg}'.format(arg=5) * str % dict ? b'%(arg)s' % {'arg': 5) These are just lookup mechanisms for finding the wanted interpolation value and don't have encoding or conversion effects. Cheers, -Barry From skip at pobox.com Tue Jan 7 23:11:09 2014 From: skip at pobox.com (Skip Montanaro) Date: Tue, 7 Jan 2014 16:11:09 -0600 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140107154650.0fd63db1@anarchist.wooz.org> References: <20140107154650.0fd63db1@anarchist.wooz.org> Message-ID: On Tue, Jan 7, 2014 at 2:46 PM, Barry Warsaw wrote: > I think we should be willing to entertain breaking feature freeze for getting > this in Python 3.4. Maybe you could revert 3.4 to alpha status and give it a cycle or two there to get this done before returning to beta status. Skip From solipsis at pitrou.net Tue Jan 7 23:13:44 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 23:13:44 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <20140107154650.0fd63db1@anarchist.wooz.org> Message-ID: <20140107231344.1ae4e02d@fsol> On Tue, 7 Jan 2014 15:46:50 -0500 Barry Warsaw wrote: > > If adopted for Python 3.4, PEP 460 should be modest in its goals, but I think > I'd still like to see the following excluded and unknown features added: > > * Attribute access: {obj.attr} > * Indexing: {dict[key]} > * format keywords? b'{arg}'.format(arg=5) > * str % dict ? b'%(arg)s' % {'arg': 5) I don't think integer values should be supported. Regards Antoine. From brett at python.org Tue Jan 7 23:21:33 2014 From: brett at python.org (Brett Cannon) Date: Tue, 7 Jan 2014 17:21:33 -0500 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140107214433.GA15046@sleipnir.bytereef.org> References: <20140107205308.05e1b5ce@fsol> <20140107214433.GA15046@sleipnir.bytereef.org> Message-ID: On Tue, Jan 7, 2014 at 4:44 PM, Stefan Krah wrote: > Antoine Pitrou wrote: > > Several solutions have been proposed: > > - move all generated code to separate C files, which would then be > > #included'd into the main module file > > +1 for the reasons that Serhiy has listed. Additionally, if custom parsers > are implemented, the generated code will take up even more space (look e.g. > at Cython's custom parsers). > Guido has already said he hates constructing files that way so that simply isn't going to happen. I personally don't care about this whole discussion (and I suspect people being quiet don't either). At this point the amount of arguing on this topic could have been used more constructively converting code and then, if necessary, tweaking the output of Argument Clinic later. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Tue Jan 7 23:27:00 2014 From: stefan at bytereef.org (Stefan Krah) Date: Tue, 7 Jan 2014 23:27:00 +0100 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> <20140107214433.GA15046@sleipnir.bytereef.org> Message-ID: <20140107222700.GA15496@sleipnir.bytereef.org> Brett Cannon wrote: > I personally don't care about this whole discussion (and I suspect people being > quiet don't either). At this point the amount of arguing on this topic could > have been used more constructively converting code and then, if necessary, > tweaking the output of Argument Clinic later. Serhiy, who started the discussion in another thread, is converting modules at a rapid pace. Stefan Krah From barry at python.org Tue Jan 7 21:48:01 2014 From: barry at python.org (Barry Warsaw) Date: Tue, 7 Jan 2014 15:48:01 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <20140107154801.56daddea@anarchist.wooz.org> On Jan 07, 2014, at 11:13 AM, Victor Stinner wrote: >Twisted and Mercurial don't support Python 3. > >(I heard that Twisted Core supports Python 3, but I don't know if it's >true nor the Python 3 version.) Parts of Twisted do run on Python 3 (and are even available in Ubuntu), but if PEP 460 helps speed up the transition of the rest of the suite, I'm all for trying to squeeze it into 3.4. -Barry From barry at python.org Tue Jan 7 21:50:24 2014 From: barry at python.org (Barry Warsaw) Date: Tue, 7 Jan 2014 15:50:24 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <88224047-95EB-4F3F-B66C-4784B24549FE@stufft.io> References: <88224047-95EB-4F3F-B66C-4784B24549FE@stufft.io> Message-ID: <20140107155024.606497d8@anarchist.wooz.org> On Jan 07, 2014, at 05:16 AM, Donald Stufft wrote: >Given the low adoption rates for Python 3 it would not surprise me if people >who are hampered by the lack of this change are willing to wait until a Python >version is released that has it. If that means waiting until 3.5, then I disagree. The Python interpreter is the lowest rung of the food chain, so there's a natural delay in having required support percolate up. Imposing another 18 month delay would be unfortunate. (Obviously, if technical matters prevent it, that's another thing.) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From solipsis at pitrou.net Tue Jan 7 23:34:44 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 23:34:44 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <88224047-95EB-4F3F-B66C-4784B24549FE@stufft.io> Message-ID: <20140107233444.57ff66d3@fsol> On Tue, 7 Jan 2014 05:16:18 -0500 Donald Stufft wrote: > Given the low adoption rates for Python 3 It would be nice not repeating that mantra since there are no reliable usage figures available. Regards Antoine. From bp at benjamin-peterson.org Tue Jan 7 23:33:26 2014 From: bp at benjamin-peterson.org (Benjamin Peterson) Date: Tue, 07 Jan 2014 14:33:26 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140107154650.0fd63db1@anarchist.wooz.org> References: <20140107154650.0fd63db1@anarchist.wooz.org> Message-ID: <1389134006.2657.67865337.4C46F1B4@webmail.messagingengine.com> On Tue, Jan 7, 2014, at 12:46 PM, Barry Warsaw wrote: > On Jan 07, 2014, at 10:40 AM, Georg Brandl wrote: > > >Very nice, thanks. If I was to make a blasphemous suggestion I would > >even target it for Python 3.4. (No, seriously, this is a big issue > >- see the recent discussion by Armin - and the big names involved show > >that it is a major holdup of 3.x uptake.) It would of course depend > >a lot on how much code from unicode formatting can be retained or > >adapted as opposed to a rewrite from scratch. > > I think we should be willing to entertain breaking feature freeze for > getting > this in Python 3.4. It's a serious enough problem, and Python 3.4 will > be > fairly widely distributed. For example, it will be a supported version > in the > next Debian release and in Ubuntu 14.04 LTS, and *possibly* the default > Python > 3 version. However, I think we'd need to see how disruptive the code > changes > are first, and get good review of any proposed patches. Larry and Guido > would > have to be on board with the exemption as well. I agree. This is a very important, much-requested feature for low-level networking code. > > If adopted for Python 3.4, PEP 460 should be modest in its goals, but I > think > I'd still like to see the following excluded and unknown features added: > > * Attribute access: {obj.attr} > * Indexing: {dict[key]} > * format keywords? b'{arg}'.format(arg=5) > * str % dict ? b'%(arg)s' % {'arg': 5) Yes, I don't think we need to support very much of the formatting language cover 99.8% of formating cases for bytes. From ncoghlan at gmail.com Tue Jan 7 23:52:09 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 8 Jan 2014 08:52:09 +1000 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> <20140107214433.GA15046@sleipnir.bytereef.org> Message-ID: On 8 Jan 2014 06:24, "Brett Cannon" wrote: > > > > > On Tue, Jan 7, 2014 at 4:44 PM, Stefan Krah wrote: >> >> Antoine Pitrou wrote: >> > Several solutions have been proposed: >> > - move all generated code to separate C files, which would then be >> > #included'd into the main module file >> >> +1 for the reasons that Serhiy has listed. Additionally, if custom parsers >> are implemented, the generated code will take up even more space (look e.g. >> at Cython's custom parsers). > > > Guido has already said he hates constructing files that way so that simply isn't going to happen. > > I personally don't care about this whole discussion (and I suspect people being quiet don't either). At this point the amount of arguing on this topic could have been used more constructively converting code and then, if necessary, tweaking the output of Argument Clinic later. I haven't had a chance to look at any of the newly converted code (due to vacation and linux.conf.au), so I'm happy to take the word of the folks doing the conversion that the current behaviour is inconvenient. I think the split VCS history where changing clinic's output without changing the function declarations *doesn't* show up as altering the manually maintained source files (but only clinic and separate generated "XYZ_clinic.c" files) is a compelling practical argument in favour of the split file approach, even if it makes execution jump around a little strangely. Failing that, I like Larry's proposal to switch to generating only prototypes inline and having the wrapper implementations at a common point in each file, reducing the visible boilerplate when looking at individual functions. Cheers, Nick. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Wed Jan 8 00:24:33 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 07 Jan 2014 15:24:33 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> Message-ID: <52CC8CB1.9070301@hastings.org> On 01/07/2014 12:46 PM, Mark Lawrence wrote: > > Maybe overkill but why not follow 3 with 2 at the end of the file, the > marker to be a very clear /* Generated by Argument Clinic - DO NOT > EDIT BELOW THIS LINE */ or whatever wording is appropriate in this case. > For what it's worth, if we use the "accumulator" approach I propose that the generated code doesn't go at the very end of the file. Instead, I suggest they should go *near* the end, below the implementations of the module / class methods, but above the methoddef/type structures and the module init function. My reasoning: when I navigate CPython C files implementing a module or a type, when I know what entry point I want I just search for its name. When I don't know what I want, I jump to the end, then scroll up until I find the name in the init function or the structures. So I wouldn't want the code at the very end; that would screw up that navigation mode. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Wed Jan 8 00:31:00 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 07 Jan 2014 15:31:00 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52CC68CF.5020806@stoneleaf.us> References: <20140107205308.05e1b5ce@fsol> <52CC68CF.5020806@stoneleaf.us> Message-ID: <52CC8E34.1020601@hastings.org> On 01/07/2014 12:51 PM, Ethan Furman wrote: > On 01/07/2014 12:39 PM, Serhiy Storchaka wrote: >> * It clutters up hg log and hg blame results. Every time when you >> change clinic.py to generate different output, it >> touches multiple lines in all files which use Argument Clinic and >> clutters up their history. > > I think this is the reason to focus on -- the others seem like editor > issues, or easily resolved by the second or third options. I don't think this is a particularly compelling reason. Once things settle down, I'm not anticipating the clinic.py code generator will change very often. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Jan 8 00:38:49 2014 From: brett at python.org (Brett Cannon) Date: Tue, 7 Jan 2014 18:38:49 -0500 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52CC8CB1.9070301@hastings.org> References: <20140107205308.05e1b5ce@fsol> <52CC8CB1.9070301@hastings.org> Message-ID: On Tue, Jan 7, 2014 at 6:24 PM, Larry Hastings wrote: > On 01/07/2014 12:46 PM, Mark Lawrence wrote: > > > Maybe overkill but why not follow 3 with 2 at the end of the file, the > marker to be a very clear /* Generated by Argument Clinic - DO NOT EDIT > BELOW THIS LINE */ or whatever wording is appropriate in this case. > > > For what it's worth, if we use the "accumulator" approach I propose that > the generated code doesn't go at the very end of the file. Instead, I > suggest they should go *near* the end, below the implementations of the > module / class methods, but above the methoddef/type structures and the > module init function. > If it is accumulated in a single location should it just be a single block for everything towards the end? Then forward declarations would go away (you could still have it as a comment to copy-and-paste where you define the implementation) and you can have a single macro for the PyMethodDef values, each class, etc. If you accumulated the PyMethodDef values into a single macro it would help make up for the convenience lost of converting a function by just cutting the old call signature up to the new *_impl() function. > > My reasoning: when I navigate CPython C files implementing a module or a > type, when I know what entry point I want I just search for its name. When > I don't know what I want, I jump to the end, then scroll up until I find > the name in the init function or the structures. So I wouldn't want the > code at the very end; that would screw up that navigation mode. > That's how I navigate as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Wed Jan 8 01:07:50 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 07 Jan 2014 16:07:50 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> <52CC8CB1.9070301@hastings.org> Message-ID: <52CC96D6.6090304@hastings.org> On 01/07/2014 03:38 PM, Brett Cannon wrote: > On Tue, Jan 7, 2014 at 6:24 PM, Larry Hastings > wrote: > > For what it's worth, if we use the "accumulator" approach I > propose that the generated code doesn't go at the very end of the > file. Instead, I suggest they should go *near* the end, below the > implementations of the module / class methods, but above the > methoddef/type structures and the module init function. > > > If it is accumulated in a single location should it just be a single > block for everything towards the end? Then forward declarations would > go away (you could still have it as a comment to copy-and-paste where > you define the implementation) and you can have a single macro for the > PyMethodDef values, each class, etc. If you accumulated the > PyMethodDef values into a single macro it would help make up for the > convenience lost of converting a function by just cutting the old call > signature up to the new *_impl() function. I *think* that would complicate some use cases. People occasionally call these parsing functions from other functions, or spread their methoddef / typeobject structures throughout the file rather than putting them all at the end. I'm proposing that the blob of text immediately between the Clinic input and the body of the impl contain (newlines added here for clarity): static char *parsing_function_doc; static PyObject * parsing_function(...); #define PARSING_FUNCTION_METHODDEF \ { ... } static PyObject * parsing_function_impl(...) Then the "accumulator" would get the text of the docstring and the definition of the parsing_function. On the other hand, if we wanted to take this opportunity to force everyone to standardize (all methoddefs and typeobjects go at the end!) we could probably make it work with one giant block near the end. Or I could make it flexible on what went into the accumulator and what went into the normal output block, and the default could be everything-in-the-accumulator. Making the common easy and the uncommon possible and all that. Yeah, that seems best. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Jan 8 01:18:53 2014 From: barry at python.org (Barry Warsaw) Date: Tue, 7 Jan 2014 19:18:53 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140107231344.1ae4e02d@fsol> References: <20140107154650.0fd63db1@anarchist.wooz.org> <20140107231344.1ae4e02d@fsol> Message-ID: <20140107191853.23d90e11@anarchist.wooz.org> On Jan 07, 2014, at 11:13 PM, Antoine Pitrou wrote: >On Tue, 7 Jan 2014 15:46:50 -0500 >Barry Warsaw wrote: >> >> If adopted for Python 3.4, PEP 460 should be modest in its goals, but I think >> I'd still like to see the following excluded and unknown features added: >> >> * Attribute access: {obj.attr} >> * Indexing: {dict[key]} >> * format keywords? b'{arg}'.format(arg=5) >> * str % dict ? b'%(arg)s' % {'arg': 5) > >I don't think integer values should be supported. Sorry, the point I was making was about the interpolation and lookup features, not the specific values. -Barry From larry at hastings.org Wed Jan 8 01:46:59 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 07 Jan 2014 16:46:59 -0800 Subject: [Python-Dev] The Great Argument Clinic Conversion Derby is now open! Message-ID: <52CCA003.4070702@hastings.org> I'm trying to get a huge chunk of work done on Python 3.4 in the next, oh, week-and-a-half, and I could use your help. I'm trying to convert a whole bunch of call sites in Python to use "Argument Clinic", a new build utility for Python that makes argument parsing code much easier to write. But I don't think I can do it all myself. To learn more about what Argument Clinic is, and get a sense for what the work will be like, please read the howto in the documentation: http://docs.python.org/dev/howto/clinic.html How can you help? I've split up the files in the Python source tree between about two dozen issues on the issue tracker, each having roughly 50 call sites in them to examine*. You can find the issues by searching for the word "Derby". This URL should do the trick: http://bugs.python.org/issue?%40search_text=Derby&ignore=file%3Acontent&title=&%40columns=title&id=&%40columns=id&stage=&creation=&creator=&activity=&%40columns=activity&%40sort=activity&actor=&nosy=&type=&components=&versions=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&nosy_count=&message_count=&%40pagesize=50&%40startwith=0&%40queryname=&%40old-queryname=&%40action=search To participate, find an issue that isn't assigned to anyone and assign it to yourself. (As far as I can tell there's no way to search for issues owned by "nobody", so you'll have to hunt around.) Once you own an issue, open up those files, search for "PyArg_ParseTuple(args" and "PyArg_ParseTupleAndKeywords(args", and start converting! In case you have questions / find bugs / need help, I'll be highly available on IRC in #python-dev during the derby, as well as responding to email and changes on the issue tracker. I'm very willing to review patches. Help! //arry/ * The original plan was, one file per issue on the tracker. But there are 129 files, and I was informed that people would come to my house and make my life unpleasant if I created 129 issues on the tracker. Please forgive me for the mildly-random way the files got bundled together. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmurray at bitdance.com Wed Jan 8 02:32:26 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 07 Jan 2014 20:32:26 -0500 Subject: [Python-Dev] The Great Argument Clinic Conversion Derby is now open! In-Reply-To: <52CCA003.4070702@hastings.org> References: <52CCA003.4070702@hastings.org> Message-ID: <20140108013226.DAC4E250165@webabinitio.net> On Tue, 07 Jan 2014 16:46:59 -0800, Larry Hastings wrote: > > I'm trying to get a huge chunk of work done on Python 3.4 in the next, > oh, week-and-a-half, and I could use your help. I'm trying to convert a > whole bunch of call sites in Python to use "Argument Clinic", a new > build utility for Python that makes argument parsing code much easier to > write. But I don't think I can do it all myself. > > To learn more about what Argument Clinic is, and get a sense for what > the work will be like, please read the howto in the documentation: > > http://docs.python.org/dev/howto/clinic.html > > > How can you help? I've split up the files in the Python source tree > between about two dozen issues on the issue tracker, each having roughly > 50 call sites in them to examine*. You can find the issues by searching > for the word "Derby". This URL should do the trick: > > http://bugs.python.org/issue?%40search_text=Derby&ignore=file%3Acontent&title=&%40columns=title&id=&%40columns=id&stage=&creation=&creator=&activity=&%40columns=activity&%40sort=activity&actor=&nosy=&type=&components=&versions=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&nosy_count=&message_count=&%40pagesize=50&%40startwith=0&%40queryname=&%40old-queryname=&%40action=search > > To participate, find an issue that isn't assigned to anyone and assign > it to yourself. (As far as I can tell there's no way to search for > issues owned by "nobody", so you'll have to hunt around.) Once you own > an issue, open up those files, search for "PyArg_ParseTuple(args" and > "PyArg_ParseTupleAndKeywords(args", and start converting! Note: you can still help even if you are not someone who can assign the issue to themselves. Just make a note that you want to work on the issue in a message posted to the issue. Obviously, everyone should also check for such messages before picking one to work on... If all the issues get claimed, we can start worrying about divvying up the work further. That would be a nice problem to have :) > In case you have questions / find bugs / need help, I'll be highly > available on IRC in #python-dev during the derby, as well as responding > to email and changes on the issue tracker. I'm very willing to review > patches. > > Help! > > > //arry/ > > * The original plan was, one file per issue on the tracker. But there > are 129 files, and I was informed that people would come to my house and > make my life unpleasant if I created 129 issues on the tracker. Please > forgive me for the mildly-random way the files got bundled together. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/rdmurray%40bitdance.com From stephen at xemacs.org Wed Jan 8 05:51:36 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 08 Jan 2014 13:51:36 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <1389134006.2657.67865337.4C46F1B4@webmail.messagingengine.com> References: <20140107154650.0fd63db1@anarchist.wooz.org> <1389134006.2657.67865337.4C46F1B4@webmail.messagingengine.com> Message-ID: <87ha9fgjon.fsf@uwakimon.sk.tsukuba.ac.jp> Benjamin Peterson writes: > I agree. This is a very important, much-requested feature for low-level > networking code. I hear it's much-requested, but is there any description of typical use cases? The ones I've seen on this list and on -ideas are typically stream-oriented, and seem like they would be perfectly well-served in terms of code readability and algorithmic accuracy by reading with .decode('ascii', errors='surrogateescape') and writing with .encode() and the same parameters (or as latin1). > Yes, I don't think we need to support very much of the formatting > language cover 99.8% of formating cases for bytes. And the other 0.02% will be continuous excuses for RFEs and gratuitous bugs in rarely used format specs and ports from str processing to bytes processing. From breamoreboy at yahoo.co.uk Wed Jan 8 08:45:52 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 08 Jan 2014 07:45:52 +0000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140107154650.0fd63db1@anarchist.wooz.org> Message-ID: On 07/01/2014 22:11, Skip Montanaro wrote: > On Tue, Jan 7, 2014 at 2:46 PM, Barry Warsaw wrote: >> I think we should be willing to entertain breaking feature freeze for getting >> this in Python 3.4. > > Maybe you could revert 3.4 to alpha status and give it a cycle or two > there to get this done before returning to beta status. > > Skip > When I first saw the suggestion from Georg I had visions of men in white coats gragging him off :) Having giving the idea more thought I think there's any opportunity here and now to make a very profound long term impact for Python 3. Skip's idea seems to me a clean way to do this. Short term pain, long term gain? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From mark at hotpy.org Wed Jan 8 10:13:46 2014 From: mark at hotpy.org (Mark Shannon) Date: Wed, 08 Jan 2014 09:13:46 +0000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <52CD16CA.5020405@hotpy.org> On 06/01/14 13:24, Victor Stinner wrote: > Hi, > > bytes % args and bytes.format(args) are requested by Mercurial and [snip] I'm opposed to adding methods to bytes for this, as I think it goes against the reason for the separation of str and bytes in the first place. str objects are pieces of text, a list of unicode characters. In other words they have meaning independent of their context. bytes are just a sequence of 8bit clumps. The meaning of bytes depends on the encoding, but the proposed methods will have no encoding, but presume meaning. What does b'%s' % 7 do? u'%s' % 7 calls 7 .__str__() which returns a (unicode) string. By implication b'%s' % 7 would call 7 .__str__() and ... And then what? Use the "default" encoding? ASCII? Explicit is better than implicit. I am not opposed to adding new functionality, as long as it is not overloading the % operator or format() method. binascii.format() perhaps? Cheers, Mark. From mal at egenix.com Wed Jan 8 10:56:43 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 08 Jan 2014 10:56:43 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <52CD20DB.3080708@egenix.com> On 06.01.2014 14:24, Victor Stinner wrote: > Hi, > > bytes % args and bytes.format(args) are requested by Mercurial and > Twisted projects. The issue #3982 was stuck because nobody proposed a > complete definition of the "new" features. Here is a try as a PEP. > > The PEP is a draft with open questions. First, I'm not sure that both > bytes%args and bytes.format(args) are needed. The implementation of > .format() is more complex, so why not only adding bytes%args? +1 on doing all of this. I'd simply copy over the Python 2 PyString code and start working from there. Readding these features makes live a lot easier in situations where you have to work on data which is encoded text using multiple (sometimes even unknown) encodings in a single data chunk. Think MIME messages, mbox files, diffs, etc. In such situations you often know the encoding of the part you're working on (in most cases ASCII), but not necessarily the encodings of other parts of the chunks. You could work around this by decoding from Latin-1, then using Unicode methods and encoding back to Latin-1, but the risk of letting Mojibake enter your application in uncontrolled ways are high. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 08 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From victor.stinner at gmail.com Wed Jan 8 11:02:19 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 8 Jan 2014 11:02:19 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52CD16CA.5020405@hotpy.org> References: <52CD16CA.5020405@hotpy.org> Message-ID: Hi, 2014/1/8 Mark Shannon : > I'm opposed to adding methods to bytes for this, as I think it goes against > the reason for the separation of str and bytes in the first place. Well, sometimes practicability beats purity. Many developers complained that Python 3 is too string. The motivation of the PEP is to ease the transition from Python 2 to Python 3 and be able to write the same code base for the two versions. > bytes are just a sequence of 8bit clumps. > The meaning of bytes depends on the encoding, but the proposed methods will > have no encoding, but presume meaning. Many protocols mix ASCII text with binary bytes. For example, an HTTP server writes headers and then copy the content of a binary file (ex: PNG picture, gzipped HTML page, whatever) *in the same stream*. There are many similar examples. Just another one: PDF mixes ASCII text with binary. > What does b'%s' % 7 do? See Examples of the PEP: b'a%sc%s' % (b'b', 4) gives b'abc4' (so b'%s' % 7 gives b'7') > u'%s' % 7 calls 7 .__str__() which returns a (unicode) string. > By implication b'%s' % 7 would call 7 .__str__() and ... Why do you think do? bytes and str will have two separated implementations, but might share some functions. CPython already has a "stringlib" which shares as much code as possible between bytes and str. For example, the "fastsearch" code is shared. > And then what? Use the "default" encoding? ASCII? Bytes have no encoding. There are just bytes :-) IMO the typical usecase will by b'%s: %s' % (b'Header', binary_data) > I am not opposed to adding new functionality, as long as it is not > overloading the % operator or format() method. Ok, I will record your oppisition in the PEP. > binascii.format() perhaps? Please read the Rationale of the PEP again, binascii.format() doesn't solve the described use case. Victor From victor.stinner at gmail.com Wed Jan 8 11:12:15 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 8 Jan 2014 11:12:15 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52CD20DB.3080708@egenix.com> References: <52CD20DB.3080708@egenix.com> Message-ID: Hi, 2014/1/8 M.-A. Lemburg : > I'd simply copy over the Python 2 PyString code and start working > from there. It's not possible to reuse directly all Python 2 code because some helpers have been modified to work on Unicode. The PEP 460 adds also more work to other implementations of Python. IMO some formatting commands must not be implemented. For example, alignment is used to display something on screen, not in network protocols or binary file formats. It's also why the issue #3982 was stuck, we must define exactly the feature set of the new methods (bytes % args, bytes.format). Victor From rosuav at gmail.com Wed Jan 8 11:16:28 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 8 Jan 2014 21:16:28 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <52CD20DB.3080708@egenix.com> Message-ID: On Wed, Jan 8, 2014 at 9:12 PM, Victor Stinner wrote: > IMO some formatting commands must not be implemented. For example, > alignment is used to display something on screen, not in network > protocols or binary file formats. Must not, or need not? I can understand that those sorts of features would be less valuable, but they do make sense. ChrisA From solipsis at pitrou.net Wed Jan 8 11:26:12 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Jan 2014 11:26:12 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <20140107154650.0fd63db1@anarchist.wooz.org> <1389134006.2657.67865337.4C46F1B4@webmail.messagingengine.com> <87ha9fgjon.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140108112612.168a58f9@fsol> On Wed, 08 Jan 2014 13:51:36 +0900 "Stephen J. Turnbull" wrote: > Benjamin Peterson writes: > > > I agree. This is a very important, much-requested feature for low-level > > networking code. > > I hear it's much-requested, but is there any description of typical > use cases? The ones I've seen on this list and on -ideas are > typically stream-oriented, and seem like they would be perfectly > well-served in terms of code readability and algorithmic accuracy by > reading with .decode('ascii', errors='surrogateescape') and writing > with .encode() and the same parameters (or as latin1). It's a matter of convenience. Sometimes you're just interpolating bytes data together and it's a bit suboptimal to have to do a decode()-encode() dance around that. That said, the whole issue is slightly overblown as well: network programming in 3.x is perfectly reasonable, as the existence of Tornado and Tulip shows. Regards Antoine. From solipsis at pitrou.net Wed Jan 8 11:28:07 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Jan 2014 11:28:07 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <52CD16CA.5020405@hotpy.org> Message-ID: <20140108112807.48fb0dba@fsol> On Wed, 8 Jan 2014 11:02:19 +0100 Victor Stinner wrote: > > > What does b'%s' % 7 do? > > See Examples of the PEP: > > b'a%sc%s' % (b'b', 4) gives b'abc4' [...] > > And then what? Use the "default" encoding? ASCII? > > Bytes have no encoding. There are just bytes :-) Therefore you shouldn't accept integers. It does not make sense to format 4 as b'4'. > IMO the typical usecase will by b'%s: %s' % (b'Header', binary_data) Agreed. Regards Antoine. From mal at egenix.com Wed Jan 8 11:40:15 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 08 Jan 2014 11:40:15 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <52CD20DB.3080708@egenix.com> Message-ID: <52CD2B0F.60606@egenix.com> On 08.01.2014 11:12, Victor Stinner wrote: > Hi, > > 2014/1/8 M.-A. Lemburg : >> I'd simply copy over the Python 2 PyString code and start working >> from there. > > It's not possible to reuse directly all Python 2 code because some > helpers have been modified to work on Unicode. The PEP 460 adds also > more work to other implementations of Python. > > IMO some formatting commands must not be implemented. For example, > alignment is used to display something on screen, not in network > protocols or binary file formats. It's also why the issue #3982 was > stuck, we must define exactly the feature set of the new methods > (bytes % args, bytes.format). I'd use practicality beats purity here. As I mentioned in my reply, such formatting methods would indeed be used on data that is text. It's just that this text would be embedded inside an otherwise binary blob. You could do the alignment in Unicode first, then encode it and format it into the binary blob, but really: why bother with that extra round-trip ? The main purpose of the readdition would be to simplify porting applications to Python 3, while keeping them compatible with Python 2 as well. If you need to do the Unicode round-trip just to align a string in some fixed sized field, you might as well convert the whole operation to a function which deals with all this based on whether Python 2 or 3 is running and you'd lose the intended simplification of the readdition. PS: The PEP mentions having to code for Python 3.0-3.4 as well, which would don't support the new methods. I think it's perfectly fine to have newly ported code to require Python 2.7/3.5+. After all, the porting effort will take some time as well. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 08 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Wed Jan 8 13:01:21 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Jan 2014 13:01:21 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: Message-ID: <20140108130121.7dda5063@fsol> Hi Victor, On Mon, 6 Jan 2014 14:24:50 +0100 Victor Stinner wrote: > Hi, > > bytes % args and bytes.format(args) are requested by Mercurial and > Twisted projects. The issue #3982 was stuck because nobody proposed a > complete definition of the "new" features. Here is a try as a PEP. There is a good use case at: https://mail.python.org/pipermail/python-ideas/2014-January/024803.html Regards Antoine. > > The PEP is a draft with open questions. First, I'm not sure that both > bytes%args and bytes.format(args) are needed. The implementation of > .format() is more complex, so why not only adding bytes%args? Then, > the following points must be decided to define the complete list of > supported features (formatters): > > * Format integer to hexadecimal? ``%x`` and ``%X`` > * Format integer to octal? ``%o`` > * Format integer to binary? ``{!b}`` > * Alignment? > * Truncating? Truncate or raise an error? > * format keywords? ``b'{arg}'.format(arg=5)`` > * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)`` > * Floating point number? > * ``%i``, ``%u`` and ``%d`` formats for integer numbers? > * Signed number? ``%+i`` and ``%-i`` > > > HTML version of the PEP: > http://www.python.org/dev/peps/pep-0460/ > > Inline copy: > > PEP: 460 > Title: Add bytes % args and bytes.format(args) to Python 3.5 > Version: $Revision$ > Last-Modified: $Date$ > Author: Victor Stinner > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 6-Jan-2014 > Python-Version: 3.5 > > > Abstract > ======== > > Add ``bytes % args`` operator and ``bytes.format(args)`` method to > Python 3.5. > > > Rationale > ========= > > ``bytes % args`` and ``bytes.format(args)`` have been removed in Python > 2. This operator and this method are requested by Mercurial and Twisted > developers to ease porting their project on Python 3. > > Python 3 suggests to format text first and then encode to bytes. In > some cases, it does not make sense because arguments are bytes strings. > Typical usage is a network protocol which is binary, since data are > send to and received from sockets. For example, SMTP, SIP, HTTP, IMAP, > POP, FTP are ASCII commands interspersed with binary data. > > Using multiple ``bytes + bytes`` instructions is inefficient because it > requires temporary buffers and copies which are slow and waste memory. > Python 3.3 optimizes ``str2 += str2`` but not ``bytes2 += bytes1``. > > ``bytes % args`` and ``bytes.format(args)`` were asked since 2008, even > before the first release of Python 3.0 (see issue #3982). > > ``struct.pack()`` is incomplete. For example, a number cannot be > formatted as decimal and it does not support padding bytes string. > > Mercurial 2.8 still supports Python 2.4. > > > Needed and excluded features > ============================ > > Needed features > > * Bytes strings: bytes, bytearray and memoryview types > * Format integer numbers as decimal > * Padding with spaces and null bytes > * "%s" should use the buffer protocol, not str() > > The feature set is minimal to keep the implementation as simple as > possible to limit the cost of the implementation. ``str % args`` and > ``str.format(args)`` are already complex and difficult to maintain, the > code is heavily optimized. > > Excluded features: > > * no implicit conversion from Unicode to bytes (ex: encode to ASCII or > to Latin1) > * Locale support (``{!n}`` format for numbers). Locales are related to > text and usually to an encoding. > * ``repr()``, ``ascii()``: ``%r``, ``{!r}``, ``%a`` and ``{!a}`` > formats. ``repr()`` and ``ascii()`` are used to debug, the output is > displayed a terminal or a graphical widget. They are more related to > text. > * Attribute access: ``{obj.attr}`` > * Indexing: ``{dict[key]}`` > * Features of struct.pack(). For example, format a number as 32 bit unsigned > integer in network endian. The ``struct.pack()`` can be used to prepare > arguments, the implementation should be kept simple. > * Features of int.to_bytes(). > * Features of ctypes. > * New format protocol like a new ``__bformat__()`` method. Since the > * list of > supported types is short, there is no need to add a new protocol. > Other types must be explicitly casted. > * Alternate format for integer. For example, ``'{|#x}'.format(0x123)`` > to get ``0x123``. It is more related to debug, and the prefix can be > easily be written in the format string (ex: ``0x%x``). > * Relation with format() and the __format__() protocol. bytes.format() > and str.format() are unrelated. > > Unknown: > > * Format integer to hexadecimal? ``%x`` and ``%X`` > * Format integer to octal? ``%o`` > * Format integer to binary? ``{!b}`` > * Alignment? > * Truncating? Truncate or raise an error? > * format keywords? ``b'{arg}'.format(arg=5)`` > * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)`` > * Floating point number? > * ``%i``, ``%u`` and ``%d`` formats for integer numbers? > * Signed number? ``%+i`` and ``%-i`` > > > bytes % args > ============ > > Formatters: > > * ``"%c"``: one byte > * ``"%s"``: integer or bytes strings > * ``"%20s"`` pads to 20 bytes with spaces (``b' '``) > * ``"%020s"`` pads to 20 bytes with zeros (``b'0'``) > * ``"%\020s"`` pads to 20 bytes with null bytes (``b'\0'``) > > > bytes.format(args) > ================== > > Formatters: > > * ``"{!c}"``: one byte > * ``"{!s}"``: integer or bytes strings > * ``"{!.20s}"`` pads to 20 bytes with spaces (``b' '``) > * ``"{!.020s}"`` pads to 20 bytes with zeros (``b'0'``) > * ``"{!\020s}"`` pads to 20 bytes with null bytes (``b'\0'``) > > > Examples > ======== > > * ``b'a%sc%s' % (b'b', 4)`` gives ``b'abc4'`` > * ``b'a{}c{}'.format(b'b', 4)`` gives ``b'abc4'`` > * ``b'%c'`` % 88`` gives ``b'X``' > * ``b'%%'`` gives ``b'%'`` > > > Criticisms > ========== > > * The development cost and maintenance cost. > * In 3.3 encoding to ascii or latin1 is as fast as memcpy > * Developers must work around the lack of bytes%args and > bytes.format(args) anyway to support Python 3.0-3.4 > * bytes.join() is consistently faster than format to join bytes strings. > * Formatting functions can be implemented in a third party module > > > References > ========== > > * `Issue #3982: support .format for bytes > `_ > * `Mercurial project > `_ > * `Twisted project > `_ > * `Documentation of Python 2 formatting (str % args) > `_ > * `Documentation of Python 2 formatting (str.format) > `_ > > Copyright > ========= > > This document has been placed in the public domain. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: From dholth at gmail.com Wed Jan 8 14:56:37 2014 From: dholth at gmail.com (Daniel Holth) Date: Wed, 8 Jan 2014 08:56:37 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <87wqibhkhb.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87wqibhkhb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jan 7, 2014 at 10:36 AM, Stephen J. Turnbull wrote: > Daniel Holth writes: > > > Isn't it true that if you have bytes > 127 or surrogate escapes then > > encoding to latin1 is no longer as fast as memcpy? > > Be careful. As phrased, the question makes no sense. You don't "have > bytes" when you are encoding, you have characters. > > If you mean "what happens when my str contains characters in the range > 128-255?", the answer is encoding a str in 8-bit representation to > latin1 is effectively memcpy. If you read in latin1, it's memcpy all > the way (unless you combine it with a non-latin1 string, in which case > you're in the cases below). > > If you mean "what happens when my str contains characters in the range >> 255", you have to truncate 16-bit units to 8 bit units; no memcpy. > > Surrogates require >= 16 bits; no memcpy. That is neat. From d2mp1a9 at newsguy.com Wed Jan 8 15:43:01 2014 From: d2mp1a9 at newsguy.com (Bob Hanson) Date: Wed, 08 Jan 2014 06:43:01 -0800 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 References: <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> <20140106040650.33B4D250165@webabinitio.net> Message-ID: <6olqc9hvcfkff6km7rc1hre06ae6ereo23@4ax.com> [Top-post fixed (use-case is an exception to the GvR rule ;-) ) and some attributions restored with my additional comments following for the ease of future readers.] TL;DR: Outbound-connection attempts seem to be happening only to me, therefore, most likely not a Python problem -- but some problem at my end. Thanks to all. On Mon, 6 Jan 2014 05:43:38 -1000, Guido van Rossum wrote: > On Mon, Jan 6, 2014 at 5:29 AM, Bob Hanson wrote: > > > [For the record: I'm running 32bit Windows XP (Pro) SP2 and > > installing "for all users."] > > > > TL;DR: No matter what I tried this morning re uninstalling and > > reinstalling 3.4.0b2, pip or no pip, MSI still tried to connect > > to the Akamai URLs. > > > > On Sun, 05 Jan 2014 23:06:49 -0500, R. David Murray wrote: > > > > > On Sun, 05 Jan 2014 19:32:15 -0800, Bob Hanson wrote: > > > > > > > Still wondering why [...] msiexec.exe [is] trying to connect out while > > > > installing 3.4.0b2 from my harddrive...? > > > > > > The ensurepip developers will have to say for sure, but my understanding > > > is that it does *not* go out to the network. On the other hand, it is > > > conceivable that pip 1.5, unlike the earlier version in Beta1, is doing > > > some sort of "up to date check" that it shouldn't be doing in the > > > ensurepip scenario. > > > > > > I presume you did have the installer install pip. > > > > To be honest, I forgot all about pip [...] didn't > > even notice a checkbox for that option. > > > > > If you haven't already, You might try reinstalling and unchecking > > > that option, and see if it msiexec still tries to go out to the > > > network. That would confirm it is ensurepip that is the issue > > > (although that does seem most likely). > > > > [...snip synopsis of various uninstall-reinstall dances...] > > > > So, whatever I have tried -- pip or no pip -- msiexec.exe still > > attempts to connect to those Akamai URLs. > > Since MSIEXEC.EXE is a legit binary (not coming from our packager) and > Akamai is a legitimate company (MS most likely has an agreement with > them), at this point I would assume that there's something that > MSIEXEC.EXE wants to get from Akamai, which is unintentionally but > harmlessly triggered by the Python install. Could it be checking for > upgrades? When I read this comment of yours, Guido, I immediately started wondering about this. You may well be right -- indeed, I have a very old install (c.2007) which has not been updated (other than one or three new MS "drivers"). Perhaps the Python 3.4.0b2 MSI installer uses a new capability, which, as you say, causes the installer to at least attempt to upgrade...? In any event, as there's been no other reports, this seems to be something happening only to me. As such, it seems to be not a Python problem, but some misconfiguration on my own system, say. If I retain interest in investigating this, and if I *do* find an actual problem with Python, I'll post again. Thanks go to you, Guido, as well as to Tim and all the others who helped me with this. Regards, Bob Hanson -- Write once, read many. From ncoghlan at gmail.com Wed Jan 8 16:03:18 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jan 2014 01:03:18 +1000 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: <6olqc9hvcfkff6km7rc1hre06ae6ereo23@4ax.com> References: <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> <20140106040650.33B4D250165@webabinitio.net> <6olqc9hvcfkff6km7rc1hre06ae6ereo23@4ax.com> Message-ID: On 9 January 2014 00:43, Bob Hanson wrote: > When I read this comment of yours, Guido, I immediately started > wondering about this. You may well be right -- indeed, I have a > very old install (c.2007) which has not been updated (other than > one or three new MS "drivers"). > > Perhaps the Python 3.4.0b2 MSI installer uses a new capability, > which, as you say, causes the installer to at least attempt to > upgrade...? I believe the pip bootstrapping involves an MSI feature we haven't previously used (MvL would be able to confirm). If so, then MSI may be looking for a new version to interpret that new setting. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From barry at python.org Wed Jan 8 16:08:14 2014 From: barry at python.org (Barry Warsaw) Date: Wed, 8 Jan 2014 10:08:14 -0500 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> Message-ID: <20140108100814.408a91f2@anarchist.wooz.org> On Jan 07, 2014, at 10:39 PM, Serhiy Storchaka wrote: >Only this option will solve all my issues. How hard would it be to put together some sample branches that provide concrete examples of the various options? My own opinion could easily be influenced by having some hands-on time with actual code, and I suspect even Guido could be influenced if he could pull some things up in his editor and take a look around. -Barry From larry at hastings.org Wed Jan 8 16:28:53 2014 From: larry at hastings.org (Larry Hastings) Date: Wed, 08 Jan 2014 07:28:53 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140108100814.408a91f2@anarchist.wooz.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> Message-ID: <52CD6EB5.8030202@hastings.org> On 01/08/2014 07:08 AM, Barry Warsaw wrote: > On Jan 07, 2014, at 10:39 PM, Serhiy Storchaka wrote: > >> Only this option will solve all my issues. > How hard would it be to put together some sample branches that provide > concrete examples of the various options? > > My own opinion could easily be influenced by having some hands-on time with > actual code, and I suspect even Guido could be influenced if he could pull > some things up in his editor and take a look around. I plan to prototype the "accumulator" later today. It probably wouldn't be hard to make the prototype support writing out to a separate file, so I'll try to do that too. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Jan 8 16:33:33 2014 From: brett at python.org (Brett Cannon) Date: Wed, 8 Jan 2014 10:33:33 -0500 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52CC96D6.6090304@hastings.org> References: <20140107205308.05e1b5ce@fsol> <52CC8CB1.9070301@hastings.org> <52CC96D6.6090304@hastings.org> Message-ID: On Tue, Jan 7, 2014 at 7:07 PM, Larry Hastings wrote: > > > On 01/07/2014 03:38 PM, Brett Cannon wrote: > > On Tue, Jan 7, 2014 at 6:24 PM, Larry Hastings wrote: > >> For what it's worth, if we use the "accumulator" approach I propose >> that the generated code doesn't go at the very end of the file. Instead, I >> suggest they should go *near* the end, below the implementations of the >> module / class methods, but above the methoddef/type structures and the >> module init function. >> > > If it is accumulated in a single location should it just be a single > block for everything towards the end? Then forward declarations would go > away (you could still have it as a comment to copy-and-paste where you > define the implementation) and you can have a single macro for the > PyMethodDef values, each class, etc. If you accumulated the PyMethodDef > values into a single macro it would help make up for the convenience lost > of converting a function by just cutting the old call signature up to the > new *_impl() function. > > > I *think* that would complicate some use cases. People occasionally call > these parsing functions from other functions, or spread their methoddef / > typeobject structures throughout the file rather than putting them all at > the end. > > I'm proposing that the blob of text immediately between the Clinic input > and the body of the impl contain (newlines added here for clarity): > > static char *parsing_function_doc; > > static PyObject * > parsing_function(...); > > #define PARSING_FUNCTION_METHODDEF \ > { ... } > > static PyObject * > parsing_function_impl(...) > > Then the "accumulator" would get the text of the docstring and the > definition of the parsing_function. > > > On the other hand, if we wanted to take this opportunity to force everyone > to standardize (all methoddefs and typeobjects go at the end!) we could > probably make it work with one giant block near the end. > > Or I could make it flexible on what went into the accumulator and what > went into the normal output block, and the default could be > everything-in-the-accumulator. Making the common easy and the uncommon > possible and all that. Yeah, that seems best. > So let's make this idea concrete to focus a possible discussion. Using the example from the Clinic HOWTO and converting to how I see it working: #################### /*[clinic input] pickle.Pickler.dump obj: 'O' The object to be pickled. / Write a pickled representation of obj to the open file. [clinic start generated code]*/ static PyObject * pickle_Pickler_dump_impl(PyObject *self, PyObject *obj) /*[clinic end generated code: checksum=3bd30745bf206a48f8b576a1da3d90f55a0a4187]*/ { /* Check whether the Pickler was initialized correctly (issue3664). Developers often forget to call __init__() in their subclasses, which would trigger a segfault without this check. */ if (self->write == NULL) { PyErr_Format(PicklingError, "Pickler.__init__() was not called by %s.__init__()", Py_TYPE(self)->tp_name); return NULL; } if (_Pickler_ClearBuffer(self) < 0) return NULL ... } ... /*[clinic accumulate]*/ PyDoc_STRVAR(pickle_Pickler_dump__doc__, "Write a pickled representation of obj to the open file.\n" "\n" static PyObject * _pickle_Pickler_dump(PyObject *args) { ... return pickle_Pickler_dump_impl(...); } #define _PICKLE_PICKLER_DUMP_METHODDEF \ {"dump", (PyCFunction)_pickle_Pickler_dump, METH_O, _pickle_Pickler_dump__doc__}, ... any other pickler.Pickler Clinic stuff that does not directly involve the the impl function ... #define _PICKLE_PICKLER_METHODDEF_ACCUMULATED \ _PICKLE_PICKLER_DUMP_METHODDEF \ ... any other MethodDef entries for pickle.Pickler /*[clinic end accumulate: checksum=0123456789]*/ ... pickle.Pickler struct where _PICKLE_PICKLER_METHODDEF_ACCUMULATED is all that is needed for the non-magical class methods ... ########################### Another potential perk of doing a gathering of Clinic output is that if we take it to it's logical conclusion, then you can start to do things like define a method like pickle.Pickler.__init__, etc., have Clinic handle docstrings for modules and classes, and then it can end up spitting out the type struct entirely for you, negating the typical need to do all of that by hand (I don't know about the rest of you but I always just copy and paste that struct anyway, so having a tool slot in the right method names for the right positions would save me busy work). It could then go as far as then spit out the module initialization function definition line and then all you would need to do is fill that in; Clinic could handle all other module-level details for you in the very common case. -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Wed Jan 8 16:46:21 2014 From: larry at hastings.org (Larry Hastings) Date: Wed, 08 Jan 2014 07:46:21 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> <52CC8CB1.9070301@hastings.org> <52CC96D6.6090304@hastings.org> Message-ID: <52CD72CD.8030606@hastings.org> On 01/08/2014 07:33 AM, Brett Cannon wrote: > So let's make this idea concrete to focus a possible discussion. Using > the example from the Clinic HOWTO and converting to how I see it working: > [...] Yep. And what I was proposing is much the same, except there are a couple extra lines in the "generated code" section. I'd keep the #define for the methoddef there, and add a prototype for the generated parsing function (_pickle_Pickler_dump) and the docstring. #################### /*[clinic input] pickle.Pickler.dump obj: 'O' The object to be pickled. / Write a pickled representation of obj to the open file. [clinic start generated code]*/ PyDoc_VAR(pickle_Pickler_dump__doc__); static PyObject * _pickle_Pickler_dump(PyObject *args); #define _PICKLE_PICKLER_DUMP_METHODDEF \ {"dump", (PyCFunction)_pickle_Pickler_dump, METH_O, _pickle_Pickler_dump__doc__}, static PyObject * pickle_Pickler_dump_impl(PyObject *self, PyObject *obj) /*[clinic end generated code: checksum=3bd30745bf206a48f8b576a1da3d90f55a0a4187]*/ { /* Check whether the Pickler was initialized correctly (issue3664). Developers often forget to call __init__() in their subclasses, which would trigger a segfault without this check. */ ... } > Another potential perk of doing a gathering of Clinic output is that > if we take it to it's logical conclusion, then you can start to do > things like define a method like pickle.Pickler.__init__, etc., have > Clinic handle docstrings for modules and classes, and then it can end > up spitting out the type struct entirely for you, negating the typical > need to do all of that by hand (I don't know about the rest of you but > I always just copy and paste that struct anyway, so having a tool slot > in the right method names for the right positions would save me busy > work). Surely new code should use the functional API for creating types? Anyway, yes, in the future it would be nice to get rid of a bunch of the busywork associated with implementing a Python builtin type, and Argument Clinic could definitely help with that. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Jan 8 17:04:01 2014 From: brett at python.org (Brett Cannon) Date: Wed, 8 Jan 2014 11:04:01 -0500 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52CD72CD.8030606@hastings.org> References: <20140107205308.05e1b5ce@fsol> <52CC8CB1.9070301@hastings.org> <52CC96D6.6090304@hastings.org> <52CD72CD.8030606@hastings.org> Message-ID: On Wed, Jan 8, 2014 at 10:46 AM, Larry Hastings wrote: > > > On 01/08/2014 07:33 AM, Brett Cannon wrote: > > So let's make this idea concrete to focus a possible discussion. Using the > example from the Clinic HOWTO and converting to how I see it working: > [...] > > > Yep. And what I was proposing is much the same, except there are a couple > extra lines in the "generated code" section. I'd keep the #define for the > methoddef there, and add a prototype for the generated parsing function > (_pickle_Pickler_dump) and the docstring. > I assume that's for flexibility in case someone has their module structured in a way that doesn't lend itself to having it all accumulated at the end of the file? Or is there something I'm overlooking? I would assume being able to put the accumulator block where ever you want with enough forward declarations would still be enough to allow for it to work with almost any structured format of a file and have almost all the generated code in a single place. I can definitely live with what you are proposing, just trying to understand the logic as shifting almost all generated stuff in a single place does make Clinic comments read like fancy docstrings which is nice. > > > #################### > > /*[clinic input] > pickle.Pickler.dump > > obj: 'O' > The object to be pickled. > / > > Write a pickled representation of obj to the open file. > [clinic start generated code]*/ > PyDoc_VAR(pickle_Pickler_dump__doc__); > > static PyObject * > _pickle_Pickler_dump(PyObject *args); > > #define _PICKLE_PICKLER_DUMP_METHODDEF \ > {"dump", (PyCFunction)_pickle_Pickler_dump, METH_O, > _pickle_Pickler_dump__doc__}, > > static PyObject * > pickle_Pickler_dump_impl(PyObject *self, PyObject *obj) > /*[clinic end generated code: > checksum=3bd30745bf206a48f8b576a1da3d90f55a0a4187]*/ > { > /* Check whether the Pickler was initialized correctly (issue3664). > Developers often forget to call __init__() in their subclasses, > which > would trigger a segfault without this check. */ > ... > } > > Another potential perk of doing a gathering of Clinic output is that if > we take it to it's logical conclusion, then you can start to do things like > define a method like pickle.Pickler.__init__, etc., have Clinic handle > docstrings for modules and classes, and then it can end up spitting out the > type struct entirely for you, negating the typical need to do all of that > by hand (I don't know about the rest of you but I always just copy and > paste that struct anyway, so having a tool slot in the right method names > for the right positions would save me busy work). > > > Surely new code should use the functional API for creating types? > Yes. Shows how long it has been since I have written a C type from scratch. =) > Anyway, yes, in the future it would be nice to get rid of a bunch of the > busywork associated with implementing a Python builtin type, and Argument > Clinic could definitely help with that. > I think that will be the big long-term win; taking out nearly all boilerplate in creating an extension module and maintaining it (in case something changes, e.g. the module init function signature). -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Wed Jan 8 17:29:59 2014 From: larry at hastings.org (Larry Hastings) Date: Wed, 08 Jan 2014 08:29:59 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> <52CC8CB1.9070301@hastings.org> <52CC96D6.6090304@hastings.org> <52CD72CD.8030606@hastings.org> Message-ID: <52CD7D07.7090700@hastings.org> On 01/08/2014 08:04 AM, Brett Cannon wrote: > On Wed, Jan 8, 2014 at 10:46 AM, Larry Hastings > wrote: > > Yep. And what I was proposing is much the same, except there are > a couple extra lines in the "generated code" section. I'd keep > the #define for the methoddef there, and add a prototype for the > generated parsing function (_pickle_Pickler_dump) and the docstring. > > > I assume that's for flexibility in case someone has their module > structured in a way that doesn't lend itself to having it all > accumulated at the end of the file? Or is there something I'm overlooking? No, you're not overlooking anything, that's why. It's for files that have getsetdef / methoddef / typeobject structures all over the place instead of keeping them all at the end. My mindset is trying to avoid requiring big changes for Argument Clinic support like "step 87: now move all your getsetdef / methoddef / typeobject to the end of your file, below the accumulator output block". Argument Clinic is contributing enough churn as it is don'tchathink! //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jan 8 17:12:46 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 08 Jan 2014 08:12:46 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140108112807.48fb0dba@fsol> References: <52CD16CA.5020405@hotpy.org> <20140108112807.48fb0dba@fsol> Message-ID: <52CD78FE.8010600@stoneleaf.us> On 01/08/2014 02:28 AM, Antoine Pitrou wrote: > On Wed, 8 Jan 2014 11:02:19 +0100 > Victor Stinner wrote: >> >>> What does b'%s' % 7 do? >> >> See Examples of the PEP: >> >> b'a%sc%s' % (b'b', 4) gives b'abc4' > [...] >>> And then what? Use the "default" encoding? ASCII? >> >> Bytes have no encoding. There are just bytes :-) > > Therefore you shouldn't accept integers. It does not make sense to > format 4 as b'4'. Agreed. I would have that it would result in b'\x04'. -- ~Ethan~ From victor.stinner at gmail.com Wed Jan 8 18:25:16 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 8 Jan 2014 18:25:16 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52CD78FE.8010600@stoneleaf.us> References: <52CD16CA.5020405@hotpy.org> <20140108112807.48fb0dba@fsol> <52CD78FE.8010600@stoneleaf.us> Message-ID: 2014/1/8 Ethan Furman : >> Therefore you shouldn't accept integers. It does not make sense to >> format 4 as b'4'. > > Agreed. I would have that it would result in b'\x04'. The PEP proposes b'%c' % 4 => b'\x04. Antoine gave me a good argument against supporting b'%s' % int: how would int subclasses be handled? int has no __bytes__() nor __bformat__() method. bytes(int) returns a string of null bytes. I'm maybe simpler to only support %s format with bytes-like objects (bytes, bytearray, memoryview). Victor From stefan_ml at behnel.de Wed Jan 8 19:12:06 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 08 Jan 2014 19:12:06 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: Victor Stinner, 07.01.2014 19:14: > 2014/1/7 Stefan Behnel: >> Victor Stinner, 06.01.2014 14:24: >>> ``struct.pack()`` is incomplete. For example, a number cannot be >>> formatted as decimal and it does not support padding bytes string. >> >> Then what about extending the struct module in a way that makes it cover >> more use cases like these? > > The idea of the PEP is to simply the portage work of Twisted and > Mercurial developers. So the same code should work on Python 2 and > Python 3. Is it really a requirement that existing Py2 code must work unchanged in Py3? Why can't someone write a third-party library that does what these projects need, and that works in both Py2 and Py3, so that these projects can be modified to use that library and thus get on with their porting to Py3? Or rather one library that does what some projects need and another one that does what other projects need, because it's quite likely that the requirements are not really as largely identical as it seems when seen through the old and milky Py2 glasses. One idea of designing a Py3 was to simplify the language. Getting all Py2 "features" back in doesn't help on that path. If something can easily be done in an external module, I think it should be done there. Stefan From ericsnowcurrently at gmail.com Wed Jan 8 19:16:49 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 8 Jan 2014 11:16:49 -0700 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52CD2B0F.60606@egenix.com> References: <52CD20DB.3080708@egenix.com> <52CD2B0F.60606@egenix.com> Message-ID: On Wed, Jan 8, 2014 at 3:40 AM, M.-A. Lemburg wrote: > PS: The PEP mentions having to code for Python 3.0-3.4 as well, > which would don't support the new methods. I think it's perfectly > fine to have newly ported code to require Python 2.7/3.5+. After > all, the porting effort will take some time as well. tl;dr We must get the relevant library projects involved in this discussion. I prefer Nick's solution to the problem at hand. I've mostly stayed out of this discussion because I neither have many unicode-related use-cases nor a deep understanding of all the issues. However, my investment in the community is such that I've been following these discussions and hope to add what I can in what few places I chime in. :) Requiring 3.5 may be tricky though. How soon will 3.5 show up in OS distros or be their system Python? Getting 3.5 on their system may not be a simple option for some (perhaps too few to matter?) and may be seen as too onerous to others. This effort is meant to ease porting to Python 3 and not as just a carrot like most other new features. It boils down to 3.5 being *the* target for porting from 2.7. Otherwise we'd be better off adding a new type to 3.5 for the wire-protocol use cases and providing a 2.7/3.x backport on the cheeseshop that would facilitate porting such code bases to 3.5. My understanding is that is basically what Nick has proposed (sorry, Nick, if I've misunderstood). The latter approach makes more sense to me. However, it seems like this whole discussion is motivated by a particular group of library projects. Regardless of what we discuss or the solutions on which we resolve, we'd be making a mistake if we did not do our utmost to ensure those projects are directly involved in these discussions. -eric From solipsis at pitrou.net Wed Jan 8 19:47:54 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Jan 2014 19:47:54 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <52CD20DB.3080708@egenix.com> <52CD2B0F.60606@egenix.com> Message-ID: <20140108194754.370b0765@fsol> On Wed, 8 Jan 2014 11:16:49 -0700 Eric Snow wrote: > > It boils down to 3.5 being *the* target for porting from 2.7. No. Please let's stop being self-deprecating. 3.3 is fine as a porting target, as the many high-profile libraries which have already been ported can attest. > Otherwise we'd be better off adding a new type to 3.5 for the > wire-protocol use cases I'm completely opposed to a new type. Regards Antoine. From ericsnowcurrently at gmail.com Wed Jan 8 19:59:51 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 8 Jan 2014 11:59:51 -0700 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: On Mon, Jan 6, 2014 at 6:24 AM, Victor Stinner wrote: > Abstract > ======== > > Add ``bytes % args`` operator and ``bytes.format(args)`` method to > Python 3.5. > > > Rationale > ========= > > ``bytes % args`` and ``bytes.format(args)`` have been removed in Python > 2. This operator and this method are requested by Mercurial and Twisted > developers to ease porting their project on Python 3. > > Python 3 suggests to format text first and then encode to bytes. In > some cases, it does not make sense because arguments are bytes strings. > Typical usage is a network protocol which is binary, since data are > send to and received from sockets. For example, SMTP, SIP, HTTP, IMAP, > POP, FTP are ASCII commands interspersed with binary data. > > Using multiple ``bytes + bytes`` instructions is inefficient because it > requires temporary buffers and copies which are slow and waste memory. > Python 3.3 optimizes ``str2 += str2`` but not ``bytes2 += bytes1``. > > ``bytes % args`` and ``bytes.format(args)`` were asked since 2008, even > before the first release of Python 3.0 (see issue #3982). > > ``struct.pack()`` is incomplete. For example, a number cannot be > formatted as decimal and it does not support padding bytes string. > > Mercurial 2.8 still supports Python 2.4. As an alternative, we could provide an import hook via some channel (cheeseshop? recipe?) that converts just b'' formatting into some Python 3 equivalent (when run under Python 3). The argument against such import hooks is usually that they have an adverse impact on the output of tracebacks. However, I'd expect most b'' formatting to happen on a single line and that the replacement source would stay on that single line. Such an import hook would lessen the desire for bytes formatting. As I mentioned elsewhere, Nick's counter-proposal of a separate wire-protocol-friendly type makes more sense to me more than adding formatting to Python 3's bytes type. As others have opined, formatting a bytes object is out of place. The need is limited in scope and audience, but apparently real. Adding that capability directly to bytes in 3.5 should be a last resort to which we appeal only when we exhaust our other options. -eric From solipsis at pitrou.net Wed Jan 8 20:08:25 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Jan 2014 20:08:25 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: Message-ID: <20140108200825.3fb1bd6d@fsol> On Wed, 8 Jan 2014 11:59:51 -0700 Eric Snow wrote: > As others have opined, > formatting a bytes object is out of place. However, interpolating a bytes object isn't out of place, and it is what a minimal "formatting" primitive could do. Regards Antoine. From stefan_ml at behnel.de Wed Jan 8 20:17:21 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 08 Jan 2014 20:17:21 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: Victor Stinner, 06.01.2014 14:24: > Abstract > ======== > Add ``bytes % args`` operator and ``bytes.format(args)`` method to > Python 3.5. Here is a counterproposal. Let someone who needs this feature write a library that does byte string formatting. That properly handles it, a full featured tool set. Write it in Cython if you need raw speed, that will also help in making it run in both Python 2 and Python 3, or in providing easy integration with buffers like the array module, various byte containers, NumPy, etc. I'm confident that this will show that the current Py2 code that (legitimately) does byte string formatting can actually be improved, simplified or sped up, at least in some corners. I'm sure Py2 byte string formatting wasn't perfect for this use case either, it just happened to be there, so everyone used it and worked around its particular quirks for the particular use case at hand. (Think of "%s" % some_unicode_value, for example.) Instead of waiting for 3.5, a third party library allows users to get started porting their code earlier, and to make it work unchanged with Python versions before 3.5. Stefan From matt at vazor.com Wed Jan 8 20:22:08 2014 From: matt at vazor.com (Matt Billenstein) Date: Wed, 08 Jan 2014 19:22:08 +0000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 Message-ID: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> On Wed, Jan 08, 2014 at 07:12:06PM +0100, Stefan Behnel wrote: > Why can't someone write a third-party library that does what these projects > need, and that works in both Py2 and Py3, so that these projects can be > modified to use that library and thus get on with their porting to Py3? Apologies if this is out of place and slightly OT and soap-boxey... Does it not strike anyone here how odd it is that one would need a library to manipulate binary data in a programming language with "batteries included" on a binary computer? And maybe you can do it with existing facilities in both versions of Python, although in python3, I need to understand what bytes, format, ascii, and surrogateescape mean - among other things. I started in Python blissfully unaware of unicode - it was a different time for sure, but what I knew from C worked pretty much the same in Python - I could read some binary data out of a file, twiddle some bits, and write it back out again without any of these complexities - life was good and granted I was naive, but it made Python approachable for me and I enjoyed it. I stuck with it and learned about unicode and the complexities of encoding data and now I'm astonished at how many professional programmers don't know the slightest bit about it and how horribly munged some data you can consume on the web might be - I agree it's all quite a mess. So now I'm getting more serious about Python3 and my fear is that the development community (python3) has fractured from the user community (python2) in that they've built something that solves their problems (to oversimplify lets say a webapp) - sure, a bunch of stuff got fixed along the way and we gave the users division they would expect (3/2 == 1.5), but somewhere what I felt was more like a hobbyist language has become big and complex and "we need to protect our users from doing the wrong thing." And I think everyone was well intentioned - and python3 covers most of the bases, but working with binary data is not only a "wire-protocol programmer's" problem. Needing a library to wrap bytesthing.format('ascii', 'surrogateescape') or some such thing makes python3 less approachable for those who haven't learned that yet - which was almost all of us at some point when we started programming. I appreciate everyone's hard work - I'm confident the community will cross the 2-3 chasm and I hope we preserve the approachability I first came to love about Python when I started using it for all sorts of applications. thx m -- Matt Billenstein matt at vazor.com http://www.vazor.com/ From solipsis at pitrou.net Wed Jan 8 20:46:07 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Jan 2014 20:46:07 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: Message-ID: <20140108204607.12c27c5f@fsol> Hi, Another remark about the PEP: it should define bytearray % args and bytearray.format(args) as well. Regards Antoine. On Mon, 6 Jan 2014 14:24:50 +0100 Victor Stinner wrote: > Hi, > > bytes % args and bytes.format(args) are requested by Mercurial and > Twisted projects. The issue #3982 was stuck because nobody proposed a > complete definition of the "new" features. Here is a try as a PEP. > > The PEP is a draft with open questions. First, I'm not sure that both > bytes%args and bytes.format(args) are needed. The implementation of > .format() is more complex, so why not only adding bytes%args? Then, > the following points must be decided to define the complete list of > supported features (formatters): > > * Format integer to hexadecimal? ``%x`` and ``%X`` > * Format integer to octal? ``%o`` > * Format integer to binary? ``{!b}`` > * Alignment? > * Truncating? Truncate or raise an error? > * format keywords? ``b'{arg}'.format(arg=5)`` > * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)`` > * Floating point number? > * ``%i``, ``%u`` and ``%d`` formats for integer numbers? > * Signed number? ``%+i`` and ``%-i`` > > > HTML version of the PEP: > http://www.python.org/dev/peps/pep-0460/ > > Inline copy: > > PEP: 460 > Title: Add bytes % args and bytes.format(args) to Python 3.5 > Version: $Revision$ > Last-Modified: $Date$ > Author: Victor Stinner > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 6-Jan-2014 > Python-Version: 3.5 > > > Abstract > ======== > > Add ``bytes % args`` operator and ``bytes.format(args)`` method to > Python 3.5. > > > Rationale > ========= > > ``bytes % args`` and ``bytes.format(args)`` have been removed in Python > 2. This operator and this method are requested by Mercurial and Twisted > developers to ease porting their project on Python 3. > > Python 3 suggests to format text first and then encode to bytes. In > some cases, it does not make sense because arguments are bytes strings. > Typical usage is a network protocol which is binary, since data are > send to and received from sockets. For example, SMTP, SIP, HTTP, IMAP, > POP, FTP are ASCII commands interspersed with binary data. > > Using multiple ``bytes + bytes`` instructions is inefficient because it > requires temporary buffers and copies which are slow and waste memory. > Python 3.3 optimizes ``str2 += str2`` but not ``bytes2 += bytes1``. > > ``bytes % args`` and ``bytes.format(args)`` were asked since 2008, even > before the first release of Python 3.0 (see issue #3982). > > ``struct.pack()`` is incomplete. For example, a number cannot be > formatted as decimal and it does not support padding bytes string. > > Mercurial 2.8 still supports Python 2.4. > > > Needed and excluded features > ============================ > > Needed features > > * Bytes strings: bytes, bytearray and memoryview types > * Format integer numbers as decimal > * Padding with spaces and null bytes > * "%s" should use the buffer protocol, not str() > > The feature set is minimal to keep the implementation as simple as > possible to limit the cost of the implementation. ``str % args`` and > ``str.format(args)`` are already complex and difficult to maintain, the > code is heavily optimized. > > Excluded features: > > * no implicit conversion from Unicode to bytes (ex: encode to ASCII or > to Latin1) > * Locale support (``{!n}`` format for numbers). Locales are related to > text and usually to an encoding. > * ``repr()``, ``ascii()``: ``%r``, ``{!r}``, ``%a`` and ``{!a}`` > formats. ``repr()`` and ``ascii()`` are used to debug, the output is > displayed a terminal or a graphical widget. They are more related to > text. > * Attribute access: ``{obj.attr}`` > * Indexing: ``{dict[key]}`` > * Features of struct.pack(). For example, format a number as 32 bit unsigned > integer in network endian. The ``struct.pack()`` can be used to prepare > arguments, the implementation should be kept simple. > * Features of int.to_bytes(). > * Features of ctypes. > * New format protocol like a new ``__bformat__()`` method. Since the > * list of > supported types is short, there is no need to add a new protocol. > Other types must be explicitly casted. > * Alternate format for integer. For example, ``'{|#x}'.format(0x123)`` > to get ``0x123``. It is more related to debug, and the prefix can be > easily be written in the format string (ex: ``0x%x``). > * Relation with format() and the __format__() protocol. bytes.format() > and str.format() are unrelated. > > Unknown: > > * Format integer to hexadecimal? ``%x`` and ``%X`` > * Format integer to octal? ``%o`` > * Format integer to binary? ``{!b}`` > * Alignment? > * Truncating? Truncate or raise an error? > * format keywords? ``b'{arg}'.format(arg=5)`` > * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)`` > * Floating point number? > * ``%i``, ``%u`` and ``%d`` formats for integer numbers? > * Signed number? ``%+i`` and ``%-i`` > > > bytes % args > ============ > > Formatters: > > * ``"%c"``: one byte > * ``"%s"``: integer or bytes strings > * ``"%20s"`` pads to 20 bytes with spaces (``b' '``) > * ``"%020s"`` pads to 20 bytes with zeros (``b'0'``) > * ``"%\020s"`` pads to 20 bytes with null bytes (``b'\0'``) > > > bytes.format(args) > ================== > > Formatters: > > * ``"{!c}"``: one byte > * ``"{!s}"``: integer or bytes strings > * ``"{!.20s}"`` pads to 20 bytes with spaces (``b' '``) > * ``"{!.020s}"`` pads to 20 bytes with zeros (``b'0'``) > * ``"{!\020s}"`` pads to 20 bytes with null bytes (``b'\0'``) > > > Examples > ======== > > * ``b'a%sc%s' % (b'b', 4)`` gives ``b'abc4'`` > * ``b'a{}c{}'.format(b'b', 4)`` gives ``b'abc4'`` > * ``b'%c'`` % 88`` gives ``b'X``' > * ``b'%%'`` gives ``b'%'`` > > > Criticisms > ========== > > * The development cost and maintenance cost. > * In 3.3 encoding to ascii or latin1 is as fast as memcpy > * Developers must work around the lack of bytes%args and > bytes.format(args) anyway to support Python 3.0-3.4 > * bytes.join() is consistently faster than format to join bytes strings. > * Formatting functions can be implemented in a third party module > > > References > ========== > > * `Issue #3982: support .format for bytes > `_ > * `Mercurial project > `_ > * `Twisted project > `_ > * `Documentation of Python 2 formatting (str % args) > `_ > * `Documentation of Python 2 formatting (str.format) > `_ > > Copyright > ========= > > This document has been placed in the public domain. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: From dholth at gmail.com Wed Jan 8 21:07:41 2014 From: dholth at gmail.com (Daniel Holth) Date: Wed, 8 Jan 2014 15:07:41 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: On Wed, Jan 8, 2014 at 2:17 PM, Stefan Behnel wrote: > Victor Stinner, 06.01.2014 14:24: >> Abstract >> ======== >> Add ``bytes % args`` operator and ``bytes.format(args)`` method to >> Python 3.5. > > Here is a counterproposal. Let someone who needs this feature write a > library that does byte string formatting. That properly handles it, a full > featured tool set. Write it in Cython if you need raw speed, that will also > help in making it run in both Python 2 and Python 3, or in providing easy > integration with buffers like the array module, various byte containers, > NumPy, etc. > I'm confident that this will show that the current Py2 code that > (legitimately) does byte string formatting can actually be improved, > simplified or sped up, at least in some corners. I'm sure Py2 byte string > formatting wasn't perfect for this use case either, it just happened to be > there, so everyone used it and worked around its particular quirks for the > particular use case at hand. (Think of "%s" % some_unicode_value, for example.) > > Instead of waiting for 3.5, a third party library allows users to get > started porting their code earlier, and to make it work unchanged with > Python versions before 3.5. Maybe we can enumerate some of the stated drawbacks of b''.format() Convenient string processing tools for bytes will make people ignore Unicode or fail to notice it or do it wrong? (As opposed to the alternative causing them to learn how to process and produce Unicode correctly?) Similar APIs on bytes and str will prevent implicit "assert isinstance(x, str)" checks? More-prevalent bytes will propagate across the program causing bugs? A-la open(b'filename').name vs open('filename').name ? It will take a long time. Hopeful benefits may include easier porting and greater Py3 adoption, less encoding dances and/or decoding non-Unicode into Unicode just to make things work, hopefully fewer surrogate-encoded bytes and therefore fewer encoding-bugs-distant-from-source-of-invalid-text, ... From rdmurray at bitdance.com Wed Jan 8 22:29:59 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 08 Jan 2014 16:29:59 -0500 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> Message-ID: <20140108213000.2A9152501A1@webabinitio.net> On Wed, 08 Jan 2014 19:22:08 +0000, "Matt Billenstein" wrote: > I started in Python blissfully unaware of unicode - it was a different time for > sure, but what I knew from C worked pretty much the same in Python - I could > read some binary data out of a file, twiddle some bits, and write it back out > again without any of these complexities - life was good and granted I was > naive, but it made Python approachable for me and I enjoyed it. I stuck with > it and learned about unicode and the complexities of encoding data and now I'm > astonished at how many professional programmers don't know the slightest bit > about it and how horribly munged some data you can consume on the web might be > - I agree it's all quite a mess. > > So now I'm getting more serious about Python3 and my fear is that the > development community (python3) has fractured from the user community (python2) > in that they've built something that solves their problems (to oversimplify > lets say a webapp) - sure, a bunch of stuff got fixed along the way and we gave > the users division they would expect (3/2 == 1.5), but somewhere what I felt I believe this is a mis-perception. I think Python3 is *simpler* and *less complex* than Python2, both at the Python language level and at the CPython implementation level. (I'm using a definition of these terms that roughly works out to "easier to understand".) That was part of the point. Python3 is *easier* to use for new projects than Python2. I'm not speaking from theory here, I've written and worked on non-trivial new projects in both versions.[1] It is true that in Python3 you *must* learn the difference between bytes and strings. But in the modern world, you had better learn to do that anyway, and learn to do it right up front. If you don't want to, I suppose you could stay stuck in an earlier age and keep using Python2. It also is true that it would be nice to have a more convenient API for, as Antoine put it, interpolating into a binary stream. But really, the vast majority of programs have no need to do that. It is pretty much only the low level libraries, most of them dealing with data-interchange (wire protocols), that would use this. > was more like a hobbyist language has become big and complex and "we need to > protect our users from doing the wrong thing." As I just learned recently, Python was always intended to be a "real" programming language, and not a hobbyist language :) But it was also always meant to be easy to learn and use. Python3's goal is to make it *easier* to do the *right* thing. The fact that in some cases it also makes it harder to to the wrong thing is mostly a consequence of making it easier to do the right thing. Python's philosophy is still one of "consenting adults", despite a few voices agitating for preventing users from shooting themselves in the foot. But making "the one obvious way to do it" easy, and consequently making the other ways harder, fits in to its overall philosophy just fine. As does trying to prevent the wrong thing from happening *by accident* (read: mojibake). --David [1] I also find it easier to maintain my python3 programs than I do my python2 programs, probably because I've gotten used to the convenience of the new Python3 features, and miss them when working Python2. [2] With perfect hindsight I think we'd have focused more right from the start on single-codebase, rather than on 2to3; but perfect hindsight doesn't do you any good when it comes to foresight. From solipsis at pitrou.net Wed Jan 8 23:42:13 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Jan 2014 23:42:13 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: Message-ID: <20140108234213.0610ef63@fsol> Hi, With Victor's consent, I overhauled PEP 460 and made the feature set more restricted and consistent with the bytes/str separation. However, I also added bytearray into the mix, as bytearray objects should generally support the same operations as bytes (and they can be useful *especially* for network programming). Regards Antoine. On Mon, 6 Jan 2014 14:24:50 +0100 Victor Stinner wrote: > Hi, > > bytes % args and bytes.format(args) are requested by Mercurial and > Twisted projects. The issue #3982 was stuck because nobody proposed a > complete definition of the "new" features. Here is a try as a PEP. > > The PEP is a draft with open questions. First, I'm not sure that both > bytes%args and bytes.format(args) are needed. The implementation of > .format() is more complex, so why not only adding bytes%args? Then, > the following points must be decided to define the complete list of > supported features (formatters): From kristjan at ccpgames.com Wed Jan 8 23:04:56 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 8 Jan 2014 22:04:56 +0000 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: <20140108213000.2A9152501A1@webabinitio.net> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com>, <20140108213000.2A9152501A1@webabinitio.net> Message-ID: Believe it or not, sometimes you really don't care about encodings. Sometimes you just want to parse text files. Python 3 forces you to think about abstract concepts like encodings when all you want is to open that .txt file on the drive and extract some phone numbers and merge in some email addresses. What encoding does the file have? Do I care? Must I care? I have lots of little utilities, to help me with day to day stuff like this. One fine morning I decided to start usnig Python 3 for the job. Imagine my surprise when it turned out to make my job more complicated, not easier. Suddenly I had to start thining about stuff that hadn't mattered at all, and still didn't really matter. All it did was complicate things for no benefit. Python forcing you to think about this is like the cashier at the hardware store who won't let you buy the hammer you brought to the cash register because you don't know what wood its handle is made of. Sure, Python should make it easier to do the *right* thing. That's equivalent to placing the indicator selector at a convenient place near the steering wheel. What it shouldn't do, is make the flashing of the indicator mandatory whenever you turn the wheel. All of this talk is positive, though. The fact that these topics have finally reached the halls of python-dev are indication that people out there are _trying_ to move to 3.3 :) Cheers, K ________________________________________ From: Python-Dev [python-dev-bounces+kristjan=ccpgames.com at python.org] on behalf of R. David Murray [rdmurray at bitdance.com] Sent: Wednesday, January 08, 2014 21:29 To: python-dev at python.org Subject: Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) ... It is true that in Python3 you *must* learn the difference between bytes and strings. But in the modern world, you had better learn to do that anyway, and learn to do it right up front. If you don't want to, I suppose you could stay stuck in an earlier age and keep using Python2. ... Python3's goal is to make it *easier* to do the *right* thing. The fact that in some cases it also makes it harder to to the wrong thing is mostly a consequence of making it easier to do the right thing. From jsbueno at python.org.br Thu Jan 9 00:28:29 2014 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Wed, 8 Jan 2014 21:28:29 -0200 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> Message-ID: On 8 January 2014 20:04, Kristj?n Valur J?nsson wrote: > Believe it or not, sometimes you really don't care about encodings. > Sometimes you just want to parse text files. Python 3 forces you to think about abstract concepts like encodings when all you want is to open that .txt file on the drive and extract some phone numbers and merge in some email addresses. What encoding does the file have? Do I care? Must I care? Kristj??n, the answer is obviously "yes you must" :-) From victor.stinner at gmail.com Thu Jan 9 00:28:41 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 9 Jan 2014 00:28:41 +0100 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> Message-ID: Hi, > Python 3 forces you to think about abstract concepts like encodings when all you want is to open that .txt file on the drive and extract some phone numbers and merge in some email addresses. You can open a text file using ascii + surrogateescape, or just open the file in binary. Victor From rdmurray at bitdance.com Thu Jan 9 00:40:39 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 08 Jan 2014 18:40:39 -0500 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com>, <20140108213000.2A9152501A1@webabinitio.net> Message-ID: <20140108234040.2DF3E2501A1@webabinitio.net> On Wed, 08 Jan 2014 22:04:56 +0000, wrote: > Believe it or not, sometimes you really don't care about encodings. > Sometimes you just want to parse text files. Python 3 forces you to > think about abstract concepts like encodings when all you want is to > open that .txt file on the drive and extract some phone numbers and > merge in some email addresses. What encoding does the file have? Do > I care? Must I care? Why *do* you care? Isn't your system configured for utf-8, and all your .txt files encoded with utf-8 by default? Or at least configured with a single consistent encoding? If that's the case, Python3 doesn't make you think about the encoding. Knowing the right encoding is different from needing to know the difference between text and bytes; you only need to worry about encodings when your system isn't configured consistently to begin with. If you do have to care, your little utilities only work by accident in Python2, and must have produced mojibake when the encoding was wrong, unless I'm completely confused. So yeah, sorting that out is harder if you were just living with the mojibake before...but if so I'm surprised you haven't wanted to fix that before this. --David From ben+python at benfinney.id.au Thu Jan 9 01:07:15 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 09 Jan 2014 11:07:15 +1100 Subject: [Python-Dev] Python3 "complexity" References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> Message-ID: <7wbnzmf26k.fsf@benfinney.id.au> Kristj?n Valur J?nsson writes: > Believe it or not, sometimes you really don't care about encodings. > Sometimes you just want to parse text files. Files don't contain text, they contain bytes. Bytes only become text when filtered through the correct encoding. Python should not guess the encoding if it's unknown. Without the right encoding, you don't get text, you get partial or complete gibberish. So, if what you want is to parse text and not get gibberish, you need to *tell* Python what the encoding is. That's a brute fact of the world of text in computing. > Python 3 forces you to think about abstract concepts like encodings > when all you want is to open that .txt file on the drive and extract > some phone numbers and merge in some email addresses. What encoding > does the file have? Do I care? Must I care? Yes, you must. > Python forcing you to think about this is like the cashier at the > hardware store who won't let you buy the hammer you brought to the > cash register because you don't know what wood its handle is made of. The cashier is making a mistake: the hammer, regardless of the wood in the handle, still functions just fine as a hammer. Hence, the question is unimportant to the purpose. The same is not true of changing the encoding for text. The encoding matters, and the programmer needs to care. -- \ ?How wonderful that we have met with a paradox. Now we have | `\ some hope of making progress.? ?Niels Bohr | _o__) | Ben Finney From python at mrabarnett.plus.com Thu Jan 9 01:21:02 2014 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 09 Jan 2014 00:21:02 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <7wbnzmf26k.fsf@benfinney.id.au> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: <52CDEB6E.1050008@mrabarnett.plus.com> On 2014-01-09 00:07, Ben Finney wrote: > Kristj?n Valur J?nsson writes: > >> Believe it or not, sometimes you really don't care about encodings. >> Sometimes you just want to parse text files. > > Files don't contain text, they contain bytes. Bytes only become text > when filtered through the correct encoding. > > Python should not guess the encoding if it's unknown. Without the right > encoding, you don't get text, you get partial or complete gibberish. > > So, if what you want is to parse text and not get gibberish, you need to > *tell* Python what the encoding is. That's a brute fact of the world of > text in computing. > >> Python 3 forces you to think about abstract concepts like encodings >> when all you want is to open that .txt file on the drive and extract >> some phone numbers and merge in some email addresses. What encoding >> does the file have? Do I care? Must I care? > > Yes, you must. > >> Python forcing you to think about this is like the cashier at the >> hardware store who won't let you buy the hammer you brought to the >> cash register because you don't know what wood its handle is made of. > > The cashier is making a mistake: the hammer, regardless of the wood in > the handle, still functions just fine as a hammer. Hence, the question > is unimportant to the purpose. > On the other hand: "I need a new battery." "What kind of battery?" "I don't care!" > The same is not true of changing the encoding for text. The encoding > matters, and the programmer needs to care. > From breamoreboy at yahoo.co.uk Thu Jan 9 01:27:01 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Thu, 09 Jan 2014 00:27:01 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <52CDEB6E.1050008@mrabarnett.plus.com> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <52CDEB6E.1050008@mrabarnett.plus.com> Message-ID: On 09/01/2014 00:21, MRAB wrote: >> > > "I need a new battery." > > "What kind of battery?" > > "I don't care!" > A neat summary of the draft requirements specification for Python 2.8. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ijmorlan at uwaterloo.ca Thu Jan 9 00:41:35 2014 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Wed, 8 Jan 2014 18:41:35 -0500 (EST) Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com>, <20140108213000.2A9152501A1@webabinitio.net> Message-ID: On Wed, 8 Jan 2014, Kristj?n Valur J?nsson wrote: > Believe it or not, sometimes you really don't care about encodings. > Sometimes you just want to parse text files. Python 3 forces you to > think about abstract concepts like encodings when all you want is to > open that .txt file on the drive and extract some phone numbers and > merge in some email addresses. What encoding does the file have? Do I > care? Must I care? Mostly staying out of this, but I need to say something here. If you don't know what encoding the file has, you don't know what bytes correspond to phone numbers. So yes, you must care, or else you simply cannot write your code. Of course, in practice, it's probably encoded in an ASCII-compatible encoding, so '0' encodes as the single byte 0x30. Whether it's UTF-8, ISO-8859-1, or something else that is ASCII-compatible doesn't really matter. So, as a practical matter, you can just use ISO-8859-1, even though in principal this is totally wrong. Then ASCII is one byte per character as you expect, and all other bytes will round-trip unchanged. Just don't do any non-trivial processing on non-ASCII characters. I don't see how it could be made any simpler without going back to making it easy for people to pretend the issue doesn't exist at all and bringing back the attendant confusion and problems. > I have lots of little utilities, to help me with day to day stuff like > this. One fine morning I decided to start usnig Python 3 for the job. > Imagine my surprise when it turned out to make my job more complicated, > not easier. Suddenly I had to start thining about stuff that hadn't > mattered at all, and still didn't really matter. All it did was > complicate things for no benefit. [....] > > All of this talk is positive, though. The fact that these topics have > finally reached the halls of python-dev are indication that people out > there are _trying_ to move to 3.3 :) Agreed. Isaac Morland CSCF Web Guru DC 2619, x36650 WWW Software Specialist From kristjan at ccpgames.com Thu Jan 9 01:22:21 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Thu, 9 Jan 2014 00:22:21 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <7wbnzmf26k.fsf@benfinney.id.au> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> , <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: Still playing the devil's advocate: I didn't used to must. Why must I must now? Did the universe just shift when I fired up python3? Things were demonstatably working just fine before without doing so. K ________________________________________ From: Python-Dev [python-dev-bounces+kristjan=ccpgames.com at python.org] on behalf of Ben Finney [ben+python at benfinney.id.au] Sent: Thursday, January 09, 2014 00:07 To: python-dev at python.org Subject: Re: [Python-Dev] Python3 "complexity" Kristj?n Valur J?nsson writes: > Python 3 forces you to think about abstract concepts like encodings > when all you want is to open that .txt file on the drive and extract > some phone numbers and merge in some email addresses. What encoding > does the file have? Do I care? Must I care? Yes, you must. From ben+python at benfinney.id.au Thu Jan 9 01:41:41 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 09 Jan 2014 11:41:41 +1100 Subject: [Python-Dev] Python3 "complexity" References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <52CDEB6E.1050008@mrabarnett.plus.com> Message-ID: <7w61puf0l6.fsf@benfinney.id.au> MRAB writes: > On 2014-01-09 00:07, Ben Finney wrote: > > Kristj?n Valur J?nsson writes: > >> Python 3 forces you to think about abstract concepts like encodings > >> when all you want is to open that .txt file on the drive and > >> extract some phone numbers and merge in some email addresses. What > >> encoding does the file have? Do I care? Must I care? > > > > Yes, you must. > > > >> Python forcing you to think about this is like the cashier at the > >> hardware store who won't let you buy the hammer you brought to the > >> cash register because you don't know what wood its handle is made > >> of. > > > > The cashier is making a mistake: the hammer, regardless of the wood in > > the handle, still functions just fine as a hammer. Hence, the question > > is unimportant to the purpose. > > On the other hand: > > "I need a new battery." > > "What kind of battery?" > > "I don't care!" That's a much better analogy. The customer may not care, but the question is essential and must be answered; if the supplier guesses what the customer wants, they are doing the customer a disservice. If the customer insists the supplier just give them a battery which will work regardless of what type of battery the device requires, the *customer is wrong*. Such customers need to be educated about the necessity to care about details they may have no interest in, if they want to get their device working reliably. We can all work toward a world where there is just one encoding which works for all text and no other encodings to confuse the matter. Until then, everyone needs to deal with the world as it is. (good sigmonster, have a cookie) -- \ ?Ours is a world where people don't know what they want and are | `\ willing to go through hell to get it.? ?Donald Robert Perry | _o__) Marquis | Ben Finney From kristjan at ccpgames.com Thu Jan 9 01:12:57 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Thu, 9 Jan 2014 00:12:57 +0000 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: <20140108234040.2DF3E2501A1@webabinitio.net> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com>, <20140108213000.2A9152501A1@webabinitio.net> , <20140108234040.2DF3E2501A1@webabinitio.net> Message-ID: Just to avoid confusion, let me state up front that I am very well aware of encodings and all that, having internationalized one largish app in python 2.x. I know the problems that 2.x had with tracking down the source of errors and understand the beautiful concept of encodings on the boundary. However: For a lot of data processing and tools, encoding isn't an issue. Either you assume ascii, or you're working with something like latin1. A single byte encoding. This is because you're working with a text file that _you_ wrote. And you're not assigning any semantics to the characters. If there is actual "text" in there it is just english, not Norwegian or Turkish. A byte read at code 0xfa doesn't mean anything special. It's just that, a byte with that value. The file system doesn't have any default encoding. A file on disk is just a file on disk consisting of bytes. There can never be any wrong encoding, no mojibake. With python 2, you can read that file into a string object. You can scan for your field delimiter, e.g. a comma, split up your string, interpolate some binary data, spit it out again. All without ever thinking about encodings. Even though the file is conceptually encoded in something, if you insist on attaching a particular semantic meaning to every ordinal value, whatever that meaning is is in many cases irrelevant to the program. I understand that surrogateescape allows you to do this. But it is an awkward extra step and forces an extra layer of needles semantics on to that guy that just wants to read a file. Sure, vegetarians and alergics like to read the list of ingredients on everything that they eat. But others are just omnivores and want to be able to eat whatever is on the table, and not worry about what it is made of. And yes, you can read the file in binary mode but then you end up with those bytes objects that we have just found that are tedious to work with. So, what I'm saying is that at least I have a very common use case that has just become a) more confusing (having to needlessly derail the train of thought about the data processing to be done by thinking about text encodings) and b) more complicated. Not sure if there is anything to be done about it though :) I think there might be a different analogy: Having to specify an encoding is like having strong typing. In Python 2.7, we _can_ forego that and just duck-type our strings :) K ________________________________________ From: Python-Dev [python-dev-bounces+kristjan=ccpgames.com at python.org] on behalf of R. David Murray [rdmurray at bitdance.com] Sent: Wednesday, January 08, 2014 23:40 To: python-dev at python.org Subject: Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) Why *do* you care? Isn't your system configured for utf-8, and all your .txt files encoded with utf-8 by default? Or at least configured with a single consistent encoding? If that's the case, Python3 doesn't make you think about the encoding. Knowing the right encoding is different from needing to know the difference between text and bytes; you only need to worry about encodings when your system isn't configured consistently to begin with. If you do have to care, your little utilities only work by accident in Python2, and must have produced mojibake when the encoding was wrong, unless I'm completely confused. So yeah, sorting that out is harder if you were just living with the mojibake before...but if so I'm surprised you haven't wanted to fix that before this. From ben+python at benfinney.id.au Thu Jan 9 01:49:51 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 09 Jan 2014 11:49:51 +1100 Subject: [Python-Dev] Python3 "complexity" References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: <7wzjn6dln4.fsf@benfinney.id.au> Kristj?n Valur J?nsson writes: > I didn't used to must. Why must I must now? Did the universe just > shift when I fired up python3? In a sense, yes. The world of software has been shifting for decades, as a reasult of broader changes in how different segments of humanity have changed their interactions, and thereby changed their expectations of what computers can do with their data. While for some programmers, in past decades, it used to be reasonable to stick one's head in the sand and ignore all encodings except one privileged local encoding, that is no longer reasonable today. As a result, it is incumbent on any programmer working with text to care about text encodings. You've likely already seen it, but the point I'm making is better made in this essay . -- \ ?????????? | `\ (What is undesirable to you, do not do to others.) | _o__) ???? Confucius, 551 BCE ? 479 BCE | Ben Finney From breamoreboy at yahoo.co.uk Thu Jan 9 02:04:19 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Thu, 09 Jan 2014 01:04:19 +0000 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com>, <20140108213000.2A9152501A1@webabinitio.net> , <20140108234040.2DF3E2501A1@webabinitio.net> Message-ID: On 09/01/2014 00:12, Kristj?n Valur J?nsson wrote: > Just to avoid confusion, let me state up front that I am very well aware of encodings and all that, having internationalized one largish app in python 2.x. I know the problems that 2.x had with tracking down the source of errors and understand the beautiful concept of encodings on the boundary. > > However: > For a lot of data processing and tools, encoding isn't an issue. Either you assume ascii, or you're working with something like latin1. A single byte encoding. This is because you're working with a text file that _you_ wrote. And you're not assigning any semantics to the characters. If there is actual "text" in there it is just english, not Norwegian or Turkish. A byte read at code 0xfa doesn't mean anything special. It's just that, a byte with that value. The file system doesn't have any default encoding. A file on disk is just a file on disk consisting of bytes. There can never be any wrong encoding, no mojibake. > > With python 2, you can read that file into a string object. You can scan for your field delimiter, e.g. a comma, split up your string, interpolate some binary data, spit it out again. All without ever thinking about encodings. > > Even though the file is conceptually encoded in something, if you insist on attaching a particular semantic meaning to every ordinal value, whatever that meaning is is in many cases irrelevant to the program. > > I understand that surrogateescape allows you to do this. But it is an awkward extra step and forces an extra layer of needles semantics on to that guy that just wants to read a file. Sure, vegetarians and alergics like to read the list of ingredients on everything that they eat. But others are just omnivores and want to be able to eat whatever is on the table, and not worry about what it is made of. > And yes, you can read the file in binary mode but then you end up with those bytes objects that we have just found that are tedious to work with. > All I can say is that I've been using python 3 for years and wouldn't know what a surrogateescape was if you were to hit me around the head with it. I open my files, I process them, and Python kindly closes them for me via a context manager. So if you're not bothered about encoding, where has the "awkward extra step and forces an extra layer of needles semantics" bit come from? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From rdmurray at bitdance.com Thu Jan 9 02:24:06 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 08 Jan 2014 20:24:06 -0500 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com>, <20140108213000.2A9152501A1@webabinitio.net> , <20140108234040.2DF3E2501A1@webabinitio.net> Message-ID: <20140109012406.84BB12501A1@webabinitio.net> On Thu, 09 Jan 2014 00:12:57 +0000, wrote: > I think there might be a different analogy: Having to specify an > encoding is like having strong typing. In Python 2.7, we _can_ forego > that and just duck-type our strings :) Python is a strongly typed language. Saying that python2 let you duck type bytestrings (ie: postpone the decision as to what encoding they were in until the last minute) is an interesting perspective...but as we know it led to many many program bugs. Which were the result, essentially, of a failure to strongly type the string and bytes types the way other python types are strongly typed. However, I do now understand your use case better, even though I wouldn't myself write programs like that. Or, rather, I make sure all my files are in the same encoding (utf-8). I suppose that this is because I, as an English-speaking USAian, came late to the need for non-ascii characters, after utf-8 was already well established. The rest of the world didn't have that luxury. --David From tjreedy at udel.edu Thu Jan 9 02:35:48 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 08 Jan 2014 20:35:48 -0500 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com>, <20140108213000.2A9152501A1@webabinitio.net> Message-ID: On 1/8/2014 5:04 PM, Kristj?n Valur J?nsson wrote: > > Believe it or not, sometimes you really don't care about encodings. > Sometimes you just want to parse text files. Python 3 forces you to > think about abstract concepts like encodings when all you want is to > open that .txt file on the drive and extract some phone numbers and I suspect that you would do that by looking for the bytes that can be interpreted as ascii digits. That will work fine as long as the .txt file has an ascii-compatible encoding. As soon as it does not, the little utility fails. It also fails with non-European digits, such as are used in Arabic and Indic writings. Even if you are in an environment where all .txt files are encoded in utf-8, it will be easier to look for non-ascii digits in decoded unicode strings. > merge in some email addresses. What encoding does the file have? Do > I care? Must I care? If the email addresses have non-ascii characters, then you must. ... > All this talk is positive, though. The fact that these topics > have finally reached the halls of python-dev are indication that > people out there are _trying_ to move to 3.3 :) That is an interesting observation, worth keeping in mind among the turmoil. -- Terry Jan Reedy From rosuav at gmail.com Thu Jan 9 02:36:01 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 9 Jan 2014 12:36:01 +1100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <52CDEB6E.1050008@mrabarnett.plus.com> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <52CDEB6E.1050008@mrabarnett.plus.com> Message-ID: On Thu, Jan 9, 2014 at 11:21 AM, MRAB wrote: > On the other hand: > > "I need a new battery." > > "What kind of battery?" > > "I don't care!" Or, bringing it back to Python: How do you write a set out to a file? foo = {1, 2, 4, 8, 16, 32} open("foo.txt","w").write(foo) # Uh... nope! I don't want to have to worry about how it's formatted! I just want to write that set out and have someone read it in later! A text string is just as abstract as any other complex type. For some reason, we've grown up thinking that "ABCD" == \x61\x62\x63\x64 == "ABCD", even though it's just as logical for those bytes to represent 12.1414 or 1094861636 or 1145258561. There's no difference between encoding one thing to bytes and encoding another thing to bytes, and it's critical to get those encodes/decodes right. ChrisA From songofacandy at gmail.com Thu Jan 9 04:27:49 2014 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 9 Jan 2014 12:27:49 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> Message-ID: > And I think everyone was well intentioned - and python3 covers most of the > bases, but working with binary data is not only a "wire-protocol > programmer's" > problem. Needing a library to wrap bytesthing.format('ascii', > 'surrogateescape') > or some such thing makes python3 less approachable for those who haven't > learned that yet - which was almost all of us at some point when we started > programming. > > Totally agree with you. -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Jan 9 04:54:13 2014 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 09 Jan 2014 03:54:13 +0000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <52CE1D65.6030206@mrabarnett.plus.com> On 2014-01-06 13:24, Victor Stinner wrote: > Hi, > > bytes % args and bytes.format(args) are requested by Mercurial and > Twisted projects. The issue #3982 was stuck because nobody proposed a > complete definition of the "new" features. Here is a try as a PEP. > > The PEP is a draft with open questions. First, I'm not sure that both > bytes%args and bytes.format(args) are needed. The implementation of > .format() is more complex, so why not only adding bytes%args? Then, > the following points must be decided to define the complete list of > supported features (formatters): > > * Format integer to hexadecimal? ``%x`` and ``%X`` > * Format integer to octal? ``%o`` > * Format integer to binary? ``{!b}`` > * Alignment? > * Truncating? Truncate or raise an error? > * format keywords? ``b'{arg}'.format(arg=5)`` > * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)`` > * Floating point number? > * ``%i``, ``%u`` and ``%d`` formats for integer numbers? > * Signed number? ``%+i`` and ``%-i`` > I'm thinking that the "i" format could be used for signed integers and the "u" for unsigned integers. The width would be the number of bytes. You would also need to have a way of specifying the endianness. For example: >>> b'{:<2i}'.format(256) b'\x01\x00' >>> b'{:>2i}'.format(256) b'\x00\x01' Perhaps the width should default to 1 in the cases of "i" and "u": >>> b'{:i}'.format(-1) b'\xFF' >>> b'{:u}'.format(255) b'\xFF' >>> b'{:i}'.format(255) ValueError: ... Interestingly, I've just been checking what exception is raised for some format types, and I got this: >>> '{:c}'.format(-1) Traceback (most recent call last): File "", line 1, in OverflowError: %c arg not in range(0x110000) Should the exception be OverflowError (probably yes), and should the message say "%c"? From drsalists at gmail.com Thu Jan 9 05:15:54 2014 From: drsalists at gmail.com (Dan Stromberg) Date: Wed, 8 Jan 2014 20:15:54 -0800 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> Message-ID: On Wed, Jan 8, 2014 at 2:04 PM, Kristj?n Valur J?nsson wrote: > > Believe it or not, sometimes you really don't care about encodings. > Sometimes you just want to parse text files. Python 3 forces you to think about abstract concepts like encodings when all you want is to open that .txt file on the drive and extract some phone numbers and merge in some email addresses. What encoding does the file have? Do I care? Must I care? If computers had taken off in China before the USA, you'd probably be wondering why some Chinese refuse to care about encodings, when the rest of the world clearly needs them. Yes, you really should care about encodings. No, it's not quite as simple as it once was for English speakers as it once was. It was formerly simple (for us) because we were effectively pressing everyone else to read and write English. If you want to keep things close to what you're used to, use latin-1 as your encoding. It's still a choice, and not a great one for user-facing text, but if you want to be simplistic about it, that's a way to do it. That said, there will be some text that isn't user-facing, EG in a network protocol. This is probably what all the fuss is about. But like I said, this can be done with latin-1. From stephen at xemacs.org Thu Jan 9 05:53:02 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 09 Jan 2014 13:53:02 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140108200825.3fb1bd6d@fsol> References: <20140108200825.3fb1bd6d@fsol> Message-ID: <87wqi9g3ip.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > However, interpolating a bytes object isn't out of place, and it is > what a minimal "formatting" primitive could do. Something like this? # VERY incomplete pseudo-code class str: # new method # fmtstring has syntax of .format method's spec, maybe adding a 'B' # for "insert Blob of bytes" spec def format_for_wire(fmtstring, args, encoding='utf-8', errors='strict'): result = b'' # gotta go to a meeting, exercise for reader :-( parts = zip_specs_and_args(fmtstring, args) for spec, arg in parts: if spec == 'B' and isinstance(arg, bytes): result += arg else: partial = format(spec, arg) result += partial.encode(encoding=encoding, errors=errors) return result Maybe format_to_bytes is a more accurate name. I have no idea how to do this for %-formatting though. :-( And I have the sneaking suspicion that it *can't* be this easy. :-( Can it? :-) From greg.ewing at canterbury.ac.nz Thu Jan 9 06:22:28 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 09 Jan 2014 18:22:28 +1300 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> Message-ID: <52CE3214.1010900@canterbury.ac.nz> Kristj?n Valur J?nsson wrote: > all you want is to open that .txt > file on the drive and extract some phone numbers and merge in some email > addresses. What encoding does the file have? Do I care? Must I care? To some extent, yes. If the encoding happens to be an ascii-compatible one, such as latin-1 or utf-8, you can probably extract the phone numbers without caring what the rest of the bytes mean. But not if it's utf-16, for example. If you know that all the files on your system have an ascii-compatible encoding, you can use the surrogateescape error handler to avoid having to know about the exact encoding. Granted, that makes it slightly more complicated than it was in Python 2, but not much. -- Greg From stephen at xemacs.org Thu Jan 9 06:29:31 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 09 Jan 2014 14:29:31 +0900 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: <87vbxtg1tw.fsf@uwakimon.sk.tsukuba.ac.jp> Kristj?n Valur J?nsson writes: > Still playing the devil's advocate: > I didn't used to must. Why must I must now? Did the universe just > shift when I fired up python3? No. Go look at the Economist's tag cloud and notice how big "China" and "India" are most days. The universe has been shifting for 3 decades now, you just noticed it when you fired up Python 3. > Things were demonstatably working just fine before without doing > so. Who elected you General Secretary of the UN? Things were, and are still, demonstrably fucked up for the world at large. Python 3 is a big contribution to un-fucking the rest of us[1], thank you very much to Guido and Company! It's not obvious how to do things right for those of us who have to deal with 8-10 different encodings daily *on our desktops*, and still make things easy for those of you who rarely see ISO 8859/N for N != 1, let alone monstrosities like GB18030 or Shift JIS. That latter is a shame, but we're working on it (and have been all along -- it's not easy). Footnotes: [1] Or will be when my employer adopts it. From stephen at xemacs.org Thu Jan 9 06:34:14 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 09 Jan 2014 14:34:14 +0900 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <7w61puf0l6.fsf@benfinney.id.au> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <52CDEB6E.1050008@mrabarnett.plus.com> <7w61puf0l6.fsf@benfinney.id.au> Message-ID: <87txddg1m1.fsf@uwakimon.sk.tsukuba.ac.jp> Ben Finney writes: > That's a much better analogy. The customer may not care, but the > question is essential and must be answered; if the supplier guesses what > the customer wants, they are doing the customer a disservice. It is a much better analogy for me on my desktop, and for programmers working for global enterprises, too. It is not for Kristj?n, nor for many other American, European, and yes, even Australian programmers. You're making the same kind of mistake he is (although I personally benefit from your mistake, and have suffered for decades from his :-). Diff'rent folks, diff'rent strokes. It would be nice if we could serve both use cases *by default*. We haven't found the way yet, that's all. From regebro at gmail.com Thu Jan 9 07:50:48 2014 From: regebro at gmail.com (Lennart Regebro) Date: Thu, 9 Jan 2014 07:50:48 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <7wbnzmf26k.fsf@benfinney.id.au> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: On Thu, Jan 9, 2014 at 1:07 AM, Ben Finney wrote: > Kristj?n Valur J?nsson writes: > >> Believe it or not, sometimes you really don't care about encodings. >> Sometimes you just want to parse text files. > > Files don't contain text, they contain bytes. Bytes only become text > when filtered through the correct encoding. To be honest, you can define text as "A stream of bytes that are split up in lines separated by a linefeed", and do some basic text processing like that. Just very *basic*, but still. Replacing characters. Extracting certain lines etc. This is harder in Python 3, as bytes does not have all the functionality strings has, like formatting. This can probably be fixed in Python 3.5, if the relevant PEP gets finished. For the battery analogy, that's like saying: "I want a battery." "What kind?" "It doesn't matter, as long as it's over 5V." //Lennart From breamoreboy at yahoo.co.uk Thu Jan 9 08:00:17 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Thu, 09 Jan 2014 07:00:17 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: On 09/01/2014 06:50, Lennart Regebro wrote: > On Thu, Jan 9, 2014 at 1:07 AM, Ben Finney wrote: >> Kristj?n Valur J?nsson writes: >> >>> Believe it or not, sometimes you really don't care about encodings. >>> Sometimes you just want to parse text files. >> >> Files don't contain text, they contain bytes. Bytes only become text >> when filtered through the correct encoding. > > To be honest, you can define text as "A stream of bytes that are split > up in lines separated by a linefeed", and do some basic text > processing like that. Just very *basic*, but still. Replacing > characters. Extracting certain lines etc. > > This is harder in Python 3, as bytes does not have all the > functionality strings has, like formatting. This can probably be fixed > in Python 3.5, if the relevant PEP gets finished. > > For the battery analogy, that's like saying: > > "I want a battery." > > "What kind?" > > "It doesn't matter, as long as it's over 5V." > > //Lennart > "That Python 3 battery you sold me blew up when I tried using it". "We've been telling you for years that could happen". "I didn't think you actually meant it". -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ncoghlan at gmail.com Thu Jan 9 08:11:06 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jan 2014 17:11:06 +1000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <7wbnzmf26k.fsf@benfinney.id.au> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: On 9 January 2014 10:07, Ben Finney wrote: > Kristj?n Valur J?nsson writes: > >> Believe it or not, sometimes you really don't care about encodings. >> Sometimes you just want to parse text files. > > Files don't contain text, they contain bytes. Bytes only become text > when filtered through the correct encoding. > > Python should not guess the encoding if it's unknown. Without the right > encoding, you don't get text, you get partial or complete gibberish. > > So, if what you want is to parse text and not get gibberish, you need to > *tell* Python what the encoding is. That's a brute fact of the world of > text in computing. Set the mode to "rb", process it as binary. Done. See http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html for details. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jan 9 08:12:18 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jan 2014 17:12:18 +1000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: On 9 January 2014 10:22, Kristj?n Valur J?nsson wrote: > Still playing the devil's advocate: > I didn't used to must. Why must I must now? Did the universe just shift when I fired up python3? > Things were demonstatably working just fine before without doing so. They were working fine for experienced POSIX users that had fully internalised the idiosycrasies of that platform and didn't need to care about any other environment (like Windows or the JVM). Cheers, Nick. > > K > > ________________________________________ > From: Python-Dev [python-dev-bounces+kristjan=ccpgames.com at python.org] on behalf of Ben Finney [ben+python at benfinney.id.au] > Sent: Thursday, January 09, 2014 00:07 > To: python-dev at python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > Kristj?n Valur J?nsson writes: > >> Python 3 forces you to think about abstract concepts like encodings >> when all you want is to open that .txt file on the drive and extract >> some phone numbers and merge in some email addresses. What encoding >> does the file have? Do I care? Must I care? > > Yes, you must. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ben+python at benfinney.id.au Thu Jan 9 08:16:16 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 09 Jan 2014 18:16:16 +1100 Subject: [Python-Dev] Python3 "complexity" References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: <7weh4heibj.fsf@benfinney.id.au> Nick Coghlan writes: > On 9 January 2014 10:07, Ben Finney wrote: > > Kristj?n Valur J?nsson writes: > > > >> Believe it or not, sometimes you really don't care about encodings. > >> Sometimes you just want to parse text files. > > > > Files don't contain text, they contain bytes. Bytes only become text > > when filtered through the correct encoding. [?] > Set the mode to "rb", process it as binary. Done. Which entails abandoning the stated goal of ?just want to parse text files? :-) -- \ ?All television is educational television. The question is: | `\ what is it teaching?? ?Nicholas Johnson | _o__) | Ben Finney From ncoghlan at gmail.com Thu Jan 9 08:09:10 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jan 2014 17:09:10 +1000 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: <52CE3214.1010900@canterbury.ac.nz> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <52CE3214.1010900@canterbury.ac.nz> Message-ID: On 9 January 2014 15:22, Greg Ewing wrote: > Kristj?n Valur J?nsson wrote: >> >> all you want is to open that .txt >> file on the drive and extract some phone numbers and merge in some email >> addresses. What encoding does the file have? Do I care? Must I care? > > > To some extent, yes. If the encoding happens to be an > ascii-compatible one, such as latin-1 or utf-8, you can > probably extract the phone numbers without caring what > the rest of the bytes mean. But not if it's utf-16, > for example. > > If you know that all the files on your system have an > ascii-compatible encoding, you can use the surrogateescape > error handler to avoid having to know about the exact > encoding. Granted, that makes it slightly more complicated > than it was in Python 2, but not much. There's also the fact that POSIX folks are used to "r" and "rb" being the same thing. Python 3 chose to make the default behaviour be to open files as text files in the default system encoding. This created two significant user visible changes: - POSIX users could no longer ignore the difference between binary mode and text mode when opening files (Windows users have always had to care due to the line ending problem) - POSIX users could no longer ignore locale configuration errors We're aiming to resolve the most common locale configuration issue by configuring surrogateescape on the standard streams when the OS claims that default encoding is ASCII, but ultimately, the long term fix is for POSIX platforms to standardise on and consistently report UTF-8 as the system encoding (as well as configuring ssh environments properly by default) Python 2 is *very* much a POSIX first language, with Windows, the JVM and other non-POSIX environments as an afterthought. Python 3 is intentionally offers more consistent cross platform behaviour, which means it no longer aligns as neatly with the sensibilities of experienced users of POSIX systems. Cheers, Nick. > > -- > Greg > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Thu Jan 9 09:03:53 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 9 Jan 2014 19:03:53 +1100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: On Thu, Jan 9, 2014 at 5:50 PM, Lennart Regebro wrote: > To be honest, you can define text as "A stream of bytes that are split > up in lines separated by a linefeed", and do some basic text > processing like that. Just very *basic*, but still. Replacing > characters. Extracting certain lines etc. You would have to define it as "A stream of bytes encoded in {ASCII|Latin-1|CP-1252|UTF-8} that" etc etc. Otherwise, those bytes might be EBCDIC, UTF-16, or anything else, and your code will fail. And once you've demanded that, well, you're right back here with clarifying encodings, so you may as well just pass encoding="ascii" and do it honestly. ChrisA From regebro at gmail.com Thu Jan 9 09:14:28 2014 From: regebro at gmail.com (Lennart Regebro) Date: Thu, 9 Jan 2014 09:14:28 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <7weh4heibj.fsf@benfinney.id.au> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7weh4heibj.fsf@benfinney.id.au> Message-ID: On Thu, Jan 9, 2014 at 8:16 AM, Ben Finney wrote: > Nick Coghlan writes: >> Set the mode to "rb", process it as binary. Done. > > Which entails abandoning the stated goal of ?just want to parse text > files? :-) Only if your definition of "text files" means it's unicode. From mark at hotpy.org Thu Jan 9 10:01:27 2014 From: mark at hotpy.org (Mark Shannon) Date: Thu, 09 Jan 2014 09:01:27 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <7wbnzmf26k.fsf@benfinney.id.au> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: <52CE6567.9060506@hotpy.org> On 09/01/14 00:07, Ben Finney wrote: > Kristj?n Valur J?nsson writes: > >> Believe it or not, sometimes you really don't care about encodings. >> Sometimes you just want to parse text files. > > Files don't contain text, they contain bytes. Bytes only become text > when filtered through the correct encoding. > I'm glad someone pointed this out. From kristjan at ccpgames.com Thu Jan 9 10:06:42 2014 From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=) Date: Thu, 9 Jan 2014 09:06:42 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <7wzjn6dln4.fsf@benfinney.id.au> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: > -----Original Message----- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Ben Finney > Sent: 9. jan?ar 2014 00:50 > To: python-dev at python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > Kristj?n Valur J?nsson writes: > > > I didn't used to must. Why must I must now? Did the universe just > > shift when I fired up python3? > > In a sense, yes. The world of software has been shifting for decades, as a > reasult of broader changes in how different segments of humanity have > changed their interactions, and thereby changed their expectations of what > computers can do with their data. Do I speak Chinese to my grocer because china is a growing force in the world? Or start every discussion with my children with a negotiation on what language to use? I get all the talk about Unicode, and interoperability and foreign languages and the world (I'm Icelandic, after all.) The point I'm trying to make, and which I think you are missing is this: A tool that I have been happily using on my own system, to my own ends (I'm not writing international spam posts or hosting a United Nations election, but parsing and writing config.ini files, say) just became harder to use for that purpose. I think I'm not the only one to realize this, otherwise, PEP460 wouldn't be there. Anyway, I'll duck out now *ducks* K From p.f.moore at gmail.com Thu Jan 9 10:19:15 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 9 Jan 2014 09:19:15 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <52CE6567.9060506@hotpy.org> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <52CE6567.9060506@hotpy.org> Message-ID: On 9 January 2014 09:01, Mark Shannon wrote: > On 09/01/14 00:07, Ben Finney wrote: >> >> Kristj?n Valur J?nsson writes: >> >>> Believe it or not, sometimes you really don't care about encodings. >>> Sometimes you just want to parse text files. >> >> >> Files don't contain text, they contain bytes. Bytes only become text >> when filtered through the correct encoding. >> > I'm glad someone pointed this out. Try working on Windows with Powershell as your default shell for a while. You learn that message *very* fast. You end up with a mix of CP1250 and UTF-16 files, and you can no longer even assume that a file of "simple text" is in an ASCII-compatible encoding. After tools like grep fail to work often enough, you get a really strong sense of why knowing the encoding matters (and you feel this urge to rewrite all the GNU tools in Python 3 ;-)). And that's on a single PC in an English-speaking locale :-( (You also get this fun with the ? sign being encoded differently in the console and the GUI). So it's not just people that "use funny foreign languages" (apologies to 99% of the globe for that :-)) who are affected. I assume Kristj?n knows all this, given the "?" in his name :-) But certainly just using open without specifying an encoding has always served me fine in Python 3, in the sense that it does at least as well as Python 2 So I think that if this discussion is to be of any real benefit, a specific example is needed. I honestly don't think I've ever encountered a case where "Sometimes [I] just want to parse text files" and code that uses the default encoding (i.e., looks pretty much identical to Python 2) has *failed* to do the job for me. PEP460 is addressing a very specific use case, and certainly isn't for "just parsing text files" - at least as I understand it. Paul. From stefanrin at gmail.com Thu Jan 9 10:31:57 2014 From: stefanrin at gmail.com (Stefan Ring) Date: Thu, 9 Jan 2014 10:31:57 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: > just became harder to use for that purpose. The entire discussion reminds me very much of the situation with file names in OS X. Whenever I want to look at an old zip file or tarball which happens to have been lying around on my hard drive for a decade or more, I can't because OS X insist that file names be encoded in UTF-8 and just throw errors if that requirement is not met. And certainly I cannot be required to re-encode all files to the then-favored encoding continually ? although favors don?t change often and I?m willing to bet that UTF-8 is here to stay, but it has already happened twice in my active computer life (DOS -> latin-1 -> UTF-8). Going back to the old tarballs, OS X is completely useless for handling them as a result of their encoding decision, and I have to move to a Linux machine which just does not care about encodings. PS I was very relieved to find out that os.listdir() ? jut to pick one file name-related function ? will still return bytes if requested, as it is not at all uncommon (at least for me) to have conflicting file name encodings in different parts of a filesystem. From martin at v.loewis.de Thu Jan 9 11:14:00 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Jan 2014 11:14:00 +0100 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: References: <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> <20140106040650.33B4D250165@webabinitio.net> <6olqc9hvcfkff6km7rc1hre06ae6ereo23@4ax.com> Message-ID: <52CE7668.5060102@v.loewis.de> Am 08.01.14 16:03, schrieb Nick Coghlan: > On 9 January 2014 00:43, Bob Hanson wrote: >> When I read this comment of yours, Guido, I immediately started >> wondering about this. You may well be right -- indeed, I have a >> very old install (c.2007) which has not been updated (other than >> one or three new MS "drivers"). >> >> Perhaps the Python 3.4.0b2 MSI installer uses a new capability, >> which, as you say, causes the installer to at least attempt to >> upgrade...? > > I believe the pip bootstrapping involves an MSI feature we haven't > previously used (MvL would be able to confirm). If so, then MSI may be > looking for a new version to interpret that new setting. That's not true. The pip bootstrapping uses a custom action, and we already have one that is similar (compile to pyc), although that isn't run by default. My guess is that it might try verifying signatures, and somehow tries to obtain the CA certificates (although it's puzzling that it would get them from akamai - perhaps MS is hosting the CA bundle there). Regards, Martin From stephen at xemacs.org Thu Jan 9 11:20:57 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 09 Jan 2014 19:20:57 +0900 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <52CE6567.9060506@hotpy.org> Message-ID: <87ob3lfoc6.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > So I think that if this discussion is to be of any real benefit, a > specific example is needed. I honestly don't think I've ever > encountered a case where "Sometimes [I] just want to parse text > files" and code that uses the default encoding (i.e., looks pretty > much identical to Python 2) has *failed* to do the job for me. I don't understand why it fails for Kristj?n, but I can tell you why it failed for me: Mac OS X "Snow Leopard" (at least on my box, and perhaps due to my misconfiguration) doesn't set the locale variables and for some reason the fallback for locale.getpreferredencoding() is not UTF-8 (== sys.getfilesystemencoding()) nor some Japanese encoding (Japanese is my system language), but US-ASCII! Naturally, putting LANG=ja_JP.UTF-8 in my shell startup fixed that once and for all, so as I say I don't understand why Kristj?n has a problem. From kristjan at ccpgames.com Thu Jan 9 11:15:08 2014 From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=) Date: Thu, 9 Jan 2014 10:15:08 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: > -----Original Message----- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Stefan Ring > Sent: 9. jan?ar 2014 09:32 > To: python-dev at python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > > just became harder to use for that purpose. > > The entire discussion reminds me very much of the situation with file names > in OS X. Whenever I want to look at an old zip file or tarball which happens to > have been lying around on my hard drive for a decade or more, I can't > because OS X insist that file names be encoded in > UTF-8 and just throw errors if that requirement is not met. And certainly I > cannot be required to re-encode all files to the then-favored encoding > continually ? although favors don?t change often and I?m willing to bet that > UTF-8 is here to stay, but it has already happened twice in my active > computer life (DOS -> latin-1 -> UTF-8). Well, yes. Also, the problem I'm describing has to do with real world stuff. This is the python 2 program: with open(fn1) as f1: with open(fn2, 'w') as f2: f2.write(process_text(f1.read()) Moving to python 3, I found that this quickly caused problems. So, I explicitly added an encoding. Better guess an encoding, something that is likely, e.g. cp1252 with open(fn1, encoding='cp1252') as f1: with open(fn2, 'w', encoding='cp1252') as f2: f2.write(process_text(f1.read()) This mostly worked. But then, with real world data, sometimes we found that even files we declared to be cp1252, sometimes contained invalid code points. Was the file really in cp1252? Or did someone mess up somewhere? Or simply take a small poet's leave with the specification? This is when it started to become annoying. I mean, clearly something was broken at some point, or I don't know the exactly correct encoding of the file. But this is not the place to correct that mistake. I want my program to be robust towards such errors. And these errors exist. So, the third version was: with open(fn1, "b") as f1: with open(fn2, 'wb') as f2: f2.write(process_bytes(f1.read()) This works, but now I have a bytes object which is rather limited in what it can do. Also, all all string constants in my process_bytes() function have to be b'foo', rather than 'foo'. Only much later did I learn about 'surrogateescape'. How is a new user to python to know about it? The final version would probably be this: with open(fn1, encoding='cp1252', errors='surrogateescape') as f1: with open(fn2, 'w', encoding='cp1252', errors='surrogateescape') as f2: f2.write(process_text(f1.read()) Will this always work? I don't know. I hope so. But it seems very verbose when all you want to do is munge on some bytes. And the 'surrogateescape' error handler is not something that a newcomer to the language, or someone coming from python2, is likely to automatically know about. Could this be made simpler? What If we had an encoding that combines 'ascii' and 'surrogateescape'? Something that allows you to read ascii text with unknown high order bytes without this unneeded verbosity? Something that would be immediately obvious to the newcomer? K From ncoghlan at gmail.com Thu Jan 9 11:32:26 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jan 2014 20:32:26 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> Message-ID: On 9 Jan 2014 11:29, "INADA Naoki" wrote: > > >> And I think everyone was well intentioned - and python3 covers most of the >> bases, but working with binary data is not only a "wire-protocol programmer's" >> problem. If you're working with binary data, use the binary API offered by bytes, bytearray and memoryview. > Needing a library to wrap bytesthing.format('ascii', 'surrogateescape') >> or some such thing makes python3 less approachable for those who haven't >> learned that yet - which was almost all of us at some point when we started >> programming. > > Totally agree with you. If you're on a relatively modern OS, everything should be UTF-8 and you should be fine as a beginner. When you start encountered malformed data, Python 3 should throw an error, and provide an opportunity to learn more (by looking up the error message), where Python 2 would silently corrupt the data stream. Python 2 enshrined a data model eminently suitable for boundary code that dealt with ASCII compatible binary protocols (like web frameworks) as the default text model. Application code then needed to take special steps to get correct behaviour for the full Unicode range. In essence, the Python 2 text model is the POSIX text model with Unicode support bolted on to the side to make it at least *possible* to write correct application code. This is completely backwards. Web applications vastly outnumber web frameworks, and the same goes for every other domain: applications are vastly more common than the libraries and frameworks that handle data transformations at system boundaries on their behalf, so making the latter easier to write at the expense of the former is a deeply flawed design choice. So Python 3 reverses the situation: the core text model is now more appropriate for the central application code, *after* the boundary code has cleaned up the murky details of wire protocols and file formats. This is pretty easy to deal with for *new* Python 3 code, since you just write things to deal with either bytes or text as appropriate. However, there is some code written for Python 2 that relies more heavily on the ability to treat ascii compatible binary data as both binary data *and* as text. This is the use case that Python 3 treats as a more specialised use case (perhaps benefitting from a specialised third party type), whereas Python 2 supports it by default. This is also the use case that relied most heavily on implicit encoding and decoding, since that's the mechanism that allows the 8-bit and Unicode paths to share string literals. Cheers, Nick. > > > -- > INADA Naoki > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Thu Jan 9 11:37:59 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Jan 2014 11:37:59 +0100 Subject: [Python-Dev] [RELEASED] Python 3.4.0b2 In-Reply-To: References: <1388968272.16645.66955061.64551259@webmail.messagingengine.com> <601kc9tarddbkqb9fliq4hkbo0odpp2isk@4ax.com> <2o4kc9l94frgidjkgrist88ho8nchd7k69@4ax.com> <5q7kc9lin0fk38q7u3qqq7ofbeq05s8veh@4ax.com> <20140106040650.33B4D250165@webabinitio.net> Message-ID: <52CE7C07.1020303@v.loewis.de> Am 06.01.14 17:26, schrieb Michael Urman: > Here's some more guesswork. Does it seem possible that msiexec is > trying to verify the revocation status of the certificate used to sign > the python .msi file? Per > http://blogs.technet.com/b/pki/archive/2006/11/30/basic-crl-checking-with-certutil.aspx > it looks like crl.microsoft.com is the host; this is hosted on akamai: > crl.microsoft.com is an alias for crl.www.ms.akadns.net. > crl.www.ms.akadns.net is an alias for a1363.g.akamai.net. I think that could be close. The MSI file has two signatures in it: the PSF code signing signature, and a Verisign timestamping signature. For the PSF certificate, the CRL is at csc3-2010-crl.verisign.com, which is (here) a CNAME for crl.ws.symantec.com.edgekey.net, which in turn is a CNAME for e6845.ce.akamaiedge.net. The timestamping signature has its CRL at ts-crl.ws.symantec.com, which is a CNAME for crl.ws.symantec.com.edgekey.net again. So the most plausible reason is indeed that it tries to download CRLs, though not Microsoft ones, but Verisign/Symantic ones. Regards, Martin From p.f.moore at gmail.com Thu Jan 9 11:52:54 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 9 Jan 2014 10:52:54 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: On 9 January 2014 10:15, Kristj?n Valur J?nsson wrote: > Also, the problem I'm describing has to do with real world stuff. > This is the python 2 program: > with open(fn1) as f1: > with open(fn2, 'w') as f2: > f2.write(process_text(f1.read()) > > Moving to python 3, I found that this quickly caused problems. You don't say what problems, but I assume encoding/decoding errors. So the files apparently weren't in the system encoding. OK, at that point I'd probably say to heck with it and use latin-1. Assuming I was sure that (a) I'd never hit a non-ascii compatible file (e.g., UTF16) and (b) I didn't have a decent means of knowing the encoding. One thing that genuinely is difficult is that because disk files don't have any out-of-band data defining their encoding, it *can* be hard to know what encoding to use in an environment where more than one encoding is common. But this isn't really a Python issue - as I say, I've hit it with GNU tools, and I've had to explain the issue to colleagues using Java on many occasions. The key difference is that with grep, people blame the file, whereas with Python people blame the language :-) (Of course, with Java, people expect this sort of problem so they blame the perverseness of the universe as a whole... ;-)) Paul. From storchaka at gmail.com Thu Jan 9 12:39:25 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 09 Jan 2014 13:39:25 +0200 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52CC68CF.5020806@stoneleaf.us> References: <20140107205308.05e1b5ce@fsol> <52CC68CF.5020806@stoneleaf.us> Message-ID: 07.01.14 22:51, Ethan Furman ???????(??): > On 01/07/2014 12:39 PM, Serhiy Storchaka wrote: >> * It clutters up hg log and hg blame results. Every time when you >> change clinic.py to generate different output, it >> touches multiple lines in all files which use Argument Clinic and >> clutters up their history. > > I think this is the reason to focus on -- the others seem like editor > issues, or easily resolved by the second or third options. AFAIK you don't write much C code. So perhaps C sources maintainability is not too valuable for you. From steve at pearwood.info Thu Jan 9 13:28:54 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 9 Jan 2014 23:28:54 +1100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: <20140109122854.GF3869@ando> On Thu, Jan 09, 2014 at 05:11:06PM +1000, Nick Coghlan wrote: > On 9 January 2014 10:07, Ben Finney wrote: > > So, if what you want is to parse text and not get gibberish, you need to > > *tell* Python what the encoding is. That's a brute fact of the world of > > text in computing. > > Set the mode to "rb", process it as binary. Done. A nice point, but really, you lose a lot by doing so. Even simple things like the ability to write: if word[0] == 'X' instead you have to write things like: if word[0:1] = b'X' if chr(word[0]) == 'X' if word[0] == ord('X') if word[0] == 0x58 (pick the one that annoys you the least). And while bytes objects do have a surprising (to me) number of string-ish methods, like upper(), there are a few missing, like format() and isnumeric(). So it's not quite as straightforward as "done". If it were, we wouldn't need text strings :-) -- Steven From solipsis at pitrou.net Thu Jan 9 13:41:30 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 9 Jan 2014 13:41:30 +0100 Subject: [Python-Dev] Python3 "complexity" References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: <20140109134130.73b1f720@fsol> On Thu, 9 Jan 2014 10:15:08 +0000 Kristj?n Valur J?nsson wrote: > > Moving to python 3, I found that this quickly caused problems. So, I explicitly added an encoding. Better guess an encoding, something that is likely, e.g. cp1252 > with open(fn1, encoding='cp1252') as f1: > with open(fn2, 'w', encoding='cp1252') as f2: > f2.write(process_text(f1.read()) If you don't "care" about the encoding, why don't you use latin1? Things will roundtrip fine and work as well as under Python 2. Regards Antoine. From solipsis at pitrou.net Thu Jan 9 13:48:03 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 9 Jan 2014 13:48:03 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <52CE1D65.6030206@mrabarnett.plus.com> Message-ID: <20140109134803.4319aa85@fsol> On Thu, 09 Jan 2014 03:54:13 +0000 MRAB wrote: > I'm thinking that the "i" format could be used for signed integers and > the "u" for unsigned integers. The width would be the number of bytes. > You would also need to have a way of specifying the endianness. > > For example: > > >>> b'{:<2i}'.format(256) > b'\x01\x00' > >>> b'{:>2i}'.format(256) > b'\x00\x01' The goal is not to add an alternative to the struct module. If you need binary packing/unpacking, just use struct. Regards Antoine. From solipsis at pitrou.net Thu Jan 9 13:46:24 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 9 Jan 2014 13:46:24 +0100 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <52CE3214.1010900@canterbury.ac.nz> Message-ID: <20140109134624.3882f704@fsol> On Thu, 9 Jan 2014 17:09:10 +1000 Nick Coghlan wrote: > > There's also the fact that POSIX folks are used to "r" and "rb" being > the same thing. Which fails immediately under Windows :-) Regards Antoine. From kristjan at ccpgames.com Thu Jan 9 14:00:59 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Thu, 9 Jan 2014 13:00:59 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: > -----Original Message----- > From: Paul Moore [mailto:p.f.moore at gmail.com] > Sent: 9. jan?ar 2014 10:53 > To: Kristj?n Valur J?nsson > Cc: Stefan Ring; python-dev at python.org > > Moving to python 3, I found that this quickly caused problems. > > You don't say what problems, but I assume encoding/decoding errors. So the > files apparently weren't in the system encoding. OK, at that point I'd > probably say to heck with it and use latin-1. Assuming I was sure that (a) I'd > never hit a non-ascii compatible file (e.g., UTF16) and > (b) I didn't have a decent means of knowing the encoding. Right. But even latin-1, or better, cp1252 (on windows) does not solve it because these have undefined code points. So you need 'surrogateescape' error handling as well. Something that I didn't know at the time, having just come from python 2 and knowing its Unicode model well. > > One thing that genuinely is difficult is that because disk files don't have any > out-of-band data defining their encoding, it *can* be hard to know what > encoding to use in an environment where more than one encoding is > common. But this isn't really a Python issue - as I say, I've hit it with GNU > tools, and I've had to explain the issue to colleagues using Java on many > occasions. The key difference is that with grep, people blame the file, > whereas with Python people blame the language :-) (Of course, with Java, > people expect this sort of problem so they blame the perverseness of the > universe as a whole... ;-)) Which reminds me, can Python3 read text files with BOM automatically yet? K From martin at v.loewis.de Thu Jan 9 14:09:02 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Jan 2014 14:09:02 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: <52CE9F6E.1050309@v.loewis.de> > Right. But even latin-1, or better, cp1252 (on windows) does not solve it because these have undefined > code points. That's not true. latin-1 does not have undefined code points. Regards, Martin From kristjan at ccpgames.com Thu Jan 9 13:55:35 2014 From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=) Date: Thu, 9 Jan 2014 12:55:35 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <20140109134130.73b1f720@fsol> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109134130.73b1f720@fsol> Message-ID: > -----Original Message----- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Antoine Pitrou > Sent: 9. jan?ar 2014 12:42 > To: python-dev at python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > On Thu, 9 Jan 2014 10:15:08 +0000 > Kristj?n Valur J?nsson wrote: > > > > Moving to python 3, I found that this quickly caused problems. So, I > > explicitly added an encoding. Better guess an encoding, something that is > likely, e.g. cp1252 with open(fn1, encoding='cp1252') as f1: > > with open(fn2, 'w', encoding='cp1252') as f2: > > f2.write(process_text(f1.read()) > > If you don't "care" about the encoding, why don't you use latin1? > Things will roundtrip fine and work as well as under Python 2. Because latin1 does not define all code points, giving you errors there. Same with cp1252. Which is why you need 'surrogateescape' in addition. K From solipsis at pitrou.net Thu Jan 9 14:17:38 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 9 Jan 2014 14:17:38 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109134130.73b1f720@fsol> Message-ID: <20140109141738.13454540@fsol> On Thu, 9 Jan 2014 12:55:35 +0000 Kristj?n Valur J?nsson wrote: > > If you don't "care" about the encoding, why don't you use latin1? > > Things will roundtrip fine and work as well as under Python 2. > > Because latin1 does not define all code points, giving you errors there. >>> b = bytes(range(256)) >>> b.decode('latin1') '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0????????????\xad??????????????????????????????????????????????????????????????????????????????????' Not sure which errors you were getting? Regards Antoine. From p.f.moore at gmail.com Thu Jan 9 14:24:53 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 9 Jan 2014 13:24:53 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: On 9 January 2014 13:00, Kristj?n Valur J?nsson wrote: >> You don't say what problems, but I assume encoding/decoding errors. So the >> files apparently weren't in the system encoding. OK, at that point I'd >> probably say to heck with it and use latin-1. Assuming I was sure that (a) I'd >> never hit a non-ascii compatible file (e.g., UTF16) and >> (b) I didn't have a decent means of knowing the encoding. > Right. But even latin-1, or better, cp1252 (on windows) does not solve it because these have undefined > code points. So you need 'surrogateescape' error handling as well. Something that I didn't know at > the time, having just come from python 2 and knowing its Unicode model well. >>> bin = bytes(range(256)) >>> bin b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\ x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\ x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x 9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb 8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4 \xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\ xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' >>> bin.decode('latin-1') '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x 1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x 80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9 c\x9d\x9e\x9f\xa0???\xa4?\xa6\xa7\xa8\xa9???\xad\xae\xaf???\xb3\xb4?\xb6?\xb8\xb9????\xbe?\xc0\xc1\xc2\xc3????\xc 8?\xca\xcb\xcc\xcd\xce\xcf\xd0?\xd2\xd3\xd4\xd5?\xd7\xd8\xd9\xda\xdb?\xdd\xde????\xe3????????????\xf0????\xf5??\x f8????\xfd\xfe?' No undefined bytes there. If you mean that latin-1 can't encode all of the Unicode code points, then how did those code points get in there? Presumably you put them in, and so you're not just playing with the ASCII text parts. And you *do* need to understand encodings. >> One thing that genuinely is difficult is that because disk files don't have any >> out-of-band data defining their encoding, it *can* be hard to know what >> encoding to use in an environment where more than one encoding is >> common. But this isn't really a Python issue - as I say, I've hit it with GNU >> tools, and I've had to explain the issue to colleagues using Java on many >> occasions. The key difference is that with grep, people blame the file, >> whereas with Python people blame the language :-) (Of course, with Java, >> people expect this sort of problem so they blame the perverseness of the >> universe as a whole... ;-)) > > Which reminds me, can Python3 read text files with BOM automatically yet? If by "automatically" you mean "reads the BOM and chooses an appropriate encoding based on it" then I don't know, but I suspect not. But unless you're worried about 2-byte encodings (see! you need to understand encodings again!) latin-1 will still work. It sounds to me like what you *really* want is something that autodetects encodings on Windows in the same sort of way as other Windows tools like Notepad does. That's a fair thing to want, but no, Python doesn't provide it (nor did Python 2). I suspect that it would be possible to write a codec to do this, though. Maybe there's even one on PyPI. Paul From kristjan at ccpgames.com Thu Jan 9 14:37:11 2014 From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=) Date: Thu, 9 Jan 2014 13:37:11 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <20140109141738.13454540@fsol> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109134130.73b1f720@fsol> <20140109141738.13454540@fsol> Message-ID: > -----Original Message----- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Antoine Pitrou > Sent: 9. jan?ar 2014 13:18 > To: python-dev at python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > On Thu, 9 Jan 2014 12:55:35 +0000 > Kristj?n Valur J?nsson wrote: > > > If you don't "care" about the encoding, why don't you use latin1? > > > Things will roundtrip fine and work as well as under Python 2. > > > > Because latin1 does not define all code points, giving you errors there. > > >>> b = bytes(range(256)) > >>> b.decode('latin1') > '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12 > \x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,- > ./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijkl > mnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x > 8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9 > c\x9d\x9e\x9f\xa0????????????\xad????????????????????????????????? > ?????????????????????????????????????????????????' You are right. I'm talking about "cp1252" which is the windows version thereof: >>> s = ''.join(chr(i) for i in range(256)) >>> s.decode('cp1252') Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\encodings\cp1252.py", line 15, in decode return codecs.charmap_decode(input,errors,decoding_table) UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 129: character maps to This definition is funny, because according to Wikipedia, it is a "superset" of 8869-1 ( latin1) See http://en.wikipedia.org/wiki/Cp1252 Also, see http://en.wikipedia.org/wiki/Latin1 There is confusion there. The iso8859-1 does in fact not define the control codes in range 128 to 158, whereas the Unicode page Latin 1 does. Strictly speaking, then, a Latin1 (or more specifically, ISO8859-1) decoder should error on these characters. the 'Latin1' codec therefore is not a true 8859-1 codec. K From victor.stinner at gmail.com Thu Jan 9 14:50:41 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 9 Jan 2014 14:50:41 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109134130.73b1f720@fsol> <20140109141738.13454540@fsol> Message-ID: 2014/1/9 Kristj?n Valur J?nsson : > This definition is funny, because according to Wikipedia, it is a "superset" of 8869-1 ( latin1) Bytes 0x80..0x9f are unassigned in ISO/CEI 8859-1... but are assigned in (IANA's) ISO-8859-1. Python implements the latter, ISO-8859-1. Wikipedia says "This encoding is a superset of ISO 8859-1, but differs from the IANA's ISO-8859-1". Victor From kristjan at ccpgames.com Thu Jan 9 14:40:08 2014 From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=) Date: Thu, 9 Jan 2014 13:40:08 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109134130.73b1f720@fsol> <20140109141738.13454540@fsol> Message-ID: > -----Original Message----- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Kristj?n Valur > J?nsson > Sent: 9. jan?ar 2014 13:37 > To: Antoine Pitrou; python-dev at python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > This definition is funny, because according to Wikipedia, it is a "superset" of > 8869-1 ( latin1) See http://en.wikipedia.org/wiki/Cp1252 > Also, see > http://en.wikipedia.org/wiki/Latin1 > > There is confusion there. The iso8859-1 does in fact not define the control > codes in range 128 to 158, whereas the Unicode page Latin 1 does. > Strictly speaking, then, a Latin1 (or more specifically, ISO8859-1) decoder > should error on these characters. > the 'Latin1' codec therefore is not a true 8859-1 codec. See also: http://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block) for the latin-1 supplement, not to be confused with 8859-1. The header of the 8859-1 page is telling: """ ISO/IEC 8859-1 From Wikipedia, the free encyclopedia (Redirected from Latin1) For the Unicode block also called "Latin 1", see Latin-1 Supplement (Unicode block). For the character encoding commonly mislabeled as "ISO-8859-1", see Windows-1252. """ K From dholth at gmail.com Thu Jan 9 15:03:40 2014 From: dholth at gmail.com (Daniel Holth) Date: Thu, 9 Jan 2014 09:03:40 -0500 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: <20140109134624.3882f704@fsol> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <52CE3214.1010900@canterbury.ac.nz> <20140109134624.3882f704@fsol> Message-ID: So the customer you're looking for is the person who cares a lot about encodings, knows how to do Unicode correctly, and has noticed that certain valid cases not limited to imperialist simpletons (dealing with specific common things invented before 1996, dealing with mixed encodings, doing what Nick describes as "ASCII compatible binary protocols") are *more complicated to do correctly* in Python 3 because Python 3 undeniably has more complicated though probably better *Unicode* support. N.b. WSGI, email, url parsing etc. The same person loves Python, all the other Python 3 features, and probably you personally, but mostly does not write programs in the domains that Python 3 makes easier. They emphatically do not want the Python 2 model especially not implicit coercion. They only want additional tools for text or string processing in Python 3. From solipsis at pitrou.net Thu Jan 9 15:07:02 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 9 Jan 2014 15:07:02 +0100 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <52CE3214.1010900@canterbury.ac.nz> <20140109134624.3882f704@fsol> Message-ID: <20140109150702.3a4b965f@fsol> On Thu, 9 Jan 2014 09:03:40 -0500 Daniel Holth wrote: > They emphatically do not want the Python 2 > model especially not implicit coercion. They only want additional > tools for text or string processing in Python 3. That's a good point. Now it's up to people who need those additional tools to propose them. We can't second-guess everyone's needs. Regards Antoine. From steve at pearwood.info Thu Jan 9 15:13:33 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 10 Jan 2014 01:13:33 +1100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: <20140109141333.GG3869@ando> On Thu, Jan 09, 2014 at 01:00:59PM +0000, Kristj?n Valur J?nsson wrote: > Which reminds me, can Python3 read text files with BOM automatically yet? I'm not sure what you mean by that. If you mean, can Python3 distinguish between UTF-16BE and UTF-16LE on the basis of a BOM, then it's been able to do that for a long time: steve at orac:~$ hexdump sample-utf-16.txt 0000000 feff 0048 0065 006c 006c 006f 0020 0057 0000010 006f 0072 006c 0064 0021 000a 00a2 00a3 0000020 00a7 2022 00b6 00df 03c0 2248 2206 000a 0000030 steve at orac:~$ python3.1 -c "print(open('sample-utf-16.txt', encoding='utf-16').read())" Hello World! ????????? If you mean, "Will Python assume that the presence of bytes FEFF or FFFE at the start of a file means that it is encoded in UTF-16?", then as far as I know, the answer is "No": [steve at ando ~]$ python3.3 -c "print(open('sample-utf-16.txt').read())" Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.3/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte I wouldn't want it to guess the encoding by default. See the Zen about ambiguity. -- Steven From kristjan at ccpgames.com Thu Jan 9 15:24:00 2014 From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=) Date: Thu, 9 Jan 2014 14:24:00 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109134130.73b1f720@fsol> <20140109141738.13454540@fsol> Message-ID: > -----Original Message----- > From: Victor Stinner [mailto:victor.stinner at gmail.com] > Sent: 9. jan?ar 2014 13:51 > To: Kristj?n Valur J?nsson > Cc: Antoine Pitrou; python-dev at python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > 2014/1/9 Kristj?n Valur J?nsson : > > This definition is funny, because according to Wikipedia, it is a > > "superset" of 8869-1 ( latin1) > > Bytes 0x80..0x9f are unassigned in ISO/CEI 8859-1... but are assigned in > (IANA's) ISO-8859-1. > > Python implements the latter, ISO-8859-1. > > Wikipedia says "This encoding is a superset of ISO 8859-1, but differs from > the IANA's ISO-8859-1". > Thanks. That's entirely non-confusing :) " ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429." So anyway, yes, Python's "latin1" encoding does cover the entire 256 range. But on windows we use cp1252 instead which does not, but instead defines useful and common windows characters in many of the control caracters slots. Hence the need for "surrogateescape" to be able to roundtrip characters. Again, this is non-obvious, and knowing from my experience with cp1252, I had no way of guessing that the "subset", i.e. latin1, would indeed cover all the range. Two things then I have learned since my initial foray into parsing ascii files with python3: Surrogateescapes and "latin1 in python == IANA's ISO-8859-1 which does indeed define the whole 8 bit range". K From ethan at stoneleaf.us Thu Jan 9 16:26:07 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 09 Jan 2014 07:26:07 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> <52CC68CF.5020806@stoneleaf.us> Message-ID: <52CEBF8F.3000500@stoneleaf.us> On 01/09/2014 03:39 AM, Serhiy Storchaka wrote: > 07.01.14 22:51, Ethan Furman ???????(??): > > AFAIK you don't write much C code. So perhaps C sources maintainability is not too valuable for you. I don't write much C code yet, no, but C source maintainability is even more important to me because of it. Having to search several files for something makes it more difficult for me to find what I need. I have the same issues with Python code, too. Back in my windows days I had some custom functions to make py code browsing much nicer in Vim; then I changed jobs, forgot to grab the code, and now my py files are unfolded and large. So far just searching for what I'm looking for has worked well enough that I haven't reimplemented my lost functions. But that is an editor issue, not a file issue. -- ~Ethan~ From stephen at xemacs.org Thu Jan 9 16:33:47 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 10 Jan 2014 00:33:47 +0900 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <20140109122854.GF3869@ando> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <20140109122854.GF3869@ando> Message-ID: <87k3e9f9us.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > If it were, we wouldn't need text strings :-) Speak for yourself, Kemosabe. Red man need Unicode, full meal not just a few bytes. From barry at python.org Thu Jan 9 18:34:14 2014 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Jan 2014 12:34:14 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <87ha9fgjon.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140107154650.0fd63db1@anarchist.wooz.org> <1389134006.2657.67865337.4C46F1B4@webmail.messagingengine.com> <87ha9fgjon.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140109123414.6bb347f4@anarchist.wooz.org> On Jan 08, 2014, at 01:51 PM, Stephen J. Turnbull wrote: >Benjamin Peterson writes: > > > I agree. This is a very important, much-requested feature for low-level > > networking code. > >I hear it's much-requested, but is there any description of typical >use cases? The two unported libraries that are preventing me from switching Mailman 3 to Python 3 are restish and storm. For storm, there's a viable alternative in SQLAlchemy though I haven't looked at how difficult it will be to port the model layer (even though we once did use SA). restish is tougher. I've investigated flask, pecan, wsme, and a few others that already have Python 3 support and none of them provide an API that I consider as nice a fit as restish for our standalone WSGI-based REST admin server. That's not to denigrate those other projects, it's just that I think restish hit the sweet spot, and porting Mailman 3 to some other framework so far has proven unworkable (I've tried with each of them). restish is plumbing so I think it's a good test case for Nick's observations of a wire-protocol layer library, and it's obvious that it Just Works in Python 2 but does work at all in Python 3. There have been at least 3 attempts to port restish to Python 3 and all of them get stuck in various places where you actually *can't* decide whether some data structure should be a bytes or str. Make one choice and you get stuck over here, make the other chose and you get stuck over there. I've got two abandoned branches on github with (rather old) porting attempts, and I know other developers have some branches as well. Having given up on trying to switch to a different framework, I'm starting over again with restish (really, it's wonderful :). I plan on keeping more detailed notes this time specifically so that I can help contribute to this discussion. If anybody wants to pitch in, both for the specific purpose of porting the library, and for the more general insights it could provide for this thread, please get in touch. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Thu Jan 9 18:36:37 2014 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Jan 2014 12:36:37 -0500 Subject: [Python-Dev] A test case for what's missing in Python 3 (Re: RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5) In-Reply-To: <87ha9fgjon.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140107154650.0fd63db1@anarchist.wooz.org> <1389134006.2657.67865337.4C46F1B4@webmail.messagingengine.com> <87ha9fgjon.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140109123637.46c28dd0@anarchist.wooz.org> (Resending with an adjusted Subject and not through Gmane. Apologies for duplicates.) On Jan 08, 2014, at 01:51 PM, Stephen J. Turnbull wrote: >Benjamin Peterson writes: > > > I agree. This is a very important, much-requested feature for low-level > > networking code. > >I hear it's much-requested, but is there any description of typical >use cases? The two unported libraries that are preventing me from switching Mailman 3 to Python 3 are restish and storm. For storm, there's a viable alternative in SQLAlchemy though I haven't looked at how difficult it will be to port the model layer (even though we once did use SA). restish is tougher. I've investigated flask, pecan, wsme, and a few others that already have Python 3 support and none of them provide an API that I consider as nice a fit as restish for our standalone WSGI-based REST admin server. That's not to denigrate those other projects, it's just that I think restish hit the sweet spot, and porting Mailman 3 to some other framework so far has proven unworkable (I've tried with each of them). restish is plumbing so I think it's a good test case for Nick's observations of a wire-protocol layer library, and it's obvious that it Just Works in Python 2 but does work at all in Python 3. There have been at least 3 attempts to port restish to Python 3 and all of them get stuck in various places where you actually *can't* decide whether some data structure should be a bytes or str. Make one choice and you get stuck over here, make the other chose and you get stuck over there. I've got two abandoned branches on github with (rather old) porting attempts, and I know other developers have some branches as well. Having given up on trying to switch to a different framework, I'm starting over again with restish (really, it's wonderful :). I plan on keeping more detailed notes this time specifically so that I can help contribute to this discussion. If anybody wants to pitch in, both for the specific purpose of porting the library, and for the more general insights it could provide for this thread, please get in touch. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From ncoghlan at gmail.com Thu Jan 9 18:59:15 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jan 2014 03:59:15 +1000 Subject: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) In-Reply-To: <20140109150702.3a4b965f@fsol> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <52CE3214.1010900@canterbury.ac.nz> <20140109134624.3882f704@fsol> <20140109150702.3a4b965f@fsol> Message-ID: On 9 Jan 2014 22:08, "Antoine Pitrou" wrote: > > On Thu, 9 Jan 2014 09:03:40 -0500 > Daniel Holth wrote: > > They emphatically do not want the Python 2 > > model especially not implicit coercion. They only want additional > > tools for text or string processing in Python 3. > > That's a good point. Now it's up to people who need those additional > tools to propose them. We can't second-guess everyone's needs. Note that I've tried to find prettier ways to write the standard library's URL parsing code. In addition to the original alternatives I explored, I'm currently experimenting with a generic function based approach with mixed results. I'm reserving judgement until I see how the completed conversion looks, but currently it doesn't seem any simpler than my current higher order function approach. However, the implicit conversions are *critical* to sharing constants between the two code paths in Python 2 without coercing bytes to str or vice-versa (disabling the implicit coercion breaks Unicode handling), so I'm still not sure the goal is achievable without creating a new type *specifically* for that task. Python 3 only code is generally much simpler - you can usually pick binary or text and just support one of them, rather than trying to support both in the same API. Cheers, Nick. > > Regards > > Antoine. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jan 9 19:08:46 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jan 2014 04:08:46 +1000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109134130.73b1f720@fsol> <20140109141738.13454540@fsol> Message-ID: On 9 Jan 2014 22:25, "Kristj?n Valur J?nsson" wrote: > > > > > -----Original Message----- > > From: Victor Stinner [mailto:victor.stinner at gmail.com] > > Sent: 9. jan?ar 2014 13:51 > > To: Kristj?n Valur J?nsson > > Cc: Antoine Pitrou; python-dev at python.org > > Subject: Re: [Python-Dev] Python3 "complexity" > > > > 2014/1/9 Kristj?n Valur J?nsson : > > > This definition is funny, because according to Wikipedia, it is a > > > "superset" of 8869-1 ( latin1) > > > > Bytes 0x80..0x9f are unassigned in ISO/CEI 8859-1... but are assigned in > > (IANA's) ISO-8859-1. > > > > Python implements the latter, ISO-8859-1. > > > > Wikipedia says "This encoding is a superset of ISO 8859-1, but differs from > > the IANA's ISO-8859-1". > > > > Thanks. That's entirely non-confusing :) > " ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429." > > So anyway, yes, Python's "latin1" encoding does cover the entire 256 range. But on windows we use cp1252 instead which does not, > but instead defines useful and common windows characters in many of the control caracters slots. > Hence the need for "surrogateescape" to be able to roundtrip characters. > > Again, this is non-obvious, and knowing from my experience with cp1252, I had no way of guessing that the "subset", i.e. latin1, would indeed cover all the range. Two things then I have learned since my initial foray into parsing ascii files with python3: Surrogateescapes and "latin1 in python == IANA's ISO-8859-1 which does indeed define the whole 8 bit range". http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.htmlis currently linked from the Unicode HOWTO. However, I'd be happy to offer it for direct inclusion to help make it more discoverable. Cheers, Nick. > > K > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jan 9 20:26:04 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jan 2014 05:26:04 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140108234213.0610ef63@fsol> References: <20140108234213.0610ef63@fsol> Message-ID: On 9 Jan 2014 06:43, "Antoine Pitrou" wrote: > > > Hi, > > With Victor's consent, I overhauled PEP 460 and made the feature set > more restricted and consistent with the bytes/str separation. +1 I was initially dubious about the idea, but the proposed semantics look good to me. We should probably include format_map for consistency with the str API. >However, I > also added bytearray into the mix, as bytearray objects should > generally support the same operations as bytes (and they can be useful > *especially* for network programming). So we'd define the *format* string as mutable to get a mutable result out of the formatting operations? This seems a little weird to me. It also seems weird for a format method on a mutable type to *not* perform in-place mutation. On the other hand, I don't see another obvious way to control the output type. Cheers, Nick. > > Regards > > Antoine. > > > > On Mon, 6 Jan 2014 14:24:50 +0100 > Victor Stinner wrote: > > Hi, > > > > bytes % args and bytes.format(args) are requested by Mercurial and > > Twisted projects. The issue #3982 was stuck because nobody proposed a > > complete definition of the "new" features. Here is a try as a PEP. > > > > The PEP is a draft with open questions. First, I'm not sure that both > > bytes%args and bytes.format(args) are needed. The implementation of > > .format() is more complex, so why not only adding bytes%args? Then, > > the following points must be decided to define the complete list of > > supported features (formatters): > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Jan 9 20:30:53 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 9 Jan 2014 20:30:53 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> Message-ID: <20140109203053.59c4ba1b@fsol> On Fri, 10 Jan 2014 05:26:04 +1000 Nick Coghlan wrote: > > We should probably include format_map for consistency with the str API. Yes, you're right. > >However, I > > also added bytearray into the mix, as bytearray objects should > > generally support the same operations as bytes (and they can be useful > > *especially* for network programming). > > So we'd define the *format* string as mutable to get a mutable result out > of the formatting operations? This seems a little weird to me. > > It also seems weird for a format method on a mutable type to *not* perform > in-place mutation. It's consistent with bytearray.join's behaviour: >>> x = bytearray() >>> x.join([b"abc"]) bytearray(b'abc') >>> x bytearray(b'') Regards Antoine. From kristjan at ccpgames.com Thu Jan 9 21:22:18 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Thu, 9 Jan 2014 20:22:18 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109134130.73b1f720@fsol> <20140109141738.13454540@fsol> , Message-ID: Thanks Nick. This does seem to cover it all. Perhaps it is worth mentioning cp1252 as the windows version of latin1, which _does_not_ cover all code points and hence requires surrogateescapes for best effort solution. K ________________________________ From: Nick Coghlan [ncoghlan at gmail.com] Sent: Thursday, January 09, 2014 18:08 To: Kristj?n Valur J?nsson Cc: Victor Stinner; Antoine Pitrou; python-dev at python.org Subject: Re: [Python-Dev] Python3 "complexity" http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html is currently linked from the Unicode HOWTO. However, I'd be happy to offer it for direct inclusion to help make it more discoverable. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Jan 9 22:36:05 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 9 Jan 2014 13:36:05 -0800 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: This has all gotten a bit complicated because everyone has been thinking in terms of actual encodings and actual text files. But I think the use-case here is something different: A file with a bunch of bytes in it, _some_of which are ascii, and the rest are other bytes (maybe binary data, maybe non-ascii-encoded text). I think this is the use-case that "just worked" in py2, but doesn't in py3 -- i.e. in py3 you have to choose either the binary interpretation or the ascii one, but you can't have both. If you choose ascii, it will barf when you try to decode it, if you choose binary, you lose the ability to do simple stuff with the ascii subset -- parsing, substitution, etc. Some folks have suggested using latin-1 (or other 8-bit encoding) -- is that guaranteed to work with any binary data, and round-trip accurately? and will surrogateescape work for arbitrary binary data? If this is a common need, then it would be nice for py3 to address. I know that I work with a couple file formats that have text headers followed by binary data (not as hard to deal with, but still harder in py3). And from this discussion , it seems that "wire protocols" commonly mix ascii and binary. So the decisions to be made: Is this a use-case worth supporting in the standard library? If so, how? 1) add some of the basic stuff to the bytes object - i.e. string formatting, what this all started with. 2) create a custom encoding that could losslessly convert to from this mixture to/from a unicode object. I 'm not sure if that is even possible, but it would be kind of cool. 3) create a new object, neither a string nor a bytes object that did what we want (it would look a lot like the py2 string...) 4) create a module for doing the stuff wanted with a bytes object (not very OO) Does that clarify the discussion at all? On Thu, Jan 9, 2014 at 2:15 AM, Kristj?n Valur J?nsson < kristjan at ccpgames.com> wrote: > This is the python 2 program: > with open(fn1) as f1: > with open(fn2, 'w') as f2: > f2.write(process_text(f1.read()) > I think the key point here is that this worked because a common case was ascii text and arbitrary binary mixed. As long as all the process_text() stuff is ascii only, that would work, either with arbitrary binary data or ascii-compatible encoding. The fact that it would NOT work with arbitrarily encoded data doesn't mean it's not useful for this special, but perhaps common, case. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Thu Jan 9 22:41:55 2014 From: eric at trueblade.com (Eric V. Smith) Date: Thu, 09 Jan 2014 16:41:55 -0500 Subject: [Python-Dev] [Python-checkins] peps: PEP 460: add .format_map() In-Reply-To: <3f0fsB3ww4z7Lk8@mail.python.org> References: <3f0fsB3ww4z7Lk8@mail.python.org> Message-ID: <52CF17A3.9030304@trueblade.com> I'm not sure how format_map helps in porting from 2 to 3, since it doesn't exist in any version of 2. Although that said, it's no doubt a useful feature, just not useful in code that supports both 2 and 3 with a single code base or when porting to 3. Eric. On 1/9/2014 4:02 PM, antoine.pitrou wrote: > http://hg.python.org/peps/rev/8947cdc6b22e > changeset: 5341:8947cdc6b22e > user: Antoine Pitrou > date: Thu Jan 09 22:02:01 2014 +0100 > summary: > PEP 460: add .format_map() > > files: > pep-0460.txt | 6 +++++- > 1 files changed, 5 insertions(+), 1 deletions(-) > > > diff --git a/pep-0460.txt b/pep-0460.txt > --- a/pep-0460.txt > +++ b/pep-0460.txt > @@ -24,12 +24,16 @@ > similar in syntax to ``str.format()`` (accepting positional as well as > keyword arguments). > > +* ``bytes.format_map(...)`` and ``bytearray.format_map(...)`` for an > + API similar to ``str.format_map(...)``, with the same formatting > + syntax and semantics as ``bytes.format()`` and ``bytearray.format()``. > + > > Rationale > ========= > > In Python 2, ``str % args`` and ``str.format(args)`` allow the formatting > -and interpolation of bytes strings. This feature has commonly been used > +and interpolation of bytestrings. This feature has commonly been used > for the assembling of protocol messages when protocols are known to use > a fixed encoding. > > > > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > https://mail.python.org/mailman/listinfo/python-checkins > From solipsis at pitrou.net Thu Jan 9 22:45:29 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 9 Jan 2014 22:45:29 +0100 Subject: [Python-Dev] Python3 "complexity" References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: <20140109224529.6ddd9388@fsol> On Thu, 9 Jan 2014 13:36:05 -0800 Chris Barker wrote: > > Some folks have suggested using latin-1 (or other 8-bit encoding) -- is > that guaranteed to work with any binary data, and round-trip accurately? Yes, it is. > and will surrogateescape work for arbitrary binary data? Yes, it will. Regards Antoine. From chris.barker at noaa.gov Thu Jan 9 23:00:32 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 9 Jan 2014 14:00:32 -0800 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <20140109224529.6ddd9388@fsol> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> Message-ID: On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote: > > latin-1 guaranteed to work with any binary data, and round-trip > accurately? > > Yes, it is. > > > and will surrogateescape work for arbitrary binary data? > > Yes, it will. > Then maybe this is really a documentation issue, after all. I know I learned something. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Jan 9 23:17:13 2014 From: brett at python.org (Brett Cannon) Date: Thu, 9 Jan 2014 17:17:13 -0500 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> Message-ID: On Thu, Jan 9, 2014 at 5:00 PM, Chris Barker wrote: > On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote: > >> > latin-1 guaranteed to work with any binary data, and round-trip >> accurately? >> >> Yes, it is. >> >> > and will surrogateescape work for arbitrary binary data? >> >> Yes, it will. >> > > Then maybe this is really a documentation issue, after all. > > I know I learned something. > I think the other issue is everyone is talking about keeping the data from the file in a single object. If you slice it up into pieces and decode the parts as necessary this also solves the issue. So if you had an HTTP header you could do:: raw_header, body = data.split(b'\r\n\r\n) header = raw_header.decode('ascii') # Ort whatever HTTP headers are encoded in. Now that might not easily solve the issue of the ASCII text interspersed (such as Kristj?n's "phone number in the middle of stuff" example), but it will deal with the problem. And if the numbers were separated with clean markers then this would probably still work. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Jan 9 23:23:44 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 9 Jan 2014 22:23:44 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> Message-ID: On 9 January 2014 22:00, Chris Barker wrote: > On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote: >> >> > latin-1 guaranteed to work with any binary data, and round-trip >> > accurately? >> >> Yes, it is. >> >> > and will surrogateescape work for arbitrary binary data? >> >> Yes, it will. > > > Then maybe this is really a documentation issue, after all. Certainly, the idea that you can use the latin1 codec and you'll get the same sort of "ascii works and you can safely ignore the rest"[1] behaviour that you get in Python 2 is not well promoted, and is non-obvious. Paul [1] Where "safely" means "probably not as safely as you think, but I'll try not to nag you" :-) And of course you have to make sure you don't *add* any content that uses unicode characters beyond 255, or you get encoding errors. But you weren't going to do that, were you? From ethan at stoneleaf.us Thu Jan 9 23:08:57 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 09 Jan 2014 14:08:57 -0800 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> Message-ID: <52CF1DF9.30003@stoneleaf.us> On 01/09/2014 02:00 PM, Chris Barker wrote: > On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote: >> Chris Barker wrote: >>> >>> latin-1 guaranteed to work with any binary data, and round-trip accurately? >> >> Yes, it is. >>> and will surrogateescape work for arbitrary binary data? >> >> Yes, it will. > Then maybe this is really a documentation issue, after all. > > I know I learned something. If latin1 is used to convert binary to text, how convoluted is it to then take chunks of that text and convert to int, or some other variety of unicode? For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80' If that were decoded using latin1 how would I then get the first two bytes to the integer 256 and the last six bytes to their Cyrillic meaning? (Apologies for not testing myself, short on time.) -- ~Ethan~ From p.f.moore at gmail.com Thu Jan 9 23:54:23 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 9 Jan 2014 22:54:23 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <52CF1DF9.30003@stoneleaf.us> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CF1DF9.30003@stoneleaf.us> Message-ID: On 9 January 2014 22:08, Ethan Furman wrote: > For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80' > > If that were decoded using latin1 how would I then get the first two bytes > to the integer 256 and the last six bytes to their Cyrillic meaning? > (Apologies for not testing myself, short on time.) I cannot conceive why you would. Slice the bytes then use struct.unpack on the first 2 bytes and decode on the last 6. We're talking about using latin1 for cases where you want to treat the text as essentially ascii (with a few bits of binary junk you want to ignore). Please don't take away the message that latin1 makes things "just like Python 2.X" - that's completely the wrong idea. Paul From ethan at stoneleaf.us Fri Jan 10 00:14:42 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 09 Jan 2014 15:14:42 -0800 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CF1DF9.30003@stoneleaf.us> Message-ID: <52CF2D62.8040500@stoneleaf.us> On 01/09/2014 02:54 PM, Paul Moore wrote: > On 9 January 2014 22:08, Ethan Furman wrote: >> >> For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80' >> >> If that were decoded using latin1 how would I then get the first two bytes >> to the integer 256 and the last six bytes to their Cyrillic meaning? >> (Apologies for not testing myself, short on time.) > > I cannot conceive why you would. Sorry, I was too short with my example. My use case is binary files, with ASCII metadata and binary metadata, as well as ASCII-encoded numeric values, binary-coded numeric values, ASCII-encoded boolean values, and who-knows-what-(before checking the in-band metadata)-encoded text. I have to process all of it, and before we say "It's just a documentation issue" I want to make sure it /is/ just a documentation issue. -- ~Ethan~ From chris.barker at noaa.gov Fri Jan 10 00:25:52 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 9 Jan 2014 15:25:52 -0800 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CF1DF9.30003@stoneleaf.us> Message-ID: On Thu, Jan 9, 2014 at 2:54 PM, Paul Moore > For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80' > > > > If that were decoded using latin1 how would I then get the first two > bytes > > to the integer 256 and the last six bytes to their Cyrillic meaning? > > (Apologies for not testing myself, short on time.) > > I cannot conceive why you would. Slice the bytes then use > struct.unpack on the first 2 bytes and decode on the last 6. exactly. > We're > talking about using latin1 for cases where you want to treat the text > as essentially ascii (with a few bits of binary junk you want to ignore). as so -- I want to replace a bit of ascii text surrounded by arbitrary binary: (apologies for the py2...) In [24]: b Out[24]: '\x01\x00\xd1\x80\xd1a name\xd0\x80' In [25]: u = b.decode('latin-1') In [26]: u2 = u.replace('a name', 'a different name') In [28]: b2 = u2.encode('latin-1') In [29]: b2 Out[29]: '\x01\x00\xd1\x80\xd1a different name\xd0\x80' -Chris > Please don't take away the message that latin1 makes things > "just like Python 2.X" - that's completely the wrong idea. > > Paul > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jan 10 00:20:43 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 09 Jan 2014 15:20:43 -0800 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CF1DF9.30003@stoneleaf.us> Message-ID: <52CF2ECB.3040502@stoneleaf.us> On 01/09/2014 02:54 PM, Paul Moore wrote: > On 9 January 2014 22:08, Ethan Furman wrote: >> For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80' >> >> If that were decoded using latin1 how would I then get the first two bytes >> to the integer 256 and the last six bytes to their Cyrillic meaning? >> (Apologies for not testing myself, short on time.) > > Please don't take away the message that latin1 makes things > "just like Python 2.X" - that's completely the wrong idea. Sure is! --> struct.unpack('>h', '\x01\x00') Traceback (most recent call last): File "", line 1, in TypeError: 'str' does not support the buffer interface -- ~Ethan~ From chris.barker at noaa.gov Fri Jan 10 00:53:35 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 9 Jan 2014 15:53:35 -0800 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <52CF2D62.8040500@stoneleaf.us> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CF1DF9.30003@stoneleaf.us> <52CF2D62.8040500@stoneleaf.us> Message-ID: On Thu, Jan 9, 2014 at 3:14 PM, Ethan Furman wrote: > Sorry, I was too short with my example. My use case is binary files, with > ASCII metadata and binary metadata, as well as ASCII-encoded numeric > values, binary-coded numeric values, ASCII-encoded boolean values, and > who-knows-what-(before checking the in-band metadata)-encoded text. I have > to process all of it, and before we say "It's just a documentation issue" I > want to make sure it /is/ just a documentation issue. > As I am coming to understand it -- yes, using latin-1 would let you work with all that. You could decode the binary data using latin-1, which would give you a unicode object, which would: 1) act like ascii for ascii values, for the normal string operations, search, replace, etc, etc... 2) have a 1:1 mapping of indexes to bytes in the original. 3) be not-too-bad for memory and other performance (as I understand it py3 now has a cool unicode implementation that does not waste a lot of bytes for low codepoints) 4) would preserve the binary data that was not directly touched. Though you'd still have to encode() to bytes to get chunks that could be used as binary -- i.e. passed to the struct module, or to a frombytes() or frombuffer() method of say numpy, or PIL or something... But I'm no expert.... -Chris > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > chris.barker%40noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Fri Jan 10 01:51:29 2014 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 10 Jan 2014 09:51:29 +0900 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CF1DF9.30003@stoneleaf.us> <52CF2D62.8040500@stoneleaf.us> Message-ID: latin1 is OK but is it Pythonic? I've posted suggestion about add 'bytes' as a alias for 'latin1'. http://comments.gmane.org/gmane.comp.python.ideas/10315 I want one Pythonic way to handle "binary containing ascii (or latin1 or utf-8 or other ascii compatible)". On Fri, Jan 10, 2014 at 8:53 AM, Chris Barker wrote: > On Thu, Jan 9, 2014 at 3:14 PM, Ethan Furman wrote: > >> Sorry, I was too short with my example. My use case is binary files, >> with ASCII metadata and binary metadata, as well as ASCII-encoded numeric >> values, binary-coded numeric values, ASCII-encoded boolean values, and >> who-knows-what-(before checking the in-band metadata)-encoded text. I have >> to process all of it, and before we say "It's just a documentation issue" I >> want to make sure it /is/ just a documentation issue. >> > > As I am coming to understand it -- yes, using latin-1 would let you work > with all that. You could decode the binary data using latin-1, which would > give you a unicode object, which would: > > 1) act like ascii for ascii values, for the normal string operations, > search, replace, etc, etc... > > 2) have a 1:1 mapping of indexes to bytes in the original. > > 3) be not-too-bad for memory and other performance (as I understand it py3 > now has a cool unicode implementation that does not waste a lot of bytes > for low codepoints) > > 4) would preserve the binary data that was not directly touched. > > Though you'd still have to encode() to bytes to get chunks that could be > used as binary -- i.e. passed to the struct module, or to a frombytes() or > frombuffer() method of say numpy, or PIL or something... > > But I'm no expert.... > > -Chris > > > > > > > > > > > > > > > >> >> -- >> ~Ethan~ >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ >> chris.barker%40noaa.gov >> > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com > > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Fri Jan 10 01:53:20 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 10 Jan 2014 03:53:20 +0300 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: On Thu, Jan 9, 2014 at 10:00 AM, Mark Lawrence wrote: > On 09/01/2014 06:50, Lennart Regebro wrote: >> >> On Thu, Jan 9, 2014 at 1:07 AM, Ben Finney >> wrote: >>> >>> Kristj?n Valur J?nsson writes: >>> >>>> Believe it or not, sometimes you really don't care about encodings. >>>> Sometimes you just want to parse text files. >>> >>> >>> Files don't contain text, they contain bytes. Bytes only become text >>> when filtered through the correct encoding. >> >> >> To be honest, you can define text as "A stream of bytes that are split >> up in lines separated by a linefeed", and do some basic text >> processing like that. Just very *basic*, but still. Replacing >> characters. Extracting certain lines etc. >> >> This is harder in Python 3, as bytes does not have all the >> functionality strings has, like formatting. This can probably be fixed >> in Python 3.5, if the relevant PEP gets finished. >> >> For the battery analogy, that's like saying: >> >> "I want a battery." >> >> "What kind?" >> >> "It doesn't matter, as long as it's over 5V." >> >> //Lennart >> > > "That Python 3 battery you sold me blew up when I tried using it". > > "We've been telling you for years that could happen". > > "I didn't think you actually meant it". "These new nuclear cells are awesome! But you stop from from leaking on their users?" A1: "The nuclear power is radioactive. Accept it." A2: "This is the basic stdlib container. You're supposed to protect yourself." A3: "The world is changing. Everybody should learn nuclear fission to use things properly." "..." and while we are at it, if the battery became more advanced, there is no reason to strip off simple default interface. This interface is not an abstract discussion here, but a real user experience study (I am going to spread UX virus), which starts with: 1. expectations 2. experience 3. outcomes and progressively iterate over 2 to get 3 matching 1 as close as possibly, without trying to change 1. 1 is equal to changing people - it is simple and natural solution that people practicing every day on children and subordinates. The only problem is that it is ineffective, hard and useless activity in open source environment, because most people by the nature of their neural network processes become conservative with ages. That's why people invented forks. However, for the encoding problem, there are some good default solutions. You'll have choose between different interests anyway, but here it is: 1. always open() text files in UTF-8 by default 2. introduce autodetect mode to open functions 1. read and transform on the fly, maintaining a buffer that stores original bytes and their mapping to letters. The mapping is updated as bytes frequency changes. When the buffer is full, you have the best candidate. 3. provide sane error messages 1. messages that users do actually understand 2. messages that tell how to fix the problem If interface becomes more complicated - the last thing you should do is to leave user 1:1 with interface problems. And to conclude, I am not saying that people should not learn about unicode, but the learning curve should not be as steep as Python 3 demands it. From jsbueno at python.org.br Fri Jan 10 02:03:40 2014 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 9 Jan 2014 23:03:40 -0200 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: On 9 January 2014 04:50, Lennart Regebro wrote: > To be honest, you can define text as "A stream of bytes that are split > up in lines separated by a linefeed", and do some basic text > processing like that. Just very *basic*, but still. Replacing > characters. Extracting certain lines etc. That is, until you hit a character which has a byte with the same value of ASCII newline in the middle of a multi-byte character. So, this approach is broken to start with. From rosuav at gmail.com Fri Jan 10 02:22:02 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 10 Jan 2014 12:22:02 +1100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: On Fri, Jan 10, 2014 at 11:53 AM, anatoly techtonik wrote: > 2. introduce autodetect mode to open functions > 1. read and transform on the fly, maintaining a buffer that > stores original bytes > and their mapping to letters. The mapping is updated as bytes frequency > changes. When the buffer is full, you have the best candidate. > Bad idea. Bad, bad idea! No biscuit. Sit! This sort of magic is what brings the "bush hid the facts" bug in Windows Notepad. If byte value distribution is used to guess encoding, there's no end to the craziness that can result. How do you know that the byte values 0x41 0x42 0x43 0x44 are supposed to mean upper-case ASCII letters and not a 32-bit integer or floating-point value, or some accented lower-case letter A's in EBCDIC, or anything else? Maybe if you have a whole document, AND you know for sure that it's linguistic text, then maybe - MAYBE - you could guess with reasonable reliability. But even then, how can you be sure? Remember, too, you might have to deal with something that's actually mis-encoded. If you're told this is UTF-8 and you find the byte sequence ED B3 BF, do you decide that it can't possibly be UTF-8 and pick a different encoding to decode with? That would produce no end of trouble, where the actual result you want is (most likely) to throw an error. ChrisA From ncoghlan at gmail.com Fri Jan 10 02:32:05 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jan 2014 11:32:05 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140109203053.59c4ba1b@fsol> References: <20140108234213.0610ef63@fsol> <20140109203053.59c4ba1b@fsol> Message-ID: On 10 Jan 2014 03:32, "Antoine Pitrou" wrote: > > On Fri, 10 Jan 2014 05:26:04 +1000 > Nick Coghlan wrote: > > > > We should probably include format_map for consistency with the str API. > > Yes, you're right. > > > >However, I > > > also added bytearray into the mix, as bytearray objects should > > > generally support the same operations as bytes (and they can be useful > > > *especially* for network programming). > > > > So we'd define the *format* string as mutable to get a mutable result out > > of the formatting operations? This seems a little weird to me. > > > > It also seems weird for a format method on a mutable type to *not* perform > > in-place mutation. > > It's consistent with bytearray.join's behaviour: > > >>> x = bytearray() > >>> x.join([b"abc"]) > bytearray(b'abc') > >>> x > bytearray(b'') Yeah, I guess I'm OK with us being consistent on that one. It's still weird, but also clearly useful :) Will the new binary format ever call __format__? I assume not, but it's probably best to make that absolutely explicit in the PEP. Cheers, Nick. > > > Regards > > Antoine. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Jan 10 03:23:43 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 10 Jan 2014 13:23:43 +1100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <52CF1DF9.30003@stoneleaf.us> References: <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CF1DF9.30003@stoneleaf.us> Message-ID: <20140110022343.GH3869@ando> On Thu, Jan 09, 2014 at 02:08:57PM -0800, Ethan Furman wrote: > If latin1 is used to convert binary to text, how convoluted is it to then > take chunks of that text and convert to int, or some other variety of > unicode? > > For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80' > > If that were decoded using latin1 how would I then get the first two bytes > to the integer 256 and the last six bytes to their Cyrillic meaning? > (Apologies for not testing myself, short on time.) Not terribly convoluted, but there is some double-processing. When you know up-front that some data is non-text, you shouldn't convert it to text, otherwise you're just double-processing: py> b = b'\x01\x00\xd1\x80\xd1\x83\xd0\x80' py> s = b.decode('latin1') py> num, = struct.unpack('>h', s[:2].encode('latin1')) py> assert num == 0x100 Better to just go straight from bytes to the struct, if you can: py> struct.unpack('>h', b[:2]) (256,) As for the last six bytes and "their Cyrillic meaning", which Cyrillic meaning did you have in mind? py> s = b'\x01\x00\xd1\x80\xd1\x83\xd0\x80'.decode('latin1') py> for encoding in "cp1251 ibm866 iso-8859-5 koi8-r koi8-u mac_cyrillic".split(): ... print(s[-6:].encode('latin1').decode(encoding)) ... ?????? ?????? ??? ?????? ?????? ?????? I understand that Cyrillic is an especially poor choice, since there are many incompatible Cyrillic code-pages. On the other hand, it's also an especially good example of how you need to know the encoding before you can make sense of the data. Again, note that if you know the encoding you are intending to use is not Latin-1, decoding to Latin-1 first just ends up double-handling. If you can, it is best to split your data into fields up front, and then decode each piece once only. -- Steven From tjreedy at udel.edu Fri Jan 10 03:27:35 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 09 Jan 2014 21:27:35 -0500 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CF1DF9.30003@stoneleaf.us> Message-ID: On 1/9/2014 6:25 PM, Chris Barker wrote: > as so -- I want to replace a bit of ascii text surrounded by arbitrary > binary: > (apologies for the py2...) > In [24]: b > Out[24]: '\x01\x00\xd1\x80\xd1a name\xd0\x80' > In [25]: u = b.decode('latin-1') > In [26]: u2 = u.replace('a name', 'a different name') > In [28]: b2 = u2.encode('latin-1') > In [29]: b2 > Out[29]: '\x01\x00\xd1\x80\xd1a different name\xd0\x80' Just to check, with 3.4 print(b'\x01\x00\xd1\x80\xd1a name\xd0\x80' .decode('latin-1'). replace('a name', 'a different name') .encode('latin-1') == b'\x01\x00\xd1\x80\xd1a different name\xd0\x80') >>> True The b prefix works in 2.6/7, so this code does the same thing in 2.6+ and 3.x. -- Terry Jan Reedy From steve at pearwood.info Fri Jan 10 03:39:52 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 10 Jan 2014 13:39:52 +1100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: <20140110023952.GI3869@ando> On Fri, Jan 10, 2014 at 12:22:02PM +1100, Chris Angelico wrote: > On Fri, Jan 10, 2014 at 11:53 AM, anatoly techtonik wrote: > > 2. introduce autodetect mode to open functions > > 1. read and transform on the fly, maintaining a buffer that > > stores original bytes > > and their mapping to letters. The mapping is updated as bytes frequency > > changes. When the buffer is full, you have the best candidate. > > > > Bad idea. Bad, bad idea! No biscuit. Sit! > > This sort of magic is what brings the "bush hid the facts" bug in > Windows Notepad. If byte value distribution is used to guess encoding, > there's no end to the craziness that can result. I think that heuristics to guess the encoding have their role to play, if the caller understands the risks. For example, an application might give the user the choice of specifying the codec, or having the app guess it. (I dislike the term "Auto detect", since that implies a level of certainty which often doesn't apply to real files.) There is already a third-party library, chardet, which does this. Perhaps the std lib should include this? Perhaps chardet should be considered best-of-breed "atomic reactor", but the std lib could include a "battery" to do something similar. I don't think we ought to dismiss this idea out of hand. > How do you know that > the byte values 0x41 0x42 0x43 0x44 are supposed to mean upper-case > ASCII letters and not a 32-bit integer or floating-point value, Presumably if you're reading a file intended to be text, they'll be meant to be text and not arbitrary binary blobs. Given that it is 2014 and not 1974, chances are reasonably good that bytes 0x41 0x42 0x43 0x44 are meant as ASCII letters rather than EBCDIC. But you can't be certain, and even if "ASCII capital A" is the right way to bet with byte 0x41, it's much harder to guess what 0xC9 is intended as: py> for encoding in "macroman cp1256 latin1 koi8_r".split(): ... print(b'\xC9'.decode(encoding)) ... ? ? ? ? If you know the encoding via some out-of-band metadata, that's great. If you don't, or if the specified encoding is wrong, an application may not have the luxury of just throwing up its hands and refusing to process the data. Your web browser has to display something even if the web page lies about the encoding used or contains invalid data. Even though encoding issues are more than 40 years old, making this problem older than most programmers, it's still new to many people. (Perhaps they haven't been paying attention, or living in denial that it would even happen to them, or they've just been lucky to be living in a pure ASCII world.) So a bit of sympathy to those struggling with this, but on the flip side, they need to HTFU and deal with it. Python 3 did not cause encoding issues, and in these days of code being interchanged all over the world, any programmer who doesn't have at least a basic understanding of this is like a programmer who doesn't understand why " cannot multiply correctly": py> 0.7*7 == 4.9 False -- Steven From regebro at gmail.com Fri Jan 10 04:32:29 2014 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 10 Jan 2014 04:32:29 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: On Thu, Jan 9, 2014 at 10:06 AM, Kristj?n Valur J?nsson wrote: > Do I speak Chinese to my grocer because china is a growing force in the world? Or start every discussion with my children with a negotiation on what language to use? No, because your environment have a default language. And Python has a default encoding. You only get problems when some file doesn't use the default encoding. //Lennart From regebro at gmail.com Fri Jan 10 04:42:04 2014 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 10 Jan 2014 04:42:04 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> Message-ID: On Fri, Jan 10, 2014 at 2:03 AM, Joao S. O. Bueno wrote: > On 9 January 2014 04:50, Lennart Regebro wrote: >> To be honest, you can define text as "A stream of bytes that are split >> up in lines separated by a linefeed", and do some basic text >> processing like that. Just very *basic*, but still. Replacing >> characters. Extracting certain lines etc. > > That is, until you hit a character which has a byte with the same > value of ASCII newline in the middle of a multi-byte character. > > So, this approach is broken to start with. For a very specific definition of broken, yes, namely that it will fail with UTF-16 or EBCDIC. Files that with the above definition of "text files" are not text files. :-) //Lennart From rosuav at gmail.com Fri Jan 10 05:03:10 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 10 Jan 2014 15:03:10 +1100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <20140110023952.GI3869@ando> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <20140110023952.GI3869@ando> Message-ID: On Fri, Jan 10, 2014 at 1:39 PM, Steven D'Aprano wrote: > On Fri, Jan 10, 2014 at 12:22:02PM +1100, Chris Angelico wrote: >> On Fri, Jan 10, 2014 at 11:53 AM, anatoly techtonik wrote: >> > 2. introduce autodetect mode to open functions >> > 1. read and transform on the fly, maintaining a buffer that >> > stores original bytes >> > and their mapping to letters. The mapping is updated as bytes frequency >> > changes. When the buffer is full, you have the best candidate. >> > >> >> Bad idea. Bad, bad idea! No biscuit. Sit! >> >> This sort of magic is what brings the "bush hid the facts" bug in >> Windows Notepad. If byte value distribution is used to guess encoding, >> there's no end to the craziness that can result. > > I think that heuristics to guess the encoding have their role to play, > if the caller understands the risks. For example, an application might > give the user the choice of specifying the codec, or having the app > guess it. (I dislike the term "Auto detect", since that implies a level > of certainty which often doesn't apply to real files.) > > There is already a third-party library, chardet, which does this. > Perhaps the std lib should include this? Perhaps chardet should be > considered best-of-breed "atomic reactor", but the std lib could include > a "battery" to do something similar. I don't think we ought to dismiss > this idea out of hand. I don't deny that chardet has its place, but would you use it like this (I'm assuming it works with Py3, the docs seem to imply Py2): text = "" with open("blah", "rb") as f: while True: data = f.read(256) if not data: break text += data.decode(chardet.detect(data)['encoding']) Certainly not. But that's how the file-open-mode of "auto detect" sounds. At very least, it has to do something like this _until_ it has confidence; maybe it can retain the chardet state after the first read, but it's still going to have to decode as little as you first read. How can it handle this case? first_char = open("blah", encoding="auto").read(1) Somehow it needs to know how many bytes to read (and not read too many more, preferably - buffering a line-ish is reasonable, buffering a megabyte not so much) and figure out what's one character. I see this as similar to the Python 2 input() function. It's not the file-open builtin's job to do something advanced and foot-shooting as automatic charset detection. If you want that, you should be prepared for its failures and the messes of partial reads, and call on chardet yourself, same as you should use eval(input()) explicitly in Py3 (and, in my opinion, eval(raw_input()) equally explicitly in Py2). I'm not saying that chardet is bad, but I *am* saying, and I stand by this, that an auto-detect option on file open is a bad idea. Unix comes with a 'file' command which will tell you even more about what something is. (For what it thinks are text files, I believe it uses heuristics similar to chardet to guess an encoding.) Would you want a parameter to the open() builtin that tries to read the file as an image, or an audio file, or a document, or an executable, and automatically decodes it to a PIL.Image, an mm.wave, etc, or execute the code and return its stdout, all entirely automatically? I don't think so. Not open()'s job. ChrisA From ben+python at benfinney.id.au Fri Jan 10 05:49:47 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 10 Jan 2014 15:49:47 +1100 Subject: [Python-Dev] Python3 "complexity" References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <20140110023952.GI3869@ando> Message-ID: <7wsiswcufo.fsf@benfinney.id.au> Steven D'Aprano writes: > I think that heuristics to guess the encoding have their role to play, > if the caller understands the risks. I think, for a language whose developers espouse a principle ?In the face of ambiguity, refuse the temptation to guess?, heuristics have no role to play in the standard library. > There is already a third-party library, chardet, which does this. As a third-party library, it's fine and quite useful. > Perhaps the std lib should include this? In my opinion, content-type guessing heuristics certainly don't belong in the standard library. -- \ ?Nothing is more sacred than the facts.? ?Sam Harris, _The End | `\ of Faith_, 2004 | _o__) | Ben Finney From stephen at xemacs.org Fri Jan 10 07:13:47 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 10 Jan 2014 15:13:47 +0900 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CF1DF9.30003@stoneleaf.us> <52CF2D62.8040500@stoneleaf.us> Message-ID: <87eh4gfjok.fsf@uwakimon.sk.tsukuba.ac.jp> INADA Naoki writes: > latin1 is OK but is it Pythonic? Yes. EIBTI, including being explicit that you're doing something that has semantics that you are ignoring but may come back to bite you or somebody who naively uses your module. There's nothing un-Pythonic about using potentially dangerous idioms. We assume that you know what you are doing and either have taken measures to trap exceptional cases or are willing to accept the risk of an unhandled exception. > I've posted suggestion about add 'bytes' as a alias for 'latin1'. Unpythonic. Such alternative names hide the fact that there are semantics that you may not want. Only the programmer can know whether it's safe. If you want an ascii-compatible and space-efficient representation that is safe even if the bytestream is something you don't expect, you need to do something like I proposed. If you don't need efficiency, (encoding='ascii', errors='surrogateescape') is the way to go. But these still don't provide convenient interpolation of binary data, as we discovered earlier. From stephen at xemacs.org Fri Jan 10 08:28:34 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 10 Jan 2014 16:28:34 +0900 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <20140110023952.GI3869@ando> Message-ID: <87d2k0fg7x.fsf@uwakimon.sk.tsukuba.ac.jp> Chris Angelico writes: > I'm not saying that chardet is bad, but I *am* saying, and I stand > by this, that an auto-detect option on file open is a bad idea. I have used it by default in Emacs and XEmacs since 1990, and I certainly haven't experienced it as a bad idea at *any* time in more than two decades. Of course, it shouldn't be default in Python for two reasons: (1) Emacsen are invariably interactive so very flexible with error recovery, not so for Python, and (2) Emacsen can generally assume that the files they open are more or less text in the first place, which again is not true for Python. > Would you want a parameter to the open() builtin It's not a parameter, it's a particular value for the encoding parameter. > that tries to read the file as an image, or an audio file, or a > document, or an executable, and automatically decodes it to a > PIL.Image, an mm.wave, etc, Emacsen do that, too. It's not the sayonara Grand Slam in the 7th game of the World Series spectacular win that text encoding detection is, but it is very useful much of the time. What it comes down to for all of the above is "consenting adults." Python should *not* do any guessing by default, but if the programmer or user explicitly request a guess with "encoding=chardet", why in the world would you want Python to do anything but give it the old college try? Of course any Python-supplied guesser should take a very pessimistic approach and error unless it's quite certain, but > or execute the code and return its stdout, all entirely > automatically? Now *that* is a really bad idea. You shouldn't mix it with the others. (I'll also concede that many file formats -- Postscript, I'm looking at you -- require special care to avoid arbitrary code execution.) From solipsis at pitrou.net Fri Jan 10 10:06:42 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 10 Jan 2014 10:06:42 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <20140109203053.59c4ba1b@fsol> Message-ID: <20140110100642.1270207b@fsol> On Fri, 10 Jan 2014 11:32:05 +1000 Nick Coghlan wrote: > > > > It's consistent with bytearray.join's behaviour: > > > > >>> x = bytearray() > > >>> x.join([b"abc"]) > > bytearray(b'abc') > > >>> x > > bytearray(b'') > > Yeah, I guess I'm OK with us being consistent on that one. It's still > weird, but also clearly useful :) > > Will the new binary format ever call __format__? I assume not, but it's > probably best to make that absolutely explicit in the PEP. Not indeed. I'll add that to the PEP, thanks. cheers Antoine. From mal at egenix.com Fri Jan 10 13:19:14 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 10 Jan 2014 13:19:14 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <20140109224529.6ddd9388@fsol> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> Message-ID: <52CFE542.5050909@egenix.com> On 09.01.2014 22:45, Antoine Pitrou wrote: > On Thu, 9 Jan 2014 13:36:05 -0800 > Chris Barker wrote: >> >> Some folks have suggested using latin-1 (or other 8-bit encoding) -- is >> that guaranteed to work with any binary data, and round-trip accurately? > > Yes, it is. Just a word of caution: Using the 'latin-1' to mean unknown encoding can easily result in Mojibake (unreadable text) entering your application with dangerous effects on your other text data. E.g. "Marc-Andr?" read using 'latin-1' if the string itself is encoded as UTF-8 will give you "Marc-Andr??" in your application. (Yes, I see that a lot in applications and websites I use ;-)) Also note that indexing based on code points will likely break that way as well, ie. if you pass an index to an application based on what you see in your editor or shell, those indexes can be wrong when used on the encoded data. UTF-8 is an example of a popular variable length encoding for Unicode, so you'll hit this problem whenever dealing with non-ASCII UTF-8 data. >> and will surrogateescape work for arbitrary binary data? > > Yes, it will. The surrogateescape trick only works if you are encoding your work using the same encoding that you used for decoding it. Otherwise, you'll get a mix of the input encoding and the output encoding as output. Note that the error handler trick has an advantage over the latin-1 trick: if you try to encode a Unicode string with escape surrogates without using the error handler, it will fail, so you at least know that there are "funny" code points in your output string that need some extra care. BTW: Perhaps it would be a good idea to backport the surrogateescape error handler to Python 2.7 to simplify writing code which works in both Python 2 and 3. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 10 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From p.f.moore at gmail.com Fri Jan 10 15:05:10 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 10 Jan 2014 14:05:10 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <52CFE542.5050909@egenix.com> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CFE542.5050909@egenix.com> Message-ID: On 10 January 2014 12:19, M.-A. Lemburg wrote: > Just a word of caution: > > Using the 'latin-1' to mean unknown encoding can easily result > in Mojibake (unreadable text) entering your application with > dangerous effects on your other text data. Agreed. The latin-1 suggestion is purely for people who object to learning how to handle the encodings in their data more accurately. That's not a criticism, wanting to avoid getting sidetracked into understanding encodings when porting a personal script is a classic "practicality vs purity" situation. Current responses to people with encoding issues tend towards an idealistic "you should understand your data better" position, which while true in the abstract is not always what the requester wants to hear. Paul. From matej at ceplovi.cz Fri Jan 10 15:28:22 2014 From: matej at ceplovi.cz (Matěj Cepl) Date: Fri, 10 Jan 2014 15:28:22 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <52CFE542.5050909@egenix.com> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CFE542.5050909@egenix.com> Message-ID: <20140110142822.602AF40B56@wycliff.ceplovi.cz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2014-01-10, 12:19 GMT, you wrote: > Using the 'latin-1' to mean unknown encoding can easily result > in Mojibake (unreadable text) entering your application with > dangerous effects on your other text data. > > E.g. "Marc-Andr?" read using 'latin-1' if the string itself > is encoded as UTF-8 will give you "Marc-Andr??" in your > application. (Yes, I see that a lot in applications > and websites I use ;-)) I am afraid that for most 'latin-1' is just another attempt to make Unicode complexity go away and the way how to ignore it. Mat?j -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iD8DBQFS0AOG4J/vJdlkhKwRAgffAKCHn8uMnpZDVSwa2Oat+QI2h32o2wCeJdUN ZXTbDtiJtJrrhnRPzbgc3dc= =Pr1X -----END PGP SIGNATURE----- From ncoghlan at gmail.com Fri Jan 10 16:20:06 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Jan 2014 01:20:06 +1000 Subject: [Python-Dev] [Python-checkins] peps: PEP 460: add .format_map() In-Reply-To: <52CF17A3.9030304@trueblade.com> References: <3f0fsB3ww4z7Lk8@mail.python.org> <52CF17A3.9030304@trueblade.com> Message-ID: On 10 January 2014 07:41, Eric V. Smith wrote: > I'm not sure how format_map helps in porting from 2 to 3, since it > doesn't exist in any version of 2. > > Although that said, it's no doubt a useful feature, just not useful in > code that supports both 2 and 3 with a single code base or when porting > to 3. It's purely a matter of consistency with str - if we're adding binary interpolation back to Python 3 (which I have been persuaded is a good idea), then we should provide the same three typical spellings of the operation that str provides. Cheers, Nick. > > Eric. > > On 1/9/2014 4:02 PM, antoine.pitrou wrote: >> http://hg.python.org/peps/rev/8947cdc6b22e >> changeset: 5341:8947cdc6b22e >> user: Antoine Pitrou >> date: Thu Jan 09 22:02:01 2014 +0100 >> summary: >> PEP 460: add .format_map() >> >> files: >> pep-0460.txt | 6 +++++- >> 1 files changed, 5 insertions(+), 1 deletions(-) >> >> >> diff --git a/pep-0460.txt b/pep-0460.txt >> --- a/pep-0460.txt >> +++ b/pep-0460.txt >> @@ -24,12 +24,16 @@ >> similar in syntax to ``str.format()`` (accepting positional as well as >> keyword arguments). >> >> +* ``bytes.format_map(...)`` and ``bytearray.format_map(...)`` for an >> + API similar to ``str.format_map(...)``, with the same formatting >> + syntax and semantics as ``bytes.format()`` and ``bytearray.format()``. >> + >> >> Rationale >> ========= >> >> In Python 2, ``str % args`` and ``str.format(args)`` allow the formatting >> -and interpolation of bytes strings. This feature has commonly been used >> +and interpolation of bytestrings. This feature has commonly been used >> for the assembling of protocol messages when protocols are known to use >> a fixed encoding. >> >> >> >> >> _______________________________________________ >> Python-checkins mailing list >> Python-checkins at python.org >> https://mail.python.org/mailman/listinfo/python-checkins >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Jan 10 16:35:38 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Jan 2014 01:35:38 +1000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: On 10 January 2014 13:32, Lennart Regebro wrote: > On Thu, Jan 9, 2014 at 10:06 AM, Kristj?n Valur J?nsson > wrote: >> Do I speak Chinese to my grocer because china is a growing force in the world? Or start every discussion with my children with a negotiation on what language to use? > > No, because your environment have a default language. And Python has a > default encoding. You only get problems when some file doesn't use the > default encoding. Putting this here because I found out today it's not in any of the PEPs and folks have to go digging in mailing list archives to find it. I'll add it to my Python 3 Q&A at some point. The reason Python 3 currently tries to rely on the POSIX locale encoding is that during the Python 3 development process it was pointed out that ShiftJIS, ISO-2022 and various CJK codec are in widespread use in Asia, since Asian users needed solutions to the problem of representing kana, ideographs and other non-Latin characters long before the Unicode Consortium existed. This creates a problem for Python 3, as assuming utf-8 means we have a high risk of corrupting user's data at least in Asian locales, as well as anywhere else where non-UTF-8 encodings are common (especially when encodings that aren't ASCII compatible are involved). While the Python 3 status quo on POSIX systems certainly isn't ideal, it at least means our most likely failure mode is an exception rather than silent data corruption. One of the major culprits for that is the antiquated POSIX/C locale, which reports ASCII as the system encoding. One idea we're considering for Python 3.5 is to have a report of "ascii" on a POSIX OS imply the surrogateescape error handler (at least for the standard streams, and perhaps in other contexts), since the OS reporting the POSIX/C locale almost certainly indicates a configuration error rather than intentional behaviour. Cheers, Nick. > > //Lennart > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From eric at trueblade.com Fri Jan 10 16:36:05 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 10 Jan 2014 10:36:05 -0500 Subject: [Python-Dev] [Python-checkins] peps: PEP 460: add .format_map() In-Reply-To: References: <3f0fsB3ww4z7Lk8@mail.python.org> <52CF17A3.9030304@trueblade.com> Message-ID: <52D01365.7010404@trueblade.com> On 1/10/2014 10:20 AM, Nick Coghlan wrote: > On 10 January 2014 07:41, Eric V. Smith wrote: >> I'm not sure how format_map helps in porting from 2 to 3, since it >> doesn't exist in any version of 2. >> >> Although that said, it's no doubt a useful feature, just not useful in >> code that supports both 2 and 3 with a single code base or when porting >> to 3. > > It's purely a matter of consistency with str - if we're adding binary > interpolation back to Python 3 (which I have been persuaded is a good > idea), then we should provide the same three typical spellings of the > operation that str provides. > > Cheers, > Nick. I'm perfectly okay with that, and it was on my list of things to suggest. I just think that the PEP should be focused on porting code from 2 to 3 and on code that runs on both 2 and 3. I think the Rationale should state this clearly. Eric. From stefan at bytereef.org Fri Jan 10 16:54:59 2014 From: stefan at bytereef.org (Stefan Krah) Date: Fri, 10 Jan 2014 16:54:59 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: <20140110155459.GA7254@sleipnir.bytereef.org> Nick Coghlan wrote: > One idea we're considering for Python 3.5 is to have a report of > "ascii" on a POSIX OS imply the surrogateescape error handler (at > least for the standard streams, and perhaps in other contexts), since > the OS reporting the POSIX/C locale almost certainly indicates a > configuration error rather than intentional behaviour. On FreeBSD users apparently get the C locale by default. I don't think I've configured anything special during the install: freebsd-amd64# adduser Username: testuser Full name: Uid (Leave empty for default): Login group [testuser]: Login group is testuser. Invite testuser into other groups? []: Login class [default]: Shell (sh csh tcsh bash rbash nologin) [sh]: Home directory [/home/testuser]: Home directory permissions (Leave empty for default): Use password-based authentication? [yes]: no Lock out the account after creation? [no]: Username : testuser Password : Full Name : Uid : 1003 Class : Groups : testuser Home : /home/testuser Home Mode : Shell : /bin/sh Locked : no OK? (yes/no): yes adduser: INFO: Successfully added (testuser) to the user database. Add another user? (yes/no): no Goodbye! freebsd-amd64# su - testuser $ locale LANG= LC_CTYPE="C" LC_COLLATE="C" LC_TIME="C" LC_NUMERIC="C" LC_MONETARY="C" LC_MESSAGES="C" LC_ALL= Stefan Krah From devel at baptiste-carvello.net Fri Jan 10 17:27:45 2014 From: devel at baptiste-carvello.net (Baptiste Carvello) Date: Fri, 10 Jan 2014 17:27:45 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: Le 10/01/2014 16:35, Nick Coghlan a ?crit : > One idea we're considering for Python 3.5 is to have a report of > "ascii" on a POSIX OS imply the surrogateescape error handler (at > least for the standard streams, and perhaps in other contexts), since > the OS reporting the POSIX/C locale almost certainly indicates a > configuration error rather than intentional behaviour. would it make sense to be more general, and allow a "lenient mode", where all files implicitly opened with the default encoding would also use the surrogateescape error handler ? That way, applications designed to process text mostly written in the default encoding would just call sys.set_lenient_mode() and be done. Of course, libraries would need to be strongly discouraged to ever use this and encouraged to explicitly set the error handler on appropriate files instead. Cheers, Baptiste From songofacandy at gmail.com Fri Jan 10 17:30:40 2014 From: songofacandy at gmail.com (INADA Naoki) Date: Sat, 11 Jan 2014 01:30:40 +0900 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <20140110142822.602AF40B56@wycliff.ceplovi.cz> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CFE542.5050909@egenix.com> <20140110142822.602AF40B56@wycliff.ceplovi.cz> Message-ID: Now I feel it is bad thing that encouraging using unicode for binary with latin-1 encoding or surrogateescape errorhandler. Handling binary data in str type using latin-1 is just a hack. Surrogateescape is just a workaround to keep undecodable bytes in text. Encouraging binary data in str type with latin-1 or surrogateescape means encourage mixing binary and text data. It is worth than Python 2. So Python should encourage handling binary data in bytes type. On Fri, Jan 10, 2014 at 11:28 PM, Mat?j Cepl wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 2014-01-10, 12:19 GMT, you wrote: > > Using the 'latin-1' to mean unknown encoding can easily result > > in Mojibake (unreadable text) entering your application with > > dangerous effects on your other text data. > > > > E.g. "Marc-Andr?" read using 'latin-1' if the string itself > > is encoded as UTF-8 will give you "Marc-Andr??" in your > > application. (Yes, I see that a lot in applications > > and websites I use ;-)) > > I am afraid that for most 'latin-1' is just another attempt to > make Unicode complexity go away and the way how to ignore it. > > Mat?j > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.22 (GNU/Linux) > > iD8DBQFS0AOG4J/vJdlkhKwRAgffAKCHn8uMnpZDVSwa2Oat+QI2h32o2wCeJdUN > ZXTbDtiJtJrrhnRPzbgc3dc= > =Pr1X > -----END PGP SIGNATURE----- > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Jan 10 18:07:48 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 10 Jan 2014 18:07:48 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140110170748.92222560CC@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-01-03 - 2014-01-10) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4409 (+61) closed 27580 (+42) total 31989 (+103) Open issues with patches: 1993 Issues opened (87) ================== #15027: Faster UTF-32 encoding http://bugs.python.org/issue15027 reopened by serhiy.storchaka #20115: NUL bytes in commented lines http://bugs.python.org/issue20115 opened by arigo #20116: urlparse.parse_qs should take argument for query separator http://bugs.python.org/issue20116 opened by ruben.orduz #20117: subprocess on Windows: wrong return code with shell=True http://bugs.python.org/issue20117 opened by gvanrossum #20118: test_imaplib test_linetoolong fails on 2.7 in SSL test on some http://bugs.python.org/issue20118 opened by r.david.murray #20119: pdb c(ont(inue)) optional one-time-only breakpoint (like perl http://bugs.python.org/issue20119 opened by nlevitt at gmail.com #20120: Percent-signs (%) in .pypirc should not be interpolated http://bugs.python.org/issue20120 opened by tlevine #20121: quopri_codec newline handling http://bugs.python.org/issue20121 opened by fredstober #20122: Move CallTips tests to idle_tests http://bugs.python.org/issue20122 opened by serhiy.storchaka #20123: pydoc.synopsis fails to load binary modules http://bugs.python.org/issue20123 opened by eric.snow #20124: The documentation for the atTime parameter of TimedRotatimeFil http://bugs.python.org/issue20124 opened by r.david.murray #20125: We need a good replacement for direct use of load_module(), po http://bugs.python.org/issue20125 opened by eric.snow #20126: sched doesn't handle events added after scheduler starts http://bugs.python.org/issue20126 opened by lotus at blossomhillranch.com #20127: Race condition in test_threaded_import.task()? http://bugs.python.org/issue20127 opened by eric.snow #20128: Re-enable test_modules_search_builtin() in test_pydoc http://bugs.python.org/issue20128 opened by eric.snow #20131: warnings module offers no documented, programmatic way to rese http://bugs.python.org/issue20131 opened by inducer #20132: Many incremental codecs don???t handle fragmented data http://bugs.python.org/issue20132 opened by vadmium #20133: Derby: Convert the audioop module to use Argument Clinic http://bugs.python.org/issue20133 opened by serhiy.storchaka #20135: mutate list http://bugs.python.org/issue20135 opened by m123orning #20136: Logging: StreamHandler does not use OS line separator. http://bugs.python.org/issue20136 opened by alibotean #20137: Logging: RotatingFileHandler computes string length instead of http://bugs.python.org/issue20137 opened by alibotean #20138: wsgiref on Python 3.x incorrectly implements URL handling caus http://bugs.python.org/issue20138 opened by aronacher #20139: Python installer does not install a "pip" command (just "pip3" http://bugs.python.org/issue20139 opened by pmoore #20140: UnicodeDecodeError in ntpath.py when home dir contains non-asc http://bugs.python.org/issue20140 opened by Jarek.??miejczak #20145: unittest.assert*Regex functions should verify that expected_re http://bugs.python.org/issue20145 opened by the.mulhern #20146: UserDict module docs link is obsolete http://bugs.python.org/issue20146 opened by drunax #20147: multiprocessing.Queue.get() raises queue.Empty exception if ev http://bugs.python.org/issue20147 opened by torsten #20148: Derby: Convert the _sre module to use Argument Clinic http://bugs.python.org/issue20148 opened by serhiy.storchaka #20150: API change in string formatting with :s option should be docum http://bugs.python.org/issue20150 opened by Thomas.Robitaille #20151: Derby: Convert the binascii module to use Argument Clinic http://bugs.python.org/issue20151 opened by serhiy.storchaka #20152: Derby #15: Convert 50 sites to Argument Clinic across 9 files http://bugs.python.org/issue20152 opened by brett.cannon #20153: New-in-3.4 weakref finalizer doc section is already out of dat http://bugs.python.org/issue20153 opened by r.david.murray #20154: Deadlock in asyncio.StreamReader.readexactly() http://bugs.python.org/issue20154 opened by gvanrossum #20155: Regression test test_httpservers fails, hangs on Windows http://bugs.python.org/issue20155 opened by jeff.allen #20156: bz2.BZ2File.read() does not treat growing input file properly http://bugs.python.org/issue20156 opened by Joshua.Chia #20159: Derby #7: Convert 51 sites to Argument Clinic across 3 files - http://bugs.python.org/issue20159 opened by serhiy.storchaka #20160: broken ctypes calling convention on MSVC / 64-bit Windows (lar http://bugs.python.org/issue20160 opened by mark.dickinson #20162: Test test_hash_distribution fails on RHEL 6.5 / ppc64 http://bugs.python.org/issue20162 opened by zaytsev #20163: ValueError: time data does not match format http://bugs.python.org/issue20163 opened by dellair.jie #20164: Undocumented KeyError from os.path.expanduser http://bugs.python.org/issue20164 opened by acdha #20165: unittest TestResult wasSuccessful returns True when there are http://bugs.python.org/issue20165 opened by gregory.p.smith #20166: window x64 c-extensions not works on python3.4.0b2 http://bugs.python.org/issue20166 opened by jarod #20167: Exception on IDLE closing http://bugs.python.org/issue20167 opened by serhiy.storchaka #20168: Derby: Convert the _tkinter module to use Argument Clinic http://bugs.python.org/issue20168 opened by serhiy.storchaka #20169: random module doc page has broken links http://bugs.python.org/issue20169 opened by roysmith #20170: Derby #1: Convert 137 sites to Argument Clinic in Modules/posi http://bugs.python.org/issue20170 opened by larry #20171: Derby #2: Convert 115 sites to Argument Clinic in Modules/_cur http://bugs.python.org/issue20171 opened by larry #20172: Derby #3: Convert 67 sites to Argument Clinic across 4 files ( http://bugs.python.org/issue20172 opened by larry #20173: Derby #4: Convert 53 sites to Argument Clinic across 5 files http://bugs.python.org/issue20173 opened by larry #20174: Derby #5: Convert 50 sites to Argument Clinic across 3 files http://bugs.python.org/issue20174 opened by larry #20175: Derby #6: Convert 50 sites to Argument Clinic across 8 files http://bugs.python.org/issue20175 opened by larry #20177: Derby #8: Convert 28 sites to Argument Clinic across 2 files http://bugs.python.org/issue20177 opened by larry #20178: Derby #9: Convert 52 sites to Argument Clinic across 11 files http://bugs.python.org/issue20178 opened by larry #20179: Derby #10: Convert 50 sites to Argument Clinic across 4 files http://bugs.python.org/issue20179 opened by larry #20180: Derby #11: Convert 50 sites to Argument Clinic across 9 files http://bugs.python.org/issue20180 opened by larry #20181: Derby #12: Convert 50 sites to Argument Clinic across 4 files http://bugs.python.org/issue20181 opened by larry #20182: Derby #13: Convert 50 sites to Argument Clinic across 5 files http://bugs.python.org/issue20182 opened by larry #20183: Derby #14: Convert 41 sites to Argument Clinic across 5 files http://bugs.python.org/issue20183 opened by larry #20184: Derby #16: Convert 50 sites to Argument Clinic across 9 files http://bugs.python.org/issue20184 opened by larry #20185: Derby #17: Convert 50 sites to Argument Clinic across 14 files http://bugs.python.org/issue20185 opened by larry #20186: Derby #18: Convert 31 sites to Argument Clinic across 23 files http://bugs.python.org/issue20186 opened by larry #20187: The Great Argument Clinic Conversion Derby Meta-Issue http://bugs.python.org/issue20187 opened by larry #20188: ALPN support for TLS http://bugs.python.org/issue20188 opened by mnot #20189: inspect.Signature doesn't recognize all builtin types http://bugs.python.org/issue20189 opened by larry #20191: resource.prlimit(int, int, str) crashs http://bugs.python.org/issue20191 opened by haypo #20192: pprint chokes on set containing frozenset http://bugs.python.org/issue20192 opened by jbylund #20193: Derby: Convert the zlib, _bz2 and _lzma modules to use Argumen http://bugs.python.org/issue20193 opened by serhiy.storchaka #20194: Add :deprecated: marker to formatter module docs http://bugs.python.org/issue20194 opened by brett.cannon #20195: Add :deprecated: marker to imp docs http://bugs.python.org/issue20195 opened by brett.cannon #20196: Argument Clinic generates invalid code for optional parameter http://bugs.python.org/issue20196 opened by serhiy.storchaka #20197: Support WebP image format detection in imghdr module http://bugs.python.org/issue20197 opened by akhenakh #20198: xml.etree.ElementTree.ElementTree.write attribute sorting http://bugs.python.org/issue20198 opened by bagratte #20199: status of module_for_loader and utils._module_to_load http://bugs.python.org/issue20199 opened by r.david.murray #20201: Argument Clinic: rwbuffer support broken http://bugs.python.org/issue20201 opened by rmsr #20202: ArgumentClinic howto: document change in Py_buffer lifecycle m http://bugs.python.org/issue20202 opened by rmsr #20204: pydocs fails for some C implemented classes http://bugs.python.org/issue20204 opened by serhiy.storchaka #20205: inspect.getsource(), P302 loader and '<..>' filenames http://bugs.python.org/issue20205 opened by stefan.mueller #20206: email quoted-printable encoding issue http://bugs.python.org/issue20206 opened by timar #20208: Clarify some things in porting HOWTO http://bugs.python.org/issue20208 opened by brett.cannon #20209: Deprecate PROTOCOL_SSLv2 http://bugs.python.org/issue20209 opened by pitrou #20210: Provide configure options to enable/disable Python modules and http://bugs.python.org/issue20210 opened by thomas-petazzoni #20211: setup.py: do not add invalid header locations http://bugs.python.org/issue20211 opened by thomas-petazzoni #20212: distutils: fix build_ext check to find whether we're building http://bugs.python.org/issue20212 opened by thomas-petazzoni #20213: Change the install location of _sysconfigdata.py http://bugs.python.org/issue20213 opened by thomas-petazzoni #20214: Argument Clinic rollup fixes http://bugs.python.org/issue20214 opened by larry #20215: Python2.7 socketserver can not listen IPv6 address http://bugs.python.org/issue20215 opened by dazhaoyu #20216: Misleading docs for sha1, sha256, sha512, md5 modules http://bugs.python.org/issue20216 opened by vajrasky Most recent 15 issues with no replies (15) ========================================== #20214: Argument Clinic rollup fixes http://bugs.python.org/issue20214 #20213: Change the install location of _sysconfigdata.py http://bugs.python.org/issue20213 #20212: distutils: fix build_ext check to find whether we're building http://bugs.python.org/issue20212 #20211: setup.py: do not add invalid header locations http://bugs.python.org/issue20211 #20210: Provide configure options to enable/disable Python modules and http://bugs.python.org/issue20210 #20204: pydocs fails for some C implemented classes http://bugs.python.org/issue20204 #20197: Support WebP image format detection in imghdr module http://bugs.python.org/issue20197 #20195: Add :deprecated: marker to imp docs http://bugs.python.org/issue20195 #20194: Add :deprecated: marker to formatter module docs http://bugs.python.org/issue20194 #20188: ALPN support for TLS http://bugs.python.org/issue20188 #20186: Derby #18: Convert 31 sites to Argument Clinic across 23 files http://bugs.python.org/issue20186 #20185: Derby #17: Convert 50 sites to Argument Clinic across 14 files http://bugs.python.org/issue20185 #20184: Derby #16: Convert 50 sites to Argument Clinic across 9 files http://bugs.python.org/issue20184 #20182: Derby #13: Convert 50 sites to Argument Clinic across 5 files http://bugs.python.org/issue20182 #20181: Derby #12: Convert 50 sites to Argument Clinic across 4 files http://bugs.python.org/issue20181 Most recent 15 issues waiting for review (15) ============================================= #20216: Misleading docs for sha1, sha256, sha512, md5 modules http://bugs.python.org/issue20216 #20214: Argument Clinic rollup fixes http://bugs.python.org/issue20214 #20213: Change the install location of _sysconfigdata.py http://bugs.python.org/issue20213 #20212: distutils: fix build_ext check to find whether we're building http://bugs.python.org/issue20212 #20211: setup.py: do not add invalid header locations http://bugs.python.org/issue20211 #20210: Provide configure options to enable/disable Python modules and http://bugs.python.org/issue20210 #20208: Clarify some things in porting HOWTO http://bugs.python.org/issue20208 #20204: pydocs fails for some C implemented classes http://bugs.python.org/issue20204 #20201: Argument Clinic: rwbuffer support broken http://bugs.python.org/issue20201 #20197: Support WebP image format detection in imghdr module http://bugs.python.org/issue20197 #20196: Argument Clinic generates invalid code for optional parameter http://bugs.python.org/issue20196 #20193: Derby: Convert the zlib, _bz2 and _lzma modules to use Argumen http://bugs.python.org/issue20193 #20174: Derby #5: Convert 50 sites to Argument Clinic across 3 files http://bugs.python.org/issue20174 #20173: Derby #4: Convert 53 sites to Argument Clinic across 5 files http://bugs.python.org/issue20173 #20172: Derby #3: Convert 67 sites to Argument Clinic across 4 files ( http://bugs.python.org/issue20172 Top 10 most discussed issues (10) ================================= #20162: Test test_hash_distribution fails on RHEL 6.5 / ppc64 http://bugs.python.org/issue20162 18 msgs #20123: pydoc.synopsis fails to load binary modules http://bugs.python.org/issue20123 17 msgs #20209: Deprecate PROTOCOL_SSLv2 http://bugs.python.org/issue20209 16 msgs #19995: %c, %o, %x, %X accept non-integer values instead of raising an http://bugs.python.org/issue19995 13 msgs #20173: Derby #4: Convert 53 sites to Argument Clinic across 5 files http://bugs.python.org/issue20173 13 msgs #10388: spwd returning different value depending on privileges http://bugs.python.org/issue10388 8 msgs #20172: Derby #3: Convert 67 sites to Argument Clinic across 4 files ( http://bugs.python.org/issue20172 8 msgs #20193: Derby: Convert the zlib, _bz2 and _lzma modules to use Argumen http://bugs.python.org/issue20193 8 msgs #1322: platform.dist() has unpredictable result under Linux http://bugs.python.org/issue1322 7 msgs #18960: First line can be executed twice http://bugs.python.org/issue18960 7 msgs Issues closed (40) ================== #5131: pprint doesn't know how to print a defaultdict http://bugs.python.org/issue5131 closed by ncoghlan #13107: Text width in optparse.py can become negative http://bugs.python.org/issue13107 closed by serhiy.storchaka #13115: tp_as_{number,sequence,mapping} can't be set using PyType_From http://bugs.python.org/issue13115 closed by loewis #16039: imaplib: unlimited readline() from connection http://bugs.python.org/issue16039 closed by r.david.murray #17390: display python version on idle title bar http://bugs.python.org/issue17390 closed by bagratte #18515: zipfile._ZipDecryptor generates wasteful crc32 table on import http://bugs.python.org/issue18515 closed by serhiy.storchaka #19081: zipimport behaves badly when the zip file changes while the pr http://bugs.python.org/issue19081 closed by gregory.p.smith #19526: Review additions to the stable ABI of Python 3.4 http://bugs.python.org/issue19526 closed by loewis #19538: Changed function prototypes in the PEP 384 stable ABI http://bugs.python.org/issue19538 closed by loewis #19659: Document Argument Clinic http://bugs.python.org/issue19659 closed by larry #19703: Update pydoc to PEP 451 http://bugs.python.org/issue19703 closed by eric.snow #19708: Check pkgutil for anything missing for PEP 451 http://bugs.python.org/issue19708 closed by eric.snow #19713: Deprecate various things in importlib thanks to PEP 451 http://bugs.python.org/issue19713 closed by eric.snow #19719: add importlib.abc.SpecLoader and SpecFinder http://bugs.python.org/issue19719 closed by brett.cannon #19723: Argument Clinic should add markers for humans http://bugs.python.org/issue19723 closed by larry #19732: python fails to build when configured with --with-system-libmp http://bugs.python.org/issue19732 closed by skrah #19927: Path-based loaders lack a meaningful __eq__() implementation. http://bugs.python.org/issue19927 closed by eric.snow #19965: Non-atomic generation of Include/Python-ast.h and Python/Pytho http://bugs.python.org/issue19965 closed by serhiy.storchaka #19976: Argument Clinic: generate second arg for METH_NOARGS http://bugs.python.org/issue19976 closed by larry #20078: zipfile - ZipExtFile.read goes into 100% CPU infinite loop on http://bugs.python.org/issue20078 closed by serhiy.storchaka #20096: Mention modernize and future in Python 2/3 porting HOWTO http://bugs.python.org/issue20096 closed by brett.cannon #20113: os.readv() and os.writev() don't raise an OSError on readv()/w http://bugs.python.org/issue20113 closed by haypo #20129: 3.4 on windows 7 can't import IntEnum http://bugs.python.org/issue20129 closed by BreamoreBoy #20130: asyncio: implement a synchronous executor if concurrent.future http://bugs.python.org/issue20130 closed by haypo #20134: typo: s/coping/copying/ http://bugs.python.org/issue20134 closed by benjamin.peterson #20141: Argument Clinic: broken support for 'O!' http://bugs.python.org/issue20141 closed by larry #20142: Argument Clinic: Py_buffer parameters are not initialized http://bugs.python.org/issue20142 closed by larry #20143: Argument Clinic: negative line numbers http://bugs.python.org/issue20143 closed by larry #20144: Argument Clinic doesn't support named constants as default val http://bugs.python.org/issue20144 closed by larry #20149: 'with instance' references class's __enter__ attribute rather http://bugs.python.org/issue20149 closed by r.david.murray #20157: Argument Clinic generates wrong keyword parameter name for "de http://bugs.python.org/issue20157 closed by larry #20158: Argument Clinic: add --clean option http://bugs.python.org/issue20158 closed by larry #20161: inspect.signature fails on some functions which use Argument C http://bugs.python.org/issue20161 closed by larry #20176: Derby #7: Convert 51 sites to Argument Clinic across 3 files http://bugs.python.org/issue20176 closed by larry #20190: dict() in dict(foo='bar').keys() raises http://bugs.python.org/issue20190 closed by haypo #20200: Argument Clinic howto: custom self_converter example broken http://bugs.python.org/issue20200 closed by python-dev #20203: ArgumentClinic: support middle optional argument http://bugs.python.org/issue20203 closed by larry #20207: Disable SSLv2 in Python 2.x http://bugs.python.org/issue20207 closed by pitrou #20217: Build failure in posixmodule.c with SCHED_SPORADIC available http://bugs.python.org/issue20217 closed by python-dev #1065986: Fix pydoc crashing on unicode strings http://bugs.python.org/issue1065986 closed by r.david.murray From juraj.sukop at gmail.com Fri Jan 10 18:17:02 2014 From: juraj.sukop at gmail.com (Juraj Sukop) Date: Fri, 10 Jan 2014 18:17:02 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 Message-ID: (Sorry if this messes-up the thread order, it is meant as a reply to the original RFC.) Dear list, newbie here. After much hesitation I decided to put forward a use case which bothers me about the current proposal. Disclaimer: I happen to write a library which is directly influenced by this. As you may know, PDF operates over bytes and an integer or floating-point number is written down as-is, for example "100" or "1.23". However, the proposal drops "%d", "%f" and "%x" formats and the suggested workaround for writing down a number is to use ".encode('ascii')", which I think has two problems: One is that it needs to construct one additional object per formatting as opposed to Python 2; it is not uncommon for a PDF file to contain millions of numbers. The second problem is that, in my eyes, it is very counter-intuitive to require the use of str only to get formatting on bytes. Consider the case where a large bytes object is created out of many smaller bytes objects. If I wanted to format a part I had to use str instead. For example: content = b''.join([ b'header', b'some dictionary structure', b'part 1 abc', ('part 2 %.3f' % number).encode('ascii'), b'trailer']) In the case of PDF, the embedding of an image into PDF looks like: 10 0 obj << /Type /XObject /Width 100 /Height 100 /Alternates 15 0 R /Length 2167 >> stream ...binary image data... endstream endobj Because of the image it makes sense to store such structure inside bytes. On the other hand, there may well be another "obj" which contains the coordinates of Bezier paths: 11 0 obj ... stream 0.5 0.1 0.2 RG 300 300 m 300 400 400 400 400 300 c b endstream endobj To summarize, there are cases which mix "binary" and "text" and, in my opinion, dropping the bytes-formatting of numbers makes it more complicated than it was. I would appreciate any explanation on how: b'%.1f %.1f %.1f RG' % (r, g, b) is more confusing than: b'%s %s %s RG' % tuple(map(lambda x: (u'%.1f' % x).encode('ascii'), (r, g, b))) Similar situation exists for HTTP ("Content-Length: 123") and ASCII STL ("vertex 1.0 0.0 0.0"). Thanks and have a nice day, Juraj Sukop PS: In the case the proposal will not include the number formatting, it would be nice to list there a set of guidelines or examples on how to proceed with porting Python 2 formats to Python 3. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanrin at gmail.com Fri Jan 10 18:34:22 2014 From: stefanrin at gmail.com (Stefan Ring) Date: Fri, 10 Jan 2014 18:34:22 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: On Fri, Jan 10, 2014 at 4:35 PM, Nick Coghlan wrote: > On 10 January 2014 13:32, Lennart Regebro wrote: >> No, because your environment have a default language. And Python has a >> default encoding. You only get problems when some file doesn't use the >> default encoding. > > The reason Python 3 currently tries to rely on the POSIX locale > encoding is that during the Python 3 development process it was > pointed out that ShiftJIS, ISO-2022 and various CJK codec are in > widespread use in Asia, since Asian users needed solutions to the > problem of representing kana, ideographs and other non-Latin > characters long before the Unicode Consortium existed. > > This creates a problem for Python 3, as assuming utf-8 means we have a > high risk of corrupting user's data at least in Asian locales, as well > as anywhere else where non-UTF-8 encodings are common (especially when > encodings that aren't ASCII compatible are involved). >From my experience, the concept of a default locale is deeply flawed. What if I log into a (Linux) machine using an old latin-1 putty from the Windows XP era, have most file names and contents in UTF-8 encoding, except for one directory where people from eastern Europe upload files via FTP in whatever encoding they choose. What should the "default" encoding be now? That's why I make it a principle to always unset all LC_* and LANG variables, except when working locally, which happens rather rarely. From storchaka at gmail.com Fri Jan 10 19:05:31 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 10 Jan 2014 20:05:31 +0200 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: <52CFE542.5050909@egenix.com> References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CFE542.5050909@egenix.com> Message-ID: 10.01.14 14:19, M.-A. Lemburg ???????(??): > BTW: Perhaps it would be a good idea to backport the > surrogateescape error handler to Python 2.7 to simplify > writing code which works in both Python 2 and 3. You also should change the UTF-8 codec so that it will reject surrogates (i.e. u'\ud880'.encode('utf-8') and '\xed\xa2\x80'.decode('utf-8') should raise exceptions). And this will break much code. From eric at trueblade.com Fri Jan 10 18:56:19 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 10 Jan 2014 12:56:19 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <52D03443.2010308@trueblade.com> On 1/10/2014 12:17 PM, Juraj Sukop wrote: > (Sorry if this messes-up the thread order, it is meant as a reply to the > original RFC.) > > Dear list, > > newbie here. After much hesitation I decided to put forward a use case > which bothers me about the current proposal. Disclaimer: I happen to > write a library which is directly influenced by this. > > As you may know, PDF operates over bytes and an integer or > floating-point number is written down as-is, for example "100" or "1.23". > > However, the proposal drops "%d", "%f" and "%x" formats and the > suggested workaround for writing down a number is to use > ".encode('ascii')", which I think has two problems: > > One is that it needs to construct one additional object per formatting > as opposed to Python 2; it is not uncommon for a PDF file to contain > millions of numbers. > > The second problem is that, in my eyes, it is very counter-intuitive to > require the use of str only to get formatting on bytes. Consider the > case where a large bytes object is created out of many smaller bytes > objects. If I wanted to format a part I had to use str instead. For example: > > content = b''.join([ > b'header', > b'some dictionary structure', > b'part 1 abc', > ('part 2 %.3f' % number).encode('ascii'), > b'trailer']) I agree. I don't see any reason to exclude int and float. See Guido's messages http://bugs.python.org/issue3982#msg180423 and http://bugs.python.org/issue3982#msg180430 for some justification and discussion. Since converting int and float to strings generates a very small range of ASCII characters, ([0-9a-fx.-=], plus the uppercase versions), what problem is introduced by allowing int and float? The original str.format() work relied on this fact in its stringlib implementation. Eric. From breamoreboy at yahoo.co.uk Fri Jan 10 19:51:13 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 10 Jan 2014 18:51:13 +0000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: On 06/01/2014 13:24, Victor Stinner wrote: > Hi, > > bytes % args and bytes.format(args) are requested by Mercurial and > Twisted projects. The issue #3982 was stuck because nobody proposed a > complete definition of the "new" features. Here is a try as a PEP. > Apologies if this has already been said, but Terry Reedy attached a proof of concept to issue 3982 which might be worth taking a look at if you haven't yet done so. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From pjenvey at underboss.org Fri Jan 10 21:11:51 2014 From: pjenvey at underboss.org (Philip Jenvey) Date: Fri, 10 Jan 2014 12:11:51 -0800 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: On Jan 10, 2014, at 7:35 AM, Nick Coghlan wrote: > Putting this here because I found out today it's not in any of the > PEPs and folks have to go digging in mailing list archives to find it. > I'll add it to my Python 3 Q&A at some point. > > The reason Python 3 currently tries to rely on the POSIX locale > encoding is that during the Python 3 development process it was > pointed out that ShiftJIS, ISO-2022 and various CJK codec are in > widespread use in Asia, since Asian users needed solutions to the > problem of representing kana, ideographs and other non-Latin > characters long before the Unicode Consortium existed. Really? Because PEP 383 doesn't support and discourages the use of some of these codecs as a locale. -- Philip Jenvey From greg.ewing at canterbury.ac.nz Fri Jan 10 22:00:46 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 11 Jan 2014 10:00:46 +1300 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CF1DF9.30003@stoneleaf.us> <52CF2D62.8040500@stoneleaf.us> Message-ID: <52D05F7E.6070205@canterbury.ac.nz> INADA Naoki wrote: > latin1 is OK but is it Pythonic? Latin is most certainly a Pythonic subject: http://www.youtube.com/watch?v=IIAdHEwiAy8 -- Greg From g.brandl at gmx.net Fri Jan 10 22:09:18 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 10 Jan 2014 22:09:18 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D03443.2010308@trueblade.com> References: <52D03443.2010308@trueblade.com> Message-ID: Am 10.01.2014 18:56, schrieb Eric V. Smith: > On 1/10/2014 12:17 PM, Juraj Sukop wrote: >> (Sorry if this messes-up the thread order, it is meant as a reply to the >> original RFC.) >> >> Dear list, >> >> newbie here. After much hesitation I decided to put forward a use case >> which bothers me about the current proposal. Disclaimer: I happen to >> write a library which is directly influenced by this. >> >> As you may know, PDF operates over bytes and an integer or >> floating-point number is written down as-is, for example "100" or "1.23". >> >> However, the proposal drops "%d", "%f" and "%x" formats and the >> suggested workaround for writing down a number is to use >> ".encode('ascii')", which I think has two problems: >> >> One is that it needs to construct one additional object per formatting >> as opposed to Python 2; it is not uncommon for a PDF file to contain >> millions of numbers. >> >> The second problem is that, in my eyes, it is very counter-intuitive to >> require the use of str only to get formatting on bytes. Consider the >> case where a large bytes object is created out of many smaller bytes >> objects. If I wanted to format a part I had to use str instead. For example: >> >> content = b''.join([ >> b'header', >> b'some dictionary structure', >> b'part 1 abc', >> ('part 2 %.3f' % number).encode('ascii'), >> b'trailer']) > > I agree. I don't see any reason to exclude int and float. See Guido's > messages http://bugs.python.org/issue3982#msg180423 and > http://bugs.python.org/issue3982#msg180430 for some justification and > discussion. Since converting int and float to strings generates a very > small range of ASCII characters, ([0-9a-fx.-=], plus the uppercase > versions), what problem is introduced by allowing int and float? The > original str.format() work relied on this fact in its stringlib > implementation. I agree. I would have needed bytes-formatting (with numbers) recently writing .rtf files. Georg From jimjjewett at gmail.com Fri Jan 10 22:23:18 2014 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Fri, 10 Jan 2014 13:23:18 -0800 (PST) Subject: [Python-Dev] Python3 "complexity" - 2 use cases In-Reply-To: <7wsiswcufo.fsf@benfinney.id.au> Message-ID: <52d064c6.6933310a.08e9.2156@mx.google.com> > Steven D'Aprano wrote: >> I think that heuristics to guess the encoding have their role to play, >> if the caller understands the risks. Ben Finney wrote: > In my opinion, content-type guessing heuristics certainly don't belong > in the standard library. It would be great if there were never any need to guess. But in the real world, there is -- and often the user won't know any more than python does. So when it is time to guess, a source of good guesses is an important battery to include. The HTML5 specifications go through some fairly extreme contortions to document what browsers actually do, as opposed to what previous standards have mandated. They don't currently specify how to guess (though I think a draft once tried, since the major browsers all do it, and at the time did it similarly), but the specs do explicitly support such a step, and do provide an implementation note encouraging user-agents to do at least minimal auto-detection. http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding My own opinion is therefore that Python SHOULD provide better support for both of the following use cases: (1) Treat this file like it came from the web -- including autodetection and even overriding explicit charset declarations for certain charsets. We should explicitly treat autodetection like time zone data -- there is no promise that the "right answer" (or at least the "best guess") won't change, even within a release. I offer no opinion on whether chardet in particular is still too volatile, but the docs should warn that the API is driven by possibly changing external data. (2) Treat this file as "ASCII+", where anything non-ASCII will (at most) be written back out unchanged; it doesn't even need to be converted to text. At this time, I don't know whether the right answer is making it easy to default to surrogate-escape for all error-handling, adding more bytes methods, encouraging use of python's latin-1 variant, offering a dedicated (new?) codec, or some new suggestion. I do know that this use case is important, and that python 3 currently looks clumsy compared to python 2. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ From chris.barker at noaa.gov Fri Jan 10 22:52:20 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 10 Jan 2014 13:52:20 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: On Fri, Jan 10, 2014 at 9:17 AM, Juraj Sukop wrote: > As you may know, PDF operates over bytes and an integer or floating-point > number is written down as-is, for example "100" or "1.23". > Just to be clear here -- is PDF specifically bytes+ascii? Or could there be some-other-encoding unicode in there? If so, then you really have a mess! if it is bytes+ascii, then it seems you could use a unicode object and encode/decode to latin-1 Perhaps still a bit klunkier than formatting directly into a bytes object, but workable. b'%.1f %.1f %.1f RG' % (r, g, b) > > is more confusing than: > > b'%s %s %s RG' % tuple(map(lambda x: (u'%.1f' % x).encode('ascii'), > (r, g, b))) > Let's see, I think that would be: u'%.1f %.1f %.1f RG' % (r, g, b) then when you want to write it out: .encode('latin-1') dumping the binary data in would be a bit uglier, for teh image example: stream ...binary image data... endstream endobj u"stream\n%s\nendstream\nendobj"%binary_data.decode('latin-1') I think..... not too bad, though if nothing else an alias for latin-1 that made it clear it worked for this would be nice. maybe ascii_plus_binary or something? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Jan 10 19:52:18 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 10 Jan 2014 20:52:18 +0200 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: 10.01.14 18:27, Baptiste Carvello ???????(??): > would it make sense to be more general, and allow a "lenient mode", > where all files implicitly opened with the default encoding would also > use the surrogateescape error handler ? The surrogateescape error handler is compatible only with ASCII-compatible encodings (i.e. no ShiftJIS, no UTF-16). It can't be used by default. But you can set PYTHONIOENCODING=:surrogateescape and got you default locale encoding with surrogateescape. From victor.stinner at gmail.com Fri Jan 10 23:12:52 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 10 Jan 2014 23:12:52 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: 2014/1/10 Juraj Sukop : > In the case of PDF, the embedding of an image into PDF looks like: > > 10 0 obj > << /Type /XObject > /Width 100 > /Height 100 > /Alternates 15 0 R > /Length 2167 > >> > stream > ...binary image data... > endstream > endobj What not building "10 0 obj ... stream" and "endstream endobj" in Unicode and then encode to ASCII? Example: data = b''.join(( ("%d %d obj ... stream" % (10, 0)).encode('ascii'), binary_image_data, ("endstream endobj").encode('ascii'), )) Victor From eric at trueblade.com Fri Jan 10 23:20:32 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 10 Jan 2014 17:20:32 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <52D07230.5090400@trueblade.com> On 1/10/2014 5:12 PM, Victor Stinner wrote: > 2014/1/10 Juraj Sukop : >> In the case of PDF, the embedding of an image into PDF looks like: >> >> 10 0 obj >> << /Type /XObject >> /Width 100 >> /Height 100 >> /Alternates 15 0 R >> /Length 2167 >> >> >> stream >> ...binary image data... >> endstream >> endobj > > What not building "10 0 obj ... stream" and "endstream endobj" in > Unicode and then encode to ASCII? Example: > > data = b''.join(( > ("%d %d obj ... stream" % (10, 0)).encode('ascii'), > binary_image_data, > ("endstream endobj").encode('ascii'), > )) Isn't the point of the PEP to make it easier to port 2.x code to 3.5? Is there really existing code like this in 2.x? I think what we're trying to do is to make code that looks like: b'%d %d obj ... stream' % (10, 0) work in both 2.x and 3.5. But correct me if I'm wrong. I'll admit to not following 100% of these emails. Eric. From solipsis at pitrou.net Fri Jan 10 23:29:58 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 10 Jan 2014 23:29:58 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <52D03443.2010308@trueblade.com> Message-ID: <20140110232958.18c5279f@fsol> On Fri, 10 Jan 2014 12:56:19 -0500 "Eric V. Smith" wrote: > > I agree. I don't see any reason to exclude int and float. See Guido's > messages http://bugs.python.org/issue3982#msg180423 and > http://bugs.python.org/issue3982#msg180430 for some justification and > discussion. If you are representing int and float, you're really formatting a text message, not bytes. Basically if you allow the formatting of int and float instances, there's no reason not to allow the formatting of arbitrary objects through __str__. It doesn't make sense to special-case those two types and nothing else. Regards Antoine. From eric at trueblade.com Fri Jan 10 23:33:57 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 10 Jan 2014 17:33:57 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140110232958.18c5279f@fsol> References: <52D03443.2010308@trueblade.com> <20140110232958.18c5279f@fsol> Message-ID: <52D07555.8040701@trueblade.com> On 1/10/2014 5:29 PM, Antoine Pitrou wrote: > On Fri, 10 Jan 2014 12:56:19 -0500 > "Eric V. Smith" wrote: >> >> I agree. I don't see any reason to exclude int and float. See Guido's >> messages http://bugs.python.org/issue3982#msg180423 and >> http://bugs.python.org/issue3982#msg180430 for some justification and >> discussion. > > If you are representing int and float, you're really formatting a text > message, not bytes. Basically if you allow the formatting of int and > float instances, there's no reason not to allow the formatting of > arbitrary objects through __str__. It doesn't make sense to > special-case those two types and nothing else. It might not for .format(), but I'm not convinced. But for %-formatting, str is already special-cased for these types. Eric. From solipsis at pitrou.net Fri Jan 10 23:34:02 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 10 Jan 2014 23:34:02 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <52D07230.5090400@trueblade.com> Message-ID: <20140110233402.56d4f180@fsol> On Fri, 10 Jan 2014 17:20:32 -0500 "Eric V. Smith" wrote: > > Isn't the point of the PEP to make it easier to port 2.x code to 3.5? > Is > there really existing code like this in 2.x? No, but so what? The point of the PEP is not to allow arbitrary Python 2 code to run without modification under Python 3. There's a reason we broke compatibility, and there's no way we're gonna undo that. > I think what we're trying to do is to make code that looks like: > b'%d %d obj ... stream' % (10, 0) > work in both 2.x and 3.5. That's not what *I* am trying to do. As far as I'm concerned the aim of the PEP is to ease bytes interpolation, not to provide some kind of magical construct that will solve everyone's porting problems. Regards Antoine. From solipsis at pitrou.net Fri Jan 10 23:42:18 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 10 Jan 2014 23:42:18 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <52D03443.2010308@trueblade.com> <20140110232958.18c5279f@fsol> <52D07555.8040701@trueblade.com> Message-ID: <20140110234218.1c2bb8e8@fsol> On Fri, 10 Jan 2014 17:33:57 -0500 "Eric V. Smith" wrote: > On 1/10/2014 5:29 PM, Antoine Pitrou wrote: > > On Fri, 10 Jan 2014 12:56:19 -0500 > > "Eric V. Smith" wrote: > >> > >> I agree. I don't see any reason to exclude int and float. See Guido's > >> messages http://bugs.python.org/issue3982#msg180423 and > >> http://bugs.python.org/issue3982#msg180430 for some justification and > >> discussion. > > > > If you are representing int and float, you're really formatting a text > > message, not bytes. Basically if you allow the formatting of int and > > float instances, there's no reason not to allow the formatting of > > arbitrary objects through __str__. It doesn't make sense to > > special-case those two types and nothing else. > > It might not for .format(), but I'm not convinced. But for %-formatting, > str is already special-cased for these types. That's not what I'm saying. str.__mod__ is able to represent all kinds of types through %s and calling __str__. It doesn't make sense for bytes.__mod__ to only support int and float. Why only them? Regards Antoine. From ethan at stoneleaf.us Fri Jan 10 23:58:15 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 10 Jan 2014 14:58:15 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140110234218.1c2bb8e8@fsol> References: <52D03443.2010308@trueblade.com> <20140110232958.18c5279f@fsol> <52D07555.8040701@trueblade.com> <20140110234218.1c2bb8e8@fsol> Message-ID: <52D07B07.8050802@stoneleaf.us> On 01/10/2014 02:42 PM, Antoine Pitrou wrote: > On Fri, 10 Jan 2014 17:33:57 -0500 > "Eric V. Smith" wrote: >> On 1/10/2014 5:29 PM, Antoine Pitrou wrote: >>> On Fri, 10 Jan 2014 12:56:19 -0500 >>> "Eric V. Smith" wrote: >>>> >>>> I agree. I don't see any reason to exclude int and float. See Guido's >>>> messages http://bugs.python.org/issue3982#msg180423 and >>>> http://bugs.python.org/issue3982#msg180430 for some justification and >>>> discussion. >>> >>> If you are representing int and float, you're really formatting a text >>> message, not bytes. Basically if you allow the formatting of int and >>> float instances, there's no reason not to allow the formatting of >>> arbitrary objects through __str__. It doesn't make sense to >>> special-case those two types and nothing else. >> >> It might not for .format(), but I'm not convinced. But for %-formatting, >> str is already special-cased for these types. > > That's not what I'm saying. str.__mod__ is able to represent all kinds > of types through %s and calling __str__. It doesn't make sense for > bytes.__mod__ to only support int and float. Why only them? Because embedding the ASCII equivalent of ints and floats in byte streams is a common operation? -- ~Ethan~ From solipsis at pitrou.net Sat Jan 11 00:02:24 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 11 Jan 2014 00:02:24 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <52D03443.2010308@trueblade.com> <20140110232958.18c5279f@fsol> <52D07555.8040701@trueblade.com> <20140110234218.1c2bb8e8@fsol> <52D07B07.8050802@stoneleaf.us> Message-ID: <20140111000224.07b0fa8d@fsol> On Fri, 10 Jan 2014 14:58:15 -0800 Ethan Furman wrote: > On 01/10/2014 02:42 PM, Antoine Pitrou wrote: > > On Fri, 10 Jan 2014 17:33:57 -0500 > > "Eric V. Smith" wrote: > >> On 1/10/2014 5:29 PM, Antoine Pitrou wrote: > >>> On Fri, 10 Jan 2014 12:56:19 -0500 > >>> "Eric V. Smith" wrote: > >>>> > >>>> I agree. I don't see any reason to exclude int and float. See Guido's > >>>> messages http://bugs.python.org/issue3982#msg180423 and > >>>> http://bugs.python.org/issue3982#msg180430 for some justification and > >>>> discussion. > >>> > >>> If you are representing int and float, you're really formatting a text > >>> message, not bytes. Basically if you allow the formatting of int and > >>> float instances, there's no reason not to allow the formatting of > >>> arbitrary objects through __str__. It doesn't make sense to > >>> special-case those two types and nothing else. > >> > >> It might not for .format(), but I'm not convinced. But for %-formatting, > >> str is already special-cased for these types. > > > > That's not what I'm saying. str.__mod__ is able to represent all kinds > > of types through %s and calling __str__. It doesn't make sense for > > bytes.__mod__ to only support int and float. Why only them? > > Because embedding the ASCII equivalent of ints and floats in byte streams > is a common operation? Again, if you're representing "ASCII", you're representing text and should use a str object. Regards Antoine. From chris.barker at noaa.gov Fri Jan 10 23:06:30 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 10 Jan 2014 14:06:30 -0800 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CFE542.5050909@egenix.com> Message-ID: On Fri, Jan 10, 2014 at 6:05 AM, Paul Moore wrote: > > Using the 'latin-1' to mean unknown encoding can easily result > > in Mojibake (unreadable text) entering your application with > > dangerous effects on your other text data. > > Agreed. The latin-1 suggestion is purely for people who object to > learning how to handle the encodings in their data more accurately. > I'm not so sure -- it could be used (abused?) for that, but I'm suggesting it be used for mixed ascii-binary data. I don't know that there IS a "right" way to do that -- at least not an efficient or easy to read and write one. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sat Jan 11 00:14:45 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 10 Jan 2014 18:14:45 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111000224.07b0fa8d@fsol> References: <52D03443.2010308@trueblade.com> <20140110232958.18c5279f@fsol> <52D07555.8040701@trueblade.com> <20140110234218.1c2bb8e8@fsol> <52D07B07.8050802@stoneleaf.us> <20140111000224.07b0fa8d@fsol> Message-ID: <52D07EE5.6020605@trueblade.com> On 1/10/2014 6:02 PM, Antoine Pitrou wrote: > On Fri, 10 Jan 2014 14:58:15 -0800 > Ethan Furman wrote: >> On 01/10/2014 02:42 PM, Antoine Pitrou wrote: >>> On Fri, 10 Jan 2014 17:33:57 -0500 >>> "Eric V. Smith" wrote: >>>> On 1/10/2014 5:29 PM, Antoine Pitrou wrote: >>>>> On Fri, 10 Jan 2014 12:56:19 -0500 >>>>> "Eric V. Smith" wrote: >>>>>> >>>>>> I agree. I don't see any reason to exclude int and float. See Guido's >>>>>> messages http://bugs.python.org/issue3982#msg180423 and >>>>>> http://bugs.python.org/issue3982#msg180430 for some justification and >>>>>> discussion. >>>>> >>>>> If you are representing int and float, you're really formatting a text >>>>> message, not bytes. Basically if you allow the formatting of int and >>>>> float instances, there's no reason not to allow the formatting of >>>>> arbitrary objects through __str__. It doesn't make sense to >>>>> special-case those two types and nothing else. >>>> >>>> It might not for .format(), but I'm not convinced. But for %-formatting, >>>> str is already special-cased for these types. >>> >>> That's not what I'm saying. str.__mod__ is able to represent all kinds >>> of types through %s and calling __str__. It doesn't make sense for >>> bytes.__mod__ to only support int and float. Why only them? Ah, I see. This is about the types that %s supports, not about support for %d and %f. >> Because embedding the ASCII equivalent of ints and floats in byte streams >> is a common operation? > > Again, if you're representing "ASCII", you're representing text and > should use a str object. Yes, but is there existing 2.x code that uses %s for int and float (perhaps unwittingly), and do we want to "help" that code out? Or do we want to make porters first change to using %d or %f instead of %s? I'll grant you that we might be doing more harm than help by special-casing these types. I'm just asking. I think what you're getting at is that in addition to not calling __format__, we don't want to call __str__, either, for the same reason. Correct me if I'm off base, please. I'm not trying to put words in anyone's mouth. In any event, I think supporting %d and %f (and %i, %u, %x, %g, etc.) inside format strings would be useful. Eric. From breamoreboy at yahoo.co.uk Sat Jan 11 00:22:55 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 10 Jan 2014 23:22:55 +0000 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CFE542.5050909@egenix.com> Message-ID: On 10/01/2014 22:06, Chris Barker wrote: > On Fri, Jan 10, 2014 at 6:05 AM, Paul Moore > wrote: > > > Using the 'latin-1' to mean unknown encoding can easily result > > in Mojibake (unreadable text) entering your application with > > dangerous effects on your other text data. > > Agreed. The latin-1 suggestion is purely for people who object to > learning how to handle the encodings in their data more accurately. > > > I'm not so sure -- it could be used (abused?) for that, but I'm > suggesting it be used for mixed ascii-binary data. I don't know that > there IS a "right" way to do that -- at least not an efficient or easy > to read and write one. > > -Chris > The correct way is to read the interface specification which tells you what should be in the data. Or do people not use interface specifications these days, preferring to guess what they've got instead? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From solipsis at pitrou.net Sat Jan 11 00:24:05 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 11 Jan 2014 00:24:05 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <52D03443.2010308@trueblade.com> <20140110232958.18c5279f@fsol> <52D07555.8040701@trueblade.com> <20140110234218.1c2bb8e8@fsol> <52D07B07.8050802@stoneleaf.us> <20140111000224.07b0fa8d@fsol> <52D07EE5.6020605@trueblade.com> Message-ID: <20140111002405.11328070@fsol> On Fri, 10 Jan 2014 18:14:45 -0500 "Eric V. Smith" wrote: > > >> Because embedding the ASCII equivalent of ints and floats in byte streams > >> is a common operation? > > > > Again, if you're representing "ASCII", you're representing text and > > should use a str object. > > Yes, but is there existing 2.x code that uses %s for int and float > (perhaps unwittingly), and do we want to "help" that code out? > Or do we > want to make porters first change to using %d or %f instead of %s? I'm afraid you're misunderstanding me. The PEP doesn't allow for %d and %f on bytes objects. > I think what you're getting at is that in addition to not calling > __format__, we don't want to call __str__, either, for the same reason. Not only. We don't want to do anything that actually asks for a *textual* representation of something. %d and %f ask for a textual representation of a number, so they're right out. Regards Antoine. From juraj.sukop at gmail.com Sat Jan 11 00:40:28 2014 From: juraj.sukop at gmail.com (Juraj Sukop) Date: Sat, 11 Jan 2014 00:40:28 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: On Fri, Jan 10, 2014 at 10:52 PM, Chris Barker wrote: > On Fri, Jan 10, 2014 at 9:17 AM, Juraj Sukop wrote: > >> As you may know, PDF operates over bytes and an integer or floating-point >> number is written down as-is, for example "100" or "1.23". >> > > Just to be clear here -- is PDF specifically bytes+ascii? > > Or could there be some-other-encoding unicode in there? > >From the specs: "At the most fundamental level, a PDF file is a sequence of 8-bit bytes." But it is also possible to represent a PDF using printable ASCII + whitespace by using escapes and "filters". Then, there are also "text strings" which might be encoded in UTF+16. What this all means is that the PDF objects are expressed in ASCII, "stream" objects like images and fonts may have a binary part and I never saw those UTF+16 strings. u"stream\n%s\nendstream\nendobj"%binary_data.decode('latin-1') > The argument for dropping "%f" et al. has been that if something is a text, then it should be Unicode. Conversely, if it is not text, then it should not be Unicode. -------------- next part -------------- An HTML attachment was scrubbed... URL: From juraj.sukop at gmail.com Sat Jan 11 00:43:39 2014 From: juraj.sukop at gmail.com (Juraj Sukop) Date: Sat, 11 Jan 2014 00:43:39 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: On Fri, Jan 10, 2014 at 11:12 PM, Victor Stinner wrote: > > What not building "10 0 obj ... stream" and "endstream endobj" in > Unicode and then encode to ASCII? Example: > > data = b''.join(( > ("%d %d obj ... stream" % (10, 0)).encode('ascii'), > binary_image_data, > ("endstream endobj").encode('ascii'), > )) > The key is "encode to ASCII" which means that the result is bytes. Then, there is this "11 0 obj" which should also be bytes. But it has no "binary_image_data" - only lots of numbers waiting to be somehow converted to bytes. I already mentioned the problems with ".encode('ascii')" but it does not stop here. Numbers may appear not only inside "streams" but almost anywhere: in the header there is PDF version, an image has to have "width" and "height", at the end of PDF there is a structure containing offsets to all of the objects in file. Basically, to ".encode('ascii')" every possible number is not exactly simple or pretty. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Jan 11 00:49:33 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 11 Jan 2014 00:49:33 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: Message-ID: <20140111004933.4e0bb394@fsol> On Sat, 11 Jan 2014 00:43:39 +0100 Juraj Sukop wrote: > Basically, to ".encode('ascii')" every possible > number is not exactly simple or pretty. Well it strikes me that the PDF format itself is not exactly simple or pretty. It might be convenient that Python 2 allows you, in certain cases, to "ignore" encoding issues because the main text type is actually a bytestring, but under the Python 3 model there's no reason to allow the same shortcuts. Also, when you say you've never encountered UTF-16 text in PDFs, it sounds like those people who've never encountered any non-ASCII data in their programs. Regards Antoine. From chris.barker at noaa.gov Sat Jan 11 00:50:04 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 10 Jan 2014 15:50:04 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: On Fri, Jan 10, 2014 at 3:40 PM, Juraj Sukop wrote: > What this all means is that the PDF objects are expressed in ASCII, > "stream" objects like images and fonts may have a binary part and I never > saw those UTF+16 strings. > hmm -- I wonder if they are out there in the wild, though.... > u"stream\n%s\nendstream\nendobj"%binary_data.decode('latin-1') >> > > The argument for dropping "%f" et al. has been that if something is a > text, then it should be Unicode. Conversely, if it is not text, then it > should not be Unicode. > > ???? What I'm trying to demostrate / test is that you can use unicode objects for mixed binary + ascii, if you make sure to encode/decode using latin-1 ( any others?). The idea is that ascii can be seen/used as text, and other bytes are preserved, and you can ignore whatever meaning latin-1 gives them. using unicode objects means that you can use the existing string formatting (%s), and if you want to pass in binary blobs, you need to decode them as latin-1, creating a unicode object, which will get interpolated into your unicode object, but then that unicode gets encoded back to latin-1, the original bytes are preserved. I think this it confusing, as we are calling it latin-1, but not really using it that way, but it seems it should work. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Sat Jan 11 00:58:07 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 10 Jan 2014 15:58:07 -0800 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CFE542.5050909@egenix.com> Message-ID: On Fri, Jan 10, 2014 at 3:22 PM, Mark Lawrence wrote: > The correct way is to read the interface specification which tells you > what should be in the data. Or do people not use interface specifications > these days, preferring to guess what they've got instead? > No one is suggesting guessing (OK, sometimes for what encoding text is in -- but that's when you already know it's text). But while some specs for mixed ascii and binary may specify which bytes are which, not all do -- there may be a read the file 'till you find this text, then the next n bytes are binary, or maybe the next bytes are binary until you get to this ascii text, etc... This is not guessing, but it does require working with an object which has both ascii text and binary in it -- and why shouldn't Python provide a reasonable way to work with that? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From juraj.sukop at gmail.com Sat Jan 11 01:13:13 2014 From: juraj.sukop at gmail.com (Juraj Sukop) Date: Sat, 11 Jan 2014 01:13:13 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111004933.4e0bb394@fsol> References: <20140111004933.4e0bb394@fsol> Message-ID: On Sat, Jan 11, 2014 at 12:49 AM, Antoine Pitrou wrote: > Also, when you say you've never encountered UTF-16 text in PDFs, it > sounds like those people who've never encountered any non-ASCII data in > their programs. Let me clarify: one does not think in "writing text in Unicode"-terms in PDF. Instead, one records the sequence of "character codes" which correspond to "glyphs" or the glyph IDs directly. That's because one Unicode character may have more than one glyph and more characters can be shown as one glyph. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sat Jan 11 01:02:20 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 10 Jan 2014 16:02:20 -0800 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> <20140109224529.6ddd9388@fsol> <52CFE542.5050909@egenix.com> Message-ID: <52D08A0C.8080204@stoneleaf.us> On 01/10/2014 03:22 PM, Mark Lawrence wrote: > On 10/01/2014 22:06, Chris Barker wrote: >> >> I'm not so sure -- it could be used (abused?) for that, but I'm >> suggesting it be used for mixed ascii-binary data. I don't know that >> there IS a "right" way to do that -- at least not an efficient or easy >> to read and write one. > > The correct way is to read the interface specification which tells you what should be in the data. Of course. The debate is about how to generate the data to the specs in an elegant manner. -- ~Ethan~ From ethan at stoneleaf.us Sat Jan 11 01:23:53 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 10 Jan 2014 16:23:53 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140108234213.0610ef63@fsol> References: <20140108234213.0610ef63@fsol> Message-ID: <52D08F19.9070605@stoneleaf.us> On 01/08/2014 02:42 PM, Antoine Pitrou wrote: > > With Victor's consent, I overhauled PEP 460 and made the feature set > more restricted and consistent with the bytes/str separation. From the PEP: ============= > Python 3 generally mandates that text be stored and manipulated as > unicode (i.e. str objects, not bytes). In some cases, though, it > makes sense to manipulate bytes objects directly. Typical usage is > binary network protocols, where you can want to interpolate and > assemble several bytes object (some of them literals, some of them > compute) to produce complete protocol messages. For example, > protocols such as HTTP or SIP have headers with ASCII names and > opaque "textual" values using a varying and/or sometimes ill-defined > encoding. Moreover, those headers can be followed by a binary > body... which can be chunked and decorated with ASCII headers and > trailers! As it stands now, the PEP talks about ASCII, about how it makes sense sometimes to work directly with bytes objects, and then refuses to allow % to embed ASCII text in the byte stream. > All other features present in formatting of str objects (either > through the percent operator or the str.format() method) are > unsupported. Those features imply treating the recipient of the > operator or method as text, which goes counter to the text / bytes > separation (for example, accepting %d as a format code would imply > that the bytes object really is a ASCII-compatible text string). No, it implies that portion of the byte stream is ASCII compatible. And we have several examples: PDF, HTML, DBF, just about every network protocol (not counting M$), and, I'm sure, plenty I haven't heard of. -1 on the PEP as it stands now. -- ~Ethan~ From solipsis at pitrou.net Sat Jan 11 02:12:58 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 11 Jan 2014 02:12:58 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> Message-ID: <20140111021258.48e72beb@fsol> On Fri, 10 Jan 2014 16:23:53 -0800 Ethan Furman wrote: > On 01/08/2014 02:42 PM, Antoine Pitrou wrote: > > > > With Victor's consent, I overhauled PEP 460 and made the feature set > > more restricted and consistent with the bytes/str separation. > > From the PEP: > ============= > > Python 3 generally mandates that text be stored and manipulated as > > unicode (i.e. str objects, not bytes). In some cases, though, it > > makes sense to manipulate bytes objects directly. Typical usage is > > binary network protocols, where you can want to interpolate and > > assemble several bytes object (some of them literals, some of them > > compute) to produce complete protocol messages. For example, > > protocols such as HTTP or SIP have headers with ASCII names and > > opaque "textual" values using a varying and/or sometimes ill-defined > > encoding. Moreover, those headers can be followed by a binary > > body... which can be chunked and decorated with ASCII headers and > > trailers! > > As it stands now, the PEP talks about ASCII, about how it makes sense > sometimes to work directly with bytes objects, and > then refuses to allow % to embed ASCII text in the byte stream. Indeed I refuse for %-formatting to allow the mixing of bytes and str objects, in the same way that it is forbidden to concatenate "a" and b"b" together, or to write b"".join(["abc"]). Python 3 was made *precisely* because the implicit conversion between ASCII unicode and bytes is deemed harmful. It's completely counter-productive and misleading for our users to start mudding the message by introducing exceptions to that rule. Regards Antoine. From eric at trueblade.com Sat Jan 11 02:53:09 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 10 Jan 2014 20:53:09 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111021258.48e72beb@fsol> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> Message-ID: <52D0A405.50903@trueblade.com> On 1/10/2014 8:12 PM, Antoine Pitrou wrote: > On Fri, 10 Jan 2014 16:23:53 -0800 > Ethan Furman wrote: >> On 01/08/2014 02:42 PM, Antoine Pitrou wrote: >>> >>> With Victor's consent, I overhauled PEP 460 and made the feature set >>> more restricted and consistent with the bytes/str separation. >> >> From the PEP: >> ============= >>> Python 3 generally mandates that text be stored and manipulated as >>> unicode (i.e. str objects, not bytes). In some cases, though, it >>> makes sense to manipulate bytes objects directly. Typical usage is >>> binary network protocols, where you can want to interpolate and >>> assemble several bytes object (some of them literals, some of them >>> compute) to produce complete protocol messages. For example, >>> protocols such as HTTP or SIP have headers with ASCII names and >>> opaque "textual" values using a varying and/or sometimes ill-defined >>> encoding. Moreover, those headers can be followed by a binary >>> body... which can be chunked and decorated with ASCII headers and >>> trailers! >> >> As it stands now, the PEP talks about ASCII, about how it makes sense >> sometimes to work directly with bytes objects, and >> then refuses to allow % to embed ASCII text in the byte stream. > > Indeed I refuse for %-formatting to allow the mixing of bytes and str > objects, in the same way that it is forbidden to concatenate "a" and > b"b" together, or to write b"".join(["abc"]). I think: 'a' + b'b' is different from: b'Content-Length: %d' % 42 The former we want to prevent, but I see nothing wrong with the latter. So, I'm -1 on the PEP. It doesn't address the cases laid out in issue 3892. See for example http://bugs.python.org/issue3982#msg180432 . Eric. From solipsis at pitrou.net Sat Jan 11 03:04:03 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 11 Jan 2014 03:04:03 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> Message-ID: <20140111030403.6f177f30@fsol> On Fri, 10 Jan 2014 20:53:09 -0500 "Eric V. Smith" wrote: > > So, I'm -1 on the PEP. It doesn't address the cases laid out in issue > 3892. See for example http://bugs.python.org/issue3982#msg180432 . Then we might as well not do anything, since any attempt to advance things is met by stubborn opposition in the name of "not far enough". (I don't care much personally, I think the issue is quite overblown anyway) Regards Antoine. From ethan at stoneleaf.us Sat Jan 11 03:28:41 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 10 Jan 2014 18:28:41 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111030403.6f177f30@fsol> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> Message-ID: <52D0AC59.9070908@stoneleaf.us> On 01/10/2014 06:04 PM, Antoine Pitrou wrote: > On Fri, 10 Jan 2014 20:53:09 -0500 > "Eric V. Smith" wrote: >> >> So, I'm -1 on the PEP. It doesn't address the cases laid out in issue >> 3892. See for example http://bugs.python.org/issue3982#msg180432 . > > Then we might as well not do anything, since any attempt to advance > things is met by stubborn opposition in the name of "not far enough". Heh, and here I thought it was stubborn opposition in the name of purity. ;) > (I don't care much personally, I think the issue is quite overblown > anyway) Is it safe to assume you don't use Python for the use-cases under discussion? Specifically, mixed ASCII, binary, and encoded-text byte streams? -- ~Ethan~ From solipsis at pitrou.net Sat Jan 11 03:39:56 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 11 Jan 2014 03:39:56 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D0AC59.9070908@stoneleaf.us> Message-ID: <20140111033956.227d25da@fsol> On Fri, 10 Jan 2014 18:28:41 -0800 Ethan Furman wrote: > > Is it safe to assume you don't use Python for the use-cases under discussion? You know, I've done quite a bit of network programming. I've also done an experimental port of Twisted to Python 3. I know what a network protocol with ill-defined encodings looks like. Regards Antoine. From songofacandy at gmail.com Sat Jan 11 04:30:23 2014 From: songofacandy at gmail.com (INADA Naoki) Date: Sat, 11 Jan 2014 12:30:23 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111002405.11328070@fsol> References: <52D03443.2010308@trueblade.com> <20140110232958.18c5279f@fsol> <52D07555.8040701@trueblade.com> <20140110234218.1c2bb8e8@fsol> <52D07B07.8050802@stoneleaf.us> <20140111000224.07b0fa8d@fsol> <52D07EE5.6020605@trueblade.com> <20140111002405.11328070@fsol> Message-ID: To avoid implicit conversion between str and bytes, I propose adding only limited %-format, not .format() or .format_map(). "limited %-format" means: %c accepts integer or bytes having one length. %r is not supported %s accepts only bytes. %a is only format accepts arbitrary object. And other formats is same to str. On Sat, Jan 11, 2014 at 8:24 AM, Antoine Pitrou wrote: > On Fri, 10 Jan 2014 18:14:45 -0500 > "Eric V. Smith" wrote: > > > > >> Because embedding the ASCII equivalent of ints and floats in byte > streams > > >> is a common operation? > > > > > > Again, if you're representing "ASCII", you're representing text and > > > should use a str object. > > > > Yes, but is there existing 2.x code that uses %s for int and float > > (perhaps unwittingly), and do we want to "help" that code out? > > Or do we > > want to make porters first change to using %d or %f instead of %s? > > I'm afraid you're misunderstanding me. The PEP doesn't allow for %d and > %f on bytes objects. > > > I think what you're getting at is that in addition to not calling > > __format__, we don't want to call __str__, either, for the same reason. > > Not only. We don't want to do anything that actually asks for a > *textual* representation of something. %d and %f ask for a textual > representation of a number, so they're right out. > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sat Jan 11 03:46:06 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 10 Jan 2014 18:46:06 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111033956.227d25da@fsol> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D0AC59.9070908@stoneleaf.us> <20140111033956.227d25da@fsol> Message-ID: <52D0B06E.70302@stoneleaf.us> On 01/10/2014 06:39 PM, Antoine Pitrou wrote: > On Fri, 10 Jan 2014 18:28:41 -0800 > Ethan Furman wrote: >> >> Is it safe to assume you don't use Python for the use-cases under discussion? > > You know, I've done quite a bit of network programming. No, I didn't, that's why I asked. > I've also done an experimental port of Twisted to Python 3. > I know what a network protocol with ill-defined encodings > looks like. Can you give a code sample of what you think, for example, the PDF generation code should look like? (If you already have, I apologize -- I missed it in all the ruckus.) -- ~Ethan~ From ethan at stoneleaf.us Sat Jan 11 03:55:18 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 10 Jan 2014 18:55:18 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111033956.227d25da@fsol> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D0AC59.9070908@stoneleaf.us> <20140111033956.227d25da@fsol> Message-ID: <52D0B296.3000101@stoneleaf.us> On 01/10/2014 06:39 PM, Antoine Pitrou wrote: > > I know what a network protocol with ill-defined encodings > looks like. For the record, I've been (and I suspect Eric and some others have also been) talking about well-defined encodings. For the DBF files that I work with, there is binary, ASCII, and third that is specified in the file header. -- ~Ethan~ From songofacandy at gmail.com Sat Jan 11 05:09:50 2014 From: songofacandy at gmail.com (INADA Naoki) Date: Sat, 11 Jan 2014 13:09:50 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111002405.11328070@fsol> References: <52D03443.2010308@trueblade.com> <20140110232958.18c5279f@fsol> <52D07555.8040701@trueblade.com> <20140110234218.1c2bb8e8@fsol> <52D07B07.8050802@stoneleaf.us> <20140111000224.07b0fa8d@fsol> <52D07EE5.6020605@trueblade.com> <20140111002405.11328070@fsol> Message-ID: To avoid implicit conversion between str and bytes, I propose adding only limited %-format, not .format() or .format_map(). "limited %-format" means: %c accepts integer or bytes having one length. %r is not supported %s accepts only bytes. %a is only format accepts arbitrary object. And other formats is same to str. On Sat, Jan 11, 2014 at 8:24 AM, Antoine Pitrou wrote: > On Fri, 10 Jan 2014 18:14:45 -0500 > "Eric V. Smith" wrote: > > > > >> Because embedding the ASCII equivalent of ints and floats in byte > streams > > >> is a common operation? > > > > > > Again, if you're representing "ASCII", you're representing text and > > > should use a str object. > > > > Yes, but is there existing 2.x code that uses %s for int and float > > (perhaps unwittingly), and do we want to "help" that code out? > > Or do we > > want to make porters first change to using %d or %f instead of %s? > > I'm afraid you're misunderstanding me. The PEP doesn't allow for %d and > %f on bytes objects. > > > I think what you're getting at is that in addition to not calling > > __format__, we don't want to call __str__, either, for the same reason. > > Not only. We don't want to do anything that actually asks for a > *textual* representation of something. %d and %f ask for a textual > representation of a number, so they're right out. > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Sat Jan 11 05:14:25 2014 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 11 Jan 2014 15:14:25 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <20140111041425.GA52753@cskk.homeip.net> On 11Jan2014 00:43, Juraj Sukop wrote: > On Fri, Jan 10, 2014 at 11:12 PM, Victor Stinner > wrote: > > What not building "10 0 obj ... stream" and "endstream endobj" in > > Unicode and then encode to ASCII? Example: > > > > data = b''.join(( > > ("%d %d obj ... stream" % (10, 0)).encode('ascii'), > > binary_image_data, > > ("endstream endobj").encode('ascii'), > > )) > > The key is "encode to ASCII" which means that the result is bytes. Then, > there is this "11 0 obj" which should also be bytes. But it has no > "binary_image_data" - only lots of numbers waiting to be somehow converted > to bytes. I already mentioned the problems with ".encode('ascii')" but it > does not stop here. Numbers may appear not only inside "streams" but almost > anywhere: in the header there is PDF version, an image has to have "width" > and "height", at the end of PDF there is a structure containing offsets to > all of the objects in file. Basically, to ".encode('ascii')" every possible > number is not exactly simple or pretty. Hi Juraj, Might I suggest a helper function (outside the PEP scope) instead of arguing for support for %f et al? Thus: def bytify(things, encoding='ascii'): for thing: if isinstance(thing, bytes): yield thing else: yield str(thing).encode('ascii') Then one's embedding in PDF might become, more readably: data = b' '.join( bytify( [ 10, 0, obj, binary_image_data, ... ] ) ) Of course, bytify might be augmented with whatever encoding facilities might suit your needs. Cheers, -- Cameron Simpson We tend to overestimate the short-term impact of technological change and underestimate its long-term impact. - Amara's Law From steve at pearwood.info Sat Jan 11 06:36:42 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 11 Jan 2014 16:36:42 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <20140111053639.GM3869@ando> On Fri, Jan 10, 2014 at 06:17:02PM +0100, Juraj Sukop wrote: > As you may know, PDF operates over bytes and an integer or floating-point > number is written down as-is, for example "100" or "1.23". I'm sorry, I don't understand what you mean here. I'm honestly not trying to be difficult, but you sound confident that you understand what you are doing, but your description doesn't make sense to me. To me, it looks like you are conflating bytes and ASCII characters, that is, assuming that characters "are" in some sense identical to their ASCII representation. Let me explain: The integer that in English is written as 100 is represented in memory as bytes 0x0064 (assuming a big-endian C short), so when you say "an integer is written down AS-IS" (emphasis added), to me that says that the PDF file includes the bytes 0x0064. But then you go on to write the three character string "100", which (assuming ASCII) is the bytes 0x313030. Going from the C short to the ASCII representation 0x313030 is nothing like inserting the int "as-is". To put it another way, the Python 2 '%d' format code does not just copy bytes. I think that what you are trying to say is that a PDF file is a binary file which includes some ASCII-formatted text fields. So when writing an integer 100, rather than writing it "as is" which would be byte 0x64 (with however many leading null bytes needed for padding), it is converted to ASCII representation 0x313030 first, and that's what needs to be inserted. If you consider PDF as binary with occasional pieces of ASCII text, then working with bytes makes sense. But I wonder whether it might be better to consider PDF as mostly text with some binary bytes. Even though the bulk of the PDF will be binary, the interesting bits are text. E.g. your example: > In the case of PDF, the embedding of an image into PDF looks like: > > 10 0 obj > << /Type /XObject > /Width 100 > /Height 100 > /Alternates 15 0 R > /Length 2167 > >> > stream > ...binary image data... > endstream > endobj Even though the binary image data is probably much, much larger in length than the text shown above, it's (probably) trivial to deal with: convert your image data into bytes, decode those bytes into Latin-1, then concatenate the Latin-1 string into the text above. Latin-1 has the nice property that every byte decodes into the character with the same code point, and visa versa. So: for i in range(256): assert bytes([i]).decode('latin-1') == chr(i) assert chr(i).encode('latin-1') == bytes([i]) passes. It seems to me that your problem goes away if you use Unicode text with embedded binary data, rather than binary data with embedded ASCII text. Then when writing the file to disk, of course you encode it to Latin-1, either explicitly: pdf = ... # Unicode string containing the PDF contents with open("outfile.pdf", "wb") as f: f.write(pdf.encode("latin-1") or implicitly: with open("outfile.pdf", "w", encoding="latin-1") as f: f.write(pdf) There may be a few wrinkles I haven't thought of, I don't claim to be an expert on PDF. But I see no reason why PDF files ought to be an exception to the rule: * work internally with Unicode text; * convert to and from bytes only on input and output. Please also take note that in Python 3.3 and better, the internal representation of Unicode strings containing only code points up to 255 (i.e. pure ASCII or pure Latin-1) is very efficient, using only one byte per character. Another advantage is that using text rather than bytes means that your example: [...] > dropping the bytes-formatting of numbers makes it more complicated > than it was. I would appreciate any explanation on how: > > b'%.1f %.1f %.1f RG' % (r, g, b) becomes simply '%.1f %.1f %.1f RG' % (r, g, b) in Python 3. In Python 3.3 and above, it can be written as: u'%.1f %.1f %.1f RG' % (r, g, b) which conveniently is exactly the same syntax you would use in Python 2. That's *much* nicer than your suggestion: > is more confusing than: > > b'%s %s %s RG' % tuple(map(lambda x: (u'%.1f' % x).encode('ascii'), > (r, g, b))) -- Steven From ben+python at benfinney.id.au Sat Jan 11 07:00:56 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 11 Jan 2014 17:00:56 +1100 Subject: [Python-Dev] Python3 "complexity" - 2 use cases References: <7wsiswcufo.fsf@benfinney.id.au> <52d064c6.6933310a.08e9.2156@mx.google.com> Message-ID: <7wob3jcb1j.fsf@benfinney.id.au> "Jim J. Jewett" writes: > > > Steven D'Aprano wrote: > >> I think that heuristics to guess the encoding have their role to play, > >> if the caller understands the risks. > > Ben Finney wrote: > > In my opinion, content-type guessing heuristics certainly don't belong > > in the standard library. > > It would be great if there were never any need to guess. But in the > real world, there is -- and often the user won't know any more than > python does. That's why I think it's great to have heuristic guessing code available as a third-party library. > So when it is time to guess, a source of good guesses is an important > battery to include. Why is it important enough to deserve that privilege, over the thousands of other candidates for the standard library? The barrier for entry to the standard library is higher than mere usefulness. > We should explicitly treat autodetection like time zone data -- > there is no promise that the "right answer" (or at least the "best > guess") won't change, even within a release. But there is exactly one set of authoritative time zones at any particular point in time. That's why it makes sense to have that set of authoritative values available in the standard library. Heuristic guesses about content types do not have the property of exactly one authoritative source, so your analogy is not compelling. -- \ ?Unix is an operating system, OS/2 is half an operating system, | `\ Windows is a shell, and DOS is a boot partition virus.? ?Peter | _o__) H. Coffin | Ben Finney From g.brandl at gmx.net Sat Jan 11 08:26:57 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 11 Jan 2014 08:26:57 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111030403.6f177f30@fsol> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> Message-ID: Am 11.01.2014 03:04, schrieb Antoine Pitrou: > On Fri, 10 Jan 2014 20:53:09 -0500 > "Eric V. Smith" wrote: >> >> So, I'm -1 on the PEP. It doesn't address the cases laid out in issue >> 3892. See for example http://bugs.python.org/issue3982#msg180432 . I agree. > Then we might as well not do anything, since any attempt to advance > things is met by stubborn opposition in the name of "not far enough". > > (I don't care much personally, I think the issue is quite overblown > anyway) So you wouldn't mind another overhaul of the PEP including a bit more functionality again? :) I really think that practicality beats purity here. (I'm not advocating free mixing bytes and str, mind you!) Georg From ncoghlan at gmail.com Sat Jan 11 09:17:07 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Jan 2014 18:17:07 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D07B07.8050802@stoneleaf.us> References: <52D03443.2010308@trueblade.com> <20140110232958.18c5279f@fsol> <52D07555.8040701@trueblade.com> <20140110234218.1c2bb8e8@fsol> <52D07B07.8050802@stoneleaf.us> Message-ID: On 11 January 2014 08:58, Ethan Furman wrote: > On 01/10/2014 02:42 PM, Antoine Pitrou wrote: >> >> On Fri, 10 Jan 2014 17:33:57 -0500 >> "Eric V. Smith" wrote: >>> >>> On 1/10/2014 5:29 PM, Antoine Pitrou wrote: >>>> >>>> On Fri, 10 Jan 2014 12:56:19 -0500 >>>> "Eric V. Smith" wrote: >>>>> >>>>> >>>>> I agree. I don't see any reason to exclude int and float. See Guido's >>>>> messages http://bugs.python.org/issue3982#msg180423 and >>>>> http://bugs.python.org/issue3982#msg180430 for some justification and >>>>> discussion. >>>> >>>> >>>> If you are representing int and float, you're really formatting a text >>>> message, not bytes. Basically if you allow the formatting of int and >>>> float instances, there's no reason not to allow the formatting of >>>> arbitrary objects through __str__. It doesn't make sense to >>>> special-case those two types and nothing else. >>> >>> >>> It might not for .format(), but I'm not convinced. But for %-formatting, >>> str is already special-cased for these types. >> >> >> That's not what I'm saying. str.__mod__ is able to represent all kinds >> of types through %s and calling __str__. It doesn't make sense for >> bytes.__mod__ to only support int and float. Why only them? > > > Because embedding the ASCII equivalent of ints and floats in byte streams is > a common operation? It's emphatically *NOT* a binary interpolation operation though - the binary representation of the integer 1 is the byte value 1, not the byte value 49. If you want the byte value 49 to appear in the stream, then you need to interpolate the *ASCII encoding* of the string "1", not the integer 1. If you want to manipulate text representations, do it in the text domain. If you want to manipulate binary representations, do it in the binary domain. The *whole point* of the text model change in Python 3 is to force programmers to *decide* which domain they're operating in at any given point in time - while the approach of blurring the boundaries between the two can be convenient for wire protocol and file format manipulation, it is a horrendous bug magnet everywhere else. PEP 360 is just about adding back some missing functionality in the binary domain (interpolating binary sequences together), not about bringing back the problematic text model that allows particular text representations to be interpreted as if they were also binary data. That said, I actually think there's a valid use case for a Python 3 type that allows the bytes/text boundary to be blurred in making it easier to port certain kinds of Python 2 code to Python 3 (specifically, working with wire protocols and file formats that contain a mixture of encodings, but all encodings are *known* to at least be ASCII compatible). It is highly unlikely that such a type will *ever* be part of the standard library, though - idiomatic Python 3 code shouldn't need it, affected Python 2 code *can* be ported without it (but may look more complicated due to the use of explicit decoding and encoding operations, rather than relying on implicit ones), and it should be entirely possible to implement it as an extension module (modulo one bug in CPython that may impact the approach, but we won't know for sure until people actually try it out). Fortunately, after years of my suggesting the idea to almost everyone that complained about the move away from the broken POSIX text model in Python 3, Benno Rice has started experimenting with such a type based on a preliminary test case I wrote at linux.conf.au last week: https://github.com/jeamland/asciicompat/blob/master/tests/ncoghlan.py Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Jan 11 09:43:25 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Jan 2014 18:43:25 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D0AC59.9070908@stoneleaf.us> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D0AC59.9070908@stoneleaf.us> Message-ID: On 11 January 2014 12:28, Ethan Furman wrote: > On 01/10/2014 06:04 PM, Antoine Pitrou wrote: >> >> On Fri, 10 Jan 2014 20:53:09 -0500 >> "Eric V. Smith" wrote: >>> >>> >>> So, I'm -1 on the PEP. It doesn't address the cases laid out in issue >>> 3892. See for example http://bugs.python.org/issue3982#msg180432 . >> >> >> Then we might as well not do anything, since any attempt to advance >> things is met by stubborn opposition in the name of "not far enough". > > > Heh, and here I thought it was stubborn opposition in the name of purity. > ;) No, it's "the POSIX text model is completely broken and we're not letting people bring it back by stealth because they want to stuff their esoteric use case back into the builtin data types instead of writing their own dedicated type now that the builtin types don't handle it any more". Yes, we know we changed the text model and knocked wire protocols off their favoured perch, and we're (thoroughly) aware of the fact that wire protocol developers don't like the fact that the default model now strongly favours the vastly more common case of application development. However, until Benno volunteered to start experimenting with implementing an asciistr type yesterday, there have been *zero* meaningful attempts at trying to solve the issues with wire protocol manipulation outside the Python 3 core - instead there has just been a litany of whining that Python 3 is different from Python 2, and a complete and total refusal to attempt to understand *why* we changed the text model. The answer *should* be obvious: the POSIX based text model in Python 2 makes web frameworks easier to write at the expense of making web applications *harder* to write, and the same is true for every other domain where the wire protocol and file format handling is isolated to widely used frameworks and support libraries, with the application code itself operating mostly on text and structured data. With the Python 3 text model, we decided that was a terrible trade-off, so the core text model now *strongly* favours application code. This means that is now *boundary* code that may need additional helper types, because the core types aren't necessarily going to cover all those use cases any more. In particular, the bytes type is, and always will be, designed for pure binary manipulation, while the str type is designed for text manipulation. The weird kinda-text-kinda-binary 8-bit builtin type is gone, and *deliberately* so. I've been saying for years that people should experiment with creating a Python 3 extension type that behaves more like the Python 2 str type. For the standard library, we've never hit a case where the explicit encoding and decoding was so complicated that creating such a type seemed simpler, so *we're* not going to do it. After discussing it with me at LCA, Benno Rice offered to try out the idea, just to determine whether or not it was actually possible. If there are any CPython bugs that mean the idea *doesn't* currently work (such as interoperability issues in the core types), then I'm certainly happy for us to fix *those*. But we're never ever going to change the core text model back to the broken POSIX one, or even take steps in that direction. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From me+python at ixokai.io Sat Jan 11 10:44:41 2014 From: me+python at ixokai.io (Stephen Hansen) Date: Sat, 11 Jan 2014 01:44:41 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111030403.6f177f30@fsol> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> Message-ID: For not caring much, your own stubbornness is quite notable throughout this discussion. Stones and glass houses. :) That said: Twisted and Mercurial aren't the only ones who are hurt by this, at all. I'm aware of at least two other projects who are actively hindered in their support or migration to Python 3 by the bytes type not having some basic functionality that "strings" had in 2.0. The purity crowd in here has brought up that it was an important and serious decision to split Text from Bytes in Py3, and I actually agree with that. However, it is missing some very real and very concrete use-cases -- there are multiple situations where there are byte streams which have a known text-subset which they really, really do need to operate on. There's been a number of examples given: PDF, HTTP, network streams that switch inline from text-ish to binary and back-again.. But, we can focus that down to a very narrow and not at all uncommon situation in the latter. Look at the HTTP Content-Length header. HTTP headers are fuzzy. My understanding is, per the RFCs, their body can be arbitrary octets to the exclusion of line feeds and DELs-- my understanding may be a bit off here, and please feel free to correct me -- but the relevant specifications are a bit fuzzy to begin with. To my understanding of the spec, the header field name is essentially an ASCII text field (sans separator), and the body is... anything, or nearly anything. This is HTTP, which is surely one of the most used protocols in the world. The need to be able to assemble and disassemble such streams of that is a real, valid use-case. But looking at it, now look to the Content-Length header I mentioned. It seems those who are declaring a purity priority in bytes/string separation think it reasonable to do things like: headers.append((b"Content-Length": ("%d" % (len(content))).encode("ascii"))) Or something. In the middle of processing a stream, you need to convert this number into a string then encode it into bytes to just represent the number as the extremely common, widely-accessible 7-bit ascii subset of its numerical value. This isn't some rare, grandiose or fiendish undertaking, or trying to merge Strings and Bytes back together: this is the simple practical recognition that representing a number as its ascii-numerical value is actually not at all uncommon. This position seems utterly astonishing in its ridiculousness to me. The recognition that the number "123" may be represented as b"123" surprises me as a controversial thing, considering how often I see it in real life. There is a LOT of code out there which needs a little bit of a middle ground between bytes and strings; it doesn't mean you are giving way and allowing strings and bytes to merge and giving up on the Edict of Separation. But there are real world use-cases where you simply need to be able to do many basic "String" like operations on byte-streams. The removal of the ability to use interpolation to construct such byte strings was a major regression in python 3 and is a big hurdle for more then a few projects to upgrade. I mean, its not like the "bytes" type lacks knowledge of the subset of bytes that happen to be 7-bit ascii-compatible and can't perform text-ish operations on them-- Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> b"stephen hansen".title() b'Stephen Hansen' How is this not a practical recognition that yes, while bytes are byte streams and not text, a huge subset of bytes are text-y, and as long as we maintain the barrier between higher characters and implicit conversion therein, we're fine? I don't see the difference here. There is a very real, practical need to interpolate bytes. This very real, practical need includes the very real recognition that converting 12345 to b'12345' is not something weird, unusual, and subject to the thorny issues of Encodings. It is not violating the doctrine of separation of powers between Text and Bytes. Personally, I won't be converting my day job's codebase to Python 3 anytime soon (where 'soon' is defined as 'within five years, assuming a best-case scenario that a number of third-party issues are resolved. But! I'm aware and involved with other projects, and this has bit two of them specifically. I'm sure there are others who are not aware of this list or don't feel comfortable talking on it (as it is, I encouraged one of the project's coder to speak up, but they thought the question was a lost one due to previous responses on the original issue ticket and gave up.). On Fri, Jan 10, 2014 at 6:04 PM, Antoine Pitrou wrote: > On Fri, 10 Jan 2014 20:53:09 -0500 > "Eric V. Smith" wrote: > > > > So, I'm -1 on the PEP. It doesn't address the cases laid out in issue > > 3892. See for example http://bugs.python.org/issue3982#msg180432 . > > Then we might as well not do anything, since any attempt to advance > things is met by stubborn opposition in the name of "not far enough". > > (I don't care much personally, I think the issue is quite overblown > anyway) > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/me%2Bpython%40ixokai.io > -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Sat Jan 11 11:59:26 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 11 Jan 2014 02:59:26 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> Message-ID: <52D1240E.8090702@g.nevcal.com> On 1/11/2014 1:44 AM, Stephen Hansen wrote: > There's been a number of examples given: PDF, HTTP, network streams > that switch inline from text-ish to binary and back-again.. But, we > can focus that down to a very narrow and not at all uncommon situation > in the latter. PDF has been mentioned a few times. ReportLAB recently decided to convert to Python 3, and fairly quickly (from my perspective, it took them a _long_ time to decide to port, but once they decided to, then it seemed quick) produced an alpha version that passes many of their tests. I've not tried it yet, although it interests me, as I have some Python 2 code written only because ReportLAB didn't support Python 3, and I wanted to generate some PDF files. I'll be glad to get rid of the Python 2 code, once they are released. But I guess they figured out a solution that wasn't onerous, I'd have to go re-read the threads to be sure, but it seems they are running one code base for both... not sure of the details of what techniques they used, or if they ever used the % operator :) But I'm wondering, since they did what they did so quickly, if the "mixed bytes and str" use case is mostly, in fact, a mind-set issue... yes, likely some code has to change, but maybe the changes really aren't all that significant. I wouldn't want to drag them into this discussion, I'd rather they get the port complete, but it would be interesting to know what they did, and how they did it, and what problems they had, etc. If anyone here knows that code a bit, perhaps the diffs could be examined in their repository to figure out what they did, and how much it impacted their code. I do know they switched XML parsers along the way, as well as dealing with string handling differences. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kristjan at ccpgames.com Sat Jan 11 11:56:46 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Sat, 11 Jan 2014 10:56:46 +0000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D0AC59.9070908@stoneleaf.us> Message-ID: I don't know what the fuss is about. This isn't about breaking the text model. It's about a convenient way to turn text into bytes using a default, lenient, way. Not the other way round. Here's my proposal b'foo%sbar' % (a) would implicitly apply the following function equivalent to every object in the tuple: def coerce_ascii(o): if has_bytes_interface(o): return o return o.encode('ascii', 'strict') There's no need for special %d or %f formatting. If more fanciful formatting is required, e.g. exponents or, or precision, then by all means, to it in the str domain: b'foo%sbar' %("%.15f"%(42.2, )) Basically, let's just support simple bytes interpolation that will support coercing into bytes by means of strict ascii. It's a one way convenience, explicitly requested, and for conselting adults. -----Original Message----- From: Python-Dev [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Nick Coghlan Sent: 11. jan?ar 2014 08:43 To: Ethan Furman Cc: python-dev at python.org Subject: Re: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 No, it's "the POSIX text model is completely broken and we're not letting people bring it back by stealth because they want to stuff their esoteric use case back into the builtin data types instead of writing their own dedicated type now that the builtin types don't handle it any more". From juraj.sukop at gmail.com Sat Jan 11 13:15:46 2014 From: juraj.sukop at gmail.com (Juraj Sukop) Date: Sat, 11 Jan 2014 13:15:46 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111041425.GA52753@cskk.homeip.net> References: <20140111041425.GA52753@cskk.homeip.net> Message-ID: On Sat, Jan 11, 2014 at 5:14 AM, Cameron Simpson wrote: > > Hi Juraj, > Hello Cameron. > data = b' '.join( bytify( [ 10, 0, obj, binary_image_data, ... ] ) ) > Thanks for the suggestion! The problem with "bytify" is that some items might require different formatting than other items. For example, in "Cross-Reference Table" there are three different formats: non-padded integer ("1"), 10- and 15digit integer, ("0000000003", "65535"). -------------- next part -------------- An HTML attachment was scrubbed... URL: From juraj.sukop at gmail.com Sat Jan 11 13:56:56 2014 From: juraj.sukop at gmail.com (Juraj Sukop) Date: Sat, 11 Jan 2014 13:56:56 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111053639.GM3869@ando> References: <20140111053639.GM3869@ando> Message-ID: On Sat, Jan 11, 2014 at 6:36 AM, Steven D'Aprano wrote: > > I'm sorry, I don't understand what you mean here. I'm honestly not > trying to be difficult, but you sound confident that you understand what > you are doing, but your description doesn't make sense to me. To me, it > looks like you are conflating bytes and ASCII characters, that is, > assuming that characters "are" in some sense identical to their ASCII > representation. Let me explain: > > The integer that in English is written as 100 is represented in memory > as bytes 0x0064 (assuming a big-endian C short), so when you say "an > integer is written down AS-IS" (emphasis added), to me that says that > the PDF file includes the bytes 0x0064. But then you go on to write the > three character string "100", which (assuming ASCII) is the bytes > 0x313030. Going from the C short to the ASCII representation 0x313030 is > nothing like inserting the int "as-is". To put it another way, the > Python 2 '%d' format code does not just copy bytes. > Sorry, I should've included an example: when I said "as-is" I meant "1", "0", "0" so that would be yours "0x313030." > If you consider PDF as binary with occasional pieces of ASCII text, then > working with bytes makes sense. But I wonder whether it might be better > to consider PDF as mostly text with some binary bytes. Even though the > bulk of the PDF will be binary, the interesting bits are text. E.g. your > example: > > Even though the binary image data is probably much, much larger in > length than the text shown above, it's (probably) trivial to deal with: > convert your image data into bytes, decode those bytes into Latin-1, > then concatenate the Latin-1 string into the text above. > This is similar to what Chris Barker suggested. I also don't try to be difficult here but please explain to me one thing. To treat bytes as if they were Latin-1 is bad idea, that's why "%f" got dropped in the first place, right? How is it then alright to put an image inside an Unicode string? Also, apart from the in/out conversions, do any other difficulties come to your mind? Please also take note that in Python 3.3 and better, the internal > representation of Unicode strings containing only code points up to 255 > (i.e. pure ASCII or pure Latin-1) is very efficient, using only one byte > per character. > I guess you meant [C]Python... In any case, thanks for the detailed reply. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jan 11 14:25:20 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Jan 2014 23:25:20 +1000 Subject: [Python-Dev] Important background for PEP 460: Py 2/3 text model differences Message-ID: The PEP 460 discussion threads made it clear that some of the participants that weren't around for the earlier parts of the Python 3 transition were struggling with the fundamental conceptual differences between the Python 2 and Python 3 text models. Since other folks (including Armin Ronacher) have also struggled with that distinction, I added a new question and answer to my Python 3 Q&A: http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#what-actually-changed-in-the-text-model-between-python-2-and-python-3 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From g.brandl at gmx.net Sat Jan 11 14:48:27 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 11 Jan 2014 14:48:27 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D0AC59.9070908@stoneleaf.us> Message-ID: Am 11.01.2014 09:43, schrieb Nick Coghlan: > On 11 January 2014 12:28, Ethan Furman wrote: >> On 01/10/2014 06:04 PM, Antoine Pitrou wrote: >>> >>> On Fri, 10 Jan 2014 20:53:09 -0500 >>> "Eric V. Smith" wrote: >>>> >>>> >>>> So, I'm -1 on the PEP. It doesn't address the cases laid out in issue >>>> 3892. See for example http://bugs.python.org/issue3982#msg180432 . >>> >>> >>> Then we might as well not do anything, since any attempt to advance >>> things is met by stubborn opposition in the name of "not far enough". >> >> >> Heh, and here I thought it was stubborn opposition in the name of purity. >> ;) > > No, it's "the POSIX text model is completely broken and we're not > letting people bring it back by stealth because they want to stuff > their esoteric use case back into the builtin data types instead of > writing their own dedicated type now that the builtin types don't > handle it any more". > > Yes, we know we changed the text model and knocked wire protocols off > their favoured perch, and we're (thoroughly) aware of the fact that > wire protocol developers don't like the fact that the default model > now strongly favours the vastly more common case of application > development. > > However, until Benno volunteered to start experimenting with > implementing an asciistr type yesterday, there have been *zero* > meaningful attempts at trying to solve the issues with wire protocol > manipulation outside the Python 3 core Can we please also include pseudo-binary file formats? It's not "just" wire protocols. Georg From g.brandl at gmx.net Sat Jan 11 14:49:58 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 11 Jan 2014 14:49:58 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> Message-ID: Am 11.01.2014 10:44, schrieb Stephen Hansen: > I mean, its not like the "bytes" type lacks knowledge of the subset of bytes > that happen to be 7-bit ascii-compatible and can't perform text-ish operations > on them-- > > Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit > (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> b"stephen hansen".title() > b'Stephen Hansen' > > How is this not a practical recognition that yes, while bytes are byte streams > and not text, a huge subset of bytes are text-y, and as long as we maintain the > barrier between higher characters and implicit conversion therein, we're fine? > > I don't see the difference here. There is a very real, practical need to > interpolate bytes. This very real, practical need includes the very real > recognition that converting 12345 to b'12345' is not something weird, unusual, > and subject to the thorny issues of Encodings. It is not violating the doctrine > of separation of powers between Text and Bytes. This. Exactly. Thanks for putting it so nicely, Stephen. Georg From g.brandl at gmx.net Sat Jan 11 14:54:23 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 11 Jan 2014 14:54:23 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> Message-ID: Am 11.01.2014 14:49, schrieb Georg Brandl: > Am 11.01.2014 10:44, schrieb Stephen Hansen: > >> I mean, its not like the "bytes" type lacks knowledge of the subset of bytes >> that happen to be 7-bit ascii-compatible and can't perform text-ish operations >> on them-- >> >> Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit >> (Intel)] on win32 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> b"stephen hansen".title() >> b'Stephen Hansen' >> >> How is this not a practical recognition that yes, while bytes are byte streams >> and not text, a huge subset of bytes are text-y, and as long as we maintain the >> barrier between higher characters and implicit conversion therein, we're fine? >> >> I don't see the difference here. There is a very real, practical need to >> interpolate bytes. This very real, practical need includes the very real >> recognition that converting 12345 to b'12345' is not something weird, unusual, >> and subject to the thorny issues of Encodings. It is not violating the doctrine >> of separation of powers between Text and Bytes. > > This. Exactly. Thanks for putting it so nicely, Stephen. To elaborate: if the bytes type didn't have all this ASCII-aware functionality already, I think we would have (and be using) a dedicated "asciistr" type right now. But it has the functionality, and it's way too late to remove it. Georg From mal at egenix.com Sat Jan 11 16:15:35 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 11 Jan 2014 16:15:35 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> Message-ID: <52D16017.1020007@egenix.com> On 11.01.2014 14:54, Georg Brandl wrote: > Am 11.01.2014 14:49, schrieb Georg Brandl: >> Am 11.01.2014 10:44, schrieb Stephen Hansen: >> >>> I mean, its not like the "bytes" type lacks knowledge of the subset of bytes >>> that happen to be 7-bit ascii-compatible and can't perform text-ish operations >>> on them-- >>> >>> Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit >>> (Intel)] on win32 >>> Type "help", "copyright", "credits" or "license" for more information. >>> >>> b"stephen hansen".title() >>> b'Stephen Hansen' >>> >>> How is this not a practical recognition that yes, while bytes are byte streams >>> and not text, a huge subset of bytes are text-y, and as long as we maintain the >>> barrier between higher characters and implicit conversion therein, we're fine? >>> >>> I don't see the difference here. There is a very real, practical need to >>> interpolate bytes. This very real, practical need includes the very real >>> recognition that converting 12345 to b'12345' is not something weird, unusual, >>> and subject to the thorny issues of Encodings. It is not violating the doctrine >>> of separation of powers between Text and Bytes. >> >> This. Exactly. Thanks for putting it so nicely, Stephen. > > To elaborate: if the bytes type didn't have all this ASCII-aware functionality > already, I think we would have (and be using) a dedicated "asciistr" type right > now. But it has the functionality, and it's way too late to remove it. I think we need to step back a little from the purist view of things and give more emphasis on the "practicality beats purity" Zen. I complete agree with Stephen, that bytes are in fact often an encoding of text. If that text is ASCII compatible, I don't see any reason why we should not continue to expose the C lib standard string APIs available for text manipulations on bytes. We don't have to be pedantic about the bytes/text separation. It doesn't help in real life. If you give programmers the choice they will - most of the time - do the right thing. If you don't give them the tools, they'll work around the missing features in a gazillion different ways of which many will probably miss a few edge cases. bytes already have most of the 8-bit string methods from Python 2, so it doesn't hurt adding some more of the missing features from Python 2 on top to make life easier for people dealing with multiple/unknown encoding data. BTW: I don't know why so many people keep asking for use cases. Isn't it obvious that text data without known (but ASCII compatible) encoding or multiple different encodings in a single data chunk is part of life ? Most HTTP packets fall into this category, many email messages as well. And let's not forget that we don't live in a perfect world. Broken encodings are everywhere around you - just have a look at your spam folder for a decent chunk of example data :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 11 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Sat Jan 11 16:28:51 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 11 Jan 2014 16:28:51 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> Message-ID: <20140111162851.1b08db58@fsol> On Sat, 11 Jan 2014 08:26:57 +0100 Georg Brandl wrote: > Am 11.01.2014 03:04, schrieb Antoine Pitrou: > > On Fri, 10 Jan 2014 20:53:09 -0500 > > "Eric V. Smith" wrote: > >> > >> So, I'm -1 on the PEP. It doesn't address the cases laid out in issue > >> 3892. See for example http://bugs.python.org/issue3982#msg180432 . > > I agree. > > > Then we might as well not do anything, since any attempt to advance > > things is met by stubborn opposition in the name of "not far enough". > > > > (I don't care much personally, I think the issue is quite overblown > > anyway) > > So you wouldn't mind another overhaul of the PEP including a bit more > functionality again? :) > I really think that practicality beats purity > here. (I'm not advocating free mixing bytes and str, mind you!) The PEP already proposes a certain amount of practicality. I personally *would* mind adding %d and friends to it. But of course someone can fork the PEP or write another one. Regards Antoine. From ncoghlan at gmail.com Sat Jan 11 16:34:26 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jan 2014 01:34:26 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D16017.1020007@egenix.com> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> Message-ID: On 12 January 2014 01:15, M.-A. Lemburg wrote: > On 11.01.2014 14:54, Georg Brandl wrote: >> Am 11.01.2014 14:49, schrieb Georg Brandl: >>> Am 11.01.2014 10:44, schrieb Stephen Hansen: >>> >>>> I mean, its not like the "bytes" type lacks knowledge of the subset of bytes >>>> that happen to be 7-bit ascii-compatible and can't perform text-ish operations >>>> on them-- >>>> >>>> Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit >>>> (Intel)] on win32 >>>> Type "help", "copyright", "credits" or "license" for more information. >>>> >>> b"stephen hansen".title() >>>> b'Stephen Hansen' >>>> >>>> How is this not a practical recognition that yes, while bytes are byte streams >>>> and not text, a huge subset of bytes are text-y, and as long as we maintain the >>>> barrier between higher characters and implicit conversion therein, we're fine? >>>> >>>> I don't see the difference here. There is a very real, practical need to >>>> interpolate bytes. This very real, practical need includes the very real >>>> recognition that converting 12345 to b'12345' is not something weird, unusual, >>>> and subject to the thorny issues of Encodings. It is not violating the doctrine >>>> of separation of powers between Text and Bytes. >>> >>> This. Exactly. Thanks for putting it so nicely, Stephen. >> >> To elaborate: if the bytes type didn't have all this ASCII-aware functionality >> already, I think we would have (and be using) a dedicated "asciistr" type right >> now. But it has the functionality, and it's way too late to remove it. > > I think we need to step back a little from the purist view > of things and give more emphasis on the "practicality beats > purity" Zen. > > I complete agree with Stephen, that bytes are in fact often > an encoding of text. If that text is ASCII compatible, I don't > see any reason why we should not continue to expose the C lib > standard string APIs available for text manipulations on bytes. > > We don't have to be pedantic about the bytes/text separation. > It doesn't help in real life. Yes, it bloody well does. The number of people who have told me that using Python 3 is what allowed them to finally understand how Unicode works vastly exceeds the number of wire protocol and file format devs that have complained about working with binary formats being significantly less tolerant of the "it's really like ASCII text" mindset. We are NOT going back to the confusing incoherent mess that is the Python 2 model of bolting Unicode onto the side of POSIX: http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#what-actually-changed-in-the-text-model-between-python-2-and-python-3 While that was an *expedient* (and, in fact, necessary) solution at the time, the fact it is still thoroughly confusing people 13 years later shows it is not a *comprehensible* solution. > If you give programmers the choice they will - most of the time - > do the right thing. If you don't give them the tools, they'll > work around the missing features in a gazillion different > ways of which many will probably miss a few edge cases. > > bytes already have most of the 8-bit string methods from Python 2, > so it doesn't hurt adding some more of the missing features > from Python 2 on top to make life easier for people dealing > with multiple/unknown encoding data. Because people that aren't happy with the current bytes type persistently refuse to experiment with writing their own extension type to figure out what the API should look like. Jamming speculative API design into the core text model without experimenting in a third party extension first is a straight up stupid idea. Anyone that is pushing for this should be checking out Benno's first draft experimental prototype for asciistr and be working on getting it passing the test suite I created: https://github.com/jeamland/asciicompat The "Wah, you broke it and now I have completely forgotten how to create custom types, so I'm just going to piss and moan until somebody else fixes it" infantilism of the past five years in this regard has frankly pissed me off. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Sat Jan 11 16:38:39 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jan 2014 02:38:39 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140111053639.GM3869@ando> Message-ID: <20140111153837.GP3869@ando> On Sat, Jan 11, 2014 at 01:56:56PM +0100, Juraj Sukop wrote: > On Sat, Jan 11, 2014 at 6:36 AM, Steven D'Aprano wrote: > > If you consider PDF as binary with occasional pieces of ASCII text, then > > working with bytes makes sense. But I wonder whether it might be better > > to consider PDF as mostly text with some binary bytes. Even though the > > bulk of the PDF will be binary, the interesting bits are text. E.g. your > > example: 10 0 obj << /Type /XObject /Width 100 /Height 100 /Alternates 15 0 R /Length 2167 >> stream ...binary image data... endstream endobj > > Even though the binary image data is probably much, much larger in > > length than the text shown above, it's (probably) trivial to deal with: > > convert your image data into bytes, decode those bytes into Latin-1, > > then concatenate the Latin-1 string into the text above. > > This is similar to what Chris Barker suggested. I also don't try to be > difficult here but please explain to me one thing. To treat bytes as if > they were Latin-1 is bad idea, Correct. Bytes are not Latin-1. Here are some bytes which represent a word I extracted from a text file on my computer: b'\x8a\x75\xa7\x65\x72\x73\x74' If you imagine that they are Latin-1, you might think that the word is a C1 control character ("VTS", or Vertical Tabulation Set) followed by "u?erst", but it is not. It is actually the German word "?u?erst" ("extremely"), and the text file was generated on a 1990s vintage Macintosh using the MacRoman "extended ASCII" code page. > that's why "%f" got dropped in the first > place, right? How is it then alright to put an image inside an Unicode > string? The point that I am making is that many people want to add formatting operations to bytes so they can put ASCII strings inside bytes. But (as far as I can tell) they don't need to do this, because they can treat Unicode strings containing code points U+0000 through U+00FF (i.e. the same range as handled by Latin-1) as if they were bytes. This gives you: - convenient syntax, no need to prefix strings with b; - mostly avoid needing to decode and encode strings, except at a few points in your code; - the full set of string methods; - can easily include arbitrary octal or hex byte values, using \o and \x escapes; - error checking: when you finally encode the text to bytes before writing to a file, or sending over a wire, any code-point greater than U+00FF will give you an exception unless explicitly silenced. No need to wait for Python 3.5 to come out, you can do this *right now*. Of course, this is a little bit "unclean", it breaks the separation of text and bytes by treating bytes *as if* they were Unicode code points, which they are not, but I believe that this is a practical technique which is not too hard to deal with. For instance, suppose I have a mixed format which consists of an ASCII tag, a number written in ASCII, a NULL separator, and some binary data: # Using bytes values = [29460, 29145, 31098, 27123] blob = b"".join(struct.pack(">h", n) for n in values) data = b"Tag:" + str(len(values)).encode('ascii') + b"\0" + blob => gives data = b'Tag:4\x00s\x14q\xd9yzi\xf3' That's a bit ugly, but not too ugly. I could write code like that. But if bytes had % formatting, I might write this instead: data = b"Tag:%d\0%s" % (len(values), blob) This is a small improvement, but I can't use it until Python 3.5 comes out. Or I could do this right now: # Using text values = [29460, 29145, 31098, 27123] blob = b"".join(struct.pack(">h", n) for n in values) data = "Tag:%d\0%s" % (len(values), blob.decode('latin-1')) => gives data = 'Tag:4\x00s\x14q?yzi?' When I'm ready to transmit this over the wire, or write to disk, then I encode, and get: data.encode('latin-1') => b'Tag:4\x00s\x14q\xd9yzi\xf3' which is exactly the same as I got in the first place. In this case, I'm not using Latin-1 for the semantics of bytes to characters (e.g. byte \xf3 = char ?), but for the useful property that all 256 distinct bytes are valid in Latin-1. Any other encoding with the same property will do. It is a little unfortunate that struct gives bytes rather than a str, but you can hide that with a simple helper function: def b2s(bytes): return bytes.decode('latin1') data = "Tag:%d\0%s" % (len(values), b2s(blob)) > Also, apart from the in/out conversions, do any other difficulties come to > your mind? No. If you accidentally introduce a non-Latin1 code point, when you decode you'll get an exception. -- Steven From solipsis at pitrou.net Sat Jan 11 16:50:27 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 11 Jan 2014 16:50:27 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> Message-ID: <20140111165027.2465deb2@fsol> On Sun, 12 Jan 2014 01:34:26 +1000 Nick Coghlan wrote: > > Yes, it bloody well does. The number of people who have told me that > using Python 3 is what allowed them to finally understand how Unicode > works vastly exceeds the number of wire protocol and file format devs > that have complained about working with binary formats being > significantly less tolerant of the "it's really like ASCII text" > mindset. +1 to what Nick says. Forcing some constructs to be explicit leads people to know about the issue and understand it, rather than sweep it under the carpet as Python 2 encouraged them to do. Yes, if you're dealing with a file format or network protocol, you'd better know in which charset its textual information is being expressed. It's a very sane question to ask yourself! Regards Antoine. From ethan at stoneleaf.us Sat Jan 11 17:20:27 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 08:20:27 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111153837.GP3869@ando> References: <20140111053639.GM3869@ando> <20140111153837.GP3869@ando> Message-ID: <52D16F4B.6080900@stoneleaf.us> On 01/11/2014 07:38 AM, Steven D'Aprano wrote: > > The point that I am making is that many people want to add formatting > operations to bytes so they can put ASCII strings inside bytes. But (as > far as I can tell) they don't need to do this, because they can treat > Unicode strings containing code points U+0000 through U+00FF (i.e. the > same range as handled by Latin-1) as if they were bytes. So instead of blurring the line between bytes and text, you're blurring the line between text and bytes (with a few extra seat belts thrown in). Besides being a bit awkward, this also means that any encoded text (even the plain ASCII stuff) is now being transformed three times instead of one: unicode to bytes bytes to unicode using latin1 unicode to bytes Even if the cost of moving those bytes around is cheap, it's not free. When you're creating hundreds of PDFs at a time that's going to make a difference. -- ~Ethan~ From mal at egenix.com Sat Jan 11 17:33:17 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 11 Jan 2014 17:33:17 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> Message-ID: <52D1724D.4080307@egenix.com> On 11.01.2014 16:34, Nick Coghlan wrote: > On 12 January 2014 01:15, M.-A. Lemburg wrote: >> On 11.01.2014 14:54, Georg Brandl wrote: >>> Am 11.01.2014 14:49, schrieb Georg Brandl: >>>> Am 11.01.2014 10:44, schrieb Stephen Hansen: >>>> >>>>> I mean, its not like the "bytes" type lacks knowledge of the subset of bytes >>>>> that happen to be 7-bit ascii-compatible and can't perform text-ish operations >>>>> on them-- >>>>> >>>>> Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit >>>>> (Intel)] on win32 >>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>> >>> b"stephen hansen".title() >>>>> b'Stephen Hansen' >>>>> >>>>> How is this not a practical recognition that yes, while bytes are byte streams >>>>> and not text, a huge subset of bytes are text-y, and as long as we maintain the >>>>> barrier between higher characters and implicit conversion therein, we're fine? >>>>> >>>>> I don't see the difference here. There is a very real, practical need to >>>>> interpolate bytes. This very real, practical need includes the very real >>>>> recognition that converting 12345 to b'12345' is not something weird, unusual, >>>>> and subject to the thorny issues of Encodings. It is not violating the doctrine >>>>> of separation of powers between Text and Bytes. >>>> >>>> This. Exactly. Thanks for putting it so nicely, Stephen. >>> >>> To elaborate: if the bytes type didn't have all this ASCII-aware functionality >>> already, I think we would have (and be using) a dedicated "asciistr" type right >>> now. But it has the functionality, and it's way too late to remove it. >> >> I think we need to step back a little from the purist view >> of things and give more emphasis on the "practicality beats >> purity" Zen. >> >> I complete agree with Stephen, that bytes are in fact often >> an encoding of text. If that text is ASCII compatible, I don't >> see any reason why we should not continue to expose the C lib >> standard string APIs available for text manipulations on bytes. >> >> We don't have to be pedantic about the bytes/text separation. >> It doesn't help in real life. > > Yes, it bloody well does. The number of people who have told me that > using Python 3 is what allowed them to finally understand how Unicode > works vastly exceeds the number of wire protocol and file format devs > that have complained about working with binary formats being > significantly less tolerant of the "it's really like ASCII text" > mindset. > > We are NOT going back to the confusing incoherent mess that is the > Python 2 model of bolting Unicode onto the side of POSIX: > http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#what-actually-changed-in-the-text-model-between-python-2-and-python-3 > > While that was an *expedient* (and, in fact, necessary) solution at > the time, the fact it is still thoroughly confusing people 13 years > later shows it is not a *comprehensible* solution. FWIW: I quite liked the Python 2 model, but perhaps that's because I already knww how Unicode works, so could use it to make my life easier ;-) Seriously, Unicode has always caused heated discussions and I don't expect this to change in the next 5-10 years. The point is: there is no 100% perfect solution either way and when you acknowledge this, things don't look black and white anymore, but instead full of colors :-) Python 3 forces people to actually use Unicode; in Python 2 they could easily avoid it. It's good to educate people on how it's used and the issues you can run into, but let's not forget that people are trying to get work done and we all love readable code. PEP 460 just adds two more methods to the bytes object which come in handy when formatting binary data; I don't think it has potential to muddy the Python 3 text model, given that the bytes object already exposes a dozen of other ASCII text methods :-) >> If you give programmers the choice they will - most of the time - >> do the right thing. If you don't give them the tools, they'll >> work around the missing features in a gazillion different >> ways of which many will probably miss a few edge cases. >> >> bytes already have most of the 8-bit string methods from Python 2, >> so it doesn't hurt adding some more of the missing features >> from Python 2 on top to make life easier for people dealing >> with multiple/unknown encoding data. > > Because people that aren't happy with the current bytes type > persistently refuse to experiment with writing their own extension > type to figure out what the API should look like. Jamming speculative > API design into the core text model without experimenting in a third > party extension first is a straight up stupid idea. > > Anyone that is pushing for this should be checking out Benno's first > draft experimental prototype for asciistr and be working on getting it > passing the test suite I created: > https://github.com/jeamland/asciicompat > > The "Wah, you broke it and now I have completely forgotten how to > create custom types, so I'm just going to piss and moan until somebody > else fixes it" infantilism of the past five years in this regard has > frankly pissed me off. Ah, you see: we're entering heated discussions again :-) asciistr is interesting in that it coerces to bytes instead of to Unicode (as is the case in Python 2). At the moment it doesn't cover the more common case bytes + str, just str + bytes, but let's assume it would, then you'd write ... headers += asciistr('Length: %i bytes\n' % 123) headers += b'\n\n' body = b'...' socket.send(headers + body) ... With PEP 460, you could write the above as: ... headers += b'Length: %i bytes\n' % 123 headers += b'\n\n' body = b'...' socket.send(headers + body) ... IMO, that's more readable. Both variants essentially do the same thing: they implicitly coerce ASCII text strings to bytes, so conceptually, there's little difference. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 11 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From matej at ceplovi.cz Sat Jan 11 13:37:32 2014 From: matej at ceplovi.cz (Matěj Cepl) Date: Sat, 11 Jan 2014 13:37:32 +0100 Subject: [Python-Dev] Python3 "complexity" In-Reply-To: =?UTF-8?Q?=3CCAAxjCExfMaJd0PqgfqZu=5FXverN0rinZhh37tuCU4ek9?= =?UTF-8?Q?TtXxDFQ=40mail=2Egmail=2Ecom=3E?= References: <4tqwn4rppvcw.1et0a-v8qii00x@api.elasticemail.com> <20140108213000.2A9152501A1@webabinitio.net> <7wbnzmf26k.fsf@benfinney.id.au> <7wzjn6dln4.fsf@benfinney.id.au> Message-ID: <20140111123732.63F2042231@wycliff.ceplovi.cz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2014-01-10, 17:34 GMT, you wrote: > From my experience, the concept of a default locale is deeply > flawed. What if I log into a (Linux) machine using an old > latin-1 putty from the Windows XP era, have most file names > and contents in UTF-8 encoding, except for one directory where > people from eastern Europe upload files via FTP in whatever > encoding they choose. What should the "default" encoding be > now? I know this stuff is really hard and only because I had to fight with it for a years (being Czech, so not blessed by Latin-1 covering my language ? actually no living encoding does support it completely, but that?s mostly theoretical issue ? Latin-2 used to work for us, and now everybody with civilized OS uses UTF-8 of course, not sure what?s the current state of MS Windows). It seems to me that you have some fundamental principles muddled together. a) Locale should be always set for the particular system. I.e., in your example above you have two variables only: locale of your Windows XP and locale of the Linux box. b) I know for fact that exactly putty (even on Windows XP) CAN translate from UTF-8 on the server to whatever Windows have to offer. So, there is no such thing as ?latin-1 putty?. c) Responsibility for filenames on the system stands on whatever actually saves the file. So, in this testcase it is a matter of correct setting up of the FTP server (I see for example http://rhn.redhat.com/errata/RHBA-2012-0187.html and https://bugzilla.redhat.com/show_bug.cgi?id=638873 which seem to indicate that vsftpd, and what else you would use?, should support UTF-8 on filenames). If the server locale supports Eastern European filenames and vsftpd supports translation to this encoding (hint, hint: UTF-8 does), then you are all set. > That's why I make it a principle to always unset all LC_* and > LANG variables, except when working locally, which happens > rather rarely. That?s a bad idea. Those variables have ALWAYS some value set (perhaps default, which tends to be something like en_US.ASCII, which is not what you want, fortunately on most Unices these days it would be en_US.UTF8, command locale(1) always gives some result). Mat?j -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iD8DBQFS0TsM4J/vJdlkhKwRAg9+AJ9wuCEnPqbUr6imA2L9ak17svSP3ACePVRp 5MKkSVUQ9G7A+fZVhDGiEC8= =MXgT -----END PGP SIGNATURE----- From matej at ceplovi.cz Sat Jan 11 14:00:15 2014 From: matej at ceplovi.cz (Matěj Cepl) Date: Sat, 11 Jan 2014 14:00:15 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D0AC59.9070908@stoneleaf.us> Message-ID: <20140111130015.AE4694160F@wycliff.ceplovi.cz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2014-01-11, 10:56 GMT, you wrote: > I don't know what the fuss is about. I just cannot resist: When you are calm while everybody else is in the state of panic, you haven?t understood the problem. -- one of many collections of Murphy?s Laws Mat?j -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iD8DBQFS0UBf4J/vJdlkhKwRAtc3AJ9c1ElUhLjvHX+Jw4/NvvmGABNbTQCfe9Zm rD65ozDhpj/Fu3ydM8Oipco= =TDQP -----END PGP SIGNATURE----- From ethan at stoneleaf.us Sat Jan 11 18:01:43 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 09:01:43 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D0AC59.9070908@stoneleaf.us> Message-ID: <52D178F7.7070401@stoneleaf.us> On 01/11/2014 12:43 AM, Nick Coghlan wrote: > > In particular, the bytes type is, and always will be, designed for > pure binary manipulation [...] I apologize for being blunt, but this is a lie. Lets take a look at the methods defined by bytes: >>> dir(b'') ['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'center', 'count', 'decode', 'endswith', 'expandtabs', 'find', 'fromhex', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'] Are you really going to insist that expandtabs, isalnum, isalpha, isdigit, islower, isspace, istitle, isupper, ljust, lower, lstrip, rjust, splitlines, swapcase, title, upper, and zfill are pure binary manipulation methods? Let's take a look at the repr of bytes: >>> bytes([48, 49, 50, 51]) b'0123' Wow, that sure doesn't look like binary data! Py3 did not go from three text models to two, it went to one good one (unicode strings) and one broken one (bytes). If the aim was indeed for pure binary manipulation, we failed. We left in bunches of methods which can *only* be interpreted as supporting ASCII manipulation. Due to backwards compatibility we cannot now finish yanking those out, so either we live with a half-dead class screaming "I want be ASCII! I want to be ASCII!" or add back the missing functionality. -- ~Ethan~ From victor.stinner at gmail.com Sat Jan 11 18:41:49 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 11 Jan 2014 18:41:49 +0100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake Message-ID: Hi, I'm in favor of adding support of formatting integer and floatting point numbers in the PEP 460: %d, %u, %o, %x, %f with padding and precision (%10d, %010d, %1.5f) and sign (%-i, %+i) but without alternate format ("{:#x}"). %s would also accept int and float for convenience. int and float subclasses would not be handled differently, their __str__ and __format__ would be ignored. Other int-like and float-like types (ex: defining __int__ or __index__) are not supported. Explicit cast would be required. For %s, the choice between string and number is made using "(PyLong_Check() || PyFloat_Check())". If you agree, I will modify the PEP. If Antoine disagree, I will fork the PEP 460 ;-) --- %s should not support precision (ex: %.100s), use Unicode for that. --- The PEP 460 should not reintroduce bytes+unicode, implicit decoding or implement encoding. b'x=%s' % 10 is well defined, it's pure bytes. If you consider that bytes should not contain text, why does the bytes type have methods like isalpha() or upper()? And why binary files have a readline() method? A "line" doesn't mean anything in pure bytes. It's an example of "practicality beats purity". Python 3 should not enforce Unicode if the developers *chose* to use bytes to handle mixed binary/text protocols like HTTP. But I'm against of adding "%r" and "%a" because they use Unicode and would require an implicit encoding. type(ascii(obj)) is str, not bytes. If you really want to use repr() and ascii(), encode the result explicitly. Victor From ethan at stoneleaf.us Sat Jan 11 19:09:33 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 10:09:33 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> Message-ID: <52D188DD.1070004@stoneleaf.us> On 01/11/2014 07:34 AM, Nick Coghlan wrote: > On 12 January 2014 01:15, M.-A. Lemburg wrote: >> >> We don't have to be pedantic about the bytes/text separation. >> It doesn't help in real life. > > Yes, it bloody well does. The number of people who have told me that > using Python 3 is what allowed them to finally understand how Unicode > works . . . We are not proposing a change to the unicode string type in any way. > We are NOT going back to the confusing incoherent mess that is the > Python 2 model of bolting Unicode onto the side of POSIX . . . We are not asking for that. >> bytes already have most of the 8-bit string methods from Python 2, >> so it doesn't hurt adding some more of the missing features >> from Python 2 on top to make life easier for people dealing >> with multiple/unknown encoding data. > > Because people that aren't happy with the current bytes type > persistently refuse to experiment with writing their own extension > type to figure out what the API should look like. Jamming speculative > API design into the core text model without experimenting in a third > party extension first is a straight up stupid idea. True, if this were a new API; but it isn't, it's the Py2 str API that was stripped out. The one big difference being that if the results of %s (or %d or any other %) is not in the 0-127 range it errors out. -- ~Ethan~ From g.brandl at gmx.net Sat Jan 11 19:29:27 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 11 Jan 2014 19:29:27 +0100 Subject: [Python-Dev] PEP 460: allowing %d and %f and NOT ALLOWING mojibake :) In-Reply-To: References: Message-ID: Am 11.01.2014 18:41, schrieb Victor Stinner: > Hi, > > I'm in favor of adding support of formatting integer and floatting > point numbers in the PEP 460: %d, %u, %o, %x, %f with padding and > precision (%10d, %010d, %1.5f) and sign (%-i, %+i) but without > alternate format ("{:#x}"). %s would also accept int and float for > convenience. > > int and float subclasses would not be handled differently, their > __str__ and __format__ would be ignored. > > Other int-like and float-like types (ex: defining __int__ or > __index__) are not supported. Explicit cast would be required. > > For %s, the choice between string and number is made using > "(PyLong_Check() || PyFloat_Check())". > > If you agree, I will modify the PEP. If Antoine disagree, I will fork > the PEP 460 ;-) > > --- > > %s should not support precision (ex: %.100s), use Unicode for that. > > --- > > The PEP 460 should not reintroduce bytes+unicode, implicit decoding or > implement encoding. > > b'x=%s' % 10 is well defined, it's pure bytes. If you consider that > bytes should not contain text, why does the bytes type have methods > like isalpha() or upper()? And why binary files have a readline() > method? A "line" doesn't mean anything in pure bytes. > > It's an example of "practicality beats purity". Python 3 should not > enforce Unicode if the developers *chose* to use bytes to handle mixed > binary/text protocols like HTTP. > > But I'm against of adding "%r" and "%a" because they use Unicode and > would require an implicit encoding. type(ascii(obj)) is str, not > bytes. If you really want to use repr() and ascii(), encode the result > explicitly. I agree. For non-ASCII characters what ascii() gives you is almost always not what you want anyway. Georg From solipsis at pitrou.net Sat Jan 11 19:32:26 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 11 Jan 2014 19:32:26 +0100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake References: Message-ID: <20140111193226.23cc771d@fsol> On Sat, 11 Jan 2014 18:41:49 +0100 Victor Stinner wrote: > > If you agree, I will modify the PEP. If Antoine disagree, I will fork > the PEP 460 ;-) Please fork it. > b'x=%s' % 10 is well defined, it's pure bytes. It is well-defined? Then please explain me what the general case of b'%s' % x is supposed to call: - does it call x.__bytes__? int.__bytes__ doesn't exist - does it call bytes(x)? bytes(10) gives b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' - does it call x.__str__? you've reintroduced the Python 2 behaviour of conflating bytes and unicode Regards Antoine. From steve at pearwood.info Sat Jan 11 19:36:32 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jan 2014 05:36:32 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D16F4B.6080900@stoneleaf.us> References: <20140111053639.GM3869@ando> <20140111153837.GP3869@ando> <52D16F4B.6080900@stoneleaf.us> Message-ID: <20140111183631.GQ3869@ando> On Sat, Jan 11, 2014 at 08:20:27AM -0800, Ethan Furman wrote: > On 01/11/2014 07:38 AM, Steven D'Aprano wrote: > > > >The point that I am making is that many people want to add formatting > >operations to bytes so they can put ASCII strings inside bytes. But (as > >far as I can tell) they don't need to do this, because they can treat > >Unicode strings containing code points U+0000 through U+00FF (i.e. the > >same range as handled by Latin-1) as if they were bytes. > > So instead of blurring the line between bytes and text, you're blurring the > line between text and bytes (with a few extra seat belts thrown in). I'm not blurring anything. The people who designed the file format that mixes textual data and binary data did the blurring. Given that such formats exist, it is inevitable that we need to put text into bytes, or bytes into text. The situation is already blurred, we just have to decide how to handle it. There are three broad strategies: 1) Make bytes more string-like, so that we can process our data as bytes, but still do string operations on the bits that are ASCII. 2) Make strings more byte-like, so that we can process our data as strings, but do byte operations (like bit mask operations) on the parts that are binary data. 3) Don't do either. Keep the text parts of your data as text, and the binary parts of your data as bytes. Do your text operations on text, and your byte operations on bytes. At some point, of course, they need to be combined. We have a choice: * Right now, we can use text as the base, and combine bytes into the text using Latin-1, and it Just Works. * Or we can wait until (maybe) Python 3.5, when (perhaps) bytes objects will be more text-like, and then use bytes as the base, and (with luck) it Should Just Work. There's another disadvantage with the second: treating bytes as if they were ASCII by default reinforces the same old harmful paradigm that text is ASCII that we're trying to get away from. That's a bad, painful idea that causes a lot of problems and buggy code, and should be resisted. On the other hand, embedding arbitrary binary data in Unicode text doesn't reinforce any common or harmful paradigms. It just requires the programmer to forget about characters and concentrate on code points, since Latin-1 maps bytes to code points in a very convenient way: Byte 0x00 maps to code point U+0000 Byte 0x01 maps to code point U+0001 Byte 0x02 maps to code point U+0002 ... Byte 0xFF maps to code point U+00FF So to embed the binary data 0xDEADBEEF in your string, you can just use '\xDE\xAD\xBE\xEF' regardless of what character those code points happen to be. If we are manipulating data *as if it were text*, then we ought to treat it as text, not add methods to bytes that makes bytes text-like. If we are manipulating data *as if it were bytes*, doing byte-manipulation operations like bit-masking, then we ought to treat it as numeric bytes, not add numeric methods to text. Is that really a controversial opinion? > Besides being a bit awkward, this also means that any encoded text (even > the plain ASCII stuff) is now being transformed three times instead of one: > > unicode to bytes > bytes to unicode using latin1 > unicode to bytes Where do you get this from? I don't follow your logic. Start with a text template: template = """\xDE\xAD\xBE\xEF Name:\0\0\0%s Age:\0\0\0\0%d Data:\0\0\0%s blah blah blah """ data = template % ("George", 42, blob.decode('latin-1')) Only the binary blobs need to be decoded. We don't need to encode the template to bytes, and the textual data doesn't get encoded until we're ready to send it across the wire or write it to disk. And when we do, since all the code points are in the range U+0000 to U+00FF, encoding it to Latin-1 ought to be a fast, efficient operation, possibly even just a mem copy. It's true that the individual binary data fields will been to be decoded from bytes, but unless you want Python to guess an encoding (which is the old broken Python 2 model), you're going to have to do that regardless. > Even if the cost of moving those bytes around is cheap, it's not free. > When you're creating hundreds of PDFs at a time that's going to make a > difference. You've profiled it? Unless you've measured it, it doesn't exist. I'm not going to debate performance penalties of code you haven't written yet. -- Steven From ethan at stoneleaf.us Sat Jan 11 19:38:01 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 10:38:01 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <20140111193226.23cc771d@fsol> References: <20140111193226.23cc771d@fsol> Message-ID: <52D18F89.9030006@stoneleaf.us> On 01/11/2014 10:32 AM, Antoine Pitrou wrote: > On Sat, 11 Jan 2014 18:41:49 +0100 > Victor Stinner wrote: >> >> If you agree, I will modify the PEP. If Antoine disagree, I will fork >> the PEP 460 ;-) > > Please fork it. You've already stated you don't care that much and are willing to let the PEP as-is be rejected. Why not remove your name and let Victor have it back? Is he not the original author? (If this is protocol just say so -- remember I'm still new to the ways of PyDev. :). -- ~Ethan~ From rdmurray at bitdance.com Sat Jan 11 19:38:27 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Sat, 11 Jan 2014 13:38:27 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D1724D.4080307@egenix.com> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <52D1724D.4080307@egenix.com> Message-ID: <20140111183828.65323250113@webabinitio.net> tl;dr: At the end I'm volunteering to look at real code that is having porting problems. On Sat, 11 Jan 2014 17:33:17 +0100, "M.-A. Lemburg" wrote: > asciistr is interesting in that it coerces to bytes instead > of to Unicode (as is the case in Python 2). > > At the moment it doesn't cover the more common case bytes + str, > just str + bytes, but let's assume it would, then you'd write > > ... > headers += asciistr('Length: %i bytes\n' % 123) > headers += b'\n\n' > body = b'...' > socket.send(headers + body) > ... > > With PEP 460, you could write the above as: > > ... > headers += b'Length: %i bytes\n' % 123 > headers += b'\n\n' > body = b'...' > socket.send(headers + body) > ... > > IMO, that's more readable. > > Both variants essentially do the same thing: they implicitly > coerce ASCII text strings to bytes, so conceptually, there's > little difference. And if we are explicit: headers = u'Length: %i bytes\n' % 123 headers += u'\n\n' body = b'...' socket.send(headers.encode('ascii') + body) (I included the 'u' prefix only because we are talking about shared-codebase python2/python3 code.) That looks pretty readable to me, and it is explicit about what parts are text and what parts are binary. But of course we'd never do exactly that in any but the simplest of protocols and scripts. Instead we'd write a library that had one or more object that modeled our wire/file protocol. The text parts the API would accept input as text strings. The binary parts it would accept input as bytes. Then, when reading or writing the data stream, we perform the appropriate conversions on the appropriate parts. Our library does a more complex analog of 'socket.send(headers.encode('ascii') + body)', one that understands the various parts and glues them together, encoding the text parts to the appropriate encoding (often-but-not-always ascii) as it does so. And yes, I have written code that does this in Python3. What I haven't done is written that code to run in both Python3 and Python2. I *think* the only missing thing I would need to back-port it is the surrogateescape error handler, but I haven't tried it. And I could probably conditionalize the code to use latin1 on python2 instead and get away with it. And please note that email is probably the messiest of messy binary wire protocols. Not only do you have bytes and text mixed in the same data stream, with internal markers (in the text parts) that specify how to interpret the binary, including what encodings each part of that binary data is in for cases where that matters, you *also* have to deal with the possibility of there being *invalid* binary data mixed in with the ostensibly text parts, that you nevertheless are expected to both preserve and parse around. When I started adding back binary support to the email package, I was really annoyed by the lack of certain string features in the bytes type. But in the end, it turned out to be really simple to instead think of the text-with-invalid-bytes parts as *text*-with-invalid-bytes (surrogateescaped bytes). Now, if I was designing from the ground up I'd store the stuff that was really binary as bytes in the model object instead of storing it as surrogateescaed text, but that problem is a consequence of how we got from there to here (python2-email to python3-email-that-didn't-handle-8bit-data to python3-email-that-works) rather than a problem with the python3 core data model. So it seems like I'm with Nick and Antoine and company here. The byte-interpolation proposed by Antoine seems reasonable, but I don't see the *need* for the other stuff. I think that programs will be cleaner if the text parts of the protocol are handled *as text*. On the other hand, Ethan's point that bytes *does* have text methods is true. However, other than the perfectly-sensible-for-bytes split, strip, and ends/startswith, I don't think I actually use any of them. But! Our goal should be to help people convert to Python3. So how can we find out what the specific problems are that real-world programs are facing, look at the *actual code*, and help that project figure out the best way to make that code work in both python2 and python3? That seems like the best way to find out what needs to be added to python3 or pypi: help port the actual code of the developers who are running into problems. Yes, I'm volunteering to help with this, though of course I can't promise exactly how much time I'll have available. --David From stephen at xemacs.org Sat Jan 11 19:44:39 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 12 Jan 2014 03:44:39 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D16017.1020007@egenix.com> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> Message-ID: <87y52me4tk.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > I complete agree with Stephen, that bytes are in fact often > an encoding of text. If that text is ASCII compatible, I don't > see any reason why we should not continue to expose the C lib > standard string APIs available for text manipulations on bytes. We already *have* a type in Python 3.3 that provides text manipulations on arrays of 8-bit objects: str (per PEP 393). > BTW: I don't know why so many people keep asking for use cases. > Isn't it obvious that text data without known (but ASCII compatible) > encoding or multiple different encodings in a single data chunk > is part of life ? Isn't it equally obvious that if you create or read all such ASCII- compatible chunks as (encoding='ascii', errors='surrogateescape') that you *don't need* string APIs for bytes? Why do these "text chunks" need to be bytes in the first place? That's why we ask for use cases. AFAICS, reading and writing ASCII- compatible text data as 'latin1' is just as fast as bytes I/O. So it's not I/O efficiency, and (since in this model we don't do any en/decoding on bytes/str), it's not redundant en/decoding of bytes to str and back. From steve at pearwood.info Sat Jan 11 19:49:25 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jan 2014 05:49:25 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D16017.1020007@egenix.com> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> Message-ID: <20140111184925.GR3869@ando> On Sat, Jan 11, 2014 at 04:15:35PM +0100, M.-A. Lemburg wrote: > I think we need to step back a little from the purist view > of things and give more emphasis on the "practicality beats > purity" Zen. > > I complete agree with Stephen, that bytes are in fact often > an encoding of text. If that text is ASCII compatible, I don't > see any reason why we should not continue to expose the C lib > standard string APIs available for text manipulations on bytes. Later in your post, you talk about the masses of broken encodings found everywhere (not just in your spam folder). How do the C lib standard string APIs help programmers to avoid broken encodings? > We don't have to be pedantic about the bytes/text separation. > It doesn't help in real life. On the contrary, it helps a lot. To the extent that people keep that clean bytes/text separation, it helps avoid bugs. It prevents problems like this Python 2 nonsense: s = "Stra?e" assert len(s) == 6 # fails assert s[5] == 'e' # fails Most problematic, printing s may (depending on your terminal settings) actually look like "Stra?e". Not only is having a clean bytes/text separation the pedantic thing to do, it's also the right thing to do nearly always (not withstanding a few exceptions, allegedly). > If you give programmers the choice they will - most of the time - > do the right thing. Unicode has been available in Python since version 2.2, more than a decade ago. And yet here we are, five point releases later (2.7), and the majority of text processing code is still using bytes. I'm not just pointing the finger at others. My 2.x only code almost always uses byte strings for text processing, and not always because it was old code I wrote before I knew better. The coders I work with do the same, only you can remove the word "almost". The code I see posted on comp.lang.python and Reddit and the tutor mailing list invariably uses byte strings. The beginners on the tutor list at least have an excuse that they are beginners. A quarter of a century after Unicode was first published, nearly 28 years since IBM first introduced the concept of "code pages" to PC users, and we still have programmers writing ASCII only string-handling code that, if it works with extended character sets, only works by accident. The majority of programmer still have *no idea* of even the most basic parts of Unicode. They've had the the right tools for a decade, and ignored them. Python 3 forces the issue, and my code is better for it. > bytes already have most of the 8-bit string methods from Python 2, > so it doesn't hurt adding some more of the missing features > from Python 2 on top to make life easier for people dealing > with multiple/unknown encoding data. I personally think it was a mistake to keep text operations like upper() and lower() on bytes. I think it will compound the mistake to add even more text operations. -- Steven From steve at pearwood.info Sat Jan 11 19:52:45 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jan 2014 05:52:45 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D1724D.4080307@egenix.com> References: <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <52D1724D.4080307@egenix.com> Message-ID: <20140111185245.GS3869@ando> On Sat, Jan 11, 2014 at 05:33:17PM +0100, M.-A. Lemburg wrote: > FWIW: I quite liked the Python 2 model, but perhaps that's because > I already knww how Unicode works, so could use it to make my > life easier ;-) /incredulous I would really love to see you justify that claim. How do you use the Python 2 string type to make processing Unicode text easier? -- Steven From python at mrabarnett.plus.com Sat Jan 11 20:22:30 2014 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 11 Jan 2014 19:22:30 +0000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111053639.GM3869@ando> References: <20140111053639.GM3869@ando> Message-ID: <52D199F6.7090105@mrabarnett.plus.com> On 2014-01-11 05:36, Steven D'Aprano wrote: [snip] > Latin-1 has the nice property that every byte decodes into the character > with the same code point, and visa versa. So: > > for i in range(256): > assert bytes([i]).decode('latin-1') == chr(i) > assert chr(i).encode('latin-1') == bytes([i]) > > passes. It seems to me that your problem goes away if you use Unicode > text with embedded binary data, rather than binary data with embedded > ASCII text. Then when writing the file to disk, of course you encode it > to Latin-1, either explicitly: > > pdf = ... # Unicode string containing the PDF contents > with open("outfile.pdf", "wb") as f: > f.write(pdf.encode("latin-1") > > or implicitly: > > with open("outfile.pdf", "w", encoding="latin-1") as f: > f.write(pdf) > [snip] The second example won't work because you're forgetting about the handling of line endings in text mode. Suppose you have some binary data bytes([10]). You convert it into a Unicode string using Latin-1, giving '\n'. You write it out to a file opened in text mode. On Windows, that string '\n' will be written to the file as b'\r\n'. From solipsis at pitrou.net Sat Jan 11 20:22:46 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 11 Jan 2014 20:22:46 +0100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> Message-ID: <20140111202246.022e4458@fsol> On Sat, 11 Jan 2014 10:38:01 -0800 Ethan Furman wrote: > On 01/11/2014 10:32 AM, Antoine Pitrou wrote: > > On Sat, 11 Jan 2014 18:41:49 +0100 > > Victor Stinner wrote: > >> > >> If you agree, I will modify the PEP. If Antoine disagree, I will fork > >> the PEP 460 ;-) > > > > Please fork it. > > You've already stated you don't care that much and are willing to let the PEP as-is be rejected. Why not remove your > name and let Victor have it back? Is he not the original author? (If this is protocol just say so -- remember I'm > still new to the ways of PyDev. :). Because the PEP is IMO a much saner compromise than what you're trying to do (and would also stand a better chance of being accepted, if it weren't for your stupid maximalist opposition). Regards Antoine. From ethan at stoneleaf.us Sat Jan 11 20:05:36 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 11:05:36 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111183631.GQ3869@ando> References: <20140111053639.GM3869@ando> <20140111153837.GP3869@ando> <52D16F4B.6080900@stoneleaf.us> <20140111183631.GQ3869@ando> Message-ID: <52D19600.1070106@stoneleaf.us> On 01/11/2014 10:36 AM, Steven D'Aprano wrote: > On Sat, Jan 11, 2014 at 08:20:27AM -0800, Ethan Furman wrote: >> >> unicode to bytes >> bytes to unicode using latin1 >> unicode to bytes > > Where do you get this from? I don't follow your logic. Start with a text > template: > > template = """\xDE\xAD\xBE\xEF > Name:\0\0\0%s > Age:\0\0\0\0%d > Data:\0\0\0%s > blah blah blah > """ > > data = template % ("George", 42, blob.decode('latin-1')) > > Only the binary blobs need to be decoded. We don't need to encode the > template to bytes, and the textual data doesn't get encoded until we're > ready to send it across the wire or write it to disk. And what if your name field has data not representable in latin-1? --> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8') u'\u0441\u0440\u0403' --> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8').encode('latin1') Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-2: ordinal not in range(256) So really your example should be: data = template % ("George".encode('some_non_ascii_encoding_such_as_cp1251').decode('latin-1'), 42, blob.decode('latin-1')) Which is a mess. -- ~Ethan~ From kristjan at ccpgames.com Sat Jan 11 20:40:31 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Sat, 11 Jan 2014 19:40:31 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: Message-ID: Hi there. How about a compromise? Personally, I think adding the full complement of integer/float formatting to bytes is a bit over the top. How about just supporting two format specifiers? %b : interpolate a bytes object. If it doesn't have the buffer interface, error. %s : interpolate a str object, encoded to ASCII using 'strict' conversion. This should cover the most common use cases. In particular, you could do this: Headers.append('Content-Length: %s'%(len(data),)) And then subsequently: Packet = b'%b%b'%(b"join(headers), data) For more complex formatting, you delegate to the more capable string class, but benefit from automatic ASCII conversion: Data = b"percentage = %s" % ("%4.2f" % (value,)) I think interpolating bytes objecst is very important. And support for automatic ASCII conversion in the process will help us cover all of the numeric use cases. K -----Original Message----- From: Python-Dev [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Victor Stinner Sent: 11. jan?ar 2014 17:42 To: Python Dev Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake Hi, I'm in favor of adding support of formatting integer and floatting point numbers in the PEP 460: %d, %u, %o, %x, %f with padding and precision (%10d, %010d, %1.5f) and sign (%-i, %+i) but without alternate format ("{:#x}"). %s would also accept int and float for convenience. From barry at python.org Sat Jan 11 19:14:54 2014 From: barry at python.org (Barry Warsaw) Date: Sat, 11 Jan 2014 13:14:54 -0500 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D18F89.9030006@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> Message-ID: <20140111131454.33570b94@anarchist.wooz.org> On Jan 11, 2014, at 10:38 AM, Ethan Furman wrote: >You've already stated you don't care that much and are willing to let the PEP >as-is be rejected. Why not remove your name and let Victor have it back? Is >he not the original author? (If this is protocol just say so -- remember I'm >still new to the ways of PyDev. :). From a procedural point of view, I would say that it's entirely appropriate for a PEP to have open questions, alternatives, and options. Have it lay out the arguments pro and con and let Guido or the appointed PEP czar make the final decision. Then the PEP can be amended with those decisions, and if folks still think more needs to be done, a follow up PEP can be filed. -Barry From stephen at xemacs.org Sat Jan 11 20:49:47 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 12 Jan 2014 04:49:47 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D199F6.7090105@mrabarnett.plus.com> References: <20140111053639.GM3869@ando> <52D199F6.7090105@mrabarnett.plus.com> Message-ID: <87wqi6e1t0.fsf@uwakimon.sk.tsukuba.ac.jp> MRAB writes: > > with open("outfile.pdf", "w", encoding="latin-1") as f: > > f.write(pdf) > > > [snip] > The second example won't work because you're forgetting about the > handling of line endings in text mode. Not so fast! Forgot, yes (me too!), but not work? Not quite: with open("outfile.pdf", "w", encoding="latin-1", newline="") as f: f.write(pdf) should do the trick. From ethan at stoneleaf.us Sat Jan 11 20:54:26 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 11:54:26 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <87wqi6e1t0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111053639.GM3869@ando> <52D199F6.7090105@mrabarnett.plus.com> <87wqi6e1t0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D1A172.5090905@stoneleaf.us> On 01/11/2014 11:49 AM, Stephen J. Turnbull wrote: > MRAB writes: > > > > with open("outfile.pdf", "w", encoding="latin-1") as f: > > > f.write(pdf) > > > > > [snip] > > The second example won't work because you're forgetting about the > > handling of line endings in text mode. > > Not so fast! Forgot, yes (me too!), but not work? Not quite: > > with open("outfile.pdf", "w", encoding="latin-1", newline="") as f: > f.write(pdf) > > should do the trick. Well, it's good that there is a work-a-round. Are we going to have a document listing all the work-a-rounds needed to program a bytes-oriented style using unicode? -- ~Ethan~ From rdmurray at bitdance.com Sat Jan 11 21:40:43 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Sat, 11 Jan 2014 15:40:43 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D1A172.5090905@stoneleaf.us> References: <20140111053639.GM3869@ando> <52D199F6.7090105@mrabarnett.plus.com> <87wqi6e1t0.fsf@uwakimon.sk.tsukuba.ac.jp> <52D1A172.5090905@stoneleaf.us> Message-ID: <20140111204043.DD49A2500AA@webabinitio.net> On Sat, 11 Jan 2014 11:54:26 -0800, Ethan Furman wrote: > On 01/11/2014 11:49 AM, Stephen J. Turnbull wrote: > > MRAB writes: > > > > > > with open("outfile.pdf", "w", encoding="latin-1") as f: > > > > f.write(pdf) > > > > > > > [snip] > > > The second example won't work because you're forgetting about the > > > handling of line endings in text mode. > > > > Not so fast! Forgot, yes (me too!), but not work? Not quite: > > > > with open("outfile.pdf", "w", encoding="latin-1", newline="") as f: > > f.write(pdf) > > > > should do the trick. > > Well, it's good that there is a work-a-round. Are we going to have a document listing all the work-a-rounds needed to > program a bytes-oriented style using unicode? That's not a work-around (if you are talking specifically about the newline=""). That's just the way the python3 IO library works. If you want to preserve the newlines in your data, but still have the text-io machinery count them for deciding when to trigger io/buffering behavior, you use newline=''. It's not the most intuitive API, so I won't be surprised if a lot of people don't know about it or get confused by it when they see it. I first learned about it in the context of csv files, another one of those legacy file protocols that are mostly-text-but-not-entirely. --David From donald at stufft.io Sat Jan 11 21:45:59 2014 From: donald at stufft.io (Donald Stufft) Date: Sat, 11 Jan 2014 15:45:59 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> Message-ID: <0AD3B4B7-A108-42F8-B914-D0CCB5C454C9@stufft.io> On Jan 11, 2014, at 10:34 AM, Nick Coghlan wrote: > Yes, it bloody well does. The number of people who have told me that > using Python 3 is what allowed them to finally understand how Unicode > works vastly exceeds the number of wire protocol and file format devs > that have complained about working with binary formats being > significantly less tolerant of the "it's really like ASCII text" > mindset. FWIW as one of the people who it took Python3 to finally figure out how to actually use unicode, it was the absence of encode on bytes and decode on str that actually did it. Giving bytes a format method would not have affected that either way I don?t believe. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ethan at stoneleaf.us Sat Jan 11 20:51:28 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 11:51:28 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <20140111202246.022e4458@fsol> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> Message-ID: <52D1A0C0.60704@stoneleaf.us> On 01/11/2014 11:22 AM, Antoine Pitrou wrote: > On Sat, 11 Jan 2014 10:38:01 -0800 > Ethan Furman wrote: >> On 01/11/2014 10:32 AM, Antoine Pitrou wrote: >>> On Sat, 11 Jan 2014 18:41:49 +0100 >>> Victor Stinner wrote: >>>> >>>> If you agree, I will modify the PEP. If Antoine disagree, I will fork >>>> the PEP 460 ;-) >>> >>> Please fork it. >> >> You've already stated you don't care that much and are willing to let the PEP as-is be rejected. Why not remove your >> name and let Victor have it back? Is he not the original author? (If this is protocol just say so -- remember I'm >> still new to the ways of PyDev. :). > > Because the PEP is IMO a much saner compromise than what you're > trying to do (and would also stand a better chance of being accepted, > if it weren't for your stupid maximalist opposition). Well, it's good to know you do care. :) -- ~Ethan~ From g.brandl at gmx.net Sat Jan 11 21:50:12 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 11 Jan 2014 21:50:12 +0100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <20140111202246.022e4458@fsol> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> Message-ID: Am 11.01.2014 20:22, schrieb Antoine Pitrou: > On Sat, 11 Jan 2014 10:38:01 -0800 > Ethan Furman wrote: >> On 01/11/2014 10:32 AM, Antoine Pitrou wrote: >> > On Sat, 11 Jan 2014 18:41:49 +0100 >> > Victor Stinner wrote: >> >> >> >> If you agree, I will modify the PEP. If Antoine disagree, I will fork >> >> the PEP 460 ;-) >> > >> > Please fork it. >> >> You've already stated you don't care that much and are willing to let the PEP as-is be rejected. Why not remove your >> name and let Victor have it back? Is he not the original author? (If this is protocol just say so -- remember I'm >> still new to the ways of PyDev. :). > > Because the PEP is IMO a much saner compromise than what you're > trying to do (and would also stand a better chance of being accepted, > if it weren't for your stupid maximalist opposition). Can you please stop throwing personal insults around? You don't have to resort to that level. Georg From storchaka at gmail.com Sat Jan 11 22:01:05 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 11 Jan 2014 23:01:05 +0200 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: Message-ID: 11.01.14 21:40, Kristj?n Valur J?nsson ???????(??): > How about a compromise? > Personally, I think adding the full complement of integer/float formatting to bytes is a bit over the top. > How about just supporting two format specifiers? > %b : interpolate a bytes object. If it doesn't have the buffer interface, error. > %s : interpolate a str object, encoded to ASCII using 'strict' conversion. %b is not supported in Python 2.7. And compatibility with Python 2.7 is only the purpose of this feature. From g.brandl at gmx.net Sat Jan 11 22:10:29 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 11 Jan 2014 22:10:29 +0100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: Message-ID: Am 11.01.2014 22:01, schrieb Serhiy Storchaka: > 11.01.14 21:40, Kristj?n Valur J?nsson ???????(??): >> How about a compromise? >> Personally, I think adding the full complement of integer/float formatting to bytes is a bit over the top. >> How about just supporting two format specifiers? >> %b : interpolate a bytes object. If it doesn't have the buffer interface, error. >> %s : interpolate a str object, encoded to ASCII using 'strict' conversion. > > %b is not supported in Python 2.7. And compatibility with Python 2.7 is > only the purpose of this feature. Not "only", but it is certainly an important one. Georg From tjreedy at udel.edu Sat Jan 11 22:28:34 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 11 Jan 2014 16:28:34 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <87y52me4tk.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <87y52me4tk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 1/11/2014 1:44 PM, Stephen J. Turnbull wrote: > We already *have* a type in Python 3.3 that provides text > manipulations on arrays of 8-bit objects: str (per PEP 393). > > > BTW: I don't know why so many people keep asking for use cases. > > Isn't it obvious that text data without known (but ASCII compatible) > > encoding or multiple different encodings in a single data chunk > > is part of life ? > > Isn't it equally obvious that if you create or read all such ASCII- > compatible chunks as (encoding='ascii', errors='surrogateescape') that > you *don't need* string APIs for bytes? > > Why do these "text chunks" need to be bytes in the first place? > That's why we ask for use cases. AFAICS, reading and writing ASCII- > compatible text data as 'latin1' is just as fast as bytes I/O. So > it's not I/O efficiency, and (since in this model we don't do any > en/decoding on bytes/str), it's not redundant en/decoding of bytes to > str and back. The problem with some criticisms of using 'unicode in Python 3' is that there really is no such thing. Unicode in 3.0 to 3.2 used the old internal model inherited from 2.x. Unicode in 3.3+ uses a different internal model that is a game changer with respect to certain issues of space and time efficiency (and cross-platform correctness and portability). So at least some the valid criticisms based on the old model are out of date and no longer valid. -- Terry Jan Reedy From dholth at gmail.com Sat Jan 11 22:35:13 2014 From: dholth at gmail.com (Daniel Holth) Date: Sat, 11 Jan 2014 16:35:13 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <87y52me4tk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Jan 11, 2014 at 4:28 PM, Terry Reedy wrote: > On 1/11/2014 1:44 PM, Stephen J. Turnbull wrote: > >> We already *have* a type in Python 3.3 that provides text >> manipulations on arrays of 8-bit objects: str (per PEP 393). >> >> > BTW: I don't know why so many people keep asking for use cases. >> > Isn't it obvious that text data without known (but ASCII compatible) >> > encoding or multiple different encodings in a single data chunk >> > is part of life ? >> >> Isn't it equally obvious that if you create or read all such ASCII- >> compatible chunks as (encoding='ascii', errors='surrogateescape') that >> you *don't need* string APIs for bytes? >> >> Why do these "text chunks" need to be bytes in the first place? >> That's why we ask for use cases. AFAICS, reading and writing ASCII- >> compatible text data as 'latin1' is just as fast as bytes I/O. So >> it's not I/O efficiency, and (since in this model we don't do any >> en/decoding on bytes/str), it's not redundant en/decoding of bytes to >> str and back. > > > The problem with some criticisms of using 'unicode in Python 3' is that > there really is no such thing. Unicode in 3.0 to 3.2 used the old internal > model inherited from 2.x. Unicode in 3.3+ uses a different internal model > that is a game changer with respect to certain issues of space and time > efficiency (and cross-platform correctness and portability). So at least > some the valid criticisms based on the old model are out of date and no > longer valid. -1 on adding more surrogateesapes by default. It's a pain to track down where the encoding errors came from. From ethan at stoneleaf.us Sat Jan 11 21:53:25 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 12:53:25 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <0AD3B4B7-A108-42F8-B914-D0CCB5C454C9@stufft.io> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <0AD3B4B7-A108-42F8-B914-D0CCB5C454C9@stufft.io> Message-ID: <52D1AF45.7040708@stoneleaf.us> On 01/11/2014 12:45 PM, Donald Stufft wrote: > > FWIW as one of the people who it took Python3 to finally figure out how to > actually use unicode, it was the absence of encode on bytes and decode on > str that actually did it. Giving bytes a format method would not have affected > that either way I don?t believe. My biggest hurdle was realizing that ASCII was an encoding. -- ~Ethan~ From ethan at stoneleaf.us Sat Jan 11 21:45:01 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 12:45:01 -0800 Subject: [Python-Dev] test.support.check_warnings Message-ID: <52D1AD4D.8000701@stoneleaf.us> The docs say this [1]: ========================================================== test.support.check_warnings(*filters, quiet=True) A convenience wrapper for warnings.catch_warnings() that makes it easier to test that a warning was correctly raised. It is approximately equivalent to calling warnings.catch_warnings(record=True) with warnings.simplefilter() set to always and with the option to automatically validate the results that are recorded. check_warnings accepts 2-tuples of the form ("message regexp", WarningCategory) as positional arguments. If one or more filters are provided, or if the optional keyword argument quiet is False, it checks to make sure the warnings are as expected: each specified filter must match at least one of the warnings raised by the enclosed code or the test fails, and if any warnings are raised that do not match any of the specified filters the test fails. To disable the first of these checks, set quiet to True. ========================================================== What I want is to make sure that DeprecationWarnings are being raised: ========================================================== with support.check_warnings( ("automatic int conversions have been deprecated", DeprecationWarning), quiet=False, ): exec("'%x' % pi") exec("'%x' % 3.14") exec("'%X' % 2.11") exec("'%o' % 1.79") exec("'%c' % pi") ========================================================== But if I throw in something that doesn't raise a deprecation warning, the test still passes: ========================================================== exec("'%d' % 3") ========================================================== Am I doing something wrong? -- ~Ethan~ From reingart at gmail.com Sat Jan 11 23:13:39 2014 From: reingart at gmail.com (Mariano Reingart) Date: Sat, 11 Jan 2014 20:13:39 -0200 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140111004933.4e0bb394@fsol> Message-ID: On Fri, Jan 10, 2014 at 9:13 PM, Juraj Sukop wrote: > > > > On Sat, Jan 11, 2014 at 12:49 AM, Antoine Pitrou wrote: > >> Also, when you say you've never encountered UTF-16 text in PDFs, it >> sounds like those people who've never encountered any non-ASCII data in >> their programs. > > > Let me clarify: one does not think in "writing text in Unicode"-terms in > PDF. Instead, one records the sequence of "character codes" which > correspond to "glyphs" or the glyph IDs directly. That's because one > Unicode character may have more than one glyph and more characters can be > shown as one glyph. > > > AFAIK (and just for the record), there could be both Latin1 text and UTF-16 in a PDF (and other encodings too), depending on the font used: /Encoding /WinAnsiEncoding (mostly latin1 "standard" fonts) /Encoding /Identity-H (generally for unicode UTF-16 True Type "embedded" fonts) For example, in PyFPDF (a PHP library ported to python), the following code writes out text that could be encoded in two different encodings: s = sprintf("BT %.2f %.2f Td (%s) Tj ET", x*self.k, (self.h-y)*self.k, txt) https://code.google.com/p/pyfpdf/source/browse/fpdf/fpdf.py#602 In Python2, txt is just a str, but in Python3 handling everything as latin1 string obviously doesn't work for TTF in this case. Best regards Mariano Reingart http://www.sistemasagiles.com.ar http://reingart.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sat Jan 11 22:50:59 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 13:50:59 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <20140111193226.23cc771d@fsol> References: <20140111193226.23cc771d@fsol> Message-ID: <52D1BCC3.1030701@stoneleaf.us> On 01/11/2014 10:32 AM, Antoine Pitrou wrote: > On Sat, 11 Jan 2014 18:41:49 +0100 > Victor Stinner wrote: >> >> b'x=%s' % 10 is well defined, it's pure bytes. > > It is well-defined? Then please explain me what the general case of > b'%s' % x > is supposed to call: This is the key question, isn't it? > - does it call x.__bytes__? int.__bytes__ doesn't exist Perhaps that's the problem. According to the docs: ======================================================================== object.__bytes__(self) Called by bytes() to compute a byte-string representation of an object. This should return a bytes object. ======================================================================== Obviously, with the plethora of different binary possibilities for representing a number (how many bytes? endianness? which complement?), we would be well within our rights to decide that the "byte-string representation" of the numeric types is the ASCII equivalent of their __repr__ or __str__, and implement __bytes__ appropriately for them. Any other object that wants to be represented easily in a byte stream would also have to implement __bytes__. If necessary we could add __bytes__ to str for /strict/ ASCII conversion (even latin-1 would have to be explicitly encoded)[1]. -- ~Ethan~ [1] I'm iffy on this point as I'm not at all sure it's needed. From victor.stinner at gmail.com Sun Jan 12 00:11:23 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 12 Jan 2014 00:11:23 +0100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D1BCC3.1030701@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D1BCC3.1030701@stoneleaf.us> Message-ID: 2014/1/11 Ethan Furman : >>> b'x=%s' % 10 is well defined, it's pure bytes. >> >> It is well-defined? Then please explain me what the general case of >> b'%s' % x >> is supposed to call: > > This is the key question, isn't it? Python 2 and Python 3 are very different here. In Python 2, the "s" format of PyArg_Parse may call the __str__() method of an object. In Python 3, the "y*" format of PyArg_Parse uses the Py_buffer API which has no slot (there is no protocol like a __getbuffer__() method). The Py_buffer can only be implemented in C. For example, bytes, bytearray and memoryview implement it. PyArg_Parse requires also the buffer to be C-contiguous and has a single segment (use PyBUF_SIMPLE flag). Said differently, bytes%args and bytes.format() would *not* call any method. Victor From victor.stinner at gmail.com Sun Jan 12 02:01:01 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 12 Jan 2014 02:01:01 +0100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <20140111193226.23cc771d@fsol> References: <20140111193226.23cc771d@fsol> Message-ID: Hi, 2014/1/11 Antoine Pitrou : >> b'x=%s' % 10 is well defined, it's pure bytes. > > It is well-defined? Then please explain me what the general case of > b'%s' % x > is supposed to call: > > - does it call x.__bytes__? int.__bytes__ doesn't exist > - does it call bytes(x)? bytes(10) gives > b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' > - does it call x.__str__? you've reintroduced the Python 2 behaviour of > conflating bytes and unicode I don't want to call any method from bytes%args, only Py_buffer API would be used. So the pseudo-code becomes: - try to get Py_buffer - on failure, check if it's an int: yes? ok, format it as decimal - otherwise, raise an error Or: - is the object an int? yes, format it as decimal. no, use Py_buffer -- I discussed with Antoine to try to understand how and why we disagree. Antoine prefers a pure API, whereas I'm trying to figure out if it would be possible to write code compatible with Python 2 and Python 3. Using Antoine's PEP, it's possible to write code working on Python 2 and Python 3 which only manipulate bytes strings. The problem is that it's a pain to write a code working on both Python versions when an argument is an integer. For example, the Python 2 code "Content-Length: %s\r\n" % 123 is written ("Content-Length: %s\r\n" % 123).encode('ascii') in Python 3. So Python 2 and Python 3 codes are different. Supporting formating integers would allow to write b"Content-Length: %s\r\n" % 123, which would work on Python 2 and Python 3. (u'Content-Length: %s\r\n' % 123).encode('ascii') works on both Python versions, but it may require more work to Python 2 code on Python 3. -- Now I'm trying to find use cases in Mercurial and Twisted source code to see which features are required. First, I'm looking for a function requiring to format a number in decimal in a bytes string. In issue #3982, I saw: """ HTTP chunking' uses ASCII mixed with binary (octets). With 2.6 you could write: def chunk(block): return b'{0:x}\r\n{1}\r\n'.format(len(block), block)" """ and """ 'Content-length: {}\r\n'.format(length) """ But are the examples real use cases, or artifical examples? -- Augie Fackler gave an example from Mercurial: """ sys.stdout.write('%(state)s %(path)s\n' % {'state': 'M', 'path': 'some/filesystem/path'}) except we don't know the encoding of the filesystem path (Hi unix!) so we have to treat the whole thing as opaque bytes. It's even more fun for 'log', becase then it's got localized strings in it as well. """ But here I disagree with the design of Mercurial, filenames should be treated as text. If a filename would be pure binary, you should not write it in a terminal. Displaying binary data usually leads to displaying random characters and changing terminal options (ex: text starts blinking or is displayed in bold!?) :-) For the localized string: again, it's also a design issue in my opinion. A localized string is text, not binary data :-) -- Another option is that I cannot find usecases because there are no use cases for the PEP 460 and the PEP is useless :-) Victor From steve at pearwood.info Sun Jan 12 02:10:46 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jan 2014 12:10:46 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D199F6.7090105@mrabarnett.plus.com> References: <20140111053639.GM3869@ando> <52D199F6.7090105@mrabarnett.plus.com> Message-ID: <20140112011046.GU3869@ando> On Sat, Jan 11, 2014 at 07:22:30PM +0000, MRAB wrote: > >with open("outfile.pdf", "w", encoding="latin-1") as f: > > f.write(pdf) > > > [snip] > The second example won't work because you're forgetting about the > handling of line endings in text mode. So I did! Thank you for the correction. -- Steven From v+python at g.nevcal.com Sun Jan 12 02:10:37 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 11 Jan 2014 17:10:37 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D1BCC3.1030701@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D1BCC3.1030701@stoneleaf.us> Message-ID: <52D1EB8D.6060209@g.nevcal.com> On 1/11/2014 1:50 PM, Ethan Furman wrote: > Perhaps that's the problem. According to the docs: > ======================================================================== > object.__bytes__(self) > > Called by bytes() to compute a byte-string representation of an > object. This should return a bytes object. > ======================================================================== > > Obviously, with the plethora of different binary possibilities for > representing a number (how many bytes? endianness? which complement?), > we would be well within our rights to decide that the "byte-string > representation" of the numeric types is the ASCII equivalent of their > __repr__ or __str__, and implement __bytes__ appropriately for them. > Any other object that wants to be represented easily in a byte stream > would also have to implement __bytes__. If necessary we could add > __bytes__ to str for /strict/ ASCII conversion (even latin-1 would > have to be explicitly encoded)[1]. In spite of Victor's explanation of internals, which I didn't understand, this sounds like a very interesting idea, conceptually, that any object could implement its __bytes__representation. On the other hand, it would probably have to be parameterized in the general case: for binary data values, one protocol or format may wish the data to be big-endian, and another may wish the data to be little-endian; for str, one protocol or format may require one encoding and another may require a different encoding, even (as for email) for different parts of the message. So it could be somewhat complex, yet would be very powerful in allowing complex objects, made up of other objects, some of which might have a variety of potential bytes formats (think TIFF files, for example) to convert themselves into a stream of bytes that fits the standard. On the flip side, one would want to convert the stream of bytes into the set of objects, which is a parsing problem. This is a bit beyond what can be done automatically, just by calling __bytes__ with no parameters, though. What it may be, though, is a meta-operation from which the needed bytes operations can be determined. It may also not be an easy "compatible with existing Python 2 code with minor tweaks" solution, either. It would be more like a pickle protocol, but pickle defines its own formats, and thus is useless for creating standard formats. I guess it would belong on python-ideas. -------------- next part -------------- An HTML attachment was scrubbed... URL: From matej at ceplovi.cz Sun Jan 12 02:12:22 2014 From: matej at ceplovi.cz (Matěj Cepl) Date: Sun, 12 Jan 2014 02:12:22 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D188DD.1070004@stoneleaf.us> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <52D188DD.1070004@stoneleaf.us> Message-ID: <20140112011222.8714942232@wycliff.ceplovi.cz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2014-01-11, 18:09 GMT, you wrote: >> We are NOT going back to the confusing incoherent mess that >> is the Python 2 model of bolting Unicode onto the side of >> POSIX . . . > > We are not asking for that. Yes, you do. Maybe not you personally, but number of people here on this list (for F...k sake, this is for DEVELOPERS of the langauge, not some bloody users!) for whom the current suggestion is just the way how to avoid Unicode and keep all those broken script which barfs at me all the time alive is quit non-zero I am afraid. Best, Mat?j -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iD8DBQFS0ev24J/vJdlkhKwRAoHOAJ9crimnp+TtXCxmZLvTUSFVFSESAwCeNrby Yjwk6Ydzc/REezfHP046C5Y= =c2vl -----END PGP SIGNATURE----- From tjreedy at udel.edu Sun Jan 12 02:20:28 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 11 Jan 2014 20:20:28 -0500 Subject: [Python-Dev] byteformat() proposal: please critique Message-ID: The following function interpolates bytes, bytearrays, and formatted strings, the latter two auto-converted to bytes, into a bytes (or auto-converted bytearray) format. This function automates much of what some people have recommended for combining ascii text and binary blogs. The test passes on 2.7.6 as well as 3.3.3, though a 2.7-only version would be simpler. =============== # bf.py -- Terry Jan Reedy, 2014 Jan 11 "Define byteformat(): a bytes version of str.format as a function." import re def byteformat(form, obs): '''Return bytes-formated objects interpolated into bytes format. The bytes or bytearray format has two types of replacement fields. b'{}' and b'{:}': The object can be any raw bytes or bytearray object. b'{:}: The object can by any object ob that can be string-formated with . Bytearray are converted to bytes. The text encoding is the default (encoding="utf-8", errors="strict"). Users should be explicitly encode to bytes for any other encoding. The struct module can by used to produce bytes, such as binary-formated integers, that are not encoded text. Test passes on both 2.7.6 and 3.3.3. ''' if isinstance(form, bytearray): form = bytes(form) fields = re.split(b'{:?([^}]*)}', form) # print(fields) if len(fields) != 2*len(obs)+1: raise ValueError('Number of replacement fields not same as len(obs)') j = 1 # index into fields for ob in obs: if isinstance(ob, bytearray): ob = bytes(ob) field = fields[j] fields[j] = format(ob, field.decode()).encode() if field else ob j += 2 return b''.join(fields) # test code bformat = b"bytes: {}; bytearray: {:}; unicode: {:s}; int: {:5d}; float: {:7.2f}; end" objects = (b'abc', bytearray(b'def'), u'ghi', 123, 12.3) result = byteformat(bformat, objects) result2 = byteformat(bytearray(bformat), objects) strings = (ob.decode() if isinstance(ob, (bytes, bytearray)) else ob for ob in objects) expect = bformat.decode().format(*strings).encode() #print(result) #print(result2) print(expect) assert result == result2 == expect ===== This has been edited from what I posted to issue 3982 to expand the docstrings and to work the same with both bytes and bytearrays on both 2.7 and 3.3. When I posted before, I though of it merely as a proof-of-concept prototype. After reading the seemingly endless discussion of possible variations of byte formatting with % and .format, I now present it as a real, concrete, proposal. There are, of course, details that could be tweaked. The encoding uses the default, which on 3.x is (encoding='utf-8', errors='strict'). This could be changed to an explicit encoding='ascii'. If that were done, the encoding could be made a parameter that defaults to 'ascii'. The joiner could be defined as type(form)() so the output type matches the input form type. I did not do that because it complicates the test. The coercion of interpolated bytearray objects to bytes is needed for 2.7 because in 2.7, str/bytes.join raises TypeError for bytearrays in the input sequence. A 3.x-only version could drop this. One objection to the function is that it is neither % or .format. To me, this is an advantage in that a new function will not be expected to exactly match the % or .format behavior in either 2.x or 3.x. It eliminates the 'matching the old' arguments so we can focus on what actual functionality is needed. There is no need to convert true binary bytes to text with either latin-1 or surrogates. There is no need to add anything to bytes. The code above uses the built-in facilities that we already have, which to me should be the first thing to try, not the last. One new feature that does not match old behavior is that {} and {:} are changed (in 3.x) to indicate bytes whereas {:s} continues to indicate (in 3.x) unicode text. ({:s} might be changed to mean unicode for 2.7 also, but I did not explore that idea.) Similarly, a new function is free to borrow only the format_spec part of replace of replacement fields and use format(ob, format_spec) to format each object. Anyone who needs the full power of str.format is free to use it explicitly. I think format_specs cover most of what people have asked for. For future releases, the function could go in the string module. It could otherwise be added to existing or future 2&3 porting packages. -- Terry Jan Reedy From steve at pearwood.info Sun Jan 12 02:31:51 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jan 2014 12:31:51 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <87y52me4tk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140112013150.GV3869@ando> On Sat, Jan 11, 2014 at 04:28:34PM -0500, Terry Reedy wrote: > The problem with some criticisms of using 'unicode in Python 3' is that > there really is no such thing. Unicode in 3.0 to 3.2 used the old > internal model inherited from 2.x. Unicode in 3.3+ uses a different > internal model that is a game changer with respect to certain issues of > space and time efficiency (and cross-platform correctness and > portability). So at least some the valid criticisms based on the old > model are out of date and no longer valid. While there are definitely performance savings (particularly of memory) regarding the FSR in Python 3.3, for the use-case we're talking about, Python 3.1 and 3.2 (and for that matter, 2.2 through 2.7) Unicode strings should be perfectly adequate. The textual data being used is ASCII, and the binary blobs are encoded to Latin-1, so everything is a subset of Unicode, namely U+0000 to U+00FF. That means there are no astral characters, and no behavioural differences between wide and narrow builds (apart from memory use). -- Steven From steve at pearwood.info Sun Jan 12 02:35:00 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jan 2014 12:35:00 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140111004933.4e0bb394@fsol> Message-ID: <20140112013500.GW3869@ando> On Sat, Jan 11, 2014 at 08:13:39PM -0200, Mariano Reingart wrote: > AFAIK (and just for the record), there could be both Latin1 text and UTF-16 > in a PDF (and other encodings too), depending on the font used: [...] > In Python2, txt is just a str, but in Python3 handling everything as latin1 > string obviously doesn't work for TTF in this case. Nobody is suggesting that you use Latin-1 for *everything*. We're suggesting that you use it for blobs of binary data that represent arbitrary bytes. First you have to get your binary data in the first place, using whatever technique is necessary. Here's one way to get a blob of binary data: # encode four C shorts into a fixed-width struct struct.pack(">hhhh", 23, 42, 17, 99) Here's another way: # encode a text string into UTF-16 "My name is Steven".encode("utf-16be") Both examples return a bytes object containing arbitrary bytes. How do you combine those arbitrary bytes with a string template while still keeping all code-points under U+0100? By decoding to Latin-1. -- Steven From brett at python.org Sun Jan 12 02:37:50 2014 From: brett at python.org (Brett Cannon) Date: Sat, 11 Jan 2014 20:37:50 -0500 Subject: [Python-Dev] test.support.check_warnings In-Reply-To: <52D1AD4D.8000701@stoneleaf.us> References: <52D1AD4D.8000701@stoneleaf.us> Message-ID: On Sat, Jan 11, 2014 at 3:45 PM, Ethan Furman wrote: > The docs say this [1]: > ========================================================== > test.support.check_warnings(*filters, quiet=True) > > A convenience wrapper for warnings.catch_warnings() that makes it > easier to test that a warning was correctly raised. It is approximately > equivalent to calling warnings.catch_warnings(record=True) with > warnings.simplefilter() set to always and with the option to automatically > validate the results that are recorded. > > check_warnings accepts 2-tuples of the form ("message regexp", > WarningCategory) as positional arguments. If one or more filters are > provided, or if the optional keyword argument quiet is False, it checks to > make sure the warnings are as expected: each specified filter must match at > least one of the warnings raised by the enclosed code or the test fails, > and if any warnings are raised that do not match any of the specified > filters the test fails. To disable the first of these checks, set quiet to > True. > ========================================================== > > What I want is to make sure that DeprecationWarnings are being raised: > ========================================================== > with support.check_warnings( > ("automatic int conversions have been deprecated", > DeprecationWarning), > quiet=False, > ): > exec("'%x' % pi") > exec("'%x' % 3.14") > exec("'%X' % 2.11") > exec("'%o' % 1.79") > exec("'%c' % pi") > ========================================================== > > But if I throw in something that doesn't raise a deprecation warning, the > test still passes: > ========================================================== > exec("'%d' % 3") > ========================================================== > > Am I doing something wrong? > You're assuming the context manager is doing something magical to verify that all calls in the block raise the expected exception. What you want to do is execute it in a loop:: for test in (...): with support.check_warnings(("automatic int conversions have been deprecated", DeprecationWarning), quiet=False): exec(test) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Sun Jan 12 03:05:04 2014 From: cs at zip.com.au (Cameron Simpson) Date: Sun, 12 Jan 2014 13:05:04 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: Message-ID: <20140112020504.GA42414@cskk.homeip.net> On 11Jan2014 13:15, Juraj Sukop wrote: > On Sat, Jan 11, 2014 at 5:14 AM, Cameron Simpson wrote: > > data = b' '.join( bytify( [ 10, 0, obj, binary_image_data, ... ] ) ) > > Thanks for the suggestion! The problem with "bytify" is that some items > might require different formatting than other items. For example, in > "Cross-Reference Table" there are three different formats: non-padded > integer ("1"), 10- and 15digit integer, ("0000000003", "65535"). Well, this is partly my point: you probably want to exert more control that is reasonable for the PEP to offer, and you're better off with a helper function of your own. In particular, aside from passing in a default char=>bytes encoding, you can provide your own format hooks. In particular, str already provides a completish % suite and you have no issue with encodings in that phase because it is all Unicode. So the points where you're treating PDF as text are probably best tackled as text and then encoded with a helper like bytify when you have to glom bytes and "textish" stuff together. Crude example, hacked up from yours: data = b''.join( bytify( ("%d %d obj ... stream" % (10, 0)), binary_image_data, "endstream endobj", ))) where bytify swallows your encoding decisions. Since encoding anything-not-bytes into a bytes sequence inherently involves an encoding decision, I think I'm +1 on the PEP's aim of never mixing bytes with non-bytes, keeping all the encoding decisions in the caller's hands. I quite understand not wanting to belabour the code with ".encode('ascii')" but that should be said somewhere, so best to do so yourself in as compact and ergonomic fashion as possible. Cheers, -- Cameron Simpson Serious error. All shortcuts have disappeared. Screen. Mind. Both are blank. - Haiku Error Messages http://www.salonmagazine.com/21st/chal/1998/02/10chal2.html From kristjan at ccpgames.com Sun Jan 12 03:11:23 2014 From: kristjan at ccpgames.com (=?gb2312?B?S3Jpc3RqqKJuIFZhbHVyIEqorm5zc29u?=) Date: Sun, 12 Jan 2014 02:11:23 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: , Message-ID: No, I don't think it is. The purpose is to make it easier to work with bytes objects. There can be no python 2 compatibility when it comes to bytes/unicode conversion. ________________________________________ From: Python-Dev [python-dev-bounces+kristjan=ccpgames.com at python.org] on behalf of Serhiy Storchaka [storchaka at gmail.com] Sent: Saturday, January 11, 2014 21:01 To: python-dev at python.org Subject: Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake 11.01.14 21:40, Kristj?n Valur J?nsson ???????(??): > How about a compromise? > Personally, I think adding the full complement of integer/float formatting to bytes is a bit over the top. > How about just supporting two format specifiers? > %b : interpolate a bytes object. If it doesn't have the buffer interface, error. > %s : interpolate a str object, encoded to ASCII using 'strict' conversion. %b is not supported in Python 2.7. And compatibility with Python 2.7 is only the purpose of this feature. From brett at python.org Sun Jan 12 03:13:40 2014 From: brett at python.org (Brett Cannon) Date: Sat, 11 Jan 2014 21:13:40 -0500 Subject: [Python-Dev] byteformat() proposal: please critique In-Reply-To: References: Message-ID: On Sat, Jan 11, 2014 at 8:20 PM, Terry Reedy wrote: > The following function interpolates bytes, bytearrays, and formatted > strings, the latter two auto-converted to bytes, into a bytes (or > auto-converted bytearray) format. This function automates much of what some > people have recommended for combining ascii text and binary blogs. The test > passes on 2.7.6 as well as 3.3.3, though a 2.7-only version would be > simpler. > =============== > > # bf.py -- Terry Jan Reedy, 2014 Jan 11 > "Define byteformat(): a bytes version of str.format as a function." > import re > > def byteformat(form, obs): > '''Return bytes-formated objects interpolated into bytes format. > > The bytes or bytearray format has two types of replacement fields. > b'{}' and b'{:}': The object can be any raw bytes or bytearray object. > b'{:}: The object can by any object ob that can be > string-formated with . Bytearray are converted to bytes. > > The text encoding is the default (encoding="utf-8", errors="strict"). > Users should be explicitly encode to bytes for any other encoding. > The struct module can by used to produce bytes, such as binary-formated > integers, that are not encoded text. > > Test passes on both 2.7.6 and 3.3.3. > ''' > > if isinstance(form, bytearray): > form = bytes(form) > fields = re.split(b'{:?([^}]*)}', form) > # print(fields) > if len(fields) != 2*len(obs)+1: > raise ValueError('Number of replacement fields not same as > len(obs)') > j = 1 # index into fields > for ob in obs: > if isinstance(ob, bytearray): > ob = bytes(ob) > field = fields[j] > fields[j] = format(ob, field.decode()).encode() if field else ob > j += 2 > return b''.join(fields) > > # test code > bformat = b"bytes: {}; bytearray: {:}; unicode: {:s}; int: {:5d}; float: > {:7.2f}; end" > objects = (b'abc', bytearray(b'def'), u'ghi', 123, 12.3) > result = byteformat(bformat, objects) > result2 = byteformat(bytearray(bformat), objects) > strings = (ob.decode() if isinstance(ob, (bytes, bytearray)) else ob > for ob in objects) > expect = bformat.decode().format(*strings).encode() > > #print(result) > #print(result2) > print(expect) > assert result == result2 == expect > > ===== > This has been edited from what I posted to issue 3982 to expand the > docstrings and to work the same with both bytes and bytearrays on both 2.7 > and 3.3. When I posted before, I though of it merely as a proof-of-concept > prototype. After reading the seemingly endless discussion of possible > variations of byte formatting with % and .format, I now present it as a > real, concrete, proposal. > > There are, of course, details that could be tweaked. The encoding uses the > default, which on 3.x is (encoding='utf-8', errors='strict'). This could > be changed to an explicit encoding='ascii'. If that were done, the encoding > could be made a parameter that defaults to 'ascii'. The joiner could be > defined as type(form)() so the output type matches the input form type. I > did not do that because it complicates the test. > With that flexibility this matches what I have been mulling in the back of my head all day. Basically everything that goes in is assumed to be bytes unless {:s} says to expect something which can be passed to str() and then use some specified encoding in all instances (stupid example following as it might be easier with bytes.join, but it gets the point across):: formatter = format_bytes('latin1', 'strict') http_response = formatter(b'Content-Type: {:s}\r\n\r\nContent-Length: {:s}\r\n\r\n{}', 'image/jpeg', len(data), data) Nothing fancy, just an easy way to handle having to call str.encode() on every text argument that is to end up as bytes as Terry is proposing (and I'm fine with defaulting to ASCII/strict with no arguments). Otherwise you do what R. David Murray suggested and just have people rely on their own API which accepts what they want and then spits out what they want behind the scenes. It basically comes down to how much tweaking of existing Python 2.7 %/.format() calls people will be expected to make. I'm fine with asking people to call a function like what Terry is proposing as it can do away with baking in that ASCII is reasonable as well as not require a bunch of work without us having to argue over what bytes.format() should or should not do. Personally I say bytes.format() is fine but it shouldn't do any text encoding which makes its usefulness rather minor (much like the other text-like methods that got carried forward in hopes that they would be useful to people porting code; maybe we should consider taking them out in Python 4 or something if we find out no one is using them). > > The coercion of interpolated bytearray objects to bytes is needed for 2.7 > because in 2.7, str/bytes.join raises TypeError for bytearrays in the input > sequence. A 3.x-only version could drop this. > > One objection to the function is that it is neither % or .format. To me, > this is an advantage in that a new function will not be expected to exactly > match the % or .format behavior in either 2.x or 3.x. It eliminates the > 'matching the old' arguments so we can focus on what actual functionality > is needed. Agreed. > There is no need to convert true binary bytes to text with either latin-1 > or surrogates. There is no need to add anything to bytes. The code above > uses the built-in facilities that we already have, which to me should be > the first thing to try, not the last. > I think we are all losing sight of the fact that we are talking about Python 3.5 here. Even with an accelerated release schedule of a year that is still a year away! I think any proposal being made should be prototyped in pure Python and tried on a handful or real world examples to see how the results end up looking like to measure how useful they are on their own and how much work it is to port to using it. I think the goal should be a balance and not going to an extreme to minimize porting work from Python 2.7 at the cost of polluting the bytes/string separation and letting people entirely ignore encoding of strings. > > One new feature that does not match old behavior is that {} and {:} are > changed (in 3.x) to indicate bytes whereas {:s} continues to indicate (in > 3.x) unicode text. ({:s} might be changed to mean unicode for 2.7 also, but > I did not explore that idea.) Similarly, a new function is free to borrow > only the format_spec part of replace of replacement fields and use > format(ob, format_spec) to format each object. Anyone who needs the full > power of str.format is free to use it explicitly. I think format_specs > cover most of what people have asked for. > > For future releases, the function could go in the string module. It could > otherwise be added to existing or future 2&3 porting packages. I don't think the string module is the right place since this is meant to operate on bytes, but then again I don't know where it would end up if it went into the stdlib. If we have it take the string encoding arguments it could be a method on the bytes type by being a factory method:: formatter = bytes.formatter('latin1', 'strict') ... I would be willing to go as far as making 'strict' the default 'error' argument, but I would say it's still go to make people specify even 'ascii', otherwise people lose sight that bytes([ord(1)]) == b'1' == '1'.encode('ascii') != 1 .to_bytes(1, 'big') and that is a key thing to grasp. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jan 12 03:29:11 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jan 2014 13:29:11 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D19600.1070106@stoneleaf.us> References: <20140111053639.GM3869@ando> <20140111153837.GP3869@ando> <52D16F4B.6080900@stoneleaf.us> <20140111183631.GQ3869@ando> <52D19600.1070106@stoneleaf.us> Message-ID: <20140112022909.GX3869@ando> On Sat, Jan 11, 2014 at 11:05:36AM -0800, Ethan Furman wrote: > On 01/11/2014 10:36 AM, Steven D'Aprano wrote: > >On Sat, Jan 11, 2014 at 08:20:27AM -0800, Ethan Furman wrote: > >> > >> unicode to bytes > >> bytes to unicode using latin1 > >> unicode to bytes > > > >Where do you get this from? I don't follow your logic. Start with a text > >template: > > > >template = """\xDE\xAD\xBE\xEF > >Name:\0\0\0%s > >Age:\0\0\0\0%d > >Data:\0\0\0%s > >blah blah blah > >""" > > > >data = template % ("George", 42, blob.decode('latin-1')) Since the use-cases people have been speaking about include only ASCII (or at most, Latin-1) text and arbitrary binary bytes, my example is limited to showing only ASCII text. But it will work with any text data, so long as you have a well-defined format that lets you tell which parts are interpreted as text and which parts as binary data. If your file format is not well-defined, then you have bigger problems than dealing with text versus bytes. > >Only the binary blobs need to be decoded. We don't need to encode the > >template to bytes, and the textual data doesn't get encoded until we're > >ready to send it across the wire or write it to disk. > > And what if your name field has data not representable in latin-1? > > --> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8') > u'\u0441\u0440\u0403' Where did you get those bytes from? You got them from somewhere. Who knows? Who cares? Once you have bytes, you can treat them as a blob of arbitrary bytes and write them to the record using the Latin-1 trick. If you're reading those bytes from some stream that gives you bytes, you don't have to care where they came from. But what if you don't start with bytes? If you start with a bunch of floats, you'll probably convert them to bytes using the struct module. If you start with non-ASCII text, you have to convert them to bytes too. No difference here. You ask the user for their name, they answer "???" which is given to you as a Unicode string, and you want to include it in your data record. The specifications of your file format aren't clear, so I'm going to assume that: 1) ASCII text is allowed "as-is" (that is, the name "George" will be in the final data file as b'George'); 2) any other non-ASCII text will be encoded as some fixed encoding which we can choose to suit ourselves; (if the encoding is fixed by the file format, then just use that) 3) arbitrary binary data is allowed "as-is" (i.e. byte N has to end up being written as byte N, for any value of N between 0 and 255). So, to write the ASCII name "George", we can just "Name:\0\0\0%s" % "George" since we know it is already ASCII. (It's a literal, so that's obvious. But see below.) To write arbitrary binary data, we take the *bytes* and decode to Latin-1: blob = bunch_o_bytes() # Completely arbitrary. "Data:\0\0\0%s" % blob.decode('latin-1')) Combine those two techniques to deal with non-ASCII names. First you have to get the non-ASCII name converted to *arbitrary bytes*, so any encoding that deals with the whole range of Unicode will do. Then you convert those arbitary bytes into Latin-1. Here I'll use UTF-32, just because I can and I feel like being wasteful: "Name:\0\0\0%s" % "???".encode("utf-32be").decode("latin-1") UTF-8 is a better choice, because it doesn't use as much space and gives you something which looks like ASCII in a hex editor: name = "George" if random.random() < 0.5 else "???" "Name:\0\0\0%s" % name.encode("utf-8").decode("latin-1") If you don't know whether your name is pure ASCII, then you have to encode first. Otherwise how do you know what bytes to use? Aside: if this point is not *bleedingly obvious*, then you need to read Joel on Software on Unicode RIGHT NOW. http://www.joelonsoftware.com/articles/Unicode.html? If the name data happens to be pure ASCII, then encoding to UTF-8 and decoding to Latin-1 ends up being a no-op: py> "George".encode("utf-8").decode("latin-1") 'George' Of course, if I know that the name is ASCII ahead of time (I wrote it as a literal, so I think I would know...) then I can short-cut the whole process and just do this: "Name:\0\0\0%s" % name_which_is_guaranteed_to_be_ascii If I screw up and insert a non-Latin-1 character, then when I eventually write it to a file, it will give me a Unicode error, exactly as it should. I've assumed that I can pick the encoding. That's rather like assuming that, given a bunch of floats, I can pick whether to represent them as C doubles or singles or something else, whatever suits my purposes. If I'm dealing with some existing file format, it probably defines the encoding, either explicitly or implicitly. When I don't have the choice of encoding, but have to use some damned stupid legacy encoding that only includes a fraction of Unicode, then: name.encode("legacy encoding", errors="whatever") will give me the bytes I need to use the Latin-1 trick on. This whole thing can be wrapped in a tiny one-line helper function: def bytify(text, encoding="utf-8", errors="ignore"): # pick your own appropriate encoding and error handler return text.encode(encoding, errors).decode('latin-1') > --> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8').encode('latin1') > Traceback (most recent call last): > File "", line 1, in > UnicodeEncodeError: 'latin-1' codec can't encode characters in position > 0-2: ordinal not in range(256) That is backwards to what I've shown. Look at my earlier example again: data = template % ("George", 42, blob.decode('latin-1')) Bytes get DECODED to latin-1, not encoded. Bytes -> text is *decoding* Text -> bytes is *encoding* > So really your example should be: > > data = template % > ("George".encode('some_non_ascii_encoding_such_as_cp1251').decode('latin-1'), 42, blob.decode('latin-1')) > > Which is a mess. Obviously it is stupid and wasteful to do that to a literal that you know is ASCII. But if you don't know what the contents of the string are, how do you know what bytes need to be written unless you encode to bytes first? -- Steven From ethan at stoneleaf.us Sun Jan 12 02:53:43 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 17:53:43 -0800 Subject: [Python-Dev] byteformat() proposal: please critique In-Reply-To: References: Message-ID: <52D1F5A7.1030600@stoneleaf.us> On 01/11/2014 05:20 PM, Terry Reedy wrote: > The following function . . . Thanks, Terry, for doing that. -- ~Ethan~ From ncoghlan at gmail.com Sun Jan 12 03:53:00 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jan 2014 12:53:00 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D178F7.7070401@stoneleaf.us> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D0AC59.9070908@stoneleaf.us> <52D178F7.7070401@stoneleaf.us> Message-ID: On 12 Jan 2014 03:29, "Ethan Furman" wrote: > > On 01/11/2014 12:43 AM, Nick Coghlan wrote: >> >> >> In particular, the bytes type is, and always will be, designed for >> pure binary manipulation [...] > > > I apologize for being blunt, but this is a lie. > > Lets take a look at the methods defined by bytes: > >>>> dir(b'') > > ['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'center', 'count', 'decode', 'endswith', 'expandtabs', 'find', 'fromhex', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'] > > Are you really going to insist that expandtabs, isalnum, isalpha, isdigit, islower, isspace, istitle, isupper, ljust, lower, lstrip, rjust, splitlines, swapcase, title, upper, and zfill are pure binary manipulation methods? Do you think I don't know that? However, those are all *in-place* modifications. Yes, they assume ASCII compatible formats, but they're a far cry from encouraging combination of data from potentially different sources. I'm also on record as considering this a design decision I regret, precisely because it has resulted in experienced Python 2 developers failing to understand that the Python 3 text model is *different* and they may need to create a new type. > > Let's take a look at the repr of bytes: > >>>> bytes([48, 49, 50, 51]) > > b'0123' > > Wow, that sure doesn't look like binary data! > > Py3 did not go from three text models to two, it went to one good one (unicode strings) and one broken one (bytes). If the aim was indeed for pure binary manipulation, we failed. We left in bunches of methods which can *only* be interpreted as supporting ASCII manipulation. No, no, no. We made some concessions in the design of the bytes type to *ease* development and debugging of ASCII compatible protocols *where we believed we could do so without compromising the underlying text model changes. Many experienced Python 2 developers are now suffering one of the worst cases of paradigm lock I have ever seen as they keep trying to make the Python 3 text model the same as the Python 2 one instead of actually learning how Python 3 works and recognising that they may actually need to create a new type for their use case and then potentially seek core dev assistance if that type reveals new interoperability bugs in the core types (or encounters old ones). > > Due to backwards compatibility we cannot now finish yanking those out, so either we live with a half-dead class screaming "I want be ASCII! I want to be ASCII!" or add back the missing functionality. No, we don't - we treat the core bytes type as PEP 460 does, by adding a *new* feature proposed by a couple people writing native Python 3 libraries like asyncio that makes binary formats easier to deal with without carrying forward even *more* broken assumptions from the Python 2 text model. (Remember, I'm in favour of Antoine's updated PEP, because it's a real spec for a new feature, rather than yet another proposal to bolt on even more text specific formatting features from someone that has never bothered to understand the reasons for the differences between the two versions). People that want a full hybrid type back can then pursue the custom extension type approach. Cheers, Nick. > > > -- > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jan 12 04:09:01 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jan 2014 13:09:01 +1000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: Message-ID: On 12 Jan 2014 03:44, "Victor Stinner" wrote: > > Hi, > > I'm in favor of adding support of formatting integer and floatting > point numbers in the PEP 460: %d, %u, %o, %x, %f with padding and > precision (%10d, %010d, %1.5f) and sign (%-i, %+i) but without > alternate format ("{:#x}"). %s would also accept int and float for > convenience. > > int and float subclasses would not be handled differently, their > __str__ and __format__ would be ignored. > > Other int-like and float-like types (ex: defining __int__ or > __index__) are not supported. Explicit cast would be required. asciistr will support the *full* text formatting API, so I don't see any reason to add this complexity to the core bytes type. However, I like the basic binary interpolation feature proposed by the current version of the PEP - it's a nice convenience method that doesn't compromise the text model by introducing implicit serialisation of other types (whether text or numbers). For Python 2 folks trying to grok where the "bright line" is in terms of the Python 3 text model: if your proposal includes *any* kind of implicit serialisation of non binary data to binary, it is going to be rejected as an addition to the core bytes type. If it avoids crossing that line (as the buffer-API-only version of PEP 460 does), then we can talk. Folks that want implicit serialisation (and I agree it has its uses) should go help Benno get asciistr up to speed. Cheers, Nick. > > For %s, the choice between string and number is made using > "(PyLong_Check() || PyFloat_Check())". > > If you agree, I will modify the PEP. If Antoine disagree, I will fork > the PEP 460 ;-) > > --- > > %s should not support precision (ex: %.100s), use Unicode for that. > > --- > > The PEP 460 should not reintroduce bytes+unicode, implicit decoding or > implement encoding. > > b'x=%s' % 10 is well defined, it's pure bytes. If you consider that > bytes should not contain text, why does the bytes type have methods > like isalpha() or upper()? And why binary files have a readline() > method? A "line" doesn't mean anything in pure bytes. > > It's an example of "practicality beats purity". Python 3 should not > enforce Unicode if the developers *chose* to use bytes to handle mixed > binary/text protocols like HTTP. > > But I'm against of adding "%r" and "%a" because they use Unicode and > would require an implicit encoding. type(ascii(obj)) is str, not > bytes. If you really want to use repr() and ascii(), encode the result > explicitly. > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Sun Jan 12 04:35:34 2014 From: larry at hastings.org (Larry Hastings) Date: Sat, 11 Jan 2014 19:35:34 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140108100814.408a91f2@anarchist.wooz.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> Message-ID: <52D20D86.6030502@hastings.org> On 01/08/2014 07:08 AM, Barry Warsaw wrote: > How hard would it be to put together some sample branches that provide > concrete examples of the various options? > > My own opinion could easily be influenced by having some hands-on time with > actual code, and I suspect even Guido could be influenced if he could pull > some things up in his editor and take a look around. I've uploaded a prototype here: https://bitbucket.org/larry/python-clinic-buffer It's a clone of Python trunk, so if you already have a trunk handy, clone that first then "hg pull -u" from the above and it'll go a lot quicker. The prototype adds some commands to Argument Clinic that allow you to specify where each bit of its output goes. You have four choices: * You can write to the output block as before. * You can buffer up the text for writing out later in the same file. * You can write to a file on the side. * Or you can throw it away. To learn how to run your own experiments, read "CLINIC.BUFFER.NOTES.TXT" in the root of the repository. For your tl;dr pleasure I've included recipes for the proposed approaches so far. I don't propose to check in the prototype in its current state. But it should be sufficient for running everybody's experiments. (If there's something you want to try that my prototype doesn't support, contact me and I should be able to throw in a feature for you.) Happy experimenting, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Jan 12 05:38:49 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 20:38:49 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140112022909.GX3869@ando> References: <20140111053639.GM3869@ando> <20140111153837.GP3869@ando> <52D16F4B.6080900@stoneleaf.us> <20140111183631.GQ3869@ando> <52D19600.1070106@stoneleaf.us> <20140112022909.GX3869@ando> Message-ID: <52D21C59.2010600@stoneleaf.us> On 01/11/2014 06:29 PM, Steven D'Aprano wrote: > On Sat, Jan 11, 2014 at 11:05:36AM -0800, Ethan Furman wrote: >> On 01/11/2014 10:36 AM, Steven D'Aprano wrote: >>> On Sat, Jan 11, 2014 at 08:20:27AM -0800, Ethan Furman wrote: >>>> >>>> unicode to bytes >>>> bytes to unicode using latin1 >>>> unicode to bytes >>> >>> Where do you get this from? I don't follow your logic. Start with a text >>> template: >>> >>> template = """\xDE\xAD\xBE\xEF >>> Name:\0\0\0%s >>> Age:\0\0\0\0%d >>> Data:\0\0\0%s >>> blah blah blah >>> """ >>> >>> data = template % ("George", 42, blob.decode('latin-1')) > > Since the use-cases people have been speaking about include only ASCII > (or at most, Latin-1) text and arbitrary binary bytes, my example is > limited to showing only ASCII text. But it will work with any text data, > so long as you have a well-defined format that lets you tell which parts > are interpreted as text and which parts as binary data. Since you're talking to me, it would be nice if you addressed the same use-case I was addressing, which is mixed: ascii-encoded text, ascii-encoded numbers, ascii-encoded bools, binary-encoded numbers, and misc-encoded text. And no, your example will not work with any text, it would completely moji-bake my dbf files. >>> Only the binary blobs need to be decoded. We don't need to encode the >>> template to bytes, and the textual data doesn't get encoded until we're >>> ready to send it across the wire or write it to disk. No! When I have text, part of which gets ascii-encoded and part of which gets, say, cp1251 encoded, I cannot wait till the end! >> And what if your name field has data not representable in latin-1? >> >> --> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8') >> u'\u0441\u0440\u0403' > > Where did you get those bytes from? You got them from somewhere. For the sake of argument, pretend a user entered them in. > Who knows? Who cares? Once you have bytes, you can treat them as a blob of > arbitrary bytes and write them to the record using the Latin-1 trick. No, I can't. See above. > If > you're reading those bytes from some stream that gives you bytes, you > don't have to care where they came from. You're kidding, right? If I don't know where they came from (a graphics field? a note field?) how am I going to know how to treat them? > But what if you don't start with bytes? If you start with a bunch of > floats, you'll probably convert them to bytes using the struct module. Yup, and I do. > If you start with non-ASCII text, you have to convert them to bytes too. > No difference here. Really? You just said above that "it will work with any text data" -- you can't have it both ways. > You ask the user for their name, they answer "???" which is given to you > as a Unicode string, and you want to include it in your data record. The > specifications of your file format aren't clear, so I'm going to assume > that: > > 1) ASCII text is allowed "as-is" (that is, the name "George" will be > in the final data file as b'George'); User data is not (typically) where the ASCII data is, but some of the metadata is definitely and always ASCII. The user text data needs to be encoded using whichever codec is specified by the file, which is only occasionally ASCII. > 2) any other non-ASCII text will be encoded as some fixed encoding > which we can choose to suit ourselves; Well, the user chooses it, we have to abide by their choice. (It's kept in the file metadata.) > 3) arbitrary binary data is allowed "as-is" (i.e. byte N has to end up > being written as byte N, for any value of N between 0 and 255). In a couple field types, yes. Usually the binary data is numeric or date related and there is conversion going on there, too, to give me the bytes I need. [snip] >> --> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8').encode('latin1') >> Traceback (most recent call last): >> File "", line 1, in >> UnicodeEncodeError: 'latin-1' codec can't encode characters in position >> 0-2: ordinal not in range(256) > > That is backwards to what I've shown. Look at my earlier example again: And you are not paying attention: '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8').encode('latin1') \--------------------------------------/ \-------------/ a non-ascii compatible unicode string to latin1 bytes ("???".encode('some_non_ascii_encoding_such_as_cp1251').decode('latin-1'), 42, blob.decode('latin-1')) \----------------------------------------------/ \--------------/ getting the actual bytes I need and back into unicode until I write them later You did say to use a *text* template to manipulate my data, and then write it later, no? Well, this is what it would look like. > Bytes get DECODED to latin-1, not encoded. > > Bytes -> text is *decoding* > Text -> bytes is *encoding* Pretend for a moment I know that, and look at my examples again. I am demonstrating the contortions needed when my TEXTual data is not ASCII-compatible: It must be ENcoded using the appropriate codec to BYTES, then DEcoded back to unicode using latin1, all so later I can ENcode the bloomin' unicode data structure back to bytes using latin1 again. Dizzy yet? And you must know this, because it is what your bytify function does. Are you trolling? -- ~Ethan~ From ncoghlan at gmail.com Sun Jan 12 08:09:21 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jan 2014 17:09:21 +1000 Subject: [Python-Dev] byteformat() proposal: please critique In-Reply-To: References: Message-ID: On 12 January 2014 12:13, Brett Cannon wrote: > With that flexibility this matches what I have been mulling in the back of > my head all day. Basically everything that goes in is assumed to be bytes > unless {:s} says to expect something which can be passed to str() and then > use some specified encoding in all instances (stupid example following as it > might be easier with bytes.join, but it gets the point across):: > > formatter = format_bytes('latin1', 'strict') > http_response = formatter(b'Content-Type: {:s}\r\n\r\nContent-Length: > {:s}\r\n\r\n{}', 'image/jpeg', len(data), data) > > Nothing fancy, just an easy way to handle having to call str.encode() on > every text argument that is to end up as bytes as Terry is proposing (and > I'm fine with defaulting to ASCII/strict with no arguments). Otherwise you > do what R. David Murray suggested and just have people rely on their own API > which accepts what they want and then spits out what they want behind the > scenes. > > It basically comes down to how much tweaking of existing Python 2.7 > %/.format() calls people will be expected to make. I'm fine with asking > people to call a function like what Terry is proposing as it can do away > with baking in that ASCII is reasonable as well as not require a bunch of > work without us having to argue over what bytes.format() should or should > not do. Personally I say bytes.format() is fine but it shouldn't do any text > encoding which makes its usefulness rather minor (much like the other > text-like methods that got carried forward in hopes that they would be > useful to people porting code; maybe we should consider taking them out in > Python 4 or something if we find out no one is using them). There are several that are useful for manipulating binary data *as binary data*, including some of those that assume ASCII compatibility. Even some of the odd ones (like bytes.title) which we considered deprecating around 3.2 or so (if I recall correctly) were left because they're useful for HTTP style headers. The thing about them all is that even though they do assume ASCII compatibility, they don't do any implicit conversions between raw bytes and other formats - they're all purely about transforming binary data. PEP 460 as it currently stands is in the same vein - it doesn't blur the lines between binary data and other formats, but it *does* make binary data easier to work with, and in a way that is a subset of what Python 2 8-bit strings allowed, further increasing the size of the Python 2/3 source compatible subset. The line that is crossed by suggestions like including number formatting in PEP 460 is that those suggestions *do* introduce implicit encoding from structured semantic data (a numeric value) to a serialised format (the ASCII text representation of that number). Implicitly encoding text (even with the ASCII codec and strict error handling) similarly blurs the line between binary and text data again, and is the kind of change that gets rejected as attempting to reintroduce the Python 2 text model back into the Python 3 core types. That said, while I don't think such a hybrid type is appropriate as part of the *core* text model, I agree that such a type *could* be useful when implementing protocol handling code. That's why I suggested "asciicompat" to Benno as the package name for the home of asciistr - I think it could be a good home for various utilities designed for working with ASCII compatible binary protocols using a more text-like API than that offered by the bytes type in Python 3. I actually see much of this debate as akin to that over the API changes between Google's original ipaddr module and the ipaddress API in the standard library. The original ipaddr API is fine *if you already know how IP networks work* - it plays fast and loose with terminology, but in a way that you can deal with if you already know the real meaning of the underlying concepts. However, anyone attempting to go the other way (learning IP networking concepts from the ipaddr API) will be hopelessly, hopelessly confused because the terminology is used in *very* loose ways. So ipaddress tightened things up and made the names more formally correct, aiming to make it usable both as an address manipulation library *and* as a way of learning the underlying IP addressing concepts. I see the Python 2 str type as similar to the ipaddr API - if you already know what you're doing when it comes to Unicode, then it's pretty easy to work with. However, if you're trying to use it to *learn* Unicode concepts, then you're pretty much stuffed, as you get lost in a mazy of twisty values, as the same data type is used with very different semantics, depending on which end of a data transformation you're on (although sometimes you'll get a different data type, depending on the data *values* involved). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From g.brandl at gmx.net Sun Jan 12 08:12:04 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 12 Jan 2014 08:12:04 +0100 Subject: [Python-Dev] cpython (3.3): Update Sphinx toolchain. In-Reply-To: <52D1A58C.4020805@udel.edu> References: <3f1r8102P7z7LjX@mail.python.org> <52D1A58C.4020805@udel.edu> Message-ID: Am 11.01.2014 21:11, schrieb Terry Reedy: > On 1/11/2014 2:04 PM, georg.brandl wrote: >> http://hg.python.org/cpython/rev/87bdee4d633a >> changeset: 88413:87bdee4d633a >> branch: 3.3 >> parent: 88410:05e84d3ecd1e >> user: Georg Brandl >> date: Sat Jan 11 20:04:19 2014 +0100 >> summary: >> Update Sphinx toolchain. >> >> files: >> Doc/Makefile | 8 ++++---- >> 1 files changed, 4 insertions(+), 4 deletions(-) >> >> >> diff --git a/Doc/Makefile b/Doc/Makefile >> --- a/Doc/Makefile >> +++ b/Doc/Makefile >> @@ -41,19 +41,19 @@ >> checkout: >> @if [ ! -d tools/sphinx ]; then \ >> echo "Checking out Sphinx..."; \ >> - svn checkout $(SVNROOT)/external/Sphinx-1.0.7/sphinx tools/sphinx; \ >> + svn checkout $(SVNROOT)/external/Sphinx-1.2/sphinx tools/sphinx; \ >> fi > > Doc/make.bat needs to be similarly updated. Indeed, thanks for the reminder. Georg From ethan at stoneleaf.us Sun Jan 12 08:10:43 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 11 Jan 2014 23:10:43 -0800 Subject: [Python-Dev] test.support.check_warnings In-Reply-To: References: <52D1AD4D.8000701@stoneleaf.us> Message-ID: <52D23FF3.3060307@stoneleaf.us> On 01/11/2014 05:37 PM, Brett Cannon wrote: > > You're assuming the context manager is doing something magical to verify that all calls in the block raise the expected > exception. What you want to do is execute it in a loop:: > > for test in (...): > with support.check_warnings(("automatic int conversions have been deprecated", DeprecationWarning), quiet=False): > exec(test) Well, this is test.support! I expect magic! ;) Thanks for setting me straight, got it working. -- ~Ethan~ From ncoghlan at gmail.com Sun Jan 12 08:48:52 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jan 2014 17:48:52 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D1724D.4080307@egenix.com> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <52D1724D.4080307@egenix.com> Message-ID: On 12 January 2014 02:33, M.-A. Lemburg wrote: > On 11.01.2014 16:34, Nick Coghlan wrote: >> While that was an *expedient* (and, in fact, necessary) solution at >> the time, the fact it is still thoroughly confusing people 13 years >> later shows it is not a *comprehensible* solution. > > FWIW: I quite liked the Python 2 model, but perhaps that's because > I already knww how Unicode works, so could use it to make my > life easier ;-) Right, I tried to capture that in http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#what-actually-changed-in-the-text-model-between-python-2-and-python-3 by pointing out that there are two *very* different kinds of code to consider when discussing text modelling. Application code lives in a nice clean world of structured data, text data and binary data, with clean conversion functions for switching between them. Boundary code, by contrast, has to deal with the messy task of translating between them all. The Python 2 text model is a convenient model for boundary code, because it implicitly allows switch between binary and text interpretations of a data stream, and that's often useful due to the way protocols and file formats are designed. However, that kind of implicit switching is thoroughly inappropriate for *application* code. So Python 3 switches the core text model to one where implicitly switching between the binary domain and the text domain is considered a *bad* thing, and we object strongly to any proposals which suggest blurry the boundaries again, since that is going back to a boundary code model rather than an application code one. I've been saying for years that we may need a third type, but it has been nigh on impossible to get boundary code developers to say anything more useful than "I preferred the Python 2 model, that was more convenient for me". Yes, we know it was (we do maintain both of them, after all, and did the update for the standard library's own boundary code), but application developers are vastly more common, so boundary code developers lost out on that one and we need to come up with solutions that *respect* the Python 3 text model, rather than trying to change it back to the Python 2 one. > Seriously, Unicode has always caused heated discussions and > I don't expect this to change in the next 5-10 years. > > The point is: there is no 100% perfect solution either way and > when you acknowledge this, things don't look black and white anymore, > but instead full of colors :-) It would be nice if more boundary code developers actually did that rather than coming out with accusatory hyperbole and pining for the halcyon days of Python 2 where the text model favoured their use case over that of normal application developers. > Python 3 forces people to actually use Unicode; in Python 2 they > could easily avoid it. It's good to educate people on how it's > used and the issues you can run into, but let's not forget > that people are trying to get work done and we all love readable > code. > > PEP 460 just adds two more methods to the bytes object which come > in handy when formatting binary data; I don't think it has potential > to muddy the Python 3 text model, given that the bytes > object already exposes a dozen of other ASCII text methods :-) I dropped my objections to PEP 460 once Antoine fixed it to respect the boundaries between binary and text data. It's now a pure binary interpolation proposal, and one I think is a fine idea - there's no implicit encoding or decoding involved, it's just a tool for manipulating binary data. That leaves the implicit encoding and decoding to the third party asciistr type, as it should be. > asciistr is interesting in that it coerces to bytes instead > of to Unicode (as is the case in Python 2). Not quite - the idea of asciistr is that it is designed to be a *hybrid* type, like str was in Python 2. If it interacts with binary objects, it will give a binary result, if it interacts with text objects, it will give a text result. This makes it potentially suitable for use for constants in hybrid binary/text APIs like urllib.parse, allowing them to be implemented using a shared code path once again. The initial experimental implementation only works with 7 bit ASCII, but the UTF-8 caching in the PEP 393 implementation opens up the possibility of offering a non-strict mode in the future, as does the option of allowing arbitrary 8-bit data and disallowing interoperation with text strings in that case. > At the moment it doesn't cover the more common case bytes + str, > just str + bytes, but let's assume it would, Right, I suspect we have some overbroad PyUnicode_Check() calls in CPython that will need to be addressed before this substitution works seamlessly - that's one of the reasons I've been asking people to experiment with the idea since at least 2010 and let us know what doesn't work (nobody did though, until Benno agreed to try it out because it sounded like an interesting puzzle - I guess everyone else just found it easier to accuse us of being clueless idiots rather than considering trying to meet us halfway). > then you'd write > > ... > headers += asciistr('Length: %i bytes\n' % 123) If you're going to wait until *after* the formatting to do the conversion, you may as well just use encode explicitly: headers += ('Length: %i bytes\n' % 123).encode('ascii') The advantage of asciistr is that it allows you to abstract away the format strings for the headers in a way explicit encoding doesn't allow: FMT_LENGTH = asciistr('Length: %i bytes\n') headers += FMT_LENGTH % 123 headers += b'\n\n' body = b'...' socket.send(headers + body) You could do it inline as well: headers += asciistr('Length: %i bytes\n') % 123 But again, that doesn't offer a lot over simply explicitly encoding that fragment as ASCII. > With PEP 460, you could write the above as: > ... > headers += b'Length: %i bytes\n' % 123 > headers += b'\n\n' > body = b'...' > socket.send(headers + body) > ... > > IMO, that's more readable. At the cost of introducing an implicit encoding step again - it interpolates numbers into arbitrary binary sequences as ASCII text. That is thoroughly inappropriate in Python 3 - serialising semantically significant structured data (like numbers) as ASCII must always be opt in, either through environmental configuration (which has its own problems due to some undesirable default behaviour on POSIX systems - users will "opt in" to ASCII by mistake, not because they actually intended to), by passing it as an encoding argument, or by using a third party type like asciistr that is explicitly documented as only working with ASCII compatible data (whereas, with a couple of minor exceptions inherited from Python 2, the core bytes type is designed to work *correctly* with arbitrary binary data, and just has some *convenience* operations that assume ASCII data). > Both variants essentially do the same thing: they implicitly > coerce ASCII text strings to bytes, so conceptually, there's > little difference. There's all the difference in the world: asciistr is a separate third party type that is deliberately designed to only work correctly with ASCII compatible binary data. If you use it for data that *isn't* ASCII compatible, then the resulting data corruption is due to using the wrong type, rather than being an implicit behaviour of a builtin Python type. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jan 12 08:51:41 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jan 2014 17:51:41 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140111183828.65323250113@webabinitio.net> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <52D1724D.4080307@egenix.com> <20140111183828.65323250113@webabinitio.net> Message-ID: On 12 January 2014 04:38, R. David Murray wrote: > But! Our goal should be to help people convert to Python3. So how can > we find out what the specific problems are that real-world programs are > facing, look at the *actual code*, and help that project figure out the > best way to make that code work in both python2 and python3? > > That seems like the best way to find out what needs to be added to > python3 or pypi: help port the actual code of the developers who are > running into problems. > > Yes, I'm volunteering to help with this, though of course I can't promise > exactly how much time I'll have available. And, as has been the case for a long time, the PSF stands ready to help with funding credible grant proposals for Python 3 porting efforts. I believe some of the core devs (including David?) do freelance and contract work, so that's an option definitely worth considered if a project would like to support Python 3, but are having difficulty getting their with purely volunteer effort. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jan 12 08:53:20 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jan 2014 17:53:20 +1000 Subject: [Python-Dev] [Python-checkins] cpython (3.3): Issue #19092 - Raise a correct exception when cgi.FieldStorage is given an In-Reply-To: <3f27By07XNz7LkH@mail.python.org> References: <3f27By07XNz7LkH@mail.python.org> Message-ID: On 12 January 2014 16:22, senthil.kumaran wrote: > summary: > Issue #19092 - Raise a correct exception when cgi.FieldStorage is given an > invalid file-obj. Also use __bool__ to determine the bool of the FieldStorage > object. > Library > ------- > > +- Issue #19097: Raise the correct Exception when cgi.FieldStorage is given an > + Invalid fileobj. You may want to tweak the tracker so the comment ends up on the appropriate issue (#19092 is something else entirely) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From songofacandy at gmail.com Sun Jan 12 09:12:00 2014 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 12 Jan 2014 17:12:00 +0900 Subject: [Python-Dev] cpython (3.3): Update Sphinx toolchain. In-Reply-To: References: <3f1r8102P7z7LjX@mail.python.org> <52D1A58C.4020805@udel.edu> Message-ID: What about using venv and pip instead of svn? On Sun, Jan 12, 2014 at 4:12 PM, Georg Brandl wrote: > Am 11.01.2014 21:11, schrieb Terry Reedy: > > On 1/11/2014 2:04 PM, georg.brandl wrote: > >> http://hg.python.org/cpython/rev/87bdee4d633a > >> changeset: 88413:87bdee4d633a > >> branch: 3.3 > >> parent: 88410:05e84d3ecd1e > >> user: Georg Brandl > >> date: Sat Jan 11 20:04:19 2014 +0100 > >> summary: > >> Update Sphinx toolchain. > >> > >> files: > >> Doc/Makefile | 8 ++++---- > >> 1 files changed, 4 insertions(+), 4 deletions(-) > >> > >> > >> diff --git a/Doc/Makefile b/Doc/Makefile > >> --- a/Doc/Makefile > >> +++ b/Doc/Makefile > >> @@ -41,19 +41,19 @@ > >> checkout: > >> @if [ ! -d tools/sphinx ]; then \ > >> echo "Checking out Sphinx..."; \ > >> - svn checkout $(SVNROOT)/external/Sphinx-1.0.7/sphinx > tools/sphinx; \ > >> + svn checkout $(SVNROOT)/external/Sphinx-1.2/sphinx tools/sphinx; > \ > >> fi > > > > Doc/make.bat needs to be similarly updated. > > Indeed, thanks for the reminder. > > Georg > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Sun Jan 12 09:23:33 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 12 Jan 2014 09:23:33 +0100 Subject: [Python-Dev] cpython (3.3): Update Sphinx toolchain. In-Reply-To: References: <3f1r8102P7z7LjX@mail.python.org> <52D1A58C.4020805@udel.edu> Message-ID: Planned :) Georg Am 12.01.2014 09:12, schrieb INADA Naoki: > What about using venv and pip instead of svn? > > > On Sun, Jan 12, 2014 at 4:12 PM, Georg Brandl > wrote: > > Am 11.01.2014 21:11, schrieb Terry Reedy: > > On 1/11/2014 2:04 PM, georg.brandl wrote: > >> http://hg.python.org/cpython/rev/87bdee4d633a > >> changeset: 88413:87bdee4d633a > >> branch: 3.3 > >> parent: 88410:05e84d3ecd1e > >> user: Georg Brandl > > >> date: Sat Jan 11 20:04:19 2014 +0100 > >> summary: > >> Update Sphinx toolchain. > >> > >> files: > >> Doc/Makefile | 8 ++++---- > >> 1 files changed, 4 insertions(+), 4 deletions(-) > >> > >> > >> diff --git a/Doc/Makefile b/Doc/Makefile > >> --- a/Doc/Makefile > >> +++ b/Doc/Makefile > >> @@ -41,19 +41,19 @@ > >> checkout: > >> @if [ ! -d tools/sphinx ]; then \ > >> echo "Checking out Sphinx..."; \ > >> - svn checkout $(SVNROOT)/external/Sphinx-1.0.7/sphinx tools/sphinx; \ > >> + svn checkout $(SVNROOT)/external/Sphinx-1.2/sphinx tools/sphinx; \ > >> fi > > > > Doc/make.bat needs to be similarly updated. > > Indeed, thanks for the reminder. > > Georg > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com > > > > > -- > INADA Naoki > > > From p.f.moore at gmail.com Sun Jan 12 09:57:51 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Jan 2014 08:57:51 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> Message-ID: On 12 January 2014 01:01, Victor Stinner wrote: > Supporting formating integers would allow to write b"Content-Length: > %s\r\n" % 123, which would work on Python 2 and Python 3. I'm surprised that no-one is mentioning b"Content-Length: %s\r\n" % str(123) which works on Python 2 and 3, is explicit, and needs no special-casing of int in the format code. Paul From g.brandl at gmx.net Sun Jan 12 10:23:54 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 12 Jan 2014 10:23:54 +0100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> Message-ID: Am 12.01.2014 09:57, schrieb Paul Moore: > On 12 January 2014 01:01, Victor Stinner wrote: >> Supporting formating integers would allow to write b"Content-Length: >> %s\r\n" % 123, which would work on Python 2 and Python 3. > > I'm surprised that no-one is mentioning b"Content-Length: %s\r\n" % > str(123) which works on Python 2 and 3, is explicit, and needs no > special-casing of int in the format code. Certainly doesn't work on Python 3 right now, and never should :) Georg From rdmurray at bitdance.com Sun Jan 12 11:36:49 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 12 Jan 2014 05:36:49 -0500 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <52D1724D.4080307@egenix.com> <20140111183828.65323250113@webabinitio.net> Message-ID: <20140112103649.CDD6025005B@webabinitio.net> On Sun, 12 Jan 2014 17:51:41 +1000, Nick Coghlan wrote: > On 12 January 2014 04:38, R. David Murray wrote: > > But! Our goal should be to help people convert to Python3. So how can > > we find out what the specific problems are that real-world programs are > > facing, look at the *actual code*, and help that project figure out the > > best way to make that code work in both python2 and python3? > > > > That seems like the best way to find out what needs to be added to > > python3 or pypi: help port the actual code of the developers who are > > running into problems. > > > > Yes, I'm volunteering to help with this, though of course I can't promise > > exactly how much time I'll have available. > > And, as has been the case for a long time, the PSF stands ready to > help with funding credible grant proposals for Python 3 porting > efforts. I believe some of the core devs (including David?) do > freelance and contract work, so that's an option definitely worth > considered if a project would like to support Python 3, but are having > difficulty getting their with purely volunteer effort. Yes, I do contract programming, as part of Murray and Walker, Inc (web site coming soon but not there yet). And yes I currently have time available in my schedule. --David From juraj.sukop at gmail.com Sun Jan 12 12:52:18 2014 From: juraj.sukop at gmail.com (Juraj Sukop) Date: Sun, 12 Jan 2014 12:52:18 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140112013500.GW3869@ando> References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> Message-ID: On Sun, Jan 12, 2014 at 2:35 AM, Steven D'Aprano wrote: > On Sat, Jan 11, 2014 at 08:13:39PM -0200, Mariano Reingart wrote: > > > AFAIK (and just for the record), there could be both Latin1 text and > UTF-16 > > in a PDF (and other encodings too), depending on the font used: > [...] > > In Python2, txt is just a str, but in Python3 handling everything as > latin1 > > string obviously doesn't work for TTF in this case. > > Nobody is suggesting that you use Latin-1 for *everything*. We're > suggesting that you use it for blobs of binary data that represent > arbitrary bytes. First you have to get your binary data in the first > place, using whatever technique is necessary. Just to check I understood what you are saying. Instead of writing: content = b'\n'.join([ b'header', b'part 2 %.3f' % number, binary_image_data, utf16_string.encode('utf-16be'), b'trailer']) it should now look like: content = '\n'.join([ 'header', 'part 2 %.3f' % number, binary_image_data.decode('latin-1'), utf16_string.encode('utf-16be').decode('latin-1'), 'trailer']).encode('latin-1') Correct? -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Jan 12 13:08:56 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Jan 2014 12:08:56 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> Message-ID: On 12 January 2014 09:23, Georg Brandl wrote: >> On 12 January 2014 01:01, Victor Stinner wrote: >>> Supporting formating integers would allow to write b"Content-Length: >>> %s\r\n" % 123, which would work on Python 2 and Python 3. >> >> I'm surprised that no-one is mentioning b"Content-Length: %s\r\n" % >> str(123) which works on Python 2 and 3, is explicit, and needs no >> special-casing of int in the format code. > > Certainly doesn't work on Python 3 right now, and never should :) Sorry, I meant str(123).encode("ascii"), and I'd probably use a helper function for it. I could easily argue at this point that this is the type of bug that having %-formatting operations on bytes would encourage - %s means "format a string" (from years of C and Python (text) experience) so I automatically supply a string argument when using %s in a bytes formatting context. The reality is that I was probably just being sloppy, though :-) Paul From solipsis at pitrou.net Sun Jan 12 13:24:46 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 12 Jan 2014 13:24:46 +0100 Subject: [Python-Dev] test.support.check_warnings References: <52D1AD4D.8000701@stoneleaf.us> <52D23FF3.3060307@stoneleaf.us> Message-ID: <20140112132446.194a4ec1@fsol> On Sat, 11 Jan 2014 23:10:43 -0800 Ethan Furman wrote: > On 01/11/2014 05:37 PM, Brett Cannon wrote: > > > > You're assuming the context manager is doing something magical to verify that all calls in the block raise the expected > > exception. What you want to do is execute it in a loop:: > > > > for test in (...): > > with support.check_warnings(("automatic int conversions have been deprecated", DeprecationWarning), quiet=False): > > exec(test) > > Well, this is test.support! I expect magic! ;) > > Thanks for setting me straight, got it working. Or you could, you know, use the new assertWarns(): http://docs.python.org/dev/library/unittest.html#unittest.TestCase.assertWarns Regards Antoine. From ncoghlan at gmail.com Sun Jan 12 14:16:37 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jan 2014 23:16:37 +1000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> Message-ID: On 12 Jan 2014 21:53, "Juraj Sukop" wrote: > > > > > On Sun, Jan 12, 2014 at 2:35 AM, Steven D'Aprano wrote: >> >> On Sat, Jan 11, 2014 at 08:13:39PM -0200, Mariano Reingart wrote: >> >> > AFAIK (and just for the record), there could be both Latin1 text and UTF-16 >> > in a PDF (and other encodings too), depending on the font used: >> [...] >> > In Python2, txt is just a str, but in Python3 handling everything as latin1 >> > string obviously doesn't work for TTF in this case. >> >> Nobody is suggesting that you use Latin-1 for *everything*. We're >> suggesting that you use it for blobs of binary data that represent >> arbitrary bytes. First you have to get your binary data in the first >> place, using whatever technique is necessary. > > > Just to check I understood what you are saying. Instead of writing: > > content = b'\n'.join([ > b'header', > b'part 2 %.3f' % number, > binary_image_data, > utf16_string.encode('utf-16be'), > b'trailer']) > > it should now look like: > > content = '\n'.join([ > 'header', > 'part 2 %.3f' % number, > binary_image_data.decode('latin-1'), > utf16_string.encode('utf-16be').decode('latin-1'), > 'trailer']).encode('latin-1') Why are you proposing to do the *join* in text space? Encode all the parts separately, concatenate them with b'\n'.join() (or whatever separator is appropriate). It's only the *text formatting operation* that needs to be done in text space and then explicitly encoded (and this example doesn't even need latin-1,ASCII is sufficient): content = b'\n'.join([ b'header', ('part 2 %.3f' % number).encode('ascii'), binary_image_data, utf16_string.encode('utf-16be'), b'trailer']) > Correct? My updated version above is the reasonable way to do it in Python 3, and the one I consider clearly superior to reintroducing implicit encoding to ASCII as part of the core text model. This is why I *don't* have a problem with PEP 460 as it stands - it's just syntactic sugar for something you can already do with b''.join(), and thus not particularly controversial. It's only proposals that add any form of implicit encoding that silently switches from the text domain to the binary domain that conflict with the core Python 3 text model (although third party types remain largely free to do whatever they want). Cheers, Nick. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jan 12 14:23:45 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jan 2014 23:23:45 +1000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> Message-ID: On 12 Jan 2014 22:10, "Paul Moore" wrote: > > On 12 January 2014 09:23, Georg Brandl wrote: > >> On 12 January 2014 01:01, Victor Stinner wrote: > >>> Supporting formating integers would allow to write b"Content-Length: > >>> %s\r\n" % 123, which would work on Python 2 and Python 3. > >> > >> I'm surprised that no-one is mentioning b"Content-Length: %s\r\n" % > >> str(123) which works on Python 2 and 3, is explicit, and needs no > >> special-casing of int in the format code. > > > > Certainly doesn't work on Python 3 right now, and never should :) > > Sorry, I meant str(123).encode("ascii"), and I'd probably use a helper > function for it. > > I could easily argue at this point that this is the type of bug that > having %-formatting operations on bytes would encourage - %s means > "format a string" (from years of C and Python (text) experience) so I > automatically supply a string argument when using %s in a bytes > formatting context. > > The reality is that I was probably just being sloppy, though :-) It's also something asciistr will help with once it is working - asciistr(123) on the RHS will work in both versions. Cheers, Nick. > Paul > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Sun Jan 12 09:49:54 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 12 Jan 2014 11:49:54 +0300 Subject: [Python-Dev] cpython (3.3): Update Sphinx toolchain. In-Reply-To: References: <3f1r8102P7z7LjX@mail.python.org> <52D1A58C.4020805@udel.edu> Message-ID: And cross-platform automation tools in Python instead of make https://bitbucket.org/birkenfeld/sphinx/issue/456/makepy-command-script -- anatoly t. On Sun, Jan 12, 2014 at 11:12 AM, INADA Naoki wrote: > What about using venv and pip instead of svn? > > > On Sun, Jan 12, 2014 at 4:12 PM, Georg Brandl wrote: >> >> Am 11.01.2014 21:11, schrieb Terry Reedy: >> > On 1/11/2014 2:04 PM, georg.brandl wrote: >> >> http://hg.python.org/cpython/rev/87bdee4d633a >> >> changeset: 88413:87bdee4d633a >> >> branch: 3.3 >> >> parent: 88410:05e84d3ecd1e >> >> user: Georg Brandl >> >> date: Sat Jan 11 20:04:19 2014 +0100 >> >> summary: >> >> Update Sphinx toolchain. >> >> >> >> files: >> >> Doc/Makefile | 8 ++++---- >> >> 1 files changed, 4 insertions(+), 4 deletions(-) >> >> >> >> >> >> diff --git a/Doc/Makefile b/Doc/Makefile >> >> --- a/Doc/Makefile >> >> +++ b/Doc/Makefile >> >> @@ -41,19 +41,19 @@ >> >> checkout: >> >> @if [ ! -d tools/sphinx ]; then \ >> >> echo "Checking out Sphinx..."; \ >> >> - svn checkout $(SVNROOT)/external/Sphinx-1.0.7/sphinx >> >> tools/sphinx; \ >> >> + svn checkout $(SVNROOT)/external/Sphinx-1.2/sphinx tools/sphinx; >> >> \ >> >> fi >> > >> > Doc/make.bat needs to be similarly updated. >> >> Indeed, thanks for the reminder. >> >> Georg >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com > > > > > -- > INADA Naoki > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/techtonik%40gmail.com > From nachshon.armon at gmail.com Sun Jan 12 12:27:26 2014 From: nachshon.armon at gmail.com (Nachshon David Armon) Date: Sun, 12 Jan 2014 13:27:26 +0200 Subject: [Python-Dev] Common subset of python 2 and python 3 Message-ID: Hi, I am Nachshon and this is my first post to the python mailing list. I have been porting some libraries from python 2 to python 3 recently with the goal of a common codebase that will run on both versions. I was thinking it would make my life, and a lot of other developers as well, a lot easier if there were a version of python that supported ONLY the features found both in python 2 and python 3. It should be a developer only version of python. It should use unicode strings and require that people use the from __future__ syntax so that anything written in it will work in python 2.7 and in python 3.3+. Regarding name changes of standard library modules it should support the new stuff and have helper functions and guides that make the old modules likethe new ones. it should encourage using backports of the new standard library modules like enum so that developers are not stuck for features. I propose that this new version of python use the python 3 unicode model. As the version of python will be fully compatible with both python 2 and with python 3 but NOT necsesarily with all existing code in either. It is designed as a porting tool only. I suggest that this new python version should be called python 2 and 9 tenths. Is it worth it for me to write a pep that suggests this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jan 12 14:52:11 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jan 2014 23:52:11 +1000 Subject: [Python-Dev] Common subset of python 2 and python 3 In-Reply-To: References: Message-ID: Hi Nachson, Python 2.7 with the -3 warning flag covers most of this, while using tox to run automated tests under both 2.x and 3.x should cover the rest (tox is also useful for checking code runs under Python 2.6, even if you normally use a newer version). Is there anything in particular you feel isn't covered by the combination of those two approaches? Regards, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Sun Jan 12 14:53:48 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 12 Jan 2014 14:53:48 +0100 Subject: [Python-Dev] cpython (3.3): Update Sphinx toolchain. In-Reply-To: References: <3f1r8102P7z7LjX@mail.python.org> <52D1A58C.4020805@udel.edu> Message-ID: That's also planned, see https://bitbucket.org/birkenfeld/sphinx-new-make-mode/. Georg Am 12.01.2014 09:49, schrieb anatoly techtonik: > And cross-platform automation tools in Python instead of make > https://bitbucket.org/birkenfeld/sphinx/issue/456/makepy-command-script > -- > anatoly t. > > > On Sun, Jan 12, 2014 at 11:12 AM, INADA Naoki wrote: >> What about using venv and pip instead of svn? From ncoghlan at gmail.com Sun Jan 12 14:58:09 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Jan 2014 23:58:09 +1000 Subject: [Python-Dev] Common subset of python 2 and python 3 In-Reply-To: References: Message-ID: On 12 Jan 2014 23:39, "Nachshon David Armon" wrote: > > Hi, > I am Nachshon and this is my first post to the python mailing list. > > I have been porting some libraries from python 2 to python 3 recently with the goal of a common codebase that will run on both versions. I was thinking it would make my life, and a lot of other developers as well, a lot easier if there were a version of python that supported ONLY the features found both in python 2 and python 3. It should be a developer only version of python. > > It should use unicode strings and require that people use the from __future__ syntax so that anything written in it will work in python 2.7 and in python 3.3+. > > Regarding name changes of standard library modules it should support the new stuff and have helper functions and guides that make the old modules likethe new ones. it should encourage using backports of the new standard library modules like enum so that developers are not stuck for features. > > I propose that this new version of python use the python 3 unicode model. As the version of python will be fully compatible with both python 2 and with python 3 but NOT necsesarily with all existing code in either. It is designed as a porting tool only. Ah, I missed this on the first read through - that combination of requirements doesn't quite make sense (the text models are fundamentally incompatible in a way that forces developers to resolve ambiguities that Python 2 would silently tolerate until it hit a bad combination of input data). You may want to take a look at the "python-future" project - that comes as close as anything else I am aware of to allowing you to write Python 2 code that reads like idiomatic Python 3 code. Cheers, Nick. > I suggest that this new python version should be called python 2 and 9 tenths. Is it worth it for me to write a pep that suggests this? > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From senthil at uthcode.com Sun Jan 12 16:00:56 2014 From: senthil at uthcode.com (Senthil Kumaran) Date: Sun, 12 Jan 2014 07:00:56 -0800 Subject: [Python-Dev] [Python-checkins] cpython (3.3): Issue #19092 - Raise a correct exception when cgi.FieldStorage is given an In-Reply-To: References: <3f27By07XNz7LkH@mail.python.org> Message-ID: On Sat, Jan 11, 2014 at 11:53 PM, Nick Coghlan wrote: > You may want to tweak the tracker so the comment ends up on the > appropriate issue (#19092 is something else entirely) > Yes. This was supposed to be #19097. My bad. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kristjan at ccpgames.com Sun Jan 12 15:50:52 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Sun, 12 Jan 2014 14:50:52 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> , Message-ID: Well, my suggestion would that we _should_ make it work, by having the %s format specifyer on bytes objects mean: str(arg).encode('ascii', 'strict') It would be an explicit encoding operator with a known, fixed, and well specified encoder. This would cover most of the use cases seen in this threadnought. Others could be handled with explicit str formatting and encoding. Imho, this is not equivalent to re-introducing automatic type conversion between binary/unicode, it is adding a specific convenience function for explicitly asking for ASCII encoding. K ________________________________________ From: Python-Dev [python-dev-bounces+kristjan=ccpgames.com at python.org] on behalf of Georg Brandl [g.brandl at gmx.net] Sent: Sunday, January 12, 2014 09:23 To: python-dev at python.org Subject: Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake Am 12.01.2014 09:57, schrieb Paul Moore: > On 12 January 2014 01:01, Victor Stinner wrote: >> Supporting formating integers would allow to write b"Content-Length: >> %s\r\n" % 123, which would work on Python 2 and Python 3. > > I'm surprised that no-one is mentioning b"Content-Length: %s\r\n" % > str(123) which works on Python 2 and 3, is explicit, and needs no > special-casing of int in the format code. Certainly doesn't work on Python 3 right now, and never should :) Georg From regebro at gmail.com Sun Jan 12 16:48:05 2014 From: regebro at gmail.com (Lennart Regebro) Date: Sun, 12 Jan 2014 16:48:05 +0100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: Message-ID: On Sat, Jan 11, 2014 at 8:40 PM, Kristj?n Valur J?nsson wrote: > Hi there. > How about a compromise? > Personally, I think adding the full complement of integer/float formatting to bytes is a bit over the top. > How about just supporting two format specifiers? > %b : interpolate a bytes object. If it doesn't have the buffer interface, error. > %s : interpolate a str object, encoded to ASCII using 'strict' conversion. > > This should cover the most common use cases. > In particular, you could do this: > > Headers.append('Content-Length: %s'%(len(data),)) > > And then subsequently: > Packet = b'%b%b'%(b"join(headers), data) > > For more complex formatting, you delegate to the more capable string class, but benefit from automatic ASCII conversion: > > Data = b"percentage = %s" % ("%4.2f" % (value,)) Although nice and clean as principle, I think it makes for somewhat messy code. I'm in favor of having float and integer specifiers as well. I'm also for including %s, because it makes moving from Python 2 easier. But it should definitely error out if you try to feed it a non-ascii string. //Lennart From ncoghlan at gmail.com Sun Jan 12 17:09:27 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Jan 2014 02:09:27 +1000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> Message-ID: On 13 Jan 2014 01:22, "Kristj?n Valur J?nsson" wrote: > > > Well, my suggestion would that we _should_ make it work, by having the %s format specifyer on bytes objects mean: str(arg).encode('ascii', 'strict') > It would be an explicit encoding operator with a known, fixed, and well specified encoder. > This would cover most of the use cases seen in this threadnought. Others could be handled with explicit str formatting and encoding. > > Imho, this is not equivalent to re-introducing automatic type conversion between binary/unicode, it is adding a specific convenience function for explicitly asking for ASCII encoding. It is not explicit, it is implicit - whether or not the resulting string assumes ASCII compatibility or not depends on whether you pass a binary value (no assumption) or a string value (assumes ASCII compatibility). This kind of data driven change in assumptions about correctness is utterly unacceptable in the core text and binary types in Python 3. It's also completely unnecessary - asciistr will be a third party extension type that allows those users pining for the halcyon days of the Python 2 str type to stop harassing the core devs with requests to compromise the core Python 3 text model with implicit encoding operations. I'll ensure any interoperability bugs between asciistr and the core types that can't be worked around get fixed. A separate type is genuinely explicit (since the ASCII assumption is no longer hidden from the type system), and allows much simpler interoperability for code that wants (indexing asciistr will eventually produce length 1 asciistr instances instead of str instances, it will avoid the bytes(intval) discrepancy, it will avoid the str(bytesval) problem, etc). I've been suggesting for years that Python 3 might need a third type (not required to be a builtin, since it's so specialised), but folks migrating from Python 2 have been so focused on making the core binary type a hybrid type again, the notion of taking advantage of PEP 393 to create a dedicated extension type specifically for working with ASCII compatible binary protocols has failed to compute. I'm hoping a test suite and preliminary implementation will help more people to finally get the point. Regards, Nick. > > K > ________________________________________ > From: Python-Dev [python-dev-bounces+kristjan=ccpgames.com at python.org] on behalf of Georg Brandl [g.brandl at gmx.net] > Sent: Sunday, January 12, 2014 09:23 > To: python-dev at python.org > Subject: Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake > > Am 12.01.2014 09:57, schrieb Paul Moore: > > On 12 January 2014 01:01, Victor Stinner wrote: > >> Supporting formating integers would allow to write b"Content-Length: > >> %s\r\n" % 123, which would work on Python 2 and Python 3. > > > > I'm surprised that no-one is mentioning b"Content-Length: %s\r\n" % > > str(123) which works on Python 2 and 3, is explicit, and needs no > > special-casing of int in the format code. > > Certainly doesn't work on Python 3 right now, and never should :) > > Georg > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From juraj.sukop at gmail.com Sun Jan 12 17:43:05 2014 From: juraj.sukop at gmail.com (Juraj Sukop) Date: Sun, 12 Jan 2014 17:43:05 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> Message-ID: On Sun, Jan 12, 2014 at 2:16 PM, Nick Coghlan wrote: > Why are you proposing to do the *join* in text space? Encode all the parts > separately, concatenate them with b'\n'.join() (or whatever separator is > appropriate). It's only the *text formatting operation* that needs to be > done in text space and then explicitly encoded (and this example doesn't > even need latin-1,ASCII is sufficient): > I apparently misunderstood what was Steven suggesting, thanks for the clarification. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Jan 12 16:57:18 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 07:57:18 -0800 Subject: [Python-Dev] test.support.check_warnings In-Reply-To: <20140112132446.194a4ec1@fsol> References: <52D1AD4D.8000701@stoneleaf.us> <52D23FF3.3060307@stoneleaf.us> <20140112132446.194a4ec1@fsol> Message-ID: <52D2BB5E.6040401@stoneleaf.us> On 01/12/2014 04:24 AM, Antoine Pitrou wrote: > On Sat, 11 Jan 2014 23:10:43 -0800 > Ethan Furman wrote: >> On 01/11/2014 05:37 PM, Brett Cannon wrote: >>> >>> You're assuming the context manager is doing something magical to verify that all calls in the block raise the expected >>> exception. What you want to do is execute it in a loop:: >>> >>> for test in (...): >>> with support.check_warnings(("automatic int conversions have been deprecated", DeprecationWarning), quiet=False): >>> exec(test) >> >> Well, this is test.support! I expect magic! ;) >> >> Thanks for setting me straight, got it working. > > Or you could, you know, use the new assertWarns(): > http://docs.python.org/dev/library/unittest.html#unittest.TestCase.assertWarns That's also cool. If I have to touch that code again I'll switch to it. -- ~Ethan~ From ethan at stoneleaf.us Sun Jan 12 17:21:20 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 08:21:20 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> Message-ID: <52D2C100.8010409@stoneleaf.us> On 01/12/2014 08:09 AM, Nick Coghlan wrote: > On 13 Jan 2014 01:22, "Kristj?n Valur J?nsson" wrote: >> >> Imho, this is not equivalent to re-introducing automatic type conversion between binary/unicode, it is adding a specific convenience function for explicitly asking for ASCII encoding. > > It is not explicit, it is implicit - whether or not the resulting string assumes ASCII compatibility or not depends on > whether you pass a binary value (no assumption) or a string value (assumes ASCII compatibility). Nick, I don't understand what you are saying here. Are you saying that the result of b'%s' % var may be either a bytes object or a str object? Because that would be wrong -- it would always be a bytes object. -- ~Ethan~ From kristjan at ccpgames.com Sun Jan 12 17:52:31 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Sun, 12 Jan 2014 16:52:31 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> , Message-ID: Now you're just splitting hairs, Nick. An explicit operator, %s, _defined_ to be "encode a string object using strict ascii", how is that any less explicit than the .encode('ascii', 'strict') spelt out in full? The language is full of constructs that are shorthands for others, more lengthy but equivalent things. I mean, basically what I am suggesting is that in addition to %b with def helper(o): return str(o).encode('ascii', 'strict') b'foo%bbar'%(helper(myobj), ) you have b'foo%sbar'%(myobj, ) There is no "data driven change in assumptions." Just an interpolation operator with a clearly defined meaning. I don't think anyone is trying to compromise the text model. All people are asking for is that the _boundary_ is made a little easier to deal with. K ________________________________ From: Nick Coghlan [ncoghlan at gmail.com] Sent: Sunday, January 12, 2014 16:09 To: Kristj?n Valur J?nsson Cc: python-dev at python.org; Georg Brandl Subject: Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake It is not explicit, it is implicit - whether or not the resulting string assumes ASCII compatibility or not depends on whether you pass a binary value (no assumption) or a string value (assumes ASCII compatibility). This kind of data driven change in assumptions about correctness is utterly unacceptable in the core text and binary types in Python 3. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Jan 12 18:03:41 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 09:03:41 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D2C100.8010409@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> Message-ID: <52D2CAED.7090502@stoneleaf.us> On 01/12/2014 08:21 AM, Ethan Furman wrote: > On 01/12/2014 08:09 AM, Nick Coghlan wrote: >> On 13 Jan 2014 01:22, "Kristj?n Valur J?nsson" wrote: >>> >>> Imho, this is not equivalent to re-introducing automatic type conversion between binary/unicode, it is adding a >>> specific convenience function for explicitly asking for ASCII encoding. >> >> It is not explicit, it is implicit - whether or not the resulting string assumes ASCII compatibility or not depends on >> whether you pass a binary value (no assumption) or a string value (assumes ASCII compatibility). > > Nick, I don't understand what you are saying here. Are you saying that the result of b'%s' % var may be either a bytes > object or a str object? Because that would be wrong -- it would always be a bytes object. Okay, I just went and took a closer look at the asciistr type [1]. For what it's worth I don't think this is Antoine's understanding of what we [2] are asking for, nor is it what we are asking for (I'm sure Antoine will correct me if I'm wrong. ;) We know full well the difference between unicode and bytes, and we know full well that numbers and much of the text we need has an ASCII (bytes!) representation. When we do a b'Content Length: %d' % len(binary_data) we are expecting to get back a bytes object, /not/ a unicode object. Your asciistr, which sometimes returns bytes and sometimes returns text, is absolutely *not* what we want. -- ~Ethan~ [1] https://github.com/jeamland/asciicompat [2] the dbf and pdf folks, at least From p.f.moore at gmail.com Sun Jan 12 18:04:20 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Jan 2014 17:04:20 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> Message-ID: On 12 January 2014 16:52, Kristj?n Valur J?nsson wrote: > I mean, basically what I am suggesting is that in addition to %b with > > def helper(o): > return str(o).encode('ascii', 'strict') > > b'foo%bbar'%(helper(myobj), ) > > you have > > b'foo%sbar'%(myobj, ) But that's not what the current PEP says. It uses %s for interpolating bytes values. It looks like you're saying that b'abc %s' % (b'def') will *not* produce b'abc def', but rather will produce b'abc b\'def\'' (because str(b'def'') is "b'def'"). If that's what you're saying, then fine, but it's a different PEP and I for one am -1 specifically because of the behaviour I show above. Paul From mark at hotpy.org Sun Jan 12 18:06:46 2014 From: mark at hotpy.org (Mark Shannon) Date: Sun, 12 Jan 2014 17:06:46 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> , Message-ID: <52D2CBA6.8050006@hotpy.org> On 12/01/14 16:52, Kristj?n Valur J?nsson wrote: > Now you're just splitting hairs, Nick. > > An explicit operator, %s, _defined_ to be "encode a string object using > strict ascii", I don't like this because '%s' reads to me as "insert *string* here". I think '%a' which reads as "encode as ASCII and insert here" would be better. > > how is that any less explicit than the .encode('ascii', 'strict') spelt > out in full? The language is full of constructs that are shorthands for > others, more lengthy but equivalent things. > > I mean, basically what I am suggesting is that in addition to %b with > > def helper(o): > > return str(o).encode('ascii', 'strict') > > b'foo*%b*bar'%(helper(myobj), ) > > you have > > b'foo*%s*bar'%(myobj, ) > > There is no "data driven change in assumptions." Just an interpolation > operator with a clearly defined meaning. > > I don't think anyone is trying to compromise the text model. All people > are asking for is that the _boundary_ is made a little easier to deal with. > > K > > ------------------------------------------------------------------------ > *From:* Nick Coghlan [ncoghlan at gmail.com] > *Sent:* Sunday, January 12, 2014 16:09 > *To:* Kristj?n Valur J?nsson > *Cc:* python-dev at python.org; Georg Brandl > *Subject:* Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake > > It is not explicit, it is implicit - whether or not the resulting string > assumes ASCII compatibility or not depends on whether you pass a binary > value (no assumption) or a string value (assumes ASCII compatibility). > This kind of data driven change in assumptions about correctness is > utterly unacceptable in the core text and binary types in Python 3. > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/mark%40hotpy.org > From steve at pearwood.info Sun Jan 12 18:22:21 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 13 Jan 2014 04:22:21 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> Message-ID: <20140112172220.GB3869@ando> On Sun, Jan 12, 2014 at 12:52:18PM +0100, Juraj Sukop wrote: > On Sun, Jan 12, 2014 at 2:35 AM, Steven D'Aprano wrote: > > > On Sat, Jan 11, 2014 at 08:13:39PM -0200, Mariano Reingart wrote: > > > > > AFAIK (and just for the record), there could be both Latin1 text and > > UTF-16 > > > in a PDF (and other encodings too), depending on the font used: > > [...] > > > In Python2, txt is just a str, but in Python3 handling everything as > > latin1 > > > string obviously doesn't work for TTF in this case. > > > > Nobody is suggesting that you use Latin-1 for *everything*. We're > > suggesting that you use it for blobs of binary data that represent > > arbitrary bytes. First you have to get your binary data in the first > > place, using whatever technique is necessary. > > > Just to check I understood what you are saying. Instead of writing: > > content = b'\n'.join([ > b'header', > b'part 2 %.3f' % number, > binary_image_data, > utf16_string.encode('utf-16be'), > b'trailer']) Which doesn't work, since bytes don't support %f in Python 3. > it should now look like: > > content = '\n'.join([ > 'header', > 'part 2 %.3f' % number, > binary_image_data.decode('latin-1'), > utf16_string.encode('utf-16be').decode('latin-1'), > 'trailer']).encode('latin-1') > > Correct? Not quite as you show. First, "utf16_string" confuses me. What is it? If it is a Unicode string, i.e.: # Python 3 semantics type(utf16_string) => returns str then the name is horribly misleading, and it is best handled like this: content = '\n'.join([ 'header', 'part 2 %.3f' % number, binary_image_data.decode('latin-1'), utf16_string, # Misleading name, actually Unicode string 'trailer']) Note that since it's text, and content is text, there is no need to encode then decode. "UTF-16" is not another name for "Unicode". Unicode is a character set. UTF-16 is just one of a number of different encodings which map the 0x10FFFF distinct Unicode characters (actually "code points") to bytes. UTF-16 is one possible way to implement Unicode strings in memory, but not the only way. Python has, or does, use four distinct implementations: 1) UTF-16 in "narrow builds" 2) UTF-32 in "wide builds" 3) a hybrid approach starting in Python 3.3, where strings are stored as either: 3a) Latin-1 3b) UCS-2 3c) UTF-32 depending on the content of the string. So calling an arbitrary string "utf16_string" is misleading or wrong. On the other hand, if it is actually a bytes object which is the product of UTF-16 encoding, i.e.: type(utf16_string) => returns bytes and those bytes were generated by "some text".encode("utf-16"), then it is already binary data and needs to be smuggled into the text string. Latin-1 is good for that: content = '\n'.join([ 'header', 'part 2 %.3f' % number, binary_image_data.decode('latin-1'), utf16_string.decode('latin-1'), 'trailer']) Both examples assume that you intend to do further processing of content before sending it, and will encode just before sending: content.encode('utf-8') (Don't use Latin-1, since it cannot handle the full range of text characters.) If that's not the case, then perhaps this is better suited to what you are doing: content = b'\n'.join([ b'header', ('part 2 %.3f' % number).encode('ascii'), binary_image_data, # already bytes utf16_string, # already bytes b'trailer']) -- Steven From breamoreboy at yahoo.co.uk Sun Jan 12 18:23:00 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 12 Jan 2014 17:23:00 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D2CBA6.8050006@hotpy.org> References: <20140111193226.23cc771d@fsol> , <52D2CBA6.8050006@hotpy.org> Message-ID: On 12/01/2014 17:06, Mark Shannon wrote: > On 12/01/14 16:52, Kristj?n Valur J?nsson wrote: >> Now you're just splitting hairs, Nick. >> >> An explicit operator, %s, _defined_ to be "encode a string object using >> strict ascii", > > I don't like this because '%s' reads to me as "insert *string* here". > I think '%a' which reads as "encode as ASCII and insert here" would be > better. > I entirely agree. This would also parallel the conversion flags given here http://docs.python.org/3/library/string.html#format-string-syntax, I quote "Three conversion flags are currently supported: '!s' which calls str() on the value, '!r' which calls repr() and '!a' which calls ascii()". -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From p.f.moore at gmail.com Sun Jan 12 18:26:15 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Jan 2014 17:26:15 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D2CAED.7090502@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> Message-ID: On 12 January 2014 17:03, Ethan Furman wrote: > We know full well the difference between unicode and bytes, and we know full > well that numbers and much of the text we need has an ASCII (bytes!) > representation. When we do a b'Content Length: %d' % len(binary_data) we > are expecting to get back a bytes object, /not/ a unicode object. What I am struggling to understand here is what room for compromise there is. Clearly, for whatever reason, b'Content Length: ' + str(len(binary_data)).encode('ascii')) is not acceptable for you. OK, fair enough. Also, apparently, writing a helper def int_to_bytes(n): return str(n).encode('ascii') b'Content Length: ' + int_to_bytes(len(binary_data)) is unacceptable. But I'm not clear why it's unacceptable. Maybe I missed the explanation - God knows, the thread is long enough :-) On the other hand, Nick has explained why b'Content Length: %d' % len(binary_data) is unacceptable to him (you don't have to agree with his opinion, just concede that he has explained his position in a way that you understand). I'm not trying to argue you're wrong - I don't know your codebase, nor do I know your application area. But surely somewhere between "we must have % formatting including %d for bytes" and the above, there's a middle ground that you *are* willing to accept? Can you give any indications of what that might be? What, specifically, about the helper function is the problem? I don't think it is any less space efficient, it doesn't double-encode, and I don't think it's more difficult to understand (although it is a little longer, it trades that off against being a bit more explicit as to what's going on). Surely you're not arguing that your code must work unchanged (not "there's a way of writing the code so it works on Python 2 and 3", but "the code you currently have for Python 2 must work with no changes at all")? Can you give an example of code that is *nearly* acceptable to you, which works in Python 2 and 3 today, and explain what improvements you would like to see to it in order to use it instead of waiting for a core change? Paul From steve at pearwood.info Sun Jan 12 18:36:57 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 13 Jan 2014 04:36:57 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> Message-ID: <20140112173657.GC3869@ando> On Sun, Jan 12, 2014 at 11:16:37PM +1000, Nick Coghlan wrote: > > content = '\n'.join([ > > 'header', > > 'part 2 %.3f' % number, > > binary_image_data.decode('latin-1'), > > utf16_string.encode('utf-16be').decode('latin-1'), > > 'trailer']).encode('latin-1') > > Why are you proposing to do the *join* in text space? In defence of that, doing the join as text may be useful if you have additional text processing that you want to do after assembling the whole string, but before calling encode. Even if you intend to encode to bytes at the end, you might prefer to work in the text domain right until just before the end: - no need for b' prefixes; - indexing a string returns a 1-char string, not an int; - can use the full range of % formatting, etc. -- Steven From nachshon.armon at gmail.com Sun Jan 12 18:39:55 2014 From: nachshon.armon at gmail.com (Nachshon David Armon) Date: Sun, 12 Jan 2014 19:39:55 +0200 Subject: [Python-Dev] Common subset of python 2 and python 3 In-Reply-To: References: Message-ID: On Sun, Jan 12, 2014 at 3:58 PM, Nick Coghlan wrote: > > On 12 Jan 2014 23:39, "Nachshon David Armon" > wrote: >> >> I propose that this new version of python use the python 3 unicode model. >> As the version of python will be fully compatible with both python 2 and >> with python 3 but NOT necsesarily with all existing code in either. It is >> designed as a porting tool only. > > Ah, I missed this on the first read through - that combination of > requirements doesn't quite make sense (the text models are fundamentally > incompatible in a way that forces developers to resolve ambiguities that > Python 2 would silently tolerate until it hit a bad combination of input > data). while that is true, it is possible to program unicode correctly in python 2 while remaining compatible with python 3. (a combination of "from future import unicode_literal" and properly using the encode and decode functions.). I would prefer a stripped version of python 3 that does not support anything that will really conflict with python 2. for porting purposes only of course. my employer still uses python 2 so the idea is to force other developers to use something that will force working on both during the transition without every single one having to be extra careful to support both versions. From juraj.sukop at gmail.com Sun Jan 12 18:57:14 2014 From: juraj.sukop at gmail.com (Juraj Sukop) Date: Sun, 12 Jan 2014 18:57:14 +0100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140112172220.GB3869@ando> References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> <20140112172220.GB3869@ando> Message-ID: Wait a second, this is how I understood it but what Nick said made me think otherwise... On Sun, Jan 12, 2014 at 6:22 PM, Steven D'Aprano wrote: > On Sun, Jan 12, 2014 at 12:52:18PM +0100, Juraj Sukop wrote: > > On Sun, Jan 12, 2014 at 2:35 AM, Steven D'Aprano >wrote: > > > > Just to check I understood what you are saying. Instead of writing: > > > > content = b'\n'.join([ > > b'header', > > b'part 2 %.3f' % number, > > binary_image_data, > > utf16_string.encode('utf-16be'), > > b'trailer']) > > Which doesn't work, since bytes don't support %f in Python 3. > I know and this was an example of the ideal (for me, anyway) way of formatting bytes. > First, "utf16_string" confuses me. What is it? If it is a Unicode > string, i.e.: > It is a Unicode string which happens to contain code points outside U+00FF (as with the TTF example above), so that it triggers the (at least) 2-bytes memory representation in CPython 3.3+. I agree, I chose the variable name poorly, my bad. > > content = '\n'.join([ > 'header', > 'part 2 %.3f' % number, > binary_image_data.decode('latin-1'), > utf16_string, # Misleading name, actually Unicode string > 'trailer']) > Which, because of that horribly-named-variable, prevents the use of simple memcpy and makes the image data occupy way more memory than as when it was in simple bytes. > Both examples assume that you intend to do further processing of content > before sending it, and will encode just before sending: > Not really, I was interested to compare it to bytes formatting, hence it included the "encode()" as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Jan 12 19:26:56 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 10:26:56 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> Message-ID: <52D2DE70.7080105@stoneleaf.us> On 01/12/2014 09:26 AM, Paul Moore wrote: > On 12 January 2014 17:03, Ethan Furman wrote: >> We know full well the difference between unicode and bytes, and we know full >> well that numbers and much of the text we need has an ASCII (bytes!) >> representation. When we do a b'Content Length: %d' % len(binary_data) we >> are expecting to get back a bytes object, /not/ a unicode object. > > What I am struggling to understand here is what room for compromise > there is. Clearly, for whatever reason, > > b'Content Length: ' + str(len(binary_data)).encode('ascii')) > > is not acceptable for you. OK, fair enough. Also, apparently, writing a helper > > def int_to_bytes(n): > return str(n).encode('ascii') > > b'Content Length: ' + int_to_bytes(len(binary_data)) > > is unacceptable. But I'm not clear why it's unacceptable. Maybe I > missed the explanation - God knows, the thread is long enough :-) True enough! ;) It's unacceptable in the sense that the bytes type is /almost/ there, it's /almost/ what is needed to handle the boundary conditions. We have a __bytes__ method (how is it supposed to be used?) that could be made to fit the interpolation bill. It seems to me the core of Nick's refusal is the (and I agree!) rejection of bytes interpolation returning unicode -- but that's not what I'm asking for! I'm asking for it to return bytes, with the interpolated data (in the case if %d, %s, etc) being strictly-ASCII encoded. > On the other hand, Nick has explained why b'Content Length: %d' % > len(binary_data) is unacceptable to him (you don't have to agree with > his opinion, just concede that he has explained his position in a way > that you understand). Only because he (or Benno) finally wrote some tests and I was able to see what he thought I was wanting. Which does seem to leave a *tiny* bit of wiggle room if bytes interpolation always return bytes, and never a unicode (yeah, I know, snowball's chance and all that). > I'm not trying to argue you're wrong - I don't know your codebase, nor > do I know your application area. But surely somewhere between "we must > have % formatting including %d for bytes" and the above, there's a > middle ground that you *are* willing to accept? Can you give any > indications of what that might be? What, specifically, about the > helper function is the problem? I don't think it is any less space > efficient, it doesn't double-encode, and I don't think it's more > difficult to understand (although it is a little longer, it trades > that off against being a bit more explicit as to what's going on). > Surely you're not arguing that your code must work unchanged (not > "there's a way of writing the code so it works on Python 2 and 3", but > "the code you currently have for Python 2 must work with no changes at > all")? I'm arguing from three PoVs: 1) 2 & 3 compatible code base 2) having the bytes type /be/ the boundary type 3) readable code > Can you give an example of code that is *nearly* acceptable to you, > which works in Python 2 and 3 today, and explain what improvements you > would like to see to it in order to use it instead of waiting for a > core change? I'm not trying to be difficult (just naturally good at it, I guess ;) , but I don't see a lot room for compromises -- I would like % interpolation, I'm told I have to use a helper function. I will if I have to, but first I have to try and make myself understood, and I'm not sure that has happened yet. Following Nick's example I'm writing up some tests that clearly show what I would like to see. Then at least we can debate what I'm actually asking for, and now what the (understandably) unicode-what-a-mess-we-had-in-py2k-don't-want-again that some think I am asking for. -- ~Ethan~ From p.f.moore at gmail.com Sun Jan 12 20:00:32 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Jan 2014 19:00:32 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D2DE70.7080105@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> Message-ID: On 12 January 2014 18:26, Ethan Furman wrote: > True enough! ;) It's unacceptable in the sense that the bytes type is > /almost/ there, it's /almost/ what is needed to handle the boundary > conditions. We have a __bytes__ method (how is it supposed to be used?) > that could be made to fit the interpolation bill. And yet I still don't follow what you *want*. Unless it's that b'%d' % (12,) must work and give b'12', and nothing else is acceptable. Maybe more accurately, I don't see what you want to do that can't be done in another way. All I'm seeing in your rejection of alternative suggestions is "it's not %-interpolation using %d". > I'm arguing from three PoVs: > 1) 2 & 3 compatible code base > 2) having the bytes type /be/ the boundary type > 3) readable code The only one of these that I can see being in any way an argument against def int_to_bytes(n): return str(n).encode('ascii') b'Content Length: ' + int_to_bytes(len(binary_data)) is (3), and that's largely subjective. Personally, I see very little difference between the above and %d-interpolation in terms of *readability*. Brevity, certainly %d wins. But that's not important on its own, and I'd argue that my version is more clear in terms of describing the intent (and would be even better if I wasn't rubbish at thinking of function names, or if this wasn't in isolation, and more application-focused functions were used). > It seems to me the core of Nick's refusal is the (and I agree!) rejection of > bytes interpolation returning unicode -- but that's not what I'm asking for! > I'm asking for it to return bytes, with the interpolated data (in the case > if %d, %s, etc) being strictly-ASCII encoded. My reading of Nick's refusal is that %d takes a value which is semantically a number, converts it into a base-10 representation (which is semantically a *string*, not a sequence of bytes[1]) and then *encodes* that string into a series of bytes using the ASCII encoding. That is *two* semantic transformations, and one (the ASCII encoding) is *implicit*. Specifically, it's implicit because (a) the normal reading of %d is "produce the base-10 representation of a number, and a base-10 representation is a *string*, and (b) because nowhere has ASCII been mentioned (why not UTF16? that would be entirely plausible for a wchar-based environment like Windows). And a core principle of the bytes/text separation in Python 3 is that encoding should never happen implicitly. By the way, I should point out that I would never have understood *any* of the ideas involved in this thread before Python 3 forced me to think about Unicode and the distinction between text and bytes. And yet, I now find myself, in my (non-Python) work environment, being the local expert whenever applications screw up text encodings. So I, for one, am very grateful for Python 3's clear separation of bytes and text. (And if I sometimes come across as over-dogmatic, I apologise - put it down to the enthusiasm of the recent convert :-)) Paul [1] If you cannot see that there's no essential reason why the base-10 representation '123' should correspond to the bytes b'\x31\x32\x33' then you are probably not old enough to have started programming on EBCDIC-based computers :-) From songofacandy at gmail.com Sun Jan 12 20:21:59 2014 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 13 Jan 2014 04:21:59 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> Message-ID: I want to add one more PoV: small performance regression, especially on Python 2. Because programs that needs byte formatting may be low level and used heavily from application. Many programs uses one source approach to support Python 3. And supporting Python 3 should not means large performance regression on Python 2. In Python 2: In [1]: def int_to_bytes(n): ...: return unicode(n).encode('ascii') ...: In [2]: %timeit int_to_bytes(42) 1000000 loops, best of 3: 691 ns per loop In [3]: %timeit b'Content-Type: ' + int int int_to_bytes intern In [3]: %timeit b'Content-Type: ' + int_to_bytes(42) 1000000 loops, best of 3: 737 ns per loop In [4]: %timeit b'Content-Type: %d' % 42 10000000 loops, best of 3: 20.2 ns per loop In [5]: %timeit (u'Content-Type: %d' % 42).encode('ascii') 1000000 loops, best of 3: 381 ns per loop In Python 3: In [1]: def int_to_bytes(n): ...: return str(n).encode('ascii') ...: In [2]: %timeit int_to_bytes(42) 1000000 loops, best of 3: 612 ns per loop In [3]: %timeit b'Content-Type: ' + int_to_bytes(42) 1000000 loops, best of 3: 668 ns per loop In [4]: %timeit ('Content-Type: %d' % 42).encode('ascii') 1000000 loops, best of 3: 326 ns per loop > I'm arguing from three PoVs: > > 1) 2 & 3 compatible code base > > 2) having the bytes type /be/ the boundary type > > 3) readable code > > The only one of these that I can see being in any way an argument against > > def int_to_bytes(n): > return str(n).encode('ascii') > > b'Content Length: ' + int_to_bytes(len(binary_data)) > > is (3), and that's largely subjective. Personally, I see very little > difference between the above and %d-interpolation in terms of > *readability*. Brevity, certainly %d wins. But that's not important on > its own, and I'd argue that my version is more clear in terms of > describing the intent (and would be even better if I wasn't rubbish at > thinking of function names, or if this wasn't in isolation, and more > application-focused functions were used). > > > It seems to me the core of Nick's refusal is the (and I agree!) > rejection of > > bytes interpolation returning unicode -- but that's not what I'm asking > for! > > I'm asking for it to return bytes, with the interpolated data (in the > case > > if %d, %s, etc) being strictly-ASCII encoded. > > My reading of Nick's refusal is that %d takes a value which is > semantically a number, converts it into a base-10 representation > (which is semantically a *string*, not a sequence of bytes[1]) and > then *encodes* that string into a series of bytes using the ASCII > encoding. That is *two* semantic transformations, and one (the ASCII > encoding) is *implicit*. Specifically, it's implicit because (a) the > normal reading of %d is "produce the base-10 representation of a > number, and a base-10 representation is a *string*, and (b) because > nowhere has ASCII been mentioned (why not UTF16? that would be > entirely plausible for a wchar-based environment like Windows). And a > core principle of the bytes/text separation in Python 3 is that > encoding should never happen implicitly. > > By the way, I should point out that I would never have understood > *any* of the ideas involved in this thread before Python 3 forced me > to think about Unicode and the distinction between text and bytes. And > yet, I now find myself, in my (non-Python) work environment, being the > local expert whenever applications screw up text encodings. So I, for > one, am very grateful for Python 3's clear separation of bytes and > text. (And if I sometimes come across as over-dogmatic, I apologise - > put it down to the enthusiasm of the recent convert :-)) > > Paul > > [1] If you cannot see that there's no essential reason why the base-10 > representation '123' should correspond to the bytes b'\x31\x32\x33' > then you are probably not old enough to have started programming on > EBCDIC-based computers :-) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Jan 12 19:40:39 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 10:40:39 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: Message-ID: <52D2E1A7.3090202@stoneleaf.us> On 01/11/2014 07:09 PM, Nick Coghlan wrote: > > Folks that want implicit serialisation (and I agree it has its uses) should go help Benno get asciistr up to speed. asciistr is not what I'm looking for in the way of a boundary type. I have created a 'bytestring'[1] repository which has the tests for what I am looking for. Hopefully that will get rid of some confusion, at least. -- ~Ethan~ [1] https://bitbucket.org/stoneleaf/bytestring From emile at fenx.com Sun Jan 12 20:30:45 2014 From: emile at fenx.com (Emile van Sebille) Date: Sun, 12 Jan 2014 11:30:45 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> Message-ID: On 01/12/2014 09:26 AM, Paul Moore wrote: > Can you give an example of code that is *nearly* acceptable to you, > which works in Python 2 and 3 today, and explain what improvements you > would like to see to it in order to use it instead of waiting for a > core change? I'm not a developer, but I'm trying to understand how in v3 I accomplish what in v2 is easy: len(open('chars','wb').write("".join(map (chr,range(256)))).read()) What's the v3 equivalent? Emile From ethan at stoneleaf.us Sun Jan 12 20:14:50 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 11:14:50 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> Message-ID: <52D2E9AA.4010308@stoneleaf.us> On 01/12/2014 11:00 AM, Paul Moore wrote: > > And yet I still don't follow what you *want*. Unless it's that b'%d' % > (12,) must work and give b'12', and nothing else is acceptable. Nothing else is ideal. I'll go that route if I have to. I understand that in the real world you go with what works, but in the development stage you fight for the ideal. :) > My reading of Nick's refusal is that %d takes a value which is > semantically a number, converts it into a base-10 representation > (which is semantically a *string*, not a sequence of bytes[1]) and > then *encodes* that string into a series of bytes using the ASCII > encoding. That is *two* semantic transformations, and one (the ASCII > encoding) is *implicit*. Specifically, it's implicit because (a) the > normal reading of %d is "produce the base-10 representation of a > number, and a base-10 representation is a *string*, and (b) because > nowhere has ASCII been mentioned (why not UTF16? that would be > entirely plausible for a wchar-based environment like Windows). And a > core principle of the bytes/text separation in Python 3 is that > encoding should never happen implicitly. That could be. And yet the bytes type already has several concessions to ASCII encoding. > By the way, I should point out that I would never have understood > *any* of the ideas involved in this thread before Python 3 forced me > to think about Unicode and the distinction between text and bytes. And > yet, I now find myself, in my (non-Python) work environment, being the > local expert whenever applications screw up text encodings. So I, for > one, am very grateful for Python 3's clear separation of bytes and > text. (And if I sometimes come across as over-dogmatic, I apologise - > put it down to the enthusiasm of the recent convert :-)) No worries. I was forced to learn the difference when I wrote my dbf module for 2.5. Took longer than I'd like to admit to realize that ASCII was an encoding. :/ > [1] If you cannot see that there's no essential reason why the base-10 > representation '123' should correspond to the bytes b'\x31\x32\x33' > then you are probably not old enough to have started programming on > EBCDIC-based computers :-) I can see it. :) But bytes already acknowledges an ASCII bias. ;) And even EBCDIC machines speak ASCII when talking telnet. -- ~Ethan~ From p.f.moore at gmail.com Sun Jan 12 20:46:39 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Jan 2014 19:46:39 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> Message-ID: On 12 January 2014 19:30, Emile van Sebille wrote: > len(open('chars','wb').write("".join(map (chr,range(256)))).read()) Python 2: >>> len(open('chars','wb').write("".join(map (chr,range(256)))).read()) Traceback (most recent call last): File "", line 1, in AttributeError: 'NoneType' object has no attribute 'read' I could be facetous and say "None.read", but more seriously, what are you trying to say here? How do I write a 256-byte file with one byte for each value? bytes(range(256)) gives you the bytestring you want. I simply don't see your point here. >> And yet I still don't follow what you *want*. Unless it's that b'%d' % >> (12,) must work and give b'12', and nothing else is acceptable. > > Nothing else is ideal. I'll go that route if I have to. I understand that in the real world you go with what works, but in the development stage you fight for the ideal. :) OK, but can you fight by giving arguments as to why it's better than the plethora of alternatives that have been suggested? Or counter-arguments to the objections that have been raised to the proposal? Paul From emile at fenx.com Sun Jan 12 20:47:33 2014 From: emile at fenx.com (Emile van Sebille) Date: Sun, 12 Jan 2014 11:47:33 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> Message-ID: On 01/12/2014 11:30 AM, Emile van Sebille wrote: > On 01/12/2014 09:26 AM, Paul Moore wrote: >> Can you give an example of code that is *nearly* acceptable to you, >> which works in Python 2 and 3 today, and explain what improvements you >> would like to see to it in order to use it instead of waiting for a >> core change? > > > I'm not a developer, but I'm trying to understand how in v3 I accomplish > what in v2 is easy: > > len(open('chars','wb').write("".join(map (chr,range(256)))).read()) my bad : >>> open('chars','wb').write("".join(map (chr,range(256)))) >>> len(open('chars','rb').read()) 256 > > What's the v3 equivalent? > > Emile > > > From stephen at xemacs.org Sun Jan 12 21:02:41 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jan 2014 05:02:41 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> Message-ID: <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> Georg Brandl writes: > > if it weren't for your stupid maximalist opposition). > > Can you please stop throwing personal insults around? You don't have to > resort to that level. Ethan's posts (as an example of one general trend in this thread) are pretty frustrating, you have to admit. MAL posted straight out the Python 2 model of text makes it easier for him to write some programs, so he's all for reintroducing it. And that is the whole truth of the matter. Although I disagree with him, I appreciate his honesty. But people keep posting "we don't want Python 2's confounding of text and binary, we just want bytes with (nearly) all the functionality of strings [because they are (partially|really) encoded text]". Some of them actually use the literal word "text" in their justification! That's, well, what would you call it? Either they know what they're saying, in which case it's disingenuous at best, or they don't know what they're saying, in which case it's a proposal based on a clear misunderstanding of the situation. The problem is not going to go away just because they *say* they don't want to reintroduce Python 2 text processing. That is precisely what this proposal is *intended* to do, whether in the limited form proposed by Antoine or in the much more extensive form that folks like Ethan want. What "maximalists" mean is that they promise not to abuse Python 2 text processing when writing Python 3 programs. This promise is highly unlikely to be kept for two reasons. First, they can't make that promise on behalf of third parties, who for various reasons certainly will abuse these features to avoid the encoded-text-to- Unicode-text and vice-versa conversions. Second, I doubt they themselves will keep the promise to my satisfaction because their definition of "text" is ambiguous. When it's convenient for them to use text-processing operations on bytes, they'll say "oh, yes, these are conventionally considered text-processing features, but that's just an accident of the particular configuration of bytes -- yup, bytes -- I'm processing." You could argue that this "abuse" isn't *abuse*. That it's covered by "consenting adults". By the same token, so is smoking in a crowded elevator -- if you don't like it, don't use the elevator! Of course in applications used only by the author, there's no abuse (at least not of others! :-/ ) But Nick's important example of web frameworks demonstrates the problem: unless they convert to text where appropriate, they're just pushing the problem off on application writers. Sometimes passing on data as bytes is appropriate, of course, but the framework authors are likely to be biased in favor of doing that, and it's not hard to imagine frameworks ported from Python 2 passing on the problem wholesale on the grounds that "we returned str in Python 2 which is bytes in Python 3, and since we were processing bytes the whole time, we see no reason to change the 'ABI'." Of course the application writers thought they were receiving text "in an inconvenient and ambiguous form". IMO, with the proposed changes, that is likely to continue indefinitely, negating some of the gains I expected to receive from Python 3. :-( Note: there are a lot of high-level frameworks like Django that even in Python 2 basically went to Unicode everywhere internally. I don't deny that. I think that Python 3 as currently constituted makes it a lot easier to make an appropriate decision of where to convert, and should take some of the burden off the high-level frameworks. Approving this PEP, especially in a maximalist form, will blur the lines. From g.brandl at gmx.net Sun Jan 12 21:05:30 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 12 Jan 2014 21:05:30 +0100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> Message-ID: Am 12.01.2014 20:30, schrieb Emile van Sebille: > On 01/12/2014 09:26 AM, Paul Moore wrote: >> Can you give an example of code that is *nearly* acceptable to you, >> which works in Python 2 and 3 today, and explain what improvements you >> would like to see to it in order to use it instead of waiting for a >> core change? > > > I'm not a developer, but I'm trying to understand how in v3 I accomplish > what in v2 is easy: > > len(open('chars','wb').write("".join(map (chr,range(256)))).read()) > > What's the v3 equivalent? That's actually very easy and shows a strength of the bytes type, since there's no text involved: open('chars', 'wb').write(bytes(range(256))) Georg From stephen at xemacs.org Sun Jan 12 21:39:14 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jan 2014 05:39:14 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <87y52me4tk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87txd9djf1.fsf@uwakimon.sk.tsukuba.ac.jp> Daniel Holth writes: > -1 on adding more surrogateesapes by default. It's a pain to track > down where the encoding errors came from. What do you mean "by default"? It was quite explicit in the code I posted, and it's the only reasonable thing to do with "text data without known (but ASCII compatible) encoding or multiple different encodings in a single data chunk". If you leave it as bytes, it will barf as soon as you try to mix it with text even if it is pure ASCII! From greg.ewing at canterbury.ac.nz Sun Jan 12 22:06:50 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 13 Jan 2014 10:06:50 +1300 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> Message-ID: <52D303EA.7050008@canterbury.ac.nz> Paul Moore wrote: > I could easily argue at this point that this is the type of bug that > having %-formatting operations on bytes would encourage - %s means > "format a string" (from years of C and Python (text) experience) so I > automatically supply a string argument when using %s in a bytes > formatting context. So don't call it %s -- call it something else such as %b. -- Greg From breamoreboy at yahoo.co.uk Sun Jan 12 22:16:23 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 12 Jan 2014 21:16:23 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D303EA.7050008@canterbury.ac.nz> References: <20140111193226.23cc771d@fsol> <52D303EA.7050008@canterbury.ac.nz> Message-ID: On 12/01/2014 21:06, Greg Ewing wrote: > Paul Moore wrote: >> I could easily argue at this point that this is the type of bug that >> having %-formatting operations on bytes would encourage - %s means >> "format a string" (from years of C and Python (text) experience) so I >> automatically supply a string argument when using %s in a bytes >> formatting context. > > So don't call it %s -- call it something else > such as %b. > Sorry but you can't use %b as that'll confuse people who're used to it meaning "Month as locale?s abbreviated name." :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From greg.ewing at canterbury.ac.nz Sun Jan 12 22:25:47 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 13 Jan 2014 10:25:47 +1300 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> Message-ID: <52D3085B.7010608@canterbury.ac.nz> Nick Coghlan wrote: > > On 13 Jan 2014 01:22, "Kristj?n Valur J?nsson" > wrote: > > > Well, my suggestion would that we _should_ make it work, by having > the %s format specifyer on bytes objects mean: str(arg).encode('ascii', > 'strict') > > It is not explicit, it is implicit - whether or not the resulting string > assumes ASCII compatibility or not depends on whether you pass a binary > value (no assumption) or a string value (assumes ASCII compatibility). How do you make that out? As far as I can see, Kristjan's proposal will *always* call str() on the argument of a %s format, regardless of its type. The *result* of that str() is then *required* (not assumed) to be encodable as ascii. I don't see any type-dependent changes in behaviour here. Interpolating a bytes object as-is, without a conversion to text, should be done by a different format specifier, such as %b. All text/bytes conversions are then explicit: if you write %s, then you're encoding something as ascii, but if you write %b, you're just inserting something that's already binary. -- Greg From ethan at stoneleaf.us Sun Jan 12 22:28:15 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 13:28:15 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D308EF.8080108@stoneleaf.us> On 01/12/2014 12:02 PM, Stephen J. Turnbull wrote: > Georg Brandl writes: >> Antoine writes: >>> >>> . . . if it weren't for your stupid maximalist opposition. . . >> >> Can you please stop throwing personal insults around? You don't have to >> resort to that level. > > Ethan's posts (as an example of one general trend in this thread) are > pretty frustrating, you have to admit. Two points: 1) Are you saying it's okay to be insulting when frustrated? I also find this mega-thread frustrating, but I'm trying very hard not to be insulting. 2) If you are going to use my name, please be certain of the facts [1]. More below. > MAL posted straight out the Python 2 model of text makes it easier for > him to write some programs, so he's all for reintroducing it. And > that is the whole truth of the matter. Although I disagree with him, > I appreciate his honesty. If you have an example of me lying (even if it's just a possibility), please refer to it directly so I can either try to explain the misunderstanding or apologize. > But people keep posting "we don't want Python 2's confounding of text > and binary, we just want bytes with (nearly) all the functionality of > strings [because they are (partially|really) encoded text]". Some of > them actually use the literal word "text" in their justification! In only one case did I use the word "text" loosely, and that was when I claimed that Py2 had three text types, and Py3 had two. I was wrong, I apologize. Py3 has one definite text type, str, and, I claim, one half text type in bytes, because bytes itself provides ASCII text processing methods. If you have a better term for the notion of b'ethan'.title() --> b'Ethan' than ASCII-text processing, I'll use that instead. If there are good reasons to not allow further concessions to the ASCII-ness of bytes (and you provide a good one below) then that makes living with the handicap easier. But don't lie to me (as Nick tried to) and say that "In particular, the bytes type is, and always will be, designed for pure binary manipulation" when it has methods like .center(). If I am wrong, and that was not a lie, please explain it to me. > That's, well, what would you call it? Either they know what they're > saying, in which case it's disingenuous at best, or they don't know > what they're saying, in which case it's a proposal based on a clear > misunderstanding of the situation. I think some of the misunderstanding (which you also seem to suffer from) is that we (or at least I) /ever/ want a unicode string back from bytes interpolation. I don't! If I start with bytes, I want bytes back! And I have a very clear grasp on the difference between str and bytes and what ACSII encoding means, it was a hard and painful lesson for me and I'm not likely to forget it. To summarize, I used the term text when referring to unicode text (str), ASCII or ASCII-encoded text to refer to bytes that are to be used in a place that requires ASCII bytes for communication (such as content length or field type). I do /not/ use ASCII to refer to any ol' collection of bytes that happens to look like it might be ASCII-encoded text. >The problem is not going to go > away just because they *say* they don't want to reintroduce Python 2 > text processing. That is precisely what this proposal is *intended* > to do, whether in the limited form proposed by Antoine or in the much > more extensive form that folks like Ethan want. > > What "maximalists" mean is that they promise not to abuse Python 2 > text processing when writing Python 3 programs. This promise is > highly unlikely to be kept for two reasons. First, they can't make > that promise on behalf of third parties, who for various reasons > certainly will abuse these features to avoid the encoded-text-to- > Unicode-text and vice-versa conversions. I concede that this is a good reason to not allow % interpolation. Kinda like not allowing sum on strings. And I don't make promises for other people, and abusing this feature would be a bug. > Second, I doubt they > themselves will keep the promise to my satisfaction because their > definition of "text" is ambiguous. *My* definition is not ambiguous at all. If this particular part of the byte stream is defined to contain ASCII-encoded text, then I can use the bytes text methods to work with it. The only time I would return a bytes object is if it was supposed to be bytes (an image, for example); otherwise I return a bool, an int, a float, a date, or, even, a str. > When it's convenient for them to > use text-processing operations on bytes, they'll say "oh, yes, these > are conventionally considered text-processing features, but that's > just an accident of the particular configuration of bytes -- yup, > bytes -- I'm processing." If that particular configuration of bytes is because it's ASCII-encoded text, then sure. To use, for example, bytes.__upper__ on data that wasn't ASCII-encoded text (even if it happened to look like it was) would be the height of stupidity. Please don't include me in such accusations. > But Nick's important example of web frameworks demonstrates the > problem: unless they convert to text where appropriate, they're just > pushing the problem off on application writers. Sometimes passing on > data as bytes is appropriate, of course, but the framework authors are > likely to be biased in favor of doing that, and it's not hard to > imagine frameworks ported from Python 2 passing on the problem > wholesale on the grounds that "we returned str in Python 2 which is > bytes in Python 3, and since we were processing bytes the whole time, > we see no reason to change the 'ABI'." Of course the application > writers thought they were receiving text "in an inconvenient and > ambiguous form". IMO, with the proposed changes, that is likely to > continue indefinitely, negating some of the gains I expected to > receive from Python 3. :-( This would be a good reason to reject PEP 460, if that danger was deemed more likely than the good it would bring. > Note: there are a lot of high-level frameworks like Django that even > in Python 2 basically went to Unicode everywhere internally. I don't > deny that. I think that Python 3 as currently constituted makes it a > lot easier to make an appropriate decision of where to convert, and > should take some of the burden off the high-level frameworks. > Approving this PEP, especially in a maximalist form, will blur the > lines. I understand your point, but I disagree. When I open a file (in binary mode, obviously, as otherwise I'd get massive corruption) I get back a bunch of bytes. When working with tcp, I get back a bunch of bytes. bytes are /already/ the boundary type. If we have to make a third type for proper boundary processing it's an admission that bytes failed in its role. -- ~Ethan~ [1] I double-checked all my posts on this topic both here and on Python Ideas to make sure. From ethan at stoneleaf.us Sun Jan 12 22:29:30 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 13:29:30 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <87txd9djf1.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <87y52me4tk.fsf@uwakimon.sk.tsukuba.ac.jp> <87txd9djf1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D3093A.2060808@stoneleaf.us> On 01/12/2014 12:39 PM, Stephen J. Turnbull wrote: > Daniel Holth writes: > > > -1 on adding more surrogateesapes by default. It's a pain to track > > down where the encoding errors came from. > > What do you mean "by default"? It was quite explicit in the code I > posted, and it's the only reasonable thing to do with "text data > without known (but ASCII compatible) encoding or multiple different > encodings in a single data chunk". If you leave it as bytes, it will > barf as soon as you try to mix it with text even if it is pure ASCII! Which is why some (including myself) are asking to be able to stay in bytes land and do any necessary interpolation there. No resulting unicode, no barfing. ;) -- ~Ethan~ From kristjan at ccpgames.com Sun Jan 12 22:37:05 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Sun, 12 Jan 2014 21:37:05 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> , Message-ID: Right. I'm saying, let's support two interpolators only: %b interpolates a bytes object (or one supporting the charbuffer interface) into a bytes object. %s interpolates a str object by first converting to a bytes object using strict ascii conversion. This makes it very explicit what we are trying to do. I think that using %s to interpolate a bytes object like the current PEP does is a bad idea, because %s already means 'str' elsewhere in the language, both in 2.7 and 3.x As for the case you mention: b"abc %s" % (b"def",) -> b"abc def" b"abc %s" % (b"def",) -> b"abc b\"def\"" # because str(bytesobject) == repr(bytesobject) This is perfectly fine, imho. Let's not overload %s to mean "bytes" in format strings if those format strnings are in fact not strings byt bytes. That way madness lies. K ________________________________________ From: Paul Moore [p.f.moore at gmail.com] Sent: Sunday, January 12, 2014 17:04 To: Kristj?n Valur J?nsson Cc: Nick Coghlan; Georg Brandl; python-dev at python.org Subject: Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake On 12 January 2014 16:52, Kristj?n Valur J?nsson wrote: But that's not what the current PEP says. It uses %s for interpolating bytes values. It looks like you're saying that b'abc %s' % (b'def') will *not* produce b'abc def', but rather will produce b'abc b\'def\'' (because str(b'def'') is "b'def'"). From kristjan at ccpgames.com Sun Jan 12 22:37:52 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Sun, 12 Jan 2014 21:37:52 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D2CBA6.8050006@hotpy.org> References: <20140111193226.23cc771d@fsol> , , <52D2CBA6.8050006@hotpy.org> Message-ID: +1, even better. ________________________________________ From: Python-Dev [python-dev-bounces+kristjan=ccpgames.com at python.org] on behalf of Mark Shannon [mark at hotpy.org] Sent: Sunday, January 12, 2014 17:06 To: python-dev at python.org Subject: Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake On 12/01/14 16:52, Kristj?n Valur J?nsson wrote: > Now you're just splitting hairs, Nick. > > An explicit operator, %s, _defined_ to be "encode a string object using > strict ascii", I don't like this because '%s' reads to me as "insert *string* here". I think '%a' which reads as "encode as ASCII and insert here" would be better. From ethan at stoneleaf.us Sun Jan 12 22:26:58 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 13:26:58 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D303EA.7050008@canterbury.ac.nz> References: <20140111193226.23cc771d@fsol> <52D303EA.7050008@canterbury.ac.nz> Message-ID: <52D308A2.6020709@stoneleaf.us> On 01/12/2014 01:06 PM, Greg Ewing wrote: > Paul Moore wrote: >> >> I could easily argue at this point that this is the type of bug that >> having %-formatting operations on bytes would encourage - %s means >> "format a string" (from years of C and Python (text) experience) so I >> automatically supply a string argument when using %s in a bytes >> formatting context. > > So don't call it %s -- call it something else > such as %b. Which is fine for 3.5+ code, but not at all helpful for a 2/3 code base. -- ~Ethan~ From mark at hotpy.org Sun Jan 12 22:59:11 2014 From: mark at hotpy.org (Mark Shannon) Date: Sun, 12 Jan 2014 21:59:11 +0000 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D3093A.2060808@stoneleaf.us> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <87y52me4tk.fsf@uwakimon.sk.tsukuba.ac.jp> <87txd9djf1.fsf@uwakimon.sk.tsukuba.ac.jp> <52D3093A.2060808@stoneleaf.us> Message-ID: <52D3102F.5020109@hotpy.org> Why not just use six.byte_format(fmt, *args)? It works on both Python2 and Python3 and accepts the numerical format specifiers, plus '%b' for inserting bytes and '%a' for converting text to ascii. Admittedly it doesn't exist yet, but it could and it would save a lot of arguing :) (Apologies to anyone who doesn't appreciate my mischievous sense of humour) Cheers, Mark. From ethan at stoneleaf.us Sun Jan 12 23:01:27 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 14:01:27 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> , Message-ID: <52D310B7.60002@stoneleaf.us> On 01/12/2014 01:37 PM, Kristj?n Valur J?nsson wrote: > Right. > I'm saying, let's support two interpolators only: > %b interpolates a bytes object (or one supporting the charbuffer interface) into a bytes object. > %s interpolates a str object by first converting to a bytes object using strict ascii conversion. > > This makes it very explicit what we are trying to do. I think that using %s to interpolate a bytes object like the current PEP does is a bad idea, because %s already means 'str' elsewhere in the language, both in 2.7 and 3.x > > As for the case you mention: > b"abc %s" % (b"def",) -> b"abc def" > b"abc %s" % (b"def",) -> b"abc b\"def\"" # because str(bytesobject) == repr(bytesobject) > > This is perfectly fine, imho. Let's not overload %s to mean "bytes" in format strings if those format strnings are in fact not strings byt bytes. That way madness lies. You didn't say, but I'm guessing you mean the second one is fine? if 2/3 compatible code is the goal, the first should be what we get. -- ~Ethan~ From v+python at g.nevcal.com Sun Jan 12 22:59:37 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 12 Jan 2014 13:59:37 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D2E9AA.4010308@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> Message-ID: <52D31049.1090708@g.nevcal.com> On 1/12/2014 11:14 AM, Ethan Furman wrote: >> And a core principle of the bytes/text separation in Python 3 is that >> encoding should never happen implicitly. > > That could be. And yet the bytes type already has several concessions > to ASCII encoding. "%d" % 26 => an explicit request to convert binary integer to a base-10 Unicode/text representation of the integer b"%d" % 26 => an explicit request to convert binary integer to a base-10 ASCII bytes representation of the integer The leading "b" seems to be a very explicit request for bytes rather than characters to me, and seems much more attractive than the proposals to embed binary in Unicode by abusing Latin-1 encoding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sun Jan 12 23:10:59 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 13 Jan 2014 11:10:59 +1300 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> Message-ID: <52D312F3.30007@canterbury.ac.nz> Paul Moore wrote: > On 12 January 2014 18:26, Ethan Furman wrote: > >>I'm arguing from three PoVs: >>1) 2 & 3 compatible code base >>2) having the bytes type /be/ the boundary type >>3) readable code > > The only one of these that I can see being in any way an argument against > > def int_to_bytes(n): > return str(n).encode('ascii') > > b'Content Length: ' + int_to_bytes(len(binary_data)) > > is (3), I think the readability argument becomes a bit sharper when you consider more complex examples, e.g. if I have a tuple of 3 floats that I want to put into a PDF file, then b"%f %f %f" % my_floats is considerably clearer than b" ".join((float_to_bytes(f) for f in my_floats)) > My reading of Nick's refusal is that %d takes a value which is > semantically a number, converts it into a base-10 representation > (which is semantically a *string*, not a sequence of bytes[1]) and > then *encodes* that string into a series of bytes using the ASCII > encoding. That is *two* semantic transformations, and one (the ASCII > encoding) is *implicit*. Specifically, it's implicit because (a) the > normal reading of %d is "produce the base-10 representation of a > number, and a base-10 representation is a *string*, and (b) because > nowhere has ASCII been mentioned It's indicated (I won't say "implied", see below) by the fact that we're interpolating it into a bytes object rather than a string. This is no more or less implicit than the fact that when we write b"ABC" then we're saying that those characters are to be encoded in ASCII, and not EBCDIC or UTF-16 or... BTW, there's a problem with bandying around the words "implicit" and "explicit", because they depend on your frame of reference. For example, one person might say that the fact that b"%s" encodes into ASCII is implicit, because ASCII isn't written down in the code anywhere. But another person might say it's explicit, because the manual explicitly says that stuff interpolated into a bytes object is encoded as ASCII. So arguments of the form "X is bad because it's not explicit" are prone to getting people talking past each other. -- Greg From greg.ewing at canterbury.ac.nz Sun Jan 12 23:12:35 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 13 Jan 2014 11:12:35 +1300 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D2CAED.7090502@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> Message-ID: <52D31353.9040909@canterbury.ac.nz> Ethan Furman wrote: > Your asciistr, which sometimes returns bytes and sometimes returns text, > is absolutely *not* what we want. The kind of third-party thing that *might* fill the bill would be a *function*: bytesformat(b"Content-Length: %d", length) that implements all the %-specifiers we're asking for. -- Greg From greg.ewing at canterbury.ac.nz Sun Jan 12 23:12:45 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 13 Jan 2014 11:12:45 +1300 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2CBA6.8050006@hotpy.org> Message-ID: <52D3135D.1050507@canterbury.ac.nz> Mark Lawrence wrote: > I entirely agree. This would also parallel the conversion flags given > here http://docs.python.org/3/library/string.html#format-string-syntax, > I quote "Three conversion flags are currently supported: '!s' which > calls str() on the value, '!r' which calls repr() and '!a' which calls > ascii()". Except that ascii() does something rather different -- it's a variation on repr() rather than str(), and it doesn't imply any encoding operation. I think this parallel would be more confusing than helpful. -- Greg From ethan at stoneleaf.us Sun Jan 12 23:03:57 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 14:03:57 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D3102F.5020109@hotpy.org> References: <20140108234213.0610ef63@fsol> <52D08F19.9070605@stoneleaf.us> <20140111021258.48e72beb@fsol> <52D0A405.50903@trueblade.com> <20140111030403.6f177f30@fsol> <52D16017.1020007@egenix.com> <87y52me4tk.fsf@uwakimon.sk.tsukuba.ac.jp> <87txd9djf1.fsf@uwakimon.sk.tsukuba.ac.jp> <52D3093A.2060808@stoneleaf.us> <52D3102F.5020109@hotpy.org> Message-ID: <52D3114D.1050707@stoneleaf.us> On 01/12/2014 01:59 PM, Mark Shannon wrote: > > Why not just use six.byte_format(fmt, *args)? > It works on both Python2 and Python3 and accepts the numerical format specifiers, plus '%b' for inserting bytes and '%a' > for converting text to ascii. Sounds like the second best option! > Admittedly it doesn't exist yet, > but it could and it would save a lot of arguing :) :) -- ~Ethan~ From rosuav at gmail.com Sun Jan 12 23:28:31 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 13 Jan 2014 09:28:31 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> <20140112172220.GB3869@ando> Message-ID: On Mon, Jan 13, 2014 at 4:57 AM, Juraj Sukop wrote: > On Sun, Jan 12, 2014 at 6:22 PM, Steven D'Aprano > wrote: >> First, "utf16_string" confuses me. What is it? If it is a Unicode >> string, i.e.: > > It is a Unicode string which happens to contain code points outside U+00FF > (as with the TTF example above), so that it triggers the (at least) 2-bytes > memory representation in CPython 3.3+. I agree, I chose the variable name > poorly, my bad. When I'm talking about Unicode strings based on their maximum codepoint, I usually call them something like "ASCII string", "Latin-1 string", "BMP string", and "SMP string". Still not wholly accurate, but less confusing than naming an encoding... oh wait, two of those _are_ encodings :| But you could use "narrow string" for the first two. Or "string(0..127)" for ASCII, "string(0..255)" for Latin-1, and then for consistency "string(0..65535)" and "string(0..1114111)" for the others, except that I doubt that'd be helpful :) At any rate, "BMP" as a term for "includes characters outside of Latin-1 but all on the Basic Multilingual Plane" would probably be close enough to get away with. ChrisA From p.f.moore at gmail.com Sun Jan 12 23:29:14 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Jan 2014 22:29:14 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D312F3.30007@canterbury.ac.nz> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D312F3.30007@canterbury.ac.nz> Message-ID: On 12 January 2014 22:10, Greg Ewing wrote: > I think the readability argument becomes a bit sharper when > you consider more complex examples, e.g. if I have a tuple > of 3 floats that I want to put into a PDF file, then > > b"%f %f %f" % my_floats > > is considerably clearer than > > b" ".join((float_to_bytes(f) for f in my_floats)) Hmm, I'm not sure I'd agree. I'd quote "explicit is better than implicit", but given comments below, that would be a mistake :-) Let's just leave it that I'd probably wrap the whole thing in a float_list(floats) function in my application, and not *care* how it was implemented. One thing that this does bring up, though, is that all the talk is about %-formatting. Do the people who are arguing for numeric formatting have views on what (if any) features will be included in bytes.format()? It seems to me that recasting many of the discussions using format() make it much less "obvious" that adding the features to bytes formatting is a reasonable thing to do. I won't give specific examples, because I would be putting words into people's mouths. But I *would* say that any genuine proposal for numeric formatting in bytes should be cast as a formal PEP and explicitly document both % and format() behaviours. > It's indicated (I won't say "implied", see below) by the > fact that we're interpolating it into a bytes object rather > than a string. > > This is no more or less implicit than the fact that when > we write > > b"ABC" > > then we're saying that those characters are to be encoded > in ASCII, and not EBCDIC or UTF-16 or... That's a fair point, and one I had not taken into consideration. > BTW, there's a problem with bandying around the words > "implicit" and "explicit", because they depend on your frame > of reference. For example, one person might say that the > fact that b"%s" encodes into ASCII is implicit, because > ASCII isn't written down in the code anywhere. But another > person might say it's explicit, because the manual explicitly > says that stuff interpolated into a bytes object is encoded > as ASCII. In my defense, I would say that I was trying to clarify Nick's objections, and it's entirely possible I misrepresented this aspect of them. Personally, I agree that it's not as black and white as simply saying "numeric formatting is wrong", but I think that the fact that %d et al represent a "double transformation" (from number to string representation to encoded bytes) is the differentiating factor here. Proposals that do nothing but interpolation are essentially convenience wrappers for various combinations of concatenation and join. Adding "double transformation" formatting codes is a step change, and needs to be explicitly acknowledged and justified. (If you *do* manage to justify such codes, there's a secondary question of precisely what codes should be supported, but we can start by getting agreement that the *class* of codes is allowed). PEP 460 explicitly excludes anything but pure interpolation. > So arguments of the form "X is bad because it's not > explicit" are prone to getting people talking past each > other. Fair point. I hope my above paragraph clarifies my position somewhat better. Paul From stephen at xemacs.org Sun Jan 12 23:31:16 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jan 2014 07:31:16 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140112172220.GB3869@ando> References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> <20140112172220.GB3869@ando> Message-ID: <87sissessr.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > then the name is horribly misleading, and it is best handled like this: > > content = '\n'.join([ > 'header', > 'part 2 %.3f' % number, > binary_image_data.decode('latin-1'), > utf16_string, # Misleading name, actually Unicode string > 'trailer']) This loses bigtime, as any encoding that can handle non-latin1 in utf16_string will corrupt binary_image_data. OTOH, latin1 will raise on non-latin1 characters. utf16_string must be encoded appropriately then decoded by latin1 to be reencoded by latin1 on output. > On the other hand, if it is actually a bytes object which is the product > of UTF-16 encoding, i.e.: > > type(utf16_string) > => returns bytes > > and those bytes were generated by "some text".encode("utf-16"), then it > is already binary data and needs to be smuggled into the text string. > Latin-1 is good for that: > > content = '\n'.join([ > 'header', > 'part 2 %.3f' % number, > binary_image_data.decode('latin-1'), > utf16_string.decode('latin-1'), > 'trailer']) > > > Both examples assume that you intend to do further processing of content > before sending it, and will encode just before sending: > > content.encode('utf-8') > > (Don't use Latin-1, since it cannot handle the full range of text > characters.) This corrupts binary_image_data. Each byte > 127 will be replaced by two bytes. In the second case, you can use latin1 to encode, it it gives you what you want. This kind of subtlety is precisely why MAL warned about use of latin1 to smuggle bytes. From breamoreboy at yahoo.co.uk Sun Jan 12 23:32:06 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 12 Jan 2014 22:32:06 +0000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D2CAED.7090502@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> Message-ID: On 12/01/2014 17:03, Ethan Furman wrote: > On 01/12/2014 08:21 AM, Ethan Furman wrote: >> On 01/12/2014 08:09 AM, Nick Coghlan wrote: >>> On 13 Jan 2014 01:22, "Kristj?n Valur J?nsson" wrote: >>>> >>>> Imho, this is not equivalent to re-introducing automatic type >>>> conversion between binary/unicode, it is adding a >>>> specific convenience function for explicitly asking for ASCII encoding. >>> >>> It is not explicit, it is implicit - whether or not the resulting >>> string assumes ASCII compatibility or not depends on >>> whether you pass a binary value (no assumption) or a string value >>> (assumes ASCII compatibility). >> >> Nick, I don't understand what you are saying here. Are you saying >> that the result of b'%s' % var may be either a bytes >> object or a str object? Because that would be wrong -- it would >> always be a bytes object. > > Okay, I just went and took a closer look at the asciistr type [1]. For > what it's worth I don't think this is Antoine's understanding of what we > [2] are asking for, nor is it what we are asking for (I'm sure Antoine > will correct me if I'm wrong. ;) > > We know full well the difference between unicode and bytes, and we know > full well that numbers and much of the text we need has an ASCII > (bytes!) representation. When we do a b'Content Length: %d' % > len(binary_data) we are expecting to get back a bytes object, /not/ a > unicode object. > > Your asciistr, which sometimes returns bytes and sometimes returns text, > is absolutely *not* what we want. I've just tried asciistr using your test code (having corrected the typo, it's assertIsInstance, not assertIsinstance :) and it looks like a very good starting point. Have you, or anyone else for that matter, actually tried asciistr out? > > -- > ~Ethan~ > > > [1] https://github.com/jeamland/asciicompat > [2] the dbf and pdf folks, at least -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ethan at stoneleaf.us Sun Jan 12 23:40:52 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 14:40:52 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <87sissessr.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> <20140112172220.GB3869@ando> <87sissessr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D319F4.5040804@stoneleaf.us> On 01/12/2014 02:31 PM, Stephen J. Turnbull wrote: > > This corrupts binary_image_data. Each byte > 127 will be replaced by > two bytes. In the second case, you can use latin1 to encode, it it > gives you what you want. > > This kind of subtlety is precisely why MAL warned about use of latin1 > to smuggle bytes. And why I've been fighting Steven D'Aprano on it. -- ~Ethan~ From brett at python.org Sun Jan 12 23:46:58 2014 From: brett at python.org (Brett Cannon) Date: Sun, 12 Jan 2014 17:46:58 -0500 Subject: [Python-Dev] Trying to focus the whole bytes/str formatting discussion Message-ID: I don't know about the rest of you but I feel like the discussion is heading off the rails (if it hasn't already jumped the tracks). Let's try to bring this back around to something actionable which people can focus their energy on as the amount of developer time spent arguing could have led to several coded-up solutions. I see it as a practicality-beats-purity vs. explicit-is-better-than-implicit. The PBP group want bytes.format() (just assume I include interpolation support if you want that) to work as close to a drop-in replacement for current str.format() use in Python 2 to ease porting. The argument is that code looks cleaner and the amount of changes in Python 2 code being ported to Python 3 is much smaller. THE EIBTI group are willing to support PEP 460 but beyond that don't want to have in Python itself anything for bytes.format() which takes in a string and spits out bytes. It's bytes in->bytes out and not bytes & str in->bytes out as the PBP group is after. The EIBTI group are arguing that letting str into bytes.format() and then automatically be converted to strict ASCII leads to conflating the text/bytes divide as well as being too magical, e.g. what if you actually wanted UTF-16 for you number string instead of ASCII; the EIBTI group **wants** to force people to make a decision. They are also less concerned with making users update Python 2 code to handle this as it already needs to be updated for other Python 3 things anyway. >From where I'm sitting, the EIBTI group and their PEP 460 proposal from Antoine (and no longer Victor) are not controversial. Everyone seems to agree that PEP 460 **at minimum** is acceptable and should happen for Python 3.5. The people with the uphill battle and something to prove are those arguing for str in->bytes out support in bytes.format(). The added features that the PBP group want are the ones being argued over. As the onus is on the PBP group to convince the EIBTI group (or Guido), I think the PBP group should code up a solution that does what they want and put it on PyPI to see what the community thinks. If the PBP group wants to convince the EIBTI group that str in->bytes out for bytes.format() is critical in getting a key group of users to start using Python 3 then I think that needs to be demonstrated through real-world usage by some people. If there is serious pickup of the solution from PyPI by projects then we can discuss integrating it into Python 3.5. That gives at least **one year** to come up with a solution which gets picked up by the community (standard requirement for stdlib inclusion). At worst some projects use the PyPI project and find it useful but it doesn't go into Python 3.5. At best lots of people find it useful enough that we add it to Python 3.5. But regardless, a PyPI project helps people **no matter what** the EIBTI group thinks. That's more forward momentum than this conversation currently has. This has split down philosophical lines and does not look to be tilting one way or the other by simply using words. I think it has reached the point that showing code is going to be the only way to tilt the favour towards the PBP group at this point. Guido has not spoken up so either he is ignoring it because he's busy, he doesn't care, or he's mulling things over still. Assuming he doesn't speak up then it comes down to getting a clear majority on the side of the PBP group and that is not going to happen the way this discussion is going. So, action items are: * Get PEP 460 pronounced on **as is** * A PyPI project containing PBP ideas and see if the community seizes on it or not (benefit to people regardless) * Do a separate PEP that builds on PEP 460 if people really want to continue down that road at this time Don't forget, we are talking about Python 3.5; we have not even hit Python 3.4rc1 yet so this level of arguing seems a bit premature and going nowhere. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Jan 12 23:52:37 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 12 Jan 2014 23:52:37 +0100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> Message-ID: <20140112235237.0646191b@fsol> Hi Ethan, On Sun, 12 Jan 2014 13:28:15 -0800 Ethan Furman wrote: > On 01/12/2014 12:02 PM, Stephen J. Turnbull wrote: > > Georg Brandl writes: > >> Antoine writes: > >>> > >>> . . . if it weren't for your stupid maximalist opposition. . . > >> > >> Can you please stop throwing personal insults around? You don't have to > >> resort to that level. > > > > Ethan's posts (as an example of one general trend in this thread) are > > pretty frustrating, you have to admit. > > Two points: > > 1) Are you saying it's okay to be insulting when frustrated? > I also find this mega-thread frustrating, but I'm trying > very hard not to be insulting. You are right, it is not ok. The wording wasn't constructive or controlled at all. I'd like to apologize for that. At the same point, I was expressing a fair amount of frustration. I think the last discussion rounds have largely failed to produce any new meaningful insight (to the point that I've stopped reading several subthreads). IMO the best thing *for now* would be to "agree to disagree", let things bake in everyone's mind for some time, and revisit the subject in some weeks. Regards Antoine. From stephen at xemacs.org Sun Jan 12 23:57:34 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jan 2014 07:57:34 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D2E9AA.4010308@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> Message-ID: <87r48cerkx.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > Nothing else is ideal. I'll go that route if I have to. I > understand that in the real world you go with what works, but in > the development stage you fight for the ideal. :) You're going to lose, because Python 3 chose a different ideal that conflicts with yours. > > My reading of Nick's refusal is that %d takes a value which is > > semantically a number, converts it into a base-10 representation > > (which is semantically a *string*, not a sequence of bytes[1]) and > > then *encodes* that string into a series of bytes using the ASCII > > encoding. > > That could be. And yet the bytes type already has several > concessions to ASCII encoding. No, Nick's point is that there's no encoding needed there are all, just a bunch of methods that handle numbers in the range 0-255. You can rationalize the particular choice of numbers by referring to the ASCII coded character set, and that's very useful to users. But knowledge of ASCII isn't necessary to specify these methods; they can be defined in an encoding/decoding-free way. > But bytes already acknowledges an ASCII bias. True, but that bias is implemented without use of encoding or decoding. b'%d' % (123,) -> b'123' does require encoding, at the very least in the sense of type change and serialization. From fabiofz at gmail.com Mon Jan 13 00:08:10 2014 From: fabiofz at gmail.com (Fabio Zadrozny) Date: Sun, 12 Jan 2014 21:08:10 -0200 Subject: [Python-Dev] Python advanced debug support (update frame code) Message-ID: Hi Python-dev. I'm playing a bit on the concept on live-coding during a debug session and one of the most annoying things is that although I can reload the code for a function (using something close to xreload), it seems it's not possible to change the code for the current frame (i.e.: I need to get out of the function call and then back in to a call to the method from that frame to see the changes). I gave a look on the frameobject and it seems it would be possible to set frame.f_code to another code object -- and set the line number to the start of the new object, which would cover the most common situation, which would be restarting the current frame -- provided the arguments remain the same (which is close to what the java debugger in Eclipse does when it drops the current frame -- on Python, provided I'm not in a try..except block I can do even better setting the the frame.f_lineno, but without being able to change the frame f_code it loses a lot of its usefulness). So, I'd like to ask for feedback from people with more knowledge on whether it'd be actually feasible to change the frame.f_code and possible implications on doing that. Thanks, Fabio -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Mon Jan 13 00:04:56 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 12 Jan 2014 15:04:56 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87r48cerkx.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <87r48cerkx.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D31F98.5080704@g.nevcal.com> On 1/12/2014 2:57 PM, Stephen J. Turnbull wrote: > > But bytes already acknowledges an ASCII bias. > > True, but that bias is implemented without use of encoding or > decoding. b'%d' % (123,) -> b'123' does require encoding, at the > very least in the sense of type change and serialization. b'%d' all by itself, even before using the % operator, does require encoding, at the very list in the sense of type change and serialization. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Mon Jan 13 00:25:11 2014 From: cs at zip.com.au (Cameron Simpson) Date: Mon, 13 Jan 2014 10:25:11 +1100 Subject: [Python-Dev] Trying to focus the whole bytes/str formatting discussion In-Reply-To: References: Message-ID: <20140112232511.GA43624@cskk.homeip.net> On 12Jan2014 17:46, Brett Cannon wrote: > THE EIBTI group are willing to support PEP 460 but beyond that don't want > to have in Python itself anything for bytes.format() which takes in a > string and spits out bytes. It's bytes in->bytes out and not bytes & str > in->bytes out as the PBP group is after. The EIBTI group are arguing that > letting str into bytes.format() and then automatically be converted to > strict ASCII leads to conflating the text/bytes divide as well as being too > magical, e.g. what if you actually wanted UTF-16 for you number string > instead of ASCII; the EIBTI group **wants** to force people to make a > decision. They are also less concerned with making users update Python 2 > code to handle this as it already needs to be updated for other Python 3 > things anyway. [...] I'm in the EIBTI on the whole, but I would also be happy for the bytes.format() function to accept strings (and floats or whatever the str.format supports) _provided_ it required an explicit encoding= parameter to enable it. i.e. make it easy to use, _but_ require an overt specification of the str->bytes encoding. You don't even need a special mode, but have it raise a ValueError if the (default) encoding is None when an encoding became needed. Just my 2c on Brett's EIBTI vs PBP divide. I'll try to stay off this thread now and bikeshed only in the others... -- Cameron Simpson You can blip it twice to clear the bore, But blip it thrice, and you've sinned once more. - Tom Warner From steve at pearwood.info Mon Jan 13 00:43:55 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 13 Jan 2014 10:43:55 +1100 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <87sissessr.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> <20140112172220.GB3869@ando> <87sissessr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140112234354.GD3869@ando> On Mon, Jan 13, 2014 at 07:31:16AM +0900, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > then the name is horribly misleading, and it is best handled like this: > > > > content = '\n'.join([ > > 'header', > > 'part 2 %.3f' % number, > > binary_image_data.decode('latin-1'), > > utf16_string, # Misleading name, actually Unicode string > > 'trailer']) > > This loses bigtime, as any encoding that can handle non-latin1 in > utf16_string will corrupt binary_image_data. OTOH, latin1 will raise > on non-latin1 characters. utf16_string must be encoded appropriately > then decoded by latin1 to be reencoded by latin1 on output. Of course you're right, but I have understood the above as being a sketch and not real code. (E.g. does "header" really mean the literal string "header", or does it stand in for something which is a header?) In real code, one would need to have some way of telling where the binary image data ends and the Unicode string begins. If I have misunderstood the situation, then my apologies for compounding the error [...] > > Both examples assume that you intend to do further processing of content > > before sending it, and will encode just before sending: > > > > content.encode('utf-8') > > > > (Don't use Latin-1, since it cannot handle the full range of text > > characters.) > > This corrupts binary_image_data. Each byte > 127 will be replaced by > two bytes. And reading it back using decode('utf-8') will replace those two bytes with a single byte, round-tripping exactly. Of course if you encode to UTF-8 and then try to read the binary data as raw bytes, you'll get corrupted data. But do people expect to do this? That's a genuine question -- again, I assumed (apparently wrongly) that the idea was to write the content out as *text* containing smuggled bytes, and read it back the same way. > In the second case, you can use latin1 to encode, it it > gives you what you want. > > This kind of subtlety is precisely why MAL warned about use of latin1 > to smuggle bytes. How would you smuggle a chunk of arbitrary bytes into a text string? Short of doing something like uuencoding it into ASCII, or equivalent. -- Steven From ethan at stoneleaf.us Mon Jan 13 00:46:39 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 15:46:39 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87r48cerkx.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <87r48cerkx.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D3295F.7070104@stoneleaf.us> On 01/12/2014 02:57 PM, Stephen J. Turnbull wrote: > Ethan Furman writes: >> >> Nothing else is ideal. I'll go that route if I have to. I >> understand that in the real world you go with what works, but in >> the development stage you fight for the ideal. :) > > You're going to lose, because Python 3 chose a different ideal that > conflicts with yours. Entirely possible. I didn't set out to waste anyone's time, but I wasn't around for the initial discussions so don't know the reasons behind the result, only that the result is not an appropriate boundary type despite it being what is handed around at the boundaries. >>> My reading of Nick's refusal is that %d takes a value which is >>> semantically a number, converts it into a base-10 representation >>> (which is semantically a *string*, not a sequence of bytes[1]) and >>> then *encodes* that string into a series of bytes using the ASCII >>> encoding. >> >> That could be. And yet the bytes type already has several >> concessions to ASCII encoding. > > No, Nick's point is that there's no encoding needed there are all, > just a bunch of methods that handle numbers in the range 0-255. You > can rationalize the particular choice of numbers by referring to the > ASCII coded character set, and that's very useful to users. But > knowledge of ASCII isn't necessary to specify these methods; they can > be defined in an encoding/decoding-free way. How can you say that with a straight face? [1] Do you really think that .title, .isalnum, and .center (to name only a few) would work the same if the assumed encoding was EBCIDC? Do you think they would do the proper transformations, or return the proper result, if the bytes they were used on were encoded Japanese? >> But bytes already acknowledges an ASCII bias. > > True, but that bias is implemented without use of encoding or > decoding. b'%d' % (123,) -> b'123' does require encoding, at the > very least in the sense of type change and serialization. You mean like changing a number into text does? Really, this is no different. -- ~Ethan~ [1] I'm sorry to be offensive, but I have no idea how to respond to that that acknowledges my complete astonishment that you would say such a thing. From ethan at stoneleaf.us Mon Jan 13 00:15:34 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 15:15:34 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <20140112235237.0646191b@fsol> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <20140112235237.0646191b@fsol> Message-ID: <52D32216.5090009@stoneleaf.us> On 01/12/2014 02:52 PM, Antoine Pitrou wrote: > > You are right, it is not ok. The wording wasn't constructive or > controlled at all. I'd like to apologize for that. Thank you. Apology accepted. > At the same point, I was expressing a fair amount of frustration. I > think the last discussion rounds have largely failed to produce any > new meaningful insight (to the point that I've stopped reading several > subthreads). IMO the best thing *for now* would be to "agree to > disagree", let things bake in everyone's mind for some time, and > revisit the subject in some weeks. For the most part I agree. I did, though, finally figure out what Nick thought I wanted, so there was at least a little progress. But yes, I think tabling the discussion for now, and working on Brett's ideas, is entirely appropriate. -- ~Ethan~ P.S. Direct reply so you don't miss my response. :) From guido at python.org Mon Jan 13 00:55:23 2014 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Jan 2014 15:55:23 -0800 Subject: [Python-Dev] PEP 460 reboot Message-ID: There's a lot of discussion about PEP 460 and I haven't read it all. Maybe you all have already reached the same conclusion that I have. In that case I apologize (but the PEP should be updated). Here's my contribution: PEP 460 itself currently rejects support for %d, AFAIK on the basis that bytes aren't necessarily ASCII. I think that's a misunderstanding of the intention of the bytes type. The key reason for introducing a separate bytes type in Python 3 is to avoid *mixing* bytes and text. This aims to avoid the classic Python 2 Unicode failure, where str+unicode fails or succeeds based on whether str contains non-ASCII characters or not, which means it is easy to miss in testing. Properly written code in Python 3 will fail based on the *type* of the objects, not based on their contents. Content-based failures are still possible, but they occur in typical "boundary" operations such as encode/decode. But this does not mean the bytes type isn't allowed to have a noticeable bias in favor of encodings that are ASCII supersets, even if not all bytes objects contain such data (e.g. image data, compressed data, binary network packets, and so on). IMO it's totally fine and consistent if b'%d' % 42 returns b'42' and also for b'{}'.format(42) to return b'42'. There are numerous places where bytes are already assumed to use an ASCII superset: - byte literals: b'abc' (it's a syntax error to have a non-ASCII character here) - the upper() and lower() methods modify the ASCII letter positions - int(b'42') == 42, float(b'3.14') == 3.14 I looked through the example code I recently write for asyncio (which uses bytes for all data read or written). There are several places where I have to make a clumsy detour via text strings because I need to include an ASCII-encoded decimal integer (e.g. the Content-Length header) or a hex-encoded one (e.g. for Transfer-Encoding: chunked). Those detours aren't needed for parsing because int() accepts bytes just fine. I also note that the behavior of the re module is perfect: if the pattern is bytes, it can only match bytes and the extracted data is bytes, and ditto for text -- so it supports both types but doesn't allow mixing them. The urllib module does this too -- at considerable cost in its implementation, but it's the right thing, because there really are good cases to be made for treating URLs as text as well as for treating them as bytes (as with filenames, command line arguments, and environment variables). I'm sad that the json module in Python 3 doesn't support bytes at all, but at least it is consistent -- it always produces text in ASCII encoding (by default). The same applies to the http module, which IIUC adheres to the standard by treating headers as Latin-1. -- --Guido van Rossum (python.org/~guido) From stephen at xemacs.org Mon Jan 13 01:02:04 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jan 2014 09:02:04 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <52D319F4.5040804@stoneleaf.us> References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> <20140112172220.GB3869@ando> <87sissessr.fsf@uwakimon.sk.tsukuba.ac.jp> <52D319F4.5040804@stoneleaf.us> Message-ID: <87ppnweolf.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > > This kind of subtlety is precisely why MAL warned about use of latin1 > > to smuggle bytes. > > And why I've been fighting Steven D'Aprano on it. No, I think you haven't been fighting Steven d'A on "it". You're talking about parsing and generating structured binary files, he's talking about techniques for parsing and generating streams with no real structure above the byte or encoded character level. Of course you can implement the former with the latter using Python 3 "str", but it's ugly, maybe even painful if you need to encode binary blobs back to binary to process them. (More discussion in my other post, although I suspect you're not going to be terribly happy with that, either. ;-) This generally *is not* the case for the wire protocol guys. AFAICT they really do want to process things as streams of ASCII-compatible text, with the non-ASCII stuff treated as runs of uninterpreted bytes that are just passed through. So when you talk about "we", I suspect you are not the "we" everybody else is arguing with. In particular, AIUI your use case is not included in the use cases most of us -- including Steven -- are thinking about. From stephen at xemacs.org Mon Jan 13 01:08:46 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jan 2014 09:08:46 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D31049.1090708@g.nevcal.com> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <52D31049.1090708@g.nevcal.com> Message-ID: <87ob3geoa9.fsf@uwakimon.sk.tsukuba.ac.jp> Glenn Linderman writes: > the proposals to embed binary in Unicode by abusing Latin-1 > encoding. Those aren't "proposals", they are currently feasible techniques in Python 3 for *some* use cases. The question is why infecting Python 3 with the byte/character confoundance virus is preferable to such techniques, especially if their (serious!) deficiencies are removed by creating a new type such as asciistr. From ethan at stoneleaf.us Mon Jan 13 01:16:43 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 16:16:43 -0800 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <87ppnweolf.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> <20140112172220.GB3869@ando> <87sissessr.fsf@uwakimon.sk.tsukuba.ac.jp> <52D319F4.5040804@stoneleaf.us> <87ppnweolf.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D3306B.1060204@stoneleaf.us> On 01/12/2014 04:02 PM, Stephen J. Turnbull wrote: > > So when you talk about "we", I suspect you are not the "we" everybody > else is arguing with. In particular, AIUI your use case is not > included in the use cases most of us -- including Steven -- are > thinking about. Ah, so even in the minority I'm in the minority. :/ The "we" I am usually referring to are those of us who have to deal with the mixed ASCII/binary/encoded text files (a couple have spoken up about PDFs, and I have DBF). -- ~Ethan~ From donald at stufft.io Mon Jan 13 01:46:19 2014 From: donald at stufft.io (Donald Stufft) Date: Sun, 12 Jan 2014 19:46:19 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: Message-ID: On Jan 12, 2014, at 6:55 PM, Guido van Rossum wrote: > The key reason for introducing a separate bytes type in Python 3 is to > avoid *mixing* bytes and text. This aims to avoid the classic Python 2 > Unicode failure, where str+unicode fails or succeeds based on whether > str contains non-ASCII characters or not, which means it is easy to > miss in testing. +1 > > But this does not mean the bytes type isn't allowed to have a > noticeable bias in favor of encodings that are ASCII supersets, even > if not all bytes objects contain such data (e.g. image data, > compressed data, binary network packets, and so on). +1 > > IMO it's totally fine and consistent if b'%d' % 42 returns b'42' and > also for b'{}'.format(42) to return b'42'. There are numerous places > where bytes are already assumed to use an ASCII superset: > > - byte literals: b'abc' (it's a syntax error to have a non-ASCII character here) > - the upper() and lower() methods modify the ASCII letter positions > - int(b'42') == 42, float(b'3.14') == 3.14 Completely Agree. > > I looked through the example code I recently write for asyncio (which > uses bytes for all data read or written). There are several places > where I have to make a clumsy detour via text strings because I need > to include an ASCII-encoded decimal integer (e.g. the Content-Length > header) or a hex-encoded one (e.g. for Transfer-Encoding: chunked). > Those detours aren't needed for parsing because int() accepts bytes > just fine. > > I also note that the behavior of the re module is perfect: if the > pattern is bytes, it can only match bytes and the extracted data is > bytes, and ditto for text -- so it supports both types but doesn't > allow mixing them. The urllib module does this too -- at considerable > cost in its implementation, but it's the right thing, because there > really are good cases to be made for treating URLs as text as well as > for treating them as bytes (as with filenames, command line arguments, > and environment variables). > > I'm sad that the json module in Python 3 doesn't support bytes at all, > but at least it is consistent -- it always produces text in ASCII > encoding (by default). The same applies to the http module, which IIUC > adheres to the standard by treating headers as Latin-1. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From guido at python.org Mon Jan 13 01:47:15 2014 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Jan 2014 16:47:15 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D3333B.9000404@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> Message-ID: On Sun, Jan 12, 2014 at 4:28 PM, Ethan Furman wrote: > On 01/12/2014 03:55 PM, Guido van Rossum wrote: >> >> There's a lot of discussion about PEP 460 and I haven't read it all. >> Maybe you all have already reached the same conclusion that I have. > > > No, no agreement has been reached. Your contribution is timely. > > > >> PEP 460 itself currently rejects support for %d, AFAIK on the basis >> that bytes aren't necessarily ASCII. I think that's a misunderstanding >> of the intention of the bytes type. > > >> [...] this does not mean the bytes type isn't allowed to have a >> >> noticeable bias in favor of encodings that are ASCII supersets, even >> if not all bytes objects contain such data [...] > > >> IMO it's totally fine and consistent if b'%d' % 42 returns b'42' and >> also for b'{}'.format(42) to return b'42' [...] >> >> >> - byte literals: b'abc' (it's a syntax error to have a non-ASCII character >> here) >> - the upper() and lower() methods modify the ASCII letter positions >> - int(b'42') == 42, float(b'3.14') == 3.14 > > > So if we allow the numeric modifiers [1], the only remaining question is do > we allow %c and %s, and if so how do they behave? > > Guido? Yes, all the numeric formatting codes such as %x, %o, %e, %f, %g should all work, as should the padding, justification and and related modifiers. E.g. b'%4x' %10 should return b' a'. %c looks simple enough too: With an int it should insert one byte, insisting that the value is in range(256). With a bytes argument the length should be 1. (I note that I can't remember ever using %c -- it's just there because it's in C.) %s seems the trickiest: I think with a bytes argument it should just insert those bytes (and the padding modifiers should work too), and for other types it should probably work like %a, so that it works as expected for numeric values, and with a string argument it will return the ascii()-variant of its repr(). Examples: b'%s' % 42 == b'42' b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' enclosed in single quotes) I have to admin I didn't know about ascii(). It's nifty. :-) > -- > ~Ethan~ > > > [1] modifiers is not the right word for %i, %x, etc, is it? What is the > correct term? I'd interpret "modifiers" as the stuff that can go between the % and the format letter, e.g. %04d or %-.3s. The term I'd use for %i, %x etc would be numeric formatting codes. -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Mon Jan 13 01:28:43 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 16:28:43 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: Message-ID: <52D3333B.9000404@stoneleaf.us> On 01/12/2014 03:55 PM, Guido van Rossum wrote: > There's a lot of discussion about PEP 460 and I haven't read it all. > Maybe you all have already reached the same conclusion that I have. No, no agreement has been reached. Your contribution is timely. > PEP 460 itself currently rejects support for %d, AFAIK on the basis > that bytes aren't necessarily ASCII. I think that's a misunderstanding > of the intention of the bytes type. > [...] this does not mean the bytes type isn't allowed to have a > noticeable bias in favor of encodings that are ASCII supersets, even > if not all bytes objects contain such data [...] > IMO it's totally fine and consistent if b'%d' % 42 returns b'42' and > also for b'{}'.format(42) to return b'42' [...] > > - byte literals: b'abc' (it's a syntax error to have a non-ASCII character here) > - the upper() and lower() methods modify the ASCII letter positions > - int(b'42') == 42, float(b'3.14') == 3.14 So if we allow the numeric modifiers [1], the only remaining question is do we allow %c and %s, and if so how do they behave? Guido? -- ~Ethan~ [1] modifiers is not the right word for %i, %x, etc, is it? What is the correct term? From guido at python.org Mon Jan 13 01:56:35 2014 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Jan 2014 16:56:35 -0800 Subject: [Python-Dev] Trying to focus the whole bytes/str formatting discussion In-Reply-To: References: Message-ID: Sorry, I started my own "PEP 460 reboot" thread -- I wrote that message before yours arrived, even if maybe I posted after you. I'm in the PBP camp myself for this. I won't pronounce on PEP 460 as-is. Please follow up in the other thread if you need clarifications. On Sun, Jan 12, 2014 at 2:46 PM, Brett Cannon wrote: > I don't know about the rest of you but I feel like the discussion is heading > off the rails (if it hasn't already jumped the tracks). Let's try to bring > this back around to something actionable which people can focus their energy > on as the amount of developer time spent arguing could have led to several > coded-up solutions. > > I see it as a practicality-beats-purity vs. > explicit-is-better-than-implicit. The PBP group want bytes.format() (just > assume I include interpolation support if you want that) to work as close to > a drop-in replacement for current str.format() use in Python 2 to ease > porting. The argument is that code looks cleaner and the amount of changes > in Python 2 code being ported to Python 3 is much smaller. > > THE EIBTI group are willing to support PEP 460 but beyond that don't want to > have in Python itself anything for bytes.format() which takes in a string > and spits out bytes. It's bytes in->bytes out and not bytes & str in->bytes > out as the PBP group is after. The EIBTI group are arguing that letting str > into bytes.format() and then automatically be converted to strict ASCII > leads to conflating the text/bytes divide as well as being too magical, e.g. > what if you actually wanted UTF-16 for you number string instead of ASCII; > the EIBTI group **wants** to force people to make a decision. They are also > less concerned with making users update Python 2 code to handle this as it > already needs to be updated for other Python 3 things anyway. > > From where I'm sitting, the EIBTI group and their PEP 460 proposal from > Antoine (and no longer Victor) are not controversial. Everyone seems to > agree that PEP 460 **at minimum** is acceptable and should happen for Python > 3.5. The people with the uphill battle and something to prove are those > arguing for str in->bytes out support in bytes.format(). The added features > that the PBP group want are the ones being argued over. > > As the onus is on the PBP group to convince the EIBTI group (or Guido), I > think the PBP group should code up a solution that does what they want and > put it on PyPI to see what the community thinks. If the PBP group wants to > convince the EIBTI group that str in->bytes out for bytes.format() is > critical in getting a key group of users to start using Python 3 then I > think that needs to be demonstrated through real-world usage by some people. > > If there is serious pickup of the solution from PyPI by projects then we can > discuss integrating it into Python 3.5. That gives at least **one year** to > come up with a solution which gets picked up by the community (standard > requirement for stdlib inclusion). At worst some projects use the PyPI > project and find it useful but it doesn't go into Python 3.5. At best lots > of people find it useful enough that we add it to Python 3.5. But > regardless, a PyPI project helps people **no matter what** the EIBTI group > thinks. That's more forward momentum than this conversation currently has. > > This has split down philosophical lines and does not look to be tilting one > way or the other by simply using words. I think it has reached the point > that showing code is going to be the only way to tilt the favour towards the > PBP group at this point. Guido has not spoken up so either he is ignoring it > because he's busy, he doesn't care, or he's mulling things over still. > Assuming he doesn't speak up then it comes down to getting a clear majority > on the side of the PBP group and that is not going to happen the way this > discussion is going. > > So, action items are: > > * Get PEP 460 pronounced on **as is** > * A PyPI project containing PBP ideas and see if the community seizes on it > or not (benefit to people regardless) > * Do a separate PEP that builds on PEP 460 if people really want to continue > down that road at this time > > Don't forget, we are talking about Python 3.5; we have not even hit Python > 3.4rc1 yet so this level of arguing seems a bit premature and going nowhere. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Mon Jan 13 02:27:12 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 17:27:12 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> Message-ID: <52D340F0.5080401@stoneleaf.us> On 01/12/2014 04:47 PM, Guido van Rossum wrote: > > %s seems the trickiest: I think with a bytes argument it should just > insert those bytes (and the padding modifiers should work too), and > for other types it should probably work like %a, so that it works as > expected for numeric values, and with a string argument it will return > the ascii()-variant of its repr(). Examples: > > b'%s' % 42 == b'42' > b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' > enclosed in single quotes) I'm not sure about the quotes. Would anyone ever actually want those in the byte stream? -- ~Ethan~ From steve at pearwood.info Mon Jan 13 03:03:15 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 13 Jan 2014 13:03:15 +1100 Subject: [Python-Dev] Smuggling bytes into text (was Re: RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5) In-Reply-To: <52D21C59.2010600@stoneleaf.us> References: <20140111053639.GM3869@ando> <20140111153837.GP3869@ando> <52D16F4B.6080900@stoneleaf.us> <20140111183631.GQ3869@ando> <52D19600.1070106@stoneleaf.us> <20140112022909.GX3869@ando> <52D21C59.2010600@stoneleaf.us> Message-ID: <20140113020314.GF3869@ando> Changing the subject line to better describe what we're talking about. I hope it is of interest to others apart from Ethan and I -- mixed bytes and text is hard to get right. (And if I've got something wrong, I'd like to know about it.) On Sat, Jan 11, 2014 at 08:38:49PM -0800, Ethan Furman wrote: > On 01/11/2014 06:29 PM, Steven D'Aprano wrote: [...] > Since you're talking to me, it would be nice if you addressed the same > use-case I was addressing, which is mixed: ascii-encoded text, > ascii-encoded numbers, ascii-encoded bools, binary-encoded numbers, and > misc-encoded text. I thought I had addressed it. But since your use-case is underspecified, please excuse me if I get some of it wrong. > And no, your example will not work with any text, it would completely > moji-bake my dbf files. I don't think it will. Admittedly, I don't know all the ins and outs of your files, but as far as I can tell, nothing you have said so far suggests that my plan will fail. Code code speaks louder than words: http://www.pearwood.info/ethan_demo.py This code produces a string containing smuggled bytes. There is: - a header containing raw bytes; - metadata consisting of the name of some encoding in ASCII; - A series of tagged fields. Each field has a name, which is always ASCII, and terminated with a colon. It is then followed by a single ASCII character and some data: * T for some arbitrary chunk of text, encoded in the metadata encoding, with a length byte prefix (that is, like a Pascal string); * F for a boolean flag "true" or "false" in ASCII; * N for an integer, a C long; * D for an integer, in ASCII, terminated at the first non-digit; * B for a chunk of arbitrary bytes, with a two-byte length prefix. And the whole thing is written out to a file, then read back in, without data corruption or mojibake. I wrote this about 1am this morning, so it may or may not be a shining example of idiomatic Python code, but it works and is readable. I understand that this won't match your actual use-case precisely, but I hope it contains the same sorts of mixed binary data and ASCII text that you're talking about. There are fixed width fields, variable length fields, binary fields, ASCII fields, non-ASCII text, and multiple encodings, all living in perfect harmony :-) And it runs unchanged under both Python 2.7 and 3.3. As so often happens, what seems good in principle is less useful in practce. Once I actually started writing code, I quickly moved beyond the simple model: template = "some text" data = template % ("text", 42, b'\x16foo'.decode('latin-1')) that I thought would be easy to a more structured approach. So I wrote reader and writer classes and abstracted away the messy bits, although in truth none of it is very messy. The worst is dealing with the 2 versus 3 differences, and even that requires only a handful of small helper functions. I don't claim that the code I tossed together is the optimal design, or bug-free, or even that the exact same approach will work for your specific case. But it is enough to demonstrate that the basic idea is sound, you can process mixed text and bytes in a clean way, it doesn't generate mojibake, and can operate in both 2.7 and 3.3 without even using a __future__ directive. > >>>Only the binary blobs need to be decoded. We don't need to encode the > >>>template to bytes, and the textual data doesn't get encoded until we're > >>>ready to send it across the wire or write it to disk. > > No! When I have text, part of which gets ascii-encoded and part of which > gets, say, cp1251 encoded, I cannot wait till the end! I think we are talking about different textual data. It's a bit ambiguous, my apologies. You're talking about taking individual fields and deciding how to process them. I'm talking about doing your processing in the text domain, which means at the end of the process I have a Unicode string object rather than a bytes object. Before that str can be written to disk, it needs to be encoded. > >>And what if your name field has data not representable in latin-1? > >> > >>--> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8') > >>u'\u0441\u0440\u0403' > > > >Where did you get those bytes from? You got them from somewhere. > > For the sake of argument, pretend a user entered them in. > > >Who knows? Who cares? Once you have bytes, you can treat them as a blob of > >arbitrary bytes and write them to the record using the Latin-1 trick. > > No, I can't. See above. > > > If > >you're reading those bytes from some stream that gives you bytes, you > >don't have to care where they came from. > > You're kidding, right? If I don't know where they came from (a graphics > field? a note field?) how am I going to know how to treat them? As I understand it, you want the ability to store *arbitrary bytes* in the file, right? Here are nine arbitrary bytes: b'\x82\xE1\xC2\0\0\x7B\0\xFF\xA8' You don't need to know how I generated them, whether they are sound samples, data from a serial port, three RGB values, or some strange C struct. I need to know how to generate them, but you can treat them as an opaque blob. They're *already* bytes, you're not responsible for converting whatever the data was into bytes, because it's already done. It's just a blob of bytes as far as you're concerned. All you need to do is smuggle them into a text string. > >But what if you don't start with bytes? If you start with a bunch of > >floats, you'll probably convert them to bytes using the struct module. > > Yup, and I do. > > >If you start with non-ASCII text, you have to convert them to bytes too. > >No difference here. > > Really? Again, I fear I failed to explain myself in sufficient detail. If your non-ASCII text doesn't match the encoding specified, how else are you going to include it? See below. > You just said above that "it will work with any text data" -- you > can't have it both ways. I have been unclear, I apologise. Let me try again with an example. As the end-user, I get to specify the encoding, that's what you said. Okay, I specify ISO-8859-7, which is Greek. Now obviously if I hand you a bunch of Russian letters in a string, and you try to encode them using ISO-8859-7, you're going to get an exception. That's okay, as presumably I'm sensible enough to only include characters which exist in the encoding I choose, and if not, its my own damn fault. But suppose I have a reason for this strange behaviour. If I pre-encode those Russian letters to bytes, using (say) UTF-16, then I can hand you the raw bytes to store as a binary blob. Later, I get the binary blob back again, and I can decode them using UTF-16, to get the original Russian text back again. So long as you don't mangle the binary blob, the process is completely reversable. That is what I am talking about. > >You ask the user for their name, they answer "???" which is given to you > >as a Unicode string, and you want to include it in your data record. The > >specifications of your file format aren't clear, so I'm going to assume > >that: > > > >1) ASCII text is allowed "as-is" (that is, the name "George" will be > > in the final data file as b'George'); > > User data is not (typically) where the ASCII data is, but some of the > metadata is definitely and always ASCII. The user text data needs to be > encoded using whichever codec is specified by the file, which is only > occasionally ASCII. > > > >2) any other non-ASCII text will be encoded as some fixed encoding > > which we can choose to suit ourselves; > > Well, the user chooses it, we have to abide by their choice. (It's kept in > the file metadata.) > > > >3) arbitrary binary data is allowed "as-is" (i.e. byte N has to end up > > being written as byte N, for any value of N between 0 and 255). > > In a couple field types, yes. Usually the binary data is numeric or date > related and there is conversion going on there, too, to give me the bytes I > need. The above all sounds reasonable. But the following does not -- I think it shows some fundamental confusion on your part. > [snip] > > >>--> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8').encode('latin1') > >>Traceback (most recent call last): > >> File "", line 1, in > >>UnicodeEncodeError: 'latin-1' codec can't encode characters in position > >>0-2: ordinal not in range(256) > > > >That is backwards to what I've shown. Look at my earlier example again: > > And you are not paying attention: > > '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8').encode('latin1') > \--------------------------------------/ \-------------/ > a non-ascii compatible unicode string to latin1 bytes You can't *decode* Unicode strings. Try it in Python 3, and it breaks: py> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8') Traceback (most recent call last): File "", line 1, in AttributeError: 'str' object has no attribute 'decode' For your code to work, you can't be using Python 3, you have to be using Python 2, where "..." is already bytes, not Unicode. Since it's a byte string, there's no point in decoding it into UTF-8, then encoding it back to bytes. All you are doing is running the risk of UnicodeEncodingError: # Python 2.7 this time py> '\xd0\x94'.decode('utf-8').encode('latin-1') Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0414' in position 0: ordinal not in range(256) Latin-1 does not work with arbitrary *characters*, but it does work with arbitrary *bytes*. You're trying to take a UTF-8 encoded byte string, decode back to arbitrary Unicode characters, then *encode* to Latin-1, which may fail. What I am doing is taking arbitrary *bytes*, then *decode* to Latin-1 as a way of smuggling those bytes into a str. > ("???".encode('some_non_ascii_encoding_such_as_cp1251').decode('latin-1'), 42, blob.decode('latin-1')) > \----------------------------------------------/ \--------------/ > getting the actual bytes I need and back into > unicode until I write them later In Python 3, that works, but I'm not sure if it does what you intend (I don't know what you intend). You have encode and decode the right way around this time, for Python 3 strings. In Python 2, the interpreter (wrongly) accepts "???" as a byte-string literal, but the results are poorly defined. What you actually get (probably) depends on your enviroment. On my system, I seem to get UTF-8 encoded bytes, but that's not guaranteed. > You did say to use a *text* template to manipulate my data, and then write > it later, no? Well, this is what it would look like. If the text strings the user gives you are compatible with the encoding they specify, you don't need that. Just use: ("???", 42, blob.decode('latin-1')) It's the user's responsibility if they choose to specify an encoding which is more restrictive than the contents of some field. If they do that, they have to encode that field somehow, so they can treat it as a binary blob. *You* don't have to do this, and you certainly don't have to take perfectly good text and turn it into bytes then back to text just so you can insert it back into text. That would be silly. > >Bytes get DECODED to latin-1, not encoded. > > > >Bytes -> text is *decoding* > >Text -> bytes is *encoding* > > Pretend for a moment I know that, and look at my examples again. Sorry to be harsh, but based on your swapping decode and encode around above in the examples above, I would have to pretend :-) > I am demonstrating the contortions needed when my TEXTual data is not > ASCII-compatible: It must be ENcoded using the appropriate codec to BYTES, > then DEcoded back to unicode using latin1, all so later I can ENcode the > bloomin' unicode data structure back to bytes using latin1 again. Dizzy > yet? No. If I, the end user, insist on using a stupid legacy encoding, then *YES* absolutely of course I have to jump through hoops to store arbitrary Unicode characters using a legacy encoding that only supports a tiny subset of Unicode. This should not surprise you. > And you must know this, because it is what your bytify function does. Are > you trolling? No. -- Steven From dholth at gmail.com Mon Jan 13 03:07:27 2014 From: dholth at gmail.com (Daniel Holth) Date: Sun, 12 Jan 2014 21:07:27 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D340F0.5080401@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> Message-ID: On Sun, Jan 12, 2014 at 8:27 PM, Ethan Furman wrote: > On 01/12/2014 04:47 PM, Guido van Rossum wrote: >> >> >> %s seems the trickiest: I think with a bytes argument it should just >> insert those bytes (and the padding modifiers should work too), and >> for other types it should probably work like %a, so that it works as >> expected for numeric values, and with a string argument it will return >> the ascii()-variant of its repr(). Examples: >> >> b'%s' % 42 == b'42' >> b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' >> enclosed in single quotes) > > > I'm not sure about the quotes. Would anyone ever actually want those in the > byte stream? > > -- > ~Ethan~ Is there a formatting character that means "anything except a unicode string" to prevent accidentally interpolating a Unicode string into a bytes string without [a sane] encoding? From guido at python.org Mon Jan 13 03:11:47 2014 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Jan 2014 18:11:47 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D340F0.5080401@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> Message-ID: On Sun, Jan 12, 2014 at 5:27 PM, Ethan Furman wrote: > On 01/12/2014 04:47 PM, Guido van Rossum wrote: >> %s seems the trickiest: I think with a bytes argument it should just >> insert those bytes (and the padding modifiers should work too), and >> for other types it should probably work like %a, so that it works as >> expected for numeric values, and with a string argument it will return >> the ascii()-variant of its repr(). Examples: >> >> b'%s' % 42 == b'42' >> b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' >> enclosed in single quotes) > > I'm not sure about the quotes. Would anyone ever actually want those in the > byte stream? Perhaps not, but it's a hint that you should probably think about an encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of it as payback time. :-) -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Jan 13 03:18:08 2014 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Jan 2014 18:18:08 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> Message-ID: On Sun, Jan 12, 2014 at 6:07 PM, Daniel Holth wrote: > Is there a formatting character that means "anything except a unicode > string" to prevent accidentally interpolating a Unicode string into a > bytes string without [a sane] encoding? No, and we shouldn't introduce one. An operation should either work for no type, one type, a few specific types, or all types. Something that works for all but one type will *appear* to work for all types to a casually experimenting user and may pass extensive unittests, leaving a bomb that can detonate when you least expect it. -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Mon Jan 13 03:21:25 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 13 Jan 2014 13:21:25 +1100 Subject: [Python-Dev] Smuggling bytes into text (was Re: RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5) In-Reply-To: <20140113020314.GF3869@ando> References: <20140111053639.GM3869@ando> <20140111153837.GP3869@ando> <52D16F4B.6080900@stoneleaf.us> <20140111183631.GQ3869@ando> <52D19600.1070106@stoneleaf.us> <20140112022909.GX3869@ando> <52D21C59.2010600@stoneleaf.us> <20140113020314.GF3869@ando> Message-ID: <20140113022125.GG3869@ando> On Mon, Jan 13, 2014 at 01:03:15PM +1100, Steven D'Aprano wrote: > code speaks louder than words: http://www.pearwood.info/ethan_demo.py [...] Ethan refers to code like: template % ("???".encode('cp1251').decode('latin-1'), 42, blob.decode('latin-1')) > > You did say to use a *text* template to manipulate my data, and then write > > it later, no? Well, this is what it would look like. > > If the text strings the user gives you are compatible with the > encoding they specify, you don't need that. Just use: > > ("???", 42, blob.decode('latin-1')) > > It's the user's responsibility if they choose to specify an encoding > which is more restrictive than the contents of some field. If they do > that, they have to encode that field somehow, so they can treat it as a > binary blob. *You* don't have to do this, and you certainly don't have > to take perfectly good text and turn it into bytes then back to text > just so you can insert it back into text. That would be silly. It occurs to me that I do exactly that in my demo code :-) In my defence, it was 1am when I wrote it, and I am a little unclear about Nathan's use-case whether the entire file is supposed to be compatible with the cp1251 encoding (the example that he gives), or just individual fields in it. If I understood the requirements better, my code would probably be able to avoid some of those encodes/decodes, or I might even decide that working in the text domain is a mistake and instead we should look to smuggle text into bytes rather than the other way around. Regardless of which way you go, I'm not seeing that mixed bytes and text should be a reason to hold off migrating from 2 to 3. Which is where this discussion started days and days ago. -- Steven From dholth at gmail.com Mon Jan 13 03:30:35 2014 From: dholth at gmail.com (Daniel Holth) Date: Sun, 12 Jan 2014 21:30:35 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> Message-ID: On Sun, Jan 12, 2014 at 9:18 PM, Guido van Rossum wrote: > On Sun, Jan 12, 2014 at 6:07 PM, Daniel Holth wrote: >> Is there a formatting character that means "anything except a unicode >> string" to prevent accidentally interpolating a Unicode string into a >> bytes string without [a sane] encoding? > > No, and we shouldn't introduce one. An operation should either work > for no type, one type, a few specific types, or all types. Something > that works for all but one type will *appear* to work for all types to > a casually experimenting user and may pass extensive unittests, > leaving a bomb that can detonate when you least expect it. That pretty much describes how I feel about str(bytes). I would accept "only a bytes" or "only a string" as consolation formatting characters :-) From ethan at stoneleaf.us Mon Jan 13 03:16:17 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 18:16:17 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> Message-ID: <52D34C71.1020200@stoneleaf.us> On 01/12/2014 06:07 PM, Daniel Holth wrote: > On Sun, Jan 12, 2014 at 8:27 PM, Ethan Furman wrote: >> On 01/12/2014 04:47 PM, Guido van Rossum wrote: >>> >>> >>> %s seems the trickiest: I think with a bytes argument it should just >>> insert those bytes (and the padding modifiers should work too), and >>> for other types it should probably work like %a, so that it works as >>> expected for numeric values, and with a string argument it will return >>> the ascii()-variant of its repr(). Examples: >>> >>> b'%s' % 42 == b'42' >>> b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' >>> enclosed in single quotes) >> >> I'm not sure about the quotes. Would anyone ever actually want those in the >> byte stream? > > Is there a formatting character that means "anything except a unicode > string" to prevent accidentally interpolating a Unicode string into a > bytes string without [a sane] encoding? In reference to a byte stream, if you do: --> b'%s' % 'some text'.encode('cp1241') it's really just bytes into bytes. If you do : --> b'%s' % 'some text' then the encoding is ASCII with strict error checking. So if it's not representable as clean ASCII either encode it manually, or prepare for it to blow up with an UnicodeEncodeError. -- ~Ethan~ From ethan at stoneleaf.us Mon Jan 13 03:24:37 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 18:24:37 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D34C71.1020200@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D34C71.1020200@stoneleaf.us> Message-ID: <52D34E65.2050007@stoneleaf.us> On 01/12/2014 06:16 PM, Ethan Furman wrote: > > If you do : > > --> b'%s' % 'some text' Ignore what I previously said. With no encoding the result would be: b"'some text'" So an encoding should definitely be specified. -- ~Ethan~ From ethan at stoneleaf.us Mon Jan 13 03:31:48 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 18:31:48 -0800 Subject: [Python-Dev] Smuggling bytes into text (was Re: RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5) In-Reply-To: <20140113020314.GF3869@ando> References: <20140111053639.GM3869@ando> <20140111153837.GP3869@ando> <52D16F4B.6080900@stoneleaf.us> <20140111183631.GQ3869@ando> <52D19600.1070106@stoneleaf.us> <20140112022909.GX3869@ando> <52D21C59.2010600@stoneleaf.us> <20140113020314.GF3869@ando> Message-ID: <52D35014.3080603@stoneleaf.us> On 01/12/2014 06:03 PM, Steven D'Aprano wrote: > > The above all sounds reasonable. But the following does not -- I think > it shows some fundamental confusion on your part. My apologies. The '\xd1.....' was a bytestring, I forgot to type the b. (I know, I know, I should've copied and pasted :( ) -- ~Ethan~ From ethan at stoneleaf.us Mon Jan 13 03:34:11 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 18:34:11 -0800 Subject: [Python-Dev] Smuggling bytes into text (was Re: RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5) In-Reply-To: <20140113022125.GG3869@ando> References: <20140111053639.GM3869@ando> <20140111153837.GP3869@ando> <52D16F4B.6080900@stoneleaf.us> <20140111183631.GQ3869@ando> <52D19600.1070106@stoneleaf.us> <20140112022909.GX3869@ando> <52D21C59.2010600@stoneleaf.us> <20140113020314.GF3869@ando> <20140113022125.GG3869@ando> Message-ID: <52D350A3.4050308@stoneleaf.us> On 01/12/2014 06:21 PM, Steven D'Aprano wrote: > On Mon, Jan 13, 2014 at 01:03:15PM +1100, Steven D'Aprano wrote: > >> code speaks louder than words: http://www.pearwood.info/ethan_demo.py > > [...] > > Ethan refers to code like: > >>> template % ("???".encode('cp1251').decode('latin-1'), 42, blob.decode('latin-1')) > > It occurs to me that I do exactly that in my demo code :-) Well, at least you see the point I was trying to make, even if you don't agree. :) I apologize again for my typos that made it look like I had no idea what I was talking about. ;) -- ~Ethan~ From stephen at xemacs.org Mon Jan 13 04:02:05 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jan 2014 12:02:05 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D308EF.8080108@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> Message-ID: <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > 1) Are you saying it's okay to be insulting when frustrated? I > also find this mega-thread frustrating, but I'm trying > very hard not to be insulting. OK, no. Understandable, yes. > 2) If you are going to use my name, please be certain of the facts > [1]. More below. > > > MAL posted straight out the Python 2 model of text makes it easier for > > him to write some programs, so he's all for reintroducing it. And > > that is the whole truth of the matter. Although I disagree with him, > > I appreciate his honesty. > > If you have an example of me lying (even if it's just a > possibility), please refer to it directly so I can either try to > explain the misunderstanding or apologize. Praising one person for honesty doesn't imply anybody else is lying. As for the Artist Currently Posting as Ethan Furman, he's not in the "disingenous" group. I don't think you understand the issues at stake (among other things, as I've discussed elsewhere, I think your use case is different from the use cases of most of those who are asking for bytes formatting). And there's a crucial terminology difference: > In only one case did I use the word "text" loosely, >From my point of view, you consistently do so. Bytes are *never* Python 3 text in my terminology, and I think that is generally accepted on these channels. "ASCII-encoded text" as you call it (and repeatedly do so), and want to manipulate using str-like methods on bytes, is *exactly* the Python 2 model of text. But you deny that the effect of your proposals (eg, b"%d" % (12,)) is to reintroduce Python 2's bytes/character confusion, don't you? Yes, I've used "ASCII-compatible text" in some of my posts, but I recognize that as "loose usage", too, and would stop if requested. Note I'm not asking you to stop -- I think we all understand what you mean, even though for some of us it's loose terminology. What I do hope you will recognize is that adding str-like methods to bytes is precisely the Python 2 model of text processing[1], and that like MAL you will say, "OK, I don't see a problem with reintroducing Python 2's byte/character confusion." (Well, I *really* want you to see the light, and retract your proposal for b'%d' format. But that hardly seems likely. :-) > But don't lie to me (as Nick tried to) and say that "In particular, > the bytes type is, and always will be, designed for pure binary > manipulation" when it has methods like .center(). I hardly think Nick is *lying*, any more than you are. AFAICT, you're *both* wrong. According to PEP 3137[2] by Guido van Rossum, the idea of the immutable bytes type was suggested (in various aspects which combined to overcome Guido's initial opposition) by Gregory P. Smith, Jeffrey Yasskin, and Talin. Guido then chose to implement it by grabbing the Python 2 code, and removing .encode, and removing locale-dependent definitions of character classes. This was with a view to supporting ports of code that implements wire protocols or uses bytes as encoded text: It also makes it possible to efficiently create hash tables using bytes for keys; this may be useful when parsing protocols like HTTP or SMTP which are based on bytes representing text. Porting code that manipulates binary data (or encoded text) in Python 2.x will be easier using the new design than using the original 3.0 design with mutable bytes; simply replace str with bytes and change '...' literals into b'...' literals. IIRC, only later was regex support added to bytes (by Nick himself, again IIRC). And despite the quote above, I don't think Guido meant to encourage use of bytes as text in wire protocol development, at least not at that time. Note that Nick has already admitted that permitting even methods that can be implemented purely as numerical manipulations: def is_uppercase(b): # Note all comparisons are between integers: return ord('A') <= b[0] and b[0] <= ord('Z') was in retrospect a mistake (in his opinion). So I don't think it was a lie, merely a difference in your definitions of "pure binary manipulation". (Which isn't surprising, given that ultimately everything in computers as we know them today eventually reduces to "pure binary manipulations".[3] Drawing the line is going to involve personal taste to some extent.) I think his interpretation that bytes were *designed* that way is a bit strained given PEP 3137. I also don't know what was discussed at language summits, and don't recall the python-dev conversations about it at all. A final remark: Be very careful in interpreting Guido's words in these "practical vs. pure" matters. I've discovered his offhand comments on these matters are often both subtle and deep (that probably doesn't surprise you), and that the idea behind them is usually extremely precise though his expression may informal or even casual (and here be dragons -- taking the expression too literally may lead you astray). > I think some of the misunderstanding (which you also seem to suffer > from) is that we (or at least I) /ever/ want a unicode string back > from bytes interpolation. I don't! Please tell me why you think I suffer from that misunderstanding. I certainly don't think you *want* Unicode strings. You've been quite strident about the fact that you don' need no steekin' yooneekode (for these purposes). What I want to find out is why your use case can't be handled with Python 3 str. That's why I provide examples (mostly parallel to yours) that return str in Python 3 (I can't speak for anyone else). > To summarize, I used the term text when referring to unicode text > (str), ASCII or ASCII-encoded text to refer to bytes that are to be > used in a place that requires ASCII bytes for communication (such > as content length or field type). I've never been confused about that, but your use of the word "text" in a way differently from others in the thread seems to confuse you about what *they* mean. But did you get that I'm worried that programmers in Omaha will use that same functionality to communicate American English (for which it is basically sufficient, and which also requires ASCII when bytes are used for communication)? > *My* definition is not ambiguous at all. If this particular part > of the byte stream is defined to contain ASCII-encoded text, then I > can use the bytes text methods to work with it. But how is Python supposed to know that? The point of having types in a programming language is so that either the interpreter can just DTRT, or raise an exception if TRT is ambiguous, without explicit specification by the programmer. This is precisely what asciistr is for: it knows that it is both unicode and bytes compatible, and morphs automatically to whichever it is combined with. And does so efficiently (because they're all immutable, any combination of these types in Python involves copying "code units", and for asciistr that copy is always of bytes, thus reducing eventually to memcpy for bytes and latin1-only str). But under your definition, you need to make the decision, or explicitly code the decision, on the basis of context. > > When it's convenient for them to use text-processing operations > > on bytes, they'll say "oh, yes, these are conventionally > > considered text-processing features, but that's just an accident > > of the particular configuration of bytes -- yup, bytes -- I'm > > processing." > > If that particular configuration of bytes is because it's > ASCII-encoded text, then sure. Once again, you are advocate precisely the Python 2 model of text. > To use, for example, bytes.__upper__ on data that wasn't > ASCII-encoded text (even if it happened to look like it was) would > be the height of stupidity. Please don't include me in such > accusations. I have no idea why you think I think anybody would be that stupid. That never occured to me. It's precisely "magic numbers" that happen to look like English words when interpreted as ASCII coded characters that I don't want manipulated by str-like methods that interpret text (such as full-featured format or %). If b"Content-Length: 123" is (ASCII-encoded) text, then it should be created as, or decoded to, internal text and handled that way. If it's binary, then handle it as binary. > > ambiguous form". IMO, with the proposed changes, that is likely to > > continue indefinitely, negating some of the gains I expected to > > receive from Python 3. :-( > > This would be a good reason to reject PEP 460, if that danger was > deemed more likely than the good it would bring. Depends on which version. I earlier opposed PEP 460 in any form, but I'm persuaded by Nick's particular definition of "pure binary manipulation" and agree that PEP 460 as revised by Antoine is harmless to my goals. Although I personally am unlikely to find any great convenience from it (both as a matter of style and to a great extent a lack of use cases, although I'd like to get involved in the email module). > > Note: there are a lot of high-level frameworks like Django that even > > in Python 2 basically went to Unicode everywhere internally. I don't > > deny that. I think that Python 3 as currently constituted makes it a > > lot easier to make an appropriate decision of where to convert, and > > should take some of the burden off the high-level frameworks. > > Approving this PEP, especially in a maximalist form, will blur the > > lines. > > I understand your point, but I disagree. When I open a file (in > binary mode, obviously, as otherwise I'd get massive corruption) Obviously, *you* would open the file in binary mode, but by definition of the latin1 codec and the surrogateescape handler, *I* can definitely avoid any corruption when reading such files as text. (This may require painful contortions if one does any nontrivial processing, but then again it may not.) > I get back a bunch of bytes. When working with tcp, I get back a > bunch of bytes. bytes are /already/ the boundary type. No, they are not. Clearly there are "just bytes" on the "outside" of I/O in each of your examples here, and they are "just copied" to the inside of Python. But in Nick's sense, this is the "outside," *not* the "inside", of your program! On the "inside", *you* want "a bool, an int, a float, a date, or, even, a str" (I'm quoting!). What Nick means by a "boundary type" is a type that works seamlessly with the types on each side of the boundary as a helper in the conversion. So when you use a struct to pack a bool, an int, and a date into a bytes, the struct is the boundary type. And if there's a helper type to work with bytes and/or str simultaneously, that's a boundary type, eg, asciistr. But bytes itself is not a boundary type, it's just a type with no internal structure, not even characters. > If we have to make a third type for proper boundary processing it's > an admission that bytes failed in its role. That admission was made in PEP 3100. Or, more precisely, bytes was never considered as a boundary type in Python 3. Footnotes: [1] To be precise, one of two models, the other one being the unicode type. [2] http://www.python.org/dev/peps/pep-3137/ [3] OK, OK, I still have my Daddy's K&E loglog slide rule. Not *everything* is binary! From scott+python-dev at scottdial.com Mon Jan 13 04:26:24 2014 From: scott+python-dev at scottdial.com (Scott Dial) Date: Sun, 12 Jan 2014 22:26:24 -0500 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: Message-ID: <52D35CE0.2050603@scottdial.com> On 2014-01-11 22:09, Nick Coghlan wrote: > For Python 2 folks trying to grok where the "bright line" is in terms of > the Python 3 text model: if your proposal includes *any* kind of > implicit serialisation of non binary data to binary, it is going to be > rejected as an addition to the core bytes type. If it avoids crossing > that line (as the buffer-API-only version of PEP 460 does), then we can > talk. To take such a hard-line stance, I would expect you to author a PEP to strip the ASCII conveniences from the bytes and bytearray types. Otherwise, I find it a bit schizophrenic to argue that methods like lower, upper, title, and etc. don't implicitly assume encoding: >>> a = "scott".encode('utf-16') >>> b = a.title() >>> c = b.decode('utf-16') 'SCOTT' So, clearly title() not only depends on the bytes characters encoded in a superset of ASCII characters, it depends on the bytes being a sequence of ASCII characters, which looks an awful lot like an operation on an implicit encoded string. >>> b"????" File "", line 1 SyntaxError: bytes can only contain ASCII literal characters. There is an implicit serialization right there. My terminal is utf8 (or even if my source encoding is utf8), so why would that not be: b'\xe6\x96\x87\xe5\xad\x97\xe5\x8c\x96\xe3\x81\x91' I sympathize with Ethan that the bytes and bytearray types already seem to concede that bytes is the type you want to use for 7-bit ASCII manipulations. If that is not what we want, then we are not doing a good job communicating that to developers with the API. At the onset, the bytes literal itself seems to be an attractive nuisance as it gives a nod to using bytes for ASCII character sequences (a.k.a ASCII strings). Regards, -Scott -- Scott Dial scott at scottdial.com From guido at python.org Mon Jan 13 04:45:13 2014 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Jan 2014 19:45:13 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D34C71.1020200@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D34C71.1020200@stoneleaf.us> Message-ID: On Sun, Jan 12, 2014 at 6:16 PM, Ethan Furman wrote: > In reference to a byte stream, if you do: > > --> b'%s' % 'some text'.encode('cp1241') > > it's really just bytes into bytes. That's a confusing example -- it would be clearer to just show b'%s' % b'some text' > If you do : > > --> b'%s' % 'some text' > > then the encoding is ASCII with strict error checking. So if it's not > representable as clean ASCII either encode it manually, or prepare for it to > blow up with an UnicodeEncodeError. You don't say what outcome you want, but if you wanted b'%s' % 'some text' to return b'some text' while b'%s' % '\u1234' should blow up, you're back at the Python 2 approach and that is the last thing I want. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Jan 13 04:47:38 2014 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Jan 2014 19:47:38 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D34E65.2050007@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D34C71.1020200@stoneleaf.us> <52D34E65.2050007@stoneleaf.us> Message-ID: On Sun, Jan 12, 2014 at 6:24 PM, Ethan Furman wrote: > On 01/12/2014 06:16 PM, Ethan Furman wrote: >> >> >> If you do : >> >> --> b'%s' % 'some text' > > > Ignore what I previously said. With no encoding the result would be: > > b"'some text'" > > So an encoding should definitely be specified. Yes, but the encoding is no business of %s or %. As far as the formatting operation cares, if the argument is bytes they will be copied literally, and if the argument is a str (or anything else) it will call ascii() on it. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Jan 13 04:49:46 2014 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Jan 2014 19:49:46 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D35CE0.2050603@scottdial.com> References: <52D35CE0.2050603@scottdial.com> Message-ID: Those still arguing on this thread might want to look at the thread "PEP 460 reboot". -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Mon Jan 13 04:56:06 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 19:56:06 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D34C71.1020200@stoneleaf.us> Message-ID: <52D363D6.6090207@stoneleaf.us> On 01/12/2014 07:45 PM, Guido van Rossum wrote: > On Sun, Jan 12, 2014 at 6:16 PM, Ethan Furman wrote: >> In reference to a byte stream, if you do: >> >> --> b'%s' % 'some text'.encode('cp1241') >> >> it's really just bytes into bytes. > > That's a confusing example -- it would be clearer to just show > > b'%s' % b'some text' > >> If you do : >> >> --> b'%s' % 'some text' >> >> then the encoding is ASCII with strict error checking. So if it's not >> representable as clean ASCII either encode it manually, or prepare for it to >> blow up with an UnicodeEncodeError. > > You don't say what outcome you want, but if you wanted b'%s' % 'some > text' to return b'some text' while b'%s' % '\u1234' should blow up, > you're back at the Python 2 approach and that is the last thing I > want. Fair enough. I'm cool with getting back b"'some_text'" and not ever blowing up. -- ~Ethan~ From stephen at xemacs.org Mon Jan 13 05:12:33 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jan 2014 13:12:33 +0900 Subject: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 In-Reply-To: <20140112234354.GD3869@ando> References: <20140111004933.4e0bb394@fsol> <20140112013500.GW3869@ando> <20140112172220.GB3869@ando> <87sissessr.fsf@uwakimon.sk.tsukuba.ac.jp> <20140112234354.GD3869@ando> Message-ID: <87lhykeczy.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > Of course you're right, but I have understood the above as being a > sketch and not real code. (E.g. does "header" really mean the literal > string "header", or does it stand in for something which is a header?) > In real code, one would need to have some way of telling where the > binary image data ends and the Unicode string begins. Sure, but I think in Ethan's case it's probably out of band. I have been assuming out of band. > > This corrupts binary_image_data. Each byte > 127 will be replaced by > > two bytes. > > And reading it back using decode('utf-8') will replace those two bytes > with a single byte, round-tripping exactly. True, but I'm assuming Ethan himself didn't choose DBF format. > Of course if you encode to UTF-8 and then try to read the binary data as > raw bytes, you'll get corrupted data. But do people expect to do this? People? Real People use Python, they wouldn't do that. :-) But the app that forced Ethan to deal with DBF might. > > This kind of subtlety is precisely why MAL warned about use of latin1 > > to smuggle bytes. > > How would you smuggle a chunk of arbitrary bytes into a text string? > Short of doing something like uuencoding it into ASCII, or > equivalent. Arbitary bytes as a chunk? I wouldn't do that, probably (see below), and it's not possible in Python 3 at present (in str ASCII codes always represent the corresponding ASCII character, they are never uninterpreted bytes). But if I know where the bytes are going to be in the str, I'd use latin1 or (encoding='ascii', errors='surrogateescape') depending on how well-controlled the processing is. If I really "own" those bytes, I might use latin1, and just "forget" all of the string-processing functions that care about character identity (eg, case manipulation). If the bytes might somehow end up leaking into the rest of the program, I'd use surrogateescape and live with the doubled space usage. But really, if it's not a wire-to-wire protocol kind of thing, I'd go ahead and create a proper model for the data, and text would be text, and chunks of arbitrary bytes would be bytes and integers would be integers.... From stephen at xemacs.org Mon Jan 13 05:27:27 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jan 2014 13:27:27 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D3295F.7070104@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <87r48cerkx.fsf@uwakimon.sk.tsukuba.ac.jp> <52D3295F.7070104@stoneleaf.us> Message-ID: <87k3e4ecb4.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > On 01/12/2014 02:57 PM, Stephen J. Turnbull wrote: > > No, Nick's point is that there's no encoding needed there are all, > > just a bunch of methods that handle numbers in the range 0-255. You > > can rationalize the particular choice of numbers by referring to the > > ASCII coded character set, and that's very useful to users. But > > knowledge of ASCII isn't necessary to specify these methods; they can > > be defined in an encoding/decoding-free way. > > How can you say that with a straight face? [1] Because I showed you code that does it. Did you see an .encode or a .decode in there? > Do you really think that .title, .isalnum, and .center (to name > only a few) would work the same if the assumed encoding was EBCIDC? Yes, yes, and yes. The numbers involved would change, and the test for finding letters would be different (and more complicated IIRC). The only one to worry about is .title, but neither ASCII nor EBCDIC has confused or multiple letter titlecase. > Do you think they would do the proper transformations, or return > the proper result, if the bytes they were used on were encoded > Japanese? That depends on which Japanese encoding. It would work correctly on UTF-8 and on EUC-JP (packed), and not on any of the others. But you wouldn't consider that "ASCII-encoded text", would you? > >> But bytes already acknowledges an ASCII bias. > > > > True, but that bias is implemented without use of encoding or > > decoding. b'%d' % (123,) -> b'123' does require encoding, at the > > very least in the sense of type change and serialization. > > You mean like changing a number into text does? Really, this is no > different. Precisely. "There should be one- and preferably only one -way to do it." The one way uses text, so preferably bytes shouldn't. From ethan at stoneleaf.us Mon Jan 13 05:00:33 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 20:00:33 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D364E1.5060704@stoneleaf.us> On 01/12/2014 07:02 PM, Stephen J. Turnbull wrote: [snip most of very eloquent reply] Thank you, Stephen, for remaining calm despite my somewhat heated response. A few comments in-line. I now better understand your viewpoint about text always being unicode strings; I just happen to disagree. Hopefully as some consolation I will be very vocal about using str unless bytes is necessary. Any application that uses text should be using str for it, and only using bytes, if necessary, on the back-end. > Ethan Furman writes: >> In only one case did I use the word "text" loosely, > > [...] Bytes are *never* Python 3 text in my terminology [...] "ASCII-encoded text" > as you call it [...] and want to manipulate using str-like methods on bytes The part that you don't seem to acknowledge (sorry if I missed it) is that there are str-like methods already on bytes. While the actual implementation of isupper (your example from below) may be done using integer methods, it only makes semantic sense if interpreted as ASCII-encoded text. > is *exactly* the Python 2 model of text. But you deny that the > effect of your proposals (eg, b"%d" % (12,)) is to reintroduce Python > 2's bytes/character confusion, don't you? Given that the default (and only) text type in Py3 is str, which is unicode, I don't think any confusion will be as severe, but I acknowledge that there could be some. > I hardly think Nick is *lying*, any more than you are. AFAICT, you're > *both* wrong. LOL, well, at least I'm in good company, then! :) >> I think some of the misunderstanding (which you also seem to suffer >> from) is that we (or at least I) /ever/ want a unicode string back >> from bytes interpolation. I don't! > > Please tell me why you think I suffer from that misunderstanding. I no longer recall, but whatever misapprehension I was suffering from you have alleviated. (That sentence would make my daughter pround! English major. ;) > But did you get that I'm worried that programmers in Omaha will use > that same functionality to communicate American English (for which it > is basically sufficient, and which also requires ASCII when bytes are > used for communication)? Yes, I get that. Hopefully their friends and neighbors will slap them with fishes if they do. >> *My* definition is not ambiguous at all. If this particular part >> of the byte stream is defined to contain ASCII-encoded text, then I >> can use the bytes text methods to work with it. > > But how is Python supposed to know that? Python doesn't need to. bytes is a low-level object -- it could contain music, movies, dbf data, pdf data, or my mothers cheesecake recipe (properly encoded, of course). Python can't protect me from treating a music file as if it were a movie file, or even just writing proper music info at the wrong place in the music file; all that is up to me, as the programmer, to get right, and to understand what is needed. > But under your definition, you need to make the decision, or > explicitly code the decision, on the basis of context. Exactly so. I even have to do that in Py2. >> If that particular configuration of bytes is because it's >> ASCII-encoded text, then sure. > > Once again, you are advocate precisely the Python 2 model of text. Not exactly, because what I get back is bytes, which cannot directly be mixed with unicode (str) as it was in Py2. I think this is a key difference. >> To use, for example, bytes.__upper__ on data that wasn't >> ASCII-encoded text (even if it happened to look like it was) would >> be the height of stupidity. Please don't include me in such >> accusations. > > I have no idea why you think I think anybody would be that stupid. > That never occured to me. It's precisely "magic numbers" that happen > to look like English words when interpreted as ASCII coded characters > that I don't want manipulated by str-like methods that interpret text > (such as full-featured format or %). This confuses me somewhat. It's okay to use b'ethan'.upper(), which only makes semantic sense as ASCII-encoded text, but b'age: %d' % 43 isn't? (Aside, I'm perfectly comfortable with "ASCII-encoded text" because if you took u'ethan'.encode('ascii') you would get b'ethan'. If it was some other encoding, such as cp1251, I would call that particular byte stream "cp1251-encoded text". And if there were methods that worked directly on a cp1251-encoded byte stream I would not have any problem using them on cp1251-encoded text.) > What Nick > means by a "boundary type" is a type that works seamlessly with the > types on each side of the boundary as a helper in the conversion. So > when you use a struct to pack a bool, an int, and a date into a bytes, > the struct is the boundary type. And if there's a helper type to work > with bytes and/or str simultaneously, that's a boundary type, eg, > asciistr. But bytes itself is not a boundary type, it's just a type > with no internal structure, not even characters. Hmmm. I'll have to think about this. Okay, I've thought somewhat. Under the definition above would it be fair to say that Db3Table (a class in my dbf module) is a boundary type? It sits between the actual file and the program, and transforms bytes into actual Python types. -- ~Ethan~ From ethan at stoneleaf.us Mon Jan 13 06:27:12 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 21:27:12 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> Message-ID: <52D37930.6060901@stoneleaf.us> On 01/12/2014 06:11 PM, Guido van Rossum wrote: > On Sun, Jan 12, 2014 at 5:27 PM, Ethan Furman wrote: >> On 01/12/2014 04:47 PM, Guido van Rossum wrote: >>> %s seems the trickiest: I think with a bytes argument it should just >>> insert those bytes (and the padding modifiers should work too), and >>> for other types it should probably work like %a, so that it works as >>> expected for numeric values, and with a string argument it will return >>> the ascii()-variant of its repr(). Examples: >>> >>> b'%s' % 42 == b'42' >>> b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' >>> enclosed in single quotes) >> >> I'm not sure about the quotes. Would anyone ever actually want those in the >> byte stream? > > Perhaps not, but it's a hint that you should probably think about an > encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of > it as payback time. :-) Well that's hardly fair! I never liked the "b'x'" either! ;) Okay, I can live with that symmetry. -- ~Ethan~ From ethan at stoneleaf.us Mon Jan 13 06:22:04 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 21:22:04 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87k3e4ecb4.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <87r48cerkx.fsf@uwakimon.sk.tsukuba.ac.jp> <52D3295F.7070104@stoneleaf.us> <87k3e4ecb4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D377FC.1020304@stoneleaf.us> On 01/12/2014 08:27 PM, Stephen J. Turnbull wrote: > Ethan Furman writes: >> On 01/12/2014 02:57 PM, Stephen J. Turnbull wrote: > I didn't trim enough to make my point clear. My apologies. >>> But >>> knowledge of ASCII isn't necessary to specify these methods; they can >>> be defined in an encoding/decoding-free way. Perhaps you meant "use the methods". I meant "write the methods". You cannot write .upper for the bytes type without knowing what encoding has been used / is represented by those bytes. And quite frankly, if you use those methods on bytes without knowing (1) which encoding is represented by the bytes and (2) that the function you are calling is meant to work with that encoding... well, you deserve what you get. >> How can you say that with a straight face? > > Because I showed you code that does it. Did you see an .encode or a > .decode in there? No, I didn't. I saw numbers representing bytes representing text that has been encoded in the ASCII codec. If you didn't know it was ASCII, you couldn't write that function. Even though you don't have to call encode or decode if working directly with encoded bytes, you still have to know what the encoding is to do it correctly. >> Do you really think that .title, .isalnum, and .center (to name >> only a few) would work the same if the assumed encoding was EBCIDC? I phrased that poorly. If the byte stream was EBCIDC-encoded, and we called the current .method_which_assumes_ASCII on it, would we get the proper results? > The numbers involved would change, and the test > for finding letters would be different (and more complicated IIRC). And you have actually just made my point. If the bytes in question were EBCIDC-encoded, we could write a function for it because we know what it looks like as encoded bytes. Then we could be debating the merits of working directly with EBCIDC-encoded text instead of ASCII-encoded text. ;) > "There should be one- and preferably only one -way to do > it." The one way uses text, so preferably bytes shouldn't. You forgot the word "obvious". -- ~Ethan~ From v+python at g.nevcal.com Mon Jan 13 06:45:00 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 12 Jan 2014 21:45:00 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> Message-ID: <52D37D5C.6070604@g.nevcal.com> On 1/12/2014 6:11 PM, Guido van Rossum wrote: > On Sun, Jan 12, 2014 at 5:27 PM, Ethan Furman wrote: >> >On 01/12/2014 04:47 PM, Guido van Rossum wrote: >>> >>%s seems the trickiest: I think with a bytes argument it should just >>> >>insert those bytes (and the padding modifiers should work too), and >>> >>for other types it should probably work like %a, so that it works as >>> >>expected for numeric values, and with a string argument it will return >>> >>the ascii()-variant of its repr(). Examples: >>> >> >>> >>b'%s' % 42 == b'42' >>> >>b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' >>> >>enclosed in single quotes) >> > >> >I'm not sure about the quotes. Would anyone ever actually want those in the >> >byte stream? > Perhaps not, but it's a hint that you should probably think about an > encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of > it as payback time.:-) +1 Quotes in the stream are a great debug hint, without blowing up. +1 to the whole reboot solution, also. It cures the problems people are having, and there is no ambiguity. So then the question is whether to proceed with 3.4, delay this feature to 3.5, or to delay 3.4 to include this feature, both have been discussed, with the justification for the latter being to make 3.4 the ultimate Python 3 porting target for recalcitrant module authors, sooner than later. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jan 13 06:33:38 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 21:33:38 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> Message-ID: <52D37AB2.3090007@stoneleaf.us> On 01/12/2014 02:32 PM, Mark Lawrence wrote: > > I've just tried asciistr using your test code (having corrected the typo, it's assertIsInstance, not assertIsinstance :) > and it looks like a very good starting point. Have you, or anyone else for that matter, actually tried asciistr out? Ah, thanks for that fix, and thanks for trying it out. Um, how exactly did you try it out? This is what I did: bytestring_test.py ================== from asciicompat import asciistr as bytestring ... ================== ethan at media:~/source/bytestring$ python3.4 bytestring_test.py .F.FFF ====================================================================== FAIL: test_bytestring_will_accept_codepoints_in_latin1 (__main__.TestByteString) ---------------------------------------------------------------------- Traceback (most recent call last): File "bytestring_test.py", line 30, in test_bytestring_will_accept_codepoints_in_latin1 self.assertEqual(bytestring(char), bytes([ch])) AssertionError: '\x00' != b'\x00' ====================================================================== FAIL: test_from_str_plus_str (__main__.TestByteString) ---------------------------------------------------------------------- Traceback (most recent call last): File "bytestring_test.py", line 9, in test_from_str_plus_str self.assertEqual(result, b'hello world') AssertionError: 'hello world' != b'hello world' ====================================================================== FAIL: test_interpolation (__main__.TestByteString) ---------------------------------------------------------------------- Traceback (most recent call last): File "bytestring_test.py", line 33, in test_interpolation self.assertEqual(bytestring('Content-Length: %d') % 71, b'Content-Length: 71') AssertionError: 'Content-Length: 71' != b'Content-Length: 71' ====================================================================== FAIL: test_str_plus_from_str (__main__.TestByteString) ---------------------------------------------------------------------- Traceback (most recent call last): File "bytestring_test.py", line 14, in test_str_plus_from_str result = 'hello' + bytestring('world') AssertionError: TypeError not raised ---------------------------------------------------------------------- Ran 6 tests in 0.002s FAILED (failures=4) Four out of six failed is not a good beginning. :( -- ~Ethan~ From v+python at g.nevcal.com Mon Jan 13 07:06:27 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 12 Jan 2014 22:06:27 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D364E1.5060704@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> Message-ID: <52D38263.4020307@g.nevcal.com> On 1/12/2014 8:00 PM, Ethan Furman wrote: > > Okay, I've thought somewhat. Under the definition above would it be > fair to say that Db3Table (a class in my dbf module) is a boundary > type? It sits between the actual file and the program, and transforms > bytes into actual Python types. Yes. That is exactly what a boundary type is. It doesn't matter whether it is a file format or a wire protocol format on the non-Python side, the sequence of bytes is defined, using methods that are not directly corresponding to python data types (if they do correspond, the boundary type is trivial). -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Mon Jan 13 07:46:14 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 12 Jan 2014 22:46:14 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87ob3geoa9.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <52D31049.1090708@g.nevcal.com> <87ob3geoa9.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D38BB6.1000100@g.nevcal.com> On 1/12/2014 4:08 PM, Stephen J. Turnbull wrote: > Glenn Linderman writes: > > > the proposals to embed binary in Unicode by abusing Latin-1 > > encoding. > > Those aren't "proposals", they are currently feasible techniques in > Python 3 for *some* use cases. > > The question is why infecting Python 3 with the byte/character > confoundance virus is preferable to such techniques, especially if > their (serious!) deficiencies are removed by creating a new type such > as asciistr. "smuggled binary" (great term borrowed from a different subthread) muddies the waters of what you are dealing with. As long as the actual data is only Latin-1 and smuggled binary, the technique probably isn't too bad... you can define the the "smuggled binary" as a "decoding" of binary to text, sort of like base64 "decodes" binary to ASCII. And it can be a useful technique. As soon as you introduce "smuggled non-ASCII, non-Latin-1 text" encodings into the mix, it gets thoroughly confusing... just as confusing as the Python 2 text model. It takes decode+encode to do the smuggled text, plus encode push it to the boundary, plus you have text that you know is text, but because of the required techniques for smuggling it, you can't operate on it or view it properly as the text that it should be. The "byte/character confoundance virus" is a hobgoblin of paranoid perception. In another post, I pointed out that ''' b"%d" % 25 ''' is not equivalent to ''' "%d" % 25 ''' because of the "b" in the first case. So the "implicit" encoding that everyone on that side of the fence was talking about was not at all implicit, but explicit. The numeric characters produced by %d are clearly in the ASCII subset of text, so having b"%d" % 25 produce pre-encoded ASCII text is explicit and practical. My only concern was what b"%s" % 'abc' should do, because in general, str may not contain only ASCII. (generalize to b"%s" % str(...) ). Guido solved that one nicely. Of course, at this point, I could punt the whole argument off to "Guido said so", but since you asked me, I felt it appropriate to respond from my perspective... and I'm not sure Guido specifically addressed your smuggled binary proposal. When the mixture of text and binary is done as encoded text in binary, then it is obvious that only limited text processing can be performed, and getting the text there requires that it was encoded (hopefully properly encoded per the binary specification being created) to become binary. And there are no extra, confusing Latin-1 encode/decode operations required. From a higher-level perspective, I think it would be great to have a module, perhaps called "boundary" (let's call it that for now), that allow some definition syntax (augmented BNF? augmented ABNF?) to explain the format of a binary blob. And then provide methods for generating and parsing it to/from Python objects. Obviously, the ABNF couldn't understand Python objects; instead, Python objects might define the ABNF to which they correspond, and methods for accepting binary and producing the object (factory method?) and methods for generating the binary. As objects build upon other objects, the ABNF to which the correspond could be constructed, and perhaps even proven to be capable of parsing all valid blobs corresponding to the specification, and perhaps even proven to be capable of generating only valid blobs (although I'm not a software proof guru; last I heard there were definite limits on the ability to do proofs, but maybe this is a limited enough domain it could work). Then all blobs could be operated on sort of like web browsers operate on the DOM, or some XML parsing libraries, by defining each blob as a collection of objects for the pieces. XML is far too wordy for practical use (but hey! it is readable) but perhaps it could be practical if tokenized, and then the tokenized representation could be converted to a DOM just like XML and HTML are. (this is mostly to draw the parallel in the parsing and processing techniques; I'm not seriously suggesting a binary version of XML, but there is a strong parallel, and it could be done). Given a DOM-like structure, a validator could be written to operate on it, though, to provide, if not a proof, at least a sanity check. And, given the DOM-like structure, one call to the top-level object to generate the blob format would walk over all of them, generating the whole blob. Off I go, drifting into Python ideas.... but I have a program I want to rewrite that could surely use some of these techniques (and probably will), because it wants to read several legacy formats, and produce several legacy formats, as well as a new, more comprehensive format. So the objects will be required to parse/generate 4 different blob structures, one of which has its own set of several legacy variations. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jan 13 07:51:17 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Jan 2014 16:51:17 +1000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: Message-ID: On 13 January 2014 09:55, Guido van Rossum wrote: > There's a lot of discussion about PEP 460 and I haven't read it all. > Maybe you all have already reached the same conclusion that I have. In > that case I apologize (but the PEP should be updated). Here's my > contribution: > > PEP 460 itself currently rejects support for %d, AFAIK on the basis > that bytes aren't necessarily ASCII. I think that's a misunderstanding > of the intention of the bytes type. > > The key reason for introducing a separate bytes type in Python 3 is to > avoid *mixing* bytes and text. This aims to avoid the classic Python 2 > Unicode failure, where str+unicode fails or succeeds based on whether > str contains non-ASCII characters or not, which means it is easy to > miss in testing. Properly written code in Python 3 will fail based on > the *type* of the objects, not based on their contents. Content-based > failures are still possible, but they occur in typical "boundary" > operations such as encode/decode. > > But this does not mean the bytes type isn't allowed to have a > noticeable bias in favor of encodings that are ASCII supersets, even > if not all bytes objects contain such data (e.g. image data, > compressed data, binary network packets, and so on). I am a strong -1 on the more lenient proposal, as it makes binary interpolation in Python 3 an *unsafe operation* for ASCII incompatible binary formats. The existing binary operations that assume ASCII do so *inherently* - they're not input driven, the operation itself assumes ASCII, so if you're working with data that may not be ASCII compatible, you simply don't use them (these are operations like title(), upper(), lower(), the default arguments for split() and strip(), etc). They don't accept text or other structured data as input - you have to provide existing binary data or individual byte values (or, in the case of split(), strip(), the special value None to indicate the assumption of ASCII whitespace). With PEP 460 as it stands, binary interpolation is safe - you can't implicitly introduce an ASCII assumption, regardless of the format string or input data, as everything that hasn't already been translated to the binary domain will be rejected with a TypeError. By allowing format characters that *do* assume ASCII, the entire construct is rendered unsafe - you have to look inside the format string to determine if it is assuming ASCII compatibility or not, thus the entire construct must be deemed as assuming ASCII compatibility at the level of static semantic analysis. The more lenient proposal also creates an ambiguity about what it means to pass an integer to a binary formatting operation - is it about inserting individual byte values in the range 0-255, or is it about inserting the ASCII encoded digits of arbitrary byte strings, or does it depend on which formatting code you use? PEP 460 is currently entirely consistent with the other binary operations (it only accepts integers in the 0-255 range and interprets them as byte values), while the more lenient approach goes for the "it depends on the formatting code" alternative. Allowing these ASCII assuming format codes in the core bytes interpolation introduces *exactly* the same problem as is present in the Python 2 text model: code that *appears* to support arbitrary binary data, but is in fact assuming ASCII compatibility. So any code that has to handle ASCII incompatible encodings will need to be implemented with the warning "don't use any of the binary formatting operations for data that may not be ASCII compatible, but we also don't provide a convenient equivalent that can be guaranteed to be safe so we know you're going to ignore this warning and do it anyway". That kind of "don't do that, it may cause problems with certain inputs" is *exactly* the kind of bug magnet that the Python 3 transition was designed to categorically eliminate. PEP 460 is perfect in that regard - it provides exactly as much functionality as can be done correctly when manipulating arbitrary binary data, and no more. It has no trace of the legacy Python 2 text model. However, I *also* accept that the Python 2 text model is convenient for certain use cases. This is why, in addition to PEP 460 as it currently stands, I am also (with Benno Rice) one of the instigators of the asciicompat project, and have promised Benno that I will ensure that any interoperability bugs asciicompat.asciistr uncovers in the core types are fixed (for Python 3.3+, since it depends on the PEP 393 internal representation for strings). asciistr will provide a public API that behaves *exactly* like a text type (including interoperating with strings and returning length 1 substrings when indexing, intepreting integers and other numeric types as their ASCII representation when passed in, supporting *full* text formatting semantics), but also exists in the binary domain, by exporting the bytes view of its internal data through the PEP 3118 buffer API. In this way, asciistr will be a *new* general purpose mechanism for translating between the binary and text domains in Python 3, just like str.encode, bytes/bytearray/memoryview.decode and the struct module. It doesn't need to compromise - it's objectives are to make working with ASCII compatible binary protocols and writing hybrid binary/text APIs exactly as convenient as it was in Python 2, because that's where the test suite is developed: in Python 2, using "asciistr=str". It just doesn't need to be a builtin and, at this point in time, doesn't even need to be in the standard library. It can be developed on GitHub and published on PyPI and made available for Python 3.3 and above (it's also trivially 2.x compatible: there, it just republishes the str builtin as asciicompat.asciistr) Once asciistr is working, we can also look into creating "asciicompat.asciiview", which would be a PEP 3118 *consumer* in addition to a publisher, and provide asciistr functionality for existing binary data, without needing to copy it. ASCII compatible protocols *are* special and *are* worthy of having a dedicated type devoted to handling them. However, it shouldn't be at the expense of compromising the ability of Python 3 users to ensure that they aren't accidentally introducing assumptions of ASCII compatibility where they don't belong, particularly when doing so produces a clearly *inferior* solution. The superior solution looks like this: * bytes/bytearray/memoryview: pure binary types, operate entirely in the binary domain. They provide convenience operations that are only valid for ASCII compatible data, but the ASCII assumption is inherent in the operation itself rather than being input driven (the one minor exception being that passing None to split() and strip() operations assumes ASCII whitespace). * asciicompat.asciistr: hybrid type that exposes a text API in the application domain, but also exposes binary data directly for binary interoperability * str: pure text type, operates entirely in the application domain. This approach also opens up the possibility of eventually leveraging PEP 393 to provide an asciicompat.utf8str type which allows arbitrary unicode characters and exports the UTF-8 representation, rather than restricting the permitted code points to 7-bit ASCII, as well as an asciicompat.latin1str which permits arbitrary 8 bit data (representing it as latin-1 text in the application domain), or even an asciicompat.encodedstr that supports any 8-bit encoding. The key thing that the text model change in Python 3 enabled is for us to use the type system to *help* with managing the complexity of dealing with text encodings. We've got a long way with just the two pure types, and no additional types that straddle the binary/text boundary the way the Python 2 str type did. Unlike introducing *new* ASCII-only operations to the bytes type, adding new types specifically for dealing with ASCII compatible formats (especially starting life as a third party library) isn't compromising the Python 3 text model, it's embracing it and making it work for us (which is why I've been suggesting that it be considered since at least 2010). The problem with "str" in Python 2 was that one type was used to represent too many things with serious semantic differences. The ongoing attempts to reintroduce that ambiguity to the core bytes type rather than exploring the creation of new types and then filing bugs for any interoperability issues those attempts uncover in the core types represents one of the worst cases of paradigm lock that I have ever seen :P Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From donald at stufft.io Mon Jan 13 07:52:26 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 13 Jan 2014 01:52:26 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D37D5C.6070604@g.nevcal.com> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> Message-ID: <3B1B5227-80B9-4E06-B639-F3111E96FF19@stufft.io> On Jan 13, 2014, at 12:45 AM, Glenn Linderman wrote: > So then the question is whether to proceed with 3.4, delay this feature to 3.5, or to delay 3.4 to include this feature, both have been discussed, with the justification for the latter being to make 3.4 the ultimate Python 3 porting target for recalcitrant module authors, sooner than later. I really hope this can make it in 3.4, needing to wait another 2 years or so until this is available would be a shame. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ncoghlan at gmail.com Mon Jan 13 07:59:42 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Jan 2014 16:59:42 +1000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <3B1B5227-80B9-4E06-B639-F3111E96FF19@stufft.io> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> <3B1B5227-80B9-4E06-B639-F3111E96FF19@stufft.io> Message-ID: On 13 January 2014 16:52, Donald Stufft wrote: > > On Jan 13, 2014, at 12:45 AM, Glenn Linderman wrote: > > So then the question is whether to proceed with 3.4, delay this feature to > 3.5, or to delay 3.4 to include this feature, both have been discussed, with > the justification for the latter being to make 3.4 the ultimate Python 3 > porting target for recalcitrant module authors, sooner than later. > > > I really hope this can make it in 3.4, needing to wait another 2 years or so > until this is available would be a shame. Indeed, it would be a shame to have to wait. Fortunately, people don't even need to wait until the release of Python 3.4, they can instead try to help out with the asciicompat project, which aims to provide this functionality in Python 3.3+: https://github.com/jeamland/asciicompat All it takes is to let go of the idea "I wish the Python 3 bytes type was more like the Python 2 str type" and instead think "hmm, the Python 3 bytes type doesn't seem like a great fit for my use case, maybe I need a different type". Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From donald at stufft.io Mon Jan 13 08:14:08 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 13 Jan 2014 02:14:08 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> <3B1B5227-80B9-4E06-B639-F3111E96FF19@stufft.io> Message-ID: <041F7CD9-B95E-4DA9-AA46-C807EB0F3974@stufft.io> On Jan 13, 2014, at 1:59 AM, Nick Coghlan wrote: > On 13 January 2014 16:52, Donald Stufft wrote: >> >> On Jan 13, 2014, at 12:45 AM, Glenn Linderman wrote: >> >> So then the question is whether to proceed with 3.4, delay this feature to >> 3.5, or to delay 3.4 to include this feature, both have been discussed, with >> the justification for the latter being to make 3.4 the ultimate Python 3 >> porting target for recalcitrant module authors, sooner than later. >> >> >> I really hope this can make it in 3.4, needing to wait another 2 years or so >> until this is available would be a shame. > > Indeed, it would be a shame to have to wait. Fortunately, people don't > even need to wait until the release of Python 3.4, they can instead > try to help out with the asciicompat project, which aims to provide > this functionality in Python 3.3+: > https://github.com/jeamland/asciicompat > > All it takes is to let go of the idea "I wish the Python 3 bytes type > was more like the Python 2 str type" and instead think "hmm, the > Python 3 bytes type doesn't seem like a great fit for my use case, > maybe I need a different type?. It?s almost a fine fit for the usecase afaict the major thing it?s missing is an easy way to handle this last use case. I don?t see how this proposal is any different than cases such as int(b?1?). ASCII is already special, giving an area that Python3 has made things worse a better way forward isn?t comprising the text model, it?s recognizing the realities of the world. > > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From guido at python.org Mon Jan 13 08:15:13 2014 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Jan 2014 23:15:13 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> <3B1B5227-80B9-4E06-B639-F3111E96FF19@stufft.io> Message-ID: On Sun, Jan 12, 2014 at 10:59 PM, Nick Coghlan wrote: > On 13 January 2014 16:52, Donald Stufft wrote: >> >> On Jan 13, 2014, at 12:45 AM, Glenn Linderman wrote: >> >> So then the question is whether to proceed with 3.4, delay this feature to >> 3.5, or to delay 3.4 to include this feature, both have been discussed, with >> the justification for the latter being to make 3.4 the ultimate Python 3 >> porting target for recalcitrant module authors, sooner than later. >> >> >> I really hope this can make it in 3.4, needing to wait another 2 years or so >> until this is available would be a shame. > > Indeed, it would be a shame to have to wait. Fortunately, people don't > even need to wait until the release of Python 3.4, they can instead > try to help out with the asciicompat project, which aims to provide > this functionality in Python 3.3+: > https://github.com/jeamland/asciicompat > > All it takes is to let go of the idea "I wish the Python 3 bytes type > was more like the Python 2 str type" and instead think "hmm, the > Python 3 bytes type doesn't seem like a great fit for my use case, > maybe I need a different type". Maybe you're letting your excitement about asciistr get the better of you? IMO we don't need more types. If you can refrain from using int(b), b.lower() and b += 'abc' when b isn't ASCII-encoded, why couldn't you also refrain from b += b'%s' % 42? I'll suppress the urge to quote verbatim from my first message in this thread (about the motivation for bytes) but I'll just recommend you re-read it. (It's too late here to write more, but it looks like we are in for a bitter fight. :-( ) -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Mon Jan 13 08:18:48 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 23:18:48 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: Message-ID: <52D39358.8090405@stoneleaf.us> On 01/12/2014 10:51 PM, Nick Coghlan wrote: > > I am a strong -1 on the more lenient proposal, as it makes binary > interpolation in Python 3 an *unsafe operation* for ASCII incompatible > binary formats. No more unsafe that calling .upper() on ASCII incompatible streams. > The existing binary operations that assume ASCII do so *inherently* - > they're not input driven, the operation itself assumes ASCII, so if > you're working with data that may not be ASCII compatible, you simply > don't use them (these are operations like title(), upper(), lower(), > the default arguments for split() and strip(), etc). How is this different from not using % interpolation when the byte stream is incompatible? It isn't. And what do you mean by "input driven"? If the LHS is bytes, the result is bytes, no matter what the input is. This is not the Py2 world where you may end up with str or unicode; you always end up with bytes if the LHS is bytes. [snip the rest that seems to flow from these misunderstandings] -- ~Ethan~ From mark at hotpy.org Mon Jan 13 09:46:13 2014 From: mark at hotpy.org (Mark Shannon) Date: Mon, 13 Jan 2014 08:46:13 +0000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D34C71.1020200@stoneleaf.us> <52D34E65.2050007@stoneleaf.us> Message-ID: <52D3A7D5.9080106@hotpy.org> On 13/01/14 03:47, Guido van Rossum wrote: > On Sun, Jan 12, 2014 at 6:24 PM, Ethan Furman wrote: >> On 01/12/2014 06:16 PM, Ethan Furman wrote: >>> >>> >>> If you do : >>> >>> --> b'%s' % 'some text' >> >> >> Ignore what I previously said. With no encoding the result would be: >> >> b"'some text'" >> >> So an encoding should definitely be specified. > > Yes, but the encoding is no business of %s or %. As far as the > formatting operation cares, if the argument is bytes they will be > copied literally, and if the argument is a str (or anything else) it > will call ascii() on it. It seems to me that what people want from '%s' is: Convert to a str then encode as ascii for non-bytes or copy directly for bytes. So why not replace '%s' with '%a' for the ascii case and with '%b' for directly inserting bytes. That way, the encoding is explicit. I think it is vital that the encoding is explicit in all cases where bytes <-> str conversion occurs. Cheers, Mark. From mal at egenix.com Mon Jan 13 10:06:00 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 13 Jan 2014 10:06:00 +0100 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: Message-ID: <52D3AC78.6010706@egenix.com> On 13.01.2014 07:51, Nick Coghlan wrote: > > [Using a new asciistr type] > > The key thing that the text model change in Python 3 enabled is for us > to use the type system to *help* with managing the complexity of > dealing with text encodings. We've got a long way with just the two > pure types, and no additional types that straddle the binary/text > boundary the way the Python 2 str type did. Unlike introducing *new* > ASCII-only operations to the bytes type, adding new types specifically > for dealing with ASCII compatible formats (especially starting life as > a third party library) isn't compromising the Python 3 text model, > it's embracing it and making it work for us (which is why I've been > suggesting that it be considered since at least 2010). The problem > with "str" in Python 2 was that one type was used to represent too > many things with serious semantic differences. > > The ongoing attempts to reintroduce that ambiguity to the core bytes > type rather than exploring the creation of new types and then filing > bugs for any interoperability issues those attempts uncover in the > core types represents one of the worst cases of paradigm lock that I > have ever seen :P In theory this sounds nice, but in practice you often run into the issue that whenever you pass such a str-subtype to some function that works on str doesn't return the str-subtype as result, but instead a new str object. As a result, you have to keep track of which operations work on your str-subtype alone and which convert it back to a str, making the approach infeasible for all but the most basic uses. This is why we try to make the basic types as useful as possible for everyone. It's also the main reason why subtyping 8-bit strings and Unicode in Python 2 wasn't a popular sport :-) Leaving aside the discussion about str and bytes, I think PEP 460 has much potential of making life easier for people dealing with binary data: the formatting codes for the bytes format methods could be extended to include the struct module features - with the struct module then turning into a proxy for these new format methods (much like we did with the string module when string methods were introduced). BTW: There's a little known trick in Python 2 which also lets you disable the string to Unicode coercion: all you have to do is set the default encoding to "undefined" (see site.py:setencoding()). Python 2 will then raise a UnicodeError whenever coercion would trigger. I added that codec to experiment with this scenario in the early days of the Unicode integration. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 13 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Mon Jan 13 10:13:48 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Jan 2014 19:13:48 +1000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D39358.8090405@stoneleaf.us> References: <52D39358.8090405@stoneleaf.us> Message-ID: On 13 Jan 2014 17:43, "Ethan Furman" wrote: > > On 01/12/2014 10:51 PM, Nick Coghlan wrote: >> >> >> I am a strong -1 on the more lenient proposal, as it makes binary >> interpolation in Python 3 an *unsafe operation* for ASCII incompatible >> binary formats. > > > No more unsafe that calling .upper() on ASCII incompatible streams. Right - Guido's proposal is *completely useless* for arbitrary binary data. You can't trust it. However, Python 3 has no equivalent binary interpolation feature that *is* safe for arbitrary binary data, so the lenient version *will* be a bug magnet if it is the only version of binary interpolation provided. However, if new formatb and formatb_map methods were included in the proposal with the current strict PEP 460 semantics, then my objections would be reduced substantially. In that case, we'd still be providing the new binary interpolation feature *in addition* to restoring the ASCII compatible interpolation feature, so the latter would be less of an attractive nuisance when writing code that needs to handle arbitrary binary formats and can't assume ASCII compatibility. With that approach, I'd even support the idea of implicit strict ASCII encoding of text inputs for the ASCII compatible version. > > > >> The existing binary operations that assume ASCII do so *inherently* - >> they're not input driven, the operation itself assumes ASCII, so if >> you're working with data that may not be ASCII compatible, you simply >> don't use them (these are operations like title(), upper(), lower(), >> the default arguments for split() and strip(), etc). > > > How is this different from not using % interpolation when the byte stream is incompatible? It isn't. Because I *want to use* the PEP 460 binary interpolation API, but wouldn't be able to use Guido's more lenient proposal, as it is a bug magnet in the presence of arbitrary binary data. Provide both APIs and my objections go away - ASCII interpolation just becomes another way to translate between structured and text data, while binary interpolation would be a strictly binary only operation. > > And what do you mean by "input driven"? If the LHS is bytes, the result is bytes, no matter what the input is. This is not the Py2 world where you may end up with str or unicode; you always end up with bytes if the LHS is bytes. The LHS may or may not be tainted with assumptions about ASCII compatibility, which means it effectively *is* tainted with such assumptions, which means code that needs to handle arbitrary binary data can't use it and is left without a binary interpolation feature. That's why *adding* formatb to Guido's more lenient proposal resolves my objections: it provides the binary interpolation feature I want, and maintains Python 3's clear distinction between the text domain and the binary domain. Cheers, Nick. > > [snip the rest that seems to flow from these misunderstandings] > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Mon Jan 13 10:19:08 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 13 Jan 2014 01:19:08 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D3A7D5.9080106@hotpy.org> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D34C71.1020200@stoneleaf.us> <52D34E65.2050007@stoneleaf.us> <52D3A7D5.9080106@hotpy.org> Message-ID: <52D3AF8C.9030203@g.nevcal.com> On 1/13/2014 12:46 AM, Mark Shannon wrote: > On 13/01/14 03:47, Guido van Rossum wrote: >> On Sun, Jan 12, 2014 at 6:24 PM, Ethan Furman >> wrote: >>> On 01/12/2014 06:16 PM, Ethan Furman wrote: >>>> >>>> >>>> If you do : >>>> >>>> --> b'%s' % 'some text' >>> >>> >>> Ignore what I previously said. With no encoding the result would be: >>> >>> b"'some text'" >>> >>> So an encoding should definitely be specified. >> >> Yes, but the encoding is no business of %s or %. As far as the >> formatting operation cares, if the argument is bytes they will be >> copied literally, and if the argument is a str (or anything else) it >> will call ascii() on it. > > It seems to me that what people want from '%s' is: > Convert to a str then encode as ascii for non-bytes > or copy directly for bytes. Maybe. But it only takes a small tweak to the parameter to get what they want... a tweak that works in both Python 2.7 and Python 3.whatever-version-gets-this. Instead of b"%s" % foo they must use b"%s" % foo.encode( explicitEncoding ) which is what they should have been doing in Python 2.7 all along, and if they were, they need make no change. Oh, foo was a Python 2.7 str? Converted to Python 3.x str, by default conversion rules? Already in ASCII? No harm. Oh, foo was a literal? Add b prefix, instead of the .encode("ASCII"), if you prefer. > So why not replace '%s' with '%a' for the ascii case and > with '%b' for directly inserting bytes. Because %a and %b don't exist in Python 2.7? > That way, the encoding is explicit. The encoding is already explicit. If it is bytes encoded from str, that transformation had an explicit encoding. If it is "%s" % str(...), then there is no encoding, but rather a transformation into an ASCII representation of the Unicode code points, using escape sequences. Which isn't likely to be what they want, but see the parameter tweak above. > I think it is vital that the encoding is explicit in all cases where > bytes <-> str conversion occurs. Since it is explicit, you have no concerns in this area. Regarding the concern about implicit use of ASCII by certain bytes methods and proposed interpolations, I'm curious how many standard encodings exist that do not have an ASCII subset. I can enumerate a starting list, but if there are others in actual use, I'm unaware of them. EBCDIC UTF-16 BE & LE UTF-32 BE & LE Wikipedia: The vast majority of code pages in current use are supersets of ASCII , a 7-bit code representing 128 control codes and printable characters. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at hotpy.org Mon Jan 13 10:49:01 2014 From: mark at hotpy.org (Mark Shannon) Date: Mon, 13 Jan 2014 09:49:01 +0000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D3AF8C.9030203@g.nevcal.com> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D34C71.1020200@stoneleaf.us> <52D34E65.2050007@stoneleaf.us> <52D3A7D5.9080106@hotpy.org> <52D3AF8C.9030203@g.nevcal.com> Message-ID: <52D3B68D.90601@hotpy.org> On 13/01/14 09:19, Glenn Linderman wrote: > On 1/13/2014 12:46 AM, Mark Shannon wrote: >> On 13/01/14 03:47, Guido van Rossum wrote: >>> On Sun, Jan 12, 2014 at 6:24 PM, Ethan Furman wrote: >>>> On 01/12/2014 06:16 PM, Ethan Furman wrote: >>>>> >>>>> >>>>> If you do : >>>>> >>>>> --> b'%s' % 'some text' >>>> >>>> >>>> Ignore what I previously said. With no encoding the result would be: >>>> >>>> b"'some text'" >>>> >>>> So an encoding should definitely be specified. >>> >>> Yes, but the encoding is no business of %s or %. As far as the >>> formatting operation cares, if the argument is bytes they will be >>> copied literally, and if the argument is a str (or anything else) it >>> will call ascii() on it. >> >> It seems to me that what people want from '%s' is: >> Convert to a str then encode as ascii for non-bytes >> or copy directly for bytes. > > Maybe. But it only takes a small tweak to the parameter to get what they want... a tweak that works in both Python 2.7 and Python 3.whatever-version-gets-this. > > Instead of > > b"%s" % foo > > they must use > > b"%s" % foo.encode( explicitEncoding ) > > which is what they should have been doing in Python 2.7 all along, and if they were, they need make no change. > > Oh, foo was a Python 2.7 str? Converted to Python 3.x str, by default conversion rules? Already in ASCII? No harm. > Oh, foo was a literal? Add b prefix, instead of the .encode("ASCII"), if you prefer. > >> So why not replace '%s' with '%a' for the ascii case and >> with '%b' for directly inserting bytes. > > Because %a and %b don't exist in Python 2.7? I thought this was about 3.5, not 2.7 ;) '%s' can't work in 3.5, as we must differentiate between strings which meed to be encoded and bytes which don't. > >> That way, the encoding is explicit. > > The encoding is already explicit. If it is bytes encoded from str, that transformation had an explicit encoding. If it is "%s" % str(...), then there is no encoding, but rather a transformation into > an ASCII representation of the Unicode code points, using escape sequences. Which isn't likely to be what they want, but see the parameter tweak above. > >> I think it is vital that the encoding is explicit in all cases where >> bytes <-> str conversion occurs. > > Since it is explicit, you have no concerns in this area. > > > Regarding the concern about implicit use of ASCII by certain bytes methods and proposed interpolations, I'm curious how many standard encodings exist that do not have an ASCII subset. I can enumerate > a starting list, but if there are others in actual use, I'm unaware of them. > > EBCDIC > UTF-16 BE & LE > UTF-32 BE & LE > > Wikipedia: The vast majority of code pages in current use are supersets of ASCII , a 7-bit code representing 128 control codes and printable characters. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/mark%40hotpy.org > From ethan at stoneleaf.us Mon Jan 13 08:59:13 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 23:59:13 -0800 Subject: [Python-Dev] PEP 460 reboot and a bitter fight In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> <3B1B5227-80B9-4E06-B639-F3111E96FF19@stufft.io> Message-ID: <52D39CD1.608@stoneleaf.us> On 01/12/2014 11:15 PM, Guido van Rossum wrote: > > (It's too late here to write more, but it looks like we are in for a > bitter fight. :-( ) It's already been a bitter fight. The opponents of %-interpolation (Nick, Antoine, Turnbull, D'Aprano, et al*) all seem to be arguing basically what Nick said. The proponents (myself, you, Stufft, Eric Smith, et al*) are arguing that bytes already has an ASCII bias, already has ASCII string methods, that it isn't the same as the Py2 world because if you combine a bytes object with a str object outside of interpolation (such as b'hello' + 'world') it doesn't work, that only bytes would ever be returned, etc, etc. With the possible exception of the question I just asked Nick, I don't think we're going to get any new information. I suppose you're used to not being able to please everybody. :/ -- ~Ethan~ * et al means everyone whose name I couldn't remember, or figure out which camp you were in in the wee hours of the night. From stephen at xemacs.org Mon Jan 13 11:48:50 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jan 2014 19:48:50 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D364E1.5060704@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> Message-ID: <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > The part that you don't seem to acknowledge (sorry if I missed it) > is that there are str-like methods already on bytes. I haven't expressed myself well, but I don't much care about that. It's what Knuth would classify as a seminumerical method. What I do care about is that the methods that convert other types to text (including format) not work for bytes. That's where I consider text to "start". > > is *exactly* the Python 2 model of text. But you deny that the > > effect of your proposals (eg, b"%d" % (12,)) is to reintroduce Python > > 2's bytes/character confusion, don't you? > > Given that the default (and only) text type in Py3 is str, which is > unicode, I don't think any confusion will be as severe, but I > acknowledge that there could be some. I fear it will be quite severe where I live, in Shift JIS/GB18030 land. (The two most obnoxious encodings known to man, except perhaps the syntax of Brainf!ck.) > >> *My* definition is not ambiguous at all. If this particular part > >> of the byte stream is defined to contain ASCII-encoded text, then I > >> can use the bytes text methods to work with it. > > > > But how is Python supposed to know that? > > Python doesn't need to. ... because you know it. But the ideal of object-oriented programming (and duck-typing) is that you shouldn't need to; the object should know how to produce appropriate behavior itself. > > But under your definition, you need to make the decision, or > > explicitly code the decision, on the basis of context. > > Exactly so. I even have to do that in Py2. "Even." This is exactly where PBP and EIBTI part company, I think. EIBTI thinks its a bad idea to pass around bytes that are implicitly some other type, and Python 3 *should be good enough to make that unnecessary*. I'm convinced, and Nick is convinced, that we can make that true for 90% of the cases that it isn't now, if we could just figure out what's hard about the use cases where Python 3 isn't up to snuff yet (and figure out which use cases we need to handle to get us up to 90%!) PBP doesn't think it's a great idea to pass around bytes that are implicitly some other type, but didn't mind it (or got used to it) in Python 2, and so they're not looking at that as a problem that Python 3 can solve. They're looking at Python 3 as the problem that prevents them from doing what worked fine in Python 2. I understand that point of view, I just think we should be able to do better in Python 3, and should give it a serious try before giving in. Remember, "Special cases aren't special enough to break the rules" comes *before* "Although practicality beats purity". Not to forget that "Explicit is better than implicit" is second[1] on the list. ;-) After looking at this thread, I feel that (due to misunderstandings on both sides) purity hasn't really been tried yet. > >> If that particular configuration of bytes is because it's > >> ASCII-encoded text, then sure. > > > > Once again, you are advocate precisely the Python 2 model of text. > > Not exactly, because what I get back is bytes, which cannot > directly be mixed with unicode (str) as it was in Py2. I think > this is a key difference. You're in good company there; that was Guido's rationale for not worrying, too. I agree it's "key" (and I'm sure Nick will, on reflection if not already). But I worry (a lot) that it's not enough. > This confuses me somewhat. It's okay to use b'ethan'.upper(), > which only makes semantic sense as ASCII-encoded text, Not really OK. In theory, because it doesn't require serialization/ encoding of a primitive type, it doesn't matter. In practice, without powerful formatting, it isn't even a major attraction. In practice, with powerful formatting, it adds to the attraction. Note that regex doesn't require type conversions (matches have methods to return positions in the target or subsequences of the target, not values of other types), which is why I (and I suspect Nick for the same reason) am comfortable with polymorphic regex but not with bytes formatting. > (Aside, I'm perfectly comfortable with "ASCII-encoded text" because > if you took u'ethan'.encode('ascii') you would get b'ethan'. If it > was some other encoding, such as cp1251, I would call that > particular byte stream "cp1251-encoded text". Even though "ethan" is perfectly good ASCII-encoded text (as well as the integer 435,744,694,638 on a bigendian machine with 5-byte words, and you have no way of knowing whether it was user data (CP1251) or a metadata keyword (ASCII) or be the US national debt in 1967 dollars (integer) when b'ethan' shows up in a trace? > And if there were methods that worked directly on a cp1251-encoded > byte stream I would not have any problem using them on > cp1251-encoded text.) I was afraid of that: all of those methods (except the case methods[2]) will work fine on a cp1251-encoded text. And because they only know that the string is bytes, the case methods will silently corrupt your "text" as soon as they get a chance. That bothers me, even if it doesn't bother you. Purity again, if you like. (But you'd take a safe .upper if you got it for free, no?) > Okay, I've thought somewhat. Under the definition above would it > be fair to say that Db3Table (a class in my dbf module) is a > boundary type? It sits between the actual file and the program, > and transforms bytes into actual Python types. Yes, I'd call that a boundary type. Footnotes: [1] Yes, I know what's number 1, but I'm not going to mention it out loud! [2] Arguably those too, since bytes don't have a locale. They're in C locale and the bytes >127 don't have semantics like case. From ncoghlan at gmail.com Mon Jan 13 12:08:36 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Jan 2014 21:08:36 +1000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <041F7CD9-B95E-4DA9-AA46-C807EB0F3974@stufft.io> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> <3B1B5227-80B9-4E06-B639-F3111E96FF19@stufft.io> <041F7CD9-B95E-4DA9-AA46-C807EB0F3974@stufft.io> Message-ID: On 13 Jan 2014 17:14, "Donald Stufft" wrote: > > > On Jan 13, 2014, at 1:59 AM, Nick Coghlan wrote: > > > On 13 January 2014 16:52, Donald Stufft wrote: > >> > >> On Jan 13, 2014, at 12:45 AM, Glenn Linderman wrote: > >> > >> So then the question is whether to proceed with 3.4, delay this feature to > >> 3.5, or to delay 3.4 to include this feature, both have been discussed, with > >> the justification for the latter being to make 3.4 the ultimate Python 3 > >> porting target for recalcitrant module authors, sooner than later. > >> > >> > >> I really hope this can make it in 3.4, needing to wait another 2 years or so > >> until this is available would be a shame. > > > > Indeed, it would be a shame to have to wait. Fortunately, people don't > > even need to wait until the release of Python 3.4, they can instead > > try to help out with the asciicompat project, which aims to provide > > this functionality in Python 3.3+: > > https://github.com/jeamland/asciicompat > > > > All it takes is to let go of the idea "I wish the Python 3 bytes type > > was more like the Python 2 str type" and instead think "hmm, the > > Python 3 bytes type doesn't seem like a great fit for my use case, > > maybe I need a different type?. > > It?s almost a fine fit for the usecase afaict the major thing it?s missing > is an easy way to handle this last use case. I don?t see how this proposal > is any different than cases such as int(b?1?). ASCII is already special, > giving an area that Python3 has made things worse a better way forward > isn?t comprising the text model, it?s recognizing the realities of the world. The difference between this and int() is that there's no structural ambiguity introduced in the case of int(): the output is always an integer, regardless of the input type. Arbitrary binary data and ASCII compatible binary data are *different things* and the only argument in favour of modelling them with a single type is because Python 2 did it that way. The Python 3 text model was built on the notion of "no implicit encoding and decoding", and Guido's more lenient proposal brings that back by stealth: the semantics proposed for the integer codes are that they be essentially equivalent to performing the operation in the text domain and then encoding with ASCII. However, I'm OK with the idea if there are separate formatb/formatb_map APIs that allow the encoding support to be bypassed entirely - that way, using mod-formatting, format or format_map *is* explicit, since the only reason to use them over formatb/formatb_map would be for the implicit ASCII encoding support, eliminating the ambiguity. Regards, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jan 13 12:41:18 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Jan 2014 12:41:18 +0100 Subject: [Python-Dev] PEP 460 reboot References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> Message-ID: <20140113124118.55c8c2c3@fsol> On Sun, 12 Jan 2014 18:11:47 -0800 Guido van Rossum wrote: > On Sun, Jan 12, 2014 at 5:27 PM, Ethan Furman wrote: > > On 01/12/2014 04:47 PM, Guido van Rossum wrote: > >> %s seems the trickiest: I think with a bytes argument it should just > >> insert those bytes (and the padding modifiers should work too), and > >> for other types it should probably work like %a, so that it works as > >> expected for numeric values, and with a string argument it will return > >> the ascii()-variant of its repr(). Examples: > >> > >> b'%s' % 42 == b'42' > >> b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' > >> enclosed in single quotes) > > > > I'm not sure about the quotes. Would anyone ever actually want those in the > > byte stream? > > Perhaps not, but it's a hint that you should probably think about an > encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of > it as payback time. :-) What is the use case for embedding a quoted ASCII-encoded representation in a byte stream? Regards Antoine. From ncoghlan at gmail.com Mon Jan 13 12:53:41 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Jan 2014 21:53:41 +1000 Subject: [Python-Dev] Trying to focus the whole bytes/str formatting discussion In-Reply-To: References: Message-ID: On 13 January 2014 08:46, Brett Cannon wrote: > I don't know about the rest of you but I feel like the discussion is heading > off the rails (if it hasn't already jumped the tracks). Let's try to bring > this back around to something actionable which people can focus their energy > on as the amount of developer time spent arguing could have led to several > coded-up solutions. > > I see it as a practicality-beats-purity vs. > explicit-is-better-than-implicit. The PBP group want bytes.format() (just > assume I include interpolation support if you want that) to work as close to > a drop-in replacement for current str.format() use in Python 2 to ease > porting. The argument is that code looks cleaner and the amount of changes > in Python 2 code being ported to Python 3 is much smaller. > > THE EIBTI group are willing to support PEP 460 but beyond that don't want to > have in Python itself anything for bytes.format() which takes in a string > and spits out bytes. It's bytes in->bytes out and not bytes & str in->bytes > out as the PBP group is after. The EIBTI group are arguing that letting str > into bytes.format() and then automatically be converted to strict ASCII > leads to conflating the text/bytes divide as well as being too magical, e.g. > what if you actually wanted UTF-16 for you number string instead of ASCII; > the EIBTI group **wants** to force people to make a decision. They are also > less concerned with making users update Python 2 code to handle this as it > already needs to be updated for other Python 3 things anyway. > > From where I'm sitting, the EIBTI group and their PEP 460 proposal from > Antoine (and no longer Victor) are not controversial. Everyone seems to > agree that PEP 460 **at minimum** is acceptable and should happen for Python > 3.5. The people with the uphill battle and something to prove are those > arguing for str in->bytes out support in bytes.format(). The added features > that the PBP group want are the ones being argued over. > > As the onus is on the PBP group to convince the EIBTI group (or Guido), I > think the PBP group should code up a solution that does what they want and > put it on PyPI to see what the community thinks. If the PBP group wants to > convince the EIBTI group that str in->bytes out for bytes.format() is > critical in getting a key group of users to start using Python 3 then I > think that needs to be demonstrated through real-world usage by some people. Note that I am now fine with Guido's more lenient proposal *so long as* explicitly bytes-only formatb and formatb_map methods are also included. That would give us the following situation in 3.5: Text interpolation: str.__mod__, str.format, str.format_map ASCII compatible interpolation: bytes.__mod__, bytes.format, bytes.format_map Arbitrary binary interpolation: bytes.formatb, bytes.formatb_map Those are all reasonable operations for the language to support natively, and by providing convenient access to all three, we avoid the attractive nuisance that would be created by providing *only* ASCII interpolation without providing strict binary interpolation (since people would inevitably use the former when they should really be using the latter, because interpolation is such a convenient construct), while still addressing the interests of both groups (people like me and Antoine that like PEP 460 as it stands, as well as those that favour the ASCII encoding features). It's only the introduction of ASCII compatible interpolation support *without* binary interpolation support that I am adamantly opposed to - that's the kind of attractive nuisance that leads to people inappropriately using ASCII compatible only APIs and then discovering that their code breaks when confronted with ASCII incompatible encodings like UTF-16, ShiftJIS and ISO-2022. Originally I was opposed to the idea entirely, but then Antoine wrote the binary only version of PEP 460 and I found it to be a *very* elegant solution that didn't compromise the Python 3 text model. As long as this pure API remains available in some form (such as formatb and formatb_map methods), then I'm OK with the ASCII only version existing in parallel - at that point, it *is* analogous to all the other existing bytes methods that assume the use of ASCII compatible data. ** The caveat ** However, note that there were *two* significant issues that were raised in the recent broader discussions. PEP 460 only tackles the more tractable of the two: the fact that Twisted and Mercurial both consider bytes.__mod__ support a blocker for switching to Python 3. That's a useful discussion to have, but it's important for people to realise that the mod-formatting feature is utterly irrelevant to the concerns Armin Ronacher raised in http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/ that kicked off this whole recent spate of interest in the topic. Obviously, I disagree with his conclusions (and personally wish Python 2 Unicode experts would show a little more humility in trying to understand the core team's motivations for Python 3 design decisions rather than assuming that we're clueless idiots that decided to maintain 4 parallel branches in Subversion for a couple of years just because we thought it might be fun), but I can certainly understand his pain. I'm the one who actually *made* the changes to restore dual bytes/unicode support in urllib.parse for Python 3 (one of Armin's favourite examples of the difficulty of writing that kind of code using the Python 3 text model), and I agree entirely with Armin's assessment of that code: it isn't pretty, and it wasn't fun to write. Yes, I got it to work, and yes, it was satisfying when the tests finally based, and yes there is now a smaller number of cases where errors will pass silently, but that's far from the same thing as finding the process of getting there a pleasant one, or considering the result an elegant approach to porting hybrid APIs from Python 2 such that bytes in = bytes out and str in = str out. The only difference between Armin and myself in this respect is that I know the reasons for the changes the text model, and I think the increased difficulty in implementing that particular use case was worth it, given the pay-off in finally being able to remove the implicit encoding and decoding operations from the text model (Note that the unicode input handling in urlparse in Python 2 breaks entirely if you turn off implicit decoding. You can still get hits from the cache, but if you have to actually parse anything, it will fail: http://python-notes.curiousefficiency.org/en/latest/python3/binary_protocols.html#couldn-t-the-implicit-decoding-just-be-disabled-in-python-2). The fact remains, however, that in Python 2 the code you need for that kind of hybrid API was *easy* to write - you just made all your internal constants 8-bit strings, and the implicit decoding to Unicode took care of the case of str inputs. There are still valid use cases for such hybrid APIs, even in Python 3 (urllib.parse is one of them), and the reason I helped Benno start the asciicompat project (https://github.com/jeamland/asciicompat) is because I want to make that kind of code almost as effortless as it was in Python 2 - all you should need to do is make your constants asciistr instances rather than builtin bytes or str objects. My ambition here is not "good enough to get people to stop complaining", it's "there's no actual reason Python 3 needs to be worse at this than Python 2, it just doesn't need to be part of the core builtin types, because we're in a better position to fix interoperability issues now that we don't have to deal with the close coupling between str and unicode that existed in Python 2, and the bytes type will generally play nice with anything that exposes the PEP 3118 buffer interface". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jan 13 13:08:00 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Jan 2014 22:08:00 +1000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> <3B1B5227-80B9-4E06-B639-F3111E96FF19@stufft.io> Message-ID: On 13 January 2014 17:15, Guido van Rossum wrote: > On Sun, Jan 12, 2014 at 10:59 PM, Nick Coghlan wrote: >> All it takes is to let go of the idea "I wish the Python 3 bytes type >> was more like the Python 2 str type" and instead think "hmm, the >> Python 3 bytes type doesn't seem like a great fit for my use case, >> maybe I need a different type". > > Maybe you're letting your excitement about asciistr get the better of > you? IMO we don't need more types. If you can refrain from using > int(b), b.lower() and b += 'abc' when b isn't ASCII-encoded, why > couldn't you also refrain from b += b'%s' % 42? It's the fact I'd feel obliged to refrain from using *any* of the proposed interpolation methods when dealing with arbitrary binary data if they include the assumption of ASCII compatibility. The reason Antoine's updates to PEP 460 earned an immediate +1 from me (even though I was initially dubious about the PEP in general) is that it aligns *exactly* with how I usually use the bytes type in Python 3 - as a pure container of arbitrary binary data, without making assumptions about whether it is ASCII compatible or not. While I still occasionally have reservations about it, I think on balance it's a good thing that the bytes type has a much support for ASCII compatible data , but my specific concern with your more lenient proposal is that it takes something that I liked and would use (the current PEP 460 API) and turned it into something I would have to avoid because it doesn't correctly support arbitrary binary data. > I'll suppress the urge to quote verbatim from my first message in this > thread (about the motivation for bytes) but I'll just recommend you > re-read it. > > (It's too late here to write more, but it looks like we are in for a > bitter fight. :-( ) I realised my problem was specifically with providing the ASCII compatible version *without* providing a pure binary equivalent that *doesn't* involve making the assumption of ASCII compatibility. This means that adding formatb and formatb_map methods with the current semantics of format and format_map from PEP 460 would cover the use cases I care about, and I can then happily ignore the debates about what the semantics of the ASCII compatible version will be. The semantics of binary interpolation could potentially even be simplified further, since the ASCII assuming versions would be responsible for handling the 2/3 source compatibility problem. "{}".formatb(other) would also provide an alternative to calling the bytes constructor that doesn't suffer from the unexpected-int-input-is-handled-as-a-length failure mode. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jan 13 14:06:04 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Jan 2014 23:06:04 +1000 Subject: [Python-Dev] PEP 460 reboot and a bitter fight In-Reply-To: <52D39CD1.608@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> <3B1B5227-80B9-4E06-B639-F3111E96FF19@stufft.io> <52D39CD1.608@stoneleaf.us> Message-ID: On 13 January 2014 17:59, Ethan Furman wrote: > On 01/12/2014 11:15 PM, Guido van Rossum wrote: >> >> >> (It's too late here to write more, but it looks like we are in for a >> bitter fight. :-( ) > > > It's already been a bitter fight. > > The opponents of %-interpolation (Nick, Antoine, Turnbull, D'Aprano, et al*) > all seem to be arguing basically what Nick said. > > The proponents (myself, you, Stufft, Eric Smith, et al*) are arguing that > bytes already has an ASCII bias, already has ASCII string methods, that it > isn't the same as the Py2 world because if you combine a bytes object with a > str object outside of interpolation (such as b'hello' + 'world') it doesn't > work, that only bytes would ever be returned, etc, etc. > > With the possible exception of the question I just asked Nick, I don't > think we're going to get any new information. I figured out tonight that it's only positioning ASCII interpolation as an *alternative* to adding binary interpolation that I have a problem with. It isn't, because you lose the structural assurance that you haven't inadvertently introduced an assumption of ASCII compatibility when you didn't need to. However, interpolation support is a convenient enough interface that I can see a version that *only* supports ASCII compatible interpolation being an attractive nuisance that becomes a source of hard to detect and fix data corruption bugs (just like the str type in Python 2). If we add both, my objections go away: people like me can use the Python 3 only formatb and formatb_map methods and be confident we haven't inadvertently introduced any assumptions regarding ASCII compatibility, while folks that know they're dealing with an ASCII compatible format can use the ASCII assuming versions that are designed to be source compatible with Python 2. If someone incorrectly uses format() or format_map() when they should be using the pure binary versions, that's a trivial bug fix (adding the necessary "b", and perhaps some explicit encoding calls) rather than a major restructuring of the code. If they use mod-formatting, that's a slightly bigger fix, but still just switching to a different spelling of the formatting operation. Both use cases (binary only and ASCII compatible) get covered cleanly, and nobody has to lose out. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jan 13 14:43:35 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Jan 2014 23:43:35 +1000 Subject: [Python-Dev] Python advanced debug support (update frame code) In-Reply-To: References: Message-ID: On 13 January 2014 09:08, Fabio Zadrozny wrote: > Hi Python-dev. > > I'm playing a bit on the concept on live-coding during a debug session and > one of the most annoying things is that although I can reload the code for a > function (using something close to xreload), it seems it's not possible to > change the code for the current frame (i.e.: I need to get out of the > function call and then back in to a call to the method from that frame to > see the changes). > > I gave a look on the frameobject and it seems it would be possible to set > frame.f_code to another code object -- and set the line number to the start > of the new object, which would cover the most common situation, which would > be restarting the current frame -- provided the arguments remain the same > (which is close to what the java debugger in Eclipse does when it drops the > current frame -- on Python, provided I'm not in a try..except block I can do > even better setting the the frame.f_lineno, but without being able to change > the frame f_code it loses a lot of its usefulness). > > So, I'd like to ask for feedback from people with more knowledge on whether > it'd be actually feasible to change the frame.f_code and possible > implications on doing that. Huh, I would have sworn there was already an issue on the tracker about that, but it appears not (Eric Snow has one about adding a reference to the running function, but nothing about trying to switch an executing frame: http://bugs.python.org/issue12857). Anyway, your main problem isn't the reference to the code object from the frame: it's the fact that the main eval loop has a reference to that code object from a C level stack variable, and stores a bunch of other state directly on the C stack. I don't see anything *intrinsically* impossible about the idea, it just wouldn't be easy, since you'd have to come up with a way of dealing with that C level state that didn't slow down normal operation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From breamoreboy at yahoo.co.uk Mon Jan 13 15:25:21 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 13 Jan 2014 14:25:21 +0000 Subject: [Python-Dev] PEP 460 reboot and a bitter fight In-Reply-To: <52D39CD1.608@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> <3B1B5227-80B9-4E06-B639-F3111E96FF19@stufft.io> <52D39CD1.608@stoneleaf.us> Message-ID: On 13/01/2014 07:59, Ethan Furman wrote: > On 01/12/2014 11:15 PM, Guido van Rossum wrote: >> > The proponents (myself, you, Stufft, Eric Smith, et al*) are arguing > that bytes already has an ASCII bias, already has ASCII string methods, > that it isn't the same as the Py2 world because if you combine a bytes > object with a str object outside of interpolation (such as b'hello' + > 'world') it doesn't work, that only bytes would ever be returned, etc, etc. > > -- > ~Ethan~ "ASCII bias" seems to me an understatement. From http://docs.python.org/3/library/stdtypes.html#bytes-and-bytearray-operations "Due to the common use of ASCII text as the basis for binary protocols, bytes and bytearray objects provide almost all methods found on text strings". Can you get any clearer than that, or have I been completely swamped by the massive tsunami that these PEP 460 threads are? Note that I'm *NOT* taking sides here, I'd just like to see a peaceful settlement without any bloodshed :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From stephen at xemacs.org Mon Jan 13 15:43:02 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Jan 2014 23:43:02 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D38BB6.1000100@g.nevcal.com> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <52D31049.1090708@g.nevcal.com> <87ob3geoa9.fsf@uwakimon.sk.tsukuba.ac.jp> <52D38BB6.1000100@g.nevcal.com> Message-ID: <87fvosdjt5.fsf@uwakimon.sk.tsukuba.ac.jp> Glenn Linderman writes: > On 1/12/2014 4:08 PM, Stephen J. Turnbull wrote: >> Glenn Linderman writes: >>> the proposals to embed binary in Unicode by abusing Latin-1 >>> encoding. >> Those aren't "proposals", they are currently feasible >> techniques in Python 3 for *some* use cases. The question is why >> infecting Python 3 with the byte/character confoundance virus is >> preferable to such techniques, especially if their (serious!) >> deficiencies are removed by creating a new type such as >> asciistr. > "smuggled binary" (great term borrowed from a different > subthread) muddies the waters of what you are dealing with. Not really. The "mud" is one or more of the serious deficiencies. It can be removed, I believe (and Nick apparently does, too). "asciistr" is one way to try that. > When the mixture of text and binary is done as encoded text in > binary, then it is obvious that only limited text processing can be > performed, Hardly. After all, that's how all text processing was done for decades. Still is, in some programs, especially C programs. > And there are no extra, confusing Latin-1 encode/decode operations > required. The "extra" encode/decode operations are mostly (perhaps all) due to examples that started from bytes and end with bytes. Of course if you assume that API and propose to do the operations using Unicode, you'll get "extra" decode/encode operations. > From a higher-level perspective, I think it would be great to have > a module, perhaps called "boundary" (let's call it that for now), > that allow some definition syntax (augmented BNF? augmented ABNF?) > to explain the format of a binary blob. We have struct, for one. I'm not sure why you want more than that. I suppose you could go all the way to ASN.1. From barry at python.org Mon Jan 13 16:49:49 2014 From: barry at python.org (Barry Warsaw) Date: Mon, 13 Jan 2014 10:49:49 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> Message-ID: <20140113104949.06d591a7@anarchist.wooz.org> On Jan 12, 2014, at 06:11 PM, Guido van Rossum wrote: >Perhaps not, but it's a hint that you should probably think about an >encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of >it as payback time. :-) Which unfortunately causes no end of headaches, often difficult to debug. https://wiki.python.org/moin/PortingToPy3k/BilingualQuickRef (see 'doctests' for one such impact). -Barry From barry at python.org Mon Jan 13 16:52:16 2014 From: barry at python.org (Barry Warsaw) Date: Mon, 13 Jan 2014 10:52:16 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D37D5C.6070604@g.nevcal.com> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> Message-ID: <20140113105216.58a8283d@anarchist.wooz.org> On Jan 12, 2014, at 09:45 PM, Glenn Linderman wrote: >Quotes in the stream are a great debug hint, without blowing up. They actually terrible for debugging for exactly the same reason as coercion in Python 2. It's rarely what you really want, it silently succeeds, and it means that the user visible error is far removed from the actual bug, both in code distance and time. So yes, it tells you Something Went Wrong, but is actually a hindrance to finding and fixing the problem. -Barry From ethan at stoneleaf.us Mon Jan 13 16:33:56 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 07:33:56 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D3B68D.90601@hotpy.org> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D34C71.1020200@stoneleaf.us> <52D34E65.2050007@stoneleaf.us> <52D3A7D5.9080106@hotpy.org> <52D3AF8C.9030203@g.nevcal.com> <52D3B68D.90601@hotpy.org> Message-ID: <52D40764.6000309@stoneleaf.us> On 01/13/2014 01:49 AM, Mark Shannon wrote: > > '%s' can't work in 3.5, as we must differentiate between > strings which meed to be encoded and bytes which don't. I don't understand this objection: def __mod__(self, other): if isinstance(other, bytes): # no encoding necessary elif isinstance(other, str): # payback time! other = ascii(other) Where is the problem? -- ~Ethan~ From guido at python.org Mon Jan 13 16:59:10 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 07:59:10 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140113124118.55c8c2c3@fsol> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> Message-ID: On Mon, Jan 13, 2014 at 3:41 AM, Antoine Pitrou wrote: > What is the use case for embedding a quoted ASCII-encoded representation > in a byte stream? It doesn't crash but produces undesired output (always, not only when the data is non-ASCII) that gives the developer a hint to think about encoding to bytes. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Mon Jan 13 17:09:49 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Jan 2014 17:09:49 +0100 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> Message-ID: <20140113170949.242b9e00@fsol> On Mon, 13 Jan 2014 07:59:10 -0800 Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 3:41 AM, Antoine Pitrou wrote: > > What is the use case for embedding a quoted ASCII-encoded representation > > in a byte stream? > > It doesn't crash but produces undesired output (always, not only when > the data is non-ASCII) that gives the developer a hint to think about > encoding to bytes. But why is it better to give a hint by producing undesired output (which may actually go unnoticed for some time and produce issues down the road), rather than simply by raising TypeError? By that token we may simply insert an error string ("CAUTION: YOU MISS AN ENCODING HERE"), rather than the ascii() representation of the argument. Regards Antoine. From raf at durin42.com Mon Jan 13 14:57:05 2014 From: raf at durin42.com (Augie Fackler) Date: Mon, 13 Jan 2014 08:57:05 -0500 Subject: [Python-Dev] PEP460 thoughts from a Mercurial dev Message-ID: (sorry for not piling on any existing threads - I don't subscribe to python-dev due to lack of time) Brett Cannon asked me to chime in - I haven't actually read the very long thread at this point, I'm just providing responses to things Brett mentioned: 1) What do we need in terms of functionality Best guess, %s, %d, and %f. I've not done a full audit of the code, but some limited looking over the grep hits for % in .py files suggests I'm right, and we could even do without %f (we only use that for 'hg --time' output, which we could do in unicode). We also need some way to emit raw bytes (in potentially mixed encodings, yes I know this is "doing it wrong") to stdout/stderr (example: someone changes a file from latin1 to utf8, and then wants to see the resulting diff). 2) Would having it as an external library that worked with Python 2 help? Probably, IF it came with 2.4 support (RHEL support, basically), and we could bundle it in our source tree. It's been extremely valuable to have the install only depend on a working C compiler and Python. 3) If this does go in, how long would it take us to port Mercurial to py3? Would it being in 3.5 hold us up? I'm honestly not sure. I'm still in the outermost layers of this yak shave: fixing cyclic imports. I'll know more when I can at least get 'hg version' to print its own version, because at that point the testsuite failures might be informative. I'd honestly _rather_ this went into 3.5 *and* got lots of validation by both us and twisted (the other folks that care?) before becoming set in stone by a release. Does that make sense? 4) Do we care if it's .format()/%, or could it be in the stdlib? It'd be really nice to not have to boil the oceans as far as editing everyplace in the codebase that does % today. If we do have to do that, it's not going to be much more helpful than something like: def maybestr(a): if isinstance(a, bytes): return a.decode('latin1) return a def sprintf(fmt, *args): (fmt.decode('latin1') % [maybestr(a) for a in args]).encode('latin1) or similar. That was (roughly) what I was figuring I'd do today without any formal bytes-string-formatting support. He also mentioned that some are calling for a shortened 3.5 release cycle - I'd rather not see that happen, for the aforementioned reason of wanting time to make sure this is Right - it'd be a shame to do the work and rush it out only to find something missing in an important way. Feel free to ask further questions - I'll try to respond promptly. AF (For those curious: my hg-on-py3 repo isn't published at the moment because I rebuilt the server it lived on and I forgot to publish it. I'll rectify that sometime this week, I hope, but it's really totally nonfunctional due to cyclic imports.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jan 13 17:30:21 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 08:30:21 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D4149D.6070002@stoneleaf.us> On 01/13/2014 02:48 AM, Stephen J. Turnbull wrote: > Ethan Furman writes: > >> The part that you don't seem to acknowledge (sorry if I missed it) >> is that there are str-like methods already on bytes. > > I haven't expressed myself well, but I don't much care about that. You don't care that there are str-like methods on bytes? Whether you do or not, they are there, and they impact how people think about bytes and what is (and what should be) allowed. > It's what Knuth would classify as a seminumerical method. I do not see how that's relevant. What matters is not how we can manipulate the data (everything is reduced to numbers at some point), but what the data represents. [snip] >>>> *My* definition is not ambiguous at all. If this particular part >>>> of the byte stream is defined to contain ASCII-encoded text, then I >>>> can use the bytes text methods to work with it. >>> >>> But how is Python supposed to know that? >> >> Python doesn't need to. > > ... because you know it. But the ideal of object-oriented programming > (and duck-typing) is that you shouldn't need to; the object should > know how to produce appropriate behavior itself. The ideal, sure. But if you're stuck with using a list to hold data for your higher-order recursive function are you going to expect the list data type to "know" which pops and inserts are allowed and which are not? Of course not. And you'd probably build a proper class on top of the list so those things could be checked. Now imagine that the list type didn't offer insert and pop, and you had to use slice replacement -- what a pain that would be! [snip] >>> But under your definition, you need to make the decision, or >>> explicitly code the decision, on the basis of context. >> >> Exactly so. I even have to do that in Py2. > > "Even." This is exactly where PBP and EIBTI part company, I think. > EIBTI thinks its a bad idea to pass around bytes that are implicitly > some other type bytes are /always/ implicitly some other type. They are basically raw data. They are given meaning by how we interpret them. [snip] > Even though "ethan" is perfectly good ASCII-encoded text (as well as > the integer 435,744,694,638 on a bigendian machine with 5-byte words, > and you have no way of knowing whether it was user data (CP1251) or a > metadata keyword (ASCII) or be the US national debt in 1967 dollars > (integer) when b'ethan' shows up in a trace? Context is everything. If b'ethan' shows up in a trace I would have to examine the surrounding code to see how those bytes were being used. >> And if there were methods that worked directly on a cp1251-encoded >> byte stream I would not have any problem using them on >> cp1251-encoded text.) > > I was afraid of that: all of those methods (except the case methods) > will work fine on a cp1251-encoded text. Really? Huh. They wouldn't work fine with the Spanish alphabet. I should've used that for my example. :/ > And because they only know > that the string is bytes, the case methods will silently corrupt your > "text" as soon as they get a chance. Inevitably there are methods that will "work" even if given the wrong data type, while others will either corrupt or blow up if not given exactly what they expect. You tell me that some ASCII methods will work okay on cp1251 text, and others will not. So I'm not going to use any of them on cp1251 as that is not what they are intended for. > That bothers me, even if it > doesn't bother you. Purity again, if you like. (But you'd take a > safe .upper if you got it for free, no?) Well, there is no such thing as free. ;) And there already is a safe .upper -- str.upper. And if I don't know that my bytes are ASCII, but I did know they were text, I wouldn't use ASCII methods, I'd convert to str and work there. -- ~Ethan~ From ethan at stoneleaf.us Mon Jan 13 17:36:05 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 08:36:05 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140113170949.242b9e00@fsol> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> Message-ID: <52D415F5.6020607@stoneleaf.us> On 01/13/2014 08:09 AM, Antoine Pitrou wrote: > On Mon, 13 Jan 2014 07:59:10 -0800 > Guido van Rossum wrote: >> On Mon, Jan 13, 2014 at 3:41 AM, Antoine Pitrou wrote: >>> What is the use case for embedding a quoted ASCII-encoded representation >>> in a byte stream? >> >> It doesn't crash but produces undesired output (always, not only when >> the data is non-ASCII) that gives the developer a hint to think about >> encoding to bytes. > > But why is it better to give a hint by producing undesired output (which > may actually go unnoticed for some time and produce issues down the > road), rather than simply by raising TypeError? You mean crash all the time? I'd be fine with that for both the str case and the bytes case. But's probably too late to change the str case, and the bytes case should mirror what str does. > By that token we may simply insert an error string ("CAUTION: YOU MISS > AN ENCODING HERE"), rather than the ascii() representation of the > argument. Well, the ascii repr is at least some clue as to where. A generic message would be no clue at all. -- ~Ethan~ From ncoghlan at gmail.com Mon Jan 13 17:51:31 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 14 Jan 2014 02:51:31 +1000 Subject: [Python-Dev] PEP460 thoughts from a Mercurial dev In-Reply-To: References: Message-ID: On 13 January 2014 23:57, Augie Fackler wrote: > (sorry for not piling on any existing threads - I don't subscribe to > python-dev due to lack of time) > > Brett Cannon asked me to chime in - I haven't actually read the very long > thread at this point, I'm just providing responses to things Brett > mentioned: > > 1) What do we need in terms of functionality > > Best guess, %s, %d, and %f. I've not done a full audit of the code, but some > limited looking over the grep hits for % in .py files suggests I'm right, > and we could even do without %f (we only use that for 'hg --time' output, > which we could do in unicode). I think PEP 460 will have you covered there, or hopefully asciistr on 3.3+ > We also need some way to emit raw bytes (in potentially mixed encodings, yes > I know this is "doing it wrong") to stdout/stderr (example: someone changes > a file from latin1 to utf8, and then wants to see the resulting diff). Writing to sys.stdout.buffer may work for that, or else being able to change the encoding of an existing stream. For the latter, Victor had a working patch to _pyio at http://bugs.python.org/issue15216 and general consensus that the semantics were sensible, but it needs to be worked up into a full patch that covers the C version as well (I tried to muster some helpers for that in the leadup to 3.4 feature freeze, but unfortunately without any luck) > 2) Would having it as an external library that worked with Python 2 help? > > Probably, IF it came with 2.4 support (RHEL support, basically), and we > could bundle it in our source tree. It's been extremely valuable to have the > install only depend on a working C compiler and Python. asciicompat.asciistr is just an alias for str on Python 2.x, so if we get that working, it may be something you could vendor into Mercurial for Python 3.3+ support. (There will likely be gaps in what asciistr can do due to interoperability issues in the core types, but the PEP 393 changes to the internal representation mean it should be able to get us pretty close) > 3) If this does go in, how long would it take us to port Mercurial to py3? > Would it being in 3.5 hold us up? > > I'm honestly not sure. I'm still in the outermost layers of this yak shave: > fixing cyclic imports. I'll know more when I can at least get 'hg version' > to print its own version, because at that point the testsuite failures might > be informative. I'd honestly _rather_ this went into 3.5 *and* got lots of > validation by both us and twisted (the other folks that care?) before > becoming set in stone by a release. Does that make sense? Yes, that actually makes a lot of sense to me - there's no point in us rushing to get this into 3.4 and then you folks discovering in 6 months it doesn't quite work for you, and then having to wait for 3.5 anyway (or, worse, Python 3 being locked into a solution that doesn't work for you by it's own internal backwards compatibility requirements). > > 4) Do we care if it's .format()/%, or could it be in the stdlib? > > It'd be really nice to not have to boil the oceans as far as editing > everyplace in the codebase that does % today. If we do have to do that, it's > not going to be much more helpful than something like: > > def maybestr(a): > if isinstance(a, bytes): > return a.decode('latin1) > return a > > def sprintf(fmt, *args): > (fmt.decode('latin1') % [maybestr(a) for a in args]).encode('latin1) > > or similar. That was (roughly) what I was figuring I'd do today without any > formal bytes-string-formatting support. Agreed - I think the two solutions that potentially make the most sense are PEP 460 and an interoperable third party type like asciistr. They each have different pros and cons, so I'm actually currently a plan of doing both (if Guido is amenable to my suggestion of providing both ASCII compatible and binary interpolation). > He also mentioned that some are calling for a shortened 3.5 release cycle - > I'd rather not see that happen, for the aforementioned reason of wanting > time to make sure this is Right - it'd be a shame to do the work and rush it > out only to find something missing in an important way. By shortened, we're mostly talking about ensuring 3.5 is published before the 2.7.9 maintenance release. So early-to-mid 2015 rather than the more typical late 2015. > Feel free to ask further questions - I'll try to respond promptly. Thanks for the contribution! I found it very helpful :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Mon Jan 13 17:33:04 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 08:33:04 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140113105216.58a8283d@anarchist.wooz.org> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> <20140113105216.58a8283d@anarchist.wooz.org> Message-ID: <52D41540.2010208@stoneleaf.us> On 01/13/2014 07:52 AM, Barry Warsaw wrote: > On Jan 12, 2014, at 09:45 PM, Glenn Linderman wrote: > >> Quotes in the stream are a great debug hint, without blowing up. > > They actually terrible for debugging for exactly the same reason as coercion > in Python 2. It's rarely what you really want, it silently succeeds, and it > means that the user visible error is far removed from the actual bug, both in > code distance and time. So yes, it tells you Something Went Wrong, but is > actually a hindrance to finding and fixing the problem. You mean like this is? --> '%s' % b'abc' "b'abc'" I agree, but we're stuck with it with str, we may as well be stuck with it for bytes, too. :/ -- ~Ethan~ From ethan at stoneleaf.us Mon Jan 13 16:54:17 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 07:54:17 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D39358.8090405@stoneleaf.us> Message-ID: <52D40C29.5040708@stoneleaf.us> On 01/13/2014 01:13 AM, Nick Coghlan wrote: > On 13 Jan 2014 17:43, "Ethan Furman" wrote: >> On 01/12/2014 10:51 PM, Nick Coghlan wrote: >>> >>> I am a strong -1 on the more lenient proposal, as it makes binary >>> interpolation in Python 3 an *unsafe operation* for ASCII incompatible >>> binary formats. >> >> No more unsafe that calling .upper() on ASCII incompatible streams. > > Right - Guido's proposal is *completely useless* for arbitrary binary data. You can't trust it. Forgive me for being dense, but I don't understand your objection. With Guido's proposal, '%s' % bytes_data, bytes_data is passed through unchanged. Did you mean something else by "binary data"? -- ~Ethan~ From ncoghlan at gmail.com Mon Jan 13 18:12:51 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 14 Jan 2014 03:12:51 +1000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D40C29.5040708@stoneleaf.us> References: <52D39358.8090405@stoneleaf.us> <52D40C29.5040708@stoneleaf.us> Message-ID: On 14 January 2014 01:54, Ethan Furman wrote: > On 01/13/2014 01:13 AM, Nick Coghlan wrote: > >> On 13 Jan 2014 17:43, "Ethan Furman" wrote: >>> >>> On 01/12/2014 10:51 PM, Nick Coghlan wrote: >>>> >>>> >>>> I am a strong -1 on the more lenient proposal, as it makes binary >>>> interpolation in Python 3 an *unsafe operation* for ASCII incompatible >>>> binary formats. >>> >>> >>> No more unsafe that calling .upper() on ASCII incompatible streams. >> >> >> Right - Guido's proposal is *completely useless* for arbitrary binary >> data. You can't trust it. > > > Forgive me for being dense, but I don't understand your objection. With > Guido's proposal, '%s' % bytes_data, bytes_data is passed through unchanged. > Did you mean something else by "binary data"? I mean it will work, but it will mean you've introduced an implicit assumption of ASCII compatibility into the structure your program, with no straightforward way of removing it (you would have to rewrite your code to not rely on interpolation). This becomes most obvious when the formatting string is passed as a variable, rather than being provided as a literal, or when you don't know the type of the *value* provided and some types may involved implicit encoding operation (I don't think Guido proposed that, but others have). That's the kind of data driven uncertainty I don't like in Python 2, and I find it's categorical elimination to be one of the best features of Python 3 - there are certain kinds of data manipulation bugs that simply *can't exist* because the types don't work that way any more. However, that's also why *adding* formatb/formatb_map to the proposal (with Antoine's stricter semantics) would resolve my concerns - you can ensure you don't introduce an implicit assumption of ASCII compatibility by using those for interpolation rather than the ASCII compatible __mod__/format/format_map that the bytes type will share with the str type. The combination of the two is completely in keeping with the Python 3 text model - we would offer text interpolation, hybrid ASCII compatible interpolation *and* pure binary interpolation. Offering only the first two would mean relegating the pure binary domain to a lower status again, since assuming ASCII compatibility would grant you access to an interpolation API, so people would be inclined to use it even when doing so opens the door to data corruption bugs. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Mon Jan 13 18:18:10 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Jan 2014 18:18:10 +0100 Subject: [Python-Dev] PEP 460 reboot References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> Message-ID: <20140113181810.27727aa7@fsol> On Mon, 13 Jan 2014 08:36:05 -0800 Ethan Furman wrote: > On 01/13/2014 08:09 AM, Antoine Pitrou wrote: > > On Mon, 13 Jan 2014 07:59:10 -0800 > > Guido van Rossum wrote: > >> On Mon, Jan 13, 2014 at 3:41 AM, Antoine Pitrou wrote: > >>> What is the use case for embedding a quoted ASCII-encoded representation > >>> in a byte stream? > >> > >> It doesn't crash but produces undesired output (always, not only when > >> the data is non-ASCII) that gives the developer a hint to think about > >> encoding to bytes. > > > > But why is it better to give a hint by producing undesired output (which > > may actually go unnoticed for some time and produce issues down the > > road), rather than simply by raising TypeError? > > You mean crash all the time? I'd be fine with that for both the str > case and the bytes case. But's probably too late > to change the str case, and the bytes case should mirror what str does. No, there's a good reason for the str case: it's that every Python object should have a working __str__ (for debugging, REPL use, etc.). So bytes has a __str__ too and that's why "%s" % (some_bytes_object) succeeds. Conversely, though, str needn't and shouldn't have a __bytes__, so there's no good reason for b"%s" % (some_str_object) to succeed. (moreover, I don't think "we did it wrong here" should be a good reason for doing it wrong there too) Regards Antoine. From solipsis at pitrou.net Mon Jan 13 18:31:14 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Jan 2014 18:31:14 +0100 Subject: [Python-Dev] PEP 460 reboot References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> Message-ID: <20140113183114.0fea07a8@fsol> On Mon, 13 Jan 2014 08:36:05 -0800 Ethan Furman wrote: > On 01/13/2014 08:09 AM, Antoine Pitrou wrote: > > On Mon, 13 Jan 2014 07:59:10 -0800 > > Guido van Rossum wrote: > >> On Mon, Jan 13, 2014 at 3:41 AM, Antoine Pitrou wrote: > >>> What is the use case for embedding a quoted ASCII-encoded representation > >>> in a byte stream? > >> > >> It doesn't crash but produces undesired output (always, not only when > >> the data is non-ASCII) that gives the developer a hint to think about > >> encoding to bytes. > > > > But why is it better to give a hint by producing undesired output (which > > may actually go unnoticed for some time and produce issues down the > > road), rather than simply by raising TypeError? > > You mean crash all the time? I'd be fine with that for both the str case > and the bytes case. But's probably too late > to change the str case, and the bytes case should mirror what str does. Let me add something else: str and bytes don't have to be symmetrical. In Python 2, str and unicode were symmetrical, they allowed exactly the same operations and were composable. In Python 3, str and bytes are different beasts; they have different operations *and* different semantics (for example, bytes interoperates with bytearray and memoryview, while str doesn't). So bytes formatting really needn't (and shouldn't, IMO) mirror str formatting. (the only reason I used "%s" in PEP 460 is to allow a migration path from 2.x bytes-formatting to 3.x bytes-formatting; in a really "pure" proposal it would have been called something else) Regards Antoine. From guido at python.org Mon Jan 13 18:34:39 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 09:34:39 -0800 Subject: [Python-Dev] PEP460 thoughts from a Mercurial dev In-Reply-To: References: Message-ID: On Mon, Jan 13, 2014 at 8:51 AM, Nick Coghlan wrote: > On 13 January 2014 23:57, Augie Fackler wrote: >> 1) What do we need in terms of functionality >> >> Best guess, %s, %d, and %f. I've not done a full audit of the code, but some >> limited looking over the grep hits for % in .py files suggests I'm right, >> and we could even do without %f (we only use that for 'hg --time' output, >> which we could do in unicode). > > I think PEP 460 will have you covered there, or hopefully asciistr on 3.3+ I'm confused on how PEP 460 would help -- Augie mentioned %d, which it excludes. -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Mon Jan 13 17:39:20 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 08:39:20 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140113104949.06d591a7@anarchist.wooz.org> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113104949.06d591a7@anarchist.wooz.org> Message-ID: <52D416B8.40008@stoneleaf.us> On 01/13/2014 07:49 AM, Barry Warsaw wrote: > On Jan 12, 2014, at 06:11 PM, Guido van Rossum wrote: > >> Perhaps not, but it's a hint that you should probably think about an >> encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of >> it as payback time. :-) > > Which unfortunately causes no end of headaches, often difficult to debug. Is it, in fact, too late to change that behavior? -- ~Ethan~ From ethan at stoneleaf.us Mon Jan 13 18:39:27 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 09:39:27 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D416B8.40008@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113104949.06d591a7@anarchist.wooz.org> <52D416B8.40008@stoneleaf.us> Message-ID: <52D424CF.6020006@stoneleaf.us> On 01/13/2014 08:39 AM, Ethan Furman wrote: > On 01/13/2014 07:49 AM, Barry Warsaw wrote: >> On Jan 12, 2014, at 06:11 PM, Guido van Rossum wrote: >> >>> Perhaps not, but it's a hint that you should probably think about an >>> encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of >>> it as payback time. :-) >> >> Which unfortunately causes no end of headaches, often difficult to debug. > > Is it, in fact, too late to change that behavior? Never mind, Antoine explained it for me. :) -- ~Ethan~ From guido at python.org Mon Jan 13 18:39:30 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 09:39:30 -0800 Subject: [Python-Dev] PEP460 thoughts from a Mercurial dev In-Reply-To: References: Message-ID: On Mon, Jan 13, 2014 at 9:37 AM, Augie Fackler wrote: > > On Mon, Jan 13, 2014 at 12:34 PM, Guido van Rossum wrote: >> >> On Mon, Jan 13, 2014 at 8:51 AM, Nick Coghlan wrote: >> > On 13 January 2014 23:57, Augie Fackler wrote: >> >> 1) What do we need in terms of functionality >> >> >> >> Best guess, %s, %d, and %f. I've not done a full audit of the code, but >> >> some >> >> limited looking over the grep hits for % in .py files suggests I'm >> >> right, >> >> and we could even do without %f (we only use that for 'hg --time' >> >> output, >> >> which we could do in unicode). >> > >> > I think PEP 460 will have you covered there, or hopefully asciistr on >> > 3.3+ >> >> I'm confused on how PEP 460 would help -- Augie mentioned %d, which it >> excludes. > > > > Yes - not having %d makes this much much less useful to me. > > For my part, it'd probably be fine if we could do %s (which would handle an > RHS that was bytes, and only bytes, no handing of str or __bytes__-type > stuff at all) and %d (with all the usual format modifiers, and would result > in an ascii-compatible sequence of bytes all the time). Would it be okay of instead of %s you had to use %b for those semantics? (%d would still exist) -- --Guido van Rossum (python.org/~guido) From rdmurray at bitdance.com Mon Jan 13 18:42:36 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 13 Jan 2014 12:42:36 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140113124118.55c8c2c3@fsol> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> Message-ID: <20140113174237.25050250030@webabinitio.net> On Mon, 13 Jan 2014 12:41:18 +0100, Antoine Pitrou wrote: > On Sun, 12 Jan 2014 18:11:47 -0800 > Guido van Rossum wrote: > > On Sun, Jan 12, 2014 at 5:27 PM, Ethan Furman wrote: > > > On 01/12/2014 04:47 PM, Guido van Rossum wrote: > > >> %s seems the trickiest: I think with a bytes argument it should just > > >> insert those bytes (and the padding modifiers should work too), and > > >> for other types it should probably work like %a, so that it works as > > >> expected for numeric values, and with a string argument it will return > > >> the ascii()-variant of its repr(). Examples: > > >> > > >> b'%s' % 42 == b'42' > > >> b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' > > >> enclosed in single quotes) > > > > > > I'm not sure about the quotes. Would anyone ever actually want those in the > > > byte stream? > > > > Perhaps not, but it's a hint that you should probably think about an > > encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of > > it as payback time. :-) > > What is the use case for embedding a quoted ASCII-encoded representation > in a byte stream? There is no use case in the sense you are asking, just like there is no real use case for '%s' % b'x' producing "b'x'". But the real use case is exactly the same: to let you know your code is screwed up without actually blowing up with a encoding Exception. For the record, I like Guido's logic and proposal. I don't understand Nick's objection, since I don't see the difference between the situation here where a string gets interpolated into bytes as 'xxx' and the corresponding situation where bytes gets interpolated into a string as b'xxx'. Why struggle to keep bytes interpolation "pure" if string interpolation isn't? Guido's proposal makes the language more symmetric, and thus more consistent and less surprising. Exactly the hallmarks of Python's design sense, IMO. (Big surprise, right? :) Of course, this point of view *is* based on the idea that when you are doing interpolation using %/.format, you are in fact primarily concerned with ASCII compatible byte streams. This is a Practicality sort of argument. It is, after all, by far the most common use case when doing interpolation[*]. If you wanted to do a purist version of this symmetry, you'd have bytes(x) calling __bytes__ if it was defined and falling back to calling a __brepr__ otherwise. But what would __brepr__ implement? The variety of format codes in the struct module argues that there is no "one obvious" binary repr for most types. (Those that have one would implement __bytes__). And what would be the __brepr__ of an arbitrary 'object'? Faced with the impracticality of defining __brepr__ usefully in any "pure bytes" form, it seems sensible to admit that the most useful __brepr__ is the ascii() encoding of the __repr__. Which naturally produces 'xxx' as the __brepr__ of a string. This does cause things to get a little un-pretty when you are operating at the python prompt: >>> b'%s' % object b'""' But then again that is most likely really not what you mean to do, so it becomes a big red flag...just like b'xxx' is a small red flag when you accidentally interpolate unencoded bytes into a string. --David PS: When I first read Guido's remark that the result of interpolating a string should be 'xxx', I went Wah? I had to reason my way through to it as above, but to him it was just the natural answer. Guido isn't always right, but this kind of automatic language design consistency is one reason he's the BDFL. [*] I still think that you mostly want to design your library so that you are handling the text parts as text and the bytes parts as bytes, and encoding/gluing them as appropriate at the IO boundary. But if Guido says his real code would benefit by being able to interpolate ASCII into bytes at certain points, I'll believe him. From solipsis at pitrou.net Mon Jan 13 18:43:01 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Jan 2014 18:43:01 +0100 Subject: [Python-Dev] PEP460 thoughts from a Mercurial dev References: Message-ID: <20140113184301.2fb4bf96@fsol> On Mon, 13 Jan 2014 09:34:39 -0800 Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 8:51 AM, Nick Coghlan wrote: > > On 13 January 2014 23:57, Augie Fackler wrote: > >> 1) What do we need in terms of functionality > >> > >> Best guess, %s, %d, and %f. I've not done a full audit of the code, but some > >> limited looking over the grep hits for % in .py files suggests I'm right, > >> and we could even do without %f (we only use that for 'hg --time' output, > >> which we could do in unicode). > > > > I think PEP 460 will have you covered there, or hopefully asciistr on 3.3+ > > I'm confused on how PEP 460 would help -- Augie mentioned %d, which it excludes. Serhiy did a survey of formatting codes in the Mercurial sources: https://mail.python.org/pipermail/python-dev/2014-January/130969.html Regards Antoine. From raf at durin42.com Mon Jan 13 18:37:19 2014 From: raf at durin42.com (Augie Fackler) Date: Mon, 13 Jan 2014 12:37:19 -0500 Subject: [Python-Dev] PEP460 thoughts from a Mercurial dev In-Reply-To: References: Message-ID: On Mon, Jan 13, 2014 at 12:34 PM, Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 8:51 AM, Nick Coghlan wrote: > > On 13 January 2014 23:57, Augie Fackler wrote: > >> 1) What do we need in terms of functionality > >> > >> Best guess, %s, %d, and %f. I've not done a full audit of the code, but > some > >> limited looking over the grep hits for % in .py files suggests I'm > right, > >> and we could even do without %f (we only use that for 'hg --time' > output, > >> which we could do in unicode). > > > > I think PEP 460 will have you covered there, or hopefully asciistr on > 3.3+ > > I'm confused on how PEP 460 would help -- Augie mentioned %d, which it > excludes. Yes - not having %d makes this much much less useful to me. For my part, it'd probably be fine if we could do %s (which would handle an RHS that was bytes, and only bytes, no handing of str or __bytes__-type stuff at all) and %d (with all the usual format modifiers, and would result in an ascii-compatible sequence of bytes all the time). -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jan 13 18:38:33 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 09:38:33 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140113183114.0fea07a8@fsol> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> Message-ID: <52D42499.1070001@stoneleaf.us> On 01/13/2014 09:31 AM, Antoine Pitrou wrote: > On Mon, 13 Jan 2014 08:36:05 -0800 > Ethan Furman wrote: >> >> You mean crash all the time? I'd be fine with that for both the str case >> and the bytes case. But's probably too late >> to change the str case, and the bytes case should mirror what str does. > > Let me add something else: str and bytes don't have to be symmetrical. > In Python 2, str and unicode were symmetrical, they allowed exactly the > same operations and were composable. > In Python 3, str and bytes are different beasts; they have different > operations *and* different semantics (for example, bytes interoperates > with bytearray and memoryview, while str doesn't). This makes sense to me. So I'm guess I'm fine with either the quoted ascii repr or the always blowing up method, with leaning towards the blowing up method. -- ~Ethan~ From yselivanov.ml at gmail.com Mon Jan 13 19:08:12 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 13 Jan 2014 13:08:12 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140113174237.25050250030@webabinitio.net> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> Message-ID: On January 13, 2014 at 12:45:40 PM, R. David Murray (rdmurray at bitdance.com) wrote: [snip] > There is no use case in the sense you are asking, just like there > is no > real use case for '%s' % b'x' producing "b'x'". But the real use > case > is exactly the same: to let you know your code is screwed up without > actually blowing up with a encoding Exception. Blowing up with an encoding exception is the *only* sane method of making you aware that something is wrong. It?s much better than just keeping producing some broken output, until it gets noticed. What?s the point of writing a piece of software that is working wrong without crashing? > For the record, I like Guido's logic and proposal. I don't understand > Nick's objection, since I don't see the difference between the > situation > here where a string gets interpolated into bytes as 'xxx' and > the > corresponding situation where bytes gets interpolated into > a string > as b'xxx'. Why struggle to keep bytes interpolation "pure" if > string > interpolation isn?t? Isn?t the whole point of this discussion to make python2 people who want to migrate on python3 happier? ?What?s the point for them to have a ported python2 code that produces "Status: b?42?? for "b?Status: %d? % 42?? And if you want to call ?str? on 42 and then encode the output in latin-1/ascii, then you?re just turning python3 in python2. - Yury From raf at durin42.com Mon Jan 13 18:41:11 2014 From: raf at durin42.com (Augie Fackler) Date: Mon, 13 Jan 2014 12:41:11 -0500 Subject: [Python-Dev] PEP460 thoughts from a Mercurial dev In-Reply-To: References: Message-ID: On Mon, Jan 13, 2014 at 12:39 PM, Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 9:37 AM, Augie Fackler wrote: > > > > On Mon, Jan 13, 2014 at 12:34 PM, Guido van Rossum > wrote: > >> > >> On Mon, Jan 13, 2014 at 8:51 AM, Nick Coghlan > wrote: > >> > On 13 January 2014 23:57, Augie Fackler wrote: > >> >> 1) What do we need in terms of functionality > >> >> > >> >> Best guess, %s, %d, and %f. I've not done a full audit of the code, > but > >> >> some > >> >> limited looking over the grep hits for % in .py files suggests I'm > >> >> right, > >> >> and we could even do without %f (we only use that for 'hg --time' > >> >> output, > >> >> which we could do in unicode). > >> > > >> > I think PEP 460 will have you covered there, or hopefully asciistr on > >> > 3.3+ > >> > >> I'm confused on how PEP 460 would help -- Augie mentioned %d, which it > >> excludes. > > > > > > > > Yes - not having %d makes this much much less useful to me. > > > > For my part, it'd probably be fine if we could do %s (which would handle > an > > RHS that was bytes, and only bytes, no handing of str or __bytes__-type > > stuff at all) and %d (with all the usual format modifiers, and would > result > > in an ascii-compatible sequence of bytes all the time). > > Would it be okay of instead of %s you had to use %b for those > semantics? (%d would still exist) Probably, but it'd be quite painful, since we'd have to to some kind of .sub() call all over the place to remain compatible with 2.4 and 2.6. Dropping 2.4 might be possible in the 3.5 timeframe - 2.6 almost certainly not. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Mon Jan 13 19:14:38 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 13 Jan 2014 19:14:38 +0100 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D42499.1070001@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> <52D42499.1070001@stoneleaf.us> Message-ID: Am 13.01.2014 18:38, schrieb Ethan Furman: > On 01/13/2014 09:31 AM, Antoine Pitrou wrote: >> On Mon, 13 Jan 2014 08:36:05 -0800 Ethan Furman wrote: >>> >>> You mean crash all the time? I'd be fine with that for both the str >>> case and the bytes case. But's probably too late to change the str case, >>> and the bytes case should mirror what str does. >> >> Let me add something else: str and bytes don't have to be symmetrical. In >> Python 2, str and unicode were symmetrical, they allowed exactly the same >> operations and were composable. In Python 3, str and bytes are different >> beasts; they have different operations *and* different semantics (for >> example, bytes interoperates with bytearray and memoryview, while str >> doesn't). > > This makes sense to me. > > So I'm guess I'm fine with either the quoted ascii repr or the always blowing > up method, with leaning towards the blowing up method. +1. Georg From brett at python.org Mon Jan 13 19:40:16 2014 From: brett at python.org (Brett Cannon) Date: Mon, 13 Jan 2014 13:40:16 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140113183114.0fea07a8@fsol> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> Message-ID: On Mon, Jan 13, 2014 at 12:31 PM, Antoine Pitrou wrote: > On Mon, 13 Jan 2014 08:36:05 -0800 > Ethan Furman wrote: > > > On 01/13/2014 08:09 AM, Antoine Pitrou wrote: > > > On Mon, 13 Jan 2014 07:59:10 -0800 > > > Guido van Rossum wrote: > > >> On Mon, Jan 13, 2014 at 3:41 AM, Antoine Pitrou > wrote: > > >>> What is the use case for embedding a quoted ASCII-encoded > representation > > >>> in a byte stream? > > >> > > >> It doesn't crash but produces undesired output (always, not only when > > >> the data is non-ASCII) that gives the developer a hint to think about > > >> encoding to bytes. > > > > > > But why is it better to give a hint by producing undesired output > (which > > > may actually go unnoticed for some time and produce issues down the > > > road), rather than simply by raising TypeError? > > > > You mean crash all the time? I'd be fine with that for both the str case > > and the bytes case. But's probably too late > > to change the str case, and the bytes case should mirror what str does. > > Let me add something else: str and bytes don't have to be symmetrical. > In Python 2, str and unicode were symmetrical, they allowed exactly the > same operations and were composable. > In Python 3, str and bytes are different beasts; they have different > operations *and* different semantics (for example, bytes interoperates > with bytearray and memoryview, while str doesn't). > This is also why the int type doesn't have a __bytes__ method (ignoring the use of an integer to bytes()): it's universally defined what str(10) should return, but who know what you want when you would want the bytes of 10 (e.g. base-2, ASCII, UTF-16, etc.). > > So bytes formatting really needn't (and shouldn't, IMO) mirror str > formatting. > I think one of the things about Guido's proposal that bugs me is that it breaks the mental model of the .format() method from str in terms of how the mini-language works. For str.format() you have the conversion and the format spec (e.g. "{!r}" and "{:d}", respectively). You apply the conversion by calling the appropriate built-in, e.g. 'r' calls repr(). The format spec semantically gets passed with the object to format() which calls the object's __format__() method: ``format(number, 'd')``. Now Guido's suggestion has two parts that affect the mini-language for .format(). One is that for bytes.format() the default conversion is bytes() instead of str(), which is fine (probably want to add 'b' as a conversion value as well to be consistent). But the other bit is that the format spec goes from semantically meaning ``format(thing, format_spec)`` to ``format(thing, format_spec).encode('ascii', 'strict')`` for at least numbers. That implicitness bugs me as I have always thought of format specs just leading to a call to format(). I think I can live with it, though, as long as it is **consistently** applied across the board for bytes.format(); every use of a format spec leads to calling ``format(thing, format_spec).encode('ascii', 'strict')`` no matter what type 'thing' would be and it is clearly documented that this is done to ease porting and handle the common case then I can live with it. This even gives people in-place ASCII encoding for strings by always using '{:s}' with text which they can do when they port their code to run under both Python 2 and 3. So you should be able to do ``b'Content-Type: {:s}'.format('image/jpeg')`` and have it give ASCII. If you want more explicit encoding to latin-1 then you need to do it explicitly and not rely on the mini-language to do tricks for you. IOW I want to treat the format mini-language as a language and thus not have any special-casing or massive shifts in meaning between str.format() and bytes.format() so my mental model doesn't have to contort based on whether it's str or bytes. My preference is not have any, but if Guido is going say PBP here then I want absolute consistency across the board in how bytes.format() tweaks things. As for %s for the % operator calling ascii(), I think that will be a porting nightmare of finding out why your bytes suddenly stopped being formatted properly and then having to crawl through all of your code for that one use of %s which is getting bytes in. By raising a TypeError you will very easily detect where your screw-up occurred thanks to the traceback; do so otherwise feels too much like implicit type conversion and ask any JavaScript developer how that can be a bad thing. -Brett > > (the only reason I used "%s" in PEP 460 is to allow a migration path > from 2.x bytes-formatting to 3.x bytes-formatting; in a really "pure" > proposal it would have been called something else) > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Mon Jan 13 19:45:37 2014 From: dholth at gmail.com (Daniel Holth) Date: Mon, 13 Jan 2014 13:45:37 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140113174237.25050250030@webabinitio.net> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> Message-ID: On Mon, Jan 13, 2014 at 12:42 PM, R. David Murray wrote: > On Mon, 13 Jan 2014 12:41:18 +0100, Antoine Pitrou wrote: >> On Sun, 12 Jan 2014 18:11:47 -0800 >> Guido van Rossum wrote: >> > On Sun, Jan 12, 2014 at 5:27 PM, Ethan Furman wrote: >> > > On 01/12/2014 04:47 PM, Guido van Rossum wrote: >> > >> %s seems the trickiest: I think with a bytes argument it should just >> > >> insert those bytes (and the padding modifiers should work too), and >> > >> for other types it should probably work like %a, so that it works as >> > >> expected for numeric values, and with a string argument it will return >> > >> the ascii()-variant of its repr(). Examples: >> > >> >> > >> b'%s' % 42 == b'42' >> > >> b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' >> > >> enclosed in single quotes) >> > > >> > > I'm not sure about the quotes. Would anyone ever actually want those in the >> > > byte stream? >> > >> > Perhaps not, but it's a hint that you should probably think about an >> > encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of >> > it as payback time. :-) >> >> What is the use case for embedding a quoted ASCII-encoded representation >> in a byte stream? > > There is no use case in the sense you are asking, just like there is no > real use case for '%s' % b'x' producing "b'x'". But the real use case > is exactly the same: to let you know your code is screwed up without > actually blowing up with a encoding Exception. > > For the record, I like Guido's logic and proposal. I don't understand > Nick's objection, since I don't see the difference between the situation > here where a string gets interpolated into bytes as 'xxx' and the > corresponding situation where bytes gets interpolated into a string > as b'xxx'. Why struggle to keep bytes interpolation "pure" if string > interpolation isn't? > > Guido's proposal makes the language more symmetric, and thus more > consistent and less surprising. Exactly the hallmarks of Python's design > sense, IMO. (Big surprise, right? :) > > Of course, this point of view *is* based on the idea that when you are > doing interpolation using %/.format, you are in fact primarily concerned > with ASCII compatible byte streams. This is a Practicality sort of > argument. It is, after all, by far the most common use case when > doing interpolation[*]. > > If you wanted to do a purist version of this symmetry, you'd have bytes(x) > calling __bytes__ if it was defined and falling back to calling a > __brepr__ otherwise. > > But what would __brepr__ implement? The variety of format codes in > the struct module argues that there is no "one obvious" binary > repr for most types. (Those that have one would implement __bytes__). > And what would be the __brepr__ of an arbitrary 'object'? > > Faced with the impracticality of defining __brepr__ usefully in any "pure > bytes" form, it seems sensible to admit that the most useful __brepr__ > is the ascii() encoding of the __repr__. Which naturally produces 'xxx' > as the __brepr__ of a string. > > This does cause things to get a little un-pretty when you are operating > at the python prompt: > > >>> b'%s' % object > b'""' > > But then again that is most likely really not what you mean to do, so > it becomes a big red flag...just like b'xxx' is a small red flag when > you accidentally interpolate unencoded bytes into a string. > > --David > > PS: When I first read Guido's remark that the result of interpolating a > string should be 'xxx', I went Wah? I had to reason my way through to > it as above, but to him it was just the natural answer. Guido isn't > always right, but this kind of automatic language design consistency > is one reason he's the BDFL. > > [*] I still think that you mostly want to design your library so that > you are handling the text parts as text and the bytes parts as bytes, > and encoding/gluing them as appropriate at the IO boundary. But if Guido > says his real code would benefit by being able to interpolate ASCII into > bytes at certain points, I'll believe him. If you think corrupted data is easier or more pleasant to track down than encoding exceptions then I think you are strange. It makes porting really difficult while you are still trying to figure out where the bytes/str boundaries are. I am now deeply suspicious of all % formatting. From donald at stufft.io Mon Jan 13 19:58:06 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 13 Jan 2014 13:58:06 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> Message-ID: On Jan 13, 2014, at 1:45 PM, Daniel Holth wrote: > On Mon, Jan 13, 2014 at 12:42 PM, R. David Murray wrote: >> On Mon, 13 Jan 2014 12:41:18 +0100, Antoine Pitrou wrote: >>> On Sun, 12 Jan 2014 18:11:47 -0800 >>> Guido van Rossum wrote: >>>> On Sun, Jan 12, 2014 at 5:27 PM, Ethan Furman wrote: >>>>> On 01/12/2014 04:47 PM, Guido van Rossum wrote: >>>>>> %s seems the trickiest: I think with a bytes argument it should just >>>>>> insert those bytes (and the padding modifiers should work too), and >>>>>> for other types it should probably work like %a, so that it works as >>>>>> expected for numeric values, and with a string argument it will return >>>>>> the ascii()-variant of its repr(). Examples: >>>>>> >>>>>> b'%s' % 42 == b'42' >>>>>> b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' >>>>>> enclosed in single quotes) >>>>> >>>>> I'm not sure about the quotes. Would anyone ever actually want those in the >>>>> byte stream? >>>> >>>> Perhaps not, but it's a hint that you should probably think about an >>>> encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of >>>> it as payback time. :-) >>> >>> What is the use case for embedding a quoted ASCII-encoded representation >>> in a byte stream? >> >> There is no use case in the sense you are asking, just like there is no >> real use case for '%s' % b'x' producing "b'x'". But the real use case >> is exactly the same: to let you know your code is screwed up without >> actually blowing up with a encoding Exception. >> >> For the record, I like Guido's logic and proposal. I don't understand >> Nick's objection, since I don't see the difference between the situation >> here where a string gets interpolated into bytes as 'xxx' and the >> corresponding situation where bytes gets interpolated into a string >> as b'xxx'. Why struggle to keep bytes interpolation "pure" if string >> interpolation isn't? >> >> Guido's proposal makes the language more symmetric, and thus more >> consistent and less surprising. Exactly the hallmarks of Python's design >> sense, IMO. (Big surprise, right? :) >> >> Of course, this point of view *is* based on the idea that when you are >> doing interpolation using %/.format, you are in fact primarily concerned >> with ASCII compatible byte streams. This is a Practicality sort of >> argument. It is, after all, by far the most common use case when >> doing interpolation[*]. >> >> If you wanted to do a purist version of this symmetry, you'd have bytes(x) >> calling __bytes__ if it was defined and falling back to calling a >> __brepr__ otherwise. >> >> But what would __brepr__ implement? The variety of format codes in >> the struct module argues that there is no "one obvious" binary >> repr for most types. (Those that have one would implement __bytes__). >> And what would be the __brepr__ of an arbitrary 'object'? >> >> Faced with the impracticality of defining __brepr__ usefully in any "pure >> bytes" form, it seems sensible to admit that the most useful __brepr__ >> is the ascii() encoding of the __repr__. Which naturally produces 'xxx' >> as the __brepr__ of a string. >> >> This does cause things to get a little un-pretty when you are operating >> at the python prompt: >> >>>>> b'%s' % object >> b'""' >> >> But then again that is most likely really not what you mean to do, so >> it becomes a big red flag...just like b'xxx' is a small red flag when >> you accidentally interpolate unencoded bytes into a string. >> >> --David >> >> PS: When I first read Guido's remark that the result of interpolating a >> string should be 'xxx', I went Wah? I had to reason my way through to >> it as above, but to him it was just the natural answer. Guido isn't >> always right, but this kind of automatic language design consistency >> is one reason he's the BDFL. >> >> [*] I still think that you mostly want to design your library so that >> you are handling the text parts as text and the bytes parts as bytes, >> and encoding/gluing them as appropriate at the IO boundary. But if Guido >> says his real code would benefit by being able to interpolate ASCII into >> bytes at certain points, I'll believe him. > > > > If you think corrupted data is easier or more pleasant to track down > than encoding exceptions then I think you are strange. It makes > porting really difficult while you are still trying to figure out > where the bytes/str boundaries are. I am now deeply suspicious of all > % formatting. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io For the record, I think %d and %f and such where the RHS is guaranteed to have a certain set of ?characters? that are guaranteed to be ascii compatible is fine and it?s perfectly acceptable to have an implicit ASCII encode for them. The %s code I?m not sure of, I think trying to ascii encode that (just using encode()) is dangerous, and I think that using ascii() and adding quotes to it is never what anyone is going to want. Given that I think it?d be far better to blow up if you?re using %s (or at least using %s on a str object and not as an alias for %b) than to implicitly encode that (given we don?t know what the RHS can contain) or to throw junk data into the bytes that we know pretty much nobody ever is going to actually want. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ethan at stoneleaf.us Mon Jan 13 20:05:13 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 11:05:13 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D39358.8090405@stoneleaf.us> <52D40C29.5040708@stoneleaf.us> Message-ID: <52D438E9.8010307@stoneleaf.us> On 01/13/2014 09:12 AM, Nick Coghlan wrote: > On 14 January 2014 01:54, Ethan Furman wrote: >> >> Forgive me for being dense, but I don't understand your objection. With >> Guido's proposal, '%s' % bytes_data, bytes_data is passed through unchanged. >> Did you mean something else by "binary data"? > > I mean it will work, but it will mean you've introduced an implicit > assumption of ASCII compatibility into the structure your program Okay, I'm still trying to understand. Apparently we both mean the same thing by binary data / bytes, so the difference must be the %s, yes? And the concern as that because you have used %s as the format code, if somebody accidentally put, say, "stupid bug" on the RHS you would end up with b"'stupid bug'" instead of an exception, which you get if you had used %b instead. Am I following? -- ~Ethan~ From guido at python.org Mon Jan 13 19:58:24 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 10:58:24 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140113174237.25050250030@webabinitio.net> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> Message-ID: Let me try rebooting the reboot. My interpretation of Nick's argument is that he are asking for a bytes formatting language that doesn't have an implicit ASCII assumption. To me this feels absurd. The formatting codes (%s, %c) themselves are expressed as ASCII characters. If you include anything else in the format string besides formatting codes (e.g. b'<%s>'), you are giving it as ASCII characters. I don't know what characters the EBCDIC codes 37, 99 or 115 encode (these are the ASCII codes for '%', 'c', 's') but it certainly wouldn't be safe to use % when the LHS is EBCDIC-encoded. If I had some byte strings in an unknown encoding (but the same encoding for all) that I needed to concatenate I would never think of '%s%s' % (x, y) -- I would write x+y. (Even in Python 2.) If I see some code using *any* formatting operation (regardless of whether it's %d, %r, %s or %c) I am going to assume that there is some ASCII-ness, and if there isn't, the code's author has obscured their goal to me. I hear the objections against b'%s' % 'x' returning b"'x'" loud and clear, and if the noise about that sub-issue is preventing folks from seeing the absurdity in PEP 460, we can talk about a compromise, e.g. use %b which would require its argument to be bytes. Those bytes should still probably be ASCII-ish, but there's no way to test that. That's fine with me and should be fine to Nick as well -- PEP 460 doesn't check that your encodings match (how could it? :-), nor does plain string concatenation using +. In my head I make the following classification of situations where you work with bytes and/or text. (A) Pure binary formats (e.g. most IP-level packet formats, media files, .pyc files, tar/zip files, compressed data, etc.). These are handled using the struct module (e.g. tar/zip) and/or custom C extensions (e.g. gzip). (B) Encoded text. Here you should just decode everything into str objects and parse your text at that level. If you really want to manipulate the data as bytes (e.g. because you have a lot of data to process and very light processing) you may be able to do it, but unless it's a verbatim copy, you are probably going to make assumptions about the encoding. You are also probably going to mess up for some encodings (e.g. leave BOM turds in the middle of a file). (C) Loosely text-based protocols and formats that have an ASCII assumption in the spec. Most classic Internet protocols (FTP, SMTP, HTTP, IRC, etc.) fall in this category; I expect there are also plenty of file formats using similar conventions (e.g. mailbox files). These protocols and formats often require text-ish manipulations, e.g. for case-insensitive headers or commands, or to split things at whitespace. This is where I find uses for the current ASCII-assuming bytes operations (e.g. b.lower(), b.split(), but also int(b)) and where the lack of number formatting (especially %d and %x) is most painful. I see no benefit in forcing the programmer writing such protocol code handling to use more cumbersome ways of converting between numbers and bytes, nor in forcing them to insert an encoding/decoding layer -- these protocols often switch between text and binary data at line boundaries, so the most basic part of parsing (splitting the input into lines) must still happen in the realm of bytes. IMO PEP 460 and the mindset that goes with it don't apply to any of these three cases. Also, IMO requiring a new type to handle (C) also seems adding too much complexity, and adds to porting efforts. I may have felt differently in the past, but ATM I feel that if newer versions of Python 3 make porting of Python 2 code easier, through minor compromises, that's a *good* thing. (Example: adding u"..." literals to 3.3.) -- --Guido van Rossum (python.org/~guido) From raf at durin42.com Mon Jan 13 19:51:32 2014 From: raf at durin42.com (Augie Fackler) Date: Mon, 13 Jan 2014 18:51:32 +0000 (UTC) Subject: [Python-Dev] PEP460 thoughts from a Mercurial dev References: <20140113184301.2fb4bf96@fsol> Message-ID: Antoine Pitrou pitrou.net> writes: > > On Mon, 13 Jan 2014 09:34:39 -0800 > Guido van Rossum python.org> wrote: > > On Mon, Jan 13, 2014 at 8:51 AM, Nick Coghlan gmail.com> wrote: > > > On 13 January 2014 23:57, Augie Fackler durin42.com> wrote: > > >> 1) What do we need in terms of functionality > > >> > > >> Best guess, %s, %d, and %f. I've not done a full audit of the code, but some > > >> limited looking over the grep hits for % in .py files suggests I'm right, > > >> and we could even do without %f (we only use that for 'hg --time' output, > > >> which we could do in unicode). > > > > > > I think PEP 460 will have you covered there, or hopefully asciistr on 3.3+ > > > > I'm confused on how PEP 460 would help -- Augie mentioned %d, which it excludes. > > Serhiy did a survey of formatting codes in the Mercurial sources: > https://mail.python.org/pipermail/python-dev/2014-January/130969.html Note that a lot of those are in debug code (eg the only %f I've spotted is), or are time format specifiers (which can be unicode just fine). A few others (eg %ln) are for our internal revset format-string language, so this overstates what we'd need in bytes by a little. %f would probably be good too, as I look a little more. (Please don't remove me from the CC list - I could only respond via gmane because I'm not subscribed to python-dev.) > > Regards > > Antoine. > > From donald at stufft.io Mon Jan 13 20:13:13 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 13 Jan 2014 14:13:13 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> Message-ID: <4B5FC5C3-26C9-42E4-BE19-BF2FFDCB411F@stufft.io> On Jan 13, 2014, at 1:58 PM, Guido van Rossum wrote: > I hear the objections against b'%s' % 'x' returning b"'x'" loud and > clear, and if the noise about that sub-issue is preventing folks from > seeing the absurdity in PEP 460, we can talk about a compromise, e.g. > use %b which would require its argument to be bytes. Those bytes > should still probably be ASCII-ish, but there's no way to test that. > That's fine with me and should be fine to Nick as well -- PEP 460 > doesn't check that your encodings match (how could it? :-), nor does > plain string concatenation using +. I think disallowing %s is the right thing to do, but I definitely think numbers and %b should be allowed. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From solipsis at pitrou.net Mon Jan 13 20:17:20 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Jan 2014 20:17:20 +0100 Subject: [Python-Dev] PEP460 thoughts from a Mercurial dev References: <20140113184301.2fb4bf96@fsol> Message-ID: <20140113201720.4e4fa175@fsol> On Mon, 13 Jan 2014 18:51:32 +0000 (UTC) Augie Fackler wrote: > > (Please don't remove me from the CC list - I could only respond via gmane > because I'm not subscribed to python-dev.) Responding via gmane is what I do, too :-) My NNTP client doesn't allow SMTP / NNTP mixed postings, so I'm forced to remove you from CC. Regards Antoine. From tjreedy at udel.edu Mon Jan 13 20:51:30 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 13 Jan 2014 14:51:30 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> Message-ID: On 1/13/2014 1:40 PM, Brett Cannon wrote: > > So bytes formatting really needn't (and shouldn't, IMO) mirror str > > formatting. This was my presumption in writing byteformat(). > I think one of the things about Guido's proposal that bugs me is that it > breaks the mental model of the .format() method from str in terms of how > the mini-language works. For str.format() you have the conversion and > the format spec (e.g. "{!r}" and "{:d}", respectively). You apply the > conversion by calling the appropriate built-in, e.g. 'r' calls repr(). > The format spec semantically gets passed with the object to format() > which calls the object's __format__() method: ``format(number, 'd')``. > > Now Guido's suggestion has two parts that affect the mini-language for > .format(). One is that for bytes.format() the default conversion is > bytes() instead of str(), which is fine (probably want to add 'b' as a > conversion value as well to be consistent). But the other bit is that > the format spec goes from semantically meaning ``format(thing, > format_spec)`` to ``format(thing, format_spec).encode('ascii', > 'strict')`` for at least numbers. That implicitness bugs me as I have > always thought of format specs just leading to a call to format(). I > think I can live with it, though, as long as it is **consistently** > applied across the board for bytes.format(); every use of a format spec > leads to calling ``format(thing, format_spec).encode('ascii', > 'strict')`` no matter what type 'thing' would be and it is clearly > documented that this is done to ease porting and handle the common case > then I can live with it. This is how my byteformat function works, except that when no format_spec is given, byte and bytearrary objects are left unchanged rather than being decoded and encoded again. > This even gives people in-place ASCII encoding for strings by always > using '{:s}' with text which they can do when they port their code to > run under both Python 2 and 3. So you should be able to do > ``b'Content-Type: {:s}'.format('image/jpeg')`` and have it give ASCII. > If you want more explicit encoding to latin-1 then you need to do it > explicitly and not rely on the mini-language to do tricks for you. > > IOW I want to treat the format mini-language as a language and thus not > have any special-casing or massive shifts in meaning between > str.format() and bytes.format() so my mental model doesn't have to > contort based on whether it's str or bytes. My preference is not have > any, but if Guido is going say PBP here then I want absolute consistency > across the board in how bytes.format() tweaks things. > > As for %s for the % operator calling ascii(), I think that will be a > porting nightmare of finding out why your bytes suddenly stopped being > formatted properly and then having to crawl through all of your code for > that one use of %s which is getting bytes in. By raising a TypeError you > will very easily detect where your screw-up occurred thanks to the > traceback; do so otherwise feels too much like implicit type conversion > and ask any JavaScript developer how that can be a bad thing. I personally would not add 'bytes % whatever'. -- Terry Jan Reedy From barry at python.org Mon Jan 13 20:57:57 2014 From: barry at python.org (Barry Warsaw) Date: Mon, 13 Jan 2014 14:57:57 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <4B5FC5C3-26C9-42E4-BE19-BF2FFDCB411F@stufft.io> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> <4B5FC5C3-26C9-42E4-BE19-BF2FFDCB411F@stufft.io> Message-ID: <20140113145757.671103a7@anarchist.wooz.org> On Jan 13, 2014, at 02:13 PM, Donald Stufft wrote: > >On Jan 13, 2014, at 1:58 PM, Guido van Rossum wrote: > >> I hear the objections against b'%s' % 'x' returning b"'x'" loud and >> clear, and if the noise about that sub-issue is preventing folks from >> seeing the absurdity in PEP 460, we can talk about a compromise, e.g. >> use %b which would require its argument to be bytes. Those bytes >> should still probably be ASCII-ish, but there's no way to test that. >> That's fine with me and should be fine to Nick as well -- PEP 460 >> doesn't check that your encodings match (how could it? :-), nor does >> plain string concatenation using +. > >I think disallowing %s is the right thing to do, but I definitely think >numbers and %b should be allowed. I guess I agree. The behavior of b'%s' % 'x' returning b"'x'" is almost always useless at best. (I would have thought maybe %a for ascii() but don't care that strongly.) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From brett at python.org Mon Jan 13 21:02:02 2014 From: brett at python.org (Brett Cannon) Date: Mon, 13 Jan 2014 15:02:02 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> Message-ID: On Mon, Jan 13, 2014 at 2:51 PM, Terry Reedy wrote: > On 1/13/2014 1:40 PM, Brett Cannon wrote: > > > So bytes formatting really needn't (and shouldn't, IMO) mirror str >> > formatting. >> > > This was my presumption in writing byteformat(). > > > I think one of the things about Guido's proposal that bugs me is that it >> breaks the mental model of the .format() method from str in terms of how >> the mini-language works. For str.format() you have the conversion and >> the format spec (e.g. "{!r}" and "{:d}", respectively). You apply the >> conversion by calling the appropriate built-in, e.g. 'r' calls repr(). >> The format spec semantically gets passed with the object to format() >> which calls the object's __format__() method: ``format(number, 'd')``. >> >> Now Guido's suggestion has two parts that affect the mini-language for >> .format(). One is that for bytes.format() the default conversion is >> bytes() instead of str(), which is fine (probably want to add 'b' as a >> conversion value as well to be consistent). But the other bit is that >> the format spec goes from semantically meaning ``format(thing, >> format_spec)`` to ``format(thing, format_spec).encode('ascii', >> 'strict')`` for at least numbers. That implicitness bugs me as I have >> always thought of format specs just leading to a call to format(). I >> think I can live with it, though, as long as it is **consistently** >> applied across the board for bytes.format(); every use of a format spec >> leads to calling ``format(thing, format_spec).encode('ascii', >> 'strict')`` no matter what type 'thing' would be and it is clearly >> documented that this is done to ease porting and handle the common case >> then I can live with it. >> > > This is how my byteformat function works, except that when no format_spec > is given, byte and bytearrary objects are left unchanged rather than being > decoded and encoded again. Right, which is what the default conversion covers. And as your code shows this can be made available today without having to wait for Python 3.5 and so can go up on PyPI and be used **today**. > > > This even gives people in-place ASCII encoding for strings by always >> using '{:s}' with text which they can do when they port their code to >> run under both Python 2 and 3. So you should be able to do >> ``b'Content-Type: {:s}'.format('image/jpeg')`` and have it give ASCII. >> If you want more explicit encoding to latin-1 then you need to do it >> explicitly and not rely on the mini-language to do tricks for you. >> >> IOW I want to treat the format mini-language as a language and thus not >> have any special-casing or massive shifts in meaning between >> str.format() and bytes.format() so my mental model doesn't have to >> contort based on whether it's str or bytes. My preference is not have >> any, but if Guido is going say PBP here then I want absolute consistency >> across the board in how bytes.format() tweaks things. >> >> As for %s for the % operator calling ascii(), I think that will be a >> porting nightmare of finding out why your bytes suddenly stopped being >> formatted properly and then having to crawl through all of your code for >> that one use of %s which is getting bytes in. By raising a TypeError you >> will very easily detect where your screw-up occurred thanks to the >> traceback; do so otherwise feels too much like implicit type conversion >> and ask any JavaScript developer how that can be a bad thing. >> > > I personally would not add 'bytes % whatever'. Personally, neither would I; just focus on bytes.format() and let % operator on strings slowly go away. -Brett > > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Mon Jan 13 21:07:47 2014 From: dholth at gmail.com (Daniel Holth) Date: Mon, 13 Jan 2014 15:07:47 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> Message-ID: I see it now. b"foo%sbar" % b'baz' should also expand to b"foob'foo'bar" Instead of "%b" could "%j" mean "I should have used + or join() here but was too lazy" and work on str too? On Mon, Jan 13, 2014 at 2:51 PM, Terry Reedy wrote: > On 1/13/2014 1:40 PM, Brett Cannon wrote: > >> > So bytes formatting really needn't (and shouldn't, IMO) mirror str >> > formatting. > > > This was my presumption in writing byteformat(). > > >> I think one of the things about Guido's proposal that bugs me is that it >> breaks the mental model of the .format() method from str in terms of how >> the mini-language works. For str.format() you have the conversion and >> the format spec (e.g. "{!r}" and "{:d}", respectively). You apply the >> conversion by calling the appropriate built-in, e.g. 'r' calls repr(). >> The format spec semantically gets passed with the object to format() >> which calls the object's __format__() method: ``format(number, 'd')``. >> >> Now Guido's suggestion has two parts that affect the mini-language for >> .format(). One is that for bytes.format() the default conversion is >> bytes() instead of str(), which is fine (probably want to add 'b' as a >> conversion value as well to be consistent). But the other bit is that >> the format spec goes from semantically meaning ``format(thing, >> format_spec)`` to ``format(thing, format_spec).encode('ascii', >> 'strict')`` for at least numbers. That implicitness bugs me as I have >> always thought of format specs just leading to a call to format(). I >> think I can live with it, though, as long as it is **consistently** >> applied across the board for bytes.format(); every use of a format spec >> leads to calling ``format(thing, format_spec).encode('ascii', >> 'strict')`` no matter what type 'thing' would be and it is clearly >> documented that this is done to ease porting and handle the common case >> then I can live with it. > > > This is how my byteformat function works, except that when no format_spec is > given, byte and bytearrary objects are left unchanged rather than being > decoded and encoded again. > > >> This even gives people in-place ASCII encoding for strings by always >> using '{:s}' with text which they can do when they port their code to >> run under both Python 2 and 3. So you should be able to do >> ``b'Content-Type: {:s}'.format('image/jpeg')`` and have it give ASCII. >> If you want more explicit encoding to latin-1 then you need to do it >> explicitly and not rely on the mini-language to do tricks for you. >> >> IOW I want to treat the format mini-language as a language and thus not >> have any special-casing or massive shifts in meaning between >> str.format() and bytes.format() so my mental model doesn't have to >> contort based on whether it's str or bytes. My preference is not have >> any, but if Guido is going say PBP here then I want absolute consistency >> across the board in how bytes.format() tweaks things. >> >> As for %s for the % operator calling ascii(), I think that will be a >> porting nightmare of finding out why your bytes suddenly stopped being >> formatted properly and then having to crawl through all of your code for >> that one use of %s which is getting bytes in. By raising a TypeError you >> will very easily detect where your screw-up occurred thanks to the >> traceback; do so otherwise feels too much like implicit type conversion >> and ask any JavaScript developer how that can be a bad thing. > > > I personally would not add 'bytes % whatever'. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com From guido at python.org Mon Jan 13 21:09:23 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 12:09:23 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140113145757.671103a7@anarchist.wooz.org> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> <4B5FC5C3-26C9-42E4-BE19-BF2FFDCB411F@stufft.io> <20140113145757.671103a7@anarchist.wooz.org> Message-ID: On Mon, Jan 13, 2014 at 11:57 AM, Barry Warsaw wrote: > On Jan 13, 2014, at 02:13 PM, Donald Stufft wrote: >>On Jan 13, 2014, at 1:58 PM, Guido van Rossum wrote: >>> I hear the objections against b'%s' % 'x' returning b"'x'" loud and >>> clear, and if the noise about that sub-issue is preventing folks from >>> seeing the absurdity in PEP 460, we can talk about a compromise, e.g. >>> use %b which would require its argument to be bytes. Those bytes >>> should still probably be ASCII-ish, but there's no way to test that. >>> That's fine with me and should be fine to Nick as well -- PEP 460 >>> doesn't check that your encodings match (how could it? :-), nor does >>> plain string concatenation using +. >>I think disallowing %s is the right thing to do, but I definitely think >>numbers and %b should be allowed. > I guess I agree. The behavior of b'%s' % 'x' returning b"'x'" is almost > always useless at best. (I would have thought maybe %a for ascii() but don't > care that strongly.) Yeah, the %s behavior with a string argument was a messy attempt at compromise. I was hoping to mimick a common use of %s in Python 2, where it can be used with either an 8-bit string or a number as argument, acting like %b in the former case and like %d in the latter case. Not having %s at all in Python 3 means that porting requires more thinking (== more opportunity for mistakes when you're converting in bulk) and there's no easy way to write code that works in Python 2 and 3. If we have %b for strictly interpolating bytes, I'm fine with adding %a for calling ascii() on the argument and then interpolating the result after ASCII-encoding it. If somehow (unlikely though it seems) we end up keeping %s (e.g. strictly to ease porting), we could also keep %r as an alias for %a. -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Mon Jan 13 21:11:37 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 13 Jan 2014 15:11:37 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> Message-ID: On January 13, 2014 at 3:08:43 PM, Daniel Holth (dholth at gmail.com) wrote: > > I see it now. b"foo%sbar" % b'baz' should also expand to b"foob'foo'bar" > > Instead of "%b" could "%j" mean "I should have used + or join() > here > but was too lazy" and work on str too? Isn?t this just error prone? Since it?s a new format character, many, probably, would write %s by mistake. And, besides, there was no %j in python2. - Yury From guido at python.org Mon Jan 13 21:13:16 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 12:13:16 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> Message-ID: On Mon, Jan 13, 2014 at 12:02 PM, Brett Cannon wrote: > On Mon, Jan 13, 2014 at 2:51 PM, Terry Reedy wrote: >> I personally would not add 'bytes % whatever'. > > Personally, neither would I; just focus on bytes.format() and let % operator > on strings slowly go away. Well, % has some very strong arguments in its favor still -- for example, the sheer amount of code that currently uses it, the fact that it's as close as we get to a cross-language standard, and the fact that nobody wants to tackle its use in the logging module (since logger objects are often shared between packages that don't know about each other). Anyway, the % or .format() issue seems completely orthogonal to the issues that get people riled up (which are mostly about whether using either implies some kind of ASCII compatibility). -- --Guido van Rossum (python.org/~guido) From dholth at gmail.com Mon Jan 13 21:15:34 2014 From: dholth at gmail.com (Daniel Holth) Date: Mon, 13 Jan 2014 15:15:34 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> Message-ID: On Mon, Jan 13, 2014 at 3:11 PM, Yury Selivanov wrote: > On January 13, 2014 at 3:08:43 PM, Daniel Holth (dholth at gmail.com) wrote: >> >> I see it now. b"foo%sbar" % b'baz' should also expand to b"foob'foo'bar" >> >> Instead of "%b" could "%j" mean "I should have used + or join() >> here >> but was too lazy" and work on str too? > > Isn?t this just error prone? Since it?s a new format character, many, > probably, would write %s by mistake. And, besides, there was no %j > in python2. Merely a flesh wound. From eric at trueblade.com Mon Jan 13 21:24:12 2014 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 13 Jan 2014 15:24:12 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> <4B5FC5C3-26C9-42E4-BE19-BF2FFDCB411F@stufft.io> <20140113145757.671103a7@anarchist.wooz.org> Message-ID: <52D44B6C.6040406@trueblade.com> On 01/13/2014 03:09 PM, Guido van Rossum wrote: > If we have %b for strictly interpolating bytes, I'm fine with adding > %a for calling ascii() on the argument and then interpolating the > result after ASCII-encoding it. > > If somehow (unlikely though it seems) we end up keeping %s (e.g. > strictly to ease porting), we could also keep %r as an alias for %a. Wouldn't %s as an alias for %b simplify porting from Python 2? From techtonik at gmail.com Mon Jan 13 20:56:20 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 13 Jan 2014 22:56:20 +0300 Subject: [Python-Dev] cpython (3.3): Update Sphinx toolchain. In-Reply-To: References: <3f1r8102P7z7LjX@mail.python.org> <52D1A58C.4020805@udel.edu> Message-ID: That's cool, but historical heritage makes the make argument somewhat confusing for new users. The immediate question I can sense is "What is the difference between build and make?" To make (this word again) the critics constructive, let me pass some ideas about ideal user experience as I see it. --[installation]-- 1 I install Sphinx. Two scenarios. 1.1 I am not a Python user - use installer 1.1.1 Installer should obviously install Python 1.1.2 And install sphinx command 1.1.3 And add sphinx to PATH 1.2 I am a Python user - use pip 1.2.1 pip should not alter my PATH (for virtualenv) --[usage]-- 2 Two scenarios 2.1 sphinx as a system command from PATH 2.2 "python -m sphinx" for current virtualenv / test config --[user experience]-- 3 These two invocations are equal > sphinx > python -m sphinx 4. They give the following ouput > Sphinx 1.2 Documentation Generator Commands: build build documentation init start new project [also quickstart] make helper for common build commands Use "sphinx -h command" or "sphinx command --help" for details I am not using sphinx ATM otherwise I'd spent more time to design ideal command set to get rid of build/make duality, but it should work ok. Actually "sphinx" is a new command, so you may rethink the syntax for "build" arguments to contain "html" instead of dir names, and move dir names into parameters, because it is how it is most often used. -- anatoly t. On Sun, Jan 12, 2014 at 4:53 PM, Georg Brandl wrote: > That's also planned, see https://bitbucket.org/birkenfeld/sphinx-new-make-mode/. > > Georg > > Am 12.01.2014 09:49, schrieb anatoly techtonik: >> And cross-platform automation tools in Python instead of make >> https://bitbucket.org/birkenfeld/sphinx/issue/456/makepy-command-script >> -- >> anatoly t. >> >> >> On Sun, Jan 12, 2014 at 11:12 AM, INADA Naoki wrote: >>> What about using venv and pip instead of svn? > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/techtonik%40gmail.com From ethan at stoneleaf.us Mon Jan 13 21:07:21 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 12:07:21 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> Message-ID: <52D44779.8070301@stoneleaf.us> On 01/13/2014 12:02 PM, Brett Cannon wrote: > > Personally, neither would I; just focus on bytes.format() and let % operator on strings slowly go away. Hey, now, some of us like %! ;) -- ~Ethan~ From v+python at g.nevcal.com Mon Jan 13 21:44:06 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 13 Jan 2014 12:44:06 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87fvosdjt5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <52D31049.1090708@g.nevcal.com> <87ob3geoa9.fsf@uwakimon.sk.tsukuba.ac.jp> <52D38BB6.1000100@g.nevcal.com> <87fvosdjt5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D45016.3060708@g.nevcal.com> On 1/13/2014 6:43 AM, Stephen J. Turnbull wrote: > Glenn Linderman writes: > > > On 1/12/2014 4:08 PM, Stephen J. Turnbull wrote: > >> Glenn Linderman writes: > >>> the proposals to embed binary in Unicode by abusing Latin-1 > >>> encoding. > > >> Those aren't "proposals", they are currently feasible > >> techniques in Python 3 for *some* use cases. The question is why > >> infecting Python 3 with the byte/character confoundance virus is > >> preferable to such techniques, especially if their (serious!) > >> deficiencies are removed by creating a new type such as > >> asciistr. > > > "smuggled binary" (great term borrowed from a different > > subthread) muddies the waters of what you are dealing with. > > Not really. The "mud" is one or more of the serious deficiencies. It > can be removed, I believe (and Nick apparently does, too). "asciistr" > is one way to try that. Yes really. Use of smuggled binary means the str containing it can no longer be treated completely as a str. That is "muddier" than having a str that is only a str. > > When the mixture of text and binary is done as encoded text in > > binary, then it is obvious that only limited text processing can be > > performed, > > Hardly. After all, that's how all text processing was done for > decades. Still is, in some programs, especially C programs. I disagree, and so do you... text processing must be limited to the text subsets of the text that includes smuggled binary... that is limited... you can't just apply text searches, scans, and transformations over the complete str, when it contains smuggled binary. You know that, but must have not considered it a limitation, because you know you can do any text processing on the text parts. But it is a limitation to have to keep track of it, and apply the text processing only to the parts that are text. Yes, it has been done that way, and the limitations of doing it that way led to the plethora of encodings each of which was intended to be sufficient for some problem domain, but most of which were only sufficient for a smaller problem domain than intended, especially as communications became more global in nature. > > And there are no extra, confusing Latin-1 encode/decode operations > > required. > > The "extra" encode/decode operations are mostly (perhaps all) due to > examples that started from bytes and end with bytes. Of course if you > assume that API and propose to do the operations using Unicode, you'll > get "extra" decode/encode operations. No, the "extra" encode/decode are from the requirement that smuggled binary use latin-1, and other binary flavors are not always latin-1. > > > From a higher-level perspective, I think it would be great to have > > a module, perhaps called "boundary" (let's call it that for now), > > that allow some definition syntax (augmented BNF? augmented ABNF?) > > to explain the format of a binary blob. > > We have struct, for one. I'm not sure why you want more than that. I > suppose you could go all the way to ASN.1. struct is insufficient to capture a whole file format, with optional parts, although it suffices for fragments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Mon Jan 13 21:54:05 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 13 Jan 2014 21:54:05 +0100 Subject: [Python-Dev] cpython (3.3): Update Sphinx toolchain. In-Reply-To: References: <3f1r8102P7z7LjX@mail.python.org> <52D1A58C.4020805@udel.edu> Message-ID: [If you want to continue this discussio, please move it from python-dev to sphinx-users. It is now completely offtopic for the former.] Anyway, just as a short explanation, you missed the point of the change: -M is not meant to be used directly but still via a (very short) Makefile. This isn't be a change meant to be visible to users. Georg Am 13.01.2014 20:56, schrieb anatoly techtonik: > That's cool, but historical heritage makes the make argument > somewhat confusing for new users. The immediate question I > can sense is "What is the difference between build and make?" > > To make (this word again) the critics constructive, let me pass > some ideas about ideal user experience as I see it. > > --[installation]-- > 1 I install Sphinx. Two scenarios. > 1.1 I am not a Python user - use installer > 1.1.1 Installer should obviously install Python > 1.1.2 And install sphinx command > 1.1.3 And add sphinx to PATH > 1.2 I am a Python user - use pip > 1.2.1 pip should not alter my PATH (for virtualenv) > > --[usage]-- > 2 Two scenarios > 2.1 sphinx as a system command from PATH > 2.2 "python -m sphinx" for current virtualenv / test config > > --[user experience]-- > 3 These two invocations are equal >> sphinx >> python -m sphinx > > 4. They give the following ouput >> > Sphinx 1.2 Documentation Generator > > Commands: > > build build documentation > init start new project [also quickstart] > make helper for common build commands > > Use "sphinx -h command" or "sphinx command --help" for details > > > I am not using sphinx ATM otherwise I'd spent more time to > design ideal command set to get rid of build/make duality, but > it should work ok. > > Actually "sphinx" is a new command, so you may rethink the > syntax for "build" arguments to contain "html" instead of dir names, > and move dir names into parameters, because it is how it is most > often used. > > -- > anatoly t. > > > On Sun, Jan 12, 2014 at 4:53 PM, Georg Brandl wrote: >> That's also planned, see https://bitbucket.org/birkenfeld/sphinx-new-make-mode/. >> >> Georg >> >> Am 12.01.2014 09:49, schrieb anatoly techtonik: >>> And cross-platform automation tools in Python instead of make >>> https://bitbucket.org/birkenfeld/sphinx/issue/456/makepy-command-script >>> -- >>> anatoly t. >>> >>> >>> On Sun, Jan 12, 2014 at 11:12 AM, INADA Naoki wrote: >>>> What about using venv and pip instead of svn? >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/techtonik%40gmail.com > From v+python at g.nevcal.com Mon Jan 13 21:54:31 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 13 Jan 2014 12:54:31 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D3B68D.90601@hotpy.org> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D34C71.1020200@stoneleaf.us> <52D34E65.2050007@stoneleaf.us> <52D3A7D5.9080106@hotpy.org> <52D3AF8C.9030203@g.nevcal.com> <52D3B68D.90601@hotpy.org> Message-ID: <52D45287.6040708@g.nevcal.com> On 1/13/2014 1:49 AM, Mark Shannon wrote: >>> So why not replace '%s' with '%a' for the ascii case and >>> with '%b' for directly inserting bytes. >> >> Because %a and %b don't exist in Python 2.7? > > I thought this was about 3.5, not 2.7 ;) > '%s' can't work in 3.5, as we must differentiate between > strings which meed to be encoded and bytes which don't. It's about migrating code to reach a point where it can work on both 2.7 and 3.5. -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Mon Jan 13 22:00:53 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 13 Jan 2014 23:00:53 +0200 Subject: [Python-Dev] PEP460 thoughts from a Mercurial dev In-Reply-To: References: Message-ID: 13.01.14 15:57, Augie Fackler ???????(??): > 1) What do we need in terms of functionality > > Best guess, %s, %d, and %f. I've not done a full audit of the code, but > some limited looking over the grep hits for % in .py files suggests I'm > right, and we could even do without %f (we only use that for 'hg --time' > output, which we could do in unicode). Most popular formatting codes in Mercurial sources (excluding %Y, %M, etc): 2519 %s 493 %d 102 %r 33 %i 23 %ld 19 %ln 12 %.3f 10 %.1f 9 %(val)r 9 %p 9 %.2f %s covers almost 80% of use cases and %d covers almost 20%. %r covers about 3%, %f covers less than 1%. So I think anything except %s and %d can be ignored. From p.f.moore at gmail.com Mon Jan 13 22:01:23 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 13 Jan 2014 21:01:23 +0000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> Message-ID: On 13 January 2014 18:58, Guido van Rossum wrote: > I hear the objections against b'%s' % 'x' returning b"'x'" loud and > clear, and if the noise about that sub-issue is preventing folks from > seeing the absurdity in PEP 460, we can talk about a compromise, e.g. > use %b which would require its argument to be bytes. Those bytes > should still probably be ASCII-ish, but there's no way to test that. > That's fine with me and should be fine to Nick as well -- PEP 460 > doesn't check that your encodings match (how could it? :-), nor does > plain string concatenation using +. For the record, Guido's reboot posting and rationale has convinced me, and I am essentially in favour of his proposal. Nick's remaining objection seems to me to have some validity if the format string is a user-supplied variable, but this type of usage is vanishingly small in my experience, and shouldn't dictate the whole design. I don't like b'%s' % 'x' behaviour, and would prefer one of the alternatives. I'm not entirely clear about the details of the alternative proposals, so I won't try to pick one. I think this should be for 3.5, and should not involve an accelerated release of 3.5 - we should get it into the 3.5 code early and let people thrash out the details during the 3.5 release cycle. Paul. PS For all the heated arguments and occasional frayed tempers, this has been an impressively civil debate. I think that's one of the best things about python-dev, that discussions like these never degenerate into flamewars. Kudos to all concerned! From v+python at g.nevcal.com Mon Jan 13 22:01:48 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 13 Jan 2014 13:01:48 -0800 Subject: [Python-Dev] PEP 460 reboot and a bitter fight In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> <3B1B5227-80B9-4E06-B639-F3111E96FF19@stufft.io> <52D39CD1.608@stoneleaf.us> Message-ID: <52D4543C.6020904@g.nevcal.com> On 1/13/2014 5:06 AM, Nick Coghlan wrote: > I figured out tonight that it's only positioning ASCII interpolation > as an*alternative* to adding binary interpolation that I have a > problem with. It isn't, because you lose the structural assurance that > you haven't inadvertently introduced an assumption of ASCII > compatibility when you didn't need to. However, interpolation support > is a convenient enough interface that I can see a version that*only* > supports ASCII compatible interpolation being an attractive nuisance > that becomes a source of hard to detect and fix data corruption bugs > (just like the str type in Python 2). > > If we add both, my objections go away: people like me can use the > Python 3 only formatb and formatb_map methods and be confident we > haven't inadvertently introduced any assumptions regarding ASCII > compatibility, while folks that know they're dealing with an ASCII > compatible format can use the ASCII assuming versions that are > designed to be source compatible with Python 2. > > If someone incorrectly uses format() or format_map() when they should > be using the pure binary versions, that's a trivial bug fix (adding > the necessary "b", and perhaps some explicit encoding calls) rather > than a major restructuring of the code. > > If they use mod-formatting, that's a slightly bigger fix, but still > just switching to a different spelling of the formatting operation. > > Both use cases (binary only and ASCII compatible) get covered cleanly, > and nobody has to lose out. > > Cheers, > Nick. As part of that, what about an alternate spelling of % to allow binary-only interpolation operations using the handy syntax of % ? Doesn't seem like / is defined for bytes or str on the LHS. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Jan 13 22:02:52 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Jan 2014 10:02:52 +1300 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> Message-ID: <52D4547C.2070506@canterbury.ac.nz> Guido van Rossum wrote: > On Sun, Jan 12, 2014 at 5:27 PM, Ethan Furman wrote: > >>On 01/12/2014 04:47 PM, Guido van Rossum wrote: > >>>b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' >>>enclosed in single quotes) >> >>I'm not sure about the quotes. Would anyone ever actually want those in the >>byte stream? > > Perhaps not, but it's a hint that you should probably think about an > encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of > it as payback time. :-) If it's never useful, wouldn't it be better to raise an exception in this case? That way, someone porting code from py2 that does this without appropriate modification will find out about the problem immediately, rather than have spurious quotes inserted into their binary data, which -- being binary data -- will likely go unnoticed until something else tries to read the data. I don't think the rule against operations that work on all-but-one-type really applies here, because the mistake it's intended to catch is not an obscure corner case. If your program's logic includes interpolating strings into bytes objects, then you're going to be testing that. -- Greg From v+python at g.nevcal.com Mon Jan 13 22:13:30 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 13 Jan 2014 13:13:30 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> Message-ID: <52D456FA.10900@g.nevcal.com> On 1/13/2014 10:40 AM, Brett Cannon wrote: > This even gives people in-place ASCII encoding for strings by always > using '{:s}' with text which they can do when they port their code to > run under both Python 2 and 3. So you should be able to do > ``b'Content-Type: {:s}'.format('image/jpeg')`` and have it give ASCII. > If you want more explicit encoding to latin-1 then you need to do it > explicitly and not rely on the mini-language to do tricks for you. > My preference is not have any, but if Guido is going say PBP here then > I want absolute consistency across the board in how bytes.format() > tweaks things. > As for %s for the % operator calling ascii(), I think that will be a > porting nightmare of finding out why your bytes suddenly stopped being > formatted properly and then having to crawl through all of your code > for that one use of %s which is getting bytes in. By raising a > TypeError you will very easily detect where your screw-up occurred > thanks to the traceback; do so otherwise feels too much like implicit > type conversion and ask any JavaScript developer how that can be a bad > thing. > So quote 3 is necessarily a violation of quote 1. But if quote 2 can allow for one exception to its absolute consistency... that is probably the best solution overall... -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Mon Jan 13 22:08:18 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 13 Jan 2014 13:08:18 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D42499.1070001@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> <52D42499.1070001@stoneleaf.us> Message-ID: <52D455C2.1020803@g.nevcal.com> On 1/13/2014 9:38 AM, Ethan Furman wrote: > On 01/13/2014 09:31 AM, Antoine Pitrou wrote: >> On Mon, 13 Jan 2014 08:36:05 -0800 >> Ethan Furman wrote: >>> >>> You mean crash all the time? I'd be fine with that for both the str >>> case >>> and the bytes case. But's probably too late >>> to change the str case, and the bytes case should mirror what str does. >> >> Let me add something else: str and bytes don't have to be symmetrical. >> In Python 2, str and unicode were symmetrical, they allowed exactly the >> same operations and were composable. >> In Python 3, str and bytes are different beasts; they have different >> operations *and* different semantics (for example, bytes interoperates >> with bytearray and memoryview, while str doesn't). > > This makes sense to me. > > So I'm guess I'm fine with either the quoted ascii repr or the always > blowing up method, with leaning towards the blowing up method. +1 - what Ethan said. A real death, instead death by inappropriately transformed data, is fine by me, if b"%s" % str(...) doesn't have the appropriate .encode(...) call. But I could live with either. -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Mon Jan 13 22:20:28 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 13 Jan 2014 21:20:28 +0000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> Message-ID: On 13/01/2014 21:01, Paul Moore wrote: > > I think this should be for 3.5, and should not involve an accelerated > release of 3.5 - we should get it into the 3.5 code early and let > people thrash out the details during the 3.5 release cycle. I disagree, it should be on pypi now so people can start trying it out, or as others have suggested incorporate it into the six module. Surely that'd make the job of getting it into 3.5 far easier? > > Paul. > > PS For all the heated arguments and occasional frayed tempers, this > has been an impressively civil debate. I think that's one of the best > things about python-dev, that discussions like these never degenerate > into flamewars. Kudos to all concerned! > +1 -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From greg.ewing at canterbury.ac.nz Mon Jan 13 22:24:00 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Jan 2014 10:24:00 +1300 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D37D5C.6070604@g.nevcal.com> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> Message-ID: <52D45970.700@canterbury.ac.nz> Glenn Linderman wrote: > Quotes in the stream are a great debug hint, without blowing up. But do you really want those quotes turning up in a *binary* stream, where they're somewhere between awkward and near-impossible to spot by eyeballing, and may only be discovered when something else -- likely a different program, possibly being run by a different person -- tries to read the data back, and blows up because the binary format is corrupted? I'd much rather it blew up at the writing stage, myself. Corrupted binary data is *much* harder to debug than corrupted text, because binary formats typically have little to no margin for error before they become complete garbage. -- Greg From guido at python.org Mon Jan 13 22:32:28 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 13:32:28 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D4547C.2070506@canterbury.ac.nz> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: I will doggedly keep posting to this thread rather than creating more threads. In another thread, Nick has said he's okay with my proposal (not sure if that includes %s or not, but it now seems of lesser importance) as long as we simultaneously introduce formatb() and formatb_map() (the latter is just a minor variation of the former, so I won't mention it further). But formatb() feels absurd to me. PEP 460 has neither a precise specification or any actual examples, so I can't tell whether the intention is that the format string can *only* contain {...} sequences or whether it can also contain "regular" characters. Translating to formatb(), my question comes down to the legality of the following example: b'Hello, {}'.formatb(name) # Where name is some bytes object If this is allowed, it reintroduces the ASCII bias (since the substring 'Hello' is clearly ASCII). If this isn't allowed, it feels like a perversion of the notion of a "formatting language", and I really don't see the attraction over using a combination of concatenation and the struct module, perhaps augmented with some use of bytes([i]) as an alternative to %c or {!c} (if that is what is meant by PEP 460 with 'c modifier' -- I can't find the word 'modifier' in the docs for format(). Note that I honestly don't understand which of these PEP 460 means. Either way, PEP 460's motivation seems kind of subjective and esthetic: """ While there are reasonably efficient ways to accumulate binary data (such as using a bytearray object, the bytes.join method or even io.BytesIO), none of them leads to the kind of readable and intuitive code that is produced by a %-formatted or {}-formatted template and a formatting operation. """ I would buy this if a binary format string could contain embedded text (like 'Hello' in my example above), but then the argument about avoiding ASCII bias seems to fall apart so I am at a loss about what Nick actually wants, and even about what PEP 460 actually specifies. -- --Guido van Rossum (python.org/~guido) From v+python at g.nevcal.com Mon Jan 13 22:29:08 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 13 Jan 2014 13:29:08 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> <4B5FC5C3-26C9-42E4-BE19-BF2FFDCB411F@stufft.io> <20140113145757.671103a7@anarchist.wooz.org> Message-ID: <52D45AA4.6080308@g.nevcal.com> On 1/13/2014 12:09 PM, Guido van Rossum wrote: > Yeah, the %s behavior with a string argument was a messy attempt at > compromise. I was hoping to mimick a common use of %s in Python 2, > where it can be used with either an 8-bit string or a number as > argument, acting like %b in the former case and like %d in the latter > case. Not having %s at all in Python 3 means that porting requires > more thinking (== more opportunity for mistakes when you're converting > in bulk) and there's no easy way to write code that works in Python 2 > and 3. > > If we have %b for strictly interpolating bytes, I'm fine with adding > %a for calling ascii() on the argument and then interpolating the > result after ASCII-encoding it. > > If somehow (unlikely though it seems) we end up keeping %s (e.g. > strictly to ease porting), we could also keep %r as an alias for %a. %s for strictly interpolating bytes eases porting. Sad name, but good for compatibility. When the blowup happens, due to having a str type passed, the porter adds the appropriate .encode(...) to the parameter, so it doesn't blow up on Py 3, and it'll be OK for Py 2 as well, will it not? -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jan 13 22:40:03 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Jan 2014 22:40:03 +0100 Subject: [Python-Dev] PEP 460 reboot References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: <20140113224003.377e2b4a@fsol> On Mon, 13 Jan 2014 13:32:28 -0800 Guido van Rossum wrote: > > But formatb() feels absurd to me. PEP 460 has neither a precise > specification or any actual examples, so I can't tell whether the > intention is that the format string can *only* contain {...} sequences > or whether it can also contain "regular" characters. Translating to > formatb(), my question comes down to the legality of the following > example: > > b'Hello, {}'.formatb(name) # Where name is some bytes object Yes, it's allowed. But so is: b'\xff\x00{}\x85{}'.formatb(payload, trailer) The ASCII bias is because of the bytes literal notation. Regards Antoine. From ethan at stoneleaf.us Mon Jan 13 22:25:34 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 13:25:34 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D455C2.1020803@g.nevcal.com> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> <52D42499.1070001@stoneleaf.us> <52D455C2.1020803@g.nevcal.com> Message-ID: <52D459CE.20606@stoneleaf.us> On 01/13/2014 01:08 PM, Glenn Linderman wrote: > > +1 - what Ethan said. A real death, instead death by inappropriately transformed data, is fine by me, if b"%s" % > str(...) doesn't have the appropriate .encode(...) call. But I could live with either. You mean instead of death by a thousand quotes? *ducks and runs* -- ~Ethan~ From guido at python.org Mon Jan 13 22:51:12 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 13:51:12 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: Terminology. Let's use the official terminology rather than making stuff up. The docs at http://docs.python.org/3/library/string.html#formatspec use the following terminology: Replacement field: {...}; contains field name, conversion, format spec in that order, all optional. Field name: either a decimal integer (referring to an argument by position) or an identifier (by name), or omitted (uses the next available position). Conversion: !r, !s, !a; these refer to repr(), str(), ascii() to the value, and then the format spec applies to the resulting string. Format spec: colon, bunch of stuff, type; the type is a letter such as d (decimal) or s (string), and the stuff between the colon and the type is used to specify field width, alignment, sign, padding and such. Also. {:b} means binary (i.e. numbers in base 2). I'm not sure what this leaves for interpolating bytes if we don't want to use {:s}. The docs at http://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting don't show %b so it could still be used there, but it would be nicer to be consistent. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Mon Jan 13 22:54:56 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Jan 2014 10:54:56 +1300 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: Message-ID: <52D460B0.9050606@canterbury.ac.nz> Nick Coghlan wrote: > By allowing format characters that *do* assume ASCII, the entire > construct is rendered unsafe - you have to look inside the format > string to determine if it is assuming ASCII compatibility or not, thus > the entire construct must be deemed as assuming ASCII compatibility at > the level of static semantic analysis. I don't see how any of the currently proposed formatting operations make a data-dependent ASCII assumption. When you write b"%d" % x, you're not assuming that x is ASCII, you're assuming that it's an *integer*. The %d conversion of an integer is defined to produce only ASCII characters, and it works on any integer, so there's no data-dependent assumption there. Something that *would* involve such an assumption would be if b"%s" % 'hello' were defined to encode 'hello' as ASCII. But Guido has proposed not doing that, and instead interpolating ascii('hello'). Since ascii() is defined to return only ASCII characters, and works on any string, there is again no data-dependent assumption. My preference would be for b"%s" % 'hello' to raise an exception, but that would still be data-independent. As for having to look inside the format string to know what types are expected, that's no different from any other formatting operation. All it means is that static type analysis in Python is hard, but we already knew that. > Allowing these ASCII assuming format codes in the core bytes > interpolation introduces *exactly* the same problem as is present in > the Python 2 text model: code that *appears* to support arbitrary > binary data, but is in fact assuming ASCII compatibility. Can you provide an example of code using Guido's currently approved formatting semantics that would fail when given arbitrary binary data? I don't see how it can happen. -- Greg From ncoghlan at gmail.com Mon Jan 13 22:56:26 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 14 Jan 2014 07:56:26 +1000 Subject: [Python-Dev] PEP460 thoughts from a Mercurial dev In-Reply-To: References: Message-ID: On 14 Jan 2014 03:34, "Guido van Rossum" wrote: > > On Mon, Jan 13, 2014 at 8:51 AM, Nick Coghlan wrote: > > On 13 January 2014 23:57, Augie Fackler wrote: > >> 1) What do we need in terms of functionality > >> > >> Best guess, %s, %d, and %f. I've not done a full audit of the code, but some > >> limited looking over the grep hits for % in .py files suggests I'm right, > >> and we could even do without %f (we only use that for 'hg --time' output, > >> which we could do in unicode). > > > > I think PEP 460 will have you covered there, or hopefully asciistr on 3.3+ > > I'm confused on how PEP 460 would help -- Augie mentioned %d, which it excludes. I meant your proposed more lenient version (since there's no need for the binary only version to be in the common 2/3 subset). Cheers, Nick. > > -- > --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jan 13 22:36:24 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 13:36:24 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> Message-ID: <52D45C58.60004@stoneleaf.us> On 01/13/2014 01:20 PM, Mark Lawrence wrote: > On 13/01/2014 21:01, Paul Moore wrote: >> >> I think this should be for 3.5, and should not involve an accelerated >> release of 3.5 - we should get it into the 3.5 code early and let >> people thrash out the details during the 3.5 release cycle. > > I disagree, it should be on pypi now so people can start trying it out, or as others have suggested incorporate it into > the six module. Surely that'd make the job of getting it into 3.5 far easier? It's a bit harder to put a core feature on PyPI. I'm not even sure how it would be done. Fortunately, once it is in 3.5 trunk the adventurous can build their own and try it out that way. -- ~Ethan~ From guido at python.org Mon Jan 13 22:56:44 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 13:56:44 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140113224003.377e2b4a@fsol> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <20140113224003.377e2b4a@fsol> Message-ID: On Mon, Jan 13, 2014 at 1:40 PM, Antoine Pitrou wrote: > On Mon, 13 Jan 2014 13:32:28 -0800 > Guido van Rossum wrote: >> >> But formatb() feels absurd to me. PEP 460 has neither a precise >> specification or any actual examples, so I can't tell whether the >> intention is that the format string can *only* contain {...} sequences >> or whether it can also contain "regular" characters. Translating to >> formatb(), my question comes down to the legality of the following >> example: >> >> b'Hello, {}'.formatb(name) # Where name is some bytes object > > Yes, it's allowed. But so is: > > b'\xff\x00{}\x85{}'.formatb(payload, trailer) > > The ASCII bias is because of the bytes literal notation. But it is nevertheless there. Including arbitrary hex bytes in the ASCII range should be a liability, unless you have memorized the hex codes for ASCII and know that e.g. '\x25' is '%' and '\x7b' is '{'. The above example (is it from a real protocol?) would be just as clear or clearer written as b'\xff\x00' + payload + b'\x85' + trailer or b''.join([b'\xff\x00', payload, b'\x85', trailer]) and reasoning about those versions requires no understanding of ASCII. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Jan 13 22:59:38 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 13:59:38 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D45AA4.6080308@g.nevcal.com> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> <4B5FC5C3-26C9-42E4-BE19-BF2FFDCB411F@stufft.io> <20140113145757.671103a7@anarchist.wooz.org> <52D45AA4.6080308@g.nevcal.com> Message-ID: On Mon, Jan 13, 2014 at 1:29 PM, Glenn Linderman wrote: > On 1/13/2014 12:09 PM, Guido van Rossum wrote: > > Yeah, the %s behavior with a string argument was a messy attempt at > compromise. I was hoping to mimick a common use of %s in Python 2, > where it can be used with either an 8-bit string or a number as > argument, acting like %b in the former case and like %d in the latter > case. Not having %s at all in Python 3 means that porting requires > more thinking (== more opportunity for mistakes when you're converting > in bulk) and there's no easy way to write code that works in Python 2 > and 3. > > If we have %b for strictly interpolating bytes, I'm fine with adding > %a for calling ascii() on the argument and then interpolating the > result after ASCII-encoding it. > > If somehow (unlikely though it seems) we end up keeping %s (e.g. > strictly to ease porting), we could also keep %r as an alias for %a. > > > %s for strictly interpolating bytes eases porting. Sad name, but good for > compatibility. When the blowup happens, due to having a str type passed, the > porter adds the appropriate .encode(...) to the parameter, so it doesn't > blow up on Py 3, and it'll be OK for Py 2 as well, will it not? Lots of code uses %s with numbers too, and probably the occasional None or list (relying on the Python 2 near-guarantee that most objects' str() is their repr() and that repr() nearly guarantees to return only ASCII). E.g. I'm sure you can find live code doing something like headers.append('Content-Length: %s\r\n' % len(body)) -- --Guido van Rossum (python.org/~guido) From brett at python.org Mon Jan 13 23:05:10 2014 From: brett at python.org (Brett Cannon) Date: Mon, 13 Jan 2014 17:05:10 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: On Mon, Jan 13, 2014 at 4:51 PM, Guido van Rossum wrote: > Terminology. Let's use the official terminology rather than making stuff > up. > > The docs at http://docs.python.org/3/library/string.html#formatspec > use the following terminology: > > Replacement field: {...}; contains field name, conversion, format spec > in that order, all optional. > > Field name: either a decimal integer (referring to an argument by > position) or an identifier (by name), or omitted (uses the next > available position). > > Conversion: !r, !s, !a; these refer to repr(), str(), ascii() to the > value, and then the format spec applies to the resulting string. > > Format spec: colon, bunch of stuff, type; the type is a letter such as > d (decimal) or s (string), and the stuff between the colon and the > type is used to specify field width, alignment, sign, padding and > such. > > > Also. {:b} means binary (i.e. numbers in base 2). I'm not sure what > this leaves for interpolating bytes if we don't want to use {:s}. The > docs at > http://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting > don't show %b so it could still be used there, but it would be nicer > to be consistent. I have been going on the assumption that bytes.format() would change what '{}' meant for itself and would only interpolate bytes. That convenient between Python 2 and 3 since it represents what we want it to (str and bytes under the hood, respectively), so it just falls through. We could also add a 'b' conversion for bytes() explicitly so as to help people not accidentally mix up things in bytes.format() and str.format(). But I was not suggesting adding a specific format spec for bytes but instead making bytes.format() just do the .encode('ascii') automatically to help with compatibility when a format spec was present. If people want fancy formatting for bytes they can always do it themselves before calling bytes.format(). -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Mon Jan 13 23:07:23 2014 From: dholth at gmail.com (Daniel Holth) Date: Mon, 13 Jan 2014 17:07:23 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> <4B5FC5C3-26C9-42E4-BE19-BF2FFDCB411F@stufft.io> <20140113145757.671103a7@anarchist.wooz.org> <52D45AA4.6080308@g.nevcal.com> Message-ID: On Mon, Jan 13, 2014 at 4:59 PM, Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 1:29 PM, Glenn Linderman wrote: >> On 1/13/2014 12:09 PM, Guido van Rossum wrote: >> >> Yeah, the %s behavior with a string argument was a messy attempt at >> compromise. I was hoping to mimick a common use of %s in Python 2, >> where it can be used with either an 8-bit string or a number as >> argument, acting like %b in the former case and like %d in the latter >> case. Not having %s at all in Python 3 means that porting requires >> more thinking (== more opportunity for mistakes when you're converting >> in bulk) and there's no easy way to write code that works in Python 2 >> and 3. >> >> If we have %b for strictly interpolating bytes, I'm fine with adding >> %a for calling ascii() on the argument and then interpolating the >> result after ASCII-encoding it. >> >> If somehow (unlikely though it seems) we end up keeping %s (e.g. >> strictly to ease porting), we could also keep %r as an alias for %a. >> >> >> %s for strictly interpolating bytes eases porting. Sad name, but good for >> compatibility. When the blowup happens, due to having a str type passed, the >> porter adds the appropriate .encode(...) to the parameter, so it doesn't >> blow up on Py 3, and it'll be OK for Py 2 as well, will it not? > > Lots of code uses %s with numbers too, and probably the occasional > None or list (relying on the Python 2 near-guarantee that most > objects' str() is their repr() and that repr() nearly guarantees to > return only ASCII). > > E.g. I'm sure you can find live code doing something like > > headers.append('Content-Length: %s\r\n' % len(body)) But if the alternative is spurious quotes then the choice is clear... From brett at python.org Mon Jan 13 23:07:16 2014 From: brett at python.org (Brett Cannon) Date: Mon, 13 Jan 2014 17:07:16 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D45C58.60004@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> <52D45C58.60004@stoneleaf.us> Message-ID: On Mon, Jan 13, 2014 at 4:36 PM, Ethan Furman wrote: > On 01/13/2014 01:20 PM, Mark Lawrence wrote: > >> On 13/01/2014 21:01, Paul Moore wrote: >> >>> >>> I think this should be for 3.5, and should not involve an accelerated >>> release of 3.5 - we should get it into the 3.5 code early and let >>> people thrash out the details during the 3.5 release cycle. >>> >> >> I disagree, it should be on pypi now so people can start trying it out, >> or as others have suggested incorporate it into >> the six module. Surely that'd make the job of getting it into 3.5 far >> easier? >> > > It's a bit harder to put a core feature on PyPI. I'm not even sure how it > would be done. Fortunately, once it is in 3.5 trunk the adventurous can > build their own and try it out that way. > You make it a function that under Python 2 and < 3.5 does what needs to be done and on 3.5 just directly calls the underlying method. People will still have to change their code, but the idea is it becomes a refactoring instead of a change in how the code is structured. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jan 13 23:11:54 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Jan 2014 23:11:54 +0100 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <20140113224003.377e2b4a@fsol> Message-ID: <20140113231154.6261b30b@fsol> On Mon, 13 Jan 2014 13:56:44 -0800 Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 1:40 PM, Antoine Pitrou wrote: > > On Mon, 13 Jan 2014 13:32:28 -0800 > > Guido van Rossum wrote: > >> > >> But formatb() feels absurd to me. PEP 460 has neither a precise > >> specification or any actual examples, so I can't tell whether the > >> intention is that the format string can *only* contain {...} sequences > >> or whether it can also contain "regular" characters. Translating to > >> formatb(), my question comes down to the legality of the following > >> example: > >> > >> b'Hello, {}'.formatb(name) # Where name is some bytes object > > > > Yes, it's allowed. But so is: > > > > b'\xff\x00{}\x85{}'.formatb(payload, trailer) > > > > The ASCII bias is because of the bytes literal notation. > > But it is nevertheless there. Including arbitrary hex bytes in the > ASCII range should be a liability, unless you have memorized the hex > codes for ASCII and know that e.g. '\x25' is '%' and '\x7b' is '{'. That's a good point. I hadn't really thought about that. > The above example (is it from a real protocol?) (no, it's cooked up) > would be just as clear > or clearer written as > > b'\xff\x00' + payload + b'\x85' + trailer > > or > > b''.join([b'\xff\x00', payload, b'\x85', trailer]) > > and reasoning about those versions requires no understanding of ASCII. Fair enough. Regards Antoine. From guido at python.org Mon Jan 13 23:14:55 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 14:14:55 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon wrote: > I have been going on the assumption that bytes.format() would change what > '{}' meant for itself and would only interpolate bytes. That convenient > between Python 2 and 3 since it represents what we want it to (str and bytes > under the hood, respectively), so it just falls through. We could also add a > 'b' conversion for bytes() explicitly so as to help people not accidentally > mix up things in bytes.format() and str.format(). But I was not suggesting > adding a specific format spec for bytes but instead making bytes.format() > just do the .encode('ascii') automatically to help with compatibility when a > format spec was present. If people want fancy formatting for bytes they can > always do it themselves before calling bytes.format(). This seems hastily written (e.g. verb missing :-), and I'm not clear on what you are (or were) actually proposing. When exactly would bytes.format() need .encode('ascii')? I would be happy to wait a few hours or days for you to to write it up clearly, rather than responding in a hurry. -- --Guido van Rossum (python.org/~guido) From eric at trueblade.com Mon Jan 13 23:25:50 2014 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 13 Jan 2014 17:25:50 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> <4B5FC5C3-26C9-42E4-BE19-BF2FFDCB411F@stufft.io> <20140113145757.671103a7@anarchist.wooz.org> <52D45AA4.6080308@g.nevcal.com> Message-ID: <52D467EE.4060503@trueblade.com> On 1/13/2014 4:59 PM, Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 1:29 PM, Glenn Linderman wrote: >> If somehow (unlikely though it seems) we end up keeping %s (e.g. >> strictly to ease porting), we could also keep %r as an alias for %a. >> >> >> %s for strictly interpolating bytes eases porting. Sad name, but good for >> compatibility. When the blowup happens, due to having a str type passed, the >> porter adds the appropriate .encode(...) to the parameter, so it doesn't >> blow up on Py 3, and it'll be OK for Py 2 as well, will it not? > > Lots of code uses %s with numbers too, and probably the occasional > None or list (relying on the Python 2 near-guarantee that most > objects' str() is their repr() and that repr() nearly guarantees to > return only ASCII). > > E.g. I'm sure you can find live code doing something like > > headers.append('Content-Length: %s\r\n' % len(body)) > That's why I think we should support %s taking bytes, int, float. And make %b mean the same thing, if you want. But I think we need to keep %s (however limited) for compatibility with Python 2. Personally, I'd be okay with %s not accepting str (by raising an exception). I think that would give us a large "compatibility surface" in common with Python 2. Eric. From donald at stufft.io Mon Jan 13 23:31:33 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 13 Jan 2014 17:31:33 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D467EE.4060503@trueblade.com> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> <4B5FC5C3-26C9-42E4-BE19-BF2FFDCB411F@stufft.io> <20140113145757.671103a7@anarchist.wooz.org> <52D45AA4.6080308@g.nevcal.com> <52D467EE.4060503@trueblade.com> Message-ID: On Jan 13, 2014, at 5:25 PM, Eric V. Smith wrote: > On 1/13/2014 4:59 PM, Guido van Rossum wrote: >> On Mon, Jan 13, 2014 at 1:29 PM, Glenn Linderman wrote: >>> If somehow (unlikely though it seems) we end up keeping %s (e.g. >>> strictly to ease porting), we could also keep %r as an alias for %a. >>> >>> >>> %s for strictly interpolating bytes eases porting. Sad name, but good for >>> compatibility. When the blowup happens, due to having a str type passed, the >>> porter adds the appropriate .encode(...) to the parameter, so it doesn't >>> blow up on Py 3, and it'll be OK for Py 2 as well, will it not? >> >> Lots of code uses %s with numbers too, and probably the occasional >> None or list (relying on the Python 2 near-guarantee that most >> objects' str() is their repr() and that repr() nearly guarantees to >> return only ASCII). >> >> E.g. I'm sure you can find live code doing something like >> >> headers.append('Content-Length: %s\r\n' % len(body)) >> > > That's why I think we should support %s taking bytes, int, float. And > make %b mean the same thing, if you want. But I think we need to keep %s > (however limited) for compatibility with Python 2. > > Personally, I'd be okay with %s not accepting str (by raising an exception). > > I think that would give us a large "compatibility surface" in common > with Python 2. %s not accepting str is the major thing I?d personally be against. %s taking numeric types and bytes would be fine. The main thing i?d be worried about is where the RHS may possibly contain something non ASCII that needs encoding (such as the str case). ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Mon Jan 13 23:36:27 2014 From: donald at stufft.io (Donald Stufft) Date: Mon, 13 Jan 2014 17:36:27 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> <4B5FC5C3-26C9-42E4-BE19-BF2FFDCB411F@stufft.io> <20140113145757.671103a7@anarchist.wooz.org> <52D45AA4.6080308@g.nevcal.com> <52D467EE.4060503@trueblade.com> Message-ID: <3A985D4B-A3C0-4EC6-A863-D5324E02953F@stufft.io> On Jan 13, 2014, at 5:31 PM, Donald Stufft wrote: > %s not accepting str is the major thing I?d personally be against. To be more clear b?%s? % ?abc? == No b?%s? % 123 == Fine ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ncoghlan at gmail.com Mon Jan 13 23:43:05 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 14 Jan 2014 08:43:05 +1000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> Message-ID: On 14 Jan 2014 04:58, "Guido van Rossum" wrote: > > Let me try rebooting the reboot. > > My interpretation of Nick's argument is that he are asking for a bytes > formatting language that doesn't have an implicit ASCII assumption. > > To me this feels absurd. The formatting codes (%s, %c) themselves are > expressed as ASCII characters. If you include anything else in the > format string besides formatting codes (e.g. b'<%s>'), you are giving > it as ASCII characters. I don't know what characters the EBCDIC codes > 37, 99 or 115 encode (these are the ASCII codes for '%', 'c', 's') but > it certainly wouldn't be safe to use % when the LHS is EBCDIC-encoded. Except we allow string escapes and programmatic creation of format strings, so while ASCII snippets in formatting code are certainly easier to type, they are by no means a mandatory feature of using interpolation operations. I agree Can you roll your own binary interpolation support with join() and simple concatenation? Yes, but Antoine's proposal provides a clean and reliable approach to flexible binary templating that isn't offered by the more lenient version. My problem is with telling Python users that if they're working with ASCII compatible data, they get access to a clean interpolation mini-language for templating purposes, but if they aren't, they don't. That's the part I see as potentially breaking the text model: now you have a convenient API on a core type encouraging you to treat your data as ASCII compatible with implicit serialisation of semantic data as ASCII text, even if that may not be appropriate. If pure binary interpolation is added at the same time (regardless of the exact spelling, so long as it's as easy to access as the ASCII templating), that objection goes away. That said, the fact that the interpolation mini-languages themselves assume ASCII is the most compelling rationale I have heard so far for treating interpolation as an operation that inherently assumes ASCII compatibility - you can't use arbitrary bytes in your formatting strings without escaping the formatting characters appropriately. While I don't see that as substantially different to needing to escape them in order to retain them in the output of text or ASCII formatting, it's at least a teachable rationale for the absence of a pure binary equivalent. > If I had some byte strings in an unknown encoding (but the same > encoding for all) that I needed to concatenate I would never think of > '%s%s' % (x, y) -- I would write x+y. (Even in Python 2.) > > If I see some code using *any* formatting operation (regardless of > whether it's %d, %r, %s or %c) I am going to assume that there is some > ASCII-ness, and if there isn't, the code's author has obscured their > goal to me. Right, that's a rationale I can explain to people. It also occurred to me that it's easier to build pure binary interpolation on top of ASCII interpolation than I previously thought: I can just check all the input values are compatible with memoryview. At that point, attempting to pass in anything that would trigger implicit encoding at the formatting stage will fail. (Aside: bytes(memoryview(obj)) is also a potentially handy way to avoid the bytes(int)) trap) > I hear the objections against b'%s' % 'x' returning b"'x'" loud and > clear, and if the noise about that sub-issue is preventing folks from > seeing the absurdity in PEP 460, we can talk about a compromise, e.g. > use %b which would require its argument to be bytes. Those bytes > should still probably be ASCII-ish, but there's no way to test that. > That's fine with me and should be fine to Nick as well -- PEP 460 > doesn't check that your encodings match (how could it? :-), nor does > plain string concatenation using +. Plus there genuinely are formats where different parts have different encodings and you rely on metadata or format definitions to know what they are. I would actually suggest something like Brett's approach for %s , but with memoryview in the mix: if the object exports a PEP 3118 buffer, interpolate it directly, otherwise invoke normal string formatting and then do strict ASCII encoding at the end. That way people don't have to learn new formatting mini-languages and only have two new behaviours to learn: buffer exporters are interpolated directly, anything else is formatted normally and then implicitly encoding as strict ASCII. > > In my head I make the following classification of situations where you > work with bytes and/or text. > > (A) Pure binary formats (e.g. most IP-level packet formats, media > files, .pyc files, tar/zip files, compressed data, etc.). These are > handled using the struct module (e.g. tar/zip) and/or custom C > extensions (e.g. gzip). > > (B) Encoded text. Here you should just decode everything into str > objects and parse your text at that level. If you really want to > manipulate the data as bytes (e.g. because you have a lot of data to > process and very light processing) you may be able to do it, but > unless it's a verbatim copy, you are probably going to make > assumptions about the encoding. You are also probably going to mess up > for some encodings (e.g. leave BOM turds in the middle of a file). > > (C) Loosely text-based protocols and formats that have an ASCII > assumption in the spec. Most classic Internet protocols (FTP, SMTP, > HTTP, IRC, etc.) fall in this category; I expect there are also plenty > of file formats using similar conventions (e.g. mailbox files). These > protocols and formats often require text-ish manipulations, e.g. for > case-insensitive headers or commands, or to split things at > whitespace. This is where I find uses for the current ASCII-assuming > bytes operations (e.g. b.lower(), b.split(), but also int(b)) and > where the lack of number formatting (especially %d and %x) is most > painful. I see no benefit in forcing the programmer writing such > protocol code handling to use more cumbersome ways of converting > between numbers and bytes, nor in forcing them to insert an > encoding/decoding layer -- these protocols often switch between text > and binary data at line boundaries, so the most basic part of parsing > (splitting the input into lines) must still happen in the realm of > bytes. > > IMO PEP 460 and the mindset that goes with it don't apply to any of > these three cases. > > Also, IMO requiring a new type to handle (C) also seems adding too > much complexity, and adds to porting efforts. I may have felt > differently in the past, but ATM I feel that if newer versions of > Python 3 make porting of Python 2 code easier, through minor > compromises, that's a *good* thing. (Example: adding u"..." literals > to 3.3.) You've persuaded me well enough that I think my last proposal above goes *further* than your original one in allowing text formatting when interpolating to ASCII compatible formats :) Cheers, Nick. > > -- > --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Mon Jan 13 23:46:01 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 13 Jan 2014 14:46:01 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113174237.25050250030@webabinitio.net> <4B5FC5C3-26C9-42E4-BE19-BF2FFDCB411F@stufft.io> <20140113145757.671103a7@anarchist.wooz.org> <52D45AA4.6080308@g.nevcal.com> Message-ID: <52D46CA9.4060805@g.nevcal.com> On 1/13/2014 1:59 PM, Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 1:29 PM, Glenn Linderman wrote: >> On 1/13/2014 12:09 PM, Guido van Rossum wrote: >> >> Yeah, the %s behavior with a string argument was a messy attempt at >> compromise. I was hoping to mimick a common use of %s in Python 2, >> where it can be used with either an 8-bit string or a number as >> argument, acting like %b in the former case and like %d in the latter >> case. Not having %s at all in Python 3 means that porting requires >> more thinking (== more opportunity for mistakes when you're converting >> in bulk) and there's no easy way to write code that works in Python 2 >> and 3. >> >> If we have %b for strictly interpolating bytes, I'm fine with adding >> %a for calling ascii() on the argument and then interpolating the >> result after ASCII-encoding it. >> >> If somehow (unlikely though it seems) we end up keeping %s (e.g. >> strictly to ease porting), we could also keep %r as an alias for %a. >> >> >> %s for strictly interpolating bytes eases porting. Sad name, but good for >> compatibility. When the blowup happens, due to having a str type passed, the >> porter adds the appropriate .encode(...) to the parameter, so it doesn't >> blow up on Py 3, and it'll be OK for Py 2 as well, will it not? > Lots of code uses %s with numbers too, and probably the occasional > None or list (relying on the Python 2 near-guarantee that most > objects' str() is their repr() and that repr() nearly guarantees to > return only ASCII). > > E.g. I'm sure you can find live code doing something like > > headers.append('Content-Length: %s\r\n' % len(body)) > That's portably fixable by switching to %d... or by adding .encode('ascii') -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Mon Jan 13 23:49:34 2014 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Mon, 13 Jan 2014 14:49:34 -0800 (PST) Subject: [Python-Dev] PEP 460 -- adding explicit assumptions In-Reply-To: Message-ID: <52d46d7e.4759e00a.40a0.ffffed5f@mx.google.com> As best I can tell, some people (apparently including Guido and PEP author Antoine) are taking some assumptions almost for granted, while other people (including me, before Nick's messages) were not assuming them at all. Since these assumptions (or, possibly, rejections of them?) are likely to decide the outcome, the assumptions should be explicit in the PEP. (1) The bytes-related classes do include methods that are only useful when the already-contained data is encoded ASCII. They do not (and will not) include any operations that *require* an encoding assumption. This implies that no non-bytes data can be added without an explicit encoding. (1a) Not even by assuming ASCII with strict error handling. (1b) Not even for numbers, where ASCII/strict really is sufficient. Note that this doesn't rule out a solution where objects (or maybe just numbers and ASCII-kind text) provide their own encoding to bytes -- but that has to be done by the objects themselves, not by the bytes container or by the interpreter. (2) Most python programmers are still in the future. So an API that confuses people who are still learning about Unicode and the text model is bad -- even if it would work fine for those who do already understand it. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ From greg.ewing at canterbury.ac.nz Tue Jan 14 00:22:44 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Jan 2014 12:22:44 +1300 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D39358.8090405@stoneleaf.us> Message-ID: <52D47544.9060903@canterbury.ac.nz> Nick Coghlan wrote: > > so the latter would be less of > an attractive nuisance when writing code that needs to handle arbitrary > binary formats and can't assume ASCII compatibility. Hang on a moment. What do you mean by code that "handles arbitrary binary formats"? As far as I can see, the proposed features are for code that handles *particular* binary formats. Ones with well-defined fields that are specified to contain ASCII-encoded text. It's the programmer's responsibility to make sure that the fields he's treating as ASCII really do contain ASCII, just as it's his responsibility to make sure he reads and writes a text file using the correct encoding. Now, it's possible that if you were working from an incomplete spec and some examples, you might be led to believe that a particular field was ASCII when in fact it was some ASCII superset such as latin1 or utf8. In that case, if you parsed it assuming ASCII, you would get into trouble of some sort with bytes greater than 127. However, the proposed formatting operations are concerned only with *generating* binary data, not parsing it. Under Guido's proposed semantics, all of the ASCII formatting operations are guaranteed to produce valid ASCII, regardless of what types or values are thrown at them. So as long as the field's true encoding is something ASCII-compatible, you will always generate valid data. > Because I *want to use* the PEP 460 binary interpolation API, but > wouldn't be able to use Guido's more lenient proposal, as it is a bug > magnet in the presence of arbitrary binary data. Where exactly is this "arbitrary binary data" that you keep talking about? The only place that arbitrary bytes comes into the picture is through b"%s" % b"...", and that's defined to just pass the bytes straight through. I don't see how that could attract any bugs that weren't already present in the data being interpolated. > The LHS may or may not be tainted with assumptions about ASCII > compatibility, which means it effectively *is* tainted with such > assumptions, which means code that needs to handle arbitrary binary data > can't use it and is left without a binary interpolation feature. If I understand correctly, what concerns you here is that you can't tell by looking at b"%s" % x whether it encodes anything as ASCII without knowing the type of x. I'm not sure how serious a problem that would be. Most of the time I think it will be fairly obvious from the purpose of the code what the type of x is *intended* to be. If it's not actually that type, then clearly there's a bug somewhere. Of all such possible bugs, the one most likely to arise due to a confusion in the programmer's mind between text and bytes would be for x to be a string when it was meant to be bytes or vice versa. Due to the still-very-strong separation between text and bytes in Py3, this is unlikely to happen without something else blowing up first. Even if it does happen, it won't result in a data- dependent failure. If b"%s" % 'hello' were defined to interpolate 'hello'.encode('ascii'), then there *would* be cause for concern. But this is not what Guido proposes -- instead he proposes interpolating ascii('hello') == "'hello'". This is almost certainly *never* what the file spec calls for, so you'll find out about it very soon one way or another. Effectively this means that b"%s" % x where x is a string is useless, so I'd much prefer it to just raise an exception in that case to make the failure immediately obvious. But either way, you're not going to end up with a latent failure waiting for some non-ASCII data to come along before you notice it. To summarise, I think the idea of binary format strings being too "tainted" for a program that does not want to use ASCII formatting to rely on is mostly FUD. -- Greg From jimjjewett at gmail.com Tue Jan 14 00:48:36 2014 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Mon, 13 Jan 2014 15:48:36 -0800 (PST) Subject: [Python-Dev] Automatic encoding detection [was: Re: Python3 "complexity" - 2 use cases] In-Reply-To: <7wob3jcb1j.fsf@benfinney.id.au> Message-ID: <52d47b54.4140e00a.32ce.fffff831@mx.google.com> >> So when it is time to guess [at the character encoding of a file], >> a source of good guesses is an important battery to include. > The barrier for entry to the standard library is higher than mere > usefulness. Agreed. But "most programs will need it, and people will either include (the same) 3rd-party library themselves, or write their own workaround, or have buggy code" *is* sufficient. The points of contention are (1) How many programs have to deal with documents written outside their control -- and probably originating on another system. I'm not ready to say "most" programs in general, but I think that barrier is met for both web clients (for which we already supply several batteries) and quick-and-dirty utilities. (2) How serious are the bugs / How annoying are the workarounds? As someone who mostly sticks to English, and who tends to manually ignore stray bytes when dealing with a semi-binary file format, the bugs aren't that serious for me personally. So I may well choose to write buggy programs, and the bug may well never get triggered on my own machine. But having a batch process crash one run in ten (where it didn't crash at all under Python 2) is a bad thing. There are environments where (once I knew about it) I would add chardet (if I could get approval for the 3rd-party component). (3) How clearcut is the *right* answer? As I said, at one point (several years ago), the w3c and whatwg started to standardize the "right" answer. They backed that out, because vendors wanted the option to improve their detection in the future without violating standards. There are certainly situations where local knowledge can do better than a global solution like chardet, but ... the "right" answer is clear most of the time. Just ignoring the problem is still a 99% answer, because most text in ASCII-mostly environments really is "close enough". But that is harder (and the One Obvious Way is less reliable) under Python 3 than it was under Python 2. An alias for "open" that defaulted to surrogate-escape (or returned the new "ASCIIstr" bytes hybrid) would probably be sufficient to get back (almost) to Python 2 levels of ease and reliability. But it would tend to encourage ASCII/English-only assumptions. You could fix most of the remaining problems by scripting a web browser, except that scripting the browser in a cross-platform manner is slow and problematic, even with webbrowser.py. "Whatever a recent Firefox does" is (almost by definition) good enough, and is available ... but maybe not in a convenient form, which is one reason that chardet was created as a port thereof. Also note that firefox assumes you will update more often than Python does. "Whatever chardet said at the time the Python release was cut" is almost certainly good enough too. The browser makers go to great lengths to match each other even in bizarre corner cases. (Which is one reason there aren't more competing solutions.) But that doesn't mean it is *impossible* to construct a test case where they disagree -- or even one where a recent improvement in the algorithms led to regressions for one particular document. That said, such regressions should be limited to documents that were not properly labeled in the first place, and should be rare even there. Think of the changes as obscure bugfixes, akin to a program starting to handle NaN properly, in a place where it "should" not ever see one. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ From rosuav at gmail.com Tue Jan 14 01:06:01 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 14 Jan 2014 11:06:01 +1100 Subject: [Python-Dev] Automatic encoding detection [was: Re: Python3 "complexity" - 2 use cases] In-Reply-To: <52d47b54.4140e00a.32ce.fffff831@mx.google.com> References: <7wob3jcb1j.fsf@benfinney.id.au> <52d47b54.4140e00a.32ce.fffff831@mx.google.com> Message-ID: On Tue, Jan 14, 2014 at 10:48 AM, Jim J. Jewett wrote: >> The barrier for entry to the standard library is higher than mere >> usefulness. > > Agreed. But "most programs will need it, and people will either > include (the same) 3rd-party library themselves, or write their > own workaround, or have buggy code" *is* sufficient. Well, no, that's not sufficient on its own either. But yes, it's a stronger argument. > But having a batch process crash one run in ten (where it didn't > crash at all under Python 2) is a bad thing. There are environments > where (once I knew about it) I would add chardet (if I could get > approval for the 3rd-party component). Having it *do the wrong thing* one run in ten is even worse. If you need chardet, then get approval for the third-party component. That's a political issue, not a technical one. "This needs to be in the stdlib because I'm not allowed to install anything else"? I hope not. Also, a PyPI package is free to update independently of the Python version schedule. The stdlib is bound. ChrisA From greg.ewing at canterbury.ac.nz Tue Jan 14 01:06:06 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Jan 2014 13:06:06 +1300 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D47F6E.40904@canterbury.ac.nz> Stephen J. Turnbull wrote: > PBP doesn't think it's a great idea to pass around bytes that are > implicitly some other type, but didn't mind it (or got used to it) in > Python 2, and so they're not looking at that as a problem that Python > 3 can solve. They're looking at Python 3 as the problem that prevents > them from doing what worked fine in Python 2. While some people may think that way, I don't think it's fair to characterise *all* proponents of bytes formatting as luddites that refuse to get with the Python 3 way. Some of us *do* understand the principles of text/ bytes separation in Python 3 and agree that they're a good idea. We just don't agree that the proposed formatting operations violate those principles to any degree worth worrying about. I don't think of my viewpoint as being PBP. That term assumes there is purity there to be beaten. To my mind, any notion of purity with respect to bytes objects went out the window as soon as it was given a pile of text methods -- together with a text-like literal syntax and default repr(), even though at least half the time they're completely inappropriate! -- Greg From greg.ewing at canterbury.ac.nz Tue Jan 14 01:27:07 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Jan 2014 13:27:07 +1300 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D37D5C.6070604@g.nevcal.com> <3B1B5227-80B9-4E06-B639-F3111E96FF19@stufft.io> <041F7CD9-B95E-4DA9-AA46-C807EB0F3974@stufft.io> Message-ID: <52D4845B.10009@canterbury.ac.nz> Nick Coghlan wrote: > Arbitrary binary data and ASCII compatible binary data are *different > things* and the only argument in favour of modelling them with a single > type is because Python 2 did it that way. I would say that ASCII compatible binary data is a *subset* of arbitrary binary data. As such, a type designed for arbitrary binary data is a perfectly good way of representing ASCII compatible binary data. What are you saying -- that there should be one type for ASCII compatible binary data, and another type for all binary data *except* when it's ASCII compatible? That makes no sense to me. > The Python 3 text model was built on the notion of "no implicit encoding > and decoding" This is nonsense. There are plenty of implicit encoding and decoding operations in Python 3. When you open a text file, it gets an encoding. After that, anything you write to it is implicitly encoded using that encoding. There's even a default encoding when you open the file, so you don't even have to be explicit about that. It's more correct to say that it was built on the notion of using separate types for encoded and decoded data, so that it's *possible* to keep track of the difference. It doesn't mean that there can't be conversions between the two types that are implicit to one degree or another. -- Greg From rosuav at gmail.com Tue Jan 14 01:48:46 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 14 Jan 2014 11:48:46 +1100 Subject: [Python-Dev] Test failures when running as root Message-ID: And now for something completely different. My root buildbot is finally now able to telnet out and get "Connection refused" errors. (For the curious, the VirtualBox "NAT" mode doesn't work properly, but the new "NAT Network" mode does. Why? I have no idea. But if anyone else is having the same problem, upgrade to the latest VirtualBox and set up a NAT Network. All I care is, it now works.) The test suite is now failing at another point, and this applies to 2.7, 3.3, and 3.x. ====================================================================== ERROR: test_initgroups (test.test_posix.PosixGroupsTester) ---------------------------------------------------------------------- Traceback (most recent call last): File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/test/test_posix.py", line 1143, in test_initgroups g = max(self.saved_groups) + 1 ValueError: max() arg is an empty sequence ---------------------------------------------------------------------- The saved_groups value comes from posix.getgroups(), and it's being used to try to get a group that this user doesn't have (I think). When I run Python as root, posix.getgroups() returns [0], but apparently it's not returning any groups when the test runs. So, two questions. Firstly, is this a problem that needs to be fixed in Python, or is it a configuration change that I made? It began failing recently, so possibly when I rebooted the VM as part of VirtualBox changes I mucked something up. And secondly, how can I run the tests manually? I can't find a binary inside the buildarea tree. Does it get deleted afterward? Apologies if these are dumb questions, hopefully they're a small distraction from PEP 460 arguments! ChrisA From tjreedy at udel.edu Tue Jan 14 01:51:48 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 13 Jan 2014 19:51:48 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <20140113124118.55c8c2c3@fsol> <20140113170949.242b9e00@fsol> <52D415F5.6020607@stoneleaf.us> <20140113183114.0fea07a8@fsol> Message-ID: <52D48A24.4000602@udel.edu> On 1/13/2014 3:13 PM, Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 12:02 PM, Brett Cannon wrote: >> On Mon, Jan 13, 2014 at 2:51 PM, Terry Reedy wrote: >>> I personally would not add 'bytes % whatever'. >> >> Personally, neither would I; just focus on bytes.format() and let % operator >> on strings slowly go away. > > Well, % has some very strong arguments in its favor still -- for If I shift from a 'personal' to a 'BDFL' viewpoint, I have to agree. > example, the sheer amount of code that currently uses it, the fact > that it's as close as we get to a cross-language standard, and the This much I know. > fact that nobody wants to tackle its use in the logging module (since > logger objects are often shared between packages that don't know about > each other). This I did not know. > Anyway, the % or .format() issue seems completely orthogonal to the > issues that get people riled up (which are mostly about whether using > either implies some kind of ASCII compatibility). A possibly important difference between '%s' and '{:s}' is that the 's' is required in the former and optional in the latter. So in byteformat(), b'{:s}' continues to format a string (as encoded bytes) while '{:}' 'formats' a byte without having to invent a new code that does not exist in 2.7. That particular solution to "does 's' mean bytes or string" does not work for % formatting. (And that lack, in turn, is part of what lay behind the inclination expressed above.) For % formatting, I would be inclined to start with 'what does mecurial need?' or even 'does anything even really work for hg?'. Hg is part of our development ecosystem, and we have an hg rep who expressed a desire to experiment. Terry From tjreedy at udel.edu Tue Jan 14 01:58:43 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 13 Jan 2014 19:58:43 -0500 Subject: [Python-Dev] Automatic encoding detection [was: Re: Python3 "complexity" - 2 use cases] In-Reply-To: References: <7wob3jcb1j.fsf@benfinney.id.au> <52d47b54.4140e00a.32ce.fffff831@mx.google.com> Message-ID: On 1/13/2014 7:06 PM, Chris Angelico wrote: > On Tue, Jan 14, 2014 at 10:48 AM, Jim J. Jewett wrote: >> Agreed. But "most programs will need it, and people will either >> include (the same) 3rd-party library themselves, or write their >> own workaround, or have buggy code" *is* sufficient. > > Well, no, that's not sufficient on its own either. But yes, it's a > stronger argument. > >> But having a batch process crash one run in ten (where it didn't >> crash at all under Python 2) is a bad thing. There are environments >> where (once I knew about it) I would add chardet (if I could get >> approval for the 3rd-party component). > > Having it *do the wrong thing* one run in ten is even worse. > > If you need chardet, then get approval for the third-party component. > That's a political issue, not a technical one. "This needs to be in > the stdlib because I'm not allowed to install anything else"? I hope > not. Also, a PyPI package is free to update independently of the > Python version schedule. The stdlib is bound. This discussion strikes me as more appropriate for python-ideas. That said, I am leery of a heuristics module in the stdlib. When is a change a 'bug fix'? and when is it an 'enhancement'? -- Terry Jan Reedy From python at mrabarnett.plus.com Tue Jan 14 02:03:44 2014 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 14 Jan 2014 01:03:44 +0000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: <52D48CF0.2080900@mrabarnett.plus.com> On 2014-01-13 21:51, Guido van Rossum wrote: > Terminology. Let's use the official terminology rather than making stuff up. > > The docs at http://docs.python.org/3/library/string.html#formatspec > use the following terminology: > > Replacement field: {...}; contains field name, conversion, format spec > in that order, all optional. > > Field name: either a decimal integer (referring to an argument by > position) or an identifier (by name), or omitted (uses the next > available position). > > Conversion: !r, !s, !a; these refer to repr(), str(), ascii() to the > value, and then the format spec applies to the resulting string. > If all you wanted to do was interpolate bytes then you could define a new conversion !b. This would, however, mean that the format spec would be applied to bytes. > Format spec: colon, bunch of stuff, type; the type is a letter such as > d (decimal) or s (string), and the stuff between the colon and the > type is used to specify field width, alignment, sign, padding and > such. > > > Also. {:b} means binary (i.e. numbers in base 2). I'm not sure what > this leaves for interpolating bytes if we don't want to use {:s}. The > docs at http://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting > don't show %b so it could still be used there, but it would be nicer > to be consistent. > From emile at fenx.com Tue Jan 14 02:09:37 2014 From: emile at fenx.com (Emile van Sebille) Date: Mon, 13 Jan 2014 17:09:37 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D47F6E.40904@canterbury.ac.nz> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> Message-ID: On 1/13/2014 4:06 PM, Greg Ewing wrote: > of text methods -- together with a text-like literal > syntax and default repr(), even though at least half > the time they're completely inappropriate! Better said as 'half the time they're coincidentally helpful!' My $.01 :) Emile From steve at pearwood.info Tue Jan 14 03:21:26 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 14 Jan 2014 13:21:26 +1100 Subject: [Python-Dev] Automatic encoding detection [was: Re: Python3 "complexity" - 2 use cases] In-Reply-To: References: <7wob3jcb1j.fsf@benfinney.id.au> <52d47b54.4140e00a.32ce.fffff831@mx.google.com> Message-ID: <20140114022126.GC3403@ando> On Mon, Jan 13, 2014 at 07:58:43PM -0500, Terry Reedy wrote: > This discussion strikes me as more appropriate for python-ideas. That > said, I am leery of a heuristics module in the stdlib. When is a change > a 'bug fix'? and when is it an 'enhancement'? Depends on the nature of the heuristic. For example, there's a simple "guess the encoding of text files" heuristic which uses the presence of a BOM to pick the encoding: - read the first four bytes in binary mode - if bytes 0 and 1 are FEFF or FFFE, then the encoding is UTF-16; - if bytes 0 through 2 are EFBBBF, then the encoding is UTF-8; - if bytes 0 through 3 are 0000FEFF or FFFE0000, then the encoding is UTF-32; - if bytes 0 through 2 are 2B2F76 and byte 3 is 38, 39, 2B or 2F, then the encoding is UTF-7; - otherwise the encoding is unknown. Here a bug fix versus an enhancement is easy: a bug fix is (say) getting one of the BOMs wrong (suppose it tested for EFFF instead of FEFF, that would be a bug); an enhancement would be adding a new BOM/encoding detector (say, F7644C for UTF-1). The same would not apply to, for instance, the chardet library, where detection is based on statistics. If the library adjusts a frequency table, does that reflect a bug or an enhancement or both? -- Steven From tjreedy at udel.edu Tue Jan 14 03:25:50 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 13 Jan 2014 21:25:50 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: On 1/13/2014 4:32 PM, Guido van Rossum wrote: > I will doggedly keep posting to this thread rather than creating more threads. Please permit to to doggedly keep pointing you toward the possible solution I posted on the tracker last October. > But formatb() feels absurd to me. PEP 460 has neither a precise > specification or any actual examples, so I can't tell whether the Two days ago, I reposted byteformat() here on pydev with a precise text specification added to the code, and with an expanded test example. I have just added another example based on your question below. > intention is that the format string can *only* contain {...} sequences > or whether it can also contain "regular" characters. Translating to > formatb(), my question comes down to the legality of the following > example: > > b'Hello, {}'.formatb(name) # Where name is some bytes object > > If this is allowed, it reintroduces the ASCII bias (since the > substring 'Hello' is clearly ASCII). Since byteformat() uses re to find {} replacement fields, it only has such ascii bias as re has, which I believe is not much, if any. As far as re and byteformat are concerned, everything outside of the {...} fields is uninterpreted bytes. As far as bytes.join is concerned, both joiner and joined are uninterpreted bytes. >>> byteformat(b'\x00{}\x02{}def', (b'\x01', b'abc',)) b'\x00\x01\x02abcdef' re.split produces [b'\x00', b'', b'\x02', b'', b'def']. The only ascii bias is the one already present is the representation of bytes, and the fact that Python code must have an ascii-compatible encoding. The advantage of byteformat(b'\x00{}\x02{}def', (b'\x01', b'abc',)) over directly writing b''.join([b'\x00', b'\x01', b'\x02', b'abc', b'def'] is that one does not have to manually split the presumably constant template into chunks and interleave them with the presumable variable chunks. Here is the example that I used for testing, including non-blank format specs. bformat = b"bytes: {}; bytearray: {:}; unicode: {:s}; int: {:5d}; float: {:7.2f}; end" objects = (b'abc', bytearray(b'def'), u'ghi', 123, 12.3) result = byteformat(bformat, objects) >>> b'bytes: abc; bytearray: def; unicode: ghi; int: 123; float: 12.30; end' The additional advantage here is the automatic encoding of formatted strings to bytes. As posted, byteformat() uses the str.encode defaults (encoding='utf-8', errors='strict'). But as I said in the post, these could become parameters to the function that are passed on to str.encode. The design reuses re.split, bytes.join, format, and the format specification. By re-using the format-spec as is, the only new thing to learn is that blank specs correspond to bytes instead of strings. This is easier to design, implement, and learn than if the format-spec is limited to disallow some things (after much bike-shedding over what to eliminate ;-). I would appreciate your comment on this proposal. -- Terry Jan Reedy From ethan at stoneleaf.us Tue Jan 14 02:38:38 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 17:38:38 -0800 Subject: [Python-Dev] magic method __bytes__ Message-ID: <52D4951E.3080104@stoneleaf.us> Has anyone actually used __bytes__ yet? What for? -- ~Ethan~ From tjreedy at udel.edu Tue Jan 14 03:30:14 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 13 Jan 2014 21:30:14 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: On 1/13/2014 5:14 PM, Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon wrote: >> I have been going on the assumption that bytes.format() would change what >> '{}' meant for itself and would only interpolate bytes. That convenient >> between Python 2 and 3 since it represents what we want it to (str and bytes >> under the hood, respectively), so it just falls through. We could also add a >> 'b' conversion for bytes() explicitly so as to help people not accidentally >> mix up things in bytes.format() and str.format(). But I was not suggesting >> adding a specific format spec for bytes but instead making bytes.format() >> just do the .encode('ascii') automatically to help with compatibility when a >> format spec was present. If people want fancy formatting for bytes they can >> always do it themselves before calling bytes.format(). > > This seems hastily written (e.g. verb missing :-), and I'm not clear > on what you are (or were) actually proposing. When exactly would > bytes.format() need .encode('ascii')? > > I would be happy to wait a few hours or days for you to to write it up > clearly, rather than responding in a hurry. I already posted my version of this proposal, with spec and example, in the thread "byteformat() proposal: please critique", and I added more in response to your earlier post. -- Terry Jan Reedy From python at mrabarnett.plus.com Tue Jan 14 03:55:06 2014 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 14 Jan 2014 02:55:06 +0000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: <52D4A70A.3000308@mrabarnett.plus.com> On 2014-01-14 02:25, Terry Reedy wrote: > On 1/13/2014 4:32 PM, Guido van Rossum wrote: > > > I will doggedly keep posting to this thread rather than creating more > threads. > > Please permit to to doggedly keep pointing you toward the possible > solution I posted on the tracker last October. > >> But formatb() feels absurd to me. PEP 460 has neither a precise >> specification or any actual examples, so I can't tell whether the > > Two days ago, I reposted byteformat() here on pydev with a precise text > specification added to the code, and with an expanded test example. I > have just added another example based on your question below. > >> intention is that the format string can *only* contain {...} sequences >> or whether it can also contain "regular" characters. Translating to >> formatb(), my question comes down to the legality of the following >> example: >> >> b'Hello, {}'.formatb(name) # Where name is some bytes object >> >> If this is allowed, it reintroduces the ASCII bias (since the >> substring 'Hello' is clearly ASCII). > > Since byteformat() uses re to find {} replacement fields, > it only has such ascii bias as re has, which I believe is not much, if > any. As far as re and byteformat are concerned, everything outside of > the {...} fields is uninterpreted bytes. As far as bytes.join is > concerned, both joiner and joined are uninterpreted bytes. > > >>> byteformat(b'\x00{}\x02{}def', (b'\x01', b'abc',)) > b'\x00\x01\x02abcdef' > [snip] Couldn't that suffer from false positives, i.e. binary data that happens to match? (Rare, yes, but possible.) From tjreedy at udel.edu Tue Jan 14 04:03:43 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 13 Jan 2014 22:03:43 -0500 Subject: [Python-Dev] Test failures when running as root In-Reply-To: References: Message-ID: On 1/13/2014 7:48 PM, Chris Angelico wrote: > And now for something completely different. > > My root buildbot is finally now able to telnet out and get "Connection > refused" errors. (For the curious, the VirtualBox "NAT" mode doesn't > work properly, but the new "NAT Network" mode does. Why? I have no > idea. But if anyone else is having the same problem, upgrade to the > latest VirtualBox and set up a NAT Network. All I care is, it now > works.) The test suite is now failing at another point, and this > applies to 2.7, 3.3, and 3.x. > > ====================================================================== > ERROR: test_initgroups (test.test_posix.PosixGroupsTester) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/test/test_posix.py", > line 1143, in test_initgroups > g = max(self.saved_groups) + 1 > ValueError: max() arg is an empty sequence try: g = max(self.saved_groups) + 1 except ValueError: g = 1 > The saved_groups value comes from posix.getgroups(), and it's being > used to try to get a group that this user doesn't have (I think). When > I run Python as root, posix.getgroups() returns [0], but apparently > it's not returning any groups when the test runs. Unless someone says that it is a bug for posix.getgroups to return an empty list, I would say that the test should be fixed by trying the code above. > So, two questions. Firstly, is this a problem that needs to be fixed > in Python, or is it a configuration change that I made? It began > failing recently, so possibly when I rebooted the VM as part of > VirtualBox changes I mucked something up. > > And secondly, how can I run the tests manually? If you build and keep a binary, it is easy. For example: path-to-binary -m test text_posix See doc chapter for test package and for 2.7 difference. > I can't find a binary inside the buildarea tree. > Does it get deleted afterward? No experience with bbots. > Apologies if these are dumb questions, hopefully they're a small > distraction from PEP 460 arguments! And welcomed. -- Terry Jan Reedy From zachary.ware+pydev at gmail.com Tue Jan 14 04:14:03 2014 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Mon, 13 Jan 2014 21:14:03 -0600 Subject: [Python-Dev] Test failures when running as root In-Reply-To: References: Message-ID: On Mon, Jan 13, 2014 at 6:48 PM, Chris Angelico wrote: > And secondly, how can I run the tests manually? I can't find a binary > inside the buildarea tree. Does it get deleted afterward? Yes, that's the 'clean' step of the buildbot build process. I'd suggest making another clone elsewhere (you can clone from the buildarea just to make the clone faster, but I'd leave the buildarea alone otherwise), then building and testing should be as simple as `./configure --with-pydebug && make && ./python -m test.test_posix`. As far as the failure itself, I have no comment. > Apologies if these are dumb questions, hopefully they're a small > distraction from PEP 460 arguments! It's kinda nice to get something non-PEP460 in the inbox this week :) -- Zach From python at mrabarnett.plus.com Tue Jan 14 04:16:29 2014 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 14 Jan 2014 03:16:29 +0000 Subject: [Python-Dev] Test failures when running as root In-Reply-To: References: Message-ID: <52D4AC0D.2000407@mrabarnett.plus.com> On 2014-01-14 03:03, Terry Reedy wrote: > On 1/13/2014 7:48 PM, Chris Angelico wrote: >> And now for something completely different. >> >> My root buildbot is finally now able to telnet out and get "Connection >> refused" errors. (For the curious, the VirtualBox "NAT" mode doesn't >> work properly, but the new "NAT Network" mode does. Why? I have no >> idea. But if anyone else is having the same problem, upgrade to the >> latest VirtualBox and set up a NAT Network. All I care is, it now >> works.) The test suite is now failing at another point, and this >> applies to 2.7, 3.3, and 3.x. >> >> ====================================================================== >> ERROR: test_initgroups (test.test_posix.PosixGroupsTester) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/test/test_posix.py", >> line 1143, in test_initgroups >> g = max(self.saved_groups) + 1 >> ValueError: max() arg is an empty sequence > > try: > g = max(self.saved_groups) + 1 > except ValueError: > g = 1 > Alternatively: g = max(self.saved_groups, [1]) or even: g = max(self.saved_groups or [1]) From rosuav at gmail.com Tue Jan 14 04:19:37 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 14 Jan 2014 14:19:37 +1100 Subject: [Python-Dev] Test failures when running as root In-Reply-To: References: Message-ID: On Tue, Jan 14, 2014 at 2:14 PM, Zachary Ware wrote: > On Mon, Jan 13, 2014 at 6:48 PM, Chris Angelico wrote: >> And secondly, how can I run the tests manually? I can't find a binary >> inside the buildarea tree. Does it get deleted afterward? > > Yes, that's the 'clean' step of the buildbot build process. I'd > suggest making another clone elsewhere (you can clone from the > buildarea just to make the clone faster, but I'd leave the buildarea > alone otherwise), then building and testing should be as simple as > `./configure --with-pydebug && make && ./python -m test.test_posix`. Doh. Yeah, I can see the 'clean' step in the build process, I should have known. Of course. Thanks, that's what I'll do then. ChrisA From benjamin at python.org Tue Jan 14 04:24:56 2014 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 13 Jan 2014 19:24:56 -0800 Subject: [Python-Dev] magic method __bytes__ In-Reply-To: <52D4951E.3080104@stoneleaf.us> References: <52D4951E.3080104@stoneleaf.us> Message-ID: <1389669896.1643.70405865.720BE7AC@webmail.messagingengine.com> On Mon, Jan 13, 2014, at 05:38 PM, Ethan Furman wrote: > Has anyone actually used __bytes__ yet? What for? In the stdlib itself: email.message wsgiref pathlib From rosuav at gmail.com Tue Jan 14 04:18:44 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 14 Jan 2014 14:18:44 +1100 Subject: [Python-Dev] Test failures when running as root In-Reply-To: References: Message-ID: On Tue, Jan 14, 2014 at 2:03 PM, Terry Reedy wrote: > On 1/13/2014 7:48 PM, Chris Angelico wrote: >> >> ValueError: max() arg is an empty sequence > > > try: > > g = max(self.saved_groups) + 1 > except ValueError: > g = 1 > > > Unless someone says that it is a bug for posix.getgroups to return an empty > list, I would say that the test should be fixed by trying the code above. I can't see anything in the getgroups man page [1] to suggest that it's a bug to return an empty list. But I can't replicate the behaviour either - not on the system Python, at least (2.7.3), hence the query about rerunning tests. Will raise an issue on the tracker. [1] eg http://linux.die.net/man/2/getgroups ChrisA From rosuav at gmail.com Tue Jan 14 05:09:51 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 14 Jan 2014 15:09:51 +1100 Subject: [Python-Dev] Test failures when running as root In-Reply-To: <52D4AC0D.2000407@mrabarnett.plus.com> References: <52D4AC0D.2000407@mrabarnett.plus.com> Message-ID: On Tue, Jan 14, 2014 at 2:16 PM, MRAB wrote: > Alternatively: > > g = max(self.saved_groups, [1]) > > or even: > > g = max(self.saved_groups or [1]) Patch created and tracker issue opened. I've used something similar to MRAB's idea as it looks compact. Thanks all! http://bugs.python.org/issue20249 Is the patch in the right format? I'm not familiar with hg, so I just looked up a git<->hg Rosetta Stone that told me to use 'hg export'. The patch works with 'hg import' on 2.7 and 3.3. ChrisA From stephen at xemacs.org Tue Jan 14 05:58:56 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 14 Jan 2014 13:58:56 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D45016.3060708@g.nevcal.com> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <52D31049.1090708@g.nevcal.com> <87ob3geoa9.fsf@uwakimon.sk.tsukuba.ac.jp> <52D38BB6.1000100@g.nevcal.com> <87fvosdjt5.fsf@uwakimon.sk.tsukuba.ac.jp> <52D45016.3060708@g.nevcal.com> Message-ID: <87eh4bdur3.fsf@uwakimon.sk.tsukuba.ac.jp> Glenn Linderman writes: > On 1/13/2014 6:43 AM, Stephen J. Turnbull wrote: >> Glenn Linderman writes: >>> "smuggled binary" (great term borrowed from a different >>> subthread) muddies the waters of what you are dealing with. >> Not really. The "mud" is one or more of the serious deficiencies. >> It can be removed, I believe (and Nick apparently does, too). >> "asciistr" is one way to try that. > Yes really. Use of smuggled binary means the str containing it can > no longer be treated completely as a str. That is "muddier" than > having a str that is only a str. You don't seem to understand what *asciistr* is: it's a *different type* that is simultaneously compatible in operation with bytes and str, by automatically converting to whichever it is used with. If we used asciistr, str would no longer be muddy (except in cases where we would have used surrogateescape anyway). You also don't seem to understand that bytes are conceptually pure mud. Anything that is pushed to bytes because you don't know what type it is (or because at the time the program is written, the type can't be known) is no longer subject to duck-typing. So the question is "how is mud best handled?" Obviously, incorporating it in str with .decode('latin1') is inappropriate. However, if you use .decode('ascii') you have your choice of error handlers. If you use errors='strict' then no mud can get in. Use of any other error handler is obviously a "consenting adults" behavior; it should only be done when you expect that you can keep the muddy str from leaking into places where it might be passed to an I/O function. (Note that the internal processing of an application that never outputs such a str is completely conformant to the Unicode Standard. That's not a goal of Python, since surrogateescape is designed to be used on output too. But if the developer applies that standard to each *program component*, he's going to be in pretty good shape.) If you use asciistr, then you're pretty much in complete control. The exception is operations that munge individual characters (case conversion). If you have a protocol with ASCII keywords but their case is specified, you'll need to define another type to remove the case-munging methods if you want that level of safety. If, as in your proposal, bytes are tagged with descriptions, you are effectively creating types on the fly. But if the program doesn't anticipate that, they're mud. If the program doesn't anticipate all of them those descriptions that are unhandled become mud, too. ITSM that the "syntax descriptor" feature is already present in Python, and it's called "class". So, IMHO, simply converting to an appropriate Python type on input is what should be done, but in any case, I don't see how adding a "syntax descriptor" attribute to bytes is going to improve the situation significantly. Note that such a class can postpone parsing for efficiency or lack of information reasons, and store the object as bytes until needed. But this is not the same as passing around naked bytes, because the class can ensure that bytes can't get out, only parsed objects. From guido at python.org Tue Jan 14 06:03:51 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 21:03:51 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: On Mon, Jan 13, 2014 at 6:25 PM, Terry Reedy wrote: > On 1/13/2014 4:32 PM, Guido van Rossum wrote: > >> I will doggedly keep posting to this thread rather than creating more >> threads. > > Please permit to to doggedly keep pointing you toward the possible solution > I posted on the tracker last October. You're talking about http://bugs.python.org/issue3982 right? >> But formatb() feels absurd to me. PEP 460 has neither a precise >> specification or any actual examples, so I can't tell whether the > > Two days ago, I reposted byteformat() here on pydev with a precise text > specification added to the code, and with an expanded test example. I have > just added another example based on your question below. That new example hasn't made it to my inbox yet, and I don't see anything very recent in that issue either. But I don't think it matters. >> intention is that the format string can *only* contain {...} sequences >> or whether it can also contain "regular" characters. Translating to >> formatb(), my question comes down to the legality of the following >> example: >> >> b'Hello, {}'.formatb(name) # Where name is some bytes object >> >> If this is allowed, it reintroduces the ASCII bias (since the >> substring 'Hello' is clearly ASCII). > > Since byteformat() uses re to find {} replacement fields, it > only has such ascii bias as re has, which I believe is not much, if any. As > far as re and byteformat are concerned, everything outside of the {...} > fields is uninterpreted bytes. As far as bytes.join is concerned, both > joiner and joined are uninterpreted bytes. > >>>> byteformat(b'\x00{}\x02{}def', (b'\x01', b'abc',)) > b'\x00\x01\x02abcdef' > > re.split produces [b'\x00', b'', b'\x02', b'', b'def']. The only ascii bias > is the one already present is the representation of bytes, and the fact that > Python code must have an ascii-compatible encoding. I don't think it's that easy. Just searching for '{' is enough to break in surprising ways unless the format string is encoded in an ASCII superset. I can think of two easy examples to illustrate this (they're similar to the example I posted here before about the essential ASCII-ness of %c). First, let's consider EBCDIC. The '{' character in ASCII is hex 7B (decimal 123). I looked it up (http://en.wikipedia.org/wiki/EBCDIC) and that is the '#' character in EBCDIC. Surprised yet? Next, let's consider UTF-16. This encoding uses two bytes per character (except for surrogates), so any character whose top half or bottom half happens to be 7B hex will cause an incorrect hit for your regular expression. Ouch. Of course, nobody in their right mind would use a format string containing UTF-16 or EBCDIC. And that is precisely my point. When you're using a format string, all of the format string (not just the part between { and }) had better use ASCII or an ASCII superset. And this (rightly) constrains the output to an ASCII superset as well. > The advantage of > byteformat(b'\x00{}\x02{}def', (b'\x01', b'abc',)) > over directly writing > b''.join([b'\x00', b'\x01', b'\x02', b'abc', b'def'] > is that one does not have to manually split the presumably constant template > into chunks and interleave them with the presumable variable chunks. Yes. And that's a great feature when the output is a known encoding that's an ASCII superset. But a terrible idea when the encoding is unconstrained. > Here is the example that I used for testing, including non-blank format > specs. > > bformat = b"bytes: {}; bytearray: {:}; unicode: {:s}; int: {:5d}; float: > {:7.2f}; end" > objects = (b'abc', bytearray(b'def'), u'ghi', 123, 12.3) > result = byteformat(bformat, objects) >>>> > b'bytes: abc; bytearray: def; unicode: ghi; int: 123; float: 12.30; end' No surprises here. And in fact I think this is the desired outcome. > The additional advantage here is the automatic encoding of formatted strings > to bytes. As posted, byteformat() uses the str.encode defaults > (encoding='utf-8', errors='strict'). But as I said in the post, these could > become parameters to the function that are passed on to str.encode. As long as that encoding is an ASCII superset this might be useful. > The design reuses re.split, bytes.join, format, and the format > specification. By re-using the format-spec as is, the only new thing to > learn is that blank specs correspond to bytes instead of strings. This is > easier to design, implement, and learn than if the format-spec is limited to > disallow some things (after much bike-shedding over what to eliminate ;-). > > I would appreciate your comment on this proposal. It seems to be a bit weak on the bytes encoding -- I would like to see an explicit format code for those (your code looks a little clever in this area). Others will probably object that it makes it too easy to encode text by default, although I'm not sure it matters, given that the behavior is quite different from Python 2's broken treatment of interpolating Unicode in an 8-bit format string. All in all it mostly looks like a sane spec though. -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Jan 14 06:06:37 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 21:06:37 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87eh4bdur3.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <52D31049.1090708@g.nevcal.com> <87ob3geoa9.fsf@uwakimon.sk.tsukuba.ac.jp> <52D38BB6.1000100@g.nevcal.com> <87fvosdjt5.fsf@uwakimon.sk.tsukuba.ac.jp> <52D45016.3060708@g.nevcal.com> <87eh4bdur3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Sorry to butt in, but can you post a link to the asciistr code? Google has too many hits for other things to be useful to find it, it seems. -- --Guido van Rossum (python.org/~guido) From stephen at xemacs.org Tue Jan 14 06:15:03 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 14 Jan 2014 14:15:03 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D47F6E.40904@canterbury.ac.nz> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> Message-ID: <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > I don't think of my viewpoint as being PBP. That term > assumes there is purity there to be beaten. To my mind, > any notion of purity with respect to bytes objects > went out the window as soon as it was given a pile > of text methods -- together with a text-like literal > syntax and default repr(), even though at least half > the time they're completely inappropriate! Isn't an analogous statement true of every programming language taken as a whole? Does that mean that, because Python 1 text handling was unavoidably "practical", adding the "purist" unicode type in Python 2 was a mistake? Python 3's sacrifice of Python 2 compatibility seems positively degenerate by your standard! To be less contentious, surely the concept of "purity" includes "purification" (even if that doesn't apply to some subdivisions of purity)? In any case, taking your statement at face value, I consider adding the methods to have been a mistake and the literal syntax and repr to be compact abbreviations that are frequently convenient. Byte sequences that can be considered to serializations of objects including ASCII text, so that some subsequences of bytes in the range 0-127 can be usefully considered as a text representation are very common. But I think that they're important enough that their representation in Python deserves a type (maybe more than one) that tries to enforce what regularities there are in such streams. The purity position is probably going to lose in the end, since Guido is clearly in the PBP camp at this point, and that's a strong indicator (especially since Nick has given up on convincing python-dev). But that does not mean it's entirely invalid. From v+python at g.nevcal.com Tue Jan 14 06:22:59 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 13 Jan 2014 21:22:59 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <52D31049.1090708@g.nevcal.com> <87ob3geoa9.fsf@uwakimon.sk.tsukuba.ac.jp> <52D38BB6.1000100@g.nevcal.com> <87fvosdjt5.fsf@uwakimon.sk.tsukuba.ac.jp> <52D45016.3060708@g.nevcal.com> <87eh4bdur3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D4C9B3.5080301@g.nevcal.com> On 1/13/2014 9:06 PM, Guido van Rossum wrote: > Sorry to butt in, but can you post a link to the asciistr code? Google > has too many hits for other things to be useful to find it, it seems. > https://github.com/jeamland/asciicompat -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Tue Jan 14 06:23:40 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 13 Jan 2014 21:23:40 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: <52D4C9DC.1080307@g.nevcal.com> On 1/13/2014 9:03 PM, Guido van Rossum wrote: > Of course, nobody in their right mind would use a format string > containing UTF-16 or EBCDIC. And that is precisely my point. When > you're using a format string, all of the format string (not just the > part between { and }) had better use ASCII or an ASCII superset. And > this (rightly) constrains the output to an ASCII superset as well. +1000 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 14 06:25:29 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 14 Jan 2014 15:25:29 +1000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: On 14 January 2014 15:03, Guido van Rossum wrote: > I don't think it's that easy. Just searching for '{' is enough to > break in surprising ways unless the format string is encoded in an > ASCII superset. I can think of two easy examples to illustrate this > (they're similar to the example I posted here before about the > essential ASCII-ness of %c). > > First, let's consider EBCDIC. The '{' character in ASCII is hex 7B > (decimal 123). I looked it up (http://en.wikipedia.org/wiki/EBCDIC) > and that is the '#' character in EBCDIC. Surprised yet? > > Next, let's consider UTF-16. This encoding uses two bytes per > character (except for surrogates), so any character whose top half or > bottom half happens to be 7B hex will cause an incorrect hit for your > regular expression. Ouch. > > Of course, nobody in their right mind would use a format string > containing UTF-16 or EBCDIC. And that is precisely my point. When > you're using a format string, all of the format string (not just the > part between { and }) had better use ASCII or an ASCII superset. And > this (rightly) constrains the output to an ASCII superset as well. In case it got lost amongst the various threads, this was the argument that finally convinced me that interpolation *inherently* assumes an ASCII compatible encoding: the assumption of ASCII compatibility is embedded in the design of the formatting syntax for both printf-style formatting and the format methods. That places interpolation support squarely in the same category as all the other bytes methods that inherently assume ASCII, and thus remains consistent with the Python 3 text model. Originally I was thinking that the ASCII assumption applied only if one of the passed in *values* needed to be implicitly encoded as ASCII, without accounting for the fact that the parser itself assumed ASCII compatibility when searching for formatting metacharacters. Once Guido pointed out that oversight on my part, my objections collapsed, since this observation makes it clear that there's *no* coherent way to offer a pure binary interpolation API - the only general purpose combination mechanism for segments of binary data that can avoid making assumptions about the encodings of metacharacters is simple concatenation. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Tue Jan 14 06:10:31 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 21:10:31 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <52D31049.1090708@g.nevcal.com> <87ob3geoa9.fsf@uwakimon.sk.tsukuba.ac.jp> <52D38BB6.1000100@g.nevcal.com> <87fvosdjt5.fsf@uwakimon.sk.tsukuba.ac.jp> <52D45016.3060708@g.nevcal.com> <87eh4bdur3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D4C6C7.3030505@stoneleaf.us> On 01/13/2014 09:06 PM, Guido van Rossum wrote: > > Sorry to butt in, but can you post a link to the asciistr code? Google > has too many hits for other things to be useful to find it, it seems. https://github.com/jeamland/asciicompat -- ~Ethan~ From ethan at stoneleaf.us Tue Jan 14 06:12:20 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 21:12:20 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <52D31049.1090708@g.nevcal.com> <87ob3geoa9.fsf@uwakimon.sk.tsukuba.ac.jp> <52D38BB6.1000100@g.nevcal.com> <87fvosdjt5.fsf@uwakimon.sk.tsukuba.ac.jp> <52D45016.3060708@g.nevcal.com> <87eh4bdur3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D4C734.3090109@stoneleaf.us> On 01/13/2014 09:06 PM, Guido van Rossum wrote: > In contrast, here's the tests I drew up for what I thought bytes should do for us (no code, just tests): https://bitbucket.org/stoneleaf/bytestring -- ~Ethan~ From ncoghlan at gmail.com Tue Jan 14 06:34:45 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 14 Jan 2014 15:34:45 +1000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 14 January 2014 15:15, Stephen J. Turnbull wrote: > The purity position is probably going to lose in the end, since Guido > is clearly in the PBP camp at this point, and that's a strong > indicator (especially since Nick has given up on convincing > python-dev). But that does not mean it's entirely invalid. I didn't give up regarding PEP 460 - Guido pointed out an error in my assumptions that made my position invalid, and his correct. "Give up" makes it sound like I got tired of arguing without being convinced rather than admitting I was just plain wrong. While I'll still work on the asciistr proposal, that's unrelated to PEP 460 - it's about making hybrid APIs less painful to write in Python 3 when you're willing to place the burden of ensuring ASCII compatibility of binary data on the calling code. That kind of thing is likely to be a reasonable approach in specific domains (when writing a web development framework, for example), even though I think it's an *in*appropriate design for the standard library. PEP 460 should actually make asciistr easier in the long run, as I now expect we'll run into some "interesting" issues getting formatting to produce anything other than text (contrary to what I said elsewhere in these threads - I hadn't thought through the full implications at the time). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Tue Jan 14 06:49:09 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Jan 2014 21:49:09 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D4C734.3090109@stoneleaf.us> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <52D31049.1090708@g.nevcal.com> <87ob3geoa9.fsf@uwakimon.sk.tsukuba.ac.jp> <52D38BB6.1000100@g.nevcal.com> <87fvosdjt5.fsf@uwakimon.sk.tsukuba.ac.jp> <52D45016.3060708@g.nevcal.com> <87eh4bdur3.fsf@uwakimon.sk.tsukuba.ac.jp> <52D4C734.3090109@stonel eaf.us> Message-ID: <52D4CFD5.5020704@stoneleaf.us> On 01/13/2014 09:12 PM, Ethan Furman wrote: > On 01/13/2014 09:06 PM, Guido van Rossum wrote: >> > > In contrast, here's the tests I drew up for what I thought bytes should do for us (no code, just tests): > > https://bitbucket.org/stoneleaf/bytestring Ugh. Ignore for now, I need to update them to reflect the recent developments. :/ -- ~Ethan~ From v+python at g.nevcal.com Tue Jan 14 07:01:35 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 13 Jan 2014 22:01:35 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87eh4bdur3.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D2C100.8010409@stoneleaf.us> <52D2CAED.7090502@stoneleaf.us> <52D2DE70.7080105@stoneleaf.us> <52D2E9AA.4010308@stoneleaf.us> <52D31049.1090708@g.nevcal.com> <87ob3geoa9.fsf@uwakimon.sk.tsukuba.ac.jp> <52D38BB6.1000100@g.nevcal.com> <87fvosdjt5.fsf@uwakimon.sk.tsukuba.ac.jp> <52D45016.3060708@g.nevcal.com> <87eh4bdur3.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D4D2BF.6050404@g.nevcal.com> On 1/13/2014 8:58 PM, Stephen J. Turnbull wrote: > Glenn Linderman writes: > > On 1/13/2014 6:43 AM, Stephen J. Turnbull wrote: > >> Glenn Linderman writes: > > >>> "smuggled binary" (great term borrowed from a different > >>> subthread) muddies the waters of what you are dealing with. > > >> Not really. The "mud" is one or more of the serious deficiencies. > >> It can be removed, I believe (and Nick apparently does, too). > >> "asciistr" is one way to try that. > > > Yes really. Use of smuggled binary means the str containing it can > > no longer be treated completely as a str. That is "muddier" than > > having a str that is only a str. > > You don't seem to understand what *asciistr* is: it's a *different > type* that is simultaneously compatible in operation with bytes and > str, by automatically converting to whichever it is used with. If we > used asciistr, str would no longer be muddy (except in cases where we > would have used surrogateescape anyway). No, I haven't fully understood what asciistr is, only Nick's several descriptions of it. I do understand it is a different type, and can interact with both bytes and str. If it automatically converts, then it sounds terribly inefficient with long data, but I didn't hear Nick say that, but maybe I missed it. You mentioned asciistr in the snippet above, but most of what you have been writing about smuggled binary was using str... I hadn't grokked that you were now a full-fledged proponent of asciistr, and were now proposing to put your smuggled binary into asciistr. > You also don't seem to understand that bytes are conceptually pure > mud. Anything that is pushed to bytes because you don't know what > type it is (or because at the time the program is written, the type > can't be known) is no longer subject to duck-typing. If you are talking str, then bytes are mud. If you are talking bytes, then str is mud. I'm wouldn't think of "pushing something to bytes" (whatever that means) because I don't know what it is... I may manipulate bytes because I know what they are, and that is the most appropriate form for that piece of data for the present manipulations; if something is text, I want to transform the bytes to str if I need to manipulate it, parse it, or present it. If I don't know what something is, it is because it didn't meet my expectations of what it should be, and I want to present an error, which may include some representation (probably hex) of some of the bytes that cannot be understood. But if I'm "pushing to bytes", which I would interpret as creating a byte stream, then I know what I have, and I need to convert it to bytes either to store it in a file, or communicate it to another process. That's far from not knowing what it is. > So the question is "how is mud best handled?" Obviously, > incorporating it in str with .decode('latin1') is inappropriate. Glad to hear you say that; I thought that was what you were promoting, when you said, in an earlier message: On 1/12/2014 4:08 PM, Stephen J. Turnbull wrote: > Glenn Linderman writes: > > > the proposals to embed binary in Unicode by abusing Latin-1 > > encoding. > > Those aren't "proposals", they are currently feasible techniques in > Python 3 for*some* use cases. Back to this one, though. > However, if you use .decode('ascii') you have your choice of error > handlers. If you use errors='strict' then no mud can get in. Use of > any other error handler is obviously a "consenting adults" behavior; > it should only be done when you expect that you can keep the muddy str > from leaking into places where it might be passed to an I/O function. > (Note that the internal processing of an application that never > outputs such a str is completely conformant to the Unicode Standard. > That's not a goal of Python, since surrogateescape is designed to be > used on output too. But if the developer applies that standard to > each *program component*, he's going to be in pretty good shape.) > > If you use asciistr, then you're pretty much in complete control. > The exception is operations that munge individual characters (case > conversion). If you have a protocol with ASCII keywords but their > case is specified, you'll need to define another type to remove the > case-munging methods if you want that level of safety. The above doesn't sound like a use case I care about, much. If I get a garbled file without an accurate definition of what it contains, then I probably want to stick it in the trash. The only "processing" that can be done is to pass on the garbage to someone else, and stink up their system, and that can be done purely as bytes. > If, as in your proposal, bytes are tagged with descriptions, you are > effectively creating types on the fly. But if the program doesn't > anticipate that, they're mud. Interpreting a file format or wire protocol requires parsing and manipulating an incoming byte stream, and converting it to useful types in the program... if it can't be converted to useful types, then why bother parsing it? So the rest of my discussion was not talking about creating types on the fly, but on a systematic way of converting a well-specified byte stream (file format, or wire protocol) to a collection of useful types, in an organized manner, that might be verifiable, rather than with ad-hoc coding. And similarly in reverse... after manipulating the objects to perform useful transformations, possibly based on user input (that's what a program does), then to write them back out to a byte stream in modified form, in an organized manner, that might be verifiable, rather than with ad-hoc coding. > If the program doesn't anticipate all > of them those descriptions that are unhandled become mud, too. ITSM > that the "syntax descriptor" feature is already present in Python, and > it's called "class". So, IMHO, simply converting to an appropriate > Python type on input is what should be done, but in any case, I don't > see how adding a "syntax descriptor" attribute to bytes is going to > improve the situation significantly. Syntax descriptors would be a description of the substructures of a file format (think TIFF files) or wire protocol, and might allow parsing of binary files similarly to the way computer languages are parsed, producing errors when encountering mud. What you dismiss as "converting to an appropriate Python type on input" can be quite complex when for complex file formats, but it is the process of converting to such a heirarchy of Python objects that was to be described by the syntax descriptors. > Note that such a class can postpone parsing for efficiency or lack of > information reasons, and store the object as bytes until needed. But > this is not the same as passing around naked bytes, because the class > can ensure that bytes can't get out, only parsed objects. Sure, it could. My proposal is suggesting that the distribution of bytes to objects in a hierarchy might be automated in the sense of parsing the binary format, so that instead of writing "a class" for the whole, that class would be pre-written, based on the syntax description of the file, and matching that with the syntax descriptions of the component types. It is really a topic for python ideas, to flesh it out further, but it seemed related, as a use case, a class that would live on the bytes processing boundary, producing other objects, some of which may be text strings, in an organized, probably hierarchical, collection of objects. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jan 14 07:04:06 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Jan 2014 22:04:06 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Jan 13, 2014 at 9:34 PM, Nick Coghlan wrote: > On 14 January 2014 15:15, Stephen J. Turnbull wrote: >> The purity position is probably going to lose in the end, since Guido >> is clearly in the PBP camp at this point, and that's a strong >> indicator (especially since Nick has given up on convincing >> python-dev). But that does not mean it's entirely invalid. > > I didn't give up regarding PEP 460 - Guido pointed out an error in my > assumptions that made my position invalid, and his correct. "Give up" > makes it sound like I got tired of arguing without being convinced > rather than admitting I was just plain wrong. Thanks for that. (I was worried when I saw your first huge post in the reboot thread.) > While I'll still work on the asciistr proposal, that's unrelated to > PEP 460 - it's about making hybrid APIs less painful to write in > Python 3 when you're willing to place the burden of ensuring ASCII > compatibility of binary data on the calling code. That kind of thing > is likely to be a reasonable approach in specific domains (when > writing a web development framework, for example), even though I think > it's an *in*appropriate design for the standard library. I've now looked at asciistr. (Thanks Glenn and Ethan for the link.) Now that I (hopefully) understand it, I'm worried that a text processing algorithm that uses asciistr might under hard-to-predict circumstances (such as when the arguments contain nothing of interest to the algorithm) might return an asciistr instance instead of a str or bytes instance, and this might confuse a caller (e.g. isinstance() checks might fail, dict lookups, or whatever -- it feels like the problem is similar to creating the perfect proxy type). > PEP 460 should actually make asciistr easier in the long run, as I now > expect we'll run into some "interesting" issues getting formatting to > produce anything other than text (contrary to what I said elsewhere in > these threads - I hadn't thought through the full implications at the > time). For example? -- --Guido van Rossum (python.org/~guido) From v+python at g.nevcal.com Tue Jan 14 07:37:17 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 13 Jan 2014 22:37:17 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: <52D4DB1D.7040508@g.nevcal.com> On 1/13/2014 9:25 PM, Nick Coghlan wrote: > since this observation makes it clear that there's*no* coherent way > to offer a pure binary interpolation API - the only general purpose > combination mechanism for segments of binary data that can avoid > making assumptions about the encodings of metacharacters is simple > concatenation. That's almost true, and I'm glad that you, Guido, and all of us can understand that the currently defined python2 and python3 formatting syntaxes contain an inherent ASCII assumption, just like many internet protocols. The bitter fight is over :) However, your statement above isn't 100% accurate, so just for the pedantry of it, I'll point out why. A mechanism could be defined where "format string" would only contain format specifications, and any other text would be considered an error. The format string could have an explicit or a defined encoding, there would be no need to make an assumption about its encoding. And since it would not contain text except for format specifications, it would only be used as a rule-book on how to interpret the parameters, contributing no text of its own to the result. This wouldn't solve the problem at hand, though, which is to provide a nice migration path from Python 2 to Python 3 for code that uses ASCII-based format strings that do contribute text as well as include parameter data. Whether such a technique would be more useful than simple concatenation (or complex concatenation such as join) remains to be seen, and possibly discussed, if anyone is interested, but it probably would belong on python-ideas, since it would not address an immediate porting issue. Assuming an ASCII-in-bytes format string (but with no contributed text to the result) one could write something like b"%{koi7}s%{00}v%{big5}d%{00}v%{ShiftJIS}s%{0000}v%b" / ( cyrillic, len( blob ), japanese, blob ) So the encodings to be applied to each of the input parameters could be explicitly specified. The %{00}v stuff would be interpolated into the output... expressed in ASCII as hex, two characters per byte. Note that the number uses Chinese digits in the big5 encoding, but I don't know if the Chinese even use their own digits or ASCII ones these days, or what base they use, I guess it was the Babylonians that used base 60 from which our timekeeping and angular measures were derived. The example shows a null byte or two between items in the output. So there _could be_ a coherent way to offer an interpolation mechanism that is pure binary, and allows selection of encoding of str data, if and as needed. One specifier could even be an encoding to apply to any format specifiers that don't include an encoding, so in the typical case of dealing with a single language output, the appropriate encoding could be set at the beginning of the format specification and overridden by particular specifiers if need be. But while there _could be_ such an interpolation mechanism, it isn't compatible with Python 2, and the jury hasn't decided whether such a thing is sufficiently more useful than concatenation to be worth implementing. A different operator might be required, or the whole thing could be a function instead of an operator, with a similar format specification, or one more like the minilanguage used with format in python 3. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Jan 14 08:43:41 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 14 Jan 2014 02:43:41 -0500 Subject: [Python-Dev] Test failures when running as root In-Reply-To: <52D4AC0D.2000407@mrabarnett.plus.com> References: <52D4AC0D.2000407@mrabarnett.plus.com> Message-ID: On 1/13/2014 10:16 PM, MRAB wrote: > On 2014-01-14 03:03, Terry Reedy wrote: >> On 1/13/2014 7:48 PM, Chris Angelico wrote: >>> And now for something completely different. >>> >>> My root buildbot is finally now able to telnet out and get "Connection >>> refused" errors. (For the curious, the VirtualBox "NAT" mode doesn't >>> work properly, but the new "NAT Network" mode does. Why? I have no >>> idea. But if anyone else is having the same problem, upgrade to the >>> latest VirtualBox and set up a NAT Network. All I care is, it now >>> works.) The test suite is now failing at another point, and this >>> applies to 2.7, 3.3, and 3.x. >>> >>> ====================================================================== >>> ERROR: test_initgroups (test.test_posix.PosixGroupsTester) >>> ---------------------------------------------------------------------- >>> Traceback (most recent call last): >>> File >>> "/root/buildarea/3.x.angelico-debian-amd64/build/Lib/test/test_posix.py", >>> >>> line 1143, in test_initgroups >>> g = max(self.saved_groups) + 1 >>> ValueError: max() arg is an empty sequence >> >> try: >> g = max(self.saved_groups) + 1 >> except ValueError: >> g = 1 >> > Alternatively: > > g = max(self.saved_groups, [1]) This would be [1] instead of 1. > > or even: > > g = max(self.saved_groups or [1]) This is 1. -- Terry Jan Reedy From ncoghlan at gmail.com Tue Jan 14 08:44:27 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 14 Jan 2014 17:44:27 +1000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 14 January 2014 16:04, Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 9:34 PM, Nick Coghlan wrote: > I've now looked at asciistr. (Thanks Glenn and Ethan for the link.) > > Now that I (hopefully) understand it, I'm worried that a text > processing algorithm that uses asciistr might under hard-to-predict > circumstances (such as when the arguments contain nothing of interest > to the algorithm) might return an asciistr instance instead of a str > or bytes instance, and this might confuse a caller (e.g. isinstance() > checks might fail, dict lookups, or whatever -- it feels like the > problem is similar to creating the perfect proxy type). Right, asciistr is designed for a specific kind of hybrid API where you want to accept binary input (and produce binary output) *and* you want to accept text input (and produce text output). Porting those from Python 2 to Python 3 is painful not because of any limitations of the str or bytes API but because it's the only use case I have found where I actually *missed* the implicit interoperability offered by the Python 2 str type. It's not an implementation style I would consider appropriate for the standard library - we need to code very defensively in order to aid debugging in arbitrary contexts, so I consider having an API like urllib.parse demand 7-bit ASCII in the binary version, and require text to handle impure input to be a better design choice. However, in an environment where you can place greater preconditions on your inputs (such as "ensure all input data is ASCII compatible") and you're willing to tolerate the occasional obscure traceback for particular kinds of errors, then it should be a convenient way to use common constants (like separators or URL scheme names) in an algorithm that can manipulate either binary or text, but not a combination of the two (the latter is still a nice improvement in correctness over Python 2, which allowed them to be mixed freely rather than requiring consistency across the inputs). It's still slightly different from Python 2, though. In Python 2, the interaction model was: str & str -> str str & unicode -> unicode (with the one exception being str.format: that consistently produces str rather than promoting to Unicode) My goal for asciistr is that it should exhibit the following behaviour: str & asciistr -> str asciistr & asciistr -> str (making it asciistr would be a pain and I don't have a use case for that) bytes & asciistr -> bytes So in code like that in urllib.parse (but in a more constrained context), you could just switch all your constants to asciistr, change your indexing operations to length 1 slices and then in theory essentially the same code that worked in Python 2 should also work in Python 3. However, Benno is finding that my warning about possible interoperability issues was accurate - we have various places where we do PyUnicode_Check() rather than PyUnicode_CheckExact(), which means we don't always notice a PEP 3118 buffer interface if it is provided by a str subclass. We'll look at those as we find them, and either work around them (if we can), decide not to support that behaviour in asciistr, or else I'll create a patch to resolve the interoperability issue. It's not necessarily a type I'd recommend using in production code, as there *will* always be a more explicit alternative that doesn't rely on a tricksy C extension type that only works in CPython. However, it's a type I think is worth having implemented and available on PyPI, even if it's just to disprove the claim that you *can't* write that kind of code in Python 3. >> PEP 460 should actually make asciistr easier in the long run, as I now >> expect we'll run into some "interesting" issues getting formatting to >> produce anything other than text (contrary to what I said elsewhere in >> these threads - I hadn't thought through the full implications at the >> time). > > For example? asciistr is a str subclass, so its formatting methods currently operate in the text domain and produce str output. Getting it to do otherwise is actually a task on the scale of implementing ASCII interpolation operations on the native bytes type. This realisation was the *other* factor that made me more comfortable with the idea of adding ASCII interpolation to the core bytes type - I previously thought asciistr could easily handle it, but it doesn't (except in the pure ASCII case where it could theoretically just encode at the end), thus also knocking out my "we can easily do this in an extension type, there's no need to provide it in the builtins" argument. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Tue Jan 14 09:20:27 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Jan 2014 21:20:27 +1300 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D4F34B.1000507@canterbury.ac.nz> Guido van Rossum wrote: > I've now looked at asciistr. (Thanks Glenn and Ethan for the link.) > > Now that I (hopefully) understand it, I'm worried that a text > processing algorithm that uses asciistr might under hard-to-predict > circumstances (such as when the arguments contain nothing of interest > to the algorithm) might return an asciistr instance instead of a str > or bytes instance, It seems to me that any algorithm with that property has a genuine ambiguity as to what it should return in that case. Arguably, returning an asciistr would be the *right* thing to do, because that would allow it to be used as a component of a larger algorithm that was polymorphic with respect to text/bytes. -- Greg From greg.ewing at canterbury.ac.nz Tue Jan 14 09:23:07 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Jan 2014 21:23:07 +1300 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D4DB1D.7040508@g.nevcal.com> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <52D4DB1D.7040508@g.nevcal.com> Message-ID: <52D4F3EB.3030408@canterbury.ac.nz> Glenn Linderman wrote: > A mechanism could be defined where > "format string" would only contain format specifications, and any other > text would be considered an error. Someone already did -- it's called struct.pack(). :-) -- Greg From tjreedy at udel.edu Tue Jan 14 09:32:20 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 14 Jan 2014 03:32:20 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: <52D4F614.5020000@udel.edu> On 1/14/2014 12:03 AM, Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 6:25 PM, Terry Reedy wrote: >>>>> byteformat(b'\x00{}\x02{}def', (b'\x01', b'abc',)) >> b'\x00\x01\x02abcdef' >> >> re.split produces [b'\x00', b'', b'\x02', b'', b'def']. The only ascii bias >> is the one already present is the representation of bytes, and the fact that >> Python code must have an ascii-compatible encoding. > > I don't think it's that easy. Just searching for '{' is enough to > break in surprising ways I see your point. The punning problem (between a byte being both itself and a special indicator character) is worse with bytes formats than the similar pun with text, and the potential for mysterious bugs greater. (This is related to why we split 'text' and 'bytes' to begin with.) With text, we break the pun by doubling the character to escape the special meaning. This works because, 1) % and { are relatively rare in text, 2) %% and {{ are grammatically incorrect, 3) %, {, and especially %% and {{ stand out visually. With bytes, 1) there is no reason why 37 (%) and 123 ({) should be rare, 2) there is no grammatical rule against the sequences 37, 37 or 123, 123, and 3) hex escapes \x25 and \x7b, which might appear in a bytes format, do not stand out as needing doubling. My example above breaks if b'\x00' is replaced with b'\x7b'. Even if a doubling and undoubling rule were added, re.split could not be used to split the format bytes. -- Terry Jan Reedy From stephen at xemacs.org Tue Jan 14 10:11:20 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 14 Jan 2014 18:11:20 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87a9ezdj2f.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > "Give up" makes it sound like I got tired of arguing without being > convinced rather than admitting I was just plain wrong. I thought it was something in between (you explicitly said "lenient PEP 460" doesn't hurt you, but my understanding was you still believe that there's a safer way, and it's the latter you aren't going to try to convince folks of). > While I'll still work on the asciistr proposal, Thank you for that. I really wish I had time to, myself, but not for several weeks... :-( > that's unrelated to PEP 460 - it's about making hybrid APIs less "It" refers to asciistr or to PEP 460? > painful to write in Python 3 when you're willing to place the > burden of ensuring ASCII compatibility of binary data on the > calling code. Versus what? From nad at acm.org Tue Jan 14 10:17:07 2014 From: nad at acm.org (Ned Deily) Date: Tue, 14 Jan 2014 01:17:07 -0800 Subject: [Python-Dev] cpython (merge 2.7 -> 3.1): complain when nbytes > buflen to fix possible buffer overflow (closes #20246) References: <3f3JH11tMgz7Lln@mail.python.org> Message-ID: In article <3f3JH11tMgz7Lln at mail.python.org>, benjamin.peterson wrote: > http://hg.python.org/cpython/rev/715fd3d8ac93 > changeset: 88454:715fd3d8ac93 > branch: 3.1 > parent: 86777:b1ddcb220a7f > parent: 88453:87673659d8f7 > user: Benjamin Peterson > date: Mon Jan 13 23:06:14 2014 -0500 > summary: > complain when nbytes > buflen to fix possible buffer overflow (closes > #20246) Benjamin, I think you may have mistakenly merged from 2.7 to 3.1 here and then left the 3.1 branch open (i.e. unmerged to 3.2). -- Ned Deily, nad at acm.org From ncoghlan at gmail.com Tue Jan 14 10:39:48 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 14 Jan 2014 19:39:48 +1000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87a9ezdj2f.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <87a9ezdj2f.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 14 Jan 2014 19:11, "Stephen J. Turnbull" wrote: > > Nick Coghlan writes: > > > "Give up" makes it sound like I got tired of arguing without being > > convinced rather than admitting I was just plain wrong. > > I thought it was something in between (you explicitly said "lenient > PEP 460" doesn't hurt you, but my understanding was you still believe > that there's a safer way, and it's the latter you aren't going to try > to convince folks of). I did say that at one point (when Guido first objected to the formatb idea), but I switched to complete agreement after he pointed out the ASCII assumption embedded in the formatting syntax itself. > > > While I'll still work on the asciistr proposal, > > Thank you for that. I really wish I had time to, myself, but not for > several weeks... :-( Heh, depending on how many quirky edge cases we find, we may still be working on it by then, especially since there are still a few docs updates and other fixes I want to get into Python 3.4. > > that's unrelated to PEP 460 - it's about making hybrid APIs less > > "It" refers to asciistr or to PEP 460? asciistr > > painful to write in Python 3 when you're willing to place the > > burden of ensuring ASCII compatibility of binary data on the > > calling code. > > Versus what? Versus doing explicit decoding the way urllib.parse does - it only accepts strict 7-bit ASCII as binary input by default, so you have to decode to text externally in order to handle arbitrary input that may contain other bytes. Cheers, Nick. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jan 14 10:54:58 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 14 Jan 2014 18:54:58 +0900 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: <878uuievm5.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > Of course, nobody in their right mind would use a format string > containing UTF-16 or EBCDIC. How about Shift JIS and Big 5 (traditionally "mandated by Microsoft" in their respective regions, with Shift JIS still overwhelmingly popular) and GB* ("GB18030 is not just a good idea, It's The Law")? Are the Japanese and Chinese crazy by definition? This is where I get the willies -- not that you think anybody is crazy by definition, but because I personally have to live with people who use crazy encodings for interoperability reasons, in fact about half the text I process daily for work is in those encodings. Anyway, the thought makes me shiver. GB2312 text may be encoded as EUC-CN, in which case it is ASCII-compatible, so no problem. I'm not sure if that's the encoding typically denoted by "GB2312" in email, though, and in any case it's irrelevant as most emails claiming "charset=GB2312" I receive nowadays include characters from the extension repertoires of GBK or GB18030. Shift JIS, Big 5, and GBK manage to avoid non-ASCII-compatible use of all characters significant in Python %-formatting (yay!), but .format is right out because {} are used. GB18030 in principle uses far more of the code space, including all of the syntactically significant punctuation, but in practice I don't know how many of those characters are actually assigned, let alone used. > And that is precisely my point. When you're using a format string, > all of the format string (not just the part between { and }) had > better use ASCII or an ASCII superset. And this (rightly) > constrains the output to an ASCII superset as well. Except that if you interpolate something like Shift JIS, much of the ASCII really isn't ASCII. That's a general issue, of course, if you do something that requires iterated format strings, but it's far more likely to appear to work most of the time with those encodings. Of course you can say "if it hurts, don't do that", but .... From ncoghlan at gmail.com Tue Jan 14 13:46:25 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 14 Jan 2014 22:46:25 +1000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <878uuievm5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <878uuievm5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 14 January 2014 19:54, Stephen J. Turnbull wrote: > Guido van Rossum writes: > > And that is precisely my point. When you're using a format string, > > all of the format string (not just the part between { and }) had > > better use ASCII or an ASCII superset. And this (rightly) > > constrains the output to an ASCII superset as well. > > Except that if you interpolate something like Shift JIS, much of the > ASCII really isn't ASCII. That's a general issue, of course, if you > do something that requires iterated format strings, but it's far more > likely to appear to work most of the time with those encodings. > > Of course you can say "if it hurts, don't do that", but .... Right, that's the danger I was worried about, but the problem is that there's at least *some* minimum level of ASCII compatibility that needs to be assumed in order to define an interpolation format at all (this is the point I originally missed). For printf-style formatting, it's % along with the various formatting characters and other syntax (like digits, parentheses, variable names and "."), with the format method it's braces, brackets, colons, variable names, etc. The mini-language parser has to assume in encoding in order to interpret the format string, and that's *all* done assuming an ASCII compatible format string (which must make life interesting if you try to use an ASCII incompatible coding cookie for your source code - I'm actually not sure what the full implications of that *are* for bytes literals in Python 3). The one remaining way I could potentially see a formatb method working is along the lines of what Glenn (I think) suggested: just like struct definitions, the formatb specifier would have to consist *solely* of substitution fields. However, that's getting awfully close to being just an alternate spelling for the struct module or bytes.join at that point, which hardly makes for a compelling case to add two new methods to a builtin type. Given that one of the concepts with the Python 3 transition was to take certain problematic constructs (like ASCII compatible interpolation directly to binary without a separate encoding step) away and decide whether or not we were happy to live without them, I think this one has proven to have sufficient staying power to finally bring it back in Python 3.5 (especially given the gain in lowering the barrier to porting Python 2 code that makes heavy use of interpolation to ASCII compatible binary formats). It's certainly a decision that has its downsides, with the potential impact on users of ASCII incompatible encodings (mostly in Asia) being the main one, but I think the increased convenience in working with ASCII compatible binary protocols and file formats is worth the cost. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From benjamin at python.org Tue Jan 14 15:16:54 2014 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 14 Jan 2014 06:16:54 -0800 Subject: [Python-Dev] cpython (merge 2.7 -> 3.1): complain when nbytes > buflen to fix possible buffer overflow (closes #20246) In-Reply-To: References: <3f3JH11tMgz7Lln@mail.python.org> Message-ID: <1389709014.28083.70588489.446DDC89@webmail.messagingengine.com> On Tue, Jan 14, 2014, at 01:17 AM, Ned Deily wrote: > In article <3f3JH11tMgz7Lln at mail.python.org>, > benjamin.peterson wrote: > > http://hg.python.org/cpython/rev/715fd3d8ac93 > > changeset: 88454:715fd3d8ac93 > > branch: 3.1 > > parent: 86777:b1ddcb220a7f > > parent: 88453:87673659d8f7 > > user: Benjamin Peterson > > date: Mon Jan 13 23:06:14 2014 -0500 > > summary: > > complain when nbytes > buflen to fix possible buffer overflow (closes > > #20246) > > Benjamin, I think you may have mistakenly merged from 2.7 to 3.1 here > and then left the 3.1 branch open (i.e. unmerged to 3.2). The name of the game is graft-gone-horribly-wrong. I think we can just ignore it, snce 3.1 is on its last legs anyway. From rdmurray at bitdance.com Tue Jan 14 16:58:49 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 14 Jan 2014 10:58:49 -0500 Subject: [Python-Dev] magic method __bytes__ In-Reply-To: <52D4951E.3080104@stoneleaf.us> References: <52D4951E.3080104@stoneleaf.us> Message-ID: <20140114155850.221D425003F@webabinitio.net> On Mon, 13 Jan 2014 17:38:38 -0800, Ethan Furman wrote: > Has anyone actually used __bytes__ yet? What for? bytes(email.message.Message()) returns the message object serialized to "wire format". --David PS: I've always thought of "wire format" as *including* files...a file is a just a "wire" with an indefinite destination and transmission time.... From guido at python.org Tue Jan 14 16:59:44 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Jan 2014 07:59:44 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D4F34B.1000507@canterbury.ac.nz> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <52D4F34B.1000507@canterbury.ac.nz> Message-ID: On Tue, Jan 14, 2014 at 12:20 AM, Greg Ewing wrote: > Guido van Rossum wrote: >> >> I've now looked at asciistr. (Thanks Glenn and Ethan for the link.) >> >> Now that I (hopefully) understand it, I'm worried that a text >> processing algorithm that uses asciistr might under hard-to-predict >> circumstances (such as when the arguments contain nothing of interest >> to the algorithm) might return an asciistr instance instead of a str >> or bytes instance, > > > It seems to me that any algorithm with that property > has a genuine ambiguity as to what it should return > in that case. Arguably, returning an asciistr would > be the *right* thing to do, because that would allow > it to be used as a component of a larger algorithm > that was polymorphic with respect to text/bytes. Here's an example of what I mean: def spam(a): r = asciistr('(') if a: r += a.strip() r += asciistr(')') return r The argument must be a string. If I call spam(''), a's type is never concatenated with r, so the return value is an asciistr. To fix this particular case, we could drop the "if a:" part. But it could be more significant, e.g. it could be something like "if a contains any digits". The general fix would be to add else: r += a[:0] but that's still an example of the awkwardness that asciistr() is trying to avoid. -- --Guido van Rossum (python.org/~guido) From brett at python.org Tue Jan 14 17:30:01 2014 From: brett at python.org (Brett Cannon) Date: Tue, 14 Jan 2014 11:30:01 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon wrote: > > I have been going on the assumption that bytes.format() would change what > > '{}' meant for itself and would only interpolate bytes. That convenient > > between Python 2 and 3 since it represents what we want it to (str and > bytes > > under the hood, respectively), so it just falls through. We could also > add a > > 'b' conversion for bytes() explicitly so as to help people not > accidentally > > mix up things in bytes.format() and str.format(). But I was not > suggesting > > adding a specific format spec for bytes but instead making bytes.format() > > just do the .encode('ascii') automatically to help with compatibility > when a > > format spec was present. If people want fancy formatting for bytes they > can > > always do it themselves before calling bytes.format(). > > This seems hastily written (e.g. verb missing :-), and I'm not clear > on what you are (or were) actually proposing. When exactly would > bytes.format() need .encode('ascii')? > > I would be happy to wait a few hours or days for you to to write it up > clearly, rather than responding in a hurry. Sorry about that. Busy day at work + trying to stay on top of this entire conversation was a bit tough. Let me try to lay out what I'm suggesting for bytes.format() in terms of how it changes http://docs.python.org/3/library/string.html#format-string-syntax for bytes. 1. New conversion operator of 'b' that operates as PEP 460 specifies (i.e. tries to get a buffer, else calls __bytes__). The default conversion changes from 's' to 'b'. 2. Use of the conversion field adds an added step of calling str.encode('ascii', 'strict') on the result returned from calling __format__(). That's it. So point 1 means that the following would work in Python 3.5:: b'Hello, {}, how are you?'.format(b'Guido') b'Hello, {!b}, how are you?'.format(b'Guido') It would produce an error if you used a text argument for 'Guido' since str doesn't define __bytes__ or a buffer. That gives the EIBTI group their bytes.format() where nothing magical happens. For point 2, let's say you have the following in Python 2:: 'I have {} bottles of beer on the wall'.format(10) Under my proposal, how would you change it to get the same result in Python 2 and 3?:: b'I have {:d} bottles of beer on the wall'.format(10) In Python 2 you're just being more explicit about the format, otherwise it's the same semantics as today. In Python 3, though, this would translate into (under the hood):: b'I have {} bottles of beer on the wall'.format(format(10, 'd').encode('ascii', 'strict')) This leads to the same bytes value in Python 2 (since it's just a string) and in Python 3 (as everything accepted by bytes.format() is either bytes already or converted to from encoding to ASCII bytes). While Python 2 users would need to make sure they used a format spec to get the same result in both Python 2 and 3 for ASCII bytes, it's a minor change which also makes the format more explicit so it's not an inherently bad thing. And for those that don't want to utilize the automatic ASCII encoding they can just not use a format spec in the format string and just pass in bytes directly (i.e. call __format__() themselves and then call str.encode() on their own). So PBP people get to have a simple way to use bytes.format() in Python 2 and 3 when dealing with things that can be represented as ASCII (just as the bytes methods allow for currently). I think this covers your desire to have numbers and anything else that can be represented as ASCII be supported for easy porting while covering my desire that any automatic encoding is clearly explicit in the format string and in no way special-cased for only some types (the introduction of a 'c' converter from PEP 460 is also fine with me). How you would want to translate this proposal with the % operator I'm not sure since it has been quite a while since I last seriously used it and so I don't think I'm in a good position to propose a shift for it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Jan 14 18:29:58 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 14 Jan 2014 12:29:58 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: Brett, I like your proposal. ?There is one idea I have that could, perhaps, improve it: 1. ?%s" and ?{}? will continue to work for bytes and bytearray in the following fashion: ?- check if __bytes__/Py_buffer supported. ?- if it is, check that the bytes are strictly in the printable? ? ?ASCII-subset (a-z, A-Z, 0-9 + special symbols like ! etc). ? ?Throw an error if the check fails. If not - concatenate. ?- Try str(), and do ".encode(?ascii?, ?stcict?)? on the result. This way *most* of the use cases of python2 will be covered without touching the code. So: ?- b?Hello {}?.format(?world?)? ? ?will be the same as b?hello ? + str(?world?).encode(?ascii?, ?strict?) ?-?b?Hello {}?.format(?\u0394?) will throw?UnicodeEncodeError ?- b?Status: {}?.format(200) ? ?will be the same as b?Status: ? + str(200).encode(?ascii?, ?strict?) ?- b?Hello %s? % (?world?,) - the same as the first example ?- b?Connection: {}?.format(b?keep-alive?) - works ?- b?Hello %s? % (b'\xce\x94?,) - will fail, not ASCII subset we accept I think it?s OK to check the buffers for ASCII-subset only. Yes, it will have some sort of sub-optimal performance, but then, it?s quite rare when string formatting is used to concatenate huge buffers. 2. new operators {!b} and %b. This ones will just use ?__bytes__? and? Py_buffer. -- Yury Selivanov On January 14, 2014 at 11:31:51 AM, Brett Cannon (brett at python.org) wrote: > > On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum > wrote: > > > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon > wrote: > > > I have been going on the assumption that bytes.format() would > change what > > > '{}' meant for itself and would only interpolate bytes. That > convenient > > > between Python 2 and 3 since it represents what we want it to > (str and > > bytes > > > under the hood, respectively), so it just falls through. We > could also > > add a > > > 'b' conversion for bytes() explicitly so as to help people > not > > accidentally > > > mix up things in bytes.format() and str.format(). But I was > not > > suggesting > > > adding a specific format spec for bytes but instead making > bytes.format() > > > just do the .encode('ascii') automatically to help with compatibility > > when a > > > format spec was present. If people want fancy formatting for > bytes they > > can > > > always do it themselves before calling bytes.format(). > > > > This seems hastily written (e.g. verb missing :-), and I'm not > clear > > on what you are (or were) actually proposing. When exactly would > > bytes.format() need .encode('ascii')? > > > > I would be happy to wait a few hours or days for you to to write it > up > > clearly, rather than responding in a hurry. > > > Sorry about that. Busy day at work + trying to stay on top of this > entire > conversation was a bit tough. Let me try to lay out what I'm suggesting > for > bytes.format() in terms of how it changes > http://docs.python.org/3/library/string.html#format-string-syntax > for bytes. > > 1. New conversion operator of 'b' that operates as PEP 460 specifies > (i.e. > tries to get a buffer, else calls __bytes__). The default conversion > changes from 's' to 'b'. > 2. Use of the conversion field adds an added step of calling > str.encode('ascii', 'strict') on the result returned from > calling > __format__(). > > That's it. So point 1 means that the following would work in Python > 3.5:: > > b'Hello, {}, how are you?'.format(b'Guido') > b'Hello, {!b}, how are you?'.format(b'Guido') > > It would produce an error if you used a text argument for 'Guido' > since str > doesn't define __bytes__ or a buffer. That gives the EIBTI group > their > bytes.format() where nothing magical happens. > > For point 2, let's say you have the following in Python 2:: > > 'I have {} bottles of beer on the wall'.format(10) > > Under my proposal, how would you change it to get the same result > in Python > 2 and 3?:: > > b'I have {:d} bottles of beer on the wall'.format(10) > > In Python 2 you're just being more explicit about the format, > otherwise > it's the same semantics as today. In Python 3, though, this would > translate > into (under the hood):: > > b'I have {} bottles of beer on the wall'.format(format(10, > 'd').encode('ascii', 'strict')) > > This leads to the same bytes value in Python 2 (since it's just > a string) > and in Python 3 (as everything accepted by bytes.format() is > either bytes > already or converted to from encoding to ASCII bytes). While > Python 2 users > would need to make sure they used a format spec to get the same result > in > both Python 2 and 3 for ASCII bytes, it's a minor change which also > makes > the format more explicit so it's not an inherently bad thing. > And for those > that don't want to utilize the automatic ASCII encoding they > can just not > use a format spec in the format string and just pass in bytes directly > (i.e. call __format__() themselves and then call str.encode() > on their > own). So PBP people get to have a simple way to use bytes.format() > in > Python 2 and 3 when dealing with things that can be represented > as ASCII > (just as the bytes methods allow for currently). > > I think this covers your desire to have numbers and anything else > that can > be represented as ASCII be supported for easy porting while covering > my > desire that any automatic encoding is clearly explicit in the > format string > and in no way special-cased for only some types (the introduction > of a 'c' > converter from PEP 460 is also fine with me). > > How you would want to translate this proposal with the % operator > I'm not > sure since it has been quite a while since I last seriously used > it and so > I don't think I'm in a good position to propose a shift for it. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com > From brett at python.org Tue Jan 14 18:47:15 2014 From: brett at python.org (Brett Cannon) Date: Tue, 14 Jan 2014 12:47:15 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: On Tue, Jan 14, 2014 at 12:29 PM, Yury Selivanov wrote: > Brett, > > > I like your proposal. There is one idea I have that could, > perhaps, improve it: > > > 1. ?%s" and ?{}? will continue to work for bytes and bytearray in > the following fashion: > > - check if __bytes__/Py_buffer supported. > - if it is, check that the bytes are strictly in the printable > ASCII-subset (a-z, A-Z, 0-9 + special symbols like ! etc). > Throw an error if the check fails. If not - concatenate. > - Try str(), and do ".encode(?ascii?, ?stcict?)? on the result. > > This way *most* of the use cases of python2 will be covered without > touching the code. So: > See, I'm fine with having people update their format strings to specify a format spec; it's minor and isn't totally useless as it expresses what they mean more explicitly (e.g. "I want this to be a int, I want this to be a float, and I want this to be an ASCII string" using d, f, and s, respectively). I want people to have to make a conscious decision to fall back on an ASCII encoding. What you are suggesting is for people have to make a conscious decision **not** to encode to ASCII implicitly which is what I'm trying to avoid with this proposal. My goal is to make it easy to work with ASCII but as an explicit choice to, not by default. -Brett > - b?Hello {}?.format(?world?) > will be the same as b?hello ? + str(?world?).encode(?ascii?, ?strict?) > > - b?Hello {}?.format(?\u0394?) will throw UnicodeEncodeError > > - b?Status: {}?.format(200) > will be the same as b?Status: ? + str(200).encode(?ascii?, ?strict?) > > - b?Hello %s? % (?world?,) - the same as the first example > > - b?Connection: {}?.format(b?keep-alive?) - works > > - b?Hello %s? % (b'\xce\x94?,) - will fail, not ASCII subset we accept > > I think it?s OK to check the buffers for ASCII-subset only. Yes, it > will have some sort of sub-optimal performance, but then, it?s quite > rare when string formatting is used to concatenate huge buffers. > 2. new operators {!b} and %b. This ones will just use ?__bytes__? and > Py_buffer. > > -- > Yury Selivanov > > On January 14, 2014 at 11:31:51 AM, Brett Cannon (brett at python.org) wrote: > > > > On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum > > wrote: > > > > > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon > > wrote: > > > > I have been going on the assumption that bytes.format() would > > change what > > > > '{}' meant for itself and would only interpolate bytes. That > > convenient > > > > between Python 2 and 3 since it represents what we want it to > > (str and > > > bytes > > > > under the hood, respectively), so it just falls through. We > > could also > > > add a > > > > 'b' conversion for bytes() explicitly so as to help people > > not > > > accidentally > > > > mix up things in bytes.format() and str.format(). But I was > > not > > > suggesting > > > > adding a specific format spec for bytes but instead making > > bytes.format() > > > > just do the .encode('ascii') automatically to help with compatibility > > > when a > > > > format spec was present. If people want fancy formatting for > > bytes they > > > can > > > > always do it themselves before calling bytes.format(). > > > > > > This seems hastily written (e.g. verb missing :-), and I'm not > > clear > > > on what you are (or were) actually proposing. When exactly would > > > bytes.format() need .encode('ascii')? > > > > > > I would be happy to wait a few hours or days for you to to write it > > up > > > clearly, rather than responding in a hurry. > > > > > > Sorry about that. Busy day at work + trying to stay on top of this > > entire > > conversation was a bit tough. Let me try to lay out what I'm suggesting > > for > > bytes.format() in terms of how it changes > > http://docs.python.org/3/library/string.html#format-string-syntax > > for bytes. > > > > 1. New conversion operator of 'b' that operates as PEP 460 specifies > > (i.e. > > tries to get a buffer, else calls __bytes__). The default conversion > > changes from 's' to 'b'. > > 2. Use of the conversion field adds an added step of calling > > str.encode('ascii', 'strict') on the result returned from > > calling > > __format__(). > > > > That's it. So point 1 means that the following would work in Python > > 3.5:: > > > > b'Hello, {}, how are you?'.format(b'Guido') > > b'Hello, {!b}, how are you?'.format(b'Guido') > > > > It would produce an error if you used a text argument for 'Guido' > > since str > > doesn't define __bytes__ or a buffer. That gives the EIBTI group > > their > > bytes.format() where nothing magical happens. > > > > For point 2, let's say you have the following in Python 2:: > > > > 'I have {} bottles of beer on the wall'.format(10) > > > > Under my proposal, how would you change it to get the same result > > in Python > > 2 and 3?:: > > > > b'I have {:d} bottles of beer on the wall'.format(10) > > > > In Python 2 you're just being more explicit about the format, > > otherwise > > it's the same semantics as today. In Python 3, though, this would > > translate > > into (under the hood):: > > > > b'I have {} bottles of beer on the wall'.format(format(10, > > 'd').encode('ascii', 'strict')) > > > > This leads to the same bytes value in Python 2 (since it's just > > a string) > > and in Python 3 (as everything accepted by bytes.format() is > > either bytes > > already or converted to from encoding to ASCII bytes). While > > Python 2 users > > would need to make sure they used a format spec to get the same result > > in > > both Python 2 and 3 for ASCII bytes, it's a minor change which also > > makes > > the format more explicit so it's not an inherently bad thing. > > And for those > > that don't want to utilize the automatic ASCII encoding they > > can just not > > use a format spec in the format string and just pass in bytes directly > > (i.e. call __format__() themselves and then call str.encode() > > on their > > own). So PBP people get to have a simple way to use bytes.format() > > in > > Python 2 and 3 when dealing with things that can be represented > > as ASCII > > (just as the bytes methods allow for currently). > > > > I think this covers your desire to have numbers and anything else > > that can > > be represented as ASCII be supported for easy porting while covering > > my > > desire that any automatic encoding is clearly explicit in the > > format string > > and in no way special-cased for only some types (the introduction > > of a 'c' > > converter from PEP 460 is also fine with me). > > > > How you would want to translate this proposal with the % operator > > I'm not > > sure since it has been quite a while since I last seriously used > > it and so > > I don't think I'm in a good position to propose a shift for it. > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Jan 14 18:57:05 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 14 Jan 2014 12:57:05 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: On January 14, 2014 at 12:47:35 PM, Brett Cannon (brett at python.org) wrote: > > On Tue, Jan 14, 2014 at 12:29 PM, Yury Selivanov wrote: > > > Brett, > > > > > > I like your proposal. There is one idea I have that could, > > perhaps, improve it: > > > > > > 1. ?%s" and ?{}? will continue to work for bytes and bytearray > in > > the following fashion: > > > > - check if __bytes__/Py_buffer supported. > > - if it is, check that the bytes are strictly in the printable > > ASCII-subset (a-z, A-Z, 0-9 + special symbols like ! etc). > > Throw an error if the check fails. If not - concatenate. > > - Try str(), and do ".encode(?ascii?, ?stcict?)? on the result. > > > > > > This way *most* of the use cases of python2 will be covered without > > touching the code. So: > > > > See, I'm fine with having people update their format strings > to specify a > format spec; it's minor and isn't totally useless as it expresses > what they > mean more explicitly (e.g. "I want this to be a int, I want this > to be a > float, and I want this to be an ASCII string" using d, f, and s, > respectively). I want people to have to make a conscious decision > to fall > back on an ASCII encoding. What you are suggesting is for people > have to > make a conscious decision **not** to encode to ASCII implicitly > which is > what I'm trying to avoid with this proposal. My goal is to make > it easy to > work with ASCII but as an explicit choice to, not by default. I understand. ?But OTOH, this whole discussion started because of? the lack of convenience to work with bytes in py3, plus it?s hard to maintain *same* codebase. ?Updating the code to include new ?%b? operators won?t help them. My proposal is based on the assumption, that most of the string formatting people usually use in python2 on ?str? (not ?unicode?) is used for ascii. That?s the implicit convenience of using bytes that everybody is looking for in py3. It allows having single codebase, and provides the necessary safety. Anyways, my 2 cents. Thank you, Yury From guido at python.org Tue Jan 14 18:58:32 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Jan 2014 09:58:32 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <52D4F34B.1000507@canterbury.ac.nz> Message-ID: On Tue, Jan 14, 2014 at 7:59 AM, Guido van Rossum wrote: > Here's an example of what I mean: I sent that off without proofreading, and I also got one detail about asciistr() wrong. Here are some corrections. > def spam(a): > r = asciistr('(') > if a: r += a.strip() > r += asciistr(')') > return r > > The argument must be a string. Or a bytes object. And the point is that the return type should be the same as the argument type. > If I call spam(''), or spam(b'') > a's type is never concatenated with r, so the > return value is an asciistr. Actually, Nick explained that asciistr() + asciistr() returns str, so this would be accidentally correct if called with '', but wrong (returning a str instead of a bytes) if called with b''. > To fix this particular case, we could > drop the "if a:" part. But it could be more significant, e.g. it could > be something like "if a contains any digits". The general fix would be > to add > > else: r += a[:0] > > but that's still an example of the awkwardness that asciistr() is > trying to avoid. This is still valid. -- --Guido van Rossum (python.org/~guido) From jimjjewett at gmail.com Tue Jan 14 19:11:59 2014 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Tue, 14 Jan 2014 10:11:59 -0800 (PST) Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D4845B.10009@canterbury.ac.nz> Message-ID: <52d57def.0180310a.4b08.ffffa287@mx.google.com> Nick Coghlan wrote: >> Arbitrary binary data and ASCII compatible binary data are *different >> things* and the only argument in favour of modelling them with a single >> type is because Python 2 did it that way. Greg Ewing replied: > I would say that ASCII compatible binary data is a > *subset* of arbitrary binary data. As such, a type > designed for arbitrary binary data is a perfectly good > way of representing ASCII compatible binary data. But not when you care about the ASCII-compatible part; then you should use a subclass. Obviously, it is too late for separating bytes from AsciiStructuredBytes. PBP *may* even mean that just using the "subclass" for everything (and just the ignoring the ASCII specific methods when they aren't appropriate) was always the right implementation choice. But in terms of explaining the text model, that separation is important enough that (1) We should be reluctant to strengthen the "its really just ASCII" messages. (2) It *may* be worth creating a virtual split in the documentation. I'm willing ot work on (2) if there is general consensus that it would be a good idea. As a rough sketch, I would change places like http://docs.python.org/3/library/stdtypes.html#typebytes from: Bytes objects are immutable sequences of single bytes. Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways. to something more like: Bytes objects are immutable sequences of single bytes. A Bytes object could represent anything, and is appropriate as the underlying storage for a sound sample or image file. Virtual subclass ASCIIStructuredBytes ==================================== One particularly common use of bytes is to represent the contents of a file, or of a network message. In these cases, the bytes will often represent Text *in a specific encoding* and that encoding will usually be a superset of ASCII. Rather than create and support an ASCIIStructuredBytes subclass, Python simply added support for these use cases straight to Bytes objects, and assumes that this support simply won't be used when when it does not make sense. For example, bytes literals *could* be used to construct a sound sample, but the literals will be far easier to read when they are used to represent (encoded) ASCII text, such as "OPEN". -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ From chris.barker at noaa.gov Tue Jan 14 18:45:59 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 14 Jan 2014 09:45:59 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: On Tue, Jan 14, 2014 at 9:29 AM, Yury Selivanov wrote: > - Try str(), and do ".encode(?ascii?, ?stcict?)? on the result. > please no -- that's the source of a lot of pain in py2 now. having a failure as a result of the value, rather than the type, of an object just makes hard-to-test for bugs. Everything will be hunky dory for development and testing, then in deployment some idiot ( ;-) ) will pass in some non-ascii compatible string and you get failure. And the person who gets the failure doesn't understand why, or they wouldn't have passed in non-ascii values in the first place... Ease of porting is nice, but let's not make it easy to port bug-prone code. -Chris > > This way *most* of the use cases of python2 will be covered without > touching the code. So: > > - b?Hello {}?.format(?world?) > will be the same as b?hello ? + str(?world?).encode(?ascii?, ?strict?) > > - b?Hello {}?.format(?\u0394?) will throw UnicodeEncodeError > > - b?Status: {}?.format(200) > will be the same as b?Status: ? + str(200).encode(?ascii?, ?strict?) > > - b?Hello %s? % (?world?,) - the same as the first example > > - b?Connection: {}?.format(b?keep-alive?) - works > > - b?Hello %s? % (b'\xce\x94?,) - will fail, not ASCII subset we accept > > I think it?s OK to check the buffers for ASCII-subset only. Yes, it > will have some sort of sub-optimal performance, but then, it?s quite > rare when string formatting is used to concatenate huge buffers. > > 2. new operators {!b} and %b. This ones will just use ?__bytes__? and > Py_buffer. > > -- > Yury Selivanov > > On January 14, 2014 at 11:31:51 AM, Brett Cannon (brett at python.org) wrote: > > > > On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum > > wrote: > > > > > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon > > wrote: > > > > I have been going on the assumption that bytes.format() would > > change what > > > > '{}' meant for itself and would only interpolate bytes. That > > convenient > > > > between Python 2 and 3 since it represents what we want it to > > (str and > > > bytes > > > > under the hood, respectively), so it just falls through. We > > could also > > > add a > > > > 'b' conversion for bytes() explicitly so as to help people > > not > > > accidentally > > > > mix up things in bytes.format() and str.format(). But I was > > not > > > suggesting > > > > adding a specific format spec for bytes but instead making > > bytes.format() > > > > just do the .encode('ascii') automatically to help with compatibility > > > when a > > > > format spec was present. If people want fancy formatting for > > bytes they > > > can > > > > always do it themselves before calling bytes.format(). > > > > > > This seems hastily written (e.g. verb missing :-), and I'm not > > clear > > > on what you are (or were) actually proposing. When exactly would > > > bytes.format() need .encode('ascii')? > > > > > > I would be happy to wait a few hours or days for you to to write it > > up > > > clearly, rather than responding in a hurry. > > > > > > Sorry about that. Busy day at work + trying to stay on top of this > > entire > > conversation was a bit tough. Let me try to lay out what I'm suggesting > > for > > bytes.format() in terms of how it changes > > http://docs.python.org/3/library/string.html#format-string-syntax > > for bytes. > > > > 1. New conversion operator of 'b' that operates as PEP 460 specifies > > (i.e. > > tries to get a buffer, else calls __bytes__). The default conversion > > changes from 's' to 'b'. > > 2. Use of the conversion field adds an added step of calling > > str.encode('ascii', 'strict') on the result returned from > > calling > > __format__(). > > > > That's it. So point 1 means that the following would work in Python > > 3.5:: > > > > b'Hello, {}, how are you?'.format(b'Guido') > > b'Hello, {!b}, how are you?'.format(b'Guido') > > > > It would produce an error if you used a text argument for 'Guido' > > since str > > doesn't define __bytes__ or a buffer. That gives the EIBTI group > > their > > bytes.format() where nothing magical happens. > > > > For point 2, let's say you have the following in Python 2:: > > > > 'I have {} bottles of beer on the wall'.format(10) > > > > Under my proposal, how would you change it to get the same result > > in Python > > 2 and 3?:: > > > > b'I have {:d} bottles of beer on the wall'.format(10) > > > > In Python 2 you're just being more explicit about the format, > > otherwise > > it's the same semantics as today. In Python 3, though, this would > > translate > > into (under the hood):: > > > > b'I have {} bottles of beer on the wall'.format(format(10, > > 'd').encode('ascii', 'strict')) > > > > This leads to the same bytes value in Python 2 (since it's just > > a string) > > and in Python 3 (as everything accepted by bytes.format() is > > either bytes > > already or converted to from encoding to ASCII bytes). While > > Python 2 users > > would need to make sure they used a format spec to get the same result > > in > > both Python 2 and 3 for ASCII bytes, it's a minor change which also > > makes > > the format more explicit so it's not an inherently bad thing. > > And for those > > that don't want to utilize the automatic ASCII encoding they > > can just not > > use a format spec in the format string and just pass in bytes directly > > (i.e. call __format__() themselves and then call str.encode() > > on their > > own). So PBP people get to have a simple way to use bytes.format() > > in > > Python 2 and 3 when dealing with things that can be represented > > as ASCII > > (just as the bytes methods allow for currently). > > > > I think this covers your desire to have numbers and anything else > > that can > > be represented as ASCII be supported for easy porting while covering > > my > > desire that any automatic encoding is clearly explicit in the > > format string > > and in no way special-cased for only some types (the introduction > > of a 'c' > > converter from PEP 460 is also fine with me). > > > > How you would want to translate this proposal with the % operator > > I'm not > > sure since it has been quite a while since I last seriously used > > it and so > > I don't think I'm in a good position to propose a shift for it. > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jan 14 19:16:17 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Jan 2014 10:16:17 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [Other readers: asciistr is at https://github.com/jeamland/asciicompat] On Mon, Jan 13, 2014 at 11:44 PM, Nick Coghlan wrote: > Right, asciistr is designed for a specific kind of hybrid API where > you want to accept binary input (and produce binary output) *and* you > want to accept text input (and produce text output). Porting those > from Python 2 to Python 3 is painful not because of any limitations of > the str or bytes API but because it's the only use case I have found > where I actually *missed* the implicit interoperability offered by the > Python 2 str type. Yes, the use case is clear. > It's not an implementation style I would consider appropriate for the > standard library - we need to code very defensively in order to aid > debugging in arbitrary contexts, so I consider having an API like > urllib.parse demand 7-bit ASCII in the binary version, and require > text to handle impure input to be a better design choice. This surprises me. I think asciistr should strive to be useful for the stdlib as well. > However, in an environment where you can place greater preconditions > on your inputs (such as "ensure all input data is ASCII compatible") That gives me the Python 2 willies. :-( > and you're willing to tolerate the occasional obscure traceback for > particular kinds of errors, Really? Can you give an example where the traceback using asciistr() would be more obscure than using the technique you used in urllib.parse? > then it should be a convenient way to use > common constants (like separators or URL scheme names) in an algorithm > that can manipulate either binary or text, but not a combination of > the two (the latter is still a nice improvement in correctness over > Python 2, which allowed them to be mixed freely rather than requiring > consistency across the inputs). Unfortunately I suspect there are still examples where asciistr's "submissive" behavior can produce surprises. E.g. consider a function of two arguments that must either be both bytes or both str. It's easily conceivable that for certain combinations of incorrect arguments (i.e. one bytes and one str) the function doesn't raise an error but returns something of one or the other type. (And this is exactly the Python 2 outcome we're trying to avoid.) > It's still slightly different from Python 2, though. In Python 2, the > interaction model was: > > str & str -> str > str & unicode -> unicode > > (with the one exception being str.format: that consistently produces > str rather than promoting to Unicode) Or raises good old UnicodeError. :-( > My goal for asciistr is that it should exhibit the following behaviour: > > str & asciistr -> str > asciistr & asciistr -> str (making it asciistr would be a pain and > I don't have a use case for that) I almost had one in the example code I sent in response to Greg. > bytes & asciistr -> bytes I understand that '&' here stands for "any arbitrary combination", but what about searches? Given that asciistr's base class is str, won't it still blow up if you try to use it as an argument to e.g. bytes.startswith()? Equality tests also sound problematic; is b'x' == asciistr('x') == 'x' ??? > So in code like that in urllib.parse (but in a more constrained > context), you could just switch all your constants to asciistr, change > your indexing operations to length 1 slices and then in theory > essentially the same code that worked in Python 2 should also work in > Python 3. The more I think about this, the less I believe it's that easy. I suspect you had the right idea when you mentioned singledispatch. It might be easier to write the bytes version in terms of the string versions wrapped in decode/encode, or vice versa, rather than trying to reason out all the different combinations of str, bytes, asciistr. > However, Benno is finding that my warning about possible > interoperability issues was accurate - we have various places where we > do PyUnicode_Check() rather than PyUnicode_CheckExact(), which means > we don't always notice a PEP 3118 buffer interface if it is provided > by a str subclass. Not sure I understand this, but I believe him when he says this won't be easy. > We'll look at those as we find them, and either > work around them (if we can), decide not to support that behaviour in > asciistr, or else I'll create a patch to resolve the interoperability > issue. > > It's not necessarily a type I'd recommend using in production code, as > there *will* always be a more explicit alternative that doesn't rely > on a tricksy C extension type that only works in CPython. However, > it's a type I think is worth having implemented and available on PyPI, > even if it's just to disprove the claim that you *can't* write that > kind of code in Python 3. Hm. It is beginning to sound more and more flawed. I also worry that it will bring back the nightmare of data-dependent UnicodeError back. E.g. this (from tests/basic.py): def test_asciistr_will_not_accept_codepoints_above_127(self): self.assertRaises(ValueError, asciistr, 'Schr?dinger') looks reasonable enough when you assume asciistr() is always used with a literal as argument -- but I suspect that plenty of people would misunderstand its purpose and write asciistr(s) as a "clever" way to turn a string into something that's compatible with both bytes and strings... :-( -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Tue Jan 14 19:17:12 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 10:17:12 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52d57def.0180310a.4b08.ffffa287@mx.google.com> References: <52d57def.0180310a.4b08.ffffa287@mx.google.com> Message-ID: <52D57F28.6080502@stoneleaf.us> On 01/14/2014 10:11 AM, Jim J. Jewett wrote: > > But in terms of explaining the text model, that > separation is important enough that > (2) It *may* be worth creating a virtual > split in the documentation. I think (2) is a great idea. > I'm willing ot work on (2) if there is general consensus > that it would be a good idea. As a rough sketch, I > would change places like > > http://docs.python.org/3/library/stdtypes.html#typebytes > > from: > > Bytes objects are immutable sequences of single bytes. > Since many major binary protocols are based on the ASCII > text encoding, bytes objects offer several methods that > are only valid when working with ASCII compatible data > and are closely related to string objects in a variety > of other ways. > > to something more like: > > Bytes objects are immutable sequences of single bytes. > > A Bytes object could represent anything, and is > appropriate as the underlying storage for a sound sample > or image file. > > Virtual subclass ASCIIStructuredBytes > ==================================== > > One particularly common use of bytes is to represent > the contents of a file, or of a network message. In > these cases, the bytes will often represent Text > *in a specific encoding* and that encoding will usually > be a superset of ASCII. Rather than create and support > an ASCIIStructuredBytes subclass, Python simply added > support for these use cases straight to Bytes objects, > and assumes that this support simply won't be used when > when it does not make sense. For example, bytes literals > *could* be used to construct a sound sample, but the > literals will be far easier to read when they are used > to represent (encoded) ASCII text, such as "OPEN". I find the Virtual subclass in the title to be confusing, but I otherwise it's great. We should have that even if we do add formatting to bytes, as that message is even more important then. -- ~Ethan~ From guido at python.org Tue Jan 14 19:52:05 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Jan 2014 10:52:05 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: On Tue, Jan 14, 2014 at 9:45 AM, Chris Barker wrote: > On Tue, Jan 14, 2014 at 9:29 AM, Yury Selivanov > wrote: >> >> - Try str(), and do ".encode(?ascii?, ?stcict?)? on the result. > > > please no -- that's the source of a lot of pain in py2 now. > > having a failure as a result of the value, rather than the type, of an > object just makes hard-to-test for bugs. Everything will be hunky dory for > development and testing, then in deployment some idiot ( ;-) ) will pass in > some non-ascii compatible string and you get failure. And the person who > gets the failure doesn't understand why, or they wouldn't have passed in > non-ascii values in the first place... > > Ease of porting is nice, but let's not make it easy to port bug-prone code. Right. This is a big red flag to me as well. I think there is some inherent conflict between the extensible design of str.format() and the practical needs of people who are actually going to use formatting operations (either % or .format()) with bytes. The *practical* needs are mostly limited to supporting basic number formatting (decimal, hex, padding) and interpolation of anything that supports the buffer interface. It would also be nice if you didn't have to specify the type at all in the format string, i.e. {} should do the right thing for numbers and (all sorts of) bytes. But the way to arrive at this behavior without duplicating a whole lot of code seems to be to call the existing text-based __format__ API and convert the result to bytes -- for numbers this should be safe (their formatting produces just ASCII digits and a selected few other ASCII characters) but leads to an undesirable outcome for other types -- not just str but also e.g. lists or dicts containing str instances, since those call __repr__ on the contained items, and repr() may produce non-ASCII bytes. This is why my earlier proposal used ascii(), which is a "nerfed"(*) version of repr(). This does the right thing for numbers as well as for many other types (e.g. None, bool) and does something unpleasant for text strings that is perhaps better than the alternative. Which reminds me. Quite a few people have spoken out in favor of loud failures rather than silent "wrong" output. But I think that in the specific context of formatting output, there is a long and IMO good tradition of producing (slightly) wrong output in favor of more strict behavior. Consider for example what to do when a number doesn't fit in the given width. Would you rather raise an exception, truncate the value, or mess up the formatting? All languages newer than Fortran that I've used have chosen the latter, and I still agree it's a good idea. Similar with infinities, NaN, or None. (Yes, it's embarrassing to have a website displaying 'null'. But isn't a 500 even *more* embarrassing?) This doesn't mean I'm insensitive to the argument in favor of loud and early failure. It's just that I can see both sides of the coin, and I'm still deciding which argument is more important. (*) Gamer slang for a weapon made less dangerous. :-) -- --Guido van Rossum (python.org/~guido) From brett at python.org Tue Jan 14 20:14:48 2014 From: brett at python.org (Brett Cannon) Date: Tue, 14 Jan 2014 14:14:48 -0500 Subject: [Python-Dev] [Python-checkins] peps: Fill in PEP number (461). In-Reply-To: <3f3hB54YWRz7LqC@mail.python.org> References: <3f3hB54YWRz7LqC@mail.python.org> Message-ID: I think this was supposed to be 461, not 460 =) On Tue, Jan 14, 2014 at 2:12 PM, guido.van.rossum < python-checkins at python.org> wrote: > http://hg.python.org/peps/rev/a25f48998ad3 > changeset: 5346:a25f48998ad3 > user: Guido van Rossum > date: Tue Jan 14 11:12:09 2014 -0800 > summary: > Fill in PEP number (461). > > files: > pep-0461.txt | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > > diff --git a/pep-0461.txt b/pep-0461.txt > --- a/pep-0461.txt > +++ b/pep-0461.txt > @@ -1,4 +1,4 @@ > -PEP: XXX > +PEP: 460 > Title: Adding % and {} formatting to bytes > Version: $Revision$ > Last-Modified: $Date$ > > -- > Repository URL: http://hg.python.org/peps > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > https://mail.python.org/mailman/listinfo/python-checkins > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Jan 14 20:19:29 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 14 Jan 2014 14:19:29 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52d57def.0180310a.4b08.ffffa287@mx.google.com> References: <52D4845B.10009@canterbury.ac.nz> <52d57def.0180310a.4b08.ffffa287@mx.google.com> Message-ID: On 1/14/2014 1:11 PM, Jim J. Jewett wrote: > But in terms of explaining the text model, that > separation is important enough that > > (1) We should be reluctant to strengthen the > "its really just ASCII" messages. > (2) It *may* be worth creating a virtual > split in the documentation. > > I'm willing ot work on (2) if there is general consensus > that it would be a good idea. As a rough sketch, I > would change places like > > http://docs.python.org/3/library/stdtypes.html#typebytes > > from: > > Bytes objects are immutable sequences of single bytes. > Since many major binary protocols are based on the ASCII > text encoding, bytes objects offer several methods that > are only valid when working with ASCII compatible data > and are closely related to string objects in a variety > of other ways. > > to something more like: > > Bytes objects are immutable sequences of single bytes. > > A Bytes object could represent anything, and is > appropriate as the underlying storage for a sound sample > or image file. > > Virtual subclass ASCIIStructuredBytes > ==================================== > > One particularly common use of bytes is to represent > the contents of a file, or of a network message. In > these cases, the bytes will often represent Text > *in a specific encoding* and that encoding will usually > be a superset of ASCII. Rather than create and support > an ASCIIStructuredBytes subclass, Python simply added > support for these use cases straight to Bytes objects, > and assumes that this support simply won't be used when > when it does not make sense. For example, bytes literals > *could* be used to construct a sound sample, but the > literals will be far easier to read when they are used > to represent (encoded) ASCII text, such as "OPEN". I rather like this. Consider opening a tracker issue. -- Terry Jan Reedy From solipsis at pitrou.net Tue Jan 14 20:23:03 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 14 Jan 2014 20:23:03 +0100 Subject: [Python-Dev] PEP 460 reboot References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: <20140114202303.504d0b2b@fsol> On Tue, 14 Jan 2014 10:52:05 -0800 Guido van Rossum wrote: > Would you rather raise an exception, truncate the > value, or mess up the formatting? All languages newer than Fortran > that I've used have chosen the latter, and I still agree it's a good > idea. Well that's useful when printing out human-readable stuff on stdout, much less when you're emitting binary data that's supposed to conform to a well-defined protocol. I expect bytes formatting to be used for the latter, not the former. (which also means, actually, that I don't think the fancy formatting features - alignment, etc. - are useful at all for bytes; but it's probably ok having them for consistency) > Similar with infinities, NaN, or None. (Yes, it's embarrassing > to have a website displaying 'null'. But isn't a 500 even *more* > embarrassing?) When it comes to type mismatch, though, an error is raised: >>> "%d" % object() Traceback (most recent call last): File "", line 1, in TypeError: %d format: a number is required, not object (instead of outputting e.g. repr(id(x))) Regards Antoine. From dholth at gmail.com Tue Jan 14 20:23:40 2014 From: dholth at gmail.com (Daniel Holth) Date: Tue, 14 Jan 2014 14:23:40 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: On Tue, Jan 14, 2014 at 1:52 PM, Guido van Rossum wrote: > On Tue, Jan 14, 2014 at 9:45 AM, Chris Barker wrote: >> On Tue, Jan 14, 2014 at 9:29 AM, Yury Selivanov >> wrote: >>> >>> - Try str(), and do ".encode(?ascii?, ?stcict?)? on the result. >> >> >> please no -- that's the source of a lot of pain in py2 now. >> >> having a failure as a result of the value, rather than the type, of an >> object just makes hard-to-test for bugs. Everything will be hunky dory for >> development and testing, then in deployment some idiot ( ;-) ) will pass in >> some non-ascii compatible string and you get failure. And the person who >> gets the failure doesn't understand why, or they wouldn't have passed in >> non-ascii values in the first place... >> >> Ease of porting is nice, but let's not make it easy to port bug-prone code. > > Right. This is a big red flag to me as well. > > I think there is some inherent conflict between the extensible design > of str.format() and the practical needs of people who are actually > going to use formatting operations (either % or .format()) with bytes. > > The *practical* needs are mostly limited to supporting basic number > formatting (decimal, hex, padding) and interpolation of anything that > supports the buffer interface. It would also be nice if you didn't > have to specify the type at all in the format string, i.e. {} should > do the right thing for numbers and (all sorts of) bytes. > > But the way to arrive at this behavior without duplicating a whole lot > of code seems to be to call the existing text-based __format__ API and > convert the result to bytes -- for numbers this should be safe (their > formatting produces just ASCII digits and a selected few other ASCII > characters) but leads to an undesirable outcome for other types -- not > just str but also e.g. lists or dicts containing str instances, since > those call __repr__ on the contained items, and repr() may produce > non-ASCII bytes. > > This is why my earlier proposal used ascii(), which is a "nerfed"(*) > version of repr(). This does the right thing for numbers as well as > for many other types (e.g. None, bool) and does something unpleasant > for text strings that is perhaps better than the alternative. > > Which reminds me. Quite a few people have spoken out in favor of loud > failures rather than silent "wrong" output. But I think that in the > specific context of formatting output, there is a long and IMO good > tradition of producing (slightly) wrong output in favor of more strict > behavior. Consider for example what to do when a number doesn't fit in > the given width. Would you rather raise an exception, truncate the > value, or mess up the formatting? All languages newer than Fortran > that I've used have chosen the latter, and I still agree it's a good > idea. Similar with infinities, NaN, or None. (Yes, it's embarrassing > to have a website displaying 'null'. But isn't a 500 even *more* > embarrassing?) > > This doesn't mean I'm insensitive to the argument in favor of loud and > early failure. It's just that I can see both sides of the coin, and > I'm still deciding which argument is more important. > > (*) Gamer slang for a weapon made less dangerous. :-) I think loud and early failure is important for porting while you might still be trying to pound out the previously blurry encode/decode boundaries. In this code str and bytes will be wrong everywhere. Some APIs might return either str or bytes based on the input. Let it fail, find the boundaries, and fix it until it does something useful without failing. And it kindof depends on the context whether it is worse to display weird ephemeral output or write the same weird output to long term storage. I'm not sure what to think about content-dependent failures on protocols that are supposed to be ASCII-only-without-repr-noise. From jimjjewett at gmail.com Tue Jan 14 20:43:16 2014 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Tue, 14 Jan 2014 11:43:16 -0800 (PST) Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D57F28.6080502@stoneleaf.us> Message-ID: <52d59354.0476310a.5f4e.ffffcd72@mx.google.com> Greg Ewing replied: >> ... ASCII compatible binary data is a >> *subset* of arbitrary binary data. I wrote: > But in terms of explaining the text model, that > separation is important enough that > (2) It *may* be worth creating a virtual > split in the documentation. (rough sketch below) Ethan likes the idea, but points out that the term "Virtual" is confusing here. Alas, I'm not sure what the correct term is. In addition to "Go for it!" / "Don't waste your time", I'm looking for advice on: (A) What word should I use instead of "Virtual"? Imaginary? Pretend? (B) Would it be good/bad/at least make the docs easier to create an actual class (or alias)? (C) Same question for a pair of classes provided only in the documentation, like example code. (D) What about an abstract class, or several? e.g., replacing the XXX TODO of collections.abc.ByteString with separate abstract classes for ByteSequence, String, ByteString, and ASCIIByteString? (ByteString already includes any bytes or bytearray instance, so backward compatibility means the String suffix isn't sufficient for an opt-in-by-instances class.) > I'm willing ot work on (2) if there is general consensus > that it would be a good idea. As a rough sketch, I > would change places like > > http://docs.python.org/3/library/stdtypes.html#typebytes > > from: > > Bytes objects are immutable sequences of single bytes. > Since many major binary protocols are based on the ASCII > text encoding, bytes objects offer several methods that > are only valid when working with ASCII compatible data > and are closely related to string objects in a variety > of other ways. > > to something more like: > > Bytes objects are immutable sequences of single bytes. > > A Bytes object could represent anything, and is > appropriate as the underlying storage for a sound sample > or image file. > > Virtual subclass ASCIIStructuredBytes > ==================================== > > One particularly common use of bytes is to represent > the contents of a file, or of a network message. In > these cases, the bytes will often represent Text > *in a specific encoding* and that encoding will usually > be a superset of ASCII. Rather than create and support > an ASCIIStructuredBytes subclass, Python simply added > support for these use cases straight to Bytes objects, > and assumes that this support simply won't be used when > when it does not make sense. For example, bytes literals > *could* be used to construct a sound sample, but the > literals will be far easier to read when they are used > to represent (encoded) ASCII text, such as "OPEN". -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ From rdmurray at bitdance.com Tue Jan 14 21:03:36 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 14 Jan 2014 15:03:36 -0500 Subject: [Python-Dev] Byte/text documentation improvements (was: PEP 460 reboot) In-Reply-To: <52d59354.0476310a.5f4e.ffffcd72@mx.google.com> References: <52d59354.0476310a.5f4e.ffffcd72@mx.google.com> Message-ID: <20140114200336.F04E12500D0@webabinitio.net> On Tue, 14 Jan 2014 11:43:16 -0800, "Jim J. Jewett" wrote: > Greg Ewing replied: > > >> ... ASCII compatible binary data is a > >> *subset* of arbitrary binary data. > > I wrote: > > > But in terms of explaining the text model, that > > separation is important enough that > > > (2) It *may* be worth creating a virtual > > split in the documentation. > > (rough sketch below) > > Ethan likes the idea, but points out that the term > "Virtual" is confusing here. > > Alas, I'm not sure what the correct term is. In > addition to "Go for it!" / "Don't waste your time", > I'm looking for advice on: > > (A) What word should I use instead of "Virtual"? > Imaginary? Pretend? Notional. > (B) Would it be good/bad/at least make the docs > easier to create an actual class (or alias)? I don't have an opinion on this, but if you make it real class then "notional" would no longer work. I guess you'd just call it an alias in that case. > (C) Same question for a pair of classes provided > only in the documentation, like example code. Bad. Refer to it via a glossary item ref or a section ref. > (D) What about an abstract class, or several? > > e.g., replacing the XXX TODO of collections.abc.ByteString > with separate abstract classes for ByteSequence, String, > ByteString, and ASCIIByteString? > > (ByteString already includes any bytes or bytearray instance, > so backward compatibility means the String suffix isn't > sufficient for an opt-in-by-instances class.) What's the difference between ByteString and ByteSequence? Or maybe I'm asking the difference between ByteString and ASCIIByteString? So the only concrete classes would be ASCIIByteStrings....that might work. It would give us something to call that argument type in, eg, the binascii docs. Not to mention a formal definition of what methods a Python byte type needs to support. --David From eric at trueblade.com Tue Jan 14 21:04:48 2014 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 14 Jan 2014 15:04:48 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: <52D59860.1010301@trueblade.com> On 01/14/2014 01:52 PM, Guido van Rossum wrote: > But the way to arrive at this behavior without duplicating a whole lot > of code seems to be to call the existing text-based __format__ API and > convert the result to bytes -- for numbers this should be safe (their > formatting produces just ASCII digits and a selected few other ASCII > characters) but leads to an undesirable outcome for other types -- not > just str but also e.g. lists or dicts containing str instances, since > those call __repr__ on the contained items, and repr() may produce > non-ASCII bytes. That's why I suggested restricting the types supported. If we could live with just a subset of known types, then we could hard-code the conversions to bytes. How many types with custom __format__'s are really getting written to byte strings in 2.x? For that matter, are any lists, sets, or dicts (or anything else using object.__format__'s conversion using str()) really getting written to bytes? Do we need to support these cases? In my mind, this comes down to: are we trying to add this just to make porting easier? In my mind, we wouldn't even be adding feature at all except for ease of porting 2.x code. So we should focus on what features are used in the code we're trying to port. I don't think our focus is on 2.x code that's using u''.format(), it's 2.x code that's been reviewed and is still using b''.format() because it's building up bytes for a wire protocol. And that code is not likely to need to format objects with arbitrary __format__ methods, or even str (in the 3.x sense). It's only likely to use numbers and bytes (or str in the 2.x sense). Eric. From guido at python.org Tue Jan 14 21:06:32 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Jan 2014 12:06:32 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52d59354.0476310a.5f4e.ffffcd72@mx.google.com> References: <52D57F28.6080502@stoneleaf.us> <52d59354.0476310a.5f4e.ffffcd72@mx.google.com> Message-ID: Personally I wouldn't add any words suggesting or referring to the option of creation another class for this purpose. You wouldn't recommend subclassing dict for constraining the types of keys or values, would you? On Tue, Jan 14, 2014 at 11:43 AM, Jim J. Jewett wrote: > > > > Greg Ewing replied: > >>> ... ASCII compatible binary data is a >>> *subset* of arbitrary binary data. > > I wrote: > >> But in terms of explaining the text model, that >> separation is important enough that > >> (2) It *may* be worth creating a virtual >> split in the documentation. > > (rough sketch below) > > Ethan likes the idea, but points out that the term > "Virtual" is confusing here. > > Alas, I'm not sure what the correct term is. In > addition to "Go for it!" / "Don't waste your time", > I'm looking for advice on: > > (A) What word should I use instead of "Virtual"? > Imaginary? Pretend? > > (B) Would it be good/bad/at least make the docs > easier to create an actual class (or alias)? > > (C) Same question for a pair of classes provided > only in the documentation, like example code. > > (D) What about an abstract class, or several? > > e.g., replacing the XXX TODO of collections.abc.ByteString > with separate abstract classes for ByteSequence, String, > ByteString, and ASCIIByteString? > > (ByteString already includes any bytes or bytearray instance, > so backward compatibility means the String suffix isn't > sufficient for an opt-in-by-instances class.) > > >> I'm willing ot work on (2) if there is general consensus >> that it would be a good idea. As a rough sketch, I >> would change places like >> >> http://docs.python.org/3/library/stdtypes.html#typebytes >> >> from: >> >> Bytes objects are immutable sequences of single bytes. >> Since many major binary protocols are based on the ASCII >> text encoding, bytes objects offer several methods that >> are only valid when working with ASCII compatible data >> and are closely related to string objects in a variety >> of other ways. >> >> to something more like: >> >> Bytes objects are immutable sequences of single bytes. >> >> A Bytes object could represent anything, and is >> appropriate as the underlying storage for a sound sample >> or image file. >> >> Virtual subclass ASCIIStructuredBytes >> ==================================== >> >> One particularly common use of bytes is to represent >> the contents of a file, or of a network message. In >> these cases, the bytes will often represent Text >> *in a specific encoding* and that encoding will usually >> be a superset of ASCII. Rather than create and support >> an ASCIIStructuredBytes subclass, Python simply added >> support for these use cases straight to Bytes objects, >> and assumes that this support simply won't be used when >> when it does not make sense. For example, bytes literals >> *could* be used to construct a sound sample, but the >> literals will be far easier to read when they are used >> to represent (encoded) ASCII text, such as "OPEN". > > > -jJ > > -- > > If there are still threading problems with my replies, please > email me with details, so that I can try to resolve them. -jJ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Jan 14 21:07:49 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Jan 2014 12:07:49 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D59860.1010301@trueblade.com> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <52D59860.1010301@trueblade.com> Message-ID: On Tue, Jan 14, 2014 at 12:04 PM, Eric V. Smith wrote: > On 01/14/2014 01:52 PM, Guido van Rossum wrote: > >> But the way to arrive at this behavior without duplicating a whole lot >> of code seems to be to call the existing text-based __format__ API and >> convert the result to bytes -- for numbers this should be safe (their >> formatting produces just ASCII digits and a selected few other ASCII >> characters) but leads to an undesirable outcome for other types -- not >> just str but also e.g. lists or dicts containing str instances, since >> those call __repr__ on the contained items, and repr() may produce >> non-ASCII bytes. > > That's why I suggested restricting the types supported. If we could live > with just a subset of known types, then we could hard-code the > conversions to bytes. How many types with custom __format__'s are really > getting written to byte strings in 2.x? For that matter, are any lists, > sets, or dicts (or anything else using object.__format__'s conversion > using str()) really getting written to bytes? Do we need to support > these cases? > > In my mind, this comes down to: are we trying to add this just to make > porting easier? In my mind, we wouldn't even be adding feature at all > except for ease of porting 2.x code. So we should focus on what features > are used in the code we're trying to port. I don't think our focus is on > 2.x code that's using u''.format(), it's 2.x code that's been reviewed > and is still using b''.format() because it's building up bytes for a > wire protocol. And that code is not likely to need to format objects > with arbitrary __format__ methods, or even str (in the 3.x sense). It's > only likely to use numbers and bytes (or str in the 2.x sense). Yes, these are exactly the right questions to ask. -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Tue Jan 14 20:56:25 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 11:56:25 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D59622.1070307@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> Message-ID: <52D59669.20404@stoneleaf.us> Duh. Here's the text, as well. ;) PEP: 461 Title: Adding % and {} formatting to bytes Version: $Revision$ Last-Modified: $Date$ Author: Ethan Furman Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-01-13 Python-Version: 3.5 Post-History: 2014-01-13 Resolution: Abstract ======== This PEP proposes adding the % and {} formatting operations from str to bytes. Proposed semantics for bytes formatting ======================================= %-interpolation --------------- All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. Example:: >>> b'%4x' % 10 b' a' %c will insert a single byte, either from an int in range(256), or from a bytes argument of length 1. Example: >>> b'%c' % 48 b'0' >>> b'%c' % b'a' b'a' %s, because it is the most general, has the most convoluted resolution: - input type is bytes? pass it straight through - input type is numeric? use its __xxx__ [1] [2] method and ascii-encode it (strictly) - input type is something else? use its __bytes__ method; if there isn't one, raise an exception [3] Examples: >>> b'%s' % b'abc' b'abc' >>> b'%s' % 3.14 b'3.14' >>> b'%s' % 'hello world!' Traceback (most recent call last): ... TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it? .. note:: Because the str type does not have a __bytes__ method, attempts to directly use 'a string' as a bytes interpolation value will raise an exception. To use 'string' values, they must be encoded or otherwise transformed into a bytes sequence:: 'a string'.encode('latin-1') format ------ The format mini language will be used as-is, with the behaviors as listed for %-interpolation. Open Questions ============== For %s there has been some discussion of trying to use the buffer protocol (Py_buffer) before trying __bytes__. This question should be answered before the PEP is implemented. Proposed variations =================== It has been suggested to use %b for bytes instead of %s. - Rejected as %b does not exist in Python 2.x %-interpolation, which is why we are using %s. It has been proposed to automatically use .encode('ascii','strict') for str arguments to %s. - Rejected as this would lead to intermittent failures. Better to have the operation always fail so the trouble-spot can be correctly fixed. It has been proposed to have %s return the ascii-encoded repr when the value is a str (b'%s' % 'abc' --> b"'abc'"). - Rejected as this would lead to hard to debug failures far from the problem site. Better to have the operation always fail so the trouble-spot can be easily fixed. Foot notes ========== .. [1] Not sure if this should be the numeric __str__ or the numeric __repr__, or if there's any difference .. [2] Any proper numeric class would then have to provide an ascii representation of its value, either via __repr__ or __str__ (whichever we choose in [1]). .. [3] TypeError, ValueError, or UnicodeEncodeError? Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From larry at hastings.org Tue Jan 14 21:22:12 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 14 Jan 2014 12:22:12 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D20D86.6030502@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> Message-ID: <52D59C74.30702@hastings.org> On 01/11/2014 07:35 PM, Larry Hastings wrote: > > On 01/08/2014 07:08 AM, Barry Warsaw wrote: >> How hard would it be to put together some sample branches that provide >> concrete examples of the various options? >> >> My own opinion could easily be influenced by having some hands-on time with >> actual code, and I suspect even Guido could be influenced if he could pull >> some things up in his editor and take a look around. > > I've uploaded a prototype here: > > https://bitbucket.org/larry/python-clinic-buffer > I have now received exactly zero feedback about the prototype, which suggests people aren't using it. In an attempt to jump-start this conversation, I've created a new repository containing the "concrete examples of the various options" that Barry proposed above. You may find it here: https://bitbucket.org/larry/clinic-buffer-samples/src In it I converted Modules/_pickle.c four different ways. There's a README, please read it. People who want to change how Clinic writes its output: this is your big chance. Comment on these samples, or produce your own counterexamples, or something. If you can enough people on your side maybe Clinic will change. If there is no further debate on this topic, nothing will happen and Clinic will not change. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jan 14 21:13:31 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 12:13:31 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: <52D59A6B.5030803@stoneleaf.us> On 01/14/2014 10:52 AM, Guido van Rossum wrote: > > Which reminds me. Quite a few people have spoken out in favor of loud > failures rather than silent "wrong" output. But I think that in the > specific context of formatting output, there is a long and IMO good > tradition of producing (slightly) wrong output in favor of more strict > behavior. Consider for example what to do when a number doesn't fit in > the given width. Would you rather raise an exception, truncate the > value, or mess up the formatting? One more data point to consider: When the binary format has strict rules on how much space a data-point is allowed, then failure is the only appropriate option. In Py2, because '%15s' can actually take 17 characters, I have to use '%15s' % data_value[:15] everywhere. I'm not suggesting we change how that portion works, as it would then be, I think, too different from both Py2 behavior as well as current str behavior, but likewise adding in single quotes would of no help to me. Loud failure so I can easily see where I forgot the .encode() would be much more helpful. -- ~Ethan~ From zachary.ware+pydev at gmail.com Tue Jan 14 21:48:34 2014 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Tue, 14 Jan 2014 14:48:34 -0600 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D59C74.30702@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> Message-ID: On Tue, Jan 14, 2014 at 2:22 PM, Larry Hastings wrote: > I have now received exactly zero feedback about the prototype, which > suggests people aren't using it. Oops, I had half a post written about this two days ago, but never got it posted. I did some experimenting on winreg.c (see http://hg.python.org/sandbox/zware/file/prototype_clinic/PC/winreg.c), and I have to say I really really like having most of the output shunted down to the bottom of the file. In that example I have only the implementation outputting to the block, and everything else (that's necessary) going into the buffer; to me it looks very nice and clean. One of my biggest annoyances with the current output is having the docstring repeated nearly verbatim (with additives) within just a few lines, and this takes care of that and more. To me, those converted functions read about as close to real Python as is ever going to happen in a C file. One thing that I could see being useful (though possibly not easy) is the ability to dump a buffer "late"; for example, near the top of the file: /*[clinic input] destination prototypes new buffer output parser_prototype prototypes dump prototypes later [clinic start generated code]*/ Then process the file, filling the prototypes buffer as we go. At the end of the file, go back and dump the buffer in that output block. I like the flexibility of the prototype, having more control over what goes where is always nice :) -- Zach From ethan at stoneleaf.us Tue Jan 14 20:55:14 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 11:55:14 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes Message-ID: <52D59622.1070307@stoneleaf.us> This PEP goes a but further than PEP 460 does, and hopefully spells things out in enough detail so there is no confusion as to what is meant. -- ~Ethan~ From larry at hastings.org Tue Jan 14 21:54:31 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 14 Jan 2014 12:54:31 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> Message-ID: <52D5A407.2020907@hastings.org> On 01/14/2014 12:48 PM, Zachary Ware wrote: > On Tue, Jan 14, 2014 at 2:22 PM, Larry Hastings wrote: >> I have now received exactly zero feedback about the prototype, which >> suggests people aren't using it. > Oops, I had half a post written about this two days ago, but never got > it posted. > > I did some experimenting on winreg.c (see > http://hg.python.org/sandbox/zware/file/prototype_clinic/PC/winreg.c), > and I have to say I really really like having most of the output > shunted down to the bottom of the file. I will consider you a +1 on the "buffer" approach and NaN on the other approaches. > One thing that I could see being useful (though possibly not easy) is > the ability to dump a buffer "late"; for example, near the top of the > file: > > /*[clinic input] > destination prototypes new buffer > output parser_prototype prototypes > dump prototypes later > [clinic start generated code]*/ > > Then process the file, filling the prototypes buffer as we go. At the > end of the file, go back and dump the buffer in that output block. That wouldn't be too hard. But conceptually it would make Clinic much more complicated. For example, I suggest that "later" is a confusing name, because the output will actually happen *earlier* in the file. "If it's hard to explain, it may be a bad idea." ;-) //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jan 14 21:54:29 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Jan 2014 12:54:29 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D59A6B.5030803@stoneleaf.us> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <52D59A6B.5030803@stoneleaf.us> Message-ID: On Tue, Jan 14, 2014 at 12:13 PM, Ethan Furman wrote: > On 01/14/2014 10:52 AM, Guido van Rossum wrote: >> >> Which reminds me. Quite a few people have spoken out in favor of loud >> failures rather than silent "wrong" output. But I think that in the >> specific context of formatting output, there is a long and IMO good >> tradition of producing (slightly) wrong output in favor of more strict >> behavior. Consider for example what to do when a number doesn't fit in >> the given width. Would you rather raise an exception, truncate the >> value, or mess up the formatting? > > One more data point to consider: When the binary format has strict rules on > how much space a data-point is allowed, then failure is the only appropriate > option. Yes, that's how the struct module works. > In Py2, because '%15s' can actually take 17 characters, I have to use '%15s' > % data_value[:15] everywhere. Wow. I thought there would be some combination using %.15s but I can't get that to work. :-( > I'm not suggesting we change how that portion works, as it would then be, I > think, too different from both Py2 behavior as well as current str behavior, > but likewise adding in single quotes would of no help to me. Loud failure > so I can easily see where I forgot the .encode() would be much more helpful. If we go with a more restricted version this makes sense indeed. The single quotes seemed unavoidable when I was trying (like several other proposals) to have a format code that works for all types. I think we're rightly giving up on that now. (I should review PEP 461, but I don't have time yet.) -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Tue Jan 14 21:57:30 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 14 Jan 2014 21:57:30 +0100 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: <20140114215730.502b4142@fsol> On Tue, 14 Jan 2014 11:56:25 -0800 Ethan Furman wrote: > > %s, because it is the most general, has the most convoluted resolution: > > - input type is bytes? > pass it straight through It should try to get a Py_buffer instead. > - input type is numeric? > use its __xxx__ [1] [2] method and ascii-encode it (strictly) What is the definition of "numeric"? Regards Antoine. From brett at python.org Tue Jan 14 22:03:35 2014 From: brett at python.org (Brett Cannon) Date: Tue, 14 Jan 2014 16:03:35 -0500 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D59C74.30702@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> Message-ID: On Tue, Jan 14, 2014 at 3:22 PM, Larry Hastings wrote: > > On 01/11/2014 07:35 PM, Larry Hastings wrote: > > > On 01/08/2014 07:08 AM, Barry Warsaw wrote: > > How hard would it be to put together some sample branches that provide > concrete examples of the various options? > > My own opinion could easily be influenced by having some hands-on time with > actual code, and I suspect even Guido could be influenced if he could pull > some things up in his editor and take a look around. > > > I've uploaded a prototype here: > > https://bitbucket.org/larry/python-clinic-buffer > > > > I have now received exactly zero feedback about the prototype, which > suggests people aren't using it. In an attempt to jump-start this > conversation, I've created a new repository containing the "concrete > examples of the various options" that Barry proposed above. You may find > it here: > > https://bitbucket.org/larry/clinic-buffer-samples/src > > In it I converted Modules/_pickle.c four different ways. There's a > README, please read it. > > People who want to change how Clinic writes its output: this is your big > chance. Comment on these samples, or produce your own counterexamples, or > something. If you can enough people on your side maybe Clinic will > change. If there is no further debate on this topic, nothing will happen > and Clinic will not change. > +0 _pickle.original.c +1 _pickle.using-buffer.c -0 _pickle.using-modified-buffer.c +1 _pickle.using-multiple-buffers.c -0 _pickle.using-sidefile.c -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Tue Jan 14 22:05:35 2014 From: brett at python.org (Brett Cannon) Date: Tue, 14 Jan 2014 16:05:35 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D59622.1070307@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> Message-ID: On Tue, Jan 14, 2014 at 2:55 PM, Ethan Furman wrote: > This PEP goes a but further than PEP 460 does, and hopefully spells things > out in enough detail so there is no confusion as to what is meant. > Are we going down the PEP route with the various ideas? Guido, do you want one from me as well or should I not bother? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jan 14 22:11:45 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 13:11:45 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> Message-ID: <52D5A811.8090406@stoneleaf.us> On 01/14/2014 01:05 PM, Brett Cannon wrote: > On Tue, Jan 14, 2014 at 2:55 PM, Ethan Furman wrote: > >> This PEP goes a but further than PEP 460 does, and hopefully spells >> things out in enough detail so there is no confusion as to what is >> meant. > > Are we going down the PEP route with the various ideas? Guido, do > you want one from me as well or should I not bother? While I can't answer for Guido, I will say I authored this PEP because Antoine didn't want 460 to be any more liberal than it already was. If you collect your ideas together, I'll add them to 461 as questions or discussions or however is appropriate (assuming you're willing to go that route). -- ~Ethan~ From ethan at stoneleaf.us Tue Jan 14 21:51:31 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 12:51:31 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D59C74.30702@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> Message-ID: <52D5A353.7070002@stoneleaf.us> On 01/14/2014 12:22 PM, Larry Hastings wrote: > > I have now received exactly zero feedback about the prototype, which suggests people aren't using it. In an attempt to > jump-start this conversation, I've created a new repository containing the "concrete examples of the various options" > that Barry proposed above. You may find it here: > > https://bitbucket.org/larry/clinic-buffer-samples/src > > In it I converted Modules/_pickle.c four different ways. There's a README, please read it. > > People who want to change how Clinic writes its output: this is your big chance. Comment on these samples, or produce > your own counterexamples, or something. If you can enough people on your side maybe Clinic will change. If there is no > further debate on this topic, nothing will happen and Clinic will not change. I checked the README, the current file, and the buffered files. My preferences from highest to lowest: - modified buffer approach - buffer approach - side file Thanks for taking the time, Larry! -- ~Ethan~ From solipsis at pitrou.net Tue Jan 14 22:12:21 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 14 Jan 2014 22:12:21 +0100 Subject: [Python-Dev] Changing Clinic's output References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> Message-ID: <20140114221221.5dbc2541@fsol> On Tue, 14 Jan 2014 12:22:12 -0800 Larry Hastings wrote: > > https://bitbucket.org/larry/clinic-buffer-samples/src > > In it I converted Modules/_pickle.c four different ways. There's a > README, please read it. I'm +1 on the sidefile approach. +0 on the various buffer approaches. -0.5 on the current "sprinkled everywhere" approach. Regards Antoine. From eric at trueblade.com Tue Jan 14 22:15:57 2014 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 14 Jan 2014 16:15:57 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <52D59A6B.5030803@stoneleaf.us> Message-ID: <52D5A90D.3070904@trueblade.com> On 1/14/2014 3:54 PM, Guido van Rossum wrote: > On Tue, Jan 14, 2014 at 12:13 PM, Ethan Furman wrote: >> In Py2, because '%15s' can actually take 17 characters, I have to use '%15s' >> % data_value[:15] everywhere. > > Wow. I thought there would be some combination using %.15s but I can't > get that to work. :-( >>> '%.15s' % 'abcdefghij1234567' 'abcdefghij12345' >>> '{:.15}'.format('abcdefghij1234567') 'abcdefghij12345' >>> Or, depending on what you're after: >>> '%15.15s' % 'abcde' ' abcde' >>> '%15.15s' % 'abcdefghij1234567' 'abcdefghij12345' >>> > (I should review PEP 461, but I don't have time yet.) Same here. From breamoreboy at yahoo.co.uk Tue Jan 14 22:17:02 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 14 Jan 2014 21:17:02 +0000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <52D59A6B.5030803@stoneleaf.us> Message-ID: On 14/01/2014 20:54, Guido van Rossum wrote: > On Tue, Jan 14, 2014 at 12:13 PM, Ethan Furman wrote: > >> In Py2, because '%15s' can actually take 17 characters, I have to use '%15s' >> % data_value[:15] everywhere. > > Wow. I thought there would be some combination using %.15s but I can't > get that to work. :-( > I believe you wanted this. >>> a='01234567890123456' >>> len(a) 17 >>> b = '%15.15s' % a >>> b;len(b) '012345678901234' 15 -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From greg.ewing at canterbury.ac.nz Tue Jan 14 22:17:42 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jan 2014 10:17:42 +1300 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <878uuievm5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D5A976.1050401@canterbury.ac.nz> Nick Coghlan wrote: > The > mini-language parser has to assume in encoding in order to interpret > the format string, and that's *all* done assuming an ASCII compatible > format string (which must make life interesting if you try to use an > ASCII incompatible coding cookie for your source code I don't think it's all *that* interesting. As long as you're able to type the relevant characters on your keyboard and have them displayed in a recognisable way in your editor, then what looks like b"Content-Length: %d" in your source will end up encoded as ascii in the bytes object, whatever the encoding of the source file. If the source file uses an encoding that can't even represent the formatting characters, then you're in trouble -- but you'd have a hard time writing Python code at all in such an environment! > It's certainly a decision that has its downsides, with the potential > impact on users of ASCII incompatible encodings (mostly in Asia) being > the main one, I don't think it will have much impact on them, other than maybe they will find less use cases for it. But the main intended use cases are for things like http headers which have protocol-mandated ascii-ish bits, and those bits are still just as ascii-ish in China as they are anywhere else. -- Greg From barry at python.org Tue Jan 14 22:26:09 2014 From: barry at python.org (Barry Warsaw) Date: Tue, 14 Jan 2014 16:26:09 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: <20140114162609.1f03cfb3@anarchist.wooz.org> On Jan 14, 2014, at 10:52 AM, Guido van Rossum wrote: >Which reminds me. Quite a few people have spoken out in favor of loud >failures rather than silent "wrong" output. But I think that in the >specific context of formatting output, there is a long and IMO good >tradition of producing (slightly) wrong output in favor of more strict >behavior. In the email package we now have a tradition of allowing either behavior. http://docs.python.org/3.4/library/email.policy.html#email.policy.Policy.raise_on_defect Perhaps not appropriate for the PEP 460 related cases, but I think the policy mechanism works great for email parsing, where sometimes you definitely want to fail early (e.g. you are composing new messages out of literal strings) and other times where you are willing to put up with some best-effort representation in exchange for no exceptions being raised (e.g. you are parsing messages being fed to you from your mail server). -Barry From ethan at stoneleaf.us Tue Jan 14 22:28:44 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 13:28:44 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D5A90D.3070904@trueblade.com> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <52D59A6B.5030803@stoneleaf.us> <52D5A90D.3070904@trueblade.com> Message-ID: <52D5AC0C.8030103@stoneleaf.us> On 01/14/2014 01:15 PM, Eric V. Smith wrote: > On 1/14/2014 3:54 PM, Guido van Rossum wrote: >> On Tue, Jan 14, 2014 at 12:13 PM, Ethan Furman wrote: >>> In Py2, because '%15s' can actually take 17 characters, I have to use '%15s' >>> % data_value[:15] everywhere. >> >> Wow. I thought there would be some combination using %.15s but I can't >> get that to work. :-( > >>>> '%.15s' % 'abcdefghij1234567' > 'abcdefghij12345' >>>> '{:.15}'.format('abcdefghij1234567') > 'abcdefghij12345' >>>> > > Or, depending on what you're after: > >>>> '%15.15s' % 'abcde' > ' abcde' >>>> '%15.15s' % 'abcdefghij1234567' > 'abcdefghij12345' Huh. Wish I'd known about that way back when! ;) -- ~Ethan~ From ethan at stoneleaf.us Tue Jan 14 22:07:57 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 13:07:57 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <20140114215730.502b4142@fsol> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <20140114215730.502b4142@fsol> Message-ID: <52D5A72D.1040208@stoneleaf.us> On 01/14/2014 12:57 PM, Antoine Pitrou wrote: > On Tue, 14 Jan 2014 11:56:25 -0800 > Ethan Furman wrote: >> >> %s, because it is the most general, has the most convoluted resolution: >> >> - input type is bytes? >> pass it straight through > > It should try to get a Py_buffer instead. Meaning any bytes or bytes-subtype will support the Py_buffer protocol, and this should be the first thing we try? Sounds good. For that matter, should the first test be "does this object support Py_buffer" and not worry about it being isinstance(obj, bytes)? >> - input type is numeric? >> use its __xxx__ [1] [2] method and ascii-encode it (strictly) > > What is the definition of "numeric"? That is a key question. Obviously we have int, float, and complex. We also have Decimal. But what about Fraction? Or some users numeric class that doesn't inherit from a core numeric type? Wherever we draw the line, we need to make it's well-documented. -- ~Ethan~ From python at mrabarnett.plus.com Tue Jan 14 22:34:35 2014 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 14 Jan 2014 21:34:35 +0000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <52D59A6B.5030803@stoneleaf.us> Message-ID: <52D5AD6B.2070408@mrabarnett.plus.com> On 2014-01-14 20:54, Guido van Rossum wrote: > On Tue, Jan 14, 2014 at 12:13 PM, Ethan Furman wrote: >> On 01/14/2014 10:52 AM, Guido van Rossum wrote: >>> >>> Which reminds me. Quite a few people have spoken out in favor of loud >>> failures rather than silent "wrong" output. But I think that in the >>> specific context of formatting output, there is a long and IMO good >>> tradition of producing (slightly) wrong output in favor of more strict >>> behavior. Consider for example what to do when a number doesn't fit in >>> the given width. Would you rather raise an exception, truncate the >>> value, or mess up the formatting? >> >> One more data point to consider: When the binary format has strict rules on >> how much space a data-point is allowed, then failure is the only appropriate >> option. > > Yes, that's how the struct module works. > >> In Py2, because '%15s' can actually take 17 characters, I have to use '%15s' >> % data_value[:15] everywhere. > > Wow. I thought there would be some combination using %.15s but I can't > get that to work. :-( > I've not sure what you mean here: Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> import string >>> '%.15s' % string.letters 'abcdefghijklmno' >>> len(_) 15 >> I'm not suggesting we change how that portion works, as it would then be, I >> think, too different from both Py2 behavior as well as current str behavior, >> but likewise adding in single quotes would of no help to me. Loud failure >> so I can easily see where I forgot the .encode() would be much more helpful. > > If we go with a more restricted version this makes sense indeed. The > single quotes seemed unavoidable when I was trying (like several other > proposals) to have a format code that works for all types. I think > we're rightly giving up on that now. > > (I should review PEP 461, but I don't have time yet.) > From ncoghlan at gmail.com Tue Jan 14 22:37:00 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Jan 2014 07:37:00 +1000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 15 Jan 2014 04:16, "Guido van Rossum" wrote: > > [Other readers: asciistr is at https://github.com/jeamland/asciicompat] > > On Mon, Jan 13, 2014 at 11:44 PM, Nick Coghlan wrote: > > Right, asciistr is designed for a specific kind of hybrid API where > > you want to accept binary input (and produce binary output) *and* you > > want to accept text input (and produce text output). Porting those > > from Python 2 to Python 3 is painful not because of any limitations of > > the str or bytes API but because it's the only use case I have found > > where I actually *missed* the implicit interoperability offered by the > > Python 2 str type. > > Yes, the use case is clear. > > > It's not an implementation style I would consider appropriate for the > > standard library - we need to code very defensively in order to aid > > debugging in arbitrary contexts, so I consider having an API like > > urllib.parse demand 7-bit ASCII in the binary version, and require > > text to handle impure input to be a better design choice. > > This surprises me. I think asciistr should strive to be useful for the > stdlib as well. The concerns you raise are the reason I'm not sure that's possible - just as in the Python 2 text model, I suspect actually *using* asciistr will trade ease of development against robust detection of input errors. I'm OK with that in a PyPI module, I'd be dubious about including it in the standard library and making it a builtin is right out. > > However, in an environment where you can place greater preconditions > > on your inputs (such as "ensure all input data is ASCII compatible") > > That gives me the Python 2 willies. :-( Yep - from a formal correctness point of view, asciistr is a terrible idea. That's not the only consideration in coding though, or we'd all be using statically typed languages :) > > and you're willing to tolerate the occasional obscure traceback for > > particular kinds of errors, > > Really? Can you give an example where the traceback using asciistr() > would be more obscure than using the technique you used in > urllib.parse? In urllib.parse I do an up front check that everything is consistently bytes or str. With asciistr it becomes tempting to skip that up front check, so you instead get a TypeError about not being able to add str and bytes. Technically you could keep that up front check and only use asciistr as an internal implementation detail, but at that point you may as well do things properly and write the algorithm to operate solely on bytes or str and convert the other inputs appropriately (which is the actual approach we use in the standard library). > > then it should be a convenient way to use > > common constants (like separators or URL scheme names) in an algorithm > > that can manipulate either binary or text, but not a combination of > > the two (the latter is still a nice improvement in correctness over > > Python 2, which allowed them to be mixed freely rather than requiring > > consistency across the inputs). > > Unfortunately I suspect there are still examples where asciistr's > "submissive" behavior can produce surprises. E.g. consider a function > of two arguments that must either be both bytes or both str. It's > easily conceivable that for certain combinations of incorrect > arguments (i.e. one bytes and one str) the function doesn't raise an > error but returns something of one or the other type. (And this is > exactly the Python 2 outcome we're trying to avoid.) Yep - that's why I consider asciistr to be firmly in the "power tool" category. If you know what you're doing, it should let you write hybrid API code that is just as concise as Python 2, but it's also far more error prone than the core Python 3 text model. I admit that's a key part of my motivation in trying to help Benno to create it - I want to show that it's not that you *can't* write code that way in Python 3, it's that there are good reasons why you *shouldn't*. And in cases where those reasons don't apply... well, the aim in that case is "pip install asciicompat" and away you go :) > > It's still slightly different from Python 2, though. In Python 2, the > > interaction model was: > > > > str & str -> str > > str & unicode -> unicode > > > > (with the one exception being str.format: that consistently produces > > str rather than promoting to Unicode) > > Or raises good old UnicodeError. :-( Unless Benno fixed it in the last couple of days (which seems unlikely given the complexity of the problem), asciistr currently has the Python 3 behaviour of interpolating the bytes repr() into the string rather than trying to decode it. That's a key reason why it likely *won't* be a substitute for PEP 460. > > My goal for asciistr is that it should exhibit the following behaviour: > > > > str & asciistr -> str > > asciistr & asciistr -> str (making it asciistr would be a pain and > > I don't have a use case for that) > > I almost had one in the example code I sent in response to Greg. > > > bytes & asciistr -> bytes > > I understand that '&' here stands for "any arbitrary combination", but > what about searches? Given that asciistr's base class is str, won't it > still blow up if you try to use it as an argument to e.g. > bytes.startswith()? Equality tests also sound problematic; is b'x' == > asciistr('x') == 'x' ??? Yes, the aim is to take advantage of the fact that bytes generally interoperates with anything that publishes a PEP 3118 buffer - the key feature of asciistr is that it publishes the 8-bit segment from PEP 393 as that buffer (the constructor checks that the max code point is 127 or less). It's very CPython specific due to the tinkering with str internals, but the idea is mostly to show that the semantics of such a type *can* still be expressed relatively sensibly in Python 3, it's just not an approach that's going to be applicable very often (most Python 3 native code will be able to choose to be a binary or text API, so the need for this kind of hybrid API design mostly affects APIs that started life in Python 2 and hence still need to support both use cases). > > So in code like that in urllib.parse (but in a more constrained > > context), you could just switch all your constants to asciistr, change > > your indexing operations to length 1 slices and then in theory > > essentially the same code that worked in Python 2 should also work in > > Python 3. > > The more I think about this, the less I believe it's that easy. I > suspect you had the right idea when you mentioned singledispatch. It > might be easier to write the bytes version in terms of the string > versions wrapped in decode/encode, or vice versa, rather than trying > to reason out all the different combinations of str, bytes, asciistr. Yes - while I don't plan to *actually* switch the way urllib.parse works away from the current higher order function approach (it ain't broke, so there's nothing to fix), I do have a patch in progress that shows how it would look using single dispatch instead. Once I have that done, I'll post it somewhere as a demonstration and update my binary protocol essay to suggest the additional option of using single dispatch to process in the binary or text domain, with optional encoding and decoding steps controlled by the type of the first input. Also: after converting a function that takes a tuple where I wanted to dispatch on the type of the first element, I suspect supporting a "key=lambda args, kwds: type(args[0][0])" argument to singledispatch in Python 3.5 might be a reasonable idea. On the other hand, I haven't explored the possibility of a custom decorator yet, either, so we don't need to do anything hasty :) > > However, Benno is finding that my warning about possible > > interoperability issues was accurate - we have various places where we > > do PyUnicode_Check() rather than PyUnicode_CheckExact(), which means > > we don't always notice a PEP 3118 buffer interface if it is provided > > by a str subclass. > > Not sure I understand this, but I believe him when he says this won't be easy. Essentially, we *want* bytes to see asciistr as a buffer exporter, but in a few places it goes "ah, a str subclass!" instead (which usually isn't what we want). > > We'll look at those as we find them, and either > > work around them (if we can), decide not to support that behaviour in > > asciistr, or else I'll create a patch to resolve the interoperability > > issue. > > > > It's not necessarily a type I'd recommend using in production code, as > > there *will* always be a more explicit alternative that doesn't rely > > on a tricksy C extension type that only works in CPython. However, > > it's a type I think is worth having implemented and available on PyPI, > > even if it's just to disprove the claim that you *can't* write that > > kind of code in Python 3. > > Hm. It is beginning to sound more and more flawed. I also worry that > it will bring back the nightmare of data-dependent UnicodeError back. > E.g. this (from tests/basic.py): > > def test_asciistr_will_not_accept_codepoints_above_127(self): > self.assertRaises(ValueError, asciistr, 'Schr?dinger') > > looks reasonable enough when you assume asciistr() is always used with > a literal as argument -- but I suspect that plenty of people would > misunderstand its purpose and write asciistr(s) as a "clever" way to > turn a string into something that's compatible with both bytes and > strings... :-( Yep - while I do did plan to publish it on PyPI (with a big "actually using this type may eat your data if you're not careful" warning), I'm also open to the idea of just leaving it as a proof of concept on GitHub. I don't see a lot of actual risk in publishing it though, and I think the demonstrable risks encountered when attempting to use it do a reasonable job of showing *why* we changed away from having a core 8-bit string type that behaved that way. Cheers, Nick. > > -- > --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Tue Jan 14 22:38:46 2014 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 14 Jan 2014 21:38:46 +0000 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140114221221.5dbc2541@fsol> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <20140114221221.5dbc2541@fsol> Message-ID: <0E0B7B9E-6269-4D46-9235-107272CDB0CC@gmail.com> On Jan 14, 2014, at 9:12 PM, Antoine Pitrou wrote: > I'm +1 on the sidefile approach. +0 on the various buffer approaches. > -0.5 on the current "sprinkled everywhere" approach. I concur with Antoine except that I'm a full -1 on commingling generated code with hand edited code. Sprinked everywhere interferes with my ability to grok the code. It interferes with code navigation. And it creates a greater risk of accidentally editing the generated code. FWIW, I think everyone should place a lot of weight on Serhiy's comments and suggestions. His reasoning is clear and compelling. And the thoughts are all soundly based on extensive experience with the clinic's effect on the C source code. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Jan 14 22:43:50 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 14 Jan 2014 22:43:50 +0100 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <20140114215730.502b4142@fsol> <52D5A72D.1040208@stoneleaf.us> Message-ID: <20140114224350.24ea3c06@fsol> On Tue, 14 Jan 2014 13:07:57 -0800 Ethan Furman wrote: > > Meaning any bytes or bytes-subtype will support the Py_buffer protocol, and this should be the first thing we try? > > Sounds good. > > For that matter, should the first test be "does this object support Py_buffer" and not worry about it being > isinstance(obj, bytes)? Yes, unless the implementation wants to micro-optimize stuff. > >> - input type is numeric? > >> use its __xxx__ [1] [2] method and ascii-encode it (strictly) > > > > What is the definition of "numeric"? > > That is a key question. > > Obviously we have int, float, and complex. We also have Decimal. The question is also how do you test for them? Decimal is not a core builtin type. Do we need some kind of __bformat__ protocol? Regards Antoine. From yselivanov.ml at gmail.com Tue Jan 14 22:44:16 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 14 Jan 2014 16:44:16 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D5A72D.1040208@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <20140114215730.502b4142@fsol> <52D5A72D.1040208@stoneleaf.us> Message-ID: On January 14, 2014 at 4:36:00 PM, Ethan Furman (ethan at stoneleaf.us) wrote: > > On 01/14/2014 12:57 PM, Antoine Pitrou wrote: > > On Tue, 14 Jan 2014 11:56:25 -0800 > > Ethan Furman wrote: > >> > >> %s, because it is the most general, has the most convoluted > resolution: > >> > >> - input type is bytes? > >> pass it straight through > > > > It should try to get a Py_buffer instead. > > Meaning any bytes or bytes-subtype will support the Py_buffer > protocol, and this should be the first thing we try? > > Sounds good. > > For that matter, should the first test be "does this object support > Py_buffer" and not worry about it being > isinstance(obj, bytes)? > > > >> - input type is numeric? > >> use its __xxx__ [1] [2] method and ascii-encode it (strictly) > > > > What is the definition of "numeric"? > > That is a key question. isinstance(o, numbers.Number) ? Yury From ethan at stoneleaf.us Tue Jan 14 22:28:01 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 13:28:01 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <52D59A6B.5030803@stoneleaf.us> Message-ID: <52D5ABE1.8030303@stoneleaf.us> On 01/14/2014 01:17 PM, Mark Lawrence wrote: > On 14/01/2014 20:54, Guido van Rossum wrote: >> On Tue, Jan 14, 2014 at 12:13 PM, Ethan Furman wrote: >> >>> In Py2, because '%15s' can actually take 17 characters, I have to use '%15s' >>> % data_value[:15] everywhere. >> >> Wow. I thought there would be some combination using %.15s but I can't >> get that to work. :-( >> > > I believe you wanted this. > >>>> a='01234567890123456' >>>> len(a) > 17 >>>> b = '%15.15s' % a >>>> b;len(b) > '012345678901234' > 15 Cool! -- ~Ethan~ From zachary.ware+pydev at gmail.com Tue Jan 14 22:51:49 2014 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Tue, 14 Jan 2014 15:51:49 -0600 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D5A407.2020907@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <52D5A407.2020907@hastings.org> Message-ID: On Tue, Jan 14, 2014 at 2:54 PM, Larry Hastings wrote: > I will consider you a +1 on the "buffer" approach and NaN on the other > approaches. Oops, I'll give you some real numbers: -1 _pickle.original.c +1 _pickle.using-buffer.c +0 _pickle.using-modified-buffer.c +1 _pickle.using-multiple-buffers.c +0 _pickle.using-sidefile.c > That wouldn't be too hard. But conceptually it would make Clinic much more > complicated. For example, I suggest that "later" is a confusing name, > because the output will actually happen *earlier* in the file. "If it's > hard to explain, it may be a bad idea." ;-) Fair enough :). "later" makes sense to me as "there's nothing in the buffer now, but there will be later; dump it here then". The spark for this idea is in _winapi.c, where OverlappedObject's methoddef is actually before any of the methods are implemented which makes a certain amount of sense as a list of what will be implemented; but as far as I can tell, it isn't possible to replicate this with Clinic right now. Having read the readme in your examples, this could also help with the chicken-and-egg problem you talked about using the various buffers: dump docstrings at the top, followed by prototypes, then methoddef defines near where they're needed (or even perhaps output them directly into the PyMethodDef structure, no defines needed). -- Zach From tjreedy at udel.edu Tue Jan 14 22:55:44 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 14 Jan 2014 16:55:44 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <878uuievm5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Let me answer you both since the issues are related. On 1/14/2014 7:46 AM, Nick Coghlan wrote: >> Guido van Rossum writes: >> > And that is precisely my point. When you're using a format string, Bytes interpolation uses a bytes format, or a byte string if you will, but it should not be thought of as a character or text string. Certain bytes (123 and 125) delimit a replacement field. The bytes in between define, in my version, a format-spec after being ascii-decoded to text for input to 3.x format(). The decoding and subsequent encoding would not be needed if 2.7 format(ob, byte-spec) were available. >> > all of the format string (not just the part between { and }) had >> > better use ASCII or an ASCII superset. I am not even sure what you mean here. The bytes outside of 123 and 125 are simply copied to the output string. There is no encoding or interpretation involved. It is true that the uninterpred bytes best not contain a byte pattern mistakenly recognized as a replacement field. I plan to refine the relational expression byte pattern used in byteformat to sharply reduce the possibility of such errors. When such errors happen anyway, an exception should be raised, and I plan to expand the error message to give more diagnostic information. >> And this (rightly) constrains the output to an ASCII superset as well. What does this mean? I suspect I disagree. The bytes interpolated into the output bytes can be any bytes. >> Except that if you interpolate something like Shift JIS, Bytes interpolation interpolates bytes, not encodings. A self-identifying byte stream starts with bytes in a known encoding that specifies the encoding of the rest of the stream. Neither part need be encoded text. (Would that something like were standard for encoded text streams, as well as for serialized images.) >> [snip] > Right, that's the danger I was worried about, but the problem is that > there's at least *some* minimum level of ASCII compatibility that > needs to be assumed in order to define an interpolation format at all > (this is the point I originally missed). I would put this sightly differently. To process bytes, we may define certain bytes as metabytes with a special meaning. We may choose the bytes that happen to be the ascii encoding of certain characters. But once the special numbers are chosen, they are numbers, not characters. The problem of metabytes having both a normal and special meaning is similar to the problem of metacharacters having both a normal and special meaning. > For printf-style formatting, > it's % along with the various formatting characters and other syntax > (like digits, parentheses, variable names and "."), with the format > method it's braces, brackets, colons, variable names, etc. It is the bytes corresponding to these characters. This is true also of the metabytes in an re module bytes pattern. > The mini-language parser has to assume in encoding > in order to interpret the format string, This is where I disagree with you and Guido. Bytes processing is done with numbers 0 <= n <= 255, not characters. The fact that ascii characters can, for convenience, be used in bytes literals to indicate the corresponding ascii codes does not change this. A bytes parser looks for certain special numbers. Other numbers need not be given any interpretation and need not represent encoded characters. > and that's *all* done assuming an ASCII compatible format string Since any bytes can be be regarded as an ascii-compatible latin-1 encoded string, that seems like a vacuous assumption. In any case, I do not seen any particular assumption in the following, other than the choice of replacement field delimiters. >>> list(byteformat(bytes([1,2,10, 123, 125, 200]), (bytes([50, 100, 150]),))) [1, 2, 10, 50, 100, 150, 200] > (which must make life interesting if you try to use an > ASCII incompatible coding cookie for your source code - I'm actually > not sure what the full implications of that *are* for bytes literals > in Python 3). An interesting and important question. The Python 2 manual says that the coding cookie applies to only to comments and strings. To me, this suggests that any encoding can be used. I am not sure how and when the encoding is applied. It suggests that the sequence of bytes resulting from a string literal is not determined by the sequence of characters comprising the string literal, but also depends on the coding cookie. The Python 3 manual says that the coding cookie applies to the whole source file. To me, this says that the subset of unicode chars included in the encoding *must* include the ascii characters. It also suggest to me that the encoding must also ascii-compatible, in order to read the encoding in the ascii-text coding cookie (unless there is a fallback to the system encoding). In any case, a 3.x source file is decoded to unicode. When the sequence of unicode chars comprising a bytes literal is interpreted, the resulting sequence of bytes depends only on the literal and not the file encoding. So list(b'()'), for instance, should always be [123, 125] in 3.x. My comments above about byte processing assume that this is so. -- Terry Jan Reedy From greg.ewing at canterbury.ac.nz Tue Jan 14 22:59:15 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jan 2014 10:59:15 +1300 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <52D4F34B.1000507@canterbury.ac.nz> Message-ID: <52D5B333.10904@canterbury.ac.nz> Guido van Rossum wrote: > def spam(a): > r = asciistr('(') > if a: r += a.strip() > r += asciistr(')') > return r > > The general fix would be to add > > else: r += a[:0] The awkwardness might be reducable if asciistr let you write something like r = asciistr('(', a) meaning "give me either a string or bytes containing the value '(', depending on the type of a". But taking a step back, how bad would it really be if an asciistr were returned in this case? Is it just that asciistr doesn't behave exactly like a str in all situations, so it might break something? If so, would it help if asciistr were a built-in type, so that other things could be made aware of it? -- Greg From v+python at g.nevcal.com Tue Jan 14 23:02:58 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 14 Jan 2014 14:02:58 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <878uuievm5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D5B412.6000607@g.nevcal.com> On 1/14/2014 4:46 AM, Nick Coghlan wrote: > The one remaining way I could potentially see a formatb method working > is along the lines of what Glenn (I think) suggested: just like struct > definitions, the formatb specifier would have to consist*solely* of > substitution fields. However, that's getting awfully close to being > just an alternate spelling for the struct module or bytes.join at that > point, which hardly makes for a compelling case to add two new methods > to a builtin type. Yes, after someone drew the parallel between my "format specifier only" pedantry, and struct.pack (which I hadn't used), I agree that they are almost just different spellings for the same things. The two differences I could see is that struct.pack doesn't support variable length items, and struct.pack doesn't support "interpolation", which is the whole beauty of the % type syntax... the ability to have a template, and interpolate values. My pedantry DID allow for template work, but they had to be specified in HEX the way I specified it yesterday. Let me repeat that syntax: b"%{hex-codes}v" That was mostly so the format string could be ASCII, yet represent any byte. That is somewhat clunky, when actually wanting to represent characters. At the next level of abstraction, one could define a "format builder" that would take Unicode specifications, and "compile" them into the binary interpolation strings, but if doing that, you could just as well compile them into functions using struct.pack formats, with the parameters interspersed with the "template" data, except for struct.pack's inability to deal with variable length data. So struct is attempting to emulate C structs, and variable length data is extremely awkward in C structs also, so I guess it provides a good emulation :) So if I were to look for features to add to Python3 to support template interpolation for users of non-ASCII encodings, which could, of course, also be used by users of ASCII-based encodings, I guess I would recommend: 1) extend struct to handle variable length data items 2) provide a sample format compiler function that would translate a Unicode format description into a function that would use struct.pack, and pre-encode (according to the format specification) the template parts into parameters for struct.pack). -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 14 23:07:15 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Jan 2014 08:07:15 +1000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D5B333.10904@canterbury.ac.nz> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <52D4F34B.1000507@canterbury.ac.nz> <52D5B333.10904@canterbury.ac.nz> Message-ID: On 15 Jan 2014 08:00, "Greg Ewing" wrote: > > Guido van Rossum wrote: >> >> def spam(a): >> r = asciistr('(') >> if a: r += a.strip() >> r += asciistr(')') >> return r >> >> The general fix would be to add >> >> else: r += a[:0] > > > The awkwardness might be reducable if asciistr let > you write something like > > r = asciistr('(', a) > > meaning "give me either a string or bytes containing > the value '(', depending on the type of a". > > But taking a step back, how bad would it really be > if an asciistr were returned in this case? Is it > just that asciistr doesn't behave exactly like a str > in all situations, so it might break something? > > If so, would it help if asciistr were a built-in > type, so that other things could be made aware of > it? That way lies the Python 2 text model, and we're not going there. It's probably best to think of asciistr as a way of demonstrating a rhetorical point about the superiority of the Python 3 text model rather than something that anyone should actually use in production Python 3 code (although, depending on how rough the edges turn out to be, it *might* eventually find a place in some single source 2/3 code bases, as well as in prototype code and personal scripts). Cheers, Nick. > > -- > Greg > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Tue Jan 14 23:07:48 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 14 Jan 2014 23:07:48 +0100 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D59C74.30702@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> Message-ID: Am 14.01.2014 21:22, schrieb Larry Hastings: > > On 01/11/2014 07:35 PM, Larry Hastings wrote: >> >> On 01/08/2014 07:08 AM, Barry Warsaw wrote: >>> How hard would it be to put together some sample branches that provide >>> concrete examples of the various options? >>> >>> My own opinion could easily be influenced by having some hands-on time with >>> actual code, and I suspect even Guido could be influenced if he could pull >>> some things up in his editor and take a look around. >> >> I've uploaded a prototype here: >> >> https://bitbucket.org/larry/python-clinic-buffer >> > > > I have now received exactly zero feedback about the prototype, which suggests > people aren't using it. In an attempt to jump-start this conversation, I've > created a new repository containing the "concrete examples of the various > options" that Barry proposed above. You may find it here: > > https://bitbucket.org/larry/clinic-buffer-samples/src > > In it I converted Modules/_pickle.c four different ways. There's a README, > please read it. > > People who want to change how Clinic writes its output: this is your big > chance. Comment on these samples, or produce your own counterexamples, or > something. If you can enough people on your side maybe Clinic will change. If > there is no further debate on this topic, nothing will happen and Clinic will > not change. Having converted several modules to AC, I think I'm -1 original +0 sidefile +1 multiple buffers +0 buffer -0 modified buffer Georg From greg.ewing at canterbury.ac.nz Tue Jan 14 23:12:48 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jan 2014 11:12:48 +1300 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <52D4F34B.1000507@canterbury.ac.nz> Message-ID: <52D5B660.1070401@canterbury.ac.nz> Guido van Rossum wrote: > Actually, Nick explained that asciistr() + asciistr() returns str, That part seems wrong to me, because it means that you can't write polymorphic byte/string functions that are composable. I would be -1 on that, and prefer that asciistr + asciistr --> asciistr. -- Greg From ncoghlan at gmail.com Tue Jan 14 23:17:31 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Jan 2014 08:17:31 +1000 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D5A72D.1040208@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <20140114215730.502b4142@fsol> <52D5A72D.1040208@stoneleaf.us> Message-ID: On 15 Jan 2014 07:36, "Ethan Furman" wrote: > > On 01/14/2014 12:57 PM, Antoine Pitrou wrote: >> >> On Tue, 14 Jan 2014 11:56:25 -0800 >> Ethan Furman wrote: >>> >>> >>> %s, because it is the most general, has the most convoluted resolution: >>> >>> - input type is bytes? >>> pass it straight through >> >> >> It should try to get a Py_buffer instead. > > > Meaning any bytes or bytes-subtype will support the Py_buffer protocol, and this should be the first thing we try? > > Sounds good. > > For that matter, should the first test be "does this object support Py_buffer" and not worry about it being isinstance(obj, bytes)? Yep. I actually suggest adjusting the %s handling to: - interpolate Py_buffer exporters directly - interpolate __bytes__ if defined - reject anything with an "encode" method - otherwise interpolate str(obj).encode("ascii") >>> - input type is numeric? >>> use its __xxx__ [1] [2] method and ascii-encode it (strictly) >> >> >> What is the definition of "numeric"? > > > That is a key question. As suggested above, I would flip the question and explicitly *disallow* implicit encoding of any object with its own "encode" method, while allowing everything else. Cheers, Nick. > > Obviously we have int, float, and complex. We also have Decimal. > > But what about Fraction? Or some users numeric class that doesn't inherit from a core numeric type? Wherever we draw the line, we need to make it's well-documented. > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 14 23:21:05 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Jan 2014 08:21:05 +1000 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D5B660.1070401@canterbury.ac.nz> References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <52D4F34B.1000507@canterbury.ac.nz> <52D5B660.1070401@canterbury.ac.nz> Message-ID: On 15 Jan 2014 08:14, "Greg Ewing" wrote: > > Guido van Rossum wrote: >> >> Actually, Nick explained that asciistr() + asciistr() returns str, > > > That part seems wrong to me, because it means that > you can't write polymorphic byte/string functions > that are composable. > > I would be -1 on that, and prefer that > asciistr + asciistr --> asciistr. You have to pretty much reimplement str to do that. I wouldn't say no to a patch that implemented it, but we're unlikely to do that much work ourselves for something which is primarily intended as a proof of concept. Cheers, Nick. > > -- > Greg > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jan 14 23:22:18 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 14:22:18 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <20140114215730.502b4142@fsol> <52D5A72D.1040208@stoneleaf.us> Message-ID: <52D5B89A.4070504@stoneleaf.us> On 01/14/2014 02:17 PM, Nick Coghlan wrote: > > On 15 Jan 2014 07:36, "Ethan Furman" > wrote: >> >> On 01/14/2014 12:57 PM, Antoine Pitrou wrote: >>> >>> On Tue, 14 Jan 2014 11:56:25 -0800 >>> Ethan Furman > wrote: >>>> >>>> >>>> %s, because it is the most general, has the most convoluted resolution: >>>> >>>> - input type is bytes? >>>> pass it straight through >>> >>> >>> It should try to get a Py_buffer instead. >> >> >> Meaning any bytes or bytes-subtype will support the Py_buffer protocol, and this should be the first thing we try? >> >> Sounds good. >> >> For that matter, should the first test be "does this object support Py_buffer" and not worry about it being isinstance(obj, bytes)? > > Yep. I actually suggest adjusting the %s handling to: > > - interpolate Py_buffer exporters directly > - interpolate __bytes__ if defined > - reject anything with an "encode" method > - otherwise interpolate str(obj).encode("ascii") > >>>> - input type is numeric? >>>> use its __xxx__ [1] [2] method and ascii-encode it (strictly) >>> >>> >>> What is the definition of "numeric"? >> >> >> That is a key question. > > As suggested above, I would flip the question and explicitly *disallow* implicit encoding of any object with its own > "encode" method, while allowing everything else. Um, int and floats (for example) don't have an .encode method, don't export Py_buffer, don't have a __bytes__ method... Ah! so it would hit the last case, I see. The danger I see with that route is that any ol' object could then make it into the byte stream, and considering what byte streams are for I think we should make the barrier for entry higher than just relying on a __str__ or __repr__. -- ~Ethan~ From larry at hastings.org Tue Jan 14 23:24:44 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 14 Jan 2014 14:24:44 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <0E0B7B9E-6269-4D46-9235-107272CDB0CC@gmail.com> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <20140114221221.5dbc2541@fsol> <0E0B7B9E-6269-4D46-9235-107272CDB0CC@gmail.com> Message-ID: <52D5B92C.5040300@hastings.org> On 01/14/2014 01:38 PM, Raymond Hettinger wrote: > > On Jan 14, 2014, at 9:12 PM, Antoine Pitrou > wrote: > >> I'm +1 on the sidefile approach. +0 on the various buffer approaches. >> -0.5 on the current "sprinkled everywhere" approach. > > I concur with Antoine except that I'm a full -1 on commingling > generated code with hand edited code. Sprinked everywhere > interferes with my ability to grok the code. It interferes with > code navigation. And it creates a greater risk of accidentally > editing the generated code. > > FWIW, I think everyone should place a lot of weight on > Serhiy's comments and suggestions. His reasoning is > clear and compelling. And the thoughts are all soundly > based on extensive experience with the clinic's effect on > the C source code. For the record I don't much care which of these Clinic does. My hope is just that the Python core dev community accepts Argument Clinic. If it forms a consensus around changing Clinic's output I'd be happy to oblige. But there's one important caveat to the above. As I recall, Guido has stated that he hates storing generated code in separate files. He has yet to rescind or weaken that pronouncement. Until such time as he does, the "side file" approach is off the table. I implemented it in the prototype purely for the purpose of fostering debate, so the "side file" proponents can try to convince him that it's necessary or that it's not so bad. But it's not going in without Guido's approval. As you yourself say--"Python is Guido's language, he just lets us use it." I'm not the person you have to convince, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Tue Jan 14 23:28:19 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 14 Jan 2014 14:28:19 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D5A353.7070002@stoneleaf.us> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <52D5A353.7070002@stoneleaf.us> Message-ID: <52D5BA03.6000005@hastings.org> On 01/14/2014 12:51 PM, Ethan Furman wrote: > I checked the README, the current file, and the buffered files. My > preferences from highest to lowest: > > - modified buffer approach > - buffer approach > - side file > Could you put that in the form of numbers from +1 to -1? I'm literally making a spreadsheet to tally people's votes. > Thanks for taking the time, Larry! Thanks for participating in this sham democracy! ;-) //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 14 23:38:11 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Jan 2014 08:38:11 +1000 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D5B89A.4070504@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <20140114215730.502b4142@fsol> <52D5A72D.1040208@stoneleaf.us> <52D5B89A.4070504@stoneleaf.us> Message-ID: On 15 Jan 2014 08:23, "Ethan Furman" wrote: > > On 01/14/2014 02:17 PM, Nick Coghlan wrote: >> >> >> On 15 Jan 2014 07:36, "Ethan Furman" > wrote: >>> >>> >>> On 01/14/2014 12:57 PM, Antoine Pitrou wrote: >>>> >>>> >>>> On Tue, 14 Jan 2014 11:56:25 -0800 >>>> Ethan Furman > wrote: >>>>> >>>>> >>>>> >>>>> %s, because it is the most general, has the most convoluted resolution: >>>>> >>>>> - input type is bytes? >>>>> pass it straight through >>>> >>>> >>>> >>>> It should try to get a Py_buffer instead. >>> >>> >>> >>> Meaning any bytes or bytes-subtype will support the Py_buffer protocol, and this should be the first thing we try? >>> >>> Sounds good. >>> >>> For that matter, should the first test be "does this object support Py_buffer" and not worry about it being isinstance(obj, bytes)? >> >> >> Yep. I actually suggest adjusting the %s handling to: >> >> - interpolate Py_buffer exporters directly >> - interpolate __bytes__ if defined >> - reject anything with an "encode" method >> - otherwise interpolate str(obj).encode("ascii") >> >>>>> - input type is numeric? >>>>> use its __xxx__ [1] [2] method and ascii-encode it (strictly) >>>> >>>> >>>> >>>> What is the definition of "numeric"? >>> >>> >>> >>> That is a key question. >> >> >> As suggested above, I would flip the question and explicitly *disallow* implicit encoding of any object with its own >> "encode" method, while allowing everything else. > > > Um, int and floats (for example) don't have an .encode method, don't export Py_buffer, don't have a __bytes__ method... Ah! so it would hit the last case, I see. > > The danger I see with that route is that any ol' object could then make it into the byte stream, and considering what byte streams are for I think we should make the barrier for entry higher than just relying on a __str__ or __repr__. Yeah, reading the other thread pointed out the issues with this idea (containers in particular are a problem). I think Brett has the right idea: we shouldn't try to accept numbers for %s in binary interpolation. If we limit it to just buffer exporters and objects with a __bytes__ method then the problem goes away. The numeric codes all exist in Python 2, so the porting requirement to the common 2/3 subset will be to update the cases of binary interpolation of a number with %s to use an appropriate numeric formatting code instead. Cheers, Nick. > > > -- > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jan 14 23:40:51 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Jan 2014 14:40:51 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D5A811.8090406@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D5A811.8090406@stoneleaf.us> Message-ID: I think of PEP 460 as the strict version and PEP 461 as the lenient version. I don't think it makes sense to have more variants. So please collaborate with whichever you like best. :-) On Tue, Jan 14, 2014 at 1:11 PM, Ethan Furman wrote: > On 01/14/2014 01:05 PM, Brett Cannon wrote: > >> On Tue, Jan 14, 2014 at 2:55 PM, Ethan Furman wrote: >> >>> This PEP goes a but further than PEP 460 does, and hopefully spells >>> things out in enough detail so there is no confusion as to what is >>> meant. >> >> >> Are we going down the PEP route with the various ideas? Guido, do >> you want one from me as well or should I not bother? > > > While I can't answer for Guido, I will say I authored this PEP because > Antoine didn't want 460 to be any more liberal than it already was. > > If you collect your ideas together, I'll add them to 461 as questions or > discussions or however is appropriate (assuming you're willing to go that > route). > > -- > ~Ethan~ -- --Guido van Rossum (python.org/~guido) From breamoreboy at yahoo.co.uk Tue Jan 14 23:41:51 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 14 Jan 2014 22:41:51 +0000 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D59622.1070307@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> Message-ID: On 14/01/2014 19:55, Ethan Furman wrote: > This PEP goes a but further than PEP 460 does, and hopefully spells > things out in enough detail so there is no confusion as to what is meant. > > -- > ~Ethan~ Out of plain old curiosity do we have to consider PEP 292 string templates in any way, shape or form, or regarding this debate have they been safely booted into touch? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From storchaka at gmail.com Tue Jan 14 23:41:25 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 15 Jan 2014 00:41:25 +0200 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <0E0B7B9E-6269-4D46-9235-107272CDB0CC@gmail.com> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <20140114221221.5dbc2541@fsol> <0E0B7B9E-6269-4D46-9235-107272CDB0CC@gmail.com> Message-ID: 14.01.14 23:38, Raymond Hettinger ???????(??): > On Jan 14, 2014, at 9:12 PM, Antoine Pitrou > wrote: > >> I'm +1 on the sidefile approach. +0 on the various buffer approaches. >> -0.5 on the current "sprinkled everywhere" approach. > > I concur with Antoine except that I'm a full -1 on commingling > generated code with hand edited code. Sprinked everywhere > interferes with my ability to grok the code. It interferes with > code navigation. And it creates a greater risk of accidentally > editing the generated code. As expected I'm same as Raymond. +1 on the sidefile approach, -1 on the current "sprinkled everywhere" approach, and about 0 on the various buffer approaches. Yet one nitpick. I prefer to have a sidefile with some unique suffix (e.g. .clinic) at the end of file name rather than in the middle. _pickle.c.clinic is better then _pickle.clinic.c (even .c in middle is not needed, it can be _pickle.clinic). My reasons: 1. I very very often use global search in sources. It's my way of navigation and it's my way of investigations. I don't want to get false results in generated files. And it is much easy to specify mask '*.[ch]' or '*.c,*.h' (depending on tool) than specify a mask and negative mask. The latter is even not always possible, I can write cumbersome expression for the find command, but Midnight Commander doesn't support negative masks at all (and perhaps your favorite IDE doesn't support them too). 2. I'm not use any IDE, but if you use, it can be important for you. If IDE shows sources tree, unlikely you want to see generated *.clinic.c files in them. This will increase the list of sources almost twice. 3. Pathname expansion works better with unique endings, You can open all Modules/_io/*.c files, but unlikely you so interested in *.clinic.c files which are matched by former pattern. From guido at python.org Tue Jan 14 23:46:33 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Jan 2014 14:46:33 -0800 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jan 14, 2014 at 1:37 PM, Nick Coghlan wrote: > Yep - that's why I consider asciistr to be firmly in the "power tool" > category. If you know what you're doing, it should let you write hybrid API > code that is just as concise as Python 2, but it's also far more error prone > than the core Python 3 text model. Hm. It sounds like the kind of power tool that only candidates for the Darwin award would use. The more I hear you defend it, the less I think it's a good idea for *anything*. And limiting it to PyPy doesn't make it less dangerous. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Tue Jan 14 23:53:50 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jan 2014 11:53:50 +1300 Subject: [Python-Dev] The asciistr problem In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D5BFFE.9090406@canterbury.ac.nz> Guido van Rossum wrote: > I understand that '&' here stands for "any arbitrary combination", but > what about searches? Given that asciistr's base class is str, won't it > still blow up if you try to use it as an argument to e.g. > bytes.startswith()? Equality tests also sound problematic; is b'x' == > asciistr('x') == 'x' ??? I'm wondering whether asciistr shouldn't be a *type* at all, but just a function that constructs a string with the same type as another string. All of these problems then go away. Instead of foo.startswith(asciistr("prefix")) you would write foo.startswith(asciistr("prefix", foo)) There's also no chance of an asciistr escaping into the wild, because there's no such thing. We probably want a more compact way of writing it, though. Ideally it would support currying. If we have a number of string literals in our function, we'd like to be able to write something like this at the top: def myfunc(a): s = stringtype(a) ... and then use s('foo') to construct all our string literals inside the function. We could go further. If the function has more than one string argument, they're probably constrained to be of the same type, so in the interests of symmetry it would be nice if we could write def myfunc(a, b): s = stringtype(a, b) ... and have it raise a TypeError if a and b are not of the same string type. -- Greg From storchaka at gmail.com Tue Jan 14 23:55:03 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 15 Jan 2014 00:55:03 +0200 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D5A811.8090406@stoneleaf.us> Message-ID: 15.01.14 00:40, Guido van Rossum ???????(??): > I think of PEP 460 as the strict version and PEP 461 as the lenient > version. I don't think it makes sense to have more variants. So please > collaborate with whichever you like best. :-) Perhaps the consensus will be PEP 460.5? Or PEP 460.3, or may be PEP 460.7? From greg.ewing at canterbury.ac.nz Wed Jan 15 00:05:20 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jan 2014 12:05:20 +1300 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52D57F28.6080502@stoneleaf.us> References: <52d57def.0180310a.4b08.ffffa287@mx.google.com> <52D57F28.6080502@stoneleaf.us> Message-ID: <52D5C2B0.5040307@canterbury.ac.nz> Ethan Furman wrote: > On 01/14/2014 10:11 AM, Jim J. Jewett wrote: > >> >> But in terms of explaining the text model, that >> separation is important enough that > > >> (2) It *may* be worth creating a virtual >> split in the documentation. > > > I think (2) is a great idea. I don't think it's such a great idea to belabour this point. The notion of an ASCIIStructuredBytes type seems to assume that you have *either* ascii-encoded text *or* some other kind of data. But many of the use cases for all of this involve constructing a single object, parts of which are one and parts of which are another. It's hard to think of that in terms of virtual classes unless you're willing to imagine that different parts of the same object are of different types, which, for a primitive object like bytes, doesn't make sense in the context of the Python object model. By all means point out that the ascii features of bytes are intended for use on data that happens to be ascii, and shouldn't be used otherwise. But I think that talking about "virtual classes" just risks confusing people, particulary when we have ABCs, which are also a kind of virtual class represented by real class objects. -- Greg From larry at hastings.org Tue Jan 14 23:47:05 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 14 Jan 2014 14:47:05 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <0E0B7B9E-6269-4D46-9235-107272CDB0CC@gmail.com> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <20140114221221.5dbc2541@fsol> <0E0B7B9E-6269-4D46-9235-107272CDB0CC@gmail.com> Message-ID: <52D5BE69.9090003@hastings.org> On 01/14/2014 01:38 PM, Raymond Hettinger wrote: > FWIW, I think everyone should place a lot of weight on > Serhiy's comments and suggestions. His reasoning is > clear and compelling. And the thoughts are all soundly > based on extensive experience with the clinic's effect on > the C source code. One more bit of anecdotal evidence. I suggest that I have easily the most extensive experience with working with Clinic. Serhiy filed his first patch converting to Clinic nine days ago; I've been working on Clinic for about eighteen months, on and off. And I got used to living with the "sprinkling" approach a long long time ago. I no longer ever mistake generated code for handwritten code, and I don't ever modify the generated text. It's basically fine. This is not to dismiss Serhiy's observations. Nor to say that my experiences will be universal. Nor indeed to suggest that learning to live with Clinic's out as it exists today is a desirable skill. I merely suggest that if we didn't modify the output of Clinic it might be survivable. Cheers, //arry/ p.s. " You get used to it. I...I don't even see the code. All I see is blonde, brunette, red-head. Hey, you uh... want a drink?" -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Jan 15 00:31:10 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jan 2014 12:31:10 +1300 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> Message-ID: <52D5C8BE.30802@canterbury.ac.nz> Guido van Rossum wrote: > Quite a few people have spoken out in favor of loud > failures rather than silent "wrong" output. But I think that in the > specific context of formatting output, there is a long and IMO good > tradition of producing (slightly) wrong output in favor of more strict > behavior. Consider for example what to do when a number doesn't fit in > the given width. Would you rather raise an exception, truncate the > value, or mess up the formatting? That depends on the context. If the output is simply a text file whose lines can grow to accommodate the extra width, messing up the formatting probably okay. If it's going into a printed report with a strictly limited width for each column, and anything that doesn't fit is going to get graphically clipped away, with no visual indication that this has happened, it's NOT okay. If it's going into a text file with defined columns for each field, which will be read by something that assumes certain things are in certain columns, it's NOT okay. If it's going into a binary file as a field consisting of a length byte followed by some chars, messing up the formatting is DEFINITELY NOT okay. This latter kind of situation is the one we're talking about. If you do something like b"%c%s" % (len(data), data) and data is a str, then the length byte will be correct, but the data will be (at least) 3 bytes too long. Whatever reads the file then gets out of step at that point, and all hell breaks loose. You do *not* get a nice, easy-to-debug symptom from this kind of thing. You get "Something is wrong somewhere in this 50 megabyte jpg file, good luck on finding out what and why". -- Greg From greg.ewing at canterbury.ac.nz Wed Jan 15 01:03:13 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jan 2014 13:03:13 +1300 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <20140111193226.23cc771d@fsol> <52D18F89.9030006@stoneleaf.us> <20140111202246.022e4458@fsol> <87vbxpdl3y.fsf@uwakimon.sk.tsukuba.ac.jp> <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <52D4F34B.1000507@canterbury.ac.nz> <52D5B333.10904@canterbury.ac.nz> Message-ID: <52D5D041.1070201@canterbury.ac.nz> Nick Coghlan wrote: > > On 15 Jan 2014 08:00, "Greg Ewing" > wrote: > > > If so, would it help if asciistr were a built-in > > type, so that other things could be made aware of > > it? > > That way lies the Python 2 text model, and we're not going there. It's > probably best to think of asciistr as a way of demonstrating a > rhetorical point about the superiority of the Python 3 text model Hmmm... something like "The Python 3 text model is so superior that we have to use this weird hack to write something that makes perfectly good semantic sense but is very awkward to write otherwise" ?-) Anyhow, I've now convinced myself that asciistr as a type is completely unnecessary -- see earlier post. -- Greg From yselivanov.ml at gmail.com Wed Jan 15 01:12:41 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 14 Jan 2014 19:12:41 -0500 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <20140114221221.5dbc2541@fsol> <0E0B7B9E-6269-4D46-9235-107272CDB0CC@gmail.com> Message-ID: Even though I?m not a core dev, I happen to work with cpython source code quite a lot, whether it?s me working on a C extension, or just digging it for some obscure details of how python works. And what I want to say is that cpython sources are great. They are easy to understand even for people who don?t know C. What?s even more important, they are easy to navigate them. Having clinic- produced code here and there will surely complicate this. Of course, if you work with cpython code 24/7, you will adapt, and won?t even notice it, but for occasional users like me it will require more focus. For my use pattern, having clinic to produce a separate file (with a distinct extension like ?.c.clinic?) would be a huge win. Besides just clean source files, it will also make it easier to: - review patches; - work with repository: logs, blames, diffs, etc; - adjusting workflow - in sublime text / eclipse / almost any IDE it would be just a file mask to hide the clinic output completely (and you don?t need to see it anyways). And besides just cpython, as I understand, the clinic should be used not just by cpython core devs for cpython sources, but also by numerous authors of C extensions. So my vote is: +1 for side files 0 for the current state of things -1 for buffers (as it makes no sense to me why would you want to have generated code at almost random places) Thanks, Yury From steve at pearwood.info Wed Jan 15 01:56:43 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 15 Jan 2014 11:56:43 +1100 Subject: [Python-Dev] magic method __bytes__ In-Reply-To: <20140114155850.221D425003F@webabinitio.net> References: <52D4951E.3080104@stoneleaf.us> <20140114155850.221D425003F@webabinitio.net> Message-ID: <20140115005643.GK3869@ando> On Tue, Jan 14, 2014 at 10:58:49AM -0500, R. David Murray wrote: > On Mon, 13 Jan 2014 17:38:38 -0800, Ethan Furman wrote: > > Has anyone actually used __bytes__ yet? What for? > > bytes(email.message.Message()) returns the message object serialized to > "wire format". > > --David > > PS: I've always thought of "wire format" as *including* files...a file is > a just a "wire" with an indefinite destination and transmission time.... Nice analogy! I must steal it :-) -- Steven From v+python at g.nevcal.com Wed Jan 15 01:57:52 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 14 Jan 2014 16:57:52 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52d57def.0180310a.4b08.ffffa287@mx.google.com> References: <52d57def.0180310a.4b08.ffffa287@mx.google.com> Message-ID: <52D5DD10.7000307@g.nevcal.com> On 1/14/2014 10:11 AM, Jim J. Jewett wrote: > Virtual subclass ASCIIStructuredBytes You would first have to define what you meant by a virtual subclass, and that somewhere would have to be linked every place you use the term, because it is a new term. Why not just call the sections of the documentation where ASCII-supporting features of bytes are discussed "Special ASCII support". Calling it that will make it clear that if you are not using ASCII, you need to be careful of using the feature... or contrariwise, that if you are using the feature, you need to be using ASCII. While some ASCII supersets may also be usable with the features, I don't think that should be emphasized in anyway, unless there is specific support for particular ASCII supersets. Using ASCII supersets should be "buyer beware". The whole b"%s" interpolation feature would, appropriately, be described in such a section. -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Wed Jan 15 02:02:17 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 14 Jan 2014 17:02:17 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <20140114215730.502b4142@fsol> <52D5A72D.1040208@stoneleaf.us> <52D5B89A.4070504@stoneleaf.us> Message-ID: <52D5DE19.40103@g.nevcal.com> On 1/14/2014 2:38 PM, Nick Coghlan wrote: > > I think Brett has the right idea: we shouldn't try to accept numbers > for %s in binary interpolation. If we limit it to just buffer > exporters and objects with a __bytes__ method then the problem goes away. > > The numeric codes all exist in Python 2, so the porting requirement to > the common 2/3 subset will be to update the cases of binary > interpolation of a number with %s to use an appropriate numeric > formatting code instead. > +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Jan 15 02:08:34 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 15 Jan 2014 12:08:34 +1100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140115010833.GL3869@ando> On Tue, Jan 14, 2014 at 10:16:17AM -0800, Guido van Rossum wrote: > Hm. It is beginning to sound more and more flawed. I also worry that > it will bring back the nightmare of data-dependent UnicodeError back. > E.g. this (from tests/basic.py): > > def test_asciistr_will_not_accept_codepoints_above_127(self): > self.assertRaises(ValueError, asciistr, 'Schr?dinger') > > looks reasonable enough when you assume asciistr() is always used with > a literal as argument -- but I suspect that plenty of people would > misunderstand its purpose and write asciistr(s) as a "clever" way to > turn a string into something that's compatible with both bytes and > strings... :-( I am one of those people. I've been trying to keep on top of this enormous multiple-thread discussion, and although I haven't read every single post in its entirety, I thought I understand the purpose of asciistr was exactly that, to produce something that was compatible with both bytes and strings. -- Steven From rmsr at lab.net Wed Jan 15 02:32:23 2014 From: rmsr at lab.net (Ryan Smith-Roberts) Date: Tue, 14 Jan 2014 17:32:23 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D59C74.30702@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> Message-ID: I favor a dual-mode approach. I think the existing behavior is best for the conversion of existing modules, because it's easy to interactively verify the generated code. Once that's done, long-term maintenance definitely favors a more centralized format. +1 _pickle.original.c /* used only during conversion of existing modules */ +0 _pickle.using-buffer.c +1 _pickle.using-modified-buffer.c -1 _pickle.using-multiple-buffers.c NaN _pickle.using-sidefile.c /* not enough experience with it */ Pondering it this afternoon, I thought of a configuration that minimizes both code churn and readability impact: two buffers. One at the top containing forward declarations and defines (an inline header file if you like), and the rest of the autogenerated code at the bottom. It's not obvious that AC currently supports this configuration, or backtracking of any kind. Nonetheless: +1 _pickle.using-two-buffers.c On Tue, Jan 14, 2014 at 12:22 PM, Larry Hastings wrote: > > On 01/11/2014 07:35 PM, Larry Hastings wrote: > > > On 01/08/2014 07:08 AM, Barry Warsaw wrote: > > How hard would it be to put together some sample branches that provide > concrete examples of the various options? > > My own opinion could easily be influenced by having some hands-on time with > actual code, and I suspect even Guido could be influenced if he could pull > some things up in his editor and take a look around. > > > I've uploaded a prototype here: > > https://bitbucket.org/larry/python-clinic-buffer > > > > I have now received exactly zero feedback about the prototype, which > suggests people aren't using it. In an attempt to jump-start this > conversation, I've created a new repository containing the "concrete > examples of the various options" that Barry proposed above. You may find > it here: > > https://bitbucket.org/larry/clinic-buffer-samples/src > > In it I converted Modules/_pickle.c four different ways. There's a > README, please read it. > > People who want to change how Clinic writes its output: this is your big > chance. Comment on these samples, or produce your own counterexamples, or > something. If you can enough people on your side maybe Clinic will > change. If there is no further debate on this topic, nothing will happen > and Clinic will not change. > > > */arry* > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmsr at lab.net Wed Jan 15 02:34:57 2014 From: rmsr at lab.net (Ryan Smith-Roberts) Date: Tue, 14 Jan 2014 17:34:57 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> Message-ID: On Tue, Jan 14, 2014 at 5:32 PM, Ryan Smith-Roberts wrote: > NaN _pickle.using-sidefile.c /* not enough experience with it */ > I hate to weasel like that. Intellectually I think I favor the sidefile over all other approaches for its cleanliness. But I'd have to actively use it in a workflow a bit to know how practical it is. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jan 15 02:53:44 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 17:53:44 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D5BA03.6000005@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <52D5A353.7070002@stoneleaf.us> <52D5BA03.6000005@hastings.org> Message-ID: <52D5EA28.8000703@stoneleaf.us> On 01/14/2014 02:28 PM, Larry Hastings wrote: > On 01/14/2014 12:51 PM, Ethan Furman wrote: >> I checked the README, the current file, and the buffered files. My >> preferences from highest to lowest: >> >> +1 modified buffer approach >> +0.5 buffer approach >> +0 side file NaN on the others is fine. ;) -- ~Ethan~ From steve at pearwood.info Wed Jan 15 03:07:41 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 15 Jan 2014 13:07:41 +1100 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <52D5D041.1070201@canterbury.ac.nz> References: <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <52D4F34B.1000507@canterbury.ac.nz> <52D5B333.10904@canterbury.ac.nz> <52D5D041.1070201@canterbury.ac.nz> Message-ID: <20140115020741.GM3869@ando> On Wed, Jan 15, 2014 at 01:03:13PM +1300, Greg Ewing wrote: > Nick Coghlan wrote: > >That way lies the Python 2 text model, and we're not going there. It's > >probably best to think of asciistr as a way of demonstrating a > >rhetorical point about the superiority of the Python 3 text model > > Hmmm... something like "The Python 3 text model is so > superior that we have to use this weird hack to write > something that makes perfectly good semantic sense > but is very awkward to write otherwise" ?-) I don't think mixing bytes and strings makes good semantic sense. If this discussion has taught me anything, it is that mixing the two is "Here Be Dragons" territory, fraught with danger. It may be that there are applications where mixing them is *unavoidable*, but I think that it's never *sensible*. -- Steven From ethan at stoneleaf.us Wed Jan 15 03:24:02 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 18:24:02 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D5DE19.40103@g.nevcal.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <20140114215730.502b4142@fsol> <52D5A72D.1040208@stoneleaf.us> <52D5B89A.4070504@stoneleaf.us> <52D5DE19.40103@g.nevcal.com> Message-ID: <52D5F142.4070802@stoneleaf.us> On 01/14/2014 05:02 PM, Glenn Linderman wrote: > On 1/14/2014 2:38 PM, Nick Coghlan wrote: >> >> I think Brett has the right idea: we shouldn't try to accept numbers >> for %s in binary interpolation. If we limit it to just buffer >> exporters and objects with a __bytes__ method then the problem goes away. >> >> The numeric codes all exist in Python 2, so the porting requirement to >> the common 2/3 subset will be to update the cases of binary >> interpolation of a number with %s to use an appropriate numeric >> formatting code instead. >> > +1 Agreed, PEP updated. -- ~Ethan~ From larry at hastings.org Wed Jan 15 03:55:58 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 14 Jan 2014 18:55:58 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> Message-ID: <52D5F8BE.1030309@hastings.org> On 01/14/2014 05:32 PM, Ryan Smith-Roberts wrote: > Pondering it this afternoon, I thought of a configuration that > minimizes both code churn and readability impact: two buffers. One at > the top containing forward declarations and defines (an inline header > file if you like), and the rest of the autogenerated code at the > bottom. It's not obvious that AC currently supports this > configuration, or backtracking of any kind. Clinic is strictly one pass currently. I could add this feature to the prototype if there was sufficient interest; for now, I'd accept a patch to the clinic-buffer-samples repo adding a sample of your proposal. Please start with "_pickle.original.c", and add simulated (but deliberately invalid!) Clinic instructions for an authentic flair. I suggest the name "forward" for the destination, and "_pickle.using-forward.buffer.c" for the filename. I take it "forward" would get the methoddef_define, the docstring_prototype, and the parser_prototype, "block" would get the impl_prototype, and "buffer" would get the docstring_definition and the parser_definition? I'm happy to collect votes for this approach too. I'll put you down as a +1 //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rl.ward at bigpond.com Wed Jan 15 01:53:05 2014 From: rl.ward at bigpond.com (Rob Ward) Date: Wed, 15 Jan 2014 11:53:05 +1100 Subject: [Python-Dev] Binding problem Message-ID: I apologise if I have come to the wrong place here, but 12hrs searching, plus experimenting, on the WWW for advice on it has not yielded any successful advice to resolve the issue. I am am having trouble binding an Entry widget to Here is the snippet (butCol is a Frame widget to put buttons,labels and text entry down LHS) KS1=StringVar() KS1.set("Key is ??") butCol.ks1 =Label(butCol,textvariable=KS1).grid(column=0,row=18,sticky=(N,W)) myKey = [0,2,4,5,7,9,11] #Begin in the key of C KS2 =StringVar() KS2.set("C") butCol.ks2 =Entry(butCol,width=20,textvariable=KS2).grid(column=0,row=19,sticky=(N,W)) The above lines all render correctly, but will not trigger off "entry" of data at all on Adding the following line just crashes. butCol.ks2.bind("",chooseKey) I downloaded the Python 3 package from the recommended site last week to begin this project (it was previously working OK in Skulptor, but I was not prepared to develop it any further in that environment, though it was an excellent starting environment). So I believe the Python language installed is up todate. I have not previously installed Python so it should be "clean". If you could give any direct advice I would be very grateful or if you can direct me to your best "forum" site maybe I could use that. One overall observation that has frustrated me is how toi search for information that relates to Python3 and the latest tkinter modules. I kept finding old python or old Tkinter or combinations of both. Wrorking my way through this was very time consuming and rarely productive. Is there any advice on how to get the "latest" information off the WWW? Cheers, Rob Ward PS In my state of eternal optimism I have attached the whole file :-) PPS I have done some OOP in the past but not keen to jump in at the moment. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: myGui07.py URL: From cs at zip.com.au Wed Jan 15 03:56:06 2014 From: cs at zip.com.au (Cameron Simpson) Date: Wed, 15 Jan 2014 13:56:06 +1100 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52d59354.0476310a.5f4e.ffffcd72@mx.google.com> References: <52d59354.0476310a.5f4e.ffffcd72@mx.google.com> Message-ID: <20140115025606.GA88703@cskk.homeip.net> On 14Jan2014 11:43, Jim Jewett wrote: > Greg Ewing replied: > >> ... ASCII compatible binary data is a > >> *subset* of arbitrary binary data. > > I wrote: [...] > > (2) It *may* be worth creating a virtual > > split in the documentation. [...] > > Ethan likes the idea, but points out that the term > "Virtual" is confusing here. [...] > (A) What word should I use instead of "Virtual"? > Imaginary? Pretend? I'd title it in terms of a common use case, not a "virtual class". You even phrase the opening sentence as a use case already. > (B) Would it be good/bad/at least make the docs > easier to create an actual class (or alias)? > (C) Same question for a pair of classes provided > only in the documentation, like example code. I don't think so. People might use it:-( [...] > > A Bytes object could represent anything, [...] Tiny nit: shouldn't that be "bytes", not "Bytes"? > > appropriate as the underlying storage for a sound sample > > or image file. > > > > Virtual subclass ASCIIStructuredBytes > > ==================================== Possible alternate title: Common use case: bytes containing text sequences, especially ASCII Cheers, -- Cameron Simpson I think... Therefore I ride. I ride... Therefore I am. - Mark Pope From tjreedy at udel.edu Wed Jan 15 04:35:00 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 14 Jan 2014 22:35:00 -0500 Subject: [Python-Dev] Binding problem In-Reply-To: References: Message-ID: On 1/14/2014 7:53 PM, Rob Ward wrote: > I apologise if I have come to the wrong place here, Yes, you have ;-). pydev is for development *of* future versions of Python. Try python-list for development *with* current version. -- Terry Jan Reedy From python at mrabarnett.plus.com Wed Jan 15 04:40:13 2014 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 15 Jan 2014 03:40:13 +0000 Subject: [Python-Dev] Binding problem In-Reply-To: References: Message-ID: <52D6031D.8000603@mrabarnett.plus.com> On 2014-01-15 00:53, Rob Ward wrote: > I apologise if I have come to the wrong place here, but 12hrs searching, > plus experimenting, on the WWW for advice on it has not yielded any > successful advice to resolve the issue. > > I am am having trouble binding an Entry widget to > > Here is the snippet (butCol is a Frame widget to put buttons,labels and > text entry down LHS) > > KS1=StringVar() > KS1.set("Key is ??") > butCol.ks1 > =Label(butCol,textvariable=KS1).grid(column=0,row=18,sticky=(N,W)) > > myKey = [0,2,4,5,7,9,11] #Begin in the key of C > KS2 =StringVar() > KS2.set("C") > butCol.ks2 > =Entry(butCol,width=20,textvariable=KS2).grid(column=0,row=19,sticky=(N,W)) > > The above lines all render correctly, but will not trigger off "entry" > of data at all on > > Adding the following line just crashes. > > butCol.ks2.bind("",chooseKey) > > I downloaded the Python 3 package from the recommended site last week to > begin this project (it was previously working OK in Skulptor, but I was > not prepared to develop it any further in that environment, though it > was an excellent starting environment). So I believe the Python > language installed is up todate. I have not previously installed Python > so it should be "clean". > > If you could give any direct advice I would be very grateful or if you > can direct me to your best "forum" site maybe I could use that. > > One overall observation that has frustrated me is how toi search for > information that relates to *Python3* and the latest *tkinte*r modules. > I kept finding old python or old Tkinter or combinations of both. > Wrorking my way through this was very time consuming and rarely > productive. Is there any advice on how to get the "latest" information > off the WWW? > > Cheers, Rob Ward > > PS In my state of eternal optimism I have attached the whole file :-) > > PPS I have done some OOP in the past but not keen to jump in at the moment. > I doubt it crashes. It's more likely that raises an exception complaining that 'None' doesn't have a 'bind' attribute. That's because the .grid method returns None. (So the .pack method.) Try this: butCol.ks2 = Entry(butCol, width=20, textvariable=KS2) butCol.ks2.grid(column=0, row=19, sticky=(N, W)) The same comment applies in a number of other places. From guido at python.org Wed Jan 15 05:04:52 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Jan 2014 20:04:52 -0800 Subject: [Python-Dev] Binding problem In-Reply-To: References: Message-ID: Hey Rob, The place to get help with Tkinter is tkinter-discuss at python.org. I've CC'ed that list for you. --Guido On Tue, Jan 14, 2014 at 4:53 PM, Rob Ward wrote: > I apologise if I have come to the wrong place here, but 12hrs searching, > plus experimenting, on the WWW for advice on it has not yielded any > successful advice to resolve the issue. > > I am am having trouble binding an Entry widget to > > Here is the snippet (butCol is a Frame widget to put buttons,labels and text > entry down LHS) > > KS1=StringVar() > KS1.set("Key is ??") > butCol.ks1 > =Label(butCol,textvariable=KS1).grid(column=0,row=18,sticky=(N,W)) > > myKey = [0,2,4,5,7,9,11] #Begin in the key of C > KS2 =StringVar() > KS2.set("C") > butCol.ks2 > =Entry(butCol,width=20,textvariable=KS2).grid(column=0,row=19,sticky=(N,W)) > > The above lines all render correctly, but will not trigger off "entry" of > data at all on > > Adding the following line just crashes. > > butCol.ks2.bind("",chooseKey) > > I downloaded the Python 3 package from the recommended site last week to > begin this project (it was previously working OK in Skulptor, but I was not > prepared to develop it any further in that environment, though it was an > excellent starting environment). So I believe the Python language installed > is up todate. I have not previously installed Python so it should be > "clean". > > If you could give any direct advice I would be very grateful or if you can > direct me to your best "forum" site maybe I could use that. > > One overall observation that has frustrated me is how toi search for > information that relates to Python3 and the latest tkinter modules. I kept > finding old python or old Tkinter or combinations of both. Wrorking my way > through this was very time consuming and rarely productive. Is there any > advice on how to get the "latest" information off the WWW? > > > > Cheers, Rob Ward > > PS In my state of eternal optimism I have attached the whole file :-) > > PPS I have done some OOP in the past but not keen to jump in at the moment. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From jimjjewett at gmail.com Wed Jan 15 05:08:48 2014 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 14 Jan 2014 23:08:48 -0500 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D57F28.6080502@stoneleaf.us> <52d59354.0476310a.5f4e.ffffcd72@mx.google.com> Message-ID: On Tue, Jan 14, 2014 at 3:06 PM, Guido van Rossum wrote: > Personally I wouldn't add any words suggesting or referring to the > option of creation another class for this purpose. You wouldn't > recommend subclassing dict for constraining the types of keys or > values, would you? Yes, and it is so clear that I suspect I'm missing some context for your question. Do I recommend that each individual application should create new concrete classes instead of just using the builtins? No. When trying to understand (learn about) the text/binary distinction, I do recommend pretending that they are represented by separate classes. Limits on the values in a bytearray are NOT the primary reason for this; the primary reason is that operations like the literal representation or the capitalize method are arbitrary nonsense unless the data happens to be representing ASCII. sound_sample.capitalize() -- syntactically valid, but semantic garbage header.capitalize() -- OK, which implies that data is an instance of something more specific than bytes. Would I recommend subclassing dict if I wanted to constrain the key types? Yes -- though MutableMapping (fewer gates to guard) or the upcoming TransformDict would probably be better still. The existing dict implementation itself effectively uses (hidden, quasi-)subclasses to restrict types of keys strictly for efficiency. (lookdict* variants) -jJ From stephen at xemacs.org Wed Jan 15 05:13:50 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 15 Jan 2014 13:13:50 +0900 Subject: [Python-Dev] magic method __bytes__ In-Reply-To: <20140114155850.221D425003F@webabinitio.net> References: <52D4951E.3080104@stoneleaf.us> <20140114155850.221D425003F@webabinitio.net> Message-ID: <87txd5dgqp.fsf@uwakimon.sk.tsukuba.ac.jp> R. David Murray writes: > a file is a just a "wire" with an indefinite destination and > transmission time +1 QOTW Of course! "Store and ... wait for it ... forward" architecture 4-ever! Store and Forward, Inc. Since 1969. From guido at python.org Wed Jan 15 05:17:03 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Jan 2014 20:17:03 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D57F28.6080502@stoneleaf.us> <52d59354.0476310a.5f4e.ffffcd72@mx.google.com> Message-ID: I am exhausted from all these discussions. I just recommend not touching those docs. On Tue, Jan 14, 2014 at 8:08 PM, Jim Jewett wrote: > On Tue, Jan 14, 2014 at 3:06 PM, Guido van Rossum wrote: >> Personally I wouldn't add any words suggesting or referring to the >> option of creation another class for this purpose. You wouldn't >> recommend subclassing dict for constraining the types of keys or >> values, would you? > > Yes, and it is so clear that I suspect I'm missing some context for > your question. > > Do I recommend that each individual application should create new > concrete classes instead of just using the builtins? No. > > When trying to understand (learn about) the text/binary distinction, I > do recommend pretending that they are represented by separate classes. > Limits on the values in a bytearray are NOT the primary reason for > this; the primary reason is that operations like the literal > representation or the capitalize method are arbitrary nonsense unless > the data happens to be representing ASCII. > > sound_sample.capitalize() -- syntactically valid, but semantic garbage > header.capitalize() -- OK, which implies that data is an instance > of something more specific than bytes. > > Would I recommend subclassing dict if I wanted to constrain the key > types? Yes -- though MutableMapping (fewer gates to guard) or the > upcoming TransformDict would probably be better still. > > The existing dict implementation itself effectively uses (hidden, > quasi-)subclasses to restrict types of keys strictly for efficiency. > (lookdict* variants) > > -jJ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Wed Jan 15 05:18:37 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Jan 2014 17:18:37 +1300 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <20140115020741.GM3869@ando> References: <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <52D4F34B.1000507@canterbury.ac.nz> <52D5B333.10904@canterbury.ac.nz> <52D5D041.1070201@canterbury.ac.nz> <20140115020741.GM3869@ando> Message-ID: <52D60C1D.1000701@canterbury.ac.nz> Steven D'Aprano wrote: > I don't think mixing bytes and strings makes good semantic sense. It's not about mixing bytes and text -- it's about writing polymorphic code that will work on either bytes *or* text. Not both at the same time. If we had quantum computers, this would be easy to solve: asciistr would be of type str/sqrt(2) + bytes/sqrt(2), and everything would work out fine. :-) -- Greg From larry at hastings.org Wed Jan 15 05:31:32 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 14 Jan 2014 20:31:32 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D59C74.30702@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> Message-ID: <52D60F24.80805@hastings.org> On 01/14/2014 12:22 PM, Larry Hastings wrote: > > On 01/11/2014 07:35 PM, Larry Hastings wrote: >> I've uploaded a prototype here: >> >> https://bitbucket.org/larry/python-clinic-buffer >> > > [...] I've created a new repository containing the "concrete examples > of the various options" that Barry proposed above. You may find it here: > > https://bitbucket.org/larry/clinic-buffer-samples/src > I've added a fourth feature to the prototype: set line_prefix lets you set a string that is prepended to every line of code generated by Clinic. Documentation is in the text file in the root. I also updated the clinic-buffer-samples repository to match. There's now a "prefixes" subdirectory, with copies of all the samples adding a per-line prefix of "/*clinic*/ ". Does that make Clinic any easier to swallow for anybody? Cheers, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From meadori at gmail.com Wed Jan 15 06:23:53 2014 From: meadori at gmail.com (Meador Inge) Date: Tue, 14 Jan 2014 23:23:53 -0600 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <20140114221221.5dbc2541@fsol> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <20140114221221.5dbc2541@fsol> Message-ID: On Tue, Jan 14, 2014 at 3:12 PM, Antoine Pitrou wrote: On Tue, 14 Jan 2014 12:22:12 -0800 > Larry Hastings wrote: > > > > https://bitbucket.org/larry/clinic-buffer-samples/src > > > > In it I converted Modules/_pickle.c four different ways. There's a > > README, please read it. > > I'm +1 on the sidefile approach. +0 on the various buffer approaches. > -0.5 on the current "sprinkled everywhere" approach. > After converting a few modules, I feel about the same. The sprinkling does clutter the file. Although, I do wonder if we can simplify things a bit for the "sideline" file by using macros and headers. You could write the original definition like: /*[clinic input begin] _pickle.PicklerMemoProxy.copy self: PicklerMemoProxyObject Copy the memo to a new object. [clinic input end]*/ static PyObject * _PICKLE_PICKLERMEMOPROXY_COPY(PyObject *self, PyObject *Py_UNUSED(ignored)) { ... } and then generate a header like: PyDoc_STRVAR(_pickle_PicklerMemoProxy_copy__doc__, "copy()\n" "Copy the memo to a new object."); #define _PICKLE_PICKLERMEMOPROXY_COPY_METHODDEF \ {"copy", (PyCFunction)_pickle_PicklerMemoProxy_copy, METH_NOARGS, _pickle_PicklerMemoProxy_copy__doc__}, static PyObject * _pickle_PicklerMemoProxy_copy_impl(PicklerMemoProxyObject *self); #define _PICKLE_PICKLERMEMOPROXY_COPY(a, b) \ _pickle_PicklerMemoProxy_copy(PyObject *self, PyObject *Py_UNUSED(ignored)) \ { \ PyObject *return_value = NULL; \ \ return_value = _pickle_PicklerMemoProxy_copy_impl((PicklerMemoProxyObject *)self); \ \ return return_value; \ } \ \ static PyObject * \ _pickle_PicklerMemoProxy_copy_impl(PicklerMemoProxyObject *self) \ This way the docstring, method def, and argument parsing code is out of the way, but you still retain the helpful comments in the implementation file. I am pretty sure this gets around the "where do I inject the side file part" too. You also don't have to do much more editing than the original scheme: write the clinic comment, #iinclude a header, and then apply the macro. That being said, this is somewhat half baked and some folks don't like macros. I just wanted to throw it out there since it seems like a reasonable compromise. FWIW, I have worked on several large programs that generate C header and implementation files on the side and it has never bothered me that much. Well, unless, something goes wrong :-) -- # Meador -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Jan 15 06:57:57 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 15 Jan 2014 14:57:57 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <20140115010833.GL3869@ando> References: <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <20140115010833.GL3869@ando> Message-ID: <87sispdbx6.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > I thought I understand the purpose of asciistr was exactly that, to > produce something that was compatible with both bytes and strings. asciistr *canonizes* something as an ASCII string, and therefore compatible with both bytes and str. It can't *create* such a thing ex nihilo. From tseaver at palladion.com Wed Jan 15 07:07:50 2014 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 15 Jan 2014 01:07:50 -0500 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: <87sispdbx6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <20140115010833.GL3869@ando> <87sispdbx6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/15/2014 12:57 AM, Stephen J. Turnbull wrote: > asciistr *canonizes* something as an ASCII string, and therefore > compatible with both bytes and str. It can't *create* such a thing > ex nihilo. How many miracles must be attested? Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLWJbYACgkQ+gerLs4ltQ7RHACfft2ysdHiE9zJM72ycqi0Uqyl s5EAnR9Z21tgqsFVsPUEPiWgtXNxCWF4 =Thyi -----END PGP SIGNATURE----- From stephen at xemacs.org Wed Jan 15 07:58:23 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 15 Jan 2014 15:58:23 +0900 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <878uuievm5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r489d94g.fsf@uwakimon.sk.tsukuba.ac.jp> > Right, that's the danger I was worried about, but the problem is that > there's at least *some* minimum level of ASCII compatibility that > needs to be assumed in order to define an interpolation format at all > (this is the point I originally missed). Only if you insist that bytes formats be admitted. But that's an implementation detail, really. (I'm not going to push that point, since it's the obvious way to request a bytes result, and insist on the various restrictions and semantic differences proposed for bytes interpolation -- anything else would be silly.) More seriously, it's irrelevant *post*-interpolation, because by definition bytes interpolation interpolates bytes, not "ASCII compatible". So what you're saying is iterated interpolation is crazy: width1, width2 = compute_column_widths(table_rows) fmt = b"%%%ds %%%ds\n" % (width1, width2) for row in table_rows: print(fmt % row) # might be useful in debugging ;-) # writing to a file is plausible IMO Tell me again why we have a '%%' format code? :-) > (which must make life interesting if you try to use an ASCII > incompatible coding cookie for your source code - I'm actually not > sure what the full implications of that *are* for bytes literals in > Python 3). Currently None: me 15:46$ python3.3 test.py File "test.py", line 2 SyntaxError: bytes can only contain ASCII literal characters. :-) > It's certainly a decision that has its downsides, with the potential > impact on users of ASCII incompatible encodings (mostly in Asia) Which is most of the world at this point. You ISO-8859-speakers are gonna wither away! :-) Nor do I think there's anybody crazy enough to make a Tiananmen Square-style stand against GB18030. In 2025 this could be Python's most sensitive Achilles' heel. Hm. Maybe I should put a fractional coefficient on that . > being the main one, but I think the increased convenience in > working with ASCII compatible binary protocols and file formats is > worth the cost. But there aren't any ASCII-compatible binary protocols in the sense that Shift JIS is *not* ASCII compatible. After interpolation, you end up with something that's not ASCII compatible. At the very least, the "iterated interpolation is a bad idea" misfeature needs to be documented. From ethan at stoneleaf.us Wed Jan 15 08:34:05 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 14 Jan 2014 23:34:05 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D60F24.80805@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <52D60F24.80805@hastings.org> Message-ID: <52D639ED.3000605@stoneleaf.us> On 01/14/2014 08:31 PM, Larry Hastings wrote: > > I've added a fourth feature to the prototype: > > set line_prefix > > lets you set a string that is prepended to every line of code generated by Clinic. Without the coloring support of my editor I would find that very useful indeed. But since I have coloring support (as soon as I hack it in, anyway ;), I'll vote +0.25. -- ~Ethan~ From storchaka at gmail.com Wed Jan 15 10:59:53 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 15 Jan 2014 11:59:53 +0200 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D5BE69.9090003@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <20140114221221.5dbc2541@fsol> <0E0B7B9E-6269-4D46-9235-107272CDB0CC@gmail.com> <52D5BE69.9090003@hastings.org> Message-ID: 15.01.14 00:47, Larry Hastings ???????(??): > I also can not change the text, but twice I was a witness as others did. And I see that make this mistake very easily. I also didn't modify the generated text, but twice I was a witness as others did. And I see that make this mistake very easily. From storchaka at gmail.com Wed Jan 15 10:52:02 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 15 Jan 2014 11:52:02 +0200 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D5B92C.5040300@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <20140114221221.5dbc2541@fsol> <0E0B7B9E-6269-4D46-9235-107272CDB0CC@gmail.com> <52D5B92C.5040300@hastings.org> Message-ID: 15.01.14 00:24, Larry Hastings ???????(??): > But there's one important caveat to the above. As I recall, Guido has > stated that he hates storing generated code in separate files. He has > yet to rescind or weaken that pronouncement. Until such time as he > does, the "side file" approach is off the table. I implemented it in > the prototype purely for the purpose of fostering debate, so the "side > file" proponents can try to convince him that it's necessary or that > it's not so bad. But it's not going in without Guido's approval. As > you yourself say--"Python is Guido's language, he just lets us use it." Yes, we know. The conviction of Guido is the purpose of this topic. I hope that his silence is a good sign. Perhaps he doubts and weighs arguments. Personally, I believe that if we don't change Argument Clinic output now, we will have to do it in 3.5 or 4.0, when many people work with the code, but at the cost of history cluttering, what I want to avoid. From stephen at xemacs.org Wed Jan 15 11:46:34 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 15 Jan 2014 19:46:34 +0900 Subject: [Python-Dev] PEP 460: allowing %d and %f and mojibake In-Reply-To: References: <52D308EF.8080108@stoneleaf.us> <87mwj0eg9e.fsf@uwakimon.sk.tsukuba.ac.jp> <52D364E1.5060704@stoneleaf.us> <87iotodunh.fsf@uwakimon.sk.tsukuba.ac.jp> <52D47F6E.40904@canterbury.ac.nz> <87d2jvdu08.fsf@uwakimon.sk.tsukuba.ac.jp> <20140115010833.GL3869@ando> <87sispdbx6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87ppntcyk5.fsf@uwakimon.sk.tsukuba.ac.jp> Tres Seaver writes: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 01/15/2014 12:57 AM, Stephen J. Turnbull wrote: > > > asciistr *canonizes* something as an ASCII string, and therefore > > compatible with both bytes and str. It can't *create* such a thing > > ex nihilo. > > How many miracles must be attested? You'll have to ask Pope Benno I. From stephen at xemacs.org Wed Jan 15 11:57:16 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 15 Jan 2014 19:57:16 +0900 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <52d57def.0180310a.4b08.ffffa287@mx.google.com> References: <52D4845B.10009@canterbury.ac.nz> <52d57def.0180310a.4b08.ffffa287@mx.google.com> Message-ID: <87ob3dcy2b.fsf@uwakimon.sk.tsukuba.ac.jp> Aside: OK, Guido, ya got me. I have a separate screed recounting the reasons for my apostasy, but that's probably not interesting any more. I'll send it to individuals on request. > But in terms of explaining the text model, that > separation is important enough that > > (1) We should be reluctant to strengthen the > "its really just ASCII" messages. True. I think the right message is is "Unless you know why you *desperately* want this, not only don't you need it, but using it is the Python equivalent of skydiving without a parachute." N.B. Don't take the metaphor as an insult. I think it's become clear that those who "desperately want this" not only use parachutes, they pack their own. No need to worry about them. > (2) It *may* be worth creating a virtual > split in the documentation. Please don't. All we need to tell naive users is: Look at the structure of the bytes. If that structure is "text", convert to str using .decode(). Please don't use bytes. If that structure isn't text, you're in a specialist domain, and it's your problem. Many structured uses of bytes use ASCII- encoded keywords: we provide bytes methods for handling them, but you *must* be aware that these methods *cannot* distinguish "bytes representing text encoded as ASCII" from "any old bytes". Be warned: They will happily -- and silently -- corrupt the latter. Make sure you respect the higher-level structure of your data when using them. > Virtual subclass ASCIIStructuredBytes > ==================================== > > One particularly common use of bytes is to represent > the contents of a file, or of a network message. In > these cases, the bytes will often represent Text > *in a specific encoding* and that encoding will usually > be a superset of ASCII. Rather than create and support > an ASCIIStructuredBytes subclass, Python simply added > support for these use cases straight to Bytes objects, > and assumes that this support simply won't be used when > when it does not make sense. For example, bytes literals This is going quite the wrong direction, I think. The only people who should care about "Text *in a specific encoding* and that encoding will usually be a superset of ASCII" are codec writers, and by now writing those is a very rare task. Everybody else uses ASCII keywords in a simple formal language. > *could* be used to construct a sound sample, but the > literals will be far easier to read when they are used > to represent (encoded) ASCII text, such as "OPEN". From martin at v.loewis.de Wed Jan 15 13:22:44 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 15 Jan 2014 13:22:44 +0100 Subject: [Python-Dev] Common subset of python 2 and python 3 In-Reply-To: References: Message-ID: <52D67D94.3010700@v.loewis.de> Am 12.01.14 18:39, schrieb Nachshon David Armon: >>> I propose that this new version of python use the python 3 unicode model. >>> As the version of python will be fully compatible with both python 2 and >>> with python 3 but NOT necsesarily with all existing code in either. It is >>> designed as a porting tool only. I don't think that it is possible to write an interpreter that is fully compatible for all it accepts. Would you think that the program print(repr(2**80).endswith("L")) is in the subset that should be supported by both Python 2 and Python 3? Notice that it prints "True" in Python 2 and "False" in Python 3. So if this common-version interpreter *rejects* the above program, which operation (**, repr, endswith) would you want to ban from subset? Regards, Martin From solipsis at pitrou.net Wed Jan 15 13:35:00 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 15 Jan 2014 13:35:00 +0100 Subject: [Python-Dev] Common subset of python 2 and python 3 References: <52D67D94.3010700@v.loewis.de> Message-ID: <20140115133500.22bebaea@fsol> On Wed, 15 Jan 2014 13:22:44 +0100 "Martin v. L?wis" wrote: > Am 12.01.14 18:39, schrieb Nachshon David Armon: > >>> I propose that this new version of python use the python 3 unicode model. > >>> As the version of python will be fully compatible with both python 2 and > >>> with python 3 but NOT necsesarily with all existing code in either. It is > >>> designed as a porting tool only. > > I don't think that it is possible to write an interpreter that is fully > compatible for all it accepts. Would you think that the program > > print(repr(2**80).endswith("L")) > > is in the subset that should be supported by both Python 2 and Python 3? > > Notice that it prints "True" in Python 2 and "False" in Python 3. We probably need an "asciibool" that is both true and false. (for some value of "we" ;-)) Regards Antoine. From rosuav at gmail.com Wed Jan 15 14:21:56 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 16 Jan 2014 00:21:56 +1100 Subject: [Python-Dev] Common subset of python 2 and python 3 In-Reply-To: <52D67D94.3010700@v.loewis.de> References: <52D67D94.3010700@v.loewis.de> Message-ID: On Wed, Jan 15, 2014 at 11:22 PM, "Martin v. L?wis" wrote: > I don't think that it is possible to write an interpreter that is fully > compatible for all it accepts. Would you think that the program > > print(repr(2**80).endswith("L")) > > is in the subset that should be supported by both Python 2 and Python 3? Easiest fix for that would be to have long.__repr__ omit the L tag. Then it'll do the same as it would in Py3. ChrisA From stefan-usenet at bytereef.org Wed Jan 15 14:43:07 2014 From: stefan-usenet at bytereef.org (Stefan Krah) Date: Wed, 15 Jan 2014 14:43:07 +0100 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D59C74.30702@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> Message-ID: <20140115134307.GA5276@sleipnir.bytereef.org> Larry Hastings wrote: > https://bitbucket.org/larry/clinic-buffer-samples/src Thanks for doing this! +1 for the sidefile, -1 for the current approach, +-0 for the rest. Stefan Krah From eric at trueblade.com Wed Jan 15 15:25:04 2014 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 15 Jan 2014 09:25:04 -0500 Subject: [Python-Dev] Common subset of python 2 and python 3 In-Reply-To: References: <52D67D94.3010700@v.loewis.de> Message-ID: <52D69A40.3020700@trueblade.com> On 1/15/2014 8:21 AM, Chris Angelico wrote: > On Wed, Jan 15, 2014 at 11:22 PM, "Martin v. L?wis" wrote: >> I don't think that it is possible to write an interpreter that is fully >> compatible for all it accepts. Would you think that the program >> >> print(repr(2**80).endswith("L")) >> >> is in the subset that should be supported by both Python 2 and Python 3? > > Easiest fix for that would be to have long.__repr__ omit the L tag. > Then it'll do the same as it would in Py3. I think Martin's point is not this specific thing, but that such a subset would be useless. Would you drop dict.items() because it returns different things in both languages? Drop range() because it's different? There are many, many such differences. The common subset is not useful. Eric. From storchaka at gmail.com Wed Jan 15 15:31:17 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 15 Jan 2014 16:31:17 +0200 Subject: [Python-Dev] Common subset of python 2 and python 3 In-Reply-To: <52D67D94.3010700@v.loewis.de> References: <52D67D94.3010700@v.loewis.de> Message-ID: 15.01.14 14:22, "Martin v. L?wis" ???????(??): > I don't think that it is possible to write an interpreter that is fully > compatible for all it accepts. Would you think that the program > > print(repr(2**80).endswith("L")) > > is in the subset that should be supported by both Python 2 and Python 3? > > Notice that it prints "True" in Python 2 and "False" in Python 3. This is implementation details. On 128-bit platform special build of Python 2 can print False. Of course there are many other differences between Python 2 and Python 3 besides unicode model and unified integers. From solipsis at pitrou.net Wed Jan 15 15:34:24 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 15 Jan 2014 15:34:24 +0100 Subject: [Python-Dev] Common subset of python 2 and python 3 References: <52D67D94.3010700@v.loewis.de> Message-ID: <20140115153424.4a4d5de7@fsol> On Wed, 15 Jan 2014 16:31:17 +0200 Serhiy Storchaka wrote: > 15.01.14 14:22, "Martin v. L?wis" ???????(??): > > I don't think that it is possible to write an interpreter that is fully > > compatible for all it accepts. Would you think that the program > > > > print(repr(2**80).endswith("L")) > > > > is in the subset that should be supported by both Python 2 and Python 3? > > > > Notice that it prints "True" in Python 2 and "False" in Python 3. > > This is implementation details. On 128-bit platform special build of > Python 2 can print False. If you explicitly create a long the L will always be printed: >>> long(0) 0L Regards Antoine. From rosuav at gmail.com Wed Jan 15 15:40:44 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 16 Jan 2014 01:40:44 +1100 Subject: [Python-Dev] Common subset of python 2 and python 3 In-Reply-To: <52D69A40.3020700@trueblade.com> References: <52D67D94.3010700@v.loewis.de> <52D69A40.3020700@trueblade.com> Message-ID: On Thu, Jan 16, 2014 at 1:25 AM, Eric V. Smith wrote: >> Easiest fix for that would be to have long.__repr__ omit the L tag. >> Then it'll do the same as it would in Py3. > > I think Martin's point is not this specific thing, but that such a > subset would be useless. Would you drop dict.items() because it returns > different things in both languages? Drop range() because it's different? > There are many, many such differences. The common subset is not useful. Fair enough. ChrisA From brett at python.org Wed Jan 15 15:45:10 2014 From: brett at python.org (Brett Cannon) Date: Wed, 15 Jan 2014 09:45:10 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D59669.20404@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: bytes.format() below. I'll leave it to you to decide if they warrant using, leaving as an open question, or rejecting. On Tue, Jan 14, 2014 at 2:56 PM, Ethan Furman wrote: > Duh. Here's the text, as well. ;) > > > PEP: 461 > Title: Adding % and {} formatting to bytes > Version: $Revision$ > Last-Modified: $Date$ > Author: Ethan Furman > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 2014-01-13 > Python-Version: 3.5 > Post-History: 2014-01-13 > Resolution: > > > Abstract > ======== > > This PEP proposes adding the % and {} formatting operations from str to > bytes. > > > Proposed semantics for bytes formatting > ======================================= > > %-interpolation > --------------- > > All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) > will be supported, and will work as they do for str, including the > padding, justification and other related modifiers. > > Example:: > > >>> b'%4x' % 10 > b' a' > > %c will insert a single byte, either from an int in range(256), or from > a bytes argument of length 1. > > Example: > > >>> b'%c' % 48 > b'0' > > >>> b'%c' % b'a' > b'a' > > %s, because it is the most general, has the most convoluted resolution: > > - input type is bytes? > pass it straight through > > - input type is numeric? > use its __xxx__ [1] [2] method and ascii-encode it (strictly) > > - input type is something else? > use its __bytes__ method; if there isn't one, raise an exception [3] > > Examples: > > >>> b'%s' % b'abc' > b'abc' > > >>> b'%s' % 3.14 > b'3.14' > > >>> b'%s' % 'hello world!' > Traceback (most recent call last): > ... > TypeError: 'hello world' has no __bytes__ method, perhaps you need to > encode it? > > .. note:: > > Because the str type does not have a __bytes__ method, attempts to > directly use 'a string' as a bytes interpolation value will raise an > exception. To use 'string' values, they must be encoded or otherwise > transformed into a bytes sequence:: > > 'a string'.encode('latin-1') > > > format > ------ > > The format mini language will be used as-is, with the behaviors as listed > for %-interpolation. > That's too vague; % interpolation does not support other format operators in the same way as str.format() does. % interpolation has specific code to support %d, etc. But str.format() gets supported for {:d} not from special code but because e.g. float.__format__('d') works. So you can't say "bytes.format() supports {:d} just like %d works with string interpolation" since the mechanisms are fundamentally different. This is why I have argued that if you specify it as "if there is a format spec specified, then the return value from calling __format__() will have str.decode('ascii', 'strict') called on it" you get the support for the various number-specific format specs for free. It also means if you pass in a string that you just want the strict ASCII bytes of then you can get it with {:s}. I also think that a 'b' conversion be added to bytes.format(). This doesn't have the same issue as %b if you make {} implicitly mean {!b} in Python 3.5 as {} will mean what is the most accurate for bytes.format() in either version. It also allows for explicit support where you know you only want a byte and allows {!s} to mean you only want a string (and thus throw an error otherwise). And all of this means that much like %s only taking bytes, the only way for bytes.format() to accept a non-byte argument is for some format spec to be specified to trigger the .encode('ascii', 'strict') call. -Brett > > > Open Questions > ============== > > For %s there has been some discussion of trying to use the buffer protocol > (Py_buffer) before trying __bytes__. This question should be answered > before > the PEP is implemented. > > > Proposed variations > =================== > > It has been suggested to use %b for bytes instead of %s. > > - Rejected as %b does not exist in Python 2.x %-interpolation, which is > why we are using %s. > > It has been proposed to automatically use .encode('ascii','strict') for str > arguments to %s. > > - Rejected as this would lead to intermittent failures. Better to have > the > operation always fail so the trouble-spot can be correctly fixed. > > It has been proposed to have %s return the ascii-encoded repr when the > value > is a str (b'%s' % 'abc' --> b"'abc'"). > > - Rejected as this would lead to hard to debug failures far from the > problem > site. Better to have the operation always fail so the trouble-spot > can be > easily fixed. > > > Foot notes > ========== > > .. [1] Not sure if this should be the numeric __str__ or the numeric > __repr__, > or if there's any difference > .. [2] Any proper numeric class would then have to provide an ascii > representation of its value, either via __repr__ or __str__ > (whichever > we choose in [1]). > .. [3] TypeError, ValueError, or UnicodeEncodeError? > > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmurray at bitdance.com Wed Jan 15 16:05:30 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 15 Jan 2014 10:05:30 -0500 Subject: [Python-Dev] Binding problem In-Reply-To: References: Message-ID: <20140115150530.756DC2500D0@webabinitio.net> On Wed, 15 Jan 2014 11:53:05 +1100, "Rob Ward" wrote: > I apologise if I have come to the wrong place here, but 12hrs > searching, plus experimenting, on the WWW for advice on it has not > yielded any successful advice to resolve the issue. This is indeed the wrong place. Your best bet for getting an answer would probably be the python-list mailing list/newsgroup. --David From storchaka at gmail.com Wed Jan 15 16:14:14 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 15 Jan 2014 17:14:14 +0200 Subject: [Python-Dev] Common subset of python 2 and python 3 In-Reply-To: <20140115153424.4a4d5de7@fsol> References: <52D67D94.3010700@v.loewis.de> <20140115153424.4a4d5de7@fsol> Message-ID: 15.01.14 16:34, Antoine Pitrou ???????(??): > If you explicitly create a long the L will always be printed: > >>>> long(0) > 0L Hey! long is not in common subset of Python 2 and Python 3. From larry at hastings.org Wed Jan 15 16:37:29 2014 From: larry at hastings.org (Larry Hastings) Date: Wed, 15 Jan 2014 07:37:29 -0800 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <20140114221221.5dbc2541@fsol> <0E0B7B9E-6269-4D46-9235-107272CDB0CC@gmail.com> <52D5BE69.9090003@hastings.org> Message-ID: <52D6AB39.9060809@hastings.org> On 01/15/2014 01:59 AM, Serhiy Storchaka wrote: > 15.01.14 00:47, Larry Hastings ???????(??): >> I also can not change the text, but twice I was a witness as others >> did. And I see that make this mistake very easily. > > I also didn't modify the generated text, but twice I was a witness as > others did. And I see that make this mistake very easily. That isn't what I wrote...? //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nas at arctrix.com Wed Jan 15 16:47:43 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 15 Jan 2014 15:47:43 +0000 (UTC) Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: This looks pretty good to me. I don't think we should limit operands based on type, that's anti-Pythonic IMHO. We should use duck-typing and that means a special method, I think. We could introduce a new one but __bytes__ looks like it can work. Otherwise, maybe __ascii__ is a good name. Objects that implement __str__ can also implement __bytes__ if they can guarantee that ASCII characters are always returned, no matter what the *value* (we don't want to repeat the hell of Python 2's unicode to str coercion which depends on the value of the unicode object). Objects that already contain encoded bytes or arbitrary bytes can also implement __bytes__. Ethan Furman wrote: > %s, because it is the most general, has the most convoluted resolution: This becomes much simpler: - does the object implement __bytes__? call it and use the value otherwise raise TypeError > It has been suggested to use %b for bytes instead of %s. > > - Rejected as %b does not exist in Python 2.x %-interpolation, which is > why we are using %s. +1. %b might be conceptually neater but ease of migration trumps that, IMHO. > It has been proposed to automatically use .encode('ascii','strict') for str > arguments to %s. > > - Rejected as this would lead to intermittent failures. Better to have the > operation always fail so the trouble-spot can be correctly fixed. Right. That would put us back in Python 2 unicode -> str coercion hell. Thanks for writing the PEP. Neil From eric at trueblade.com Wed Jan 15 16:52:27 2014 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 15 Jan 2014 10:52:27 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: <52D6AEBB.8050806@trueblade.com> On 1/15/2014 9:45 AM, Brett Cannon wrote: > That's too vague; % interpolation does not support other format > operators in the same way as str.format() does. % interpolation has > specific code to support %d, etc. But str.format() gets supported for > {:d} not from special code but because e.g. float.__format__('d') works. > So you can't say "bytes.format() supports {:d} just like %d works with > string interpolation" since the mechanisms are fundamentally different. > > This is why I have argued that if you specify it as "if there is a > format spec specified, then the return value from calling __format__() > will have str.decode('ascii', 'strict') called on it" you get the > support for the various number-specific format specs for free. It also > means if you pass in a string that you just want the strict ASCII bytes > of then you can get it with {:s}. > > I also think that a 'b' conversion be added to bytes.format(). This > doesn't have the same issue as %b if you make {} implicitly mean {!b} in > Python 3.5 as {} will mean what is the most accurate for bytes.format() > in either version. It also allows for explicit support where you know > you only want a byte and allows {!s} to mean you only want a string (and > thus throw an error otherwise). > > And all of this means that much like %s only taking bytes, the only way > for bytes.format() to accept a non-byte argument is for some format spec > to be specified to trigger the .encode('ascii', 'strict') call. Agreed. With %-formatting, you can start with the format strings and then decide what you want to do with the passed in objects. But with .format, it's the other way around: you have to look at the passed in objects being formatted, and then decide what the format specifier means to that type. So, for .format, you could say "hey, that object's an int, and I happen to know how to format ints, outside of calling it's .__format__". Or you could even call its __format__ because you know that it will only be ASCII. But to take this approach, you're going to have to hard-code the types. And subclasses are probably out, since there you don't know what the subclass's __format__ will return. It could be non-ASCII. >>> class Int(int): ... def __format__(self, fmt): ... return u'foo' ... >>> '{}'.format(Int(3)) 'foo' So basically I think we'll have to hard-code the types that .format() will support, and never call __format__, or only call __format__ if we know that it's a exact type where we know that __format__ will return (strict ASCII). Either that, or we're back to encoding the result of __format__ and accepting that sometimes it might throw errors, depending on the values being passed into format(). Eric. From ethan at stoneleaf.us Wed Jan 15 16:57:10 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 07:57:10 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: <52D6AFD6.5020204@stoneleaf.us> On 01/15/2014 06:45 AM, Brett Cannon wrote: > bytes.format() below. I'll leave it to you to decide if they warrant using, leaving as an open question, or rejecting. Thanks for your comments. I've only barely touched format, so it's not an area of strength for me. :) -- ~Ethan~ From solipsis at pitrou.net Wed Jan 15 17:04:15 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 15 Jan 2014 17:04:15 +0100 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: <20140115170415.6363d7c6@fsol> On Wed, 15 Jan 2014 15:47:43 +0000 (UTC) Neil Schemenauer wrote: > > Objects that implement __str__ can also implement __bytes__ if they > can guarantee that ASCII characters are always returned, no matter > what the *value* I think that's a slippery slope. __bytes__ should mean that the object has a well-known bytes equivalent or encoding, not that its __str__ happens to be pure ASCII. (for example, it would be fine for a HTTP message class to define a __bytes__ method) Also, consider that if e.g. float had a __bytes__ method, then bytes(2.0) would start returning b'2.0', while bytes(2) would still need to return b'\x00\x00'. Regards Antoine. From storchaka at gmail.com Wed Jan 15 17:22:45 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 15 Jan 2014 18:22:45 +0200 Subject: [Python-Dev] Changing Clinic's output In-Reply-To: <52D6AB39.9060809@hastings.org> References: <20140107205308.05e1b5ce@fsol> <20140108100814.408a91f2@anarchist.wooz.org> <52D20D86.6030502@hastings.org> <52D59C74.30702@hastings.org> <20140114221221.5dbc2541@fsol> <0E0B7B9E-6269-4D46-9235-107272CDB0CC@gmail.com> <52D5BE69.9090003@hastings.org> <52D6AB39.9060809@hastings.org> Message-ID: 15.01.14 17:37, Larry Hastings ???????(??): > On 01/15/2014 01:59 AM, Serhiy Storchaka wrote: >> 15.01.14 00:47, Larry Hastings ???????(??): >>> I also can not change the text, but twice I was a witness as others >>> did. And I see that make this mistake very easily. >> >> I also didn't modify the generated text, but twice I was a witness as >> others did. And I see that make this mistake very easily. > > That isn't what I wrote...? Gotcha! How I got this? I wanted to quote: > I no longer ever mistake generated code for handwritten code, and I don't ever modify the generated text. From nas at arctrix.com Wed Jan 15 17:27:32 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 15 Jan 2014 16:27:32 +0000 (UTC) Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: Neil Schemenauer wrote: > We should use duck-typing and that means a special method, I > think. We could introduce a new one but __bytes__ looks like it > can work. Otherwise, maybe __ascii__ is a good name. I poked around the Python 3 source. Using __bytes__ has some downsides, e.g. the following would happen: >>> bytes(12) b'12' Perhaps that's a little too ASCII-centric. OTOH, UTF-8 seems to be winning the encoding war and so the above could be argued as reasonable behavior. I think forcing people to explicitly choose an encoding for str objects will be sufficient to avoid the bytes/str mess we have in Python 2. Unfortunately, that change conflicts with the current behavior: >>> bytes(12) b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' Would it be too disruptive to change that? It doesn't appear to be too useful and we could do it using a keyword argument, e.g.: bytes(size=12) I notice something else surprising to me: >>> class Test(object): ... def __bytes__(self): ... return b'test' ... >>> with open('test', 'wb') as fp: ... fp.write(Test()) ... Traceback (most recent call last): File "", line 2, in TypeError: 'Test' does not support the buffer interface I'd expect that to write b'test' to the file, not raise an error. Regards, Neil From ethan at stoneleaf.us Wed Jan 15 16:46:41 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 07:46:41 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> Message-ID: <52D6AD61.9080606@stoneleaf.us> On 01/14/2014 02:41 PM, Mark Lawrence wrote: > On 14/01/2014 19:55, Ethan Furman wrote: >> This PEP goes a but further than PEP 460 does, and hopefully spells >> things out in enough detail so there is no confusion as to what is meant. >> >> -- >> ~Ethan~ > > Out of plain old curiosity do we have to consider PEP 292 string templates in any way, shape or form, or regarding this > debate have they been safely booted into touch? Well, I'm not sure what "booted into touch" means, but yes, we can ignore string templates. :) -- ~Ethan~ From breamoreboy at yahoo.co.uk Wed Jan 15 17:33:18 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 15 Jan 2014 16:33:18 +0000 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D59669.20404@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: On 14/01/2014 19:56, Ethan Furman wrote: > Duh. Here's the text, as well. ;) > > %s, because it is the most general, has the most convoluted resolution: > > - input type is bytes? > pass it straight through > > - input type is numeric? > use its __xxx__ [1] [2] method and ascii-encode it (strictly) > > - input type is something else? > use its __bytes__ method; if there isn't one, raise an exception [3] > > Examples: > > >>> b'%s' % b'abc' > b'abc' > > >>> b'%s' % 3.14 > b'3.14' > > >>> b'%s' % 'hello world!' > Traceback (most recent call last): > ... > TypeError: 'hello world' has no __bytes__ method, perhaps you need > to encode it? > For completeness I believe %r and %a should be included here as well. FTR %a appears to have been introduced in 3.2, but I couldn't find anything in the What's New and there's no note in the docs http://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting to indicate when it first came into play. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ethan at stoneleaf.us Wed Jan 15 17:20:06 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 08:20:06 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <20140115170415.6363d7c6@fsol> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <20140115170415.6363d7c6@fsol> Message-ID: <52D6B536.40501@stoneleaf.us> On 01/15/2014 08:04 AM, Antoine Pitrou wrote: > On Wed, 15 Jan 2014 15:47:43 +0000 (UTC) > Neil Schemenauer wrote: >> >> Objects that implement __str__ can also implement __bytes__ if they >> can guarantee that ASCII characters are always returned, no matter >> what the *value* > > I think that's a slippery slope. __bytes__ should mean that the object > has a well-known bytes equivalent or encoding, not that its __str__ > happens to be pure ASCII. Agreed. -- ~Ethan~ From ethan at stoneleaf.us Wed Jan 15 17:21:35 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 08:21:35 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: <52D6B58F.2000705@stoneleaf.us> On 01/15/2014 07:47 AM, Neil Schemenauer wrote: > > Thanks for writing the PEP. Thank you for your comments! -- ~Ethan~ From brett at python.org Wed Jan 15 17:50:33 2014 From: brett at python.org (Brett Cannon) Date: Wed, 15 Jan 2014 11:50:33 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D6AEBB.8050806@trueblade.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D6AEBB.8050806@trueblade.com> Message-ID: On Wed, Jan 15, 2014 at 10:52 AM, Eric V. Smith wrote: > On 1/15/2014 9:45 AM, Brett Cannon wrote: > > > That's too vague; % interpolation does not support other format > > operators in the same way as str.format() does. % interpolation has > > specific code to support %d, etc. But str.format() gets supported for > > {:d} not from special code but because e.g. float.__format__('d') works. > > So you can't say "bytes.format() supports {:d} just like %d works with > > string interpolation" since the mechanisms are fundamentally different. > > > > This is why I have argued that if you specify it as "if there is a > > format spec specified, then the return value from calling __format__() > > will have str.decode('ascii', 'strict') called on it" you get the > > support for the various number-specific format specs for free. It also > > means if you pass in a string that you just want the strict ASCII bytes > > of then you can get it with {:s}. > > > > I also think that a 'b' conversion be added to bytes.format(). This > > doesn't have the same issue as %b if you make {} implicitly mean {!b} in > > Python 3.5 as {} will mean what is the most accurate for bytes.format() > > in either version. It also allows for explicit support where you know > > you only want a byte and allows {!s} to mean you only want a string (and > > thus throw an error otherwise). > > > > And all of this means that much like %s only taking bytes, the only way > > for bytes.format() to accept a non-byte argument is for some format spec > > to be specified to trigger the .encode('ascii', 'strict') call. > > Agreed. With %-formatting, you can start with the format strings and > then decide what you want to do with the passed in objects. But with > .format, it's the other way around: you have to look at the passed in > objects being formatted, and then decide what the format specifier means > to that type. > > So, for .format, you could say "hey, that object's an int, and I happen > to know how to format ints, outside of calling it's .__format__". Or you > could even call its __format__ because you know that it will only be > ASCII. But to take this approach, you're going to have to hard-code the > types. And subclasses are probably out, since there you don't know what > the subclass's __format__ will return. It could be non-ASCII. > > >>> class Int(int): > ... def __format__(self, fmt): > ... return u'foo' > ... > >>> '{}'.format(Int(3)) > 'foo' > > So basically I think we'll have to hard-code the types that .format() > will support, and never call __format__, or only call __format__ if we > know that it's a exact type where we know that __format__ will return > (strict ASCII). > > Either that, or we're back to encoding the result of __format__ and > accepting that sometimes it might throw errors, depending on the values > being passed into format(). > I say accept that an error might get thrown as there is precedent of specifying a format spec that an object's __format__() method doesn't recognize:: >>> '{:s}'.format(1) Traceback (most recent call last): File "", line 1, in ValueError: Unknown format code 's' for object of type 'int' IOW I'm actively trying to avoid type-restricting the semantics for bytes.format() for a consistent, clear mental model. Remembering that "any format spec leads to calling .encode('ascii', 'strict') on the result" is simple compared to "ASCII bytes will be returned for ints and floats when passed in, otherwise all other types follow these rules". As the zen says: Errors should never pass silently. Special cases aren't special enough to break the rules. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Jan 15 17:51:06 2014 From: brett at python.org (Brett Cannon) Date: Wed, 15 Jan 2014 11:51:06 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D6AFD6.5020204@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D6AFD6.5020204@stoneleaf.us> Message-ID: On Wed, Jan 15, 2014 at 10:57 AM, Ethan Furman wrote: > On 01/15/2014 06:45 AM, Brett Cannon wrote: > >> bytes.format() below. I'll leave it to you to decide if they warrant >> using, leaving as an open question, or rejecting. >> > > Thanks for your comments. I've only barely touched format, so it's not an > area of strength for me. :) > Time to strengthen it if you are proposing a PEP that is going to affect it. =) > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jan 15 17:57:59 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 08:57:59 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D6AFD6.5020204@stoneleaf.us> Message-ID: <52D6BE17.4080307@stoneleaf.us> On 01/15/2014 08:51 AM, Brett Cannon wrote: > On Wed, Jan 15, 2014 at 10:57 AM, Ethan Furman wrote: > >> Thanks for your comments. I've only barely touched format, so it's not an area of strength for me. :) > > Time to strengthen it if you are proposing a PEP that is going to affect it. =) I am. You're helping. :) -- ~Ethan~ From ijmorlan at uwaterloo.ca Wed Jan 15 17:18:54 2014 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Wed, 15 Jan 2014 11:18:54 -0500 (EST) Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <20140115170415.6363d7c6@fsol> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <20140115170415.6363d7c6@fsol> Message-ID: On Wed, 15 Jan 2014, Antoine Pitrou wrote: > On Wed, 15 Jan 2014 15:47:43 +0000 (UTC) > Neil Schemenauer wrote: >> >> Objects that implement __str__ can also implement __bytes__ if they >> can guarantee that ASCII characters are always returned, no matter >> what the *value* > > I think that's a slippery slope. __bytes__ should mean that the object > has a well-known bytes equivalent or encoding, not that its __str__ > happens to be pure ASCII. +1 > (for example, it would be fine for a HTTP message class to define a > __bytes__ method) > > Also, consider that if e.g. float had a __bytes__ method, then > bytes(2.0) would start returning b'2.0', while bytes(2) would still > need to return b'\x00\x00'. Not actually suggesting the following for a number of reasons including but not limited to the consistency of floating point formats across different implementations, but it would make more sense for bytes (2.0) to return the 8-byte IEEE representation than for it to return the ASCII encoding of the decimal representation of the number. Isaac Morland CSCF Web Guru DC 2619, x36650 WWW Software Specialist From nas at arctrix.com Wed Jan 15 19:09:29 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 15 Jan 2014 18:09:29 +0000 (UTC) Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <20140115170415.6363d7c6@fsol> Message-ID: Antoine Pitrou wrote: > On Wed, 15 Jan 2014 15:47:43 +0000 (UTC) Neil S wrote: >> >> Objects that implement __str__ can also implement __bytes__ if they >> can guarantee that ASCII characters are always returned, no matter >> what the *value* > > I think that's a slippery slope. __bytes__ should mean that the object > has a well-known bytes equivalent or encoding, not that its __str__ > happens to be pure ASCII. After poking around some more into the Python 3 source, I agree. It seems too late to change bytes() and bytearray(). We should have used a keyword only argument but too late now (tp_new is a mess). I can also agree that pushing the ASCII-centric behavior into the bytes() constructor goes too far. If we limit the ASCII-centric behavior solely to % and format(), that seems a reasonable trade-off for usability. As others have argued, once you are using format codes, you are pretty clearly dealing with ASCII encoding. I feel strongly that % and format on bytes needs to use duck-typing and not type checking. Also, formatting falures must be due to types and not due to values. If we can get agreement on these two principles, that will help guide the design. Those principles absolutely rule out call calling encode('ascii') automatically. I'm not deeply intimate with format() but I think it also rules out calling __format__. Could we introduce only __bformat__ and have the % operator call it? That would only require implementing one new special method instead of two. Neil From guido at python.org Wed Jan 15 19:39:27 2014 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Jan 2014 10:39:27 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <20140115170415.6363d7c6@fsol> Message-ID: All sounds good. A fleeting thought about constructors: you can always add alternative constructors as class methods (like datetime does). On Wed, Jan 15, 2014 at 10:09 AM, Neil Schemenauer wrote: > Antoine Pitrou wrote: >> On Wed, 15 Jan 2014 15:47:43 +0000 (UTC) Neil S wrote: >>> >>> Objects that implement __str__ can also implement __bytes__ if they >>> can guarantee that ASCII characters are always returned, no matter >>> what the *value* >> >> I think that's a slippery slope. __bytes__ should mean that the object >> has a well-known bytes equivalent or encoding, not that its __str__ >> happens to be pure ASCII. > > After poking around some more into the Python 3 source, I agree. It > seems too late to change bytes() and bytearray(). > We should have used a keyword only argument but too late now (tp_new > is a mess). > > I can also agree that pushing the ASCII-centric behavior into the > bytes() constructor goes too far. If we limit the ASCII-centric > behavior solely to % and format(), that seems a reasonable trade-off > for usability. As others have argued, once you are using format > codes, you are pretty clearly dealing with ASCII encoding. > > I feel strongly that % and format on bytes needs to use duck-typing > and not type checking. Also, formatting falures must be due to > types and not due to values. If we can get agreement on these two > principles, that will help guide the design. > > Those principles absolutely rule out call calling encode('ascii') > automatically. I'm not deeply intimate with format() but I think it > also rules out calling __format__. > > Could we introduce only __bformat__ and have the % operator call it? > That would only require implementing one new special method instead > of two. > > Neil > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From v+python at g.nevcal.com Wed Jan 15 22:04:41 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 15 Jan 2014 13:04:41 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D6AEBB.8050806@trueblade.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D6AEBB.8050806@trueblade.com> Message-ID: <52D6F7E9.9080101@g.nevcal.com> On 1/15/2014 7:52 AM, Eric V. Smith wrote: > So basically I think we'll have to hard-code the types that .format() > will support, and never call __format__, or only call __format__ if we > know that it's a exact type where we know that __format__ will return > (strict ASCII). > > Either that, or we're back to encoding the result of __format__ and > accepting that sometimes it might throw errors, depending on the values > being passed into format(). Looks like you need to invent __formatb__ to produce only ASCII. Objects that have __formatb__ can be formatted by bytes.format. To avoid coding, it could be possible that __formatb__ might be a callable, in which case it is called to get the result, or not a callable, in which case one calls __format__ and converts the result to ASCII, __formatb__ just indicating a guarantee that only ASCII will result. Or it could be that __formatb__ replaces __format__ and str.__format__, if it finds no __format__ looks for __formatb__, calls that, and converts the result to Unicode. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nas at arctrix.com Wed Jan 15 22:24:27 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 15 Jan 2014 21:24:27 +0000 (UTC) Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D6AEBB.8050806@trueblade.com> <52D6F7E9.9080101@g.nevcal.com> Message-ID: Glenn Linderman wrote: > On 1/15/2014 7:52 AM, Eric V. Smith wrote: >> Either that, or we're back to encoding the result of __format__ and >> accepting that sometimes it might throw errors, depending on the values >> being passed into format(). That would take us back to Python 2 hell. Please no. I don't like checking for types either, we should have a special method. > Looks like you need to invent __formatb__ to produce only ASCII. > Objects that have __formatb__ can be formatted by bytes.format. To > avoid coding, it could be possible that __formatb__ might be a callable > in which case it is called to get the result, or not a callable, in > which case one calls __format__ and converts the result to ASCII, > __formatb__ just indicating a guarantee that only ASCII will result. Just do: def __formatb__(self, spec): return MyClass.__format__(self, spec).encode('ascii') Note that I think it is better to explicitly use the __format__ method rather than using self.__format__. My reasoning is that a subclass might implement a __format__ that returns non-ASCII characters. We don't need a special bytes version of __str__ since the %-operator can call __formatb__ with the correct format spec. Neil From ethan at stoneleaf.us Wed Jan 15 22:24:15 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 13:24:15 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: <52D6FC7F.8090804@stoneleaf.us> On 01/15/2014 06:45 AM, Brett Cannon wrote: > > I also think that a 'b' conversion be added to bytes.format(). This doesn't have the same issue as %b if you make {} > implicitly mean {!b} in Python 3.5 as {} will mean what is the most accurate for bytes.format() in either version. It > also allows for explicit support where you know you only want a byte and allows {!s} to mean you only want a string (and > thus throw an error otherwise). Given that !b does not exist in Py2, !s (like %s) has to mean bytes when working with a byte stream. Given that, !s and !b would mean the same thing, so it worth adding !b? -- ~Ethan~ From ethan at stoneleaf.us Wed Jan 15 22:32:01 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 13:32:01 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: <52D6FE51.5000401@stoneleaf.us> On 01/15/2014 06:45 AM, Brett Cannon wrote: The PEP currently says:: >> >> format >> ------ >> >> The format mini language will be used as-is, with the behaviors as listed >> for %-interpolation. > > That's too vague; % interpolation does not support other format operators in the same way as str.format() does. % > interpolation has specific code to support %d, etc. But str.format() gets supported for {:d} not from special code but > because e.g. float.__format__('d') works. So you can't say "bytes.format() supports {:d} just like %d works with string > interpolation" since the mechanisms are fundamentally different. A question for anyone that has extensive experience in both %-formatting and .format-formatting: Would it be possible, at least for int and float, to take whatever is in the specifier and convert to %? Example: "Weight: {wgt:-07f}".format(wgt=137.23) would take the "-07f" and basically do a "%-07f" % 137.23 to get the ASCII to use? -- ~Ethan~ From greg.ewing at canterbury.ac.nz Wed Jan 15 22:55:31 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Jan 2014 10:55:31 +1300 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: <52D703D3.5090400@canterbury.ac.nz> Neil Schemenauer wrote: > Objects that implement __str__ can also implement __bytes__ if they > can guarantee that ASCII characters are always returned, I think __ascii_ would be a better name. I'd expect a method called __bytes__ on an int to return some version of its binary value. -- Greg From steve at pearwood.info Wed Jan 15 23:00:37 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 16 Jan 2014 09:00:37 +1100 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D703D3.5090400@canterbury.ac.nz> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D703D3.5090400@canterbury.ac.nz> Message-ID: <20140115220036.GU3869@ando> On Thu, Jan 16, 2014 at 10:55:31AM +1300, Greg Ewing wrote: > Neil Schemenauer wrote: > >Objects that implement __str__ can also implement __bytes__ if they > >can guarantee that ASCII characters are always returned, > > I think __ascii_ would be a better name. I'd expect > a method called __bytes__ on an int to return some > version of its binary value. +1 -- Steven From eric at trueblade.com Wed Jan 15 23:14:06 2014 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 15 Jan 2014 17:14:06 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D6FE51.5000401@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D6FE51.5000401@stoneleaf.us> Message-ID: <52D7082E.3080605@trueblade.com> On 1/15/2014 4:32 PM, Ethan Furman wrote: > A question for anyone that has extensive experience in both %-formatting > and .format-formatting: Would it be possible, at least for int and > float, to take whatever is in the specifier and convert to %? Example: > > "Weight: {wgt:-07f}".format(wgt=137.23) > > would take the "-07f" and basically do a "%-07f" % 137.23 to get the > ASCII to use? I think the int.__format__ version might be a superset. Specifically, the "n" and "%" types. There may well be others. But I think we could say we're not going to support these in b"".format(). From brett at python.org Wed Jan 15 23:17:24 2014 From: brett at python.org (Brett Cannon) Date: Wed, 15 Jan 2014 17:17:24 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D6FC7F.8090804@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D6FC7F.8090804@stoneleaf.us> Message-ID: On Wed, Jan 15, 2014 at 4:24 PM, Ethan Furman wrote: > On 01/15/2014 06:45 AM, Brett Cannon wrote: > >> >> I also think that a 'b' conversion be added to bytes.format(). This >> doesn't have the same issue as %b if you make {} >> implicitly mean {!b} in Python 3.5 as {} will mean what is the most >> accurate for bytes.format() in either version. It >> also allows for explicit support where you know you only want a byte and >> allows {!s} to mean you only want a string (and >> thus throw an error otherwise). >> > > Given that !b does not exist in Py2, !s (like %s) has to mean bytes when > working with a byte stream. Given that, !s and !b would mean the same > thing, so it worth adding !b? > I disagree with the assertion. %s has to mean bytes for Python 2 compatibility because there is no equivalent to '{}' (no conversion or format spec specified); basically %s represents "no conversion" for the % operator. But since format() has the concept of a default conversion as well as explicit conversions you can lean on that fact and let the default conversion do what makes sense for that version of Python. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Jan 15 23:22:10 2014 From: brett at python.org (Brett Cannon) Date: Wed, 15 Jan 2014 17:22:10 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <20140115220036.GU3869@ando> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D703D3.5090400@canterbury.ac.nz> <20140115220036.GU3869@ando> Message-ID: On Wed, Jan 15, 2014 at 5:00 PM, Steven D'Aprano wrote: > On Thu, Jan 16, 2014 at 10:55:31AM +1300, Greg Ewing wrote: > > Neil Schemenauer wrote: > > >Objects that implement __str__ can also implement __bytes__ if they > > >can guarantee that ASCII characters are always returned, > > > > I think __ascii_ would be a better name. I'd expect > > a method called __bytes__ on an int to return some > > version of its binary value. > > +1 > If we are going the route of a new magic method then __ascii__ or __bytes_format__ get my vote as long as they only return bytes (I see no need to abbreviate to __bformat__ or __formatb__ when we have method names as long as __text_signature__ now). -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jan 15 23:11:36 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 14:11:36 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: <52D70798.2050308@stoneleaf.us> On 01/15/2014 08:33 AM, Mark Lawrence wrote: > > For completeness I believe %r and %a should be included here as well. Good point. Done. -- ~Ethan~ From breamoreboy at yahoo.co.uk Wed Jan 15 23:34:48 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 15 Jan 2014 22:34:48 +0000 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D703D3.5090400@canterbury.ac.nz> <20140115220036.GU3869@ando> Message-ID: On 15/01/2014 22:22, Brett Cannon wrote: > > > > On Wed, Jan 15, 2014 at 5:00 PM, Steven D'Aprano > wrote: > > On Thu, Jan 16, 2014 at 10:55:31AM +1300, Greg Ewing wrote: > > Neil Schemenauer wrote: > > >Objects that implement __str__ can also implement __bytes__ if they > > >can guarantee that ASCII characters are always returned, > > > > I think __ascii_ would be a better name. I'd expect > > a method called __bytes__ on an int to return some > > version of its binary value. > > +1 > > > If we are going the route of a new magic method then __ascii__ or > __bytes_format__ get my vote as long as they only return bytes (I see no > need to abbreviate to __bformat__ or __formatb__ when we have method > names as long as __text_signature__ now). > __bytes_format__ gets my vote as it's blatantly obvious what it does. I'm against __ascii__ as I'd automatically associate that with ascii in the same way that I associate str with __str__ and repr with __repr__. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From steve at pearwood.info Wed Jan 15 23:35:48 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 16 Jan 2014 09:35:48 +1100 Subject: [Python-Dev] Common subset of python 2 and python 3 In-Reply-To: <52D69A40.3020700@trueblade.com> References: <52D67D94.3010700@v.loewis.de> <52D69A40.3020700@trueblade.com> Message-ID: <20140115223548.GV3869@ando> On Wed, Jan 15, 2014 at 09:25:04AM -0500, Eric V. Smith wrote: > On 1/15/2014 8:21 AM, Chris Angelico wrote: > > On Wed, Jan 15, 2014 at 11:22 PM, "Martin v. L?wis" wrote: > >> I don't think that it is possible to write an interpreter that is fully > >> compatible for all it accepts. Would you think that the program > >> > >> print(repr(2**80).endswith("L")) > >> > >> is in the subset that should be supported by both Python 2 and Python 3? > > > > Easiest fix for that would be to have long.__repr__ omit the L tag. > > Then it'll do the same as it would in Py3. > > I think Martin's point is not this specific thing, but that such a > subset would be useless. Would you drop dict.items() because it returns > different things in both languages? Drop range() because it's different? > There are many, many such differences. The common subset is not useful. To expand on this, the common subset is not useful, not well-defined, and it is not needed. Not well-defined because neither "Python 2" nor "Python 3" are well-defined. Most of the code I write supports Python 2.4 onwards, and there are a lot of features (including syntax!) that exist in 2.7 but not 2.4. Likewise there are features in 3.3 that aren't in 3.2. But most importantly, limiting yourself to just the common subset isn't needed to write polyglot code that works over 2.x and 3.x. For the most part, a few conditional tests will let you write code that works across multiple versions. I prefer to check for features than test the version number: try: next except NameError: # Python 2.4 or 2.5 def next(obj): return type(obj).__next__() sort of thing. Syntax changes are more difficult to deal with. I deal with the lack of ternary if operator in 2.4 by just avoiding it. The other day I really, really wanted to use a with statement, but still be compatible with 2.4. I started off messing about with exec, but eventually rejected that in favour of a conditional import: I lifted that one function using a with statement into a module of its own, then tried importing it. If it failed, I fell back to a second module which implemented the same thing using nested try...except blocks. There's a tiny bit of duplicated code, but less than a dozens lines including a docstring. Given how easy it usually is to write 2/3 compatible code, I don't think that limiting myself to a subset that works unchanged in both would be useful to me. That would be a step backward, like going back to Python 1.5 or 2.0, where the language is still recognisably the same, but it's missing so many features we take for granted that it's painful to work with. -- Steven From steve at pearwood.info Thu Jan 16 01:03:45 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 16 Jan 2014 11:03:45 +1100 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D703D3.5090400@canterbury.ac.nz> <20140115220036.GU3869@ando> Message-ID: <20140116000344.GW3869@ando> On Wed, Jan 15, 2014 at 10:34:48PM +0000, Mark Lawrence wrote: > On 15/01/2014 22:22, Brett Cannon wrote: > > > > > > > >On Wed, Jan 15, 2014 at 5:00 PM, Steven D'Aprano >> wrote: > > > > On Thu, Jan 16, 2014 at 10:55:31AM +1300, Greg Ewing wrote: > > > Neil Schemenauer wrote: > > > >Objects that implement __str__ can also implement __bytes__ if they > > > >can guarantee that ASCII characters are always returned, > > > > > > I think __ascii_ would be a better name. I'd expect > > > a method called __bytes__ on an int to return some > > > version of its binary value. > > > > +1 > > > > > >If we are going the route of a new magic method then __ascii__ or > >__bytes_format__ get my vote as long as they only return bytes (I see no > >need to abbreviate to __bformat__ or __formatb__ when we have method > >names as long as __text_signature__ now). > > > > __bytes_format__ gets my vote as it's blatantly obvious what it does. What precisely does it do? If it's so obvious, why is this thread so long? > I'm against __ascii__ as I'd automatically associate that with ascii in > the same way that I associate str with __str__ and repr with __repr__. That's a good point. I forgot about ascii(). -- Steven From ethan at stoneleaf.us Thu Jan 16 01:13:55 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 16:13:55 -0800 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D59622.1070307@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> Message-ID: <52D72443.10002@stoneleaf.us> Current copy of PEP, many modifications from all the feedback. Thank you to everyone. I know it's been a long week (feels a lot longer!) while all this was hammered out, but I think we're getting close! ============================ Abstract ======== This PEP proposes adding the % and {} formatting operations from str to bytes [1]. Overriding Principles ===================== In order to avoid the problems of auto-conversion and value-generated exceptions, all object checking will be done via isinstance, not by values contained in a Unicode representation. In other words:: - duck-typing to allow/reject entry into a byte-stream - no value generated errors Proposed semantics for bytes formatting ======================================= %-interpolation --------------- All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers, except locale. Example:: >>> b'%4x' % 10 b' a' %c will insert a single byte, either from an int in range(256), or from a bytes argument of length 1. Example: >>> b'%c' % 48 b'0' >>> b'%c' % b'a' b'a' %s is restricted in what it will accept:: - input type supports Py_buffer? use it to collect the necessary bytes - input type is something else? use its __bytes__ method; if there isn't one, raise an exception [2] Examples: >>> b'%s' % b'abc' b'abc' >>> b'%s' % 3.14 Traceback (most recent call last): ... TypeError: 3.14 has no __bytes__ method >>> b'%s' % 'hello world!' Traceback (most recent call last): ... TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it? .. note:: Because the str type does not have a __bytes__ method, attempts to directly use 'a string' as a bytes interpolation value will raise an exception. To use 'string' values, they must be encoded or otherwise transformed into a bytes sequence:: 'a string'.encode('latin-1') format ------ The format mini language codes, where they correspond with the %-interpolation codes, will be used as-is, with three exceptions:: - !s is not supported, as {} can mean the default for both str and bytes, in both Py2 and Py3. - !b is supported, and new Py3k code can use it to be explicit. - no other __format__ method will be called. Numeric Format Codes -------------------- To properly handle int and float subclasses, int(), index(), and float() will be called on the objects intended for (d, i, u), (b, o, x, X), and (e, E, f, F, g, G). Unsupported codes ----------------- %r (which calls __repr__), and %a (which calls ascii() on __repr__) are not supported. !r and !a are not supported. The n integer and float format code is not supported. Open Questions ============== Currently non-numeric objects go through:: - Py_buffer - __bytes__ - failure Do we want to add a __format_bytes__ method in there? - Guaranteed to produce only ascii (as in b'10', not b'\x0a') - Makes more sense than using __bytes__ to produce ascii output - What if an object has both __bytes__ and __format_bytes__? Do we need to support all the numeric format codes? The floating point exponential formats seem less appropriate, for example. Proposed variations =================== It was suggested to let %s accept numbers, but since numbers have their own format codes this idea was discarded. It has been suggested to use %b for bytes instead of %s. - Rejected as %b does not exist in Python 2.x %-interpolation, which is why we are using %s. It has been proposed to automatically use .encode('ascii','strict') for str arguments to %s. - Rejected as this would lead to intermittent failures. Better to have the operation always fail so the trouble-spot can be correctly fixed. It has been proposed to have %s return the ascii-encoded repr when the value is a str (b'%s' % 'abc' --> b"'abc'"). - Rejected as this would lead to hard to debug failures far from the problem site. Better to have the operation always fail so the trouble-spot can be easily fixed. Footnotes ========= .. [1] string.Template is not under consideration. .. [2] TypeError, ValueError, or UnicodeEncodeError? ====================== -- ~Ethan~ From ncoghlan at gmail.com Thu Jan 16 01:35:42 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Jan 2014 10:35:42 +1000 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <87ob3dcy2b.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52D4845B.10009@canterbury.ac.nz> <52d57def.0180310a.4b08.ffffa287@mx.google.com> <87ob3dcy2b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 15 Jan 2014 20:58, "Stephen J. Turnbull" wrote: > > Aside: OK, Guido, ya got me. > > I have a separate screed recounting the reasons for my apostasy, but > that's probably not interesting any more. I'll send it to individuals > on request. > > > But in terms of explaining the text model, that > > separation is important enough that > > > > (1) We should be reluctant to strengthen the > > "its really just ASCII" messages. > > True. I think the right message is is "Unless you know why you > *desperately* want this, not only don't you need it, but using it is > the Python equivalent of skydiving without a parachute." > > N.B. Don't take the metaphor as an insult. I think it's become clear > that those who "desperately want this" not only use parachutes, they > pack their own. No need to worry about them. > > > (2) It *may* be worth creating a virtual > > split in the documentation. > > Please don't. All we need to tell naive users is: > > Look at the structure of the bytes. If that structure is "text", > convert to str using .decode(). Please don't use bytes. > > If that structure isn't text, you're in a specialist domain, and > it's your problem. Many structured uses of bytes use ASCII- > encoded keywords: we provide bytes methods for handling them, but > you *must* be aware that these methods *cannot* distinguish "bytes > representing text encoded as ASCII" from "any old bytes". Be > warned: They will happily -- and silently -- corrupt the latter. > Make sure you respect the higher-level structure of your data when > using them. Yes, I'm currently thinking the appropriate approach to the docs will be to remove the current "these have most of the str methods too" paragraph for binary sequences and instead create three completely explicit lists of methods: - provided, works with arbitrary data - provided, assumes the use of an ASCII compatible data format - not provided PEP 461 would add a fourth category, of being provided, but with more restricted semantics. Cheers, Nick. > > > Virtual subclass ASCIIStructuredBytes > > ==================================== > > > > One particularly common use of bytes is to represent > > the contents of a file, or of a network message. In > > these cases, the bytes will often represent Text > > *in a specific encoding* and that encoding will usually > > be a superset of ASCII. Rather than create and support > > an ASCIIStructuredBytes subclass, Python simply added > > support for these use cases straight to Bytes objects, > > and assumes that this support simply won't be used when > > when it does not make sense. For example, bytes literals > > This is going quite the wrong direction, I think. The only people who > should care about "Text *in a specific encoding* and that encoding > will usually be a superset of ASCII" are codec writers, and by now > writing those is a very rare task. Everybody else uses ASCII keywords > in a simple formal language. > > > *could* be used to construct a sound sample, but the > > literals will be far easier to read when they are used > > to represent (encoded) ASCII text, such as "OPEN". > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Thu Jan 16 02:17:55 2014 From: carl at oddbird.net (Carl Meyer) Date: Wed, 15 Jan 2014 18:17:55 -0700 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D72443.10002@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> Message-ID: <52D73343.6080207@oddbird.net> Hi Ethan, I haven't chimed into this discussion, but the direction it's headed recently seems right to me. Thanks for putting together a PEP. Some comments on it: On 01/15/2014 05:13 PM, Ethan Furman wrote: > ============================ > Abstract > ======== > > This PEP proposes adding the % and {} formatting operations from str to > bytes [1]. I think the PEP could really use a rationale section summarizing _why_ these formatting operations are being added to bytes; namely that they are useful when working with various ASCIIish-but-not-properly-text network protocols and file formats, and in particular when porting code dealing with such formats/protocols from Python 2. Also I think it would be useful to have a section summarizing the primary objections that have been raised, and why those objections have been overruled (presuming the PEP is accepted). For instance: the main objection, AIUI, has been that the bytes type is for pure bytes-handling with no assumptions about encoding, and thus we should not add features to it that assume ASCIIness, and that may be attractive nuisances for people writing bytes-handling code that should not assume ASCIIness but will once they use the feature. And the refutation: that the bytes type already provides some operations that assume ASCIIness, and these new formatting features are no more of an attractive nuisance than those; since the syntax of the formatting mini-languages themselves itself assumes ASCIIness, there is not likely to be any temptation to use it with binary data that cannot. Although it can be hard to arrive at accurate and agreed-on summaries of the discussion, recording such summaries in the PEP is important; it may help save our future selves and colleagues from having to revisit all these same discussions and megathreads. > Overriding Principles > ===================== > > In order to avoid the problems of auto-conversion and value-generated > exceptions, > all object checking will be done via isinstance, not by values contained > in a > Unicode representation. In other words:: > > - duck-typing to allow/reject entry into a byte-stream > - no value generated errors This seems self-contradictory; "isinstance" is type-checking, which is the opposite of duck-typing. A duck-typing implementation would not use isinstance, it would call / check for the existence of a certain magic method instead. I think it might also be good to expand (very) slightly on what "the problems of auto-conversion and value-generated exceptions" are; that is, that the benefit of Python 3's model is that encoding is explicit, not implicit, making it harder to unwittingly write code that works as long as all data is ASCII, but fails as soon as someone feeds in non-ASCII text data. Not everyone who reads this PEP will be steeped in years of discussion about the relative merits of the Python 2 vs 3 models; it doesn't hurt to spell out a few assumptions. > Proposed semantics for bytes formatting > ======================================= > > %-interpolation > --------------- > > All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) > will be supported, and will work as they do for str, including the > padding, justification and other related modifiers, except locale. > > Example:: > > >>> b'%4x' % 10 > b' a' > > %c will insert a single byte, either from an int in range(256), or from > a bytes argument of length 1. > > Example: > > >>> b'%c' % 48 > b'0' > > >>> b'%c' % b'a' > b'a' > > %s is restricted in what it will accept:: > > - input type supports Py_buffer? > use it to collect the necessary bytes > > - input type is something else? > use its __bytes__ method; if there isn't one, raise an exception [2] > > Examples: > > >>> b'%s' % b'abc' > b'abc' > > >>> b'%s' % 3.14 > Traceback (most recent call last): > ... > TypeError: 3.14 has no __bytes__ method > > >>> b'%s' % 'hello world!' > Traceback (most recent call last): > ... > TypeError: 'hello world' has no __bytes__ method, perhaps you need > to encode it? > > .. note:: > > Because the str type does not have a __bytes__ method, attempts to > directly use 'a string' as a bytes interpolation value will raise an > exception. To use 'string' values, they must be encoded or otherwise > transformed into a bytes sequence:: > > 'a string'.encode('latin-1') > > format > ------ > > The format mini language codes, where they correspond with the > %-interpolation codes, > will be used as-is, with three exceptions:: > > - !s is not supported, as {} can mean the default for both str and > bytes, in both > Py2 and Py3. > - !b is supported, and new Py3k code can use it to be explicit. > - no other __format__ method will be called. > > Numeric Format Codes > -------------------- > > To properly handle int and float subclasses, int(), index(), and float() > will be called on the > objects intended for (d, i, u), (b, o, x, X), and (e, E, f, F, g, G). > > Unsupported codes > ----------------- > > %r (which calls __repr__), and %a (which calls ascii() on __repr__) are > not supported. > > !r and !a are not supported. > > The n integer and float format code is not supported. > > > Open Questions > ============== > > Currently non-numeric objects go through:: > > - Py_buffer > - __bytes__ > - failure > > Do we want to add a __format_bytes__ method in there? > > - Guaranteed to produce only ascii (as in b'10', not b'\x0a') > - Makes more sense than using __bytes__ to produce ascii output > - What if an object has both __bytes__ and __format_bytes__? > > Do we need to support all the numeric format codes? The floating point > exponential formats seem less appropriate, for example. > > > Proposed variations > =================== > > It was suggested to let %s accept numbers, but since numbers have their own > format codes this idea was discarded. > > It has been suggested to use %b for bytes instead of %s. > > - Rejected as %b does not exist in Python 2.x %-interpolation, which is > why we are using %s. > > It has been proposed to automatically use .encode('ascii','strict') for str > arguments to %s. > > - Rejected as this would lead to intermittent failures. Better to > have the > operation always fail so the trouble-spot can be correctly fixed. > > It has been proposed to have %s return the ascii-encoded repr when the > value > is a str (b'%s' % 'abc' --> b"'abc'"). > > - Rejected as this would lead to hard to debug failures far from the > problem > site. Better to have the operation always fail so the trouble-spot > can be > easily fixed. > > > Footnotes > ========= > > .. [1] string.Template is not under consideration. > .. [2] TypeError, ValueError, or UnicodeEncodeError? TypeError seems right to me. Definitely not UnicodeEncodeError - refusal to implicitly encode is not at all the same thing as an encoding error. Carl From v+python at g.nevcal.com Thu Jan 16 02:46:07 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 15 Jan 2014 17:46:07 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <20140116000344.GW3869@ando> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D703D3.5090400@canterbury.ac.nz> <20140115220036.GU3869@ando> <20140116000344.GW3869@ando> Message-ID: <52D739DF.1020009@g.nevcal.com> On 1/15/2014 4:03 PM, Steven D'Aprano wrote: > What precisely does it do? If it's so obvious, why is this thread so > long? It produces a formatted representation of the object in bytes. For numbers, that would probably be expected to be ASCII digits and punctuation. But other items are not as obvious. bytes would probably be expected not to have a __bytes_format__, but if a subclass defined one, it might be HEX or Base64 of the base bytes. Or if the subclass is ASCII text oriented, it might be the ASCII text version of the base bytes (which would be identical to the base bytes, except for the type transformation). str would probably be expected not to have a __bytes_format__, but if a subclass defined one, it might be HEX or Base64, or it might be a specific encoding of the base str. Other objects might generate an ASCII __repr__, if they define the method. It took a lot of talk to reach the conclusion, if it has been reached, that none of the solution are general enough without defining something like __bytes_format__. And before that, a lot of talk to decide that % interpolation already had an ASCII bias. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Jan 16 03:15:35 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 16 Jan 2014 13:15:35 +1100 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D739DF.1020009@g.nevcal.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D703D3.5090400@canterbury.ac.nz> <20140115220036.GU3869@ando> <20140116000344.GW3869@ando> <52D739DF.1020009@g.nevcal.com> Message-ID: <20140116021534.GX3869@ando> On Wed, Jan 15, 2014 at 05:46:07PM -0800, Glenn Linderman wrote: > On 1/15/2014 4:03 PM, Steven D'Aprano wrote: > >What precisely does it do? If it's so obvious, why is this thread so > >long? > > It produces a formatted representation of the object in bytes. For > numbers, that would probably be expected to be ASCII digits and punctuation. > > But other items are not as obvious. My point exactly. -- Steven From v+python at g.nevcal.com Thu Jan 16 03:12:04 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 15 Jan 2014 18:12:04 -0800 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D72443.10002@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> Message-ID: <52D73FF4.2000609@g.nevcal.com> On 1/15/2014 4:13 PM, Ethan Furman wrote: > - no value generated errors ... > > %c will insert a single byte, either from an int in range(256), or from > a bytes argument of length 1. what does x = 354 b"%c" % x produce? Seems that construct produces a value dependent error in both python 2 & 3 (although it takes a much bigger value to produce the error in python 3, with str %... with bytes %, the problem with be reached at 256, just like python 2). Is this an intended exception to the overriding principle? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sky.kok at speaklikeaking.com Thu Jan 16 03:25:01 2014 From: sky.kok at speaklikeaking.com (Vajrasky Kok) Date: Thu, 16 Jan 2014 10:25:01 +0800 Subject: [Python-Dev] Signature of function with default value uncapturable in Python and C Message-ID: Dear friends, >>> from itertools import repeat >>> list(repeat('a', 3)) ['a', 'a', 'a'] >>> list(repeat('a', 0)) [] >>> repeat.__doc__ 'repeat(object [,times]) -> create an iterator which returns the object\nfor the specified number of times. If not specified, returns the object\nendlessly.' If you omit the times argument: >>> list(repeat('a')) ... unlimited of a .... sometimes it can hang your machine .... In the C code it self, the default value of variable handling times argument is -1. It checks how many arguments you give to the function. So if you give -1 directly: >>> list(repeat('a', -1)) [] Negative value of times argument means 0. So what is the correct signature of this function? The value is not really capturable in Python and C. repeat(object [,times = unlimited]) ???? Can we do this in Clinic? If not, should we? Vajrasky From guido at python.org Thu Jan 16 03:48:51 2014 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Jan 2014 18:48:51 -0800 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D73FF4.2000609@g.nevcal.com> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73FF4.2000609@g.nevcal.com> Message-ID: Surprisingly, in this case the exception is just what the doctor ordered. :-) On Wed, Jan 15, 2014 at 6:12 PM, Glenn Linderman wrote: > On 1/15/2014 4:13 PM, Ethan Furman wrote: > > - no value generated errors > > ... > > > %c will insert a single byte, either from an int in range(256), or from > a bytes argument of length 1. > > > what does > > x = 354 > b"%c" % x > > produce? Seems that construct produces a value dependent error in both > python 2 & 3 (although it takes a much bigger value to produce the error in > python 3, with str %... with bytes %, the problem with be reached at 256, > just like python 2). > > Is this an intended exception to the overriding principle? > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Thu Jan 16 04:14:23 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 15 Jan 2014 22:14:23 -0500 Subject: [Python-Dev] Signature of function with default value uncapturable in Python and C In-Reply-To: References: Message-ID: On 1/15/2014 9:25 PM, Vajrasky Kok wrote: > Dear friends, > >>>> from itertools import repeat >>>> list(repeat('a', 3)) > ['a', 'a', 'a'] >>>> list(repeat('a', 0)) > [] >>>> repeat.__doc__ > 'repeat(object [,times]) -> create an iterator which returns the > object\nfor the specified number of times. If not specified, returns > the object\nendlessly.' I think that the doc should say that a negative value is treated as 0 and that this is enough for a tracker issue after you get more feedback or gather more info. There is at least one other builtin/stdlib function that does this. > If you omit the times argument: > >>>> list(repeat('a')) > ... unlimited of a .... sometimes it can hang your machine .... > > In the C code it self, the default value of variable handling times > argument is -1. Is is necessary to give times a pseudo-default? What is done in other places (which are many) where a parameter is optional, with no default? > It checks how many arguments you give to the function. > So if you give -1 directly: > >>>> list(repeat('a', -1)) > [] > > Negative value of times argument means 0. > > So what is the correct signature of this function? The value is not > really capturable in Python and C. The signature in the doc is correct: times is optional, with no default value. Instead, the function has a default behavior that does not need the value. There are other examples. The (nearly) 'equivalent' Python code in the doc fakes this with times=None, but passing None fails. I think the same issue occurs in the random module. > repeat(object [,times = unlimited]) ???? > > Can we do this in Clinic? If not, should we? I should hope that Clinic (and signature objects) can handle no-default optional args, as there are several. -- Terry Jan Reedy From greg.ewing at canterbury.ac.nz Thu Jan 16 04:40:42 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Jan 2014 16:40:42 +1300 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D6AD61.9080606@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D6AD61.9080606@stoneleaf.us> Message-ID: <52D754BA.2080506@canterbury.ac.nz> Ethan Furman wrote: > Well, I'm not sure what "booted into touch" means, It's a rugby term, referring to kicking the ball over the touch line. As a metaphor, it seems to mean making a problem go away. -- Greg From greg.ewing at canterbury.ac.nz Thu Jan 16 04:54:06 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Jan 2014 16:54:06 +1300 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D73FF4.2000609@g.nevcal.com> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73FF4.2000609@g.nevcal.com> Message-ID: <52D757DE.5000106@canterbury.ac.nz> Glenn Linderman wrote: > > x = 354 > b"%c" % x > > Is this an intended exception to the overriding principle? I think it's an unavoidable one, unless we want to introduce an "integer in the range 0-255" type. But that would just push the problem into another place, since b"%c" % byte(x) would then blow up on byte(x) if x were out of range. If you really want to make sure it won't crash, you can always do b"%c" % (x & 0xff) or whatever your favourite method of mangling out- of-range ints is. -- Greg From rmsr at lab.net Thu Jan 16 04:57:46 2014 From: rmsr at lab.net (Ryan Smith-Roberts) Date: Wed, 15 Jan 2014 19:57:46 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments Message-ID: One of the downsides of converting positional-only functions to Argument Clinic is that it can result in misleading docstring signatures. Example: socket.getservbyname(servicename[, protocolname]) -> socket.getservbyname(servicename, protocolname=None) The problem with the new signature is that it indicates passing None for protocolname is the same as omitting it (the other, much larger problem is that it falsely indicates keyword compatibility, but that's a separate indoor elephant). My question: Is it OK to change a longstanding function to treat None like an absent parameter, where previously it was an error? (This also entails a docs update and maybe a changelog entry) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Thu Jan 16 05:39:30 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 16 Jan 2014 13:39:30 +0900 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: References: <52D4845B.10009@canterbury.ac.nz> <52d57def.0180310a.4b08.ffffa287@mx.google.com> <87ob3dcy2b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87d2jsczgd.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > Yes, I'm currently thinking the appropriate approach to the docs > will be to remove the current "these have most of the str methods > too" paragraph for binary sequences and instead create three > completely explicit lists of methods: > - provided, works with arbitrary data > - provided, assumes the use of an ASCII compatible data format I'm not sure what that means. If you mean that in the format string for .format() and %-formatting, bytes 0-127 must always have ASCII coded character semantics with bytes 128-255 unrestricted, indeed, that is the pragmatic restriction. Is there anything else? The implications of this should be made clear, though: funky Asian encodings cannot be safely used in format strings for format(), GB18030 isn't safe in %-formatting either, and the value returned by these operations should be assumed to be non-ASCII-compatible unless proven otherwise (no iterated formatting). I think you also need - provided, assumes pure ASCII-encoded text since as far as I know the only strictly ASCII-compatible binary formats are ISO 2022-compatible encodings and UTF-8, ie, text, and the characters represented with bytes in the range 128-255 are not handled by bytes versions of the case-checking and case-converting operations, and so have extremely dubious semantics unless the data is pure ASCII. This is also true of most of the is_* operations. Note that .center and .strip have pretty dubious semantics for arbitrary "ASCII-compatible" data: >>> b"abc\r\n".center(15) b' abc\r\n ' >>> " \xA0abc\xA0 ".strip() 'abc' >>> b" \xA0abc\xA0 ".strip() b'\xa0abc\xa0' Of course the case of .center() is purely a programmer error, and I don't have a use case where it's problematic in practice. But it's sort of unpleasant. Although I have internalized Guido's point that what's important is that there be no implicit conversions between bytes and str, I still worry that this slew of subtle semantic differences when moving str methods wholesale to bytes is a bug magnet. I have an especially bad feeling about str-into-bytes interpolation. If people want that, they should use a type like asciistr that provides more or less firm guarantees that the content is pure ASCII. > - not provided > PEP 461 would add a fourth category, of being provided, but with > more restricted semantics. I haven't looked closely at PEP 461 yet, and I'm not sure I'm going to have time this week. From rmsr at lab.net Thu Jan 16 05:35:36 2014 From: rmsr at lab.net (Ryan Smith-Roberts) Date: Wed, 15 Jan 2014 20:35:36 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: Message-ID: On Wed, Jan 15, 2014 at 7:57 PM, Ryan Smith-Roberts wrote: > socket.getservbyname(servicename[, protocolname]) > -> > socket.getservbyname(servicename, protocolname=None) > Here is a more complicated example, since the above does technically have an alternative fix: sockobj.sendmsg(buffers[, ancdata[, flags[, address]]]) -> sockobj.sendmsg(buffers, ancdata=None, flags=0, address=None) -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Jan 16 06:32:47 2014 From: larry at hastings.org (Larry Hastings) Date: Wed, 15 Jan 2014 21:32:47 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: Message-ID: <52D76EFF.6080605@hastings.org> On 01/15/2014 08:35 PM, Ryan Smith-Roberts wrote: > On Wed, Jan 15, 2014 at 7:57 PM, Ryan Smith-Roberts > wrote: > > socket.getservbyname(servicename[, protocolname]) > -> > socket.getservbyname(servicename, protocolname=None) > > > Here is a more complicated example, since the above does technically > have an alternative fix: > > sockobj.sendmsg(buffers[, ancdata[, flags[, address]]]) > -> > sockobj.sendmsg(buffers, ancdata=None, flags=0, address=None) I feel like Ryan didn't sufficiently explain the problem. I'll elaborate. For functions implemented in Python, it's always true that: * a parameter that is optional always has a default value, and * this default value is visible to Python and is a Python value. The default value of every parameter is part of the function's signature--you can see them with inspect.signature() or inspect.getfullargspec(). A corollary of this: for every function implemented in Python, you can construct a call to it where you fill in every optional value with its published default value, and this is exactly equivalent to calling it without specifying those arguments: def foo(a=any_python_value): ... sig = inspect.signature(foo) defaults = [p.default for p in sig.parameters.values()] foo(*defaults) == foo() Assuming neither foo nor "any_python_value" have side effects, those two calls to foo() will be exactly the same in every way. But! Builtin functions implemented in C commonly have optional parameters whose default value is not only opaque to Python, it's not renderable as a Python value. That default value is NULL. Consider the first two paragraphs of SHA1_new() in Modules/sha1module.c: static PyObject * SHA1_new(PyObject *self, PyObject *args, PyObject *kwdict) { static char *kwlist[] = {"string", NULL}; SHA1object *new; PyObject *data_obj = NULL; Py_buffer buf; if (!PyArg_ParseTupleAndKeywords(args, kwdict, "|O:new", kwlist, &data_obj)) { return NULL; } ... The "string" parameter is optional. Its value, if specified, is written to "data_obj". "data_obj" has a default value of NULL. There is no Python value you could pass in to this function that would result in "data_obj" being (redundantly) set to NULL. In Python SHA1_new is known as "_sha1.sha1". And: sig = inspect.signature(_sha1.sha1) defaults = [p.default for p in sig.parameters.values()] _sha1.sha1(*defaults) == _sha1.sha1() There's no value we could put in the signature for _sha1.sha1 that would behave exactly the same as that NULL. The ultimate goal of Argument Clinic is to provide introspection information for builtins. But we can't provide a default value to Python for the "string" parameter to _sha1.sha1() without changing the semantics of the function. We're stuck. Ryan's question, then, is "can we change a function like this so it accepts None?" My initial reaction is "no". That might be okay for _sha1.sha1(), but it'd be best to avoid. In the specific case of SHA1_new's "string" parameter, we could lie and claim that the default value is b''. Internally we could still use NULL as a default and get away with it. But this is only a happy coincidence. Many (most?) functions like this won't have a clever Python value we can trick you with. What else could we do? We could add a special value, let's call it sys.NULL, whose specific semantics are "turns into NULL when passed into builtins". This would solve the problem but it's really, really awful. The only other option I can see: don't convert SHA1_new() to use Argument Clinic, and don't provide introspection information for it either. Can you, gentle reader, suggest a better option? //arry/ p.s. Ryan's function signatures above suggest that he's converting code from using PyArg_ParseTuple into using PyArg_ParseTupleAndKeywords. I don't think he's *actually* doing that, and if I saw that in patches submitted to me I would ask that it be fixed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jan 16 06:37:09 2014 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Jan 2014 21:37:09 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: <52D76EFF.6080605@hastings.org> References: <52D76EFF.6080605@hastings.org> Message-ID: Well, I think these are mostly artifacts from old times, and usually passing None *should* be the same as omitting the argument. But check each case! On Wednesday, January 15, 2014, Larry Hastings wrote: > > On 01/15/2014 08:35 PM, Ryan Smith-Roberts wrote: > > On Wed, Jan 15, 2014 at 7:57 PM, Ryan Smith-Roberts > > wrote: > >> socket.getservbyname(servicename[, protocolname]) >> -> >> socket.getservbyname(servicename, protocolname=None) >> > > Here is a more complicated example, since the above does technically > have an alternative fix: > > sockobj.sendmsg(buffers[, ancdata[, flags[, address]]]) > -> > sockobj.sendmsg(buffers, ancdata=None, flags=0, address=None) > > > I feel like Ryan didn't sufficiently explain the problem. I'll elaborate. > > > For functions implemented in Python, it's always true that: > > - a parameter that is optional always has a default value, and > - this default value is visible to Python and is a Python value. > > The default value of every parameter is part of the function's > signature--you can see them with inspect.signature() or > inspect.getfullargspec(). > > A corollary of this: for every function implemented in Python, you can > construct a call to it where you fill in every optional value with its > published default value, and this is exactly equivalent to calling it > without specifying those arguments: > > def foo(a=any_python_value): ... > > sig = inspect.signature(foo) > defaults = [p.default for p in sig.parameters.values()] > foo(*defaults) == foo() > > Assuming neither foo nor "any_python_value" have side effects, those two > calls to foo() will be exactly the same in every way. > > > But! Builtin functions implemented in C commonly have optional parameters > whose default value is not only opaque to Python, it's not renderable as a > Python value. That default value is NULL. Consider the first two > paragraphs of SHA1_new() in Modules/sha1module.c: > > static PyObject * > SHA1_new(PyObject *self, PyObject *args, PyObject *kwdict) > { > static char *kwlist[] = {"string", NULL}; > SHA1object *new; > PyObject *data_obj = NULL; > Py_buffer buf; > > if (!PyArg_ParseTupleAndKeywords(args, kwdict, "|O:new", kwlist, > &data_obj)) { > return NULL; > } > ... > > The "string" parameter is optional. Its value, if specified, is written > to "data_obj". "data_obj" has a default value of NULL. There is no Python > value you could pass in to this function that would result in "data_obj" > being (redundantly) set to NULL. In Python SHA1_new is known as > "_sha1.sha1". And: > > sig = inspect.signature(_sha1.sha1) > defaults = [p.default for p in sig.parameters.values()] > _sha1.sha1(*defaults) == _sha1.sha1() > > There's no value we could put in the signature for _sha1.sha1 that would > behave exactly the same as that NULL. > > The ultimate goal of Argument Clinic is to provide introspection > information for builtins. But we can't provide a default value to Python > for the "string" parameter to _sha1.sha1() without changing the semantics > of the function. We're stuck. > > Ryan's question, then, is "can we change a function like this so it > accepts None?" My initial reaction is "no". That might be okay for > _sha1.sha1(), but it'd be best to avoid. > > In the specific case of SHA1_new's "string" parameter, we could lie and > claim that the default value is b''. Internally we could still use NULL as > a default and get away with it. But this is only a happy coincidence. > Many (most?) functions like this won't have a clever Python value we can > trick you with. > > What else could we do? We could add a special value, let's call it > sys.NULL, whose specific semantics are "turns into NULL when passed into > builtins". This would solve the problem but it's really, really awful. > > The only other option I can see: don't convert SHA1_new() to use Argument > Clinic, and don't provide introspection information for it either. > > Can you, gentle reader, suggest a better option? > > > */arry* > > p.s. Ryan's function signatures above suggest that he's converting code > from using PyArg_ParseTuple into using PyArg_ParseTupleAndKeywords. I > don't think he's *actually* doing that, and if I saw that in patches > submitted to me I would ask that it be fixed. > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Jan 16 06:55:46 2014 From: larry at hastings.org (Larry Hastings) Date: Wed, 15 Jan 2014 21:55:46 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: <52D76EFF.6080605@hastings.org> Message-ID: <52D77462.8060101@hastings.org> On 01/15/2014 09:37 PM, Guido van Rossum wrote: > Well, I think these are mostly artifacts from old times, and usually > passing None *should* be the same as omitting the argument. But check > each case! Vajrasky Kok's recent posting to python-dev discusses the same problem. His example is itertools.repeat's second parameter, which is slightly nastier. Consider the implementation: static PyObject * repeat_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { repeatobject *ro; PyObject *element; Py_ssize_t cnt = -1; static char *kwargs[] = {"object", "times", NULL}; if (!PyArg_ParseTupleAndKeywords(args, kwds, "O|n:repeat", kwargs, &element, &cnt)) return NULL; if (PyTuple_Size(args) == 2 && cnt < 0) cnt = 0; I draw your attention to the last two lines. And remember, Argument Clinic doesn't provide the "args" and "kwargs" parameters to the "impl" function. That means two things: * itertools.repeat deliberately makes it impossible to provide an argument for "times" that behaves the same as not providing the "times" argument, and * there is currently no way to implement this behavior using Argument Clinic. (I'd have to add a hack where impl functions also get args and kwargs.) Passing in "None" here is inconvenient as it's an integer argument. -1 actually seems like a pretty sane default to mean "repeat forever", but the author has gone to some effort to prevent this. I therefore assume they had a very good reason. So again we seem stuck. Are you suggesting that, when converting builtins to Argument Clinic with unrepresentable default values, we're permitted to tweak the defaults to something representable? //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jan 16 07:05:00 2014 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Jan 2014 22:05:00 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: <52D77462.8060101@hastings.org> References: <52D76EFF.6080605@hastings.org> <52D77462.8060101@hastings.org> Message-ID: On Wed, Jan 15, 2014 at 9:55 PM, Larry Hastings wrote: > > On 01/15/2014 09:37 PM, Guido van Rossum wrote: > > Well, I think these are mostly artifacts from old times, and usually passing > None *should* be the same as omitting the argument. But check each case! > > > Vajrasky Kok's recent posting to python-dev discusses the same problem. His > example is itertools.repeat's second parameter, which is slightly nastier. > Consider the implementation: > > static PyObject * > repeat_new(PyTypeObject *type, PyObject *args, PyObject *kwds) > { > repeatobject *ro; > PyObject *element; > Py_ssize_t cnt = -1; > static char *kwargs[] = {"object", "times", NULL}; > > if (!PyArg_ParseTupleAndKeywords(args, kwds, "O|n:repeat", kwargs, > &element, &cnt)) > return NULL; > > if (PyTuple_Size(args) == 2 && cnt < 0) > cnt = 0; > > > I draw your attention to the last two lines. And remember, Argument Clinic > doesn't provide the "args" and "kwargs" parameters to the "impl" function. > That means two things: > > itertools.repeat deliberately makes it impossible to provide an argument for > "times" that behaves the same as not providing the "times" argument, and > there is currently no way to implement this behavior using Argument Clinic. > (I'd have to add a hack where impl functions also get args and kwargs.) > > > Passing in "None" here is inconvenient as it's an integer argument. -1 > actually seems like a pretty sane default to mean "repeat forever", but the > author has gone to some effort to prevent this. I therefore assume they had > a very good reason. So again we seem stuck. > > Are you suggesting that, when converting builtins to Argument Clinic with > unrepresentable default values, we're permitted to tweak the defaults to > something representable? In this specific case it's clear to me that the special-casing of negative count is intentional -- presumably it emulates sequence repetition, where e.g. 'a'*-1 == ''. But not accepting None is laziness -- accepting either a number or None requires much more effort, if you need to have the number as a C integer. I'm not sure how AC could make this any easier, unless you want to special-case maxint or -maxint-1. In the sha1 example, however, accepting None and converting it to NULL (without a reference leak, please :-) seems fine though. -- --Guido van Rossum (python.org/~guido) From g.brandl at gmx.net Thu Jan 16 07:21:06 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 16 Jan 2014 07:21:06 +0100 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: Message-ID: Am 16.01.2014 05:35, schrieb Ryan Smith-Roberts: > On Wed, Jan 15, 2014 at 7:57 PM, Ryan Smith-Roberts > wrote: > > socket.getservbyname(servicename[, protocolname]) > -> > socket.getservbyname(servicename, protocolname=None) > > > Here is a more complicated example, since the above does technically have an > alternative fix: > > sockobj.sendmsg(buffers[, ancdata[, flags[, address]]]) > -> > sockobj.sendmsg(buffers, ancdata=None, flags=0, address=None) As far as I understand you should convert these with the "optional group" syntax (i.e. brackets). Georg From larry at hastings.org Thu Jan 16 07:28:57 2014 From: larry at hastings.org (Larry Hastings) Date: Wed, 15 Jan 2014 22:28:57 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: Message-ID: <52D77C29.4060101@hastings.org> On 01/15/2014 10:21 PM, Georg Brandl wrote: > Am 16.01.2014 05:35, schrieb Ryan Smith-Roberts: >> On Wed, Jan 15, 2014 at 7:57 PM, Ryan Smith-Roberts > > wrote: >> >> socket.getservbyname(servicename[, protocolname]) >> -> >> socket.getservbyname(servicename, protocolname=None) >> >> >> Here is a more complicated example, since the above does technically have an >> alternative fix: >> >> sockobj.sendmsg(buffers[, ancdata[, flags[, address]]]) >> -> >> sockobj.sendmsg(buffers, ancdata=None, flags=0, address=None) > As far as I understand you should convert these with the "optional group" syntax > (i.e. brackets). That's correct. Functions that use PyArg_ParseTuple should continue to use PyArg_ParseTuple. Ryan: add a "/" on a line by itself after the last parameter of each of these functions, and that will let you use the [ ] syntax too. Please see the Argument Clinic howto for more. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jan 16 06:52:08 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 21:52:08 -0800 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D73343.6080207@oddbird.net> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> Message-ID: <52D77388.4010607@stoneleaf.us> On 01/15/2014 05:17 PM, Carl Meyer wrote: > > I think the PEP could really use a rationale section It will have one before it's done. > Also I think it would be useful to have a section summarizing the > primary objections that have been raised, and why those objections have > been overruled Excellent point. That section will also be present. >> In order to avoid the problems of auto-conversion and value-generated >> exceptions, >> all object checking will be done via isinstance, not by values contained >> in a >> Unicode representation. In other words:: >> >> - duck-typing to allow/reject entry into a byte-stream >> - no value generated errors > > This seems self-contradictory; "isinstance" is type-checking, which is > the opposite of duck-typing. Good point, I'll reword that. It will be duck-typing. > I think it might also be good to expand (very) slightly on what "the > problems of auto-conversion and value-generated exceptions" are Will do. >> .. [2] TypeError, ValueError, or UnicodeEncodeError? > > TypeError seems right to me. Definitely not UnicodeEncodeError - refusal > to implicitly encode is not at all the same thing as an encoding error. That's the direction I'm leaning, too. Thanks for your comments! -- ~Ethan~ From ethan at stoneleaf.us Thu Jan 16 08:34:08 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 23:34:08 -0800 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <87r489d94g.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52D3333B.9000404@stoneleaf.us> <52D340F0.5080401@stoneleaf.us> <52D4547C.2070506@canterbury.ac.nz> <878uuievm5.fsf@uwakimon.sk.tsukuba.ac.jp> <87r489d94g.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D78B70.8050409@stoneleaf.us> On 01/14/2014 10:58 PM, Stephen J. Turnbull wrote: > > At the very least, the "iterated interpolation is a bad idea" > misfeature needs to be documented. I'm not sure it needs any extra attention. Even with str, iterated interpolation is tricky -- I've been bitten by it more than once, and that even when I controlled the source! :/ -- ~Ethan~ From ethan at stoneleaf.us Thu Jan 16 08:51:38 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 23:51:38 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> Message-ID: <52D78F8A.7060003@stoneleaf.us> On 01/15/2014 06:45 AM, Brett Cannon wrote: > > This is why I have argued that if you specify it as "if there is a format spec specified, then the return value from > calling __format__() will have str.decode('ascii', 'strict') called on it" you get the support for the various > number-specific format specs for free. It may work like this under the hood, but it's an implementation detail. Since the numeric format codes will call int, index, or float on the object (to handle subclasses), we could then call __format__ on the resulting int or float to do the heavy lifting; but since __format__ on anything else would never be called I don't want to give that impression. > It also means if you pass in a string that you just want the strict ASCII bytes > of then you can get it with {:s}. This isn't going to happen. If the user wants a string to be in the byte stream, it has to either be a bytes literal or explicitly encoded [1]. -- ~Ethan~ [1] Apologies if this has already been answered. I wanted to make sure I responded to all the ideas/objects, and I may have responded more than once to some. It's been a long few threads. ;) From ethan at stoneleaf.us Thu Jan 16 08:52:20 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Jan 2014 23:52:20 -0800 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D73FF4.2000609@g.nevcal.com> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73FF4.2000609@g.nevcal.com> Message-ID: <52D78FB4.90503@stoneleaf.us> On 01/15/2014 06:12 PM, Glenn Linderman wrote: > On 1/15/2014 4:13 PM, Ethan Furman wrote: >> >> - no value generated errors > ... >> >> %c will insert a single byte, either from an int in range(256), or from >> a bytes argument of length 1. > > what does > > x = 354 > b"%c" % x > > produce? Seems that construct produces a value dependent error in both python 2 & 3 (although it takes a much bigger > value to produce the error in python 3, with str %... with bytes %, the problem with be reached at 256, just like python 2). > > Is this an intended exception to the overriding principle? Hmm, thanks for spotting that. Yes, that would be a value error if anything over 255 is used, both currently in Py2, and for bytes in Py3. As Carl suggested, a little more explanation is needed in the PEP. -- ~Ethan~ From cs at zip.com.au Thu Jan 16 09:10:42 2014 From: cs at zip.com.au (Cameron Simpson) Date: Thu, 16 Jan 2014 19:10:42 +1100 Subject: [Python-Dev] PEP 460 reboot In-Reply-To: <20140114202303.504d0b2b@fsol> References: <20140114202303.504d0b2b@fsol> Message-ID: <20140116081042.GA98052@cskk.homeip.net> On 14Jan2014 20:23, Antoine Pitrou wrote: > On Tue, 14 Jan 2014 10:52:05 -0800 > Guido van Rossum wrote: > > Quite a few people have spoken out in favor of loud > > failures rather than silent "wrong" output. But I think that in the > > specific context of formatting output, there is a long and IMO good > > tradition of producing (slightly) wrong output in favor of more > > strict behavior. Consider for example what to do when a number > > doesn't fit in the given width. Would you rather raise an exception, > > truncate the > > value, or mess up the formatting? All languages newer than Fortran > > that I've used have chosen the latter, and I still agree it's a good > > idea. > > Well that's useful when printing out human-readable stuff on stdout, > much less when you're emitting binary data that's supposed to conform > to a well-defined protocol. I expect bytes formatting to be used for > the latter, not the former. I'm 12 hours behind in this thread still, but I'm with Antoine here. With protocols, there's a long and IMO good tradition in the RFC world of being generous in what you accept and conservative in what you send, and writing bytes data constitutes "send" to my mind. While having numbers overflow their widths is (only) often ok for human reports, even that is a PITA for machine parsing later. By way of a text example, my personal bugbear is the UNIX "ps" command in its many flavours. It has fixed width columns with fields that frequently overflow these days, and the overflowing numbers abut each other. Post processing this rubbish is a disaster (I don't want to write "ps", but I have written things that want to read its output). Of course the fix is easy in some ways, use format strings saying "%-5d %-5d %-5d" instead of "%-6d%-6d%-6d". But the authors of ps didn't. And quietly overflowing these fields is exactly what breaks my post processing programs. Morally, this is the same as mojibake. Therefore I am firmly in the "fail loudly" camp: if the format string doesn't behave as you naively expected it to, find out early while you can easily fix it. Cheers, -- Cameron Simpson Motorcycles are like peanuts... who can stop at just one? - Zebee Johnstone aus.motorcycles Poser Permit #1 From larry at hastings.org Wed Jan 15 17:16:06 2014 From: larry at hastings.org (Larry Hastings) Date: Wed, 15 Jan 2014 08:16:06 -0800 Subject: [Python-Dev] Argument Clinic Derby update Message-ID: <52D6B446.1090202@hastings.org> The Derby is moving forward, and I have maybe a half-dozen contributors so far. They've made a small dent in making the conversion but I'd have to say it's going slowly. We could use more people contributing patches! To the contributors with patches that are stalled in the tracker: Sorry, but there are only so many hours in the day. I really am spending all day on this, every day, but I've also been adding new features in response to user requests and that's eaten a lot of my time. I'm getting to the patches in the order they arrived in my mailbox. Please sit tight, I'll get to yours, and I really do appreciate your contribution. If you have more time to contribute, please consider writing more patches. It really will help! And I guess I could also use some volunteers to review patches, too! //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Thu Jan 16 09:25:20 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 16 Jan 2014 10:25:20 +0200 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: <52D77462.8060101@hastings.org> References: <52D76EFF.6080605@hastings.org> <52D77462.8060101@hastings.org> Message-ID: 16.01.14 07:55, Larry Hastings ???????(??): > * itertools.repeat deliberately makes it impossible to provide an > argument for "times" that behaves the same as not providing the > "times" argument, and > * there is currently no way to implement this behavior using Argument > Clinic. (I'd have to add a hack where impl functions also get args > and kwargs.) /*[clinic input] itertools.times object: object [ times: int ] [clinic start generated code]*/ > Are you suggesting that, when converting builtins to Argument Clinic > with unrepresentable default values, we're permitted to tweak the > defaults to something representable? I think we need some standard placeholder for unrepresentable default value. May be "..." or "?"? From storchaka at gmail.com Thu Jan 16 09:31:17 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 16 Jan 2014 10:31:17 +0200 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: <52D76EFF.6080605@hastings.org> <52D77462.8060101@hastings.org> Message-ID: 16.01.14 08:05, Guido van Rossum ???????(??): > In this specific case it's clear to me that the special-casing of > negative count is intentional -- presumably it emulates sequence > repetition, where e.g. 'a'*-1 == ''. In this specific case it's contrary to sequence repetition. Because repeat('a', -1) repeats 'a' forever. This is a point of Vajrasky's issue [1]. > But not accepting None is laziness -- accepting either a number or > None requires much more effort, if you need to have the number as a C > integer. I'm not sure how AC could make this any easier, unless you > want to special-case maxint or -maxint-1. getattr(foo, 'bar', None) is not the same as getattr(foo, 'bar'). So None can't be used as universal default value. [1] http://bugs.python.org/issue19145 From tjreedy at udel.edu Thu Jan 16 10:42:43 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 16 Jan 2014 04:42:43 -0500 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: <52D76EFF.6080605@hastings.org> <52D77462.8060101@hastings.org> Message-ID: On 1/16/2014 3:31 AM, Serhiy Storchaka wrote: > 16.01.14 08:05, Guido van Rossum ???????(??): >> In this specific case it's clear to me that the special-casing of >> negative count is intentional -- presumably it emulates sequence >> repetition, where e.g. 'a'*-1 == ''. > > In this specific case it's contrary to sequence repetition. Because > repeat('a', -1) repeats 'a' forever. 'Forever' only when the keyword is used and the value is -1. In 3.4b2 >>> itertools.repeat('a', -1) repeat('a', 0) >>> itertools.repeat('a', times=-1) repeat('a') >>> itertools.repeat('a', times=-2) repeat('a', -2) > This is a point of Vajrasky's issue [1]. The first line is correct in both behavior and representation. The second line behavior (and corresponding repr) are wrong. The third line repr is wrong but the behavior is like the first. > [1] http://bugs.python.org/issue19145 -- Terry Jan Reedy From ncoghlan at gmail.com Thu Jan 16 10:56:41 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Jan 2014 19:56:41 +1000 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D78F8A.7060003@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> Message-ID: On 16 Jan 2014 17:53, "Ethan Furman" wrote: > > On 01/15/2014 06:45 AM, Brett Cannon wrote: >> >> >> This is why I have argued that if you specify it as "if there is a format spec specified, then the return value from >> calling __format__() will have str.decode('ascii', 'strict') called on it" you get the support for the various >> number-specific format specs for free. > > > It may work like this under the hood, but it's an implementation detail. Since the numeric format codes will call int, index, or float on the object (to handle subclasses), we could then call __format__ on the resulting int or float to do the heavy lifting; but since __format__ on anything else would never be called I don't want to give that impression. I have a different proposal: let's *just* add mod formatting to bytes, and leave the extensible formatting system as a text only operation. We don't really care if bytes supports that method for version compatibility purposes, and the deliberate flexibility of the design makes it hard to translate into the binary domain. So let's just not provide that - let's accept that, for the binary domain, printf style formatting is just a better fit for the job :) Cheers, Nick. > > >> It also means if you pass in a string that you just want the strict ASCII bytes >> of then you can get it with {:s}. > > > This isn't going to happen. If the user wants a string to be in the byte stream, it has to either be a bytes literal or explicitly encoded [1]. > > -- > ~Ethan~ > > [1] Apologies if this has already been answered. I wanted to make sure I responded to all the ideas/objects, and I may have responded more than once to some. It's been a long few threads. ;) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Jan 16 11:02:37 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Jan 2014 23:02:37 +1300 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> Message-ID: <52D7AE3D.4080807@canterbury.ac.nz> Nick Coghlan wrote: > > I have a different proposal: let's *just* add mod formatting to bytes, > and leave the extensible formatting system as a text only operation. +1 -- Greg From ncoghlan at gmail.com Thu Jan 16 11:11:04 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Jan 2014 20:11:04 +1000 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D73343.6080207@oddbird.net> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> Message-ID: On 16 Jan 2014 11:45, "Carl Meyer" wrote: > > Hi Ethan, > > I haven't chimed into this discussion, but the direction it's headed > recently seems right to me. Thanks for putting together a PEP. Some > comments on it: > > On 01/15/2014 05:13 PM, Ethan Furman wrote: > > ============================ > > Abstract > > ======== > > > > This PEP proposes adding the % and {} formatting operations from str to > > bytes [1]. > > I think the PEP could really use a rationale section summarizing _why_ > these formatting operations are being added to bytes; namely that they > are useful when working with various ASCIIish-but-not-properly-text > network protocols and file formats, and in particular when porting code > dealing with such formats/protocols from Python 2. > > Also I think it would be useful to have a section summarizing the > primary objections that have been raised, and why those objections have > been overruled (presuming the PEP is accepted). For instance: the main > objection, AIUI, has been that the bytes type is for pure bytes-handling > with no assumptions about encoding, and thus we should not add features > to it that assume ASCIIness, and that may be attractive nuisances for > people writing bytes-handling code that should not assume ASCIIness but > will once they use the feature. Close, but not quite - the concern was that this was a feature that didn't *inherently* imply a restriction to ASCII compatible data, but only did so when the numeric formatting codes were used. This made it a source of value dependent compatibility errors based on the format string, akin to the kind of value dependent errors seen when implicitly encoding arbitrary text as ASCII. Guido's successful counter was to point out that the parsing of the format string itself assumes ASCII compatible data, thus placing at least the mod-formatting operation in the same category as the currently existing valid-for-sufficiently-ASCII-compatible-data only operations. Current discussions suggest to me that the argument against implicit encoding operations that introduce latent data driven defects may still apply to bytes.format though, so I've reverted to being -1 on that. Cheers, Nick. >And the refutation: that the bytes type > already provides some operations that assume ASCIIness, and these new > formatting features are no more of an attractive nuisance than those; > since the syntax of the formatting mini-languages themselves itself > assumes ASCIIness, there is not likely to be any temptation to use it > with binary data that cannot. > > Although it can be hard to arrive at accurate and agreed-on summaries of > the discussion, recording such summaries in the PEP is important; it may > help save our future selves and colleagues from having to revisit all > these same discussions and megathreads. > > > Overriding Principles > > ===================== > > > > In order to avoid the problems of auto-conversion and value-generated > > exceptions, > > all object checking will be done via isinstance, not by values contained > > in a > > Unicode representation. In other words:: > > > > - duck-typing to allow/reject entry into a byte-stream > > - no value generated errors > > This seems self-contradictory; "isinstance" is type-checking, which is > the opposite of duck-typing. A duck-typing implementation would not use > isinstance, it would call / check for the existence of a certain magic > method instead. > > I think it might also be good to expand (very) slightly on what "the > problems of auto-conversion and value-generated exceptions" are; that > is, that the benefit of Python 3's model is that encoding is explicit, > not implicit, making it harder to unwittingly write code that works as > long as all data is ASCII, but fails as soon as someone feeds in > non-ASCII text data. > > Not everyone who reads this PEP will be steeped in years of discussion > about the relative merits of the Python 2 vs 3 models; it doesn't hurt > to spell out a few assumptions. > > > > Proposed semantics for bytes formatting > > ======================================= > > > > %-interpolation > > --------------- > > > > All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) > > will be supported, and will work as they do for str, including the > > padding, justification and other related modifiers, except locale. > > > > Example:: > > > > >>> b'%4x' % 10 > > b' a' > > > > %c will insert a single byte, either from an int in range(256), or from > > a bytes argument of length 1. > > > > Example: > > > > >>> b'%c' % 48 > > b'0' > > > > >>> b'%c' % b'a' > > b'a' > > > > %s is restricted in what it will accept:: > > > > - input type supports Py_buffer? > > use it to collect the necessary bytes > > > > - input type is something else? > > use its __bytes__ method; if there isn't one, raise an exception [2] > > > > Examples: > > > > >>> b'%s' % b'abc' > > b'abc' > > > > >>> b'%s' % 3.14 > > Traceback (most recent call last): > > ... > > TypeError: 3.14 has no __bytes__ method > > > > >>> b'%s' % 'hello world!' > > Traceback (most recent call last): > > ... > > TypeError: 'hello world' has no __bytes__ method, perhaps you need > > to encode it? > > > > .. note:: > > > > Because the str type does not have a __bytes__ method, attempts to > > directly use 'a string' as a bytes interpolation value will raise an > > exception. To use 'string' values, they must be encoded or otherwise > > transformed into a bytes sequence:: > > > > 'a string'.encode('latin-1') > > > > format > > ------ > > > > The format mini language codes, where they correspond with the > > %-interpolation codes, > > will be used as-is, with three exceptions:: > > > > - !s is not supported, as {} can mean the default for both str and > > bytes, in both > > Py2 and Py3. > > - !b is supported, and new Py3k code can use it to be explicit. > > - no other __format__ method will be called. > > > > Numeric Format Codes > > -------------------- > > > > To properly handle int and float subclasses, int(), index(), and float() > > will be called on the > > objects intended for (d, i, u), (b, o, x, X), and (e, E, f, F, g, G). > > > > Unsupported codes > > ----------------- > > > > %r (which calls __repr__), and %a (which calls ascii() on __repr__) are > > not supported. > > > > !r and !a are not supported. > > > > The n integer and float format code is not supported. > > > > > > Open Questions > > ============== > > > > Currently non-numeric objects go through:: > > > > - Py_buffer > > - __bytes__ > > - failure > > > > Do we want to add a __format_bytes__ method in there? > > > > - Guaranteed to produce only ascii (as in b'10', not b'\x0a') > > - Makes more sense than using __bytes__ to produce ascii output > > - What if an object has both __bytes__ and __format_bytes__? > > > > Do we need to support all the numeric format codes? The floating point > > exponential formats seem less appropriate, for example. > > > > > > Proposed variations > > =================== > > > > It was suggested to let %s accept numbers, but since numbers have their own > > format codes this idea was discarded. > > > > It has been suggested to use %b for bytes instead of %s. > > > > - Rejected as %b does not exist in Python 2.x %-interpolation, which is > > why we are using %s. > > > > It has been proposed to automatically use .encode('ascii','strict') for str > > arguments to %s. > > > > - Rejected as this would lead to intermittent failures. Better to > > have the > > operation always fail so the trouble-spot can be correctly fixed. > > > > It has been proposed to have %s return the ascii-encoded repr when the > > value > > is a str (b'%s' % 'abc' --> b"'abc'"). > > > > - Rejected as this would lead to hard to debug failures far from the > > problem > > site. Better to have the operation always fail so the trouble-spot > > can be > > easily fixed. > > > > > > Footnotes > > ========= > > > > .. [1] string.Template is not under consideration. > > .. [2] TypeError, ValueError, or UnicodeEncodeError? > > TypeError seems right to me. Definitely not UnicodeEncodeError - refusal > to implicitly encode is not at all the same thing as an encoding error. > > Carl > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Jan 16 12:38:05 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 16 Jan 2014 12:38:05 +0100 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments References: <52D76EFF.6080605@hastings.org> <52D77462.8060101@hastings.org> Message-ID: <20140116123805.7cda7cb2@fsol> On Wed, 15 Jan 2014 21:55:46 -0800 Larry Hastings wrote: > > Passing in "None" here is inconvenient as it's an integer argument. Inconvenient for whom? The callee or the caller? Regards Antoine. From solipsis at pitrou.net Thu Jan 16 12:39:49 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 16 Jan 2014 12:39:49 +0100 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments References: <52D76EFF.6080605@hastings.org> <52D77462.8060101@hastings.org> Message-ID: <20140116123949.444eac0c@fsol> On Thu, 16 Jan 2014 04:42:43 -0500 Terry Reedy wrote: > On 1/16/2014 3:31 AM, Serhiy Storchaka wrote: > > 16.01.14 08:05, Guido van Rossum ???????(??): > >> In this specific case it's clear to me that the special-casing of > >> negative count is intentional -- presumably it emulates sequence > >> repetition, where e.g. 'a'*-1 == ''. > > > > In this specific case it's contrary to sequence repetition. Because > > repeat('a', -1) repeats 'a' forever. > > 'Forever' only when the keyword is used and the value is -1. > In 3.4b2 > > >>> itertools.repeat('a', -1) > repeat('a', 0) > >>> itertools.repeat('a', times=-1) > repeat('a') > >>> itertools.repeat('a', times=-2) > repeat('a', -2) Looks like a horrible bug to me. Passing an argument by position should mean the same as passing it by keyword! Regards Antoine. From solipsis at pitrou.net Thu Jan 16 12:40:56 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 16 Jan 2014 12:40:56 +0100 Subject: [Python-Dev] cpython: asyncio: Fix CoroWrapper (fix my previous commit) References: <3f4Rlz1w3bzQPC@mail.python.org> Message-ID: <20140116124056.48fe0422@fsol> On Thu, 16 Jan 2014 01:55:43 +0100 (CET) victor.stinner wrote: > http://hg.python.org/cpython/rev/f07161c4f3aa > changeset: 88494:f07161c4f3aa > user: Victor Stinner > date: Thu Jan 16 01:55:29 2014 +0100 > summary: > asyncio: Fix CoroWrapper (fix my previous commit) > > Add __name__ and __doc__ to __slots__ > > files: > Lib/asyncio/tasks.py | 4 +--- > 1 files changed, 1 insertions(+), 3 deletions(-) > > > diff --git a/Lib/asyncio/tasks.py b/Lib/asyncio/tasks.py > --- a/Lib/asyncio/tasks.py > +++ b/Lib/asyncio/tasks.py > @@ -32,9 +32,7 @@ > > > class CoroWrapper: > - """Wrapper for coroutine in _DEBUG mode.""" > - > - __slots__ = ['gen', 'func'] > + __slots__ = ['gen', 'func', '__name__', '__doc__'] > Why did you remove the docstring? Regards Antoine. From g.brandl at gmx.net Thu Jan 16 13:15:03 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 16 Jan 2014 13:15:03 +0100 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: <20140116123949.444eac0c@fsol> References: <52D76EFF.6080605@hastings.org> <52D77462.8060101@hastings.org> <20140116123949.444eac0c@fsol> Message-ID: Am 16.01.2014 12:39, schrieb Antoine Pitrou: > On Thu, 16 Jan 2014 04:42:43 -0500 > Terry Reedy wrote: > >> On 1/16/2014 3:31 AM, Serhiy Storchaka wrote: >> > 16.01.14 08:05, Guido van Rossum ???????(??): >> >> In this specific case it's clear to me that the special-casing of >> >> negative count is intentional -- presumably it emulates sequence >> >> repetition, where e.g. 'a'*-1 == ''. >> > >> > In this specific case it's contrary to sequence repetition. Because >> > repeat('a', -1) repeats 'a' forever. >> >> 'Forever' only when the keyword is used and the value is -1. >> In 3.4b2 >> >> >>> itertools.repeat('a', -1) >> repeat('a', 0) >> >>> itertools.repeat('a', times=-1) >> repeat('a') >> >>> itertools.repeat('a', times=-2) >> repeat('a', -2) > > Looks like a horrible bug to me. Passing an argument by position should > mean the same as passing it by keyword! Indeed, that should be fixed regardless of AC. Georg From python at mrabarnett.plus.com Thu Jan 16 13:21:19 2014 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 16 Jan 2014 12:21:19 +0000 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: <52D76EFF.6080605@hastings.org> References: <52D76EFF.6080605@hastings.org> Message-ID: <52D7CEBF.7020309@mrabarnett.plus.com> On 2014-01-16 05:32, Larry Hastings wrote: [snip] > In the specific case of SHA1_new's "string" parameter, we could lie and > claim that the default value is b''. Internally we could still use NULL > as a default and get away with it. But this is only a happy > coincidence. Many (most?) functions like this won't have a clever > Python value we can trick you with. > > What else could we do? We could add a special value, let's call it > sys.NULL, whose specific semantics are "turns into NULL when passed into > builtins". This would solve the problem but it's really, really awful. > Would it be better if it were called "__null__"? > The only other option I can see: don't convert SHA1_new() to use > Argument Clinic, and don't provide introspection information for it either. > > Can you, gentle reader, suggest a better option? > > > //arry/ > > p.s. Ryan's function signatures above suggest that he's converting code > from using PyArg_ParseTuple into using PyArg_ParseTupleAndKeywords. I > don't think he's *actually* doing that, and if I saw that in patches > submitted to me I would ask that it be fixed. > From markus at unterwaditzer.net Thu Jan 16 13:52:18 2014 From: markus at unterwaditzer.net (Markus Unterwaditzer) Date: Thu, 16 Jan 2014 13:52:18 +0100 Subject: [Python-Dev] Common subset of python 2 and python 3 Message-ID: <20140116125218.GB1652@untibox.unti> On Wed, Jan 15, 2014 at 01:22:44PM +0100, "Martin v. L?wis" wrote: > Am 12.01.14 18:39, schrieb Nachshon David Armon: > >>> I propose that this new version of python use the python 3 unicode model. > >>> As the version of python will be fully compatible with both python 2 and > >>> with python 3 but NOT necsesarily with all existing code in either. It is > >>> designed as a porting tool only. > > I don't think that it is possible to write an interpreter that is fully > compatible for all it accepts. Would you think that the program > > print(repr(2**80).endswith("L")) > > is in the subset that should be supported by both Python 2 and Python 3? IMO Python 2 and 3 do have this part in common when you talk about valid syntax and available methods and functions, but not in terms of behavior. I think a new proposed Python version should simply crash on your example. I'm kind-of playing devil's advocate here because i agree with previous posters that such a Python version is unneccessary with tox and "python2 -3" > > Notice that it prints "True" in Python 2 and "False" in Python 3. So if > this common-version interpreter *rejects* the above program, which > operation (**, repr, endswith) would you want to ban from subset? Warnings about using certain string methods on repr() might be a neat thing to add to "python -3" or static analysis tools. > > Regards, > Martin > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/markus%40unterwaditzer.net From brett at python.org Thu Jan 16 15:45:14 2014 From: brett at python.org (Brett Cannon) Date: Thu, 16 Jan 2014 09:45:14 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D78F8A.7060003@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> Message-ID: On Thu, Jan 16, 2014 at 2:51 AM, Ethan Furman wrote: > On 01/15/2014 06:45 AM, Brett Cannon wrote: > >> >> This is why I have argued that if you specify it as "if there is a format >> spec specified, then the return value from >> calling __format__() will have str.decode('ascii', 'strict') called on >> it" you get the support for the various >> number-specific format specs for free. >> > > It may work like this under the hood, but it's an implementation detail. I'm arguing it's not an implementation detail but a definition of how bytes.format() would work. > Since the numeric format codes will call int, index, or float on the > object (to handle subclasses), But that's **only** because the numeric types choose to as part of their __format__() implementation; it is not inherent to str.format(). > we could then call __format__ on the resulting int or float to do the > heavy lifting; It's not just the heavy lifting; it does **all** the lifting for format specifications. > but since __format__ on anything else would never be called I don't want > to give that impression. > > Fine, if you're worried about bytes.format() overstepping by implicitly calling str.encode() on the return value of __format__() then you will need __bytes__format__() to get equivalent support. -Brett > > It also means if you pass in a string that you just want the strict ASCII >> bytes >> of then you can get it with {:s}. >> > > This isn't going to happen. If the user wants a string to be in the byte > stream, it has to either be a bytes literal or explicitly encoded [1]. > > -- > ~Ethan~ > > [1] Apologies if this has already been answered. I wanted to make sure I > responded to all the ideas/objects, and I may have responded more than once > to some. It's been a long few threads. ;) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Jan 16 15:46:12 2014 From: brett at python.org (Brett Cannon) Date: Thu, 16 Jan 2014 09:46:12 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> Message-ID: On Thu, Jan 16, 2014 at 4:56 AM, Nick Coghlan wrote: > > On 16 Jan 2014 17:53, "Ethan Furman" wrote: > > > > On 01/15/2014 06:45 AM, Brett Cannon wrote: > >> > >> > >> This is why I have argued that if you specify it as "if there is a > format spec specified, then the return value from > >> calling __format__() will have str.decode('ascii', 'strict') called on > it" you get the support for the various > >> number-specific format specs for free. > > > > > > It may work like this under the hood, but it's an implementation detail. > Since the numeric format codes will call int, index, or float on the > object (to handle subclasses), we could then call __format__ on the > resulting int or float to do the heavy lifting; but since __format__ on > anything else would never be called I don't want to give that impression. > > I have a different proposal: let's *just* add mod formatting to bytes, and > leave the extensible formatting system as a text only operation. > > We don't really care if bytes supports that method for version > compatibility purposes, and the deliberate flexibility of the design makes > it hard to translate into the binary domain. > > So let's just not provide that - let's accept that, for the binary domain, > printf style formatting is just a better fit for the job :) > Or PEP 460 for bytes.format() and PEP 461 for %. -Brett > Cheers, > Nick. > > > > > > >> It also means if you pass in a string that you just want the strict > ASCII bytes > >> of then you can get it with {:s}. > > > > > > This isn't going to happen. If the user wants a string to be in the > byte stream, it has to either be a bytes literal or explicitly encoded [1]. > > > > -- > > ~Ethan~ > > > > [1] Apologies if this has already been answered. I wanted to make sure > I responded to all the ideas/objects, and I may have responded more than > once to some. It's been a long few threads. ;) > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jan 16 16:09:47 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 16 Jan 2014 07:09:47 -0800 Subject: [Python-Dev] PEP 461 updates In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73FF4.2000609@g.nevcal.com> <52D78FB4.90503@stoneleaf.us> Message-ID: <52D7F63B.5040304@stoneleaf.us> On 01/16/2014 04:49 AM, Michael Urman wrote: > On Thu, Jan 16, 2014 at 1:52 AM, Ethan Furman wrote: >>> Is this an intended exception to the overriding principle? >> >> >> Hmm, thanks for spotting that. Yes, that would be a value error if anything >> over 255 is used, both currently in Py2, and for bytes in Py3. As Carl >> suggested, a little more explanation is needed in the PEP. > > FYI, note that str/unicode already has another value-dependent > exception with %c. I find the message surprising, as I wasn't aware > Python had a 'char' type: > >>>> '%c' % 'a' > 'a' >>>> '%c' % 'abc' > Traceback (most recent call last): > File "", line 1, in > TypeError: %c requires int or char Python doesn't have a char type, it has str's of length 1... which are usually referred to as char's. ;) -- ~Ethan~ From nas at arctrix.com Thu Jan 16 16:51:21 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 16 Jan 2014 15:51:21 +0000 (UTC) Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D703D3.5090400@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Neil Schemenauer wrote: >> Objects that implement __str__ can also implement __bytes__ if they >> can guarantee that ASCII characters are always returned, > > I think __ascii_ would be a better name. I'd expect > a method called __bytes__ on an int to return some > version of its binary value. I realize now we can't use __bytes__. Currently, passing an int to bytes() causes it to construct an object with that many null bytes. If we are going to support format() (I'm not convinced it is nessary and could easily be added in a later version), then we need an equivalent to __format__. My vote is either: def __formatascii__(self, spec): ... or def __ascii__(self, spec): ... Previously I was thinking of __bformat__ or __formatb__ but having ascii in the method name is a great reminder. Objects with a natural arbitrary byte representation can implement __bytes__ and %s should use that if it exists. Neil From guido at python.org Thu Jan 16 16:57:33 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Jan 2014 07:57:33 -0800 Subject: [Python-Dev] cpython: asyncio: Fix CoroWrapper (fix my previous commit) In-Reply-To: <20140116124056.48fe0422@fsol> References: <3f4Rlz1w3bzQPC@mail.python.org> <20140116124056.48fe0422@fsol> Message-ID: Because somehow you can't have a slot named __doc__ *and* a docstring in the class. Try it. (I tried to work around this but didn't get very far.) On Thu, Jan 16, 2014 at 3:40 AM, Antoine Pitrou wrote: > On Thu, 16 Jan 2014 01:55:43 +0100 (CET) > victor.stinner wrote: >> http://hg.python.org/cpython/rev/f07161c4f3aa >> changeset: 88494:f07161c4f3aa >> user: Victor Stinner >> date: Thu Jan 16 01:55:29 2014 +0100 >> summary: >> asyncio: Fix CoroWrapper (fix my previous commit) >> >> Add __name__ and __doc__ to __slots__ >> >> files: >> Lib/asyncio/tasks.py | 4 +--- >> 1 files changed, 1 insertions(+), 3 deletions(-) >> >> >> diff --git a/Lib/asyncio/tasks.py b/Lib/asyncio/tasks.py >> --- a/Lib/asyncio/tasks.py >> +++ b/Lib/asyncio/tasks.py >> @@ -32,9 +32,7 @@ >> >> >> class CoroWrapper: >> - """Wrapper for coroutine in _DEBUG mode.""" >> - >> - __slots__ = ['gen', 'func'] >> + __slots__ = ['gen', 'func', '__name__', '__doc__'] >> > > Why did you remove the docstring? > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From christian at python.org Thu Jan 16 17:14:49 2014 From: christian at python.org (Christian Heimes) Date: Thu, 16 Jan 2014 17:14:49 +0100 Subject: [Python-Dev] cpython: asyncio: Fix CoroWrapper (fix my previous commit) In-Reply-To: References: <3f4Rlz1w3bzQPC@mail.python.org> <20140116124056.48fe0422@fsol> Message-ID: On 16.01.2014 16:57, Guido van Rossum wrote: > Because somehow you can't have a slot named __doc__ *and* a docstring > in the class. Try it. (I tried to work around this but didn't get very > far.) That's true for all class attributes. You can't have a slot and a class attribute at the same time. After all the __doc__ string is stored in a class attribute, too. >>> class Example: ... __slots__ = ("egg",) ... # This doesn't work ... egg = None ... Traceback (most recent call last): File "", line 1, in ValueError: 'egg' in __slots__ conflicts with class variable >>> class Example: ... """doc""" ... __slots__ = ("__doc__",) ... Traceback (most recent call last): File "", line 1, in ValueError: '__doc__' in __slots__ conflicts with class variable From murman at gmail.com Thu Jan 16 17:33:32 2014 From: murman at gmail.com (Michael Urman) Date: Thu, 16 Jan 2014 10:33:32 -0600 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> Message-ID: On Thu, Jan 16, 2014 at 8:45 AM, Brett Cannon wrote: > Fine, if you're worried about bytes.format() overstepping by implicitly > calling str.encode() on the return value of __format__() then you will need > __bytes__format__() to get equivalent support. Could we just re-use PEP-3101's note (easily updated for Python 3): Note for Python 2.x: The 'format_spec' argument will be either a string object or a unicode object, depending on the type of the original format string. The __format__ method should test the type of the specifiers parameter to determine whether to return a string or unicode object. It is the responsibility of the __format__ method to return an object of the proper type. If __format__ receives a format_spec of type bytes, it should return bytes. For such cases on objects that cannot support bytes (i.e. for str), it can raise. This appears to avoid the need for additional methods. (As does Nick's proposal of leaving it out for now.) From brett at python.org Thu Jan 16 17:41:03 2014 From: brett at python.org (Brett Cannon) Date: Thu, 16 Jan 2014 11:41:03 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> Message-ID: On Thu, Jan 16, 2014 at 11:33 AM, Michael Urman wrote: > On Thu, Jan 16, 2014 at 8:45 AM, Brett Cannon wrote: > > Fine, if you're worried about bytes.format() overstepping by implicitly > > calling str.encode() on the return value of __format__() then you will > need > > __bytes__format__() to get equivalent support. > > Could we just re-use PEP-3101's note (easily updated for Python 3): > > Note for Python 2.x: The 'format_spec' argument will be either > a string object or a unicode object, depending on the type of the > original format string. The __format__ method should test the type > of the specifiers parameter to determine whether to return a string or > unicode object. It is the responsibility of the __format__ method > to return an object of the proper type. > > If __format__ receives a format_spec of type bytes, it should return > bytes. For such cases on objects that cannot support bytes (i.e. for > str), it can raise. This appears to avoid the need for additional > methods. (As does Nick's proposal of leaving it out for now.) > That's a very good catch, Michael! I think that makes sense if there is precedence. Unfortunately that bit from the PEP never made it into the documentation so I'm not sure if there is a backwards-compatibility worry. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nas at arctrix.com Thu Jan 16 17:42:58 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 16 Jan 2014 16:42:58 +0000 (UTC) Subject: [Python-Dev] PEP 461 updates References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> Message-ID: Carl Meyer wrote: > I think the PEP could really use a rationale section summarizing _why_ > these formatting operations are being added to bytes I agree. My attempt at re-writing the PEP is below. >> In order to avoid the problems of auto-conversion and >> value-generated exceptions, all object checking will be done via >> isinstance, not by values contained in a Unicode representation. >> In other words:: >> >> - duck-typing to allow/reject entry into a byte-stream >> - no value generated errors > > This seems self-contradictory; "isinstance" is type-checking, which is > the opposite of duck-typing. Again, I agree. We should avoid isinstance checks if possible. Abstract ======== This PEP proposes adding %-interpolation to the bytes object. Rational ======== A distruptive but useful change introduced in Python 3.0 was the clean separation of byte strings (i.e. the "bytes" object) from character strings (i.e. the "str" object). The benefit is that character encodings must be explicitly specified and the risk of corrupting character data is reduced. Unfortunately, this separation has made writing certain types of programs more complicated and verbose. For example, programs that deal with network protocols often manipulate ASCII encoded strings. Since the "bytes" type does not support string formatting, extra encoding and decoding between the "str" type is required. For simplicity and convenience it is desireable to introduce formatting methods to "bytes" that allow formatting of ASCII-encoded character data. This change would blur the clean separation of byte strings and character strings. However, it is felt that the practical benefits outweigh the purity costs. The implicit assumption of ASCII-encoding would be limited to formatting methods. One source of many problems with the Python 2 Unicode implementation is the implicit coercion of Unicode character strings into byte strings using the "ascii" codec. If the character strings contain only ASCII characters, all was well. However, if the string contains a non-ASCII character then coercion causes an exception. The combination of implicit coercion and value dependent failures has proven to be a recipe for hard to debug errors. A program may seem to work correctly when tested (e.g. string input that happened to be ASCII only) but later would fail, often with a traceback far from the source of the real error. The formatting methods for bytes() should avoid this problem by not implicitly encoding data that might fail based on the content of the data. Another desirable feature is to allow arbitrary user classes to be used as formatting operands. Generally this is done by introducing a special method that can be implemented by the new class. Proposed semantics for bytes formatting ======================================= Special method __ascii__ ------------------------ A new special method, analogous to __format__, is introduced. This method takes a single argument, a format specifier. The return value is a bytes object. Objects that have an ASCII only representation can implement this method to allow them to be used as format operators. Objects with natural byte representations should implement __bytes__ or the Py_buffer API. %-interpolation --------------- All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. To avoid having to introduce two special methods, the format specifications will be translated to equivalent __format__ specifiers and __ascii__ method of each argument would be called. Example:: >>> b'%4x' % 10 b' a' %c will insert a single byte, either from an int in range(256), or from a bytes argument of length 1. Example: >>> b'%c' % 48 b'0' >>> b'%c' % b'a' b'a' %s is a restricted in what it will accept:: - input type supports Py_buffer or has __bytes__? use it to collect the necessary bytes (may contain non-ASCII characters) - input type is something else? use its __ascii__ method; if there isn't one, raise TypeErorr Examples: >>> b'%s' % b'abc' b'abc' >>> b'%s' % 3.14 b'3.14' >>> b'%4s' % 12 b' 12' >>> b'%s' % 'hello world!' Traceback (most recent call last): ... TypeError: 'hello world' has no __ascii__ method, perhaps you need to encode it? .. note:: Because the str type does not have a __ascii__ method, attempts to directly use 'a string' as a bytes interpolation value will raise an exception. To use 'string' values, they must be encoded or otherwise transformed into a bytes sequence:: 'a string'.encode('latin-1') Unsupported % format codes ^^^^^^^^^^^^^^^^^^^^^^^^^^ %r (which calls __repr__) is not supported format ------ The format() method will not be implemented at this time but may be added in a later Python release. The __ascii__ method is designed to make adding it later simpler. Open Questions ============== Do we need to support the complete set of format codes? For complicated formatting perhaps using the str object to do the formatting and encoding the result is sufficient. Should Python check that the bytes returned by __ascii__ are in the range 0-127 (i.e. ASCII)? That seems of little utility since the error would be similar to a unicode-to-str coercion failure in Python 2 and the traceback would normally be far removed from the real error. Built-in types would be designed to never return non-ASCII characters from the __ascii__ method. Proposed variations =================== Instead of introducing a new special method, have numeric types implement __bytes__. - Adding __bytes__ to the int object is not backwards compatible. bytes() already has an incompatible meaning. It has been suggested to use %b for bytes instead of %s. - Rejected, using %s will making porting code from Python 2 easier. It was suggested to disallow %s from accepting numbers. - Rejected, to ease porting of Python 2 code, %s should accept number operands. It has been proposed to automatically use .encode('ascii','strict') for str arguments to %s. - Rejected as this would lead to intermittent failures. Better to have the operation always fail so the trouble-spot can be correctly fixed. It has been proposed to have %s return the ascii-encoded repr when the value is a str (b'%s' % 'abc' --> b"'abc'"). - Rejected as this would lead to hard to debug failures far from the problem site. Better to have the operation always fail so the trouble-spot can be easily fixed. Instead of having %-interpolation call __ascii__, introduce a second special method analogous to __str__ and have %s call it. - Rejected, __ascii__ is both necessary for implementing format() and sufficient for %-interpolation. While implementing a __ascii__ method is more complicated due to the specifier argument, the number of classes which will do so are limited. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From ethan at stoneleaf.us Thu Jan 16 17:23:13 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 16 Jan 2014 08:23:13 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> Message-ID: <52D80771.1090402@stoneleaf.us> On 01/16/2014 06:45 AM, Brett Cannon wrote: > On Thu, Jan 16, 2014 at 2:51 AM, Ethan Furman wrote: >> On 01/15/2014 06:45 AM, Brett Cannon wrote: >>> >>> This is why I have argued that if you specify it as >>> "if there is a format spec specified, then the return >>> value from calling __format__() will have >>> str.decode('ascii', 'strict') called on it" you get >>> the support for the various number-specific format >>> specs for free. >> Since the numeric format codes will call int, index, >> or float on the object (to handle subclasses), > > But that's **only** because the numeric types choose > to as part of their __format__() implementation; it is > not inherent to str.format(). As I understand it, str.format will call the object's __format__. So, for example, if I say: u'the value is: %d' % myNum(17) then it will be myNum.__format__ that gets called, not int.__format__; this is precisely what we don't want, since can't know that myNum is only going to return ASCII characters. This is why I would have bytes.__format__, as part of its parsing, call int, index, or float depending on the format code; so the above example would have bytes.__format__ calling int() on myNum(17), at which point we either have an int type or an exception was raised because myNum isn't really an integer. Once we have an int, whose format we know and trust, then we can call its __format__ and proceed from there. On the flip side, if myNum does define it's own __format__, it will not be called by bytes.format, and perhaps that is another good reason for bytes to only support %-interpolation and not format? -- ~Ethan~ From nas at arctrix.com Thu Jan 16 18:13:43 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 16 Jan 2014 17:13:43 +0000 (UTC) Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> Message-ID: Michael Urman wrote: > If __format__ receives a format_spec of type bytes, it should return > bytes. For such cases on objects that cannot support bytes (i.e. for > str), it can raise. This appears to avoid the need for additional > methods. (As does Nick's proposal of leaving it out for now.) That's an interesting idea. I proposed __ascii__ as a analogous method to __format__ for bytes formatting and to have %-interpolation use it. However, overloading __format__ based on the type of the argument could work. I see with Python 3: >>> (1).__format__(b'') Traceback (most recent call last): File "", line 1, in TypeError: must be str, not bytes A TypeError exception is what we want if the object does not support bytes formatting. Some possible problems: - It could be hard to provide a helpful exception message since it is generated inside the __format__ method rather than inside the bytes.__mod__ method (in the case of a missing __ascii__ method). The most common error will be using a str object and so we could modify the __format__ method of str to provide a nice hint (use encode()). - Is there some risk that an object will unwittingly implement a __format__ method that unintentionally accepts a bytes argument? That requires some investigation. From murman at gmail.com Thu Jan 16 18:49:52 2014 From: murman at gmail.com (Michael Urman) Date: Thu, 16 Jan 2014 11:49:52 -0600 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> Message-ID: On Thu, Jan 16, 2014 at 11:13 AM, Neil Schemenauer wrote: > A TypeError exception is what we want if the object does not support > bytes formatting. Some possible problems: > > - It could be hard to provide a helpful exception message since it > is generated inside the __format__ method rather than inside the > bytes.__mod__ method (in the case of a missing __ascii__ method). > The most common error will be using a str object and so we could > modify the __format__ method of str to provide a nice hint (use > encode()). The various format functions could certainly intercept and wrap exceptions raised by __format__ methods. Once the core types were modified to expect bytes in format_spec, however, this may not be critical; __format__ methods which delegate would work as expected, str could certainly be clear about why it raised, and custom implementations would be handled per comments I'll make on your second point. Overall I suspect this is no worse than unhandled values in the format_spec are today. > - Is there some risk that an object will unwittingly implement a > __format__ method that unintentionally accepts a bytes argument? > That requires some investigation. Agreed. Some quick armchair calculations suggest to me that there are three likely outcomes: - Properly handle the type (perhaps written with the 2.x clause in mind) - Raise an exception internally (perhaps ValueError, such as from format(3, 'q')) - Mishandle and return a str (perhaps due to to if/else defaulting) The first and second outcome may well reflect what we want, and the third could easily be detected and turned into an exception by the format functions. I'm uncertain whether this reflects all the scenarios we would care about. From v+python at g.nevcal.com Thu Jan 16 19:25:28 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 16 Jan 2014 10:25:28 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> Message-ID: <52D82418.5000506@g.nevcal.com> On 1/16/2014 8:41 AM, Brett Cannon wrote: > That's a very good catch, Michael! I think that makes sense if there > is precedence. Unfortunately that bit from the PEP never made it into > the documentation so I'm not sure if there is a > backwards-compatibility worry. No. If __format__ is called with bytes format, and returns str, there would be an exception generated on the spot. If __format__ is called with bytes format, and tries to use it as str, there would be an exception generated on the spot. Prior to 3.whenever-this-is-implemented, Python 3 only provides str formats to __format__, right? So new code is required to pass bytes to __format__. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Thu Jan 16 19:30:01 2014 From: eric at trueblade.com (Eric V. Smith) Date: Thu, 16 Jan 2014 13:30:01 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D80771.1090402@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> Message-ID: <52D82529.9020708@trueblade.com> On 01/16/2014 11:23 AM, Ethan Furman wrote: > On 01/16/2014 06:45 AM, Brett Cannon wrote: >> On Thu, Jan 16, 2014 at 2:51 AM, Ethan Furman wrote: >>> On 01/15/2014 06:45 AM, Brett Cannon wrote: >>>> >>>> This is why I have argued that if you specify it as >>>> "if there is a format spec specified, then the return >>>> value from calling __format__() will have >>>> str.decode('ascii', 'strict') called on it" you get >>>> the support for the various number-specific format >>>> specs for free. > >>> Since the numeric format codes will call int, index, >>> or float on the object (to handle subclasses), >> >> But that's **only** because the numeric types choose >> to as part of their __format__() implementation; it is >> not inherent to str.format(). > > As I understand it, str.format will call the object's __format__. So, > for example, if I say: > > u'the value is: %d' % myNum(17) > > then it will be myNum.__format__ that gets called, not int.__format__; > this is precisely what we don't want, since can't know that myNum is > only going to return ASCII characters. "Magic" methods, including __format__, are called on the type, not the instance. > This is why I would have bytes.__format__, as part of its parsing, call > int, index, or float depending on the format code; so the above example > would have bytes.__format__ calling int() on myNum(17), at which point > we either have an int type or an exception was raised because myNum > isn't really an integer. Once we have an int, whose format we know and > trust, then we can call its __format__ and proceed from there. > > On the flip side, if myNum does define it's own __format__, it will not > be called by bytes.format, and perhaps that is another good reason for > bytes to only support %-interpolation and not format? For the first iteration of bytes.format(), I think we should just support the exact types of int, float, and bytes. It will call the type's__format__ (with the object as "self") and encode the result to ASCII. For the stated use case of 2.x compatibility, I suspect this will cover > 90% of the uses in real code. If we find there are cases where real code needs additional types supported, we can consider adding __format_ascii__ (or whatever name we cook up). Eric. From guido at python.org Thu Jan 16 19:42:35 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Jan 2014 10:42:35 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: <52D76EFF.6080605@hastings.org> <52D77462.8060101@hastings.org> Message-ID: On Thu, Jan 16, 2014 at 1:42 AM, Terry Reedy wrote: > On 1/16/2014 3:31 AM, Serhiy Storchaka wrote: >> >> 16.01.14 08:05, Guido van Rossum ???????(??): >>> >>> In this specific case it's clear to me that the special-casing of >>> negative count is intentional -- presumably it emulates sequence >>> repetition, where e.g. 'a'*-1 == ''. >> >> >> In this specific case it's contrary to sequence repetition. Because >> repeat('a', -1) repeats 'a' forever. > > > 'Forever' only when the keyword is used and the value is -1. > In 3.4b2 > >>>> itertools.repeat('a', -1) > repeat('a', 0) >>>> itertools.repeat('a', times=-1) > repeat('a') >>>> itertools.repeat('a', times=-2) > repeat('a', -2) > > >> This is a point of Vajrasky's issue [1]. > > The first line is correct in both behavior and representation. > The second line behavior (and corresponding repr) are wrong. > The third line repr is wrong but the behavior is like the first. > >> [1] http://bugs.python.org/issue19145 Eew. This is much more wacko than I thought. (Serves me right for basically not caring about itertools :-(. ) It also mostly sounds like unintended -- I can't imagine the intention was to treat the keyword argument different than the positional argument, but I can easily imagine getting the logic slightly wrong. If I had complete freedom in redefining the spec I would treat positional and keyword the same, interpret absent or None to mean "forever" and explicit negative integers to mean the same as zero, and make repr show a positional integer >= 0 if the repeat isn't None. But I don't know if that's too much of a change. @Antoine: >> Passing in "None" here is inconvenient as it's an integer argument. > > Inconvenient for whom? The callee or the caller? I meant for the callee -- it's slightly complex to code up right. But IMO worth it. -- --Guido van Rossum (python.org/~guido) From guido at python.org Thu Jan 16 19:57:39 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Jan 2014 10:57:39 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: <52D76EFF.6080605@hastings.org> <52D77462.8060101@hastings.org> Message-ID: On Thu, Jan 16, 2014 at 12:31 AM, Serhiy Storchaka wrote: > getattr(foo, 'bar', None) is not the same as getattr(foo, 'bar'). So None > can't be used as universal default value. Not universal, but I still think that most functions don't need to have such a subtle distinction. E.g. in the case of sha1() I still believe that it's totally fine to switch the default to b''. In that particular case I don't see the need to also accept None as a way to specify the default. Basically, my philosophy about this is that anytime you can't easily reimplement the same signature in Python (without reverting to manually parsing the args using *args and **kwds) it is a pain, and you should think twice before canonizing such a signature. Also, there are two somewhat different cases: (a) The default can easily be expressed as a value of the same type that the argument normally has. This is the sha1() case. In this case I see no need to also accept None as an argument (unless it is currently accepted, which it isn't for sha1()). Another example is .read() -- here, passing in a negative integer means the same as not passing an argument. (b) The default has a special meaning that does something different than any valid value. A good example is getattr(), which must forever be special. To me, most functions should fall in (a) even if there is currently ambiguity, and it feels like repeat() was *meant* to be in (a). I'm not sure how AC should deal with (b), but I still hope that true examples are rare enough that we can keep hand-coding them. -- --Guido van Rossum (python.org/~guido) From guido at python.org Thu Jan 16 20:03:46 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Jan 2014 11:03:46 -0800 Subject: [Python-Dev] cpython: asyncio: Fix CoroWrapper (fix my previous commit) In-Reply-To: References: <3f4Rlz1w3bzQPC@mail.python.org> <20140116124056.48fe0422@fsol> Message-ID: Yeah the confusing thing is that omitting the docstring fixes it -- the class still has a __doc__ attribute but apparently it comes from the metaclass. :-) I guess you *could* have both a class and an instance __doc__ by making a really clever descriptor, but it seems simpler to just use a comment instead of a docstring. :-) I'll do this now. On Thu, Jan 16, 2014 at 8:14 AM, Christian Heimes wrote: > On 16.01.2014 16:57, Guido van Rossum wrote: >> Because somehow you can't have a slot named __doc__ *and* a docstring >> in the class. Try it. (I tried to work around this but didn't get very >> far.) > > That's true for all class attributes. You can't have a slot and a class > attribute at the same time. After all the __doc__ string is stored in a > class attribute, too. > >>>> class Example: > ... __slots__ = ("egg",) > ... # This doesn't work > ... egg = None > ... > Traceback (most recent call last): > File "", line 1, in > ValueError: 'egg' in __slots__ conflicts with class variable > > >>>> class Example: > ... """doc""" > ... __slots__ = ("__doc__",) > ... > Traceback (most recent call last): > File "", line 1, in > ValueError: '__doc__' in __slots__ conflicts with class variable > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Thu Jan 16 20:15:42 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 16 Jan 2014 14:15:42 -0500 Subject: [Python-Dev] python code in argument clinic annotations Message-ID: The whole discussion of whether clinic should write its output right in the source file (buffered or not), or in a separate sidefile, started because we currently cannot run the clinic during the build process, since it?s written in python. But what if, at some point, someone implements the Tools/clinic.py in pure C, so that integrating it directly in the build process will be possible? ?In this case, the question is ? should we use python code? in the argument clinic DSL? If we keep it strictly declarative, then, at least, we?ll have this possibility in the future. -- Yury Selivanov From ethan at stoneleaf.us Thu Jan 16 19:55:47 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 16 Jan 2014 10:55:47 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D82529.9020708@trueblade.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> Message-ID: <52D82B33.5040701@stoneleaf.us> On 01/16/2014 10:30 AM, Eric V. Smith wrote: > On 01/16/2014 11:23 AM, Ethan Furman wrote: >> On 01/16/2014 06:45 AM, Brett Cannon wrote: >>> >>> But that's **only** because the numeric types choose >>> to as part of their __format__() implementation; it is >>> not inherent to str.format(). >> >> As I understand it, str.format will call the object's __format__. So, >> for example, if I say: >> >> u'the value is: %d' % myNum(17) >> >> then it will be myNum.__format__ that gets called, not int.__format__; >> this is precisely what we don't want, since can't know that myNum is >> only going to return ASCII characters. > > "Magic" methods, including __format__, are called on the type, not the > instance. Yes, that's why I said `myNum(17)` and not `myNum`. >> This is why I would have bytes.__format__, as part of its parsing, call >> int, index, or float depending on the format code; so the above example >> would have bytes.__format__ calling int() on myNum(17), at which point >> we either have an int type or an exception was raised because myNum >> isn't really an integer. Once we have an int, whose format we know and >> trust, then we can call its __format__ and proceed from there. >> >> On the flip side, if myNum does define it's own __format__, it will not >> be called by bytes.format, and perhaps that is another good reason for >> bytes to only support %-interpolation and not format? > > For the first iteration of bytes.format(), I think we should just > support the exact types of int, float, and bytes. It will call the > type's__format__ (with the object as "self") and encode the result to > ASCII. For the stated use case of 2.x compatibility, I suspect this will > cover > 90% of the uses in real code. If we find there are cases where > real code needs additional types supported, we can consider adding > __format_ascii__ (or whatever name we cook up). That can certainly be our fallback position if we can't decide now how we want to handle int and float subclasses. -- ~Ethan~ From eric at trueblade.com Thu Jan 16 20:28:57 2014 From: eric at trueblade.com (Eric V. Smith) Date: Thu, 16 Jan 2014 14:28:57 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D82B33.5040701@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D82B33.5040701@stoneleaf.us> Message-ID: <52D832F9.5020203@trueblade.com> On 01/16/2014 01:55 PM, Ethan Furman wrote: >> "Magic" methods, including __format__, are called on the type, not the >> instance. > > Yes, that's why I said `myNum(17)` and not `myNum`. Oops, apologies. I misread the code. Eric. From guido at python.org Thu Jan 16 20:36:23 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Jan 2014 11:36:23 -0800 Subject: [Python-Dev] python code in argument clinic annotations In-Reply-To: References: Message-ID: On Thu, Jan 16, 2014 at 11:15 AM, Yury Selivanov wrote: > The whole discussion of whether clinic should write its output > right in the source file (buffered or not), or in a separate sidefile, > started because we currently cannot run the clinic during the build > process, since it?s written in python. But that's why the output is checked in. It's the same with the parser IIRC. (And yes, there's a bootstrap issue -- but that's solved by using an older Python version.) > But what if, at some point, someone implements the Tools/clinic.py in > pure C, so that integrating it directly in the build process will be > possible? In this case, the question is ? should we use python code > in the argument clinic DSL? > > If we keep it strictly declarative, then, at least, we?ll have this > possibility in the future. Sounds like a pretty unlikely scenario. Why would you implement clinic in C? -- --Guido van Rossum (python.org/~guido) From tseaver at palladion.com Thu Jan 16 20:38:59 2014 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 16 Jan 2014 14:38:59 -0500 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: <52D76EFF.6080605@hastings.org> References: <52D76EFF.6080605@hastings.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/16/2014 12:32 AM, Larry Hastings wrote: > We could add a special value, let's call it sys.NULL, whose specific > semantics are "turns into NULL when passed into builtins". This would > solve the problem but it's really, really awful. That doesn't smell too bad too me -- I would prefer to be able to build up all such calls programmatically for testing purposes (e.g., to ensure identical semantics for all code paths between a Python reference implementation and a C extension). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLYNVMACgkQ+gerLs4ltQ79NwCgy3231to9rnw/8I+52hFJE+2w Z9QAnR0pAMfkofhT82K1yQctm0E8TF7j =QaC4 -----END PGP SIGNATURE----- From larry at hastings.org Thu Jan 16 20:42:03 2014 From: larry at hastings.org (Larry Hastings) Date: Thu, 16 Jan 2014 11:42:03 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: <20140116123805.7cda7cb2@fsol> References: <52D76EFF.6080605@hastings.org> <52D77462.8060101@hastings.org> <20140116123805.7cda7cb2@fsol> Message-ID: <52D8360B.4030507@hastings.org> On 01/16/2014 03:38 AM, Antoine Pitrou wrote: > On Wed, 15 Jan 2014 21:55:46 -0800 > Larry Hastings wrote: >> Passing in "None" here is inconvenient as it's an integer argument. > Inconvenient for whom? The callee or the caller? The callee, specifically the C argument parsing code. (Even more specifically: the Argument Clinic argument parsing code generator.) //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Thu Jan 16 20:46:48 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 16 Jan 2014 14:46:48 -0500 Subject: [Python-Dev] python code in argument clinic annotations In-Reply-To: References: Message-ID: Guido, On Thursday, January 16, 2014, Guido van Rossum wrote: > On Thu, Jan 16, 2014 at 11:15 AM, Yury Selivanov > > wrote: > > The whole discussion of whether clinic should write its output > > right in the source file (buffered or not), or in a separate sidefile, > > started because we currently cannot run the clinic during the build > > process, since it?s written in python. > > But that's why the output is checked in. It's the same with the parser > IIRC. (And yes, there's a bootstrap issue -- but that's solved by > using an older Python version.) > > > But what if, at some point, someone implements the Tools/clinic.py in > > pure C, so that integrating it directly in the build process will be > > possible? In this case, the question is ? should we use python code > > in the argument clinic DSL? > > > > If we keep it strictly declarative, then, at least, we?ll have this > > possibility in the future. > > Sounds like a pretty unlikely scenario. Why would you implement clinic in > C? Unlikely, yes. There is just one reason for having it in C -- having it integrated in the build process, so that the generated output/sidefiles are not in the repository. Yury -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Jan 16 20:43:31 2014 From: larry at hastings.org (Larry Hastings) Date: Thu, 16 Jan 2014 11:43:31 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: <52D7CEBF.7020309@mrabarnett.plus.com> References: <52D76EFF.6080605@hastings.org> <52D7CEBF.7020309@mrabarnett.plus.com> Message-ID: <52D83663.8040403@hastings.org> On 01/16/2014 04:21 AM, MRAB wrote: > On 2014-01-16 05:32, Larry Hastings wrote: > [snip] > >> We could add a special value, let's call it >> sys.NULL, whose specific semantics are "turns into NULL when passed into >> builtins". This would solve the problem but it's really, really awful. >> > Would it be better if it were called "__null__"? No. The problem is not the name, the problem is in the semantics. This would mean a permanent special case in Python's argument parsing (and "special cases aren't special enough to break the rules"), and would inflict these same awful semantics on alternate implementations like PyPy, Jython, and IronPython. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Jan 16 20:50:24 2014 From: larry at hastings.org (Larry Hastings) Date: Thu, 16 Jan 2014 11:50:24 -0800 Subject: [Python-Dev] python code in argument clinic annotations In-Reply-To: References: Message-ID: <52D83800.7040005@hastings.org> On 01/16/2014 11:36 AM, Guido van Rossum wrote: > On Thu, Jan 16, 2014 at 11:15 AM, Yury Selivanov > wrote: >> If we keep it strictly declarative, then, at least, we?ll have this >> possibility in the future. > Sounds like a pretty unlikely scenario. Why would you implement clinic in C? We'll never reimplement Argument Clinic in C. I could list many reasons for this. Suffice to say, I'm not doing it, and I doubt anyone else would ever step up to the plate and try it. And, "form follows function". It's a bad idea to limit Argument Clinic's features today based on what might be inconvenient someday in some hypothetical rewrite in C. Argument Clinic should be maximally useful, right now. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Jan 16 20:55:11 2014 From: larry at hastings.org (Larry Hastings) Date: Thu, 16 Jan 2014 11:55:11 -0800 Subject: [Python-Dev] python code in argument clinic annotations In-Reply-To: References: Message-ID: <52D8391F.2080903@hastings.org> On 01/16/2014 11:46 AM, Yury Selivanov wrote: > There is just one reason for having it in C -- > having it integrated in the build process, > so that the generated output/sidefiles > are not in the repository. It's possible to integrate Argument Clinic into the build process without rewriting it in C. We could write a small C program that looked on your path for a suitable Python 3 interpreter, and ran Tools/clinic/clinic.py under that interpreter. If it failed to find such an interpreter it could print a warning message. Alternatively, we could add a checksum for the Clinic *input* block to the output somewhere. This would give the C tool the ability to check and see if the Clinic input had changed, and only bother to run clinic.py if it had. However, the generated output is still going to be checked in to the repository regardless. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Thu Jan 16 20:55:56 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 16 Jan 2014 20:55:56 +0100 Subject: [Python-Dev] python code in argument clinic annotations In-Reply-To: References: Message-ID: Am 16.01.2014 20:46, schrieb Yury Selivanov: > Guido, > > On Thursday, January 16, 2014, Guido van Rossum > wrote: > > On Thu, Jan 16, 2014 at 11:15 AM, Yury Selivanov > > wrote: > > The whole discussion of whether clinic should write its output > > right in the source file (buffered or not), or in a separate sidefile, > > started because we currently cannot run the clinic during the build > > process, since it?s written in python. > > But that's why the output is checked in. It's the same with the parser > IIRC. (And yes, there's a bootstrap issue -- but that's solved by > using an older Python version.) > > > But what if, at some point, someone implements the Tools/clinic.py in > > pure C, so that integrating it directly in the build process will be > > possible? In this case, the question is ? should we use python code > > in the argument clinic DSL? > > > > If we keep it strictly declarative, then, at least, we?ll have this > > possibility in the future. > > Sounds like a pretty unlikely scenario. Why would you implement clinic in C? > > > Unlikely, yes. About as unlikely as switching the Python sources to C++ and using templates to implement a Clinic-like DSL :) Georg From yselivanov.ml at gmail.com Thu Jan 16 21:07:13 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 16 Jan 2014 15:07:13 -0500 Subject: [Python-Dev] python code in argument clinic annotations In-Reply-To: <52D8391F.2080903@hastings.org> References: <52D8391F.2080903@hastings.org> Message-ID: Larry, On January 16, 2014 at 2:58:02 PM, Larry Hastings (larry at hastings.org) wrote: > > However, the generated output is still going to be checked in > to the repository regardless. OK. Since it looks like it?s 100% accepted to commit it to the repo, then my question is moot. And again, Larry, kudos for pulling the AC off. - Yury From larry at hastings.org Thu Jan 16 21:13:20 2014 From: larry at hastings.org (Larry Hastings) Date: Thu, 16 Jan 2014 12:13:20 -0800 Subject: [Python-Dev] Closing the Clinic output format debate (at least for now) Message-ID: <52D83D60.6060106@hastings.org> The current tally of votes, by order of popularity: Side file: +6 Buffer: +1.5 Multiple buffers, Modified buffer, Forward buffer: +1 Original: -5 However, as stated, support for "side files" will not go in unless Guido explicitly states that it's okay with him. He has not. Therefore it's not going in. If you want this feature, take it up with our BDFL. I feel my hands are tied. Second-best is all the buffer approaches, collectively. Since there was no clear winner, I'm going to make the new default the "modified buffer" approach, as that's the only one that does not require rearranging your code to use. However, to encourage continued experimentation, I'm going to leave in the configurability (at least for now), so people can keep experimenting. Maybe we'll find something in the future that's a clear new favorite. As a stretch goal, I'd like to also add Zachary Ware's proposed "forward" buffer, as a further concession to experimentation. It shouldn't be too messy, but if it gets out of hand I'll back out of it. Finally, I'm going to add support for "presets" so you can switch between original / modified buffer / buffer / forward buffer with just one statement. (Multiple buffers doesn't need a different preset.) I'll also keep the line prefix (and add a line suffix too) and see if a prefix of "/*clinic*/" helps. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at hotpy.org Thu Jan 16 21:15:23 2014 From: mark at hotpy.org (Mark Shannon) Date: Thu, 16 Jan 2014 20:15:23 +0000 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: <52D83663.8040403@hastings.org> References: <52D76EFF.6080605@hastings.org> <52D7CEBF.7020309@mrabarnett.plus.com> <52D83663.8040403@hastings.org> Message-ID: <52D83DDB.9030205@hotpy.org> On 16/01/14 19:43, Larry Hastings wrote: > On 01/16/2014 04:21 AM, MRAB wrote: >> On 2014-01-16 05:32, Larry Hastings wrote: >> [snip] >> >>> We could add a special value, let's call it >>> sys.NULL, whose specific semantics are "turns into NULL when passed into >>> builtins". This would solve the problem but it's really, really awful. >>> >> Would it be better if it were called "__null__"? > > No. The problem is not the name, the problem is in the semantics. This > would mean a permanent special case in Python's argument parsing (and > "special cases aren't special enough to break the rules"), and would > inflict these same awful semantics on alternate implementations like > PyPy, Jython, and IronPython. Indeed. Why not just change the clinic spec a bit, from 'The "default" is a Python literal value.' to 'The "default" is a Python literal value or NULL.'? A NULL default would imply the parameter is optional with no default. Cheers, Mark. From tjreedy at udel.edu Thu Jan 16 22:18:04 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 16 Jan 2014 16:18:04 -0500 Subject: [Python-Dev] PEP 461 updates In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> Message-ID: <52D84C8C.6090408@udel.edu> On 1/16/2014 5:11 AM, Nick Coghlan wrote: > Guido's successful counter was to point out that the parsing of the > format string itself assumes ASCII compatible data, Did you see my explanation, which I wrote in response to one of your earlier posts, of why I think "the parsing of the format string itself assumes ASCII compatible data" that statement is confused and wrong? The above seems to say that what I wrote is impossible, but perhaps I misunderstand what Guido and you mean. Among my questions are "by data, do you mean interpolated objects or interpolated bytes?" and "what restriction on 'data' do you intend by 'ASCII compatible'?". -- Terry Jan Reedy From larry at hastings.org Thu Jan 16 22:20:40 2014 From: larry at hastings.org (Larry Hastings) Date: Thu, 16 Jan 2014 13:20:40 -0800 Subject: [Python-Dev] Closing the Clinic output format debate (at least for now) In-Reply-To: <52D83D60.6060106@hastings.org> References: <52D83D60.6060106@hastings.org> Message-ID: <52D84D28.9030302@hastings.org> On 01/16/2014 12:13 PM, Larry Hastings wrote: > > > The current tally of votes, by order of popularity: > > Side file: +6 > Buffer: +1.5 > Multiple buffers, Modified buffer, Forward buffer: +1 > Original: -5 > I should add, that's out of a total of eleven votes cast. So the side file was a clear winner but far from unanimous. Since the votes were all public, the tally might as well be to. Here it is in handy "CSV" format: "Names", Original", "Side File", "Buffer", "Multiple Buffers", "Modified Buffer", "Forward Buffer" "Totals", -5, 6, 1.5, 1, 1, 1 "Brett Cannon", 0, 0, 1, 1, 0, 0 "Antoine Pitrou", -0.5, 1, 0, 0, 0, 0 "Raymond Hettinger", -1, 1, 0, 0, 0, 0 "Zachary Ware", -1, 0, 1, 1, 0, 0 "Georg Brandl", -1, 0, 0, 1, 0, 0 "Serhiy Storchaka", -1, 1, 0, 0, 0, 0 "Yury Selivanov", 0, 1, -1, -1, -1, 0 "Ryan Smith-Roberts", 1, 0, 0, -1, 1, 1 "Ethan Furman", 0, 0, 0.5, 0, 1, 0 "Meador Inge", -0.5, 1, 0, 0, 0, 0 "Stefan Krah", -1, 1, 0, 0, 0, 0 //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Thu Jan 16 22:24:45 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 16 Jan 2014 22:24:45 +0100 Subject: [Python-Dev] Closing the Clinic output format debate (at least for now) In-Reply-To: <52D84D28.9030302@hastings.org> References: <52D83D60.6060106@hastings.org> <52D84D28.9030302@hastings.org> Message-ID: Am 16.01.2014 22:20, schrieb Larry Hastings: > On 01/16/2014 12:13 PM, Larry Hastings wrote: >> >> >> The current tally of votes, by order of popularity: >> >> Side file: +6 >> Buffer: +1.5 >> Multiple buffers, Modified buffer, Forward buffer: +1 >> Original: -5 >> > > I should add, that's out of a total of eleven votes cast. So the side file was > a clear winner but far from unanimous. > > Since the votes were all public, the tally might as well be to. Here it is in > handy "CSV" format: > > "Names", Original", "Side File", "Buffer", "Multiple Buffers", "Modified > Buffer", "Forward Buffer" > "Totals", -5, 6, 1.5, 1, 1, 1 > "Brett Cannon", 0, 0, 1, 1, 0, 0 > "Antoine Pitrou", -0.5, 1, 0, 0, 0, 0 > "Raymond Hettinger", -1, 1, 0, 0, 0, 0 > "Zachary Ware", -1, 0, 1, 1, 0, 0 > "Georg Brandl", -1, 0, 0, 1, 0, 0 > "Serhiy Storchaka", -1, 1, 0, 0, 0, 0 > "Yury Selivanov", 0, 1, -1, -1, -1, 0 > "Ryan Smith-Roberts", 1, 0, 0, -1, 1, 1 > "Ethan Furman", 0, 0, 0.5, 0, 1, 0 > "Meador Inge", -0.5, 1, 0, 0, 0, 0 > "Stefan Krah", -1, 1, 0, 0, 0, 0 Although this is neglecting the difference between +0 and -0 :) Georg From larry at hastings.org Thu Jan 16 22:32:41 2014 From: larry at hastings.org (Larry Hastings) Date: Thu, 16 Jan 2014 13:32:41 -0800 Subject: [Python-Dev] Closing the Clinic output format debate (at least for now) In-Reply-To: References: <52D83D60.6060106@hastings.org> <52D84D28.9030302@hastings.org> Message-ID: <52D84FF9.5050705@hastings.org> On 01/16/2014 01:24 PM, Georg Brandl wrote: > Although this is neglecting the difference between +0 and -0 :) I hear LibreOffice is accepting external patches again. //arr//y/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmsr at lab.net Thu Jan 16 22:08:47 2014 From: rmsr at lab.net (Ryan Smith-Roberts) Date: Thu, 16 Jan 2014 13:08:47 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: <52D83DDB.9030205@hotpy.org> References: <52D76EFF.6080605@hastings.org> <52D7CEBF.7020309@mrabarnett.plus.com> <52D83663.8040403@hastings.org> <52D83DDB.9030205@hotpy.org> Message-ID: Let me expand on the issue, and address some of the replies. The goal of Argument Clinic is to create new docstring signatures for builtins, with the following properties: 1) Useful. While one can create a signature of func(*args) and then document complex and arbitrary restrictions on what args contains, that isn't helpful to the end-user examining the docstring, or to automated tools. 2) Inspectable. For a signature to be compatible with inspect.signature(), it *must be a valid native Python declaration*. This means no optional positional arguments of the form func(foo[, bar]), and no non-Python default values. 3) Correct. The semantics of the builtin's signature should match the expectations users have about pure Python declarations. There are two classes of builtins whose signatures do not have these properties. The first is those with very weird signatures, like curses.window.addstr(). It's fine that those don't get converted, they're hopeless. A second class is builtins with "almost but not quite" usable signatures, mostly the ones with optional positional parameters. It would be nice to "rescue" those builtins. So, let us return to my original example, getservbyname(). Its current signature: socket.getservbyname(servicename[, protocolname]) This is not an inspectable signature, since pure Python does not support bracketed arguments. To make it inspectable, we must give protocolname a (valid Python) default value: socket.getservbyname(servicename, protocolname=None) Unfortunately, while useful and inspectable, this signature is not correct. For a pure Python function, passing None for protocolname is the same as omitting it. However, if you pass None to getservbyname(), it raises a TypeError. So, we have these three options: 1) Don't give getservbyname() an inspectable signature. 2) Lie to the user about the acceptability of None. 3) Alter the semantics of getservbyname() to treat None as equivalent to omitting protocolname. Obviously #2 is out. My question: is #3 ever acceptable? It's a real change, as it breaks any code that relies on the TypeError exception. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jan 16 22:59:12 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Jan 2014 13:59:12 -0800 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D84C8C.6090408@udel.edu> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> Message-ID: On Thu, Jan 16, 2014 at 1:18 PM, Terry Reedy wrote: > On 1/16/2014 5:11 AM, Nick Coghlan wrote: > >> Guido's successful counter was to point out that the parsing of the >> format string itself assumes ASCII compatible data, > > Did you see my explanation, which I wrote in response to one of your earlier > posts, of why I think "the parsing of the format string itself assumes ASCII > compatible data" that statement is confused and wrong? The above seems to > say that what I wrote is impossible, but perhaps I misunderstand what Guido > and you mean. Among my questions are "by data, do you mean interpolated > objects or interpolated bytes?" and "what restriction on 'data' do you > intend by 'ASCII compatible'?". Can you move the meta-discussion off-list? I'm getting tired of "did you understand what I said". -- --Guido van Rossum (python.org/~guido) From guido at python.org Thu Jan 16 23:01:07 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Jan 2014 14:01:07 -0800 Subject: [Python-Dev] Closing the Clinic output format debate (at least for now) In-Reply-To: <52D83D60.6060106@hastings.org> References: <52D83D60.6060106@hastings.org> Message-ID: I am tired of being the only blocker. So I withdraw my preference. Do what you all can agree on without me. On Thu, Jan 16, 2014 at 12:13 PM, Larry Hastings wrote: > > > The current tally of votes, by order of popularity: > > Side file: +6 > Buffer: +1.5 > Multiple buffers, Modified buffer, Forward buffer: +1 > Original: -5 > > > However, as stated, support for "side files" will not go in unless Guido > explicitly states that it's okay with him. He has not. Therefore it's not > going in. If you want this feature, take it up with our BDFL. I feel my > hands are tied. > > Second-best is all the buffer approaches, collectively. Since there was no > clear winner, I'm going to make the new default the "modified buffer" > approach, as that's the only one that does not require rearranging your code > to use. However, to encourage continued experimentation, I'm going to leave > in the configurability (at least for now), so people can keep experimenting. > Maybe we'll find something in the future that's a clear new favorite. > > As a stretch goal, I'd like to also add Zachary Ware's proposed "forward" > buffer, as a further concession to experimentation. It shouldn't be too > messy, but if it gets out of hand I'll back out of it. > > Finally, I'm going to add support for "presets" so you can switch between > original / modified buffer / buffer / forward buffer with just one > statement. (Multiple buffers doesn't need a different preset.) > > I'll also keep the line prefix (and add a line suffix too) and see if a > prefix of "/*clinic*/" helps. > > > /arry > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Thu Jan 16 23:01:56 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 16 Jan 2014 17:01:56 -0500 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: <52D76EFF.6080605@hastings.org> <52D77462.8060101@hastings.org> Message-ID: > On Thu, Jan 16, 2014 at 1:42 AM, Terry Reedy wrote: >>>>> itertools.repeat('a', -1) >> repeat('a', 0) >>>>> itertools.repeat('a', times=-1) >> repeat('a') >>>>> itertools.repeat('a', times=-2) >> repeat('a', -2) >> The first line is correct in both behavior and representation. >> The second line behavior (and corresponding repr) are wrong. >> The third line repr is wrong but the behavior is like the first. >> >>> [1] http://bugs.python.org/issue19145 On 1/16/2014 1:42 PM, Guido van Rossum wrote: > If I had complete freedom in redefining the spec I would treat > positional and keyword the same, interpret absent or None to mean > "forever" and explicit negative integers to mean the same as zero, and > make repr show a positional integer >= 0 if the repeat isn't None. > > But I don't know if that's too much of a change. I copied the unsnipped stuff above to a tracker message. http://bugs.python.org/issue19145 -- Terry Jan Reedy From zuo at chopin.edu.pl Thu Jan 16 23:06:37 2014 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Thu, 16 Jan 2014 23:06:37 +0100 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> Message-ID: 16.01.2014 17:33, Michael Urman wrote: > On Thu, Jan 16, 2014 at 8:45 AM, Brett Cannon > wrote: >> Fine, if you're worried about bytes.format() overstepping by >> implicitly >> calling str.encode() on the return value of __format__() then you >> will need >> __bytes__format__() to get equivalent support. > > Could we just re-use PEP-3101's note (easily updated for Python 3): > > Note for Python 2.x: The 'format_spec' argument will be either > a string object or a unicode object, depending on the type of the > original format string. The __format__ method should test the > type > of the specifiers parameter to determine whether to return a > string or > unicode object. It is the responsibility of the __format__ > method > to return an object of the proper type. > > If __format__ receives a format_spec of type bytes, it should return > bytes. For such cases on objects that cannot support bytes (i.e. for > str), it can raise. This appears to avoid the need for additional > methods. (As does Nick's proposal of leaving it out for now.) -1. I'd treat the format()+.__format__()+str.format()-"ecosystem" as a nice text-data-oriented, *complete* Py3k feature, backported to Python 2 to share the benefits of the feature with it as well as to make the 2-to-3 transition a bit easier. IMHO, the PEP-3101's note cited above just describes a workaround over the flaws of the Py2's obsolete text model. Moving such complications into Py3k would make the feature (and especially the ability to implement your own .__format__()) harder to understand and make use of -- for little profit. Such a move is not needed for compatibility. And, IMHO, the format()/__format__()/str.format()-matter is all about nice and flexible *text* formatting, not about binary data interpolation. 16.01.2014 10:56, Nick Coghlan wrote: > I have a different proposal: let's *just* add mod formatting to > bytes, and leave the extensible formatting system as a text only > operation. > > We don't really care if bytes supports that method for version > compatibility purposes, and the deliberate flexibility of the design > makes it hard to translate into the binary domain. > > So let's just not provide that - let's accept that, for the binary > domain, printf style formatting is just a better fit for the job :) +1! However, I am not sure if %s should be limited to bytes-like objects. As "practicality beats purity", I would be +0.5 for enabling the following: - input type supports Py_buffer? use it to collect the necessary bytes - input type has the __bytes__() method? use it to collect the necessary bytes - input type has the encode() method? raise TypeError - otherwise: use something equivalent to ascii(obj).encode('ascii') (note that it would nicely format numbers + format other object in more-or-less useful way without the fear of encountering a non-ascii data). another option: use str()-representation of strictly defined types, e.g.: int, float, decimal.Decimal, fractions.Fraction... Cheers. *j From tseaver at palladion.com Thu Jan 16 23:12:32 2014 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 16 Jan 2014 17:12:32 -0500 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: <52D76EFF.6080605@hastings.org> <52D7CEBF.7020309@mrabarnett.plus.com> <52D83663.8040403@hastings.org> <52D83DDB.9030205@hotpy.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/16/2014 04:08 PM, Ryan Smith-Roberts wrote: > [L]et us return to my original example, getservbyname(). Its current > signature: > > socket.getservbyname(servicename[, protocolname]) > > This is not an inspectable signature, since pure Python does not > support bracketed arguments. To make it inspectable, we must give > protocolname a (valid Python) default value: > > socket.getservbyname(servicename, protocolname=None) > > Unfortunately, while useful and inspectable, this signature is not > correct. For a pure Python function, passing None for protocolname is > the same as omitting it. However, if you pass None to getservbyname(), > it raises a TypeError. So, we have these three options: > > 1) Don't give getservbyname() an inspectable signature. 2) Lie to the > user about the acceptability of None. 3) Alter the semantics of > getservbyname() to treat None as equivalent to omitting protocolname. > > Obviously #2 is out. My question: is #3 ever acceptable? It's a real > change, as it breaks any code that relies on the TypeError exception. +1 for #3, especially in a new "major" release (w/ sufficient documentation of the change). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLYWU8ACgkQ+gerLs4ltQ6obQCglHmIM4kcNOQte7jj9NjL6Xia KQwAn2ircAlSR6iwFIAt6PDz0bs6iIDt =G+GC -----END PGP SIGNATURE----- From tjreedy at udel.edu Thu Jan 16 23:34:09 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 16 Jan 2014 17:34:09 -0500 Subject: [Python-Dev] PEP 461 updates In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> Message-ID: <52D85E61.8040701@udel.edu> On 1/16/2014 4:59 PM, Guido van Rossum wrote: > I'm getting tired of "did you understand what I said". I was asking whether I needed to repeat myself, but forget that. I was also saying that while I understand 'ascii-compatible encoding', I do not understand the notion of 'ascii-compatible data' or statements based on it. From larry at hastings.org Fri Jan 17 01:01:18 2014 From: larry at hastings.org (Larry Hastings) Date: Thu, 16 Jan 2014 16:01:18 -0800 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: <52D76EFF.6080605@hastings.org> <52D7CEBF.7020309@mrabarnett.plus.com> <52D83663.8040403@hastings.org> <52D83DDB.9030205@hotpy.org> Message-ID: <52D872CE.90601@hastings.org> On 01/16/2014 01:08 PM, Ryan Smith-Roberts wrote: > There are two classes of builtins whose signatures do not have these > properties. The first is those with very weird signatures, like > curses.window.addstr(). It's fine that those don't get converted, > they're hopeless. Speaking as the father of Argument Clinic, I disagree. My goal with Clinic is to convert every function in CPython whose semantics can be expressed with a PyArg_ parsing function. For example, curses.window.addstr could be converted just fine. Its signature is exactly the same as curses.window.addch which has already been converted. socket.sendto eludes me for now--but I haven't given up yet. Don't give up hope, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jan 17 01:26:21 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 17 Jan 2014 10:26:21 +1000 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D85E61.8040701@udel.edu> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D85E61.8040701@udel.edu> Message-ID: On 17 Jan 2014 09:36, "Terry Reedy" wrote: > > On 1/16/2014 4:59 PM, Guido van Rossum wrote: > >> I'm getting tired of "did you understand what I said". > > > I was asking whether I needed to repeat myself, but forget that. > I was also saying that while I understand 'ascii-compatible encoding', I do not understand the notion of 'ascii-compatible data' or statements based on it. There are plenty of data formats (like SMTP and HTTP) that are constrained to be ASCII compatible, either globally, or locally in the parts being manipulated by an application (such as a file header). ASCII incompatible segments may be present, but in ways that allow the data processing to handle them correctly. The ASCII assuming methods on bytes objects are there to help in dealing with that kind of data. If the binary data is just one large block in a single text encoding, it's generally easier to just decode it to text, but multipart formats generally don't allow that. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Fri Jan 17 02:32:27 2014 From: greg.ewing at canterbury.ac.nz (Greg) Date: Fri, 17 Jan 2014 14:32:27 +1300 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D84C8C.6090408@udel.edu> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> Message-ID: <52D8882B.1060303@canterbury.ac.nz> On 17/01/2014 10:18 a.m., Terry Reedy wrote: > On 1/16/2014 5:11 AM, Nick Coghlan wrote: > >> Guido's successful counter was to point out that the parsing of the >> format string itself assumes ASCII compatible data, Nick's initial arguments against bytes formatting were very abstract and philosophical, along the lines that it violated some pure mental model of text/bytes separation. Then Guido said something that Nick took to be an equal and opposite philosophical argument that cancelled out his original objections, and he withdrew them. I don't think it matters whether the internal details of that debate make sense to the rest of us. The main thing is that a consensus seems to have been reached on bytes formatting being basically a good thing. -- Greg From ethan at stoneleaf.us Fri Jan 17 02:51:43 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 16 Jan 2014 17:51:43 -0800 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D8882B.1060303@canterbury.ac.nz> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D8882B.1060303@canterbury.ac.nz> Message-ID: <52D88CAF.6090406@stoneleaf.us> On 01/16/2014 05:32 PM, Greg wrote: > > I don't think it matters whether the internal details of that > debate make sense to the rest of us. The main thing is that > a consensus seems to have been reached on bytes formatting > being basically a good thing. And a good thing, too, on both counts! :) A few folks have suggested not implementing .format() on bytes; I've been resistant, but then I remembered that format is also a function. http://docs.python.org/3/library/functions.html?highlight=ascii#format ====================================================================== format(value[, format_spec]) Convert a value to a ?formatted? representation, as controlled by format_spec. The interpretation of format_spec will depend on the type of the value argument, however there is a standard formatting syntax that is used by most built-in types: Format Specification Mini-Language. The default format_spec is an empty string which usually gives the same effect as calling str(value). A call to format(value, format_spec) is translated to type(value).__format__(format_spec) which bypasses the instance dictionary when searching for the value?s __format__() method. A TypeError exception is raised if the method is not found or if either the format_spec or the return value are not strings. ====================================================================== Given that, I can relent on .format and just go with .__mod__ . A low-level service for a low-level protocol, what? ;) -- ~Ethan~ From stephen at xemacs.org Fri Jan 17 03:19:44 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 17 Jan 2014 11:19:44 +0900 Subject: [Python-Dev] PEP 461 updates In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D85E61.8040701@udel.edu> Message-ID: <874n53cptr.fsf@uwakimon.sk.tsukuba.ac.jp> Meta enough that I'll take Guido out of the CC. Nick Coghlan writes: > There are plenty of data formats (like SMTP and HTTP) that are > constrained to be ASCII compatible, "ASCII compatible" is a technical term in encodings, which means "bytes in the range 0-127 always have ASCII coded character semantics, do what you like with bytes in the range 128-255."[1] Worse, it's clearly confusing in this discussion. Let's stop using this term to mean the data format has elements that are defined to contain only bytes with ASCII coded character semantics (which is the relevant restriction AFAICS -- I don't know of any ASCII-compatible formats where the bytes 128-255 are used for any purpose other than encoding non-ASCII characters). OTOH, if it *is* an ASCII-compatible text encoding, the semantics are dubious if the bytes versions of many of these methods/operations are used. A documentation suggestion: It's easy enough to rewrite > constrained to be ASCII compatible, either globally, or locally in > the parts being manipulated by an application (such as a file > header). ASCII incompatible segments may be present, but in ways > that allow the data processing to handle them correctly. as containing 'well-defined segments constrained to be (strictly) ASCII-encoded' (aka ASCII segments). And then you can say are designed for use *only* on bytes that are ASCII segments; use on other data is likely to cause hard-to-diagnose corruption. If there are other use cases for "ASCII-compatible data formats" as defined above (not worrying about codecs, because they are a very small minority of code-to-be-written at this point), I don't know about them. Does anyone? If there are any, I'll be happy to revise. If not, that seems to be a precise and intelligible statement of the restrictions that is useful to the practical use cases. And nothing stops users who think they know what they're doing from using them in other contexts (which can be documented if they turn out to be broadly useful). Footnotes: [1] "ASCII coded character semantics" is of course mildly ambiguous due to considerations like EOL conventions. But "you know what I'm talking about". From stephen at xemacs.org Fri Jan 17 03:31:04 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 17 Jan 2014 11:31:04 +0900 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D8882B.1060303@canterbury.ac.nz> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D8882B.1060303@canterbury.ac.nz> Message-ID: <871u07cpav.fsf@uwakimon.sk.tsukuba.ac.jp> Greg writes: > I don't think it matters whether the internal details of [the EIBTI > vs. PBP] debate make sense to the rest of us. The main thing is > that a consensus seems to have been reached on bytes formatting > being basically a good thing. I think some of it matters to the documentation. From ncoghlan at gmail.com Fri Jan 17 03:52:15 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 17 Jan 2014 12:52:15 +1000 Subject: [Python-Dev] Closing the Clinic output format debate (at least for now) In-Reply-To: References: <52D83D60.6060106@hastings.org> Message-ID: On 17 January 2014 08:01, Guido van Rossum wrote: > I am tired of being the only blocker. So I withdraw my preference. Do > what you all can agree on without me. I had been staying out of the debate because I haven't had time to participate in the derby yet (if nobody has claimed the builtins yet, I was planning to do that this weekend). However, reviewing the changes for http://bugs.python.org/issue20189 has now been enough to convince me that a separate generated file is the way to go. My rationale is because of the way it affects the code review process: with a separate file, I can skip to the next file in the review as soon as I see ".clinic" in the file name. We may even be able to teach Reitveld to skip over clinic files (or at least suggest skipping them) automatically. With the current intermingled hand written + generated format, I can't tell just from the file name whether or not there are manual changes I need to review. Fortunately, in this particular case, Larry provided a list of the files with real changes in them, but I now think it makes more sense to instead bake the "this is all generated code, if you have reviewed the input changes and trust argument clinic to do the right thing, you can just skip reviewing it" notification directly into the filenames. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From nas at arctrix.com Fri Jan 17 04:36:19 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 17 Jan 2014 03:36:19 +0000 (UTC) Subject: [Python-Dev] PEP 461 updates References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D8882B.1060303@canterbury.ac.nz> Message-ID: Greg wrote: > I don't think it matters whether the internal details of that > debate make sense to the rest of us. The main thing is that > a consensus seems to have been reached on bytes formatting > being basically a good thing. I've been mostly steering clear of the metaphysical and writing code today. ;-) An extremely rough patch has been uploaded: http://bugs.python.org/issue20284 I have a new one almost ready that introduces __ascii__ rather than overloading __format__. I like it better, will upload to issue tracker soon. Regards, Neil From steve at pearwood.info Fri Jan 17 06:36:11 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 17 Jan 2014 16:36:11 +1100 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <874n53cptr.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D85E61.8040701@udel.edu> <874n53cptr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140117053611.GB3915@ando> On Fri, Jan 17, 2014 at 11:19:44AM +0900, Stephen J. Turnbull wrote: > Meta enough that I'll take Guido out of the CC. > > Nick Coghlan writes: > > > There are plenty of data formats (like SMTP and HTTP) that are > > constrained to be ASCII compatible, > > "ASCII compatible" is a technical term in encodings, which means > "bytes in the range 0-127 always have ASCII coded character semantics, > do what you like with bytes in the range 128-255."[1] Examples, and counter-examples, may help. Let me see if I have got this right: an ASCII-compatible encoding may be an ASCII-superset like Latin-1, or a variable-width encoding like UTF-8 where the ASCII chars are encoded to the same bytes as ASCII, and non-ASCII chars are not. A counter-example would be UTF-16, or some of the Asian encodings like Big5. Am I right so far? But Nick isn't talking about an encoding, he's talking about a data format. I think that an ASCII-compatible format means one where (in at least *some* parts of the data) bytes between 0 and 127 have the same meaning as in ASCII, e.g. byte 84 is to be interpreted as ASCII character "T". This doesn't mean that every byte 84 means "T", only that some of them do -- hopefully a well-defined sections of the data. Below, you introduce the term "ASCII segments" for these. > Worse, it's clearly confusing in this discussion. Let's stop using > this term to mean > > the data format has elements that are defined to contain only > bytes with ASCII coded character semantics > > (which is the relevant restriction AFAICS -- I don't know of any > ASCII-compatible formats where the bytes 128-255 are used for any > purpose other than encoding non-ASCII characters). OTOH, if it *is* > an ASCII-compatible text encoding, the semantics are dubious if the > bytes versions of many of these methods/operations are used. > > A documentation suggestion: It's easy enough to rewrite > > > constrained to be ASCII compatible, either globally, or locally in > > the parts being manipulated by an application (such as a file > > header). ASCII incompatible segments may be present, but in ways > > that allow the data processing to handle them correctly. > > as > > containing 'well-defined segments constrained to be (strictly) > ASCII-encoded' (aka ASCII segments). > > And then you can say > > are designed for use *only* on bytes > that are ASCII segments; use on other data is likely to cause > hard-to-diagnose corruption. An example: if you have the byte b'\x63', calling upper() on that will return b'\x43'. That is only meaningful if the byte is intended as the ASCII character "c". > Footnotes: > [1] "ASCII coded character semantics" is of course mildly ambiguous > due to considerations like EOL conventions. But "you know what I'm > talking about". I think I know what your talking about, but don't know for sure unless I explain it back to you. -- Steven From ncoghlan at gmail.com Fri Jan 17 06:46:15 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 17 Jan 2014 15:46:15 +1000 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <52D88CAF.6090406@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D8882B.1060303@canterbury.ac.nz> <52D88CAF.6090406@stoneleaf.us> Message-ID: On 17 January 2014 11:51, Ethan Furman wrote: > On 01/16/2014 05:32 PM, Greg wrote: >> >> >> I don't think it matters whether the internal details of that >> debate make sense to the rest of us. The main thing is that >> a consensus seems to have been reached on bytes formatting >> being basically a good thing. > > > And a good thing, too, on both counts! :) > > A few folks have suggested not implementing .format() on bytes; I've been > resistant, but then I remembered that format is also a function. > > http://docs.python.org/3/library/functions.html?highlight=ascii#format > ====================================================================== > format(value[, format_spec]) > > Convert a value to a ?formatted? representation, as controlled by > format_spec. The interpretation of format_spec will depend on the type of > the value argument, however there is a standard formatting syntax that is > used by most built-in types: Format Specification Mini-Language. > > The default format_spec is an empty string which usually gives the same > effect as calling str(value). > > A call to format(value, format_spec) is translated to > type(value).__format__(format_spec) which bypasses the instance dictionary > when searching for the value?s __format__() method. A TypeError exception is > raised if the method is not found or if either the format_spec or the return > value are not strings. > ====================================================================== > > Given that, I can relent on .format and just go with .__mod__ . A low-level > service for a low-level protocol, what? ;) Exactly - while I'm a fan of the new extensible formatting system and strongly prefer it to printf-style formatting for text, it also has a whole lot of complexity that is hard to translate to the binary domain, including the format() builtin and __format__ methods. Since the relevant use cases appear to be already covered adequately by prinft-style formatting, attempting to translate the flexible text formatting system as well just becomes additional complexity we don't need. I like Stephen Turnbull's suggestion of using "binary formats with ASCII segments" to distinguish the kind of formats we're talking about from ASCII compatible text encodings, and I think Python 3.5 will end up with a suite of solutions that suitably covers all use cases, just by bringing back printf-style formatting directly to bytes: * format(), str.format(), str.format_map(): a rich extensible text formatting system, including date interpolation support * str.__mod__: retained primarily for backwards compatibility, may occasionally be used as a text formatting optimisation tool (since the inflexibility means it will likely always be marginally faster than the rich formatting system for the cases that it covers) * bytes.__mod__, bytearray.__mod__: restored in Python 3.5 to simplify production of data in variable length binary formats that contain ASCII segments * the struct module: rich (but not extensible) formatting system for fixed length binary formats In Python 2, the binary format with ASCII segments use case was intermingled with general purpose text formatting on the str type, which is I think the main reason it has taken us so long to convince ourselves it is something that is genuinely worth bringing back in a more limited form in Python 3, rather than just being something we wanted back because we were used to having it in Python 2. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From v+python at g.nevcal.com Fri Jan 17 07:13:09 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 16 Jan 2014 22:13:09 -0800 Subject: [Python-Dev] PEP 461 updates In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D8882B.1060303@canterbury.ac.nz> <52D88CAF.6090406@stoneleaf.us> Message-ID: <52D8C9F5.4000108@g.nevcal.com> On 1/16/2014 9:46 PM, Nick Coghlan wrote: > On 17 January 2014 11:51, Ethan Furman wrote: >> On 01/16/2014 05:32 PM, Greg wrote: >>> >>> I don't think it matters whether the internal details of that >>> debate make sense to the rest of us. The main thing is that >>> a consensus seems to have been reached on bytes formatting >>> being basically a good thing. >> >> And a good thing, too, on both counts! :) >> >> A few folks have suggested not implementing .format() on bytes; I've been >> resistant, but then I remembered that format is also a function. >> >> http://docs.python.org/3/library/functions.html?highlight=ascii#format >> ====================================================================== >> format(value[, format_spec]) >> >> Convert a value to a ?formatted? representation, as controlled by >> format_spec. The interpretation of format_spec will depend on the type of >> the value argument, however there is a standard formatting syntax that is >> used by most built-in types: Format Specification Mini-Language. >> >> The default format_spec is an empty string which usually gives the same >> effect as calling str(value). >> >> A call to format(value, format_spec) is translated to >> type(value).__format__(format_spec) which bypasses the instance dictionary >> when searching for the value?s __format__() method. A TypeError exception is >> raised if the method is not found or if either the format_spec or the return >> value are not strings. >> ====================================================================== >> >> Given that, I can relent on .format and just go with .__mod__ . A low-level >> service for a low-level protocol, what? ;) > Exactly - while I'm a fan of the new extensible formatting system and > strongly prefer it to printf-style formatting for text, it also has a > whole lot of complexity that is hard to translate to the binary > domain, including the format() builtin and __format__ methods. > > Since the relevant use cases appear to be already covered adequately > by prinft-style formatting, attempting to translate the flexible text > formatting system as well just becomes additional complexity we don't > need. > > I like Stephen Turnbull's suggestion of using "binary formats with > ASCII segments" to distinguish the kind of formats we're talking about > from ASCII compatible text encodings, I liked that too, and almost said so on his posting, but will say it here, instead. > and I think Python 3.5 will end > up with a suite of solutions that suitably covers all use cases, just > by bringing back printf-style formatting directly to bytes: > > * format(), str.format(), str.format_map(): a rich extensible text > formatting system, including date interpolation support > * str.__mod__: retained primarily for backwards compatibility, may > occasionally be used as a text formatting optimisation tool (since the > inflexibility means it will likely always be marginally faster than > the rich formatting system for the cases that it covers) > * bytes.__mod__, bytearray.__mod__: restored in Python 3.5 to simplify > production of data in variable length binary formats that contain > ASCII segments > * the struct module: rich (but not extensible) formatting system for > fixed length binary formats Adding format codes with variable length could enhance the struct module to additional uses. C structs, on which it is modeled, often get around the difficulty of variable length items by defining one variable length item at the end, or by defining offsets in the fixed part, to variable length parts that follows. Such a structure cannot presently be created by struct alone. > In Python 2, the binary format with ASCII segments use case was > intermingled with general purpose text formatting on the str type, > which is I think the main reason it has taken us so long to convince > ourselves it is something that is genuinely worth bringing back in a > more limited form in Python 3, rather than just being something we > wanted back because we were used to having it in Python 2. > > Cheers, > Nick. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Jan 17 08:47:37 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 17 Jan 2014 18:47:37 +1100 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D80771.1090402@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> Message-ID: <20140117074737.GD3915@ando> On Thu, Jan 16, 2014 at 08:23:13AM -0800, Ethan Furman wrote: > As I understand it, str.format will call the object's __format__. So, for > example, if I say: > > u'the value is: %d' % myNum(17) > > then it will be myNum.__format__ that gets called, not int.__format__; I seem to have missed something, because I am completely confused... Why are you talking about str.format and then show an example using % instead? %d calls __str__, not __format__. This is in Python 3.3: py> class MyNum(int): ... def __str__(self): ... print("Calling MyNum.__str__") ... return super().__str__() ... def __format__(self): ... print("Calling MyNum.__format__") ... return super().__format__() ... py> n = MyNum(17) py> u"%d" % n Calling MyNum.__str__ '17' By analogy, if we have a bytes %d formatting, surely it should either: (1) call type(n).__bytes__(n), which is guaranteed to raise if the result isn't ASCII (i.e. like len() raises if the result isn't an int); or (2) call type(n).__str__(n).encode("ascii", "strict"). Personally, I lean towards (2), even though that means you can't have a single class provide an ASCII string to b'%d' and a non-ASCII string to u'%d'. > this > is precisely what we don't want, since can't know that myNum is only going > to return ASCII characters. It seems to me that Consenting Adults applies here. If class MyNum returns a non-ASCII string, then you ought to get a runtime exception, exactly the same as happens with just about every other failure in Python. If you don't want that possible exception, then don't use MyNum, or explicitly wrap it in a call to int: b'the value is: %d' % int(MyNum(17)) The *worst* solution would be to completely ignore MyNum.__str__. That's a nasty violation of the Principle Of Least Surprise, and will lead to confusion ("why isn't my class' __str__ method being called?") and bugs. * Explicit is better than implicit -- better to explicitly wrap MyNum in a call to int() than to have bytes %d automagically do it for you; * Special cases aren't special enough to break the rules -- bytes %d isn't so special that standard Python rules about calling special methods should be ignored; * Errors should never pass silently -- if MyNum does the wrong thing when used with bytes %d, you should get an exception. > This is why I would have bytes.__format__, as part of its parsing, call > int, index, or float depending on the format code; so the above example > would have bytes.__format__ calling int() on myNum(17), The above example you give doesn't have any bytes in it. Can you explain what you meant to say? I'm guessing you intended this: b'the value is: %d' % MyNum(17) rather than using u'' as actually given, but I don't really know. -- Steven From ericsnowcurrently at gmail.com Fri Jan 17 09:01:37 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 17 Jan 2014 01:01:37 -0700 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D82529.9020708@trueblade.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> Message-ID: On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith wrote: > For the first iteration of bytes.format(), I think we should just > support the exact types of int, float, and bytes. It will call the > type's__format__ (with the object as "self") and encode the result to > ASCII. For the stated use case of 2.x compatibility, I suspect this will > cover > 90% of the uses in real code. If we find there are cases where > real code needs additional types supported, we can consider adding > __format_ascii__ (or whatever name we cook up). +1 -eric From ericsnowcurrently at gmail.com Fri Jan 17 09:07:48 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 17 Jan 2014 01:07:48 -0700 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> Message-ID: On Thu, Jan 16, 2014 at 3:06 PM, Jan Kaliszewski wrote: > I'd treat the format()+.__format__()+str.format()-"ecosystem" as > a nice text-data-oriented, *complete* Py3k feature, backported to > Python 2 to share the benefits of the feature with it as well as > to make the 2-to-3 transition a bit easier. > > IMHO, the PEP-3101's note cited above just describes a workaround > over the flaws of the Py2's obsolete text model. Moving such > complications into Py3k would make the feature (and especially the > ability to implement your own .__format__()) harder to understand > and make use of -- for little profit. > > Such a move is not needed for compatibility. And, IMHO, the > format()/__format__()/str.format()-matter is all about nice and > flexible *text* formatting, not about binary data interpolation. [disclaimer: I personally don't have many use cases for any bytes formatting.] Yet there is still a strong symmetry between str and bytes that makes bytes easier to use. I don't always use formatting, but when I do I use .format(). :) never-been-a-fan-of-mod-formatting-ly yours, -eric From steve at pearwood.info Fri Jan 17 09:21:06 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 17 Jan 2014 19:21:06 +1100 Subject: [Python-Dev] AC Derby and accepting None for optional positional arguments In-Reply-To: References: <52D76EFF.6080605@hastings.org> <52D7CEBF.7020309@mrabarnett.plus.com> <52D83663.8040403@hastings.org> <52D83DDB.9030205@hotpy.org> Message-ID: <20140117082106.GE3915@ando> On Thu, Jan 16, 2014 at 01:08:47PM -0800, Ryan Smith-Roberts wrote: > socket.getservbyname(servicename[, protocolname]) > > This is not an inspectable signature, since pure Python does not support > bracketed arguments. To make it inspectable, we must give protocolname a > (valid Python) default value: > > socket.getservbyname(servicename, protocolname=None) > > Unfortunately, while useful and inspectable, this signature is not correct. > For a pure Python function, passing None for protocolname is the same as > omitting it. However, if you pass None to getservbyname(), it raises a > TypeError. So, we have these three options: > > 1) Don't give getservbyname() an inspectable signature. > 2) Lie to the user about the acceptability of None. > 3) Alter the semantics of getservbyname() to treat None as equivalent to > omitting protocolname. > > Obviously #2 is out. My question: is #3 ever acceptable? It's a real > change, as it breaks any code that relies on the TypeError exception. The answer seems straightforward to me: it should be treated as any other change of behaviour, and judged on a case-by-case basis. I think the bug tracker is the right place to ask. Since it's not a bug fix, it may be able to be changed, but not lightly, and not in a bug-fix release. The fact that the motivation for the behaviour change is Argument Clinic should not change the decision, as far as I can see. Would a feature request "Allow None as default protocolname" be accepted? -- Steven From stephen at xemacs.org Fri Jan 17 10:59:30 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 17 Jan 2014 18:59:30 +0900 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <20140117053611.GB3915@ando> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D85E61.8040701@udel.edu> <874n53cptr.fsf@uwakimon.sk.tsukuba.ac.jp> <20140117053611.GB3915@ando> Message-ID: <87wqhzapz1.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > On Fri, Jan 17, 2014 at 11:19:44AM +0900, Stephen J. Turnbull wrote: > > "ASCII compatible" is a technical term in encodings, which means > > "bytes in the range 0-127 always have ASCII coded character semantics, > > do what you like with bytes in the range 128-255."[1] > > Examples, and counter-examples, may help. Let me see if I have got this > right: an ASCII-compatible encoding may be an ASCII-superset like > Latin-1, or a variable-width encoding like UTF-8 where the ASCII chars > are encoded to the same bytes as ASCII, and non-ASCII chars are not. A > counter-example would be UTF-16, or some of the Asian encodings like > Big5. Am I right so far? All correct. > But Nick isn't talking about an encoding, he's talking about a data > format. I think that an ASCII-compatible format means one where (in at > least *some* parts of the data) bytes between 0 and 127 have the same > meaning as in ASCII, e.g. byte 84 is to be interpreted as ASCII > character "T". This doesn't mean that every byte 84 means "T", only that > some of them do -- hopefully a well-defined sections of the data. Below, > you introduce the term "ASCII segments" for these. Yes, except that I believe Nick, as well as the "file-and-wire guys", strengthen "hopefully well-defined" to just "well-defined". > > are designed for use *only* on bytes > > that are ASCII segments; use on other data is likely to cause > > hard-to-diagnose corruption. > > An example: if you have the byte b'\x63', calling upper() on that will > return b'\x43'. That is only meaningful if the byte is intended as the > ASCII character "c". Good example. From nas at arctrix.com Fri Jan 17 11:49:42 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 17 Jan 2014 10:49:42 +0000 (UTC) Subject: [Python-Dev] Migration from Python 2.7 and bytes formatting Message-ID: As I see it, there are two separate goals in adding formatting methods to bytes. One is to make it easier to write new programs that manipulate byte data. Another is to make it easier to upgrade Python 2.x programs to Python 3.x. Here is an idea to better address these separate goals. Introduce %-interpolation for bytes. Support the following format codes to aid in writing new code: %b: insert arbitrary bytes (via __bytes__ or Py_buffer) %[dox]: insert an integer, encoded as ASCII %[eEfFgG]: insert a float, encoded as ASCII %a: call ascii(), insert result Add a command-line option, disabled by default, that enables the following format codes: %s: if the object has __bytes__ or Py_buffer then insert it. Otherwise, call str() and encode with the 'ascii' codec %r: call repr(), encode with the 'ascii' codec %[iuX]: as per Python 2.x, for backwards compatibility Introducing these extra codes and the command-line option will provide a more gradual upgrade path. The next step in porting could be to examine each %s inside bytes literals and decide if they should either be converted to %b or if the literal should be converted to a unicode literal. Any %r codes could likely be safely changed to %a. From ncoghlan at gmail.com Fri Jan 17 12:42:59 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 17 Jan 2014 21:42:59 +1000 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> Message-ID: On 17 Jan 2014 18:03, "Eric Snow" wrote: > > On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith wrote: > > For the first iteration of bytes.format(), I think we should just > > support the exact types of int, float, and bytes. It will call the > > type's__format__ (with the object as "self") and encode the result to > > ASCII. For the stated use case of 2.x compatibility, I suspect this will > > cover > 90% of the uses in real code. If we find there are cases where > > real code needs additional types supported, we can consider adding > > __format_ascii__ (or whatever name we cook up). > > +1 Please don't make me learn the limitations of a new mini language without a really good reason. For the sake of argument, assume we have a Python 3.5 with bytes.__mod__ restored roughly as described in PEP 461. *Given* that feature set, what is the rationale for *adding* bytes.format? What new capabilities will it provide that aren't already covered by printf-style interpolation directly to bytes or text formatting followed by encoding the result? Cheers, Nick. > > -eric > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Fri Jan 17 13:34:13 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 17 Jan 2014 07:34:13 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> Message-ID: <52D92345.5050202@trueblade.com> On 1/17/2014 6:42 AM, Nick Coghlan wrote: > > On 17 Jan 2014 18:03, "Eric Snow" > wrote: >> >> On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith > wrote: >> > For the first iteration of bytes.format(), I think we should just >> > support the exact types of int, float, and bytes. It will call the >> > type's__format__ (with the object as "self") and encode the result to >> > ASCII. For the stated use case of 2.x compatibility, I suspect this will >> > cover > 90% of the uses in real code. If we find there are cases where >> > real code needs additional types supported, we can consider adding >> > __format_ascii__ (or whatever name we cook up). >> >> +1 > > Please don't make me learn the limitations of a new mini language > without a really good reason. > > For the sake of argument, assume we have a Python 3.5 with bytes.__mod__ > restored roughly as described in PEP 461. *Given* that feature set, what > is the rationale for *adding* bytes.format? What new capabilities will > it provide that aren't already covered by printf-style interpolation > directly to bytes or text formatting followed by encoding the result? The only reason to add any of this, in my mind, is to ease porting of 2.x code. If my proposal covers most of the cases of b''.format() that exist in 2.x code that wants to move to 3.5, then I think it's worth doing. Is there any such code that's blocked from porting by the lack of b''.format() that supports bytes, int, and float? I don't know. I concede that it's unlikely. IF this were a feature that we were going to add to 3.5 on its own merits, I think we add __format_ascii__ and make the whole thing extensible. Is there any new code that's blocked from being written by missing b"".format()? I don't know that, either. Eric. From eric at trueblade.com Fri Jan 17 15:50:00 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 17 Jan 2014 09:50:00 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D92345.5050202@trueblade.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> Message-ID: <52D94318.3070506@trueblade.com> On 01/17/2014 07:34 AM, Eric V. Smith wrote: > On 1/17/2014 6:42 AM, Nick Coghlan wrote: >> >> On 17 Jan 2014 18:03, "Eric Snow" > > wrote: >>> >>> On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith > > wrote: >>>> For the first iteration of bytes.format(), I think we should just >>>> support the exact types of int, float, and bytes. It will call the >>>> type's__format__ (with the object as "self") and encode the result to >>>> ASCII. For the stated use case of 2.x compatibility, I suspect this will >>>> cover > 90% of the uses in real code. If we find there are cases where >>>> real code needs additional types supported, we can consider adding >>>> __format_ascii__ (or whatever name we cook up). >>> >>> +1 >> >> Please don't make me learn the limitations of a new mini language >> without a really good reason. >> >> For the sake of argument, assume we have a Python 3.5 with bytes.__mod__ >> restored roughly as described in PEP 461. *Given* that feature set, what >> is the rationale for *adding* bytes.format? What new capabilities will >> it provide that aren't already covered by printf-style interpolation >> directly to bytes or text formatting followed by encoding the result? > > The only reason to add any of this, in my mind, is to ease porting of > 2.x code. If my proposal covers most of the cases of b''.format() that > exist in 2.x code that wants to move to 3.5, then I think it's worth > doing. Is there any such code that's blocked from porting by the lack of > b''.format() that supports bytes, int, and float? I don't know. I > concede that it's unlikely. > > IF this were a feature that we were going to add to 3.5 on its own > merits, I think we add __format_ascii__ and make the whole thing > extensible. Is there any new code that's blocked from being written by > missing b"".format()? I don't know that, either. Following up, I think this leaves us with 3 choices: 1. Do not implement bytes.format(). We tell any 2.x code that's written to use str.format() to switch to %-formatting for their common code base. 2. Add the simplistic version of bytes.format() that I describe above, restricted to accepting bytes, int, and float (and no subclasses). Some 2.x code will work, some will need to change to %-formatting. 3. Add bytes.format() and the __format_ascii__ protocol. We might want to also add a format_ascii() builtin, to match __format__ and format(). This would require the least change to 2.x code that uses str.format() and wants to move to bytes.format(), but would require some work on the 3.x side. I'd advocate 1 or 2. Eric. From breamoreboy at yahoo.co.uk Fri Jan 17 16:15:58 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 17 Jan 2014 15:15:58 +0000 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D94318.3070506@trueblade.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> Message-ID: On 17/01/2014 14:50, Eric V. Smith wrote: > On 01/17/2014 07:34 AM, Eric V. Smith wrote: >> On 1/17/2014 6:42 AM, Nick Coghlan wrote: >>> >>> On 17 Jan 2014 18:03, "Eric Snow" >> > wrote: >>>> >>>> On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith >> > wrote: >>>>> For the first iteration of bytes.format(), I think we should just >>>>> support the exact types of int, float, and bytes. It will call the >>>>> type's__format__ (with the object as "self") and encode the result to >>>>> ASCII. For the stated use case of 2.x compatibility, I suspect this will >>>>> cover > 90% of the uses in real code. If we find there are cases where >>>>> real code needs additional types supported, we can consider adding >>>>> __format_ascii__ (or whatever name we cook up). >>>> >>>> +1 >>> >>> Please don't make me learn the limitations of a new mini language >>> without a really good reason. >>> >>> For the sake of argument, assume we have a Python 3.5 with bytes.__mod__ >>> restored roughly as described in PEP 461. *Given* that feature set, what >>> is the rationale for *adding* bytes.format? What new capabilities will >>> it provide that aren't already covered by printf-style interpolation >>> directly to bytes or text formatting followed by encoding the result? >> >> The only reason to add any of this, in my mind, is to ease porting of >> 2.x code. If my proposal covers most of the cases of b''.format() that >> exist in 2.x code that wants to move to 3.5, then I think it's worth >> doing. Is there any such code that's blocked from porting by the lack of >> b''.format() that supports bytes, int, and float? I don't know. I >> concede that it's unlikely. >> >> IF this were a feature that we were going to add to 3.5 on its own >> merits, I think we add __format_ascii__ and make the whole thing >> extensible. Is there any new code that's blocked from being written by >> missing b"".format()? I don't know that, either. > > Following up, I think this leaves us with 3 choices: > > 1. Do not implement bytes.format(). We tell any 2.x code that's written > to use str.format() to switch to %-formatting for their common code base. > > 2. Add the simplistic version of bytes.format() that I describe above, > restricted to accepting bytes, int, and float (and no subclasses). Some > 2.x code will work, some will need to change to %-formatting. > > 3. Add bytes.format() and the __format_ascii__ protocol. We might want > to also add a format_ascii() builtin, to match __format__ and format(). > This would require the least change to 2.x code that uses str.format() > and wants to move to bytes.format(), but would require some work on the > 3.x side. > > I'd advocate 1 or 2. > > Eric. > For both options 1 and 2 surely you cannot be suggesting that after people have written 2.x code to use format() as %f formatting is to be deprecated, they now have to change the code back to the way they may well have written it in the first place? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From eric at trueblade.com Fri Jan 17 16:24:32 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 17 Jan 2014 10:24:32 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> Message-ID: <52D94B30.8050205@trueblade.com> On 01/17/2014 10:15 AM, Mark Lawrence wrote: > On 17/01/2014 14:50, Eric V. Smith wrote: >> On 01/17/2014 07:34 AM, Eric V. Smith wrote: >>> On 1/17/2014 6:42 AM, Nick Coghlan wrote: >>>> >>>> On 17 Jan 2014 18:03, "Eric Snow" >>> > wrote: >>>>> >>>>> On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith >>> > wrote: >>>>>> For the first iteration of bytes.format(), I think we should just >>>>>> support the exact types of int, float, and bytes. It will call the >>>>>> type's__format__ (with the object as "self") and encode the result to >>>>>> ASCII. For the stated use case of 2.x compatibility, I suspect >>>>>> this will >>>>>> cover > 90% of the uses in real code. If we find there are cases >>>>>> where >>>>>> real code needs additional types supported, we can consider adding >>>>>> __format_ascii__ (or whatever name we cook up). >>>>> >>>>> +1 >>>> >>>> Please don't make me learn the limitations of a new mini language >>>> without a really good reason. >>>> >>>> For the sake of argument, assume we have a Python 3.5 with >>>> bytes.__mod__ >>>> restored roughly as described in PEP 461. *Given* that feature set, >>>> what >>>> is the rationale for *adding* bytes.format? What new capabilities will >>>> it provide that aren't already covered by printf-style interpolation >>>> directly to bytes or text formatting followed by encoding the result? >>> >>> The only reason to add any of this, in my mind, is to ease porting of >>> 2.x code. If my proposal covers most of the cases of b''.format() that >>> exist in 2.x code that wants to move to 3.5, then I think it's worth >>> doing. Is there any such code that's blocked from porting by the lack of >>> b''.format() that supports bytes, int, and float? I don't know. I >>> concede that it's unlikely. >>> >>> IF this were a feature that we were going to add to 3.5 on its own >>> merits, I think we add __format_ascii__ and make the whole thing >>> extensible. Is there any new code that's blocked from being written by >>> missing b"".format()? I don't know that, either. >> >> Following up, I think this leaves us with 3 choices: >> >> 1. Do not implement bytes.format(). We tell any 2.x code that's written >> to use str.format() to switch to %-formatting for their common code base. >> >> 2. Add the simplistic version of bytes.format() that I describe above, >> restricted to accepting bytes, int, and float (and no subclasses). Some >> 2.x code will work, some will need to change to %-formatting. >> >> 3. Add bytes.format() and the __format_ascii__ protocol. We might want >> to also add a format_ascii() builtin, to match __format__ and format(). >> This would require the least change to 2.x code that uses str.format() >> and wants to move to bytes.format(), but would require some work on the >> 3.x side. >> >> I'd advocate 1 or 2. >> >> Eric. >> > > For both options 1 and 2 surely you cannot be suggesting that after > people have written 2.x code to use format() as %f formatting is to be > deprecated, they now have to change the code back to the way they may > well have written it in the first place? > That would be part of it, yes. Otherwise you need #3. This is all assuming we've ruled out an option 4, because of the exceptions raised depending on what __format__ does: 4. Add bytes.format(), have it convert the format specifier to str (unicode), call __format__ and encode the result back to ASCII. Accept that there will be data-driven exceptions depending on the result of the __format__ call. I'm open to other ideas. Eric. From ethan at stoneleaf.us Fri Jan 17 16:41:20 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 17 Jan 2014 07:41:20 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> Message-ID: <52D94F20.70707@stoneleaf.us> On 01/17/2014 07:15 AM, Mark Lawrence wrote: > > For both options 1 and 2 surely you cannot be suggesting that > after people have written 2.x code to use format() as %f > formatting is to be deprecated %f formatting is not deprecated, and will not be in 3.x's lifetime. -- ~Ethan~ From eric at trueblade.com Fri Jan 17 16:50:20 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 17 Jan 2014 10:50:20 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D94B30.8050205@trueblade.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> <52D94B30.8050205@trueblade.com> Message-ID: <52D9513C.1080906@trueblade.com> On 01/17/2014 10:24 AM, Eric V. Smith wrote: > On 01/17/2014 10:15 AM, Mark Lawrence wrote: >> On 17/01/2014 14:50, Eric V. Smith wrote: >>> On 01/17/2014 07:34 AM, Eric V. Smith wrote: >>>> On 1/17/2014 6:42 AM, Nick Coghlan wrote: >>>>> >>>>> On 17 Jan 2014 18:03, "Eric Snow" >>>> > wrote: >>>>>> >>>>>> On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith >>>> > wrote: >>>>>>> For the first iteration of bytes.format(), I think we should just >>>>>>> support the exact types of int, float, and bytes. It will call the >>>>>>> type's__format__ (with the object as "self") and encode the result to >>>>>>> ASCII. For the stated use case of 2.x compatibility, I suspect >>>>>>> this will >>>>>>> cover > 90% of the uses in real code. If we find there are cases >>>>>>> where >>>>>>> real code needs additional types supported, we can consider adding >>>>>>> __format_ascii__ (or whatever name we cook up). >>>>>> >>>>>> +1 >>>>> >>>>> Please don't make me learn the limitations of a new mini language >>>>> without a really good reason. >>>>> >>>>> For the sake of argument, assume we have a Python 3.5 with >>>>> bytes.__mod__ >>>>> restored roughly as described in PEP 461. *Given* that feature set, >>>>> what >>>>> is the rationale for *adding* bytes.format? What new capabilities will >>>>> it provide that aren't already covered by printf-style interpolation >>>>> directly to bytes or text formatting followed by encoding the result? >>>> >>>> The only reason to add any of this, in my mind, is to ease porting of >>>> 2.x code. If my proposal covers most of the cases of b''.format() that >>>> exist in 2.x code that wants to move to 3.5, then I think it's worth >>>> doing. Is there any such code that's blocked from porting by the lack of >>>> b''.format() that supports bytes, int, and float? I don't know. I >>>> concede that it's unlikely. >>>> >>>> IF this were a feature that we were going to add to 3.5 on its own >>>> merits, I think we add __format_ascii__ and make the whole thing >>>> extensible. Is there any new code that's blocked from being written by >>>> missing b"".format()? I don't know that, either. >>> >>> Following up, I think this leaves us with 3 choices: >>> >>> 1. Do not implement bytes.format(). We tell any 2.x code that's written >>> to use str.format() to switch to %-formatting for their common code base. >>> >>> 2. Add the simplistic version of bytes.format() that I describe above, >>> restricted to accepting bytes, int, and float (and no subclasses). Some >>> 2.x code will work, some will need to change to %-formatting. >>> >>> 3. Add bytes.format() and the __format_ascii__ protocol. We might want >>> to also add a format_ascii() builtin, to match __format__ and format(). >>> This would require the least change to 2.x code that uses str.format() >>> and wants to move to bytes.format(), but would require some work on the >>> 3.x side. For #3, hopefully this "additional work" on the 3.x side would just be to add, to each class where you already have a custom __format__ used for b''.format(), code like: def __format_ascii__(self, fmt): return self.__format__(fmt.decode()).encode('ascii') That is, we're pushing the possibility of having to deal with an encoding exception off to the type, instead of having it live in bytes.format(). And to agree with Ethan: %-formatting isn't deprecated. Eric. From breamoreboy at yahoo.co.uk Fri Jan 17 16:58:10 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 17 Jan 2014 15:58:10 +0000 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D94F20.70707@stoneleaf.us> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> <52D94F20.70707@stoneleaf.us> Message-ID: On 17/01/2014 15:41, Ethan Furman wrote: > On 01/17/2014 07:15 AM, Mark Lawrence wrote: >> >> For both options 1 and 2 surely you cannot be suggesting that >> after people have written 2.x code to use format() as %f >> formatting is to be deprecated > > %f formatting is not deprecated, and will not be in 3.x's lifetime. > > -- > ~Ethan~ I'm sorry, I got the above wrong, I should have said "was to be deprecated" :( -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From brett at python.org Fri Jan 17 17:00:38 2014 From: brett at python.org (Brett Cannon) Date: Fri, 17 Jan 2014 11:00:38 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D94318.3070506@trueblade.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> Message-ID: On Fri, Jan 17, 2014 at 9:50 AM, Eric V. Smith wrote: > On 01/17/2014 07:34 AM, Eric V. Smith wrote: > > On 1/17/2014 6:42 AM, Nick Coghlan wrote: > >> > >> On 17 Jan 2014 18:03, "Eric Snow" >> > wrote: > >>> > >>> On Thu, Jan 16, 2014 at 11:30 AM, Eric V. Smith >> > wrote: > >>>> For the first iteration of bytes.format(), I think we should just > >>>> support the exact types of int, float, and bytes. It will call the > >>>> type's__format__ (with the object as "self") and encode the result to > >>>> ASCII. For the stated use case of 2.x compatibility, I suspect this > will > >>>> cover > 90% of the uses in real code. If we find there are cases where > >>>> real code needs additional types supported, we can consider adding > >>>> __format_ascii__ (or whatever name we cook up). > >>> > >>> +1 > >> > >> Please don't make me learn the limitations of a new mini language > >> without a really good reason. > >> > >> For the sake of argument, assume we have a Python 3.5 with bytes.__mod__ > >> restored roughly as described in PEP 461. *Given* that feature set, what > >> is the rationale for *adding* bytes.format? What new capabilities will > >> it provide that aren't already covered by printf-style interpolation > >> directly to bytes or text formatting followed by encoding the result? > > > > The only reason to add any of this, in my mind, is to ease porting of > > 2.x code. If my proposal covers most of the cases of b''.format() that > > exist in 2.x code that wants to move to 3.5, then I think it's worth > > doing. Is there any such code that's blocked from porting by the lack of > > b''.format() that supports bytes, int, and float? I don't know. I > > concede that it's unlikely. > > > > IF this were a feature that we were going to add to 3.5 on its own > > merits, I think we add __format_ascii__ and make the whole thing > > extensible. Is there any new code that's blocked from being written by > > missing b"".format()? I don't know that, either. > > Following up, I think this leaves us with 3 choices: > > 1. Do not implement bytes.format(). We tell any 2.x code that's written > to use str.format() to switch to %-formatting for their common code base. > +1 I would rephrase it to "switch to %-formatting for bytes usage for their common code base". If they are working with actual text then using str.format() still works (and is actually nicer to use IMO). It actually might make the str/bytes relationship even clearer, especially if we start to promote that str.format() is for text and %-formatting is for bytes. > > 2. Add the simplistic version of bytes.format() that I describe above, > restricted to accepting bytes, int, and float (and no subclasses). Some > 2.x code will work, some will need to change to %-formatting. > -1 I am still not comfortable with the special-casing by type for bytes.format(). > > 3. Add bytes.format() and the __format_ascii__ protocol. We might want > to also add a format_ascii() builtin, to match __format__ and format(). > This would require the least change to 2.x code that uses str.format() > and wants to move to bytes.format(), but would require some work on the > 3.x side. > +0 Would allow for easy porting and it's general enough, but I don't know if working with bytes really requires this much beyond supporting the porting story. I'm still +1 on PEP 460 for bytes.format() as a nice way to simplify basic bytes usage in Python 3, but if that's not accepted then I say just drop bytes.format() entirely and let %-formatting be the way people do Python 2/3 bytes work (if they are not willing to build it up from scratch like they already can do). -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Jan 17 17:06:55 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 17 Jan 2014 16:06:55 +0000 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D9513C.1080906@trueblade.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> <52D94B30.8050205@trueblade.com> <52D9513C.1080906@trueblade.com> Message-ID: On 17 January 2014 15:50, Eric V. Smith wrote: > For #3, hopefully this "additional work" on the 3.x side would just be > to add, to each class where you already have a custom __format__ used > for b''.format(), code like: > > def __format_ascii__(self, fmt): > return self.__format__(fmt.decode()).encode('ascii') For me, the big cost would seem to be in the necessary documentation, explaining the new special method in the language reference, explaining the 2 different forms of format() in the built in types docs. And the conceptual overhead of another special method for people to be aware of. If I implement my own number subclass, do I need to implement __format_ascii__? My gut feeling is that we simply don't implement format() for bytes. I don't see sufficient benefit, if %-formatting is available. Paul. From barry at python.org Fri Jan 17 17:16:00 2014 From: barry at python.org (Barry Warsaw) Date: Fri, 17 Jan 2014 11:16:00 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> Message-ID: <20140117111600.604ccb5f@anarchist.wooz.org> On Jan 17, 2014, at 11:00 AM, Brett Cannon wrote: >I would rephrase it to "switch to %-formatting for bytes usage for their >common code base". -1. %-formatting is so neanderthal. :) -Barry From ncoghlan at gmail.com Fri Jan 17 17:24:48 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 18 Jan 2014 02:24:48 +1000 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> <52D94B30.8050205@trueblade.com> <52D9513C.1080906@trueblade.com> Message-ID: On 18 Jan 2014 02:08, "Paul Moore" wrote: > > On 17 January 2014 15:50, Eric V. Smith wrote: > > For #3, hopefully this "additional work" on the 3.x side would just be > > to add, to each class where you already have a custom __format__ used > > for b''.format(), code like: > > > > def __format_ascii__(self, fmt): > > return self.__format__(fmt.decode()).encode('ascii') > > For me, the big cost would seem to be in the necessary documentation, > explaining the new special method in the language reference, > explaining the 2 different forms of format() in the built in types > docs. And the conceptual overhead of another special method for people > to be aware of. If I implement my own number subclass, do I need to > implement __format_ascii__? > > My gut feeling is that we simply don't implement format() for bytes. I > don't see sufficient benefit, if %-formatting is available. Exactly, it's the documentation problem to explain "when would I recommend using this over the alternatives?" that turns me off the idea of general purpose bytes formatting. printf style covers the use cases we have identified, and the code bases of immediate interest support 2.5 or earlier and thus *must* be using printf-style formatting. Add to that the fact that to maintain the Python 3 text model, we either have to gut it to the point where it has very few of the benefits the text version offers printf-style formatting, or else we introduce a whole new protocol for a feature that we consider so borderline that it took us six Python 3 releases to add it back to the language. By contrast, the following model is relatively easy to document: * printf-style is low level and relatively inflexible, but available for both text and for ASCII compatible segments in binary data. The %s formatting code accepts arbitrary objects (using str) in text mode, but only buffer exporters and objects with a __bytes__ method in binary mode. * the format is high level and very flexible, but available only for text - the result must be explicitly encoded to binary if that is needed. Cheers, Nick. > > Paul. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jan 17 17:49:21 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 17 Jan 2014 08:49:21 -0800 Subject: [Python-Dev] PEP 461 Final? Message-ID: <52D95F11.3020005@stoneleaf.us> Here's the text for your reading pleasure. I'll commit the PEP after I add some markup. Major change: - dropped `format` support, just using %-interpolation Coming soon: - Rationale section ;) ================================================================================ PEP: 461 Title: Adding % formatting to bytes Version: $Revision$ Last-Modified: $Date$ Author: Ethan Furman Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-01-13 Python-Version: 3.5 Post-History: 2014-01-14, 2014-01-15, 2014-01-17 Resolution: Abstract ======== This PEP proposes adding % formatting operations similar to Python 2's str type to bytes [1]_ [2]_. Overriding Principles ===================== In order to avoid the problems of auto-conversion and Unicode exceptions that could plague Py2 code, all object checking will be done by duck-typing, not by values contained in a Unicode representation [3]_. Proposed semantics for bytes formatting ======================================= %-interpolation --------------- All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) will be supported, and will work as they do for str, including the padding, justification and other related modifiers. Example:: >>> b'%4x' % 10 b' a' >>> '%#4x' % 10 ' 0xa' >>> '%04X' % 10 '000A' %c will insert a single byte, either from an int in range(256), or from a bytes argument of length 1, not from a str. Example: >>> b'%c' % 48 b'0' >>> b'%c' % b'a' b'a' %s is restricted in what it will accept:: - input type supports Py_buffer? use it to collect the necessary bytes - input type is something else? use its __bytes__ method; if there isn't one, raise a TypeError Examples: >>> b'%s' % b'abc' b'abc' >>> b'%s' % 3.14 Traceback (most recent call last): ... TypeError: 3.14 has no __bytes__ method >>> b'%s' % 'hello world!' Traceback (most recent call last): ... TypeError: 'hello world' has no __bytes__ method, perhaps you need to encode it? .. note:: Because the str type does not have a __bytes__ method, attempts to directly use 'a string' as a bytes interpolation value will raise an exception. To use 'string' values, they must be encoded or otherwise transformed into a bytes sequence:: 'a string'.encode('latin-1') Numeric Format Codes -------------------- To properly handle int and float subclasses, int(), index(), and float() will be called on the objects intended for (d, i, u), (b, o, x, X), and (e, E, f, F, g, G). Unsupported codes ----------------- %r (which calls __repr__), and %a (which calls ascii() on __repr__) are not supported. Proposed variations =================== It was suggested to let %s accept numbers, but since numbers have their own format codes this idea was discarded. It has been suggested to use %b for bytes instead of %s. - Rejected as %b does not exist in Python 2.x %-interpolation, which is why we are using %s. It has been proposed to automatically use .encode('ascii','strict') for str arguments to %s. - Rejected as this would lead to intermittent failures. Better to have the operation always fail so the trouble-spot can be correctly fixed. It has been proposed to have %s return the ascii-encoded repr when the value is a str (b'%s' % 'abc' --> b"'abc'"). - Rejected as this would lead to hard to debug failures far from the problem site. Better to have the operation always fail so the trouble-spot can be easily fixed. Originally this PEP also proposed adding format style formatting, but it was decided that format and its related machinery were all strictly text (aka str) based, and it was dropped. Various new special methods were proposed, such as __ascii__, __format_bytes___, etc.; such methods are not needed at this time, but can be visited again later if real-world use shows deficiencies with this solution. Footnotes ========= .. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting .. [2] neither string.Template, format, nor str.format are under consideration. .. [3] %c is not an exception as neither of its possible arguments are unicode. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: ================================================================================ From ethan at stoneleaf.us Fri Jan 17 17:07:03 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 17 Jan 2014 08:07:03 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <20140117074737.GD3915@ando> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <20140117074737.GD3915@ando> Message-ID: <52D95527.3020808@stoneleaf.us> On 01/16/2014 11:47 PM, Steven D'Aprano wrote: > On Thu, Jan 16, 2014 at 08:23:13AM -0800, Ethan Furman wrote: > >> As I understand it, str.format will call the object's __format__. So, for >> example, if I say: >> >> u'the value is: %d' % myNum(17) >> >> then it will be myNum.__format__ that gets called, not int.__format__; > > I seem to have missed something, because I am completely confused... Why > are you talking about str.format and then show an example using % instead? Sorry, PEP 46x fatigue. :/ It should have been u'the value is {:d}'.format(myNum(17)) and yes I meant the str type. > %d calls __str__, not __format__. This is in Python 3.3: > > py> class MyNum(int): > ... def __str__(self): > ... print("Calling MyNum.__str__") > ... return super().__str__() > ... def __format__(self): > ... print("Calling MyNum.__format__") > ... return super().__format__() > ... > py> n = MyNum(17) > py> u"%d" % n > Calling MyNum.__str__ > '17' And that's a bug we fixed in 3.4: Python 3.4.0b1 (default:172a6bfdd91b+, Jan 5 2014, 06:39:32) [GCC 4.7.3] on linux Type "help", "copyright", "credits" or "license" for more information. --> class myNum(int): ... def __int__(self): ... return 7 ... def __index__(self): ... return 11 ... def __float__(self): ... return 13.81727 ... def __str__(self): ... print('__str__') ... return '1' ... def __repr__(self): ... print('__repr__') ... return '2' ... --> '%d' % myNum() '0' --> '%f' % myNum() '13.817270' After all, consider: >>> '%d' % True '1' >>> '%s' % True 'True' So, in fact, on subclasses __str__ should *not* be called to get the integer representation. First we do a conversion to make sure we have an int (or float, or ...), and then we call __str__ on our tried and trusted genuine core type. > The *worst* solution would be to completely ignore MyNum.__str__. > That's a nasty violation of the Principle Of Least Surprise, and will > lead to confusion ("why isn't my class' __str__ method being called?") Because you asked for a numeric representation, not a string representation [1]. -- ~Ethan~ [1] for all the gory details, see: http://bugs.python.org/issue18780 http://bugs.python.org/issue18738 From brett at python.org Fri Jan 17 17:53:23 2014 From: brett at python.org (Brett Cannon) Date: Fri, 17 Jan 2014 11:53:23 -0500 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <52D95F11.3020005@stoneleaf.us> References: <52D95F11.3020005@stoneleaf.us> Message-ID: On Fri, Jan 17, 2014 at 11:49 AM, Ethan Furman wrote: > Here's the text for your reading pleasure. I'll commit the PEP after I > add some markup. > > Major change: > > - dropped `format` support, just using %-interpolation > > Coming soon: > > - Rationale section ;) > > ============================================================ > ==================== > PEP: 461 > Title: Adding % formatting to bytes > Version: $Revision$ > Last-Modified: $Date$ > Author: Ethan Furman > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 2014-01-13 > Python-Version: 3.5 > Post-History: 2014-01-14, 2014-01-15, 2014-01-17 > Resolution: > > > Abstract > ======== > > This PEP proposes adding % formatting operations similar to Python 2's str > type > to bytes [1]_ [2]_. > > > Overriding Principles > ===================== > > In order to avoid the problems of auto-conversion and Unicode exceptions > that > could plague Py2 code, all object checking will be done by duck-typing, > not by > Don't abbreviate; spell out "Python 2". > values contained in a Unicode representation [3]_. > > > Proposed semantics for bytes formatting > ======================================= > > %-interpolation > --------------- > > All the numeric formatting codes (such as %x, %o, %e, %f, %g, etc.) > will be supported, and will work as they do for str, including the > padding, justification and other related modifiers. > > Example:: > > >>> b'%4x' % 10 > b' a' > > >>> '%#4x' % 10 > ' 0xa' > > >>> '%04X' % 10 > '000A' > > %c will insert a single byte, either from an int in range(256), or from > a bytes argument of length 1, not from a str. > > Example: > > >>> b'%c' % 48 > b'0' > > >>> b'%c' % b'a' > b'a' > > %s is restricted in what it will accept:: > > - input type supports Py_buffer? > use it to collect the necessary bytes > > - input type is something else? > use its __bytes__ method; if there isn't one, raise a TypeError > > Examples: > > >>> b'%s' % b'abc' > b'abc' > > >>> b'%s' % 3.14 > Traceback (most recent call last): > ... > TypeError: 3.14 has no __bytes__ method > > >>> b'%s' % 'hello world!' > Traceback (most recent call last): > ... > TypeError: 'hello world' has no __bytes__ method, perhaps you need to > encode it? > > .. note:: > > Because the str type does not have a __bytes__ method, attempts to > directly use 'a string' as a bytes interpolation value will raise an > exception. To use 'string' values, they must be encoded or otherwise > transformed into a bytes sequence:: > > 'a string'.encode('latin-1') > > > Numeric Format Codes > -------------------- > > To properly handle int and float subclasses, int(), index(), and float() > will be called on the objects intended for (d, i, u), (b, o, x, X), and > (e, E, f, F, g, G). > > > Unsupported codes > ----------------- > > %r (which calls __repr__), and %a (which calls ascii() on __repr__) are not > supported. > > > Proposed variations > =================== > > It was suggested to let %s accept numbers, but since numbers have their own > format codes this idea was discarded. > > It has been suggested to use %b for bytes instead of %s. > > - Rejected as %b does not exist in Python 2.x %-interpolation, which is > why we are using %s. > > It has been proposed to automatically use .encode('ascii','strict') for str > arguments to %s. > > - Rejected as this would lead to intermittent failures. Better to have > the > operation always fail so the trouble-spot can be correctly fixed. > > It has been proposed to have %s return the ascii-encoded repr when the > value > is a str (b'%s' % 'abc' --> b"'abc'"). > > - Rejected as this would lead to hard to debug failures far from the > problem > site. Better to have the operation always fail so the trouble-spot > can be > easily fixed. > > Originally this PEP also proposed adding format style formatting, but it > was > "format-style" > decided that format and its related machinery were all strictly text (aka > str) > based, and it was dropped. > "that the method and" > > Various new special methods were proposed, such as __ascii__, > __format_bytes___, > etc.; such methods are not needed at this time, but can be visited again > later > if real-world use shows deficiencies with this solution. > > > Footnotes > ========= > > .. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting > .. [2] neither string.Template, format, nor str.format are under > consideration. > .. [3] %c is not an exception as neither of its possible arguments are > unicode. > +1 from me -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Jan 17 17:58:37 2014 From: brett at python.org (Brett Cannon) Date: Fri, 17 Jan 2014 11:58:37 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <20140117111600.604ccb5f@anarchist.wooz.org> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> <20140117111600.604ccb5f@anarchist.wooz.org> Message-ID: On Fri, Jan 17, 2014 at 11:16 AM, Barry Warsaw wrote: > On Jan 17, 2014, at 11:00 AM, Brett Cannon wrote: > > >I would rephrase it to "switch to %-formatting for bytes usage for their > >common code base". > > -1. %-formatting is so neanderthal. :) > Very much so, which is why I'm willing to let it be bastardized in Python 3.5 for the sake of porting but not bytes.format(). =) I'm keeping format() clean for my nieces and nephew to use; they can just turn their nose up at %-formatting when they are old enough to program. -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Jan 17 18:07:38 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 17 Jan 2014 18:07:38 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140117170738.9025556912@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-01-10 - 2014-01-17) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4437 (+28) closed 27624 (+44) total 32061 (+72) Open issues with patches: 2012 Issues opened (47) ================== #14455: plistlib unable to read json and binary plist files http://bugs.python.org/issue14455 reopened by ronaldoussoren #20218: Add `pathlib.Path.write` and `pathlib.Path.read` http://bugs.python.org/issue20218 opened by cool-RR #20219: ElementTree: allow passing XMLPullParser instance into iterpar http://bugs.python.org/issue20219 opened by scoder #20220: TarFile.list() outputs wrong time http://bugs.python.org/issue20220 opened by serhiy.storchaka #20221: #define hypot _hypot conflicts with existing definition http://bugs.python.org/issue20221 opened by tabrezm #20222: unittest.mock-examples doc uses builtin file which is removed http://bugs.python.org/issue20222 opened by naoki #20223: inspect.signature does not support new functools.partialmethod http://bugs.python.org/issue20223 opened by yselivanov #20224: C API docs need a clear "defining custom extension types" sect http://bugs.python.org/issue20224 opened by ncoghlan #20227: Argument Clinic: rename arguments in generated C? http://bugs.python.org/issue20227 opened by georg.brandl #20230: structseq types should expose _fields http://bugs.python.org/issue20230 opened by abarnert #20231: Argument Clinic accepts no-default args after default args http://bugs.python.org/issue20231 opened by rmsr #20233: Re-enable buffer API slots for heap types http://bugs.python.org/issue20233 opened by Benno.Rice #20237: Ambiguous sentence in document of xml package. http://bugs.python.org/issue20237 opened by naoki #20238: Incomplete gzip output with tarfile.open(fileobj=..., mode="w: http://bugs.python.org/issue20238 opened by vadmium #20239: Allow repeated deletion of unittest.mock.Mock attributes http://bugs.python.org/issue20239 opened by michael.foord #20241: Bad reference to RFC in document of ipaddress? http://bugs.python.org/issue20241 opened by naoki #20243: ReadError when open a tarfile for writing http://bugs.python.org/issue20243 opened by serhiy.storchaka #20244: Possible resources leak in tarfile.open() http://bugs.python.org/issue20244 opened by serhiy.storchaka #20245: Check empty mode in TarFile.*open() http://bugs.python.org/issue20245 opened by serhiy.storchaka #20247: Condition._is_owned is wrong http://bugs.python.org/issue20247 opened by Antony.Lee #20249: test_posix.test_initgroups fails when running with no suppleme http://bugs.python.org/issue20249 opened by Rosuav #20252: Argument Clinic howto: small typo in y# translation http://bugs.python.org/issue20252 opened by rmsr #20254: Duplicate bytearray test on test_socket.py http://bugs.python.org/issue20254 opened by vajrasky #20256: Argument Clinic: compare signed and unsigned ints http://bugs.python.org/issue20256 opened by serhiy.storchaka #20257: test_socket fails if using tipc module and SELinux enabled http://bugs.python.org/issue20257 opened by vajrasky #20260: Argument Clinic: add unsigned integers converters http://bugs.python.org/issue20260 opened by serhiy.storchaka #20261: Cannot pickle some objects that have a __getattr__() http://bugs.python.org/issue20261 opened by barry #20262: Convert some debugging prints in zipfile to warnings http://bugs.python.org/issue20262 opened by serhiy.storchaka #20264: Update patchcheck to looks for files with clinic comments http://bugs.python.org/issue20264 opened by meador.inge #20265: Bring Doc/using/windows up to date http://bugs.python.org/issue20265 opened by zach.ware #20266: Bring Doc/faq/windows up to date http://bugs.python.org/issue20266 opened by zach.ware #20267: TemporaryDirectory does not resolve path when created using a http://bugs.python.org/issue20267 opened by Antony.Lee #20269: Inconsistent behavior in pdb when pressing Ctrl-C http://bugs.python.org/issue20269 opened by xdegaye #20270: urllib.parse doesn't work with empty port http://bugs.python.org/issue20270 opened by serhiy.storchaka #20271: urllib.parse.urlparse() accepts wrong URLs http://bugs.python.org/issue20271 opened by serhiy.storchaka #20274: sqlite module has bad argument parsing code, including undefin http://bugs.python.org/issue20274 opened by larry #20275: asyncio: remove debug code from BaseEventLoop http://bugs.python.org/issue20275 opened by yselivanov #20276: ctypes._dlopen should not force RTLD_NOW http://bugs.python.org/issue20276 opened by Albert.Zeyer #20280: add "predicate" to the glossary http://bugs.python.org/issue20280 opened by flox #20281: time.strftime %z format specifier is the same as %Z http://bugs.python.org/issue20281 opened by Mike.Owens #20282: Argument Clinic: int with boolean default http://bugs.python.org/issue20282 opened by serhiy.storchaka #20283: Wrong keyword parameter name in regex pattern methods http://bugs.python.org/issue20283 opened by serhiy.storchaka #20284: proof for concept patch for bytes formatting methods http://bugs.python.org/issue20284 opened by nascheme #20285: Improve object.__doc__ and help(object) output http://bugs.python.org/issue20285 opened by terry.reedy #20287: Argument Clinic: support diverting output to buffer, external http://bugs.python.org/issue20287 opened by larry #20288: HTMLParse handing of non-numeric charrefs broken http://bugs.python.org/issue20288 opened by iko #20289: Make cgi.FieldStorage a context manager http://bugs.python.org/issue20289 opened by brett.cannon Most recent 15 issues with no replies (15) ========================================== #20289: Make cgi.FieldStorage a context manager http://bugs.python.org/issue20289 #20288: HTMLParse handing of non-numeric charrefs broken http://bugs.python.org/issue20288 #20285: Improve object.__doc__ and help(object) output http://bugs.python.org/issue20285 #20283: Wrong keyword parameter name in regex pattern methods http://bugs.python.org/issue20283 #20282: Argument Clinic: int with boolean default http://bugs.python.org/issue20282 #20271: urllib.parse.urlparse() accepts wrong URLs http://bugs.python.org/issue20271 #20270: urllib.parse doesn't work with empty port http://bugs.python.org/issue20270 #20269: Inconsistent behavior in pdb when pressing Ctrl-C http://bugs.python.org/issue20269 #20267: TemporaryDirectory does not resolve path when created using a http://bugs.python.org/issue20267 #20265: Bring Doc/using/windows up to date http://bugs.python.org/issue20265 #20257: test_socket fails if using tipc module and SELinux enabled http://bugs.python.org/issue20257 #20254: Duplicate bytearray test on test_socket.py http://bugs.python.org/issue20254 #20249: test_posix.test_initgroups fails when running with no suppleme http://bugs.python.org/issue20249 #20245: Check empty mode in TarFile.*open() http://bugs.python.org/issue20245 #20244: Possible resources leak in tarfile.open() http://bugs.python.org/issue20244 Most recent 15 issues waiting for review (15) ============================================= #20287: Argument Clinic: support diverting output to buffer, external http://bugs.python.org/issue20287 #20284: proof for concept patch for bytes formatting methods http://bugs.python.org/issue20284 #20283: Wrong keyword parameter name in regex pattern methods http://bugs.python.org/issue20283 #20275: asyncio: remove debug code from BaseEventLoop http://bugs.python.org/issue20275 #20270: urllib.parse doesn't work with empty port http://bugs.python.org/issue20270 #20269: Inconsistent behavior in pdb when pressing Ctrl-C http://bugs.python.org/issue20269 #20264: Update patchcheck to looks for files with clinic comments http://bugs.python.org/issue20264 #20262: Convert some debugging prints in zipfile to warnings http://bugs.python.org/issue20262 #20260: Argument Clinic: add unsigned integers converters http://bugs.python.org/issue20260 #20257: test_socket fails if using tipc module and SELinux enabled http://bugs.python.org/issue20257 #20254: Duplicate bytearray test on test_socket.py http://bugs.python.org/issue20254 #20252: Argument Clinic howto: small typo in y# translation http://bugs.python.org/issue20252 #20249: test_posix.test_initgroups fails when running with no suppleme http://bugs.python.org/issue20249 #20245: Check empty mode in TarFile.*open() http://bugs.python.org/issue20245 #20244: Possible resources leak in tarfile.open() http://bugs.python.org/issue20244 Top 10 most discussed issues (10) ================================= #20275: asyncio: remove debug code from BaseEventLoop http://bugs.python.org/issue20275 24 msgs #20189: inspect.Signature doesn't recognize all builtin types http://bugs.python.org/issue20189 15 msgs #20186: Derby #18: Convert 31 sites to Argument Clinic across 23 files http://bugs.python.org/issue20186 12 msgs #20172: Derby #3: Convert 67 sites to Argument Clinic across 4 files ( http://bugs.python.org/issue20172 10 msgs #19936: Executable permissions of Python source files http://bugs.python.org/issue19936 9 msgs #20227: Argument Clinic: rename arguments in generated C? http://bugs.python.org/issue20227 9 msgs #14455: plistlib unable to read json and binary plist files http://bugs.python.org/issue14455 8 msgs #20185: Derby #17: Convert 49 sites to Argument Clinic across 13 files http://bugs.python.org/issue20185 8 msgs #20281: time.strftime %z format specifier is the same as %Z http://bugs.python.org/issue20281 8 msgs #20260: Argument Clinic: add unsigned integers converters http://bugs.python.org/issue20260 7 msgs Issues closed (43) ================== #5803: email/quoprimime: encode and decode are very slow on large mes http://bugs.python.org/issue5803 closed by r.david.murray #6625: UnicodeEncodeError on pydoc's CLI http://bugs.python.org/issue6625 closed by berker.peksag #18960: First line can be executed twice http://bugs.python.org/issue18960 closed by serhiy.storchaka #19082: Lib/xmlrpc/client.py demo code points to the dead server http://bugs.python.org/issue19082 closed by orsenthil #19097: bool(cgi.FieldStorage(...)) may be False unexpectedly http://bugs.python.org/issue19097 closed by orsenthil #19534: normalize() in locale.py fails for sr_RS.UTF-8 at latin http://bugs.python.org/issue19534 closed by serhiy.storchaka #19886: Better estimated memory requirements for bigmem tests http://bugs.python.org/issue19886 closed by serhiy.storchaka #20009: Property should expose wrapped function. http://bugs.python.org/issue20009 closed by ncoghlan #20072: Ttk tests fail when wantobjects is false http://bugs.python.org/issue20072 closed by serhiy.storchaka #20086: test_locale fails with Turkish locale http://bugs.python.org/issue20086 closed by serhiy.storchaka #20138: wsgiref on Python 3.x incorrectly implements URL handling caus http://bugs.python.org/issue20138 closed by serhiy.storchaka #20196: Argument Clinic generates invalid code for optional parameter http://bugs.python.org/issue20196 closed by larry #20201: Argument Clinic: rwbuffer support broken http://bugs.python.org/issue20201 closed by larry #20202: ArgumentClinic howto: document change in Py_buffer lifecycle m http://bugs.python.org/issue20202 closed by rmsr #20206: email quoted-printable encoding issue http://bugs.python.org/issue20206 closed by r.david.murray #20208: Clarify some things in porting HOWTO http://bugs.python.org/issue20208 closed by brett.cannon #20214: Argument Clinic rollup fixes http://bugs.python.org/issue20214 closed by larry #20225: Argument Clinic: simplify METH_NOARGS generated code http://bugs.python.org/issue20225 closed by larry #20226: Argument Clinic: support for simple expressions? http://bugs.python.org/issue20226 closed by larry #20228: Argument Clinic should understand Python special methods http://bugs.python.org/issue20228 closed by larry #20229: platform.py uses deprecated feature of plistlib http://bugs.python.org/issue20229 closed by ned.deily #20232: Argument Clinic NULL default falsely implies None acceptabilit http://bugs.python.org/issue20232 closed by rmsr #20234: Argument Clinic: use PyTuple_GET_SIZE? http://bugs.python.org/issue20234 closed by larry #20235: Argument Clinic: recover a bit more gracefully from exceptions http://bugs.python.org/issue20235 closed by python-dev #20236: Invalid inline markup in xml document. http://bugs.python.org/issue20236 closed by r.david.murray #20240: Whitespace ignored in relative imports: from.package import so http://bugs.python.org/issue20240 closed by eric.smith #20242: logging style parameter does not work correctly http://bugs.python.org/issue20242 closed by python-dev #20246: buffer overflow in socket.recvfrom_into http://bugs.python.org/issue20246 closed by python-dev #20248: docs: socket.recvmsg{,_into} are keword-compatible http://bugs.python.org/issue20248 closed by rmsr #20250: defaultdict docstring neglects the *args http://bugs.python.org/issue20250 closed by python-dev #20251: socket.recvfrom_into crash with empty buffer http://bugs.python.org/issue20251 closed by python-dev #20253: Typo in ipaddress document http://bugs.python.org/issue20253 closed by python-dev #20255: Doc/about: update a bit http://bugs.python.org/issue20255 closed by zach.ware #20258: Python documentation build fails with "markupsafe module canno http://bugs.python.org/issue20258 closed by python-dev #20259: os.walk conflicts with os.listdir http://bugs.python.org/issue20259 closed by r.david.murray #20263: Function print definable as variables? http://bugs.python.org/issue20263 closed by r.david.murray #20268: Argument Clinic: support cloning existing functions http://bugs.python.org/issue20268 closed by larry #20272: chain.from_iterable in the overview table not linking to the f http://bugs.python.org/issue20272 closed by python-dev #20273: Argument Clinic: unhelpful crashes http://bugs.python.org/issue20273 closed by larry #20277: default __debug__ value and asserts http://bugs.python.org/issue20277 closed by michael.foord #20278: Wrong URL to the pysqlite web page http://bugs.python.org/issue20278 closed by python-dev #20279: bytearray.fromhex does not work with some hexes generated by h http://bugs.python.org/issue20279 closed by Anatoly Belikov #20286: Segfault when using internal DictProxy http://bugs.python.org/issue20286 closed by benjamin.peterson From eric at trueblade.com Fri Jan 17 18:13:17 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 17 Jan 2014 12:13:17 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> <20140117111600.604ccb5f@anarchist.wooz.org> Message-ID: <52D964AD.8030507@trueblade.com> On 01/17/2014 11:58 AM, Brett Cannon wrote: > > > > On Fri, Jan 17, 2014 at 11:16 AM, Barry Warsaw > wrote: > > On Jan 17, 2014, at 11:00 AM, Brett Cannon wrote: > > >I would rephrase it to "switch to %-formatting for bytes usage for > their > >common code base". > > -1. %-formatting is so neanderthal. :) > > > Very much so, which is why I'm willing to let it be bastardized in > Python 3.5 for the sake of porting but not bytes.format(). =) I'm > keeping format() clean for my nieces and nephew to use; they can just > turn their nose up at %-formatting when they are old enough to program. Given the problems with implementing it, I'm more than willing to drop bytes.format() from PEP 461 (not that it's my PEP). But if we think that %-formatting is neanderthal and will get dropped in the Python 4000 timeframe (that is, someday in the far future), then I think we should have some advice to give to people who are writing new 3.x code for the non-porting use-cases addressed by the PEP. I'm specifically thinking of new code that wants to format some bytes for an on-the-wire ascii-like protocol. Is it: b'Content-Length: ' + str(47).encode('ascii') or b'Content-Length: {}.format(str(47).encode('ascii')) or something better? I think it will look like the above, or involve something like bytes.format() and __format_ascii__. Or, maybe a library that just supports a few types (say, bytes, int, and float!). Eric. From nas at arctrix.com Fri Jan 17 18:46:15 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 17 Jan 2014 17:46:15 +0000 (UTC) Subject: [Python-Dev] PEP 461 Final? References: <52D95F11.3020005@stoneleaf.us> Message-ID: Ethan Furman wrote: > Overriding Principles >===================== > > In order to avoid the problems of auto-conversion and Unicode exceptions that > could plague Py2 code, all object checking will be done by duck-typing, not by > values contained in a Unicode representation [3]_. I think a longer "Rational" section is justified given the amount of discussion this feature generated. Here is a revised version of what I already suggested: Rational ======== A distruptive but useful change introduced in Python 3.0 was the clean separation of byte strings (i.e. the "bytes" object) from character strings (i.e. the "str" object). The benefit is that character encodings must be explicitly specified and the risk of corrupting character data is reduced. Unfortunately, this separation has made writing certain types of programs more complicated and verbose. For example, programs that deal with network protocols often manipulate ASCII encoded strings or assemble byte strings from fragments. Since the "bytes" type does not support string formatting, extra encoding and decoding between the "str" type is often required. For simplicity and convenience it is desireable to introduce formatting methods to "bytes" that allow formatting of ASCII-encoded character data. This change would blur the clean separation of byte strings and character strings. However, it is felt that the practical benefits outweigh the purity costs. The implicit assumption of ASCII-encoding would be limited to formatting methods. One source of many problems with the Python 2 Unicode implementation is the implicit coercion of Unicode character strings into byte strings using the "ascii" codec. If the character strings contain only ASCII characters, all was well. However, if the string contains a non-ASCII character then coercion causes an exception. The combination of implicit coercion and value dependent failures has proven to be a recipe for hard to debug errors. A program may seem to work correctly when tested (e.g. string input that happened to be ASCII only) but later would fail, often with a traceback far from the source of the real error. The formatting methods for bytes() should avoid this problem by not implicitly encoding data that might fail based on the content of the data. I think we can back off on the duck-typing idea. It's a good Python principle but I now realize existing %-interpolation doesn't do it. The numeric format codes coerce to long or float. > Unsupported codes > ----------------- > > %r (which calls __repr__), and %a (which calls ascii() on __repr__) are not > supported. I think %a should be supported. I imagine it would be quite useful when dumping debugging output to a bytes stream. It's easy to implement and I think the danger for abuse or surprises is small. It would also help when translating Python 2 code, change %r to %a. > Proposed variations >=================== > > It was suggested to let %s accept numbers, but since numbers have their own > format codes this idea was discarded. > > It has been suggested to use %b for bytes instead of %s. > > - Rejected as %b does not exist in Python 2.x %-interpolation, which is > why we are using %s. I think we should use %b instead of %s. In that case, I'm fine with %b not accepting numbers. Using %b clearly indicates we are inserting arbitrary bytes. It also proves a useful code review step when translating from Python 2.x. To ease porting from Python 2.x code, I propose adding a command-line option that enables %s and %r format codes for bytes %-interpolation. I'm going to write a draft PEP (it would depend on PEP 461 being implemented). > Originally this PEP also proposed adding format style formatting, but it was > decided that format and its related machinery were all strictly text (aka str) > based, and it was dropped. I would also argue that we should limit the scope of this PEP. It has already generated a massive amount of discussion. Nothing precludes us from adding support for format() to bytes in the future, if we decide we want it and how it should work. > Various new special methods were proposed, such as __ascii__, > __format_bytes___, etc.; such methods are not needed at this time, > but can be visited again later if real-world use shows > deficiencies with this solution. I agree, new special methods are not needed at this time since numeric codes do use duck-typing and __bytes__ already exists. Neil From breamoreboy at yahoo.co.uk Fri Jan 17 19:19:57 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 17 Jan 2014 18:19:57 +0000 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: References: <52D95F11.3020005@stoneleaf.us> Message-ID: On 17/01/2014 17:46, Neil Schemenauer wrote: > > I think we should use %b instead of %s. In that case, I'm fine with > %b not accepting numbers. Using %b clearly indicates we are > inserting arbitrary bytes. It also proves a useful code review step > when translating from Python 2.x. > Using %b could cause problems in the future as b is used in new style formatting to mean output numbers in binary, so %B seems to me the obvious choice as it's also unused. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From larry at hastings.org Fri Jan 17 19:23:14 2014 From: larry at hastings.org (Larry Hastings) Date: Fri, 17 Jan 2014 10:23:14 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: References: <52D95F11.3020005@stoneleaf.us> Message-ID: <52D97512.8050701@hastings.org> On 01/17/2014 09:46 AM, Neil Schemenauer wrote: > Rational > ======== "Rationale". "Rational" is an adjective, "Rationale" is a noun. Pedantically yours, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jan 17 18:38:59 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 17 Jan 2014 09:38:59 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D964AD.8030507@trueblade.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> <20140117111600.604ccb5f@anarchist.wooz.org> <52D964AD.8030507@trueblade.com> Message-ID: <52D96AB3.10500@stoneleaf.us> On 01/17/2014 09:13 AM, Eric V. Smith wrote: > On 01/17/2014 11:58 AM, Brett Cannon wrote: >> On Fri, Jan 17, 2014 at 11:16 AM, Barry Warsaw wrote: >>> On Jan 17, 2014, at 11:00 AM, Brett Cannon wrote: >>>> >>>> I would rephrase it to "switch to %-formatting for bytes usage for >>>> their common code base". >>> >>> -1. %-formatting is so neanderthal. :) >> >> Very much so, which is why I'm willing to let it be bastardized in >> Python 3.5 for the sake of porting but not bytes.format(). =) I'm >> keeping format() clean for my nieces and nephew to use; they can just >> turn their nose up at %-formatting when they are old enough to program. > > Given the problems with implementing it, I'm more than willing to drop > bytes.format() from PEP 461 (not that it's my PEP). But if we think that > %-formatting is neanderthal and will get dropped in the Python 4000 > timeframe I hope not! > (that is, someday in the far future), then I think we should > have some advice to give to people who are writing new 3.x code for the > non-porting use-cases addressed by the PEP. I'm specifically thinking of > new code that wants to format some bytes for an on-the-wire ascii-like > protocol. %-interpolation handles this use case well, format does not. > Is it: > b'Content-Length: ' + str(47).encode('ascii') > or > b'Content-Length: {}.format(str(47).encode('ascii')) > or something better? Ew. Neither of those look better than b'Content-Length: %d' % 47 -- ~Ethan~ From v+python at g.nevcal.com Fri Jan 17 20:04:13 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 17 Jan 2014 11:04:13 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D94318.3070506@trueblade.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> Message-ID: <52D97EAD.6070407@g.nevcal.com> On 1/17/2014 6:50 AM, Eric V. Smith wrote: > Following up, I think this leaves us with 3 choices: > > 1. Do not implement bytes.format(). We tell any 2.x code that's written > to use str.format() to switch to %-formatting for their common code base. > > 2. Add the simplistic version of bytes.format() that I describe above, > restricted to accepting bytes, int, and float (and no subclasses). Some > 2.x code will work, some will need to change to %-formatting. > > 3. Add bytes.format() and the __format_ascii__ protocol. We might want > to also add a format_ascii() builtin, to match __format__ and format(). > This would require the least change to 2.x code that uses str.format() > and wants to move to bytes.format(), but would require some work on the > 3.x side. > > I'd advocate 1 or 2. Nice summary. I'd advocate 1 or 3. -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Fri Jan 17 20:04:56 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 17 Jan 2014 11:04:56 -0800 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> Message-ID: <52D97ED8.9010303@g.nevcal.com> On 1/17/2014 7:15 AM, Mark Lawrence wrote: > For both options 1 and 2 surely you cannot be suggesting that after > people have written 2.x code to use format() as %f formatting is to be > deprecated, they now have to change the code back to the way they may > well have written it in the first place? If they are committed to format(), another option is to operate in the Unicode domain, and encode at the end. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Fri Jan 17 20:11:26 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 17 Jan 2014 14:11:26 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: <52D97ED8.9010303@g.nevcal.com> References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> <52D97ED8.9010303@g.nevcal.com> Message-ID: <52D9805E.3070302@trueblade.com> On 01/17/2014 02:04 PM, Glenn Linderman wrote: > On 1/17/2014 7:15 AM, Mark Lawrence wrote: >> For both options 1 and 2 surely you cannot be suggesting that after >> people have written 2.x code to use format() as %f formatting is to be >> deprecated, they now have to change the code back to the way they may >> well have written it in the first place? > > If they are committed to format(), another option is to operate in the > Unicode domain, and encode at the end. Maybe that's the best advice to give. It's better than my earlier example of field-at-a-time encoding. Eric. From v+python at g.nevcal.com Fri Jan 17 20:40:37 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 17 Jan 2014 11:40:37 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <52D95F11.3020005@stoneleaf.us> References: <52D95F11.3020005@stoneleaf.us> Message-ID: <52D98735.5040201@g.nevcal.com> On 1/17/2014 8:49 AM, Ethan Furman wrote: > %s is restricted in what it will accept:: > > - input type supports Py_buffer? > use it to collect the necessary bytes > > - input type is something else? > use its __bytes__ method; if there isn't one, raise a TypeError > > Examples: > > >>> b'%s' % b'abc' > b'abc' > > >>> b'%s' % 3.14 > Traceback (most recent call last): > ... > TypeError: 3.14 has no __bytes__ method > > >>> b'%s' % 'hello world!' > Traceback (most recent call last): > ... > TypeError: 'hello world' has no __bytes__ method, perhaps you need > to encode it? If you produce a helpful error message for str (re: encoding), might it not be appropriate to produce a helpful error message for builtin number types (, perhaps you need a numeric format code?)? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jan 17 20:48:11 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 17 Jan 2014 11:48:11 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <52D98735.5040201@g.nevcal.com> References: <52D95F11.3020005@stoneleaf.us> <52D98735.5040201@g.nevcal.com> Message-ID: <52D988FB.6020205@stoneleaf.us> On 01/17/2014 11:40 AM, Glenn Linderman wrote: > On 1/17/2014 8:49 AM, Ethan Furman wrote: >> >> >>> b'%s' % 3.14 >> Traceback (most recent call last): >> ... >> TypeError: 3.14 has no __bytes__ method > > If you produce a helpful error message for str (re: encoding), might it not be appropriate to produce a helpful error > message for builtin number types (, perhaps you need a numeric format code?)? Good point! Done. -- ~Ethan~ From nas at arctrix.com Fri Jan 17 20:51:44 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 17 Jan 2014 19:51:44 +0000 (UTC) Subject: [Python-Dev] PEP 461 Final? References: <52D95F11.3020005@stoneleaf.us> Message-ID: Mark Lawrence wrote: > Using %b could cause problems in the future as b is used in new style > formatting to mean output numbers in binary, so %B seems to me the > obvious choice as it's also unused. After updating my patch, I've decided that %s works better. My patch implements PEP 461 as proposed with the following additional features: - add %a format code, calls PyObject_ASCII on the argument. I see no reason not too add it as a useful debugging feature. - add -2 command-line option. When enabled: %s will fallback to calling PyObject_Str() after first trying the buffer API and __bytes__. The value will be encoded using strict ASCII encoding. Also, %r is enabled as an alias for %a. The patch is v4, bugs.python.org/issue20284, still needs more review and testing. Neil From v+python at g.nevcal.com Fri Jan 17 20:52:00 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 17 Jan 2014 11:52:00 -0800 Subject: [Python-Dev] Migration from Python 2.7 and bytes formatting In-Reply-To: References: Message-ID: <52D989E0.4050705@g.nevcal.com> On 1/17/2014 2:49 AM, Neil Schemenauer wrote: > As I see it, there are two separate goals in adding formatting > methods to bytes. One is to make it easier to write new programs > that manipulate byte data. Another is to make it easier to upgrade > Python 2.x programs to Python 3.x. Here is an idea to better > address these separate goals. > > Introduce %-interpolation for bytes. Support the following format > codes to aid in writing new code: > > %b: insert arbitrary bytes (via __bytes__ or Py_buffer) > > %[dox]: insert an integer, encoded as ASCII > > %[eEfFgG]: insert a float, encoded as ASCII > > %a: call ascii(), insert result > > Add a command-line option, disabled by default, that enables the > following format codes: > > %s: if the object has __bytes__ or Py_buffer then insert it. > Otherwise, call str() and encode with the 'ascii' codec > > %r: call repr(), encode with the 'ascii' codec > > %[iuX]: as per Python 2.x, for backwards compatibility > > Introducing these extra codes and the command-line option will > provide a more gradual upgrade path. The next step in porting could > be to examine each %s inside bytes literals and decide if they > should either be converted to %b or if the literal should be > converted to a unicode literal. Any %r codes could likely be safely > changed to %a. -1 overall. Not worth the extra complexity in documentation and command line parameters. %s, since it cannot be used for strings of characters (str) anyway, might as well be used for strings of bytes, and of necessity for single-code-base porting, must be usable in that manner. I would give +.5 to the idea of supporting %a in Python 3 I would give +.2 for %r as a synonym for %a in Python 3. %r and %a don't produce fixed-width fields, so are likely used in places where the exact length in bytes is flexible, and in ASCII segments of the byte stream... supporting them both with the semantics of %a might be useful. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nas at arctrix.com Fri Jan 17 21:09:45 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 17 Jan 2014 20:09:45 +0000 (UTC) Subject: [Python-Dev] Migration from Python 2.7 and bytes formatting References: Message-ID: I've refined this idea a little in my latest PEP 461 patch (issue 20284). Continuing to use %s instead of introducing %b seems better. I've called the commmand-line option -2, it could be used to enable other similar porting aids. I'd like to try porting code making use of the -2 feature to see how helpful it is. The behavior is partway between Python 2.x laziness and Python 3.x strictness in terms of specifying encodings. Python 2.x: - coerce byte strings to unicode strings to avoid making a decision about encoding - when writing a unicode string to a bytes stream without a specified encoding, encode with ASCII. Blow up with an exception if a non-ASCII character is encounted, often far from where the real bug is. Python 3.x: - refuse to accept unicode strings where bytes are expected, require explicit encoding to be preformed Python 3.x with -2 command-line option: - when objects are formatted into bytes, immediately encode them using strict ASCII encoding. No code would be considered fully ported to Python 3 unless it can run without the -2 command line option. Neil From rymg19 at gmail.com Fri Jan 17 21:10:40 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 17 Jan 2014 14:10:40 -0600 Subject: [Python-Dev] Migration from Python 2.7 and bytes formatting In-Reply-To: References: Message-ID: A command line parameter?? The annoying part would be telling every single user to call Python with a certain argument and hope they read the README. If it's a library, out of the question. If it's a program, well, I hope your users read READMEs. On Fri, Jan 17, 2014 at 4:49 AM, Neil Schemenauer wrote: > As I see it, there are two separate goals in adding formatting > methods to bytes. One is to make it easier to write new programs > that manipulate byte data. Another is to make it easier to upgrade > Python 2.x programs to Python 3.x. Here is an idea to better > address these separate goals. > > Introduce %-interpolation for bytes. Support the following format > codes to aid in writing new code: > > %b: insert arbitrary bytes (via __bytes__ or Py_buffer) > > %[dox]: insert an integer, encoded as ASCII > > %[eEfFgG]: insert a float, encoded as ASCII > > %a: call ascii(), insert result > > Add a command-line option, disabled by default, that enables the > following format codes: > > %s: if the object has __bytes__ or Py_buffer then insert it. > Otherwise, call str() and encode with the 'ascii' codec > > %r: call repr(), encode with the 'ascii' codec > > %[iuX]: as per Python 2.x, for backwards compatibility > > Introducing these extra codes and the command-line option will > provide a more gradual upgrade path. The next step in porting could > be to examine each %s inside bytes literals and decide if they > should either be converted to %b or if the literal should be > converted to a unicode literal. Any %r codes could likely be safely > changed to %a. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com > -- Ryan When your hammer is C++, everything begins to look like a thumb. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jan 17 21:16:48 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 17 Jan 2014 15:16:48 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> Message-ID: On 1/17/2014 10:15 AM, Mark Lawrence wrote: > For both options 1 and 2 surely you cannot be suggesting that after > people have written 2.x code to use format() as %f formatting is to be > deprecated, I will not be for at least a decade. >they now have to change the code back to the way they may > well have written it in the first place? I would suggest that people simply .encode the result if bytes are needed in 3.x as well as 2.x. Polyglot code will likely have a 'py3' boolean already to make the encoding conditional. -- Terry Jan Reedy From nas at arctrix.com Fri Jan 17 21:19:31 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 17 Jan 2014 14:19:31 -0600 Subject: [Python-Dev] Migration from Python 2.7 and bytes formatting In-Reply-To: References: Message-ID: <20140117201931.GA6948@python.ca> On 2014-01-17, Ryan Gonzalez wrote: > A command line parameter?? I believe it has to be global flag. A __future__ statement will not work. Probably we should allow the flag to be set with an environment variable as well. > The annoying part would be telling every single user to call Python with a > certain argument and hope they read the README. > > If it's a library, out of the question. > > If it's a program, well, I hope your users read READMEs. The purpose of the command line parameter is not for end users. It is intended to help developers port millions of lines of existing Python 2.x code. I'm very sad if Python core developers don't realize the enormity of the task and don't continue to make efforts to make it easier. Regards, Neil From ethan at stoneleaf.us Fri Jan 17 21:20:21 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 17 Jan 2014 12:20:21 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: References: <52D95F11.3020005@stoneleaf.us> Message-ID: <52D99085.9060600@stoneleaf.us> On 01/17/2014 08:53 AM, Brett Cannon wrote: > > Don't abbreviate; spell out "Python 2". Fixed. >> Originally this PEP also proposed adding format style formatting, but it was > > "format-style" Fixed. > decided that format and its related machinery were all strictly text (aka str) > based, and it was dropped. > > "that the method and" Fixed. Thanks. -- ~Ethan~ From nas at arctrix.com Fri Jan 17 21:24:10 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 17 Jan 2014 20:24:10 +0000 (UTC) Subject: [Python-Dev] Migration from Python 2.7 and bytes formatting References: <52D989E0.4050705@g.nevcal.com> Message-ID: Glenn Linderman wrote: > -1 overall. > > Not worth the extra complexity in documentation and command line > parameters. Really? It's less than 20 lines of code to implement, probably similar to document. With millions maybe billions of lines of existing Python 2.x code to port, I'm dismayed to hear this objection. Time to take a break from python-dev, I've got paying work to do, programming in Python 2.x. Neil From rymg19 at gmail.com Fri Jan 17 21:26:22 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 17 Jan 2014 14:26:22 -0600 Subject: [Python-Dev] Migration from Python 2.7 and bytes formatting In-Reply-To: <20140117201931.GA6948@python.ca> References: <20140117201931.GA6948@python.ca> Message-ID: Regardless, I still feel the introduction of a switch and all that stuff is too complicated. I understand you position, since all my applications are written in Python 2(except 1). However, I don't think this is the best solution. On Fri, Jan 17, 2014 at 2:19 PM, Neil Schemenauer wrote: > On 2014-01-17, Ryan Gonzalez wrote: > > A command line parameter?? > > I believe it has to be global flag. A __future__ statement will not > work. Probably we should allow the flag to be set with an > environment variable as well. > > > The annoying part would be telling every single user to call Python with > a > > certain argument and hope they read the README. > > > > If it's a library, out of the question. > > > > If it's a program, well, I hope your users read READMEs. > > The purpose of the command line parameter is not for end users. It > is intended to help developers port millions of lines of existing > Python 2.x code. I'm very sad if Python core developers don't > realize the enormity of the task and don't continue to make efforts > to make it easier. > > Regards, > > Neil > -- Ryan When your hammer is C++, everything begins to look like a thumb. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jan 17 21:56:07 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 17 Jan 2014 15:56:07 -0500 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> Message-ID: Responding to two posts at once, as I consider them On 1/17/2014 11:00 AM, Brett Cannon wrote: > I would rephrase it to "switch to %-formatting for bytes usage for their > common code base". If they are working with actual text then using > str.format() still works (and is actually nicer to use IMO). It actually > might make the str/bytes relationship even clearer, especially if we > start to promote that str.format() is for text and %-formatting is for > bytes. Good idea, I think: printf % formatting was invented for formatting ascii text in bytestrings as it was being output (although sprintf allowed not-output). In retrospect, I think we should have introduced unicode.format when unicode was introduced in 2.0 and perhap never have had unicode % formatting. Or we should have dropped str % instead of bytes % in 3.0. On 1/17/2014 12:13 PM, Eric V. Smith wrote: > But if we think that %-formatting is neanderthal and will get dropped > in the Python 4000 timeframe (that is, someday in the far future), Some people, such as Martin Loewis, have a different opinion of %-formatting and will fight deprecating it *ever*. (I suspect that %-format opinions are influenced by one's current relation to C.) > then I think we should have some advice to give to people who are > writing new 3.x code for the non-porting use-cases addressed by the > PEP. I'm specifically thinking of new code that wants to format some > bytes for an on-the-wire ascii-like protocol. If we add %-formatting back in 3.5 for its original purpose, formatting ascii in bytes for output, I think we should drop the idea of later deprecating it (a few releases later) for that purpose. I think the PEP should even say so, that bytes % will remain indefinitely even if str % were to be dropped in favor of str.format. I would consider dropping unicode(now string).__mod__ in favor of .format to still be an eventual option, especially if someone were to write a converter. -- Terry Jan Reedy From chris.barker at noaa.gov Fri Jan 17 22:37:23 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 17 Jan 2014 13:37:23 -0800 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <87wqhzapz1.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D85E61.8040701@udel.edu> <874n53cptr.fsf@uwakimon.sk.tsukuba.ac.jp> <20140117053611.GB3915@ando> <87wqhzapz1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: For the record, we've got a pretty good thread (not this good, though!) over on the numpy list about how to untangle the mess that has resulted from porting text-file-parsing code to py3 (and the underlying issue with the 'S' data type in numpy...) One note from the github issue: """ The use of asbytes originates only from the fact that b'%d' % (20,) does not work. """ So yeah PEP 461! (even if too late for numpy...) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Sat Jan 18 00:52:24 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 17 Jan 2014 15:52:24 -0800 Subject: [Python-Dev] PEP 461 updates In-Reply-To: <20140117220604.E344B11E653@klonk.arctrix.com> References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D85E61.8040701@udel.edu> <874n53cptr.fsf@uwakimon.sk.tsukuba.ac.jp> <20140117053611.GB3915@ando> <87wqhzapz1.fsf@uwakimon.sk.tsukuba.ac.jp> <20140117220604.E344B11E653@klonk.arctrix.com> Message-ID: I hope you didn't mean to take this off-list: On Fri, Jan 17, 2014 at 2:06 PM, Neil Schemenauer wrote: > In gmane.comp.python.devel, you wrote: > > For the record, we've got a pretty good thread (not this good, though!) > > over on the numpy list about how to untangle the mess that has resulted > > Not sure about your definition of good. ;-) well, in the sense of "big" anyway... > Could you summarize the main points on python-dev? I'm not feeling up to > wading through > another massive thread but I'm quite interested to hear the > challenges that numpy deals with. Well, not much new to it, really. But here's a re-cap: numpy has had an 'S' dtype for a while, which corresponded to the py2 string type (except for being fixed length). So it could auto-convert to-from python strings... all was good and happy. Enter py3: what to do? there is no py2 string type anymore. So it was decided to have the 'S' dtype correspond to the py3 bytes type. Apparently there was thought of renaming it, but the 'B' and 'b' type identifiers were already takes, so 'S' was kept. However, as we all know in this thread, the py3 bytes type is not the same thing as a py2 string (or py2 bytes, natch), and folks like to use the 'S' type for text data -- so that is kind of broken in py3. However, other folks use the 'S' type for binary data, so like (and rely on) it being mapped to the py3 bytes type. So we are stuck with that. Given the nature of numpy, and scientific data, there is talk of having a one-byte-per-char text type in numpy (there is already a unicode type, but it uses 4-bytes-per-char, as it's key to the numpy data model that all objects of a given type are the same size.) This would be analogous to the current multiple precision options for numbers. It would take up less memory, and would not be able to hold all values. It's not clear what the level of support is for this right now -- after all, you can do everything you need to do with the appropriate calls to encode() and decode(), if a bit awkward. Meanwhile, back at the ranch -- related, but separate issues have arisen with the functions that parse text files: numpy.loadtxt and numpy.genfromtxt. These functions were adapted for py3 just enough to get things to mostly work, but have some serious limitations when doing anything with unicode -- and in fact do some weird things with plain ascii text files if you ask it to create unicode objects, and that is a natural thing to do (and the "right" thing to do in the Py3 text model) if you do something like: arr = loadtxt('a_file_name', dtype=str) on py3, an str is a py3unicode string, so you get the numpy 'U' datatype but loadtxt wasn't designed to deal with that, so you can get stuff like: ["b'C:\\\\Users\\\\Documents\\\\Project\\\\mytextfile1.txt'" "b'C:\\\\Users\\\\Documents\\\\Project\\\\mytextfile2.txt'" "b'C:\\\\Users\\\\Documents\\\\Project\\\\mytextfile3.txt'"] This was (Presumably, I haven't debugged the code) due to conversion from bytes to unicode...(I'm still confused about the extra slashes) And this ascii text -- it gets worse if there is non-ascii text in there. Anyway, the truth is, this stuff is hard, but it will get at least a touch easier with PEP 461. [though to be truthful, I'm not sure why someone put a comment in the issue tracker about b'%d'%some_num being an issue ... I'm not sure how when we're going from text to numbers, not the other way around...] -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sat Jan 18 02:04:47 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 17 Jan 2014 20:04:47 -0500 Subject: [Python-Dev] PEP 461 updates In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D85E61.8040701@udel.edu> <874n53cptr.fsf@uwakimon.sk.tsukuba.ac.jp> <20140117053611.GB3915@ando> <87wqhzapz1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52D9D32F.1040103@trueblade.com> On 1/17/2014 4:37 PM, Chris Barker wrote: > For the record, we've got a pretty good thread (not this good, though!) > over on the numpy list about how to untangle the mess that has resulted > from porting text-file-parsing code to py3 (and the underlying issue > with the 'S' data type in numpy...) > > One note from the github issue: > """ > The use of asbytes originates only from the fact that b'%d' % (20,) > does not work. > """ > > So yeah PEP 461! (even if too late for numpy...) Would they use "(u'%d' % (20,)).encode('ascii')" for that? Just curious as to what they're planning on doing. Eric. From steve at pearwood.info Sat Jan 18 02:27:53 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 18 Jan 2014 12:27:53 +1100 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <52D95F11.3020005@stoneleaf.us> References: <52D95F11.3020005@stoneleaf.us> Message-ID: <20140118012751.GM3915@ando> On Fri, Jan 17, 2014 at 08:49:21AM -0800, Ethan Furman wrote: > Overriding Principles > ===================== > > In order to avoid the problems of auto-conversion and Unicode exceptions > that > could plague Py2 code, all object checking will be done by duck-typing, not > by > values contained in a Unicode representation [3]_. I don't understand this paragraph. What does "values contained in a Unicode representation" mean? [...] > %s is restricted in what it will accept:: > > - input type supports Py_buffer? > use it to collect the necessary bytes Can you give some examples of what types support Py_buffer? Presumably bytes. Anything else? > - input type is something else? > use its __bytes__ method; if there isn't one, raise a TypeError I think you should explicitly state that this is a new special method, and state which built-in types will grow a __bytes__ method (if any). > Numeric Format Codes > -------------------- > > To properly handle int and float subclasses, int(), index(), and float() > will be called on the objects intended for (d, i, u), (b, o, x, X), and > (e, E, f, F, g, G). -1 on this idea. This is a rather large violation of the principle of least surprise, and radically different from the behaviour of Python 3 str. In Python 3, '%d' interpolation calls the __str__ method, so if you subclass, you can get the behaviour you want: py> class HexInt(int): ... def __str__(self): ... return hex(self) ... py> "%d" % HexInt(23) '0x17' which is exactly what we should expect from a subclass. You're suggesting that bytes should ignore any custom display implemented by subclasses, and implicitly coerce them to the superclass int. What is the justification for this? You don't define or even describe what you consider "properly handle". > Unsupported codes > ----------------- > > %r (which calls __repr__), and %a (which calls ascii() on __repr__) are not > supported. +1 on not supporting b'%r' (i.e. I agree with the PEP). Why not support b'%a'? That seems to be a strange thing to prohibit. Everythng else, well done and thank you. -- Steven From ethan at stoneleaf.us Sat Jan 18 02:51:05 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 17 Jan 2014 17:51:05 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <20140118012751.GM3915@ando> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> Message-ID: <52D9DE09.60102@stoneleaf.us> On 01/17/2014 05:27 PM, Steven D'Aprano wrote: > On Fri, Jan 17, 2014 at 08:49:21AM -0800, Ethan Furman wrote: >> >> Overriding Principles >> ===================== >> >> In order to avoid the problems of auto-conversion and Unicode >> exceptions that could plague Py2 code, all object checking will >> be done by duck-typing, not by values contained in a Unicode >> representation [3]_. > > I don't understand this paragraph. What does "values contained in a > Unicode representation" mean? Yeah, that is clunky. I'm trying to convey the idea that we don't want errors based on content, i.e. which characters happens to be in a str. > [...] >> %s is restricted in what it will accept:: >> >> - input type supports Py_buffer? >> use it to collect the necessary bytes > > Can you give some examples of what types support Py_buffer? Presumably > bytes. Anything else? Anybody? Otherwise I'll go spelunking in the code. >> - input type is something else? >> use its __bytes__ method; if there isn't one, raise a TypeError > > I think you should explicitly state that this is a new special method, > and state which built-in types will grow a __bytes__ method (if any). It's not new. I know bytes, str, and numbers /do not/ have __bytes__. >> Numeric Format Codes >> -------------------- >> >> To properly handle int and float subclasses, int(), index(), and float() >> will be called on the objects intended for (d, i, u), (b, o, x, X), and >> (e, E, f, F, g, G). > > > -1 on this idea. > > This is a rather large violation of the principle of least surprise, and > radically different from the behaviour of Python 3 str. In Python 3, > '%d' interpolation calls the __str__ method, so if you subclass, you can > get the behaviour you want: Did you read the bug reports I linked to? This behavior (which is a bug) has already been fixed for Python3.4. As a quick thought experiment, why does "%d" % True return "1"? >> Unsupported codes >> ----------------- >> >> %r (which calls __repr__), and %a (which calls ascii() on __repr__) are not >> supported. > > +1 on not supporting b'%r' (i.e. I agree with the PEP). > > Why not support b'%a'? That seems to be a strange thing to prohibit. I'll admit to being somewhat on the fence about %a. It seems there are two possibilities with %a: 1) have it be ascii(repr(obj)) 2) have it be str(obj).encode('ascii', 'strict') (1) seems only useful for debugging, but even then not very -- if you switch from %s to %a you'll no longer see the bytes output (although you would get the name of the object, which could be handy); (2) is (slightly) blurring the lines between text and encoded-ascii; I would rather see "%s" % text.encode('ascii', 'strict')" So we have two possibilities, both can be useful, I don't know which is most useful or even most logical. So I guess I'm still open to arguments. :) > Everythng else, well done and thank you. You're welcome! Thank you to everyone who participated. -- ~Ethan~ From rosuav at gmail.com Sat Jan 18 03:03:55 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 18 Jan 2014 13:03:55 +1100 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <52D9DE09.60102@stoneleaf.us> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> Message-ID: On Sat, Jan 18, 2014 at 12:51 PM, Ethan Furman wrote: > It seems there are two possibilities with %a: > > 1) have it be ascii(repr(obj)) Wouldn't that be redundant? ascii() is already repr()-like. ChrisA From ethan at stoneleaf.us Sat Jan 18 03:14:52 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 17 Jan 2014 18:14:52 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> Message-ID: <52D9E39C.3010807@stoneleaf.us> On 01/17/2014 06:03 PM, Chris Angelico wrote: > On Sat, Jan 18, 2014 at 12:51 PM, Ethan Furman wrote: >> It seems there are two possibilities with %a: >> >> 1) have it be ascii(repr(obj)) > > Wouldn't that be redundant? ascii() is already repr()-like. Good point. -- ~Ethan~ From ncoghlan at gmail.com Sat Jan 18 03:24:50 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 18 Jan 2014 12:24:50 +1000 Subject: [Python-Dev] PEP 461 - Adding % and {} formatting to bytes In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D59669.20404@stoneleaf.us> <52D78F8A.7060003@stoneleaf.us> <52D80771.1090402@stoneleaf.us> <52D82529.9020708@trueblade.com> <52D92345.5050202@trueblade.com> <52D94318.3070506@trueblade.com> Message-ID: On 18 Jan 2014 06:19, "Terry Reedy" wrote: > > On 1/17/2014 10:15 AM, Mark Lawrence wrote: > >> For both options 1 and 2 surely you cannot be suggesting that after >> people have written 2.x code to use format() as %f formatting is to be >> deprecated, > > > I will not be for at least a decade. It will not be deprecated, period. Originally, we thought that the introduction of the new flexible text formatting system made printf-style formatting redundant. After running both in parallel for a while, we learned we were wrong: - it's far more difficult than we originally anticipated to migrate away from it to the new text formatting system - in particular, the lazy interpolation support in the logging module (and similar systems) has no reasonable migration path - two different core interpolation systems make it much easier to interpolate into format strings - it's a better fit for code which needs to semantically align with C - it's a useful micro-optimisation - as the current discussion shows, it's much better suited to the interpolation of ASCII compatible segments in binary data formats Do many of the core devs strongly prefer the new formatting system? Yes. Were we originally planning to deprecate and remove the printf-style formatting system? Yes. Are there still any plans to do so? No. That's why we rewrote the relevant docs to always describe it as "mod formatting" or "printf-style formatting", rather than "legacy" or "old-style". If there are any instances (or even implications) of the latter left in the official docs, that's a bug to be fixed. Perhaps this needs to be a new Q in my Python 3 Q&A, since a lot of people still seem to have the wrong idea... Regards, Nick. > > >> they now have to change the code back to the way they may >> well have written it in the first place? > > > I would suggest that people simply .encode the result if bytes are needed in 3.x as well as 2.x. Polyglot code will likely have a 'py3' boolean already to make the encoding conditional. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jan 18 03:36:50 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 18 Jan 2014 12:36:50 +1000 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <52D95F11.3020005@stoneleaf.us> References: <52D95F11.3020005@stoneleaf.us> Message-ID: +1 on the technical spec from me. The rationale needs work, but you already know that :) For API consistency, I suggest explicitly noting that bytearray will also support the operation, generating a bytearray result. I also suggest introducing the phrase "ASCII compatible segments in binary formats" somewhere, as the intended use case for *all* the ASCII assuming methods on the bytes and bytearray types, including this new one. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat Jan 18 05:11:09 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 18 Jan 2014 13:11:09 +0900 Subject: [Python-Dev] Migration from Python 2.7 and bytes formatting In-Reply-To: References: Message-ID: <87r486aq02.fsf@uwakimon.sk.tsukuba.ac.jp> Neil Schemenauer writes: > I'd like to try porting code making use of the -2 feature to see how > helpful it is. The behavior is partway between Python 2.x laziness > and Python 3.x strictness in terms of specifying encodings. > > Python 2.x: > [...] > Python 3.x: > [...] The above are descriptions of current behavior (ie, unchanged by PEPs 460, 461), and this: > Python 3.x with -2 command-line option: > > - when objects are formatted into bytes, immediately > encode them using strict ASCII encoding. is the content of this proposal, is that right? From stephen at xemacs.org Sat Jan 18 05:59:57 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 18 Jan 2014 13:59:57 +0900 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: References: <52D95F11.3020005@stoneleaf.us> Message-ID: <87ppnpc2b6.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > I also suggest introducing the phrase "ASCII compatible > segments in binary formats" somewhere, What is the use case for "ASCII *compatible* segments"? Can't you just say "ASCII segments"? I'm not sure exactly what PEP 461 says at this point, but most of the discussion prescribes .encode('ascii', errors='strict') for implicit interpolation of str. "ASCII compatible" is a term that people consistently to interpret to include the bytes representation of their data. Although the actual rule isn't terribly complex (bytes 0-127 must always have ASCII coded character semantics[1]), AFAIK there are no use cases for that other than encoded text, ie, interpolating str, and nobody wants that done leniently in Python 3. Footnotes: [1] Otherwise you need to analyze the content of data to determine whether "ASCII-compatible" operations are safe to perform. Of course that's possible but it was repeatedly rejected in favor of duck-typing. From stefan_ml at behnel.de Sat Jan 18 08:26:09 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 18 Jan 2014 08:26:09 +0100 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <20140118012751.GM3915@ando> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> Message-ID: Steven D'Aprano, 18.01.2014 02:27: > On Fri, Jan 17, 2014 at 08:49:21AM -0800, Ethan Furman wrote: >> %s is restricted in what it will accept:: >> >> - input type supports Py_buffer? >> use it to collect the necessary bytes > > Can you give some examples of what types support Py_buffer? Presumably > bytes. Anything else? Lots of things: bytes, bytearray, memoryview, array.array, NumPy arrays, just to name a few. Basically anything that wants itself to be representable as a chunk of memory with metadata. It's a very common thing in the Big Data department (although many people wouldn't know that they're actually heavy users of this protocol because they just use NumPy and/or Cython and don't look under the hood). Stefan From storchaka at gmail.com Sat Jan 18 10:02:20 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 18 Jan 2014 11:02:20 +0200 Subject: [Python-Dev] .clinic.c vs .c.clinic Message-ID: <1870884.AZlKeLdVux@raxxla> After the latest Argument Clinic updates my patches began to look much better. Thank you Larry. Now Argument Clinic supports output to side file (this is not default, you should specify "output preset file" at the start of first clinic declaration). I already wrote about this here, but it seems my post got lost in the heart of one of the numerous threads and was not noticed. So I repeat it as a separate thread. Now generated files have suffixes .clinic.c. I think it will be better, if they will end at special suffix (.c.clinic or even just .clinic). My reasons: 1. I very very often use global search in sources. It's my way of navigation and it's my way of investigations. I don't want to get false results in generated files. And it is much easy to specify mask '*.[ch]' or '*.c,*.h' (depending on tool) than specify a mask and negative mask. The latter is even not always possible, I can write cumbersome expression for the find command, but Midnight Commander doesn't support negative masks at all (and perhaps your favorite IDE doesn't support them too). 2. I'm not use any IDE, but if you use, it can be important for you. If IDE shows sources tree, unlikely you want to see generated *.clinic.c files in them. This will increase the list of sources almost twice. 3. Pathname expansion works better with unique endings, You can open all Modules/_io/*.c files, but unlikely you so interested in *.clinic.c files which are matched by former pattern. 4. .c suffix at the end lies. This is not compilable C source file. This file should be included in other C source file. This will confuse accidental user and other tools. Including Argument Clinic itself, this is why it inserts the "preserve" directive at the start of generated file. But other tools have no such sign. My attempt to convince Larry on IRC failed. He agreed to change his opinion only if other core developers persuade him. I ask you to help me convince Larry. From rosuav at gmail.com Sat Jan 18 10:06:56 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 18 Jan 2014 20:06:56 +1100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <1870884.AZlKeLdVux@raxxla> References: <1870884.AZlKeLdVux@raxxla> Message-ID: On Sat, Jan 18, 2014 at 8:02 PM, Serhiy Storchaka wrote: > 2. I'm not use any IDE, but if you use, it can be important for you. If IDE > shows sources tree, unlikely you want to see generated *.clinic.c files in > them. This will increase the list of sources almost twice. A point for the contrary side: In any editor or IDE with syntax highlighting, a .clinic.c file will be highlighted as C code, but it would take extra configuration to handle a .clinic file that way. But that's a relatively minor consideration (AIUI most people won't be looking at the .clinic files much, and for those who do, configure the editor appropriately). ChrisA From solipsis at pitrou.net Sat Jan 18 12:40:06 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 18 Jan 2014 12:40:06 +0100 Subject: [Python-Dev] PEP 461 Final? References: <52D95F11.3020005@stoneleaf.us> Message-ID: <20140118124006.5a8c7314@fsol> On Fri, 17 Jan 2014 08:49:21 -0800 Ethan Furman wrote: > ================================================================================ > PEP: 461 There are formatting issues in the HTML rendering, I think the ReST code needs a bit massaging: http://www.python.org/dev/peps/pep-0461/ > .. note:: > > Because the str type does not have a __bytes__ method, attempts to > directly use 'a string' as a bytes interpolation value will raise an > exception. To use 'string' values, they must be encoded or otherwise > transformed into a bytes sequence:: s/'string' values/unicode strings/ Regards Antoine. From ncoghlan at gmail.com Sat Jan 18 14:28:44 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 18 Jan 2014 23:28:44 +1000 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> Message-ID: On 18 Jan 2014 19:08, "Chris Angelico" wrote: > > On Sat, Jan 18, 2014 at 8:02 PM, Serhiy Storchaka wrote: > > 2. I'm not use any IDE, but if you use, it can be important for you. If IDE > > shows sources tree, unlikely you want to see generated *.clinic.c files in > > them. This will increase the list of sources almost twice. > > A point for the contrary side: In any editor or IDE with syntax > highlighting, a .clinic.c file will be highlighted as C code, but it > would take extra configuration to handle a .clinic file that way. But > that's a relatively minor consideration (AIUI most people won't be > looking at the .clinic files much, and for those who do, configure the > editor appropriately). I can argue either side, but the biggest potential problem I see with Serhiy's suggestion is the likelihood of breaking automatic cross referencing of symbols in most IDEs, as well as causing possible issues for interactive debuggers. These are at least valid fragments of C files, even if they're not designed to be compiled independently. However, if both Visual Studio and gdb can still find the symbols correctly, even with the ".clinic" extension, then I would consider that a point strongly in favour of Serhiy's suggestion. Picking up on a side comment in Serhiy's post, based on my experience reviewing a patch that included changes to clinic input blocks, I'd also prefer if a parallel file was the default, and single file was opt in (or not allowed at all). Getting changes reviewed and merged is one of the biggest bottlenecks in our workflow, and the inline version of clinic is much harder to review due to the intermingled diff of clinic input and generated output. Cheers, Nick. > > ChrisA > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jan 18 14:48:55 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 18 Jan 2014 23:48:55 +1000 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <52D9DE09.60102@stoneleaf.us> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> Message-ID: On 18 Jan 2014 11:52, "Ethan Furman" wrote: > > On 01/17/2014 05:27 PM, Steven D'Aprano wrote: >> >> On Fri, Jan 17, 2014 at 08:49:21AM -0800, Ethan Furman wrote: >>> >>> >>> Overriding Principles >>> ===================== >>> >>> In order to avoid the problems of auto-conversion and Unicode >>> exceptions that could plague Py2 code, all object checking will >>> be done by duck-typing, not by values contained in a Unicode >>> representation [3]_. >> >> >> I don't understand this paragraph. What does "values contained in a >> Unicode representation" mean? > > > Yeah, that is clunky. I'm trying to convey the idea that we don't want errors based on content, i.e. which characters happens to be in a str. > > > >> [...] >>> >>> %s is restricted in what it will accept:: >>> >>> - input type supports Py_buffer? >>> use it to collect the necessary bytes >> >> >> Can you give some examples of what types support Py_buffer? Presumably >> bytes. Anything else? > > > Anybody? Otherwise I'll go spelunking in the code. bytes, bytearray, memoryview, ctypes arrays, array.array, numpy.ndarrray It may actually be clearer to express this in terms of memoryview for the benefits of those that aren't familiar with the C API, as that is the closest equivalent Python level API (while there is an open issue regarding the C only nature of the buffer export API, nobody has volunteered to put together a PEP and implementation for a Python level follow up to the C level PEP 3118. The problem is that the original use cases involve C extensions anyway, so the relevant experts don't have any personal need for a Python level buffer exporter interface. Instead, it's in the "should be done for completeness, and would make some of our testing easier, but doesn't have anyone clamouring for it" bucket. > > > >>> - input type is something else? >>> use its __bytes__ method; if there isn't one, raise a TypeError >> >> >> I think you should explicitly state that this is a new special method, >> and state which built-in types will grow a __bytes__ method (if any). > > > It's not new. I know bytes, str, and numbers /do not/ have __bytes__. Right, it is already used by bytes to convert arbitrary objects to a binary representation. The difference with Py_buffer/memoryview is that they provide access to the raw data without necessarily copying anything. str and numbers don't implement it as there's no obvious default interpretation (the b'\x00' * n interpretation of integers is part of the bytes constructor and now a decision we mostly regret - it should have been a keyword argument or a separate class method) > > > >>> Unsupported codes >>> ----------------- >>> >>> %r (which calls __repr__), and %a (which calls ascii() on __repr__) are not >>> supported. >> >> >> +1 on not supporting b'%r' (i.e. I agree with the PEP). >> >> Why not support b'%a'? That seems to be a strange thing to prohibit. > > > I'll admit to being somewhat on the fence about %a. > > It seems there are two possibilities with %a: > > 1) have it be ascii(repr(obj)) > > 2) have it be str(obj).encode('ascii', 'strict') This gets very close to crossing the line into implicit encoding of text again. Binary interpolation is being added back for the specific use case of working with ASCII compatible segments in binary formats, and it's at best arguable that supporting %a will help with that use case. However, without it, there may be a greater temptation to inappropriately define __bytes__ just to support binary interpolation, rather than because a type truly has an appropriate translation directly to bytes. By allowing %a, we avoid that temptation. This is also potentially useful specifically in the case of binary logging formats and as a quick way to request backslash escaping of non-ASCII characters in text. Call it +0.5 for allowing %a. I don't expect it to be used heavily, but I think it will head off a fair bit of potential misuse of __bytes__. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Sat Jan 18 15:39:29 2014 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sat, 18 Jan 2014 14:39:29 +0000 Subject: [Python-Dev] PEP 461 updates In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D85E61.8040701@udel.edu> <874n53cptr.fsf@uwakimon.sk.tsukuba.ac.jp> <20140117053611.GB3915@ando> <87wqhzapz1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 17 January 2014 21:37, Chris Barker wrote: > > For the record, we've got a pretty good thread (not this good, though!) over > on the numpy list about how to untangle the mess that has resulted from > porting text-file-parsing code to py3 (and the underlying issue with the 'S' > data type in numpy...) > > One note from the github issue: > """ > The use of asbytes originates only from the fact that b'%d' % (20,) does > not work. > """ > > So yeah PEP 461! (even if too late for numpy...) The discussion about numpy.loadtxt and the 'S' dtype is not relevant to PEP 461. PEP 461 is about facilitating handling ascii/binary protocols and file formats. The loadtxt function is for reading text files. Reading text files is already handled very well in Python 3. The only caveat is that you need to specify the encoding when you open the file. The loadtxt function doesn't specify the encoding when it opens the file so on Python 3 it gets the system default encoding when reading from the file. Since the 'S' dtype is for an array of bytes the loadtxt function has to encode the unicode strings before storing them in the array. The function has no idea what encoding the user wants so it just uses latin-1 leading to mojibake if the file content and encoding are not compatible with latin-1 e.g.: utf-8. The loadtxt function is a classic example of how *not* to do text and whoever made it that way probably didn't understand unicode and the Python 3 text model. If they did understand what they were doing then they knew that they were implementing a dirty hack. If you want to draw a relevant lesson from that thread in this one then the lesson argues against PEP 461: adding back the bytes formatting methods helps people who refuse to understand text processing and continue implementing dirty hacks instead of doing it properly. Oscar From stefan at bytereef.org Sat Jan 18 17:24:19 2014 From: stefan at bytereef.org (Stefan Krah) Date: Sat, 18 Jan 2014 17:24:19 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <1870884.AZlKeLdVux@raxxla> References: <1870884.AZlKeLdVux@raxxla> Message-ID: <20140118162419.GA31614@sleipnir.bytereef.org> Serhiy Storchaka wrote: > Now generated files have suffixes .clinic.c. I think it will be better, if they > will end at special suffix (.c.clinic or even just .clinic). Can the output not go into a header file with static inline functions? I'd rather see memoryview.h than memoryview.clinic.c. Stefan Krah From eric at trueblade.com Sat Jan 18 17:30:21 2014 From: eric at trueblade.com (Eric V. Smith) Date: Sat, 18 Jan 2014 11:30:21 -0500 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <20140118162419.GA31614@sleipnir.bytereef.org> References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> Message-ID: <52DAAC1D.8060204@trueblade.com> On 1/18/2014 11:24 AM, Stefan Krah wrote: > Serhiy Storchaka wrote: >> Now generated files have suffixes .clinic.c. I think it will be better, if they >> will end at special suffix (.c.clinic or even just .clinic). > > Can the output not go into a header file with static inline functions? > > I'd rather see memoryview.h than memoryview.clinic.c. Same here. There's some history for this, but not for generated code. In Objects/stringlib, all of the files are .h files. They're really C code designed to be included by other .c files. Eric. From stefan at bytereef.org Sat Jan 18 18:06:06 2014 From: stefan at bytereef.org (Stefan Krah) Date: Sat, 18 Jan 2014 18:06:06 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <20140118162419.GA31614@sleipnir.bytereef.org> References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> Message-ID: <20140118170606.GA31878@sleipnir.bytereef.org> > I'd rather see memoryview.h than memoryview.clinic.c. Or, if this collides with Include/*, one of the following: memoryview_func.h // public functions memoryview_if.h // public interface Stefan Krah From solipsis at pitrou.net Sat Jan 18 18:09:21 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 18 Jan 2014 18:09:21 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> <20140118170606.GA31878@sleipnir.bytereef.org> Message-ID: <20140118180921.6b14ccd5@fsol> On Sat, 18 Jan 2014 18:06:06 +0100 Stefan Krah wrote: > > I'd rather see memoryview.h than memoryview.clinic.c. > > Or, if this collides with Include/*, one of the following: > > memoryview_func.h // public functions > > memoryview_if.h // public interface Objects/memoryview.clinic.h should be fine. Regards Antoine. From stefan at bytereef.org Sat Jan 18 18:18:49 2014 From: stefan at bytereef.org (Stefan Krah) Date: Sat, 18 Jan 2014 18:18:49 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <20140118180921.6b14ccd5@fsol> References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> <20140118170606.GA31878@sleipnir.bytereef.org> <20140118180921.6b14ccd5@fsol> Message-ID: <20140118171849.GA31959@sleipnir.bytereef.org> Antoine Pitrou wrote: > Objects/memoryview.clinic.h should be fine. Last attempt: Objects/memoryview.api.h That is more neutral and describes what the file contains. IOW, it's easier to ignore the name (which is good in this case). Stefan Krah From solipsis at pitrou.net Sat Jan 18 18:23:47 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 18 Jan 2014 18:23:47 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> <20140118170606.GA31878@sleipnir.bytereef.org> <20140118180921.6b14ccd5@fsol> <20140118171849.GA31959@sleipnir.bytereef.org> Message-ID: <20140118182347.5ff70152@fsol> On Sat, 18 Jan 2014 18:18:49 +0100 Stefan Krah wrote: > Antoine Pitrou wrote: > > Objects/memoryview.clinic.h should be fine. > > Last attempt: > > Objects/memoryview.api.h > > > That is more neutral and describes what the file contains. Disagreed. It's not an API in the sense that it's something that's designed to be called directly by third-party code. Regards Antoine. From stefan at bytereef.org Sat Jan 18 18:39:08 2014 From: stefan at bytereef.org (Stefan Krah) Date: Sat, 18 Jan 2014 18:39:08 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <20140118182347.5ff70152@fsol> References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> <20140118170606.GA31878@sleipnir.bytereef.org> <20140118180921.6b14ccd5@fsol> <20140118171849.GA31959@sleipnir.bytereef.org> <20140118182347.5ff70152@fsol> Message-ID: <20140118173908.GA32139@sleipnir.bytereef.org> Antoine Pitrou wrote: > > Objects/memoryview.api.h > > > > > > That is more neutral and describes what the file contains. > > Disagreed. It's not an API in the sense that it's something that's > designed to be called directly by third-party code. Right. Objects/memoryview.ac.h perhaps? I sort of dislike reading full words in filename extensions. Stefan Krah From ethan at stoneleaf.us Sat Jan 18 18:18:53 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 18 Jan 2014 09:18:53 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <20140118124006.5a8c7314@fsol> References: <52D95F11.3020005@stoneleaf.us> <20140118124006.5a8c7314@fsol> Message-ID: <52DAB77D.8080902@stoneleaf.us> On 01/18/2014 03:40 AM, Antoine Pitrou wrote: > On Fri, 17 Jan 2014 08:49:21 -0800 > Ethan Furman wrote: >> ================================================================================ >> PEP: 461 > > There are formatting issues in the HTML rendering, I think the ReST > code needs a bit massaging: > http://www.python.org/dev/peps/pep-0461/ I'm not seeing the problems (could be I don't have enough experience to spot them). >> .. note:: >> >> Because the str type does not have a __bytes__ method, attempts to >> directly use 'a string' as a bytes interpolation value will raise an >> exception. To use 'string' values, they must be encoded or otherwise >> transformed into a bytes sequence:: > > s/'string' values/unicode strings/ Fixed, thanks. -- ~Ethan~ From larry at hastings.org Sat Jan 18 18:51:19 2014 From: larry at hastings.org (Larry Hastings) Date: Sat, 18 Jan 2014 09:51:19 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> Message-ID: <52DABF17.30401@hastings.org> On 01/18/2014 05:28 AM, Nick Coghlan wrote: > > However, if both Visual Studio and gdb can still find the symbols > correctly, even with the ".clinic" extension, then I would consider > that a point strongly in favour of Serhiy's suggestion. > No, that would be a lack of a point against Serhiy's suggestion. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Jan 18 18:56:38 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 18 Jan 2014 19:56:38 +0200 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> Message-ID: 18.01.14 11:06, Chris Angelico ???????(??): > A point for the contrary side: In any editor or IDE with syntax > highlighting, a .clinic.c file will be highlighted as C code, but it > would take extra configuration to handle a .clinic file that way. But > that's a relatively minor consideration (AIUI most people won't be > looking at the .clinic files much, and for those who do, configure the > editor appropriately). Yes, this was the main Larry's objection. And as you, I think this is a minor consideration (for same reasons). From storchaka at gmail.com Sat Jan 18 19:07:37 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 18 Jan 2014 20:07:37 +0200 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> Message-ID: 18.01.14 15:28, Nick Coghlan ???????(??): > I can argue either side, but the biggest potential problem I see with > Serhiy's suggestion is the likelihood of breaking automatic cross > referencing of symbols in most IDEs, as well as causing possible issues > for interactive debuggers. These are at least valid fragments of C > files, even if they're not designed to be compiled independently. > However, if both Visual Studio and gdb can still find the symbols > correctly, even with the ".clinic" extension, then I would consider that > a point strongly in favour of Serhiy's suggestion. Good point. This idea did not come into my mind, and now I am almost ready to give up my proposals. But C allows you to include files with any extensions (.h, hpp, .h++, .c, .cpp, .inc, .gen, etc), and a powerful tool should monitor "#include"s not paying attention to expansions. On the other hand, simpler tools can work with filename masks, and for them it is much easier to add a new extension than to set exclude condition (the last option may not be supported at all). At least it is so with the tools that I use. From storchaka at gmail.com Sat Jan 18 19:15:01 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 18 Jan 2014 20:15:01 +0200 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <20140118173908.GA32139@sleipnir.bytereef.org> References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> <20140118170606.GA31878@sleipnir.bytereef.org> <20140118180921.6b14ccd5@fsol> <20140118171849.GA31959@sleipnir.bytereef.org> <20140118182347.5ff70152@fsol> <20140118173908.GA32139@sleipnir.bytereef.org> Message-ID: 18.01.14 19:39, Stefan Krah ???????(??): > Right. Objects/memoryview.ac.h perhaps? I sort of dislike reading full words > in filename extensions. .ac is well known suffix of autoconf related files. And tail .h has same disadvantages as .c. From storchaka at gmail.com Sat Jan 18 19:10:47 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 18 Jan 2014 20:10:47 +0200 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <20140118180921.6b14ccd5@fsol> References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> <20140118170606.GA31878@sleipnir.bytereef.org> <20140118180921.6b14ccd5@fsol> Message-ID: 18.01.14 19:09, Antoine Pitrou ???????(??): > On Sat, 18 Jan 2014 18:06:06 +0100 > Stefan Krah wrote: >>> I'd rather see memoryview.h than memoryview.clinic.c. >> >> Or, if this collides with Include/*, one of the following: >> >> memoryview_func.h // public functions >> >> memoryview_if.h // public interface > > Objects/memoryview.clinic.h should be fine. All my objections against .clinic.c are applicable to .clinic.h as well. From zachary.ware+pydev at gmail.com Sat Jan 18 19:29:43 2014 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Sat, 18 Jan 2014 12:29:43 -0600 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> <20140118170606.GA31878@sleipnir.bytereef.org> <20140118180921.6b14ccd5@fsol> Message-ID: On Sat, Jan 18, 2014 at 12:10 PM, Serhiy Storchaka wrote: > 18.01.14 19:09, Antoine Pitrou ???????(??): > >> On Sat, 18 Jan 2014 18:06:06 +0100 >> Stefan Krah wrote: >>>> >>>> I'd rather see memoryview.h than memoryview.clinic.c. >>> >>> >>> Or, if this collides with Include/*, one of the following: >>> >>> memoryview_func.h // public functions >>> >>> memoryview_if.h // public interface >> >> >> Objects/memoryview.clinic.h should be fine. > > > All my objections against .clinic.c are applicable to .clinic.h as well. Would it be of any help for the clinic files to live in their own separate directory? Say, instead of Objects/memoryview.clinic.c, Clinic/memoryview.clinic.c? -- Zach From larry at hastings.org Sat Jan 18 19:49:39 2014 From: larry at hastings.org (Larry Hastings) Date: Sat, 18 Jan 2014 10:49:39 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <1870884.AZlKeLdVux@raxxla> References: <1870884.AZlKeLdVux@raxxla> Message-ID: <52DACCC3.9080800@hastings.org> On 01/18/2014 01:02 AM, Serhiy Storchaka wrote: > 1. I very very often use global search in sources. It's my way of navigation > and it's my way of investigations. I don't want to get false results in > generated files. And it is much easy to specify mask '*.[ch]' or '*.c,*.h' > (depending on tool) than specify a mask and negative mask. The latter is even > not always possible, I can write cumbersome expression for the find command, > but Midnight Commander doesn't support negative masks at all (and perhaps your > favorite IDE doesn't support them too). Apparently you do this at the command-line. In that case, you can make an 'alias' to hide the cumbersome expression. Perhaps you've already made one that ignores the ".hg" directory tree? If the generated file didn't end in a standard extension, editors won't automatically recognize them and won't code-color them. You tell me "everyone can easily reconfigure their editors" but it seems you writing an alias is unreasonable. > 2. I'm not use any IDE, but if you use, it can be important for you. If IDE > shows sources tree, unlikely you want to see generated *.clinic.c files in > them. This will increase the list of sources almost twice. My experience is that IDEs either show all files in the "project" (which should include the generated files anyway) or they show all files in the directory. So this concern assumes behavior that isn't true. > 3. Pathname expansion works better with unique endings, You can open all > Modules/_io/*.c files, but unlikely you so interested in *.clinic.c files which > are matched by former pattern. How often do people edit *.c in a directory? And then, how often do people edit *.c in a directory and wouldn't want to see the Argument Clinic generated code? > 4. .c suffix at the end lies. This is not compilable C source file. This file > should be included in other C source file. This will confuse accidental user > and other tools. Including Argument Clinic itself, this is why it inserts the > "preserve" directive at the start of generated file. But other tools have no > such sign. This is nonsense. The contents of the file is 100% C. If you added the proper include files (by hand, not recommended) it would compile standalone. A lot of your suggestions assume no one would ever want to examine the generated code. But people will still want to look in there: * to set breakpoints * to make sure existing Argument Clinic generated code does what you wanted * when experimenting with Argument Clinic inputs So I don't see the need to make the generated files totally invisible. Later in the thread someone suggests that ".h" would be a better ending; I'm willing to consider that. (As in ".clinic.h".) After all, you do include it, and there's some precedent for C code in H files (the already-cited stringlib). Also, now I'm starting to worry that adding ".clinic.c" files to an IDE would mean the IDE would try to compile them. Can somebody who uses an IDE to compile Python code experiment with ".clinic.c" files and report back--is it possible to add them to your "project" in such a way that the compiler will notice when they changed but won't try to compile them standalone? I'm thinking specifically of MSVS, as that's explicitly supported by CPython, but I'm interested in results from other IDEs if people use them with CPython trunk. Serhiy: I appreciate your contributions, both to Python in general and to Argument Clinic specifically. And you're only doing this because you care. Still, I feel like you've never been shown a bikeshed you didn't have an opinion on. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Sat Jan 18 20:05:50 2014 From: stefan at bytereef.org (Stefan Krah) Date: Sat, 18 Jan 2014 20:05:50 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> <20140118170606.GA31878@sleipnir.bytereef.org> <20140118180921.6b14ccd5@fsol> <20140118171849.GA31959@sleipnir.bytereef.org> <20140118182347.5ff70152@fsol> <20140118173908.GA32139@sleipnir.bytereef.org> Message-ID: <20140118190550.GA403@sleipnir.bytereef.org> Serhiy Storchaka wrote: > .ac is well known suffix of autoconf related files. I know, but unless someone writes Objects/configure.c I think this won't be a problem. > And tail .h has same disadvantages as .c. I'm not strongly inconvenienced by those you listed. Stefan Krah From ethan at stoneleaf.us Sat Jan 18 23:01:03 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 18 Jan 2014 14:01:03 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> Message-ID: <52DAF99F.4050609@stoneleaf.us> On 01/18/2014 05:48 AM, Nick Coghlan wrote: > On 18 Jan 2014 11:52, "Ethan Furman" wrote: >> >> I'll admit to being somewhat on the fence about %a. >> >> It seems there are two possibilities with %a: >> >> 1) have it be ascii(repr(obj)) >> >> 2) have it be str(obj).encode('ascii', 'strict') > > This gets very close to crossing the line into implicit encoding of text again. Binary interpolation is being added back > for the specific use case of working with ASCII compatible segments in binary formats, and it's at best arguable that > supporting %a will help with that use case. Agreed. > However, without it, there may be a greater temptation to inappropriately define __bytes__ just to support binary > interpolation, rather than because a type truly has an appropriate translation directly to bytes. True. > By allowing %a, we avoid that temptation. This is also potentially useful specifically in the case of binary logging > formats and as a quick way to request backslash escaping of non-ASCII characters in text. > > Call it +0.5 for allowing %a. I don't expect it to be used heavily, but I think it will head off a fair bit of potential > misuse of __bytes__. So, if %a is added it would act like: --------- "%a" % some_obj --------- tmp = str(some_obj) res = b'' for ch in tmp: if ord(ch) < 256: res += bytes([ord(ch)] else: res += unicode_escape(ch) --------- where 'unicode_escape' would yield something like "\u0440" ? -- ~Ethan~ From nas at arctrix.com Sat Jan 18 23:31:11 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Sat, 18 Jan 2014 16:31:11 -0600 Subject: [Python-Dev] Migration from Python 2.7 and bytes formatting In-Reply-To: <87r486aq02.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87r486aq02.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140118223111.GA27903@python.ca> On 2014-01-18, Stephen J. Turnbull wrote: > The above are descriptions of current behavior (ie, unchanged by PEPs > 460, 461), and this: [..] > is the content of this proposal, is that right? The proposal is that -2 enables the following: - %r as an alias for %a (i.e. calls ascii()) - %s will fallback to calling PyObject_Str() and then call _PyUnicode_AsASCIIString(obj, "strict") to convert to bytes That's it. After sleeping on it, I'm not sure that's enough Python 2.x compatibility to help a lot. I haven't ported much code to 3.x yet but I imagine the following are major challenges: - comparisons between str and bytes always returns unequal - indexing/iterating bytes returns integers, not bytes objects - concatenation of str and bytes fails (not so bad since a TypeError is generated right away). Maybe the -2 command line option could revert to Python 2.x behavior for the above but I'm worried it might break working 3.x library code (the %r/%s change is very safe). I think I'll play with the idea and see which unit tests get broken. Ideally, there would be warnings generated when each backwards compatible behavior kicks in, that would greatly help when fixing up code. Neil From shibturn at gmail.com Sun Jan 19 00:32:31 2014 From: shibturn at gmail.com (Richard Oudkerk) Date: Sat, 18 Jan 2014 23:32:31 +0000 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <20140118180921.6b14ccd5@fsol> References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> <20140118170606.GA31878@sleipnir.bytereef.org> <20140118180921.6b14ccd5@fsol> Message-ID: <52DB0F0F.5020207@gmail.com> On 18/01/2014 05:09 pm, Antoine Pitrou wrote: > Or, if this collides with Include/*, one of the following: > > memoryview_func.h // public functions > > memoryview_if.h // public interface > Objects/memoryview.clinic.h should be fine. Or maybe have a __clinic__ directory similar to __pycache__. -- Richard From Steve.Dower at microsoft.com Sun Jan 19 01:44:08 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sun, 19 Jan 2014 00:44:08 +0000 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DACCC3.9080800@hastings.org> References: <1870884.AZlKeLdVux@raxxla>,<52DACCC3.9080800@hastings.org> Message-ID: <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> Visual Studio will try to compile them if they end with .c, though this can be disabled on a per-file basis in the project file. Files ending in .h won't be compiled, though changes should be detected and cause the .c files that include them to be recompiled. .inl is also sometimes used as an extension for this purpose. I don't recall whether VS will add file associations for this type. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Larry Hastings Sent: ?1/?18/?2014 10:58 To: python-dev at python.org Subject: Re: [Python-Dev] .clinic.c vs .c.clinic On 01/18/2014 01:02 AM, Serhiy Storchaka wrote: 1. I very very often use global search in sources. It's my way of navigation and it's my way of investigations. I don't want to get false results in generated files. And it is much easy to specify mask '*.[ch]' or '*.c,*.h' (depending on tool) than specify a mask and negative mask. The latter is even not always possible, I can write cumbersome expression for the find command, but Midnight Commander doesn't support negative masks at all (and perhaps your favorite IDE doesn't support them too). Apparently you do this at the command-line. In that case, you can make an 'alias' to hide the cumbersome expression. Perhaps you've already made one that ignores the ".hg" directory tree? If the generated file didn't end in a standard extension, editors won't automatically recognize them and won't code-color them. You tell me "everyone can easily reconfigure their editors" but it seems you writing an alias is unreasonable. 2. I'm not use any IDE, but if you use, it can be important for you. If IDE shows sources tree, unlikely you want to see generated *.clinic.c files in them. This will increase the list of sources almost twice. My experience is that IDEs either show all files in the "project" (which should include the generated files anyway) or they show all files in the directory. So this concern assumes behavior that isn't true. 3. Pathname expansion works better with unique endings, You can open all Modules/_io/*.c files, but unlikely you so interested in *.clinic.c files which are matched by former pattern. How often do people edit *.c in a directory? And then, how often do people edit *.c in a directory and wouldn't want to see the Argument Clinic generated code? 4. .c suffix at the end lies. This is not compilable C source file. This file should be included in other C source file. This will confuse accidental user and other tools. Including Argument Clinic itself, this is why it inserts the "preserve" directive at the start of generated file. But other tools have no such sign. This is nonsense. The contents of the file is 100% C. If you added the proper include files (by hand, not recommended) it would compile standalone. A lot of your suggestions assume no one would ever want to examine the generated code. But people will still want to look in there: * to set breakpoints * to make sure existing Argument Clinic generated code does what you wanted * when experimenting with Argument Clinic inputs So I don't see the need to make the generated files totally invisible. Later in the thread someone suggests that ".h" would be a better ending; I'm willing to consider that. (As in ".clinic.h".) After all, you do include it, and there's some precedent for C code in H files (the already-cited stringlib). Also, now I'm starting to worry that adding ".clinic.c" files to an IDE would mean the IDE would try to compile them. Can somebody who uses an IDE to compile Python code experiment with ".clinic.c" files and report back--is it possible to add them to your "project" in such a way that the compiler will notice when they changed but won't try to compile them standalone? I'm thinking specifically of MSVS, as that's explicitly supported by CPython, but I'm interested in results from other IDEs if people use them with CPython trunk. Serhiy: I appreciate your contributions, both to Python in general and to Argument Clinic specifically. And you're only doing this because you care. Still, I feel like you've never been shown a bikeshed you didn't have an opinion on. /arry -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Sun Jan 19 01:56:36 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sat, 18 Jan 2014 16:56:36 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() Message-ID: <87vbxgeqm3.fsf@vostro.rath.org> Hello, I'm trying to convert functions using parse_time_t_args() (from timemodule.c) for argument parsing to argument clinic. The function is defined as: ,---- | static int | parse_time_t_args(PyObject *args, char *format, time_t *pwhen) | { | PyObject *ot = NULL; | time_t whent; | | if (!PyArg_ParseTuple(args, format, &ot)) | return 0; | if (ot == NULL || ot == Py_None) { | whent = time(NULL); | } | else { | if (_PyTime_ObjectToTime_t(ot, &whent) == -1) | return 0; | } | *pwhen = whent; | return 1; | } `---- and used like this: ,---- | static PyObject * | time_localtime(PyObject *self, PyObject *args) | { | time_t when; | struct tm buf; | | if (!parse_time_t_args(args, "|O:localtime", &when)) | return NULL; | if (pylocaltime(&when, &buf) == -1) | return NULL; | return tmtotuple(&buf); | } `---- In other words, if any Python object is passed to it, it calls _PyTime_ObjectToTime_t on it to convert it to time_t, and otherwise uses time(NULL) as the default value. May first attempt to implement something similar in argument clinic was: ,---- | /*[python input] | class time_t_converter(CConverter): | type = 'time_t' | converter = 'time_t_converter' | default = None | py_default = 'None' | c_default = 'time(NULL)' | converter = '_PyTime_ObjectToTime_t' | [python start generated code]*/ | | /*[clinic input] | time.localtime | | seconds: time_t | / | | bla. | [clinic start generated code]*/ `---- However, running clinic.py on this file gives: ,---- | $ Tools/clinic/clinic.py Modules/timemodule.c | Error in file "Modules/timemodule.c" on line 529: | Exception raised during parsing: | Traceback (most recent call last): | File "Tools/clinic/clinic.py", line 1445, in parse | parser.parse(block) | File "Tools/clinic/clinic.py", line 2738, in parse | self.state(None) | File "Tools/clinic/clinic.py", line 3468, in state_terminal | self.function.docstring = self.format_docstring() | File "Tools/clinic/clinic.py", line 3344, in format_docstring | s += "".join(a) | TypeError: sequence item 2: expected str instance, NoneType found `---- What am I doing wrong? Best, Nikolaus -- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C ?Time flies like an arrow, fruit flies like a Banana.? From ethan at stoneleaf.us Sun Jan 19 01:47:51 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 18 Jan 2014 16:47:51 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DACCC3.9080800@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> Message-ID: <52DB20B7.4000700@stoneleaf.us> On 01/18/2014 10:49 AM, Larry Hastings wrote: > > Later in the thread someone suggests that ".h" would be a better ending; I'm willing to consider that. I'll cast a vote for .clinic.h. :) -- ~Ethan~ From nas at arctrix.com Sun Jan 19 02:21:25 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 19 Jan 2014 01:21:25 +0000 (UTC) Subject: [Python-Dev] PEP 461 Final? References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> <52DAF99F.4050609@stoneleaf.us> Message-ID: Ethan Furman wrote: > So, if %a is added it would act like: > > --------- > "%a" % some_obj > --------- > tmp = str(some_obj) > res = b'' > for ch in tmp: > if ord(ch) < 256: > res += bytes([ord(ch)] > else: > res += unicode_escape(ch) > --------- > > where 'unicode_escape' would yield something like "\u0440" ? My patch on the tracker already implements %a, it's simple. Just call PyObject_ASCII() (same as ascii()) then call PyUnicode_AsLatin1String(s) to convert it to bytes and stick it in. PyObject_ASCII does not return non-ASCII characters, no decode error is possible. We could call _PyUnicode_AsASCIIString(s, "strict") instead if we are afraid for non-ASCII bytes coming out of PyObject_ASCII. Neil From nas at arctrix.com Sun Jan 19 02:26:19 2014 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 19 Jan 2014 01:26:19 +0000 (UTC) Subject: [Python-Dev] PEP 461 Final? References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> Message-ID: Steven D'Aprano wrote: >> To properly handle int and float subclasses, int(), index(), and float() >> will be called on the objects intended for (d, i, u), (b, o, x, X), and >> (e, E, f, F, g, G). > > > -1 on this idea. > > This is a rather large violation of the principle of least surprise, and > radically different from the behaviour of Python 3 str. In Python 3, > '%d' interpolation calls the __str__ method, so if you subclass, you can > get the behaviour you want: > > py> class HexInt(int): > ... def __str__(self): > ... return hex(self) > ... > py> "%d" % HexInt(23) > '0x17' > > > which is exactly what we should expect from a subclass. > > You're suggesting that bytes should ignore any custom display > implemented by subclasses, and implicitly coerce them to the superclass > int. What is the justification for this? You don't define or even > describe what you consider "properly handle". The proposed behavior (at least as I understand it and as I've implemented in my proposed patch) matches Python 2 str/unicode and Python 3 str behavior for these codes. If you want to allow subclasses to have control or to use duck-typing, you have to use str and __format__. I'm okay with the limitation, bytes formatting can be simple, limited and fast. Neil From rmsr at lab.net Sun Jan 19 03:11:31 2014 From: rmsr at lab.net (Ryan Smith-Roberts) Date: Sat, 18 Jan 2014 18:11:31 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: <87vbxgeqm3.fsf@vostro.rath.org> References: <87vbxgeqm3.fsf@vostro.rath.org> Message-ID: Hi Nikolaus. I also started a conversion of timemodule, but dropped it when I saw in the issue that you had taken over that conversion. I also tried to turn parse_time_t_args into a converter. However, it won't work. The problem is that parse_time_t_args must be called whether or not the user supplies an argument to the function, but an Argument Clinic converter only gets called if the user actually supplies something, and not on the default value. So, the best idea is to * Remove the PyArgs_ParseTuple code from parse_time_t_args * Declare seconds as a plain object in Argument Clinic * Call the modified parse_time_t_args on seconds first thing in the _impl functions On Sat, Jan 18, 2014 at 4:56 PM, Nikolaus Rath wrote: > Hello, > > I'm trying to convert functions using parse_time_t_args() (from > timemodule.c) for argument parsing to argument clinic. > > The function is defined as: > > ,---- > | static int > | parse_time_t_args(PyObject *args, char *format, time_t *pwhen) > | { > | PyObject *ot = NULL; > | time_t whent; > | > | if (!PyArg_ParseTuple(args, format, &ot)) > | return 0; > | if (ot == NULL || ot == Py_None) { > | whent = time(NULL); > | } > | else { > | if (_PyTime_ObjectToTime_t(ot, &whent) == -1) > | return 0; > | } > | *pwhen = whent; > | return 1; > | } > `---- > > and used like this: > > ,---- > | static PyObject * > | time_localtime(PyObject *self, PyObject *args) > | { > | time_t when; > | struct tm buf; > | > | if (!parse_time_t_args(args, "|O:localtime", &when)) > | return NULL; > | if (pylocaltime(&when, &buf) == -1) > | return NULL; > | return tmtotuple(&buf); > | } > `---- > > In other words, if any Python object is passed to it, it calls > _PyTime_ObjectToTime_t on it to convert it to time_t, and otherwise uses > time(NULL) as the default value. > > May first attempt to implement something similar in argument clinic was: > > ,---- > | /*[python input] > | class time_t_converter(CConverter): > | type = 'time_t' > | converter = 'time_t_converter' > | default = None > | py_default = 'None' > | c_default = 'time(NULL)' > | converter = '_PyTime_ObjectToTime_t' > | [python start generated code]*/ > | > | /*[clinic input] > | time.localtime > | > | seconds: time_t > | / > | > | bla. > | [clinic start generated code]*/ > `---- > > However, running clinic.py on this file gives: > > ,---- > | $ Tools/clinic/clinic.py Modules/timemodule.c > | Error in file "Modules/timemodule.c" on line 529: > | Exception raised during parsing: > | Traceback (most recent call last): > | File "Tools/clinic/clinic.py", line 1445, in parse > | parser.parse(block) > | File "Tools/clinic/clinic.py", line 2738, in parse > | self.state(None) > | File "Tools/clinic/clinic.py", line 3468, in state_terminal > | self.function.docstring = self.format_docstring() > | File "Tools/clinic/clinic.py", line 3344, in format_docstring > | s += "".join(a) > | TypeError: sequence item 2: expected str instance, NoneType found > `---- > > What am I doing wrong? > > > Best, > Nikolaus > > -- > Encrypted emails preferred. > PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C > > ?Time flies like an arrow, fruit flies like a Banana.? > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sun Jan 19 03:27:12 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 19 Jan 2014 11:27:12 +0900 Subject: [Python-Dev] Migration from Python 2.7 and bytes formatting In-Reply-To: <20140118223111.GA27903@python.ca> References: <87r486aq02.fsf@uwakimon.sk.tsukuba.ac.jp> <20140118223111.GA27903@python.ca> Message-ID: <87wqhwaepr.fsf@uwakimon.sk.tsukuba.ac.jp> Neil Schemenauer writes: > That's it. After sleeping on it, I'm not sure that's enough Python > 2.x compatibility to help a lot. I haven't ported much code to 3.x > yet but I imagine the following are major challenges: > > - comparisons between str and bytes always returns unequal > > - indexing/iterating bytes returns integers, not bytes objects > > - concatenation of str and bytes fails (not so bad since > a TypeError is generated right away). Experience shows these are rarely major challenges. The reason we are having this discussion is that if you are the kind of programmer who runs into challenges once, you are likely to run into all of the above and more, repeatedly, and addressing them using features available in Python up to v3.3 make your code unreadable. In other words, it's like unemployment at 5%. It would be bearable (just) if the pain were shared by 100% of the people being 5% unemployed, but rather the burden falls on the 5% who are 100% unemployed. Now, the problem that many existing libraries face is that they were designed for monolingual environments where text encodings are more or less ASCII compatible[1]. If you stay in the Python 2 world, you can "internationalize" with the existing design, more or less limp along, fixing encoding bugs as they arise (not "if" but "when", and it can take a decade to find them all). But Python 3 *strongly* discourages that policy. From the point of view of design for the modern environment, such libraries really should have their I/O modules rewritten from scratch (not a huge job), and the necessary adjustments made in processing code (few but randomly dispersed through the code, and each a ticking time bomb for your users). But I stress that the problem here is that the design of such libraries is at fault, not Python 3. The world has changed.[2] And then there are the remaining 5% or so that really need to work mostly in bytes, but want to use string formatting to format their byte streams. I used to think that this was just a porting convenience, but I was wrong. Code written this way is often more concise and more readable than code written using .join() or the struct module. It *should* be written using string formatting. And that's what PEPs 460 and 461 are intended to address. We'll see what happens as these PEPs are implemented, but I suspect that we'll find that there are very few bandaids left that are of much use. That is, as I claimed above, for the remaining problematic libraries a redesign will be needed. Footnotes: [1] In the technical sense that you can rely on ASCII bytes to mean ASCII characters, not part of a non-ASCII character. [2] And if the world *hasn't* changed for your application, what's wrong with staying with Python 2? From ethan at stoneleaf.us Sun Jan 19 03:34:49 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 18 Jan 2014 18:34:49 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> <52DAF99F.4050609@stoneleaf.us> Message-ID: <52DB39C9.2020104@stoneleaf.us> On 01/18/2014 05:21 PM, Neil Schemenauer wrote: > Ethan Furman wrote: >> So, if %a is added it would act like: >> >> --------- >> "%a" % some_obj >> --------- >> tmp = str(some_obj) >> res = b'' >> for ch in tmp: >> if ord(ch) < 256: >> res += bytes([ord(ch)] >> else: >> res += unicode_escape(ch) >> --------- >> >> where 'unicode_escape' would yield something like "\u0440" ? > > My patch on the tracker already implements %a, it's simple. Before one implements a patch it is good to know the specifications. > Just call PyObject_ASCII() (same as ascii()) then call > PyUnicode_AsLatin1String(s) to convert it to bytes and stick it in. > PyObject_ASCII does not return non-ASCII characters, no decode error > is possible. We could call _PyUnicode_AsASCIIString(s, "strict") > instead if we are afraid for non-ASCII bytes coming out of > PyObject_ASCII. I appreciate that this is the behavior you want, but I'm not sure it's the behavior Nick was describing. -- ~Ethan~ From ethan at stoneleaf.us Sun Jan 19 04:01:22 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 18 Jan 2014 19:01:22 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <52DAF99F.4050609@stoneleaf.us> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> <52DAF99F.4050609@stoneleaf.us> Message-ID: <52DB4002.4080804@stoneleaf.us> On 01/18/2014 02:01 PM, Ethan Furman wrote: > > where 'unicode_escape' would yield something like "\u0440" ? Just to be clear, "\u0440" is the six bytes b'\\', b'u', b'0', b'4', b'4', b'0'. -- ~Ethan~ From Nikolaus at rath.org Sun Jan 19 04:42:35 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sat, 18 Jan 2014 19:42:35 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: (Ryan Smith-Roberts's message of "Sat, 18 Jan 2014 18:11:31 -0800") References: <87vbxgeqm3.fsf@vostro.rath.org> Message-ID: <87siskeixg.fsf@vostro.rath.org> Hi Ryan, Ryan Smith-Roberts writes: > Hi Nikolaus. I also started a conversion of timemodule, but dropped it when > I saw in the issue that you had taken over that conversion. I also tried to > turn parse_time_t_args into a converter. However, it won't work. The > problem is that parse_time_t_args must be called whether or not the user > supplies an argument to the function, but an Argument Clinic converter only > gets called if the user actually supplies something, and not on the default > value. I don't quite follow. My approach was to drop parse_time_t_args() completely and use _PyTime_ObjectToTime_t() as the conversion function (which only needs to be called if the user supplied something). In other words, I would have expected >> ,---- >> | /*[python input] >> | class time_t_converter(CConverter): >> | type = 'time_t' >> | converter = 'time_t_converter' >> | default = None >> | py_default = 'None' >> | c_default = 'time(NULL)' >> | converter = '_PyTime_ObjectToTime_t' >> | [python start generated code]*/ >> | >> | /*[clinic input] >> | time.localtime >> | >> | seconds: time_t >> | / >> | >> | bla. >> | [clinic start generated code]*/ >> `---- to produce something like this: static PyObject * time_localtime(PyObject *self, PyObject *args) { PyObject *obj = NULL; time_t seconds; struct tm buf; if (!PyArg_ParseTuple(args, "|O:localtime", &obj)) return NULL; if (obj == NULL || obj == Py_None) seconds = time(NULL); else { if (_PyTime_ObjectToTime_t(obj, &seconds) == -1) return NULL; } return time_localtime_impl(self, seconds); } Apart from getting an error from clinic.py, it seems to me that this should in principle be possible. Best, Nikolaus > > So, the best idea is to > > * Remove the PyArgs_ParseTuple code from parse_time_t_args > * Declare seconds as a plain object in Argument Clinic > * Call the modified parse_time_t_args on seconds first thing in the _impl > functions > > > On Sat, Jan 18, 2014 at 4:56 PM, Nikolaus Rath wrote: > >> Hello, >> >> I'm trying to convert functions using parse_time_t_args() (from >> timemodule.c) for argument parsing to argument clinic. >> >> The function is defined as: >> >> ,---- >> | static int >> | parse_time_t_args(PyObject *args, char *format, time_t *pwhen) >> | { >> | PyObject *ot = NULL; >> | time_t whent; >> | >> | if (!PyArg_ParseTuple(args, format, &ot)) >> | return 0; >> | if (ot == NULL || ot == Py_None) { >> | whent = time(NULL); >> | } >> | else { >> | if (_PyTime_ObjectToTime_t(ot, &whent) == -1) >> | return 0; >> | } >> | *pwhen = whent; >> | return 1; >> | } >> `---- >> >> and used like this: >> >> ,---- >> | static PyObject * >> | time_localtime(PyObject *self, PyObject *args) >> | { >> | time_t when; >> | struct tm buf; >> | >> | if (!parse_time_t_args(args, "|O:localtime", &when)) >> | return NULL; >> | if (pylocaltime(&when, &buf) == -1) >> | return NULL; >> | return tmtotuple(&buf); >> | } >> `---- >> >> In other words, if any Python object is passed to it, it calls >> _PyTime_ObjectToTime_t on it to convert it to time_t, and otherwise uses >> time(NULL) as the default value. >> >> May first attempt to implement something similar in argument clinic was: >> >> ,---- >> | /*[python input] >> | class time_t_converter(CConverter): >> | type = 'time_t' >> | converter = 'time_t_converter' >> | default = None >> | py_default = 'None' >> | c_default = 'time(NULL)' >> | converter = '_PyTime_ObjectToTime_t' >> | [python start generated code]*/ >> | >> | /*[clinic input] >> | time.localtime >> | >> | seconds: time_t >> | / >> | >> | bla. >> | [clinic start generated code]*/ >> `---- >> >> However, running clinic.py on this file gives: >> >> ,---- >> | $ Tools/clinic/clinic.py Modules/timemodule.c >> | Error in file "Modules/timemodule.c" on line 529: >> | Exception raised during parsing: >> | Traceback (most recent call last): >> | File "Tools/clinic/clinic.py", line 1445, in parse >> | parser.parse(block) >> | File "Tools/clinic/clinic.py", line 2738, in parse >> | self.state(None) >> | File "Tools/clinic/clinic.py", line 3468, in state_terminal >> | self.function.docstring = self.format_docstring() >> | File "Tools/clinic/clinic.py", line 3344, in format_docstring >> | s += "".join(a) >> | TypeError: sequence item 2: expected str instance, NoneType found >> `---- >> >> What am I doing wrong? >> >> >> Best, >> Nikolaus >> >> -- >> Encrypted emails preferred. >> PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C >> >> ?Time flies like an arrow, fruit flies like a Banana.? >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net >> -- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C ?Time flies like an arrow, fruit flies like a Banana.? From rmsr at lab.net Sun Jan 19 06:52:01 2014 From: rmsr at lab.net (Ryan Smith-Roberts) Date: Sat, 18 Jan 2014 21:52:01 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: <87siskeixg.fsf@vostro.rath.org> References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> Message-ID: Ah yes, my apologies, I was thrown off by the first converter declaration in your class and didn't spot the second, so didn't realize what you were up to. I still advise you not to use this solution. time() is a system call on many operating systems, and so it can be a heavier operation than you'd think. Best to avoid it unless it's needed (on FreeBSD it seems to add about 15% overhead to localtime(), for instance). As for why you're getting that exception, it definitely looks like a bug in Argument Clinic. I spotted another bug that would have bitten you while I was looking for this one, so I've opened bugs on both issues, and put you on the nosy list for them. On Sat, Jan 18, 2014 at 7:42 PM, Nikolaus Rath wrote: > Hi Ryan, > > > Ryan Smith-Roberts writes: > > Hi Nikolaus. I also started a conversion of timemodule, but dropped it > when > > I saw in the issue that you had taken over that conversion. I also tried > to > > turn parse_time_t_args into a converter. However, it won't work. The > > problem is that parse_time_t_args must be called whether or not the user > > supplies an argument to the function, but an Argument Clinic converter > only > > gets called if the user actually supplies something, and not on the > default > > value. > > I don't quite follow. My approach was to drop parse_time_t_args() > completely and use _PyTime_ObjectToTime_t() as the conversion function > (which only needs to be called if the user supplied something). > > In other words, I would have expected > > >> ,---- > >> | /*[python input] > >> | class time_t_converter(CConverter): > >> | type = 'time_t' > >> | converter = 'time_t_converter' > >> | default = None > >> | py_default = 'None' > >> | c_default = 'time(NULL)' > >> | converter = '_PyTime_ObjectToTime_t' > >> | [python start generated code]*/ > >> | > >> | /*[clinic input] > >> | time.localtime > >> | > >> | seconds: time_t > >> | / > >> | > >> | bla. > >> | [clinic start generated code]*/ > >> `---- > > to produce something like this: > > static PyObject * > time_localtime(PyObject *self, PyObject *args) > { > PyObject *obj = NULL; > time_t seconds; > struct tm buf; > > if (!PyArg_ParseTuple(args, "|O:localtime", &obj)) > return NULL; > if (obj == NULL || obj == Py_None) > seconds = time(NULL); > else { > if (_PyTime_ObjectToTime_t(obj, &seconds) == -1) > return NULL; > } > return time_localtime_impl(self, seconds); > } > > > Apart from getting an error from clinic.py, it seems to me that this > should in principle be possible. > > Best, > Nikolaus > > > > > > So, the best idea is to > > > > * Remove the PyArgs_ParseTuple code from parse_time_t_args > > * Declare seconds as a plain object in Argument Clinic > > * Call the modified parse_time_t_args on seconds first thing in the _impl > > functions > > > > > > On Sat, Jan 18, 2014 at 4:56 PM, Nikolaus Rath > wrote: > > > >> Hello, > >> > >> I'm trying to convert functions using parse_time_t_args() (from > >> timemodule.c) for argument parsing to argument clinic. > >> > >> The function is defined as: > >> > >> ,---- > >> | static int > >> | parse_time_t_args(PyObject *args, char *format, time_t *pwhen) > >> | { > >> | PyObject *ot = NULL; > >> | time_t whent; > >> | > >> | if (!PyArg_ParseTuple(args, format, &ot)) > >> | return 0; > >> | if (ot == NULL || ot == Py_None) { > >> | whent = time(NULL); > >> | } > >> | else { > >> | if (_PyTime_ObjectToTime_t(ot, &whent) == -1) > >> | return 0; > >> | } > >> | *pwhen = whent; > >> | return 1; > >> | } > >> `---- > >> > >> and used like this: > >> > >> ,---- > >> | static PyObject * > >> | time_localtime(PyObject *self, PyObject *args) > >> | { > >> | time_t when; > >> | struct tm buf; > >> | > >> | if (!parse_time_t_args(args, "|O:localtime", &when)) > >> | return NULL; > >> | if (pylocaltime(&when, &buf) == -1) > >> | return NULL; > >> | return tmtotuple(&buf); > >> | } > >> `---- > >> > >> In other words, if any Python object is passed to it, it calls > >> _PyTime_ObjectToTime_t on it to convert it to time_t, and otherwise uses > >> time(NULL) as the default value. > >> > >> May first attempt to implement something similar in argument clinic was: > >> > >> ,---- > >> | /*[python input] > >> | class time_t_converter(CConverter): > >> | type = 'time_t' > >> | converter = 'time_t_converter' > >> | default = None > >> | py_default = 'None' > >> | c_default = 'time(NULL)' > >> | converter = '_PyTime_ObjectToTime_t' > >> | [python start generated code]*/ > >> | > >> | /*[clinic input] > >> | time.localtime > >> | > >> | seconds: time_t > >> | / > >> | > >> | bla. > >> | [clinic start generated code]*/ > >> `---- > >> > >> However, running clinic.py on this file gives: > >> > >> ,---- > >> | $ Tools/clinic/clinic.py Modules/timemodule.c > >> | Error in file "Modules/timemodule.c" on line 529: > >> | Exception raised during parsing: > >> | Traceback (most recent call last): > >> | File "Tools/clinic/clinic.py", line 1445, in parse > >> | parser.parse(block) > >> | File "Tools/clinic/clinic.py", line 2738, in parse > >> | self.state(None) > >> | File "Tools/clinic/clinic.py", line 3468, in state_terminal > >> | self.function.docstring = self.format_docstring() > >> | File "Tools/clinic/clinic.py", line 3344, in format_docstring > >> | s += "".join(a) > >> | TypeError: sequence item 2: expected str instance, NoneType found > >> `---- > >> > >> What am I doing wrong? > >> > >> > >> Best, > >> Nikolaus > >> > >> -- > >> Encrypted emails preferred. > >> PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C > >> > >> ?Time flies like an arrow, fruit flies like a Banana.? > >> _______________________________________________ > >> Python-Dev mailing list > >> Python-Dev at python.org > >> https://mail.python.org/mailman/listinfo/python-dev > >> Unsubscribe: > >> https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net > >> > > > -- > Encrypted emails preferred. > PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C > > ?Time flies like an arrow, fruit flies like a Banana.? > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jan 19 07:19:00 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Jan 2014 16:19:00 +1000 Subject: [Python-Dev] PEP 461 updates In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D85E61.8040701@udel.edu> <874n53cptr.fsf@uwakimon.sk.tsukuba.ac.jp> <20140117053611.GB3915@ando> <87wqhzapz1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 19 January 2014 00:39, Oscar Benjamin wrote: > > If you want to draw a relevant lesson from that thread in this one > then the lesson argues against PEP 461: adding back the bytes > formatting methods helps people who refuse to understand text > processing and continue implementing dirty hacks instead of doing it > properly. Yes, that's why it has taken so long to even *consider* bringing binary interpolation support back - one of our primary concerns in the early days of Python 3 was developers (including core developers!) attempting to translate bad habits from Python 2 into Python 3 by continuing to treat binary data as text. Making interpolation a purely text domain operation helped strongly in enforcing this distinction, as it generally required thinking about encoding issues in order to get things into the text domain (or hitting them with the "latin-1" hammer, in which case... *sigh*). The reason PEP 460/461 came up is that we *do* acknowledge that there is a legitimate use case for binary interpolation support when dealing with binary formats that contain ASCII compatible segments. Now that people have had a few years to get used to the Python 3 text model , lowering the barrier to migration from Python 2 and better handling that use case in Python 3 in general has finally tilted the scales in favour of providing the feature (assuming Guido is happy with PEP 461 after Ethan finishes the Rationale section). (Tangent) While I agree it's not relevant to the PEP 460/461 discussions, so long as numpy.loadtxt is explicitly documented as only working with latin-1 encoded files (it currently isn't), there's no problem. If it's supposed to work with other encodings (but the entire file is still required to use a consistent encoding), then it just needs encoding and errors arguments to fit the Python 3 text model (with "latin-1" documented as the default encoding). If it is intended to allow S columns to contain text in arbitrary encodings, then that should also be supported by the current API with an adjustment to the default behaviour, since passing something like codecs.getdecoder("utf-8") as a column converter should do the right thing. However, if you're currently decoding S columns with latin-1 *before* passing the value to the converter, then you'll need to use a WSGI style decoding dance instead: def fix_encoding(text): return text.encode("latin-1").decode("utf-8") # For example That's more wasteful than just passing the raw bytes through for decoding, but is the simplest backwards compatible option if you're doing latin-1 decoding already. If different rows in the *same* column are allowed to have different encodings, then that's not a valid use of the operation (since the column converter has no access to the rest of the row to determine what encoding should be used for the decode operation). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jan 19 07:36:45 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Jan 2014 16:36:45 +1000 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: On 19 January 2014 10:44, Steve Dower wrote: > Visual Studio will try to compile them if they end with .c, though this can > be disabled on a per-file basis in the project file. Files ending in .h > won't be compiled, though changes should be detected and cause the .c files > that include them to be recompiled. That sounds like a rather good argument for .clinic.h over .clinic.c :) My assessment of the thread is that .clinic.h will give us the best overall tool compatibility. I use Eli Bendersky's pss for my command line source searching needs, and should be able to update that to skip clinic files without much difficulty (rather than having to exclude them manually from every search). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jan 19 08:09:51 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Jan 2014 17:09:51 +1000 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <52DB39C9.2020104@stoneleaf.us> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> <52DAF99F.4050609@stoneleaf.us> <52DB39C9.2020104@stoneleaf.us> Message-ID: On 19 January 2014 12:34, Ethan Furman wrote: > On 01/18/2014 05:21 PM, Neil Schemenauer wrote: >> >> Ethan Furman wrote: >>> >>> So, if %a is added it would act like: >>> >>> --------- >>> "%a" % some_obj >>> --------- >>> tmp = str(some_obj) >>> res = b'' >>> for ch in tmp: >>> if ord(ch) < 256: >>> res += bytes([ord(ch)] >>> else: >>> res += unicode_escape(ch) >>> --------- >>> >>> where 'unicode_escape' would yield something like "\u0440" ? >> >> >> My patch on the tracker already implements %a, it's simple. > > > Before one implements a patch it is good to know the specifications. A very sound engineering principle :) Neil has the resulting semantics right for what I had in mind, but the faster path to bytes (rather than going through the ASCII builtin) is to do the C level equivalent of: repr(obj).encode("ascii", errors="backslashreplace") That's essentially what the ascii() builtin does, but that operates entirely in the text domain, so (as Neil found) you still need a separate encode step at the end. >>> ascii("?").encode("ascii") b"'\\xe8'" >>> repr("?").encode("ascii", errors="backslashreplace") b"'\\xe8'" b"%a" % "?" should produce the same result as the two examples above. (Code points higher up in the Unicode code space would produce \u and \U escapes as needed, which should already be handled properly by the backslashreplace error handler) One nice thing about this definition is that in the specific case of text input, the transformation can always be reversed by decoding as ASCII and then applying ast.literal_eval(): >>> import ast >>> ast.literal_eval(repr("?").encode("ascii", "backslashreplace").decode("ascii")) '?' (Please don't use eval() to reverse a transformation like this, as doing so not only makes security engineers cry, it's also likely to make your code vulnerable to all kinds of interesting attacks) As noted earlier in the thread, one key purpose of including this feature is to reduce the likelihood of people inappropriately adding __bytes__ implementations for %s compatibility that look like: def __bytes__(self): # This is unlikely to be a good idea! return repr(self).encode("ascii", errors="backslashreplace") Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From larry at hastings.org Sun Jan 19 11:19:43 2014 From: larry at hastings.org (Larry Hastings) Date: Sun, 19 Jan 2014 02:19:43 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> Message-ID: <52DBA6BF.5070108@hastings.org> On 01/18/2014 10:36 PM, Nick Coghlan wrote: > On 19 January 2014 10:44, Steve Dower wrote: >> Visual Studio will try to compile them if they end with .c, though this can >> be disabled on a per-file basis in the project file. Files ending in .h >> won't be compiled, though changes should be detected and cause the .c files >> that include them to be recompiled. > That sounds like a rather good argument for .clinic.h over .clinic.c :) > > My assessment of the thread is that .clinic.h will give us the best > overall tool compatibility. Yeah, I'm tipping pretty far towards "foo.c" -> "foo.clinic.h". But there's one onion in the ointment: what should "foo.h" generate? The day may yet arrive when we have Argument Clinic code in foo.{ch}. Not kidding, my best idea so far is "foo.clinic.h.h", //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Sun Jan 19 11:38:12 2014 From: larry at hastings.org (Larry Hastings) Date: Sun, 19 Jan 2014 02:38:12 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> Message-ID: <52DBAB14.8090704@hastings.org> On 01/18/2014 09:52 PM, Ryan Smith-Roberts wrote: > > I still advise you not to use this solution. time() is a system call > on many operating systems, and so it can be a heavier operation than > you'd think. Best to avoid it unless it's needed (on FreeBSD it seems > to add about 15% overhead to localtime(), for instance). > I agree. Converting to Argument Clinic should not cause a performance regression. Please don't add new calls to time() for the sake of making code more generic. A better choice would be to write a converter function in C, then use a custom converter that called it. Nikolaus: Is that something you're comfortable doing? > As for why you're getting that exception, it definitely looks like a > bug in Argument Clinic. I spotted another bug that would have bitten > you while I was looking for this one, so I've opened bugs on both > issues, and put you on the nosy list for them. > According to the issue tracker, " rmsr" has only ever filed one issue. I just fixed (and closed) it. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Sun Jan 19 12:32:24 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 19 Jan 2014 12:32:24 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DBA6BF.5070108@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> Message-ID: Am 19.01.2014 11:19, schrieb Larry Hastings: > On 01/18/2014 10:36 PM, Nick Coghlan wrote: >> On 19 January 2014 10:44, Steve Dower wrote: >>> Visual Studio will try to compile them if they end with .c, though this can >>> be disabled on a per-file basis in the project file. Files ending in .h >>> won't be compiled, though changes should be detected and cause the .c files >>> that include them to be recompiled. >> That sounds like a rather good argument for .clinic.h over .clinic.c :) >> >> My assessment of the thread is that .clinic.h will give us the best >> overall tool compatibility. > > Yeah, I'm tipping pretty far towards "foo.c" -> "foo.clinic.h". > > But there's one onion in the ointment: what should "foo.h" generate? The day > may yet arrive when we have Argument Clinic code in foo.{ch}. > > Not kidding, my best idea so far is "foo.clinic.h.h", Why not always put clinic into its own directory? Modules/mathmodule.c -> Modules/clinic/mathmodule.c.h Modules/mathmodule.h -> Modules/clinic/mathmodule.h.h At least that is consistent, allows easy exclusion in tools, and gets rid of the additional "clinic" in the filename. Georg From steve at pearwood.info Sun Jan 19 12:37:45 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 19 Jan 2014 22:37:45 +1100 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <52D9DE09.60102@stoneleaf.us> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> Message-ID: <20140119113745.GU3915@ando> On Fri, Jan 17, 2014 at 05:51:05PM -0800, Ethan Furman wrote: > On 01/17/2014 05:27 PM, Steven D'Aprano wrote: > >>Numeric Format Codes > >>-------------------- > >> > >>To properly handle int and float subclasses, int(), index(), and float() > >>will be called on the objects intended for (d, i, u), (b, o, x, X), and > >>(e, E, f, F, g, G). > > > > > >-1 on this idea. > > > >This is a rather large violation of the principle of least surprise, and > >radically different from the behaviour of Python 3 str. In Python 3, > >'%d' interpolation calls the __str__ method, so if you subclass, you can > >get the behaviour you want: > > Did you read the bug reports I linked to? This behavior (which is a bug) > has already been fixed for Python3.4. No I didn't. This thread is huge, and it's only one of a number of huge threads about the same "bytes/unicode Python 2/3" stuff. I'm probably not the only person who missed the bug reports you linked to. If these bug reports are relevant to the PEP, you ought to list them in the PEP, and if they aren't relevant, I shan't be reading them *wink* In any case, whether I have succeeded in making the case against this aspect of the PEP or not, I think you should: - explain what you mean by "properly handle" (give an example?); - justify why b'%d' % obj should ignore any relevant overloaded methods in obj; - if there are similar, existing, examples of this (to me) surprising behaviour, you should briefly mention them; - note that there was some opposition to the suggestion; - and explain why the contrary behaviour (i.e. allowing obj to overload b'%d') is not desirable. > As a quick thought experiment, why does "%d" % True return "1"? I don't know. Perhaps it is a bug? -- Steven From storchaka at gmail.com Sun Jan 19 15:35:21 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 19 Jan 2014 16:35:21 +0200 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DAAC1D.8060204@trueblade.com> References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> <52DAAC1D.8060204@trueblade.com> Message-ID: 18.01.14 18:30, Eric V. Smith ???????(??): > Same here. There's some history for this, but not for generated code. In > Objects/stringlib, all of the files are .h files. They're really C code > designed to be included by other .c files. Objects/stringlib files are hand-written. We should distinguish generated code from hand-written. From oscar.j.benjamin at gmail.com Sun Jan 19 16:21:25 2014 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sun, 19 Jan 2014 15:21:25 +0000 Subject: [Python-Dev] PEP 461 updates In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D85E61.8040701@udel.edu> <874n53cptr.fsf@uwakimon.sk.tsukuba.ac.jp> <20140117053611.GB3915@ando> <87wqhzapz1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 19 January 2014 06:19, Nick Coghlan wrote: > > While I agree it's not relevant to the PEP 460/461 discussions, so > long as numpy.loadtxt is explicitly documented as only working with > latin-1 encoded files (it currently isn't), there's no problem. Actually there is problem. If it explicitly specified the encoding as latin-1 when opening the file then it could document the fact that it works for latin-1 encoded files. However it actually uses the system default encoding to read the file and then converts the strings to bytes with the as_bytes function that is hard-coded to use latin-1: https://github.com/numpy/numpy/blob/master/numpy/compat/py3k.py#L28 So it only works if the system default encoding is latin-1 and the file content is white-space and newline compatible with latin-1. Regardless of whether the file itself is in utf-8 or latin-1 it will only work if the system default encoding is latin-1. I've never used a system that had latin-1 as the default encoding (unless you count cp1252 as latin-1). > If it's supposed to work with other encodings (but the entire file is > still required to use a consistent encoding), then it just needs > encoding and errors arguments to fit the Python 3 text model (with > "latin-1" documented as the default encoding). This is the right solution. Have an encoding argument, document the fact that it will use the system default encoding if none is specified, and re-encode using the same encoding to fit any dtype='S' bytes column. This will then work for any encoding including the ones that aren't ASCII-compatible (e.g. utf-16). Then instead of having a compat module with an as_bytes helper to get rid of all the unicode strings on Python 3, you can have a compat module with an open_unicode helper to do the right thing on Python 2. The as_bytes function is just a way of fighting the Python 3 text model: "I don't care about mojibake just do whatever it takes to shut up the interpreter and its error messages and make sure it works for ASCII data." > If it is intended to > allow S columns to contain text in arbitrary encodings, then that > should also be supported by the current API with an adjustment to the > default behaviour, since passing something like > codecs.getdecoder("utf-8") as a column converter should do the right > thing. However, if you're currently decoding S columns with latin-1 > *before* passing the value to the converter, then you'll need to use a > WSGI style decoding dance instead: > > def fix_encoding(text): > return text.encode("latin-1").decode("utf-8") # For example That's just getting silly IMO. If the file uses mixed encodings then I don't consider it a valid "text file" and see no reason for loadtxt to support reading it. > That's more wasteful than just passing the raw bytes through for > decoding, but is the simplest backwards compatible option if you're > doing latin-1 decoding already. > > If different rows in the *same* column are allowed to have different > encodings, then that's not a valid use of the operation (since the > column converter has no access to the rest of the row to determine > what encoding should be used for the decode operation). Ditto. Oscar From ethan at stoneleaf.us Sun Jan 19 17:29:25 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 19 Jan 2014 08:29:25 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> Message-ID: <52DBFD65.8050307@stoneleaf.us> On 01/19/2014 03:32 AM, Georg Brandl wrote: > Am 19.01.2014 11:19, schrieb Larry Hastings: >> On 01/18/2014 10:36 PM, Nick Coghlan wrote: >>> On 19 January 2014 10:44, Steve Dower wrote: >>>> Visual Studio will try to compile them if they end with .c, though this can >>>> be disabled on a per-file basis in the project file. Files ending in .h >>>> won't be compiled, though changes should be detected and cause the .c files >>>> that include them to be recompiled. >>> That sounds like a rather good argument for .clinic.h over .clinic.c :) >>> >>> My assessment of the thread is that .clinic.h will give us the best >>> overall tool compatibility. >> >> Yeah, I'm tipping pretty far towards "foo.c" -> "foo.clinic.h". >> >> But there's one onion in the ointment: what should "foo.h" generate? The day >> may yet arrive when we have Argument Clinic code in foo.{ch}. >> >> Not kidding, my best idea so far is "foo.clinic.h.h", > > Why not always put clinic into its own directory? > > Modules/mathmodule.c -> Modules/clinic/mathmodule.c.h > Modules/mathmodule.h -> Modules/clinic/mathmodule.h.h > > At least that is consistent, allows easy exclusion in tools, and gets rid > of the additional "clinic" in the filename. +1 If AC will work with both .c and .h files. I think a separate directory is the way to go. -- ~Ethan~ From ethan at stoneleaf.us Sun Jan 19 17:27:46 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 19 Jan 2014 08:27:46 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <20140118162419.GA31614@sleipnir.bytereef.org> <52DAAC1D.8060204@trueblade.com> Message-ID: <52DBFD02.6060608@stoneleaf.us> On 01/19/2014 06:35 AM, Serhiy Storchaka wrote: > 18.01.14 18:30, Eric V. Smith ???????(??): >> >> Same here. There's some history for this, but not for generated code. In >> Objects/stringlib, all of the files are .h files. They're really C code >> designed to be included by other .c files. > > Objects/stringlib files are hand-written. We should distinguish generated code from hand-written. We do. Generated files have .clinic. in the name. That is sufficient. -- ~Ethan~ From ethan at stoneleaf.us Sun Jan 19 18:44:22 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 19 Jan 2014 09:44:22 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <20140119113745.GU3915@ando> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> <20140119113745.GU3915@ando> Message-ID: <52DC0EF6.6020001@stoneleaf.us> On 01/19/2014 03:37 AM, Steven D'Aprano wrote: > On Fri, Jan 17, 2014 at 05:51:05PM -0800, Ethan Furman wrote: >> On 01/17/2014 05:27 PM, Steven D'Aprano wrote: > >>>> Numeric Format Codes >>>> -------------------- >>>> >>>> To properly handle int and float subclasses, int(), index(), and float() >>>> will be called on the objects intended for (d, i, u), (b, o, x, X), and >>>> (e, E, f, F, g, G). >>> >>> >>> -1 on this idea. >>> >>> This is a rather large violation of the principle of least surprise, and >>> radically different from the behaviour of Python 3 str. In Python 3, >>> '%d' interpolation calls the __str__ method, so if you subclass, you can >>> get the behaviour you want: >> >> Did you read the bug reports I linked to? This behavior (which is a bug) >> has already been fixed for Python3.4. > > No I didn't. This thread is huge, and it's only one of a number of huge > threads about the same "bytes/unicode Python 2/3" stuff. I'm probably > not the only person who missed the bug reports you linked to. Fair point. > If these bug reports are relevant to the PEP, you ought to list them in > the PEP, and if they aren't relevant, I shan't be reading them *wink* Well, it seems to me they are more relevant to your misunderstanding of how %d and friends should work rather than to the PEP itself. However, I suppose it possible you're not the only one so affected, so I'll link them in. > In any case, whether I have succeeded in making the case against this > aspect of the PEP or not Not. This was a bug that was fixed long before the PEP came into existence. >> As a quick thought experiment, why does "%d" % True return "1"? > > I don't know. Perhaps it is a bug? To summarize a rather long issue, %d and friends are /numeric/ codes; returning non-numeric text is inappropriate. Yes, I realize there are other unicode values than also mean numeric digits, but they do not mean (so far as I know) Decimal digits, or Hexadecimal digits, or Octal digits. (Obviously an ASCII slant going on there.) Now that I've written that down, I think there are, in fact, other scripts that represent a base-10 number system with obviously different glyphs for the numbers.... Well, that means that this PEP just further strengthens the notion that format is for text (as then a custom numeric type could easily override the display even for :d, :h, etc.) and % is for bytes (where such glyphs are not natively representable anyway). -- ~Ethan~ From rmsr at lab.net Sun Jan 19 18:59:04 2014 From: rmsr at lab.net (Ryan Smith-Roberts) Date: Sun, 19 Jan 2014 09:59:04 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: <52DBAB14.8090704@hastings.org> References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> Message-ID: On Sun, Jan 19, 2014 at 2:38 AM, Larry Hastings wrote: > According to the issue tracker, " rmsr" has only ever filed one issue. > I just fixed (and closed) it. > The two issues were "custom converter with converter and default raises exception" and "custom converter with py_default and c_default being overridden by __init__". As for the former, you said "I hope you know what you're doing!" which made me step back and think more about the "why". I realized two things: The default-related class attributes might be an 'attractive nuisance', in that setting them there technically saves repetition, but could easily confuse a later reader who expects to find the defaults declared inline as usual. As well, it is unclear which of 'default', 'py_default', 'c_default' one needs to set, or which has priority. Nikolaus went ahead and set all three, thus my bug reports. After tinkering some more with the test file for the first bug, I noticed that a class with 'default' and 'converter' generates a default in the signature line but not at the C level. I'm wondering now if class-level default support shouldn't just be removed, as the aforementioned attractive nuisance. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Sun Jan 19 22:03:14 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sun, 19 Jan 2014 13:03:14 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: <52DBAB14.8090704@hastings.org> (Larry Hastings's message of "Sun, 19 Jan 2014 02:38:12 -0800") References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> Message-ID: <87y52bbs6l.fsf@vostro.rath.org> Larry Hastings writes: > On 01/18/2014 09:52 PM, Ryan Smith-Roberts wrote: >> >> I still advise you not to use this solution. time() is a system call >> on many operating systems, and so it can be a heavier operation than >> you'd think. Best to avoid it unless it's needed (on FreeBSD it >> seems to add about 15% overhead to localtime(), for instance). >> > > I agree. Converting to Argument Clinic should not cause a performance > regression. Please don't add new calls to time() for the sake of > making code more generic. I don't see how the conversion would result in more calls to time() than we have now. It seems to me that the expression for the C default should be only evaluated if the caller did not specify a value. Is that not how ac works? > A better choice would be to write a converter function in C, then use > a custom converter that called it. Nikolaus: Is that something you're > comfortable doing? As long as you're comfortable looking over the (probably buggy) patch, yes, I'm happy to give it a shot. Best, Nikolaus -- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C ?Time flies like an arrow, fruit flies like a Banana.? From ethan at stoneleaf.us Mon Jan 20 01:01:30 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 19 Jan 2014 16:01:30 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <20140119113745.GU3915@ando> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> <20140119113745.GU3915@ando> Message-ID: <52DC675A.60604@stoneleaf.us> On 01/19/2014 03:37 AM, Steven D'Aprano wrote: > On Fri, Jan 17, 2014 at 05:51:05PM -0800, Ethan Furman wrote: >> On 01/17/2014 05:27 PM, Steven D'Aprano wrote: > >>>> Numeric Format Codes >>>> -------------------- >>>> >>>> To properly handle int and float subclasses, int(), index(), and float() >>>> will be called on the objects intended for (d, i, u), (b, o, x, X), and >>>> (e, E, f, F, g, G). >>> >>> >>> -1 on this idea. I went to add examples to this section of the PEP, and realized I was just describing what Python does anyway. So it doesn't need to be in the PEP. -- ~Ethan~ From yselivanov.ml at gmail.com Mon Jan 20 03:42:18 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 19 Jan 2014 21:42:18 -0500 Subject: [Python-Dev] signature.object, argument clinic and grouped parameters Message-ID: In the midst of work on the issue #17481, it became apparent that we need? a way of specifying optional/grouped parameters. One good example of grouped parameters in python is the `type` function. Basically, it has two different signatures: * type(name, bases, dict) * type(object) Which we can combine in one, if we define a notion of grouped parameters: * type(object_or_name, [bases, dict]) Another good example, is 'itertools.repeat'. Its signature is "(elem[, n])". If "n" argument is passed, then it's how many times the "elem" will be repeated, and if it is not passed at all, then "elem" will be repeated endlessly. One way of emulating this behavior in pure python is to define a special private marker object and use it as a default value: ? ? _optional = object() ? ? def repeat(elem, n=_optional): ? ? ? ? if n is _optional: ? ? ? ? ? ? # `n` wasn't passed, repeat indefinitely ? ? ? ? else: ? ? ? ? ? ? # we have something for `n` One of the problems with the above approach is how to represent its signature, how to document it clearly. Another one, is that there is no common marker, so whenever this is needed, a new marker is invented. Now, the more I think about having a concept of grouped parameters, the more different things to consider and take care of appear: * In issue #17481 Larry proposed that parameters will have a group id (arbitrary), and perhaps parent group id, to make it possible to have nested groups. * API for retrieving grouped parameters for the Signature objects. Something akin to what we have for ``regex.Match`` objects, probably.? * An accepted and documented method of declaring groups for pure python function would be a nice thing to have. * Will we have groups for keyword-only parameters? Can we combine ``*args`` and a keyword-only parameter in a group? etc. That seems to be a lot of work (some of it is maybe enough for a PEP.) So before committing to the parameters groups idea, I'd like to propose somewhat simpler, but powerful enough to solve our todays problems? solution. What if we add a notion of "optional" parameters? * ``Parameter.__init__ `` will receive one more keyword-only argument: ``optional``, ``False`` by default. * We add a special marker ``Parameter.optional`` (or some other name, like ``inspect.optional`` or ``functools.optional``), and teach ``inspect.signature`` to recognize it. So for pure-python functions, ?if you want to define an optional parameter, you would write:? ``def mytype(name_or_obj, bases=Parameter.optional, dict=Parameter.optional)`` * Argument Clinic may get a new syntax for specifying if parameter is? optional. * We standardize how optional parameters should look like in documentation and ``Signature.__str__``. In PEP 362 we used notation for? optional parameters: ``foo(param=)``, but we also can use square? brackets for that: ``bar([spam][, ham])``. With this approach, a signature of the ``type`` function would look like: ``type(object_or_name[, bases][, dict])``. The main downside is that it's not immediately apparent, that you can only pass either one argument? "(object)", or all three arguments "(name, bases, dict)". But that's? something, that a good documentation (and meaningful exceptions) could help with. The advantages if this approach, is that it works for all types of parameters, and that the implementation is going to be simpler than groups (and we will? need fewer new APIs). Yury From stephen at xemacs.org Mon Jan 20 03:56:51 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 20 Jan 2014 11:56:51 +0900 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <52DC0EF6.6020001@stoneleaf.us> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> <20140119113745.GU3915@ando> <52DC0EF6.6020001@stoneleaf.us> Message-ID: <87ha8z9x8s.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > Well, that means that this PEP just further strengthens the notion > that format is for text (as then a custom numeric type could easily > override the display even for :d, :h, etc.) and % is for bytes > (where such glyphs are not natively representable anyway). This argument is specious. Alternative numeric characters just as representable as the ASCII digits are, and in the same way (by defining a bytes <-> str mapping, aka codec). The problem is not that they're non-representable, it's that they're non-ASCII, and the numeric format codes implicitly specify the ASCII numerals when in text as well as when in bytes. There's no technical reason why these features couldn't use EBCDIC or even UTF-16 nowadays. It's purely a convention. But it's a very useful convention, so it's helpful if Python conforms to it. (Note that "{:d}.format(True)" -> '1' works because True *is* an int and so can be d-formatted in principle. It's not an exceptional case. It's a different issue from what you're talking about here.) The problem that EIBTI worries about is that in many places there is a local convention to use not pure ASCII, but a specific ASCII superset. This allows them to take advantage of the common convention of using ASCII for protocol keywords, and at the same time using "legacy" facilities for internal processing of text. Becoming a disadvantage if and when such programs need to communicate with internationalized applications. These PEPs provide a crutch for such crippled software, allowing them to hobble into the House of Python 3. That's obvious, so please don't try to obfuscate it; just declare "consenting adults" and move on. From ncoghlan at gmail.com Mon Jan 20 05:30:13 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Jan 2014 14:30:13 +1000 Subject: [Python-Dev] signature.object, argument clinic and grouped parameters In-Reply-To: References: Message-ID: Guido, Larry and I thrashed out the required semantics for parameter groups at PyCon US last year (and I believe the argument clinic PEP describes those accurately). They're mainly needed to represent oddball signatures like range() and slice(). However, I'm inclined to say that the affected functions should simply not support introspection until Python 3.5. It's not just a matter of the data model, there's also the matter of defining the string representation. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Mon Jan 20 05:19:07 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Sun, 19 Jan 2014 20:19:07 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: <52DBAB14.8090704@hastings.org> (Larry Hastings's message of "Sun, 19 Jan 2014 02:38:12 -0800") References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> Message-ID: <87iotfxp38.fsf@vostro.rath.org> Larry Hastings writes: > On 01/18/2014 09:52 PM, Ryan Smith-Roberts wrote: >> >> I still advise you not to use this solution. time() is a system call >> on many operating systems, and so it can be a heavier operation than >> you'd think. Best to avoid it unless it's needed (on FreeBSD it >> seems to add about 15% overhead to localtime(), for instance). >> > > I agree. Converting to Argument Clinic should not cause a performance > regression. Please don't add new calls to time() for the sake of > making code more generic. > > A better choice would be to write a converter function in C, then use > a custom converter that called it. Nikolaus: Is that something you're > comfortable doing? I think I'll need some help. I don't know how to handle the case where the user is not passing anything. Here's my attempt: ,---- | /* C Converter for argument clinic | If obj is NULL or Py_None, return current time. Otherwise, | convert Python object to time_t. | */ | static int | PyObject_to_time_t(PyObject *obj, time_t *stamp) | { | if (obj == NULL || obj == Py_None) { | *stamp = time(NULL); | } | else { | if (_PyTime_ObjectToTime_t(obj, stamp) == -1) | return 0; | } | return 1; | } | | /*[python input] | class time_t_converter(CConverter): | type = 'time_t' | converter = 'PyObject_to_time_t' | default = None | [python start generated code]*/ | /*[python end generated code: checksum=da39a3ee5e6b4b0d3255bfef95601890afd80709]*/ | | | /*[clinic input] | time.gmtime | | seconds: time_t | / | | [clinic start generated code]*/ `---- but this results in the following code: ,---- | static PyObject * | time_gmtime(PyModuleDef *module, PyObject *args) | { | PyObject *return_value = NULL; | time_t seconds; | | if (!PyArg_ParseTuple(args, | "|O&:gmtime", | PyObject_to_time_t, &seconds)) | goto exit; | return_value = time_gmtime_impl(module, seconds); | | exit: | return return_value; | } `---- This works if the user calls time.gmtime(None), but it fails for time.gmtime(). It seems that in that case my C converter function is never called. What's the trick that I'm missing? Thanks! -Nikolaus -- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C ?Time flies like an arrow, fruit flies like a Banana.? From ethan at stoneleaf.us Mon Jan 20 07:15:24 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 19 Jan 2014 22:15:24 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <87ha8z9x8s.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> <20140119113745.GU3915@ando> <52DC0EF6.6020001@stoneleaf.us> <87ha8z9x8s.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52DCBEFC.6080706@stoneleaf.us> On 01/19/2014 06:56 PM, Stephen J. Turnbull wrote: > Ethan Furman writes: > >> Well, that means that this PEP just further strengthens the notion >> that format is for text (as then a custom numeric type could easily >> override the display even for :d, :h, etc.) and % is for bytes >> (where such glyphs are not natively representable anyway). > > This argument is specious. I don't think so. I think it's a good argument for the future of Python code. Mind you, I should probably have said % is primarily for bytes, or even more useful for bytes than for text. The idea being that true text fun stuff requires format, while bytes can only use % for easy formatting. > Alternative numeric characters [are] just as representable as the ASCII > digits are, and in the same way (by defining a bytes <-> str mapping, > aka codec). The problem is not that they're non-representable, it's > that they're non-ASCII, and the numeric format codes implicitly > specify the ASCII numerals when in text as well as when in bytes. Certainly. And you can't change that either. Oh, wait, you can! Define your own! class LocalNum(int): "displays d, i, and u codes in local language" def __format__(self, fmt): # do the fancy stuff so the characters are not ASCII, but whatever # is local here Then you could have your text /and/ your numbers be in your own language. But you can't get that using % unless you always call a custom function and use %s. > (Note > that "'{:d}'.format(True)" -> '1' works because True *is* an int and so > can be d-formatted in principle. It's not an exceptional case. It's > a different issue from what you're talking about here.) "'{:d}'.format(True)" is not exceptional, you're right. But "'%d' % True" is, and was singled-out in the unicode display code to print as '1' and not as 'True'. (Now all int subclasses behave this way (in 3.4 anyways).) And I think it's the same issue, or at least closely related. If you create a custom number type with the intention of displaying them in the local lingo, you have to use __format__ because % is hard coded to yield digits that map to ASCII. > These PEPs provide a crutch for such crippled software, allowing them > to hobble into the House of Python 3. Very picturesque. > That's obvious, so please don't try to obfuscate it; just declare > "consenting adults" and move on. Lots of features can be abused. That doesn't mean we shouldn't talk about the intended use cases and encourage those. -- ~Ethan~ From stephen at xemacs.org Mon Jan 20 08:10:40 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 20 Jan 2014 16:10:40 +0900 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <52DCBEFC.6080706@stoneleaf.us> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> <20140119113745.GU3915@ando> <52DC0EF6.6020001@stoneleaf.us> <87ha8z9x8s.fsf@uwakimon.sk.tsukuba.ac.jp> <52DCBEFC.6080706@stoneleaf.us> Message-ID: <87d2jn9lhr.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > > This argument is specious. > > I don't think so. I think it's a good argument for the future of > Python code. I agree that restricting bytes '%'-formatting to ASCII is a good idea, but you should base your arguments on a correct description of what's going on. It's not an issue of representability. It's an issue of "we should support this for ASCII because it's a useful, nearly universal convention, and we should not support ASCII supersets because that leads to mojibake." > Then you could have your text /and/ your numbers be in your own > language. My language uses numerals other than those in the ASCII repertoire in a rather stylized way. I can't use __format__ for that, because it depends on context, anyway. Most of the time the digits in the ASCII set are used (especially in tables and the like). I believe that's true for all languages nowadays. > Lots of features can be abused. That doesn't mean we shouldn't > talk about the intended use cases and encourage those. I only objected to claims that issues of "representability" and "what I can do with __format__" support the preferred use cases, not to descriptions of the preferred use cases. From larry at hastings.org Mon Jan 20 09:05:16 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 20 Jan 2014 00:05:16 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DBFD65.8050307@stoneleaf.us> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> Message-ID: <52DCD8BC.1040205@hastings.org> On 01/19/2014 08:29 AM, Ethan Furman wrote: > On 01/19/2014 03:32 AM, Georg Brandl wrote: >> Am 19.01.2014 11:19, schrieb Larry Hastings: >>> Not kidding, my best idea so far is "foo.clinic.h.h", >> >> Why not always put clinic into its own directory? >> >> Modules/mathmodule.c -> Modules/clinic/mathmodule.c.h >> Modules/mathmodule.h -> Modules/clinic/mathmodule.h.h >> >> At least that is consistent, allows easy exclusion in tools, and gets >> rid >> of the additional "clinic" in the filename. > > +1 > > If AC will work with both .c and .h files. I think a separate > directory is the way to go. In theory, Argument Clinic works with any file for which it can iterate over by lines and recognize comments. It current supports C and Python files and automatically recognizes a bunch of extensions. -------------------- Okay, I'm taking a poll. I will total your answers and take the result... strongly under advisement. ;-) The rules: * The poll will be over in 48 hours, maybe sooner if a winner emerges early. * Please express your vote from -1 to +1. -0 and +0 will only be differentiated during a tiebreaker. * If you don't vote for a contestant, your vote will be assumed to be 0. * You may change your vote at any time while the poll is still running. * If you wish to nominate a new contestant, you may. Please give the contestant a name, and express how it would transform the filenames "foo.c" and "foo.h". I would strongly prefer that all transformations be expressable using str.format(transformation, filename="foo.c", basename="foo", extension=".c") . The contestants so far: Contestant 1: "Add .clinic.h" foo.c -> foo.c.clinic.h foo.h -> foo.h.clinic.h Contestant 2: "Add .ac.h" foo.c -> foo.c.ac.h foo.h -> foo.h.ac.h Contestant 3: "Add .clinic" foo.c -> foo.c.clinic foo.h -> foo.h.clinic Contestant 4: "Put in clinic directory, add .h" foo.c -> clinic/foo.c.h foo.h -> clinic/foo.h.h Contestant 5: "Put in __clinic__ directory, add .h" foo.c -> __clinic__/foo.c.h foo.h -> __clinic__/foo.h.h I didn't add a contestant for what Stefan Krah originally suggested ("foo.c -> foo.h") because it's not clear how this would handle "foo.h". You'll notice the current behavior is no longer in the running, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Mon Jan 20 09:15:04 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 20 Jan 2014 09:15:04 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DCD8BC.1040205@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: Am 20.01.2014 09:05, schrieb Larry Hastings: > > On 01/19/2014 08:29 AM, Ethan Furman wrote: >> On 01/19/2014 03:32 AM, Georg Brandl wrote: >>> Am 19.01.2014 11:19, schrieb Larry Hastings: >>>> Not kidding, my best idea so far is "foo.clinic.h.h", >>> >>> Why not always put clinic into its own directory? >>> >>> Modules/mathmodule.c -> Modules/clinic/mathmodule.c.h >>> Modules/mathmodule.h -> Modules/clinic/mathmodule.h.h >>> >>> At least that is consistent, allows easy exclusion in tools, and gets rid >>> of the additional "clinic" in the filename. >> >> +1 >> >> If AC will work with both .c and .h files. I think a separate directory is the >> way to go. > > In theory, Argument Clinic works with any file for which it can iterate over by > lines and recognize comments. It current supports C and Python files and > automatically recognizes a bunch of extensions. > > -------------------- > > Okay, I'm taking a poll. I will total your answers and take the result... > strongly under advisement. ;-) > > The rules: > * The poll will be over in 48 hours, maybe sooner if a winner emerges early. > * Please express your vote from -1 to +1. -0 and +0 will only be differentiated > during a tiebreaker. > * If you don't vote for a contestant, your vote will be assumed to be 0. > * You may change your vote at any time while the poll is still running. > * If you wish to nominate a new contestant, you may. Please give the contestant > a name, and express how it would transform the filenames "foo.c" and "foo.h". I > would strongly prefer that all transformations be expressable using > str.format(transformation, filename="foo.c", basename="foo", extension=".c") . > > > The contestants so far: > > Contestant 1: "Add .clinic.h" > > foo.c -> foo.c.clinic.h > foo.h -> foo.h.clinic.h -0. (Clutters the directory.) > Contestant 2: "Add .ac.h" > > foo.c -> foo.c.ac.h > foo.h -> foo.h.ac.h -1. (Autoconf...) > Contestant 3: "Add .clinic" > > foo.c -> foo.c.clinic > foo.h -> foo.h.clinic -1. (Doesn't get included in global *.[ch] search, clutters the directory.) > Contestant 4: "Put in clinic directory, add .h" > > foo.c -> clinic/foo.c.h > foo.h -> clinic/foo.h.h +1. > Contestant 5: "Put in __clinic__ directory, add .h" > > foo.c -> __clinic__/foo.c.h > foo.h -> __clinic__/foo.h.h -1. (Too complicated; this isn't Python packages we're talking about.) Georg From ethan at stoneleaf.us Mon Jan 20 08:56:09 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 19 Jan 2014 23:56:09 -0800 Subject: [Python-Dev] PEP 461 Final? In-Reply-To: <87d2jn9lhr.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52D95F11.3020005@stoneleaf.us> <20140118012751.GM3915@ando> <52D9DE09.60102@stoneleaf.us> <20140119113745.GU3915@ando> <52DC0EF6.6020001@stoneleaf.us> <87ha8z9x8s.fsf@uwakimon.sk.tsukuba.ac.jp> <52DCBEFC.6080706@stoneleaf.us> <87d2jn9lhr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52DCD699.3040006@stoneleaf.us> On 01/19/2014 11:10 PM, Stephen J. Turnbull wrote: > Ethan Furman writes: > > > > This argument is specious. > > > > I don't think so. I think it's a good argument for the future of > > Python code. > > I agree that restricting bytes '%'-formatting to ASCII is a good idea, > but you should base your arguments on a correct description of what's > going on. It's not an issue of representability. It's an issue of > "we should support this for ASCII because it's a useful, nearly > universal convention, and we should not support ASCII supersets > because that leads to mojibake." > > > Then you could have your text /and/ your numbers be in your own > > language. > > My language uses numerals other than those in the ASCII repertoire in > a rather stylized way. I can't use __format__ for that, because it > depends on context, anyway. Most of the time the digits in the ASCII > set are used (especially in tables and the like). I believe that's > true for all languages nowadays. > > > Lots of features can be abused. That doesn't mean we shouldn't > > talk about the intended use cases and encourage those. > > I only objected to claims that issues of "representability" and "what > I can do with __format__" support the preferred use cases, not to > descriptions of the preferred use cases. Thank you. I appreciate your time. -- ~Ethan~ From ncoghlan at gmail.com Mon Jan 20 10:07:36 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Jan 2014 19:07:36 +1000 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: +1 for Contestant 4 for me as well, +0 for Contestant 5, -1 for the others. Same reasons as Georg, even where my votes are different. -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Mon Jan 20 11:16:47 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 20 Jan 2014 02:16:47 -0800 Subject: [Python-Dev] signature.object, argument clinic and grouped parameters In-Reply-To: References: Message-ID: <52DCF78F.2090000@hastings.org> On 01/19/2014 08:30 PM, Nick Coghlan wrote: > > Guido, Larry and I thrashed out the required semantics for parameter > groups at PyCon US last year (and I believe the argument clinic PEP > describes those accurately). > > They're mainly needed to represent oddball signatures like range() and > slice(). > > However, I'm inclined to say that the affected functions should simply > not support introspection until Python 3.5. > > It's not just a matter of the data model, there's also the matter of > defining the string representation. > Au contraire!, says the man writing patches. How it works right now: Argument Clinic publishes the signature information for builtins as the first line of the builtin's docstring. It gets clipped off before __doc__ returns it, and it's made available as __text_signature__. inspect.Signature retrieves this string and passes it in to ast.parse(). It then walks the resulting tree, producing the Function and Parameter objects as it goes. Argument Clinic has the information about positional-only parameters and optional groups, but it can't express it in this text signature. The text signature is parsed by ast.parse(), and ast.parse() only understands Python syntax. So the information is thrown away. Boo hoo. I was trying to change this with PEP 457: http://www.python.org/dev/peps/pep-0457/ PEP 457 proposed new Python syntax for positional-only parameters and optional groups. The syntax looks like Argument Clinic's syntax for these features, but with commas replacing newlines: Positional-only parameters are delimited with a single slash. In "def foo(self, /, a)", self is positional-only and a is not. Optional groups are parameters wrapped with square brackets, with the brackets never appearing between a parameter and a comma. In "def foo(a, [b, c])", b and c are in an optional group together. Groups can be nested with some restrictions. To be clear: I wasn't proposing we actually add this syntax to Python! Just that, *if* we added support for positional-only parameters and optional groups, it would be this syntax. I did propose that documentation and tools like Argument Clinic could use it now. PEP 457 didn't get any support. People said: "We don't need to define syntax if we're not going to use it. If you really need this you can do it privately." Okay! I'll do it privately! Hopefully this week! Step 1: Extend the format of the __text_signature__ to use PEP 457 syntax. Then *strip* that extra syntax in inspect.Signature, producing a sanitized string that is parseable by ast.parse(), and storing the extra information on the side. Pass the sanitized string in to ast.parse(), walk the parse tree, and merge the positional-only / optional group information into the Parameter objects as I go. inspect.Signature objects can already represent positional-only parameters, and I plan to add a new member ("group", see below) that lets them represent optional groups too. Hooray! inspect.Signature objects can now represent almost every Python callable! Step 2: make pydoc print the optional groups information, aka the PEP 457 square brackets, because that's uncontroversial. pydoc on builtins has printed things like "range([start], stop, [step])" for decades. Everybody intuitively understands it, everybody likes it. Step 3, if people want it: Also make pydoc display which parameters are positional-only. I was going to propose that in a couple of days, when I was getting near ready to implement it. But I guess we're discussing it now. Unsurprisingly, I propose to use the PEP 457 syntax here. -------------------- Regarding Step 3 above, here's something to consider. str(inspect.signature(foo)) produces a very nice-looking string, which currently is (almost!) always parseable by ast.parse() like so: sig = inspect.signature(foo) if sig: ast.parse("def foo" + str(sig) + ": pass") On the one hand, this mechanical round-trip ability is kind of a nice property. On the other hand, it does restrict the format of the string--it can't have our square brackets, it can't have our "/,". Meanwhile __str__ makes no guarantee that the string it returns is a valid Python expression. We could do both. Make inspect.Signature.__str__ only return Python compatible syntax, and allow another mechanism for pydoc to produce something more informative (but not Python compatible) for the user, like so: >>> str(inspect.signature(_pickle.Pickler.dump)) '(self, obj)' >>> pydoc.plain(pydoc.render_doc(pickle.Pickler.dump)) "...\n_pickle.Pickler.dump(self, /, obj)\n..." Or we could do it the other way around. So I guess I'm proposing four alternatives: 1) inspect.Signature.__str__() must always be parsable by ast.parse(). 2) inspect.Signature.__str__() must always be parsable by ast.parse(). Add another method to inspect.Signature that can use PEP 457 syntax, use that from pydoc. 3) inspect.Signature.__str__() produces PEP 457 syntax. Add another method to inspect.Signature producing a text representation that is parseable by ast.parse(). 4) inspect.Signature.__str__() produces PEP 457 syntax. -------------------- In case you thought we were done, there's one more wrinkle: str(inspect.signature(foo)) *already* marks positional-only parameters! I discovered that in the last day or two. Late last week I noticed that "self" is *always* positional-only for builtins. It doesn't matter if the builtin is METH_KEYWORDS. So, in my burgeoning "add support for more builtins" patch, in inspect.Signature I marked self parameters on builtins as positional-only. I was rewarded with this: >>> str(inspect.signature(_pickle.Pickler.dump)) '(, obj)' Yes, inspect.Signature.__str__() uses angle brackets to denote positional-only parameters. I think this behavior needs to be removed. It's undocumented and way non-obvious. I'm not aware of this syntax getting support from much of anybody--the only reason it's survived this long is because nobody besides me has ever seen it in the wild. -------------------- To address Yury's proposal: > So before committing to the parameters groups idea, I'd like to propose > somewhat simpler, but powerful enough to solve our todays problems > solution. > > What if we add a notion of "optional" parameters? Your proposal gets a "no, absolutely not" vote from me. 1. We already have a notion of "optional parameters". Parameters with default values are optional. 2. Your proposed syntax doesn't mention how we'd actually establish default values for parameters. So it's insufficient to handle existing code. 3. Your syntax/semantics, as described, can't convey the concept of optional groups. So it's insufficient to solve the problem it sets out to solve. 4. Your proposed syntax changes 80% of existing code--any parameter with a default value. I don't know how you concluded this was "simpler". Here's my counter-proposal. 1. We add a new class to inspect named "ParameterGroup". This class will have only one public member, "parent". ParameterGroup.parent will always be either None or a different inspect.ParameterGroup instance. 2. We add a new member to inspect.Parameter named "group". "group" will be either None or an instance of inspect.ParameterGroup. 3. We add a new method on Parameter named "is_optional()". "is_optional()" returns True if the function can be called without providing the parameter. Here's the implementation: def is_optional(self): return (self.default is not self._empty) or (self.group is not None) 4. Textual representations intended for displaying to the user are permitted to use square brackets to denote optional groups. They might also be permitted to use "/" to delimit positional-only parameters from other types of parameters, if the community accepts this. 5. We don't change any Python language semantics and we don't break any existing code. Under my proposal: bytearray([source, [encoding, [errors]]]) source.group != encoding.group encoding.group != errors.group source.group.parent == None encoding.group.parent == source.group errors.group.parent == encoding.group source.is_optional() == encoding.is_optional() == errors.is_optional() == True curses.window.addch([x, y,] ch, [attr]) x.group == y.group x.group != attr.group x.group.parent == attr.group.parent == None x.is_optional() == y.is_optional() == attr.is_optional() == True ch.is_optional() == False Sorry about the length of this email, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Jan 20 12:01:12 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 20 Jan 2014 06:01:12 -0500 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: On 1/20/2014 4:07 AM, Nick Coghlan wrote: > +1 for Contestant 4 for me as well, +0 for Contestant 5, -1 for the > others. Same reasons as Georg, even where my votes are different. Ditto for me. -- Terry Jan Reedy From storchaka at gmail.com Mon Jan 20 12:14:33 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 20 Jan 2014 13:14:33 +0200 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DCD8BC.1040205@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: 20.01.14 10:05, Larry Hastings ???????(??): > The contestants so far: > > Contestant 1: "Add .clinic.h" > > foo.c -> foo.c.clinic.h > foo.h -> foo.h.clinic.h -0.5. > Contestant 2: "Add .ac.h" > > foo.c -> foo.c.ac.h > foo.h -> foo.h.ac.h -1. > Contestant 3: "Add .clinic" > > foo.c -> foo.c.clinic > foo.h -> foo.h.clinic +1. (Doesn't get included in global *.[ch] search, generated files are located close to origins.) > Contestant 4: "Put in clinic directory, add .h" > > foo.c -> clinic/foo.c.h > foo.h -> clinic/foo.h.h -1. (Generated files are located far from origins, directory name clutters the namespace of directory names). > Contestant 5: "Put in __clinic__ directory, add .h" > > foo.c -> __clinic__/foo.c.h > foo.h -> __clinic__/foo.h.h -0.5. From solipsis at pitrou.net Mon Jan 20 12:19:09 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 20 Jan 2014 12:19:09 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: <20140120121909.60a63aaf@fsol> On Mon, 20 Jan 2014 00:05:16 -0800 Larry Hastings wrote: > > On 01/19/2014 08:29 AM, Ethan Furman wrote: > > On 01/19/2014 03:32 AM, Georg Brandl wrote: > >> Am 19.01.2014 11:19, schrieb Larry Hastings: > >>> Not kidding, my best idea so far is "foo.clinic.h.h", > >> > >> Why not always put clinic into its own directory? > >> > >> Modules/mathmodule.c -> Modules/clinic/mathmodule.c.h > >> Modules/mathmodule.h -> Modules/clinic/mathmodule.h.h > >> > >> At least that is consistent, allows easy exclusion in tools, and gets > >> rid > >> of the additional "clinic" in the filename. > > > > +1 > > > > If AC will work with both .c and .h files. I think a separate > > directory is the way to go. > > In theory, Argument Clinic works with any file for which it can iterate > over by lines and recognize comments. It current supports C and Python > files and automatically recognizes a bunch of extensions. > > -------------------- > > Okay, I'm taking a poll. I will total your answers and take the > result... strongly under advisement. ;-) > > The rules: > * The poll will be over in 48 hours, maybe sooner if a winner emerges early. > * Please express your vote from -1 to +1. -0 and +0 will only be > differentiated during a tiebreaker. > * If you don't vote for a contestant, your vote will be assumed to be 0. > * You may change your vote at any time while the poll is still running. > * If you wish to nominate a new contestant, you may. Please give the > contestant a name, and express how it would transform the filenames > "foo.c" and "foo.h". I would strongly prefer that all transformations > be expressable using str.format(transformation, filename="foo.c", > basename="foo", extension=".c") . > > > The contestants so far: > > Contestant 1: "Add .clinic.h" +1 > Contestant 2: "Add .ac.h" -0.5 > Contestant 3: "Add .clinic" -1 > Contestant 4: "Put in clinic directory, add .h" +1 > Contestant 5: "Put in __clinic__ directory, add .h" -0.5 Regards Antoine. From ncoghlan at gmail.com Mon Jan 20 13:59:38 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Jan 2014 22:59:38 +1000 Subject: [Python-Dev] signature.object, argument clinic and grouped parameters In-Reply-To: <52DCF78F.2090000@hastings.org> References: <52DCF78F.2090000@hastings.org> Message-ID: On 20 January 2014 20:16, Larry Hastings wrote: > > > On 01/19/2014 08:30 PM, Nick Coghlan wrote: > > Guido, Larry and I thrashed out the required semantics for parameter groups > at PyCon US last year (and I believe the argument clinic PEP describes those > accurately). > > They're mainly needed to represent oddball signatures like range() and > slice(). > > However, I'm inclined to say that the affected functions should simply not > support introspection until Python 3.5. > > It's not just a matter of the data model, there's also the matter of > defining the string representation. > > > Au contraire!, says the man writing patches. When I wrote that, I was thinking we had made inspect.Signature.__repr__ produce a nice string format, but then I noticed in the REPL today that we never got around to doing that - I think because we didn't know how to handle positional-only arguments, which already can't be expressed as Python syntax. (I haven't checked if we have an RFE filed anywhere) However, while I know you're keen to finally make introspection work for all C level callables in 3.4, even the ones with signatures that can't be expressed as Python function signatures, I'd like to strongly encourage you to hold off on that last part until Python 3.5. We're already in beta, we're already introducing a lot of code churn to get the C level callables that *can* have their signatures expressed as Python syntax converted, so where's the harm to users in saying that C level callables with non-Python signatures still don't support introspection in Python 3.4? Almost no C level callables support programmatic introspection in Python 3.3, so even what inspect.signature will already provide in beta 3 is a big step forward. While the text string used to communicate between Argument Clinic and inspect.signature will be private, the representation on inspect.Signature objects will be a new *public* API. As the discussions between you, me and Yury show, I don't think there's an immediately obvious best answer of how to do that. Your suggestion of just adding the group numbers to the Parameter objects would *work*, but it's not very Pythonic - we have container types that support nesting, which seems like a more natural structure for indicating parameter groups at the Python level. Essentially, the group number proposal feels like the kind of low level interface returned by getfullargspec(), not the kind of high level interface defined for inspect.Signature in PEP 362. It's going to take a while to come up with a public API for this aspect of C level signatures that feels right to at least you, me and Yury, and the beta period *really* isn't the right time to be doing that. If other changes like the binary interpolation proposals and adding the PEP 451 based target attributes to runpy can wait until Python 3.5 due to feature freeze, then I think adding full C level signature support to inspect.Signature can also wait. That way, you can resurrect PEP 457, recast it as proposing an *output* format for inspect.Signature.__repr__(), add an inspect.Signature.fromstr() API that can use it to create a signature object from __text_signature__ attributes (rather than relying on ast.parse), add the optional group support and do it *right*, rather than trying to squeeze it in as a new public API during the beta period, which may lock us in to supporting an introspection API we later regret. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jan 20 14:03:16 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Jan 2014 23:03:16 +1000 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: On 20 January 2014 21:14, Serhiy Storchaka wrote: > 20.01.14 10:05, Larry Hastings ???????(??): >> Contestant 4: "Put in clinic directory, add .h" >> >> foo.c -> clinic/foo.c.h >> foo.h -> clinic/foo.h.h > > > -1. (Generated files are located far from origins, directory name clutters > the namespace of directory names). Larry's not talking about a top level directory here (at least I hope he isn't). This proposal would mean using "Objects/clinic", "Python/clinic", "Modules/clinic" as appropriate. That's substantially *less* directory clutter rather than more, just like __pycache__ vs the old model of implicitly creating adjacent .pyc and .pyo files. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Mon Jan 20 14:31:53 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 20 Jan 2014 15:31:53 +0200 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: 20.01.14 15:03, Nick Coghlan ???????(??): > On 20 January 2014 21:14, Serhiy Storchaka wrote: >> 20.01.14 10:05, Larry Hastings ???????(??): >>> Contestant 4: "Put in clinic directory, add .h" >>> >>> foo.c -> clinic/foo.c.h >>> foo.h -> clinic/foo.h.h >> >> >> -1. (Generated files are located far from origins, directory name clutters >> the namespace of directory names). > > Larry's not talking about a top level directory here (at least I hope > he isn't). This proposal would mean using "Objects/clinic", > "Python/clinic", "Modules/clinic" as appropriate. This means the appearance of directories with the common name "clinic" in random places of the source tree. Some special name ("__clinic__", ".clinic") looks slightly less confusing to me. From storchaka at gmail.com Mon Jan 20 14:38:56 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 20 Jan 2014 15:38:56 +0200 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: <87iotfxp38.fsf@vostro.rath.org> References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> <87iotfxp38.fsf@vostro.rath.org> Message-ID: 20.01.14 06:19, Nikolaus Rath ???????(??): > This works if the user calls time.gmtime(None), but it fails for > time.gmtime(). It seems that in that case my C converter function is > never called. > > What's the trick that I'm missing? /*[clinic input] time.gmtime [ seconds: time_t ] / [clinic start generated code]*/ From meadori at gmail.com Mon Jan 20 15:35:09 2014 From: meadori at gmail.com (Meador Inge) Date: Mon, 20 Jan 2014 08:35:09 -0600 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DCD8BC.1040205@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: On Mon, Jan 20, 2014 at 2:05 AM, Larry Hastings wrote: Contestant 1: "Add .clinic.h" > > foo.c -> foo.c.clinic.h > foo.h -> foo.h.clinic.h > > -0 > Contestant 2: "Add .ac.h" > > foo.c -> foo.c.ac.h > foo.h -> foo.h.ac.h > > -1 > Contestant 3: "Add .clinic" > > foo.c -> foo.c.clinic > foo.h -> foo.h.clinic > > -1 (This will break too many tools in the C/C++ tools ecosystem and I am not convinced by any of the arguments given those far.) > Contestant 4: "Put in clinic directory, add .h" > > foo.c -> clinic/foo.c.h > foo.h -> clinic/foo.h.h > > +1 > Contestant 5: "Put in __clinic__ directory, add .h" > > foo.c -> __clinic__/foo.c.h > foo.h -> __clinic__/foo.h.h > > -1 ("clinic" without the dunders more clear for a directory name.) -- Meador -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Jan 20 16:06:38 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 20 Jan 2014 10:06:38 -0500 Subject: [Python-Dev] signature.object, argument clinic and grouped parameters In-Reply-To: References: <52DCF78F.2090000@hastings.org> Message-ID: Larry, Nick, On January 20, 2014 at 8:00:35 AM, Nick Coghlan (ncoghlan at gmail.com) wrote: > > Your proposal gets a "no, absolutely not" vote from me. > > 1. We already have a notion of "optional parameters". Parameters > with default values are optional. > 2. Your proposed syntax doesn't mention how we'd actually establish > default values for parameters. So it's insufficient to handle > existing code. > 3. Your syntax/semantics, as described, can't convey the concept > of optional groups. So it's insufficient to solve the problem > it sets out to solve. > 4. Your proposed syntax changes 80% of existing code--any parameter > with a default value. I don't know how you concluded this was "simpler". I withdraw my proposal in its current form. Turns out I?ve missed groups definition in PEP 436. But, just want to add a few things. My proposal is still good for the other problem ? having support for optional parameters (where ?optional? is in a sense of itertools.repeat last parameter). Again, my example from the first email in this thread. Suppose you want to write itertools.repeat in python (where ?n? can not be None, what matters is *was is passed or not*): ? _marker = object() ? def repeat(elem, n=_marker): ? ? ?pass Now, if you do 'str(inspect.signature(repeat))? you?ll have? something like:?'(elem, n=)? However, if we choose to have a special marker object defined in the stdlib, you?d write: ? def repeat(elem, n=functools.optional): ? ? ?if n is?functools.optional: ? ? ? ? # no param was passed and str of signature would look like ?(elem[, n])?. Yury From stefan at bytereef.org Mon Jan 20 16:07:51 2014 From: stefan at bytereef.org (Stefan Krah) Date: Mon, 20 Jan 2014 16:07:51 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DCD8BC.1040205@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: <20140120150751.GA28539@sleipnir.bytereef.org> Larry Hastings wrote: > Contestant 4: "Put in clinic directory, add .h" > > foo.c -> clinic/foo.c.h > foo.h -> clinic/foo.h.h +1 for this, 0 for the rest. Bonus points for any other directory name that is more self-descriptive. ;) Stefan Krah From brett at python.org Mon Jan 20 16:12:08 2014 From: brett at python.org (Brett Cannon) Date: Mon, 20 Jan 2014 10:12:08 -0500 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DCD8BC.1040205@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: On Mon, Jan 20, 2014 at 3:05 AM, Larry Hastings wrote: > > On 01/19/2014 08:29 AM, Ethan Furman wrote: > > On 01/19/2014 03:32 AM, Georg Brandl wrote: > > Am 19.01.2014 11:19, schrieb Larry Hastings: > > Not kidding, my best idea so far is "foo.clinic.h.h", > > > Why not always put clinic into its own directory? > > Modules/mathmodule.c -> Modules/clinic/mathmodule.c.h > Modules/mathmodule.h -> Modules/clinic/mathmodule.h.h > > At least that is consistent, allows easy exclusion in tools, and gets rid > of the additional "clinic" in the filename. > > > +1 > > If AC will work with both .c and .h files. I think a separate directory is > the way to go. > > > In theory, Argument Clinic works with any file for which it can iterate > over by lines and recognize comments. It current supports C and Python > files and automatically recognizes a bunch of extensions. > > -------------------- > > Okay, I'm taking a poll. I will total your answers and take the result... > strongly under advisement. ;-) > > The rules: > * The poll will be over in 48 hours, maybe sooner if a winner emerges > early. > * Please express your vote from -1 to +1. -0 and +0 will only be > differentiated during a tiebreaker. > * If you don't vote for a contestant, your vote will be assumed to be 0. > * You may change your vote at any time while the poll is still running. > * If you wish to nominate a new contestant, you may. Please give the > contestant a name, and express how it would transform the filenames "foo.c" > and "foo.h". I would strongly prefer that all transformations be > expressable using str.format(transformation, filename="foo.c", > basename="foo", extension=".c") . > > > The contestants so far: > > Contestant 1: "Add .clinic.h" > > foo.c -> foo.c.clinic.h > foo.h -> foo.h.clinic.h > > +0 > Contestant 2: "Add .ac.h" > > foo.c -> foo.c.ac.h > foo.h -> foo.h.ac.h > > -1 > Contestant 3: "Add .clinic" > > foo.c -> foo.c.clinic > foo.h -> foo.h.clinic > > +0 > Contestant 4: "Put in clinic directory, add .h" > > foo.c -> clinic/foo.c.h > foo.h -> clinic/foo.h.h > > +0 > Contestant 5: "Put in __clinic__ directory, add .h" > > foo.c -> __clinic__/foo.c.h > foo.h -> __clinic__/foo.h.h > > +1 > > I didn't add a contestant for what Stefan Krah originally suggested > ("foo.c -> foo.h") because it's not clear how this would handle "foo.h". > > > You'll notice the current behavior is no longer in the running, > And +1 to making a side file the default for ease of use. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sky.kok at speaklikeaking.com Mon Jan 20 15:55:13 2014 From: sky.kok at speaklikeaking.com (Vajrasky Kok) Date: Mon, 20 Jan 2014 22:55:13 +0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DCD8BC.1040205@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: > > The contestants so far: > > Contestant 1: "Add .clinic.h" > > foo.c -> foo.c.clinic.h > foo.h -> foo.h.clinic.h +0 > > Contestant 2: "Add .ac.h" > > foo.c -> foo.c.ac.h > foo.h -> foo.h.ac.h +1 > > Contestant 3: "Add .clinic" > > foo.c -> foo.c.clinic > foo.h -> foo.h.clinic +0 > > Contestant 4: "Put in clinic directory, add .h" > > foo.c -> clinic/foo.c.h > foo.h -> clinic/foo.h.h -1 > > Contestant 5: "Put in __clinic__ directory, add .h" > > foo.c -> __clinic__/foo.c.h > foo.h -> __clinic__/foo.h.h -1 From zachary.ware+pydev at gmail.com Mon Jan 20 16:30:27 2014 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Mon, 20 Jan 2014 09:30:27 -0600 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DCD8BC.1040205@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: On Mon, Jan 20, 2014 at 2:05 AM, Larry Hastings wrote: > Contestant 1: "Add .clinic.h" > > foo.c -> foo.c.clinic.h > foo.h -> foo.h.clinic.h -0 > Contestant 2: "Add .ac.h" > > foo.c -> foo.c.ac.h > foo.h -> foo.h.ac.h -1 > Contestant 3: "Add .clinic" > > foo.c -> foo.c.clinic > foo.h -> foo.h.clinic -0 > Contestant 4: "Put in clinic directory, add .h" > > foo.c -> clinic/foo.c.h > foo.h -> clinic/foo.h.h +1 > Contestant 5: "Put in __clinic__ directory, add .h" > > foo.c -> __clinic__/foo.c.h > foo.h -> __clinic__/foo.h.h +0 -- Zach From phd at phdru.name Mon Jan 20 16:44:28 2014 From: phd at phdru.name (Oleg Broytman) Date: Mon, 20 Jan 2014 16:44:28 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <20140120150751.GA28539@sleipnir.bytereef.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> <20140120150751.GA28539@sleipnir.bytereef.org> Message-ID: <20140120154428.GA7347@phdru.name> On Mon, Jan 20, 2014 at 04:07:51PM +0100, Stefan Krah wrote: > Bonus points for any other directory name that is > more self-descriptive. ;) Argument Clinic is a PyArg_Parse* preprocessor, AFAIU. Why not call the directory "pyargprep", "pyargparsers" or such? Or may be "aclinic-output"? Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From brett at python.org Mon Jan 20 17:35:43 2014 From: brett at python.org (Brett Cannon) Date: Mon, 20 Jan 2014 11:35:43 -0500 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <20140120154428.GA7347@phdru.name> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> <20140120150751.GA28539@sleipnir.bytereef.org> <20140120154428.GA7347@phdru.name> Message-ID: On Mon, Jan 20, 2014 at 10:44 AM, Oleg Broytman wrote: > On Mon, Jan 20, 2014 at 04:07:51PM +0100, Stefan Krah > wrote: > > Bonus points for any other directory name that is > > more self-descriptive. ;) > > Argument Clinic is a PyArg_Parse* preprocessor, AFAIU. Why not call > the directory "pyargprep", "pyargparsers" or such? > Or may be "aclinic-output"? > The fact that it emits PyArg_Parse*-using code is an implementation detail. It could easily change if we want since Argument Clinic abstracts out argument parsing entirely. -------------- next part -------------- An HTML attachment was scrubbed... URL: From taleinat at gmail.com Mon Jan 20 17:41:39 2014 From: taleinat at gmail.com (Tal Einat) Date: Mon, 20 Jan 2014 18:41:39 +0200 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DCD8BC.1040205@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: On Mon, Jan 20, 2014 at 10:05 AM, Larry Hastings wrote: > > Okay, I'm taking a poll. I will total your answers and take the result... > strongly under advisement. ;-) +1 for #5, +0.5 for #4, -1 for the rest. From stefan at bytereef.org Mon Jan 20 17:58:40 2014 From: stefan at bytereef.org (Stefan Krah) Date: Mon, 20 Jan 2014 17:58:40 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <20140120150751.GA28539@sleipnir.bytereef.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> <20140120150751.GA28539@sleipnir.bytereef.org> Message-ID: <20140120165840.GA29753@sleipnir.bytereef.org> Stefan Krah wrote: > Larry Hastings wrote: > > Contestant 4: "Put in clinic directory, add .h" > > > > foo.c -> clinic/foo.c.h > > foo.h -> clinic/foo.h.h > > +1 for this, 0 for the rest. Bonus points for any other directory name that is > more self-descriptive. ;) On second thought, I do find that having Modules/cjkcodecs Modules/clinic looks kind of weird. So I'm +1 for __clinic__, 0 on the rest. +2 for something more self-explanatory like: Modules/__arghandlers__ Modules/__autogen__ Stefan Krah From g.brandl at gmx.net Mon Jan 20 19:09:11 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 20 Jan 2014 19:09:11 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: Am 20.01.2014 14:31, schrieb Serhiy Storchaka: > 20.01.14 15:03, Nick Coghlan ???????(??): >> On 20 January 2014 21:14, Serhiy Storchaka wrote: >>> 20.01.14 10:05, Larry Hastings ???????(??): >>>> Contestant 4: "Put in clinic directory, add .h" >>>> >>>> foo.c -> clinic/foo.c.h >>>> foo.h -> clinic/foo.h.h >>> >>> >>> -1. (Generated files are located far from origins, directory name clutters >>> the namespace of directory names). >> >> Larry's not talking about a top level directory here (at least I hope >> he isn't). This proposal would mean using "Objects/clinic", >> "Python/clinic", "Modules/clinic" as appropriate. > > This means the appearance of directories with the common name "clinic" > in random places of the source tree. Some special name ("__clinic__", > ".clinic") looks slightly less confusing to me. "clinic" shouldn't be such a common name in C soures :) Georg From ethan at stoneleaf.us Mon Jan 20 19:30:39 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 20 Jan 2014 10:30:39 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DCD8BC.1040205@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: <52DD6B4F.5000404@stoneleaf.us> On 01/20/2014 12:05 AM, Larry Hastings wrote: > > Contestant 1: "Add .clinic.h" > > foo.c -> foo.c.clinic.h > foo.h -> foo.h.clinic.h -1 > Contestant 2: "Add .ac.h" > > foo.c -> foo.c.ac.h > foo.h -> foo.h.ac.h -1 > Contestant 3: "Add .clinic" > > foo.c -> foo.c.clinic > foo.h -> foo.h.clinic +0 > Contestant 4: "Put in clinic directory, add .h" > > foo.c -> clinic/foo.c.h > foo.h -> clinic/foo.h.h +0.5 > Contestant 5: "Put in __clinic__ directory, add .h" > > foo.c -> __clinic__/foo.c.h > foo.h -> __clinic__/foo.h.h +1 -- ~Ethan~ From barry at python.org Mon Jan 20 20:09:04 2014 From: barry at python.org (Barry Warsaw) Date: Mon, 20 Jan 2014 14:09:04 -0500 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DCD8BC.1040205@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: <20140120140904.48d01e05@anarchist.wooz.org> On Jan 20, 2014, at 12:05 AM, Larry Hastings wrote: >Contestant 5: "Put in __clinic__ directory, add .h" > > foo.c -> __clinic__/foo.c.h > foo.h -> __clinic__/foo.h.h This is cached output right? IOW, it can be regenerated if it's missing. If so, this seems like a nice parallel to __pycache__. It's mostly hidden until you want to go looking for it. +1 -Barry From brett at python.org Mon Jan 20 20:46:25 2014 From: brett at python.org (Brett Cannon) Date: Mon, 20 Jan 2014 14:46:25 -0500 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <20140120140904.48d01e05@anarchist.wooz.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> <20140120140904.48d01e05@anarchist.wooz.org> Message-ID: On Mon, Jan 20, 2014 at 2:09 PM, Barry Warsaw wrote: > On Jan 20, 2014, at 12:05 AM, Larry Hastings wrote: > > >Contestant 5: "Put in __clinic__ directory, add .h" > > > > foo.c -> __clinic__/foo.c.h > > foo.h -> __clinic__/foo.h.h > > This is cached output right? Yes, it's generated entirely based on data provided in original source file. > IOW, it can be regenerated if it's missing. If > so, this seems like a nice parallel to __pycache__. It's mostly hidden > until > you want to go looking for it. > More-or-less. The key difference is you will most likely look at the generated file *once* to copy-and-paste the relevant macros to paste into your source file for use (e.g. the relevant MethodDef stuff). But it's a one-time thing that never has to be done again as long as you don't rename a function or method. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmsr at lab.net Mon Jan 20 21:00:13 2014 From: rmsr at lab.net (Ryan Smith-Roberts) Date: Mon, 20 Jan 2014 12:00:13 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: <87iotfxp38.fsf@vostro.rath.org> References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> <87iotfxp38.fsf@vostro.rath.org> Message-ID: The trick you're missing is that *any time* you have an optional argument with a custom converter[1], PyArg_ParseTuple *only* calls the converter function in the case that the user *actually supplies some value*. This is a basic property of an optional argument. Another property is that the c_default is evaluated *every time*, as it is set *before* the call to PyArg_ParseTuple. Are these the best ways to do things? Maybe not, but it's how they are. Please do not use a custom converter for this case. It can't work. Please do what I outlined earlier (untested, somewhat verbose code follows): static int parse_time_t_arg(PyObject *arg, time_t *when) { if (arg == NULL || arg == Py_None) { *when = time(NULL); return 1; } if (_PyTime_ObjectToTime_t(arg, when) == -1) return 0; return 1; } /*[clinic input] time.gmtime seconds: object = None [clinic start generated code]*/ { time_t when; if (0 == parse_time_t_arg(seconds, &when)) return NULL; ... [1] If you set a default value, or put it in brackets as Serhiy later recommends, it works the same. On Sun, Jan 19, 2014 at 8:19 PM, Nikolaus Rath wrote: > Larry Hastings writes: > > On 01/18/2014 09:52 PM, Ryan Smith-Roberts wrote: > >> > >> I still advise you not to use this solution. time() is a system call > >> on many operating systems, and so it can be a heavier operation than > >> you'd think. Best to avoid it unless it's needed (on FreeBSD it > >> seems to add about 15% overhead to localtime(), for instance). > >> > > > > I agree. Converting to Argument Clinic should not cause a performance > > regression. Please don't add new calls to time() for the sake of > > making code more generic. > > > > A better choice would be to write a converter function in C, then use > > a custom converter that called it. Nikolaus: Is that something you're > > comfortable doing? > > I think I'll need some help. I don't know how to handle the case where > the user is not passing anything. > > Here's my attempt: > > ,---- > | /* C Converter for argument clinic > | If obj is NULL or Py_None, return current time. Otherwise, > | convert Python object to time_t. > | */ > | static int > | PyObject_to_time_t(PyObject *obj, time_t *stamp) > | { > | if (obj == NULL || obj == Py_None) { > | *stamp = time(NULL); > | } > | else { > | if (_PyTime_ObjectToTime_t(obj, stamp) == -1) > | return 0; > | } > | return 1; > | } > | > | /*[python input] > | class time_t_converter(CConverter): > | type = 'time_t' > | converter = 'PyObject_to_time_t' > | default = None > | [python start generated code]*/ > | /*[python end generated code: > checksum=da39a3ee5e6b4b0d3255bfef95601890afd80709]*/ > | > | > | /*[clinic input] > | time.gmtime > | > | seconds: time_t > | / > | > | [clinic start generated code]*/ > `---- > > but this results in the following code: > > ,---- > | static PyObject * > | time_gmtime(PyModuleDef *module, PyObject *args) > | { > | PyObject *return_value = NULL; > | time_t seconds; > | > | if (!PyArg_ParseTuple(args, > | "|O&:gmtime", > | PyObject_to_time_t, &seconds)) > | goto exit; > | return_value = time_gmtime_impl(module, seconds); > | > | exit: > | return return_value; > | } > `---- > > This works if the user calls time.gmtime(None), but it fails for > time.gmtime(). It seems that in that case my C converter function is > never called. > > What's the trick that I'm missing? > > > Thanks! > -Nikolaus > > -- > Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 > 02CF A9AD B7F8 AE4E 425C > > ?Time flies like an arrow, fruit flies like a Banana.? > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jan 20 21:05:55 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 20 Jan 2014 12:05:55 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> <20140120140904.48d01e05@anarchist.wooz.org> Message-ID: <52DD81A3.4000205@stoneleaf.us> On 01/20/2014 11:46 AM, Brett Cannon wrote: > On Mon, Jan 20, 2014 at 2:09 PM, Barry Warsaw wrote: >> On Jan 20, 2014, at 12:05 AM, Larry Hastings wrote: >> >>> Contestant 5: "Put in __clinic__ directory, add .h" >>> >>> foo.c -> __clinic__/foo.c.h >>> foo.h -> __clinic__/foo.h.h >> >> This is cached output right? > > Yes, it's generated entirely based on data provided > in original source file. > >> IOW, it can be regenerated if it's missing. If so, >> this seems like a nice parallel to __pycache__. >> It's mostly hidden until you want to go looking for it. > > More-or-less. The key difference is you will most likely > look at the generated file *once* to copy-and-paste the > relevant macros to paste into your source file for use > (e.g. the relevant MethodDef stuff). But it's a one-time > thing that never has to be done again as long as you don't > rename a function or method. Won't AC put those macros in the source file for you? -- ~Ethan~ From g.brandl at gmx.net Mon Jan 20 21:14:06 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 20 Jan 2014 21:14:06 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DD81A3.4000205@stoneleaf.us> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> <20140120140904.48d01e05@anarchist.wooz.org> <52DD81A3.4000205@stoneleaf.us> Message-ID: Am 20.01.2014 21:05, schrieb Ethan Furman: > On 01/20/2014 11:46 AM, Brett Cannon wrote: >> On Mon, Jan 20, 2014 at 2:09 PM, Barry Warsaw wrote: >>> On Jan 20, 2014, at 12:05 AM, Larry Hastings wrote: >>> >>>> Contestant 5: "Put in __clinic__ directory, add .h" >>>> >>>> foo.c -> __clinic__/foo.c.h >>>> foo.h -> __clinic__/foo.h.h >>> >>> This is cached output right? >> >> Yes, it's generated entirely based on data provided >> in original source file. >> >>> IOW, it can be regenerated if it's missing. If so, >>> this seems like a nice parallel to __pycache__. >>> It's mostly hidden until you want to go looking for it. >> >> More-or-less. The key difference is you will most likely >> look at the generated file *once* to copy-and-paste the >> relevant macros to paste into your source file for use >> (e.g. the relevant MethodDef stuff). But it's a one-time >> thing that never has to be done again as long as you don't >> rename a function or method. > > Won't AC put those macros in the source file for you? No, currently it wouldn't know where to look. And that's a good thing because AC never should modify anything not inbetween "clinic start generated code" and "clinic end generated code". But Larry has said that in the future the wohl PyMethodDef array might be generated by AC in a separate block (that you have to indicate in the file as well). Georg From tjreedy at udel.edu Mon Jan 20 21:25:03 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 20 Jan 2014 15:25:03 -0500 Subject: [Python-Dev] signature.object, argument clinic and grouped parameters In-Reply-To: References: <52DCF78F.2090000@hastings.org> Message-ID: On 1/20/2014 7:59 AM, Nick Coghlan wrote: > However, while I know you're keen to finally make introspection work > for all C level callables in 3.4, even the ones with signatures that > can't be expressed as Python function signatures, I'd like to strongly > encourage you to hold off on that last part until Python 3.5. ... > That way, you can resurrect PEP 457, recast it as proposing an > *output* format for inspect.Signature.__repr__(), add an > inspect.Signature.fromstr() API that can use it to create a signature > object from __text_signature__ attributes (rather than relying on > ast.parse), add the optional group support and do it *right*, rather > than trying to squeeze it in as a new public API during the beta > period, which may lock us in to supporting an introspection API we > later regret. I agree. What we can do with the API we have already is a great advance. -- Terry Jan Reedy From tjreedy at udel.edu Mon Jan 20 21:32:15 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 20 Jan 2014 15:32:15 -0500 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: On 1/20/2014 6:01 AM, Terry Reedy wrote: > On 1/20/2014 4:07 AM, Nick Coghlan wrote: >> +1 for Contestant 4 for me as well, +0 for Contestant 5, -1 for the >> others. Same reasons as Georg, even where my votes are different. > > Ditto for me. Except that after reading other responses, I might switch 4 and 5, so make that +1 for either 4 or 5. -- Terry Jan Reedy From larry at hastings.org Mon Jan 20 22:40:17 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 20 Jan 2014 13:40:17 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: <52DD97C1.5080408@hastings.org> On 01/20/2014 05:03 AM, Nick Coghlan wrote: > On 20 January 2014 21:14, Serhiy Storchaka wrote: >> 20.01.14 10:05, Larry Hastings ???????(??): >>> Contestant 4: "Put in clinic directory, add .h" >>> >>> foo.c -> clinic/foo.c.h >>> foo.h -> clinic/foo.h.h >> >> -1. (Generated files are located far from origins, directory name clutters >> the namespace of directory names). > Larry's not talking about a top level directory here (at least I hope > he isn't). This proposal would mean using "Objects/clinic", > "Python/clinic", "Modules/clinic" as appropriate. You're correct, I'm not talking about a top-level directory here. I'm talking about creating a "clinic" subdirectory in the same directory as the original file. The transformation suggested above was exact. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Mon Jan 20 22:43:19 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 20 Jan 2014 13:43:19 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <20140120140904.48d01e05@anarchist.wooz.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> <20140120140904.48d01e05@anarchist.wooz.org> Message-ID: <52DD9877.8000102@hastings.org> On 01/20/2014 11:09 AM, Barry Warsaw wrote: > On Jan 20, 2014, at 12:05 AM, Larry Hastings wrote: > >> Contestant 5: "Put in __clinic__ directory, add .h" >> >> foo.c -> __clinic__/foo.c.h >> foo.h -> __clinic__/foo.h.h > This is cached output right? IOW, it can be regenerated if it's missing. If > so, this seems like a nice parallel to __pycache__. It's mostly hidden until > you want to go looking for it. > > +1 It's cached output. The difference to __pycache__ is that the output will be checked in, and is something you might ever want to examine. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jan 20 22:47:28 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 20 Jan 2014 13:47:28 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> <20140120140904.48d01e05@anarchist.wooz.org> <52DD81A3.4000205@stoneleaf.us> Message-ID: <52DD9970.6090103@stoneleaf.us> On 01/20/2014 12:14 PM, Georg Brandl wrote: > Am 20.01.2014 21:05, schrieb Ethan Furman: >> On 01/20/2014 11:46 AM, Brett Cannon wrote: >>> On Mon, Jan 20, 2014 at 2:09 PM, Barry Warsaw wrote: >>>> On Jan 20, 2014, at 12:05 AM, Larry Hastings wrote: >>>> >>>>> Contestant 5: "Put in __clinic__ directory, add .h" >>>>> >>>>> foo.c -> __clinic__/foo.c.h >>>>> foo.h -> __clinic__/foo.h.h >>>> >>>> This is cached output right? >>> >>> Yes, it's generated entirely based on data provided >>> in original source file. >>> >>>> IOW, it can be regenerated if it's missing. If so, >>>> this seems like a nice parallel to __pycache__. >>>> It's mostly hidden until you want to go looking for it. >>> >>> More-or-less. The key difference is you will most likely >>> look at the generated file *once* to copy-and-paste the >>> relevant macros to paste into your source file for use >>> (e.g. the relevant MethodDef stuff). But it's a one-time >>> thing that never has to be done again as long as you don't >>> rename a function or method. >> >> Won't AC put those macros in the source file for you? > > No, currently it wouldn't know where to look. And that's a good thing > because AC never should modify anything not inbetween "clinic start > generated code" and "clinic end generated code". So, if I understand correctly, by moving into a sidefile approach, we will have go to a two-pass system? Once to ACify the file and run Argument Clinic on it, and then again to add in the macros? Is this basically the same as it was with the buffer approach? -- ~Ethan~ From larry at hastings.org Mon Jan 20 22:57:52 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 20 Jan 2014 13:57:52 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DD9970.6090103@stoneleaf.us> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> <20140120140904.48d01e05@anarchist.wooz.org> <52DD81A3.4000205@stoneleaf.us> <52DD9970.6090103@stoneleaf.us> Message-ID: <52DD9BE0.5010905@hastings.org> On 01/20/2014 01:47 PM, Ethan Furman wrote: > So, if I understand correctly, by moving into a sidefile approach, we > will have go to a two-pass system? Once to ACify the file and run > Argument Clinic on it, and then again to add in the macros? > > Is this basically the same as it was with the buffer approach? Let me paraphrase this to add context, to see if I understand your question correctly. When you add the Argument Clinic blob to a function, Argument Clinic generates some C code. That code contains a parsing function, an "impl" function (which you implement), and a macro to paste into the correct slot in the PyMethodDef structure. The name of the macro is full_name_of_function.upper() + "_METHODDEF". However, it can be inconvenient to guess what the actual macro name is, so most people fill out the structure, run clinic.py on the C file, then go hunting for the METHODDEF macro definition to get the name. This is what you refer to as "a two-pass system": edit the file, run Argument Clinic on it, then edit it to put the METHODDEF macro in the right spot. If that's what you meant, then: yes, and yes. It's possible to skip the second pass if you're comfortable guessing the generated name of the macro, but that's just one more thing for people to remember, and hunting for it is easier. And yes, whether it's original output or buffer or "clinic file" (I'm trying to deprecate the name "side file", it was a dumb idea), the destination Argument Clinic writes to doesn't dramatically alter what it writes. //arry/ p.s. Your saying "macros" threw me off, as there's only one macro. -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Mon Jan 20 23:18:44 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 21 Jan 2014 00:18:44 +0200 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: 20.01.14 20:09, Georg Brandl ???????(??): > Am 20.01.2014 14:31, schrieb Serhiy Storchaka: >> 20.01.14 15:03, Nick Coghlan ???????(??): >>> On 20 January 2014 21:14, Serhiy Storchaka wrote: >>>> 20.01.14 10:05, Larry Hastings ???????(??): >>>>> Contestant 4: "Put in clinic directory, add .h" >>>>> >>>>> foo.c -> clinic/foo.c.h >>>>> foo.h -> clinic/foo.h.h >>>> >>>> >>>> -1. (Generated files are located far from origins, directory name clutters >>>> the namespace of directory names). >>> >>> Larry's not talking about a top level directory here (at least I hope >>> he isn't). This proposal would mean using "Objects/clinic", >>> "Python/clinic", "Modules/clinic" as appropriate. >> >> This means the appearance of directories with the common name "clinic" >> in random places of the source tree. Some special name ("__clinic__", >> ".clinic") looks slightly less confusing to me. > > "clinic" shouldn't be such a common name in C soures :) Sources tree already has one "clinic" directory (Tools/clinic/). From g.brandl at gmx.net Mon Jan 20 23:51:36 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 20 Jan 2014 23:51:36 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DD9970.6090103@stoneleaf.us> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> <20140120140904.48d01e05@anarchist.wooz.org> <52DD81A3.4000205@stoneleaf.us> <52DD9970.6090103@stoneleaf.us> Message-ID: Am 20.01.2014 22:47, schrieb Ethan Furman: >>> Won't AC put those macros in the source file for you? >> >> No, currently it wouldn't know where to look. And that's a good thing >> because AC never should modify anything not inbetween "clinic start >> generated code" and "clinic end generated code". > > So, if I understand correctly, by moving into a sidefile approach, we will > have go to a two-pass system? Once to ACify the file and run Argument > Clinic on it, and then again to add in the macros? No. It is completely the same as in the current all-in-one-file approach. > Is this basically the same as it was with the buffer approach? It's the same as it always was. Georg From ncoghlan at gmail.com Tue Jan 21 00:09:53 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Jan 2014 09:09:53 +1000 Subject: [Python-Dev] signature.object, argument clinic and grouped parameters In-Reply-To: References: <52DCF78F.2090000@hastings.org> Message-ID: On 21 Jan 2014 06:26, "Terry Reedy" wrote: > > On 1/20/2014 7:59 AM, Nick Coghlan wrote: > >> However, while I know you're keen to finally make introspection work >> for all C level callables in 3.4, even the ones with signatures that >> can't be expressed as Python function signatures, I'd like to strongly >> encourage you to hold off on that last part until Python 3.5. > > ... > >> That way, you can resurrect PEP 457, recast it as proposing an >> *output* format for inspect.Signature.__repr__(), add an >> inspect.Signature.fromstr() API that can use it to create a signature >> object from __text_signature__ attributes (rather than relying on >> ast.parse), add the optional group support and do it *right*, rather >> than trying to squeeze it in as a new public API during the beta >> period, which may lock us in to supporting an introspection API we >> later regret. > > > I agree. What we can do with the API we have already is a great advance. It also occurred to me last night that PEP 457 could define a "functools.textsignature" decorator to permit describing a particular signature on arbitrary callables (using the attribute already added for Argument Clinic, but extended to arbitrary types). That would allow signature overrides without needing to import the inspect module at startup. Cheers, Nick. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 21 00:15:51 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Jan 2014 09:15:51 +1000 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: On 21 Jan 2014 08:20, "Serhiy Storchaka" wrote: > > 20.01.14 20:09, Georg Brandl ???????(??): > >> Am 20.01.2014 14:31, schrieb Serhiy Storchaka: >>> >>> 20.01.14 15:03, Nick Coghlan ???????(??): >>>> >>>> On 20 January 2014 21:14, Serhiy Storchaka wrote: >>>>> >>>>> 20.01.14 10:05, Larry Hastings ???????(??): >>>>>> >>>>>> Contestant 4: "Put in clinic directory, add .h" >>>>>> >>>>>> foo.c -> clinic/foo.c.h >>>>>> foo.h -> clinic/foo.h.h >>>>> >>>>> >>>>> >>>>> -1. (Generated files are located far from origins, directory name clutters >>>>> the namespace of directory names). >>>> >>>> >>>> Larry's not talking about a top level directory here (at least I hope >>>> he isn't). This proposal would mean using "Objects/clinic", >>>> "Python/clinic", "Modules/clinic" as appropriate. >>> >>> >>> This means the appearance of directories with the common name "clinic" >>> in random places of the source tree. Some special name ("__clinic__", >>> ".clinic") looks slightly less confusing to me. >> >> >> "clinic" shouldn't be such a common name in C soures :) > > > Sources tree already has one "clinic" directory (Tools/clinic/). This observation and the cjkcodecs comparison has prompted me to switch my votes for #4 and #5: +1 for __clinic__, +0 for clinic. I still prefer a subdirectory to adjacent files, though. Cheers, Nick. > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Tue Jan 21 00:24:25 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 20 Jan 2014 15:24:25 -0800 Subject: [Python-Dev] signature.object, argument clinic and grouped parameters In-Reply-To: References: <52DCF78F.2090000@hastings.org> Message-ID: <52DDB029.1000808@hastings.org> On 01/20/2014 04:59 AM, Nick Coghlan wrote: > When I wrote that, I was thinking we had made > inspect.Signature.__repr__ produce a nice string format, but then I > noticed in the REPL today that we never got around to doing that - I > think because we didn't know how to handle positional-only arguments, > which already can't be expressed as Python syntax. (I haven't checked > if we have an RFE filed anywhere) I don't know what you had intended to do, but right now inspect.Signature inherits the standard repl from object. inspect.Signature.__str__ produces something that looks like a Python function signature, starting and ending with parentheses. (For those of you unfamiliar with inspect.Signature: A signature is agnostic about the name of the function. So it doesn't include the name.) > However, while I know you're keen to finally make introspection work > for all C level callables in 3.4, even the ones with signatures that > can't be expressed as Python function signatures, I'd like to strongly > encourage you to hold off on that last part until Python 3.5. If we hold off on all of this until 3.5, the signatures for most builtins will be wrong in 3.4, because most builtins take positional-only parameters. I had higher hopes for Python 3.4 than that. To be honest I'd rather not have the feature at all than have it be wrong most of the time. I think it's fair to summarize your argument as "there could be monsters lurking in CPython with signatures that can't be expressed in PEP 457 syntax". To me this smacks of FUD. Let me open my kimono and tell you all the counter-examples we know of so far. * socket.sendto() has an optional group in the middle of required parameters. (This signature is from 1993.) PEP 457 could support this just by relaxing one requirement. I know what's needed here, but given that PEP 457 was such a dud I haven't bothered to update it. Regardless, Argument Clinic, and the syntax used for text signatures, could (and I expect will soon) support this. The inspect.Parameter.group proposal from my email last night supports this just fine. * itertools.repeat() has a parameter that behaves differently if it's passed by keyword vs passed by position. Guido already ruled that this signature must be changed so it is representable with Python syntax--this behavior is a "bug". * Many functions have default values that are not representable in Python, chiefly a NULL pointer. Guido has already ruled that these signatures should be changed so that they're representable in Python. The best approach is often accepting None, which is inconvenient for non-pointer arguments like integers. Right now Argument Clinic gives you no assistance in this area, but I plan to add explicit support making it easy (via "nullable ints"). In short, there's a clear trend: functions must have signatures representable in Python syntax, with the exception of optional groups which are a legacy feature we can't get rid of but won't support in Python syntax. Any functions whose signatures are not representable in Python syntax shall be tweaked until they are. Any new monsters we discover lurking in CPython will be slain, not supported. ----- We could split the difference, and not add a feature to the inspect module to support optional groups. We could still support marking positional-only parameters, as inspect currently supports that. That would mean nearly all signatures for builtins would be correct. Personally I'd rather go the extra distance and support optional groups too. There are important callables that can only be expressed with optional groups (range, type). Given the trend above, Parameter arguments with optional groups should be sufficient to express every signature available in Python. We've come this far... or, as the British say, in for a penny, in for a pound. Let's hash it out right now and get it done. > While the text string used to communicate between Argument Clinic and > inspect.signature will be private, the representation on > inspect.Signature objects will be a new *public* API. As the > discussions between you, me and Yury show, I don't think there's an > immediately obvious best answer of how to do that. Your suggestion of > just adding the group numbers to the Parameter objects would *work*, > but it's not very Pythonic - we have container types that support > nesting, Apparently you didn't read my proposal in the email you replied to. I didn't propose that "group" contain a number, I proposed it contain a ParameterGroup object that supports nesting. We could take another approach, one you seem to be suggesting, where the nesting is outside the Parameter objects. In this alternate approach, the Signature.parameters array can contain either Parameter objects or OrderedDicts. The nested OrderedDicts themselves can contain either Parameter objects or more nested OrderedDicts. The API would specify that the nested OrderedDicts of parameters are optional en masse. This works fine too. The chief difference between these proposals: if you ignore the complexity of optional groups, the failure mode with ".group" is that it kind of works except when it doesn't, whereas with having OrderedDicts in .parameters the failure mode is that your code blows up with missing attributes (like "couldn't find an attribute called name on this OrderedDict object"). That's probably a vote in favor of the nested OrderedDicts. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jan 20 23:52:13 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 20 Jan 2014 14:52:13 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52DD9BE0.5010905@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> <20140120140904.48d01e05@anarchist.wooz.org> <52DD81A3.4000205@stoneleaf.us> <52DD9970.6090103@stoneleaf.us> <52DD9BE0.5010905@hastings.org> Message-ID: <52DDA89D.4040307@stoneleaf.us> On 01/20/2014 01:57 PM, Larry Hastings wrote: > > If that's what you meant, then: yes, and yes. It's possible to skip the second pass if you're comfortable guessing the > generated name of the macro, but that's just one more thing for people to remember, and hunting for it is easier. And > yes, whether it's original output or buffer or "clinic file" (I'm trying to deprecate the name "side file", it was a > dumb idea), the destination Argument Clinic writes to doesn't dramatically alter what it writes. Okay, thanks. And I'm happy to call them clinic files. :) -- ~Ethan~ From ncoghlan at gmail.com Tue Jan 21 00:53:16 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Jan 2014 09:53:16 +1000 Subject: [Python-Dev] signature.object, argument clinic and grouped parameters In-Reply-To: <52DDB029.1000808@hastings.org> References: <52DCF78F.2090000@hastings.org> <52DDB029.1000808@hastings.org> Message-ID: On 21 Jan 2014 09:26, "Larry Hastings" wrote: > > > > On 01/20/2014 04:59 AM, Nick Coghlan wrote: >> >> When I wrote that, I was thinking we had made >> inspect.Signature.__repr__ produce a nice string format, but then I >> noticed in the REPL today that we never got around to doing that - I >> think because we didn't know how to handle positional-only arguments, >> which already can't be expressed as Python syntax. (I haven't checked >> if we have an RFE filed anywhere) > > > I don't know what you had intended to do, but right now inspect.Signature inherits the standard repl from object. inspect.Signature.__str__ produces something that looks like a Python function signature, starting and ending with parentheses. (For those of you unfamiliar with inspect.Signature: A signature is agnostic about the name of the function. So it doesn't include the name.) > > > >> However, while I know you're keen to finally make introspection work >> for all C level callables in 3.4, even the ones with signatures that >> can't be expressed as Python function signatures, I'd like to strongly >> encourage you to hold off on that last part until Python 3.5. > > > If we hold off on all of this until 3.5, the signatures for most builtins will be wrong in 3.4, because most builtins take positional-only parameters. I had higher hopes for Python 3.4 than that. To be honest I'd rather not have the feature at all than have it be wrong most of the time. Positional only is fine - PEP 362 already handles those. It only doesn't handle things like range(), and those callables should continue to not support introspection at all rather than reporting an incorrect signature. > I think it's fair to summarize your argument as "there could be monsters lurking in CPython with signatures that can't be expressed in PEP 457 syntax". No. I am saying there *are* signatures that the *inspect module* cannot express in its public API. You already *know* that, since you are proposing to add a new feature (group support) to inspect.Signature late in the beta cycle in order to handle those cases. I am saying that's a gross violation of our established processes. The argument clinic conversions can be defended as internal implementation details. A new public feature in the inspect module cannot. Please turn the question around and look at it with your release manager hat on rather than your creator of Argument Clinic hat: if I came to you and said I wanted to add a new public API to the inspect module after the second beta release, what would you say? Can you honestly say that if *someone else* was proposing the inclusion of a new public API this late in the release cycle, you would say yes? If I can wait until 3.5 to add PEP 451 "target" parameters to runpy because I was too busy to land that before feature freeze, and Eric can wait to fully support PEP 451 for builtin and extension modules, and Ethan can wait to restore binary interpolation, and Antoine can wait a full release cycle between adding qualified names in 3.3 and actually seeing them used in an updated pickle protocol in 3.4, there's no good reason to rush adding introspection support for oddball legacy signatures to the inspect module. Regards, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From taleinat at gmail.com Tue Jan 21 02:46:53 2014 From: taleinat at gmail.com (Tal Einat) Date: Tue, 21 Jan 2014 03:46:53 +0200 Subject: [Python-Dev] Argument Clinic: Bug? self converters are not preserved when cloning functions Message-ID: Hi, I'm working on converting Objects/bytearray.c and Objects/bytes.c. For bytes, the strip methods need a "self converter" so that they get a PyBytesObject* instead of PyObject*. However, having set this in bytes.strip and "cloning" that clinic definition for bytes.lstrip and bytes.rstrip, it appears that the self converter wasn't set on lstrip and rstrip. Removing the cloning and copying the argument definitions resolved the issue. Is this a bug? - Tal From larry at hastings.org Tue Jan 21 03:19:36 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 20 Jan 2014 18:19:36 -0800 Subject: [Python-Dev] Argument Clinic: Bug? self converters are not preserved when cloning functions Message-ID: <52ddd942.054e440a.3d71.4914@mx.google.com> Please file an issue on the tracker and add me to the nosy list.? Do that next time, too; this didn't need to go to python-dev. On Jan 20, 2014 5:46 PM, Tal Einat wrote: > > Hi, > > I'm working on converting Objects/bytearray.c and Objects/bytes.c. > > For bytes, the strip methods need a "self converter" so that they get > a PyBytesObject* instead of PyObject*. However, having set this in > bytes.strip and "cloning" that clinic definition for bytes.lstrip and > bytes.rstrip, it appears that the self converter wasn't set on lstrip > and rstrip. Removing the cloning and copying the argument definitions > resolved the issue. > > Is this a bug? > > - Tal > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/larry%40hastings.org From Nikolaus at rath.org Tue Jan 21 03:35:40 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Mon, 20 Jan 2014 18:35:40 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: (Ryan Smith-Roberts's message of "Mon, 20 Jan 2014 12:00:13 -0800") References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> <87iotfxp38.fsf@vostro.rath.org> Message-ID: <87r4825af7.fsf@vostro.rath.org> Ryan Smith-Roberts writes: > The trick you're missing is that *any time* you have an optional argument > with a custom converter[1], PyArg_ParseTuple *only* calls the converter > function in the case that the user *actually supplies some value*. This is > a basic property of an optional argument. Another property is that the > c_default is evaluated *every time*, as it is set *before* the call to > PyArg_ParseTuple. Are these the best ways to do things? Maybe not, but it's > how they are. > > Please do not use a custom converter for this case. It can't work. Well, I thought I was following Larry's recommendation: >>> A better choice would be to write a converter function in C, then use >>> a custom converter that called it. Nikolaus: Is that something you're >>> comfortable doing? ..and I assumed he'd know best. Did I misunderstand that quote? > Please do what I outlined earlier (untested, somewhat verbose code > follows): > > static int > parse_time_t_arg(PyObject *arg, time_t *when) > { > if (arg == NULL || arg == Py_None) { > *when = time(NULL); > return 1; > } > if (_PyTime_ObjectToTime_t(arg, when) == -1) > return 0; > return 1; > } Ahm, this is exactly the code that I wrote (and you even quoted it below), only with the identifiers renamed. >> /*[clinic input] > time.gmtime > seconds: object = None > [clinic start generated code]*/ > { > time_t when; > > if (0 == parse_time_t_arg(seconds, &when)) > return NULL; That's fine with me too. I'd just like Larry to sign off on it, because as far as I know, he'll be the one to review my patch. Best, -Nikolaus > [1] If you set a default value, or put it in brackets as Serhiy later > recommends, it works the same. > > > On Sun, Jan 19, 2014 at 8:19 PM, Nikolaus Rath wrote: > >> Larry Hastings writes: >> > On 01/18/2014 09:52 PM, Ryan Smith-Roberts wrote: >> >> >> >> I still advise you not to use this solution. time() is a system call >> >> on many operating systems, and so it can be a heavier operation than >> >> you'd think. Best to avoid it unless it's needed (on FreeBSD it >> >> seems to add about 15% overhead to localtime(), for instance). >> >> >> > >> > I agree. Converting to Argument Clinic should not cause a performance >> > regression. Please don't add new calls to time() for the sake of >> > making code more generic. >> > >> > A better choice would be to write a converter function in C, then use >> > a custom converter that called it. Nikolaus: Is that something you're >> > comfortable doing? >> >> I think I'll need some help. I don't know how to handle the case where >> the user is not passing anything. >> >> Here's my attempt: >> >> ,---- >> | /* C Converter for argument clinic >> | If obj is NULL or Py_None, return current time. Otherwise, >> | convert Python object to time_t. >> | */ >> | static int >> | PyObject_to_time_t(PyObject *obj, time_t *stamp) >> | { >> | if (obj == NULL || obj == Py_None) { >> | *stamp = time(NULL); >> | } >> | else { >> | if (_PyTime_ObjectToTime_t(obj, stamp) == -1) >> | return 0; >> | } >> | return 1; >> | } >> | >> | /*[python input] >> | class time_t_converter(CConverter): >> | type = 'time_t' >> | converter = 'PyObject_to_time_t' >> | default = None >> | [python start generated code]*/ >> | /*[python end generated code: >> checksum=da39a3ee5e6b4b0d3255bfef95601890afd80709]*/ >> | >> | >> | /*[clinic input] >> | time.gmtime >> | >> | seconds: time_t >> | / >> | >> | [clinic start generated code]*/ >> `---- >> >> but this results in the following code: >> >> ,---- >> | static PyObject * >> | time_gmtime(PyModuleDef *module, PyObject *args) >> | { >> | PyObject *return_value = NULL; >> | time_t seconds; >> | >> | if (!PyArg_ParseTuple(args, >> | "|O&:gmtime", >> | PyObject_to_time_t, &seconds)) >> | goto exit; >> | return_value = time_gmtime_impl(module, seconds); >> | >> | exit: >> | return return_value; >> | } >> `---- >> >> This works if the user calls time.gmtime(None), but it fails for >> time.gmtime(). It seems that in that case my C converter function is >> never called. >> >> What's the trick that I'm missing? >> >> >> Thanks! >> -Nikolaus >> >> -- >> Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 >> 02CF A9AD B7F8 AE4E 425C >> >> ?Time flies like an arrow, fruit flies like a Banana.? >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net >> -- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C ?Time flies like an arrow, fruit flies like a Banana.? From Nikolaus at rath.org Tue Jan 21 03:44:51 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Mon, 20 Jan 2014 18:44:51 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: (Serhiy Storchaka's message of "Mon, 20 Jan 2014 15:38:56 +0200") References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> <87iotfxp38.fsf@vostro.rath.org> Message-ID: <87ob3659zw.fsf@vostro.rath.org> Serhiy Storchaka writes: > 20.01.14 06:19, Nikolaus Rath ???????(??): >> This works if the user calls time.gmtime(None), but it fails for >> time.gmtime(). It seems that in that case my C converter function is >> never called. >> >> What's the trick that I'm missing? > > /*[clinic input] > time.gmtime > > [ > seconds: time_t > ] > / > Ahh, interesting. So this works, but now the C default is evaluated even if the user passed an argument (so that answers my question about decreased performance in the other subthread). The generated code is: time_gmtime(PyModuleDef *module, PyObject *args) { PyObject *return_value = NULL; int group_right_1 = 0; time_t seconds = time(NULL); switch (PyTuple_GET_SIZE(args)) { case 0: break; case 1: if (!PyArg_ParseTuple(args, "O&:gmtime", PyObject_to_time_t, &seconds)) return NULL; group_right_1 = 1; break; default: PyErr_SetString(PyExc_TypeError, "time.gmtime requires 0 to 1 arguments"); return NULL; } return_value = time_gmtime_impl(module, group_right_1, seconds); All in all, I'm still not sure how I'm supposed to proceed. I see the following options (and I'm fine with all of them): 1. Use the option group with a custom converter. This means a time(NULL) call even if the caller passed a parameter. 2. Declare the _impl parameter as PyObject* instead of time_t, and explicitly call a C conversion function. 3. Patch clinic.py to only evaluate the C default if the caller does not pass a parameter. This seemest cleanest, but I don't know if the design of clinic.py actually allows that. Best, Nikolaus -- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C ?Time flies like an arrow, fruit flies like a Banana.? From larry at hastings.org Tue Jan 21 04:47:16 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 20 Jan 2014 19:47:16 -0800 Subject: [Python-Dev] signature.object, argument clinic and grouped parameters In-Reply-To: References: <52DCF78F.2090000@hastings.org> <52DDB029.1000808@hastings.org> Message-ID: <52DDEDC4.1050402@hastings.org> On 01/20/2014 03:53 PM, Nick Coghlan wrote: > > Please turn the question around and look at it with your release > manager hat on rather than your creator of Argument Clinic hat: if I > came to you and said I wanted to add a new public API to the inspect > module after the second beta release, what would you say? Can you > honestly say that if *someone else* was proposing the inclusion of a > new public API this late in the release cycle, you would say yes? > You're right. Optional group information won't be a public API in 3.4. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Tue Jan 21 08:15:02 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 21 Jan 2014 08:15:02 +0100 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: Am 20.01.2014 09:15, schrieb Georg Brandl: >> Contestant 5: "Put in __clinic__ directory, add .h" >> >> foo.c -> __clinic__/foo.c.h >> foo.h -> __clinic__/foo.h.h > > -1. (Too complicated; this isn't Python packages we're talking about.) Make that +0. Georg From larry at hastings.org Tue Jan 21 08:19:11 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 20 Jan 2014 23:19:11 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: <87ob3659zw.fsf@vostro.rath.org> References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> <87iotfxp38.fsf@vostro.rath.org> <87ob3659zw.fsf@vostro.rath.org> Message-ID: <52DE1F6F.6010202@hastings.org> On 01/20/2014 06:44 PM, Nikolaus Rath wrote: > All in all, I'm still not sure how I'm supposed to proceed. I see the > following options (and I'm fine with all of them): > > 1. Use the option group with a custom converter. This means a time(NULL) > call even if the caller passed a parameter. > > 2. Declare the _impl parameter as PyObject* instead of time_t, and > explicitly call a C conversion function. > > 3. Patch clinic.py to only evaluate the C default if the caller does not > pass a parameter. This seemest cleanest, but I don't know if the > design of clinic.py actually allows that. clinic.py is not flexible enough to allow initialization code after the call to the converter. A comment on your approach so far: I'm very much against giving "default" a default value in the constructor. I realize that hack saves you having to say "= NULL" in a lot of places. But explicit is better than implicit, and we're going to read these signatures a lot more often than we write them, and I want Clinic signatures to be easy to read at first glance. Anyway, you're right, the converter function is not called if a value is not passed in to convert it. I think this is more complicated than you suspect, because PyArg_ParseWhatnot doesn't tell you whether or not it processed a parameter. You have to detect it yourself, generally through a clever choice of a default value in C. But there are no illegal values of time_t. All is not lost! What follows is rough pseudo-C code, hopefully you can take it from here. typedef struct { int set; time_t when; } clinic_time_t; #define DEFAULT_CLINIC_TIME_T {0, 0} static int parse_clinic_time_t_arg(PyObject *arg, clinic_time_t *ct) { if (arg == NULL) return 1; if (arg == Py_None) return 0; if (_PyTime_ObjectToTime_t(arg, &ct->when) == -1) { set = 1; return 0; } return 1; } static int post_parse_clinic_time_t(clinic_time_t *ct) { if (ct->set) return 0; ct->when = time(NULL); return 0; } /*[python input] class clinic_time_t_converter(CConverter): type = 'clinic_time_t' converter = 'parse_clinic_time_t' c_default = 'DEFAULT_CLINIC_TIME_T' [python start generated code]*/ /*[python end generated code: checksum=...]*/ Now you can use clinic_time_t. Parameters declared clinic_time_t can be required, or they can be optional; if they're optional give them a default value of None. You'll have to call post_parse_clinic_time_t() by hand in your impl function; I'll see if I can extend Clinic so converters can emit code after a successful call to the parse function but before the call to the impl. Also, the converter probably isn't quite right, you'll have to play with "impl_by_reference" and "parse_by_reference" and add and remove asterisks and ampersands to make sure the code is 100% correct. Examine the implementation of path_t in Modules/posixmodule.c, as that does about the same thing. And of course clinic_time_t is a poor name, perhaps you can come up with a better one. That should work, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Jan 21 09:59:06 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 21 Jan 2014 10:59:06 +0200 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: <87ob3659zw.fsf@vostro.rath.org> References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> <87iotfxp38.fsf@vostro.rath.org> <87ob3659zw.fsf@vostro.rath.org> Message-ID: 21.01.14 04:44, Nikolaus Rath ???????(??): > Serhiy Storchaka writes: >> 20.01.14 06:19, Nikolaus Rath ???????(??): >>> This works if the user calls time.gmtime(None), but it fails for >>> time.gmtime(). It seems that in that case my C converter function is >>> never called. >>> >>> What's the trick that I'm missing? >> >> /*[clinic input] >> time.gmtime >> >> [ >> seconds: time_t >> ] >> / >> > > Ahh, interesting. So this works, but now the C default is evaluated even > if the user passed an argument (so that answers my question about > decreased performance in the other subthread). The generated code is: Don't use time(NULL) as C default. Instead check group_right_1 and call time(NULL) explicitly. time_gmtime_impl(PyModuleDef *module, int group_right_1, time_t seconds) { if (!group_right_1) seconds = time(NULL); ... } From larry at hastings.org Tue Jan 21 11:39:52 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 21 Jan 2014 02:39:52 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> <87iotfxp38.fsf@vostro.rath.org> <87ob3659zw.fsf@vostro.rath.org> Message-ID: <52DE4E78.7070008@hastings.org> On 01/21/2014 12:59 AM, Serhiy Storchaka wrote: > 21.01.14 04:44, Nikolaus Rath ???????(??): >> Serhiy Storchaka writes: >>> 20.01.14 06:19, Nikolaus Rath ???????(??): >>>> This works if the user calls time.gmtime(None), but it fails for >>>> time.gmtime(). It seems that in that case my C converter function is >>>> never called. >>>> >>>> What's the trick that I'm missing? >>> >>> /*[clinic input] >>> time.gmtime >>> >>> [ >>> seconds: time_t >>> ] >>> / >>> >> >> Ahh, interesting. So this works, but now the C default is evaluated even >> if the user passed an argument (so that answers my question about >> decreased performance in the other subthread). The generated code is: > > Don't use time(NULL) as C default. Instead check group_right_1 and > call time(NULL) explicitly. While this "trick" works, it abuses optional groups. Optional groups are intended as a last resort, for semantics that can't be expressed any other way. The semantics of time.gmtime() are very easily expressed using normal Python syntax. Please don't use optional groups here. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 21 12:46:48 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Jan 2014 21:46:48 +1000 Subject: [Python-Dev] signature.object, argument clinic and grouped parameters In-Reply-To: <52DDEDC4.1050402@hastings.org> References: <52DCF78F.2090000@hastings.org> <52DDB029.1000808@hastings.org> <52DDEDC4.1050402@hastings.org> Message-ID: On 21 Jan 2014 13:49, "Larry Hastings" wrote: > > On 01/20/2014 03:53 PM, Nick Coghlan wrote: >> >> Please turn the question around and look at it with your release manager hat on rather than your creator of Argument Clinic hat: if I came to you and said I wanted to add a new public API to the inspect module after the second beta release, what would you say? Can you honestly say that if *someone else* was proposing the inclusion of a new public API this late in the release cycle, you would say yes? > > > You're right. Optional group information won't be a public API in 3.4. Thanks for that. I agree it's a shame we missed the 3.4 feature deadline (as I would also like to see full C level signature support done and dusted), but once 3.4 is out the door we can hopefully resurrect PEP 457 as a direct successor to the original PEP 362, and update the inspect module to also handle these quirkier variants. Perhaps we can even add that public API for building Signature objects from string definitions, since that has proved quite handy internally :) Cheers, Nick. > > > /arry > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Jan 21 16:59:36 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 21 Jan 2014 10:59:36 -0500 Subject: [Python-Dev] Formatting of positional-only parameters in signatures Message-ID: There is one more, hopefully last, open urgent question with the signature? object. At the time we were working on the PEP 362, PEP 457 didn?t? exist. Nor did we have any function with real positonal-only parameters,? since there was no Argument Clinic yet. However, I implemented? rudimentary support for them:? ?Parameter.POSITIONAL_ONLY? constant; ?Signature.bind? and ?Signature.bind_partial? fully support functions with positional-only parameters; ?Signature.__str__? renders them distinctively from other kinds. The last point is the troublesome now. "Signature.__str__? renders positional-only parameters in ?<>? brackets, so in: foo(, , baz) ?ham? and ?spam? are positional-only. The choice of angle brackets was unfortunate, as, first of all, this wasn?t really discussed on? python-dev, and second, it?s easy to think that those parameters are optional. Now, with the AC landing in 3.4, we need to decide how positional-only parameters will look like. Without starting a new discussion similar to what we had prior to PEP 457, I think we have three options: 1. Leave it as is. Obviously, the downside is the potential confusion? with ?optional? notation. 2. Adopt PEP 457 style: using ?/? to separate positional-only parameters from the rest. 3. Don?t use any notation, just render them as plain arguments: "foo(ham, spam, baz)". The downside here is that the users will be? confused, and might try passing those parameters with keywords, or binding them with keywords. Thoughts? Yury From chris.barker at noaa.gov Tue Jan 21 17:57:52 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 21 Jan 2014 08:57:52 -0800 Subject: [Python-Dev] PEP 461 updates In-Reply-To: References: <52D59622.1070307@stoneleaf.us> <52D72443.10002@stoneleaf.us> <52D73343.6080207@oddbird.net> <52D84C8C.6090408@udel.edu> <52D85E61.8040701@udel.edu> <874n53cptr.fsf@uwakimon.sk.tsukuba.ac.jp> <20140117053611.GB3915@ando> <87wqhzapz1.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Jan 19, 2014 at 7:21 AM, Oscar Benjamin wrote: > > long as numpy.loadtxt is explicitly documented as only working with > > latin-1 encoded files (it currently isn't), there's no problem. > > Actually there is problem. If it explicitly specified the encoding as > latin-1 when opening the file then it could document the fact that it > works for latin-1 encoded files. However it actually uses the system > default encoding to read the file which is a really bad default -- oh well. Also, I don't think it was a choice, at least not a well thought out one, but rather what fell out of tryin gto make it "just work" on py3. and then converts the strings to > bytes with the as_bytes function that is hard-coded to use latin-1: > https://github.com/numpy/numpy/blob/master/numpy/compat/py3k.py#L28 > > So it only works if the system default encoding is latin-1 and the > file content is white-space and newline compatible with latin-1. > Regardless of whether the file itself is in utf-8 or latin-1 it will > only work if the system default encoding is latin-1. I've never used a > system that had latin-1 as the default encoding (unless you count > cp1252 as latin-1). > even if it was a common default it would be a "bad idea". Fortunately (?), so it really is broken, we can fix it without being too constrained by backwards compatibility. > > > If it's supposed to work with other encodings (but the entire file is > > still required to use a consistent encoding), then it just needs > > encoding and errors arguments to fit the Python 3 text model (with > > "latin-1" documented as the default encoding). > > This is the right solution. Have an encoding argument, document the > fact that it will use the system default encoding if none is > specified, and re-encode using the same encoding to fit any dtype='S' > bytes column. This will then work for any encoding including the ones > that aren't ASCII-compatible (e.g. utf-16). > Exactly, except I dont think the system encoding as a default is a good choice. If there is a default MOST people will use it. And it will work for a lot of their test code. Then it will break if the code is passed to a system with a different default encoding, or a file comes from another source in a different encoding. This is very, very likely. Far more likely that files consistently being in the system encoding.... > > default behaviour, since passing something like > > codecs.getdecoder("utf-8") as a column converter should do the right > > thing. > that seems to work at the moment, actually, if done with care. That's just getting silly IMO. If the file uses mixed encodings then I > don't consider it a valid "text file" and see no reason for loadtxt to > support reading it. agreed -- that's just getting crazy -- the only use-case I can image is to clean up a file that got moji-baked by some other process -- not really the use case for loadtxt and friends. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Jan 21 18:22:13 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 21 Jan 2014 12:22:13 -0500 Subject: [Python-Dev] Formatting of positional-only parameters in signatures In-Reply-To: References: Message-ID: On 1/21/2014 10:59 AM, Yury Selivanov wrote: > There is one more, hopefully last, open urgent question with the signature > object. At the time we were working on the PEP 362, PEP 457 didn?t > exist. Nor did we have any function with real positonal-only parameters, > since there was no Argument Clinic yet. However, I implemented > rudimentary support for them: > > ?Parameter.POSITIONAL_ONLY? constant; > > ?Signature.bind? and ?Signature.bind_partial? fully support functions > with positional-only parameters; > > ?Signature.__str__? renders them distinctively from other kinds. > > The last point is the troublesome now. "Signature.__str__? renders > positional-only parameters in ?<>? brackets, so in: > > foo(, , baz) This amounts to a hidden new API. > ?ham? and ?spam? are positional-only. The choice of angle brackets was > unfortunate, as, first of all, this wasn?t really discussed on > python-dev, and second, it?s easy to think that those parameters are > optional. > > Now, with the AC landing in 3.4, we need to decide how positional-only > parameters will look like. > > Without starting a new discussion similar to what we had prior to PEP 457, > I think we have three options: > > 1. Leave it as is. Obviously, the downside is the potential confusion > with ?optional? notation. I think this this is bad, as it has not been discussed and agreed on, and might be changed. It is a plausible alternative to '/', but might possibly even have been rejected. I do not remember. > 2. Adopt PEP 457 style: using ?/? to separate positional-only parameters > from the rest. I think this is what Larry proposed, but Nick opposed as a post-beta new feature. > 3. Don?t use any notation, just render them as plain arguments: > "foo(ham, spam, baz)". The downside here is that the users will be > confused, and might try passing those parameters with keywords, or > binding them with keywords. This is the status quo for the docs (both notation and occasional confusion). I think signature should match until we agree on a new convention (and I think one is needed). The first thing to decide is whether to mark each position-only parameter or to put one marker after all of them. -- Terry Jan Reedy From yselivanov.ml at gmail.com Tue Jan 21 18:51:34 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 21 Jan 2014 12:51:34 -0500 Subject: [Python-Dev] Formatting of positional-only parameters in signatures In-Reply-To: References: Message-ID: Terry, On January 21, 2014 at 12:23:31 PM, Terry Reedy (tjreedy at udel.edu) wrote: > > On 1/21/2014 10:59 AM, Yury Selivanov wrote: > > There is one more, hopefully last, open urgent question with > the signature > > object. At the time we were working on the PEP 362, PEP 457 didn?t > > exist. Nor did we have any function with real positonal-only > parameters, > > since there was no Argument Clinic yet. However, I implemented > > rudimentary support for them: > > > > ?Parameter.POSITIONAL_ONLY? constant; > > > > ?Signature.bind? and ?Signature.bind_partial? fully support > functions > > with positional-only parameters; > > > > ?Signature.__str__? renders them distinctively from other > kinds. > > > > The last point is the troublesome now. "Signature.__str__? > renders > > positional-only parameters in ?<>? brackets, so in: > > > > foo(, , baz) > > This amounts to a hidden new API. Yes, and no. This wasn?t documented and until 3.4 we had no real-world positonal-only parameters. So I think it?s OK to fix this now. Yury From ethan at stoneleaf.us Tue Jan 21 19:09:24 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 21 Jan 2014 10:09:24 -0800 Subject: [Python-Dev] Formatting of positional-only parameters in signatures In-Reply-To: References: Message-ID: <52DEB7D4.4070205@stoneleaf.us> On 01/21/2014 07:59 AM, Yury Selivanov wrote: > > 2. Adopt PEP 457 style: using ?/? to separate positional-only parameters > from the rest. +1 -- ~Ethan~ From ncoghlan at gmail.com Wed Jan 22 00:05:05 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 22 Jan 2014 09:05:05 +1000 Subject: [Python-Dev] Formatting of positional-only parameters in signatures In-Reply-To: References: Message-ID: On 22 Jan 2014 03:24, "Terry Reedy" wrote: > > On 1/21/2014 10:59 AM, Yury Selivanov wrote: >> >> There is one more, hopefully last, open urgent question with the signature >> object. At the time we were working on the PEP 362, PEP 457 didn?t >> exist. Nor did we have any function with real positonal-only parameters, >> since there was no Argument Clinic yet. However, I implemented >> rudimentary support for them: >> >> ?Parameter.POSITIONAL_ONLY? constant; >> >> ?Signature.bind? and ?Signature.bind_partial? fully support functions >> with positional-only parameters; >> >> ?Signature.__str__? renders them distinctively from other kinds. >> >> The last point is the troublesome now. "Signature.__str__? renders >> positional-only parameters in ?<>? brackets, so in: >> >> foo(, , baz) > > > This amounts to a hidden new API. > > >> ?ham? and ?spam? are positional-only. The choice of angle brackets was >> unfortunate, as, first of all, this wasn?t really discussed on >> python-dev, and second, it?s easy to think that those parameters are >> optional. >> >> Now, with the AC landing in 3.4, we need to decide how positional-only >> parameters will look like. >> >> Without starting a new discussion similar to what we had prior to PEP 457, >> I think we have three options: >> >> 1. Leave it as is. Obviously, the downside is the potential confusion >> with ?optional? notation. > > > I think this this is bad, as it has not been discussed and agreed on, and might be changed. It is a plausible alternative to '/', but might possibly even have been rejected. I do not remember. It was the "we have to do something about this" short term fix we implemented for 3.3, that hasn't caused major problems due to the difficulties of creating such signatures in the first place. Argument Clinic changes that, since a variety of C level callables with this kind of signature will know be handled by the inspect module. I think it's actually tolerable to leave this in place for 3.4 as well, although I'd prefer to instead bring it into line with Argument Clinic. >> 2. Adopt PEP 457 style: using ?/? to separate positional-only parameters >> from the rest. > > > I think this is what Larry proposed, but Nick opposed as a post-beta new feature. No, I think this is a good idea. It matches previous discussions with Guido and is more consistent with Argument Clinic. The exchange between me and Larry was about the more exotic signatures like range() that inspect.Signature currently can't even represent internally - handling *those* requires changes to the data model and public API of inspect.Signature, not just the string representation. By contrast, positional only parameter support is already there, it's just that the notation used in the string representation is inconsistent with Argument Clinic's notation. >> 3. Don?t use any notation, just render them as plain arguments: >> "foo(ham, spam, baz)". The downside here is that the users will be >> confused, and might try passing those parameters with keywords, or >> binding them with keywords. > > > This is the status quo for the docs (both notation and occasional confusion). I think signature should match until we agree on a new convention We already rejected this option when inspect.Signature was first added. I don't see a good reason to reverse that decision now. > (and I think one is needed). The first thing to decide is whether to mark each position-only parameter or to put one marker after all of them. That's (at least in part) what PEP 457 was about - documenting the format Guido had already given lukewarm approval to. It's the basis of Argument Clinic's syntax, and currently illegal Python syntax. Cheers, Nick. > > -- > Terry Jan Reedy > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nikolaus at rath.org Wed Jan 22 04:19:39 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Tue, 21 Jan 2014 19:19:39 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: <52DE1F6F.6010202@hastings.org> (Larry Hastings's message of "Mon, 20 Jan 2014 23:19:11 -0800") References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> <87iotfxp38.fsf@vostro.rath.org> <87ob3659zw.fsf@vostro.rath.org> <52DE1F6F.6010202@hastings.org> Message-ID: <87r480og8k.fsf@vostro.rath.org> Larry Hastings writes: > A comment on your approach so far: I'm very much against giving > "default" a default value in the constructor. You mean in the definition of the custom converter class? > I realize that hack saves you having to say "= NULL" in a lot of > places. But explicit is better than implicit, and we're going to read > these signatures a lot more often than we write them, and I want > Clinic signatures to be easy to read at first glance. [....] > All is not lost! What follows is rough pseudo-C code, hopefully you > can take it from here. > > typedef struct { > int set; > time_t when; > } clinic_time_t; > #define DEFAULT_CLINIC_TIME_T {0, 0} > [...] > > /*[python input] > class clinic_time_t_converter(CConverter): > type = 'clinic_time_t' > converter = 'parse_clinic_time_t' > c_default = 'DEFAULT_CLINIC_TIME_T' > [python start generated code]*/ > /*[python end generated code: checksum=...]*/ > > Now you can use clinic_time_t. Parameters declared clinic_time_t can > be required, or they can be optional; if they're optional give them a > default value of None. That doesn't work. If the default value is declared for the function rather than in the converter definition, it overwrites the C default: /*[clinic input] time.gmtime seconds: clinic_time_t=None / */ gives: static PyObject * time_gmtime(PyModuleDef *module, PyObject *args) { PyObject *return_value = NULL; clinic_time_t seconds = Py_None; if (!PyArg_ParseTuple(args, "|O&:gmtime", parse_clinic_time_t, &seconds)) goto exit; return_value = time_gmtime_impl(module, seconds); so the default for seconds is now Py_None instead of DEFAULT_CLINIC_TIME_T'. Best, Nikolaus -- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C ?Time flies like an arrow, fruit flies like a Banana.? From larry at hastings.org Wed Jan 22 05:23:29 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 21 Jan 2014 20:23:29 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() Message-ID: <52df47d5.ca41420a.6d54.0064@mx.google.com> Yes, I meant in the definition of the convertor class.? You can fix c_default in converter_init. On Jan 21, 2014 7:19 PM, Nikolaus Rath wrote: > > Larry Hastings writes: > > A comment on your approach so far: I'm very much against giving > > "default" a default value in the constructor. > > You mean in the definition of the custom converter class? > > > I realize that hack saves you having to say "= NULL" in a lot of > > places.? But explicit is better than implicit, and we're going to read > > these signatures a lot more often than we write them, and I want > > Clinic signatures to be easy to read at first glance. > [....] > > All is not lost!? What follows is rough pseudo-C code, hopefully you > > can take it from here. > > > >??? typedef struct { > >?????? int set; > >?????? time_t when; > >??? } clinic_time_t; > >??? #define DEFAULT_CLINIC_TIME_T {0, 0} > > > [...] > > > >??? /*[python input] > >??? class clinic_time_t_converter(CConverter): > >???????? type = 'clinic_time_t' > >???????? converter = 'parse_clinic_time_t' > >???????? c_default = 'DEFAULT_CLINIC_TIME_T' > >??? [python start generated code]*/ > >??? /*[python end generated code: checksum=...]*/ > > > > Now you can use clinic_time_t.? Parameters declared clinic_time_t can > > be required, or they can be optional; if they're optional give them a > > default value of None. > > That doesn't work. If the default value is declared for the function > rather than in the converter definition, it overwrites the C default: > > /*[clinic input] > time.gmtime > > ??? seconds: clinic_time_t=None > ??? / > */ > > gives: > > static PyObject * > time_gmtime(PyModuleDef *module, PyObject *args) > { > ??? PyObject *return_value = NULL; > ??? clinic_time_t seconds = Py_None; > > ??? if (!PyArg_ParseTuple(args, > ??????? "|O&:gmtime", > ??????? parse_clinic_time_t, &seconds)) > ??????? goto exit; > ??? return_value = time_gmtime_impl(module, seconds); > > so the default for seconds is now Py_None instead of > DEFAULT_CLINIC_TIME_T'. > > Best, > Nikolaus > > -- > Encrypted emails preferred. > PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6? 02CF A9AD B7F8 AE4E 425C > > ???????????? ?Time flies like an arrow, fruit flies like a Banana.? > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/larry%40hastings.org From Nikolaus at rath.org Wed Jan 22 06:13:22 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Tue, 21 Jan 2014 21:13:22 -0800 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: <52DE1F6F.6010202@hastings.org> (Larry Hastings's message of "Mon, 20 Jan 2014 23:19:11 -0800") References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> <87iotfxp38.fsf@vostro.rath.org> <87ob3659zw.fsf@vostro.rath.org> <52DE1F6F.6010202@hastings.org> Message-ID: <87fvogegzx.fsf@vostro.rath.org> Larry Hastings writes: > All is not lost! What follows is rough pseudo-C code, hopefully you > can take it from here. > > typedef struct { > int set; > time_t when; > } clinic_time_t; > > #define DEFAULT_CLINIC_TIME_T {0, 0} > > static int > parse_clinic_time_t_arg(PyObject *arg, clinic_time_t *ct) > { > if (arg == NULL) > return 1; > if (arg == Py_None) > return 0; > if (_PyTime_ObjectToTime_t(arg, &ct->when) == -1) { > set = 1; > return 0; > } > return 1; > } > > static int post_parse_clinic_time_t(clinic_time_t *ct) { > if (ct->set) > return 0; > ct->when = time(NULL); > return 0; > } > > /*[python input] > class clinic_time_t_converter(CConverter): > type = 'clinic_time_t' > converter = 'parse_clinic_time_t' > c_default = 'DEFAULT_CLINIC_TIME_T' > [python start generated code]*/ > /*[python end generated code: checksum=...]*/ > > Now you can use clinic_time_t. Parameters declared clinic_time_t can > be required, or they can be optional; if they're optional give them a > default value of None. You'll have to call post_parse_clinic_time_t() > by hand in your impl function; I'll see if I can extend Clinic so > converters can emit code after a successful call to the parse function > but before the call to the impl. Okay, I attached a patch along these lines to the bugtracker. However, I have to say that I lost track of what we're actually gaining with all this. If all we need is a type that can distinguish between a valid time_t value and a default value, why don't we simply use PyObject? In other words, what's the advantage of all the code above over: static int PyObject_to_time_t(PyObject *obj, time_t *when) { if (obj == NULL || obj == Py_None) *when = time(NULL); else if (_PyTime_ObjectToTime_t(obj, when) == -1) return 0; return 1; } /*[clinic input] time.gmtime seconds: object=NULL / [...] static PyObject * time_gmtime_impl(PyModuleDef *module, PyObject *seconds) { PyObject *return_value = NULL; time_t when; if(!PyObj_to_time_t(seconds, &when)) return NULL; [...] To me this version looks shorter and clearer. Is there really an advantage in defining the clinic argument as a custom struct rather than object? Best, -Nikolaus -- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C ?Time flies like an arrow, fruit flies like a Banana.? From storchaka at gmail.com Wed Jan 22 07:47:34 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 22 Jan 2014 08:47:34 +0200 Subject: [Python-Dev] Using argument clinic to replace timemodule.c:parse_time_t_args() In-Reply-To: <87fvogegzx.fsf@vostro.rath.org> References: <87vbxgeqm3.fsf@vostro.rath.org> <87siskeixg.fsf@vostro.rath.org> <52DBAB14.8090704@hastings.org> <87iotfxp38.fsf@vostro.rath.org> <87ob3659zw.fsf@vostro.rath.org> <52DE1F6F.6010202@hastings.org> <87fvogegzx.fsf@vostro.rath.org> Message-ID: 22.01.14 07:13, Nikolaus Rath ???????(??): > To me this version looks shorter and clearer. Is there really an > advantage in defining the clinic argument as a custom struct rather than > object? To me too. From donald at stufft.io Wed Jan 22 11:30:40 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 05:30:40 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation Message-ID: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> I would like to propose that a backwards incompatible change be made to Python to make verification of hostname and certificate chain the default instead of requiring it to be opt in. Python 3.4 has made great strides in making it easier for applications to simply turn on these settings, however many people are not aware at all that they need to opt into this. Most assume that it will operate similarly to their browser, curl, wget, etc and validate by default and in the typical style of security related issues it will appear to work just fine however be grossly insecure. In the real world ?opt in security? typically translates to just plain old insecure for the bulk of applications/libraries. I believe that Python has a responsibility to do the right thing by default here and it is in the best position to do so. The alternative requires every Python developer who wants to access a secure resource to be educated on the fact that they need to flip some switch to do what most of them would expect. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From mal at egenix.com Wed Jan 22 11:51:26 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 22 Jan 2014 11:51:26 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: <52DFA2AE.8030504@egenix.com> On 22.01.2014 11:30, Donald Stufft wrote: > I would like to propose that a backwards incompatible change be made to Python to make > verification of hostname and certificate chain the default instead of requiring it to be opt > in. > > Python 3.4 has made great strides in making it easier for applications to simply turn on these > settings, however many people are not aware at all that they need to opt into this. Most assume > that it will operate similarly to their browser, curl, wget, etc and validate by default and > in the typical style of security related issues it will appear to work just fine however be > grossly insecure. > > In the real world ?opt in security? typically translates to just plain old insecure for the > bulk of applications/libraries. I believe that Python has a responsibility to do the right > thing by default here and it is in the best position to do so. The alternative requires every > Python developer who wants to access a secure resource to be educated on the fact that they > need to flip some switch to do what most of them would expect. Such a change would introduce considerable breakage. This would either have to be done using our usual pending deprecation, deprecation, removal dance (over three releases) or be postponed until Python 4. Note that several python.org services use CAcerts which would no longer be accessible per default following such a change. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 22 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, >>> mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald at stufft.io Wed Jan 22 11:56:05 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 05:56:05 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <52DFA2AE.8030504@egenix.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> Message-ID: On Jan 22, 2014, at 5:51 AM, M.-A. Lemburg wrote: > On 22.01.2014 11:30, Donald Stufft wrote: >> I would like to propose that a backwards incompatible change be made to Python to make >> verification of hostname and certificate chain the default instead of requiring it to be opt >> in. >> >> Python 3.4 has made great strides in making it easier for applications to simply turn on these >> settings, however many people are not aware at all that they need to opt into this. Most assume >> that it will operate similarly to their browser, curl, wget, etc and validate by default and >> in the typical style of security related issues it will appear to work just fine however be >> grossly insecure. >> >> In the real world ?opt in security? typically translates to just plain old insecure for the >> bulk of applications/libraries. I believe that Python has a responsibility to do the right >> thing by default here and it is in the best position to do so. The alternative requires every >> Python developer who wants to access a secure resource to be educated on the fact that they >> need to flip some switch to do what most of them would expect. > > Such a change would introduce considerable breakage. This would either > have to be done using our usual pending deprecation, deprecation, removal > dance (over three releases) or be postponed until Python 4. I can understand the need for doing the typical deprecation dance, although I believe such policies are often overlooked or accelerated for security sensitive changes. I do believe that waiting until Python 4 would be doing a great disservice to the users of Python though. > > Note that several python.org services use CAcerts which would no > longer be accessible per default following such a change. Not that it has much to do with this proposal, but those services should be switched to use certificates that are well trusted. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Jan 22 2014) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, >>>> mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From p.f.moore at gmail.com Wed Jan 22 12:21:27 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 22 Jan 2014 11:21:27 +0000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On 22 January 2014 10:30, Donald Stufft wrote: > Python 3.4 has made great strides in making it easier for applications > to simply turn on these settings, however many people are not aware > at all that they need to opt into this. Most assume that it will operate > similarly to their browser, curl, wget, etc and validate by default and in > the typical style of security related issues it will appear to work just fine > however be grossly insecure. Two things: 1. To be "like the browser" we'd need to use the OS certificate store, which isn't the case on Windows at the moment (managing those certificate bundle files is most definitely *not* "like the browser" - I'd have no idea how to add a self-certificate to the bundle file embedded in pip, for example). 2. Your proposal is that because some application authors have not opted in yet, we should penalise the end users of those applications by stopping them being able to use unverified https? And don't forget, applications that haven't opted in will have no switch to allow unverified use. That seems to be punishing the wrong people. Paul From donald at stufft.io Wed Jan 22 12:29:47 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 06:29:47 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On Jan 22, 2014, at 6:21 AM, Paul Moore wrote: > On 22 January 2014 10:30, Donald Stufft wrote: >> Python 3.4 has made great strides in making it easier for applications >> to simply turn on these settings, however many people are not aware >> at all that they need to opt into this. Most assume that it will operate >> similarly to their browser, curl, wget, etc and validate by default and in >> the typical style of security related issues it will appear to work just fine >> however be grossly insecure. > > Two things: > > 1. To be "like the browser" we'd need to use the OS certificate store, > which isn't the case on Windows at the moment (managing those > certificate bundle files is most definitely *not* "like the browser" - > I'd have no idea how to add a self-certificate to the bundle file > embedded in pip, for example). Python 3.4 added the ability to use the OS cert store on Windows, see http://bugs.python.org/issue17134. > 2. Your proposal is that because some application authors have not > opted in yet, we should penalise the end users of those applications > by stopping them being able to use unverified https? And don't forget, > applications that haven't opted in will have no switch to allow > unverified use. That seems to be punishing the wrong people. Some applications will need to be updated yes to provide such a switch but the alternative is that every user of this API needs to configure it to verify certificates. The difference is that with my proposal the error condition is very obvious, the SSL certificate will fail to validate, a bug can be filed and it can be fixed. With the current behavior they only way you?d know is if you expected it to fail and didn?t, or you went specifically looking. It?s a dangerous by default API that punishes people for not knowing that they need to turn it on, and punishes people who use those applications. However it won?t punish them directly, instead it?ll just make it possible to MITM their connection, possibly leaking sensitive material. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From mal at egenix.com Wed Jan 22 12:30:28 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 22 Jan 2014 12:30:28 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> Message-ID: <52DFABD4.3090702@egenix.com> On 22.01.2014 11:56, Donald Stufft wrote: > > On Jan 22, 2014, at 5:51 AM, M.-A. Lemburg wrote: > >> On 22.01.2014 11:30, Donald Stufft wrote: >>> I would like to propose that a backwards incompatible change be made to Python to make >>> verification of hostname and certificate chain the default instead of requiring it to be opt >>> in. >>> >>> Python 3.4 has made great strides in making it easier for applications to simply turn on these >>> settings, however many people are not aware at all that they need to opt into this. Most assume >>> that it will operate similarly to their browser, curl, wget, etc and validate by default and >>> in the typical style of security related issues it will appear to work just fine however be >>> grossly insecure. >>> >>> In the real world ?opt in security? typically translates to just plain old insecure for the >>> bulk of applications/libraries. I believe that Python has a responsibility to do the right >>> thing by default here and it is in the best position to do so. The alternative requires every >>> Python developer who wants to access a secure resource to be educated on the fact that they >>> need to flip some switch to do what most of them would expect. >> >> Such a change would introduce considerable breakage. This would either >> have to be done using our usual pending deprecation, deprecation, removal >> dance (over three releases) or be postponed until Python 4. > > I can understand the need for doing the typical deprecation dance, although > I believe such policies are often overlooked or accelerated for security > sensitive changes. I do believe that waiting until Python 4 would be doing > a great disservice to the users of Python though. Well, it's not really a security issue, since the security features are present in Python 3.4. It's just that the user has to enable them. >> Note that several python.org services use CAcerts which would no >> longer be accessible per default following such a change. > > Not that it has much to do with this proposal, but those services should > be switched to use certificates that are well trusted. Oh, it does have to do with the proposal, since it's a prominent example of services that would no longer work. While we can potentially change the situation for python.org servers, you have the same issues with other services which we don't have any influence on. The change would also disable all services using self-signed certificates which are very common in internal networks and for ad-hoc setups. Many routers and other devices use self-signed certificates when offering HTTPS services. I think overall, it's good to have default security, but locking out all certificates which do not have their root CA certs installed in default installations of systems per default would likely lead to people seeking other more insecure ways of getting things to work, rather than asking their admins to add their CA certs to the certificate chain configuration. So I'm not sure whether raising errors is the best way to achieve better default security. Perhaps just using warnings would be better. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 22 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald at stufft.io Wed Jan 22 12:36:52 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 06:36:52 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <52DFABD4.3090702@egenix.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> Message-ID: On Jan 22, 2014, at 6:30 AM, M.-A. Lemburg wrote: > On 22.01.2014 11:56, Donald Stufft wrote: >> >> On Jan 22, 2014, at 5:51 AM, M.-A. Lemburg wrote: >> >>> On 22.01.2014 11:30, Donald Stufft wrote: >>>> I would like to propose that a backwards incompatible change be made to Python to make >>>> verification of hostname and certificate chain the default instead of requiring it to be opt >>>> in. >>>> >>>> Python 3.4 has made great strides in making it easier for applications to simply turn on these >>>> settings, however many people are not aware at all that they need to opt into this. Most assume >>>> that it will operate similarly to their browser, curl, wget, etc and validate by default and >>>> in the typical style of security related issues it will appear to work just fine however be >>>> grossly insecure. >>>> >>>> In the real world ?opt in security? typically translates to just plain old insecure for the >>>> bulk of applications/libraries. I believe that Python has a responsibility to do the right >>>> thing by default here and it is in the best position to do so. The alternative requires every >>>> Python developer who wants to access a secure resource to be educated on the fact that they >>>> need to flip some switch to do what most of them would expect. >>> >>> Such a change would introduce considerable breakage. This would either >>> have to be done using our usual pending deprecation, deprecation, removal >>> dance (over three releases) or be postponed until Python 4. >> >> I can understand the need for doing the typical deprecation dance, although >> I believe such policies are often overlooked or accelerated for security >> sensitive changes. I do believe that waiting until Python 4 would be doing >> a great disservice to the users of Python though. > > Well, it's not really a security issue, since the security features > are present in Python 3.4. It's just that the user has to enable them. I explicitly didn?t call it a security issue, but rather a security sensitive change, which it is. This would effectively be making Python more secure by default and removing a security related footgun. > >>> Note that several python.org services use CAcerts which would no >>> longer be accessible per default following such a change. >> >> Not that it has much to do with this proposal, but those services should >> be switched to use certificates that are well trusted. > > Oh, it does have to do with the proposal, since it's a prominent > example of services that would no longer work. While we can potentially > change the situation for python.org servers, you have the same issues > with other services which we don't have any influence on. I meant the statement that they should be switched to a well trusted certificate didn?t have much to do with this proposal. > > The change would also disable all services using self-signed > certificates which are very common in internal networks and > for ad-hoc setups. Many routers and other devices use self-signed > certificates when offering HTTPS services. It will just disable them by default, they can still easily be accessed you?d just need to pass the ?do not verify? flag. This clearly indicates that you?re opting out of the S in HTTPS. > > I think overall, it's good to have default security, but locking out > all certificates which do not have their root CA certs installed > in default installations of systems per default would likely lead to > people seeking other more insecure ways of getting things to work, > rather than asking their admins to add their CA certs to the certificate > chain configuration. So I'm not sure whether raising errors is the > best way to achieve better default security. Perhaps just using > warnings would be better. Again it?s not ?locking out?, it?s simply ?requires explicitly saying I want to not validate?. It?s hard to be more insecure than not verifying. Just about the only other way is to use plaintext but the only real difference there is passive vs attack attacks. However typically if you?re in the situation to do a passive attack you can also do an active attack. So ?more insecure? is minorly more insecure while an error is drastically more secure. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Jan 22 2014) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Wed Jan 22 12:42:00 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 06:42:00 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: <0659BFF0-94C3-49CB-8054-AD2471D45CBB@stufft.io> On Jan 22, 2014, at 6:21 AM, Paul Moore wrote: > 2. Your proposal is that because some application authors have not > opted in yet, we should penalise the end users of those applications > by stopping them being able to use unverified https? And don't forget, > applications that haven't opted in will have no switch to allow > unverified use. That seems to be punishing the wrong people. Another thought, if this is seriously a blocker something simple like an environment variable could be added that switches the default. Which would act as a global sort of ?insecure flag for applications that don?t provide one. I really don?t like the idea of doing that, but it would be better than not validating by default. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ncoghlan at gmail.com Wed Jan 22 12:45:16 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 22 Jan 2014 21:45:16 +1000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On 22 January 2014 21:21, Paul Moore wrote: > On 22 January 2014 10:30, Donald Stufft wrote: >> Python 3.4 has made great strides in making it easier for applications >> to simply turn on these settings, however many people are not aware >> at all that they need to opt into this. Most assume that it will operate >> similarly to their browser, curl, wget, etc and validate by default and in >> the typical style of security related issues it will appear to work just fine >> however be grossly insecure. > > Two things: > > 1. To be "like the browser" we'd need to use the OS certificate store, > which isn't the case on Windows at the moment (managing those > certificate bundle files is most definitely *not* "like the browser" - > I'd have no idea how to add a self-certificate to the bundle file > embedded in pip, for example). > 2. Your proposal is that because some application authors have not > opted in yet, we should penalise the end users of those applications > by stopping them being able to use unverified https? And don't forget, > applications that haven't opted in will have no switch to allow > unverified use. That seems to be punishing the wrong people. Right, the browsers have a whole system of "click through" security to make the web (and corporate intranets!) still usable even when they only accept CA signed certs by default. With a programming language, there's no such interactivity, so applications just break and users don't know why. It's notable that even Linux distros haven't made this change in their system Python builds, and commercial Linux distros have raised paranoia to an art form (since that's a respectable chunk of what their users are paying for). We also have to account for the fact that an awful lot of Python applications are corporate ones relying on perimeter defence for security, or private CAs, or just self-signed certificates that their users have already accepted. There are limits to the amount of backwards incompatible change users will tolerate, and at this point in time we're still trying to get people to accept proper Unicode support. Even the package distribution tools are struggling with the consequences of trying to lock things down by default - every new release of pip currently brings someone else out of the woodwork pointing out they were relying on whichever insecure default we have changed on them this time. Securing the web is a "boil the ocean" type task - Python 3.4 takes us a step closer by making it possible for people to easily use the system certs via ssl.create_default_context() (http://docs.python.org/dev/library/ssl.html#ssl.create_default_context), but "move fast and break things" isn't going to work on this one any more than it does for proper Unicode support or the IPv4 to IPv6 transition. Security concerns are too abstract for most people for them to accept it as an excuse when you tell them you broke their software for their own good. The only before-Python-4 approach I can potentially see working is this: 1. Provide "ssl.create_default_context()" in 3.4 (done) 2. Deprecate the implicit SSL context in 3.5 3. Remove the implicit SSL context in 3.6 4. Make the default context the implicit context in 3.7 It would be slow, but we *could* get there. You *might* even be able to make the case for collapsing steps 3 and 4 together so that the more secure default is brought in as part of 3.6 in 2017. I'd suggest a PEP along those lines, but I think you already have plenty to keep you busy helping to ride herd on the packaging ecosystem :) In the meantime, making it so that it *is* just a single additional function call to get proper TLS security settings in 3.4 is a major step forward (thanks Christian!). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Wed Jan 22 12:53:14 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 22 Jan 2014 12:53:14 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> Message-ID: <52DFB12A.7010202@egenix.com> On 22.01.2014 12:36, Donald Stufft wrote: > > On Jan 22, 2014, at 6:30 AM, M.-A. Lemburg wrote: >> The change would also disable all services using self-signed >> certificates which are very common in internal networks and >> for ad-hoc setups. Many routers and other devices use self-signed >> certificates when offering HTTPS services. > > It will just disable them by default, they can still easily be accessed > you?d just need to pass the ?do not verify? flag. This clearly indicates > that you?re opting out of the S in HTTPS. > >> >> I think overall, it's good to have default security, but locking out >> all certificates which do not have their root CA certs installed >> in default installations of systems per default would likely lead to >> people seeking other more insecure ways of getting things to work, >> rather than asking their admins to add their CA certs to the certificate >> chain configuration. So I'm not sure whether raising errors is the >> best way to achieve better default security. Perhaps just using >> warnings would be better. > > Again it?s not ?locking out?, it?s simply ?requires explicitly saying > I want to not validate?. This would have to configurable without changing application code, e.g. using an environment settings. Otherwise, you do lock out existing scripts and applications from using Python 3.6 by requiring supporting custom configurations from all of them. Simply saying: oh, just change your code to never validate is not a good solution either. > It?s hard to be more insecure than not verifying. Just about the only > other way is to use plaintext but the only real difference there is > passive vs attack attacks. However typically if you?re in the situation > to do a passive attack you can also do an active attack. So ?more > insecure? is minorly more insecure while an error is drastically > more secure. I disagree with that statement. Using HTTPS without verification is still far more secure than using plain text. I know that verification is a lot better, but please remember that practicality beats purity. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 22 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Wed Jan 22 12:58:10 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 22 Jan 2014 21:58:10 +1000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> Message-ID: On 22 January 2014 21:36, Donald Stufft wrote: > On Jan 22, 2014, at 6:30 AM, M.-A. Lemburg wrote: >> The change would also disable all services using self-signed >> certificates which are very common in internal networks and >> for ad-hoc setups. Many routers and other devices use self-signed >> certificates when offering HTTPS services. > > It will just disable them by default, they can still easily be accessed > you?d just need to pass the ?do not verify? flag. This clearly indicates > that you?re opting out of the S in HTTPS. You need to remember that *Python is fundamentally not an application*. We don't control the interaction with the user, application developers do, and every decision we make has to take that into account. The kinds of decisions that an application like a web browser or a package installer can make aren't necessarily available to a runtime. We had to be cautious even with the initial hash randomisation change to avoid breaking currently working applications. Look at the anger that people express about us making Python 3 more sensitive to environment misconfiguration on POSIX systems: people don't blame the misconfigured environment that Python 2 tolerated with an increased risk of data corruption, they blame *us* for breaking something that used to work by default. The change you're proposing would mean that *every* Python application would either need to be updated to explicitly opt in to insecurity (the path most of them will take, because humans) or else to provide a "set this option to make your computer work again" insecurity flag (which is a bad idea anyway, again because humans). There are currently still too many valid reasons for not using verified SSL for us to realistically make it the default without a seriously long transition period (not quite IPv6 or even Python 3 long, but certainly not as short as the time period involved in introducing hash randomisation). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From donald at stufft.io Wed Jan 22 13:02:28 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 07:02:28 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On Jan 22, 2014, at 6:45 AM, Nick Coghlan wrote: > On 22 January 2014 21:21, Paul Moore wrote: >> On 22 January 2014 10:30, Donald Stufft wrote: >>> Python 3.4 has made great strides in making it easier for applications >>> to simply turn on these settings, however many people are not aware >>> at all that they need to opt into this. Most assume that it will operate >>> similarly to their browser, curl, wget, etc and validate by default and in >>> the typical style of security related issues it will appear to work just fine >>> however be grossly insecure. >> >> Two things: >> >> 1. To be "like the browser" we'd need to use the OS certificate store, >> which isn't the case on Windows at the moment (managing those >> certificate bundle files is most definitely *not* "like the browser" - >> I'd have no idea how to add a self-certificate to the bundle file >> embedded in pip, for example). >> 2. Your proposal is that because some application authors have not >> opted in yet, we should penalise the end users of those applications >> by stopping them being able to use unverified https? And don't forget, >> applications that haven't opted in will have no switch to allow >> unverified use. That seems to be punishing the wrong people. > > Right, the browsers have a whole system of "click through" security to > make the web (and corporate intranets!) still usable even when they > only accept CA signed certs by default. With a programming language, > there's no such interactivity, so applications just break and users > don't know why. > > It's notable that even Linux distros haven't made this change in their > system Python builds, and commercial Linux distros have raised > paranoia to an art form (since that's a respectable chunk of what > their users are paying for). I was actually talking to a Debian maintainer about the likelihood of making this change there earlier today :) If I fail at making this change in upstream I?ll be lobbying downstream and then we?ll just have different behaviors based on where you get your Python from which I think stinks. > > We also have to account for the fact that an awful lot of Python > applications are corporate ones relying on perimeter defence for > security, or private CAs, or just self-signed certificates that their > users have already accepted. There are limits to the amount of > backwards incompatible change users will tolerate, and at this point > in time we're still trying to get people to accept proper Unicode > support. Most of those add their private CAs to the system cert stores which would still work fine. I don?t think this change is one that users would be very upset about. We received very positive feedback in doing similar for Pip and we did break things for a few people. > > Even the package distribution tools are struggling with the > consequences of trying to lock things down by default - every new > release of pip currently brings someone else out of the woodwork > pointing out they were relying on whichever insecure default we have > changed on them this time. Again speaking to the packaging tools, the switch to verifying SSL was very well tolerated, it was other changes that are unrelated this this proposal that have caused the bulk of the issues. > > Securing the web is a "boil the ocean" type task - Python 3.4 takes us > a step closer by making it possible for people to easily use the > system certs via ssl.create_default_context() > (http://docs.python.org/dev/library/ssl.html#ssl.create_default_context), > but "move fast and break things" isn't going to work on this one any > more than it does for proper Unicode support or the IPv4 to IPv6 > transition. Security concerns are too abstract for most people for > them to accept it as an excuse when you tell them you broke their > software for their own good. Calling this a ?boil the ocean? type task is facetious. Yes securing the web as a whole would be like boiling the ocean, but enabling TLS validation by default is like basic 101 level. > > The only before-Python-4 approach I can potentially see working is this: > > 1. Provide "ssl.create_default_context()" in 3.4 (done) > 2. Deprecate the implicit SSL context in 3.5 > 3. Remove the implicit SSL context in 3.6 > 4. Make the default context the implicit context in 3.7 > > It would be slow, but we *could* get there. You *might* even be able > to make the case for collapsing steps 3 and 4 together so that the > more secure default is brought in as part of 3.6 in 2017. This would be OK, obviously i?d prefer it to happen on as quickly as timeline as possible but like I said earlier I can totally see the need to have the deprecation process. > > I'd suggest a PEP along those lines, but I think you already have > plenty to keep you busy helping to ride herd on the packaging > ecosystem :) If a PEP is required for this I?ll write one. I have no idea what changes need a PEP or not and I default to not writing one because it?s easier :) > > In the meantime, making it so that it *is* just a single additional > function call to get proper TLS security settings in 3.4 is a major > step forward (thanks Christian!). Absolutely, the situation in 3.4 is way better, but it still relies on the people using the API to even know there is a problem and they get zero warning that they are likely doing it wrong. > > Cheers, > Nick. > > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From p.f.moore at gmail.com Wed Jan 22 13:03:08 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 22 Jan 2014 12:03:08 +0000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On 22 January 2014 11:29, Donald Stufft wrote: >> 1. To be "like the browser" we'd need to use the OS certificate store, >> which isn't the case on Windows at the moment (managing those >> certificate bundle files is most definitely *not* "like the browser" - >> I'd have no idea how to add a self-certificate to the bundle file >> embedded in pip, for example). > > Python 3.4 added the ability to use the OS cert store on Windows, > see http://bugs.python.org/issue17134. Brilliant. I didn't know that. Will pip when run on Python 3.4 use the OS cert store? I guess the answer is probably "no" (but i'd love to be pleasantly surprised). Paul From donald at stufft.io Wed Jan 22 13:03:52 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 07:03:52 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: <7D61B8DF-B035-4D02-9C96-5749C74427E6@stufft.io> On Jan 22, 2014, at 7:03 AM, Paul Moore wrote: > On 22 January 2014 11:29, Donald Stufft wrote: >>> 1. To be "like the browser" we'd need to use the OS certificate store, >>> which isn't the case on Windows at the moment (managing those >>> certificate bundle files is most definitely *not* "like the browser" - >>> I'd have no idea how to add a self-certificate to the bundle file >>> embedded in pip, for example). >> >> Python 3.4 added the ability to use the OS cert store on Windows, >> see http://bugs.python.org/issue17134. > > Brilliant. I didn't know that. > > Will pip when run on Python 3.4 use the OS cert store? I guess the > answer is probably "no" (but i'd love to be pleasantly surprised). > > Paul The answer is (I believe) no, mostly for consistency?s sake. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Wed Jan 22 13:15:47 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 07:15:47 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> Message-ID: <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> On Jan 22, 2014, at 6:58 AM, Nick Coghlan wrote: > On 22 January 2014 21:36, Donald Stufft wrote: >> On Jan 22, 2014, at 6:30 AM, M.-A. Lemburg wrote: >>> The change would also disable all services using self-signed >>> certificates which are very common in internal networks and >>> for ad-hoc setups. Many routers and other devices use self-signed >>> certificates when offering HTTPS services. >> >> It will just disable them by default, they can still easily be accessed >> you?d just need to pass the ?do not verify? flag. This clearly indicates >> that you?re opting out of the S in HTTPS. > > You need to remember that *Python is fundamentally not an > application*. We don't control the interaction with the user, > application developers do, and every decision we make has to take that > into account. > > The kinds of decisions that an application like a web browser or a > package installer can make aren't necessarily available to a runtime. > We had to be cautious even with the initial hash randomisation change > to avoid breaking currently working applications. The same could be said for requests, it?s fundamentally not an application and can?t control the interaction with the user and yet it validates TLS by default just fine. > > Look at the anger that people express about us making Python 3 more > sensitive to environment misconfiguration on POSIX systems: people > don't blame the misconfigured environment that Python 2 tolerated with > an increased risk of data corruption, they blame *us* for breaking > something that used to work by default. They blame Python for breaking something that used to work by default for something they feel has little or no benefit, and which is also difficult or impossible to adapt to without often significant code changes (for the people who I?ve seen blaming Python dev for). Do you really think those people would be making the same complaints if they could restore the previous behavior with a simple boolean flag delivered either via environment variable or in their own code? > > The change you're proposing would mean that *every* Python application > would either need to be updated to explicitly opt in to insecurity > (the path most of them will take, because humans) or else to provide a > "set this option to make your computer work again" insecurity flag > (which is a bad idea anyway, again because humans). Every Python application *that depends on an invalid certificate* and is written for Python3. It?s not hardly every Python application since a good majority of them will simply continue working, and given that most Python applications are still written for Python2 it?s not even something that is going to affect the bulk of *Python* applications. > > There are currently still too many valid reasons for not using > verified SSL for us to realistically make it the default without a > seriously long transition period (not quite IPv6 or even Python 3 > long, but certainly not as short as the time period involved in > introducing hash randomisation). As I?ve said multiple times, I think it?s fine to send it through the deprecation process which is still pretty long and gives people a good chunk of time to update. > > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From p.f.moore at gmail.com Wed Jan 22 13:17:52 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 22 Jan 2014 12:17:52 +0000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On 22 January 2014 12:02, Donald Stufft wrote: >> We also have to account for the fact that an awful lot of Python >> applications are corporate ones relying on perimeter defence for >> security, or private CAs, or just self-signed certificates that their >> users have already accepted. There are limits to the amount of >> backwards incompatible change users will tolerate, and at this point >> in time we're still trying to get people to accept proper Unicode >> support. > > Most of those add their private CAs to the system cert stores > which would still work fine. I don?t think this change is one that > users would be very upset about. We received very positive > feedback in doing similar for Pip and we did break things for > a few people. Speaking as someone whose day job consists entirely of working in a corporate "behind the firewall" environment, in my experience this is simply wrong. Most companies do *not* add private or self certificates to the system stores. Rather, they expect their end users to click on "Yes, Allow" in the browser *every* *time* they access the webpage. In many cases even the local PC store and exception list is locked down, so the user has no way of even avoiding this on a local basis. Python and applications built on Python are often used unofficially in such organisations for productivity-enhancing applications. Because it's unofficial, it's often latest versions. Because it's to improve productivity, grabbing existing apps and libraries and having them work rather than writing your own is crucial. Seriously - the security viewpoints I'm seeing here are so far from corporate life that it's ridiculous. (But to be fair to corporate environments, the firewalls involved mean that the systems involved often have so little internet access that you can essentially ignore anything other than internal threats). Paul From jnoller at gmail.com Wed Jan 22 13:43:03 2014 From: jnoller at gmail.com (Jesse Noller) Date: Wed, 22 Jan 2014 06:43:03 -0600 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <52DFABD4.3090702@egenix.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> Message-ID: <50AAFC8B-FB56-4FD6-B5BC-4D56C47A47C3@gmail.com> > On Jan 22, 2014, at 5:30 AM, "M.-A. Lemburg" wrote: > >> On 22.01.2014 11:56, Donald Stufft wrote: >> >>> On Jan 22, 2014, at 5:51 AM, M.-A. Lemburg wrote: >>> >>>> On 22.01.2014 11:30, Donald Stufft wrote: >>>> I would like to propose that a backwards incompatible change be made to Python to make >>>> verification of hostname and certificate chain the default instead of requiring it to be opt >>>> in. >>>> >>>> Python 3.4 has made great strides in making it easier for applications to simply turn on these >>>> settings, however many people are not aware at all that they need to opt into this. Most assume >>>> that it will operate similarly to their browser, curl, wget, etc and validate by default and >>>> in the typical style of security related issues it will appear to work just fine however be >>>> grossly insecure. >>>> >>>> In the real world ?opt in security? typically translates to just plain old insecure for the >>>> bulk of applications/libraries. I believe that Python has a responsibility to do the right >>>> thing by default here and it is in the best position to do so. The alternative requires every >>>> Python developer who wants to access a secure resource to be educated on the fact that they >>>> need to flip some switch to do what most of them would expect. >>> >>> Such a change would introduce considerable breakage. This would either >>> have to be done using our usual pending deprecation, deprecation, removal >>> dance (over three releases) or be postponed until Python 4. >> >> I can understand the need for doing the typical deprecation dance, although >> I believe such policies are often overlooked or accelerated for security >> sensitive changes. I do believe that waiting until Python 4 would be doing >> a great disservice to the users of Python though. > > Well, it's not really a security issue, since the security features > are present in Python 3.4. It's just that the user has to enable them. I have to concur with Donald here - in the case of security, especially language security which directly impacts the implicit security of downstream applications, I should not have to opt in to the most secure defaults. Yes; this potentially breaks applications relying on insecure / loose defaults. However it changes the model to "you are by default, explicitly secure" then relying on the domain knowledge of an application developer to harden their application. When, if this changes, an application breaks, it will be in a plainly obvious way which can quickly be resolved. Donald is perfectly right: today, it's trivial to MITM an application that relies off of the current behavior; this is bad news bears for users and developers as it means they need domain knowledge to secure their applications by default they may not have. > >>> Note that several python.org services use CAcerts which would no >>> longer be accessible per default following such a change. >> >> Not that it has much to do with this proposal, but those services should >> be switched to use certificates that are well trusted. > > Oh, it does have to do with the proposal, since it's a prominent > example of services that would no longer work. While we can potentially > change the situation for python.org servers, you have the same issues > with other services which we don't have any influence on. > > The change would also disable all services using self-signed > certificates which are very common in internal networks and > for ad-hoc setups. Many routers and other devices use self-signed > certificates when offering HTTPS services. > > I think overall, it's good to have default security, but locking out > all certificates which do not have their root CA certs installed > in default installations of systems per default would likely lead to > people seeking other more insecure ways of getting things to work, > rather than asking their admins to add their CA certs to the certificate > chain configuration. So I'm not sure whether raising errors is the > best way to achieve better default security. Perhaps just using > warnings would be better. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Jan 22 2014) >>>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/jnoller%40gmail.com From rosuav at gmail.com Wed Jan 22 13:58:00 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 22 Jan 2014 23:58:00 +1100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> Message-ID: On Wed, Jan 22, 2014 at 11:15 PM, Donald Stufft wrote: > Do you really think those people would be making the same complaints > if they could restore the previous behavior with a simple boolean flag > delivered either via environment variable or in their own code? You assume that it's easy to tweak the code. From personal experience just today I can say that this isn't always the case. I was asked a question about an internal program that had been in use since the late 1990s, and which had originally been written to work with Netscape Navigator and had been updated to work with Firefox, but not Chrome. The original author is still around, but it's too much hassle to get that code dug into, so it's far easier just to accept a small issue with Chrome (since the program's not used very often anyway). But if Chrome had completely broken that program, the solution would simply be "keep using Firefox", not "fix the program" - it's not considered a bug. Now, maybe it wouldn't be a problem if the fix is an environment variable, but imagine a thousand-computer deployment and you have to tweak the environment on all of them. Feel like doing that just because the newest Python needs it? Not so much. ChrisA From mal at egenix.com Wed Jan 22 14:16:18 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 22 Jan 2014 14:16:18 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <50AAFC8B-FB56-4FD6-B5BC-4D56C47A47C3@gmail.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <50AAFC8B-FB56-4FD6-B5BC-4D56C47A47C3@gmail.com> Message-ID: <52DFC4A2.4090408@egenix.com> On 22.01.2014 13:43, Jesse Noller wrote: >> Well, it's not really a security issue, since the security features >> are present in Python 3.4. It's just that the user has to enable them. > > I have to concur with Donald here - in the case of security, especially language security which directly impacts the implicit security of downstream applications, I should not have to opt in to the most secure defaults. > > Yes; this potentially breaks applications relying on insecure / loose defaults. However it changes the model to "you are by default, explicitly secure" then relying on the domain knowledge of an application developer to harden their application. > > When, if this changes, an application breaks, it will be in a plainly obvious way which can quickly be resolved. The "can quickly be resolved" is the issue... > Donald is perfectly right: today, it's trivial to MITM an application that relies off of the current behavior; this is bad news bears for users and developers as it means they need domain knowledge to secure their applications by default they may not have. I don't think you need much domain knowledge to insert a single line of code into applications to enable the checks. Using an environment switch the extra checks could even be enabled without any code changes. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 22 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Wed Jan 22 14:16:44 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 22 Jan 2014 23:16:44 +1000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> Message-ID: On 22 January 2014 22:15, Donald Stufft wrote: > > On Jan 22, 2014, at 6:58 AM, Nick Coghlan wrote: >> >> The kinds of decisions that an application like a web browser or a >> package installer can make aren't necessarily available to a runtime. >> We had to be cautious even with the initial hash randomisation change >> to avoid breaking currently working applications. > > The same could be said for requests, it?s fundamentally not an application > and can?t control the interaction with the user and yet it validates TLS by > default just fine. The requests library is used by a relatively small fraction of the Python community, and they're mostly web specialists that already understand the need for TLS-by-default and hence don't need it carefully explained to them. New users adopting it just treat that as the "way that requests works" and not something the language developers are forcing on them. >> Look at the anger that people express about us making Python 3 more >> sensitive to environment misconfiguration on POSIX systems: people >> don't blame the misconfigured environment that Python 2 tolerated with >> an increased risk of data corruption, they blame *us* for breaking >> something that used to work by default. > > They blame Python for breaking something that used to work by default > for something they feel has little or no benefit, and which is also difficult > or impossible to adapt to without often significant code changes (for > the people who I?ve seen blaming Python dev for). Which is exactly the way most non-web-specialists working inside the comfort of corporate and academic firewalls will react to a change that breaks their access to internal applications, where self-signed certs and improperly configured internal CAs are endemic (of course, that's assuming they're using HTTPS at all, which I admit is an optimistic assumption). >> There are currently still too many valid reasons for not using >> verified SSL for us to realistically make it the default without a >> seriously long transition period (not quite IPv6 or even Python 3 >> long, but certainly not as short as the time period involved in >> introducing hash randomisation). > > As I?ve said multiple times, I think it?s fine to send it through the > deprecation process which is still pretty long and gives people > a good chunk of time to update. Then I don't believe we actually have an argument here. Create the PEP with the rationale (written in terms a non-web-specialist can understand) and the deprecation strategy, and we can consider adding the initial deprecation warning in 3.5 next year. So, no rush on getting the PEP together :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Wed Jan 22 14:19:04 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 22 Jan 2014 14:19:04 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: <20140122141904.1ec2ef4d@fsol> On Wed, 22 Jan 2014 05:30:40 -0500 Donald Stufft wrote: > I would like to propose that a backwards incompatible change be > made to Python to make verification of hostname and certificate > chain the default instead of requiring it to be opt in. > > Python 3.4 has made great strides in making it easier for applications > to simply turn on these settings, however many people are not aware > at all that they need to opt into this. Most assume that it will operate > similarly to their browser, curl, wget, etc Python is not a Web client. Are you talking specifically about urllib? Regards Antoine. From ncoghlan at gmail.com Wed Jan 22 14:24:54 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 22 Jan 2014 23:24:54 +1000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <20140122141904.1ec2ef4d@fsol> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <20140122141904.1ec2ef4d@fsol> Message-ID: On 22 January 2014 23:19, Antoine Pitrou wrote: > On Wed, 22 Jan 2014 05:30:40 -0500 > Donald Stufft wrote: >> I would like to propose that a backwards incompatible change be >> made to Python to make verification of hostname and certificate >> chain the default instead of requiring it to be opt in. >> >> Python 3.4 has made great strides in making it easier for applications >> to simply turn on these settings, however many people are not aware >> at all that they need to opt into this. Most assume that it will operate >> similarly to their browser, curl, wget, etc > > Python is not a Web client. Are you talking specifically about urllib? And all the other client modules that can make secure network connections (but don't validate that the certificate matches the hostname by default). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From christian at python.org Wed Jan 22 14:29:04 2014 From: christian at python.org (Christian Heimes) Date: Wed, 22 Jan 2014 14:29:04 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On 22.01.2014 12:45, Nick Coghlan wrote: > We also have to account for the fact that an awful lot of Python > applications are corporate ones relying on perimeter defence for > security, or private CAs, or just self-signed certificates that their > users have already accepted. There are limits to the amount of > backwards incompatible change users will tolerate, and at this point > in time we're still trying to get people to accept proper Unicode > support. Side note: Users can simple add self-signed certs to OpenSSL's cert store and get validation for free. It's possible to do that with an environment variable, too. But I recommend against the environment variable because you may overwrite to operating store. Christian From donald at stufft.io Wed Jan 22 14:55:02 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 08:55:02 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On Jan 22, 2014, at 8:29 AM, Christian Heimes wrote: > On 22.01.2014 12:45, Nick Coghlan wrote: >> We also have to account for the fact that an awful lot of Python >> applications are corporate ones relying on perimeter defence for >> security, or private CAs, or just self-signed certificates that their >> users have already accepted. There are limits to the amount of >> backwards incompatible change users will tolerate, and at this point >> in time we're still trying to get people to accept proper Unicode >> support. > > Side note: > Users can simple add self-signed certs to OpenSSL's cert store and get > validation for free. It's possible to do that with an environment > variable, too. But I recommend against the environment variable because > you may overwrite to operating store. > > Christian > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io As an additional side note, anecdotal evidence and what not, but *every* time I bring this up somewhere I get at least one reply that looks similar to https://twitter.com/ojiidotch/status/425986619879866368 ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From christian at python.org Wed Jan 22 15:03:44 2014 From: christian at python.org (Christian Heimes) Date: Wed, 22 Jan 2014 15:03:44 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: <52DFCFC0.4020205@python.org> On 22.01.2014 14:55, Donald Stufft wrote: > As an additional side note, anecdotal evidence and what not, but > *every* time I bring this up somewhere I get at least one reply > that looks similar to > https://twitter.com/ojiidotch/status/425986619879866368 Yeah :( The ssl module documentation http://docs.python.org/3/library/ssl.html features a big red warning box for a good reason. From christian at python.org Wed Jan 22 15:07:19 2014 From: christian at python.org (Christian Heimes) Date: Wed, 22 Jan 2014 15:07:19 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <20140122141904.1ec2ef4d@fsol> Message-ID: On 22.01.2014 14:24, Nick Coghlan wrote: > On 22 January 2014 23:19, Antoine Pitrou wrote: >> On Wed, 22 Jan 2014 05:30:40 -0500 >> Donald Stufft wrote: >>> I would like to propose that a backwards incompatible change be >>> made to Python to make verification of hostname and certificate >>> chain the default instead of requiring it to be opt in. >>> >>> Python 3.4 has made great strides in making it easier for applications >>> to simply turn on these settings, however many people are not aware >>> at all that they need to opt into this. Most assume that it will operate >>> similarly to their browser, curl, wget, etc >> >> Python is not a Web client. Are you talking specifically about urllib? > > And all the other client modules that can make secure network > connections (but don't validate that the certificate matches the > hostname by default). With Python 3.4 all stdlib modules can verify the hostname and in fact do with ssl.create_default_context(). Several modules like ftplib didn't support SNI and hostname verification. From jnoller at gmail.com Wed Jan 22 15:08:48 2014 From: jnoller at gmail.com (Jesse Noller) Date: Wed, 22 Jan 2014 08:08:48 -0600 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> Message-ID: <801041CB-FF3D-4321-A7AB-2A3E9014DB24@gmail.com> > On Jan 22, 2014, at 6:58 AM, Chris Angelico wrote: > >> On Wed, Jan 22, 2014 at 11:15 PM, Donald Stufft wrote: >> Do you really think those people would be making the same complaints >> if they could restore the previous behavior with a simple boolean flag >> delivered either via environment variable or in their own code? > > You assume that it's easy to tweak the code. From personal experience > just today I can say that this isn't always the case. I was asked a > question about an internal program that had been in use since the late > 1990s, and which had originally been written to work with Netscape > Navigator and had been updated to work with Firefox, but not Chrome. > The original author is still around, but it's too much hassle to get > that code dug into, so it's far easier just to accept a small issue > with Chrome (since the program's not used very often anyway). But if > Chrome had completely broken that program, the solution would simply > be "keep using Firefox", not "fix the program" - it's not considered a > bug. > > Now, maybe it wouldn't be a problem if the fix is an environment > variable, but imagine a thousand-computer deployment and you have to > tweak the environment on all of them. Feel like doing that just > because the newest Python needs it? Not so much. > What's the bet that that application will be ported to python 3.4/3.5 if this is the case? I'd say approaching 0, which is ok. > ChrisA > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/jnoller%40gmail.com From jnoller at gmail.com Wed Jan 22 15:12:49 2014 From: jnoller at gmail.com (Jesse Noller) Date: Wed, 22 Jan 2014 08:12:49 -0600 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <52DFCFC0.4020205@python.org> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFCFC0.4020205@python.org> Message-ID: <9952F949-07EF-444A-B5D5-971242F656E1@gmail.com> > On Jan 22, 2014, at 8:03 AM, Christian Heimes wrote: > >> On 22.01.2014 14:55, Donald Stufft wrote: >> As an additional side note, anecdotal evidence and what not, but >> *every* time I bring this up somewhere I get at least one reply >> that looks similar to >> https://twitter.com/ojiidotch/status/425986619879866368 > > > Yeah :( > > The ssl module documentation http://docs.python.org/3/library/ssl.html > features a big red warning box for a good reason. And no one reads it. I can't count the number of times I've gotten called into a managers office when they find out python doesn't do cert validation by default (and in 2, it's not been trivial) and gotten told to fix it, or we move off of python. Donald is perfectly right: every time you point out to users that this is the default behavior the response is almost universally "you can't be serious, is this a joke?" > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/jnoller%40gmail.com From christian at python.org Wed Jan 22 15:13:00 2014 From: christian at python.org (Christian Heimes) Date: Wed, 22 Jan 2014 15:13:00 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <50AAFC8B-FB56-4FD6-B5BC-4D56C47A47C3@gmail.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <50AAFC8B-FB56-4FD6-B5BC-4D56C47A47C3@gmail.com> Message-ID: On 22.01.2014 13:43, Jesse Noller wrote: > I have to concur with Donald here - in the case of security, especially language security which directly impacts the implicit security of downstream applications, I should not have to opt in to the most secure defaults. > > Yes; this potentially breaks applications relying on insecure / loose defaults. However it changes the model to "you are by default, explicitly secure" then relying on the domain knowledge of an application developer to harden their application. > > When, if this changes, an application breaks, it will be in a plainly obvious way which can quickly be resolved. > > Donald is perfectly right: today, it's trivial to MITM an application that relies off of the current behavior; this is bad news bears for users and developers as it means they need domain knowledge to secure their applications by default they may not have. For 3.5 I'd like to work on a policy framework for the ssl where application can define policies like SSL/TLS version, cert store, verification modes etc. etc. I'll discuss my ideas with Donald, Alex and the other crypto guys as soon as I have settled in with my new job and town. Christian From rosuav at gmail.com Wed Jan 22 15:16:14 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 23 Jan 2014 01:16:14 +1100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <801041CB-FF3D-4321-A7AB-2A3E9014DB24@gmail.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <801041CB-FF3D-4321-A7AB-2A3E9014DB24@gmail.com> Message-ID: On Thu, Jan 23, 2014 at 1:08 AM, Jesse Noller wrote: >> Now, maybe it wouldn't be a problem if the fix is an environment >> variable, but imagine a thousand-computer deployment and you have to >> tweak the environment on all of them. Feel like doing that just >> because the newest Python needs it? Not so much. >> > > What's the bet that that application will be ported to python 3.4/3.5 if this is the case? I'd say approaching 0, which is ok. Define "ported to". (This particular application isn't Python, so the specifics don't apply, but in general.) Usually that means simply "run on". Something that was written for Python 3.2 will probably run on 3.3, and on 3.4, and on 3.5 as well. You certainly wouldn't expect one small corner of it to suddenly start doing different stuff, and if you do, you'll blame Python... which would mean that you're right, that program wouldn't be run on 3.4. Is that a good thing? I don't know, but I think not. In a big company with lots of seats, every option is looking like a sysadmin's nightmare. That said, though, I agree *in principle* that secure-by-default is the way to go. It's just the backward-incompatibility of *changing* it. I like how requests is going. ChrisA From solipsis at pitrou.net Wed Jan 22 15:18:14 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 22 Jan 2014 15:18:14 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <50AAFC8B-FB56-4FD6-B5BC-4D56C47A47C3@gmail.com> Message-ID: <20140122151814.2fd8a230@fsol> On Wed, 22 Jan 2014 15:13:00 +0100 Christian Heimes wrote: > On 22.01.2014 13:43, Jesse Noller wrote: > > I have to concur with Donald here - in the case of security, especially language security which directly impacts the implicit security of downstream applications, I should not have to opt in to the most secure defaults. > > > > Yes; this potentially breaks applications relying on insecure / loose defaults. However it changes the model to "you are by default, explicitly secure" then relying on the domain knowledge of an application developer to harden their application. > > > > When, if this changes, an application breaks, it will be in a plainly obvious way which can quickly be resolved. > > > > Donald is perfectly right: today, it's trivial to MITM an application that relies off of the current behavior; this is bad news bears for users and developers as it means they need domain knowledge to secure their applications by default they may not have. > > For 3.5 I'd like to work on a policy framework for the ssl where > application can define policies like SSL/TLS version, cert store, > verification modes etc. etc. Isn't that called a SSLContext? From p.f.moore at gmail.com Wed Jan 22 15:19:54 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 22 Jan 2014 14:19:54 +0000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On 22 January 2014 13:55, Donald Stufft wrote: > > As an additional side note, anecdotal evidence and what not, but > *every* time I bring this up somewhere I get at least one reply that > looks similar to https://twitter.com/ojiidotch/status/425986619879866368 Surprise that Python doesn't verify certs is one thing. I would also like to live in a world where Python has always verified certs, and all the issues have already been resolved. Imposing breakage on end users because we haven't managed to persuade application developers to do the right thing yet (even though it appears we've made it one-line-of-code easy to do so) is another thing entirely. But the deprecation cycle gives application developers time (and a deadline) so I'm happy with that. Although from MAL's original comment: > Note that several python.org services use CAcerts which would no > longer be accessible per default following such a change. ,The PSF needs to get that sorted before making cert validation the default in Python, IMO. Paul From donald at stufft.io Wed Jan 22 15:25:25 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 09:25:25 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On Jan 22, 2014, at 9:19 AM, Paul Moore wrote: > On 22 January 2014 13:55, Donald Stufft wrote: >> >> As an additional side note, anecdotal evidence and what not, but >> *every* time I bring this up somewhere I get at least one reply that >> looks similar to https://twitter.com/ojiidotch/status/425986619879866368 > > Surprise that Python doesn't verify certs is one thing. I would also > like to live in a world where Python has always verified certs, and > all the issues have already been resolved. Imposing breakage on end > users because we haven't managed to persuade application developers to > do the right thing yet (even though it appears we've made it > one-line-of-code easy to do so) is another thing entirely. Note: That it requires users to even be aware they *need* to do that one line of code, which many are not. > > But the deprecation cycle gives application developers time (and a > deadline) so I'm happy with that. Awesome, It looks like I?ll be writing a PEP to handle this, I wasn?t sure if it needed one or not. > > Although from MAL's original comment: >> Note that several python.org services use CAcerts which would no >> longer be accessible per default following such a change. > > ,The PSF needs to get that sorted before making cert validation the > default in Python, IMO. I?m not aware of which services those are, if MAL (or anyone else) can point them out I?ll see what I can do to make that happen. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From p.f.moore at gmail.com Wed Jan 22 15:28:15 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 22 Jan 2014 14:28:15 +0000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On 22 January 2014 13:29, Christian Heimes wrote: > Side note: > Users can simple add self-signed certs to OpenSSL's cert store and get > validation for free. It's possible to do that with an environment > variable, too. But I recommend against the environment variable because > you may overwrite to operating store. I'm pretty sure what I'm about to ask isn't what you mean, but take it as an example of how people may misunderstand and/or misinterpret comments in this area ;-) So if I set up a PyPI mirror running under https, with a self-signed certificate, can you explain how I get it to work? For "work", assume I mean pip will use it, I can browse to it with my web browser, and my various Python scripts (now running under Python 3.5 with SSL verification on by default) that query the index all work without needing extra flags, code changes, or interactive prompts. I'm on Windows, by the way, just for added fun. (This is a one of the real-world reasons I've never set up a local https index - not a big one, laziness trumps it by miles :-) as does the effectiveness of simpler solutions - but it's there. I did think about it at one stage. If I *were* to set up an index, it's definitely why I'd use http rather than bothering with https.) Paul From donald at stufft.io Wed Jan 22 15:33:18 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 09:33:18 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On Jan 22, 2014, at 9:28 AM, Paul Moore wrote: > On 22 January 2014 13:29, Christian Heimes wrote: >> Side note: >> Users can simple add self-signed certs to OpenSSL's cert store and get >> validation for free. It's possible to do that with an environment >> variable, too. But I recommend against the environment variable because >> you may overwrite to operating store. > > I'm pretty sure what I'm about to ask isn't what you mean, but take it > as an example of how people may misunderstand and/or misinterpret > comments in this area ;-) > > So if I set up a PyPI mirror running under https, with a self-signed > certificate, can you explain how I get it to work? For "work", assume > I mean pip will use it, I can browse to it with my web browser, and my > various Python scripts (now running under Python 3.5 with SSL > verification on by default) that query the index all work without > needing extra flags, code changes, or interactive prompts. > > I'm on Windows, by the way, just for added fun. For everything but pip, you?d add it to your OS cert store. Pip doesn?t use that so you?d have to use the ?cert config. > > (This is a one of the real-world reasons I've never set up a local > https index - not a big one, laziness trumps it by miles :-) as does > the effectiveness of simpler solutions - but it's there. I did think > about it at one stage. If I *were* to set up an index, it's definitely > why I'd use http rather than bothering with https.) > > Paul > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From christian at python.org Wed Jan 22 15:33:21 2014 From: christian at python.org (Christian Heimes) Date: Wed, 22 Jan 2014 15:33:21 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <9952F949-07EF-444A-B5D5-971242F656E1@gmail.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFCFC0.4020205@python.org> <9952F949-07EF-444A-B5D5-971242F656E1@gmail.com> Message-ID: <52DFD6B1.6060200@python.org> On 22.01.2014 15:12, Jesse Noller wrote: > And no one reads it. I can't count the number of times I've gotten called into a managers office when they find out python doesn't do cert validation by default (and in 2, it's not been trivial) and gotten told to fix it, or we move off of python. > > Donald is perfectly right: every time you point out to users that this is the default behavior the response is almost universally "you can't be serious, is this a joke?" Yes, you are right. :( About two months ago (maybe three) I proposed to deprecated implicit SSL context, unverified certs and unverified hostnames all together. But I was voted down. Donald made a similar attempt half an year ago, too. Can't we just mark these things as pending deprecated in Python 3.4 so people start fixing their code *now*? Christian From donald at stufft.io Wed Jan 22 15:36:05 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 09:36:05 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <52DFD6B1.6060200@python.org> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFCFC0.4020205@python.org> <9952F949-07EF-444A-B5D5-971242F656E1@gmail.com> <52DFD6B1.6060200@python.org> Message-ID: On Jan 22, 2014, at 9:33 AM, Christian Heimes wrote: > On 22.01.2014 15:12, Jesse Noller wrote: >> And no one reads it. I can't count the number of times I've gotten called into a managers office when they find out python doesn't do cert validation by default (and in 2, it's not been trivial) and gotten told to fix it, or we move off of python. >> >> Donald is perfectly right: every time you point out to users that this is the default behavior the response is almost universally "you can't be serious, is this a joke?" > > Yes, you are right. :( > > About two months ago (maybe three) I proposed to deprecated implicit SSL > context, unverified certs and unverified hostnames all together. But I > was voted down. Donald made a similar attempt half an year ago, too. Last time I tried the reasoning was that Python couldn?t ship root certs and we couldn?t get to the OS certs everywhere. Thanks to you this is fixed now, so ?once more unto the breach?. > > Can't we just mark these things as pending deprecated in Python 3.4 so > people start fixing their code *now*? +10000 ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From benjamin at python.org Wed Jan 22 15:39:48 2014 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 22 Jan 2014 06:39:48 -0800 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> Message-ID: <1390401588.2931.73946373.277A7FB6@webmail.messagingengine.com> On Wed, Jan 22, 2014, at 04:15 AM, Donald Stufft wrote: > > On Jan 22, 2014, at 6:58 AM, Nick Coghlan wrote: > > > On 22 January 2014 21:36, Donald Stufft wrote: > >> On Jan 22, 2014, at 6:30 AM, M.-A. Lemburg wrote: > >>> The change would also disable all services using self-signed > >>> certificates which are very common in internal networks and > >>> for ad-hoc setups. Many routers and other devices use self-signed > >>> certificates when offering HTTPS services. > >> > >> It will just disable them by default, they can still easily be accessed > >> you?d just need to pass the ?do not verify? flag. This clearly indicates > >> that you?re opting out of the S in HTTPS. > > > > You need to remember that *Python is fundamentally not an > > application*. We don't control the interaction with the user, > > application developers do, and every decision we make has to take that > > into account. > > > > The kinds of decisions that an application like a web browser or a > > package installer can make aren't necessarily available to a runtime. > > We had to be cautious even with the initial hash randomisation change > > to avoid breaking currently working applications. > > The same could be said for requests, it?s fundamentally not an > application > and can?t control the interaction with the user and yet it validates TLS > by > default just fine. Speaking of requests, I think another way to address this issue would be import a requests-like APIs into the stdlib (something which should happen anyway) and make that verify certificates by default. This would address the casual urllib-type usecase of fetching files over http/ftp etc. (I expect most people using their own protocols over raw TLS already know to force certificate verification.) From christian at python.org Wed Jan 22 15:45:34 2014 From: christian at python.org (Christian Heimes) Date: Wed, 22 Jan 2014 15:45:34 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFCFC0.4020205@python.org> <9952F949-07EF-444A-B5D5-971242F656E1@gmail.com> <52DFD6B1.6060200@python.org> Message-ID: <52DFD98E.6040105@python.org> On 22.01.2014 15:36, Donald Stufft wrote: > Last time I tried the reasoning was that Python couldn?t ship root certs > and we couldn?t get to the OS certs everywhere. Thanks to you this > is fixed now, so ?once more unto the breach?. The Windows situation is still not perfect, though. I'd love to use Chrome's approach and directly hook Windows' crypt32 API into OpenSSL verify function. That would trigger automatic retrieval of unknown root certs and CRL checks. From solipsis at pitrou.net Wed Jan 22 16:05:43 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 22 Jan 2014 16:05:43 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFCFC0.4020205@python.org> <9952F949-07EF-444A-B5D5-971242F656E1@gmail.com> <52DFD6B1.6060200@python.org> Message-ID: <20140122160543.0e6bf427@fsol> On Wed, 22 Jan 2014 15:33:21 +0100 Christian Heimes wrote: > > About two months ago (maybe three) I proposed to deprecated implicit SSL > context, unverified certs and unverified hostnames all together. But I > was voted down. Donald made a similar attempt half an year ago, too. So why are you trying a third time? Do you have any new arguments compared to last time? Antoine. From donald at stufft.io Wed Jan 22 16:10:36 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 10:10:36 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <20140122160543.0e6bf427@fsol> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFCFC0.4020205@python.org> <9952F949-07EF-444A-B5D5-971242F656E1@gmail.com> <52DFD6B1.6060200@python.org> <20140122160543.0e6bf427@fsol> Message-ID: On Jan 22, 2014, at 10:05 AM, Antoine Pitrou wrote: > On Wed, 22 Jan 2014 15:33:21 +0100 > Christian Heimes wrote: >> >> About two months ago (maybe three) I proposed to deprecated implicit SSL >> context, unverified certs and unverified hostnames all together. But I >> was voted down. Donald made a similar attempt half an year ago, too. > > So why are you trying a third time? Do you have any new arguments > compared to last time? See my other email, Last time I tried I was told the reason was there wasn?t a reliable enough default certificate store that worked on platforms such as Windows and Python was unwilling to ship it?s own certificate bundle. Christian has improved this situation so that it appears that this issue has been largely resolved. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ericsnowcurrently at gmail.com Wed Jan 22 16:12:06 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 22 Jan 2014 08:12:06 -0700 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <52DFC4A2.4090408@egenix.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <50AAFC8B-FB56-4FD6-B5BC-4D56C47A47C3@gmail.com> <52DFC4A2.4090408@egenix.com> Message-ID: On Jan 22, 2014 6:17 AM, "M.-A. Lemburg" wrote: > Using an environment switch the extra checks could even be enabled > without any code changes. When Donald brought this up it sounded good. It still does. This is similar to what we did for hash randomization. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From cory at lukasa.co.uk Wed Jan 22 11:38:10 2014 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 22 Jan 2014 10:38:10 +0000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On 22 January 2014 10:30, Donald Stufft wrote: > I would like to propose that a backwards incompatible change be > made to Python to make verification of hostname and certificate > chain the default instead of requiring it to be opt in. I'm overwhelmingly, dramatically +1 on this. There's no good architectural reason to not use the built-in certificate chains by default. I'd like to be in favour of backporting this change to earlier Python versions as well, but it feels just a bit too aggressive. Cory From cory at lukasa.co.uk Wed Jan 22 12:00:13 2014 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 22 Jan 2014 11:00:13 +0000 (UTC) Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: Donald Stufft stufft.io> writes: > > I would like to propose that a backwards incompatible change be > made to Python to make verification of hostname and certificate > chain the default instead of requiring it to be opt in. I'm overwhelmingly, dramatically +1 on this. There's no good architectural reason to not use the built-in certificate chains by default. I'd like to be in favour of backporting this change to earlier Python versions as well, but it feels too aggressive, even to me. From bp at benjamin-peterson.org Wed Jan 22 15:32:43 2014 From: bp at benjamin-peterson.org (Benjamin Peterson) Date: Wed, 22 Jan 2014 06:32:43 -0800 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: <1390401163.1494.73944713.464B2111@webmail.messagingengine.com> On Wed, Jan 22, 2014, at 04:02 AM, Donald Stufft wrote: > > On Jan 22, 2014, at 6:45 AM, Nick Coghlan wrote: > > > On 22 January 2014 21:21, Paul Moore wrote: > >> On 22 January 2014 10:30, Donald Stufft wrote: > >>> Python 3.4 has made great strides in making it easier for applications > >>> to simply turn on these settings, however many people are not aware > >>> at all that they need to opt into this. Most assume that it will operate > >>> similarly to their browser, curl, wget, etc and validate by default and in > >>> the typical style of security related issues it will appear to work just fine > >>> however be grossly insecure. > >> > >> Two things: > >> > >> 1. To be "like the browser" we'd need to use the OS certificate store, > >> which isn't the case on Windows at the moment (managing those > >> certificate bundle files is most definitely *not* "like the browser" - > >> I'd have no idea how to add a self-certificate to the bundle file > >> embedded in pip, for example). > >> 2. Your proposal is that because some application authors have not > >> opted in yet, we should penalise the end users of those applications > >> by stopping them being able to use unverified https? And don't forget, > >> applications that haven't opted in will have no switch to allow > >> unverified use. That seems to be punishing the wrong people. > > > > Right, the browsers have a whole system of "click through" security to > > make the web (and corporate intranets!) still usable even when they > > only accept CA signed certs by default. With a programming language, > > there's no such interactivity, so applications just break and users > > don't know why. > > > > It's notable that even Linux distros haven't made this change in their > > system Python builds, and commercial Linux distros have raised > > paranoia to an art form (since that's a respectable chunk of what > > their users are paying for). > > I was actually talking to a Debian maintainer about the likelihood > of making this change there earlier today :) If I fail at making this > change in upstream I?ll be lobbying downstream and then we?ll > just have different behaviors based on where you get your Python > from which I think stinks. I suppose if Debian wants to serve as a test ground to determine whether everyone is happy about having their scripts broken, that's fine, too. From solipsis at pitrou.net Wed Jan 22 17:07:46 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 22 Jan 2014 17:07:46 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <50AAFC8B-FB56-4FD6-B5BC-4D56C47A47C3@gmail.com> <52DFC4A2.4090408@egenix.com> Message-ID: <20140122170746.44549969@fsol> On Wed, 22 Jan 2014 08:12:06 -0700 Eric Snow wrote: > On Jan 22, 2014 6:17 AM, "M.-A. Lemburg" wrote: > > Using an environment switch the extra checks could even be enabled > > without any code changes. > > When Donald brought this up it sounded good. It still does. This is > similar to what we did for hash randomization. The comparison is baseless. Hash randomization is a language feature that can only be enabled at interpreter startup, and is at best a per-application decision. SSL settings, on the other hand, have to be decided per-client endpoint, not per-process, and they will depend on the external service you connect to rather than the way your code is written. I'm -1 on adding env vars because we can't agree on SSL configuration options. Regards Antoine. From storchaka at gmail.com Wed Jan 22 17:20:51 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 22 Jan 2014 18:20:51 +0200 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: 20.01.14 13:14, Serhiy Storchaka ???????(??): >> Contestant 5: "Put in __clinic__ directory, add .h" >> >> foo.c -> __clinic__/foo.c.h >> foo.h -> __clinic__/foo.h.h > > -0.5. As far as 4 and 5 have equal total votes, I change my vote for 5 from -0.5 to -0. From gokoproject at gmail.com Wed Jan 22 19:10:39 2014 From: gokoproject at gmail.com (John Yeuk Hon Wong) Date: Wed, 22 Jan 2014 13:10:39 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> Message-ID: <52E0099F.6000704@gmail.com> On 1/22/14 8:16 AM, Nick Coghlan wrote: > Which is exactly the way most non-web-specialists working inside the > comfort of corporate and academic firewalls will react to a change > that breaks their access to internal applications, where self-signed > certs and improperly configured internal CAs are endemic (of course, > that's assuming they're using HTTPS at all, which I admit is an > optimistic assumption). The number of people who are using 3.4+ in these environments is probably very very low to be honest. I don't have a number to prove, but in that environment people are more likely to still be using 2.6+. I think a deprecation in 2.7+ would be nice, but forward we should just enable it by default. When requests changed property calls (e.g. requests.json) to callable instead of an attribute(from requests.json to requests.json()), I was shocked. I had to figure out by Googling it. I found out from github issue.... I think a hard fail is somehow necessary. Also, a lot of people overlook at deprecation warnings. They either don't care or don't see it. I see a lot of deprecation warnings in the older applications I write, but I can careless until it breaks. So as we moving forward, we can break it. For those stuck behind, deprecation is the right approach. John From brian at python.org Wed Jan 22 19:46:57 2014 From: brian at python.org (Brian Curtin) Date: Wed, 22 Jan 2014 12:46:57 -0600 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <52E0099F.6000704@gmail.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <52E0099F.6000704@gmail.com> Message-ID: On Wed, Jan 22, 2014 at 12:10 PM, John Yeuk Hon Wong wrote: > On 1/22/14 8:16 AM, Nick Coghlan wrote: >> >> Which is exactly the way most non-web-specialists working inside the >> comfort of corporate and academic firewalls will react to a change that >> breaks their access to internal applications, where self-signed certs and >> improperly configured internal CAs are endemic (of course, that's assuming >> they're using HTTPS at all, which I admit is an optimistic assumption). > > The number of people who are using 3.4+ in these environments is probably > very very low to be honest. I don't have a number to prove, but in that > environment people are more likely to still be using 2.6+. I think a > deprecation in 2.7+ would be nice, but forward we should just enable it by > default. > > When requests changed property calls (e.g. requests.json) to callable > instead of an attribute(from requests.json to requests.json()), I was > shocked. I had to figure out by Googling it. I found out from github > issue.... > > I think a hard fail is somehow necessary. > > Also, a lot of people overlook at deprecation warnings. They either don't > care or don't see it. I see a lot of deprecation warnings in the older > applications I write, but I can careless until it breaks. So as we moving > forward, we can break it. For those stuck behind, deprecation is the right > approach. They're disabled by default, so a lot of people simply don't know they exist because they also don't read the documentation. From donald at stufft.io Wed Jan 22 19:50:37 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 13:50:37 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <52E0099F.6000704@gmail.com> Message-ID: On Jan 22, 2014, at 1:46 PM, Brian Curtin wrote: > On Wed, Jan 22, 2014 at 12:10 PM, John Yeuk Hon Wong > wrote: >> On 1/22/14 8:16 AM, Nick Coghlan wrote: >>> >>> Which is exactly the way most non-web-specialists working inside the >>> comfort of corporate and academic firewalls will react to a change that >>> breaks their access to internal applications, where self-signed certs and >>> improperly configured internal CAs are endemic (of course, that's assuming >>> they're using HTTPS at all, which I admit is an optimistic assumption). >> >> The number of people who are using 3.4+ in these environments is probably >> very very low to be honest. I don't have a number to prove, but in that >> environment people are more likely to still be using 2.6+. I think a >> deprecation in 2.7+ would be nice, but forward we should just enable it by >> default. >> >> When requests changed property calls (e.g. requests.json) to callable >> instead of an attribute(from requests.json to requests.json()), I was >> shocked. I had to figure out by Googling it. I found out from github >> issue.... >> >> I think a hard fail is somehow necessary. >> >> Also, a lot of people overlook at deprecation warnings. They either don't >> care or don't see it. I see a lot of deprecation warnings in the older >> applications I write, but I can careless until it breaks. So as we moving >> forward, we can break it. For those stuck behind, deprecation is the right >> approach. > > They're disabled by default, so a lot of people simply don't know they > exist because they also don't read the documentation. Ironically this is the exact reason why validation should happen by default :] ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From solipsis at pitrou.net Wed Jan 22 20:02:40 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 22 Jan 2014 20:02:40 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <52E0099F.6000704@gmail.com> Message-ID: <20140122200240.2a301be9@fsol> On Wed, 22 Jan 2014 13:50:37 -0500 Donald Stufft wrote: > > Ironically this is the exact reason why validation should happen by default :] I think most of us would agree that a new client API should do validation by default (with an easy way to opt out). So let's concentrate on the question of whether and how to migrate existing APIs while remaining user-friendly. Regards Antoine. From wes.turner at gmail.com Wed Jan 22 19:53:37 2014 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 22 Jan 2014 12:53:37 -0600 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <52E0099F.6000704@gmail.com> Message-ID: If I could summarize this discussion (please correct me if I am wrong): 1. Status Quo All existing versions of Python are unsecure by default because by not doing SSL hostname verification, libraries which wrap sockets with SSLContext allow SSL MITM (man-in-the-middle) with no warning. Christian Heimes has introduced ss.create_default_context for 3.4 Donald Stuft is proposing that hostname verification should be the default behavior in future versions of Python; such as Python 3.4, ideally as soon as possible. python-dev has been resistant to preventing SSL MITM by default because humans, pypi:keyring, and pypi:certifi 2. Generalizations about end-users: * corporate users aren't susceptible to MITM because perimeter * fallacious on all accounts * mismatched metaphors abound * nobody knows how to add self-signed CA certs to their chains * Agreed. * most people haven't / aren't going to read the docs: http://docs.python.org/3/library/ssl.html#verifying-certificates A **warning::** would probably be appropriate (and a big red warning box has been added to the 3.3 docs) It could also say "PYTHON DOES NOT VALIDATE SSL CERTIFICATES BY DEFAULT" * most people don't realize / have modification rights / have the ability to submit patches upstream; either to * set CERT_REQUIRED for every connection * set (a forthcoming) CERT_NOVALIDATE * people who complain about breaking security defaults which allow MITM are most relevant * "it doesn't work on my SOHO router" * people would need to understand how Python (and many other languages' SSL implementations) is less secure than current browsers * people don't read "What's New"; distributions test and upgrade their interpreter for them * some people are aware that third-party libraries requests and urllib3 do SSL hostname validation by default, now, with Python 2 3. Compatibility There could be a PYTHONSSLNOVALIDATE environment variable: * +1 : This would allow workarounds for applications which cannot be updated * -1 : This is not preferable because PYTHONHASHSEED 4. Chain management * It should be possible to update the cert chain * It could be easier to specify a different cert chain (?) * Python 3.4 now supports using the Windows cert chain * Pip does not yet support using the Windows cert chain * pypi:certifi adds the Mozilla Firefox keychain for Python 2 and 3 (like requests and pip) 5. Deprecation * Should deprecate slowly (2017) because people would complain about having been secured against MITM (in their upgrade to Python 3.4) * Should not deprecate slowly because the status quo (insecure by default) is risky * Should add pending deprecation now (see CERT_NOVALIDATE NOP below) 6. Testing Responses * It's probably good to start testing downstream patches to Python 3.4 packages * The error message may be primed My two cents: * CERT_NOVALIDATE could/should be present now (if even as a no-op) * 2to3 could/should add CERT_NOVALIDATE Please feel free to add any, all, or none of this to the forthcoming PEP. Thank you for addressing this issue. (This is buried here because the mailman archive gzip hasn't yet updated with the latest Message-Id to specify for In-Reply-To) -- Wes Turner On Wed, Jan 22, 2014 at 12:46 PM, Brian Curtin wrote: > On Wed, Jan 22, 2014 at 12:10 PM, John Yeuk Hon Wong > wrote: > > On 1/22/14 8:16 AM, Nick Coghlan wrote: > >> > >> Which is exactly the way most non-web-specialists working inside the > >> comfort of corporate and academic firewalls will react to a change that > >> breaks their access to internal applications, where self-signed certs > and > >> improperly configured internal CAs are endemic (of course, that's > assuming > >> they're using HTTPS at all, which I admit is an optimistic assumption). > > > > The number of people who are using 3.4+ in these environments is probably > > very very low to be honest. I don't have a number to prove, but in that > > environment people are more likely to still be using 2.6+. I think a > > deprecation in 2.7+ would be nice, but forward we should just enable it > by > > default. > > > > When requests changed property calls (e.g. requests.json) to callable > > instead of an attribute(from requests.json to requests.json()), I was > > shocked. I had to figure out by Googling it. I found out from github > > issue.... > > > > I think a hard fail is somehow necessary. > > > > Also, a lot of people overlook at deprecation warnings. They either don't > > care or don't see it. I see a lot of deprecation warnings in the older > > applications I write, but I can careless until it breaks. So as we moving > > forward, we can break it. For those stuck behind, deprecation is the > right > > approach. > > They're disabled by default, so a lot of people simply don't know they > exist because they also don't read the documentation. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Jan 22 20:32:35 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 22 Jan 2014 14:32:35 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: On 1/22/2014 9:25 AM, Donald Stufft wrote: > Awesome, It looks like I?ll be writing a PEP to handle this, I wasn?t > sure if it needed one or not. Definitely. I think the transition from insecure by default to secure by default is somewhat comparable to the transition from ascii by default to unicode by default. I suspect more than one PEP will be needed for various aspects. -- Terry Jan Reedy From ncoghlan at gmail.com Wed Jan 22 21:25:49 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 23 Jan 2014 06:25:49 +1000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <1390401588.2931.73946373.277A7FB6@webmail.messagingengine.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <1390401588.2931.73946373.277A7FB6@webmail.messagingengine.com> Message-ID: On 23 Jan 2014 00:39, "Benjamin Peterson" wrote: > > On Wed, Jan 22, 2014, at 04:15 AM, Donald Stufft wrote: > > > > On Jan 22, 2014, at 6:58 AM, Nick Coghlan wrote: > > > > > On 22 January 2014 21:36, Donald Stufft wrote: > > >> On Jan 22, 2014, at 6:30 AM, M.-A. Lemburg wrote: > > >>> The change would also disable all services using self-signed > > >>> certificates which are very common in internal networks and > > >>> for ad-hoc setups. Many routers and other devices use self-signed > > >>> certificates when offering HTTPS services. > > >> > > >> It will just disable them by default, they can still easily be accessed > > >> you?d just need to pass the ?do not verify? flag. This clearly indicates > > >> that you?re opting out of the S in HTTPS. > > > > > > You need to remember that *Python is fundamentally not an > > > application*. We don't control the interaction with the user, > > > application developers do, and every decision we make has to take that > > > into account. > > > > > > The kinds of decisions that an application like a web browser or a > > > package installer can make aren't necessarily available to a runtime. > > > We had to be cautious even with the initial hash randomisation change > > > to avoid breaking currently working applications. > > > > The same could be said for requests, it?s fundamentally not an > > application > > and can?t control the interaction with the user and yet it validates TLS > > by > > default just fine. > > Speaking of requests, I think another way to address this issue would be > import a requests-like APIs into the stdlib (something which should > happen anyway) and make that verify certificates by default. This would > address the casual urllib-type usecase of fetching files over http/ftp > etc. (I expect most people using their own protocols over raw TLS > already know to force certificate verification.) Guido gave in principle approval for an asyncio backed requests clone as the preferred HTTP client API last year, but that's going to take someone to write and publish it if we're going to be able to include it in 3.5. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Wed Jan 22 21:56:58 2014 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 22 Jan 2014 12:56:58 -0800 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <1390401588.2931.73946373.277A7FB6@webmail.messagingengine.com> Message-ID: <1390424218.31642.74108049.03A37E13@webmail.messagingengine.com> On Wed, Jan 22, 2014, at 12:25 PM, Nick Coghlan wrote: > On 23 Jan 2014 00:39, "Benjamin Peterson" wrote: > > Speaking of requests, I think another way to address this issue would be > > import a requests-like APIs into the stdlib (something which should > > happen anyway) and make that verify certificates by default. This would > > address the casual urllib-type usecase of fetching files over http/ftp > > etc. (I expect most people using their own protocols over raw TLS > > already know to force certificate verification.) > > Guido gave in principle approval for an asyncio backed requests clone as > the preferred HTTP client API last year, but that's going to take someone > to write and publish it if we're going to be able to include it in 3.5. But requests is synchronous, so I'm not sure how much you can use of asyncio. I was thinking of something bolted onto urllib. From brett at python.org Wed Jan 22 22:29:01 2014 From: brett at python.org (Brett Cannon) Date: Wed, 22 Jan 2014 16:29:01 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <1390424218.31642.74108049.03A37E13@webmail.messagingengine.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <1390401588.2931.73946373.277A7FB6@webmail.messagingengine.com> <1390424218.31642.74108049.03A37E13@webmail.messagingengine.com> Message-ID: On Wed, Jan 22, 2014 at 3:56 PM, Benjamin Peterson wrote: > > > On Wed, Jan 22, 2014, at 12:25 PM, Nick Coghlan wrote: > > On 23 Jan 2014 00:39, "Benjamin Peterson" wrote: > > > Speaking of requests, I think another way to address this issue would > be > > > import a requests-like APIs into the stdlib (something which should > > > happen anyway) and make that verify certificates by default. This would > > > address the casual urllib-type usecase of fetching files over http/ftp > > > etc. (I expect most people using their own protocols over raw TLS > > > already know to force certificate verification.) > > > > Guido gave in principle approval for an asyncio backed requests clone as > > the preferred HTTP client API last year, but that's going to take someone > > to write and publish it if we're going to be able to include it in 3.5. > > But requests is synchronous, so I'm not sure how much you can use of > asyncio. I was thinking of something bolted onto urllib. Sure, but the key point is that a new async API can be made synchronous as well by blocking as necessary. The eventual synchronous API can resemble or take inspiration from requests. Point is, though, is it was admitted a new module is probably called for thanks to asyncio (and to give us a chance to fix mistakes in urllib). -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Wed Jan 22 22:41:32 2014 From: larry at hastings.org (Larry Hastings) Date: Wed, 22 Jan 2014 13:41:32 -0800 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> Message-ID: <52E03B0C.6080709@hastings.org> On 01/22/2014 08:20 AM, Serhiy Storchaka wrote: > 20.01.14 13:14, Serhiy Storch > aka ???????(??): >>> Contestant 5: "Put in __clinic__ directory, add .h" >>> >>> foo.c -> __clinic__/foo.c.h >>> foo.h -> __clinic__/foo.h.h >> >> -0.5. > > As far as 4 and 5 have equal total votes, I change my vote for 5 from > -0.5 to -0. Too late! The poll ended Tuesday evening at 11pm, PST (GMT -0800). ;-) And yes, with 13 votes cast, it ended with a tie between "clinic/{filename}.h" and "__clinic__/{filename}.h", both at +4. As officiant I get to be the tiebreaker. My thoughts so far: * A bunch of longtime Python core devs cast their votes for "__clinic__": Nick, Terry, Stefan, Brett, Barry. On the other hand, Antoine and Georg preferred "clinic". * We have the precendent of __pycache__, where we cache machine-generated code that's the equivalent of code that in a file that's a sibling of the __pycache__ directory. * But it's not a perfect metaphor. For one, this directory will be checked in; __pycache__ directories should not be checked in. For another, if you blow away a __pycache__ directory everything automatically works fine. If you blow away a directory of Clinic generated code, you have to rebuild it by hand. Until you do you've broken your build. * We also have the precedent of "stringlib", a directory containing a bunch of unpleasant-to-look-at headers containing C code. It's not machine-generated code. But it is templatized code, so it's kind of compile-time generated on the fly if you squint at it. And it is checked in. * We also have the precedent of some machine-generated C code that is checked in in the Python tree: Python-ast.c, Python-ast.h. (Maybe one or two more? I forget.) None of these files have funny double-underscores prepended to their names. Also: If you only examine the people who voted +1 on "clinic", the sum of their votes on "__clinic__" is -0.5. If you only examine the people who voted +1 on "__clinic__", the sum of their votes on "clinic" is +2. Therefore, the people who voted for "__clinic__" are pretty tolerant of "clinic". The people who voted for "clinic" are less tolerant of "__clinic__". And finally: The total positive votes for "clinic" were 6, and total for the minus -2. The total positive votes for "__clinic__" were 7, and the minus -3. So "__clinic__" seems slightly more divisive. I'm leaning towards "clinic", primarily because of precedents in CPython trunk. But also because it makes it look more on-purpose and permanent. And because it's more aesthetically pleasing to look at. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Wed Jan 22 22:48:03 2014 From: donald at stufft.io (Donald Stufft) Date: Wed, 22 Jan 2014 16:48:03 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <1390401588.2931.73946373.277A7FB6@webmail.messagingengine.com> <1390424218.31642.74108049.03A37E13@webmail.messagingengine.com> Message-ID: Never mind. If someone else cares they can propose it. I withdraw. > On Jan 22, 2014, at 4:29 PM, Brett Cannon wrote: > > > > >> On Wed, Jan 22, 2014 at 3:56 PM, Benjamin Peterson wrote: >> >> >> On Wed, Jan 22, 2014, at 12:25 PM, Nick Coghlan wrote: >> > On 23 Jan 2014 00:39, "Benjamin Peterson" wrote: >> > > Speaking of requests, I think another way to address this issue would be >> > > import a requests-like APIs into the stdlib (something which should >> > > happen anyway) and make that verify certificates by default. This would >> > > address the casual urllib-type usecase of fetching files over http/ftp >> > > etc. (I expect most people using their own protocols over raw TLS >> > > already know to force certificate verification.) >> > >> > Guido gave in principle approval for an asyncio backed requests clone as >> > the preferred HTTP client API last year, but that's going to take someone >> > to write and publish it if we're going to be able to include it in 3.5. >> >> But requests is synchronous, so I'm not sure how much you can use of >> asyncio. I was thinking of something bolted onto urllib. > > Sure, but the key point is that a new async API can be made synchronous as well by blocking as necessary. The eventual synchronous API can resemble or take inspiration from requests. Point is, though, is it was admitted a new module is probably called for thanks to asyncio (and to give us a chance to fix mistakes in urllib). > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From jnoller at gmail.com Wed Jan 22 23:09:54 2014 From: jnoller at gmail.com (Jesse Noller) Date: Wed, 22 Jan 2014 16:09:54 -0600 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <1390401588.2931.73946373.277A7FB6@webmail.messagingengine.com> <1390424218.31642.74108049.03A37E13@webmail.messagingengine.com> Message-ID: On Wed, Jan 22, 2014 at 3:48 PM, Donald Stufft wrote: > Never mind. If someone else cares they can propose it. I withdraw. > I'll throw writing a PEP for 3.5 to do this following the deprecation policy on my todo list so 3.4 fixing can move on. I needed to brush up on my ReST anyway. From ncoghlan at gmail.com Wed Jan 22 23:20:24 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 23 Jan 2014 08:20:24 +1000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <1390401588.2931.73946373.277A7FB6@webmail.messagingengine.com> <1390424218.31642.74108049.03A37E13@webmail.messagingengine.com> Message-ID: On 23 Jan 2014 07:48, "Donald Stufft" wrote: > > Never mind. If someone else cares they can propose it. I withdraw. That's unfortunate, but understandable - we already have a lot to deal with just trying to get even the software distribution infrastructure to a "secure by default" status. However, now we have access to the system cert stores on all major platforms, I *do* think it's a good idea to eventually change the default settings to include host verification. It's just any such concrete proposal, like any other major backwards incompatible change, needs to be written up as a PEP, including a transition plan for insecure environments where users will blame *Python* if an upgrade breaks things, rather than the insecurity of their configuration. We know all too well from the Python 3 transition how unhappy it makes users when a new version complains about environmental issues that previous versions blithely ignored. While the normal deprecation process should suffice, that still means Python 3.6 (2017-ish) is the earliest feasible target for new default settings. Such a proposal will also need to address the implications for source compatible Python 2/3 code across *all* secure network protocols, not just HTTPS (the latter can be handled relatively easily using the requests module). A new, preferred, secure-by-default HTTPS client API for the standard library (based on the requests API) is an orthogonal proposal, and one that already has in-principle approval from Guido, preferably as a synchronous front end to the asyncio networking backend. Regards, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Wed Jan 22 23:21:00 2014 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 22 Jan 2014 14:21:00 -0800 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <1390401588.2931.73946373.277A7FB6@webmail.messagingengine.com> <1390424218.31642.74108049.03A37E13@webmail.messagingengine.com> Message-ID: <1390429260.19663.74139045.44B5E189@webmail.messagingengine.com> On Wed, Jan 22, 2014, at 01:48 PM, Donald Stufft wrote: > Never mind. If someone else cares they can propose it. I withdraw. I'm sorry to see this thread went down hill so quickly. I think we can all agree than not validating certs by default is bad and that it should change "soon". It's only a question of when. And since we're python-dev, we like to take things in excruciating slow and careful pace. From tjreedy at udel.edu Wed Jan 22 23:47:36 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 22 Jan 2014 17:47:36 -0500 Subject: [Python-Dev] .clinic.c vs .c.clinic In-Reply-To: <52E03B0C.6080709@hastings.org> References: <1870884.AZlKeLdVux@raxxla> <52DACCC3.9080800@hastings.org> <03c12f1b286945729d6ad60f0edcd042@BLUPR03MB389.namprd03.prod.outlook.com> <52DBA6BF.5070108@hastings.org> <52DBFD65.8050307@stoneleaf.us> <52DCD8BC.1040205@hastings.org> <52E03B0C.6080709@hastings.org> Message-ID: On 1/22/2014 4:41 PM, Larry Hastings wrote: > And yes, with 13 votes cast, it ended with a tie between > "clinic/{filename}.h" and "__clinic__/{filename}.h", both at +4. As > officiant I get to be the tiebreaker. Yep. > My thoughts so far: > * A bunch of longtime Python core devs cast their votes for > "__clinic__": Nick, Terry, Stefan, Brett, Barry. On the other hand, > Antoine and Georg preferred "clinic". > * We have the precendent of __pycache__, where we cache > machine-generated code that's the equivalent of code that in a file > that's a sibling of the __pycache__ directory. > * But it's not a perfect metaphor. For one, this directory will be > checked in; __pycache__ directories should not be checked in. For > another, if you blow away a __pycache__ directory everything > automatically works fine. If you blow away a directory of Clinic > generated code, you have to rebuild it by hand. Until you do you've > broken your build. > * We also have the precedent of "stringlib", a directory containing a > bunch of unpleasant-to-look-at headers containing C code. It's not > machine-generated code. But it is templatized code, so it's kind of > compile-time generated on the fly if you squint at it. And it is > checked in. > * We also have the precedent of some machine-generated C code that is > checked in in the Python tree: Python-ast.c, Python-ast.h. (Maybe one or > two more? I forget.) None of these files have funny double-underscores > prepended to their names. > > Also: > If you only examine the people who voted +1 on "clinic", the sum of > their votes on "__clinic__" is -0.5. > If you only examine the people who voted +1 on "__clinic__", the sum of > their votes on "clinic" is +2. > Therefore, the people who voted for "__clinic__" are pretty tolerant of > "clinic". The people who voted for "clinic" are less tolerant of > "__clinic__". > > And finally: > The total positive votes for "clinic" were 6, and total for the minus -2. > The total positive votes for "__clinic__" were 7, and the minus -3. > So "__clinic__" seems slightly more divisive. > > I'm leaning towards "clinic", primarily because of precedents in CPython > trunk. But also because it makes it look more on-purpose and permanent. > And because it's more aesthetically pleasing to look at. I think you nicely summarized the various thoughts on 'clinic/' versus '__clinic__'. -- Terry Jan Reedy From christian at python.org Thu Jan 23 00:31:40 2014 From: christian at python.org (Christian Heimes) Date: Thu, 23 Jan 2014 00:31:40 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <1390401588.2931.73946373.277A7FB6@webmail.messagingengine.com> <1390424218.31642.74108049.03A37E13@webmail.messagingengine.com> Message-ID: On 22.01.2014 23:20, Nick Coghlan wrote: > However, now we have access to the system cert stores on all major > platforms, I *do* think it's a good idea to eventually change the > default settings to include host verification. Somebody has revise the situation on OSX for Python 3.5 and possible create new bindings to the keychain API. OSX has only 0.9.8. Apple has deprecated OpenSSL and I'd like to drop 0.9.8 support in 3.5. > Such a proposal will also need to address the implications for source > compatible Python 2/3 code across *all* secure network protocols, not > just HTTPS (the latter can be handled relatively easily using the > requests module). Please count me in! I see two options to handle Python < 3.4: backport the ssl module or hope that the "cryptography" library is ready. Christian From wes.turner at gmail.com Thu Jan 23 02:00:22 2014 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 22 Jan 2014 19:00:22 -0600 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> <1390401588.2931.73946373.277A7FB6@webmail.messagingengine.com> <1390424218.31642.74108049.03A37E13@webmail.messagingengine.com> Message-ID: > While the normal deprecation process should suffice, that still means Python 3.6 (2017-ish) is the earliest feasible target for new default settings. Could a CERT_NOVERIFY no-op be added now? (no-op because it would just be more explicit about the default behavior anyway) [1] > Such a proposal will also need to address the implications for source compatible Python 2/3 code across *all* secure network protocols, not just HTTPS (the latter can be handled relatively easily using the requests module). Could/should this be a feature of 2to3? [1] (and/or automated AST code scanners for Python source)? [1] https://mail.python.org/pipermail/python-dev/2014-January/131974.html > A new, preferred, secure-by-default HTTPS client API for the standard library (based on the requests API) is an orthogonal proposal, and one that already has in-principle approval from Guido, preferably as a synchronous front end to the asyncio networking backend. I'm aware of aiohttp (BSD), but haven't yet had much chance to review the source. It looks like the tests currently require nose and gunicorn, because it provides an asyncio gunicorn worker. "http client/server for asyncio" https://github.com/fafhrd91/aiohttp -- Wes Turner On Wed, Jan 22, 2014 at 4:20 PM, Nick Coghlan wrote: > > On 23 Jan 2014 07:48, "Donald Stufft" wrote: >> >> Never mind. If someone else cares they can propose it. I withdraw. > > That's unfortunate, but understandable - we already have a lot to deal with > just trying to get even the software distribution infrastructure to a > "secure by default" status. > > However, now we have access to the system cert stores on all major > platforms, I *do* think it's a good idea to eventually change the default > settings to include host verification. > > It's just any such concrete proposal, like any other major backwards > incompatible change, needs to be written up as a PEP, including a transition > plan for insecure environments where users will blame *Python* if an upgrade > breaks things, rather than the insecurity of their configuration. We know > all too well from the Python 3 transition how unhappy it makes users when a > new version complains about environmental issues that previous versions > blithely ignored. > > While the normal deprecation process should suffice, that still means Python > 3.6 (2017-ish) is the earliest feasible target for new default settings. > > Such a proposal will also need to address the implications for source > compatible Python 2/3 code across *all* secure network protocols, not just > HTTPS (the latter can be handled relatively easily using the requests > module). > > A new, preferred, secure-by-default HTTPS client API for the standard > library (based on the requests API) is an orthogonal proposal, and one that > already has in-principle approval from Guido, preferably as a synchronous > front end to the asyncio networking backend. > > Regards, > Nick. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com > From kristjan at ccpgames.com Thu Jan 23 07:02:18 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Thu, 23 Jan 2014 06:02:18 +0000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: > -----Original Message----- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Nick Coghlan > Sent: Wednesday, January 22, 2014 19:45 > To: Paul Moore > Cc: Python-Dev > Subject: Re: [Python-Dev] Enable Hostname and Certificate Chain Validation > Right, the browsers have a whole system of "click through" security to make > the web (and corporate intranets!) still usable even when they only accept > CA signed certs by default. With a programming language, there's no such > interactivity, so applications just break and users don't know why. > If not already possible, I suggest that we allow the use of a certificate validation callback (it isn't possible for 2.7, I just hacked in one yesterday to allow me to ignore out-date-failure for certificates.) Using this, it would be possible to e.g. emit warnings when certificiate failures occur, rather than deny connection outright. K From scott+python-dev at scottdial.com Thu Jan 23 07:45:15 2014 From: scott+python-dev at scottdial.com (Scott Dial) Date: Thu, 23 Jan 2014 01:45:15 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: <52E0BA7B.2000503@scottdial.com> On 2014-01-22 9:33 AM, Donald Stufft wrote: > For everything but pip, you?d add it to your OS cert store. Pip doesn?t > use that so you?d have to use the ?cert config. What if I don't want that self-signed cert to be trusted by all users on the system? What if I don't have administrative rights? How do I do it then? Is this common knowledge for average users? Are we trading one big red box in the documentation for another? Anecdotally, I already know of a system at work that is using HTTPS purely for encryption, because the authentication is done in-band. So, a self-signed cert was wholly sufficient. The management tools use a RESTful interface over HTTPS for control, but you are telling me this will be broken by default now. What do I tell our developers (who often adopt the latest and greatest versions of things to play with)? -- Scott Dial scott at scottdial.com From stephen at xemacs.org Thu Jan 23 09:37:06 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 23 Jan 2014 17:37:06 +0900 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: <87bnz38571.fsf@uwakimon.sk.tsukuba.ac.jp> Cory Benfield writes: > I'm overwhelmingly, dramatically +1 on this. There's no good > architectural reason to not use the built-in certificate chains by > default. I'd like to be in favour of backporting this change to earlier > Python versions as well, but it feels just a bit too aggressive. -1 This is just a bit too aggressive, too. I'll guarantee this breaks applications all over Japan, especially in universities because the Ministry of Education uses certificates rooted somewhere nobody's ever heard of, and typically don't bother to ensure the domain name matches the cert being presented. I've even run into such domain-match issues with banks (not banks I deal with any more, of course!) This is quite different from web browsers and other interactive applications. It has the potential to break "secure" mail and news and other automatic data transfers. Breaking people's software that should run silently in the background just because they upgrade Python shouldn't happen, and people here will blame Python, not their broken websites and network apps. I don't know what the right answer is, but this needs careful discussion and amelioration, not just "you're broken, so take the consequences!" From stephen at xemacs.org Thu Jan 23 10:05:27 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 23 Jan 2014 18:05:27 +0900 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: <87a9en83vs.fsf@uwakimon.sk.tsukuba.ac.jp> Donald Stufft writes: > As an additional side note, anecdotal evidence and what not, but > *every* time I bring this up somewhere I get at least one reply > that looks similar to > https://twitter.com/ojiidotch/status/425986619879866368 Hey, wait a cotton-picking minute! Are you telling me that Perl, PHP, and Ruby *do* verify certs by default in their "batteries included" stdlibs, and developers using those languages have been turning that feature off in their code for, like, you know, well, for-EVER man!? (They surely don't leave it on, or my employer would have fixed their broken cert chain and hostnames by now.) If so, that's evidence for the practicality of the proposal, and maybe even for fast-tracking it to catch up. My employer and the Ministry of Education, Culture, Science, and Technology be damned (and they will be). But if it's only the already security-conscious developers and managers who go WTF?, and other environments don't do this by default, I'd consider that a "dangerous curve, slow down" sign. From martin at v.loewis.de Thu Jan 23 13:41:08 2014 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 23 Jan 2014 13:41:08 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <52E0BA7B.2000503@scottdial.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52E0BA7B.2000503@scottdial.com> Message-ID: <52E10DE4.4070904@v.loewis.de> Am 23.01.14 07:45, schrieb Scott Dial: > Anecdotally, I already know of a system at work that is using HTTPS > purely for encryption, because the authentication is done in-band. So, a > self-signed cert was wholly sufficient. The management tools use a > RESTful interface over HTTPS for control, but you are telling me this > will be broken by default now. What do I tell our developers (who often > adopt the latest and greatest versions of things to play with)? If they play with the newest version before actually using it in production, all is well. You can then tell them that they have four options: - not upgrade to the newest Python release (at least not until they are willing to pursue any of the other alternatives) - update the code to disable cert validation, or explicitly add the self-signed cert as a trusted one programmatically. - update the client system configuration, to add the self-signed certificate as trusted (system-wide or per user). - update the server, to use a cert signed by one of the trusted CAs. Regards, Martin From ncoghlan at gmail.com Thu Jan 23 15:38:57 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 24 Jan 2014 00:38:57 +1000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <52E10DE4.4070904@v.loewis.de> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52E0BA7B.2000503@scottdial.com> <52E10DE4.4070904@v.loewis.de> Message-ID: On 23 January 2014 22:41, "Martin v. L?wis" wrote: > Am 23.01.14 07:45, schrieb Scott Dial: >> Anecdotally, I already know of a system at work that is using HTTPS >> purely for encryption, because the authentication is done in-band. So, a >> self-signed cert was wholly sufficient. The management tools use a >> RESTful interface over HTTPS for control, but you are telling me this >> will be broken by default now. What do I tell our developers (who often >> adopt the latest and greatest versions of things to play with)? > > If they play with the newest version before actually using it in > production, all is well. You can then tell them that they have > four options: > - not upgrade to the newest Python release (at least not until > they are willing to pursue any of the other alternatives) > - update the code to disable cert validation, or explicitly > add the self-signed cert as a trusted one programmatically. > - update the client system configuration, to add the self-signed > certificate as trusted (system-wide or per user). > - update the server, to use a cert signed by one of the > trusted CAs. Or, depending on the exact transition plan, potentially set: PYTHONSSLDEFAULT=NOVERIFY (akin to the "no, really, don't randomise the hashes" option). That's the kind of question a PEP would be needed to thrash out, though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Thu Jan 23 16:03:26 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 23 Jan 2014 16:03:26 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52E0BA7B.2000503@scottdial.com> Message-ID: <20140123160326.24fa8931@fsol> On Thu, 23 Jan 2014 01:45:15 -0500 Scott Dial wrote: > > Anecdotally, I already know of a system at work that is using HTTPS > purely for encryption, because the authentication is done in-band. So, a > self-signed cert was wholly sufficient. The management tools use a > RESTful interface over HTTPS for control, but you are telling me this > will be broken by default now. What do I tell our developers (who often > adopt the latest and greatest versions of things to play with)? That the system may be vulnerable to MITM attacks? (depending on how the authentication is done) Regards Antoine. From solipsis at pitrou.net Thu Jan 23 16:04:50 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 23 Jan 2014 16:04:50 +0100 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> Message-ID: <20140123160450.15220ff1@fsol> On Thu, 23 Jan 2014 06:02:18 +0000 Kristj?n Valur J?nsson wrote: > > If not already possible, I suggest that we allow the use of a certificate validation callback > (it isn't possible for 2.7, I just hacked in one yesterday to allow me to ignore out-date-failure for certificates.) > Using this, it would be possible to e.g. emit warnings when certificiate failures occur, rather than deny connection outright. That's a possible transition measure indeed. Regards Antoine. From wes.turner at gmail.com Thu Jan 23 08:24:38 2014 From: wes.turner at gmail.com (Wes Turner) Date: Thu, 23 Jan 2014 01:24:38 -0600 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <52E0BA7B.2000503@scottdial.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52E0BA7B.2000503@scottdial.com> Message-ID: On 2014-01-22 9:33 AM, Donald Stufft wrote: > For everything but pip, you?d add it to your OS cert store. Pip doesn?t > use that so you?d have to use the ?cert config. > What if I don't want that self-signed cert to be trusted by all users on the system? Specify a client cert and an appropriate CA bundle. > What if I don't have administrative rights? Specify an appropriate CA bundle for which there are appropriate permissions. > How do I do it then? Is this common knowledge for average users? Are we trading one big red box in the documentation for another? That's really not the case. Current web browsers are not susceptible to this particular issue. > Anecdotally, I already know of a system at work that is using HTTPS > purely for encryption, because the authentication is done in-band. So, a > self-signed cert was wholly sufficient. The management tools use a > RESTful interface over HTTPS for control, but you are telling me this > will be broken by default now. What do I tell our developers (who often > adopt the latest and greatest versions of things to play with)? There are layers. OSI layers, if you prefer. It sounds like the relevant layers here are: * HTTP * SSL/TLS * TCP A MITM compromise of the channel (e.g. by a rogue security tester responding with a different SSL certificate that is not signed by a CA, with a different hostname than requested) renders most 'in-band' authentication mechanisms (such as HTTP Basic, Digest, and Cookie-based Sessions) invalid. 'Higher layers' generally operate by sharing tokens as plaintext. With SSL compromised through a MITM (as allowed by not validating hostnames by default), said security tester could trivially intercept and modify any of the requests and responses in the channel. CWE-300: Channel Accessible by Non-Endpoint ('Man-in-the-Middle') http://cwe.mitre.org/data/definitions/300.html To use a bad metaphor: it's like the carrier piegeon stops on the way home and there's no seal. From wes.turner at gmail.com Thu Jan 23 12:20:07 2014 From: wes.turner at gmail.com (Wes Turner) Date: Thu, 23 Jan 2014 05:20:07 -0600 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <87a9en83vs.fsf@uwakimon.sk.tsukuba.ac.jp> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <87a9en83vs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: > But if it's only the already security-conscious developers and > managers who go WTF?, and other environments don't do this by default, > I'd consider that a "dangerous curve, slow down" sign. Mitigations: **Packaging** * Upgrade setuptools (distribute, zc.buildout) * Avoid easy_install, python setup.py install, and python setup.py develop (until it can be verified that the installed version of setuptools contains VerifyingHTTPSHandler [1]) https://bitbucket.org/pypa/setuptools/history-node/tip/setuptools/ssl_support.py * +1 for Pip install -e vcs+ssh://vcs at example.org/username/pkgname at semver@egg=pkgname * +1 for Conda * +1 for OS packages **Implementation** * Python < 3.4 : https://pypi.python.org/pypi/backports.ssl_match_hostname **Awareness** * Big red warning boxes: (.. warning:: in RST): Documentation * This must not be easy to test. * http://www.cvedetails.com/vulnerability-list/vendor_id-10210/product_id-18230/Python-Python.html -- Wes Turner On Thu, Jan 23, 2014 at 3:05 AM, Stephen J. Turnbull wrote: > Donald Stufft writes: > > > As an additional side note, anecdotal evidence and what not, but > > *every* time I bring this up somewhere I get at least one reply > > that looks similar to > > https://twitter.com/ojiidotch/status/425986619879866368 > > Hey, wait a cotton-picking minute! > > Are you telling me that Perl, PHP, and Ruby *do* verify certs by > default in their "batteries included" stdlibs, and developers using > those languages have been turning that feature off in their code for, > like, you know, well, for-EVER man!? (They surely don't leave it on, > or my employer would have fixed their broken cert chain and hostnames > by now.) > > If so, that's evidence for the practicality of the proposal, and maybe > even for fast-tracking it to catch up. My employer and the Ministry > of Education, Culture, Science, and Technology be damned (and they > will be). > > But if it's only the already security-conscious developers and > managers who go WTF?, and other environments don't do this by default, > I'd consider that a "dangerous curve, slow down" sign. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com From storchaka at gmail.com Thu Jan 23 18:22:56 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 23 Jan 2014 19:22:56 +0200 Subject: [Python-Dev] Wrong keyword parameter name in regex pattern methods Message-ID: <3184460.Lz2zyrXQ7D@raxxla> Currently there is a mismatch between documented parameter names in some methods of regex pattern object. match(), search(), and fullmatch() (the last was added in 3.4) document first arguments as "string": match(string[, pos[, endpos]]) search(string[, pos[, endpos]]) fullmatch(string[, pos[, endpos]]) But actually they don't accept the "string" keyword parameter, by mistake it is named as "pattern" in the code. findall() and split() document first arguments as "string": findall(string[, pos[, endpos]]) -> list split(string[, maxsplit = 0]) But actually they don't accept the "string" keyword parameter, it is named as "source" in the code. The scanner() method is not documented and also has the "source" parameter. All other methods accepts the "string" argument as documented. The match object returned by match(), search(), fullmatch(), and finditer() methods and generated by the scanner, has the "string" attribute which is equivalent to the argument of these methods. Module level functions which corresponds to these methods have the "string" parameter. Due to all these facts I think that parameter names "pattern" and "source" are accidental mistakes and should be renamed to expected "string". Because this parameter is mandatory, apparently it is always used as positional parameter, and this error was not discovered long time. http://bugs.python.org/issue20283 From ethan at stoneleaf.us Fri Jan 24 00:05:09 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 23 Jan 2014 15:05:09 -0800 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <5881F8C3-71F3-42FF-826C-DD4841CDC2E4@stufft.io> Message-ID: <52E1A025.6080505@stoneleaf.us> On 01/22/2014 04:15 AM, Donald Stufft wrote: > > As I?ve said multiple times, I think it?s fine to send it through the > deprecation process which is still pretty long and gives people > a good chunk of time to update. Agreed. -- ~Ethan~ From ethan at stoneleaf.us Fri Jan 24 00:03:48 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 23 Jan 2014 15:03:48 -0800 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <52DFC4A2.4090408@egenix.com> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <52DFA2AE.8030504@egenix.com> <52DFABD4.3090702@egenix.com> <50AAFC8B-FB56-4FD6-B5BC-4D56C47A47C3@gmail.com> <52DFC4A2.4090408@egenix.com> Message-ID: <52E19FD4.2070403@stoneleaf.us> On 01/22/2014 05:16 AM, M.-A. Lemburg wrote: > On 22.01.2014 13:43, Jesse Noller wrote: >> >> Donald is perfectly right: today, it's trivial to MITM an application >> that relies off of the current behavior; this is bad news bears for >> users and developers as it means they need domain knowledge to secure >> their applications by default they may not have. > > I don't think you need much domain knowledge to insert > a single line of code into applications to enable the checks. I find myself on the "dumb user" side of this argument, and I think it is much like the str/unicode transition of 3.0 -- which is it say, there are many who didn't understand unicode until forced to by 3.0, and likewise there will be many who don't understand security until forced to by enabling this new feature. One big difference is it's possible to opt-out of this security feature (which is a good thing, considering all the ill-configured systems out there). -- ~Ethan~ From tjreedy at udel.edu Fri Jan 24 00:56:46 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 23 Jan 2014 18:56:46 -0500 Subject: [Python-Dev] Wrong keyword parameter name in regex pattern methods In-Reply-To: <3184460.Lz2zyrXQ7D@raxxla> References: <3184460.Lz2zyrXQ7D@raxxla> Message-ID: On 1/23/2014 12:22 PM, Serhiy Storchaka wrote: > Currently there is a mismatch between documented parameter names in some > methods of regex pattern object. > > match(), search(), and fullmatch() (the last was added in 3.4) document first > arguments as "string": > > match(string[, pos[, endpos]]) > search(string[, pos[, endpos]]) > fullmatch(string[, pos[, endpos]]) > > But actually they don't accept the "string" keyword parameter, by mistake it > is named as "pattern" in the code. > > findall() and split() document first arguments as "string": > > findall(string[, pos[, endpos]]) -> list > split(string[, maxsplit = 0]) > > But actually they don't accept the "string" keyword parameter, it is named as > "source" in the code. > > The scanner() method is not documented and also has the "source" parameter. > > All other methods accepts the "string" argument as documented. The match > object returned by match(), search(), fullmatch(), and finditer() methods and > generated by the scanner, has the "string" attribute which is equivalent to > the argument of these methods. Module level functions which corresponds to > these methods have the "string" parameter. > > Due to all these facts I think that parameter names "pattern" and "source" are > accidental mistakes and should be renamed to expected "string". Because this > parameter is mandatory, apparently it is always used as positional parameter, > and this error was not discovered long time. > > http://bugs.python.org/issue20283 I think we should correct the C code before we expose it more with clinic. Otherwise, help(re.x) will conflict with the re docs. -- Terry Jan Reedy From stephen at xemacs.org Fri Jan 24 04:06:17 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 24 Jan 2014 12:06:17 +0900 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <87a9en83vs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <874n4u84eu.fsf@uwakimon.sk.tsukuba.ac.jp> Wes Turner writes: > > But if it's only the already security-conscious developers and > > managers who go WTF?, and other environments don't do this by default, > > I'd consider that a "dangerous curve, slow down" sign. > > Mitigations: > > **Packaging** > > * Upgrade setuptools (distribute, zc.buildout) > * Avoid easy_install, python setup.py install, and python setup.py develop > (until it can be verified that the installed version of setuptools contains > VerifyingHTTPSHandler [1]) Are you kidding? These *aren't* the apps that I care about breaking, and I know that the PHBs won't pay attention to what I say about fixing their sites and cert chains (believe me, I've tried, and the answer is as Paul Moore says: the users can punch the "go ahead anyway button," what's the big deal here?), they'll just deprecate Python. My question remains: > > Are you telling me that Perl, PHP, and Ruby *do* verify certs by > > default in their "batteries included" stdlibs, and developers using > > those languages have been turning that feature off in their code for, > > like, you know, well, for-EVER man!? I find that hard to believe, given that the security of the network remains broken yet there aren't warnings out to avoid these platforms. (BTW, my employer prides itself on being Matz's alma mater ... they actually might do something if Ruby was breaking things!) From donald at stufft.io Fri Jan 24 04:09:49 2014 From: donald at stufft.io (Donald Stufft) Date: Thu, 23 Jan 2014 22:09:49 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <874n4u84eu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <87a9en83vs.fsf@uwakimon.sk.tsukuba.ac.jp> <874n4u84eu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <3DE7A7D2-07DA-43BB-93C1-D388A7326BF3@stufft.io> On Jan 23, 2014, at 10:06 PM, Stephen J. Turnbull wrote: > Wes Turner writes: >>> But if it's only the already security-conscious developers and >>> managers who go WTF?, and other environments don't do this by default, >>> I'd consider that a "dangerous curve, slow down" sign. >> >> Mitigations: >> >> **Packaging** >> >> * Upgrade setuptools (distribute, zc.buildout) >> * Avoid easy_install, python setup.py install, and python setup.py develop >> (until it can be verified that the installed version of setuptools contains >> VerifyingHTTPSHandler [1]) > > Are you kidding? These *aren't* the apps that I care about breaking, > and I know that the PHBs won't pay attention to what I say about > fixing their sites and cert chains (believe me, I've tried, and the > answer is as Paul Moore says: the users can punch the "go ahead anyway > button," what's the big deal here?), they'll just deprecate Python. > > My question remains: > >>> Are you telling me that Perl, PHP, and Ruby *do* verify certs by >>> default in their "batteries included" stdlibs, and developers using >>> those languages have been turning that feature off in their code for, >>> like, you know, well, for-EVER man!? > > I find that hard to believe, given that the security of the network > remains broken yet there aren't warnings out to avoid these platforms. > (BTW, my employer prides itself on being Matz's alma mater ... they > actually might do something if Ruby was breaking things!) Ruby has verified the peer by default since Ruby 1.9 Go also verifies by default, I?m not aware if PHP or Perl do. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From donald at stufft.io Fri Jan 24 04:19:28 2014 From: donald at stufft.io (Donald Stufft) Date: Thu, 23 Jan 2014 22:19:28 -0500 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <3DE7A7D2-07DA-43BB-93C1-D388A7326BF3@stufft.io> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <87a9en83vs.fsf@uwakimon.sk.tsukuba.ac.jp> <874n4u84eu.fsf@uwakimon.sk.tsukuba.ac.jp> <3DE7A7D2-07DA-43BB-93C1-D388A7326BF3@stufft.io> Message-ID: <89C306D2-44FC-404D-BD3C-B5C6F51F8CF8@stufft.io> On Jan 23, 2014, at 10:09 PM, Donald Stufft wrote: > > On Jan 23, 2014, at 10:06 PM, Stephen J. Turnbull wrote: > >> Wes Turner writes: >>>> But if it's only the already security-conscious developers and >>>> managers who go WTF?, and other environments don't do this by default, >>>> I'd consider that a "dangerous curve, slow down" sign. >>> >>> Mitigations: >>> >>> **Packaging** >>> >>> * Upgrade setuptools (distribute, zc.buildout) >>> * Avoid easy_install, python setup.py install, and python setup.py develop >>> (until it can be verified that the installed version of setuptools contains >>> VerifyingHTTPSHandler [1]) >> >> Are you kidding? These *aren't* the apps that I care about breaking, >> and I know that the PHBs won't pay attention to what I say about >> fixing their sites and cert chains (believe me, I've tried, and the >> answer is as Paul Moore says: the users can punch the "go ahead anyway >> button," what's the big deal here?), they'll just deprecate Python. >> >> My question remains: >> >>>> Are you telling me that Perl, PHP, and Ruby *do* verify certs by >>>> default in their "batteries included" stdlibs, and developers using >>>> those languages have been turning that feature off in their code for, >>>> like, you know, well, for-EVER man!? >> >> I find that hard to believe, given that the security of the network >> remains broken yet there aren't warnings out to avoid these platforms. >> (BTW, my employer prides itself on being Matz's alma mater ... they >> actually might do something if Ruby was breaking things!) > > Ruby has verified the peer by default since Ruby 1.9 > > Go also verifies by default, I?m not aware if PHP or Perl do. Oh, Node.js also verifies by default, PHP apparently does not. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From drsalists at gmail.com Fri Jan 24 04:48:49 2014 From: drsalists at gmail.com (Dan Stromberg) Date: Thu, 23 Jan 2014 19:48:49 -0800 Subject: [Python-Dev] Python 3 marketing document? Message-ID: Has anyone published a web page or wiki page about what's great about Python 3.x? From ncoghlan at gmail.com Fri Jan 24 11:25:09 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 24 Jan 2014 20:25:09 +1000 Subject: [Python-Dev] Python 3 marketing document? In-Reply-To: References: Message-ID: I've tried to address the most common questions and misapprehensions in my Python 3 Q & A: http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Jan 24 15:23:31 2014 From: brett at python.org (Brett Cannon) Date: Fri, 24 Jan 2014 09:23:31 -0500 Subject: [Python-Dev] Python 3 marketing document? In-Reply-To: References: Message-ID: On Thu, Jan 23, 2014 at 10:48 PM, Dan Stromberg wrote: > Has anyone published a web page or wiki page about what's great about > Python 3.x? > In case you want a video I gave a presentation at PyCon US 2013 on Python 3.3 and tried to cover everything from 3.0 on. http://www.youtube.com/watch?v=f_6vDi7ywuA -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Fri Jan 24 16:07:47 2014 From: larry at hastings.org (Larry Hastings) Date: Fri, 24 Jan 2014 07:07:47 -0800 Subject: [Python-Dev] Argument Clinic: what to do with builtins with non-standard signatures? Message-ID: <52E281C3.1050303@hastings.org> BACKGROUND (skippable if you're a know-it-all) Argument parsing for Python functions follows some very strict rules. Unless the function implements its own parsing like so: def black_box(*args, **kwargs): there are some semantics that are always true. For example: * Any parameter that has a default value is optional, and vice-versa. * It doesn't matter whether you pass in a parameter by name or by position, it behaves the same. * You can see the default values by examining its inspect.Signature. * Calling a function and passing in the default value for a parameter is identical to calling the function without that parameter. e.g. (assuming foo is a pure function): def foo(a=value): ... foo() == foo(value) == foo(a=value) With that signature, foo() literally can't tell the difference between those three calls. And it doesn't matter what the type of value is or where you got it. Python builtins are a little less regular. They effectively do their own parsing. So they *could* do any crazy thing they want. 99.9% of the time they do one of four standard things: * They parse their arguments with a single call to PyArg_ParseTuple(). * They parse their arguments with a single call to PyArg_ParseTupleAndKeywords(). * They take a single argument of type "object" (METH_O). * They take no arguments (METH_NOARGS). PyArg_ParseTupleAndKeywords() behaves almost exactly like a Python function. PyArg_ParseTuple() is a little less like a Python function, because it doesn't support keyword arguments. (Surely this behavior is familiar to you!) But then there's that funny 0.1%, the builtins that came up with their own unique approach for parsing arguments--given them funny semantics. Argument Clinic tries to accomodate these as best it can. (That's why it supports "optional groups" for example.) But it can only do so much. THE PROBLEM Argument Clinic's original goal was to provide an introspection signature for every builtin in Python. But a small percentage of builtins have funny semantics that aren't expressable in a valid Python signature. This makes them hard to convert to Argument Clinic, and makes their signature inaccurate. If we want these functions to have an accurate Python introspection signature, their argument parsing will have to change. THE QUESTION What should someone converting functions to Argument Clinic do when faced with one of these functions? Of course, the simplest answer is "nothing"--don't convert the function to Argument Clinic. We're in beta, and any change that isn't a bugfix is impermissible. We can try again for 3.5. But if "any change" is impermissible, then we wouldn't have the community support to convert to Argument Clinic right now. The community wants proper signatures for builtins badly enough that we're doing it now, even though we're already in beta for Python 3.4. Converting to Argument Clinic is, in the vast majority of cases, a straightforward and low-risk change--but it is *a* change. Therefore perhaps the answer isn't an automatic "no". Perhaps additional straightforward, low-risk changes are permissible. The trick is, what constitutes a straightforward, low-risk change? Where should we draw the line? Let's discuss it. Perhaps a consensus will form around an answer besides a flat "no". THE SPECIFICS I'm sorting the problems we see into four rough categories. a) Functions where there's a static Python value that behaves identically to not passing in that parameter (aka "the NULL problem") Example: _sha1.sha1(). Its optional parameter has a default value in C of NULL. We can't express NULL in a Python signature. However, it just so happens that _sha1.sha1(b'') is exactly equivalent to _sha1.sha1(). b'' makes for a fine replacement default value. Same holds for list.__init__(). its optional "sequence" parameter has a default value in C of NULL. But this signature: list.__init__(sequence=()) works fine. The way Clinic works, we can actually still use the NULL as the default value in C. Clinic will let you use completely different values as the published default value in Python and the real default value in C. (Consenting adults rule and all that.) So we could lie to Python and everything works just the way we want it to. Possible Solutions: 0) Do nothing, don't convert the function. 1) Use that clever static value as the default. b) Functions where there's no static Python value that behaves identically to not passing in that parameter (aka "the dynamic default problem") There are functions with parameters whose defaults are mildly dynamic, responding to other parameters. Example: I forget its name, but someone recently showed me a builtin that took a list as its first parameter, and its optional second parameter defaulted to the length of the list. As I recall this function didn't allow negative numbers, so -1 wasn't a good fit. Possible solutions: 0) Do nothing, don't convert the function. 1) Use a magic value as None. Preferably of the same type as the function accepts, but failing that use None. If they pass in the magic value use the previous default value. Guido himself suggested this in 2) Use an Argument Clinic "optional group". This only works for functions that don't support keyword arguments. Also, I hate this, because "optional groups" are not expressable in Python syntax, so these functions automatically have invalid signatures. c) Functions that accept an 'int' when they mean 'boolean' (aka the "ints instead of bools" problem) This is specific but surprisingly common. Before Python 3.3 there was no PyArg_ParseTuple format unit that meant "boolean value". Functions generally used "i" (int). Even older functions accepted an object and called PyLong_AsLong() on it. Passing in True or False for "i" (or PyLong_AsLong()) works, because boolean inherits from long. But anything other than ints and bools throws an exception. In Python 3.3 I added the "p" format unit for boolean arguments. This calls PyObject_IsTrue() which accepts nearly any Python value. I assert that Python has a crystal clear definition of what constitutes "true" and "false". These parameters are clearly intended as booleans but they don't conform to the boolean protocol. So I suggest every instance of this is a (very mild!) bug. But changing these parameters to use "p" is a change: they'll accept many more values than before. Right now people convert these using 'int' because that's an exact match. But sometimes they are optional, and the person doing the conversion wants to use True or False as a default value, and it doesn't work: Argument Clinic's type enforcement complains and they have to work around it. (Argument Clinic has to enforce some type-safety here because the values are used as defaults for C variables.) I've been asked to allow True and False as defaults for "int" parameters specifically because of this. Example: str.splitlines(keepends) Solution: 1) Use "bool". 2) Use "int", and I'll go relax Argument Clinic so they can use bool values as defaults for int parameters. d) Functions with behavior that deliberately defy being expressed as a Python signature (aka the "untranslatable signature" problem) Example: itertools.repeat(), which behaves differently depending on whether "times" is supplied as a positional or keyword argument. (If "times" is <0, and was supplied via position, the function yields 0 times. If "times" is <0, and was supplied via keyword, the function yields infinitely-many times.) Solution: 0) Do nothing, don't convert the function. 1) Change the signature until it is Python compatible. This new signature *must* accept a superset of the arguments accepted by the existing signature. (This is being discussed right now in issue #19145.) //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Fri Jan 24 16:19:48 2014 From: larry at hastings.org (Larry Hastings) Date: Fri, 24 Jan 2014 07:19:48 -0800 Subject: [Python-Dev] Argument Clinic: what to do with builtins with non-standard signatures? In-Reply-To: <52E281C3.1050303@hastings.org> References: <52E281C3.1050303@hastings.org> Message-ID: <52E28494.8060501@hastings.org> On 01/24/2014 07:07 AM, Larry Hastings wrote: > b) Functions where there's no static Python value that behaves > identically to > not passing in that parameter (aka "the dynamic default problem") Ouch! Sorry, I forgot a detail here. This can also be another form of NULL problem. For example, socket.socket.getservbyport() takes an optional "protocol" argument. Internally its default value is NULL. But there's really no good static string that we could use for the default. Guido specifically suggested accepting None here to mean "use the internal default" should be fine. But again this is a change, we're in beta, etc etc, discuss. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Fri Jan 24 16:50:22 2014 From: ram at rachum.com (Ram Rachum) Date: Fri, 24 Jan 2014 17:50:22 +0200 Subject: [Python-Dev] lambda (x, y): Message-ID: I don't like how in Python 3.x, you can't do this: lambda (x, y): whatever It's quite useful in Python 2 if I understand correctly, it's a side effect of such packed arguments not being allowed in function definitions. (i.e. def instead of lambda) Can you please refer me to the original discussion in which it was decided to remove this grammar in Python 3? I'd like to understand the arguments for it. Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Jan 24 16:53:22 2014 From: brett at python.org (Brett Cannon) Date: Fri, 24 Jan 2014 10:53:22 -0500 Subject: [Python-Dev] lambda (x, y): In-Reply-To: References: Message-ID: On Fri, Jan 24, 2014 at 10:50 AM, Ram Rachum wrote: > I don't like how in Python 3.x, you can't do this: > > lambda (x, y): whatever > > It's quite useful in Python 2 > > if I understand correctly, it's a side effect of such packed arguments not > being allowed in function definitions. (i.e. def instead of lambda) > > Can you please refer me to the original discussion in which it was decided > to remove this grammar in Python 3? I'd like to understand the arguments > for it. > http://python.org/dev/peps/pep-3113/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Fri Jan 24 16:57:28 2014 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 24 Jan 2014 10:57:28 -0500 Subject: [Python-Dev] lambda (x, y): In-Reply-To: References: Message-ID: <52E28D68.2050902@trueblade.com> On 1/24/2014 10:50 AM, Ram Rachum wrote: > I don't like how in Python 3.x, you can't do this: > > lambda (x, y): whatever > > It's quite useful in Python 2 > > if I understand correctly, it's a side effect of such packed arguments > not being allowed in function definitions. (i.e. def instead of lambda) You can still do: >>> fn = lambda x, y: x+y >>> fn(20, 22) 42 It's just tuple unpacking which doesn't work. > Can you please refer me to the original discussion in which it was > decided to remove this grammar in Python 3? I'd like to understand the > arguments for it. See PEP 3113. From cory at lukasa.co.uk Fri Jan 24 09:22:54 2014 From: cory at lukasa.co.uk (Cory Benfield) Date: Fri, 24 Jan 2014 08:22:54 +0000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <874n4u84eu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <87a9en83vs.fsf@uwakimon.sk.tsukuba.ac.jp> <874n4u84eu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 24 January 2014 03:06, Stephen J. Turnbull wrote: > Are you kidding? These *aren't* the apps that I care about breaking, > and I know that the PHBs won't pay attention to what I say about > fixing their sites and cert chains (believe me, I've tried, and the > answer is as Paul Moore says: the users can punch the "go ahead anyway > button," what's the big deal here?), they'll just deprecate Python. Surely the solution here isn't to say "well then, let's be insecure by default", it's to provide a "go ahead anyway" button. That at least lets us push the choice to be insecure by default onto someone else. The idea that an enterprise will deprecate Python instead of adding a single environment variable or one line at the top of their scripts seems hugely unlikely. As an example, Requests provides a "stop verifying certs" button, and that works fine for us. (I know that Requests is outside the stdlib and so it's not a perfect analogy, but it's serviceable.) I suspect most people who want this change don't care if users have an easy way to circumvent it, we just want to have the user/enterprise make that choice for themselves. Put another way, we want the average user to fall into a pit of success. From cory at lukasa.co.uk Fri Jan 24 09:29:46 2014 From: cory at lukasa.co.uk (Cory Benfield) Date: Fri, 24 Jan 2014 08:29:46 +0000 Subject: [Python-Dev] Enable Hostname and Certificate Chain Validation In-Reply-To: <87bnz38571.fsf@uwakimon.sk.tsukuba.ac.jp> References: <148DFBC9-7A72-41DA-B0AE-EBFA1A26C759@stufft.io> <87bnz38571.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 23 January 2014 08:37, Stephen J. Turnbull wrote: > > I don't know what the right answer is, but this needs careful > discussion and amelioration, not just "you're broken, so take the > consequences!" > Absolutely. =) With that said, having a great big button to turn the change off (e.g. environment variable, global setting) feels like an acceptable mitigation to me, but then I'm probably still more aggressive than the core developers! Either way, it looks like the deprecation dance is going to be the consensus decision here. From wes.turner at gmail.com Fri Jan 24 15:36:47 2014 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 24 Jan 2014 08:36:47 -0600 Subject: [Python-Dev] Python 3 marketing document? In-Reply-To: References: Message-ID: Hardly marketing documents, but potentially useful nonetheless: http://docs.python.org/3.4/whatsnew/index.html https://bitbucket.org/gutworth/six/src/tip/six.py https://github.com/nandoflorestan/nine/blob/master/nine/__init__.py http://docs.python.org/3/library/2to3.html https://pypi.python.org/pypi/ backports.ssl_match_hostname http:// docs.python.org /3.4/library/ asyncio.html The new pathlib is pretty cool: http://docs.python.org/3.4/whatsnew/3.4.html#new-modules http://docs.python.org/3.4/library/pathlib.html#module-pathlib Wes Turner On Jan 23, 2014 9:49 PM, "Dan Stromberg" wrote: > Has anyone published a web page or wiki page about what's great about > Python 3.x? > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tseaver at palladion.com Fri Jan 24 17:10:24 2014 From: tseaver at palladion.com (Tres Seaver) Date: Fri, 24 Jan 2014 11:10:24 -0500 Subject: [Python-Dev] Argument Clinic: what to do with builtins with non-standard signatures? In-Reply-To: <52E281C3.1050303@hastings.org> References: <52E281C3.1050303@hastings.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/24/2014 10:07 AM, Larry Hastings wrote: > THE SPECIFICS > > I'm sorting the problems we see into four rough categories. > > a) Functions where there's a static Python value that behaves > identically to not passing in that parameter (aka "the NULL problem") > > Example: _sha1.sha1(). Its optional parameter has a default value in > C of NULL. We can't express NULL in a Python signature. However, it > just so happens that _sha1.sha1(b'') is exactly equivalent to > _sha1.sha1(). b'' makes for a fine replacement default value. > > Same holds for list.__init__(). its optional "sequence" parameter > has a default value in C of NULL. But this signature: > list.__init__(sequence=()) works fine. > > The way Clinic works, we can actually still use the NULL as the > default value in C. Clinic will let you use completely different > values as the published default value in Python and the real default > value in C. (Consenting adults rule and all that.) So we could lie to > Python and everything works just the way we want it to. > > Possible Solutions: 0) Do nothing, don't convert the function. 1) Use > that clever static value as the default. I prefer #1. > b) Functions where there's no static Python value that behaves > identically to not passing in that parameter (aka "the dynamic default > problem") > > There are functions with parameters whose defaults are mildly > dynamic, responding to other parameters. > > Example: I forget its name, but someone recently showed me a builtin > that took a list as its first parameter, and its optional second > parameter defaulted to the length of the list. As I recall this > function didn't allow negative numbers, so -1 wasn't a good fit. > > Possible solutions: 0) Do nothing, don't convert the function. 1) Use > a magic value as None. Preferably of the same type as the function > accepts, but failing that use None. If they pass in the magic value > use the previous default value. Guido himself suggested this in 2) > Use an Argument Clinic "optional group". This only works for > functions that don't support keyword arguments. Also, I hate this, > because "optional groups" are not expressable in Python syntax, so > these functions automatically have invalid signatures. I prefer #1. > c) Functions that accept an 'int' when they mean 'boolean' (aka the > "ints instead of bools" problem) > > This is specific but surprisingly common. > > Before Python 3.3 there was no PyArg_ParseTuple format unit that > meant "boolean value". Functions generally used "i" (int). Even > older functions accepted an object and called PyLong_AsLong() on it. > Passing in True or False for "i" (or PyLong_AsLong()) works, because > boolean inherits from long. But anything other than ints and bools > throws an exception. > > In Python 3.3 I added the "p" format unit for boolean arguments. This > calls PyObject_IsTrue() which accepts nearly any Python value. > > I assert that Python has a crystal clear definition of what > constitutes "true" and "false". These parameters are clearly intended > as booleans but they don't conform to the boolean protocol. So I > suggest every instance of this is a (very mild!) bug. But changing > these parameters to use "p" is a change: they'll accept many more > values than before. > > Right now people convert these using 'int' because that's an exact > match. But sometimes they are optional, and the person doing the > conversion wants to use True or False as a default value, and it > doesn't work: Argument Clinic's type enforcement complains and they > have to work around it. (Argument Clinic has to enforce some > type-safety here because the values are used as defaults for C > variables.) I've been asked to allow True and False as defaults for > "int" parameters specifically because of this. > > Example: str.splitlines(keepends) > > Solution: 1) Use "bool". 2) Use "int", and I'll go relax Argument > Clinic so they can use bool values as defaults for int parameters. I prefer #1. > d) Functions with behavior that deliberately defy being expressed as > a Python signature (aka the "untranslatable signature" problem) > > Example: itertools.repeat(), which behaves differently depending on > whether "times" is supplied as a positional or keyword argument. (If > "times" is <0, and was supplied via position, the function yields 0 > times. If "times" is <0, and was supplied via keyword, the function > yields infinitely-many times.) > > Solution: 0) Do nothing, don't convert the function. 1) Change the > signature until it is Python compatible. This new signature *must* > accept a superset of the arguments accepted by the existing signature. > (This is being discussed right now in issue #19145.) I can't imagine justifying such an API design in the first place, but sometimes things "jest grew", rather than being designed. I'm in favor of # 1, in any case. If real backward compatibility is not feasible for some reason, then I would favor the following: 2) Deprecate the manky builtin, and leave it unconverted for AC; then add a new builtin with a sane signature, and re-implement the deprecated version as an impedance-matching shim over the new one. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEUEARECAAYFAlLikGgACgkQ+gerLs4ltQ5UEgCYu13+7HfmwWw2hq7GrsBGM4I3 UACgz3WKVvqG1QkOsx8C3tiCjp5PkL0= =2tLW -----END PGP SIGNATURE----- From storchaka at gmail.com Fri Jan 24 17:28:03 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 24 Jan 2014 18:28:03 +0200 Subject: [Python-Dev] Argument Clinic: what to do with builtins with non-standard signatures? In-Reply-To: <52E281C3.1050303@hastings.org> References: <52E281C3.1050303@hastings.org> Message-ID: 24.01.14 17:07, Larry Hastings ???????(??): > a) Functions where there's a static Python value that behaves > identically to not passing in that parameter (aka "the NULL problem") [...] > Possible Solutions: > 0) Do nothing, don't convert the function. > 1) Use that clever static value as the default. I think #1 is reasonable solution. Internals of C function are just implementation details. > b) Functions where there's no static Python value that behaves > identically to > not passing in that parameter (aka "the dynamic default problem") > > There are functions with parameters whose defaults are mildly dynamic, > responding to other parameters. > > Example: > I forget its name, but someone recently showed me a builtin that took > a list as its first parameter, and its optional second parameter > defaulted to the length of the list. As I recall this function didn't > allow negative numbers, so -1 wasn't a good fit. > > Possible solutions: > 0) Do nothing, don't convert the function. > 1) Use a magic value as None. Preferably of the same type as the > function accepts, but failing that use None. If they pass in > the magic value use the previous default value. Guido himself > suggested this in > 2) Use an Argument Clinic "optional group". This only works for > functions that don't support keyword arguments. Also, I hate > this, because "optional groups" are not expressable in Python > syntax, so these functions automatically have invalid signatures. This is list.index(self, item, start=0, stop=len(self). Vajrasky Kok works on this in issue20185 [1]. In this particular case we can use default stop=sys.maxsize, as in many other places. > c) Functions that accept an 'int' when they mean 'boolean' (aka the > "ints instead of bools" problem) [...] > I assert that Python has a crystal clear definition of what > constitutes "true" and "false". These parameters are clearly > intended as booleans but they don't conform to the boolean > protocol. So I suggest every instance of this is a (very mild!) > bug. But changing these parameters to use "p" is a change: they'll > accept many more values than before. See issue15999 [2] which 16 months waits for review. > Solution: > 1) Use "bool". > 2) Use "int", and I'll go relax Argument Clinic so they > can use bool values as defaults for int parameters. I use int(c_default="0") = False int(c_default="1") = True See also rejected issue20282 [3]. > d) Functions with behavior that deliberately defy being expressed as a > Python signature (aka the "untranslatable signature" problem) > > Example: > itertools.repeat(), which behaves differently depending on whether > "times" is supplied as a positional or keyword argument. (If > "times" is <0, and was supplied via position, the function yields > 0 times. If "times" is <0, and was supplied via keyword, the > function yields infinitely-many times.) > > Solution: > 0) Do nothing, don't convert the function. > 1) Change the signature until it is Python compatible. This new > signature *must* accept a superset of the arguments accepted > by the existing signature. (This is being discussed right > now in issue #19145.) In this particular case this is a bug and should be fixed irrespective of Argument Clinic. If we implemented this function in pure Python, we would have used the sentinel idiom. _forever = object() def repeat(value, times=_forever): if times is _forever: ... else: ... We need an equivalent to the sentinel idiom in Argument Clinic. There is fifth category. The default value is C constant which is not exposed to Python. For example in the zlib module: zlib.decompress(data, [wbits, [bufsize]]) From ram at rachum.com Fri Jan 24 17:32:17 2014 From: ram at rachum.com (Ram Rachum) Date: Fri, 24 Jan 2014 18:32:17 +0200 Subject: [Python-Dev] str.rreplace Message-ID: Question: Why is there no str.rreplace in Python? -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Jan 24 17:32:23 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 24 Jan 2014 18:32:23 +0200 Subject: [Python-Dev] Argument Clinic: what to do with builtins with non-standard signatures? In-Reply-To: References: <52E281C3.1050303@hastings.org> Message-ID: 24.01.14 18:28, Serhiy Storchaka ???????(??): > 24.01.14 17:07, Larry Hastings ???????(??): >> a) Functions where there's a static Python value that behaves >> identically to not passing in that parameter (aka "the NULL problem") > [...] >> Possible Solutions: >> 0) Do nothing, don't convert the function. >> 1) Use that clever static value as the default. > > I think #1 is reasonable solution. Internals of C function are just > implementation details. > > >> b) Functions where there's no static Python value that behaves >> identically to >> not passing in that parameter (aka "the dynamic default problem") >> >> There are functions with parameters whose defaults are mildly >> dynamic, >> responding to other parameters. >> >> Example: >> I forget its name, but someone recently showed me a builtin that >> took >> a list as its first parameter, and its optional second parameter >> defaulted to the length of the list. As I recall this function >> didn't >> allow negative numbers, so -1 wasn't a good fit. >> >> Possible solutions: >> 0) Do nothing, don't convert the function. >> 1) Use a magic value as None. Preferably of the same type as the >> function accepts, but failing that use None. If they pass in >> the magic value use the previous default value. Guido himself >> suggested this in >> 2) Use an Argument Clinic "optional group". This only works for >> functions that don't support keyword arguments. Also, I hate >> this, because "optional groups" are not expressable in Python >> syntax, so these functions automatically have invalid >> signatures. > > This is list.index(self, item, start=0, stop=len(self). Vajrasky Kok > works on this in issue20185 [1]. > > In this particular case we can use default stop=sys.maxsize, as in many > other places. > > >> c) Functions that accept an 'int' when they mean 'boolean' (aka the >> "ints instead of bools" problem) > [...] >> I assert that Python has a crystal clear definition of what >> constitutes "true" and "false". These parameters are clearly >> intended as booleans but they don't conform to the boolean >> protocol. So I suggest every instance of this is a (very mild!) >> bug. But changing these parameters to use "p" is a change: they'll >> accept many more values than before. > > See issue15999 [2] which 16 months waits for review. > >> Solution: >> 1) Use "bool". >> 2) Use "int", and I'll go relax Argument Clinic so they >> can use bool values as defaults for int parameters. > > I use > > int(c_default="0") = False > int(c_default="1") = True > > See also rejected issue20282 [3]. > > >> d) Functions with behavior that deliberately defy being expressed as a >> Python signature (aka the "untranslatable signature" problem) >> >> Example: >> itertools.repeat(), which behaves differently depending on whether >> "times" is supplied as a positional or keyword argument. (If >> "times" is <0, and was supplied via position, the function yields >> 0 times. If "times" is <0, and was supplied via keyword, the >> function yields infinitely-many times.) >> >> Solution: >> 0) Do nothing, don't convert the function. >> 1) Change the signature until it is Python compatible. This new >> signature *must* accept a superset of the arguments accepted >> by the existing signature. (This is being discussed right >> now in issue #19145.) > > In this particular case this is a bug and should be fixed irrespective > of Argument Clinic. > > If we implemented this function in pure Python, we would have used the > sentinel idiom. > > _forever = object() > > def repeat(value, times=_forever): > if times is _forever: > ... > else: > ... > > We need an equivalent to the sentinel idiom in Argument Clinic. > > > There is fifth category. The default value is C constant which is not > exposed to Python. For example in the zlib module: > > zlib.decompress(data, [wbits, [bufsize]]) Oh, I have deleted links. [1] http://bugs.python.org/issue20185 [2] http://bugs.python.org/issue15999 [3] http://bugs.python.org/issue20282 From jnoller at gmail.com Fri Jan 24 17:37:12 2014 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 24 Jan 2014 10:37:12 -0600 Subject: [Python-Dev] Python 3 marketing document? In-Reply-To: References: Message-ID: fwiw, I'm offering the keys/account/etc for getpython3.com to whomever has the time to keep it fresh and up to date. On Fri, Jan 24, 2014 at 8:36 AM, Wes Turner wrote: > Hardly marketing documents, but potentially useful nonetheless: > > http://docs.python.org/3.4/whatsnew/index.html > > https://bitbucket.org/gutworth/six/src/tip/six.py > > https://github.com/nandoflorestan/nine/blob/master/nine/__init__.py > > http://docs.python.org/3/library/2to3.html > > https://pypi.python.org/pypi/backports.ssl_match_hostname > > http://docs.python.org/3.4/library/asyncio.html > > The new pathlib is pretty cool: > > http://docs.python.org/3.4/whatsnew/3.4.html#new-modules > > http://docs.python.org/3.4/library/pathlib.html#module-pathlib > > Wes Turner > > On Jan 23, 2014 9:49 PM, "Dan Stromberg" wrote: >> >> Has anyone published a web page or wiki page about what's great about >> Python 3.x? >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/jnoller%40gmail.com > From solipsis at pitrou.net Fri Jan 24 17:38:14 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 24 Jan 2014 17:38:14 +0100 Subject: [Python-Dev] str.rreplace References: Message-ID: <20140124173814.78b62257@fsol> On Fri, 24 Jan 2014 18:32:17 +0200 Ram Rachum wrote: > Question: Why is there no str.rreplace in Python? What would it do? (also, I think such questions are better asked on python-ideas) Regards Antoine. From rosuav at gmail.com Fri Jan 24 17:40:10 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 25 Jan 2014 03:40:10 +1100 Subject: [Python-Dev] str.rreplace In-Reply-To: <20140124173814.78b62257@fsol> References: <20140124173814.78b62257@fsol> Message-ID: On Sat, Jan 25, 2014 at 3:38 AM, Antoine Pitrou wrote: > On Fri, 24 Jan 2014 18:32:17 +0200 > Ram Rachum wrote: >> Question: Why is there no str.rreplace in Python? > > What would it do? > (also, I think such questions are better asked on python-ideas) Or python-list. Chances are there's a way to do it already, which would be of interest to other people who might be looking. But I've no idea what semantics are expected. :) ChrisA From breamoreboy at yahoo.co.uk Fri Jan 24 17:41:19 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 24 Jan 2014 16:41:19 +0000 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On 24/01/2014 16:32, Ram Rachum wrote: > Question: Why is there no str.rreplace in Python? > It's not needed. Is this *REALLY* relevant to this list? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ram at rachum.com Fri Jan 24 17:46:08 2014 From: ram at rachum.com (Ram Rachum) Date: Fri, 24 Jan 2014 18:46:08 +0200 Subject: [Python-Dev] str.rreplace In-Reply-To: <20140124173814.78b62257@fsol> References: <20140124173814.78b62257@fsol> Message-ID: You see, Antoine, *you* know that it's better asked on python-ideas because you know it doesn't exist in Python, therefore it's an idea for an addition. However, when a person like me asks this question, he does not know whether it exists or not, so he can't know whether he's proposing a new idea or whether it's something that exists under a different name or whether that's something that can't exist because of some unknown reason that the asker didn't think of. Now that I know it doesn't exist, I'll ask this on python-ideas. Thanks, Ram. On Fri, Jan 24, 2014 at 6:38 PM, Antoine Pitrou wrote: > On Fri, 24 Jan 2014 18:32:17 +0200 > Ram Rachum wrote: > > Question: Why is there no str.rreplace in Python? > > What would it do? > (also, I think such questions are better asked on python-ideas) > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/ram%40rachum.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Fri Jan 24 17:44:09 2014 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 24 Jan 2014 10:44:09 -0600 Subject: [Python-Dev] Python 3 marketing document? In-Reply-To: References: Message-ID: Wes Turner On Jan 24, 2014 10:37 AM, "Jesse Noller" wrote: > > fwiw, I'm offering the keys/account/etc for getpython3.com to whomever > has the time to keep it fresh and up to date. It shouldn't be too difficult to add a GET JSON view to python3wos: https://python3wos.appspot.com > > On Fri, Jan 24, 2014 at 8:36 AM, Wes Turner wrote: > > Hardly marketing documents, but potentially useful nonetheless: > > > > http://docs.python.org/3.4/whatsnew/index.html > > > > https://bitbucket.org/gutworth/six/src/tip/six.py > > > > https://github.com/nandoflorestan/nine/blob/master/nine/__init__.py > > > > http://docs.python.org/3/library/2to3.html > > > > https://pypi.python.org/pypi/backports.ssl_match_hostname > > > > http://docs.python.org/3.4/library/asyncio.html > > > > The new pathlib is pretty cool: > > > > http://docs.python.org/3.4/whatsnew/3.4.html#new-modules > > > > http://docs.python.org/3.4/library/pathlib.html#module-pathlib > > > > Wes Turner > > > > On Jan 23, 2014 9:49 PM, "Dan Stromberg" wrote: > >> > >> Has anyone published a web page or wiki page about what's great about > >> Python 3.x? > >> _______________________________________________ > >> Python-Dev mailing list > >> Python-Dev at python.org > >> https://mail.python.org/mailman/listinfo/python-dev > >> Unsubscribe: > >> https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com > > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python-dev/jnoller%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Fri Jan 24 17:45:14 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 24 Jan 2014 10:45:14 -0600 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: http://stackoverflow.com/questions/2556108/how-to-replace-the-last-occurence-of-an-expression-in-a-string On Fri, Jan 24, 2014 at 10:32 AM, Ram Rachum wrote: > Question: Why is there no str.rreplace in Python? > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com > > -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jan 24 18:04:45 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 24 Jan 2014 12:04:45 -0500 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On 1/24/2014 11:32 AM, Ram Rachum wrote: > Question: Why is there no str.rreplace in Python? Ram, this list is for discussing the development of the next few releases of CPython. General questions should go to python-list. -- Terry Jan Reedy From status at bugs.python.org Fri Jan 24 18:07:51 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 24 Jan 2014 18:07:51 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140124170751.52695560C4@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-01-17 - 2014-01-24) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4475 (+38) closed 27678 (+54) total 32153 (+92) Open issues with patches: 2039 Issues opened (70) ================== #17390: display python version on idle title bar http://bugs.python.org/issue17390 reopened by ezio.melotti #18147: SSL: diagnostic functions to list loaded CA certs http://bugs.python.org/issue18147 reopened by r.david.murray #19081: zipimport behaves badly when the zip file changes while the pr http://bugs.python.org/issue19081 reopened by gregory.p.smith #20290: inspect module bug for modules with same filename http://bugs.python.org/issue20290 opened by cqxz #20291: Argument Clinic should understand *args and **kwargs parameter http://bugs.python.org/issue20291 opened by larry #20293: pydoc fails with the "unspecified" default value http://bugs.python.org/issue20293 opened by serhiy.storchaka #20295: imghdr add openexr support http://bugs.python.org/issue20295 opened by mvignali #20296: PyArg_ParseTuple 2.X docs mention int for "t#", but "Py_ssize_ http://bugs.python.org/issue20296 opened by rlb #20297: concurrent.futures.as_completed() installs waiters for already http://bugs.python.org/issue20297 opened by glangford #20303: Argument Clinic: optional groups http://bugs.python.org/issue20303 opened by serhiy.storchaka #20304: Argument Clinic: char convertor should use default values of t http://bugs.python.org/issue20304 opened by taleinat #20305: Android's incomplete locale.h implementation prevents cross-co http://bugs.python.org/issue20305 opened by shiz #20306: Lack of pw_gecos field in Android's struct passwd causes cross http://bugs.python.org/issue20306 opened by shiz #20307: Android's failure to expose SYS_* system call constants causes http://bugs.python.org/issue20307 opened by shiz #20308: inspect.Signature doesn't support user classes without __init_ http://bugs.python.org/issue20308 opened by larry #20309: Not all method descriptors are callable http://bugs.python.org/issue20309 opened by larry #20311: epoll.poll(timeout) and PollSelector.select(timeout) must roun http://bugs.python.org/issue20311 opened by haypo #20314: Potentially confusing formulation in 6.1.4. Template strings http://bugs.python.org/issue20314 opened by Gerrit.Holl #20317: ExitStack hang if enough nested exceptions http://bugs.python.org/issue20317 opened by jcflack #20318: subprocess.Popen can hang in threaded applications http://bugs.python.org/issue20318 opened by Andrew.Lutomirski #20319: concurrent.futures.wait() can block forever even if Futures ha http://bugs.python.org/issue20319 opened by glangford #20320: select.select(timeout) and select.kqueue.control(timeout) must http://bugs.python.org/issue20320 opened by haypo #20322: Upgrade ensurepip's pip and setuptools http://bugs.python.org/issue20322 opened by dstufft #20323: Argument Clinic: docstring_prototype output causes build failu http://bugs.python.org/issue20323 opened by zach.ware #20325: Argument Clinic: self converters are not preserved when clonin http://bugs.python.org/issue20325 opened by taleinat #20326: Argument Clinic should use a non-error-prone syntax to mark te http://bugs.python.org/issue20326 opened by larry #20328: mailbox: http://bugs.python.org/issue20328 opened by jmtd #20329: zipfile.extractall fails in Posix shell with utf-8 filename http://bugs.python.org/issue20329 opened by Laurent.Mazuel #20330: PEP 342 is outdated http://bugs.python.org/issue20330 opened by msmhrt #20331: Fix various fd leaks http://bugs.python.org/issue20331 opened by serhiy.storchaka #20333: argparse subparser usage message hides main parser usage http://bugs.python.org/issue20333 opened by Martin.d'Anjou #20334: make inspect Signature hashable http://bugs.python.org/issue20334 opened by yselivanov #20335: bytes constructor accepts more than one argument even if the f http://bugs.python.org/issue20335 opened by rndblnch #20336: test_asyncio: relax timings even more http://bugs.python.org/issue20336 opened by skrah #20337: bdist_rpm should support %config(noreplace) http://bugs.python.org/issue20337 opened by kv #20338: Idle: increase max calltip width http://bugs.python.org/issue20338 opened by terry.reedy #20339: Make bytes() use tp_as_buffer for cmp http://bugs.python.org/issue20339 opened by nascheme #20341: Argument Clinic: add "nullable ints" http://bugs.python.org/issue20341 opened by larry #20342: Endianness not detected correctly due to AC_RUN_IFELSE macros http://bugs.python.org/issue20342 opened by zaytsev #20344: subprocess.check_output() docs misrepresent what shell=True do http://bugs.python.org/issue20344 opened by klausman #20345: Better logging defaults http://bugs.python.org/issue20345 opened by ArneBab #20346: Argument Clinic: missing entry in table mapping legacy convert http://bugs.python.org/issue20346 opened by taleinat #20348: Argument Clinic HOWTO listed multiple times in HOWTO index http://bugs.python.org/issue20348 opened by brett.cannon #20349: Argument Clinic: error on __new__ or __init__ with no argument http://bugs.python.org/issue20349 opened by taleinat #20350: Replace tkapp.split() to tkapp.splitlist() http://bugs.python.org/issue20350 opened by serhiy.storchaka #20351: Add doc examples for DictReader and DictWriter http://bugs.python.org/issue20351 opened by charlax #20352: Add support for AUTH command to poplib http://bugs.python.org/issue20352 opened by dveeden #20353: Hanging bug with multiprocessing + sqlite3 + tkinter (OS X 10. http://bugs.python.org/issue20353 opened by Craig.Silverstein #20354: tracemalloc causes segfault in "make profile-opt" http://bugs.python.org/issue20354 opened by matejcik #20355: -W command line options should have higher priority than PYTHO http://bugs.python.org/issue20355 opened by Arfrever #20356: fix formatting of positional-only parameters in inspect.Signat http://bugs.python.org/issue20356 opened by yselivanov #20357: Mention buildbots in the core dev section of the devguide http://bugs.python.org/issue20357 opened by ncoghlan #20358: test_curses is failing http://bugs.python.org/issue20358 opened by larry #20360: inspect.Signature could provide readable expressions for defau http://bugs.python.org/issue20360 opened by larry #20361: -W command line options and PYTHONWARNINGS environmental varia http://bugs.python.org/issue20361 opened by Arfrever #20362: longMessage attribute is ignored in unittest.TestCase.assertRe http://bugs.python.org/issue20362 opened by Dhara #20363: BytesWarnings triggerred by test suite http://bugs.python.org/issue20363 opened by Arfrever #20364: Rename & explain sqlite3.Cursor.execute 'parameters' param http://bugs.python.org/issue20364 opened by terry.reedy #20366: SQLite FTS (full text search) http://bugs.python.org/issue20366 opened by mark #20367: concurrent.futures.as_completed() fails when given duplicate F http://bugs.python.org/issue20367 opened by glangford #20368: Tkinter: handle the null character http://bugs.python.org/issue20368 opened by serhiy.storchaka #20369: concurrent.futures.wait() blocks forever when given duplicate http://bugs.python.org/issue20369 opened by glangford #20371: datetime.datetime.replace bypasses a subclass's __new__ http://bugs.python.org/issue20371 opened by Andrew.Lutomirski #20372: inspect.getfile should raise a TypeError if C object does not http://bugs.python.org/issue20372 opened by yselivanov #20373: Use test.script_helper.assert_python_ok() instead of subproces http://bugs.python.org/issue20373 opened by Arfrever #20375: ElementTree: Document handling processing instructions http://bugs.python.org/issue20375 opened by nikolaus.rath #20376: Argument Clinic: backslashes in docstrings are not escaped http://bugs.python.org/issue20376 opened by serhiy.storchaka #20378: Implement `Signature.__repr__` http://bugs.python.org/issue20378 opened by cool-RR #20379: help(bound_builtin_class) does not display self http://bugs.python.org/issue20379 opened by larry #20381: Argument Clinic: expression default arguments broken http://bugs.python.org/issue20381 opened by zach.ware Most recent 15 issues with no replies (15) ========================================== #20381: Argument Clinic: expression default arguments broken http://bugs.python.org/issue20381 #20379: help(bound_builtin_class) does not display self http://bugs.python.org/issue20379 #20378: Implement `Signature.__repr__` http://bugs.python.org/issue20378 #20375: ElementTree: Document handling processing instructions http://bugs.python.org/issue20375 #20373: Use test.script_helper.assert_python_ok() instead of subproces http://bugs.python.org/issue20373 #20372: inspect.getfile should raise a TypeError if C object does not http://bugs.python.org/issue20372 #20368: Tkinter: handle the null character http://bugs.python.org/issue20368 #20366: SQLite FTS (full text search) http://bugs.python.org/issue20366 #20362: longMessage attribute is ignored in unittest.TestCase.assertRe http://bugs.python.org/issue20362 #20357: Mention buildbots in the core dev section of the devguide http://bugs.python.org/issue20357 #20351: Add doc examples for DictReader and DictWriter http://bugs.python.org/issue20351 #20350: Replace tkapp.split() to tkapp.splitlist() http://bugs.python.org/issue20350 #20349: Argument Clinic: error on __new__ or __init__ with no argument http://bugs.python.org/issue20349 #20346: Argument Clinic: missing entry in table mapping legacy convert http://bugs.python.org/issue20346 #20339: Make bytes() use tp_as_buffer for cmp http://bugs.python.org/issue20339 Most recent 15 issues waiting for review (15) ============================================= #20375: ElementTree: Document handling processing instructions http://bugs.python.org/issue20375 #20373: Use test.script_helper.assert_python_ok() instead of subproces http://bugs.python.org/issue20373 #20372: inspect.getfile should raise a TypeError if C object does not http://bugs.python.org/issue20372 #20369: concurrent.futures.wait() blocks forever when given duplicate http://bugs.python.org/issue20369 #20368: Tkinter: handle the null character http://bugs.python.org/issue20368 #20367: concurrent.futures.as_completed() fails when given duplicate F http://bugs.python.org/issue20367 #20364: Rename & explain sqlite3.Cursor.execute 'parameters' param http://bugs.python.org/issue20364 #20363: BytesWarnings triggerred by test suite http://bugs.python.org/issue20363 #20361: -W command line options and PYTHONWARNINGS environmental varia http://bugs.python.org/issue20361 #20356: fix formatting of positional-only parameters in inspect.Signat http://bugs.python.org/issue20356 #20355: -W command line options should have higher priority than PYTHO http://bugs.python.org/issue20355 #20354: tracemalloc causes segfault in "make profile-opt" http://bugs.python.org/issue20354 #20352: Add support for AUTH command to poplib http://bugs.python.org/issue20352 #20351: Add doc examples for DictReader and DictWriter http://bugs.python.org/issue20351 #20350: Replace tkapp.split() to tkapp.splitlist() http://bugs.python.org/issue20350 Top 10 most discussed issues (10) ================================= #20341: Argument Clinic: add "nullable ints" http://bugs.python.org/issue20341 30 msgs #20218: Add `pathlib.Path.write` and `pathlib.Path.read` http://bugs.python.org/issue20218 26 msgs #20311: epoll.poll(timeout) and PollSelector.select(timeout) must roun http://bugs.python.org/issue20311 26 msgs #17481: inspect.getfullargspec should use __signature__ http://bugs.python.org/issue17481 23 msgs #20180: Derby #11: Convert 50 sites to Argument Clinic across 9 files http://bugs.python.org/issue20180 14 msgs #20293: pydoc fails with the "unspecified" default value http://bugs.python.org/issue20293 12 msgs #20338: Idle: increase max calltip width http://bugs.python.org/issue20338 12 msgs #20308: inspect.Signature doesn't support user classes without __init_ http://bugs.python.org/issue20308 10 msgs #20177: Derby #8: Convert 28 sites to Argument Clinic across 2 files http://bugs.python.org/issue20177 9 msgs #20185: Derby #17: Convert 49 sites to Argument Clinic across 13 files http://bugs.python.org/issue20185 9 msgs Issues closed (55) ================== #7883: CallTips.py _find_constructor does not work http://bugs.python.org/issue7883 closed by terry.reedy #16630: IDLE: Calltip fails if __getattr__ raises exception http://bugs.python.org/issue16630 closed by terry.reedy #16638: support multi-line docstring signatures in IDLE calltips http://bugs.python.org/issue16638 closed by terry.reedy #16655: IDLE list.append calltips test failures http://bugs.python.org/issue16655 closed by terry.reedy #17811: Improve os.readv() and os.writev() documentation and docstring http://bugs.python.org/issue17811 closed by python-dev #17814: Popen.stdin/stdout/stderr documentation should mention object http://bugs.python.org/issue17814 closed by python-dev #18574: BaseHTTPRequestHandler.handle_expect_100() sends invalid respo http://bugs.python.org/issue18574 closed by python-dev #19020: Regression: Windows-tkinter-idle, unicode, and 0xxx filename http://bugs.python.org/issue19020 closed by terry.reedy #19036: setlocale fails due to locale.h being wrapped up in LANGINFO c http://bugs.python.org/issue19036 closed by skrah #19291: Add docs for asyncio package (Tulip, PEP 3156) http://bugs.python.org/issue19291 closed by haypo #19584: IDLE fails - Python V2.7.6 - 64b on Win7 64b http://bugs.python.org/issue19584 closed by terry.reedy #19936: Executable permissions of Python source files http://bugs.python.org/issue19936 closed by serhiy.storchaka #20024: Py_BuildValue() can call Python code with an exception set http://bugs.python.org/issue20024 closed by haypo #20122: Move CallTips tests to idle_tests http://bugs.python.org/issue20122 closed by terry.reedy #20165: unittest TestResult wasSuccessful returns True when there are http://bugs.python.org/issue20165 closed by gregory.p.smith #20189: inspect.Signature doesn't recognize all builtin types http://bugs.python.org/issue20189 closed by larry #20194: Add :deprecated: marker to formatter module docs http://bugs.python.org/issue20194 closed by brett.cannon #20195: Add :deprecated: marker to imp docs http://bugs.python.org/issue20195 closed by brett.cannon #20222: unittest.mock-examples doc uses builtin file which is removed http://bugs.python.org/issue20222 closed by terry.reedy #20238: Incomplete gzip output with tarfile.open(fileobj=..., mode="w: http://bugs.python.org/issue20238 closed by serhiy.storchaka #20243: ReadError when open a tarfile for writing http://bugs.python.org/issue20243 closed by serhiy.storchaka #20244: Possible resources leak in tarfile.open() http://bugs.python.org/issue20244 closed by serhiy.storchaka #20245: Check empty mode in TarFile.*open() http://bugs.python.org/issue20245 closed by serhiy.storchaka #20252: Argument Clinic howto: small typo in y# translation http://bugs.python.org/issue20252 closed by rmsr #20262: Convert some debugging prints in zipfile to warnings http://bugs.python.org/issue20262 closed by serhiy.storchaka #20270: urllib.parse doesn't work with empty port http://bugs.python.org/issue20270 closed by serhiy.storchaka #20275: asyncio: remove debug code from BaseEventLoop http://bugs.python.org/issue20275 closed by python-dev #20280: add "predicate" to the glossary http://bugs.python.org/issue20280 closed by flox #20282: Argument Clinic: int with boolean default http://bugs.python.org/issue20282 closed by larry #20287: Argument Clinic: support diverting output to buffer, external http://bugs.python.org/issue20287 closed by larry #20292: clinic.py str_converter with encoding throws exception http://bugs.python.org/issue20292 closed by larry #20294: Argument Clinic: add support for __init__ http://bugs.python.org/issue20294 closed by larry #20298: json library needs a non-strict option to decode single-quoted http://bugs.python.org/issue20298 closed by rhettinger #20299: Argument Clinic CConverter.__init__() overrides c_default and http://bugs.python.org/issue20299 closed by larry #20300: Argument Clinic raises exception for custom converter with def http://bugs.python.org/issue20300 closed by larry #20301: Correct docs for default access argument for DeleteKeyEx http://bugs.python.org/issue20301 closed by zach.ware #20302: Argument Clinic: meaningful names for group flags http://bugs.python.org/issue20302 closed by larry #20310: Recommend using pprint for deterministic doctest http://bugs.python.org/issue20310 closed by r.david.murray #20312: A missing link to Python-3.4.0b2.tar.bz2 in the download page. http://bugs.python.org/issue20312 closed by georg.brandl #20313: inspect.signature should raise ValueError for builtins with no http://bugs.python.org/issue20313 closed by larry #20315: Remove old compatibility code http://bugs.python.org/issue20315 closed by serhiy.storchaka #20316: Byte-compiled files should be absent in tarballs http://bugs.python.org/issue20316 closed by benjamin.peterson #20321: ImportError when a module is created after a catched ImportErr http://bugs.python.org/issue20321 closed by brett.cannon #20324: gcc-4.2.4 support on python-3.3 (libmpdec) http://bugs.python.org/issue20324 closed by georg.brandl #20327: Argument Clinic: setting internal variable names for parsed ar http://bugs.python.org/issue20327 closed by zach.ware #20332: Argument Clinic docs do not list support for the 'l' format http://bugs.python.org/issue20332 closed by larry #20340: -bb option does not have different behavior than -b option http://bugs.python.org/issue20340 closed by Arfrever #20343: zipfile truncates extracted files http://bugs.python.org/issue20343 closed by serhiy.storchaka #20347: dir(__future__) gives segfault on OS X in 3.2 and 3.3 http://bugs.python.org/issue20347 closed by r.david.murray #20359: Having escape sequences (like color codes) in the sys.ps1 mess http://bugs.python.org/issue20359 closed by georg.brandl #20365: test_asyncio.test_read_pty_output() fails on "AMD64 Snow Leop http://bugs.python.org/issue20365 closed by python-dev #20370: Docs error in Library reference-17.1.3 threading.Lock.acquire( http://bugs.python.org/issue20370 closed by yonoho #20374: Failure to compile with readline-6.3-rc1 http://bugs.python.org/issue20374 closed by python-dev #20377: Argument Clinic: get rid of the "_impl" suffix http://bugs.python.org/issue20377 closed by larry #20380: __defaults__ changed by *args http://bugs.python.org/issue20380 closed by larry From ram at rachum.com Fri Jan 24 18:19:26 2014 From: ram at rachum.com (Ram Rachum) Date: Fri, 24 Jan 2014 19:19:26 +0200 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: Hmm, on one hand I understand the need for the separation between python-dev and python-list, but on the other hand I don't think python-list is a good place to discuss Python, the language. I now looked at the 17 most recent python-list threads. Out of them: - 58% are about third-party packages. - 17% are off-topic (not even programming related) - 11% are 2-vs-3 discussions - 5% are job offers. - 5% (which is just one thread out of 17) is about Python the language. So can you understand why someone would be reluctant to start a discussion in python-list about Python the language there? Especially if this is the same place where beginners might ask newbies questions about Python? (So not only are actual Python questions just 5% of the content, non-newbie questions are just a subset of that 5%.) it's full of people asking about third-party Python packages, or asking newbie questions. On Fri, Jan 24, 2014 at 7:04 PM, Terry Reedy wrote: > On 1/24/2014 11:32 AM, Ram Rachum wrote: > >> Question: Why is there no str.rreplace in Python? >> > > Ram, this list is for discussing the development of the next few releases > of CPython. General questions should go to python-list. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > ram%40rachum.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Jan 24 18:31:11 2014 From: brett at python.org (Brett Cannon) Date: Fri, 24 Jan 2014 12:31:11 -0500 Subject: [Python-Dev] str.rreplace In-Reply-To: References: <20140124173814.78b62257@fsol> Message-ID: On Fri, Jan 24, 2014 at 11:46 AM, Ram Rachum wrote: > You see, Antoine, *you* know that it's better asked on python-ideas > because you know it doesn't exist in Python, therefore it's an idea for an > addition. However, when a person like me asks this question, he does not > know whether it exists or not, so he can't know whether he's proposing a > new idea or whether it's something that exists under a different name or > whether that's something that can't exist because of some unknown reason > that the asker didn't think of. > > Now that I know it doesn't exist, I'll ask this on python-ideas. > I think there might be a language issue here because you originally said "Why is there no str.rreplace in Python?" which shows you already knew it didn't exist. Did you mean to say you wanted to know *why* it didn't exist? Even in that case, if searching for [python str.rreplace] didn't turn up anything then chances are there was no proposal, which makes it a new idea and thus belongs on python-ideas. Basically the rule of thumb is anything considered new goes to python-ideas first. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Fri Jan 24 18:33:23 2014 From: ram at rachum.com (Ram Rachum) Date: Fri, 24 Jan 2014 19:33:23 +0200 Subject: [Python-Dev] str.rreplace In-Reply-To: References: <20140124173814.78b62257@fsol> Message-ID: I knew it didn't exist by that name, but couldn't know whether there was another function that did the same thing or technique to make it not needed. So I couldn't know whether it's new or not, therefore I couldn't know whether it should be on python-ideas or not. On Fri, Jan 24, 2014 at 7:31 PM, Brett Cannon wrote: > > > > On Fri, Jan 24, 2014 at 11:46 AM, Ram Rachum wrote: > >> You see, Antoine, *you* know that it's better asked on python-ideas >> because you know it doesn't exist in Python, therefore it's an idea for an >> addition. However, when a person like me asks this question, he does not >> know whether it exists or not, so he can't know whether he's proposing a >> new idea or whether it's something that exists under a different name or >> whether that's something that can't exist because of some unknown reason >> that the asker didn't think of. >> >> Now that I know it doesn't exist, I'll ask this on python-ideas. >> > > I think there might be a language issue here because you originally said "Why > is there no str.rreplace in Python?" which shows you already knew it didn't > exist. Did you mean to say you wanted to know *why* it didn't exist? > > Even in that case, if searching for [python str.rreplace] didn't turn up > anything then chances are there was no proposal, which makes it a new idea > and thus belongs on python-ideas. Basically the rule of thumb is anything > considered new goes to python-ideas first. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Fri Jan 24 18:40:05 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 24 Jan 2014 17:40:05 +0000 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On 24/01/2014 17:19, Ram Rachum wrote: > Hmm, on one hand I understand the need for the separation between > python-dev and python-list, but on the other hand I don't think > python-list is a good place to discuss Python, the language. > > I now looked at the 17 most recent python-list threads. Out of them: > > - 58% are about third-party packages. > - 17% are off-topic (not even programming related) > - 11% are 2-vs-3 discussions > - 5% are job offers. > - 5% (which is just one thread out of 17) is about Python the language. > I'm extremely impressed by your knowledge of statistics, it must have taken you many man years of effort to analyse all 17 threads in such detail. > So can you understand why someone would be reluctant to start a > discussion in python-list about Python the language there? Especially if > this is the same place where beginners might ask newbies questions about > Python? (So not only are actual Python questions just 5% of the content, > non-newbie questions are just a subset of that 5%.) > > it's full of people asking about third-party Python packages, or asking > newbie questions. > How terrible, fancy having the audacity to ask about third party packages or newbie questions on the *MAIN* Python mailing list. There's yet another reason to bring back the death penalty in the UK. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From brett at python.org Fri Jan 24 18:41:11 2014 From: brett at python.org (Brett Cannon) Date: Fri, 24 Jan 2014 12:41:11 -0500 Subject: [Python-Dev] str.rreplace In-Reply-To: References: <20140124173814.78b62257@fsol> Message-ID: On Fri, Jan 24, 2014 at 12:33 PM, Ram Rachum wrote: > I knew it didn't exist by that name, but couldn't know whether there was > another function that did the same thing or technique to make it not needed. > > So I couldn't know whether it's new or not, therefore I couldn't know > whether it should be on python-ideas or not. > So then you were simply wondering about its existence, for which you should go to python-list or python-ideas first. Python-ideas exists *explicitly* as a filter for this kind of question which is why people are saying it should have gone there first (or to python-list). If you have any doubt as to whether a question should go here or not, then err on the side of caution and post to python-ideas or python-list first. -Brett > > > > On Fri, Jan 24, 2014 at 7:31 PM, Brett Cannon wrote: > >> >> >> >> On Fri, Jan 24, 2014 at 11:46 AM, Ram Rachum wrote: >> >>> You see, Antoine, *you* know that it's better asked on python-ideas >>> because you know it doesn't exist in Python, therefore it's an idea for an >>> addition. However, when a person like me asks this question, he does not >>> know whether it exists or not, so he can't know whether he's proposing a >>> new idea or whether it's something that exists under a different name or >>> whether that's something that can't exist because of some unknown reason >>> that the asker didn't think of. >>> >>> Now that I know it doesn't exist, I'll ask this on python-ideas. >>> >> >> I think there might be a language issue here because you originally said "Why >> is there no str.rreplace in Python?" which shows you already knew it didn't >> exist. Did you mean to say you wanted to know *why* it didn't exist? >> >> Even in that case, if searching for [python str.rreplace] didn't turn up >> anything then chances are there was no proposal, which makes it a new idea >> and thus belongs on python-ideas. Basically the rule of thumb is anything >> considered new goes to python-ideas first. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jan 24 18:42:39 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 24 Jan 2014 12:42:39 -0500 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On 1/24/2014 12:19 PM, Ram Rachum wrote: > Hmm, on one hand I understand the need for the separation between > python-dev and python-list, but on the other hand I don't think > python-list is a good place to discuss Python, the language. Python-list is the place for such discussions. Questions such as yours are common. I have been reading it for almost 17 years. -- Terry Jan Reedy From rosuav at gmail.com Fri Jan 24 18:46:39 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 25 Jan 2014 04:46:39 +1100 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On Sat, Jan 25, 2014 at 4:19 AM, Ram Rachum wrote: > I now looked at the 17 most recent python-list threads. Out of them: > > - 58% are about third-party packages. > - 17% are off-topic (not even programming related) > - 11% are 2-vs-3 discussions > - 5% are job offers. > - 5% (which is just one thread out of 17) is about Python the language. My analysis here is based on what I see arriving in Gmail, so some of them may have been dropped into spam. But these are the threads with the most recent posts: "The potential for a Python 2.8" - discussing the language, though the last few posts drifted off into numeric jokes (also fun). "Class and instance related questions" - short thread but completely on topic (so far) "Python declarative" - not all the code shown has been Python, and a lot of the discussion centers around alternatives like XML and JSON, but it's definitely focused on Python "datetime as subclass of date" - on topic "Can post a code but afraid of plagiarism" - haven't been following it, but last I saw it was on topic "Elementree and insert new element if it is not present" - might count as discussion of a separate module, I guess "generate De Bruijn sequence memory and string vs lists" - all about how to do it in Python, looks on topic to me "Need Help with Programming Science Project" - the OP never said that the program was to be in Python, but if we assume that, it's completely on topic Further down than that we have a few about SQLite, which Python comes with, and an announcement of a new version of Dipy. Far as I can see, that's only two threads that are truly about third-party modules (that and lxml). Yes, there's some noise on the list, but it's not as bad as 16/17ths of the threads. Maybe you're reading it in some way other than the mailing list, and it accrues more noise? ChrisA From ram at rachum.com Fri Jan 24 18:46:56 2014 From: ram at rachum.com (Ram Rachum) Date: Fri, 24 Jan 2014 19:46:56 +0200 Subject: [Python-Dev] str.rreplace In-Reply-To: References: <20140124173814.78b62257@fsol> Message-ID: Okay, next time I'll ask on python-ideas. (I do hope that no one there will be angry that I'm posting a question there rather than an idea...) On Fri, Jan 24, 2014 at 7:41 PM, Brett Cannon wrote: > > > > On Fri, Jan 24, 2014 at 12:33 PM, Ram Rachum wrote: > >> I knew it didn't exist by that name, but couldn't know whether there was >> another function that did the same thing or technique to make it not needed. >> >> So I couldn't know whether it's new or not, therefore I couldn't know >> whether it should be on python-ideas or not. >> > > So then you were simply wondering about its existence, for which you > should go to python-list or python-ideas first. Python-ideas exists > *explicitly* as a filter for this kind of question which is why people are > saying it should have gone there first (or to python-list). > > If you have any doubt as to whether a question should go here or not, then > err on the side of caution and post to python-ideas or python-list first. > > -Brett > > >> >> >> >> On Fri, Jan 24, 2014 at 7:31 PM, Brett Cannon wrote: >> >>> >>> >>> >>> On Fri, Jan 24, 2014 at 11:46 AM, Ram Rachum wrote: >>> >>>> You see, Antoine, *you* know that it's better asked on python-ideas >>>> because you know it doesn't exist in Python, therefore it's an idea for an >>>> addition. However, when a person like me asks this question, he does not >>>> know whether it exists or not, so he can't know whether he's proposing a >>>> new idea or whether it's something that exists under a different name or >>>> whether that's something that can't exist because of some unknown reason >>>> that the asker didn't think of. >>>> >>>> Now that I know it doesn't exist, I'll ask this on python-ideas. >>>> >>> >>> I think there might be a language issue here because you originally said >>> "Why is there no str.rreplace in Python?" which shows you already knew >>> it didn't exist. Did you mean to say you wanted to know *why* it didn't >>> exist? >>> >>> Even in that case, if searching for [python str.rreplace] didn't turn up >>> anything then chances are there was no proposal, which makes it a new idea >>> and thus belongs on python-ideas. Basically the rule of thumb is anything >>> considered new goes to python-ideas first. >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Jan 24 18:49:50 2014 From: brett at python.org (Brett Cannon) Date: Fri, 24 Jan 2014 12:49:50 -0500 Subject: [Python-Dev] str.rreplace In-Reply-To: References: <20140124173814.78b62257@fsol> Message-ID: On Fri, Jan 24, 2014 at 12:46 PM, Ram Rachum wrote: > Okay, next time I'll ask on python-ideas. (I do hope that no one there > will be angry that I'm posting a question there rather than an idea...) > Nope, no one will. Just phrase it as "is there something like a str.rreplace? If not I think it would be useful because ...". The assumption is that if you are asking if something exists then you would like it to exist, in which case you should have a reason for wanting it. -Brett > > > On Fri, Jan 24, 2014 at 7:41 PM, Brett Cannon wrote: > >> >> >> >> On Fri, Jan 24, 2014 at 12:33 PM, Ram Rachum wrote: >> >>> I knew it didn't exist by that name, but couldn't know whether there was >>> another function that did the same thing or technique to make it not needed. >>> >>> So I couldn't know whether it's new or not, therefore I couldn't know >>> whether it should be on python-ideas or not. >>> >> >> So then you were simply wondering about its existence, for which you >> should go to python-list or python-ideas first. Python-ideas exists >> *explicitly* as a filter for this kind of question which is why people are >> saying it should have gone there first (or to python-list). >> >> If you have any doubt as to whether a question should go here or not, >> then err on the side of caution and post to python-ideas or python-list >> first. >> >> -Brett >> >> >>> >>> >>> >>> On Fri, Jan 24, 2014 at 7:31 PM, Brett Cannon wrote: >>> >>>> >>>> >>>> >>>> On Fri, Jan 24, 2014 at 11:46 AM, Ram Rachum wrote: >>>> >>>>> You see, Antoine, *you* know that it's better asked on python-ideas >>>>> because you know it doesn't exist in Python, therefore it's an idea for an >>>>> addition. However, when a person like me asks this question, he does not >>>>> know whether it exists or not, so he can't know whether he's proposing a >>>>> new idea or whether it's something that exists under a different name or >>>>> whether that's something that can't exist because of some unknown reason >>>>> that the asker didn't think of. >>>>> >>>>> Now that I know it doesn't exist, I'll ask this on python-ideas. >>>>> >>>> >>>> I think there might be a language issue here because you originally >>>> said "Why is there no str.rreplace in Python?" which shows you already >>>> knew it didn't exist. Did you mean to say you wanted to know *why* it >>>> didn't exist? >>>> >>>> Even in that case, if searching for [python str.rreplace] didn't turn >>>> up anything then chances are there was no proposal, which makes it a new >>>> idea and thus belongs on python-ideas. Basically the rule of thumb is >>>> anything considered new goes to python-ideas first. >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amk at amk.ca Fri Jan 24 18:52:51 2014 From: amk at amk.ca (A.M. Kuchling) Date: Fri, 24 Jan 2014 12:52:51 -0500 Subject: [Python-Dev] Python 3 marketing document? In-Reply-To: References: Message-ID: <20140124175251.GA1200@DATLANDREWK.local> On Fri, Jan 24, 2014 at 10:37:12AM -0600, Jesse Noller wrote: > fwiw, I'm offering the keys/account/etc for getpython3.com to whomever > has the time to keep it fresh and up to date. I'd be interested. --amk From ethan at stoneleaf.us Fri Jan 24 18:27:29 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 24 Jan 2014 09:27:29 -0800 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: <52E2A281.1090004@stoneleaf.us> On 01/24/2014 09:19 AM, Ram Rachum wrote: > Hmm, on one hand I understand the need for the separation between python-dev and python-list, but on the other hand I > don't think python-list is a good place to discuss Python, the language. [snip] > it's full of people asking about third-party Python packages, or asking newbie questions. Yes, so imagine how happy we would be to see an actual Python the Language question there! :) Setting follow-up to Python List. -- ~Ethan~ From breamoreboy at yahoo.co.uk Fri Jan 24 19:01:05 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 24 Jan 2014 18:01:05 +0000 Subject: [Python-Dev] Python 3 marketing document? In-Reply-To: References: Message-ID: On 24/01/2014 16:37, Jesse Noller wrote: > fwiw, I'm offering the keys/account/etc for getpython3.com to whomever > has the time to keep it fresh and up to date. > If I've ever heard of this I've forgotten about it. How do we make it more prominent? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From wes.turner at gmail.com Fri Jan 24 18:50:31 2014 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 24 Jan 2014 11:50:31 -0600 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On Jan 24, 2014 11:43 AM, "Terry Reedy" wrote: > > On 1/24/2014 12:19 PM, Ram Rachum wrote: >> >> Hmm, on one hand I understand the need for the separation between >> python-dev and python-list, but on the other hand I don't think >> python-list is a good place to discuss Python, the language. > Is there a link to this sort of information? (e.g. a page with group descriptions) (EDIT) http://www.python.org/community/lists/ > > Python-list is the place for such discussions. Questions such as yours are common. I have been reading it for almost 17 years. > Http://reddit.com/r/learnpython can also be helpful, though it only supports markdown. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jnoller at gmail.com Fri Jan 24 20:23:42 2014 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 24 Jan 2014 13:23:42 -0600 Subject: [Python-Dev] Python 3 marketing document? In-Reply-To: References: Message-ID: I'm giving AMK the keys to the kingdom right now: AMK: Feel free to go nuts. Email me your public key On Fri, Jan 24, 2014 at 12:01 PM, Mark Lawrence wrote: > On 24/01/2014 16:37, Jesse Noller wrote: >> >> fwiw, I'm offering the keys/account/etc for getpython3.com to whomever >> has the time to keep it fresh and up to date. >> > > If I've ever heard of this I've forgotten about it. How do we make it more > prominent? > > -- > My fellow Pythonistas, ask not what our language can do for you, ask what > you can do for our language. > > Mark Lawrence > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/jnoller%40gmail.com From tjreedy at udel.edu Fri Jan 24 21:05:00 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 24 Jan 2014 15:05:00 -0500 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On 1/24/2014 12:50 PM, Wes Turner wrote: > On Jan 24, 2014 11:43 AM, "Terry Reedy" > wrote: > > > > On 1/24/2014 12:19 PM, Ram Rachum wrote: > >> > >> Hmm, on one hand I understand the need for the separation between > >> python-dev and python-list, but on the other hand I don't think > >> python-list is a good place to discuss Python, the language. > > > > Is there a link to this sort of information? (e.g. a page with group > descriptions) > > (EDIT) http://www.python.org/community/lists/ mail.python.org, which redirects to https://mail.python.org/mailman/listinfo Python-list: General discussion list for the Python programming language Python-Dev Python core developers Python-ideas Discussions of speculative Python language ideas > > Python-list is the place for such discussions. Questions such as > yours are common. I have been reading it for almost 17 years. > > -- Terry Jan Reedy From brian at python.org Fri Jan 24 23:44:23 2014 From: brian at python.org (Brian Curtin) Date: Fri, 24 Jan 2014 16:44:23 -0600 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On Fri, Jan 24, 2014 at 11:40 AM, Mark Lawrence wrote: > On 24/01/2014 17:19, Ram Rachum wrote: >> >> Hmm, on one hand I understand the need for the separation between >> python-dev and python-list, but on the other hand I don't think >> python-list is a good place to discuss Python, the language. >> >> I now looked at the 17 most recent python-list threads. Out of them: >> >> - 58% are about third-party packages. >> - 17% are off-topic (not even programming related) >> - 11% are 2-vs-3 discussions >> - 5% are job offers. >> - 5% (which is just one thread out of 17) is about Python the language. >> > > > I'm extremely impressed by your knowledge of statistics, it must have taken > you many man years of effort to analyse all 17 threads in such detail. > > >> So can you understand why someone would be reluctant to start a >> discussion in python-list about Python the language there? Especially if >> this is the same place where beginners might ask newbies questions about >> Python? (So not only are actual Python questions just 5% of the content, >> non-newbie questions are just a subset of that 5%.) >> >> it's full of people asking about third-party Python packages, or asking >> newbie questions. >> > > How terrible, fancy having the audacity to ask about third party packages or > newbie questions on the *MAIN* Python mailing list. There's yet another > reason to bring back the death penalty in the UK. Please adjust the tone of your messages if you are going to use this mailing list. From breamoreboy at yahoo.co.uk Fri Jan 24 23:50:28 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 24 Jan 2014 22:50:28 +0000 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On 24/01/2014 22:44, Brian Curtin wrote: > On Fri, Jan 24, 2014 at 11:40 AM, Mark Lawrence wrote: >> On 24/01/2014 17:19, Ram Rachum wrote: >>> >>> Hmm, on one hand I understand the need for the separation between >>> python-dev and python-list, but on the other hand I don't think >>> python-list is a good place to discuss Python, the language. >>> >>> I now looked at the 17 most recent python-list threads. Out of them: >>> >>> - 58% are about third-party packages. >>> - 17% are off-topic (not even programming related) >>> - 11% are 2-vs-3 discussions >>> - 5% are job offers. >>> - 5% (which is just one thread out of 17) is about Python the language. >>> >> >> >> I'm extremely impressed by your knowledge of statistics, it must have taken >> you many man years of effort to analyse all 17 threads in such detail. >> >> >>> So can you understand why someone would be reluctant to start a >>> discussion in python-list about Python the language there? Especially if >>> this is the same place where beginners might ask newbies questions about >>> Python? (So not only are actual Python questions just 5% of the content, >>> non-newbie questions are just a subset of that 5%.) >>> >>> it's full of people asking about third-party Python packages, or asking >>> newbie questions. >>> >> >> How terrible, fancy having the audacity to ask about third party packages or >> newbie questions on the *MAIN* Python mailing list. There's yet another >> reason to bring back the death penalty in the UK. > > Please adjust the tone of your messages if you are going to use this > mailing list. > I'm sorry but I do not understand, please explain what is wrong with an extremely heavy dose of sarcasm. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From brian at python.org Fri Jan 24 23:56:54 2014 From: brian at python.org (Brian Curtin) Date: Fri, 24 Jan 2014 16:56:54 -0600 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On Fri, Jan 24, 2014 at 4:50 PM, Mark Lawrence wrote: > On 24/01/2014 22:44, Brian Curtin wrote: >> >> On Fri, Jan 24, 2014 at 11:40 AM, Mark Lawrence >> wrote: >>> >>> On 24/01/2014 17:19, Ram Rachum wrote: >>>> >>>> >>>> Hmm, on one hand I understand the need for the separation between >>>> python-dev and python-list, but on the other hand I don't think >>>> python-list is a good place to discuss Python, the language. >>>> >>>> I now looked at the 17 most recent python-list threads. Out of them: >>>> >>>> - 58% are about third-party packages. >>>> - 17% are off-topic (not even programming related) >>>> - 11% are 2-vs-3 discussions >>>> - 5% are job offers. >>>> - 5% (which is just one thread out of 17) is about Python the >>>> language. >>>> >>> >>> >>> I'm extremely impressed by your knowledge of statistics, it must have >>> taken >>> you many man years of effort to analyse all 17 threads in such detail. >>> >>> >>>> So can you understand why someone would be reluctant to start a >>>> discussion in python-list about Python the language there? Especially if >>>> this is the same place where beginners might ask newbies questions about >>>> Python? (So not only are actual Python questions just 5% of the content, >>>> non-newbie questions are just a subset of that 5%.) >>>> >>>> it's full of people asking about third-party Python packages, or asking >>>> newbie questions. >>>> >>> >>> How terrible, fancy having the audacity to ask about third party packages >>> or >>> newbie questions on the *MAIN* Python mailing list. There's yet another >>> reason to bring back the death penalty in the UK. >> >> >> Please adjust the tone of your messages if you are going to use this >> mailing list. >> > > I'm sorry but I do not understand, please explain what is wrong with an > extremely heavy dose of sarcasm. There's a real discussion going on and you're just responding to throw around sarcasm. People aren't going to come to this list if you're just going to give them snarky replies. It's not helping. From breamoreboy at yahoo.co.uk Sat Jan 25 00:02:25 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 24 Jan 2014 23:02:25 +0000 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On 24/01/2014 22:56, Brian Curtin wrote: > On Fri, Jan 24, 2014 at 4:50 PM, Mark Lawrence wrote: >> On 24/01/2014 22:44, Brian Curtin wrote: >>> >>> On Fri, Jan 24, 2014 at 11:40 AM, Mark Lawrence >>> wrote: >>>> >>>> On 24/01/2014 17:19, Ram Rachum wrote: >>>>> >>>>> >>>>> Hmm, on one hand I understand the need for the separation between >>>>> python-dev and python-list, but on the other hand I don't think >>>>> python-list is a good place to discuss Python, the language. >>>>> >>>>> I now looked at the 17 most recent python-list threads. Out of them: >>>>> >>>>> - 58% are about third-party packages. >>>>> - 17% are off-topic (not even programming related) >>>>> - 11% are 2-vs-3 discussions >>>>> - 5% are job offers. >>>>> - 5% (which is just one thread out of 17) is about Python the >>>>> language. >>>>> >>>> >>>> >>>> I'm extremely impressed by your knowledge of statistics, it must have >>>> taken >>>> you many man years of effort to analyse all 17 threads in such detail. >>>> >>>> >>>>> So can you understand why someone would be reluctant to start a >>>>> discussion in python-list about Python the language there? Especially if >>>>> this is the same place where beginners might ask newbies questions about >>>>> Python? (So not only are actual Python questions just 5% of the content, >>>>> non-newbie questions are just a subset of that 5%.) >>>>> >>>>> it's full of people asking about third-party Python packages, or asking >>>>> newbie questions. >>>>> >>>> >>>> How terrible, fancy having the audacity to ask about third party packages >>>> or >>>> newbie questions on the *MAIN* Python mailing list. There's yet another >>>> reason to bring back the death penalty in the UK. >>> >>> >>> Please adjust the tone of your messages if you are going to use this >>> mailing list. >>> >> >> I'm sorry but I do not understand, please explain what is wrong with an >> extremely heavy dose of sarcasm. > > There's a real discussion going on and you're just responding to throw > around sarcasm. People aren't going to come to this list if you're > just going to give them snarky replies. It's not helping. > Okay, I'll leave the snarky comments to the people who are authorised to be snarky. How do you get on this list? Is it any core dev, or are there more severe restrictions than that, for example do you have to be a member of the PSF, in which case I'd guess you can be very snarky without having a word said against you? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From brett at python.org Sat Jan 25 00:43:18 2014 From: brett at python.org (Brett Cannon) Date: Fri, 24 Jan 2014 18:43:18 -0500 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On Fri, Jan 24, 2014 at 6:02 PM, Mark Lawrence wrote: > On 24/01/2014 22:56, Brian Curtin wrote: > >> On Fri, Jan 24, 2014 at 4:50 PM, Mark Lawrence >> wrote: >> >>> On 24/01/2014 22:44, Brian Curtin wrote: >>> >>>> >>>> On Fri, Jan 24, 2014 at 11:40 AM, Mark Lawrence < >>>> breamoreboy at yahoo.co.uk> >>>> wrote: >>>> >>>>> >>>>> On 24/01/2014 17:19, Ram Rachum wrote: >>>>> >>>>>> >>>>>> >>>>>> Hmm, on one hand I understand the need for the separation between >>>>>> python-dev and python-list, but on the other hand I don't think >>>>>> python-list is a good place to discuss Python, the language. >>>>>> >>>>>> I now looked at the 17 most recent python-list threads. Out of them: >>>>>> >>>>>> - 58% are about third-party packages. >>>>>> - 17% are off-topic (not even programming related) >>>>>> - 11% are 2-vs-3 discussions >>>>>> - 5% are job offers. >>>>>> - 5% (which is just one thread out of 17) is about Python the >>>>>> language. >>>>>> >>>>>> >>>>> >>>>> I'm extremely impressed by your knowledge of statistics, it must have >>>>> taken >>>>> you many man years of effort to analyse all 17 threads in such detail. >>>>> >>>>> >>>>> So can you understand why someone would be reluctant to start a >>>>>> discussion in python-list about Python the language there? Especially >>>>>> if >>>>>> this is the same place where beginners might ask newbies questions >>>>>> about >>>>>> Python? (So not only are actual Python questions just 5% of the >>>>>> content, >>>>>> non-newbie questions are just a subset of that 5%.) >>>>>> >>>>>> it's full of people asking about third-party Python packages, or >>>>>> asking >>>>>> newbie questions. >>>>>> >>>>>> >>>>> How terrible, fancy having the audacity to ask about third party >>>>> packages >>>>> or >>>>> newbie questions on the *MAIN* Python mailing list. There's yet >>>>> another >>>>> reason to bring back the death penalty in the UK. >>>>> >>>> >>>> >>>> Please adjust the tone of your messages if you are going to use this >>>> mailing list. >>>> >>>> >>> I'm sorry but I do not understand, please explain what is wrong with an >>> extremely heavy dose of sarcasm. >>> >> >> There's a real discussion going on and you're just responding to throw >> around sarcasm. People aren't going to come to this list if you're >> just going to give them snarky replies. It's not helping. >> >> > Okay, I'll leave the snarky comments to the people who are authorised to > be snarky. How do you get on this list? Is it any core dev, or are there > more severe restrictions than that, for example do you have to be a member > of the PSF, in which case I'd guess you can be very snarky without having a > word said against you? I suspect Brian's point is sarcasm is fine in moderation. I'm sure we have all had incidences online where sarcasm was not understood and someone took it the wrong way. And with this list being international the chance of something not catching something as sarcastic just goes up. So sarcasm is fine, but keep it on the lighter side is all. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jan 25 01:41:05 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Jan 2014 10:41:05 +1000 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: On 25 Jan 2014 09:46, "Brett Cannon" wrote: > On Fri, Jan 24, 2014 at 6:02 PM, Mark Lawrence wrote: >>> >> >> Okay, I'll leave the snarky comments to the people who are authorised to be snarky. How do you get on this list? Is it any core dev, or are there more severe restrictions than that, for example do you have to be a member of the PSF, in which case I'd guess you can be very snarky without having a word said against you? > > > I suspect Brian's point is sarcasm is fine in moderation. I'm sure we have all had incidences online where sarcasm was not understood and someone took it the wrong way. And with this list being international the chance of something not catching something as sarcastic just goes up. So sarcasm is fine, but keep it on the lighter side is all. I personally draw the line as so: - is my draft post *just* snark? Then I delete it rather than posting - such posts never further the discussion, increase the level of noise on the list, and generally waste the time of other list subscribers solely for some momentary emotional satisfaction on my part. If I really feel the need to vent, then I'll send the unhelpful post directly to a friend rather than to the list. This is the kind of post that has no place on any of the core development lists. - is there a snarky side comment in an otherwise constructive post? Then I'll usually take it out anyway, since such comments still usually hinder communication rather than helping it, and we already have enough inherent barriers to effective communication due to a relative lack of knowledge of each other's backgrounds and experience. However, if I'm genuinely irritated, I'll sometimes leave them in - I'm not a saint, and a snarky comment that indicates "I am annoyed by this thread or situation" is a vastly different thing from a snarky *post* that says to someone else "your post was bad and you should feel bad". - there are other times (fortunately rare), when I consider it necessary to express genuine concern or anger. My main tool for dealing with such posts in the most constructive manner possible is to find every occurrence of the pronoun "you" (or other people's names) and figure out how to replace it with the pronoun "I". The purpose of such rephrasing is to help ensure the post is a constructive one expressing my concerns and sharing my impressions and experience rather than a destructive one that causes the recipients to become defensive, because once we dig in our heels and start defending our positions out of ego rather than reason, then the opportunity for a meaningful, productive discussion is lost. So, a snarky side comment or two in otherwise constructive post? Not preferred, but usually acceptable. A post consisting of nothing but snark? Not acceptable - either don't post it, or send it to a trusted friend off-list in order to vent. In this specific case, our general communication about the different purposes of the core lists *isn't* particularly good, so it's entirely expected that we'll still get the occasional post to python-dev that is better directed to a different list. That's why everyone gets a free pass to asking one or two inappropriate questions on python-dev, since it isn't always clear to them that the question is off-topic. The appropriate response is to politely explain the purposes of the different lists and redirect them to the correct one. Cheers, Nick. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jan 25 02:14:02 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 25 Jan 2014 12:14:02 +1100 Subject: [Python-Dev] str.rreplace In-Reply-To: References: Message-ID: <20140125011402.GS3915@ando> On Sat, Jan 25, 2014 at 10:41:05AM +1000, Nick Coghlan wrote: > In this specific case, our general communication about the different > purposes of the core lists *isn't* particularly good, Nick, I beg to differ: I think that our communication in this regard actually is quite reasonable. Before signing up to Python-Dev via the website, one cannot help but see right at the top of the page: Do not post general Python questions to this list. For help with Python please see the Python help page. https://mail.python.org/mailman/listinfo/python-dev Although perhaps a link directly to the python-list mailing list as well wouldn't go astray. > so it's entirely > expected that we'll still get the occasional post to python-dev that is > better directed to a different list. That's why everyone gets a free pass > to asking one or two inappropriate questions on python-dev, since it isn't > always clear to them that the question is off-topic. I agree with the conclusion, but not the reason. We should allow people a free pass for small errors, because we would appreciate such a free pass for small errors ourselves. To err is human, to forgive is humane, and a little bit of kindness helps grease the wheels of civilized discourse. (Also, there sometimes are grey areas where it isn't clear whether a question is on-topic or not. However, "Is there a version of str.replace that works from the right?" is not in that grey area.) What annoyed me most about Ram's thread is not that he made the mistake in the first place, but that when gently corrected, he choose to argue and give spurious reasons for why this was the right place to ask. Still, I think Mark's overly-aggressive use of sarcasm in an otherwise content-less post was out of proportion to the magnitude of Ram's transgression. -- Steven From ncoghlan at gmail.com Sat Jan 25 04:12:43 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Jan 2014 13:12:43 +1000 Subject: [Python-Dev] str.rreplace In-Reply-To: <20140125011402.GS3915@ando> References: <20140125011402.GS3915@ando> Message-ID: On 25 January 2014 11:14, Steven D'Aprano wrote: > On Sat, Jan 25, 2014 at 10:41:05AM +1000, Nick Coghlan wrote: > >> In this specific case, our general communication about the different >> purposes of the core lists *isn't* particularly good, > > Nick, I beg to differ: I think that our communication in this regard > actually is quite reasonable. Before signing up to Python-Dev via the > website, one cannot help but see right at the top of the page: > > Do not post general Python questions to this list. For help with > Python please see the Python help page. > > https://mail.python.org/mailman/listinfo/python-dev > > Although perhaps a link directly to the python-list mailing list > as well wouldn't go astray. I believe you can currently post without subscribing, though. It isn't that the relevant information isn't available (it is: http://docs.python.org/devguide/communication.html), it's that there are lots of ways to miss that ifnformation, so there's always going to be the occasional misdirected question. And yes, I agree that responding to gentle redirection with "I still think this is the right place for my question" is not an appropriate way for anyone to behave. Moving on to practical matters: perhaps we should ensure a link to that communications page is included in the list descriptions and automated footers for at least python-dev and python-ideas, and also that we update it to include a link to the PSF code of conduct page? (The python-ideas footer already links directly to the CoC) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From larry at hastings.org Sat Jan 25 05:07:43 2014 From: larry at hastings.org (Larry Hastings) Date: Fri, 24 Jan 2014 20:07:43 -0800 Subject: [Python-Dev] Quick poll: should help() show bound arguments? Message-ID: <52E3388F.4090109@hastings.org> (Quick, because apparently nobody reads the long ones!) In Python 3.3: >>> class C: ... def foo(self, a): pass ... >>> c = C() >>> help(c.foo) shows you the signature "foo(self, a)". As in, it claims it accepts two parameters. The function actually only accepts one parameter, because "self" has already been bound. inspect.signature gets this right: >>> import inspect >>> str(inspect.signature(c.foo)) '(a)' but inspect.getfullargspec does not: >>> inspect.getfullargspec(c.foo) FullArgSpec(args=['self', 'a'], varargs=None, varkw=None, defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={}) help() gets its text from pydoc. pydoc uses inspect.getfullargspec to produce the signature. When I added support for introspection on builtins, I wanted help() to show their signature too. But inspect.getfullargspec doesn't support introspection on builtins. So I had to use inspect.signature. Which means the behavior is inconsistent: help() on a method of an instance of a builtin class *doesn't* show "self". FYI, the relevant issues: help(instance_of_builtin_class.method) does not display self http://bugs.python.org/issue20379 inspect.getfullargspec should use __siganture__ http://bugs.python.org/issue17481 What should it be? A) pydoc and help() should not show bound parameters in the signature, like inspect.signature. B) pydoc and help() should show bound parameters in the signature, like inspect.getfullargspec. I'll tally the results if there's interest. I'd assume a "vote for A" = +1 on A and -1 on B. You can express your vote numerically if you like. I'm voting for A. And yes, that was short for me, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.ware+pydev at gmail.com Sat Jan 25 05:18:19 2014 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Fri, 24 Jan 2014 22:18:19 -0600 Subject: [Python-Dev] Quick poll: should help() show bound arguments? In-Reply-To: <52E3388F.4090109@hastings.org> References: <52E3388F.4090109@hastings.org> Message-ID: On Fri, Jan 24, 2014 at 10:07 PM, Larry Hastings wrote: > > (Quick, because apparently nobody reads the long ones!) > > In Python 3.3: > >>>> class C: > ... def foo(self, a): pass > ... >>>> c = C() >>>> help(c.foo) > > shows you the signature "foo(self, a)". As in, it claims it accepts two > parameters. The function actually only accepts one parameter, because > "self" has already been bound. inspect.signature gets this right: > >>>> import inspect >>>> str(inspect.signature(c.foo)) > '(a)' > > but inspect.getfullargspec does not: > >>>> inspect.getfullargspec(c.foo) > FullArgSpec(args=['self', 'a'], varargs=None, varkw=None, defaults=None, > kwonlyargs=[], kwonlydefaults=None, annotations={}) > > > help() gets its text from pydoc. pydoc uses inspect.getfullargspec to > produce the signature. > > When I added support for introspection on builtins, I wanted help() to show > their signature too. But inspect.getfullargspec doesn't support > introspection on builtins. So I had to use inspect.signature. Which means > the behavior is inconsistent: help() on a method of an instance of a builtin > class *doesn't* show "self". > > > FYI, the relevant issues: > > help(instance_of_builtin_class.method) does not display self > > http://bugs.python.org/issue20379 > > inspect.getfullargspec should use __siganture__ > > http://bugs.python.org/issue17481 > > > What should it be? > > A) pydoc and help() should not show bound parameters in the signature, like > inspect.signature. > B) pydoc and help() should show bound parameters in the signature, like > inspect.getfullargspec. > > I'll tally the results if there's interest. I'd assume a "vote for A" = +1 > on A and -1 on B. You can express your vote numerically if you like. I'm > voting for A. > I vote A: it makes sense (though B does too, to an extent), and the patch to make help() consistent for Python and C implemented methods is simply removing two lines from pydoc. I'm not sure how convoluted it might become to make it work the other way. -- Zach From steve at pearwood.info Sat Jan 25 05:36:49 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 25 Jan 2014 15:36:49 +1100 Subject: [Python-Dev] Quick poll: should help() show bound arguments? In-Reply-To: <52E3388F.4090109@hastings.org> References: <52E3388F.4090109@hastings.org> Message-ID: <20140125043649.GW3915@ando> On Fri, Jan 24, 2014 at 08:07:43PM -0800, Larry Hastings wrote: > > (Quick, because apparently nobody reads the long ones!) > > In Python 3.3: > > >>> class C: > ... def foo(self, a): pass > ... > >>> c = C() > >>> help(c.foo) > > shows you the signature "foo(self, a)". That matches the function declaration as defined right there in the class. > As in, it claims it accepts two > parameters. The function actually only accepts one parameter, because > "self" has already been bound. No, that's wrong. The *function* C.foo accepts two parameters, self and a, exactly as declared. The *method* c.foo accepts only one. > inspect.signature gets this right: > > >>> import inspect > >>> str(inspect.signature(c.foo)) > '(a)' > > but inspect.getfullargspec does not: > > >>> inspect.getfullargspec(c.foo) > FullArgSpec(args=['self', 'a'], varargs=None, varkw=None, > defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={}) That's arguable. Should inspect give the argument spec of the method object or the function object? I can see arguments for both. Backwards-compatibility argues against changing either, though. help() is another story. Since help() is documentation aimed at the end-user, we can change it without worrying about backward compatibility. Ideally, help() ought to distingush between the two cases. I see that it already does, but not very well. help(c.foo) in Python 3.3 gives: Help on method foo in module __main__: foo(self, a) method of __main__.C instance while help(C.foo) gives: Help on function foo in module __main__: foo(self, a) I think in the bound method case, it should drop the "self": Help on method foo in module __main__: foo(a) method of __main__.C instance and in the function (previously: unbound method) case, it should show the "self" and the class: Help on function foo in module __main__: foo(self, a) method of class __main__.C although I'm not sure how that second one is possible, now that unbound methods are gone and C.foo returns a plain function. (Hmmm, perhaps getting rid of unbound methods was a mistake...) > A) pydoc and help() should not show bound parameters in the signature, > like inspect.signature. > B) pydoc and help() should show bound parameters in the signature, like > inspect.getfullargspec. Provided they are described as *methods* rather than *functions*, bound methods should not show self. Likewise for classmethod and cls. That is, I'm voting +1 on A provided help() doesn't confuse methods and functions. -- Steven From benjamin at python.org Sat Jan 25 05:49:53 2014 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 24 Jan 2014 20:49:53 -0800 Subject: [Python-Dev] [Python-checkins] Cron /home/docs/build-devguide In-Reply-To: References: Message-ID: <1390625393.20121.75103497.659301B2@webmail.messagingengine.com> On Fri, Jan 24, 2014, at 08:45 PM, Cron Daemon wrote: > Could not find platform independent libraries > Could not find platform dependent libraries > Consider setting $PYTHONHOME to [:] > 'import site' failed; use -v for traceback > Traceback (most recent call last): > File "/data/hg/sphinx-env/bin/sphinx-build", line 5, in > from pkg_resources import load_entry_point > ImportError: No module named pkg_resources I recreated the sphinx virtualenv on dinsdale, so hopefully this will be fixed now. From ncoghlan at gmail.com Sat Jan 25 06:34:37 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Jan 2014 15:34:37 +1000 Subject: [Python-Dev] Quick poll: should help() show bound arguments? In-Reply-To: <20140125043649.GW3915@ando> References: <52E3388F.4090109@hastings.org> <20140125043649.GW3915@ando> Message-ID: On 25 January 2014 14:36, Steven D'Aprano wrote: > On Fri, Jan 24, 2014 at 08:07:43PM -0800, Larry Hastings wrote: >> A) pydoc and help() should not show bound parameters in the signature, >> like inspect.signature. >> B) pydoc and help() should show bound parameters in the signature, like >> inspect.getfullargspec. > > Provided they are described as *methods* rather than *functions*, bound > methods should not show self. Likewise for classmethod and cls. That is, > I'm voting +1 on A provided help() doesn't confuse methods and > functions. +1 from me for fixing the signature help() and pydoc report for bound methods that are reported as such. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tseaver at palladion.com Sat Jan 25 07:03:00 2014 From: tseaver at palladion.com (Tres Seaver) Date: Sat, 25 Jan 2014 01:03:00 -0500 Subject: [Python-Dev] Quick poll: should help() show bound arguments? In-Reply-To: <52E3388F.4090109@hastings.org> References: <52E3388F.4090109@hastings.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/24/2014 11:07 PM, Larry Hastings wrote: > A) pydoc and help() should not show bound parameters in the signature, > like inspect.signature. B) pydoc and help() should show bound > parameters in the signature, like inspect.getfullargspec. +1 for A: it is how you would actually call the object on which 'help()' is being called. The fact that self will be passed silently is irrelevant to the caller. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLjU4UACgkQ+gerLs4ltQ48SQCg0zMZseKV3EZ/0pRkc5ngt2tb aFMAn0Vhze2wMEim6Vf7F1fvlh+j3PJ/ =Mx/i -----END PGP SIGNATURE----- From greg.ewing at canterbury.ac.nz Sat Jan 25 06:41:14 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 25 Jan 2014 18:41:14 +1300 Subject: [Python-Dev] lambda (x, y): In-Reply-To: References: Message-ID: <52E34E7A.10605@canterbury.ac.nz> Brett Cannon wrote: > > On Fri, Jan 24, 2014 at 10:50 AM, Ram Rachum > wrote: > > lambda (x, y): whatever > > http://python.org/dev/peps/pep-3113/ Part of the rationale in that PEP is that argument unpacking can always be replaced by an explicitly named argument and an unpacking assignment. No mention is made of the fact that you can't do this in a lambda, giving the impression that lambdas are deemed second-class citizens that are not worth consideration. The author was clearly aware of the issue, since a strategy is suggested for translation of lambdas by 2to3: lambda (x, y): x + y --> lambda x_y: x_y[0] + x_y[1] That's a bit on the ugly side for human use, though. An alternative would be lambda xy: (lambda x, y: x + y)(*xy) Whether that's any better is a matter of opinion. -- Greg From ncoghlan at gmail.com Sat Jan 25 07:50:53 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Jan 2014 16:50:53 +1000 Subject: [Python-Dev] lambda (x, y): In-Reply-To: <52E34E7A.10605@canterbury.ac.nz> References: <52E34E7A.10605@canterbury.ac.nz> Message-ID: On 25 January 2014 15:41, Greg Ewing wrote: > Brett Cannon wrote: >> >> >> On Fri, Jan 24, 2014 at 10:50 AM, Ram Rachum > > wrote: >> >> lambda (x, y): whatever >> >> http://python.org/dev/peps/pep-3113/ > > > Part of the rationale in that PEP is that argument unpacking > can always be replaced by an explicitly named argument and > an unpacking assignment. No mention is made of the fact that > you can't do this in a lambda, giving the impression that > lambdas are deemed second-class citizens that are not worth > consideration. Given that lambdas only just escaped being removed entirely from the language in Python 3, that impression isn't wrong. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Sat Jan 25 07:55:04 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 25 Jan 2014 08:55:04 +0200 Subject: [Python-Dev] lambda (x, y): In-Reply-To: <52E34E7A.10605@canterbury.ac.nz> References: <52E34E7A.10605@canterbury.ac.nz> Message-ID: 25.01.14 07:41, Greg Ewing ???????(??): > Brett Cannon wrote: >> >> On Fri, Jan 24, 2014 at 10:50 AM, Ram Rachum > > wrote: >> >> lambda (x, y): whatever >> >> http://python.org/dev/peps/pep-3113/ > > Part of the rationale in that PEP is that argument unpacking > can always be replaced by an explicitly named argument and > an unpacking assignment. No mention is made of the fact that > you can't do this in a lambda, giving the impression that > lambdas are deemed second-class citizens that are not worth > consideration. > > The author was clearly aware of the issue, since a strategy > is suggested for translation of lambdas by 2to3: > > lambda (x, y): x + y --> lambda x_y: x_y[0] + x_y[1] > > That's a bit on the ugly side for human use, though. > An alternative would be > > lambda xy: (lambda x, y: x + y)(*xy) > > Whether that's any better is a matter of opinion. There is open issue for this: http://bugs.python.org/issue16094 From ncoghlan at gmail.com Sat Jan 25 08:44:32 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Jan 2014 17:44:32 +1000 Subject: [Python-Dev] Argument Clinic: what to do with builtins with non-standard signatures? In-Reply-To: <52E281C3.1050303@hastings.org> References: <52E281C3.1050303@hastings.org> Message-ID: On 25 January 2014 01:07, Larry Hastings wrote: > I'm sorting the problems we see into four rough categories. > > a) Functions where there's a static Python value that behaves > identically to not passing in that parameter (aka "the NULL problem") > > Possible Solutions: > 0) Do nothing, don't convert the function. > 1) Use that clever static value as the default. For this case, I think option 1) is better, as there's no externally visible change in semantics, just a change to the internal implementation details. > b) Functions where there's no static Python value that behaves identically > to > not passing in that parameter (aka "the dynamic default problem") > > Possible solutions: > 0) Do nothing, don't convert the function. > 1) Use a magic value as None. Preferably of the same type as the > function accepts, but failing that use None. If they pass in > the magic value use the previous default value. Guido himself > suggested this in > 2) Use an Argument Clinic "optional group". This only works for > functions that don't support keyword arguments. Also, I hate > this, because "optional groups" are not expressable in Python > syntax, so these functions automatically have invalid signatures. I'm inclined to say "leave these for now, we'll fix them in 3.5". They're going to be hard to convert without altering their semantics, which we shouldn't be doing at this stage of the release cycle. There's going to be follow up work in 3.5 anyway, as I think we should continue with PEP 457 to make __text_signature__ a public API and add optional group support to inspect.Signature. > c) Functions that accept an 'int' when they mean 'boolean' (aka the > "ints instead of bools" problem) > > Solution: > 1) Use "bool". > 2) Use "int", and I'll go relax Argument Clinic so they > can use bool values as defaults for int parameters. If the temptation is to use True or False as the default, then I think that's a clear argument that these should be accepting "bool". However, expanding the accepted types is also clearly a new feature that would need a "versionchanged" in the docs for all affected functions, so I think these changes also belong in the "conversion implies semantic changes, so leave until 3.5" category. > d) Functions with behavior that deliberately defy being expressed as a > Python signature (aka the "untranslatable signature" problem) > > Example: > itertools.repeat(), which behaves differently depending on whether > "times" is supplied as a positional or keyword argument. (If > "times" is <0, and was supplied via position, the function yields > 0 times. If "times" is <0, and was supplied via keyword, the > function yields infinitely-many times.) > > Solution: > 0) Do nothing, don't convert the function. > 1) Change the signature until it is Python compatible. This new > signature *must* accept a superset of the arguments accepted > by the existing signature. (This is being discussed right > now in issue #19145.) For these, I think we should respect the release cycle and leave them until 3.5. Python has survived for a couple of decades with broken introspection for builtins and extension modules, we'll survive another release that still exhibits a subset of the problem :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From g.brandl at gmx.net Sat Jan 25 08:51:02 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 25 Jan 2014 08:51:02 +0100 Subject: [Python-Dev] [Python-checkins] Cron /home/docs/build-devguide In-Reply-To: <1390625393.20121.75103497.659301B2@webmail.messagingengine.com> References: <1390625393.20121.75103497.659301B2@webmail.messagingengine.com> Message-ID: Am 25.01.2014 05:49, schrieb Benjamin Peterson: > On Fri, Jan 24, 2014, at 08:45 PM, Cron Daemon wrote: >> Could not find platform independent libraries >> Could not find platform dependent libraries >> Consider setting $PYTHONHOME to [:] >> 'import site' failed; use -v for traceback >> Traceback (most recent call last): >> File "/data/hg/sphinx-env/bin/sphinx-build", line 5, in >> from pkg_resources import load_entry_point >> ImportError: No module named pkg_resources > > I recreated the sphinx virtualenv on dinsdale, so hopefully this will be > fixed now. Thanks, this somehow slipped by me. Georg From ncoghlan at gmail.com Sat Jan 25 10:20:22 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Jan 2014 19:20:22 +1000 Subject: [Python-Dev] Argument Clinic: what to do with builtins with non-standard signatures? In-Reply-To: References: <52E281C3.1050303@hastings.org> Message-ID: On 25 January 2014 17:44, Nick Coghlan wrote: > On 25 January 2014 01:07, Larry Hastings wrote: >> c) Functions that accept an 'int' when they mean 'boolean' (aka the >> "ints instead of bools" problem) >> >> Solution: >> 1) Use "bool". >> 2) Use "int", and I'll go relax Argument Clinic so they >> can use bool values as defaults for int parameters. > > If the temptation is to use True or False as the default, then I think > that's a clear argument that these should be accepting "bool". > However, expanding the accepted types is also clearly a new feature > that would need a "versionchanged" in the docs for all affected > functions, so I think these changes also belong in the "conversion > implies semantic changes, so leave until 3.5" category. I changed my mind (slightly) on this one. For 3.4, we can go with converting the current semantics (i.e. using "i"), and tweaking argument clinic to all bool defaults for integers. That allows the introspection to be added sensibly, without changing the semantics of the interface. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Jan 25 10:20:56 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Jan 2014 19:20:56 +1000 Subject: [Python-Dev] Argument Clinic: what to do with builtins with non-standard signatures? In-Reply-To: References: <52E281C3.1050303@hastings.org> Message-ID: On 25 January 2014 19:20, Nick Coghlan wrote: > On 25 January 2014 17:44, Nick Coghlan wrote: >> On 25 January 2014 01:07, Larry Hastings wrote: >>> c) Functions that accept an 'int' when they mean 'boolean' (aka the >>> "ints instead of bools" problem) >>> >>> Solution: >>> 1) Use "bool". >>> 2) Use "int", and I'll go relax Argument Clinic so they >>> can use bool values as defaults for int parameters. >> >> If the temptation is to use True or False as the default, then I think >> that's a clear argument that these should be accepting "bool". >> However, expanding the accepted types is also clearly a new feature >> that would need a "versionchanged" in the docs for all affected >> functions, so I think these changes also belong in the "conversion >> implies semantic changes, so leave until 3.5" category. > > I changed my mind (slightly) on this one. For 3.4, we can go with > converting the current semantics (i.e. using "i"), and tweaking > argument clinic to all bool defaults for integers. "allow bool defaults", rather. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan_ml at behnel.de Sat Jan 25 10:46:08 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 25 Jan 2014 10:46:08 +0100 Subject: [Python-Dev] Argument Clinic: what to do with builtins with non-standard signatures? In-Reply-To: References: <52E281C3.1050303@hastings.org> Message-ID: Nick Coghlan, 25.01.2014 10:20: > On 25 January 2014 17:44, Nick Coghlan wrote: >> On 25 January 2014 01:07, Larry Hastings wrote: >>> c) Functions that accept an 'int' when they mean 'boolean' (aka the >>> "ints instead of bools" problem) >>> >>> Solution: >>> 1) Use "bool". >>> 2) Use "int", and I'll go relax Argument Clinic so they >>> can use bool values as defaults for int parameters. >> >> If the temptation is to use True or False as the default, then I think >> that's a clear argument that these should be accepting "bool". >> However, expanding the accepted types is also clearly a new feature >> that would need a "versionchanged" in the docs for all affected >> functions, so I think these changes also belong in the "conversion >> implies semantic changes, so leave until 3.5" category. > > I changed my mind (slightly) on this one. For 3.4, we can go with > converting the current semantics (i.e. using "i"), and tweaking > argument clinic to all[ow] bool defaults for integers. > > That allows the introspection to be added sensibly, without changing > the semantics of the interface. FWIW, Cython knows a type called "bint" that is identical to a C int except that it automatically coerces to and from a Python boolean value (using truth testing). Seems to match the use case of the "p" that was added to CPython's arg parsing now. Given that "p" hasn't been around for all that long (and that Python didn't even have a bool type in its early days), it's clear why the existing code misused "i" in so many places over the last decades. I otherwise agree with Nick's comments above. It's sad that this can't just be fixed at the interface level, though. Stefan From 2013 at jmunch.dk Sat Jan 25 12:34:06 2014 From: 2013 at jmunch.dk (Anders J. Munch) Date: Sat, 25 Jan 2014 12:34:06 +0100 Subject: [Python-Dev] Quick poll: should help() show bound arguments? In-Reply-To: <52E3388F.4090109@hastings.org> References: <52E3388F.4090109@hastings.org> Message-ID: <52E3A12E.1070804@jmunch.dk> Larry Hastings wrote: > > inspect.signature gets this right: > > >>> import inspect > >>> str(inspect.signature(c.foo)) > '(a)' > Not always. : Python 3.4.0b2+ (default:32f9e0ae23f7, Jan 18 2014, 13:56:31) : [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin : Type "help", "copyright", "credits" or "license" for more information. : >>> import inspect : >>> class C1: : ... def f(*args, **kwargs): pass : ... : >>> c = C1() : >>> c.f() : >>> str(inspect.signature(c.f)) : '(**kwargs)' Not to mention: class C2: def g(**kwargs): pass It doesn't really make sense - calling C2().g is guaranteed to fail - but it's legal Python. I'm not saying you can't special-case a few things and fix this, but still, -1/B. I like explicit self. regards, Anders From larry at hastings.org Sat Jan 25 13:19:18 2014 From: larry at hastings.org (Larry Hastings) Date: Sat, 25 Jan 2014 04:19:18 -0800 Subject: [Python-Dev] New policies for the Derby -- please read! Message-ID: <52E3ABC6.5080506@hastings.org> 1) New policy for what can and cannot be converted during the Derby First, let me apologize for only figuring this out now. The Derby has been a learning process, discovering things that Argument Clinic didn't handle. And there were a lot of funny edge cases that we weren't going to discover until we tried doing the conversion. And to be honest I was pushing harder than I should have. *ahem* The new policy for conversion work for Python 3.4: We may only convert functions that have signatures that we can represent 100% accurately in an inspect.Signature object. IF the function has a default value that can't be represented in Python (e.g. NULL), but we can find a value in Python that behaves identically to not passing in that parameter (e.g. _sha1.sha1(b'') == _sha1.sha1()), then we may convert that function using that clever default value (e.g. "string: object(c_default='NULL') = b'' "). IF the function has parameters with default values that are dynamic or cannot be represented accurately in Python, it cannot be converted without changing its semantics. So it cannot be converted for 3.4. IF the function *requires* "optional groups" (as in, the original function used switch(PyTuple_GET_SIZE(args)) and multiple calls to PyArg_ParseTuple()), then it's permissible to convert it for 3.4, but they are low priority. Such functions have semantics that are so weird, we will have to modify inspect.Parameter to support them, and that will only happen in 3.5. However, I will ensure that they otherwise convert correctly for 3.4. (They won't actually generate signatures. They will however generate the first line of the docstring.) For 3.5, I expect we'll have more leeway in doing things like "this should accept an int or None". But we should talk about that then. If you have patches outstanding that convert functions that shouldn't be converted for 3.4, please put them aside. We can almost certainly use them for 3.5. If any conversions have been committed that change the semantics of the function, someone will have to back them out. I'd appreciate it if the person who checked it in could do it. I expect to make a pass before rc1 to check all the conversions myself. (Which I hope will be quite time-consuming, as hopefully there will be lots of converted functions by then!) _______________________________________________________________________________ 2) New recommendation for marking to-do functions If you examine a function and determine that it can't be converted to Argument Clinic right now, please add a comment to that effect. The comment should be one line, and contain a special marker so we can find it easily with searches. I nominate two different markers: "AC 3.4" means "It's okay to convert this function to Argument Clinic in 3.4, but it can't be converted right now because of a bug or missing feature in Argument Clinic." "AC 3.5" means "This function can't be converted to Argument Clinic in 3.4, it must wait for 3.5." I encourage you to add a little text saying why, like: /* AC 3.4: waiting for *args support */ /* AC 3.5: value parameter has default value of NULL */ _______________________________________________________________________________ Finally, I've realized that right now there's no good way to stay abreast of what's new and changing in clinic.py. I check in a patch just about every day for clinic.py, and sometimes I don't remember to update the "howto". The best way unfortunately is to read the output of "hg log". If you have questions, you can email me directly at "larry at hastings dot org", or you can find me in the #python-dev IRC channel. Thank you, everybody who's participating in the Derby, and again I'm sorry I didn't realize sooner this *had* to be the policy for 3.4. Hope to see your issues in the tracker, /arry -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jan 25 13:34:47 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 25 Jan 2014 23:34:47 +1100 Subject: [Python-Dev] Quick poll: should help() show bound arguments? In-Reply-To: <52E3388F.4090109@hastings.org> References: <52E3388F.4090109@hastings.org> Message-ID: On Sat, Jan 25, 2014 at 3:07 PM, Larry Hastings wrote: > What should it be? > > A) pydoc and help() should not show bound parameters in the signature, like > inspect.signature. > B) pydoc and help() should show bound parameters in the signature, like > inspect.getfullargspec. Vote for A. As far as I'm concerned, all these foo are equally callable and equally take one parameter named a: def foo1(a): pass class C: def foo(self, a): pass foo2=C().foo class C: def __call__(self, a): pass foo3=C() def two_arg(b, a): pass foo4=lambda a: two_arg(0, a) If I call them as fooN(), fooN(1), and fooN(1,2), the middle one works and the other two throw exceptions, ergo they are one-argument functions. The fact that two of them happen to be bound methods is an implementation detail; it's just a form of currying, which foo4 happens also to be (in that C.foo takes two args, C().foo takes one). ChrisA From larry at hastings.org Sat Jan 25 14:29:05 2014 From: larry at hastings.org (Larry Hastings) Date: Sat, 25 Jan 2014 05:29:05 -0800 Subject: [Python-Dev] Quick poll: should help() show bound arguments? In-Reply-To: <52E3A12E.1070804@jmunch.dk> References: <52E3388F.4090109@hastings.org> <52E3A12E.1070804@jmunch.dk> Message-ID: <52E3BC21.30703@hastings.org> On 01/25/2014 03:34 AM, Anders J. Munch wrote: > Larry Hastings wrote: >> >> inspect.signature gets this right: >> >> >>> import inspect >> >>> str(inspect.signature(c.foo)) >> '(a)' >> > > Not always. > > : Python 3.4.0b2+ (default:32f9e0ae23f7, Jan 18 2014, 13:56:31) > : [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin > : Type "help", "copyright", "credits" or "license" for more information. > : >>> import inspect > : >>> class C1: > : ... def f(*args, **kwargs): pass > : ... > : >>> c = C1() > : >>> c.f() > : >>> str(inspect.signature(c.f)) > : '(**kwargs)' File a bug, if there hasn't already been one filed. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ezio.melotti at gmail.com Sat Jan 25 14:29:53 2014 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Sat, 25 Jan 2014 15:29:53 +0200 Subject: [Python-Dev] Deprecation policy Message-ID: Hi, a couple of years ago I suggested to define and document our deprecation policy in this thread: https://mail.python.org/pipermail/python-dev/2011-October/114199.html I didn't receive many replies and eventually nothing was done. Lately the same issue came up on #python-dev and Larry and Nick suggested me to bring this up again. Nick also suggested to document our deprecation policy in PEP 5 (Guidelines for Language Evolution: http://www.python.org/dev/peps/pep-0005/ ). I'm including below the full text of the original email. Best Regards, Ezio Melotti ------------------------------- Hi, our current deprecation policy is not so well defined (see e.g. [0]), and it seems to me that it's something like: 1) deprecate something and add a DeprecationWarning; 2) forget about it after a while; 3) wait a few versions until someone notices it; 4) actually remove it; I suggest to follow the following process: 1) deprecate something and add a DeprecationWarning; 2) decide how long the deprecation should last; 3) use the deprecated-remove[1] directive to document it; 4) add a test that fails after the update so that we remember to remove it[2]; Other related issues: PendingDeprecationWarnings: * AFAIK the difference between PDW and DW is that PDW are silenced by default; * now DW are silence by default too, so there are no differences; * I therefore suggest we stop using it, but we can leave it around[3] (other projects might be using it for something different); Deprecation Progression: Before, we more or less used to deprecated in release X and remove in X+1, or add a PDW in X, DW in X+1, and remove it in X+2. I suggest we drop this scheme and just use DW until X+N, where N is >=1 and depends on what is being removed. We can decide to leave the DW for 2-3 versions before removing something widely used, or just deprecate in X and remove in X+1 for things that are less used. Porting from 2.x to 3.x: Some people will update directly from 2.7 to 3.2 or even later versions (3.3, 3.4, ...), without going through earlier 3.x versions. If something is deprecated on 3.2 but not in 2.7 and then is removed in 3.3, people updating from 2.7 to 3.3 won't see any warning, and this will make the porting even more difficult. I suggest that: * nothing that is available and not deprecated in 2.7, will be removed until 3.x (x needs to be defined); * possibly we start backporting warnings to 2.7 so that they are visible while running with -3; Documenting the deprecations: In order to advertise the deprecations, they should be documented: * in their doc, using the deprecated-removed directive (and possibly not the 'deprecated' one); * in the what's new, possibly listing everything that is currently deprecated, and when it will be removed; Django seems to do something similar[4]. (Another thing I would like is a different rending for deprecated functions. Some part of the docs have a deprecation warning on the top of the section and the single functions look normal if you miss that. Also while linking to a deprecated function it would be nice to have it rendered with a different color or something similar.) Testing the deprecations: Tests that fail when a new release is made and the version number is bumped should be added to make sure we don't forget to remove it. The test should have a related issue with a patch to remove the deprecated function and the test. Setting the priority of the issue to release blocker or deferred blocker can be done in addition/instead, but that works well only when N == 1 (the priority could be updated for every release though). The tests could be marked with an expected failure to give some time after the release to remove them. All the deprecation-related tests might be added to the same file, or left in the test file of their module. Where to add this: Once we agree about the process we should write it down somewhere. Possible candidates are: * PEP387: Backwards Compatibility Policy[5] (it has a few lines about this); * a new PEP; * the devguide; I think having it in a PEP would be good, the devguide can then link to it. Best Regards, Ezio Melotti [0]: http://bugs.python.org/issue13248 [1]: deprecated-removed doesn't seem to be documented in the documenting doc, but it was added here: http://hg.python.org/cpython/rev/03296316a892 [2]: see e.g. http://hg.python.org/cpython/file/default/Lib/unittest/test/test_case.py#l1187 [3]: we could also introduce a MetaDeprecationWarning and make PendingDeprecationWarning inherit from it so that it can be used to pending-deprecate itself. Once PendingDeprecationWarning is gone, the MetaDeprecationWarning will become useless and can then be used to meta-deprecate itself. [4]: https://docs.djangoproject.com/en/dev/internals/deprecation/ [5]: http://www.python.org/dev/peps/pep-0387/ From solipsis at pitrou.net Sat Jan 25 14:37:33 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 25 Jan 2014 14:37:33 +0100 Subject: [Python-Dev] Quick poll: should help() show bound arguments? References: <52E3388F.4090109@hastings.org> Message-ID: <20140125143733.48556505@fsol> On Fri, 24 Jan 2014 20:07:43 -0800 Larry Hastings wrote: > > A) pydoc and help() should not show bound parameters in the signature, > like inspect.signature. -1 from me. The problem is this will make help(c.foo) inconsistent with help(c) and help(C), and is bound to confuse newcomers. Speaking of which, I think asking for votes before all arguments have been made is counter-productive. Regards Antoine. From larry at hastings.org Sat Jan 25 14:42:58 2014 From: larry at hastings.org (Larry Hastings) Date: Sat, 25 Jan 2014 05:42:58 -0800 Subject: [Python-Dev] Quick poll: should help() show bound arguments? In-Reply-To: <20140125143733.48556505@fsol> References: <52E3388F.4090109@hastings.org> <20140125143733.48556505@fsol> Message-ID: <52E3BF62.3070905@hastings.org> On 01/25/2014 05:37 AM, Antoine Pitrou wrote: > Speaking of which, I think asking for votes before all arguments have > been made is counter-productive. Sorry, I didn't realize there was an established protocol for this. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Jan 25 14:59:49 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 25 Jan 2014 14:59:49 +0100 Subject: [Python-Dev] Quick poll: should help() show bound arguments? References: <52E3388F.4090109@hastings.org> <20140125143733.48556505@fsol> <52E3BF62.3070905@hastings.org> Message-ID: <20140125145949.61f046f4@fsol> On Sat, 25 Jan 2014 05:42:58 -0800 Larry Hastings wrote: > On 01/25/2014 05:37 AM, Antoine Pitrou wrote: > > Speaking of which, I think asking for votes before all arguments have > > been made is counter-productive. > > Sorry, I didn't realize there was an established protocol for this. No, the problem is that people then vote assuming you have done all the research and presented all arguments for and against, which you haven't. Regards Antoine. From ncoghlan at gmail.com Sat Jan 25 16:16:49 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 26 Jan 2014 01:16:49 +1000 Subject: [Python-Dev] Argument Clinic: what to do with builtins with non-standard signatures? In-Reply-To: References: <52E281C3.1050303@hastings.org> Message-ID: On 25 January 2014 19:46, Stefan Behnel wrote: > FWIW, Cython knows a type called "bint" that is identical to a C int except > that it automatically coerces to and from a Python boolean value (using > truth testing). Seems to match the use case of the "p" that was added to > CPython's arg parsing now. Given that "p" hasn't been around for all that > long (and that Python didn't even have a bool type in its early days), it's > clear why the existing code misused "i" in so many places over the last > decades. > > I otherwise agree with Nick's comments above. It's sad that this can't just > be fixed at the interface level, though. We're building up a nice collection of edge cases to address in 3.5 - it's getting to the point where I'm starting to think we should create the 3.5 release PEP early so we can start making notes of things we've decided we would like to do but are too late for 3.4... Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Jan 25 16:26:20 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 26 Jan 2014 01:26:20 +1000 Subject: [Python-Dev] New policies for the Derby -- please read! In-Reply-To: <52E3ABC6.5080506@hastings.org> References: <52E3ABC6.5080506@hastings.org> Message-ID: On 25 January 2014 22:19, Larry Hastings wrote: > 1) New policy for what can and cannot be converted during the Derby > The new policy for conversion work for Python 3.4: > > We may only convert functions that have signatures that we can represent > 100% accurately in an inspect.Signature object. +1. The builtin callables hit a few of these, and I think simply putting them aside for reconsideration post 3.4 release is our best option right now. > I encourage you to add a little text saying why, like: > > /* AC 3.4: waiting for *args support */ So, here's a suggestion I know you're not going to like, but I think is still worth considering: how about postponing varargs support in Argument Clinic until 3.5? Such a decision is not without cost: at least min, max and print can't be made to support programmatic introspection without it, and having unittest.mock.auto_spec(print) work would be a nice demonstration of *why* we think the cost of switching from print-as-statement to print-as-function was worth it in terms of unifying it with the rest of the language. However, you've indicated that adding varargs support is going to take you quite a bit of work, so postponing it is an option definitely worth considering at this point in the release cycle. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From larry at hastings.org Sat Jan 25 16:37:40 2014 From: larry at hastings.org (Larry Hastings) Date: Sat, 25 Jan 2014 07:37:40 -0800 Subject: [Python-Dev] New policies for the Derby -- please read! In-Reply-To: References: <52E3ABC6.5080506@hastings.org> Message-ID: <52E3DA44.8040801@hastings.org> On 01/25/2014 07:26 AM, Nick Coghlan wrote: > However, you've indicated that adding varargs support is going to take > you quite a bit of work, so postponing it is an option definitely > worth considering at this point in the release cycle. It's worth considering. I'm estimating it's about 1.5 days' worth of work. Mainly because, at the same time, I need to teach Clinic to have separate namespaces for "parser" and "impl" functions. At the same time I was going to implement a frequently-requested feature, allowing the C variable storing an argument to have a different name than the actual Python parameter. And it could be one of those "hey that was easier than I thought" things. 1.5 days is kind of worst-case. So maybe the best thing would be to give it a half-day and see if it turned out to be easy. Of course, If we bring the Derby to a close now-ish (debate going on in python-committers right now!), then I'll definitely put it off until 3.5. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Sat Jan 25 10:24:01 2014 From: wes.turner at gmail.com (Wes Turner) Date: Sat, 25 Jan 2014 03:24:01 -0600 Subject: [Python-Dev] str.rreplace In-Reply-To: References: <20140125011402.GS3915@ando> Message-ID: On Jan 24, 2014 9:13 PM, "Nick Coghlan" wrote: > > On 25 January 2014 11:14, Steven D'Aprano wrote: > > On Sat, Jan 25, 2014 at 10:41:05AM +1000, Nick Coghlan wrote: > > > >> In this specific case, our general communication about the different > >> purposes of the core lists *isn't* particularly good [...] > > It isn't that the relevant information isn't available [...] > it's that there > are lots of ways to miss that ifnformation, > so there's always going to > be the occasional misdirected question. Should this sort of signal separation guidance be available from the following 3 URLs? - http://docs.python.org/devguide/communication.html - http://www.python.org/community/lists/ - https://mail.python.org/mailman/listinfo A do / do not treatment that could be linked to could also be helpful. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Sat Jan 25 14:47:13 2014 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 25 Jan 2014 07:47:13 -0600 Subject: [Python-Dev] version numbers mismatched in google search results. Message-ID: When I do a google search the version numbers are mismatched with the linked page (or redirected). For example search for "python counter" I get the following results. (see attachment) It seems like the website is redirecting incorrectly. 1. collections - Python 3.3.3 documentation 1. links to http://docs.python.org/library/collections.html 2. redirects to http://docs.python.org/2/library/collections.html 3. Which is python 2.7.6 2. itertools - Python 3.3.3 documentation 1. links to http://docs.python.org/library/itertools.html 2. redirects to http://docs.python.org/2/library/itertools.html 3. Which is again 2.7.6 3. 8.3. collections ? Container datatypes - Python 3.3.3 documentation 1. This one seems correct, 3.40b2 2. links to http://docs.python.org/dev/library/collections The link to addresses are not really true, they look more like: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCcQFjAA&url=http%3A%2F%2Fdocs.python.org%2Flibrary%2Fcollections.html&ei=k7vjUqPrHM_jsAS-m4G4Cw&usg=AFQjCNFTyb_RHzPdorBGavEIR_ekNn_AFA&sig2=yW6S02oUEfioUot11lTAlQ&bvm=bv.59930103,d.cWc Vincent Davis -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: python_counter.tiff Type: image/tiff Size: 153994 bytes Desc: not available URL: From benjamin at python.org Sat Jan 25 17:12:34 2014 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 25 Jan 2014 08:12:34 -0800 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: References: Message-ID: <1390666354.4886.75222561.1A3A841F@webmail.messagingengine.com> On Sat, Jan 25, 2014, at 05:47 AM, Vincent Davis wrote: > When I do a google search the version numbers are mismatched with the > linked page (or redirected). > For example search for "python counter" I get the following results. (see > attachment) > It seems like the website is redirecting incorrectly. Internal links with no version redirect to the Python 2 version for backwards compatibility reasons. From g.brandl at gmx.net Sat Jan 25 17:26:03 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 25 Jan 2014 17:26:03 +0100 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <1390666354.4886.75222561.1A3A841F@webmail.messagingengine.com> References: <1390666354.4886.75222561.1A3A841F@webmail.messagingengine.com> Message-ID: Am 25.01.2014 17:12, schrieb Benjamin Peterson: > On Sat, Jan 25, 2014, at 05:47 AM, Vincent Davis wrote: >> When I do a google search the version numbers are mismatched with the >> linked page (or redirected). >> For example search for "python counter" I get the following results. (see >> attachment) >> It seems like the website is redirecting incorrectly. > > Internal links with no version redirect to the Python 2 version for > backwards compatibility reasons. Yep, and the URLs without version never served Python 3 docs as far as I can remember, so I don't know where Google has these s from. Georg From storchaka at gmail.com Sat Jan 25 18:52:31 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 25 Jan 2014 19:52:31 +0200 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <lc0ogt$mo2$1@ger.gmane.org> References: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> <1390666354.4886.75222561.1A3A841F@webmail.messagingengine.com> <lc0ogt$mo2$1@ger.gmane.org> Message-ID: <lc0tk1$gud$1@ger.gmane.org> 25.01.14 18:26, Georg Brandl ???????(??): > Am 25.01.2014 17:12, schrieb Benjamin Peterson: >> On Sat, Jan 25, 2014, at 05:47 AM, Vincent Davis wrote: >>> When I do a google search the version numbers are mismatched with the >>> linked page (or redirected). >>> For example search for "python counter" I get the following results. (see >>> attachment) >>> It seems like the website is redirecting incorrectly. >> >> Internal links with no version redirect to the Python 2 version for >> backwards compatibility reasons. > > Yep, and the URLs without version never served Python 3 docs as far as I can > remember, so I don't know where Google has these <title>s from. Guido had forgot his time machine? From ethan at stoneleaf.us Sat Jan 25 19:36:13 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 25 Jan 2014 10:36:13 -0800 Subject: [Python-Dev] Deprecation policy In-Reply-To: <CACBhJdGOysQPA6HkvCEZtbLPC7pAMxEK02=VD0K8bTm2TbHgDw@mail.gmail.com> References: <CACBhJdGOysQPA6HkvCEZtbLPC7pAMxEK02=VD0K8bTm2TbHgDw@mail.gmail.com> Message-ID: <52E4041D.4080305@stoneleaf.us> On 01/25/2014 05:29 AM, Ezio Melotti wrote: > > a couple of years ago I suggested to define and document our > deprecation policy +1 -- ~Ethan~ From benjamin at python.org Sat Jan 25 20:05:30 2014 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 25 Jan 2014 11:05:30 -0800 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <CALyJZZXBfwopr6SR0iJ_bn=FGGD53wKucbcEeSCWsNswoGUb5w@mail.gmail.com> References: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> <1390666354.4886.75222561.1A3A841F@webmail.messagingengine.com> <CALyJZZXBfwopr6SR0iJ_bn=FGGD53wKucbcEeSCWsNswoGUb5w@mail.gmail.com> Message-ID: <1390676730.5741.75263769.6502E7A9@webmail.messagingengine.com> On Sat, Jan 25, 2014, at 10:55 AM, Vincent Davis wrote: > On Sat, Jan 25, 2014 at 10:12 AM, Benjamin Peterson > <benjamin at python.org>wrote: > > > Internal links with no version redirect to the Python 2 version for > > backwards compatibility reasons. > > > > On Sat, Jan 25, 2014 at 10:26 AM, Georg Brandl <g.brandl at gmx.net> wrote: > > > Yep, and the URLs without version never served Python 3 docs as far as I > > can > > > remember, so I don't know where Google has these <title>s from. > ? > That is not consistent with? > ?http://docs.python.org (no version number) redirects to > http://docs.python.org/3/ This is recent. It used to go to Python 2 docs. > Maybe this is related to google search results. > Seems wrong to me to point to 2.7 rather that 3.3 but I am sure there was > discussion about that. The internal links all used to go to Python 2. > > I looked (googled) for an example of a google link to current version of > python 3.3 documentation. My approach was to google "python" and > something > listed in > http://docs.python.org/3/whatsnew/3.3.html > These results all seem to point to http://docs.python.org/dev/library? > i.e. > 3.4.0b2 > > > Vincent Davis From solipsis at pitrou.net Sat Jan 25 22:35:34 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 25 Jan 2014 22:35:34 +0100 Subject: [Python-Dev] cpython: Issue #20311: asyncio: Add a granularity attribute to BaseEventLoop: maximum References: <3fBJp820h8z7LlL@mail.python.org> Message-ID: <20140125223534.7247d6e9@fsol> On Sat, 25 Jan 2014 15:02:56 +0100 (CET) victor.stinner <python-checkins at python.org> wrote: > + > + @tasks.coroutine > + def wait(): > + loop = self.loop > + calls.append(loop._run_once_counter) > + yield from tasks.sleep(loop.granularity * 10, loop=loop) > + calls.append(loop._run_once_counter) > + yield from tasks.sleep(loop.granularity / 10, loop=loop) > + calls.append(loop._run_once_counter) > + > + self.loop.run_until_complete(wait()) > + calls.append(self.loop._run_once_counter) > + self.assertEqual(calls, [1, 3, 5, 6]) Could you add a comment explaining the number of calls to run_once()? For example, why does it jump from 1 to 3 and then 3 to 5, rather than 1 to 2 and then 2 to 3? Regards Antoine. From christian at python.org Sun Jan 26 00:10:40 2014 From: christian at python.org (Christian Heimes) Date: Sun, 26 Jan 2014 00:10:40 +0100 Subject: [Python-Dev] cpython: Issue #20133: The audioop module now uses Argument Clinic. In-Reply-To: <3fBCMr1Wdpz7LjP@mail.python.org> References: <3fBCMr1Wdpz7LjP@mail.python.org> Message-ID: <lc1g93$nn3$1@ger.gmane.org> On 25.01.2014 10:58, serhiy.storchaka wrote: > http://hg.python.org/cpython/rev/d4099b8a7d0f > changeset: 88687:d4099b8a7d0f > user: Serhiy Storchaka <storchaka at gmail.com> > date: Sat Jan 25 11:57:59 2014 +0200 > summary: > Issue #20133: The audioop module now uses Argument Clinic. > > files: > Modules/audioop.c | 1077 ++++++++++++++----------- > Modules/audioop.clinic.c | 836 ++++++++++++++++++++ > 2 files changed, 1427 insertions(+), 486 deletions(-) > Coverity has detected an issue in this commit: ** CID 1164423: Division or modulo by zero (DIVIDE_BY_ZERO) /Modules/audioop.c: 1375 in audioop_ratecv_impl() ________________________________________________________________________________________________________ *** CID 1164423: Division or modulo by zero (DIVIDE_BY_ZERO) /Modules/audioop.c: 1375 in audioop_ratecv_impl() 1369 without spurious overflow is the challenge; we can 1370 settle for a reasonable upper bound, though, in this 1371 case ceiling(len/inrate) * outrate. */ 1372 1373 /* compute ceiling(len/inrate) without overflow */ 1374 Py_ssize_t q = len > 0 ? 1 + (len - 1) / inrate : 0; >>> CID 1164423: Division or modulo by zero (DIVIDE_BY_ZERO) >>> In expression "9223372036854775807L / q", division by expression "q" which may be zero has undefined behavior. 1375 if (outrate > PY_SSIZE_T_MAX / q / bytes_per_frame) 1376 str = NULL; 1377 else 1378 str = PyBytes_FromStringAndSize(NULL, 1379 q * outrate * bytes_per_frame); 1380 } From brett at python.org Sun Jan 26 01:01:31 2014 From: brett at python.org (Brett Cannon) Date: Sat, 25 Jan 2014 19:01:31 -0500 Subject: [Python-Dev] lambda (x, y): In-Reply-To: <52E34E7A.10605@canterbury.ac.nz> References: <CANXboVaFfvK7d8zn71E=bR5NkRe7T4Hqmq_SQASWynKZE=mtMw@mail.gmail.com> <CAP1=2W4nO8w+GDv6xEGtWdhr9eaVmC6Rtd-thas+A6jF4hTkhw@mail.gmail.com> <52E34E7A.10605@canterbury.ac.nz> Message-ID: <CAP1=2W5cEPHPAYgHdo90628Bn4TrPXTTTcRXO8BOoO2BjP2Z7Q@mail.gmail.com> On Sat, Jan 25, 2014 at 12:41 AM, Greg Ewing <greg.ewing at canterbury.ac.nz>wrote: > Brett Cannon wrote: > >> >> On Fri, Jan 24, 2014 at 10:50 AM, Ram Rachum <ram at rachum.com <mailto: >> ram at rachum.com>> wrote: >> >> lambda (x, y): whatever >> >> http://python.org/dev/peps/pep-3113/ >> > > Part of the rationale in that PEP is that argument unpacking > can always be replaced by an explicitly named argument and > an unpacking assignment. No mention is made of the fact that > you can't do this in a lambda, giving the impression that > lambdas are deemed second-class citizens that are not worth > consideration. > > The author was clearly aware of the issue, since a strategy > is suggested for translation of lambdas by 2to3: > > lambda (x, y): x + y --> lambda x_y: x_y[0] + x_y[1] > > That's a bit on the ugly side for human use, though. > An alternative would be > > lambda xy: (lambda x, y: x + y)(*xy) > > Whether that's any better is a matter of opinion. As the author of the PEP and I can say that `lambda (x, y): x + y` can just as easily be expressed as `lambda x, y: x + y` and then be called by using *args in the argument list. Anything that gets much fancier typically calls for a defined function instead of a lambda. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140125/fd7e5491/attachment.html> From brett at python.org Sun Jan 26 01:19:15 2014 From: brett at python.org (Brett Cannon) Date: Sat, 25 Jan 2014 19:19:15 -0500 Subject: [Python-Dev] Deprecation policy In-Reply-To: <CACBhJdGOysQPA6HkvCEZtbLPC7pAMxEK02=VD0K8bTm2TbHgDw@mail.gmail.com> References: <CACBhJdGOysQPA6HkvCEZtbLPC7pAMxEK02=VD0K8bTm2TbHgDw@mail.gmail.com> Message-ID: <CAP1=2W5G_W6_YftOqrWpjH63-rLrFXaon4Ak3zp_scrnwV1b3A@mail.gmail.com> On Sat, Jan 25, 2014 at 8:29 AM, Ezio Melotti <ezio.melotti at gmail.com>wrote: > Hi, > a couple of years ago I suggested to define and document our > deprecation policy in this thread: > https://mail.python.org/pipermail/python-dev/2011-October/114199.html > I didn't receive many replies and eventually nothing was done. > Lately the same issue came up on #python-dev and Larry and Nick > suggested me to bring this up again. Nick also suggested to document > our deprecation policy in PEP 5 (Guidelines for Language Evolution: > http://www.python.org/dev/peps/pep-0005/ ). > > I'm including below the full text of the original email. > > Best Regards, > Ezio Melotti > > ------------------------------- > > Hi, > our current deprecation policy is not so well defined (see e.g. [0]), > and it seems to me that it's something like: > 1) deprecate something and add a DeprecationWarning; > 2) forget about it after a while; > 3) wait a few versions until someone notices it; > 4) actually remove it; > > I suggest to follow the following process: > 1) deprecate something and add a DeprecationWarning; > 2) decide how long the deprecation should last; > 3) use the deprecated-remove[1] directive to document it; > 4) add a test that fails after the update so that we remember to remove > it[2]; > > Other related issues: > > PendingDeprecationWarnings: > * AFAIK the difference between PDW and DW is that PDW are silenced by > default; > * now DW are silence by default too, so there are no differences; > * I therefore suggest we stop using it, but we can leave it around[3] > (other projects might be using it for something different); > > Deprecation Progression: > Before, we more or less used to deprecated in release X and remove in > X+1, or add a PDW in X, DW in X+1, and remove it in X+2. > I suggest we drop this scheme and just use DW until X+N, where N is > >=1 and depends on what is being removed. We can decide to leave the > DW for 2-3 versions before removing something widely used, or just > deprecate in X and remove in X+1 for things that are less used. > The way I used PendingDeprecationWarning is that it is used until two releases before removal. If we want to add a keyword-only parameter to DeprecationWarning which specifies when the feature will be removed then PendingDeprecationWarning is really not needed; 'removal' maybe? Could then come up with a template format that's consistent when 'removal' is specified: '{} is deprecated and slated for removal in {} {}'.format(str(self.args), self.project, self.removal) # self.project defaults to 'Python'. Depending on how fancy we get we could have it so that when you warn on a DeprecationWarning where 'removal' equals sys.version (or whatever) then it always turns into an error case and thus raised as an exception. -Brett > > Porting from 2.x to 3.x: > Some people will update directly from 2.7 to 3.2 or even later > versions (3.3, 3.4, ...), without going through earlier 3.x versions. > If something is deprecated on 3.2 but not in 2.7 and then is removed > in 3.3, people updating from 2.7 to 3.3 won't see any warning, and > this will make the porting even more difficult. > I suggest that: > * nothing that is available and not deprecated in 2.7, will be > removed until 3.x (x needs to be defined); > * possibly we start backporting warnings to 2.7 so that they are > visible while running with -3; > > Documenting the deprecations: > In order to advertise the deprecations, they should be documented: > * in their doc, using the deprecated-removed directive (and possibly > not the 'deprecated' one); > * in the what's new, possibly listing everything that is currently > deprecated, and when it will be removed; > Django seems to do something similar[4]. > (Another thing I would like is a different rending for deprecated > functions. Some part of the docs have a deprecation warning on the > top of the section and the single functions look normal if you miss > that. Also while linking to a deprecated function it would be nice to > have it rendered with a different color or something similar.) > > Testing the deprecations: > Tests that fail when a new release is made and the version number is > bumped should be added to make sure we don't forget to remove it. > The test should have a related issue with a patch to remove the > deprecated function and the test. > Setting the priority of the issue to release blocker or deferred > blocker can be done in addition/instead, but that works well only when > N == 1 (the priority could be updated for every release though). > The tests could be marked with an expected failure to give some time > after the release to remove them. > All the deprecation-related tests might be added to the same file, or > left in the test file of their module. > > Where to add this: > Once we agree about the process we should write it down somewhere. > Possible candidates are: > * PEP387: Backwards Compatibility Policy[5] (it has a few lines about > this); > * a new PEP; > * the devguide; > I think having it in a PEP would be good, the devguide can then link to it. > > > Best Regards, > Ezio Melotti > > > [0]: http://bugs.python.org/issue13248 > [1]: deprecated-removed doesn't seem to be documented in the > documenting doc, but it was added here: > http://hg.python.org/cpython/rev/03296316a892 > [2]: see e.g. > http://hg.python.org/cpython/file/default/Lib/unittest/test/test_case.py#l1187 > [3]: we could also introduce a MetaDeprecationWarning and make > PendingDeprecationWarning inherit from it so that it can be used to > pending-deprecate itself. Once PendingDeprecationWarning is gone, the > MetaDeprecationWarning will become useless and can then be used to > meta-deprecate itself. > [4]: https://docs.djangoproject.com/en/dev/internals/deprecation/ > [5]: http://www.python.org/dev/peps/pep-0387/ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140125/df298bd8/attachment.html> From solipsis at pitrou.net Sun Jan 26 02:28:27 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 26 Jan 2014 02:28:27 +0100 Subject: [Python-Dev] cpython: Issue 19944: Fix importlib.find_spec() so it imports parents as needed. References: <3fBXDF0vdHz7LjS@mail.python.org> Message-ID: <20140126022827.7b713321@fsol> On Sat, 25 Jan 2014 23:37:49 +0100 (CET) eric.snow <python-checkins at python.org> wrote: > http://hg.python.org/cpython/rev/665f1ba77b57 > changeset: 88710:665f1ba77b57 > user: Eric Snow <ericsnowcurrently at gmail.com> > date: Sat Jan 25 15:32:46 2014 -0700 > summary: > Issue 19944: Fix importlib.find_spec() so it imports parents as needed. > > The function is also moved to importlib.util. Is there a reason to have separate "importlib" (toplevel) and "importlib.util" namespaces? (the doc says "This module contains the various objects that help in the construction of an importer", which doesn't sound related to find_spec()) Regards Antoine. From ethan at stoneleaf.us Sun Jan 26 02:24:20 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 25 Jan 2014 17:24:20 -0800 Subject: [Python-Dev] Quick poll: should help() show bound arguments? In-Reply-To: <CAPTjJmpwfoiRR-iCLyH34qWkpUyVgZ=F=KZMiZvJhYC0JzRLUA@mail.gmail.com> References: <52E3388F.4090109@hastings.org> <CAPTjJmpwfoiRR-iCLyH34qWkpUyVgZ=F=KZMiZvJhYC0JzRLUA@mail.gmail.com> Message-ID: <52E463C4.1010606@stoneleaf.us> On 01/25/2014 04:34 AM, Chris Angelico wrote: > On Sat, Jan 25, 2014 at 3:07 PM, Larry Hastings <larry at hastings.org> wrote: >> >> What should it be? >> >> A) pydoc and help() should not show bound parameters in the signature, like >> inspect.signature. > > Vote for A. As far as I'm concerned, all these foo are equally > callable and equally take one parameter named a: [snip] To strengthen this argument: --> import inspect --> from functools import partial --> def lots_of_args(a, b, c, d=3, e='wow', f=None): ... print(a, b, c, d, e, f) ... --> str(inspect.signature(lots_of_args)) "(a, b, c, d=3, e='wow', f=None)" --> curried = partial(lots_of_args, 9, f='Some') --> str(inspect.signature(curried)) "(b, c, d=3, e='wow', f='Some')" While I partially agree with Antoine that the whole self thing is confusing, I think it would be more accurate to only give help (and a signature) on parameters that you can actually change; if you are calling a bound method there is no way to pass in something else in place of self. So I vote for A. -- ~Ethan~ From ericsnowcurrently at gmail.com Sun Jan 26 02:57:31 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 25 Jan 2014 18:57:31 -0700 Subject: [Python-Dev] cpython: Issue 19944: Fix importlib.find_spec() so it imports parents as needed. In-Reply-To: <20140126022827.7b713321@fsol> References: <3fBXDF0vdHz7LjS@mail.python.org> <20140126022827.7b713321@fsol> Message-ID: <CALFfu7C3_wEuprE4_E_Cgx35bgC7gP309dKnO1Ab6XsXsKdmDA@mail.gmail.com> On Sat, Jan 25, 2014 at 6:28 PM, Antoine Pitrou <solipsis at pitrou.net> wrote: > Is there a reason to have separate "importlib" (toplevel) and > "importlib.util" namespaces? As to why they are separate, you'll need to ask Brett. I believe it's meant to keep the top namespace as small as possible. Regarding this changeset, it depended on importlib.util.resolve_name(), so moving find_spec() made sense. We discussed it briefly in issue #19944 and everyone there agreed it was fine. > > (the doc says "This module contains the various objects that help in > the construction of an importer", which doesn't sound related to > find_spec()) We should fix that, :) -eric From raymond.hettinger at gmail.com Sun Jan 26 03:30:24 2014 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 25 Jan 2014 18:30:24 -0800 Subject: [Python-Dev] Deprecation policy In-Reply-To: <CACBhJdGOysQPA6HkvCEZtbLPC7pAMxEK02=VD0K8bTm2TbHgDw@mail.gmail.com> References: <CACBhJdGOysQPA6HkvCEZtbLPC7pAMxEK02=VD0K8bTm2TbHgDw@mail.gmail.com> Message-ID: <3A20438F-02C4-4D92-BE72-C2F146AC99BA@gmail.com> On Jan 25, 2014, at 5:29 AM, Ezio Melotti <ezio.melotti at gmail.com> wrote: > Nick also suggested to document > our deprecation policy in PEP 5 (Guidelines for Language Evolution: > http://www.python.org/dev/peps/pep-0005/ ). Here's a few thoughts on deprecations: * If we care at all about people moving to Python 3, then we'll stop doing anything that makes the process more difficult. For someone moving from Python 2.7, it really doesn't matter if something that existed in 2.7 got deprecated in 3.1 and removed in 3.3; from their point-of-view, it just one more thing that won't work. * The notion of PendingDeprecationWarnings didn't work out very well. Conceptually, it was a nice idea, but in practice no one was benefitting from it. The warnings slowed down working, but not yet deprecated code. And no one was actually seeing the pending deprecations. * When a module becomes obsolete (say optparse vs argparse), there isn't really anything wrong with just leaving it in and making the docs indicate that something better is available. AFAICT, there isn't much value in actually removing the older tool. * A good use for deprecations is for features that were flat-out misdesigned and prone to error. For those, there is nothing wrong with deprecating them right away. Once deprecated though, there doesn't need to be a rush to actually remove it -- that just makes it harder for people with currently working code to upgrade to newer versions of Python. * When I became a core developer well over a decade ago, I was a little deprecation happy (old stuff must go, keep everything nice and clean, etc). What I learned though is that deprecations are very hard on users and that the purported benefits usually aren't really important. my-two-cents, Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140125/cb6739cd/attachment.html> From raymond.hettinger at gmail.com Sun Jan 26 03:42:43 2014 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 25 Jan 2014 18:42:43 -0800 Subject: [Python-Dev] lambda (x, y): In-Reply-To: <CAP1=2W5cEPHPAYgHdo90628Bn4TrPXTTTcRXO8BOoO2BjP2Z7Q@mail.gmail.com> References: <CANXboVaFfvK7d8zn71E=bR5NkRe7T4Hqmq_SQASWynKZE=mtMw@mail.gmail.com> <CAP1=2W4nO8w+GDv6xEGtWdhr9eaVmC6Rtd-thas+A6jF4hTkhw@mail.gmail.com> <52E34E7A.10605@canterbury.ac.nz> <CAP1=2W5cEPHPAYgHdo90628Bn4TrPXTTTcRXO8BOoO2BjP2Z7Q@mail.gmail.com> Message-ID: <7F648ECD-9262-4FC3-A0B8-6556112ECD8E@gmail.com> On Jan 25, 2014, at 4:01 PM, Brett Cannon <brett at python.org> wrote: > As the author of the PEP and I can say that `lambda (x, y): x + y` can just as easily be expressed as `lambda x, y: x + y` and then be called by using *args in the argument list. Anything that gets much fancier typically calls for a defined function instead of a lambda. I think that is an over-simplification. The argument unpacking was handy in a number of situations where *args wouldn't suffice: lambda (px, py), (qx, qy): ((px - qx) ** 2 + (py - qy) ** 2) ** 0.5 IIRC, the original reason for the change was that it simplified the compiler a bit, not that it was broken or not useful. Taking-out tuple unpacking might have been a good idea for the reasons listed in the PEP, but we shouldn't pretend that it didn't cripple some of the use cases for lambda where some of the arguments came in as tuples (host/port pairs, x-y coordinates, hue-saturation-luminosity, month-day-year, etc). Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140125/7bdcce0c/attachment.html> From ncoghlan at gmail.com Sun Jan 26 04:04:47 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 26 Jan 2014 13:04:47 +1000 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <1390676730.5741.75263769.6502E7A9@webmail.messagingengine.com> References: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> <1390666354.4886.75222561.1A3A841F@webmail.messagingengine.com> <CALyJZZXBfwopr6SR0iJ_bn=FGGD53wKucbcEeSCWsNswoGUb5w@mail.gmail.com> <1390676730.5741.75263769.6502E7A9@webmail.messagingengine.com> Message-ID: <CADiSq7fe3ysPagR-VJPUAQTWayqNLgLJ2ku8FG6SnNHb2kD8Ag@mail.gmail.com> On 26 January 2014 05:05, Benjamin Peterson <benjamin at python.org> wrote: > > > On Sat, Jan 25, 2014, at 10:55 AM, Vincent Davis wrote: >> On Sat, Jan 25, 2014 at 10:12 AM, Benjamin Peterson >> <benjamin at python.org>wrote: >> >> > Internal links with no version redirect to the Python 2 version for >> > backwards compatibility reasons. >> > >> >> On Sat, Jan 25, 2014 at 10:26 AM, Georg Brandl <g.brandl at gmx.net> wrote: >> >> > Yep, and the URLs without version never served Python 3 docs as far as I >> > can >> > >> remember, so I don't know where Google has these <title>s from. >> >> That is not consistent with >> http://docs.python.org (no version number) redirects to >> http://docs.python.org/3/ > > This is recent. It used to go to Python 2 docs. http://www.python.org/dev/peps/pep-0430/ covers the rationale for the current arrangement. The main issue is the extensive use of existing deep links into the Python 2 documentation from Python 2 specific tutorials and other references. Those third party references not only include vast numbers of online resources that we don't control, but also books that can't be updated at all. So, the canonical URLs on docs.python.org now always include the major version number in the path so they're unambiguous, the Python 3 docs are displayed by default, and unqualified deep links redirect to Python 2 for backwards compatibility. The robots.txt on python.org is *supposed* to keep the web crawlers away from the "/dev/" subtree (since most people searching for Python info aren't going to want the docs for an unreleased version), but I don't know if that's documented anywhere, or even if it's currently still configured that way. >> Maybe this is related to google search results. >> Seems wrong to me to point to 2.7 rather that 3.3 but I am sure there was >> discussion about that. > > The internal links all used to go to Python 2. There's also a lot of weight given in Google to the extensive array of existing unqualified deep links, which relate to Python 2. >> I looked (googled) for an example of a google link to current version of >> python 3.3 documentation. My approach was to google "python" and >> something >> listed in >> http://docs.python.org/3/whatsnew/3.3.html >> These results all seem to point to http://docs.python.org/dev/library >> i.e. >> 3.4.0b2 Which suggests that the Google web crawler *is* spidering the dev docs, which we generally don't want :P Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From vincent at vincentdavis.net Sun Jan 26 04:27:22 2014 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 25 Jan 2014 21:27:22 -0600 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <CADiSq7fe3ysPagR-VJPUAQTWayqNLgLJ2ku8FG6SnNHb2kD8Ag@mail.gmail.com> References: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> <1390666354.4886.75222561.1A3A841F@webmail.messagingengine.com> <CALyJZZXBfwopr6SR0iJ_bn=FGGD53wKucbcEeSCWsNswoGUb5w@mail.gmail.com> <1390676730.5741.75263769.6502E7A9@webmail.messagingengine.com> <CADiSq7fe3ysPagR-VJPUAQTWayqNLgLJ2ku8FG6SnNHb2kD8Ag@mail.gmail.com> Message-ID: <CALyJZZU-vxeKA-8QTFuxQsdr2WONeDJBhaBW3B48+OWqBGYpwQ@mail.gmail.com> I think subdomains need there own robots.txt which docs.python.org nor docs.python.org/(2 or 3)/ have. and http://python.org/robots.txt (below) seems a little sparse. For sure /dev/ is not blocked # Directions for robots. See this URL: # http://www.robotstxt.org/wc/norobots.html # for a description of the file format. User-agent: HTTrack User-agent: puf User-agent: MSIECrawler Disallow: / # The Krugle web crawler (though based on Nutch) is OK. User-agent: Krugle Allow: / Disallow: /moin Disallow: /pypi Disallow: /~guido/orlijn/ Disallow: /wwwstats/ Disallow: /ftpstats/ # No one should be crawling us with Nutch. User-agent: Nutch Disallow: / # Hide old versions of the documentation and various large sets of files. User-agent: * Disallow: /~guido/orlijn/ Disallow: /wwwstats/ Disallow: /webstats/ Disallow: /ftpstats/ Disallow: /moin Disallow: /pypi Disallow: /dev/buildbot/ Vincent Davis 720-301-3003 On Sat, Jan 25, 2014 at 9:04 PM, Nick Coghlan <ncoghlan at gmail.com> wrote: > On 26 January 2014 05:05, Benjamin Peterson <benjamin at python.org> wrote: > > > > > > On Sat, Jan 25, 2014, at 10:55 AM, Vincent Davis wrote: > >> On Sat, Jan 25, 2014 at 10:12 AM, Benjamin Peterson > >> <benjamin at python.org>wrote: > >> > >> > Internal links with no version redirect to the Python 2 version for > >> > backwards compatibility reasons. > >> > > >> > >> On Sat, Jan 25, 2014 at 10:26 AM, Georg Brandl <g.brandl at gmx.net> > wrote: > >> > >> > Yep, and the URLs without version never served Python 3 docs as far > as I > >> > can > >> > > >> remember, so I don't know where Google has these <title>s from. > >> > >> That is not consistent with > >> http://docs.python.org (no version number) redirects to > >> http://docs.python.org/3/ > > > > This is recent. It used to go to Python 2 docs. > > http://www.python.org/dev/peps/pep-0430/ covers the rationale for the > current arrangement. > > The main issue is the extensive use of existing deep links into the > Python 2 documentation from Python 2 specific tutorials and other > references. Those third party references not only include vast numbers > of online resources that we don't control, but also books that can't > be updated at all. > > So, the canonical URLs on docs.python.org now always include the major > version number in the path so they're unambiguous, the Python 3 docs > are displayed by default, and unqualified deep links redirect to > Python 2 for backwards compatibility. > > The robots.txt on python.org is *supposed* to keep the web crawlers > away from the "/dev/" subtree (since most people searching for Python > info aren't going to want the docs for an unreleased version), but I > don't know if that's documented anywhere, or even if it's currently > still configured that way. > > >> Maybe this is related to google search results. > >> Seems wrong to me to point to 2.7 rather that 3.3 but I am sure there > was > >> discussion about that. > > > > The internal links all used to go to Python 2. > > There's also a lot of weight given in Google to the extensive array of > existing unqualified deep links, which relate to Python 2. > > >> I looked (googled) for an example of a google link to current version of > >> python 3.3 documentation. My approach was to google "python" and > >> something > >> listed in > >> http://docs.python.org/3/whatsnew/3.3.html > >> These results all seem to point to http://docs.python.org/dev/library > >> i.e. > >> 3.4.0b2 > > Which suggests that the Google web crawler *is* spidering the dev > docs, which we generally don't want :P > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140125/4aaf31d1/attachment.html> From ncoghlan at gmail.com Sun Jan 26 04:28:15 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 26 Jan 2014 13:28:15 +1000 Subject: [Python-Dev] Deprecation policy In-Reply-To: <3A20438F-02C4-4D92-BE72-C2F146AC99BA@gmail.com> References: <CACBhJdGOysQPA6HkvCEZtbLPC7pAMxEK02=VD0K8bTm2TbHgDw@mail.gmail.com> <3A20438F-02C4-4D92-BE72-C2F146AC99BA@gmail.com> Message-ID: <CADiSq7eFOd0XwL89kVoK-wcCTZRFEEgEUvf7OuKuyKn4Rco_UA@mail.gmail.com> On 26 January 2014 12:30, Raymond Hettinger <raymond.hettinger at gmail.com> wrote: > > On Jan 25, 2014, at 5:29 AM, Ezio Melotti <ezio.melotti at gmail.com> wrote: > > Nick also suggested to document > our deprecation policy in PEP 5 (Guidelines for Language Evolution: > http://www.python.org/dev/peps/pep-0005/ ). > > > Here's a few thoughts on deprecations: > > * If we care at all about people moving to Python 3, then we'll stop > doing anything that makes the process more difficult. For someone > moving from Python 2.7, it really doesn't matter if something that > existed in 2.7 got deprecated in 3.1 and removed in 3.3; from their > point-of-view, it just one more thing that won't work. > > * The notion of PendingDeprecationWarnings didn't work out very well. > Conceptually, it was a nice idea, but in practice no one was benefitting > from it. The warnings slowed down working, but not yet deprecated code. > And no one was actually seeing the pending deprecations. > > * When a module becomes obsolete (say optparse vs argparse), there > isn't really anything wrong with just leaving it in and making the docs > indicate that something better is available. AFAICT, there isn't much > value in actually removing the older tool. > > * A good use for deprecations is for features that were flat-out misdesigned > and prone to error. For those, there is nothing wrong with deprecating them > right away. Once deprecated though, there doesn't need to be a rush to > actually remove it -- that just makes it harder for people with currently > working code to upgrade to newer versions of Python. > > * When I became a core developer well over a decade ago, I was a little > deprecation happy (old stuff must go, keep everything nice and clean, etc). > What I learned though is that deprecations are very hard on users and that > the purported benefits usually aren't really important. +1 to what Raymond send - unless something is actively causing us maintenance hassles or is a genuine bug magnet (cf. contextlib.nested), we should be very cautious about forcing users to change their code, especially in the context of raising additional barriers to migration from Python 2 to Python 3. Ezio's suggestions make sense as a policy for how to handle the situation when we *do* decide something needs programmatic deprecation, but I think there needs to be an explicit caveat that programmatic deprecation should be a last resort. Wherever possible, we should also provide a PyPI module that helps address the issue in a cross-version compatible way (whether through a dedicated upstream and/or backport module like contextlib2 or by adding features to existing cross-version compatibility modules like six and future) As an example that hopefully helps illustrate the two different cases, the shift to enabling proper SSL/TLS verification by default would qualify as a worthy use of programmatic deprecation (since the current insecure-by-default behaviour is a genuine bug magnet that leads to security flaws), but I'm not yet convinced that we actually gain any significant benefit from deprecating the legacy import plugin APIs. While PEP 451 nominally approved deprecating the latter, on the basis that deprecating them will make importlib easier to maintain, they received a stay of execution in 3.4 because the extension module import system currently still needs them, and there are other valid use cases that PEP 451 doesn't currently cover, At this point, I have come to believe that retaining the existing strictly-more-powerful-but-also-harder-to-use-correctly plugin API is a better way to handle those more complex use cases rather than inventing something new on top of PEP 451, as the latter approach would make things *harder* to maintain (due to increased complexity and needing to manage the deprecation process), and provide a worse experience for users implementing custom import hooks that need to support multiple versions (due to the introduction of new cross-version compatibility issues). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From benjamin at python.org Sun Jan 26 06:34:31 2014 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 25 Jan 2014 21:34:31 -0800 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <CADiSq7fe3ysPagR-VJPUAQTWayqNLgLJ2ku8FG6SnNHb2kD8Ag@mail.gmail.com> References: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> <1390666354.4886.75222561.1A3A841F@webmail.messagingengine.com> <CALyJZZXBfwopr6SR0iJ_bn=FGGD53wKucbcEeSCWsNswoGUb5w@mail.gmail.com> <1390676730.5741.75263769.6502E7A9@webmail.messagingengine.com> <CADiSq7fe3ysPagR-VJPUAQTWayqNLgLJ2ku8FG6SnNHb2kD8Ag@mail.gmail.com> Message-ID: <1390714471.32259.75378393.3A5AC458@webmail.messagingengine.com> On Sat, Jan 25, 2014, at 07:04 PM, Nick Coghlan wrote: > Which suggests that the Google web crawler *is* spidering the dev > docs, which we generally don't want :P I've now added a robots.txt to disallow crawling /dev. From tjreedy at udel.edu Sun Jan 26 06:34:56 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 26 Jan 2014 00:34:56 -0500 Subject: [Python-Dev] New policies for the Derby -- please read! In-Reply-To: <52E3DA44.8040801@hastings.org> References: <52E3ABC6.5080506@hastings.org> <CADiSq7ckJoUV2yA7Kyn3GAG27SJ02y-wd3uHTFKk23Xz9aS6-w@mail.gmail.com> <52E3DA44.8040801@hastings.org> Message-ID: <lc26po$3nf$1@ger.gmane.org> On 1/25/2014 10:37 AM, Larry Hastings wrote: > On 01/25/2014 07:26 AM, Nick Coghlan wrote: >> However, you've indicated that adding varargs support is going to take >> you quite a bit of work, so postponing it is an option definitely >> worth considering at this point in the release cycle. > > It's worth considering. I'm estimating it's about 1.5 days' worth of > work. Mainly because, at the same time, I need to teach Clinic to have > separate namespaces for "parser" and "impl" functions. At the same time > I was going to implement a frequently-requested feature, allowing the C > variable storing an argument to have a different name than the actual > Python parameter. > > And it could be one of those "hey that was easier than I thought" > things. 1.5 days is kind of worst-case. So maybe the best thing would > be to give it a half-day and see if it turned out to be easy. > > Of course, If we bring the Derby to a close now-ish (debate going on in > python-committers right now!), then I'll definitely put it off until 3.5. I have been annoyed by the mismatch between Python signatures and C-implementation for at least a decade. Now that the idea that all functions (except possible for range) should have Python signatures seems to have been accepted, I am willing for the full implementation to wait until 3.5. I think the Derby experiment has pretty well exposed what still needs to be done, so I expect it should be possible to have it all in 3.5 as long as people do not burn out first. -- Terry Jan Reedy From storchaka at gmail.com Sun Jan 26 08:40:42 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 26 Jan 2014 09:40:42 +0200 Subject: [Python-Dev] cpython: Issue #20133: The audioop module now uses Argument Clinic. In-Reply-To: <lc1g93$nn3$1@ger.gmane.org> References: <3fBCMr1Wdpz7LjP@mail.python.org> <lc1g93$nn3$1@ger.gmane.org> Message-ID: <lc2e5d$2lk$1@ger.gmane.org> 26.01.14 01:10, Christian Heimes ???????(??): > Coverity has detected an issue in this commit: Thank you. This is false positive detection. http://bugs.python.org/issue20394. From ncoghlan at gmail.com Sun Jan 26 13:04:00 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 26 Jan 2014 22:04:00 +1000 Subject: [Python-Dev] cpython: Issue 19944: Fix importlib.find_spec() so it imports parents as needed. In-Reply-To: <CALFfu7C3_wEuprE4_E_Cgx35bgC7gP309dKnO1Ab6XsXsKdmDA@mail.gmail.com> References: <3fBXDF0vdHz7LjS@mail.python.org> <20140126022827.7b713321@fsol> <CALFfu7C3_wEuprE4_E_Cgx35bgC7gP309dKnO1Ab6XsXsKdmDA@mail.gmail.com> Message-ID: <CADiSq7fh-_h6-O+EuxmQc3_H+T_rb06B6epSgu7zGLT1Z=0+uQ@mail.gmail.com> On 26 January 2014 11:57, Eric Snow <ericsnowcurrently at gmail.com> wrote: > On Sat, Jan 25, 2014 at 6:28 PM, Antoine Pitrou <solipsis at pitrou.net> wrote: >> Is there a reason to have separate "importlib" (toplevel) and >> "importlib.util" namespaces? > > As to why they are separate, you'll need to ask Brett. I believe it's > meant to keep the top namespace as small as possible. Correct - the top level namespace is meant for people to use normally for dynamic imports, reloading modules and clearing the internal caches when dynamically creating modules. By contrast, importlib.util is for when people are poking around at import system internals more directly, writing custom importers and loaders, replicating parts of the import system, etc. It's a "if you don't know why you might want these operations, you probably don't need them" kind of module, so we can keep the top level namespace simple and relatively easy to learn. find_loader()/find_spec() are both borderline as to which category they fall into, but as Eric noted, there was a dependency issue in this case which meant there were practical benefits to putting find_spec() in the "import specialist" category. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Sun Jan 26 15:59:53 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 26 Jan 2014 15:59:53 +0100 Subject: [Python-Dev] Deprecation policy References: <CACBhJdGOysQPA6HkvCEZtbLPC7pAMxEK02=VD0K8bTm2TbHgDw@mail.gmail.com> <3A20438F-02C4-4D92-BE72-C2F146AC99BA@gmail.com> Message-ID: <20140126155953.169e0ef7@fsol> On Sat, 25 Jan 2014 18:30:24 -0800 Raymond Hettinger <raymond.hettinger at gmail.com> wrote: > > On Jan 25, 2014, at 5:29 AM, Ezio Melotti <ezio.melotti at gmail.com> wrote: > > > Nick also suggested to document > > our deprecation policy in PEP 5 (Guidelines for Language Evolution: > > http://www.python.org/dev/peps/pep-0005/ ). > > Here's a few thoughts on deprecations: > > * If we care at all about people moving to Python 3, then we'll stop > doing anything that makes the process more difficult. For someone > moving from Python 2.7, it really doesn't matter if something that > existed in 2.7 got deprecated in 3.1 and removed in 3.3; from their > point-of-view, it just one more thing that won't work. +1. > * The notion of PendingDeprecationWarnings didn't work out very well. > Conceptually, it was a nice idea, but in practice no one was benefitting > from it. The warnings slowed down working, but not yet deprecated code. > And no one was actually seeing the pending deprecations. +1 too, especially since DeprecationWarnings are now silent by default, there's no reason to start with a PendingDeprecationWarning IMO. Regards Antoine. From francismb at email.de Sun Jan 26 15:59:36 2014 From: francismb at email.de (francis) Date: Sun, 26 Jan 2014 15:59:36 +0100 Subject: [Python-Dev] lambda (x, y): In-Reply-To: <7F648ECD-9262-4FC3-A0B8-6556112ECD8E@gmail.com> References: <CANXboVaFfvK7d8zn71E=bR5NkRe7T4Hqmq_SQASWynKZE=mtMw@mail.gmail.com> <CAP1=2W4nO8w+GDv6xEGtWdhr9eaVmC6Rtd-thas+A6jF4hTkhw@mail.gmail.com> <52E34E7A.10605@canterbury.ac.nz> <CAP1=2W5cEPHPAYgHdo90628Bn4TrPXTTTcRXO8BOoO2BjP2Z7Q@mail.gmail.com> <7F648ECD-9262-4FC3-A0B8-6556112ECD8E@gmail.com> Message-ID: <52E522D8.2010507@email.de> On 01/26/2014 03:42 AM, Raymond Hettinger wrote: > > I think that is an over-simplification. The argument unpacking was handy > in a number of situations where *args wouldn't suffice: > > lambda (px, py), (qx, qy): ((px - qx) ** 2 + (py - qy) ** 2) ** 0.5 > > IIRC, the original reason for the change was that it simplified the > compiler a bit, > not that it was broken or not useful. I'm not sure if that's applicable or other issues arise with: def fn(*p): px,py,qx,qy = p; return ((px - qx) ** 2 + (py - qy) ** 2) ** 0.5 Thanks in advance! Francis From sky.kok at speaklikeaking.com Sun Jan 26 18:00:19 2014 From: sky.kok at speaklikeaking.com (Vajrasky Kok) Date: Mon, 27 Jan 2014 01:00:19 +0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) Message-ID: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> Dear comrades, I would like to bring to your attention my disagreement with Larry Hastings in this ticket: http://bugs.python.org/issue19145 (Inconsistent behaviour in itertools.repeat when using negative times). Let me give you the context: >>> from itertools import repeat >>> repeat('a') repeat('a') >>> repeat('a', times=-1) repeat('a') >>> repeat('a', -1) repeat('a', 0) >>> repeat('a', times=-4) repeat('a', -4) >>> repeat('a', -4) repeat('a', 0) Right now, the only way you can tell repeat to do endless repetitions is to omit the `times` argument or by setting `times` argument to -1 via keyword. Larry intends to fix this in Python 3.5 by making None value to `times` argument as a representative of unlimited repetitions and negative `times` argument (either via keyword or positional) ALWAYS means 0 repetitions. This will ensure repeat has the appropriate signature. I have no qualms about it. All is well. My disagreement is related to Larry's decision not to fix this bug in Python 2.7, 3.3, and 3.4. Both of us agree that we should not let Python 2.7, 3.3, and 3.4 happily accepts None value because that is more than bug fix. What we don't agree is whether we should make negative `times` argument via keyword behaviour needs to be changed or not. He prefer let it be. I prefer we change the behaviour so that negative `times` argument in Python 2.7, 3.3, and 3.4 ALWAYS means 0 repetitions. My argument is that, on all circumstances, argument sent to function via positional or keyword must mean the same thing. Let's consider this hypothetical code: # 0 means draw, positive int means win, negative int means lose result = goals_result_of_the_match() # For every goal of the winning match, we donate money to charity. # Every donation consists of 10 $. import itertools itertools.repeat(donate_money_to_charity(), result) Later programmer B refactor this code: # 0 means draw, positive int means win, negative int means lose result = goals_result_of_the_match() # For every goal of the winning match, we donate money to charity. # Every donation consists of 10 $ from itertools import repeat repeat(object=donate_money_to_charity(), times=result) They use Python 2.7 (remember Python 2.7 is here to stay for a long long time). And imagine the match is lost 0-1 or 1-2 or 2-3 (so the goal difference is negative one / -1). It means they donate money to charity endlessly!!! They can go bankrupt. So I hope my argument is convincing enough. We need to fix this bug in Python 2.7, 3.3, and 3.4, by making `times` argument sent via positional or keyword in itertools.repeat ALWAYS means the same thing, which is 0 repetitions. If this is not possible, at the very least, we need to warn this behaviour in the doc. Whatever decision that comes up from this discussion, I will make peace with it. Vajrasky From eric at trueblade.com Sun Jan 26 18:16:30 2014 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 26 Jan 2014 12:16:30 -0500 Subject: [Python-Dev] lambda (x, y): In-Reply-To: <52E522D8.2010507@email.de> References: <CANXboVaFfvK7d8zn71E=bR5NkRe7T4Hqmq_SQASWynKZE=mtMw@mail.gmail.com> <CAP1=2W4nO8w+GDv6xEGtWdhr9eaVmC6Rtd-thas+A6jF4hTkhw@mail.gmail.com> <52E34E7A.10605@canterbury.ac.nz> <CAP1=2W5cEPHPAYgHdo90628Bn4TrPXTTTcRXO8BOoO2BjP2Z7Q@mail.gmail.com> <7F648ECD-9262-4FC3-A0B8-6556112ECD8E@gmail.com> <52E522D8.2010507@email.de> Message-ID: <52E542EE.8020500@trueblade.com> On 01/26/2014 09:59 AM, francis wrote: > On 01/26/2014 03:42 AM, Raymond Hettinger wrote: >> >> I think that is an over-simplification. The argument unpacking was handy >> in a number of situations where *args wouldn't suffice: >> >> lambda (px, py), (qx, qy): ((px - qx) ** 2 + (py - qy) ** 2) ** 0.5 >> >> IIRC, the original reason for the change was that it simplified the >> compiler a bit, >> not that it was broken or not useful. > I'm not sure if that's applicable or other issues arise with: > > def fn(*p): px,py,qx,qy = p; return ((px - qx) ** 2 + (py - qy) ** 2) ** > 0.5 [Dropping some whitespace to get things to all fit on one line] The goal is to call fn as: fn((1, 1), (2, 2)) So, in 2.7: >>> def fn((px, py), (qx, qy)): ... return ((px-qx)**2+(py-qy)**2)**0.5 ... >>> fn((1, 1), (2, 2)) 1.4142135623730951 >>> The nearest equivalent in 3.3 (also works in 2.7): >>> def fn(p, q): ... (px, py), (qx, qy) = p, q ... return ((px-qx)**2+(py-qy)**2)**0.5 ... >>> fn((1, 1), (2, 2)) 1.4142135623730951 For a lambda in 3.3, you're out of luck because you can't do the assignment. There, the best you can do is: >>> fn = lambda p, q: ((p[0]-q[0])**2+(p[1]-q[1])**2)**0.5 >>> fn((1, 1), (2, 2)) 1.4142135623730951 Which isn't quite so readable as the equivalent lambda in 2.7: >>> fn = lambda (px, py),(qx, qy):((px-qx)**2+(py-qy)**2)**0.5 >>> fn((1, 1), (2, 2)) 1.4142135623730951 As someone pointed out, it's not quite the same when you do your own tuple unpacking, but it's probably close enough for most cases. Eric. From storchaka at gmail.com Sun Jan 26 19:43:00 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 26 Jan 2014 20:43:00 +0200 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> Message-ID: <lc3kv7$dgl$1@ger.gmane.org> 26.01.14 19:00, Vajrasky Kok ???????(??): > So I hope my argument is convincing enough. We need to fix this bug in > Python 2.7, 3.3, and 3.4, by making `times` argument sent via > positional or keyword in itertools.repeat ALWAYS means the same thing, > which is 0 repetitions. > > If this is not possible, at the very least, we need to warn this > behaviour in the doc. I agree with Vajrasky. This is obvious bug and it should be fixed in any way. From larry at hastings.org Mon Jan 27 01:24:06 2014 From: larry at hastings.org (Larry Hastings) Date: Sun, 26 Jan 2014 16:24:06 -0800 Subject: [Python-Dev] The Argument Clinic Derby is drawing to a close Message-ID: <52E5A726.4090307@hastings.org> The first release candidate of Python 3.4 will be tagged in about two weeks. We need to be completely done with the Derby by then. And it's going to take a while to review and iterate on the patches we've got. Therefore: I'm going to stop accepting submissions for new patches in two days. Patches posted to the issue tracker on or after Wednesday Jan 29 at 12:00:01am will not be accepted. If you have a patch you're still working on, you have until then to get it in. Please make sure that all patches, whether they've been posted yet or not, conform to the new conversion policy for the Derby: https://mail.python.org/pipermail/python-dev/2014-January/132066.html Thanks for participating! //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140126/4248b9d3/attachment.html> From larry at hastings.org Mon Jan 27 01:26:40 2014 From: larry at hastings.org (Larry Hastings) Date: Sun, 26 Jan 2014 16:26:40 -0800 Subject: [Python-Dev] The Argument Clinic Derby is drawing to a close In-Reply-To: <52E5A726.4090307@hastings.org> References: <52E5A726.4090307@hastings.org> Message-ID: <52E5A7C0.2050208@hastings.org> On 01/26/2014 04:24 PM, Larry Hastings wrote: > Patches posted to the issue tracker on or after Wednesday Jan 29 at > 12:00:01am will not be accepted. Sorry, forgot to specify the time zone: PST, which is GMT -08:00. Put another way, the submission window closes about 55.5 hours from when I posted this message. Cheers, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140126/4e3f38e2/attachment.html> From larry at hastings.org Mon Jan 27 02:24:00 2014 From: larry at hastings.org (Larry Hastings) Date: Sun, 26 Jan 2014 17:24:00 -0800 Subject: [Python-Dev] The Argument Clinic Derby is drawing to a close In-Reply-To: <52E5A726.4090307@hastings.org> References: <52E5A726.4090307@hastings.org> Message-ID: <52E5B530.5030509@hastings.org> On 01/26/2014 04:24 PM, Larry Hastings wrote: > The first release candidate of Python 3.4 will be tagged in about two > weeks. We need to be completely done with the Derby by then. And it's > going to take a while to review and iterate on the patches we've got. Since I was asked to clarify: we aren't going to convert every possible candidate function before the close of the Derby. We'll just ship what we have, and Python 3.4 will only have partial coverage for builtins. Once trunk reopens for Python 3.5 development work we'll pick up where we left off. Thanks again, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140126/97b8cf6f/attachment.html> From ncoghlan at gmail.com Mon Jan 27 02:52:41 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Jan 2014 11:52:41 +1000 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <lc3kv7$dgl$1@ger.gmane.org> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> Message-ID: <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> On 27 Jan 2014 04:44, "Serhiy Storchaka" <storchaka at gmail.com> wrote: > > 26.01.14 19:00, Vajrasky Kok ???????(??): > >> So I hope my argument is convincing enough. We need to fix this bug in >> Python 2.7, 3.3, and 3.4, by making `times` argument sent via >> positional or keyword in itertools.repeat ALWAYS means the same thing, >> which is 0 repetitions. >> >> If this is not possible, at the very least, we need to warn this >> behaviour in the doc. > > > I agree with Vajrasky. This is obvious bug and it should be fixed in any way. Larry and I were discussing this problem on IRC and the problem is that we can't remove this behaviour without providing equivalent functionality under a different spelling. In 3.5, that will be passing None, rather than -1. For those proposing to change the maintenance releases, how should a user relying on this misbehaviour update their code to handle it? There's also the fact that breaking working code in a maintenance release is always dubious, especially when there's no current supported way to get the equivalent behaviour prior to the maintenance release. This is the kind of change that will require a note in the "porting to Python 3.5" section of the What's New, again suggesting strongly that we can't change it in the maintenance releases. Cheers, Nick. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140127/44047e5f/attachment.html> From python at mrabarnett.plus.com Mon Jan 27 03:39:52 2014 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 27 Jan 2014 02:39:52 +0000 Subject: [Python-Dev] The Argument Clinic Derby is drawing to a close In-Reply-To: <52E5A7C0.2050208@hastings.org> References: <52E5A726.4090307@hastings.org> <52E5A7C0.2050208@hastings.org> Message-ID: <52E5C6F8.5050700@mrabarnett.plus.com> On 2014-01-27 00:26, Larry Hastings wrote: > > On 01/26/2014 04:24 PM, Larry Hastings wrote: >> Patches posted to the issue tracker on or after Wednesday Jan 29 at >> 12:00:01am will not be accepted. > > Sorry, forgot to specify the time zone: PST, which is GMT -08:00. > > Put another way, the submission window closes about 55.5 hours from when > I posted this message. > Put yet another way, 2014-01-29T08:00Z. From alexander.belopolsky at gmail.com Mon Jan 27 04:29:18 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 26 Jan 2014 22:29:18 -0500 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> Message-ID: <CAP7h-xYOfKcwqRgohZ2Cpa78MCmg6Jp9Q1Pzw9vhO0skJ6D43A@mail.gmail.com> On Sun, Jan 26, 2014 at 8:52 PM, Nick Coghlan <ncoghlan at gmail.com> wrote: > There's also the fact that breaking working code in a maintenance release > is always dubious, especially when there's no current supported way to get > the equivalent behaviour prior to the maintenance release. This is the kind > of change that will require a note in the "porting to Python 3.5" section > of the What's New, again suggesting strongly that we can't change it in the > maintenance releases. It looks like there is more than one bug related to passing negative times as a keyword argument: >>> from itertools import * >>> ''.join(repeat('a', times=-4)) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: long int too large to convert to int itertools.repeat() is documented [1] as being equivalent to: def repeat(object, times=None): # repeat(10, 3) --> 10 10 10 if times is None: while True: yield object else: for i in range(times): yield object The behavior of the CPython implementation is clearly wrong. If there are people relying on it - they already have code that would break in other implementations. (I would say not accepting None for times is a bug as well if you read the docs literally.) When implementation behavior differs from documentation it is a bug by definition and a fix should go in bug-fix releases. Reproducing old behavior is fairly trivial using an old_repeat(object, *args, **kwds) wrapper to distinguish between arguments passed positionally and by keyword. However, I seriously doubt that anyone would need that. [1] http://docs.python.org/3/library/itertools.html#itertools.repeat -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140126/a85d3894/attachment-0001.html> From alexander.belopolsky at gmail.com Mon Jan 27 04:51:20 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 26 Jan 2014 22:51:20 -0500 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> Message-ID: <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> On Sun, Jan 26, 2014 at 12:00 PM, Vajrasky Kok <sky.kok at speaklikeaking.com>wrote: > >>> repeat('a', times=-1) > repeat('a') > As I think about it, this may be more than a bug but a door for a denial of service attack. Imagine an application where times comes from an untrusted source. Someone relying on documented behavior may decide to sanitize the value by only checking against an upper bound assuming that negative values would just lead to no repetitions. If an attacker could somehow case times to get the value of -1 this may cause an infinite loop, blow up memory etc. If you think this is far-fetched - consider a web app that uses repeat() as a part of logic to pretty-print user input. The times value may come from a calculation of a difference between the screen width and the length of some string - both under user control. So maybe the fix should go into security bugs only branches as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140126/32f52a59/attachment.html> From ncoghlan at gmail.com Mon Jan 27 05:02:29 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Jan 2014 14:02:29 +1000 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> Message-ID: <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> On 27 January 2014 13:51, Alexander Belopolsky <alexander.belopolsky at gmail.com> wrote: > > On Sun, Jan 26, 2014 at 12:00 PM, Vajrasky Kok <sky.kok at speaklikeaking.com> > wrote: >> >> >>> repeat('a', times=-1) >> repeat('a') > > > As I think about it, this may be more than a bug but a door for a denial of > service attack. Imagine an application where times comes from an untrusted > source. Someone relying on documented behavior may decide to sanitize the > value by only checking against an upper bound assuming that negative values > would just lead to no repetitions. If an attacker could somehow case times > to get the value of -1 this may cause an infinite loop, blow up memory etc. > > If you think this is far-fetched - consider a web app that uses repeat() as > a part of logic to pretty-print user input. The times value may come from a > calculation of a difference between the screen width and the length of some > string - both under user control. > > So maybe the fix should go into security bugs only branches as well. If we do go this path, then we should backport the full fix (i.e. accepting None to indicate repeating forever), rather than just a partial fix. That is, I'm OK with either not backporting anything at all, or backporting the full change. The only idea I object to is the one of removing the infinite iteration capability without providing a replacement spelling for it. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tshepang at gmail.com Mon Jan 27 05:07:39 2014 From: tshepang at gmail.com (Tshepang Lekhonkhobe) Date: Mon, 27 Jan 2014 06:07:39 +0200 Subject: [Python-Dev] The Argument Clinic Derby is drawing to a close In-Reply-To: <52E5A726.4090307@hastings.org> References: <52E5A726.4090307@hastings.org> Message-ID: <CAA77j2CtpBAW-HjKwVVH=0fqG6G+V-OhuHJge71kLFU5YxcdrA@mail.gmail.com> On Mon, Jan 27, 2014 at 2:24 AM, Larry Hastings <larry at hastings.org> wrote: > > > The first release candidate of Python 3.4 will be tagged in about two weeks. > We need to be completely done with the Derby by then. And it's going to > take a while to review and iterate on the patches we've got. > > Therefore: I'm going to stop accepting submissions for new patches in two > days. Patches posted to the issue tracker on or after Wednesday Jan 29 at > 12:00:01am will not be accepted. If you have a patch you're still working > on, you have until then to get it in. Why not delay rc1 by a week or two and get this work done? From alexander.belopolsky at gmail.com Mon Jan 27 05:20:50 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 26 Jan 2014 23:20:50 -0500 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> Message-ID: <CAP7h-xbDarKLiO6eXS_Ct1zaKf1QHoLe_xqwrq35nS8SZ8G2QA@mail.gmail.com> On Sun, Jan 26, 2014 at 11:02 PM, Nick Coghlan <ncoghlan at gmail.com> wrote: > That is, I'm OK with either not backporting anything at all, or > backporting the full change. > +1 A partial backport will do a disservice to both users and maintainers. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140126/8bf37611/attachment.html> From sky.kok at speaklikeaking.com Mon Jan 27 05:26:18 2014 From: sky.kok at speaklikeaking.com (Vajrasky Kok) Date: Mon, 27 Jan 2014 12:26:18 +0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CAP7h-xbDarKLiO6eXS_Ct1zaKf1QHoLe_xqwrq35nS8SZ8G2QA@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <CAP7h-xbDarKLiO6eXS_Ct1zaKf1QHoLe_xqwrq35nS8SZ8G2QA@mail.gmail.com> Message-ID: <CAB+fVUXrKbD3_uBhrDz6d7g3LRLmE2BM-G3Taq3m=tDCJHMapg@mail.gmail.com> On Mon, Jan 27, 2014 at 12:20 PM, Alexander Belopolsky <alexander.belopolsky at gmail.com> wrote: > > +1 > > A partial backport will do a disservice to both users and maintainers. In case we are taking "not backporting anything at all" road, what is the best fix for the document? Old >>> itertools.repeat.__doc__ 'repeat(object [,times]) -> create an iterator which returns the object\nfor the specified number of times. If not specified, returns the object\nendlessly.' New >>> itertools.repeat.__doc__ 'repeat(object [,times]) -> create an iterator which returns the object\nfor the specified number of times. If not specified, returns the object\nendlessly. If times is specified through positional and negative numbers, it always means 0 repetitions. If times is specified through keyword and -1, it means endless repetition. Other negative numbers for times through keyword should be avoided.' From sky.kok at speaklikeaking.com Mon Jan 27 05:21:19 2014 From: sky.kok at speaklikeaking.com (Vajrasky Kok) Date: Mon, 27 Jan 2014 12:21:19 +0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> Message-ID: <CAB+fVUXpUDdwrX5UQ=ZmFvhy5TcnG3mB_-oAEVaRgQspoZePoA@mail.gmail.com> On Mon, Jan 27, 2014 at 12:02 PM, Nick Coghlan <ncoghlan at gmail.com> wrote: >>> > > That is, I'm OK with either not backporting anything at all, or > backporting the full change. The only idea I object to is the one of > removing the infinite iteration capability without providing a > replacement spelling for it. > Is repeat('a') (omitting times argument) not a replacement spelling for it? What about this alternative? Makes -1 consistently mean unlimited repetition and other negative numbers consistently mean zero repetitions then document this behaviour. Just throwing suggestion. I am not so keen to it, though. From alexander.belopolsky at gmail.com Mon Jan 27 05:28:09 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 26 Jan 2014 23:28:09 -0500 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CAB+fVUXpUDdwrX5UQ=ZmFvhy5TcnG3mB_-oAEVaRgQspoZePoA@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <CAB+fVUXpUDdwrX5UQ=ZmFvhy5TcnG3mB_-oAEVaRgQspoZePoA@mail.gmail.com> Message-ID: <CAP7h-xZWDcvsDszG6-bay299CXqPUcM0m8dQN1ASjay3hk76sQ@mail.gmail.com> On Sun, Jan 26, 2014 at 11:21 PM, Vajrasky Kok <sky.kok at speaklikeaking.com>wrote: > What about this alternative? Makes -1 consistently mean unlimited > repetition and other negative numbers consistently mean zero > repetitions > -1 I think this idea was already rejected on the bug tracker. It will be very surprising if list(repeat(x, n)) would be different from [x] * n for integer n. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140126/6d591b5b/attachment.html> From larry at hastings.org Mon Jan 27 05:34:49 2014 From: larry at hastings.org (Larry Hastings) Date: Sun, 26 Jan 2014 20:34:49 -0800 Subject: [Python-Dev] The Argument Clinic Derby is drawing to a close In-Reply-To: <CAA77j2CtpBAW-HjKwVVH=0fqG6G+V-OhuHJge71kLFU5YxcdrA@mail.gmail.com> References: <52E5A726.4090307@hastings.org> <CAA77j2CtpBAW-HjKwVVH=0fqG6G+V-OhuHJge71kLFU5YxcdrA@mail.gmail.com> Message-ID: <52E5E1E9.9080306@hastings.org> On 01/26/2014 08:07 PM, Tshepang Lekhonkhobe wrote: > On Mon, Jan 27, 2014 at 2:24 AM, Larry Hastings <larry at hastings.org> wrote: >> >> The first release candidate of Python 3.4 will be tagged in about two weeks. >> We need to be completely done with the Derby by then. And it's going to >> take a while to review and iterate on the patches we've got. >> >> Therefore: I'm going to stop accepting submissions for new patches in two >> days. Patches posted to the issue tracker on or after Wednesday Jan 29 at >> 12:00:01am will not be accepted. If you have a patch you're still working >> on, you have until then to get it in. > Why not delay rc1 by a week or two and get this work done? We discussed that. The core developers, and Guido in particular, were against it. One good reason: if we slip Python 3.4 again we will miss shipping Python 3.4 final with Ubuntu 14.04. You can read the discussion in python-committers; it's under the subject "Status of the Derby, and request for another slip". https://mail.python.org/pipermail/python-committers/2014-January/002977.html //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140126/2d48bb3a/attachment.html> From alexander.belopolsky at gmail.com Mon Jan 27 05:40:33 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 26 Jan 2014 23:40:33 -0500 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CAB+fVUXrKbD3_uBhrDz6d7g3LRLmE2BM-G3Taq3m=tDCJHMapg@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <CAP7h-xbDarKLiO6eXS_Ct1zaKf1QHoLe_xqwrq35nS8SZ8G2QA@mail.gmail.com> <CAB+fVUXrKbD3_uBhrDz6d7g3LRLmE2BM-G3Taq3m=tDCJHMapg@mail.gmail.com> Message-ID: <CAP7h-xY5YNrnYj5j-qRe7-AXMD_sH7JcqbxgX6L25zqm_ZZ5cQ@mail.gmail.com> On Sun, Jan 26, 2014 at 11:26 PM, Vajrasky Kok <sky.kok at speaklikeaking.com>wrote: > In case we are taking "not backporting anything at all" road, what is > the best fix for the document? > > Old > >>> itertools.repeat.__doc__ > 'repeat(object [,times]) -> create an iterator which returns the > object\nfor the specified number of times. If not specified, returns > the object\nendlessly.' > I would say no fix is needed for this doc because the signature suggests (correctly) that passing times by keyword is not supported. The following behavior further supports this interpretation. >>> from itertools import * >>> ''.join(repeat('a', times=-4)) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: long int too large to convert to int The ReST documentation may benefit from an addition of a warning that behavior of repeat() is "undefined" when times is passed by keyword. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140126/66e8ea71/attachment.html> From larry at hastings.org Mon Jan 27 06:01:08 2014 From: larry at hastings.org (Larry Hastings) Date: Sun, 26 Jan 2014 21:01:08 -0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CAP7h-xY5YNrnYj5j-qRe7-AXMD_sH7JcqbxgX6L25zqm_ZZ5cQ@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <CAP7h-xbDarKLiO6eXS_Ct1zaKf1QHoLe_xqwrq35nS8SZ8G2QA@mail.gmail.com> <CAB+fVUXrKbD3_uBhrDz6d7g3LRLmE2BM-G3Taq3m=tDCJHMapg@mail.gmail.com> <CAP7h-xY5YNrnYj5j-qRe7-AXMD_sH7JcqbxgX6L25zqm_ZZ5cQ@mail.gmail.com> Message-ID: <52E5E814.2020904@hastings.org> On 01/26/2014 08:40 PM, Alexander Belopolsky wrote: > > On Sun, Jan 26, 2014 at 11:26 PM, Vajrasky Kok > <sky.kok at speaklikeaking.com <mailto:sky.kok at speaklikeaking.com>> wrote: > > In case we are taking "not backporting anything at all" road, what is > the best fix for the document? > > > I would say no fix is needed for this doc because the signature > suggests (correctly) that passing times by keyword is not supported. Where does it do that? And why would the function support keyword arguments, if it was the author's intent to not support them? It's easier to not support them, you call PyArg_ParseTuple. Calling PyArg_ParseTupleAndKeywords is slightly more involved. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140126/7639483d/attachment.html> From tjreedy at udel.edu Mon Jan 27 07:11:16 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 27 Jan 2014 01:11:16 -0500 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> Message-ID: <lc4t9o$rj6$1@ger.gmane.org> On 1/26/2014 11:02 PM, Nick Coghlan wrote: > On 27 January 2014 13:51, Alexander Belopolsky > <alexander.belopolsky at gmail.com> wrote: >> >> On Sun, Jan 26, 2014 at 12:00 PM, Vajrasky Kok <sky.kok at speaklikeaking.com> >> wrote: >>> >>>>>> repeat('a', times=-1) >>> repeat('a') >> >> >> As I think about it, this may be more than a bug but a door for a denial of >> service attack. Imagine an application where times comes from an untrusted >> source. Someone relying on documented behavior may decide to sanitize the >> value by only checking against an upper bound assuming that negative values >> would just lead to no repetitions. If an attacker could somehow case times >> to get the value of -1 this may cause an infinite loop, blow up memory etc. >> >> If you think this is far-fetched - consider a web app that uses repeat() as >> a part of logic to pretty-print user input. The times value may come from a >> calculation of a difference between the screen width and the length of some >> string - both under user control. >> >> So maybe the fix should go into security bugs only branches as well. > > If we do go this path, then we should backport the full fix (i.e. > accepting None to indicate repeating forever), rather than just a > partial fix. > > That is, I'm OK with either not backporting anything at all, or > backporting the full change. The only idea I object to is the one of > removing the infinite iteration capability without providing a > replacement spelling for it. I believe that the replacement spelling would be the existing 'spelling' of no spelling at all -- omitting the argument. However, having to (temporarily) write def myfunc(...what, num, ...): ... if times is None: itertools.repeat(what, num) else: itertools.repeat(what) is obnoxious at best. While I am a strong supported of no new features in bug releases, one of my early commits added a new parameter to difflib.SequenceMatcher. This was after pydev discussion that included Tim Peters and which concluded that there was no other sane way to fix the bug. If merely making both ways of passing times have the same effect is rejected, then I vote for the complete fix. Fixing the default for an existing parameter is less of a new feature than a new parameter. -- Terry Jan Reedy From larry at hastings.org Mon Jan 27 07:21:51 2014 From: larry at hastings.org (Larry Hastings) Date: Sun, 26 Jan 2014 22:21:51 -0800 Subject: [Python-Dev] [RELEASED] Python 3.4.0b3 Message-ID: <52E5FAFF.1080901@hastings.org> On behalf of the Python development team, I'm quite pleased to announce the third beta release of Python 3.4. This is a preview release, and its use is not recommended for production settings. Python 3.4 includes a range of improvements of the 3.x series, including hundreds of small improvements and bug fixes. Major new features and changes in the 3.4 release series include: * PEP 428, a "pathlib" module providing object-oriented filesystem paths * PEP 435, a standardized "enum" module * PEP 436, a build enhancement that will help generate introspection information for builtins * PEP 442, improved semantics for object finalization * PEP 443, adding single-dispatch generic functions to the standard library * PEP 445, a new C API for implementing custom memory allocators * PEP 446, changing file descriptors to not be inherited by default in subprocesses * PEP 450, a new "statistics" module * PEP 451, standardizing module metadata for Python's module import system * PEP 453, a bundled installer for the *pip* package manager * PEP 454, a new "tracemalloc" module for tracing Python memory allocations * PEP 456, a new hash algorithm for Python strings and binary data * PEP 3154, a new and improved protocol for pickled objects * PEP 3156, a new "asyncio" module, a new framework for asynchronous I/O Python 3.4 is now in "feature freeze", meaning that no new features will be added. The final release is projected for mid-March 2014. To download Python 3.4.0b3 visit: http://www.python.org/download/releases/3.4.0/ Please consider trying Python 3.4.0b3 with your code and reporting any new issues you notice to: http://bugs.python.org/ Enjoy! -- Larry Hastings, Release Manager larry at hastings.org (on behalf of the entire python-dev team and 3.4's contributors) From steve at pearwood.info Mon Jan 27 07:31:34 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 27 Jan 2014 17:31:34 +1100 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CAP7h-xY5YNrnYj5j-qRe7-AXMD_sH7JcqbxgX6L25zqm_ZZ5cQ@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <CAP7h-xbDarKLiO6eXS_Ct1zaKf1QHoLe_xqwrq35nS8SZ8G2QA@mail.gmail.com> <CAB+fVUXrKbD3_uBhrDz6d7g3LRLmE2BM-G3Taq3m=tDCJHMapg@mail.gmail.com> <CAP7h-xY5YNrnYj5j-qRe7-AXMD_sH7JcqbxgX6L25zqm_ZZ5cQ@mail.gmail.com> Message-ID: <20140127063134.GB3915@ando> On Sun, Jan 26, 2014 at 11:40:33PM -0500, Alexander Belopolsky wrote: > I would say no fix is needed for this doc because the signature suggests > (correctly) that passing times by keyword is not supported. How do you determine that? Passing times by keyword works in Python 3.3: py> from itertools import repeat py> it = repeat("a", times=5) py> list(it) ['a', 'a', 'a', 'a', 'a'] The docstring signature names both parameters: py> print(repeat.__doc__) repeat(object [,times]) -> create an iterator which returns the object for the specified number of times. If not specified, returns the object endlessly. And both names work fine: py> repeat(object=2, times=5) repeat(2, 5) Judging from the docstring and current behaviour, I think we can conclude that: - keyword arguments are fine; - there shouldn't be any difference between providing a value by position or by keyword; - repeat(x) should yield x forever; - repeat(x, 0) should immediately raise StopIteration; - repeat(x, -n) is not specified by the docstring, but the documentation on the website makes it clear that it should be equivalent to repeat(x, 0) no matter the magnitude of -n; - which matches the behaviour of [x]*-n nicely. As far as I'm concerned, this is a clear case of a bug. Providing times=None (by keyword or by position) ought to be equivalent to not providing times at all, and any negative times ought to be equivalent to zero. > The following behavior further supports this interpretation. > > >>> from itertools import * > >>> ''.join(repeat('a', times=-4)) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > OverflowError: long int too large to convert to int I don't think it does. I think it suggests that something is trying to convert an unsigned value into an int, and failing. I note that repeat is happy to work with negatives times one at a time: py> it = repeat('a', times=-4) py> next(it) 'a' -- Steven From ncoghlan at gmail.com Mon Jan 27 07:45:53 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Jan 2014 16:45:53 +1000 Subject: [Python-Dev] [Python-checkins] peps: Incorporate PEP 462 feedback In-Reply-To: <52E5EE6F.8010300@udel.edu> References: <3fCGVH220wz7Lkf@mail.python.org> <52E5EE6F.8010300@udel.edu> Message-ID: <CADiSq7fHi+0svcCft_aAocVdDdrnTiB4bHbuyuBMy_KfoWBe6Q@mail.gmail.com> On 27 January 2014 15:28, Terry Reedy <tjreedy at udel.edu> wrote: > On 1/26/2014 10:22 PM, nick.coghlan wrote: >> >> +Terry Reedy has suggested doing an initial filter which specifically >> looks >> +for approved documentation-only patches (~700 of the 4000+ open CPython >> +issues are pure documentation updates). This approach would avoid several >> +of the issues related to flaky tests and cross-platform testing, while >> +still allowing the rest of the automation flows to be worked out (such as >> +how to push a patch into the merge queue). >> + >> +The one downside to this approach is that Zuul wouldn't have complete >> +control of the merge process as it usually expects, so there would >> +potentially be additional coordination needed around that. > > > An essential part of my idea is that Zuul *would* have complete control > while pushing doc patches to avoid the otherwise inevitable push races. > Initially, this would be for a part of every day. While it has control, I > would expect it to push doc patches at intervals of perhaps a minute, or > even more rapidly with parallel testing. (Since doc patch rarely interfere > and would be expected to apply after pre-testing, little speculative testing > would need to be tossed.) "Exclusive control some of the time" is not the same thing as "exclusive control". It's not an impossible idea, but certainly not the way Zuul is currently designed to work :) >> +It may be worth keeping this approach as a fallback option if the initial >> +deployment proves to have more trouble with test reliability than is >> +anticipated. > > > I think a doc queue should be a permanent part of the system. There would > always be doc-only patches -- and I suspect even more than now. One of the > types of jobs on the main queue could be a periodic 'push all pending doc > patches' job. I would then think we should try splitting code + doc patches > into a code patch, pushed first, and a doc patch, added to the doc queue if > the code patch succeeded. That seems like added complexity for little gain - note that we can make the *test runner* smart about how it handles doc-only patches, by just checking the docs build and skipping the test run. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From georg at python.org Mon Jan 27 08:36:46 2014 From: georg at python.org (Georg Brandl) Date: Mon, 27 Jan 2014 08:36:46 +0100 Subject: [Python-Dev] [RELEASED] Python 3.3.4 release candidate 1 Message-ID: <52E60C8E.1050908@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On behalf of the Python development team, I'm reasonably happy to announce the Python 3.3.4 release candidate 1. Python 3.3.4 includes several security fixes and over 120 bug fixes compared to the Python 3.3.3 release. This release fully supports OS X 10.9 Mavericks. In particular, this release fixes an issue that could cause previous versions of Python to crash when typing in interactive mode on OS X 10.9. Python 3.3 includes a range of improvements of the 3.x series, as well as easier porting between 2.x and 3.x. In total, almost 500 API items are new or improved in Python 3.3. For a more extensive list of changes in the 3.3 series, see http://docs.python.org/3.3/whatsnew/3.3.html and for the detailed changelog of 3.3.4, see http://docs.python.org/3.3/whatsnew/changelog.html To download Python 3.3.4 rc1 visit: http://www.python.org/download/releases/3.3.4/ This is a preview release, please report any bugs to http://bugs.python.org/ The final version is scheduled to be released in two weeks' time, on or about the 10th of February. Enjoy! - -- Georg Brandl, Release Manager georg at python.org (on behalf of the entire python-dev team and 3.3's contributors) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEARECAAYFAlLmDI4ACgkQN9GcIYhpnLAr6wCePRbHF80k5goV4RUDBA5FfkwF rLUAnRg0RpL/b6apv+Dt2/sgnUd3hTPA =Z4Ss -----END PGP SIGNATURE----- From solipsis at pitrou.net Mon Jan 27 10:38:30 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 27 Jan 2014 10:38:30 +0100 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> Message-ID: <20140127103830.2d54c0c6@fsol> On Mon, 27 Jan 2014 14:02:29 +1000 Nick Coghlan <ncoghlan at gmail.com> wrote: > > If we do go this path, then we should backport the full fix (i.e. > accepting None to indicate repeating forever), rather than just a > partial fix. > > That is, I'm OK with either not backporting anything at all, or > backporting the full change. The only idea I object to is the one of > removing the infinite iteration capability without providing a > replacement spelling for it. I would say not backport at all. The security threat is highly theoretical. If someone blindly accepts user values for repeat(), the user value can just as well be a very large positive with similar effects (e.g. 2**31). Regards Antoine. From solipsis at pitrou.net Mon Jan 27 10:39:55 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 27 Jan 2014 10:39:55 +0100 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <CAP7h-xbDarKLiO6eXS_Ct1zaKf1QHoLe_xqwrq35nS8SZ8G2QA@mail.gmail.com> <CAB+fVUXrKbD3_uBhrDz6d7g3LRLmE2BM-G3Taq3m=tDCJHMapg@mail.gmail.com> <CAP7h-xY5YNrnYj5j-qRe7-AXMD_sH7JcqbxgX6L25zqm_ZZ5cQ@mail.gmail.com> <52E5E814.2020904@hastings.org> Message-ID: <20140127103955.1eae1e6f@fsol> On Sun, 26 Jan 2014 21:01:08 -0800 Larry Hastings <larry at hastings.org> wrote: > On 01/26/2014 08:40 PM, Alexander Belopolsky wrote: > > > > On Sun, Jan 26, 2014 at 11:26 PM, Vajrasky Kok > > <sky.kok at speaklikeaking.com <mailto:sky.kok at speaklikeaking.com>> wrote: > > > > In case we are taking "not backporting anything at all" road, what is > > the best fix for the document? > > > > > > I would say no fix is needed for this doc because the signature > > suggests (correctly) that passing times by keyword is not supported. > > Where does it do that? In the "[,times]" spelling, which is the spelling customarily used for positional-only arguments. Regards Antoine. From breamoreboy at yahoo.co.uk Mon Jan 27 10:47:22 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 27 Jan 2014 09:47:22 +0000 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> Message-ID: <lc59uu$4m9$1@ger.gmane.org> On 27/01/2014 01:52, Nick Coghlan wrote: > > In 3.5, that will be passing None, rather than -1. For those proposing > to change the maintenance releases, how should a user relying on this > misbehaviour update their code to handle it? > I'm -1 on using None. The code currently rejects anything except an int. The docs don't say anything about using None, except in the "equivalent to" section, which is also the only place where it looks as if times can be a keyword argument. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From victor.stinner at gmail.com Mon Jan 27 10:45:37 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 27 Jan 2014 10:45:37 +0100 Subject: [Python-Dev] News from asyncio Message-ID: <CAMpsgwZE0snFr4pKcyE_j=TQj+R+TiiJRXm7g_zuSdHtc5XtpQ@mail.gmail.com> Hi, I'm working for eNovance on the asyncio module, the goal is to use it in the huge OpenStack project (2.5 millions line of code) which currently uses eventlet. I'm trying to fix remaining issues in the asyncio module before Python 3.4 final. The asyncio project is very active but discussions are splitted between its own dedicated mailing list (python-tulip Google group), Tulip bug tracker and Python bug tracker. Please join Tulip mailing list if you are interested to contribute. http://code.google.com/p/tulip/ I would like to share with you the status of the module. Many bugs have been fixed recently. I suppose that new bugs are found because new developers started to play with asyncio since Python 3.4 beta 1. asyncio issues fixed in Python 3.4 beta 3, in a random order: - I wrote most of the asyncio documentation, please help me to improve it! I tried to add many short examples, each time explaining one feature or concept (ex: callbacks, signals, futures, etc.): http://docs.python.org/dev/library/asyncio.html - Characters devices (TTY/PTY) are now supported, useful to control real terminal (not pipes) for subprocesses. On Mac OS X older than Maverick (10.9), the SelectSelector should be used instead of KqueueSelector (kqueue didnd't support character devices) - Tulip #111: StreamReader.readexactly() now raises an IncompleteReadError if the end of stream is reached before we received enough bytes, instead of returning less bytes than requested. - Python #20311: asyncio had a performance issue related to the resolution of selectors and clocks. For example, selectors.EpollSelector has a resolution of 1 millisecond (10^-3), whereas asyncio uses arbitrary timestamps. The issue was fixed by adding a resolution attribute to selectors and a private granularity attribute to asyncio.BaseEventLoop, and use the granularity in asyncio event loop to round times. - New Task.current_task() class method - Guido wrote a web crawler, see examples/crawl.py in Tulip - More symbols are exported in the main asyncio module (ex: Queue, iscouroutine(), etc.) - Charles-Fran?ois improved the signal handlers: SA_RESTART flag is now set to limit EINTR errors in syscalls - Some optimizations (ex: don't call logger.log() when it's not needed) - Many bugfixes - (sorry if I forgot other changes, see also Tulip history and history of the asyncio module in Python) I also would like to change asyncio to support a "stream-like" API for subprocesses, see Tulip issue #115 (and Python issue #20400): http://code.google.com/p/tulip/issues/detail?id=115 I ported ayncio on Python 2.6 and 2.7, because today OpenStack only uses these Python versions. I created a new project called "Trollius" (instead of "Tulip") because the syntax is a little bit different. "yield from" becomes "yield", and "return x" becomes "raise Return(x)": https://bitbucket.org/enovance/trollius https://pypi.python.org/pypi/trollius If you are interested by the OpenStack part, see my blueprint (something similar to PEPs but for smaller changes) for Oslo Messaing: https://wiki.openstack.org/wiki/Oslo/blueprints/asyncio There is an ongoing effort to port OpenStack to Python 3, eNovance is also working on the portage: https://wiki.openstack.org/wiki/Python3 Victor From solipsis at pitrou.net Mon Jan 27 11:50:03 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 27 Jan 2014 11:50:03 +0100 Subject: [Python-Dev] News from asyncio References: <CAMpsgwZE0snFr4pKcyE_j=TQj+R+TiiJRXm7g_zuSdHtc5XtpQ@mail.gmail.com> Message-ID: <20140127115003.0f5ba5b5@fsol> On Mon, 27 Jan 2014 10:45:37 +0100 Victor Stinner <victor.stinner at gmail.com> wrote: > > - Tulip #111: StreamReader.readexactly() now raises an > IncompleteReadError if the > end of stream is reached before we received enough bytes, instead of returning > less bytes than requested. Why not simply EOFError? Regards Antoine. From victor.stinner at gmail.com Mon Jan 27 11:55:22 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 27 Jan 2014 11:55:22 +0100 Subject: [Python-Dev] News from asyncio In-Reply-To: <20140127115003.0f5ba5b5@fsol> References: <CAMpsgwZE0snFr4pKcyE_j=TQj+R+TiiJRXm7g_zuSdHtc5XtpQ@mail.gmail.com> <20140127115003.0f5ba5b5@fsol> Message-ID: <CAMpsgwas4FfzH7Fah=XyRJ=4ONNkPQjQX1q0aG2BEjHh5F_hag@mail.gmail.com> 2014-01-27 Antoine Pitrou <solipsis at pitrou.net>: > On Mon, 27 Jan 2014 10:45:37 +0100 > Victor Stinner <victor.stinner at gmail.com> wrote: >> >> - Tulip #111: StreamReader.readexactly() now raises an >> IncompleteReadError if the >> end of stream is reached before we received enough bytes, instead of returning >> less bytes than requested. > > Why not simply EOFError? IncompleteReadError has two additionnal attributes: - partial: "incomplete" received bytes - expected: total number of expected bytes (n parameter of readexactly) I prefer to use a different exception to ensure that these attributes are present. I don't like having to check "hasattr(exc, ...)". Before this change, readexactly(n) returned the partial received bytes if the end of the stream was reached. Victor From gjcarneiro at gmail.com Mon Jan 27 12:10:20 2014 From: gjcarneiro at gmail.com (Gustavo Carneiro) Date: Mon, 27 Jan 2014 11:10:20 +0000 Subject: [Python-Dev] News from asyncio In-Reply-To: <CAMpsgwas4FfzH7Fah=XyRJ=4ONNkPQjQX1q0aG2BEjHh5F_hag@mail.gmail.com> References: <CAMpsgwZE0snFr4pKcyE_j=TQj+R+TiiJRXm7g_zuSdHtc5XtpQ@mail.gmail.com> <20140127115003.0f5ba5b5@fsol> <CAMpsgwas4FfzH7Fah=XyRJ=4ONNkPQjQX1q0aG2BEjHh5F_hag@mail.gmail.com> Message-ID: <CAO-CpELvhmuy2R=v=52vU3X=AWJO8knMtTYJwwOjF5dPeJuuyw@mail.gmail.com> On 27 January 2014 10:55, Victor Stinner <victor.stinner at gmail.com> wrote: > 2014-01-27 Antoine Pitrou <solipsis at pitrou.net>: > > On Mon, 27 Jan 2014 10:45:37 +0100 > > Victor Stinner <victor.stinner at gmail.com> wrote: > >> > >> - Tulip #111: StreamReader.readexactly() now raises an > >> IncompleteReadError if the > >> end of stream is reached before we received enough bytes, instead of > returning > >> less bytes than requested. > > > > Why not simply EOFError? > > IncompleteReadError has two additionnal attributes: > > - partial: "incomplete" received bytes > - expected: total number of expected bytes (n parameter of readexactly) > > I prefer to use a different exception to ensure that these attributes > are present. I don't like having to check "hasattr(exc, ...)". > > Before this change, readexactly(n) returned the partial received bytes > if the end of the stream was reached. > I had the same doubt. Note also that IncompleteReadError is a subclass of EOFError, so you can catch EOFError if you like. > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/gjcarneiro%40gmail.com > -- Gustavo J. A. M. Carneiro Gambit Research LLC "The universe is always one step beyond logic." -- Frank Herbert -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140127/d119ce6a/attachment.html> From victor.stinner at gmail.com Mon Jan 27 12:20:36 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 27 Jan 2014 12:20:36 +0100 Subject: [Python-Dev] News from asyncio In-Reply-To: <CAO-CpELvhmuy2R=v=52vU3X=AWJO8knMtTYJwwOjF5dPeJuuyw@mail.gmail.com> References: <CAMpsgwZE0snFr4pKcyE_j=TQj+R+TiiJRXm7g_zuSdHtc5XtpQ@mail.gmail.com> <20140127115003.0f5ba5b5@fsol> <CAMpsgwas4FfzH7Fah=XyRJ=4ONNkPQjQX1q0aG2BEjHh5F_hag@mail.gmail.com> <CAO-CpELvhmuy2R=v=52vU3X=AWJO8knMtTYJwwOjF5dPeJuuyw@mail.gmail.com> Message-ID: <CAMpsgwbMeyeN9UQ=qF7WQ17Ap9LUQFmt_U=zbVmye-HF37fG9g@mail.gmail.com> 2014-01-27 Gustavo Carneiro <gjcarneiro at gmail.com>: >> > Why not simply EOFError? >> >> IncompleteReadError has two additionnal attributes: >> >> - partial: "incomplete" received bytes >> - expected: total number of expected bytes (n parameter of readexactly) >> >> I prefer to use a different exception to ensure that these attributes >> are present. I don't like having to check "hasattr(exc, ...)". >> >> Before this change, readexactly(n) returned the partial received bytes >> if the end of the stream was reached. > > I had the same doubt. Note also that IncompleteReadError is a subclass of > EOFError, so you can catch EOFError if you like. Oops, I forgot to mention that :-) I just documented the new IncompleteReadError in asyncio doc. Victor From larry at hastings.org Mon Jan 27 13:01:02 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 27 Jan 2014 04:01:02 -0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <20140127103955.1eae1e6f@fsol> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <CAP7h-xbDarKLiO6eXS_Ct1zaKf1QHoLe_xqwrq35nS8SZ8G2QA@mail.gmail.com> <CAB+fVUXrKbD3_uBhrDz6d7g3LRLmE2BM-G3Taq3m=tDCJHMapg@mail.gmail.com> <CAP7h-xY5YNrnYj5j-qRe7-AXMD_sH7JcqbxgX6L25zqm_ZZ5cQ@mail.gmail.com> <52E5E814.2020904@hastings.org> <20140127103955.1eae1e6f@fsol> Message-ID: <52E64A7E.4070302@hastings.org> On 01/27/2014 01:39 AM, Antoine Pitrou wrote: > On Sun, 26 Jan 2014 21:01:08 -0800 > Larry Hastings <larry at hastings.org> wrote: >> On 01/26/2014 08:40 PM, Alexander Belopolsky wrote: >>> On Sun, Jan 26, 2014 at 11:26 PM, Vajrasky Kok >>> <sky.kok at speaklikeaking.com <mailto:sky.kok at speaklikeaking.com>> wrote: >>> >>> In case we are taking "not backporting anything at all" road, what is >>> the best fix for the document? >>> >>> >>> I would say no fix is needed for this doc because the signature >>> suggests (correctly) that passing times by keyword is not supported. >> Where does it do that? > In the "[,times]" spelling, which is the spelling customarily used for > positional-only arguments. That's not my experience. It's very common--in fact I believe more common--for functions that only accept positional arguments to *not* use the square-brackets-for-optional-parameters convention. The square-brackets-for-optional-parameters convention is not legal Python syntax, so I observe that documentation authors avoid it when they can, preferring to express their function's signature in real Python. As an example, consider "heapq.nlargest(n, iterable, key=None)". The implementation uses PyArg_ParseTuple to parse its parameters, and therefore does not accept keyword arguments. But--no square brackets. My experience is that the doc convention of square-brackets-for-optional-parameters is primarily used in two circumstances: one, when doing something really crazy like optional groups, and two, when the default value of one of the function's parameters is inconvenient to specify as a Python value. Of these two the second is far more common. An example of this latter case is zlib.compressobj(). The documentation shows its last parameter as "[, zdict]". However, the implementation parses uses PyArg_ParseTupleAndKeywords(), and therefore accepts keyword arguments. Furthermore, this notation simply cannot be used for functions that have only required parameters. You can't look at the constructor for "memoryview(object)" and determine whether or not it accepts keyword arguments. (It does.) There seems to be no strong correlation between functions that only accept positional-only parameters and functions whose documentation uses square-brackets-for-optional-parameters. Indeed, this is one of the things that can be frustrating about Python, which is why I hope we can make Python 3.5 more predictable in this area. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140127/bf4bd671/attachment.html> From solipsis at pitrou.net Mon Jan 27 13:12:25 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 27 Jan 2014 13:12:25 +0100 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <CAP7h-xbDarKLiO6eXS_Ct1zaKf1QHoLe_xqwrq35nS8SZ8G2QA@mail.gmail.com> <CAB+fVUXrKbD3_uBhrDz6d7g3LRLmE2BM-G3Taq3m=tDCJHMapg@mail.gmail.com> <CAP7h-xY5YNrnYj5j-qRe7-AXMD_sH7JcqbxgX6L25zqm_ZZ5cQ@mail.gmail.com> <52E5E814.2020904@hastings.org> <20140127103955.1eae1e6f@fsol> <52E64A7E.4070302@hastings.org> Message-ID: <20140127131225.71fc2d60@fsol> On Mon, 27 Jan 2014 04:01:02 -0800 Larry Hastings <larry at hastings.org> wrote: > > On 01/27/2014 01:39 AM, Antoine Pitrou wrote: > > On Sun, 26 Jan 2014 21:01:08 -0800 > > Larry Hastings <larry at hastings.org> wrote: > >> On 01/26/2014 08:40 PM, Alexander Belopolsky wrote: > >>> On Sun, Jan 26, 2014 at 11:26 PM, Vajrasky Kok > >>> <sky.kok at speaklikeaking.com <mailto:sky.kok at speaklikeaking.com>> wrote: > >>> > >>> In case we are taking "not backporting anything at all" road, what is > >>> the best fix for the document? > >>> > >>> > >>> I would say no fix is needed for this doc because the signature > >>> suggests (correctly) that passing times by keyword is not supported. > >> Where does it do that? > > In the "[,times]" spelling, which is the spelling customarily used for > > positional-only arguments. > > That's not my experience. But it's mine :-) (try "help(str)" or "help(list)") That said, it's fair to say that whatever convention there is isn't very strictly followed on this particular point. Regards Antoine. From sky.kok at speaklikeaking.com Mon Jan 27 13:22:53 2014 From: sky.kok at speaklikeaking.com (Vajrasky Kok) Date: Mon, 27 Jan 2014 20:22:53 +0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <20140127103830.2d54c0c6@fsol> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <20140127103830.2d54c0c6@fsol> Message-ID: <CAB+fVUU-MDvb5nVWouTpJQieZBvAQS7vFkJh61ytQDZnkU2BNA@mail.gmail.com> On Mon, Jan 27, 2014 at 5:38 PM, Antoine Pitrou <solipsis at pitrou.net> wrote: > > I would say not backport at all. The security threat is highly > theoretical. If someone blindly accepts user values for repeat(), the > user value can just as well be a very large positive with similar > effects (e.g. 2**31). > I can not comment about whether this is security issue or not. But the effect of large positive number is not similar to the effect of unlimited repetitions. >>> from itertools import repeat >>> list(repeat('a', 2**31)) Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError >>> list(repeat('a', 2**99)) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C ssize_t >>> list(repeat('a', times=-1)) ...this freezes my computer... That is why I prefer we backport the fix (either partial or full). If not, giving a big warning in the documentation should suffice. From storchaka at gmail.com Mon Jan 27 13:28:53 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 27 Jan 2014 14:28:53 +0200 Subject: [Python-Dev] News from asyncio In-Reply-To: <CAMpsgwas4FfzH7Fah=XyRJ=4ONNkPQjQX1q0aG2BEjHh5F_hag@mail.gmail.com> References: <CAMpsgwZE0snFr4pKcyE_j=TQj+R+TiiJRXm7g_zuSdHtc5XtpQ@mail.gmail.com> <20140127115003.0f5ba5b5@fsol> <CAMpsgwas4FfzH7Fah=XyRJ=4ONNkPQjQX1q0aG2BEjHh5F_hag@mail.gmail.com> Message-ID: <lc5jdp$fei$1@ger.gmane.org> 27.01.14 12:55, Victor Stinner ???????(??): > IncompleteReadError has two additionnal attributes: > > - partial: "incomplete" received bytes > - expected: total number of expected bytes (n parameter of readexactly) This looks similar to http.client.IncompleteRead. From larry at hastings.org Mon Jan 27 13:29:04 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 27 Jan 2014 04:29:04 -0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <lc59uu$4m9$1@ger.gmane.org> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> <lc59uu$4m9$1@ger.gmane.org> Message-ID: <52E65110.7070904@hastings.org> On 01/27/2014 01:47 AM, Mark Lawrence wrote: > On 27/01/2014 01:52, Nick Coghlan wrote: >> >> In 3.5, that will be passing None, rather than -1. For those proposing >> to change the maintenance releases, how should a user relying on this >> misbehaviour update their code to handle it? >> > > I'm -1 on using None. The code currently rejects anything except an > int. The docs don't say anything about using None, except in the > "equivalent to" section, which is also the only place where it looks > as if times can be a keyword argument. > The docs describe the signature of itertools.repeat twice: the first time as its heading (" itertools.repeat(object[, times])"), the second time as an example implementation asserted to be equivalent to Python's implementation. These two signatures are not identical, but they are compatible. You state that we should pay attention to the first and ignore the second. How did you arrive at that conclusion? Also, you say something strange like "which is also the only place where it looks as if times can be a keyword argument.". I don't see a point over debating whether or not "times" *looks* like it can be a keyword argument. itertools.repeat() has accepted keyword arguments since 2.7. The code currently has semantics that cannot be accurately represented in a Python signature. We could do one of three things: 1) Do nothing, and don't allow inspect.Signature to produce a signature for the function. This is the status quo. 2) Change the semantics of the function in a non-backwards-compatible way so that we can represent its signature accurately with an inspect.Signature object. For example, "change the function so that providing times=-1 as a keyword argument behaves the same as providing times=-1 as a positional-only argument" is such an incompatible change. Another is "change the function to not accept keyword arguments at all". 3) Change the semantics of the function in a backwards-compatible way so that we can represent its supported signature accurately with an inspect.Signature object. Allow continued use of the old semantics for a full deprecation cycle (two major versions) if not longer. For example, "change the times argument to have a default of None, and change the logic so that times=None means it repeats forever" would be such an approach. For 3.3 and 3.4, I suggest that only 1) makes sense. For 3.5 I prefer 3), specifically the "times=None" approach, as that's how the function has been documented as working since the itertools module was first introduce in 2.3. And I see functions as having accurate signatures as a good thing. I'm against 2), as I'm against removing functionality simply for purity's sakes. Removing functionality breaks code. So it's best reserved for critical problems like security issues. I cite the thread we just had in python-dev, subject line "Deprecation policy", as an excellent discussion and summary of this topic. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140127/0f573c8d/attachment.html> From solipsis at pitrou.net Mon Jan 27 13:29:18 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 27 Jan 2014 13:29:18 +0100 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CAB+fVUU-MDvb5nVWouTpJQieZBvAQS7vFkJh61ytQDZnkU2BNA@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <20140127103830.2d54c0c6@fsol> <CAB+fVUU-MDvb5nVWouTpJQieZBvAQS7vFkJh61ytQDZnkU2BNA@mail.gmail.com> Message-ID: <20140127132918.177728cc@fsol> On Mon, 27 Jan 2014 20:22:53 +0800 Vajrasky Kok <sky.kok at speaklikeaking.com> wrote: > > >>> from itertools import repeat > >>> list(repeat('a', 2**31)) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > MemoryError Sure, just adjust the number to fit the available memory (here, 2**29 does the trick). Regards Antoine. From ncoghlan at gmail.com Mon Jan 27 13:43:39 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Jan 2014 22:43:39 +1000 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <20140127132918.177728cc@fsol> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <20140127103830.2d54c0c6@fsol> <CAB+fVUU-MDvb5nVWouTpJQieZBvAQS7vFkJh61ytQDZnkU2BNA@mail.gmail.com> <20140127132918.177728cc@fsol> Message-ID: <CADiSq7cpWFKoSEPmwLYMGFkawmUd3n3Dj=1PU0VQ1xnfWisAJw@mail.gmail.com> On 27 January 2014 22:29, Antoine Pitrou <solipsis at pitrou.net> wrote: > On Mon, 27 Jan 2014 20:22:53 +0800 > Vajrasky Kok <sky.kok at speaklikeaking.com> wrote: >> >> >>> from itertools import repeat >> >>> list(repeat('a', 2**31)) >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> MemoryError > > Sure, just adjust the number to fit the available memory (here, 2**29 > does the trick). And for anyone interested in why a sufficiently large positive value that won't fit in available RAM fails gracefully with MemoryError: >>> repeat('a', 2**31).__length_hint__() 2147483648 >>> repeat('a', -1).__length_hint__() 0 list() uses __length_hint__() for preallocation, so a sufficiently large length hint means the preallocation attempt fails with MemoryError. As Antoine showed though, you still can't feed it untrusted data, because a large enough value that just fits into RAM can still cause you a lot of grief. Everything points to "times=-1" behaving as it does being a bug, but not a sufficiently critical one to risk breaking working code in a maintenance release. That makes deprecating the current behaviour of "times=-1" and accepting "times=None" in 3.5 the least controversial course of action. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Mon Jan 27 13:56:51 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 27 Jan 2014 23:56:51 +1100 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <52E65110.7070904@hastings.org> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> <lc59uu$4m9$1@ger.gmane.org> <52E65110.7070904@hastings.org> Message-ID: <20140127125651.GE3915@ando> On Mon, Jan 27, 2014 at 04:29:04AM -0800, Larry Hastings wrote: > The code currently has semantics that cannot be accurately represented > in a Python signature. We could do one of three things: > > 1) Do nothing, and don't allow inspect.Signature to produce a signature > for the function. This is the status quo. > > 2) Change the semantics of the function in a non-backwards-compatible > way so that we can represent its signature accurately with an > inspect.Signature object. For example, "change the function so that > providing times=-1 as a keyword argument behaves the same as providing > times=-1 as a positional-only argument" is such an incompatible change. > Another is "change the function to not accept keyword arguments at all". > > 3) Change the semantics of the function in a backwards-compatible way so > that we can represent its supported signature accurately with an > inspect.Signature object. Allow continued use of the old semantics for > a full deprecation cycle (two major versions) if not longer. For > example, "change the times argument to have a default of None, and > change the logic so that times=None means it repeats forever" would be > such an approach. > > For 3.3 and 3.4, I suggest that only 1) makes sense. Are you rejecting the idea that the current behaviour is an out and out buggy, and therefore fixing these things can and should occur in a bug-fix release? As far as I can see, the only piece of evidence that the given behaviour isn't a bug is that the signature says "object [, times]" rather than "object, times=None". That's not conclusive: I've often written signatures using [ ] to indicate optional arguments without specifying the default value in the signature. As it stands now, the documentation is internally contradictory. In one part of the documentation, it gives a clear indication that "times is None" should select the repeat forever behaviour. In another part of the documentation, it fails to mention that None is an acceptable value to select the repeat forever behaviour. > For 3.5 I prefer > 3), specifically the "times=None" approach, as that's how the function > has been documented as working since the itertools module was first > introduce in 2.3. And I see functions as having accurate signatures as > a good thing. > > I'm against 2), as I'm against removing functionality simply for > purity's sakes. Removing functionality breaks code. So it's best > reserved for critical problems like security issues. I cite the thread > we just had in python-dev, subject line "Deprecation policy", as an > excellent discussion and summary of this topic. I'm confused... you seem to be saying that you are *against* changing the behaviour of repeat so that: repeat(x, -1) and repeat(x, times=-1) behave the same. Is that actually what you mean, or have I misunderstood? Are there any other functions in the standard library where the behaviour differs depending on whether an argument is given positionally or by keyword? -- Steven From larry at hastings.org Mon Jan 27 14:13:01 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 27 Jan 2014 05:13:01 -0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <20140127125651.GE3915@ando> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> <lc59uu$4m9$1@ger.gmane.org> <52E65110.7070904@hastings.org> <20140127125651.GE3915@ando> Message-ID: <52E65B5D.6060502@hastings.org> On 01/27/2014 04:56 AM, Steven D'Aprano wrote (rearranged slightly so I could make my points in order): > I'm confused... you seem to be saying that you are *against* changing > the behaviour of repeat so that: > > repeat(x, -1) > > and > > repeat(x, times=-1) > > behave the same. Is that actually what you mean, or have I > misunderstood? I apologize for not making myself clear. But that's part of what I meant, yes: we should preserve the existing behavior of times=-1 when passed in by position or by keyword. However, we should *also* add a deprecation warning when passing times=-1 by keyword, suggesting that they use times=None instead. The idea is that we could eventually remove the PyTuple_Size check and make times=-1 always behave like times=0. In practice it'd be okay with me if we never did, or at least not until Python 4. > Are you rejecting the idea that the current behaviour is an out and out > buggy, and therefore fixing these things can and should occur in a > bug-fix release? While it's a bug, it's a very minor bug. As Python 3.4 release manager, my position is: Python 3.4 is in beta, so let's not change semantics for purity's sakes now. I'm -0.5 on adding times=None right now, and until we do we can't deprecate the old behavior. > Are there any other functions in the standard library where the > behaviour differs depending on whether an argument is given positionally > or by keyword? Not that I know of. This instance seems to be purely unintentional; see my latest message on the relevant issue, where I went back and figured out why itertools.repeat behaves like this in the first place: http://bugs.python.org/issue19145#msg209440 Cheers, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140127/5bf105d9/attachment.html> From victor.stinner at gmail.com Mon Jan 27 14:21:39 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 27 Jan 2014 14:21:39 +0100 Subject: [Python-Dev] News from asyncio In-Reply-To: <lc5jdp$fei$1@ger.gmane.org> References: <CAMpsgwZE0snFr4pKcyE_j=TQj+R+TiiJRXm7g_zuSdHtc5XtpQ@mail.gmail.com> <20140127115003.0f5ba5b5@fsol> <CAMpsgwas4FfzH7Fah=XyRJ=4ONNkPQjQX1q0aG2BEjHh5F_hag@mail.gmail.com> <lc5jdp$fei$1@ger.gmane.org> Message-ID: <CAMpsgwYAqaUYo+QgDz7ihiRxsvmu2VHSGeJ6Y37M579V7BND_g@mail.gmail.com> 2014-01-27 Serhiy Storchaka <storchaka at gmail.com>: > 27.01.14 12:55, Victor Stinner ???????(??): > >> IncompleteReadError has two additionnal attributes: >> >> - partial: "incomplete" received bytes >> - expected: total number of expected bytes (n parameter of readexactly) > > This looks similar to http.client.IncompleteRead. Please read the original issue for more information: http://code.google.com/p/tulip/issues/detail?id=111 I mentionned http.client.IncompleRead exception there. The HTTP exception is similar but also different: - asyncio.IncompleReadError inherits from EOFError, not from HTTPException (which inherits from Exception) - asyncio.IncompleReadError.expected is the total expected size, not the remaining size Victor From breamoreboy at yahoo.co.uk Mon Jan 27 14:29:28 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 27 Jan 2014 13:29:28 +0000 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <20140127125651.GE3915@ando> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> <lc59uu$4m9$1@ger.gmane.org> <52E65110.7070904@hastings.org> <20140127125651.GE3915@ando> Message-ID: <lc5mvb$2cq$1@ger.gmane.org> On 27/01/2014 12:56, Steven D'Aprano wrote: > > As it stands now, the documentation is internally contradictory. In > one part of the documentation, it gives a clear indication that > "times is None" should select the repeat forever behaviour. In > another part of the documentation, it fails to mention that None is > an acceptable value to select the repeat forever behaviour. > None is not currently an acceptable value, ValueError is raised if you provide anything other than an int in both Python 2.7 and 3.3. That's why I'm against using it to say "run forever" in Python 3.5. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From tds333 at gmail.com Mon Jan 27 15:40:44 2014 From: tds333 at gmail.com (Wolfgang) Date: Mon, 27 Jan 2014 15:40:44 +0100 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) Message-ID: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> Hi, I tested the latest beta from 3.4 (b3) and noticed there is a new marshal protocol version 3. The documentation is a little silent about the new features, not going into detail. I've run a performance test with the new protocol version and noticed the new version is two times slower in serialization than version 2. I tested it with a simple value tuple in a list (500000 elements). Nothing special. (happens only if the tuple contains also a tuple) Copy of the test code: from time import time from marshal import dumps def genData(amount=500000): for i in range(amount): yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True) data = list(genData()) print(len(data)) t0 = time() result = dumps(data, 2) t1 = time() print("duration p2: %f" % (t1-t0)) t0 = time() result = dumps(data, 3) t1 = time() print("duration p3: %f" % (t1-t0)) Is the overhead for the recursion detection so high ? Note this happens only if there is a tuple in the tuple of the datalist. Regards, Wolfgang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140127/d08e89a7/attachment.html> From victor.stinner at gmail.com Mon Jan 27 16:35:24 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 27 Jan 2014 16:35:24 +0100 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> Message-ID: <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> Hi, I'm surprised: marshal.dumps() doesn't raise an error if you pass an invalid version. In fact, Python 3.3 only supports versions 0, 1 and 2. If you pass 3, it will use the version 2. (Same apply for version 99.) Python 3.4 has two new versions: 3 and 4. The version 3 "shares common object references", the version 4 adds short tuples and short strings (produce smaller files). It would be nice to document the differences between marshal versions. And what do you think of raising an error if the version is unknown in marshal.dumps()? I modified your benchmark to test also loads() and run the benchmark 10 times. Results: --- Python 3.3.3+ (3.3:50aa9e3ab9a4, Jan 27 2014, 16:11:26) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux dumps v0: 391.9 ms data size v0: 45582.9 kB loads v0: 616.2 ms dumps v1: 384.3 ms data size v1: 45582.9 kB loads v1: 594.0 ms dumps v2: 153.1 ms data size v2: 41395.4 kB loads v2: 549.6 ms dumps v3: 152.1 ms data size v3: 41395.4 kB loads v3: 535.9 ms dumps v4: 152.3 ms data size v4: 41395.4 kB loads v4: 549.7 ms --- And: --- Python 3.4.0b3+ (default:dbad4564cd12, Jan 27 2014, 16:09:40) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux dumps v0: 389.4 ms data size v0: 45582.9 kB loads v0: 564.8 ms dumps v1: 390.2 ms data size v1: 45582.9 kB loads v1: 545.6 ms dumps v2: 165.5 ms data size v2: 41395.4 kB loads v2: 470.9 ms dumps v3: 425.6 ms data size v3: 41395.4 kB loads v3: 528.2 ms dumps v4: 369.2 ms data size v4: 37000.9 kB loads v4: 550.2 ms --- Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with Python 3.4 produces the smallest file. Victor 2014-01-27 Wolfgang <tds333 at gmail.com>: > Hi, > > I tested the latest beta from 3.4 (b3) and noticed there is a new marshal > protocol version 3. > The documentation is a little silent about the new features, not going into > detail. > > I've run a performance test with the new protocol version and noticed the > new version is two times slower in serialization than version 2. I tested it > with a simple value tuple in a list (500000 elements). > Nothing special. (happens only if the tuple contains also a tuple) > > Copy of the test code: > > > from time import time > from marshal import dumps > > def genData(amount=500000): > for i in range(amount): > yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, > True) > > data = list(genData()) > print(len(data)) > t0 = time() > result = dumps(data, 2) > t1 = time() > print("duration p2: %f" % (t1-t0)) > t0 = time() > result = dumps(data, 3) > t1 = time() > print("duration p3: %f" % (t1-t0)) > > > > Is the overhead for the recursion detection so high ? > > Note this happens only if there is a tuple in the tuple of the datalist. > > > Regards, > > Wolfgang > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > -------------- next part -------------- A non-text attachment was scrubbed... Name: bench.py Type: text/x-python Size: 752 bytes Desc: not available URL: <http://mail.python.org/pipermail/python-dev/attachments/20140127/bdfac9c6/attachment.py> From p.f.moore at gmail.com Mon Jan 27 16:42:13 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 27 Jan 2014 15:42:13 +0000 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> Message-ID: <CACac1F_9xoxWusiXQRZ3FCOOpLj8zEZ9BJk8tQtsRBJgM9D8tg@mail.gmail.com> On 27 January 2014 15:35, Victor Stinner <victor.stinner at gmail.com> wrote: > Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with > Python 3.4 produces the smallest file. Which version is used when creating pyc files? This benchmark might suggest that version 2 is the best... Paul From brett at python.org Mon Jan 27 17:02:36 2014 From: brett at python.org (Brett Cannon) Date: Mon, 27 Jan 2014 11:02:36 -0500 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <CACac1F_9xoxWusiXQRZ3FCOOpLj8zEZ9BJk8tQtsRBJgM9D8tg@mail.gmail.com> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> <CACac1F_9xoxWusiXQRZ3FCOOpLj8zEZ9BJk8tQtsRBJgM9D8tg@mail.gmail.com> Message-ID: <CAP1=2W6CcX9v85Yx_f52KG5RCsHQUacLj-Qsq5zky8W4PD6jGQ@mail.gmail.com> On Mon, Jan 27, 2014 at 10:42 AM, Paul Moore <p.f.moore at gmail.com> wrote: > On 27 January 2014 15:35, Victor Stinner <victor.stinner at gmail.com> wrote: > > Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with > > Python 3.4 produces the smallest file. > > Which version is used when creating pyc files? This benchmark might > suggest that version 2 is the best... > Importlib just uses the default: http://hg.python.org/cpython/file/dbad4564cd12/Lib/importlib/_bootstrap.py#l671 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140127/fb679097/attachment.html> From g.brandl at gmx.net Mon Jan 27 17:40:56 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 27 Jan 2014 17:40:56 +0100 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <20140127131225.71fc2d60@fsol> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <CAP7h-xbDarKLiO6eXS_Ct1zaKf1QHoLe_xqwrq35nS8SZ8G2QA@mail.gmail.com> <CAB+fVUXrKbD3_uBhrDz6d7g3LRLmE2BM-G3Taq3m=tDCJHMapg@mail.gmail.com> <CAP7h-xY5YNrnYj5j-qRe7-AXMD_sH7JcqbxgX6L25zqm_ZZ5cQ@mail.gmail.com> <52E5E814.2020904@hastings.org> <20140127103955.1eae1e6f@fsol> <52E64A7E.4070302@hastings.org> <20140127131225.71fc2d60@fsol> Message-ID: <lc624s$h31$1@ger.gmane.org> Am 27.01.2014 13:12, schrieb Antoine Pitrou: > On Mon, 27 Jan 2014 04:01:02 -0800 > Larry Hastings <larry at hastings.org> wrote: >> >> On 01/27/2014 01:39 AM, Antoine Pitrou wrote: >> > On Sun, 26 Jan 2014 21:01:08 -0800 >> > Larry Hastings <larry at hastings.org> wrote: >> >> On 01/26/2014 08:40 PM, Alexander Belopolsky wrote: >> >>> On Sun, Jan 26, 2014 at 11:26 PM, Vajrasky Kok >> >>> <sky.kok at speaklikeaking.com <mailto:sky.kok at speaklikeaking.com>> wrote: >> >>> >> >>> In case we are taking "not backporting anything at all" road, what is >> >>> the best fix for the document? >> >>> >> >>> >> >>> I would say no fix is needed for this doc because the signature >> >>> suggests (correctly) that passing times by keyword is not supported. >> >> Where does it do that? >> > In the "[,times]" spelling, which is the spelling customarily used for >> > positional-only arguments. >> >> That's not my experience. > > But it's mine :-) (try "help(str)" or "help(list)") It's also the convention we've been using for the docs. Georg From jeanpierreda at gmail.com Mon Jan 27 18:15:55 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 27 Jan 2014 09:15:55 -0800 Subject: [Python-Dev] News from asyncio In-Reply-To: <CAMpsgwYAqaUYo+QgDz7ihiRxsvmu2VHSGeJ6Y37M579V7BND_g@mail.gmail.com> References: <CAMpsgwZE0snFr4pKcyE_j=TQj+R+TiiJRXm7g_zuSdHtc5XtpQ@mail.gmail.com> <20140127115003.0f5ba5b5@fsol> <CAMpsgwas4FfzH7Fah=XyRJ=4ONNkPQjQX1q0aG2BEjHh5F_hag@mail.gmail.com> <lc5jdp$fei$1@ger.gmane.org> <CAMpsgwYAqaUYo+QgDz7ihiRxsvmu2VHSGeJ6Y37M579V7BND_g@mail.gmail.com> Message-ID: <CABicbJKchK+hS+YuOPeSigHi-+oygctcsY9ZpJrOY8RAH1XL4A@mail.gmail.com> On Mon, Jan 27, 2014 at 5:21 AM, Victor Stinner <victor.stinner at gmail.com> wrote: > - asyncio.IncompleReadError.expected is the total expected size, not > the remaining size Why not be consistent with the meaning of http.client.IncompleteRead.expected? The current meaning can be recovered via len(e.partial) + e.expected. -- Devin From tds333 at gmail.com Mon Jan 27 18:23:20 2014 From: tds333 at gmail.com (Wolfgang) Date: Mon, 27 Jan 2014 18:23:20 +0100 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <CAP1=2W6CcX9v85Yx_f52KG5RCsHQUacLj-Qsq5zky8W4PD6jGQ@mail.gmail.com> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> <CACac1F_9xoxWusiXQRZ3FCOOpLj8zEZ9BJk8tQtsRBJgM9D8tg@mail.gmail.com> <CAP1=2W6CcX9v85Yx_f52KG5RCsHQUacLj-Qsq5zky8W4PD6jGQ@mail.gmail.com> Message-ID: <CAFTeXi1=ZGQG83nH8AFKx1jzdpvqvgiT8aUJDyQAmaKBcHM5Fw@mail.gmail.com> Thanks Victor for improving this. I also have to note, version 3 is only in the case of tuple in tuple slower. If you use a flat tuple it is faster than version 2. So I asked for this corner case and thought the recursion detection or something else has a huge cost. For pyc files, I think the highest available version is the used default. I didn't know version 4, nowhere mentioned in the docs. Also figured out, that every integer is accepted as protocol version. But was usable for tests against 3.3 and 2.7. :-) On Mon, Jan 27, 2014 at 5:02 PM, Brett Cannon <brett at python.org> wrote: > > > > On Mon, Jan 27, 2014 at 10:42 AM, Paul Moore <p.f.moore at gmail.com> wrote: > >> On 27 January 2014 15:35, Victor Stinner <victor.stinner at gmail.com> >> wrote: >> > Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with >> > Python 3.4 produces the smallest file. >> >> Which version is used when creating pyc files? This benchmark might >> suggest that version 2 is the best... >> > > Importlib just uses the default: > http://hg.python.org/cpython/file/dbad4564cd12/Lib/importlib/_bootstrap.py#l671 > > -- bye by Wolfgang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140127/19344c52/attachment-0001.html> From storchaka at gmail.com Mon Jan 27 21:00:47 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 27 Jan 2014 22:00:47 +0200 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> Message-ID: <lc6dsg$9ni$1@ger.gmane.org> 27.01.14 17:35, Victor Stinner ???????(??): > Python 3.4 has two new versions: 3 and 4. The version 3 "shares common > object references", the version 4 adds short tuples and short strings > (produce smaller files). Why we need two new versions added in one Python release? From sky.kok at speaklikeaking.com Tue Jan 28 03:00:36 2014 From: sky.kok at speaklikeaking.com (Vajrasky Kok) Date: Tue, 28 Jan 2014 10:00:36 +0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <52E65B5D.6060502@hastings.org> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> <lc59uu$4m9$1@ger.gmane.org> <52E65110.7070904@hastings.org> <20140127125651.GE3915@ando> <52E65B5D.6060502@hastings.org> Message-ID: <CAB+fVUX=ysZ5iNk2DZWrECcWdYqyfhqXwWAdsB_TUtXesuVtAQ@mail.gmail.com> On Mon, Jan 27, 2014 at 9:13 PM, Larry Hastings <larry at hastings.org> wrote: > > I apologize for not making myself clear. But that's part of what I meant, > yes: we should preserve the existing behavior of times=-1 when passed in by > position or by keyword. However, we should *also* add a deprecation warning > when passing times=-1 by keyword, suggesting that they use times=None > instead. The idea is that we could eventually remove the PyTuple_Size check > and make times=-1 always behave like times=0. In practice it'd be okay with > me if we never did, or at least not until Python 4. > So we only add deprecation warning to only times=-1 via keyword or for all negative numbers to times via keyword? I mean, what about: >>> from itertools import repeat >>> list(repeat('a', times=-2)) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C ssize_t Deprecation warning or not? From sky.kok at speaklikeaking.com Tue Jan 28 03:05:37 2014 From: sky.kok at speaklikeaking.com (Vajrasky Kok) Date: Tue, 28 Jan 2014 10:05:37 +0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <20140127132918.177728cc@fsol> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <20140127103830.2d54c0c6@fsol> <CAB+fVUU-MDvb5nVWouTpJQieZBvAQS7vFkJh61ytQDZnkU2BNA@mail.gmail.com> <20140127132918.177728cc@fsol> Message-ID: <CAB+fVUXEoSJuXyF2Oy7N3qQptzxDnyEi8UF-6YwU3FzQT=tejQ@mail.gmail.com> On Mon, Jan 27, 2014 at 8:29 PM, Antoine Pitrou <solipsis at pitrou.net> wrote: > > Sure, just adjust the number to fit the available memory (here, 2**29 > does the trick). > I get your point. But strangely enough, I can still recover from list(repeat('a', 2**29)). It only slows down my computer. I can ^Z the application then kill it later. But with list(repeat('a', times=-1)), rebooting the machine is compulsory. From larry at hastings.org Tue Jan 28 03:24:25 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 27 Jan 2014 18:24:25 -0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CAB+fVUX=ysZ5iNk2DZWrECcWdYqyfhqXwWAdsB_TUtXesuVtAQ@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> <lc59uu$4m9$1@ger.gmane.org> <52E65110.7070904@hastings.org> <20140127125651.GE3915@ando> <52E65B5D.6060502@hastings.org> <CAB+fVUX=ysZ5iNk2DZWrECcWdYqyfhqXwWAdsB_TUtXesuVtAQ@mail.gmail.com> Message-ID: <52E714D9.1060308@hastings.org> On 01/27/2014 06:00 PM, Vajrasky Kok wrote: > On Mon, Jan 27, 2014 at 9:13 PM, Larry Hastings <larry at hastings.org> wrote: >> I apologize for not making myself clear. But that's part of what I meant, >> yes: we should preserve the existing behavior of times=-1 when passed in by >> position or by keyword. However, we should *also* add a deprecation warning >> when passing times=-1 by keyword, suggesting that they use times=None >> instead. The idea is that we could eventually remove the PyTuple_Size check >> and make times=-1 always behave like times=0. In practice it'd be okay with >> me if we never did, or at least not until Python 4. >> > So we only add deprecation warning to only times=-1 via keyword or for > all negative numbers to times via keyword? > > I mean, what about: >>>> from itertools import repeat >>>> list(repeat('a', times=-2)) > I should have been even *more* precise! When I said "times=-1" I really meant all negative numbers. (I was trying to abbreviate it as -1, as my text was already too long and unwieldly.) I propose the logic be equivalent to this, handwaving for clarity boilerplate error handling (the real implementation would handle PyArg_ParseParseTupleAndKeywords or PyLong_ToPy_ssize_t failing): PyObject *element, times = Py_None; Py_ssize_t cnt; PyArg_ParseTupleAndKeywords(args, kwargs, "O|O:repeat", kwargs, &element, ×); if times == Py_None cnt = -1 else cnt = PyLong_ToPy_ssize_t(times) if cnt < 0 if "times" was passed by keyword issue DeprecationWarning, "use repeat(o, times=None) to repeat indefinitely" else cnt = 0 (For those of you who aren't familiar with the source: "cnt" is the internal variable used to set the repeat count of the iterator. If "cnt" is < 0, the iterator repeats forever.) If in the future we actually removed the deprecated behavior, the last "if" block would change simply to if cnt < 0 cnt = 0 //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140127/b3a7381f/attachment.html> From sky.kok at speaklikeaking.com Tue Jan 28 03:26:17 2014 From: sky.kok at speaklikeaking.com (Vajrasky Kok) Date: Tue, 28 Jan 2014 10:26:17 +0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <52E65B5D.6060502@hastings.org> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> <lc59uu$4m9$1@ger.gmane.org> <52E65110.7070904@hastings.org> <20140127125651.GE3915@ando> <52E65B5D.6060502@hastings.org> Message-ID: <CAB+fVUVEpJUpJS_um1HiHB8Bp9wswgBLNSAwk2R1424bd6snrQ@mail.gmail.com> On Mon, Jan 27, 2014 at 9:13 PM, Larry Hastings <larry at hastings.org> wrote: > > > While it's a bug, it's a very minor bug. As Python 3.4 release manager, my > position is: Python 3.4 is in beta, so let's not change semantics for > purity's sakes now. I'm -0.5 on adding times=None right now, and until we > do we can't deprecate the old behavior. > > I bow to your decision, Larry. So I believe the doc fix is required then. I propose these for doc fix: 1. Keeps the status quo ================= >>> repeat.__doc__ 'repeat(object [,times]) -> create an iterator which returns the object\nfor the specified number of times. If not specified, returns the object\nendlessly.' We don't explain the meaning of negative `times`. Well, people shouldn't repeat with negative times because statement such as, "Kids, repeat the push-up negative two times more.", does not make sense. 2. Explains the negative times, ignores the keyword ===================================== >>> repeat.__doc__ 'repeat(object [,times]) -> create an iterator which returns the object\nfor the specified number of times. If not specified, returns the object\nendlessly. Negative times means zero repetitions.' The signature repeat(object [,times]) suggest this function does not accept keyword as some core developers have stated. So if the user uses keyword with this function, well, it's too bad for them. 3. Explains the negative times, warns about keyword ====================================== >>> repeat.__doc__ 'repeat(object [,times]) -> create an iterator which returns the object\nfor the specified number of times. If not specified, returns the object\nendlessly. Negative times means zero repetitions. This function accepts keyword argument but the behaviour is buggy and should be avoided.' 4. Explains everything ================ >>> repeat.__doc__ 'repeat(object [,times]) -> create an iterator which returns the object\nfor the specified number of times. If not specified, returns the object\nendlessly. Negative times means zero repetitions via positional-only arguments. -1 value for times via keyword means endless repetitions and is same as omitting times argument and other negative number for times means endless repetitions as well but with different implementation.' If you are wondering about the last statement: >>> from itertools import repeat >>> list(repeat('a', times=-4)) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C ssize_t >>> a = repeat('a', times=-4) >>> next(a) 'a' >>> next(a) 'a' >>> a = repeat('a', times=-1) >>> next(a) 'a' >>> next(a) 'a' >>> list(repeat('a', times=-1)) ... freezes your computer ... Which one is better? Once we settle this, I can think about the doc fix for Doc/library/itertools.rst. Vajrasky From larry at hastings.org Tue Jan 28 07:06:57 2014 From: larry at hastings.org (Larry Hastings) Date: Mon, 27 Jan 2014 22:06:57 -0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CAB+fVUVEpJUpJS_um1HiHB8Bp9wswgBLNSAwk2R1424bd6snrQ@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> <lc59uu$4m9$1@ger.gmane.org> <52E65110.7070904@hastings.org> <20140127125651.GE3915@ando> <52E65B5D.6060502@hastings.org> <CAB+fVUVEpJUpJS_um1HiHB8Bp9wswgBLNSAwk2R1424bd6snrQ@mail.gmail.com> Message-ID: <52E74901.1070008@hastings.org> On 01/27/2014 06:26 PM, Vajrasky Kok wrote: > So I believe the doc fix is required then. I propose the docstring should describe only supported behavior, and the docs in the manual should mention the unsupported behavior. However, I'm interested in Raymond's take, as he's the original author of itertools.repeat. If I were writing it, it might well come out like this: docstring: repeat(object [,times]) -> iterator Return an iterator which yields the object for the specified number of times. If times is unspecified, yields the object forever. If times is negative, behave as if times is 0. documentation: repeat(object [,times]) -> iterator Return an iterator which yields the object for the specified number of times. If times is unspecified, yields the object forever. If times is negative, behave as if times is 0. Equivalent to: def repeat(object, times=None): # repeat(10, 3) --> 10 10 10 if times is None: while True: yield object else: for i in range(times): yield object A common use for repeat is to supply a stream of constant values to map or zip: >>> >>> list(map(pow, range(10), repeat(2))) [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] .. note: if "times" is specified using a keyword argument, and provided with a negative value, repeat yields the object forever. This is a bug, its use is unsupported, and this behavior may be removed in a future version of Python. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140127/5c6b6b5c/attachment.html> From kristjan at ccpgames.com Tue Jan 28 06:14:52 2014 From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=) Date: Tue, 28 Jan 2014 05:14:52 +0000 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> Message-ID: <EFE3877620384242A686D52278B7CCD3A5254B85@rkv-it-exch103> Hi there. I think you should modify your program to marshal (and load) a compiled module. This is where the optimizations in versions 3 and 4 become important. K > -----Original Message----- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Victor Stinner > Sent: Monday, January 27, 2014 23:35 > To: Wolfgang > Cc: Python-Dev > Subject: Re: [Python-Dev] Python 3.4, marshal dumps slower (version 3 > protocol) > > Hi, > > I'm surprised: marshal.dumps() doesn't raise an error if you pass an invalid > version. In fact, Python 3.3 only supports versions 0, 1 and 2. If you pass 3, it > will use the version 2. (Same apply for version > 99.) > > Python 3.4 has two new versions: 3 and 4. The version 3 "shares common > object references", the version 4 adds short tuples and short strings > (produce smaller files). > > It would be nice to document the differences between marshal versions. > > And what do you think of raising an error if the version is unknown in > marshal.dumps()? > > I modified your benchmark to test also loads() and run the benchmark > 10 times. Results: > --- > Python 3.3.3+ (3.3:50aa9e3ab9a4, Jan 27 2014, 16:11:26) [GCC 4.8.2 20131212 > (Red Hat 4.8.2-7)] on linux > > dumps v0: 391.9 ms > data size v0: 45582.9 kB > loads v0: 616.2 ms > > dumps v1: 384.3 ms > data size v1: 45582.9 kB > loads v1: 594.0 ms > > dumps v2: 153.1 ms > data size v2: 41395.4 kB > loads v2: 549.6 ms > > dumps v3: 152.1 ms > data size v3: 41395.4 kB > loads v3: 535.9 ms > > dumps v4: 152.3 ms > data size v4: 41395.4 kB > loads v4: 549.7 ms > --- > > And: > --- > Python 3.4.0b3+ (default:dbad4564cd12, Jan 27 2014, 16:09:40) [GCC 4.8.2 > 20131212 (Red Hat 4.8.2-7)] on linux > > dumps v0: 389.4 ms > data size v0: 45582.9 kB > loads v0: 564.8 ms > > dumps v1: 390.2 ms > data size v1: 45582.9 kB > loads v1: 545.6 ms > > dumps v2: 165.5 ms > data size v2: 41395.4 kB > loads v2: 470.9 ms > > dumps v3: 425.6 ms > data size v3: 41395.4 kB > loads v3: 528.2 ms > > dumps v4: 369.2 ms > data size v4: 37000.9 kB > loads v4: 550.2 ms > --- > > Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with Python 3.4 > produces the smallest file. > > Victor > > 2014-01-27 Wolfgang <tds333 at gmail.com>: > > Hi, > > > > I tested the latest beta from 3.4 (b3) and noticed there is a new > > marshal protocol version 3. > > The documentation is a little silent about the new features, not going > > into detail. > > > > I've run a performance test with the new protocol version and noticed > > the new version is two times slower in serialization than version 2. I > > tested it with a simple value tuple in a list (500000 elements). > > Nothing special. (happens only if the tuple contains also a tuple) > > > > Copy of the test code: > > > > > > from time import time > > from marshal import dumps > > > > def genData(amount=500000): > > for i in range(amount): > > yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, > > 1.01*i, > > True) > > > > data = list(genData()) > > print(len(data)) > > t0 = time() > > result = dumps(data, 2) > > t1 = time() > > print("duration p2: %f" % (t1-t0)) > > t0 = time() > > result = dumps(data, 3) > > t1 = time() > > print("duration p3: %f" % (t1-t0)) > > > > > > > > Is the overhead for the recursion detection so high ? > > > > Note this happens only if there is a tuple in the tuple of the datalist. > > > > > > Regards, > > > > Wolfgang > > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > https://mail.python.org/mailman/options/python- > dev/victor.stinner%40gm > > ail.com > > From tds333 at gmail.com Tue Jan 28 09:17:28 2014 From: tds333 at gmail.com (tds333 at gmail.com) Date: Tue, 28 Jan 2014 09:17:28 +0100 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <EFE3877620384242A686D52278B7CCD3A5254B85@rkv-it-exch103> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> <EFE3877620384242A686D52278B7CCD3A5254B85@rkv-it-exch103> Message-ID: <52E76798.8050502@gmail.com> Hi, yes I know the main usage is to generate pyc files. But marshal is also used for other stuff and is the fastest built in serialization method. For some use cases it makes sense to use it instead of pickle or others. And people use it not only to generate pyc files. I only found one case with a performance regression in the newer protocol versions for 3.4. We should take care of it and improve it. Now it is possible to handle this in a beta phase and fix it for the upcoming release. Or even document all this. I think it is also useful for others to know about the new versions and their usage and the behavior. I also noticed the new versions can be faster in some use cases. I like the work done for this and think it was also useful to reduce the size of the resulting serialization. I 'm not against it nor want to criticize it. I only want to improve all this further. Regards, Wolfgang On 28.01.2014 06:14, Kristj?n Valur J?nsson wrote: > Hi there. > I think you should modify your program to marshal (and load) a compiled module. > This is where the optimizations in versions 3 and 4 become important. > K > >> -----Original Message----- >> From: Python-Dev [mailto:python-dev- >> bounces+kristjan=ccpgames.com at python.org] On Behalf Of Victor Stinner >> Sent: Monday, January 27, 2014 23:35 >> To: Wolfgang >> Cc: Python-Dev >> Subject: Re: [Python-Dev] Python 3.4, marshal dumps slower (version 3 >> protocol) >> >> Hi, >> >> I'm surprised: marshal.dumps() doesn't raise an error if you pass an invalid >> version. In fact, Python 3.3 only supports versions 0, 1 and 2. If you pass 3, it >> will use the version 2. (Same apply for version >> 99.) >> >> Python 3.4 has two new versions: 3 and 4. The version 3 "shares common >> object references", the version 4 adds short tuples and short strings >> (produce smaller files). >> >> It would be nice to document the differences between marshal versions. >> >> And what do you think of raising an error if the version is unknown in >> marshal.dumps()? >> >> I modified your benchmark to test also loads() and run the benchmark >> 10 times. Results: >> --- >> Python 3.3.3+ (3.3:50aa9e3ab9a4, Jan 27 2014, 16:11:26) [GCC 4.8.2 20131212 >> (Red Hat 4.8.2-7)] on linux >> >> dumps v0: 391.9 ms >> data size v0: 45582.9 kB >> loads v0: 616.2 ms >> >> dumps v1: 384.3 ms >> data size v1: 45582.9 kB >> loads v1: 594.0 ms >> >> dumps v2: 153.1 ms >> data size v2: 41395.4 kB >> loads v2: 549.6 ms >> >> dumps v3: 152.1 ms >> data size v3: 41395.4 kB >> loads v3: 535.9 ms >> >> dumps v4: 152.3 ms >> data size v4: 41395.4 kB >> loads v4: 549.7 ms >> --- >> >> And: >> --- >> Python 3.4.0b3+ (default:dbad4564cd12, Jan 27 2014, 16:09:40) [GCC 4.8.2 >> 20131212 (Red Hat 4.8.2-7)] on linux >> >> dumps v0: 389.4 ms >> data size v0: 45582.9 kB >> loads v0: 564.8 ms >> >> dumps v1: 390.2 ms >> data size v1: 45582.9 kB >> loads v1: 545.6 ms >> >> dumps v2: 165.5 ms >> data size v2: 41395.4 kB >> loads v2: 470.9 ms >> >> dumps v3: 425.6 ms >> data size v3: 41395.4 kB >> loads v3: 528.2 ms >> >> dumps v4: 369.2 ms >> data size v4: 37000.9 kB >> loads v4: 550.2 ms >> --- >> >> Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with Python 3.4 >> produces the smallest file. >> >> Victor >> >> 2014-01-27 Wolfgang <tds333 at gmail.com>: >>> Hi, >>> >>> I tested the latest beta from 3.4 (b3) and noticed there is a new >>> marshal protocol version 3. >>> The documentation is a little silent about the new features, not going >>> into detail. >>> >>> I've run a performance test with the new protocol version and noticed >>> the new version is two times slower in serialization than version 2. I >>> tested it with a simple value tuple in a list (500000 elements). >>> Nothing special. (happens only if the tuple contains also a tuple) >>> >>> Copy of the test code: >>> >>> >>> from time import time >>> from marshal import dumps >>> >>> def genData(amount=500000): >>> for i in range(amount): >>> yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, >>> 1.01*i, >>> True) >>> >>> data = list(genData()) >>> print(len(data)) >>> t0 = time() >>> result = dumps(data, 2) >>> t1 = time() >>> print("duration p2: %f" % (t1-t0)) >>> t0 = time() >>> result = dumps(data, 3) >>> t1 = time() >>> print("duration p3: %f" % (t1-t0)) >>> >>> >>> >>> Is the overhead for the recursion detection so high ? >>> >>> Note this happens only if there is a tuple in the tuple of the datalist. >>> >>> >>> Regards, >>> >>> Wolfgang >>> >>> >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> https://mail.python.org/mailman/options/python- >> dev/victor.stinner%40gm >>> ail.com >>> From martin at v.loewis.de Tue Jan 28 09:27:53 2014 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Jan 2014 09:27:53 +0100 Subject: [Python-Dev] Add PyType_GetSlot Message-ID: <52E76A09.40700@v.loewis.de> I'd like to resolve a long-standing issue of the stable ABI in 3.4: http://bugs.python.org/issue17162 The issue is that, since PyTypeObject is opaque, module authors cannot get at tp_free, which they may need to in order to implement tp_dealloc properly. Rather than providing the proposed specific wrapper for tp_dealloc, I propose to add a generic PyType_GetSlot function. From a stability point of view, exposing slot values is uncritical - it's just that the layout of the type object is hidden. Any objection to adding this before RC1? Regards, Martin From martin at v.loewis.de Tue Jan 28 10:00:29 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Jan 2014 10:00:29 +0100 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> Message-ID: <52E771AD.1020306@v.loewis.de> I've debugged this a little bit. I couldn't originally see where the problem is, since I expected that the code dealing with shared references shouldn't ever trigger - none of the tuples in the example are actually shared (i.e. they all have a ref-count of 1, except for the outer list, which is both a parameter and bound in a variable). Debugging reveals that it is actually the many integer objects which trigger the sharing code. So a much simplified example of Victor's benchmarking code can use data = [0]*10000000 The difference between version 2 and version 3 here is that v2 marshals a lot of "0" integers, whereas version 3 marshals a single one, and then a lot of references to this integer. Since "0" is a small integer, and thus a singleton anyway, this doesn't affect the unmarshal result. If the integers were larger, and actually shared, the umarshal result under v2 would be "more correct". If the integers are not shared, v2 and v3 have about the same runtime, e.g. seen when using data = [1000*1000 for i in range(10000000)] Regards, Martin From barry at python.org Tue Jan 28 10:23:01 2014 From: barry at python.org (Barry Warsaw) Date: Tue, 28 Jan 2014 09:23:01 +0000 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <52E76798.8050502@gmail.com> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> <EFE3877620384242A686D52278B7CCD3A5254B85@rkv-it-exch103> <52E76798.8050502@gmail.com> Message-ID: <20140128092301.0960026d@anarchist> On Jan 28, 2014, at 09:17 AM, tds333 at gmail.com wrote: >yes I know the main usage is to generate pyc files. But marshal is also used >for other stuff and is the fastest built in serialization method. For some >use cases it makes sense to use it instead of pickle or others. And people >use it not only to generate pyc files. marshall is not guaranteed to be backward compatible between Python versions, so it's generally not a good idea to use it for serialization. -Barry From victor.stinner at gmail.com Tue Jan 28 11:22:40 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 28 Jan 2014 11:22:40 +0100 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <52E771AD.1020306@v.loewis.de> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> <52E771AD.1020306@v.loewis.de> Message-ID: <CAMpsgwYF3xmHhzS-E7eTP76L9v2akqYDUGW34Qq_xR+n0UDdqQ@mail.gmail.com> 2014-01-28 "Martin v. L?wis" <martin at v.loewis.de>: > Debugging reveals that it is actually the many integer objects which > trigger the sharing code. So a much simplified example of Victor's > benchmarking code can use > > data = [0]*10000000 > > The difference between version 2 and version 3 here is that v2 marshals > a lot of "0" integers, whereas version 3 marshals a single one, and then > a lot of references to this integer. Since the output size looks to be the same, it may be interesting to special-case small integers, or even integers and floats in general. Handling references to these numbers takes probably more CPU, whereas the gain on the file size is probably minor. I wrote a short patch: http://bugs.python.org/issue20416 "dumps v3 is 60% faster, loads v3 is also 14% *faster*." "dumps v4 is 66% faster, loads v4 is 16% faster." "file size (on version 3 and 4) is unchanged with my patch." "So with the patch, the Python 3.4 default version (4) is *faster* (dump 20% faster, load 16% faster) and produces *smaller files* (10% smaller)." It looks like a win-win patch :-) The drawback is that files storing many duplicated huge numbers will not be smaller with marshal version >= 3. Victor From solipsis at pitrou.net Tue Jan 28 12:11:43 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 28 Jan 2014 12:11:43 +0100 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> <52E771AD.1020306@v.loewis.de> <CAMpsgwYF3xmHhzS-E7eTP76L9v2akqYDUGW34Qq_xR+n0UDdqQ@mail.gmail.com> Message-ID: <20140128121143.016ecbf6@fsol> On Tue, 28 Jan 2014 11:22:40 +0100 Victor Stinner <victor.stinner at gmail.com> wrote: > 2014-01-28 "Martin v. L?wis" <martin at v.loewis.de>: > > Debugging reveals that it is actually the many integer objects which > > trigger the sharing code. So a much simplified example of Victor's > > benchmarking code can use > > > > data = [0]*10000000 > > > > The difference between version 2 and version 3 here is that v2 marshals > > a lot of "0" integers, whereas version 3 marshals a single one, and then > > a lot of references to this integer. > > Since the output size looks to be the same, it may be interesting to > special-case small integers, or even integers and floats in general. > Handling references to these numbers takes probably more CPU, whereas > the gain on the file size is probably minor. Please remember file size is only one factor. Another factor is runtime size after unmarshalling. For the typical case of pyc files, dump times are not very important. Load times are. Regards Antoine. From tds333 at gmail.com Tue Jan 28 12:51:27 2014 From: tds333 at gmail.com (tds333 at gmail.com) Date: Tue, 28 Jan 2014 12:51:27 +0100 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <20140128092301.0960026d@anarchist> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> <EFE3877620384242A686D52278B7CCD3A5254B85@rkv-it-exch103> <52E76798.8050502@gmail.com> <20140128092301.0960026d@anarchist> Message-ID: <52E799BF.6010909@gmail.com> On 28.01.2014 10:23, Barry Warsaw wrote: > On Jan 28, 2014, at 09:17 AM, tds333 at gmail.com wrote: > >> yes I know the main usage is to generate pyc files. But marshal is also used >> for other stuff and is the fastest built in serialization method. For some >> use cases it makes sense to use it instead of pickle or others. And people >> use it not only to generate pyc files. > marshall is not guaranteed to be backward compatible between Python versions, > so it's generally not a good idea to use it for serialization. > Yes I know. And because of that I use it only if nothing persists and the exchange is between the same Python version (even the same architecture and Interpreter type). But there are use cases for inter process communication with no persistence and no need to serialize custom classes and so on. And if speed matters and security is not the problem you use the marshal module to serialize data. Assume something like multiprocessing for Windows (no fork available) and only a pipe to exchange a lot of simple data and pickle is to slow. (Sometimes distributed to other computers.) Another use case can be a persistent cache with ultra fast serialization (dump/load) needs but not with critical data normally stored in a database. Can be regenerated easily if Python version changes from main data. (think pyc files are such a use case) I have tested a lot of modules for some needs (JSON, Thrift, MessagePack, Pickle, ProtoBuffers, ...) all are very useful and has their usage scenario. The same applies to marshal if all the limitations are no problem for you. (I've read the manual and have some knowledge about the limitations) But all these serialization modules are not as fast as marshal. (for my use case) I hear you and registered the warning about this. And will not complain if something will be incompatible. :-) If someone knows something faster to serialize basic Python types. I'm glad to use it. Regards, Wolfgang From steve at pearwood.info Tue Jan 28 13:37:14 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Jan 2014 23:37:14 +1100 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <52E74901.1070008@hastings.org> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> <lc59uu$4m9$1@ger.gmane.org> <52E65110.7070904@hastings.org> <20140127125651.GE3915@ando> <52E65B5D.6060502@hastings.org> <CAB+fVUVEpJUpJS_um1HiHB8Bp9wswgBLNSAwk2R1424bd6snrQ@mail.gmail.com> <52E74901.1070008@hastings.org> Message-ID: <20140128123714.GI3915@ando> On Mon, Jan 27, 2014 at 10:06:57PM -0800, Larry Hastings wrote: > If I were writing it, it might well come out like this: [snip example] +1 on this wording, with one minor caveat: > .. note: if "times" is specified using a keyword argument, and > provided with a negative value, repeat yields the object forever. > This is a bug, its use is unsupported, and this behavior may be > removed in a future version of Python. How about changing "may be removed" to "will be removed", he asks hopefully? :-) -- Steven From ethan at stoneleaf.us Tue Jan 28 15:18:23 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 28 Jan 2014 06:18:23 -0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <20140128123714.GI3915@ando> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> <lc59uu$4m9$1@ger.gmane.org> <52E65110.7070904@hastings.org> <20140127125651.GE3915@ando> <52E65B5D.6060502@hastings.org> <CAB+fVUVEpJUpJS_um1HiHB8Bp9wswgBLNSAwk2R1424bd6snrQ@mail.gmail.com> <52E74901.1070008@hastings.org> <20140128123714.GI3915@ando> Message-ID: <52E7BC2F.2080207@stoneleaf.us> On 01/28/2014 04:37 AM, Steven D'Aprano wrote: > On Mon, Jan 27, 2014 at 10:06:57PM -0800, Larry Hastings wrote: > >> If I were writing it, it might well come out like this: > [snip example] > > +1 on this wording, with one minor caveat: > >> .. note: if "times" is specified using a keyword argument, and >> provided with a negative value, repeat yields the object forever. >> This is a bug, its use is unsupported, and this behavior may be >> removed in a future version of Python. > > How about changing "may be removed" to "will be removed", he asks > hopefully? :-) +1 -- ~Ethan~ From arigo at tunes.org Tue Jan 28 21:35:32 2014 From: arigo at tunes.org (Armin Rigo) Date: Tue, 28 Jan 2014 21:35:32 +0100 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <CAB+fVUXEoSJuXyF2Oy7N3qQptzxDnyEi8UF-6YwU3FzQT=tejQ@mail.gmail.com> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <CAP7h-xZAS2dUJBpLZsyb0T5eEbDv2Mu0Kst8=z=r0j3U97WtmA@mail.gmail.com> <CADiSq7ePo=H8j86wpOo7HPyv5SV24ZcJd_KAvCdbWeUn1CD_GA@mail.gmail.com> <20140127103830.2d54c0c6@fsol> <CAB+fVUU-MDvb5nVWouTpJQieZBvAQS7vFkJh61ytQDZnkU2BNA@mail.gmail.com> <20140127132918.177728cc@fsol> <CAB+fVUXEoSJuXyF2Oy7N3qQptzxDnyEi8UF-6YwU3FzQT=tejQ@mail.gmail.com> Message-ID: <CAMSv6X02o20ojcgKVy2zYLJFtCtUt2C6+txTC2t3VOP4wbvT_A@mail.gmail.com> Hi Vajrasky, On 28 January 2014 03:05, Vajrasky Kok <sky.kok at speaklikeaking.com> wrote: > I get your point. But strangely enough, I can still recover from > list(repeat('a', 2**29)). It only slows down my computer. I can ^Z the > application then kill it later. But with list(repeat('a', times=-1)), > rebooting the machine is compulsory. Actually you get the early OverflowError if the value doesn't fit a C long, and any value up to sys.maxint gets past this check. Try with 2**31-1. A bient?t, Armin. From larry at hastings.org Wed Jan 29 03:46:39 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 28 Jan 2014 18:46:39 -0800 Subject: [Python-Dev] Add PyType_GetSlot In-Reply-To: <52E76A09.40700@v.loewis.de> References: <52E76A09.40700@v.loewis.de> Message-ID: <52E86B8F.2060500@hastings.org> On 01/28/2014 12:27 AM, "Martin v. L?wis" wrote: > I'd like to resolve a long-standing issue of the stable ABI in 3.4: > > http://bugs.python.org/issue17162 > > The issue is that, since PyTypeObject is opaque, module authors cannot > get at tp_free, which they may need to in order to implement tp_dealloc > properly. > > Rather than providing the proposed specific wrapper for tp_dealloc, I > propose to add a generic PyType_GetSlot function. From a stability point > of view, exposing slot values is uncritical - it's just that the layout > of the type object is hidden. > > Any objection to adding this before RC1? So this would be a new public ABI function? Would it be 100% new code, or would you need to refactor code internally to achieve it? In general I'm in favor of it but I'd like to review the patch before it goes in. Also, just curious: what is typeslots.h used for? I tried searching for a couple of those macros, and their only appearance in trunk was their definition. Cheers, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140128/86ed98be/attachment.html> From larry at hastings.org Wed Jan 29 03:50:50 2014 From: larry at hastings.org (Larry Hastings) Date: Tue, 28 Jan 2014 18:50:50 -0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <52E7BC2F.2080207@stoneleaf.us> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> <lc59uu$4m9$1@ger.gmane.org> <52E65110.7070904@hastings.org> <20140127125651.GE3915@ando> <52E65B5D.6060502@hastings.org> <CAB+fVUVEpJUpJS_um1HiHB8Bp9wswgBLNSAwk2R1424bd6snrQ@mail.gmail.com> <52E74901.1070008@hastings.org> <20140128123714.GI3915@ando> <52E7BC2F.2080207@stoneleaf.us> Message-ID: <52E86C8A.8090706@hastings.org> On 01/28/2014 06:18 AM, Ethan Furman wrote: > On 01/28/2014 04:37 AM, Steven D'Aprano wrote: >> On Mon, Jan 27, 2014 at 10:06:57PM -0800, Larry Hastings wrote: >>> .. note: if "times" is specified using a keyword argument, and >>> provided with a negative value, repeat yields the object forever. >>> This is a bug, its use is unsupported, and this behavior may be >>> removed in a future version of Python. >> >> How about changing "may be removed" to "will be removed", he asks >> hopefully? :-) > > +1 See the recent discussion "Deprecation policy" right here in python-dev for a cogent discussion on this issue. I agree with Raymond's view, posted on 1/25: * A good use for deprecations is for features that were flat-out misdesigned and prone to error. For those, there is nothing wrong with deprecating them right away. Once deprecated though, there doesn't need to be a rush to actually remove it -- that just makes it harder for people with currently working code to upgrade to newer versions of Python. * When I became a core developer well over a decade ago, I was a little deprecation happy (old stuff must go, keep everything nice and clean, etc). What I learned though is that deprecations are very hard on users and that the purported benefits usually aren't really important. I think the "times behaves differently when passed by name versus passed by position" behavior falls exactly into this category, and its advice on how to handle it is sound. Cheers, //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140128/5aa08894/attachment.html> From guido at python.org Wed Jan 29 04:06:40 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Jan 2014 19:06:40 -0800 Subject: [Python-Dev] Need help designing subprocess API for Tulip Message-ID: <CAP7+vJLt=-PaC7QJNwjt0-YEKaYxGHOZ1Q_SU7Yu24_p6AKh+Q@mail.gmail.com> If you're interested, please see us on the python-tulip mailing list at Google Groups. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140128/f6ada783/attachment.html> From kristjan at ccpgames.com Wed Jan 29 04:16:39 2014 From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=) Date: Wed, 29 Jan 2014 03:16:39 +0000 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> Message-ID: <EFE3877620384242A686D52278B7CCD3A525595C@rkv-it-exch103> ?Note this happens only if there is a tuple in the tuple of the datalist.? This is rather odd. Protocol 3 adds support for object instancing. Non-trivial Objects are looked up in the memo dictionary if they have a reference count larger than 1. I suspect that the internal tuple has this property, for some reason. However, my little test in 2.7 does not bear out this hypothesis: def genData(amount=500000): for i in range(amount): yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True) l = list(genData()) import sys print sys.getrefcount(l[1000]) print sys.getrefcount(l[1000][0]) print sys.getrefcount(l[1000][3]) C:\Program Files\Perforce>python d:\pyscript\data.py 2 3 2 K From: Python-Dev [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Wolfgang Sent: Monday, January 27, 2014 22:41 To: Python-Dev Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) Hi, I tested the latest beta from 3.4 (b3) and noticed there is a new marshal protocol version 3. The documentation is a little silent about the new features, not going into detail. I've run a performance test with the new protocol version and noticed the new version is two times slower in serialization than version 2. I tested it with a simple value tuple in a list (500000 elements). Nothing special. (happens only if the tuple contains also a tuple) Copy of the test code: from time import time from marshal import dumps def genData(amount=500000): for i in range(amount): yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True) data = list(genData()) print(len(data)) t0 = time() result = dumps(data, 2) t1 = time() print("duration p2: %f" % (t1-t0)) t0 = time() result = dumps(data, 3) t1 = time() print("duration p3: %f" % (t1-t0)) Is the overhead for the recursion detection so high ? Note this happens only if there is a tuple in the tuple of the datalist. Regards, Wolfgang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140129/11cddcb8/attachment-0001.html> From kristjan at ccpgames.com Wed Jan 29 04:02:58 2014 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 29 Jan 2014 03:02:58 +0000 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <20140128092301.0960026d@anarchist> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> <EFE3877620384242A686D52278B7CCD3A5254B85@rkv-it-exch103> <52E76798.8050502@gmail.com> <20140128092301.0960026d@anarchist> Message-ID: <EFE3877620384242A686D52278B7CCD3A5255935@rkv-it-exch103> How often I hear this argument :) For many people, serialized data is not persisted. But used e.g. for sending information over the wire, or between processes. Marshal is very good for that. Additionally, it doesn't have any side effects since it just stores primitive types and is thus "safe". EVE Online uses its own extended version of the marshal system, and has for years, because it is fast and it can be tuned to an application domain by adding custom opcodes. > -----Original Message----- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Barry Warsaw > Sent: Tuesday, January 28, 2014 17:23 > To: python-dev at python.org > Subject: Re: [Python-Dev] Python 3.4, marshal dumps slower (version 3 > protocol) > marshall is not guaranteed to be backward compatible between Python > versions, so it's generally not a good idea to use it for serialization. From tjreedy at udel.edu Wed Jan 29 05:55:50 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 28 Jan 2014 23:55:50 -0500 Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) In-Reply-To: <EFE3877620384242A686D52278B7CCD3A5255935@rkv-it-exch103> References: <CAFTeXi253vLKp9h4EtA5jMaEzyenTZPkEBGKq7asci3ta-Q8qg@mail.gmail.com> <CAMpsgwbjx+SVT1vTP0uin0pd9yA4B=q1JQpet3euuVjPSpbN_g@mail.gmail.com> <EFE3877620384242A686D52278B7CCD3A5254B85@rkv-it-exch103> <52E76798.8050502@gmail.com> <20140128092301.0960026d@anarchist> <EFE3877620384242A686D52278B7CCD3A5255935@rkv-it-exch103> Message-ID: <lca1ke$3jf$1@ger.gmane.org> On 1/28/2014 10:02 PM, Kristj?n Valur J?nsson wrote: >> marshall is not guaranteed to be backward compatible between Python >> versions, so it's generally not a good idea to use it for serialization. > How often I hear this argument :) > For many people, serialized data is not persisted. But used e.g. for sending information over the wire, or between processes. > Marshal is very good for that. Additionally, it doesn't have any side effects since it just stores primitive types and is thus "safe". > EVE Online uses its own extended version of the marshal system, and has for years, because it is fast and it can be > tuned to an application domain by adding custom opcodes. I think the proper message is this: "Marshal is designed for caching compiled message objects and has the function needed for that goal. When the need changes, marshal changes (with a change in magic number). Other uses should take into account the limitations of function and stability." It appears you did just that by making a custom version with the function and stability you need. -- Terry Jan Reedy From ethan at stoneleaf.us Wed Jan 29 06:18:37 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 28 Jan 2014 21:18:37 -0800 Subject: [Python-Dev] Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4) In-Reply-To: <52E86C8A.8090706@hastings.org> References: <CAB+fVUWZ4j81q__xyJUXGyWQ00DaQaBOhH_MkTQ=QCV-W09_zA@mail.gmail.com> <lc3kv7$dgl$1@ger.gmane.org> <CADiSq7eTqtObNeyHC8m01zpQze_1BKHrithFa+Bt=d0GC3nygw@mail.gmail.com> <lc59uu$4m9$1@ger.gmane.org> <52E65110.7070904@hastings.org> <20140127125651.GE3915@ando> <52E65B5D.6060502@hastings.org> <CAB+fVUVEpJUpJS_um1HiHB8Bp9wswgBLNSAwk2R1424bd6snrQ@mail.gmail.com> <52E74901.1070008@hastings.org> <20140128123714.GI3915@ando> <52E7BC2F.2080207@stoneleaf.us> <52E86C8A.8090706@hastings.org> Message-ID: <52E88F2D.9030500@stoneleaf.us> On 01/28/2014 06:50 PM, Larry Hastings wrote: > > See the recent discussion "Deprecation policy" right here in python-dev for a cogent discussion on this issue. I agree > with Raymond's view, posted on 1/25: > > * A good use for deprecations is for features that were flat-out misdesigned > and prone to error. For those, there is nothing wrong with deprecating them > right away. Once deprecated though, there doesn't need to be a rush to > actually remove it -- that just makes it harder for people with currently > working code to upgrade to newer versions of Python. > > * When I became a core developer well over a decade ago, I was a little > deprecation happy (old stuff must go, keep everything nice and clean, etc). > What I learned though is that deprecations are very hard on users and that > the purported benefits usually aren't really important. I also agree with this view. > I think the "times behaves differently when passed by name versus passed by position" behavior falls exactly into this > category, and its advice on how to handle it is sound. I don't agree with this. This is a bug. Somebody going through (for example) a code review and making minor changes so the code is more readable shouldn't have to be afraid that [inserting | removing] the keyword in the function call is going to *drastically* [1] change the behavior. I understand the need for a cycle of deprecation [2], but not fixing it in 3.5 is folly. -- ~Ethan~ [1] or change the behavior *at all*, for that matter [2] speaking of deprecations, are all the 3.1, 3.2, etc., etc., deprecations being added to 2.7? From victor.stinner at gmail.com Wed Jan 29 10:42:01 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 29 Jan 2014 10:42:01 +0100 Subject: [Python-Dev] Need help designing subprocess API for Tulip In-Reply-To: <CAP7+vJLt=-PaC7QJNwjt0-YEKaYxGHOZ1Q_SU7Yu24_p6AKh+Q@mail.gmail.com> References: <CAP7+vJLt=-PaC7QJNwjt0-YEKaYxGHOZ1Q_SU7Yu24_p6AKh+Q@mail.gmail.com> Message-ID: <CAMpsgwbk1pOxwMybVivz8ZMwXWkAy7BDg1ntngV-SbY40+38jQ@mail.gmail.com> Link to the thread on python-tulip: https://groups.google.com/forum/#!topic/python-tulip/2snxuJY_Lx0 Victor 2014-01-29 Guido van Rossum <guido at python.org>: > If you're interested, please see us on the python-tulip mailing list at > Google Groups. > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > From andrew.svetlov at gmail.com Wed Jan 29 17:55:04 2014 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Wed, 29 Jan 2014 18:55:04 +0200 Subject: [Python-Dev] [python-committers] [RELEASED] Python 3.3.4 release candidate 1 In-Reply-To: <52E60C8E.1050908@python.org> References: <52E60C8E.1050908@python.org> Message-ID: <CAL3CFcV4ALwveAN4jKkWDOxo2eU3ZQCOO9UG4GC9RV02Z9FXzg@mail.gmail.com> Would you to accept fixes for http://bugs.python.org/issue20434 and http://bugs.python.org/issue20437 before 3.3.4 final? On Mon, Jan 27, 2014 at 9:36 AM, Georg Brandl <georg at python.org> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On behalf of the Python development team, I'm reasonably happy to announce the > Python 3.3.4 release candidate 1. > > Python 3.3.4 includes several security fixes and over 120 bug fixes compared to > the Python 3.3.3 release. > > This release fully supports OS X 10.9 Mavericks. In particular, this release > fixes an issue that could cause previous versions of Python to crash when typing > in interactive mode on OS X 10.9. > > Python 3.3 includes a range of improvements of the 3.x series, as well as easier > porting between 2.x and 3.x. In total, almost 500 API items are new or improved > in Python 3.3. For a more extensive list of changes in the 3.3 series, see > > http://docs.python.org/3.3/whatsnew/3.3.html > > and for the detailed changelog of 3.3.4, see > > http://docs.python.org/3.3/whatsnew/changelog.html > > To download Python 3.3.4 rc1 visit: > > http://www.python.org/download/releases/3.3.4/ > > This is a preview release, please report any bugs to > > http://bugs.python.org/ > > The final version is scheduled to be released in two weeks' time, on or about > the 10th of February. > > Enjoy! > > - -- > Georg Brandl, Release Manager > georg at python.org > (on behalf of the entire python-dev team and 3.3's contributors) > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.22 (GNU/Linux) > > iEYEARECAAYFAlLmDI4ACgkQN9GcIYhpnLAr6wCePRbHF80k5goV4RUDBA5FfkwF > rLUAnRg0RpL/b6apv+Dt2/sgnUd3hTPA > =Z4Ss > -----END PGP SIGNATURE----- > _______________________________________________ > python-committers mailing list > python-committers at python.org > https://mail.python.org/mailman/listinfo/python-committers -- Thanks, Andrew Svetlov From storchaka at gmail.com Wed Jan 29 19:24:41 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 29 Jan 2014 20:24:41 +0200 Subject: [Python-Dev] Add Py_REPLACE and Py_XREPLACE macros Message-ID: <lcbh0b$2k9$1@ger.gmane.org> The Py_CLEAR macros is used as safe alternative for following unsafe idiomatic code: Py_XDECREF(ptr); ptr = NULL; But other unsafe idiomatic code is widely used in the sources: Py_XDECREF(ptr); ptr = new_value; Every occurrence of such code is potential bug for same reasons as for Py_CLEAR. It was offered [1] to introduce new macros Py_REPLACE and Py_XREPLACE for safe replace with Py_DECREF and Py_XDECREF respectively. Automatically generated patch contains about 50 replaces [2]. [1] http://bugs.python.org/issue16447 [2] http://bugs.python.org/issue20440 From storchaka at gmail.com Wed Jan 29 20:10:14 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 29 Jan 2014 21:10:14 +0200 Subject: [Python-Dev] Add Py_REPLACE and Py_XREPLACE macros In-Reply-To: <lcbh0b$2k9$1@ger.gmane.org> References: <lcbh0b$2k9$1@ger.gmane.org> Message-ID: <lcbjlm$436$1@ger.gmane.org> Antoine already proposed similar macros [1] [2]. But now we have about 50 potential bugs which can be fixed with these macros. [1] https://mail.python.org/pipermail/python-dev/2008-May/079862.html [2] http://bugs.python.org/issue20440 From storchaka at gmail.com Wed Jan 29 20:12:18 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 29 Jan 2014 21:12:18 +0200 Subject: [Python-Dev] [RELEASED] Python 3.3.4 release candidate 1 In-Reply-To: <CAL3CFcV4ALwveAN4jKkWDOxo2eU3ZQCOO9UG4GC9RV02Z9FXzg@mail.gmail.com> References: <52E60C8E.1050908@python.org> <CAL3CFcV4ALwveAN4jKkWDOxo2eU3ZQCOO9UG4GC9RV02Z9FXzg@mail.gmail.com> Message-ID: <lcbjpi$436$2@ger.gmane.org> 29.01.14 18:55, Andrew Svetlov ???????(??): > Would you to accept fixes for http://bugs.python.org/issue20434 and > http://bugs.python.org/issue20437 before 3.3.4 final? And http://bugs.python.org/issue20440. From storchaka at gmail.com Wed Jan 29 20:12:55 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 29 Jan 2014 21:12:55 +0200 Subject: [Python-Dev] Add Py_REPLACE and Py_XREPLACE macros In-Reply-To: <lcbjlm$436$1@ger.gmane.org> References: <lcbh0b$2k9$1@ger.gmane.org> <lcbjlm$436$1@ger.gmane.org> Message-ID: <lcbjqn$436$3@ger.gmane.org> 29.01.14 21:10, Serhiy Storchaka ???????(??): > Antoine already proposed similar macros [1] [2]. But now we have about > 50 potential bugs which can be fixed with these macros. > > [1] https://mail.python.org/pipermail/python-dev/2008-May/079862.html > [2] http://bugs.python.org/issue20440 [2] http://bugs.python.org/issue3081 From stefan_ml at behnel.de Thu Jan 30 07:25:01 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 30 Jan 2014 07:25:01 +0100 Subject: [Python-Dev] weird docstring generated by argument clinic Message-ID: <lccr7g$f0m$1@ger.gmane.org> Hi, for two days now, the signature embedding tests in Cython have been failing with this (doctest) error: """ Expected: f_D(long double D) -> long double Got: f_DNone f_D(long double D) -> long double """ https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/1869/ARCH=m64,BACKEND=c,PYVERSION=py3km/testReport/junit/doctest/DocTestCase/Doctest__embedsignatures/ The first line that Cython writes into the docstring is the "expected" one above. So far, all CPython versions have ignored it, Py3.4 then started picking it up at some point due to the argument clinic changes, but properly copied it over to the (IIRC) "__signature__" attribute. However, the recent change now lead to the above being dumped into the docstring. Could someone please quickly explain what the purpose of the first line is and why it says "None" at the end? Is there anything we should do on our side in order to fix this? Since many of our users embed their signatures for documentation purposes (it was the only way to make them visible in previous CPython versions and is supported by several tools, e.g. epydoc), this is a rather annoying result for them. Stefan From guido at python.org Thu Jan 30 07:36:44 2014 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Jan 2014 22:36:44 -0800 Subject: [Python-Dev] weird docstring generated by argument clinic In-Reply-To: <lccr7g$f0m$1@ger.gmane.org> References: <lccr7g$f0m$1@ger.gmane.org> Message-ID: <CAP7+vJLbNu=KWDs826fKgoqpBG2GYxXdhMg7wD5PBHv4gsWeCg@mail.gmail.com> I suppose it's related to this checkin: changeset: 88792:d6311829da15 parent: 88788:6b37e6aff9ef user: Larry Hastings <larry at hastings.org> date: Tue Jan 28 05:00:08 2014 -0800 files: Include/object.h Lib/idlelib/idle_test/test_calltips.py Lib/inspect.py Lib/test/test_capi.py Lib/test/test_generators.py Lib/test/test_genexps.py Misc/NEWS Modules/_bz2module.c Modules/_cryp\ tmodule.c Modules/_cursesmodule.c Modules/_datetimemodule.c Modules/_dbmmodule.c Modules/_lzmamodule.c Modules/_lzmamodule.clinic.c Modules/_opcode.c Modules/_pickle.c Modules/_sre.c Modules/_testcapimod\ ule.c Modules/_weakref.c Modules/audioop.c Modules/binascii.c Modules/clinic/_bz2module.c.h Modules/clinic/_lzmamodule.c.h Modules/clinic/_pickle.c.h Modules/clinic/audioop.c.h Modules/clinic/binascii.c.\ h Modules/clinic/zlibmodule.c.h Modules/posixmodule.c Modules/unicodedata.c Modules/zlibmodule.c Objects/descrobject.c Objects/dictobject.c Objects/methodobject.c Objects/typeobject.c Objects/unicodeobje\ ct.c Python/import.c Tools/clinic/clinic.py description: Issue #20326: Argument Clinic now uses a simple, unique signature to annotate text signatures in docstrings, resulting in fewer false positives. "self" parameters are also explicitly marked, allowing inspect.Signature() to authoritatively detect (and skip) said parameters. Issue #20326: Argument Clinic now generates separate checksums for the input and output sections of the block, allowing external tools to verify that the input has not changed (and thus the output is not out-of-date). On Wed, Jan 29, 2014 at 10:25 PM, Stefan Behnel <stefan_ml at behnel.de> wrote: > Hi, > > for two days now, the signature embedding tests in Cython have been failing > with this (doctest) error: > > """ > Expected: > f_D(long double D) -> long double > Got: > f_DNone > f_D(long double D) -> long double > """ > > > https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/1869/ARCH=m64,BACKEND=c,PYVERSION=py3km/testReport/junit/doctest/DocTestCase/Doctest__embedsignatures/ > > The first line that Cython writes into the docstring is the "expected" one > above. So far, all CPython versions have ignored it, Py3.4 then started > picking it up at some point due to the argument clinic changes, but > properly copied it over to the (IIRC) "__signature__" attribute. However, > the recent change now lead to the above being dumped into the docstring. > > Could someone please quickly explain what the purpose of the first line is > and why it says "None" at the end? > > Is there anything we should do on our side in order to fix this? Since many > of our users embed their signatures for documentation purposes (it was the > only way to make them visible in previous CPython versions and is supported > by several tools, e.g. epydoc), this is a rather annoying result for them. > > Stefan > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140129/98883928/attachment.html> From larry at hastings.org Thu Jan 30 08:14:22 2014 From: larry at hastings.org (Larry Hastings) Date: Wed, 29 Jan 2014 23:14:22 -0800 Subject: [Python-Dev] weird docstring generated by argument clinic In-Reply-To: <lccr7g$f0m$1@ger.gmane.org> References: <lccr7g$f0m$1@ger.gmane.org> Message-ID: <52E9FBCE.90605@hastings.org> On 01/29/2014 10:25 PM, Stefan Behnel wrote: > Hi, > > for two days now, the signature embedding tests in Cython have been failing > with this (doctest) error: > > """ > Expected: > f_D(long double D) -> long double > Got: > f_DNone > f_D(long double D) -> long double > """ The string "f_DNone" shouldn't be there. However, "f_DNone" doesn't appear in a fresh checkout of CPython trunk. If you have a reproducable test case, please file it with an issue on the tracker and add me to the nosy list. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140129/59d4118c/attachment.html> From stefan_ml at behnel.de Thu Jan 30 08:19:33 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 30 Jan 2014 08:19:33 +0100 Subject: [Python-Dev] weird docstring generated by argument clinic In-Reply-To: <CAP7+vJLbNu=KWDs826fKgoqpBG2GYxXdhMg7wD5PBHv4gsWeCg@mail.gmail.com> References: <lccr7g$f0m$1@ger.gmane.org> <CAP7+vJLbNu=KWDs826fKgoqpBG2GYxXdhMg7wD5PBHv4gsWeCg@mail.gmail.com> Message-ID: <lccudo$fik$1@ger.gmane.org> Guido van Rossum, 30.01.2014 07:36: > I suppose it's related to this checkin: > [...] > Issue #20326: Argument Clinic now uses a simple, unique signature to > annotate text signatures in docstrings, resulting in fewer false > positives.[...] Thanks, I'll comment there. > On Wed, Jan 29, 2014 at 10:25 PM, Stefan Behnel wrote: >> for two days now, the signature embedding tests in Cython have been failing >> with this (doctest) error: >> >> """ >> Expected: >> f_D(long double D) -> long double >> Got: >> f_DNone >> f_D(long double D) -> long double >> """ And I should have checked a bit more deeply. The None part here is the __text_signature__, which is no longer filled. The commit above explains it. Stefan From stefan_ml at behnel.de Thu Jan 30 08:33:52 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 30 Jan 2014 08:33:52 +0100 Subject: [Python-Dev] weird docstring generated by argument clinic In-Reply-To: <52E9FBCE.90605@hastings.org> References: <lccr7g$f0m$1@ger.gmane.org> <52E9FBCE.90605@hastings.org> Message-ID: <lccv8j$nlu$1@ger.gmane.org> Larry Hastings, 30.01.2014 08:14: > On 01/29/2014 10:25 PM, Stefan Behnel wrote: >> for two days now, the signature embedding tests in Cython have been failing >> with this (doctest) error: >> >> """ >> Expected: >> f_D(long double D) -> long double >> Got: >> f_DNone >> f_D(long double D) -> long double >> """ > > The string "f_DNone" shouldn't be there. However, "f_DNone" doesn't appear > in a fresh checkout of CPython trunk. If you have a reproducable test > case, please file it with an issue on the tracker and add me to the nosy list. Sorry, my fault. Please ignore that part. Stefan From g.brandl at gmx.net Thu Jan 30 11:14:01 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 30 Jan 2014 11:14:01 +0100 Subject: [Python-Dev] [RELEASED] Python 3.3.4 release candidate 1 In-Reply-To: <lcbjpi$436$2@ger.gmane.org> References: <52E60C8E.1050908@python.org> <CAL3CFcV4ALwveAN4jKkWDOxo2eU3ZQCOO9UG4GC9RV02Z9FXzg@mail.gmail.com> <lcbjpi$436$2@ger.gmane.org> Message-ID: <lcd8jb$38o$1@ger.gmane.org> Am 29.01.2014 20:12, schrieb Serhiy Storchaka: > 29.01.14 18:55, Andrew Svetlov ???????(??): >> Would you to accept fixes for http://bugs.python.org/issue20434 and >> http://bugs.python.org/issue20437 before 3.3.4 final? > > And http://bugs.python.org/issue20440. No, sorry; these bugs are not regressions in 3.3.4 and have been there at least the whole 3.3 line (I think), so their fixes might do more harm than good if *they* are faulty. They are fine for 3.3.5 which is not too far away anyway. Georg From greg at krypto.org Thu Jan 30 19:11:48 2014 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 30 Jan 2014 10:11:48 -0800 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <1390714471.32259.75378393.3A5AC458@webmail.messagingengine.com> References: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> <1390666354.4886.75222561.1A3A841F@webmail.messagingengine.com> <CALyJZZXBfwopr6SR0iJ_bn=FGGD53wKucbcEeSCWsNswoGUb5w@mail.gmail.com> <1390676730.5741.75263769.6502E7A9@webmail.messagingengine.com> <CADiSq7fe3ysPagR-VJPUAQTWayqNLgLJ2ku8FG6SnNHb2kD8Ag@mail.gmail.com> <1390714471.32259.75378393.3A5AC458@webmail.messagingengine.com> Message-ID: <CAGE7PNKkXN=eBecpooi1Laz+wBb43D-qi2qCHTFRmW-akUoYvA@mail.gmail.com> I also get search results with Python 1.5.0p2 showing up. Search for PyArg_ParseTuple. The first result is a URL with /2/ in it who's search result title says "3.3.3" but opening it is the correct 2.x documentation. The second result is the ancient Python 1.5.0 docs. ;) Should the ancient /release/ docs have redirects setup or be somehow marked as no crawl? http://docs.python.org/release/1.5.2p2/ext/parseTuple.html is the humorous result in this case. "I want to know how the API I'm using behaved 15 years ago!", said no one ever. -gps On Sat, Jan 25, 2014 at 9:34 PM, Benjamin Peterson <benjamin at python.org>wrote: > > > On Sat, Jan 25, 2014, at 07:04 PM, Nick Coghlan wrote: > > Which suggests that the Google web crawler *is* spidering the dev > > docs, which we generally don't want :P > > I've now added a robots.txt to disallow crawling /dev. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140130/713d81ca/attachment.html> From arigo at tunes.org Thu Jan 30 22:49:31 2014 From: arigo at tunes.org (Armin Rigo) Date: Thu, 30 Jan 2014 22:49:31 +0100 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <lc0ogt$mo2$1@ger.gmane.org> References: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> <1390666354.4886.75222561.1A3A841F@webmail.messagingengine.com> <lc0ogt$mo2$1@ger.gmane.org> Message-ID: <CAMSv6X218i4dxWSRQ9w1_mMaH-+6+JU2Na_93Khuv51Nx93t5w@mail.gmail.com> Hi, On 25 January 2014 17:26, Georg Brandl <g.brandl at gmx.net> wrote: > Yep, and the URLs without version never served Python 3 docs as far as I can > remember, so I don't know where Google has these <title>s from. My guess would be that it's the title of the page that we (now) get from the url http://docs.python.org/ . Only my 2 cents, but this "bug" as reported by Vincent Davis might be worth a workaround... A bient?t, Armin. From benjamin at python.org Thu Jan 30 23:02:39 2014 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 30 Jan 2014 14:02:39 -0800 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <CAGE7PNKkXN=eBecpooi1Laz+wBb43D-qi2qCHTFRmW-akUoYvA@mail.gmail.com> References: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> <1390666354.4886.75222561.1A3A841F@webmail.messagingengine.com> <CALyJZZXBfwopr6SR0iJ_bn=FGGD53wKucbcEeSCWsNswoGUb5w@mail.gmail.com> <1390676730.5741.75263769.6502E7A9@webmail.messagingengine.com> <CADiSq7fe3ysPagR-VJPUAQTWayqNLgLJ2ku8FG6SnNHb2kD8Ag@mail.gmail.com> <1390714471.32259.75378393.3A5AC458@webmail.messagingengine.com> <CAGE7PNKkXN=eBecpooi1Laz+wBb43D-qi2qCHTFRmW-akUoYvA@mail.gmail.com> Message-ID: <1391119359.16934.77428409.180B463C@webmail.messagingengine.com> On Thu, Jan 30, 2014, at 10:11 AM, Gregory P. Smith wrote: > I also get search results with Python 1.5.0p2 showing up. > > Search for PyArg_ParseTuple. The first result is a URL with /2/ in it > who's > search result title says "3.3.3" but opening it is the correct 2.x > documentation. The second result is the ancient Python 1.5.0 docs. ;) > > Should the ancient /release/ docs have redirects setup or be somehow > marked > as no crawl? http://docs.python.org/release/1.5.2p2/ext/parseTuple.html > is > the humorous result in this case. I've now added /release to robots.txt. From vincent at vincentdavis.net Fri Jan 31 15:23:57 2014 From: vincent at vincentdavis.net (Vincent Davis) Date: Fri, 31 Jan 2014 07:23:57 -0700 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <CAKhrM53WYAysLEhZfJrNbrXPau9yvOgtKbcTHzJf9=n26RtTPA@mail.gmail.com> References: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> <CAKhrM53WYAysLEhZfJrNbrXPau9yvOgtKbcTHzJf9=n26RtTPA@mail.gmail.com> Message-ID: <CALyJZZWRjVg6ZUVfztwfMYqzPRV0g3+H9yBQXjy3GTrD2SvO0w@mail.gmail.com> On Fri, Jan 31, 2014 at 5:41 AM, Rick Boyce <rickboyce at gmail.com> wrote: > 28.3. builtins ? Built-in objects ? Python 3.3.3 documentation -> > https://docs.python.org/3/library/builtins.html > ?I can't get the https <https://docs.python.org/3/library/builtins.html> link to work. Does python.org support https<https://docs.python.org/3/library/builtins.html>? it should :-)? Vincent Davis 720-301-3003 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140131/2dd8260b/attachment.html> From vincent at vincentdavis.net Fri Jan 31 15:22:29 2014 From: vincent at vincentdavis.net (Vincent Davis) Date: Fri, 31 Jan 2014 07:22:29 -0700 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <CAKhrM53WYAysLEhZfJrNbrXPau9yvOgtKbcTHzJf9=n26RtTPA@mail.gmail.com> References: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> <CAKhrM53WYAysLEhZfJrNbrXPau9yvOgtKbcTHzJf9=n26RtTPA@mail.gmail.com> Message-ID: <CALyJZZXw34aw=3Zz86ofQ25dyR7yBer+ekxQag4Au2ndrzMDmg@mail.gmail.com> As I understand it, http://docs.python.org/library<https://docs.python.org/3/library/builtins.html> redirects to http://docs.python.org/2/library<https://docs.python.org/3/library/builtins.html> so that old links, blog posts..... that exist in the world and originally referenced python 2 will still work as they pointed to http://docs.python.org/library<https://docs.python.org/3/library/builtins.html> (no version number) * Is this correct? At some point http://docs.python.org/library<https://docs.python.org/3/library/builtins.html> should stop working. I would consider adding release numbers, i.e http://docs.python.org/3.4/library<https://docs.python.org/3/library/builtins.html> All this makes me think it would be cool to have a DIFF button on document pages that would show a diff between version number. i.e. when I read a blog post about X and I follow the link in the post to doc version a.b I can see a quick diff to see how it (docs) compare to version c.d I am using. I think http://docs.python.org<https://docs.python.org/3/library/builtins.html> should be a landing page not forwarded to current version docs. Maybe something like http://www.python.org/doc/versions/ although I think that should be here http://docs.python.org/versions/ Vincent Davis 720-301-3003 On Fri, Jan 31, 2014 at 5:41 AM, Rick Boyce <rickboyce at gmail.com> wrote: > I get caught out a lot by the titles Google is showing for pages quite > often too, but as far I can tell they are not related to the /dev docs. > > If I Google "python builtins" the top 3 results, for me, are as follows: > > 2. Built-in Functions ? Python v2.7.6 documentation -> > http://docs.python.org/library/functions.html > 28.3. builtins ? Built-in objects ? Python 3.3.3 documentation -> > https://docs.python.org/3/library/builtins.html > Built-in objects - Python 3.3.3 documentation -> > http://docs.python.org/library/__builtin__.html > > The top two are fine, but the last one is a Python 2 docs page but Google > shows the title as being for 3.3. This seems to be really common when > googling for python docs. > > At least the design of the two versions is different enough that you spot > it immediately, but it happens often enough to be confusing all the same! > > Rick > > > On 25 January 2014 13:47, Vincent Davis <vincent at vincentdavis.net> wrote: > >> When I do a google search the version numbers are mismatched with the >> linked page (or redirected). >> For example search for "python counter" I get the following results. >> (see attachment) >> It seems like the website is redirecting incorrectly. >> >> >> 1. collections - Python 3.3.3 documentation<http://docs.python.org/library/collections.html> >> 1. links to http://docs.python.org/library/collections.html >> 2. redirects to http://docs.python.org/2/library/collections.html >> 3. Which is python 2.7.6 >> 2. itertools - Python 3.3.3 documentation<http://docs.python.org/library/itertools.html> >> 1. links to http://docs.python.org/library/itertools.html >> 2. redirects to http://docs.python.org/2/library/itertools.html >> 3. Which is again 2.7.6 >> 3. 8.3. collections ? Container datatypes - Python 3.3.3 documentation<http://docs.python.org/dev/library/collections> >> 1. This one seems correct, 3.40b2 >> 2. links to http://docs.python.org/dev/library/collections >> >> The link to addresses are not really true, they look more like: >> >> https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCcQFjAA&url=http%3A%2F%2Fdocs.python.org%2Flibrary%2Fcollections.html&ei=k7vjUqPrHM_jsAS-m4G4Cw&usg=AFQjCNFTyb_RHzPdorBGavEIR_ekNn_AFA&sig2=yW6S02oUEfioUot11lTAlQ&bvm=bv.59930103,d.cWc >> >> Vincent Davis >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/rickboyce%40gmail.com >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140131/6709ebce/attachment.html> From rickboyce at gmail.com Fri Jan 31 13:41:48 2014 From: rickboyce at gmail.com (Rick Boyce) Date: Fri, 31 Jan 2014 12:41:48 +0000 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> References: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> Message-ID: <CAKhrM53WYAysLEhZfJrNbrXPau9yvOgtKbcTHzJf9=n26RtTPA@mail.gmail.com> I get caught out a lot by the titles Google is showing for pages quite often too, but as far I can tell they are not related to the /dev docs. If I Google "python builtins" the top 3 results, for me, are as follows: 2. Built-in Functions -- Python v2.7.6 documentation -> http://docs.python.org/library/functions.html 28.3. builtins -- Built-in objects -- Python 3.3.3 documentation -> https://docs.python.org/3/library/builtins.html Built-in objects - Python 3.3.3 documentation -> http://docs.python.org/library/__builtin__.html The top two are fine, but the last one is a Python 2 docs page but Google shows the title as being for 3.3. This seems to be really common when googling for python docs. At least the design of the two versions is different enough that you spot it immediately, but it happens often enough to be confusing all the same! Rick On 25 January 2014 13:47, Vincent Davis <vincent at vincentdavis.net> wrote: > When I do a google search the version numbers are mismatched with the > linked page (or redirected). > For example search for "python counter" I get the following results. (see > attachment) > It seems like the website is redirecting incorrectly. > > > 1. collections - Python 3.3.3 documentation<http://docs.python.org/library/collections.html> > 1. links to http://docs.python.org/library/collections.html > 2. redirects to http://docs.python.org/2/library/collections.html > 3. Which is python 2.7.6 > 2. itertools - Python 3.3.3 documentation<http://docs.python.org/library/itertools.html> > 1. links to http://docs.python.org/library/itertools.html > 2. redirects to http://docs.python.org/2/library/itertools.html > 3. Which is again 2.7.6 > 3. 8.3. collections -- Container datatypes - Python 3.3.3 documentation<http://docs.python.org/dev/library/collections> > 1. This one seems correct, 3.40b2 > 2. links to http://docs.python.org/dev/library/collections > > The link to addresses are not really true, they look more like: > > https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCcQFjAA&url=http%3A%2F%2Fdocs.python.org%2Flibrary%2Fcollections.html&ei=k7vjUqPrHM_jsAS-m4G4Cw&usg=AFQjCNFTyb_RHzPdorBGavEIR_ekNn_AFA&sig2=yW6S02oUEfioUot11lTAlQ&bvm=bv.59930103,d.cWc > > Vincent Davis > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/rickboyce%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140131/87245afd/attachment.html> From rickboyce at gmail.com Fri Jan 31 16:52:59 2014 From: rickboyce at gmail.com (Rick Boyce) Date: Fri, 31 Jan 2014 15:52:59 +0000 Subject: [Python-Dev] version numbers mismatched in google search results. In-Reply-To: <CALyJZZWRjVg6ZUVfztwfMYqzPRV0g3+H9yBQXjy3GTrD2SvO0w@mail.gmail.com> References: <CALyJZZW03Zz=H2_PWD_250xn4w_ze-XRa=yk-+s+kCMX4ybwoA@mail.gmail.com> <CAKhrM53WYAysLEhZfJrNbrXPau9yvOgtKbcTHzJf9=n26RtTPA@mail.gmail.com> <CALyJZZWRjVg6ZUVfztwfMYqzPRV0g3+H9yBQXjy3GTrD2SvO0w@mail.gmail.com> Message-ID: <CAKhrM52PTs9amL2HLbcrkea-tPZ9aHepv2SAgSs4MC4P3cvP-g@mail.gmail.com> > > On 31 January 2014 14:23, Vincent Davis <vincent at vincentdavis.net> wrote: > >> >> On Fri, Jan 31, 2014 at 5:41 AM, Rick Boyce <rickboyce at gmail.com> wrote: >> >>> 28.3. builtins -- Built-in objects -- Python 3.3.3 documentation -> >>> https://docs.python.org/3/library/builtins.html >> >> >> I can't get the https <https://docs.python.org/3/library/builtins.html> link >> to work. Does python.org support https<https://docs.python.org/3/library/builtins.html>? >> it should :-) >> > My mistake - I must have added the extra s when I de-googleized the URL. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140131/3ed82f76/attachment.html> From status at bugs.python.org Fri Jan 31 18:07:48 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 31 Jan 2014 18:07:48 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140131170748.96AA8560C8@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-01-24 - 2014-01-31) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4484 ( +9) closed 27748 (+70) total 32232 (+79) Open issues with patches: 2036 Issues opened (58) ================== #3158: Doctest fails to find doctests in extension modules http://bugs.python.org/issue3158 reopened by zach.ware #20383: Add a keyword-only spec argument to types.ModuleType http://bugs.python.org/issue20383 opened by brett.cannon #20384: os.open() exception doesn't contain file name on Windows http://bugs.python.org/issue20384 opened by serhiy.storchaka #20386: socket.SocketType enum overwrites import of _socket.SocketType http://bugs.python.org/issue20386 opened by giampaolo.rodola #20387: tokenize/untokenize roundtrip fails with tabs http://bugs.python.org/issue20387 opened by jason.coombs #20389: clarify meaning of xbar and mu in pvariance/variance of statis http://bugs.python.org/issue20389 opened by jtaylor #20391: windows python launcher should support explicit 64-bit version http://bugs.python.org/issue20391 opened by theller #20392: Inconsistency with uppercase file extensions in MimeTypes.gues http://bugs.python.org/issue20392 opened by rodrigo.parra #20393: Docs: mark deprecated items in the TOC http://bugs.python.org/issue20393 opened by zearin #20394: Coverity complains on audioop http://bugs.python.org/issue20394 opened by serhiy.storchaka #20396: Argument Clinic: Touch source file if any output file changed http://bugs.python.org/issue20396 opened by larry #20397: distutils --record option does not validate existance of byte- http://bugs.python.org/issue20397 opened by marcusva #20399: Comparison of memoryview http://bugs.python.org/issue20399 opened by fin.swimmer #20400: Add create_read_pipe_protocol/create_write_pipe_protocol to as http://bugs.python.org/issue20400 opened by haypo #20402: List comprehensions should be noted in for loop documentation http://bugs.python.org/issue20402 opened by robla #20403: Idle options dialog: add help http://bugs.python.org/issue20403 opened by terry.reedy #20404: Delayed exception using non-text encodings with TextIOWrapper http://bugs.python.org/issue20404 opened by ncoghlan #20405: Add io.BinaryTransformWrapper and a "transform" parameter to o http://bugs.python.org/issue20405 opened by ncoghlan #20406: Use application icon for IDLE http://bugs.python.org/issue20406 opened by serhiy.storchaka #20408: memoryview() constructor documentation error http://bugs.python.org/issue20408 opened by larry #20410: Argument Clinic: add 'self' return converter http://bugs.python.org/issue20410 opened by zach.ware #20412: Enum and IntEnum classes are not defined in the documentation http://bugs.python.org/issue20412 opened by ethan.furman #20413: Errors in documentation of standard codec error handlers http://bugs.python.org/issue20413 opened by RalfM #20414: Python 3.4 has two Overlapped types http://bugs.python.org/issue20414 opened by haypo #20415: Could method "isinstance" take a list as parameter? http://bugs.python.org/issue20415 opened by Chen.ZHANG #20416: Marshal: special case int and float, don't use references http://bugs.python.org/issue20416 opened by haypo #20417: ensurepip should not be installed with --without-ensurepip http://bugs.python.org/issue20417 opened by Arfrever #20420: BufferedIncrementalEncoder violates IncrementalEncoder interfa http://bugs.python.org/issue20420 opened by serhiy.storchaka #20421: expose SSL socket protocol version http://bugs.python.org/issue20421 opened by pitrou #20423: io.StringIO newline param has wrong default http://bugs.python.org/issue20423 opened by couplewavylines #20426: Compiling a regex with re.DEBUG should force a recompile http://bugs.python.org/issue20426 opened by leewz #20428: _Py_open does not work with O_CREAT http://bugs.python.org/issue20428 opened by alexandre.vassalotti #20429: 3.3.4rc1 install deleted Windows taskbar icons http://bugs.python.org/issue20429 opened by terry.reedy #20430: Make argparse.SUPPRESS work as an argument "dest" http://bugs.python.org/issue20430 opened by jzwinck #20431: Should posix functions that accept fd also accept objects with http://bugs.python.org/issue20431 opened by larry #20432: Argument Clinic: when cloning functions with path_t, path_t re http://bugs.python.org/issue20432 opened by larry #20433: add aliasedname() and namedaliases() methods to unicodedata mo http://bugs.python.org/issue20433 opened by jamadagni #20434: Process crashes if not enough memory to import module http://bugs.python.org/issue20434 opened by qualab #20435: Discrepancy between io.StringIO and _pyio.StringIO with univer http://bugs.python.org/issue20435 opened by serhiy.storchaka #20437: Use Py_CLEAR to safe clear attributes http://bugs.python.org/issue20437 opened by serhiy.storchaka #20438: inspect: Deprecate getfullargspec? http://bugs.python.org/issue20438 opened by yselivanov #20439: inspect.Signature: Add Signature.format method to match format http://bugs.python.org/issue20439 opened by yselivanov #20440: Use Py_REPLACE/Py_XREPLACE macros http://bugs.python.org/issue20440 opened by serhiy.storchaka #20443: __code__. co_filename should always be an absolute path http://bugs.python.org/issue20443 opened by yselivanov #20444: Reduce logging.config.Converting duplication of code http://bugs.python.org/issue20444 opened by dongwm #20445: HAVE_BROKEN_NICE detected incorrectly due to configure.ac typo http://bugs.python.org/issue20445 opened by George.Kouryachy #20446: ipaddress: hash similarities for ipv4 and ipv6 http://bugs.python.org/issue20446 opened by flambda #20447: doctest.debug_script: insecure use of /tmp http://bugs.python.org/issue20447 opened by jwilk #20450: hg touch fails on System Z Linux buildbot http://bugs.python.org/issue20450 opened by serhiy.storchaka #20451: os.exec* mangles argv on windows (splits on spaces, etc) http://bugs.python.org/issue20451 opened by The Compiler #20452: test_timeout_rounding() of test_asyncio fails on "x86 Ubuntu S http://bugs.python.org/issue20452 opened by haypo #20453: json.load() error message changed in 3.4 http://bugs.python.org/issue20453 opened by barry #20454: platform.linux_distribution() returns empty value on Archlinux http://bugs.python.org/issue20454 opened by fmoreau #20455: test_asyncio hangs on Windows http://bugs.python.org/issue20455 opened by haypo #20456: Argument Clinic rollup patch, 2014/01/31 http://bugs.python.org/issue20456 opened by larry #20457: Use partition and enumerate make getopt easier http://bugs.python.org/issue20457 opened by dongwm #20459: No Argument Clinic documentation on how to specify a return co http://bugs.python.org/issue20459 opened by brett.cannon #20460: Wrong markup in c-api/arg.rst http://bugs.python.org/issue20460 opened by OSAMU.NAKAMURA Most recent 15 issues with no replies (15) ========================================== #20460: Wrong markup in c-api/arg.rst http://bugs.python.org/issue20460 #20459: No Argument Clinic documentation on how to specify a return co http://bugs.python.org/issue20459 #20457: Use partition and enumerate make getopt easier http://bugs.python.org/issue20457 #20456: Argument Clinic rollup patch, 2014/01/31 http://bugs.python.org/issue20456 #20454: platform.linux_distribution() returns empty value on Archlinux http://bugs.python.org/issue20454 #20451: os.exec* mangles argv on windows (splits on spaces, etc) http://bugs.python.org/issue20451 #20450: hg touch fails on System Z Linux buildbot http://bugs.python.org/issue20450 #20447: doctest.debug_script: insecure use of /tmp http://bugs.python.org/issue20447 #20446: ipaddress: hash similarities for ipv4 and ipv6 http://bugs.python.org/issue20446 #20443: __code__. co_filename should always be an absolute path http://bugs.python.org/issue20443 #20435: Discrepancy between io.StringIO and _pyio.StringIO with univer http://bugs.python.org/issue20435 #20432: Argument Clinic: when cloning functions with path_t, path_t re http://bugs.python.org/issue20432 #20430: Make argparse.SUPPRESS work as an argument "dest" http://bugs.python.org/issue20430 #20429: 3.3.4rc1 install deleted Windows taskbar icons http://bugs.python.org/issue20429 #20421: expose SSL socket protocol version http://bugs.python.org/issue20421 Most recent 15 issues waiting for review (15) ============================================= #20460: Wrong markup in c-api/arg.rst http://bugs.python.org/issue20460 #20457: Use partition and enumerate make getopt easier http://bugs.python.org/issue20457 #20456: Argument Clinic rollup patch, 2014/01/31 http://bugs.python.org/issue20456 #20452: test_timeout_rounding() of test_asyncio fails on "x86 Ubuntu S http://bugs.python.org/issue20452 #20444: Reduce logging.config.Converting duplication of code http://bugs.python.org/issue20444 #20440: Use Py_REPLACE/Py_XREPLACE macros http://bugs.python.org/issue20440 #20437: Use Py_CLEAR to safe clear attributes http://bugs.python.org/issue20437 #20434: Process crashes if not enough memory to import module http://bugs.python.org/issue20434 #20417: ensurepip should not be installed with --without-ensurepip http://bugs.python.org/issue20417 #20416: Marshal: special case int and float, don't use references http://bugs.python.org/issue20416 #20414: Python 3.4 has two Overlapped types http://bugs.python.org/issue20414 #20406: Use application icon for IDLE http://bugs.python.org/issue20406 #20404: Delayed exception using non-text encodings with TextIOWrapper http://bugs.python.org/issue20404 #20394: Coverity complains on audioop http://bugs.python.org/issue20394 #20392: Inconsistency with uppercase file extensions in MimeTypes.gues http://bugs.python.org/issue20392 Top 10 most discussed issues (10) ================================= #20386: socket.SocketType enum overwrites import of _socket.SocketType http://bugs.python.org/issue20386 19 msgs #20311: epoll.poll(timeout) and PollSelector.select(timeout) must roun http://bugs.python.org/issue20311 18 msgs #20452: test_timeout_rounding() of test_asyncio fails on "x86 Ubuntu S http://bugs.python.org/issue20452 17 msgs #19081: zipimport behaves badly when the zip file changes while the pr http://bugs.python.org/issue19081 12 msgs #20414: Python 3.4 has two Overlapped types http://bugs.python.org/issue20414 12 msgs #19145: Inconsistent behaviour in itertools.repeat when using negative http://bugs.python.org/issue19145 11 msgs #20406: Use application icon for IDLE http://bugs.python.org/issue20406 10 msgs #20416: Marshal: special case int and float, don't use references http://bugs.python.org/issue20416 9 msgs #15216: Support setting the encoding on a text stream after creation http://bugs.python.org/issue15216 8 msgs #17162: Py_LIMITED_API needs a PyType_GenericDealloc http://bugs.python.org/issue17162 8 msgs Issues closed (70) ================== #8260: When I use codecs.open(...) and f.readline() follow up by f.re http://bugs.python.org/issue8260 closed by serhiy.storchaka #8639: Allow callable objects in inspect.getfullargspec http://bugs.python.org/issue8639 closed by yselivanov #12704: Language Reference: Clarify behaviour of yield when generator http://bugs.python.org/issue12704 closed by python-dev #15189: tkinter.messagebox does not use the application's icon http://bugs.python.org/issue15189 closed by terry.reedy #15869: IDLE: Include .desktop file and icon http://bugs.python.org/issue15869 closed by terry.reedy #15931: inspect.findsource fails after directory change http://bugs.python.org/issue15931 closed by yselivanov #16490: "inspect.getargspec()" and "inspect.getcallargs()" don't work http://bugs.python.org/issue16490 closed by yselivanov #17432: PyUnicode_ functions not accessible in Limited API on Windows http://bugs.python.org/issue17432 closed by loewis #17481: inspect.getfullargspec should use __signature__ http://bugs.python.org/issue17481 closed by yselivanov #17721: Help button on preference window doesn't work http://bugs.python.org/issue17721 closed by terry.reedy #17727: document that some distributions change site.py defaults http://bugs.python.org/issue17727 closed by georg.brandl #19077: More robust TemporaryDirectory cleanup http://bugs.python.org/issue19077 closed by serhiy.storchaka #19140: inspect.Signature.bind() inaccuracies http://bugs.python.org/issue19140 closed by yselivanov #19363: Python 2.7's future_builtins.map is not compatible with Python http://bugs.python.org/issue19363 closed by eric.smith #19456: ntpath doesn't join paths correctly when a drive is present http://bugs.python.org/issue19456 closed by serhiy.storchaka #19618: test_sysconfig_module fails on Ubuntu 12.04 http://bugs.python.org/issue19618 closed by berker.peksag #19658: inspect.getsource weird case http://bugs.python.org/issue19658 closed by yselivanov #19944: Make importlib.find_spec load packages as needed http://bugs.python.org/issue19944 closed by eric.snow #19966: Wrong mtimes of Include/Python-ast.h and Python/Python-ast.c i http://bugs.python.org/issue19966 closed by python-dev #19990: Add unittests for imghdr module http://bugs.python.org/issue19990 closed by serhiy.storchaka #20011: Changing the signature for Parameter's constructor http://bugs.python.org/issue20011 closed by yselivanov #20075: help(open) eats first line http://bugs.python.org/issue20075 closed by zach.ware #20105: Codec exception chaining is losing traceback details http://bugs.python.org/issue20105 closed by python-dev #20133: Derby: Convert the audioop module to use Argument Clinic http://bugs.python.org/issue20133 closed by serhiy.storchaka #20151: Derby: Convert the binascii module to use Argument Clinic http://bugs.python.org/issue20151 closed by serhiy.storchaka #20166: window x64 c-extensions not works on python3.4.0b2 http://bugs.python.org/issue20166 closed by skrah #20193: Derby: Convert the zlib, _bz2 and _lzma modules to use Argumen http://bugs.python.org/issue20193 closed by serhiy.storchaka #20209: Deprecate PROTOCOL_SSLv2 http://bugs.python.org/issue20209 closed by pitrou #20223: inspect.signature does not support new functools.partialmethod http://bugs.python.org/issue20223 closed by yselivanov #20231: Argument Clinic accepts no-default args after default args http://bugs.python.org/issue20231 closed by larry #20308: inspect.Signature doesn't support user classes without __init_ http://bugs.python.org/issue20308 closed by yselivanov #20317: ExitStack hang if enough nested exceptions http://bugs.python.org/issue20317 closed by ncoghlan #20320: select.select(timeout) and select.kqueue.control(timeout) must http://bugs.python.org/issue20320 closed by haypo #20322: Upgrade ensurepip's pip and setuptools http://bugs.python.org/issue20322 closed by dstufft #20325: Argument Clinic: self converters are not preserved when clonin http://bugs.python.org/issue20325 closed by larry #20326: Argument Clinic should use a non-error-prone syntax to mark te http://bugs.python.org/issue20326 closed by larry #20330: PEP 342 is outdated http://bugs.python.org/issue20330 closed by ncoghlan #20331: Fix various fd leaks http://bugs.python.org/issue20331 closed by serhiy.storchaka #20338: Idle: increase max calltip width http://bugs.python.org/issue20338 closed by terry.reedy #20348: Argument Clinic HOWTO listed multiple times in HOWTO index http://bugs.python.org/issue20348 closed by ezio.melotti #20349: Argument Clinic: error on __new__ or __init__ with no argument http://bugs.python.org/issue20349 closed by taleinat #20356: fix formatting of positional-only parameters in inspect.Signat http://bugs.python.org/issue20356 closed by yselivanov #20358: test_curses is failing on Ubuntu 13.10 http://bugs.python.org/issue20358 closed by larry #20367: concurrent.futures.as_completed() fails when given duplicate F http://bugs.python.org/issue20367 closed by gvanrossum #20372: inspect.getfile should raise a TypeError if C object does not http://bugs.python.org/issue20372 closed by yselivanov #20373: Use test.script_helper.assert_python_ok() instead of subproces http://bugs.python.org/issue20373 closed by pitrou #20376: Argument Clinic: backslashes in docstrings are not escaped http://bugs.python.org/issue20376 closed by zach.ware #20381: Argument Clinic: expression default arguments broken http://bugs.python.org/issue20381 closed by zach.ware #20382: Typo in cStringIO tp_name http://bugs.python.org/issue20382 closed by haypo #20385: Argument Clinic: Support for __new__ not checking _PyArg_NoKey http://bugs.python.org/issue20385 closed by taleinat #20388: Argument Clinic doesn't handle module level functions with mod http://bugs.python.org/issue20388 closed by zach.ware #20390: Argument Clinic rollup patch, 2014/01/25 http://bugs.python.org/issue20390 closed by larry #20395: Extract generated clinic code in Modules/_pickle.c to separate http://bugs.python.org/issue20395 closed by serhiy.storchaka #20398: stem crashes python http://bugs.python.org/issue20398 closed by pitrou #20401: inspect.signature removes initial starred method params (bug) http://bugs.python.org/issue20401 closed by yselivanov #20407: heapq.nsmallest and heapq.nlargest don't accept a "key" parame http://bugs.python.org/issue20407 closed by rhettinger #20409: .readline() returned garble text http://bugs.python.org/issue20409 closed by r.david.murray #20411: IndexError in sys.__interactivehook__ with pyreadline installe http://bugs.python.org/issue20411 closed by jason.coombs #20418: socket.getaddrinfo fails for hostname that is all digits 0-9 http://bugs.python.org/issue20418 closed by neologix #20419: it's not possible to set ECDH curve name via ssl.wrap_socket http://bugs.python.org/issue20419 closed by pitrou #20422: Signature.from_builtin should raise a ValueError when no signa http://bugs.python.org/issue20422 closed by yselivanov #20424: _pyio.StringIO doesn't work with lone surrogates http://bugs.python.org/issue20424 closed by serhiy.storchaka #20425: inspect.Signature should work on decorated builtins http://bugs.python.org/issue20425 closed by yselivanov #20427: inspect.Signature should ensure that non-default params don't http://bugs.python.org/issue20427 closed by yselivanov #20436: test.regrtest is no more importable in 2.7 http://bugs.python.org/issue20436 closed by serhiy.storchaka #20441: Test_tcl.TclTest.test_split(list) failures on Windows, 2.7. http://bugs.python.org/issue20441 closed by serhiy.storchaka #20442: inspect: Document Signature & Parameter constructors' signatur http://bugs.python.org/issue20442 closed by yselivanov #20448: Adds missing backslash to devguide setup page http://bugs.python.org/issue20448 closed by sahutd #20449: _overlapped: read_buffer and write_buffer are misused http://bugs.python.org/issue20449 closed by haypo #20458: ``clinic.py --converters`` raises an exception http://bugs.python.org/issue20458 closed by zach.ware