From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Aug 1 01:46:12 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 1 Aug 2016 14:46:12 +0900 Subject: [Python-ideas] size of the installation of Python on mobile devices In-Reply-To: References: <8d577973-eaaf-21cf-3604-8ffd28773ba4@gmail.com> <579A49F8.9080708@egenix.com> <7bb89d4a-3930-d2fd-9736-2360008f561e@gmail.com> <22428.22132.611219.145348@turnbull.sk.tsukuba.ac.jp> Message-ID: <22430.57892.461912.677961@turnbull.sk.tsukuba.ac.jp> Victor Stinner writes: > Xavier is a core developer. He is free to dedicate his time to > supporting sourceless distribution :-) So are we all, core or not. But on Nick's terms (he even envisions releases with the "sourceless" build broken), I don't think adding to core is fair to Xavier's (and others') efforts in this direction. It would also set an unfortunate precedent. Steve From storchaka at gmail.com Mon Aug 1 04:20:30 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 1 Aug 2016 11:20:30 +0300 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> Message-ID: On 31.07.16 22:38, Victor Stinner wrote: > I dislike this API. What's the point of calling clamp(x)? clamp(b, a) is > min(a, b) and clamp(a, max_val=b) is just max(a, b). My point is that > all parameters must be mandatory. Seconded. From p1himik at gmail.com Mon Aug 1 06:05:05 2016 From: p1himik at gmail.com (Eugene Pakhomov) Date: Mon, 1 Aug 2016 17:05:05 +0700 Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: It was hardly all of the thoughts, or at least with little background information. E.g. "multiprocessing and threading" - I can't find any information on it. Was it done in one go or gradually by introducing new aliases and deprecating old names, at it has been suggested by you (although it was 7 years ago)? The suggestion about aliases+deprecation has been made for other modules, but sadly it ended up with you saying "let's do it this way" - with nothing following. Or "you have to keep the old API around" - what is the exact motivation behind it? If it's "so the code doesn't break", then it's a strange motivation, because as I said, code gets [potentially] broken more often that it doesn't. Every alteration/addition/deletion is a potential code breakage a priory - aliasing+deprecation is in no way more dangerous than, let's say, new zipapp module. I in no way state that the changes should include a complete overhaul or be done in one go. But maybe we can at least start gradually introducing some naming changes to the oldest, most used and least changed (w.r.t. API) modules? The perfect example I think is collections module, but it's only from the personal experience - other modules may be better candidates. I can't say that I surely do not underestimate the efforts required. But what if I want to invest my time in it? If let's say I succeed and do everything according to the developer's guide and prove on a number of the most popular libraries that the changes indeed do not break anything - will be patch be considered or just thrown away with a note "we already discussed it"? Regards, Eugene On Mon, Aug 1, 2016 at 7:39 AM, Guido van Rossum wrote: > Thoughts of the core dev were expressed clearly earlier in this thread. > > On Sun, Jul 31, 2016 at 12:47 PM, Eugene Pakhomov > wrote: > > I'm on Ralph's side here. "Why is this thing named the other way?" was > one > > of the first questions I asked. And people whom I occasionally teach > about > > Python, ask the same question over and over again. > > > > Code breakage happens (PEP 3151 - didn't know about it till it almost > bit my > > leg off), so we can't shy away from it completely. > > Is there any link on the previous thoughts of the core devs on the > matter? > > Especially regarding the amount of potential code breakage. > > I'm genuinely interested, as I think that this amount is negligible if > the > > new names will be gradually introduced along with a deprecation notice on > > (end eventual removal of) the old ones. > > As far as I can see, it can only do some harm if someone uses a > discouraged > > "import *" or monkey-patches some new methods into Python standard > > classes/modules, and updates his Python installation. > > > > Regards, > > Eugene > > > > > > On Mon, Aug 1, 2016 at 1:46 AM, Ralph Broenink > > wrote: > >> > >> I'm a bit sad that I'm clearly on the loosing side of the argument. I > now > >> believe that I must be grossly underestimating the amount of effort and > >> overestimating the potential gain. However, I still feel that we should > >> strive for consistency in the long run. I do not propose to do this at > once, > >> but I feel that at least some collaborated effort would be nice. (If not > >> only for this kind of mail threads.) > >> > >> If I would start an effort to - for instance - 'fix' some camelCased > >> modules, and attempt to make it 100% backwards compatible, including > tests, > >> would there be any chance it could be merged at some point? Otherwise, I > >> feel it would be a totally pointless effort ;). > >> > >> On Tue, 26 Jul 2016 at 18:29 Brett Cannon wrote: > >>> > >>> On Mon, 25 Jul 2016 at 13:03 Mark Mollineaux > > >>> wrote: > >>>> > >>>> I've pined for this (and feel a real mental pain every time I use one > >>>> of those poorlyCased names)-- I end up using a lot of mental space > >>>> remembering exactly HOW each stdlib isn't consistent. > >>>> > >>>> Aliasing consistent names in each case seems like a real win all > >>>> around, personally. > >>> > >>> > >>> For those that want consistent names, you could create a PyPI package > >>> that is nothing more than the aliased names as suggested. > >>> > >>> Otherwise I get the desire for consistency, but as pointed out by a > bunch > >>> of other core devs, we have thought about this many times and always > reach > >>> the same conclusion that the amount of work and potential code > breakage is > >>> too great. > >>> > >>> -Brett > >>> > >>>> > >>>> > >>>> On Mon, Jul 25, 2016 at 10:55 AM, Ralph Broenink > >>>> wrote: > >>>> > Hi python-ideas, > >>>> > > >>>> > As you all know, the Python stdlib can sometimes be a bit of an > >>>> > inconsistent > >>>> > mess that can be surprising in how it names things. This is mostly > >>>> > caused by > >>>> > the fact that several modules were developed before the introduction > >>>> > of > >>>> > PEP-8, and now we're stuck with the older naming within these > modules. > >>>> > > >>>> > It has been said and discussed in the past [1][2] that the stdlib is > >>>> > in fact > >>>> > inconsistent, but fixing this has almost always been disregarded as > >>>> > being > >>>> > too painful (after all, we don't want a new Python 3 all over > again). > >>>> > However, this way, we will never move away from these > inconsistencies. > >>>> > Perhaps this is fine, but I think we should at least consider > >>>> > providing > >>>> > function and class names that are unsurprising for developers. > >>>> > > >>>> > While maintaining full backwards compatibility, my idea is that we > >>>> > should > >>>> > offer consistently named aliases in -eventually- all stdlib modules. > >>>> > For > >>>> > instance, with Python 2.6, the threading module received this > >>>> > treatment, but > >>>> > unfortunately this was not expanded to all modules. > >>>> > > >>>> > What am I speaking of precisely? I have done a quick survey of the > >>>> > stdlib > >>>> > and found the following examples. Please note, this is a highly > >>>> > opinionated > >>>> > list; some names may have been chosen with a very good reason, and > >>>> > others > >>>> > are just a matter of taste. Hopefully you agree with at least some > of > >>>> > them: > >>>> > > >>>> > * The CamelCasing in some modules are the most obvious culprits, > >>>> > e.g. > >>>> > logging and unittest. There is obviously an issue regarding > subclasses > >>>> > and > >>>> > methods that are supposed to be overridden, but I feel we could make > >>>> > it > >>>> > work. > >>>> > > >>>> > * All lower case class names, such as collections.defaultdict and > >>>> > collections.deque, should be CamelCased. Another example is > datetime, > >>>> > which > >>>> > uses names such as timedelta instead of TimeDelta. > >>>> > > >>>> > * Inconsistent names all together, such as re.sub, which I feel > >>>> > should be > >>>> > re.replace (cf. str.replace). But also re.finditer and re.findall, > but > >>>> > no > >>>> > re.find. > >>>> > > >>>> > * Names that do not reflect actual usage, such as > >>>> > ssl.PROTOCOL_SSLv23, > >>>> > which can in fact not be used as client for SSLv2. > >>>> > > >>>> > * Underscore usage, such as tarfile.TarFile.gettarinfo (should it > >>>> > not be > >>>> > get_tar_info?), http.client.HTTPConnection.getresponse vs > >>>> > set_debuglevel, > >>>> > and pathlib.Path.samefile vs pathlib.Path.read_text. And is it > >>>> > pkgutil.iter_modules or is it pathlib.Path.iterdir (or re.finditer)? > >>>> > > >>>> > * Usage of various abbreviations, such as in filecmp.cmp > >>>> > > >>>> > * Inconsistencies between similar modules, e.g. between > >>>> > tarfile.TarFile.add and zipfile.ZipFile.write. > >>>> > > >>>> > These are just some examples of inconsistent and surprising naming I > >>>> > could > >>>> > find, other categories are probably also conceivable. Another > subject > >>>> > for > >>>> > reconsideration would be attribute and argument names, but I haven't > >>>> > looked > >>>> > for those in my quick survey. > >>>> > > >>>> > For all of these inconsistencies, I think we should make a > >>>> > 'consistently' > >>>> > named alternative, and alias the original variant with them (or the > >>>> > other > >>>> > way around), without defining a deprecation timeline for the > original > >>>> > names. > >>>> > This should make it possible to eventually make the stdlib > consistent, > >>>> > Pythonic and unsurprising. > >>>> > > >>>> > What would you think of such an effort? > >>>> > > >>>> > Regards, > >>>> > Ralph Broenink > >>>> > > >>>> > [1] > >>>> > > https://mail.python.org/pipermail/python-ideas/2010-January/006755.html > >>>> > [2] > >>>> > https://mail.python.org/pipermail/python-dev/2009-March/086646.html > >>>> > > >>>> > > >>>> > _______________________________________________ > >>>> > Python-ideas mailing list > >>>> > Python-ideas at python.org > >>>> > https://mail.python.org/mailman/listinfo/python-ideas > >>>> > Code of Conduct: http://python.org/psf/codeofconduct/ > >>>> _______________________________________________ > >>>> Python-ideas mailing list > >>>> Python-ideas at python.org > >>>> https://mail.python.org/mailman/listinfo/python-ideas > >>>> Code of Conduct: http://python.org/psf/codeofconduct/ > >> > >> > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From julien at gns3.net Mon Aug 1 06:41:19 2016 From: julien at gns3.net (Julien Duponchelle) Date: Mon, 01 Aug 2016 10:41:19 +0000 Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: A lot of people will not want to invest time to rewrite their code to use the consistent stdlib because they will not see immediate/enough benefits. This mean you will need to keep the alias forever creating more confusion for users because they will have two function doing the same. Regards, Julien On Mon, Aug 1, 2016 at 12:05 PM Eugene Pakhomov wrote: > It was hardly all of the thoughts, or at least with little background > information. > > E.g. "multiprocessing and threading" - I can't find any information on it. > Was it done in one go or gradually by introducing new aliases and > deprecating old names, at it has been suggested by you (although it was 7 > years ago)? > > The suggestion about aliases+deprecation has been made for other modules, > but sadly it ended up with you saying "let's do it this way" - with nothing > following. > > Or "you have to keep the old API around" - what is the exact motivation > behind it? If it's "so the code doesn't break", then it's a strange > motivation, because as I said, code gets [potentially] broken more often > that it doesn't. Every alteration/addition/deletion is a potential code > breakage a priory - aliasing+deprecation is in no way more dangerous than, > let's say, new zipapp module. > > I in no way state that the changes should include a complete overhaul or > be done in one go. But maybe we can at least start gradually introducing > some naming changes to the oldest, most used and least changed (w.r.t. API) > modules? > The perfect example I think is collections module, but it's only from the > personal experience - other modules may be better candidates. > > I can't say that I surely do not underestimate the efforts required. But > what if I want to invest my time in it? If let's say I succeed and do > everything according to the developer's guide and prove on a number of the > most popular libraries that the changes indeed do not break anything - will > be patch be considered or just thrown away with a note "we already > discussed it"? > > Regards, > Eugene > > > On Mon, Aug 1, 2016 at 7:39 AM, Guido van Rossum wrote: > >> Thoughts of the core dev were expressed clearly earlier in this thread. >> >> On Sun, Jul 31, 2016 at 12:47 PM, Eugene Pakhomov >> wrote: >> > I'm on Ralph's side here. "Why is this thing named the other way?" was >> one >> > of the first questions I asked. And people whom I occasionally teach >> about >> > Python, ask the same question over and over again. >> > >> > Code breakage happens (PEP 3151 - didn't know about it till it almost >> bit my >> > leg off), so we can't shy away from it completely. >> > Is there any link on the previous thoughts of the core devs on the >> matter? >> > Especially regarding the amount of potential code breakage. >> > I'm genuinely interested, as I think that this amount is negligible if >> the >> > new names will be gradually introduced along with a deprecation notice >> on >> > (end eventual removal of) the old ones. >> > As far as I can see, it can only do some harm if someone uses a >> discouraged >> > "import *" or monkey-patches some new methods into Python standard >> > classes/modules, and updates his Python installation. >> > >> > Regards, >> > Eugene >> > >> > >> > On Mon, Aug 1, 2016 at 1:46 AM, Ralph Broenink > > >> > wrote: >> >> >> >> I'm a bit sad that I'm clearly on the loosing side of the argument. I >> now >> >> believe that I must be grossly underestimating the amount of effort and >> >> overestimating the potential gain. However, I still feel that we should >> >> strive for consistency in the long run. I do not propose to do this at >> once, >> >> but I feel that at least some collaborated effort would be nice. (If >> not >> >> only for this kind of mail threads.) >> >> >> >> If I would start an effort to - for instance - 'fix' some camelCased >> >> modules, and attempt to make it 100% backwards compatible, including >> tests, >> >> would there be any chance it could be merged at some point? Otherwise, >> I >> >> feel it would be a totally pointless effort ;). >> >> >> >> On Tue, 26 Jul 2016 at 18:29 Brett Cannon wrote: >> >>> >> >>> On Mon, 25 Jul 2016 at 13:03 Mark Mollineaux < >> bufordsharkley at gmail.com> >> >>> wrote: >> >>>> >> >>>> I've pined for this (and feel a real mental pain every time I use one >> >>>> of those poorlyCased names)-- I end up using a lot of mental space >> >>>> remembering exactly HOW each stdlib isn't consistent. >> >>>> >> >>>> Aliasing consistent names in each case seems like a real win all >> >>>> around, personally. >> >>> >> >>> >> >>> For those that want consistent names, you could create a PyPI package >> >>> that is nothing more than the aliased names as suggested. >> >>> >> >>> Otherwise I get the desire for consistency, but as pointed out by a >> bunch >> >>> of other core devs, we have thought about this many times and always >> reach >> >>> the same conclusion that the amount of work and potential code >> breakage is >> >>> too great. >> >>> >> >>> -Brett >> >>> >> >>>> >> >>>> >> >>>> On Mon, Jul 25, 2016 at 10:55 AM, Ralph Broenink >> >>>> wrote: >> >>>> > Hi python-ideas, >> >>>> > >> >>>> > As you all know, the Python stdlib can sometimes be a bit of an >> >>>> > inconsistent >> >>>> > mess that can be surprising in how it names things. This is mostly >> >>>> > caused by >> >>>> > the fact that several modules were developed before the >> introduction >> >>>> > of >> >>>> > PEP-8, and now we're stuck with the older naming within these >> modules. >> >>>> > >> >>>> > It has been said and discussed in the past [1][2] that the stdlib >> is >> >>>> > in fact >> >>>> > inconsistent, but fixing this has almost always been disregarded as >> >>>> > being >> >>>> > too painful (after all, we don't want a new Python 3 all over >> again). >> >>>> > However, this way, we will never move away from these >> inconsistencies. >> >>>> > Perhaps this is fine, but I think we should at least consider >> >>>> > providing >> >>>> > function and class names that are unsurprising for developers. >> >>>> > >> >>>> > While maintaining full backwards compatibility, my idea is that we >> >>>> > should >> >>>> > offer consistently named aliases in -eventually- all stdlib >> modules. >> >>>> > For >> >>>> > instance, with Python 2.6, the threading module received this >> >>>> > treatment, but >> >>>> > unfortunately this was not expanded to all modules. >> >>>> > >> >>>> > What am I speaking of precisely? I have done a quick survey of the >> >>>> > stdlib >> >>>> > and found the following examples. Please note, this is a highly >> >>>> > opinionated >> >>>> > list; some names may have been chosen with a very good reason, and >> >>>> > others >> >>>> > are just a matter of taste. Hopefully you agree with at least some >> of >> >>>> > them: >> >>>> > >> >>>> > * The CamelCasing in some modules are the most obvious culprits, >> >>>> > e.g. >> >>>> > logging and unittest. There is obviously an issue regarding >> subclasses >> >>>> > and >> >>>> > methods that are supposed to be overridden, but I feel we could >> make >> >>>> > it >> >>>> > work. >> >>>> > >> >>>> > * All lower case class names, such as collections.defaultdict and >> >>>> > collections.deque, should be CamelCased. Another example is >> datetime, >> >>>> > which >> >>>> > uses names such as timedelta instead of TimeDelta. >> >>>> > >> >>>> > * Inconsistent names all together, such as re.sub, which I feel >> >>>> > should be >> >>>> > re.replace (cf. str.replace). But also re.finditer and re.findall, >> but >> >>>> > no >> >>>> > re.find. >> >>>> > >> >>>> > * Names that do not reflect actual usage, such as >> >>>> > ssl.PROTOCOL_SSLv23, >> >>>> > which can in fact not be used as client for SSLv2. >> >>>> > >> >>>> > * Underscore usage, such as tarfile.TarFile.gettarinfo (should it >> >>>> > not be >> >>>> > get_tar_info?), http.client.HTTPConnection.getresponse vs >> >>>> > set_debuglevel, >> >>>> > and pathlib.Path.samefile vs pathlib.Path.read_text. And is it >> >>>> > pkgutil.iter_modules or is it pathlib.Path.iterdir (or >> re.finditer)? >> >>>> > >> >>>> > * Usage of various abbreviations, such as in filecmp.cmp >> >>>> > >> >>>> > * Inconsistencies between similar modules, e.g. between >> >>>> > tarfile.TarFile.add and zipfile.ZipFile.write. >> >>>> > >> >>>> > These are just some examples of inconsistent and surprising naming >> I >> >>>> > could >> >>>> > find, other categories are probably also conceivable. Another >> subject >> >>>> > for >> >>>> > reconsideration would be attribute and argument names, but I >> haven't >> >>>> > looked >> >>>> > for those in my quick survey. >> >>>> > >> >>>> > For all of these inconsistencies, I think we should make a >> >>>> > 'consistently' >> >>>> > named alternative, and alias the original variant with them (or the >> >>>> > other >> >>>> > way around), without defining a deprecation timeline for the >> original >> >>>> > names. >> >>>> > This should make it possible to eventually make the stdlib >> consistent, >> >>>> > Pythonic and unsurprising. >> >>>> > >> >>>> > What would you think of such an effort? >> >>>> > >> >>>> > Regards, >> >>>> > Ralph Broenink >> >>>> > >> >>>> > [1] >> >>>> > >> https://mail.python.org/pipermail/python-ideas/2010-January/006755.html >> >>>> > [2] >> >>>> > >> https://mail.python.org/pipermail/python-dev/2009-March/086646.html >> >>>> > >> >>>> > >> >>>> > _______________________________________________ >> >>>> > Python-ideas mailing list >> >>>> > Python-ideas at python.org >> >>>> > https://mail.python.org/mailman/listinfo/python-ideas >> >>>> > Code of Conduct: http://python.org/psf/codeofconduct/ >> >>>> _______________________________________________ >> >>>> Python-ideas mailing list >> >>>> Python-ideas at python.org >> >>>> https://mail.python.org/mailman/listinfo/python-ideas >> >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> >> >> _______________________________________________ >> >> Python-ideas mailing list >> >> Python-ideas at python.org >> >> https://mail.python.org/mailman/listinfo/python-ideas >> >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > >> > >> > >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p1himik at gmail.com Mon Aug 1 06:45:34 2016 From: p1himik at gmail.com (Eugene Pakhomov) Date: Mon, 1 Aug 2016 17:45:34 +0700 Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: There's always a lot of people that don't want to make some changes. It doesn't mean that we have to keep every old piece at its place. We can keep deprecated names for a really long time, yes - maybe even until we introduce some required and definitely breaking change in the related module. But the names still will be deprecated and their use hence will be discouraged - there's no confusion at all. Regards, Eugene On Mon, Aug 1, 2016 at 5:41 PM, Julien Duponchelle wrote: > A lot of people will not want to invest time to rewrite their code to use > the consistent stdlib because they will not see immediate/enough benefits. > This mean you will need to keep the alias forever creating more confusion > for users because they will have two function doing the same. > > Regards, > > Julien > > On Mon, Aug 1, 2016 at 12:05 PM Eugene Pakhomov wrote: > >> It was hardly all of the thoughts, or at least with little background >> information. >> >> E.g. "multiprocessing and threading" - I can't find any information on >> it. Was it done in one go or gradually by introducing new aliases and >> deprecating old names, at it has been suggested by you (although it was 7 >> years ago)? >> >> The suggestion about aliases+deprecation has been made for other modules, >> but sadly it ended up with you saying "let's do it this way" - with nothing >> following. >> >> Or "you have to keep the old API around" - what is the exact motivation >> behind it? If it's "so the code doesn't break", then it's a strange >> motivation, because as I said, code gets [potentially] broken more often >> that it doesn't. Every alteration/addition/deletion is a potential code >> breakage a priory - aliasing+deprecation is in no way more dangerous than, >> let's say, new zipapp module. >> >> I in no way state that the changes should include a complete overhaul or >> be done in one go. But maybe we can at least start gradually introducing >> some naming changes to the oldest, most used and least changed (w.r.t. API) >> modules? >> The perfect example I think is collections module, but it's only from the >> personal experience - other modules may be better candidates. >> >> I can't say that I surely do not underestimate the efforts required. But >> what if I want to invest my time in it? If let's say I succeed and do >> everything according to the developer's guide and prove on a number of the >> most popular libraries that the changes indeed do not break anything - will >> be patch be considered or just thrown away with a note "we already >> discussed it"? >> >> Regards, >> Eugene >> >> >> On Mon, Aug 1, 2016 at 7:39 AM, Guido van Rossum >> wrote: >> >>> Thoughts of the core dev were expressed clearly earlier in this thread. >>> >>> On Sun, Jul 31, 2016 at 12:47 PM, Eugene Pakhomov >>> wrote: >>> > I'm on Ralph's side here. "Why is this thing named the other way?" was >>> one >>> > of the first questions I asked. And people whom I occasionally teach >>> about >>> > Python, ask the same question over and over again. >>> > >>> > Code breakage happens (PEP 3151 - didn't know about it till it almost >>> bit my >>> > leg off), so we can't shy away from it completely. >>> > Is there any link on the previous thoughts of the core devs on the >>> matter? >>> > Especially regarding the amount of potential code breakage. >>> > I'm genuinely interested, as I think that this amount is negligible if >>> the >>> > new names will be gradually introduced along with a deprecation notice >>> on >>> > (end eventual removal of) the old ones. >>> > As far as I can see, it can only do some harm if someone uses a >>> discouraged >>> > "import *" or monkey-patches some new methods into Python standard >>> > classes/modules, and updates his Python installation. >>> > >>> > Regards, >>> > Eugene >>> > >>> > >>> > On Mon, Aug 1, 2016 at 1:46 AM, Ralph Broenink < >>> ralph at ralphbroenink.net> >>> > wrote: >>> >> >>> >> I'm a bit sad that I'm clearly on the loosing side of the argument. I >>> now >>> >> believe that I must be grossly underestimating the amount of effort >>> and >>> >> overestimating the potential gain. However, I still feel that we >>> should >>> >> strive for consistency in the long run. I do not propose to do this >>> at once, >>> >> but I feel that at least some collaborated effort would be nice. (If >>> not >>> >> only for this kind of mail threads.) >>> >> >>> >> If I would start an effort to - for instance - 'fix' some camelCased >>> >> modules, and attempt to make it 100% backwards compatible, including >>> tests, >>> >> would there be any chance it could be merged at some point? >>> Otherwise, I >>> >> feel it would be a totally pointless effort ;). >>> >> >>> >> On Tue, 26 Jul 2016 at 18:29 Brett Cannon wrote: >>> >>> >>> >>> On Mon, 25 Jul 2016 at 13:03 Mark Mollineaux < >>> bufordsharkley at gmail.com> >>> >>> wrote: >>> >>>> >>> >>>> I've pined for this (and feel a real mental pain every time I use >>> one >>> >>>> of those poorlyCased names)-- I end up using a lot of mental space >>> >>>> remembering exactly HOW each stdlib isn't consistent. >>> >>>> >>> >>>> Aliasing consistent names in each case seems like a real win all >>> >>>> around, personally. >>> >>> >>> >>> >>> >>> For those that want consistent names, you could create a PyPI package >>> >>> that is nothing more than the aliased names as suggested. >>> >>> >>> >>> Otherwise I get the desire for consistency, but as pointed out by a >>> bunch >>> >>> of other core devs, we have thought about this many times and always >>> reach >>> >>> the same conclusion that the amount of work and potential code >>> breakage is >>> >>> too great. >>> >>> >>> >>> -Brett >>> >>> >>> >>>> >>> >>>> >>> >>>> On Mon, Jul 25, 2016 at 10:55 AM, Ralph Broenink >>> >>>> wrote: >>> >>>> > Hi python-ideas, >>> >>>> > >>> >>>> > As you all know, the Python stdlib can sometimes be a bit of an >>> >>>> > inconsistent >>> >>>> > mess that can be surprising in how it names things. This is mostly >>> >>>> > caused by >>> >>>> > the fact that several modules were developed before the >>> introduction >>> >>>> > of >>> >>>> > PEP-8, and now we're stuck with the older naming within these >>> modules. >>> >>>> > >>> >>>> > It has been said and discussed in the past [1][2] that the stdlib >>> is >>> >>>> > in fact >>> >>>> > inconsistent, but fixing this has almost always been disregarded >>> as >>> >>>> > being >>> >>>> > too painful (after all, we don't want a new Python 3 all over >>> again). >>> >>>> > However, this way, we will never move away from these >>> inconsistencies. >>> >>>> > Perhaps this is fine, but I think we should at least consider >>> >>>> > providing >>> >>>> > function and class names that are unsurprising for developers. >>> >>>> > >>> >>>> > While maintaining full backwards compatibility, my idea is that we >>> >>>> > should >>> >>>> > offer consistently named aliases in -eventually- all stdlib >>> modules. >>> >>>> > For >>> >>>> > instance, with Python 2.6, the threading module received this >>> >>>> > treatment, but >>> >>>> > unfortunately this was not expanded to all modules. >>> >>>> > >>> >>>> > What am I speaking of precisely? I have done a quick survey of the >>> >>>> > stdlib >>> >>>> > and found the following examples. Please note, this is a highly >>> >>>> > opinionated >>> >>>> > list; some names may have been chosen with a very good reason, and >>> >>>> > others >>> >>>> > are just a matter of taste. Hopefully you agree with at least >>> some of >>> >>>> > them: >>> >>>> > >>> >>>> > * The CamelCasing in some modules are the most obvious culprits, >>> >>>> > e.g. >>> >>>> > logging and unittest. There is obviously an issue regarding >>> subclasses >>> >>>> > and >>> >>>> > methods that are supposed to be overridden, but I feel we could >>> make >>> >>>> > it >>> >>>> > work. >>> >>>> > >>> >>>> > * All lower case class names, such as collections.defaultdict >>> and >>> >>>> > collections.deque, should be CamelCased. Another example is >>> datetime, >>> >>>> > which >>> >>>> > uses names such as timedelta instead of TimeDelta. >>> >>>> > >>> >>>> > * Inconsistent names all together, such as re.sub, which I feel >>> >>>> > should be >>> >>>> > re.replace (cf. str.replace). But also re.finditer and >>> re.findall, but >>> >>>> > no >>> >>>> > re.find. >>> >>>> > >>> >>>> > * Names that do not reflect actual usage, such as >>> >>>> > ssl.PROTOCOL_SSLv23, >>> >>>> > which can in fact not be used as client for SSLv2. >>> >>>> > >>> >>>> > * Underscore usage, such as tarfile.TarFile.gettarinfo (should >>> it >>> >>>> > not be >>> >>>> > get_tar_info?), http.client.HTTPConnection.getresponse vs >>> >>>> > set_debuglevel, >>> >>>> > and pathlib.Path.samefile vs pathlib.Path.read_text. And is it >>> >>>> > pkgutil.iter_modules or is it pathlib.Path.iterdir (or >>> re.finditer)? >>> >>>> > >>> >>>> > * Usage of various abbreviations, such as in filecmp.cmp >>> >>>> > >>> >>>> > * Inconsistencies between similar modules, e.g. between >>> >>>> > tarfile.TarFile.add and zipfile.ZipFile.write. >>> >>>> > >>> >>>> > These are just some examples of inconsistent and surprising >>> naming I >>> >>>> > could >>> >>>> > find, other categories are probably also conceivable. Another >>> subject >>> >>>> > for >>> >>>> > reconsideration would be attribute and argument names, but I >>> haven't >>> >>>> > looked >>> >>>> > for those in my quick survey. >>> >>>> > >>> >>>> > For all of these inconsistencies, I think we should make a >>> >>>> > 'consistently' >>> >>>> > named alternative, and alias the original variant with them (or >>> the >>> >>>> > other >>> >>>> > way around), without defining a deprecation timeline for the >>> original >>> >>>> > names. >>> >>>> > This should make it possible to eventually make the stdlib >>> consistent, >>> >>>> > Pythonic and unsurprising. >>> >>>> > >>> >>>> > What would you think of such an effort? >>> >>>> > >>> >>>> > Regards, >>> >>>> > Ralph Broenink >>> >>>> > >>> >>>> > [1] >>> >>>> > >>> https://mail.python.org/pipermail/python-ideas/2010-January/006755.html >>> >>>> > [2] >>> >>>> > >>> https://mail.python.org/pipermail/python-dev/2009-March/086646.html >>> >>>> > >>> >>>> > >>> >>>> > _______________________________________________ >>> >>>> > Python-ideas mailing list >>> >>>> > Python-ideas at python.org >>> >>>> > https://mail.python.org/mailman/listinfo/python-ideas >>> >>>> > Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>>> _______________________________________________ >>> >>>> Python-ideas mailing list >>> >>>> Python-ideas at python.org >>> >>>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >>> >> >>> >> _______________________________________________ >>> >> Python-ideas mailing list >>> >> Python-ideas at python.org >>> >> https://mail.python.org/mailman/listinfo/python-ideas >>> >> Code of Conduct: http://python.org/psf/codeofconduct/ >>> > >>> > >>> > >>> > _______________________________________________ >>> > Python-ideas mailing list >>> > Python-ideas at python.org >>> > https://mail.python.org/mailman/listinfo/python-ideas >>> > Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >>> >>> -- >>> --Guido van Rossum (python.org/~guido) >>> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Aug 1 07:03:29 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 1 Aug 2016 21:03:29 +1000 Subject: [Python-ideas] size of the installation of Python on mobile devices In-Reply-To: <22430.57892.461912.677961@turnbull.sk.tsukuba.ac.jp> References: <8d577973-eaaf-21cf-3604-8ffd28773ba4@gmail.com> <579A49F8.9080708@egenix.com> <7bb89d4a-3930-d2fd-9736-2360008f561e@gmail.com> <22428.22132.611219.145348@turnbull.sk.tsukuba.ac.jp> <22430.57892.461912.677961@turnbull.sk.tsukuba.ac.jp> Message-ID: On 1 August 2016 at 15:46, Stephen J. Turnbull wrote: > Victor Stinner writes: > > > Xavier is a core developer. He is free to dedicate his time to > > supporting sourceless distribution :-) > > So are we all, core or not. But on Nick's terms (he even envisions > releases with the "sourceless" build broken), I don't think adding to > core is fair to Xavier's (and others') efforts in this direction. How would it be any different from our efforts to support other platforms outside the primary set of Windows, Mac OS X, Linux, and *BSD? Things *definitely* break from time-to-time on those other less common setups, and when they do, folks submit patches to fix them after they notice (usually because they went to rebase their own work on a newer version and discovered things didn't work as expected). I'd see this working the same way - we wouldn't go out of our way to break sourceless builds, but if they did break, it would be on the folks that care about them to submit patches to resolve the problem. The gain for folks that care would be getting a green light to pursue more robust support for that model in the standard library itself, such as clearly marking test cases that require linecache to be working (as we already do for tests that require docstrings or that test CPython implementation details rather than Python language features), and perhaps even eventually developing a mechanism along the lines of JavaScript sourcemaps that would allow linecache to keep working, even when running on a sourceless build of the standard library. > It would also set an unfortunate precedent. What precedent do you mean? That ./configure may contain options that aren't 100% reliable? That's already the case - I can assure you that we *do not* consistently test all of the options reported by "./configure --help", since what matters is that the options people are *actually using* keep working in the context where they're using them, rather than all of the options working in every possible environment. Or do you mean the precedent that we're OK with folks shipping the standard library sans source code? *That* precedent was set when Guido chose to use of permissive licensing model for the language definition and runtime - while shipping without source code is a bad idea in educational contexts, and other situations where having the code on hand for inspection by end users is beneficial, Python is used in plenty of scenarios where those considerations don't apply. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Mon Aug 1 07:45:58 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 1 Aug 2016 12:45:58 +0100 Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: On 1 August 2016 at 11:05, Eugene Pakhomov wrote: > Or "you have to keep the old API around" - what is the exact motivation > behind it? If it's "so the code doesn't break", then it's a strange > motivation, because as I said, code gets [potentially] broken more often > that it doesn't. Every alteration/addition/deletion is a potential code > breakage a priory - aliasing+deprecation is in no way more dangerous than, > let's say, new zipapp module. The motivation here is that there are many, many people with deployed code that uses the existing APIs and names. You are suggesting that we ask those people to modify their working code to use the new names - certainly you're proposing a deprecation period, but ultimately the proposal is to just have the new names. What benefit do you offer to those people to justify that work? "It's more consistent" isn't going to be compelling. Further, how will you support people such as library authors who need their code to work with multiple versions of Python? What timescale are you talking about here? Library authors are still having to support 2.7 at present, so until 2.7 is completely gone, *someone* will have to maintain code that uses both old and new names. So either your deprecation period is very long (with a consequent cost on the Python core developers, maintaining 2 names) or some library authors are left having to maintain their own compatibility code. Neither is attractive, so again, where's a practical, significant benefit? What's the benefit to book authors or trainers who find that their books/course materials are now out of date, and they are under pressure to produce a new version that's "up to date"? The Python core developers take their responsibility not to break their users' code without good reason very seriously. And from long experience, we know that we need to consider long timescales. That's not necessarily something we like (if we were writing things from scratch, we might well make different decisions) but it's part of the role of maintaining software that millions of people rely on, often for core aspects of their business. > I can't say that I surely do not underestimate the efforts required. But > what if I want to invest my time in it? If let's say I succeed and do > everything according to the developer's guide and prove on a number of the > most popular libraries that the changes indeed do not break anything - will > be patch be considered or just thrown away with a note "we already discussed > it"? You've had the answer a few times in this thread. The benefits have to outweigh the costs. Vague statements about "consistency" are not enough, you need to show concrete benefits and show how they improve things *for the people who have to change their code*. [From another post] > But the names still will be deprecated and their use hence will be discouraged - there's no confusion at all. As long as both names are supported (even deprecated and discouraged names remain supported) the Python core developers will have to pay the cost - maintain compatibility wrappers, test both names, etc. How long do you expect the core devs to do that? Here - consider this. We rename collections.defaultdict to collections.DefaultDict (for whatever reason). So now collections.defaultdict must act the same as collections.DefaultDict. OK, so someone has the following relatively standard mock object type of pattern in their test suite: # Patch defaultdict for some reason collections.defaultdict = MyMockClass # Now call the code under test test_some_external_function() # Now check the results MyMockClass.showcallpattern() Now, suppose that the "external function" switches to use the name collections.DefaultDict. The above code will break, unless the two names defaultdict and DefaultDict are updated in parallel somehow to both point to MyMockClass. How do you propose we support that? And if your answer is that the user is making incorrect use of the stdlib, then you just broke their code. You have what you feel is a justification, but who gets to handle the bug report? Who gets to support a user whose production code needs a rewrite to work with Python 3.6? Or to support the author of the external_function() library who has angry users saying the new version broke code that uses it, even though the library author was following the new guidelines given in the Python documentation (I assume you'll be including documentation updates in your proposal)? Of course these are unlikely scenarios. I'm not trying to claim otherwise. But they are the sort of things that the core devs have to concern themselves with, and that's why the benefits need to justify a change like this. Paul From p.f.moore at gmail.com Mon Aug 1 07:49:56 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 1 Aug 2016 12:49:56 +0100 Subject: [Python-ideas] size of the installation of Python on mobile devices In-Reply-To: References: <8d577973-eaaf-21cf-3604-8ffd28773ba4@gmail.com> <579A49F8.9080708@egenix.com> <7bb89d4a-3930-d2fd-9736-2360008f561e@gmail.com> <22428.22132.611219.145348@turnbull.sk.tsukuba.ac.jp> <22430.57892.461912.677961@turnbull.sk.tsukuba.ac.jp> Message-ID: On 1 August 2016 at 12:03, Nick Coghlan wrote: > Or do you mean the precedent that we're OK with folks shipping the > standard library sans source code? *That* precedent was set when Guido > chose to use of permissive licensing model for the language definition > and runtime - while shipping without source code is a bad idea in > educational contexts, and other situations where having the code on > hand for inspection by end users is beneficial, Python is used in > plenty of scenarios where those considerations don't apply. And the Windows "embedded" distributions already ship without source code. Paul From ncoghlan at gmail.com Mon Aug 1 08:51:36 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 1 Aug 2016 22:51:36 +1000 Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: On 1 August 2016 at 20:05, Eugene Pakhomov wrote: > It was hardly all of the thoughts, or at least with little background > information. The most pertinent piece of background information is http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.html The number of changes we *could* make to Python as a language is theoretically unbounded. "Can you get current core developers sufficiently enthusiastic about your idea to encourage you to see it through to implementation?" is thus one of the main defenses the language has against churn for churn's sake (it's definitely not the only one, but it's still a hurdle that a lot of proposals fail to clear). > E.g. "multiprocessing and threading" - I can't find any information on it. > Was it done in one go or gradually by introducing new aliases and > deprecating old names, at it has been suggested by you (although it was 7 > years ago)? All at once, and the old names haven't been removed, and will likely remain supported indefinitely (there's nothing technically wrong with them, they just predate the introduction of the descriptor protocol and Python's adoption of snake_case as the preferred convention for method and attribute names, and instead use the older Java-style API with camelCase names and explicit getter and setter methods). The implementation issue is http://bugs.python.org/issue3042 but you'll need to click through to the actual applied patches to see the magnitude of the diffs (the patches Benjamin posted to the tracker were only partial ones to discuss the general approach). For a decent list of other renames that have largely been judged to have created more annoyance for existing users than was justified by any related increase in naming consistency, the six.moves compatibility module documentation includes a concise list of many of the renames that took place in the migration to Python 3: https://pythonhosted.org/six/#module-six.moves I've yet to hear a professional educator proclaim their delight at those renaming efforts, but I *have* received a concrete suggestion for improving the process next time we decide to start renaming things to improve "consistency": don't do it unless we have actual user experience testing results in hand to show that the new names are genuinely less confusing and easier to learn than the old ones. > The suggestion about aliases+deprecation has been made for other modules, > but sadly it ended up with you saying "let's do it this way" - with nothing > following. That's the second great filter for language change proposals: is there at least one person (whether a current core developer or not) with the necessary time, energy and interest needed to implement the proposed change and present it for review? > Or "you have to keep the old API around" - what is the exact motivation > behind it? If it's "so the code doesn't break", then it's a strange > motivation, because as I said, code gets [potentially] broken more often > that it doesn't. Aside from adding new methods to existing classes (which may collide with added methods in third party subclasses) and the text mock example Paul cited, additions to existing modules almost never break things. By contrast, removals almost *always* break things and hence are generally only done these days when the old ways of doing things are actively harmful (e.g. the way the misdesigned contextlib.nested API encouraged resource leaks in end-user applications, or the blurred distinction between text and binary data in Python 2 encouraged mishandling of text data). > Every alteration/addition/deletion is a potential code > breakage a priory - aliasing+deprecation is in no way more dangerous than, > let's say, new zipapp module. If it's a programmatic deprecation (rather than merely a documented one), then deprecation absolutely *does* force folks that have a "no warnings" policy for their test suites to update their code immediately. It's the main reason we prefer to only deprecate things when they're actively harmful, rather than just because they're no longer fashionable. While standard library module additions can indeed pose a compatibility challenge (due to the way module name shadowing works), the typical worst case scenario there is just needing some explicit sys.path manipulation to override the stdlib version specifically for affected applications, and even that only arises if somebody has an existing internal package name that collides with the new stdlib one (we try to avoid colliding with names on PyPI). > I in no way state that the changes should include a complete overhaul or be > done in one go. But maybe we can at least start gradually introducing some > naming changes to the oldest, most used and least changed (w.r.t. API) > modules? That's a different proposal from wholesale name changes, as it allows each change to be discussed on its individual merits rather than attempting to establish a blanket change in development policy. > The perfect example I think is collections module, but it's only from the > personal experience - other modules may be better candidates. collections has the challenge that the normal PEP 8 guidelines don't apply to the builtins, so some collections types are named like builtins (deque, defaultdict), while others follow normal class naming conventions (e.g. OrderedDict). There are also type factories like namedtuple, which again follow the builtin convention of omitting underscores from typically snake_case names. > I can't say that I surely do not underestimate the efforts required. But > what if I want to invest my time in it? It's not contributor time or even core developer time that's the main problem in cases like this, it's the flow on effect on books, tutorials, and other learning resources. When folks feel obliged to update those, we want them to able to say to themselves "Yes, the new way of doing things is clearly better for my students and readers than the old way". When we ask other people to spend time on something, the onus is on us to make sure that at least we believe their time will be well spent in the long run (even if we're not 100% successful in convincing them of that in the near term). If even we don't think that's going to be the case, then the onus is on us to avoid wasting their time in the first place. We're never going to be 100% successful in that (we're always going to approve some non-zero number of changes that, with the benefit of hindsight, turn out to have been more trouble than they were worth), but it's essential that we keep the overall cost of change to the entire ecosystem in mind when reviewing proposals, rather than taking a purely local view of the costs and benefits at the initial implementation level. > If let's say I succeed and do > everything according to the developer's guide and prove on a number of the > most popular libraries that the changes indeed do not break anything - will > be patch be considered or just thrown away with a note "we already discussed > it"? It would depend on the specific change you propose, and the rationale you give for making it. If the only rationale given is "Make more compliant with PEP 8", then it will almost certainly be knocked back. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p1himik at gmail.com Mon Aug 1 09:30:30 2016 From: p1himik at gmail.com (Eugene Pakhomov) Date: Mon, 1 Aug 2016 20:30:30 +0700 Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: Thank you very much for the links and the new insights, especially regarding actual user experience with new names. A good reason to start gathering actual statistics. You have made a very good point, I will not reason about naming further without a significant amount of data showing that it still may be worth it. Regards, Eugene On Mon, Aug 1, 2016 at 7:51 PM, Nick Coghlan wrote: > On 1 August 2016 at 20:05, Eugene Pakhomov wrote: > > It was hardly all of the thoughts, or at least with little background > > information. > > The most pertinent piece of background information is > > http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.html > > The number of changes we *could* make to Python as a language is > theoretically unbounded. "Can you get current core developers > sufficiently enthusiastic about your idea to encourage you to see it > through to implementation?" is thus one of the main defenses the > language has against churn for churn's sake (it's definitely not the > only one, but it's still a hurdle that a lot of proposals fail to > clear). > > > E.g. "multiprocessing and threading" - I can't find any information on > it. > > Was it done in one go or gradually by introducing new aliases and > > deprecating old names, at it has been suggested by you (although it was 7 > > years ago)? > > All at once, and the old names haven't been removed, and will likely > remain supported indefinitely (there's nothing technically wrong with > them, they just predate the introduction of the descriptor protocol > and Python's adoption of snake_case as the preferred convention for > method and attribute names, and instead use the older Java-style API > with camelCase names and explicit getter and setter methods). > > The implementation issue is http://bugs.python.org/issue3042 but > you'll need to click through to the actual applied patches to see the > magnitude of the diffs (the patches Benjamin posted to the tracker > were only partial ones to discuss the general approach). > > For a decent list of other renames that have largely been judged to > have created more annoyance for existing users than was justified by > any related increase in naming consistency, the six.moves > compatibility module documentation includes a concise list of many of > the renames that took place in the migration to Python 3: > https://pythonhosted.org/six/#module-six.moves > > I've yet to hear a professional educator proclaim their delight at > those renaming efforts, but I *have* received a concrete suggestion > for improving the process next time we decide to start renaming things > to improve "consistency": don't do it unless we have actual user > experience testing results in hand to show that the new names are > genuinely less confusing and easier to learn than the old ones. > > > The suggestion about aliases+deprecation has been made for other modules, > > but sadly it ended up with you saying "let's do it this way" - with > nothing > > following. > > That's the second great filter for language change proposals: is there > at least one person (whether a current core developer or not) with the > necessary time, energy and interest needed to implement the proposed > change and present it for review? > > > Or "you have to keep the old API around" - what is the exact motivation > > behind it? If it's "so the code doesn't break", then it's a strange > > motivation, because as I said, code gets [potentially] broken more often > > that it doesn't. > > Aside from adding new methods to existing classes (which may collide > with added methods in third party subclasses) and the text mock > example Paul cited, additions to existing modules almost never break > things. By contrast, removals almost *always* break things and hence > are generally only done these days when the old ways of doing things > are actively harmful (e.g. the way the misdesigned contextlib.nested > API encouraged resource leaks in end-user applications, or the blurred > distinction between text and binary data in Python 2 encouraged > mishandling of text data). > > > Every alteration/addition/deletion is a potential code > > breakage a priory - aliasing+deprecation is in no way more dangerous > than, > > let's say, new zipapp module. > > If it's a programmatic deprecation (rather than merely a documented > one), then deprecation absolutely *does* force folks that have a "no > warnings" policy for their test suites to update their code > immediately. It's the main reason we prefer to only deprecate things > when they're actively harmful, rather than just because they're no > longer fashionable. > > While standard library module additions can indeed pose a > compatibility challenge (due to the way module name shadowing works), > the typical worst case scenario there is just needing some explicit > sys.path manipulation to override the stdlib version specifically for > affected applications, and even that only arises if somebody has an > existing internal package name that collides with the new stdlib one > (we try to avoid colliding with names on PyPI). > > > I in no way state that the changes should include a complete overhaul or > be > > done in one go. But maybe we can at least start gradually introducing > some > > naming changes to the oldest, most used and least changed (w.r.t. API) > > modules? > > That's a different proposal from wholesale name changes, as it allows > each change to be discussed on its individual merits rather than > attempting to establish a blanket change in development policy. > > > The perfect example I think is collections module, but it's only from the > > personal experience - other modules may be better candidates. > > collections has the challenge that the normal PEP 8 guidelines don't > apply to the builtins, so some collections types are named like > builtins (deque, defaultdict), while others follow normal class naming > conventions (e.g. OrderedDict). There are also type factories like > namedtuple, which again follow the builtin convention of omitting > underscores from typically snake_case names. > > > I can't say that I surely do not underestimate the efforts required. But > > what if I want to invest my time in it? > > It's not contributor time or even core developer time that's the main > problem in cases like this, it's the flow on effect on books, > tutorials, and other learning resources. When folks feel obliged to > update those, we want them to able to say to themselves "Yes, the new > way of doing things is clearly better for my students and readers than > the old way". > > When we ask other people to spend time on something, the onus is on us > to make sure that at least we believe their time will be well spent in > the long run (even if we're not 100% successful in convincing them of > that in the near term). If even we don't think that's going to be the > case, then the onus is on us to avoid wasting their time in the first > place. We're never going to be 100% successful in that (we're always > going to approve some non-zero number of changes that, with the > benefit of hindsight, turn out to have been more trouble than they > were worth), but it's essential that we keep the overall cost of > change to the entire ecosystem in mind when reviewing proposals, > rather than taking a purely local view of the costs and benefits at > the initial implementation level. > > > If let's say I succeed and do > > everything according to the developer's guide and prove on a number of > the > > most popular libraries that the changes indeed do not break anything - > will > > be patch be considered or just thrown away with a note "we already > discussed > > it"? > > It would depend on the specific change you propose, and the rationale > you give for making it. If the only rationale given is "Make > more compliant with PEP 8", then it will almost certainly be knocked > back. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From boekewurm at gmail.com Mon Aug 1 09:31:19 2016 From: boekewurm at gmail.com (Matthias welp) Date: Mon, 1 Aug 2016 15:31:19 +0200 Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: On 1 August 2016 at 13:45, Paul Moore wrote: > On 1 August 2016 at 11:05, Eugene Pakhomov wrote: >> Or "you have to keep the old API around" - what is the exact motivation >> behind it? If it's "so the code doesn't break", then it's a strange >> motivation, because as I said, code gets [potentially] broken more often >> that it doesn't. Every alteration/addition/deletion is a potential code >> breakage a priory - aliasing+deprecation is in no way more dangerous than, >> let's say, new zipapp module. > What timescale are you talking about here? Library authors are still > having to support 2.7 at present, so until 2.7 is completely gone, > *someone* will have to maintain code that uses both old and new names. > So either your deprecation period is very long (with a consequent cost > on the Python core developers, maintaining 2 names) or some library > authors are left having to maintain their own compatibility code. > Neither is attractive, so again, where's a practical, significant > benefit? Double names maintenance is not really an issue in my opinion. Earlier this thread there was a valid point for aliasing, which from my perspective needs little maintenance if any, with no cost towards performance. > What's the benefit to book authors or trainers who find that their > books/course materials are now out of date, and they are under > pressure to produce a new version that's "up to date"? New books get written all the time, but 'getting it up-to-date' is not needed when aliases are put in place, with deprecation warnings for e.g. when support for 2.7 and both are discontinued. That gives both authors and users enough time to adapt their relative works. > The Python core developers take their responsibility not to break > their users' code without good reason very seriously. And from long > experience, we know that we need to consider long timescales. That's > not necessarily something we like (if we were writing things from > scratch, we might well make different decisions) but it's part of the > role of maintaining software that millions of people rely on, often > for core aspects of their business. Especially on the long run people who get in touch with python will again and again find out that the names in python are not consistent. If this can be solved, why not do so? >> I can't say that I surely do not underestimate the efforts required. But >> what if I want to invest my time in it? If let's say I succeed and do >> everything according to the developer's guide and prove on a number of the >> most popular libraries that the changes indeed do not break anything - will >> be patch be considered or just thrown away with a note "we already discussed >> it"? > You've had the answer a few times in this thread. The benefits have to > outweigh the costs. Vague statements about "consistency" are not > enough, you need to show concrete benefits and show how they improve > things *for the people who have to change their code*. > [From another post] >> But the names still will be deprecated and their use hence will be >> discouraged - there's no confusion at all. > As long as both names are supported (even deprecated and discouraged > names remain supported) the Python core developers will have to pay > the cost - maintain compatibility wrappers, test both names, etc. How > long do you expect the core devs to do that? Compatebility wrappers only have to be of one type: make the content of package variable 'a' be the same as package variable 'A'. This can be done by allowing the containing structure of variables be pointed to by multiple variable names, which would allow for library maintainers to just insert the new name, and point it to the correct variable container. > # Patch defaultdict for some reason > collections.defaultdict = MyMockClass > # Now call the code under test > test_some_external_function() > # Now check the results > MyMockClass.showcallpattern() > Now, suppose that the "external function" switches to use the name > collections.DefaultDict. The above code will break, unless the two > names defaultdict and DefaultDict are updated in parallel somehow to > both point to MyMockClass. How do you propose we support that? This would be fixed with the 'aliasing variable names'-solution. > And if your answer is that the user is making incorrect use of the > stdlib, then you just broke their code. You have what you feel is a > justification, but who gets to handle the bug report? Who gets to > support a user whose production code needs a rewrite to work with > Python 3.6? Or to support the author of the external_function() > library who has angry users saying the new version broke code that > uses it, even though the library author was following the new > guidelines given in the Python documentation (I assume you'll be > including documentation updates in your proposal)? Changing supported Python versions will always change behaviour of someone's code. There's a nice statement about that on XKCD: (https://xkcd.com/1172/). Although I know this is also an extreme example, this does not mean that it is not correct: If there is a point in which we can make Python/stdlib easier to work with, then maybe we should do so to make things more consistent and easier to learn. -Matthias -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Aug 1 10:23:58 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 2 Aug 2016 00:23:58 +1000 Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: On Mon, Aug 1, 2016 at 11:31 PM, Matthias welp wrote: >> As long as both names are supported (even deprecated and discouraged >> names remain supported) the Python core developers will have to pay >> the cost - maintain compatibility wrappers, test both names, etc. How >> long do you expect the core devs to do that? > > Compatebility wrappers only have to be of one type: make the content of > package variable 'a' be the same as package variable 'A'. This can be done > by > allowing the containing structure of variables be pointed to by multiple > variable names, which would allow for library maintainers to just insert > the new name, and point it to the correct variable container. > >> # Patch defaultdict for some reason >> collections.defaultdict = MyMockClass >> # Now call the code under test >> test_some_external_function() >> # Now check the results >> MyMockClass.showcallpattern() > >> Now, suppose that the "external function" switches to use the name >> collections.DefaultDict. The above code will break, unless the two >> names defaultdict and DefaultDict are updated in parallel somehow to >> both point to MyMockClass. How do you propose we support that? > > This would be fixed with the 'aliasing variable names'-solution. > Not sure I follow; are you proposing that module attributes be able to say "I'm the same as that guy over there"? That could be done with descriptor protocol (think @property, where you can write a getter/setter that has the actual value in a differently-named public attribute), but normally, modules don't allow that, as you need to mess with the class not the instance. But there have been numerous proposals to make that easier for module authors, one way or another. That would make at least some things easier - the mocking example would work that way - but it'd still mean people have to grok more than one name when reading code, and it'd most likely mess with people's expectations in tracebacks etc (the function isn't called what I thought it was called). Or is that not what you mean by aliasing? ChrisA From ncoghlan at gmail.com Mon Aug 1 10:37:40 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 2 Aug 2016 00:37:40 +1000 Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: On 2 August 2016 at 00:23, Chris Angelico wrote: > Not sure I follow; are you proposing that module attributes be able to > say "I'm the same as that guy over there"? That could be done with > descriptor protocol (think @property, where you can write a > getter/setter that has the actual value in a differently-named public > attribute), but normally, modules don't allow that, as you need to > mess with the class not the instance. But there have been numerous > proposals to make that easier for module authors, one way or another. One of them (making __class__ writable on module instances) was actually implemented in Python 3.5, but omitted from the What's New docs and given a relatively cryptic description in the NEWS file: http://bugs.python.org/issue27505 So if someone wanted to try their hand at documentation for that which is comprehensible to folks that aren't necessarily experts in: - the import system; - the metaclass machinery; and - the descriptor protocol I can review it. I'm just not currently sure where to start in writing it, as what mainly needs to be covered is how folks can *use* it to change the behaviour of module level attribute lookups, rather than the precise mechanics of how it works :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From g.rodola at gmail.com Mon Aug 1 10:51:24 2016 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Mon, 1 Aug 2016 16:51:24 +0200 Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: On Mon, Aug 1, 2016 at 12:05 PM, Eugene Pakhomov wrote: > It was hardly all of the thoughts, or at least with little background > information. > > E.g. "multiprocessing and threading" - I can't find any information on it. > Was it done in one go or gradually by introducing new aliases and > deprecating old names, at it has been suggested by you (although it was 7 > years ago)? > > The suggestion about aliases+deprecation has been made for other modules, > but sadly it ended up with you saying "let's do it this way" - with nothing > following. > > Or "you have to keep the old API around" - what is the exact motivation > behind it? If it's "so the code doesn't break", then it's a strange > motivation, because as I said, code gets [potentially] broken more often > that it doesn't. Every alteration/addition/deletion is a potential code > breakage a priory - aliasing+deprecation is in no way more dangerous than, > let's say, new zipapp module. > > I in no way state that the changes should include a complete overhaul or > be done in one go. But maybe we can at least start gradually introducing > some naming changes to the oldest, most used and least changed (w.r.t. API) > modules? > The perfect example I think is collections module, but it's only from the > personal experience - other modules may be better candidates. > > I can't say that I surely do not underestimate the efforts required. But > what if I want to invest my time in it? If let's say I succeed and do > everything according to the developer's guide and prove on a number of the > most popular libraries that the changes indeed do not break anything - will > be patch be considered or just thrown away with a note "we already > discussed it"? > > Regards, > Eugene > Python 3 already broke quite a lot of stuff "just to be better/more-consistent" and we're still into a state where there are many people stuck with Python 2.7 because the migration cost is considered too high or just not worth it. Introducing such a change would increase this cost and make the two Python versions (2 and 3) even more different. It also has a mnemonic cost because you double the size of API names you'll have to remember for virtually zero additional value other than the fact that the names are more consistent. I understand the rationale but I think any proposal making Python 3 more incompatible with python 2 should have a *very* huge barrier in terms of acceptance. > On Mon, Aug 1, 2016 at 7:39 AM, Guido van Rossum wrote: > >> Thoughts of the core dev were expressed clearly earlier in this thread. >> >> On Sun, Jul 31, 2016 at 12:47 PM, Eugene Pakhomov >> wrote: >> > I'm on Ralph's side here. "Why is this thing named the other way?" was >> one >> > of the first questions I asked. And people whom I occasionally teach >> about >> > Python, ask the same question over and over again. >> > >> > Code breakage happens (PEP 3151 - didn't know about it till it almost >> bit my >> > leg off), so we can't shy away from it completely. >> > Is there any link on the previous thoughts of the core devs on the >> matter? >> > Especially regarding the amount of potential code breakage. >> > I'm genuinely interested, as I think that this amount is negligible if >> the >> > new names will be gradually introduced along with a deprecation notice >> on >> > (end eventual removal of) the old ones. >> > As far as I can see, it can only do some harm if someone uses a >> discouraged >> > "import *" or monkey-patches some new methods into Python standard >> > classes/modules, and updates his Python installation. >> > >> > Regards, >> > Eugene >> > >> > >> > On Mon, Aug 1, 2016 at 1:46 AM, Ralph Broenink > > >> > wrote: >> >> >> >> I'm a bit sad that I'm clearly on the loosing side of the argument. I >> now >> >> believe that I must be grossly underestimating the amount of effort and >> >> overestimating the potential gain. However, I still feel that we should >> >> strive for consistency in the long run. I do not propose to do this at >> once, >> >> but I feel that at least some collaborated effort would be nice. (If >> not >> >> only for this kind of mail threads.) >> >> >> >> If I would start an effort to - for instance - 'fix' some camelCased >> >> modules, and attempt to make it 100% backwards compatible, including >> tests, >> >> would there be any chance it could be merged at some point? Otherwise, >> I >> >> feel it would be a totally pointless effort ;). >> >> >> >> On Tue, 26 Jul 2016 at 18:29 Brett Cannon wrote: >> >>> >> >>> On Mon, 25 Jul 2016 at 13:03 Mark Mollineaux < >> bufordsharkley at gmail.com> >> >>> wrote: >> >>>> >> >>>> I've pined for this (and feel a real mental pain every time I use one >> >>>> of those poorlyCased names)-- I end up using a lot of mental space >> >>>> remembering exactly HOW each stdlib isn't consistent. >> >>>> >> >>>> Aliasing consistent names in each case seems like a real win all >> >>>> around, personally. >> >>> >> >>> >> >>> For those that want consistent names, you could create a PyPI package >> >>> that is nothing more than the aliased names as suggested. >> >>> >> >>> Otherwise I get the desire for consistency, but as pointed out by a >> bunch >> >>> of other core devs, we have thought about this many times and always >> reach >> >>> the same conclusion that the amount of work and potential code >> breakage is >> >>> too great. >> >>> >> >>> -Brett >> >>> >> >>>> >> >>>> >> >>>> On Mon, Jul 25, 2016 at 10:55 AM, Ralph Broenink >> >>>> wrote: >> >>>> > Hi python-ideas, >> >>>> > >> >>>> > As you all know, the Python stdlib can sometimes be a bit of an >> >>>> > inconsistent >> >>>> > mess that can be surprising in how it names things. This is mostly >> >>>> > caused by >> >>>> > the fact that several modules were developed before the >> introduction >> >>>> > of >> >>>> > PEP-8, and now we're stuck with the older naming within these >> modules. >> >>>> > >> >>>> > It has been said and discussed in the past [1][2] that the stdlib >> is >> >>>> > in fact >> >>>> > inconsistent, but fixing this has almost always been disregarded as >> >>>> > being >> >>>> > too painful (after all, we don't want a new Python 3 all over >> again). >> >>>> > However, this way, we will never move away from these >> inconsistencies. >> >>>> > Perhaps this is fine, but I think we should at least consider >> >>>> > providing >> >>>> > function and class names that are unsurprising for developers. >> >>>> > >> >>>> > While maintaining full backwards compatibility, my idea is that we >> >>>> > should >> >>>> > offer consistently named aliases in -eventually- all stdlib >> modules. >> >>>> > For >> >>>> > instance, with Python 2.6, the threading module received this >> >>>> > treatment, but >> >>>> > unfortunately this was not expanded to all modules. >> >>>> > >> >>>> > What am I speaking of precisely? I have done a quick survey of the >> >>>> > stdlib >> >>>> > and found the following examples. Please note, this is a highly >> >>>> > opinionated >> >>>> > list; some names may have been chosen with a very good reason, and >> >>>> > others >> >>>> > are just a matter of taste. Hopefully you agree with at least some >> of >> >>>> > them: >> >>>> > >> >>>> > * The CamelCasing in some modules are the most obvious culprits, >> >>>> > e.g. >> >>>> > logging and unittest. There is obviously an issue regarding >> subclasses >> >>>> > and >> >>>> > methods that are supposed to be overridden, but I feel we could >> make >> >>>> > it >> >>>> > work. >> >>>> > >> >>>> > * All lower case class names, such as collections.defaultdict and >> >>>> > collections.deque, should be CamelCased. Another example is >> datetime, >> >>>> > which >> >>>> > uses names such as timedelta instead of TimeDelta. >> >>>> > >> >>>> > * Inconsistent names all together, such as re.sub, which I feel >> >>>> > should be >> >>>> > re.replace (cf. str.replace). But also re.finditer and re.findall, >> but >> >>>> > no >> >>>> > re.find. >> >>>> > >> >>>> > * Names that do not reflect actual usage, such as >> >>>> > ssl.PROTOCOL_SSLv23, >> >>>> > which can in fact not be used as client for SSLv2. >> >>>> > >> >>>> > * Underscore usage, such as tarfile.TarFile.gettarinfo (should it >> >>>> > not be >> >>>> > get_tar_info?), http.client.HTTPConnection.getresponse vs >> >>>> > set_debuglevel, >> >>>> > and pathlib.Path.samefile vs pathlib.Path.read_text. And is it >> >>>> > pkgutil.iter_modules or is it pathlib.Path.iterdir (or >> re.finditer)? >> >>>> > >> >>>> > * Usage of various abbreviations, such as in filecmp.cmp >> >>>> > >> >>>> > * Inconsistencies between similar modules, e.g. between >> >>>> > tarfile.TarFile.add and zipfile.ZipFile.write. >> >>>> > >> >>>> > These are just some examples of inconsistent and surprising naming >> I >> >>>> > could >> >>>> > find, other categories are probably also conceivable. Another >> subject >> >>>> > for >> >>>> > reconsideration would be attribute and argument names, but I >> haven't >> >>>> > looked >> >>>> > for those in my quick survey. >> >>>> > >> >>>> > For all of these inconsistencies, I think we should make a >> >>>> > 'consistently' >> >>>> > named alternative, and alias the original variant with them (or the >> >>>> > other >> >>>> > way around), without defining a deprecation timeline for the >> original >> >>>> > names. >> >>>> > This should make it possible to eventually make the stdlib >> consistent, >> >>>> > Pythonic and unsurprising. >> >>>> > >> >>>> > What would you think of such an effort? >> >>>> > >> >>>> > Regards, >> >>>> > Ralph Broenink >> >>>> > >> >>>> > [1] >> >>>> > >> https://mail.python.org/pipermail/python-ideas/2010-January/006755.html >> >>>> > [2] >> >>>> > >> https://mail.python.org/pipermail/python-dev/2009-March/086646.html >> >>>> > >> >>>> > >> >>>> > _______________________________________________ >> >>>> > Python-ideas mailing list >> >>>> > Python-ideas at python.org >> >>>> > https://mail.python.org/mailman/listinfo/python-ideas >> >>>> > Code of Conduct: http://python.org/psf/codeofconduct/ >> >>>> _______________________________________________ >> >>>> Python-ideas mailing list >> >>>> Python-ideas at python.org >> >>>> https://mail.python.org/mailman/listinfo/python-ideas >> >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> >> >> _______________________________________________ >> >> Python-ideas mailing list >> >> Python-ideas at python.org >> >> https://mail.python.org/mailman/listinfo/python-ideas >> >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > >> > >> > >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From boekewurm at gmail.com Mon Aug 1 11:35:48 2016 From: boekewurm at gmail.com (Matthias welp) Date: Mon, 1 Aug 2016 17:35:48 +0200 Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: On 1 August 2016 at 16:23, Chris Angelico wrote: >On Mon, Aug 1, 2016 at 11:31 PM, Matthias welp wrote: >>> Now, suppose that the "external function" switches to use the name >>> collections.DefaultDict. The above code will break, unless the two >>> names defaultdict and DefaultDict are updated in parallel somehow to >>> both point to MyMockClass. How do you propose we support that? >> >> This would be fixed with the 'aliasing variable names'-solution. >> > > Not sure I follow; are you proposing that module attributes be able to > say "I'm the same as that guy over there"? That could be done with > descriptor protocol (think @property, where you can write a > getter/setter that has the actual value in a differently-named public > attribute), but normally, modules don't allow that, as you need to > mess with the class not the instance. But there have been numerous > proposals to make that easier for module authors, one way or another. > That would make at least some things easier - the mocking example > would work that way - but it'd still mean people have to grok more > than one name when reading code, and it'd most likely mess with > people's expectations in tracebacks etc (the function isn't called > what I thought it was called). Or is that not what you mean by > aliasing? By aliasing I meant that the names of the fuctions/variables/ classes (variables) are all using the same value pointer/location. That could mean that in debugging the name of the variable is used as the name of the function. e.g. debugging get_int results in 'in get_int(), line 5' but it's original getint results in 'in getint(), line 5'. Another option is 'in get_int(), line 5 of getint()' if you want to retain the source location and name. It could be descriptors for Python library implementations, but in C it could be implemented as a pointer to instead of a struct containing , or it could compile(?) to use the same reference. I am not familiar enough with the structure of CPython and how it's variable lookups are built, but these are just a few ideas. -Matthias -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Aug 1 13:33:14 2016 From: brett at python.org (Brett Cannon) Date: Mon, 01 Aug 2016 17:33:14 +0000 Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: On Mon, Aug 1, 2016, 08:36 Matthias welp wrote: > On 1 August 2016 at 16:23, Chris Angelico wrote: > >On Mon, Aug 1, 2016 at 11:31 PM, Matthias welp > wrote: > >>> Now, suppose that the "external function" switches to use the name > >>> collections.DefaultDict. The above code will break, unless the two > >>> names defaultdict and DefaultDict are updated in parallel somehow to > >>> both point to MyMockClass. How do you propose we support that? > >> > >> This would be fixed with the 'aliasing variable names'-solution. > >> > > > > Not sure I follow; are you proposing that module attributes be able to > > say "I'm the same as that guy over there"? That could be done with > > descriptor protocol (think @property, where you can write a > > getter/setter that has the actual value in a differently-named public > > attribute), but normally, modules don't allow that, as you need to > > mess with the class not the instance. But there have been numerous > > proposals to make that easier for module authors, one way or another. > > That would make at least some things easier - the mocking example > > would work that way - but it'd still mean people have to grok more > > than one name when reading code, and it'd most likely mess with > > people's expectations in tracebacks etc (the function isn't called > > what I thought it was called). Or is that not what you mean by > > aliasing? > > By aliasing I meant that the names of the fuctions/variables/ > classes (variables) are all using the same value pointer/location. > That could mean that in debugging the name of the variable > is used as the name of the function. e.g. debugging get_int > results in 'in get_int(), line 5' but it's original getint results in > 'in getint(), line 5'. Another option is 'in get_int(), line 5 of > getint()' > if you want to retain the source location and name. > And all of that requires work beyond simple aliasing by assignment. That means writing code to make this work as well as tests to make sure nothing breaks (on top of the documentation). Multiple core devs have now said why this isn't a technical problem. Nick has pointed out that unless someone does a study showing the new names would be worth the effort then the situation will not change. At this point the core devs have either muted this thread and thus you're not reaching them anymore or we are going to continue to give the same answer and we feel like we're repeating ourselves and this is a drain on our time. I know everyone involved on this thread means well (including you, Matthias), but do realize that not letting this topic go is a drain on people's time. Every email you write is taking the time of hundreds of people, so it's not just taking 10 minutes of my spare time to read and respond to this email while I'm on vacation (happy BC Day), but it's a minute for everyone else who is on this mailing list to simply decide what to do with it (you can't assume people can mute threads thanks to the variety of email clients out there). So every email sent to this list literally takes an accumulative time of hours from people to deal with. So when multiple core devs have given an answer and what it will take to change the situation then please accept that answer. Otherwise you run the risk of frustrating the core devs by making us feel like we're are not being listened to or trusted. And that is part of what leads to burnout for the core devs. -brett > It could be descriptors for Python library implementations, but in C > it could be implemented as a pointer to instead of > a struct containing , or it could compile(?) to use > the same reference. > > I am not familiar enough with the structure of CPython and > how it's variable lookups are built, but these are just a few ideas. > > -Matthias > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Aug 1 15:00:11 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 1 Aug 2016 12:00:11 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160801024738.GC13777@ando.pearwood.info> References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> Message-ID: Something to keep in mind: the math module is written in C, and will remain that way for the time being (see recent discussion on, I think, this list and also the discussion when we added math.isclose() which means it will be for floats only. My first thought is that not every one line function needs to be in the standard library. However, as this thread shows, there are some complications to be considered, so maybe it does make sense to have them hashed out. Regarding NaN: In [4]: nan = float('nan') In [6]: nan > 5 Out[6]: False In [7]: 5 > nan Out[7]: False This follows the IEEE spec -- so the only correct result from clip(x, float('nan')) is NaN. Steven D'Aprano wrote: I don't care too much whether the parameters are mandatory or have > defaults, so long as it is *possible* to pass something for the lower > and upper bounds which mean "unbounded". I think the point was that if one of the liimts in unbounded, then you can jsut use min or max... though I think I agree -- you may have code where the limits are sometimes unbounded, and sometimes not -- nice to have a way to have only one code path. (1) Explicitly pass -INFINITY or +INFINITY as needed; but which that's it then. > infinity, float or Decimal? If you pass the wrong one, you may have to > pay the cost of converting your values to float/Decimal, which could end > up expensive if you have a lot of them. well, as above, if it's in the math module, it's only float.... you could add one ot the Decimal module, too, I suppose. > (2) Pass a NAN as the bounds. With my implementation, that actually > works! But it's a surprising accident of implementation, it feels wrong > and looks weird, and violates IEEE754 -- don't do that. > (3) Use some special Infimum and Supremum objects which are smaller > than, and greater than, every other value. But we don't have such > objects, so you'd need to create your own. > that's what float('inf') already is -- let's use them. > (4) Use None as a placeholder for "no limit". That's my preferred > option. reasonable enough -- and would make the API a bit easier -- both for matching different types, and because there is no literal or pre-existing object for Inf. -Chris On Sun, Jul 31, 2016 at 7:47 PM, Steven D'Aprano wrote: > On Sun, Jul 31, 2016 at 09:38:44PM +0200, Victor Stinner wrote: > > I dislike this API. What's the point of calling clamp(x)? clamp(b, a) is > > min(a, b) and clamp(a, max_val=b) is just max(a, b). > > You have that the wrong way around. If you supply a lower-bounds, you > must take the max(), not the min(). If you supply a upper-bounds, you > take the min(), not the max(). It's easy to get wrong. > > > > My point is that all parameters must be mandatory. > > I don't care too much whether the parameters are mandatory or have > defaults, so long as it is *possible* to pass something for the lower > and upper bounds which mean "unbounded". There are four obvious > alternatives (well three obvious ones and one surprising one): > > (1) Explicitly pass -INFINITY or +INFINITY as needed; but which > infinity, float or Decimal? If you pass the wrong one, you may have to > pay the cost of converting your values to float/Decimal, which could end > up expensive if you have a lot of them. > > (2) Pass a NAN as the bounds. With my implementation, that actually > works! But it's a surprising accident of implementation, it feels wrong > and looks weird, and again, it may require converting the values to > float/Decimal. > > (3) Use some special Infimum and Supremum objects which are smaller > than, and greater than, every other value. But we don't have such > objects, so you'd need to create your own. > > (4) Use None as a placeholder for "no limit". That's my preferred > option. > > Of course, even if None is accepted as "no limit", the caller can still > explicitly provide an infinity if they prefer. > > As I said, I don't particularly care whether the lower and upper bounds > have default values. But I think it is useful and elegant to accept None > (as well as infinity) to mean "no limit". > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Aug 1 16:10:44 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 2 Aug 2016 06:10:44 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> Message-ID: <20160801201044.GA6608@ando.pearwood.info> On Mon, Aug 01, 2016 at 12:00:11PM -0700, Chris Barker wrote: > Something to keep in mind: > > the math module is written in C, and will remain that way for the time > being (see recent discussion on, I think, this list and also the discussion > when we added math.isclose() > > which means it will be for floats only. Not necessarily. py> import math py> math.factorial(100) 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000 Not a float :-) It means that this clamp() function would have to be implemented in C. It *doesn't* mean that it will have to convert its arguments to floats, or reject non-float arguments. As my implementation shows, this should work with any ordered numeric type if clamp() calls the Python < and > operators (i.e. the __lt__ and __gt__ dunders). Let the objects themselves do any numeric conversions *if necessary*, there's no need for clamp() to convert the arguments to floats and call the native C double < and > operators. (I presume that there's a way to call Python operators from C code.) > My first thought is that not every one line function needs to be in the > standard library. However, as this thread shows, there are some > complications to be considered, so maybe it does make sense to have them > hashed out. Indeed. > Regarding NaN: > > In [4]: nan = float('nan') > In [6]: nan > 5 > Out[6]: False > In [7]: 5 > nan > Out[7]: False NANs are *unordered* values: they are neither greater than, nor less than, any other value. > This follows the IEEE spec -- so the only correct result from > > clip(x, float('nan')) is NaN. I don't agree that this is the "only correct result". We only clamp the value if it is less than the lower bound, or greater than the upper bound. Otherwise we leave it untouched. So, given: clamp(x, lower, upper) we say: if x < lower: x = lower elif x > upper: x = upper If lower or upper are NANs, then neither condition will ever be true, and x will never be clamped to a NAN (unless it is already a NAN). That's why I said that it was an accident of implementation that passing a NAN as one of the lower or upper bounds will be equivalent to setting the bounds to minus/plus infinity: the value will never be less than NAN, or greater than NAN. I suppose we could rule that case out: if either bound is a NAN, raise an exception. But that will require a conversion to float, which may fail. I'd rather just document that passing NANs as bounds will lead to implementation-specific behaviour that you cannot rely on it. If you want to specify an unbounded limit, pass None or an infinity with the right sign. > Steven D'Aprano wrote: > > I don't care too much whether the parameters are mandatory or have > > defaults, so long as it is *possible* to pass something for the lower > > and upper bounds which mean "unbounded". > > I think the point was that if one of the liimts in unbounded, then you can > jsut use min or max... > > though I think I agree -- you may have code where the limits are sometimes > unbounded, and sometimes not -- nice to have a way to have only one code > path. That's exactly my thinking. The last thing you want to do is to inspect the bounds, then decide whether you need to call min(), max() or clamp(). Not only is it a pain, but as Victor inadvertently showed, it's easy to get mixed up and call the wrong function. > (1) Explicitly pass -INFINITY or +INFINITY as needed; > but which > > that's it then. > > > infinity, float or Decimal? If you pass the wrong one, you may have to > > pay the cost of converting your values to float/Decimal, which could end > > up expensive if you have a lot of them. > > well, as above, if it's in the math module, it's only float.... you could > add one ot the Decimal module, too, I suppose. I'm pretty sure that a C implementation can be type agnostic and simply rely on the Python < and > operators. > > (2) Pass a NAN as the bounds. With my implementation, that actually > > works! But it's a surprising accident of implementation, it feels wrong > > and looks weird, > > and violates IEEE754 -- don't do that. What part of IEEE-754 do you think it violates? I don't think it violates anything. But I agree, don't do that. If you do, you'll get whatever the implementation happens to do, no promises or guarantees. [...] > > (4) Use None as a placeholder for "no limit". That's my preferred > > option. > > reasonable enough -- and would make the API a bit easier -- both for > matching different types, and because there is no literal or pre-existing > object for Inf. I agree with that reasoning. -- Steve From alexander.belopolsky at gmail.com Mon Aug 1 16:17:40 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 1 Aug 2016 16:17:40 -0400 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160801201044.GA6608@ando.pearwood.info> References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> Message-ID: On Mon, Aug 1, 2016 at 4:10 PM, Steven D'Aprano wrote: > (I presume that there's a way to call Python operators from C code.) Yes, see . From guido at python.org Mon Aug 1 17:31:16 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 1 Aug 2016 14:31:16 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 Message-ID: PEP 484 doesn't change Python's syntax. Therefore it has no good syntax to offer for declaring the type of variables, and instead you have to write e.g. a = 0 # type: float b = [] # type: List[int] c = None # type: Optional[str] I'd like to address this in the future, and I think the most elegant syntax would be to let you write these as follows: a: float = 0 b: List[int] = [] c: Optional[str] = None (I've considered a 'var' keyword in the past, but there just are too many variables named 'var' in my code. :-) There are some corner cases to consider. First, to declare a variable's type without giving it an initial value, we can write this: a: float Second, when these occur in a class body, they can define either class variables or instance variables. Do we need to be able to specify which? Third, there's an annoying thing with tuples/commas here. On the one hand, in a function declaration, we may see (a: int = 0, b: str = ''). On the other hand, in an assignment, we may see a, b = 0, '' Suppose we wanted to add types to the latter. Would we write this as a, b: int, str = 0, '' or as a: int, b: str = 0, '' ??? Personally I think neither is acceptable, and we should just write it as a: int = 0 b: str = '' but this is a slight step back from a, b = 0, '' # type: (int, str) -- --Guido van Rossum (python.org/~guido) From rosuav at gmail.com Mon Aug 1 17:40:39 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 2 Aug 2016 07:40:39 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Tue, Aug 2, 2016 at 7:31 AM, Guido van Rossum wrote: > I'd like to address this in the future, and I think the most elegant > syntax would be to let you write these as follows: > > a: float = 0 > b: List[int] = [] > c: Optional[str] = None > > There are some corner cases to consider. Additional case, unless it's patently obvious to someone with more 484 experience than I: what happens with chained assignment? a = b = c = 0 Does each variable get separately tagged, or does one tag apply to all? ChrisA From ethan at stoneleaf.us Mon Aug 1 17:46:38 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 01 Aug 2016 14:46:38 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <579FC33E.2030301@stoneleaf.us> On 08/01/2016 02:31 PM, Guido van Rossum wrote: > Third, there's an annoying thing with tuples/commas here. On the one > hand, in a function declaration, we may see (a: int = 0, b: str = ''). > On the other hand, in an assignment, we may see > > a, b = 0, '' > > Suppose we wanted to add types to the latter. Would we write this as > > a: int, b: str = 0, '' If keeping it all on one line, I find this far more readable: - it keeps the type right next the name (imagine if there five names and types) - it mirrors the function header style (one less thing to remember) -- ~Ethan~ From ethan at stoneleaf.us Mon Aug 1 17:49:12 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 01 Aug 2016 14:49:12 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <579FC3D8.7070400@stoneleaf.us> On 08/01/2016 02:40 PM, Chris Angelico wrote: > On Tue, Aug 2, 2016 at 7:31 AM, Guido van Rossum wrote: >> I'd like to address this in the future, and I think the most elegant >> syntax would be to let you write these as follows: >> >> a: float = 0 >> b: List[int] = [] >> c: Optional[str] = None >> >> There are some corner cases to consider. > > Additional case, unless it's patently obvious to someone with more 484 > experience than I: what happens with chained assignment? > > a = b = c = 0 > > Does each variable get separately tagged, or does one tag apply to all? I would think each name would need to be typed: a: int = b: int = c: int = 0 However, if somebody was doing typing from the get-go I would imagine they would do: a, b, c: float -- ~Ethan~ From elazarg at gmail.com Mon Aug 1 18:10:49 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Mon, 01 Aug 2016 22:10:49 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On 08/01/2016 02:31 PM, Guido van Rossum wrote: > (I've considered a 'var' keyword in the past, but there just are too > many variables named 'var' in my code. :-) > > There's the obvious 'local', completing 'global' and 'nonlocal'. - Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Mon Aug 1 18:40:43 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 1 Aug 2016 16:40:43 -0600 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Mon, Aug 1, 2016 at 3:31 PM, Guido van Rossum wrote: > PEP 484 doesn't change Python's syntax. Therefore it has no good > syntax to offer for declaring the type of variables, and instead you > have to write e.g. > > a = 0 # type: float > b = [] # type: List[int] > c = None # type: Optional[str] > > I'd like to address this in the future, and I think the most elegant > syntax would be to let you write these as follows: > > a: float = 0 > b: List[int] = [] > c: Optional[str] = None > > (I've considered a 'var' keyword in the past, but there just are too > many variables named 'var' in my code. :-) As noted by someone else, what about "local": local a: float It seems like nonlocal and global would need the same treatment: nonlocal b: List[int] global c: Optional[str] > > There are some corner cases to consider. First, to declare a > variable's type without giving it an initial value, we can write this: > > a: float Isn't that currently a NameError? Is it worth making this work while preserving the error case? A "local" keyword would solve the problem, no? > > Second, when these occur in a class body, they can define either class > variables or instance variables. Do we need to be able to specify > which? In the immediate case I'd expect the former. We don't currently have a canonical way to "declare" instance attributes in the class definition. It may be worth sorting that out separately and addressing the PEP 484 aspect at that point. > > Third, there's an annoying thing with tuples/commas here. What about unpacking into explicit displays? For example: (a, b) = 0, '' [a, b] = 0, '' -eric From guido at python.org Mon Aug 1 18:58:43 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 1 Aug 2016 15:58:43 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <579FC33E.2030301@stoneleaf.us> References: <579FC33E.2030301@stoneleaf.us> Message-ID: On Mon, Aug 1, 2016 at 2:46 PM, Ethan Furman wrote: > On 08/01/2016 02:31 PM, Guido van Rossum wrote: > >> Third, there's an annoying thing with tuples/commas here. On the one >> hand, in a function declaration, we may see (a: int = 0, b: str = ''). >> On the other hand, in an assignment, we may see >> >> a, b = 0, '' >> >> Suppose we wanted to add types to the latter. Would we write this as >> >> a: int, b: str = 0, '' > > > If keeping it all on one line, I find this far more readable: > > - it keeps the type right next the name (imagine if there five names and > types) > - it mirrors the function header style (one less thing to remember) The problem with this is that the relative priorities of '=' and ',' are inverted between argument lists and assignments. And the expression on the right might be a single variable whose type is a tuple. So we'd get a: int, b: str = x But the same thing in a function definition already has a different meaning: def foo(a: int, b: str = x): ... -- --Guido van Rossum (python.org/~guido) From dmoisset at machinalis.com Mon Aug 1 19:09:34 2016 From: dmoisset at machinalis.com (Daniel Moisset) Date: Tue, 2 Aug 2016 00:09:34 +0100 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Mon, Aug 1, 2016 at 11:40 PM, Eric Snow wrote: > > As noted by someone else, what about "local": > > local a: float > > It seems like nonlocal and global would need the same treatment: > > nonlocal b: List[int] > global c: Optional[str] > The last 2 examples are actually wrong. The use of "nonlocal" and "global" is inside functions, to refer to variables that were originally defined *outside* the function. The type declaration should be near the definition (in the global/outer scope), not on its use. If we're looking for a keyword-based syntax for empty definitions, one option would be: def a: float It uses an existing keyword so no risk of name collissions, the meaning is quite accurate (you're defining a new variable), and my guess is that the syntax is distinct enough (no parenthesis or final colons which would indicate a function definition). OTOH it might break some usages of grep to look for functions, and autoindenters on code editors. -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Aug 1 19:31:38 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 1 Aug 2016 16:31:38 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <579FC33E.2030301@stoneleaf.us> References: <579FC33E.2030301@stoneleaf.us> Message-ID: On Mon, Aug 1, 2016 at 2:46 PM, Ethan Furman wrote: > On 08/01/2016 02:31 PM, Guido van Rossum wrote: > >> Third, there's an annoying thing with tuples/commas here. On the one >> hand, in a function declaration, we may see (a: int = 0, b: str = ''). >> On the other hand, in an assignment, we may see >> >> a, b = 0, '' >> >> Suppose we wanted to add types to the latter. Would we write this as >> >> a: int, b: str = 0, '' > > > If keeping it all on one line, I find this far more readable: > > - it keeps the type right next the name (imagine if there five names and > types) > - it mirrors the function header style (one less thing to remember) But what would this do? a: int, b: str = x Does the x get distributed over (a, b) or does a remain unset? The analogy with assignment suggest that x gets distributed, but the analogy with function definitions says x only goes to b. -- --Guido van Rossum (python.org/~guido) From dmoisset at machinalis.com Mon Aug 1 19:35:25 2016 From: dmoisset at machinalis.com (Daniel Moisset) Date: Tue, 2 Aug 2016 00:35:25 +0100 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Mon, Aug 1, 2016 at 10:31 PM, Guido van Rossum wrote: > > Second, when these occur in a class body, they can define either class > variables or instance variables. Do we need to be able to specify > which? > I'd say that if I have a class C with a class variable cv, instance variable iv, a good type checking system should detect: C.cv # ok C.iv # error! C().iv # ok which is something that PEP484 doesn't clarify much (and mypy flags all 3 as valid) So in short, I think it is relevant to specify differently class vs instance vars. Suppose we wanted to add types to the latter. Would we write this as > > a, b: int, str = 0, '' > > or as > > a: int, b: str = 0, '' > > ??? Personally I think neither is acceptable, and we should just write it > as > > a: int = 0 > b: str = '' > > but this is a slight step back from > > a, b = 0, '' # type: (int, str) > I'm not specially fond of the ?# type: (int, str)?. It works ok for 2 variables, but for more it is hard to mentally "align" the variable names to the type names, for example in: kind, text, start, end, line = token # type: (int, str, Tuple[int, int], Tuple[int, int], str) it's not easy to quickly answer "what is the type of 'end' here?". So I wouldn't miss that notation a lot if it went away Given that, I'm happier with both the 2-line solution and the second one-liner, which both keep the types closer to the names. But given that as you mentioned: kind: int, text:str, start: Tuple[int, int], end: Tuple[int, int], line: str = token looks a bit misleading (looks more like an assignment to token), perhaps it would avoid errors to accpt as valid only: (kind: int, text:str, start: Tuple[int, int], end: Tuple[int, int], line: str) = token other possibility if you really love the current mypy syntax (perhaps both could be valid): (kind, text, start, end, line):(int, str, Tuple[int, int], Tuple[int, int], str) = token I don't like that one very much, but perhaps it inspires ideas on someone here. Other places to think about are: * Multiple assignment (Chris mentioned these) * loop variables (in a for statement, comprehensions, generator expressions) * lambda arguments -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Mon Aug 1 19:39:50 2016 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 01 Aug 2016 23:39:50 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Mon, Aug 1, 2016 at 2:32 PM Guido van Rossum wrote: > PEP 484 doesn't change Python's syntax. Therefore it has no good > syntax to offer for declaring the type of variables, and instead you > have to write e.g. > > a = 0 # type: float > b = [] # type: List[int] > c = None # type: Optional[str] > > I'd like to address this in the future, and I think the most elegant > syntax would be to let you write these as follows: > > a: float = 0 > b: List[int] = [] > c: Optional[str] = None > > (I've considered a 'var' keyword in the past, but there just are too > many variables named 'var' in my code. :-) > > My first impression of this given the trivial int and str examples is... Why are you declaring types for things that are plainly obvious? I guess that's a way of saying pick better examples. :) Ones where the types aren't implied by obvious literals on the RHS. Your examples using complex types such as List[int] and Optional[str] are already good ones as that can't be immediately inferred. b: str = module.something(a) is a better example as without knowledge of module.something we cannot immediately infer the type and thus the type declaration might be considered useful to prevent bugs rather than annoying read and keep up to date. I predict it will be more useful for people to declare abstract interface-like types rather than concrete ones such as int or str anyways. (duck typing ftw) But my predictions shouldn't be taken too seriously. I want to see what happens. > There are some corner cases to consider. First, to declare a > variable's type without giving it an initial value, we can write this: > > a: float > I don't like this at all. We only allow pre-declaration without an assignment using keywords today. the 'local' suggestion others have mentioned is worth consideration but I worry any time we add a keyword as that breaks a lot of existing code. Cython uses 'cdef' for this but we obviously don't want that as it implies much more and isn't obvious outside of the cython context. You could potentially reuse the 'def' keyword for this. def a: List[float]. This would be a surprising new syntax for many who are used to searching code for r'^\s*def' to find function definitions. Precedent: Cython already overloads its own 'cdef' concept for both variable and function/method use. Potential alternative to the above def (ab)use: def a -> List[float] def a List[float] def List[float] a # copies the Cython ordering which seems to derive from C syntax for obvious reasons But the -> token really implies return value while the : token already implies variable type annotation. At first glance I'm not happy with these but arguments could be made. Second, when these occur in a class body, they can define either class > variables or instance variables. Do we need to be able to specify > which? > > Third, there's an annoying thing with tuples/commas here. On the one > hand, in a function declaration, we may see (a: int = 0, b: str = ''). > On the other hand, in an assignment, we may see > > a, b = 0, '' > > Suppose we wanted to add types to the latter. Would we write this as > > a, b: int, str = 0, '' > > or as > > a: int, b: str = 0, '' > > ??? Personally I think neither is acceptable, and we should just write it > as > > a: int = 0 > b: str = '' > Disallowing ": type" syntax in the presence of tuple assignment seems simple and wise to me. Easy to parse. But I understand if people disagree and want a defined way to do it. but this is a slight step back from > > a, b = 0, '' # type: (int, str) > > -- > --Guido van Rossum (python.org/~guido) > When thinking about how to spell this out in a PEP, it is worth taking into account existing ways of declaring types on variables in Python. Cython took the "Keyword Type Name" approach with "cdef double j" syntax. http://cython.readthedocs.io/en/latest/src/quickstart/cythonize.html Is it an error to write the following (poor style) code declaring a type for the same variable multiple times: c: int = module.count_things(x) compute_thing(c) if c > 3: c: str = module.get_thing(3) logging.info('end of thing 3: %s', c[-5:]) do_something(c) where c takes on multiple types within a single scope? static single assignment form would generate a c', c'', and union of c' and c'' types for the final do_something call to reason about that code. but it is entirely doable in Python and does happen in unfortunately real world messy code as variables are reused in bad ways. My preference would be to make it an error for more than one type to be declared for the same variable. First type ever mentioned within the scope wins and all others are SyntaxError worthy. Assigning to a variable in a scope before an assignment that declares its type should probably also be a SyntaxError. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Aug 1 19:41:16 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 1 Aug 2016 16:41:16 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Mon, Aug 1, 2016 at 3:40 PM, Eric Snow wrote: > On Mon, Aug 1, 2016 at 3:31 PM, Guido van Rossum wrote: >> (I've considered a 'var' keyword in the past, but there just are too >> many variables named 'var' in my code. :-) > > As noted by someone else, what about "local": > > local a: float A common use of `# type:` is in class bodies. But `local` is a bad idea there. > It seems like nonlocal and global would need the same treatment: > > nonlocal b: List[int] > global c: Optional[str] Well, they don't support assignment either, so I'd like to keep them out of this if possible. >> There are some corner cases to consider. First, to declare a >> variable's type without giving it an initial value, we can write this: >> >> a: float > > Isn't that currently a NameError? Is it worth making this work while > preserving the error case? A "local" keyword would solve the problem, > no? Currently looks like a syntax error to me. >> Second, when these occur in a class body, they can define either class >> variables or instance variables. Do we need to be able to specify >> which? > > In the immediate case I'd expect the former. We don't currently have > a canonical way to "declare" instance attributes in the class > definition. It may be worth sorting that out separately and > addressing the PEP 484 aspect at that point. My observation is that most uses of `# type: ...` in class bodies is used to declare an instance variable. >> Third, there's an annoying thing with tuples/commas here. > > What about unpacking into explicit displays? For example: > > (a, b) = 0, '' > [a, b] = 0, '' > > -eric Yeah. -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Mon Aug 1 19:41:52 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 01 Aug 2016 16:41:52 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <579FC33E.2030301@stoneleaf.us> Message-ID: <579FDE40.1040503@stoneleaf.us> On 08/01/2016 04:31 PM, Guido van Rossum wrote: > On Mon, Aug 1, 2016 at 2:46 PM, Ethan Furman wrote: >> On 08/01/2016 02:31 PM, Guido van Rossum wrote: >> >>> Third, there's an annoying thing with tuples/commas here. On the one >>> hand, in a function declaration, we may see (a: int = 0, b: str = ''). >>> On the other hand, in an assignment, we may see >>> >>> a, b = 0, '' >>> >>> Suppose we wanted to add types to the latter. Would we write this as >>> >>> a: int, b: str = 0, '' >> >> >> If keeping it all on one line, I find this far more readable: >> >> - it keeps the type right next the name (imagine if there five names and >> types) >> - it mirrors the function header style (one less thing to remember) > > But what would this do? > > a: int, b: str = x > > Does the x get distributed over (a, b) or does a remain unset? The > analogy with assignment suggest that x gets distributed, but the > analogy with function definitions says x only goes to b. When speaking of the function header I meant that we already have one way to say which name has which type, and it would be simpler (at least to understand) if we stick with one way to specify type. As far as questions such as "what would this do" I would say it should do the exact same thing as if the type info wasn't there: a: int, b: str = x simplifies to a, b = x and so x had better be a two-item iterable; on the other hand: def func(a: int, b: str = x): simplifies to def func(a, b=x): and so parameter a is mandatory while parameter b has a default and so is optional. -- ~Ethan~ From arek.bulski at gmail.com Mon Aug 1 20:12:12 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Tue, 2 Aug 2016 02:12:12 +0200 Subject: [Python-ideas] Proposing new file-like object methods Message-ID: <579fe554.d7092e0a.59607.8aa9@mx.google.com> I would like to see few methods added to file() objects: - read(size, *, offset=None)? so os.pread() be used going around file description position - write(data, *, offset=None)? analog os.pwrite() - flush(sync=False)? so os.fsync(f.fileno()) can optionally follow For text files, offset would probably be not supported or count in characters. New methods could be added like readp/writep/fsync instead of modifying current methods, its debatable. Please tell me how to go about pushing this mainstream. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Mon Aug 1 20:27:44 2016 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 1 Aug 2016 19:27:44 -0500 Subject: [Python-ideas] Proposing new file-like object methods In-Reply-To: <579fe554.d7092e0a.59607.8aa9@mx.google.com> References: <579fe554.d7092e0a.59607.8aa9@mx.google.com> Message-ID: On Aug 1, 2016 7:13 PM, "Arek Bulski" wrote: > > I would like to see few methods added to file() objects: > > > > - read(size, *, offset=None) so os.pread() be used going around file description position https://docs.python.org/3/library/os.html#os.pread (Unix) file.pread? > > - write(data, *, offset=None) analog os.pwrite() https://docs.python.org/3/library/os.html#os.pwrite (Unix) file.pwrite? > > - flush(sync=False) so os.fsync(f.fileno()) can optionally follow https://docs.python.org/3/library/os.html#os.fsync (Unix, Windows) file.fsync? > > > > For text files, offset would probably be not supported or count in characters. New methods could be added like readp/writep/fsync instead of modifying current methods, its debatable. IDK how this works when wrapped with https://docs.python.org/2/library/codecs.html#codecs.open ? > > > > Please tell me how to go about pushing this mainstream. It's often easier to build a package with the new methods: - e.g. https://pypi.python.org/pypi/backports/1.0 - https://pypi.python.org/pypi/pathlib/1.0.1 https://pypi.python.org/pypi/path.py > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Mon Aug 1 22:00:20 2016 From: random832 at fastmail.com (Random832) Date: Mon, 01 Aug 2016 22:00:20 -0400 Subject: [Python-ideas] Proposing new file-like object methods In-Reply-To: References: <579fe554.d7092e0a.59607.8aa9@mx.google.com> Message-ID: <1470103220.465244.683231857.15D2EF71@webmail.messagingengine.com> On Mon, Aug 1, 2016, at 20:27, Wes Turner wrote: > file.pread? Wouldn't that be rather like having os.fstatat? For the concept in general, I'm concerned that the equivalent functionality on windows may require that the I/O be done asynchronously, or that the file have been opened in a special way. Any Windows expert care to comment? Also, I don't think windows has pread [etc] functions. Will integrating support for it into high-level files require us to implement our own read/write logic independent of msvcrt? How does this interact with files that may have been opened in text mode via os.open or msvcrt.setmode; or do we not care about compatibility for this case? Should O_TEXT and msvcrt.setmode be deprecated? > > For text files, offset would probably be not supported or count in > characters. New methods could be added like readp/writep/fsync instead of > modifying current methods, its debatable. From steve at pearwood.info Mon Aug 1 22:55:09 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 2 Aug 2016 12:55:09 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <20160802025509.GD6608@ando.pearwood.info> On Mon, Aug 01, 2016 at 02:31:16PM -0700, Guido van Rossum wrote: > PEP 484 doesn't change Python's syntax. Therefore it has no good > syntax to offer for declaring the type of variables, and instead you > have to write e.g. > > a = 0 # type: float > b = [] # type: List[int] > c = None # type: Optional[str] > > I'd like to address this in the future, and I think the most elegant > syntax would be to let you write these as follows: > > a: float = 0 > b: List[int] = [] > c: Optional[str] = None Those examples look reasonable to me. [...] > Second, when these occur in a class body, they can define either class > variables or instance variables. Do we need to be able to specify > which? I would think so. Consider the case that you have Class.spam and Class().spam which may not be the same type. E.g. the class attribute (representing the default value used by all instances) might be a mandatory int, while the instance attribute might be Optional[int]. > Third, there's an annoying thing with tuples/commas here. On the one > hand, in a function declaration, we may see (a: int = 0, b: str = ''). > On the other hand, in an assignment, we may see > > a, b = 0, '' > > Suppose we wanted to add types to the latter. Would we write this as > > a, b: int, str = 0, '' > > or as > > a: int, b: str = 0, '' Require parens around the name:hint. (a:int), (b:str) = 0, '' Or just stick to a type hinting comment :-) What about this case? spam, eggs = [1, 2.0, 'foo'], (1, '') [a, b, c], [d, e] = spam, eggs That becomes: [(a:int), (b:float), (c:str)], [(x:int), (y:str)] = spam, eggs which is bearable, but just unpleasant enough to discourage people from doing it unless they really need to. -- Steve From steve at pearwood.info Mon Aug 1 22:57:12 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 2 Aug 2016 12:57:12 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <20160802025711.GE6608@ando.pearwood.info> On Tue, Aug 02, 2016 at 07:40:39AM +1000, Chris Angelico wrote: > Additional case, unless it's patently obvious to someone with more 484 > experience than I: what happens with chained assignment? > > a = b = c = 0 > > Does each variable get separately tagged, or does one tag apply to all? (a:int) = (b:Optional[int]) = c = 0 `a` is declared to always be an int, `b` declared to be an int or None, and `c` is not declared. -- Steve From eryksun at gmail.com Tue Aug 2 00:33:53 2016 From: eryksun at gmail.com (eryk sun) Date: Tue, 2 Aug 2016 04:33:53 +0000 Subject: [Python-ideas] Proposing new file-like object methods In-Reply-To: <1470103220.465244.683231857.15D2EF71@webmail.messagingengine.com> References: <579fe554.d7092e0a.59607.8aa9@mx.google.com> <1470103220.465244.683231857.15D2EF71@webmail.messagingengine.com> Message-ID: On Tue, Aug 2, 2016 at 2:00 AM, Random832 wrote: > On Mon, Aug 1, 2016, at 20:27, Wes Turner wrote: >> file.pread? > > Wouldn't that be rather like having os.fstatat? > > For the concept in general, I'm concerned that the equivalent > functionality on windows may require that the I/O be done > asynchronously, or that the file have been opened in a special > way. Updating the file position is all or nothing. In synchronous mode, which is what the CRT's low I/O implementation uses, the file position is always updated. However, one can atomically set the file position before a read by passing an overlapped with an offset to ReadFile, which in turn sets the ByteOffset argument of the underlying NtReadFile [1] system call. [1]: https://msdn.microsoft.com/en-us/library/ff567072 For example: import _winapi import ctypes from ctypes import wintypes kernel32 = ctypes.WinDLL('kernel32', use_last_error=True) ULONG_PTR = wintypes.WPARAM class OVERLAPPED(ctypes.Structure): class _OFF_PTR(ctypes.Union): class _OFFSET(ctypes.Structure): _fields_ = (('Offset', wintypes.DWORD), ('OffsetHigh', wintypes.DWORD)) _fields_ = (('_offset', _OFFSET), ('Pointer', wintypes.LPVOID)) _anonymous_ = ('_offset',) _fields_ = (('Internal', ULONG_PTR), ('InternalHigh', ULONG_PTR), ('_off_ptr', _OFF_PTR), ('hEvent', wintypes.HANDLE)) _anonymous_ = ('_off_ptr',) with open('test.txt', 'wb') as f: f.write(b'0123456789') h = _winapi.CreateFile('test.txt', 0x80000000, 3, 0, 3, 0, 0) buf = (ctypes.c_char * 5)() n = (wintypes.DWORD * 1)() fp = (wintypes.LARGE_INTEGER * 1)() ov = (OVERLAPPED * 1)() ov[0].Offset = 5 >>> kernel32.ReadFile(h, buf, 5, n, ov) 1 >>> buf[:] b'56789' >>> kernel32.SetFilePointerEx(h, 0, fp, 1) 1 >>> fp[0] 10 Note that after reading 5 bytes at an offset of 5, the current file position is 10. With pread() it would still be at 0. The file position can be tracked and set atomically before each read or write, but I don't think this is practical unless it's part of a larger project to use the Windows API directly. From guido at python.org Tue Aug 2 00:57:12 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 1 Aug 2016 21:57:12 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <20160802025711.GE6608@ando.pearwood.info> References: <20160802025711.GE6608@ando.pearwood.info> Message-ID: The parentheses really strike me as too much complexity. You should just split it up into multiple lines, or use a semicolon. On Mon, Aug 1, 2016 at 7:57 PM, Steven D'Aprano wrote: > On Tue, Aug 02, 2016 at 07:40:39AM +1000, Chris Angelico wrote: > >> Additional case, unless it's patently obvious to someone with more 484 >> experience than I: what happens with chained assignment? >> >> a = b = c = 0 >> >> Does each variable get separately tagged, or does one tag apply to all? > > (a:int) = (b:Optional[int]) = c = 0 > > `a` is declared to always be an int, `b` declared to be an int or None, > and `c` is not declared. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Aug 2 01:02:33 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 1 Aug 2016 22:02:33 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <20160802025509.GD6608@ando.pearwood.info> References: <20160802025509.GD6608@ando.pearwood.info> Message-ID: Regarding class variables: in my experience instance variables way outnumber class variables, except when the latter are used as defaults for instance variables. See e.g. this bit of code from mypy (and many others, actually): https://github.com/python/mypy/blob/master/mypy/nodes.py#L154. All of these declare instance variables. Many of them have no need for a default in the class, but must give one anyway or else there's nothing to put the type comment on -- you can't write defs # type: List[Statement] (it would be a NameError) and you certainly don't want to write defs = [] # type: List[Statement] (else the gods of shared mutable state will curse you) so you have to make do with defs = None # type: List[Statement] But this would be totally reasonable as defs: List[Statement] under my proposal. On Mon, Aug 1, 2016 at 7:55 PM, Steven D'Aprano wrote: > On Mon, Aug 01, 2016 at 02:31:16PM -0700, Guido van Rossum wrote: >> PEP 484 doesn't change Python's syntax. Therefore it has no good >> syntax to offer for declaring the type of variables, and instead you >> have to write e.g. >> >> a = 0 # type: float >> b = [] # type: List[int] >> c = None # type: Optional[str] >> >> I'd like to address this in the future, and I think the most elegant >> syntax would be to let you write these as follows: >> >> a: float = 0 >> b: List[int] = [] >> c: Optional[str] = None > > > Those examples look reasonable to me. > > > [...] >> Second, when these occur in a class body, they can define either class >> variables or instance variables. Do we need to be able to specify >> which? > > I would think so. Consider the case that you have Class.spam and > Class().spam which may not be the same type. E.g. the class attribute > (representing the default value used by all instances) might be a > mandatory int, while the instance attribute might be Optional[int]. > > >> Third, there's an annoying thing with tuples/commas here. On the one >> hand, in a function declaration, we may see (a: int = 0, b: str = ''). >> On the other hand, in an assignment, we may see >> >> a, b = 0, '' >> >> Suppose we wanted to add types to the latter. Would we write this as >> >> a, b: int, str = 0, '' >> >> or as >> >> a: int, b: str = 0, '' > > Require parens around the name:hint. > > (a:int), (b:str) = 0, '' > > Or just stick to a type hinting comment :-) > > > What about this case? > > spam, eggs = [1, 2.0, 'foo'], (1, '') > [a, b, c], [d, e] = spam, eggs > > That becomes: > > [(a:int), (b:float), (c:str)], [(x:int), (y:str)] = spam, eggs > > which is bearable, but just unpleasant enough to discourage people from > doing it unless they really need to. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Aug 2 01:14:30 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 1 Aug 2016 22:14:30 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: What makes you respond so vehemently to `a: float`? The `def` keyword has been proposed before, but *I* have a vehement response to it; `def` is for functions. The Cython syntax may live forever in Cython, but I don't want to add it to Python, especially since we already have function annotations using `var: type = default` -- variable declarations must somehow rhyme with this. The `local` keyword isn't horrible but would still require a `__future__` import and a long wait, which is why I'm exploring just `var: type = value`. I think we can pull it off syntactically, using a trick no more horrible than the one we already employ to restrict the LHS of an assignment even though the grammar seen by the parser is something like `expr_stmt: testlist ('=' testlist)*` (at least it was something like this long ago -- it's more complex now but the same idea still applies, since the official parser is still LR(1)). Regarding scopes, I like the way mypy currently does this -- you can only have a `# type` comment on the first assignment of a variable, and scopes are flat as they are in Python. (Mypy is really anticipating a syntax for variable declarations here.) Seems we agree on this, at least. On Mon, Aug 1, 2016 at 4:39 PM, Gregory P. Smith wrote: > > On Mon, Aug 1, 2016 at 2:32 PM Guido van Rossum wrote: >> >> PEP 484 doesn't change Python's syntax. Therefore it has no good >> syntax to offer for declaring the type of variables, and instead you >> have to write e.g. >> >> a = 0 # type: float >> b = [] # type: List[int] >> c = None # type: Optional[str] >> >> I'd like to address this in the future, and I think the most elegant >> syntax would be to let you write these as follows: >> >> a: float = 0 >> b: List[int] = [] >> c: Optional[str] = None >> >> (I've considered a 'var' keyword in the past, but there just are too >> many variables named 'var' in my code. :-) >> > > My first impression of this given the trivial int and str examples is... Why > are you declaring types for things that are plainly obvious? I guess that's > a way of saying pick better examples. :) Ones where the types aren't > implied by obvious literals on the RHS. > > Your examples using complex types such as List[int] and Optional[str] are > already good ones as that can't be immediately inferred. > > b: str = module.something(a) > > is a better example as without knowledge of module.something we cannot > immediately infer the type and thus the type declaration might be considered > useful to prevent bugs rather than annoying read and keep up to date. > > I predict it will be more useful for people to declare abstract > interface-like types rather than concrete ones such as int or str anyways. > (duck typing ftw) But my predictions shouldn't be taken too seriously. I > want to see what happens. > >> >> There are some corner cases to consider. First, to declare a >> variable's type without giving it an initial value, we can write this: >> >> a: float > > > I don't like this at all. We only allow pre-declaration without an > assignment using keywords today. the 'local' suggestion others have > mentioned is worth consideration but I worry any time we add a keyword as > that breaks a lot of existing code. Cython uses 'cdef' for this but we > obviously don't want that as it implies much more and isn't obvious outside > of the cython context. > > You could potentially reuse the 'def' keyword for this. > > def a: List[float]. > > This would be a surprising new syntax for many who are used to searching > code for r'^\s*def' to find function definitions. Precedent: Cython already > overloads its own 'cdef' concept for both variable and function/method use. > > Potential alternative to the above def (ab)use: > > def a -> List[float] > def a List[float] > def List[float] a # copies the Cython ordering which seems to derive from C > syntax for obvious reasons > > But the -> token really implies return value while the : token already > implies variable type annotation. At first glance I'm not happy with these > but arguments could be made. > >> Second, when these occur in a class body, they can define either class >> variables or instance variables. Do we need to be able to specify >> which? >> >> Third, there's an annoying thing with tuples/commas here. On the one >> hand, in a function declaration, we may see (a: int = 0, b: str = ''). >> On the other hand, in an assignment, we may see >> >> a, b = 0, '' >> >> Suppose we wanted to add types to the latter. Would we write this as >> >> a, b: int, str = 0, '' >> >> or as >> >> a: int, b: str = 0, '' >> >> ??? Personally I think neither is acceptable, and we should just write it >> as >> >> a: int = 0 >> b: str = '' > > > Disallowing ": type" syntax in the presence of tuple assignment seems simple > and wise to me. Easy to parse. But I understand if people disagree and want > a defined way to do it. > >> but this is a slight step back from >> >> a, b = 0, '' # type: (int, str) >> >> -- >> --Guido van Rossum (python.org/~guido) > > > When thinking about how to spell this out in a PEP, it is worth taking into > account existing ways of declaring types on variables in Python. Cython took > the "Keyword Type Name" approach with "cdef double j" syntax. > http://cython.readthedocs.io/en/latest/src/quickstart/cythonize.html > > Is it an error to write the following (poor style) code declaring a type for > the same variable multiple times: > > c: int = module.count_things(x) > compute_thing(c) > if c > 3: > c: str = module.get_thing(3) > logging.info('end of thing 3: %s', c[-5:]) > do_something(c) > > where c takes on multiple types within a single scope? static single > assignment form would generate a c', c'', and union of c' and c'' types for > the final do_something call to reason about that code. but it is entirely > doable in Python and does happen in unfortunately real world messy code as > variables are reused in bad ways. > > My preference would be to make it an error for more than one type to be > declared for the same variable. > First type ever mentioned within the scope wins and all others are > SyntaxError worthy. > Assigning to a variable in a scope before an assignment that declares its > type should probably also be a SyntaxError. > > -gps > -- --Guido van Rossum (python.org/~guido) From me+python at ixokai.io Tue Aug 2 01:19:12 2016 From: me+python at ixokai.io (Stephen Hansen) Date: Mon, 01 Aug 2016 22:19:12 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <1470115152.3980526.683342857.52586E3F@webmail.messagingengine.com> On Mon, Aug 1, 2016, at 02:31 PM, Guido van Rossum wrote: > Third, there's an annoying thing with tuples/commas here. On the one > hand, in a function declaration, we may see (a: int = 0, b: str = ''). > On the other hand, in an assignment, we may see > > a, b = 0, '' > > Suppose we wanted to add types to the latter. Would we write this as > > a, b: int, str = 0, '' > > or as > > a: int, b: str = 0, '' > > ??? Personally I think neither is acceptable, and we should just write it > as > > a: int = 0 > b: str = '' > > but this is a slight step back from > > a, b = 0, '' # type: (int, str) To me, option A ("a, b: int, str") and option B ("a: int, b: str") are both... overly dense. If I had to choose between one, I'd choose B. Strongly. A is flatly wrong-- 'b' looks like it binds to 'int', but in fact, it binds to 'str'. This contradicts function annotations. But I still don't really like B. To me, I'd just disallow variable annotations in unpacking. Unpacking is a wonderful feature, I love unpacking, but to combine it with annotations is a cognitive overload. Maybe that means you don't use unpacking sometimes: that's okay. Unpacking is about being concise, to me at least, and if I'm doing annotations, that means I'm accepting being more verbose for the sake of static checking. You can't be all things to all people. In this case, I say, someone has to pick which their priority is: conciseness or static checking. -- Stephen Hansen m e @ i x o k a i . i o From brenbarn at brenbarn.net Tue Aug 2 01:25:48 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Mon, 01 Aug 2016 22:25:48 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <579FC33E.2030301@stoneleaf.us> Message-ID: <57A02EDC.405@brenbarn.net> On 2016-08-01 15:58, Guido van Rossum wrote: > The problem with this is that the relative priorities of '=' and ',' > are inverted between argument lists and assignments. And the > expression on the right might be a single variable whose type is a > tuple. So we'd get > > a: int, b: str = x > > But the same thing in a function definition already has a different meaning: > > def foo(a: int, b: str = x): ... Is that really a big deal? Insofar as it's a problem, it already exists for function arguments vs. assignments without any type annotation. That is: a, b = x already means something different from def foo(a, b = x): ... So in one sense, keeping type annotations with the variables would actually maintain the current convention (namely, that assignment and argument defaults have different conventions). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From p.f.moore at gmail.com Tue Aug 2 03:54:24 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 2 Aug 2016 08:54:24 +0100 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <579FDE40.1040503@stoneleaf.us> References: <579FC33E.2030301@stoneleaf.us> <579FDE40.1040503@stoneleaf.us> Message-ID: On 2 August 2016 at 00:41, Ethan Furman wrote: > As far as questions such as "what would this do" I would say it should do > the > exact same thing as if the type info wasn't there: > > a: int, b: str = x simplifies to > > a, b = x > > and so x had better be a two-item iterable For me, a: int, b: str = x immediately says "leave a unassigned, and assign x to b". Maybe that's my C experience at work. But a, b = x immediately says tuple unpacking. So in my view, allowing annotations on unpacking syntax is going to cause a *lot* of confusion, and I'd strongly argue for only allowing type annotations on single variables: VAR [: TYPE] [= INITIAL_VALUE] For multiple variables, just use multiple lines: a: int b: str = x or a: int b: str a, b = x depending on your intention. >From what Guido says, the main use case is class variables, where complicated initialisations are extremely rare, so this shouldn't be a problem in practice. Paul From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Aug 2 05:09:17 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 2 Aug 2016 18:09:17 +0900 Subject: [Python-ideas] size of the installation of Python on mobile devices In-Reply-To: References: <8d577973-eaaf-21cf-3604-8ffd28773ba4@gmail.com> <579A49F8.9080708@egenix.com> <7bb89d4a-3930-d2fd-9736-2360008f561e@gmail.com> <22428.22132.611219.145348@turnbull.sk.tsukuba.ac.jp> <22430.57892.461912.677961@turnbull.sk.tsukuba.ac.jp> Message-ID: <22432.25405.170635.186167@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > On 1 August 2016 at 15:46, Stephen J. Turnbull > wrote: > > Victor Stinner writes: > > > > > Xavier is a core developer. He is free to dedicate his time to > > > supporting sourceless distribution :-) > > > > So are we all, core or not. But on Nick's terms (he even envisions > > releases with the "sourceless" build broken), I don't think adding to > > core is fair to Xavier's (and others') efforts in this direction. > > How would it be any different from our efforts to support other > platforms outside the primary set of Windows, Mac OS X, Linux, and > *BSD? This isn't platform support in the same sense, for one. This is a variant distribution mode. A better analogy would be the common packaging support we provide for RPMs, debs, and all the whatnots associated with different Linux and *BSD distros. AFAIK there is no such support, rather we leave it up to the individual distros, which mostly ignore our own packaging technologies (except where implied by directory structure). (OK, there's probably stuff in Tools that can be applied to that, but all I see that's obvious is a *distribution- specific* facility, Misc/RPM, which is regularly used.) Second, such platforms typical provide or lack features that *require* changes to the compilation of the interpreter or stdlib modules. By contrast, Xavier's core list can be provided by script AFAICS. > Things *definitely* break from time-to-time on those other less common > setups, and when they do, folks submit patches to fix them after they > notice (usually because they went to rebase their own work on a newer > version and discovered things didn't work as expected). I'd see this > working the same way - we wouldn't go out of our way to break > sourceless builds, but if they did break, it would be on the folks > that care about them to submit patches to resolve the problem. I understand the functional similarity. I don't think the incentives are the same. If Python crashes on your platform, you *must* fix Python, and you've fixed it for everybody using the same platform. In sourceless, if the build process leaves pyc turds or lacks something you actually need, you may change your specialized build scripts that do all the other stuff you need, and maybe even stop using Python's sourceless support (eg, in the missing requirements case). It's not yet clear that the same fixes will be appropriate for all sourceless distros. > The gain for folks that care would be getting a green light to pursue > more robust support for that model in the standard library itself, > such as clearly marking test cases that require linecache to be > working How does that green light depends on patches to the Makefile? > and perhaps even eventually developing a mechanism along the lines > of JavaScript sourcemaps that would allow linecache to keep > working, even when running on a sourceless build of the standard > library. Why not wait until such features are in development, and at the same time declare sourceless a supported distribution mode? > > It would also set an unfortunate precedent. > > What precedent do you mean? That ./configure may contain options that > aren't 100% reliable? That's already the case - I can assure you that > we *do not* consistently test all of the options reported by > "./configure --help", since what matters is that the options people > are *actually using* keep working It appears that you're arguing that this feature won't be used so it doesn't matter if it works. Nothing is 100% reliable, that's a red herring. The precedent I refer to is adding code of uncertain value and completeness, that will require maintenance by somebody, on speculation that it will be used. I suggest that this patch is premature optimization. It's more efficient to not add __pycache__ directories in the first place, but it's not necessary to do it that way, it can be done by a separate script. I believe that is true of all of the three options Xavier proposed -- a bash script would probably be fewer than 10 pipelines. If these were done by a script that could be jointly designed and maintained in a separate project by the sourceless distributors, it would be a lot more effective in discovering the common features that all of them use in the same way, and even more so for important features that only a subset use. Once there are features that *must* be done in the core build process (eg, linecache sourcemap), then it would be timely to move the common features there. IOW, I don't see why the usual arguments for "put it on PyPI first" don't apply here, with the usual strength. > Or do you mean the precedent that we're OK with folks shipping the > standard library sans source code? Of course *I* don't mean that. If an economist fails to understand the implications of licensing, he should change careers. From stephanh42 at gmail.com Tue Aug 2 05:49:20 2016 From: stephanh42 at gmail.com (Stephan Houben) Date: Tue, 2 Aug 2016 11:49:20 +0200 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <579FC33E.2030301@stoneleaf.us> <579FDE40.1040503@stoneleaf.us> Message-ID: Just to add some data points, here is how it is done in some other languages which have both types and tuple unpacking. In all cases I added the minimum of parentheses required. Scala: val (x : Int, y : String) = (42, "Hello") Ocaml: let (x: int), (y : string) = 42, "Hello" alternatively: let (x, y) : int * string = 42, "Hello" Note that if we omit the types we can do the Python-esque: let x, y = 42, "Hello" Stephan 2016-08-02 9:54 GMT+02:00 Paul Moore : > On 2 August 2016 at 00:41, Ethan Furman wrote: > > As far as questions such as "what would this do" I would say it should do > > the > > exact same thing as if the type info wasn't there: > > > > a: int, b: str = x simplifies to > > > > a, b = x > > > > and so x had better be a two-item iterable > > For me, > > a: int, b: str = x > > immediately says "leave a unassigned, and assign x to b". Maybe that's > my C experience at work. > > But > > a, b = x > > immediately says tuple unpacking. > > So in my view, allowing annotations on unpacking syntax is going to > cause a *lot* of confusion, and I'd strongly argue for only allowing > type annotations on single variables: > > VAR [: TYPE] [= INITIAL_VALUE] > > For multiple variables, just use multiple lines: > > a: int > b: str = x > > or > > a: int > b: str > a, b = x > > depending on your intention. > > From what Guido says, the main use case is class variables, where > complicated initialisations are extremely rare, so this shouldn't be a > problem in practice. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xdegaye at gmail.com Tue Aug 2 06:22:21 2016 From: xdegaye at gmail.com (Xavier de Gaye) Date: Tue, 2 Aug 2016 12:22:21 +0200 Subject: [Python-ideas] size of the installation of Python on mobile devices In-Reply-To: <22432.25405.170635.186167@turnbull.sk.tsukuba.ac.jp> References: <8d577973-eaaf-21cf-3604-8ffd28773ba4@gmail.com> <579A49F8.9080708@egenix.com> <7bb89d4a-3930-d2fd-9736-2360008f561e@gmail.com> <22428.22132.611219.145348@turnbull.sk.tsukuba.ac.jp> <22430.57892.461912.677961@turnbull.sk.tsukuba.ac.jp> <22432.25405.170635.186167@turnbull.sk.tsukuba.ac.jp> Message-ID: On 08/02/2016 11:09 AM, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > On 1 August 2016 at 15:46, Stephen J. Turnbull > > wrote: > > > Victor Stinner writes: > > > > > > > Xavier is a core developer. He is free to dedicate his time to > > > > supporting sourceless distribution :-) > > > > > > So are we all, core or not. But on Nick's terms (he even envisions > > > releases with the "sourceless" build broken), I don't think adding to > > > core is fair to Xavier's (and others') efforts in this direction. > > > > How would it be any different from our efforts to support other > > platforms outside the primary set of Windows, Mac OS X, Linux, and > > *BSD? > > This isn't platform support in the same sense, for one. This is a > variant distribution mode. A better analogy would be the common > packaging support we provide for RPMs, debs, and all the whatnots > associated with different Linux and *BSD distros. AFAIK there is no > such support, rather we leave it up to the individual distros, which > mostly ignore our own packaging technologies (except where implied by > directory structure). (OK, there's probably stuff in Tools that can > be applied to that, but all I see that's obvious is a *distribution- > specific* facility, Misc/RPM, which is regularly used.) You are mistaken. There is already at least one feature of the Python build system that fits exactly your analogy, it is called MULTIARCH. It applies to all unix systems whether they support or not multiarch (AFAIK archlinux, OS X, Android do not support multiarch). It is not a minor change in Python build system, adding about 140 lines in configure.ac and the add_multiarch_paths() function in setup.py. Xavier From ncoghlan at gmail.com Tue Aug 2 06:34:42 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 2 Aug 2016 20:34:42 +1000 Subject: [Python-ideas] size of the installation of Python on mobile devices In-Reply-To: <22432.25405.170635.186167@turnbull.sk.tsukuba.ac.jp> References: <8d577973-eaaf-21cf-3604-8ffd28773ba4@gmail.com> <579A49F8.9080708@egenix.com> <7bb89d4a-3930-d2fd-9736-2360008f561e@gmail.com> <22428.22132.611219.145348@turnbull.sk.tsukuba.ac.jp> <22430.57892.461912.677961@turnbull.sk.tsukuba.ac.jp> <22432.25405.170635.186167@turnbull.sk.tsukuba.ac.jp> Message-ID: On 2 August 2016 at 19:09, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > On 1 August 2016 at 15:46, Stephen J. Turnbull > > wrote: > > > Victor Stinner writes: > > > > > > > Xavier is a core developer. He is free to dedicate his time to > > > > supporting sourceless distribution :-) > > > > > > So are we all, core or not. But on Nick's terms (he even envisions > > > releases with the "sourceless" build broken), I don't think adding to > > > core is fair to Xavier's (and others') efforts in this direction. > > > > How would it be any different from our efforts to support other > > platforms outside the primary set of Windows, Mac OS X, Linux, and > > *BSD? > > This isn't platform support in the same sense, for one. This is a > variant distribution mode. A better analogy would be the common > packaging support we provide for RPMs, debs, and all the whatnots > associated with different Linux and *BSD distros. AFAIK there is no > such support, rather we leave it up to the individual distros, which > mostly ignore our own packaging technologies (except where implied by > directory structure). We have plenty of knobs & dials in the configure script primarily for use by *nix redistributors in their package build scripts (e.g. the various "--with-system-*" flags, as well as the Debian-specific MULTIARCH support that Xavier mentioned). The fact the existence of these *doesn't* readily spring to mind for folks not actively involved in Linux distro maintenance is a point in favour of why I think what Xavier is suggesting is reasonable: we have long experience with configure flags where the only folks that typically care those flags exist are the folks that need them, and when someone else inadvertently breaks those flags, the folks that care make the necessary updates needed to restore the functionality. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Aug 2 07:10:45 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 2 Aug 2016 21:10:45 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <579FC33E.2030301@stoneleaf.us> <579FDE40.1040503@stoneleaf.us> Message-ID: On 2 August 2016 at 17:54, Paul Moore wrote: > So in my view, allowing annotations on unpacking syntax is going to > cause a *lot* of confusion, and I'd strongly argue for only allowing > type annotations on single variables: > > VAR [: TYPE] [= INITIAL_VALUE] > > For multiple variables, just use multiple lines: > > a: int > b: str = x > > or > > a: int > b: str > a, b = x > > depending on your intention. > > From what Guido says, the main use case is class variables, where > complicated initialisations are extremely rare, so this shouldn't be a > problem in practice. +1 from me for this view: only allow syntactic annotation of single names without parentheses. That one rule eliminates all the cases where unpacking assignment and parameter assignment do starkly different things. *However*, even with that restriction, unambiguously annotated unpacking could still be supported via: (a, b) : Tuple[int, str] = x (a, b, *c) : Tuple[int, int, Tuple[int]] = y Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Aug 2 08:37:43 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 2 Aug 2016 21:37:43 +0900 Subject: [Python-ideas] size of the installation of Python on mobile devices In-Reply-To: References: <8d577973-eaaf-21cf-3604-8ffd28773ba4@gmail.com> <579A49F8.9080708@egenix.com> <7bb89d4a-3930-d2fd-9736-2360008f561e@gmail.com> <22428.22132.611219.145348@turnbull.sk.tsukuba.ac.jp> <22430.57892.461912.677961@turnbull.sk.tsukuba.ac.jp> <22432.25405.170635.186167@turnbull.sk.tsukuba.ac.jp> Message-ID: <22432.37911.335413.674163@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > The fact the existence of these *doesn't* readily spring to mind > for folks not actively involved in Linux distro maintenance is a > point in favour of why I think what Xavier is suggesting is > reasonable: I don't, and never did, think it's unreasonable -- sourceless makes a lot of technical sense to me in several environments, and I don't see any "ethical" reason to discourage sourceless in those environments either. Based on the evidence presented so far, I think adding to core is premature and the feature would benefit from cooperation among interested parties in a development environment that can move far faster than core will. But I concede I have no experience with this kind of option in Python to match yours or Xavier's. From arek.bulski at gmail.com Tue Aug 2 11:11:58 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Tue, 2 Aug 2016 17:11:58 +0200 Subject: [Python-ideas] Python-ideas Digest, Vol 117, Issue 13 In-Reply-To: References: Message-ID: I am moving the discussion on file-like methods to GitHub. Mailing list is not very usable IMHO. Please switch to that. https://github.com/python/peps/issues/66 pozdrawiam, Arkadiusz Bulski 2016-08-02 7:02 GMT+02:00 : > Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > > > Today's Topics: > > 1. Re: Trial balloon: adding variable type declarations in > support of PEP 484 (Steven D'Aprano) > 2. Re: Trial balloon: adding variable type declarations in > support of PEP 484 (Steven D'Aprano) > 3. Re: Proposing new file-like object methods (eryk sun) > 4. Re: Trial balloon: adding variable type declarations in > support of PEP 484 (Guido van Rossum) > 5. Re: Trial balloon: adding variable type declarations in > support of PEP 484 (Guido van Rossum) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 2 Aug 2016 12:55:09 +1000 > From: Steven D'Aprano > To: python-ideas at python.org > Subject: Re: [Python-ideas] Trial balloon: adding variable type > declarations in support of PEP 484 > Message-ID: <20160802025509.GD6608 at ando.pearwood.info> > Content-Type: text/plain; charset=us-ascii > > On Mon, Aug 01, 2016 at 02:31:16PM -0700, Guido van Rossum wrote: > > PEP 484 doesn't change Python's syntax. Therefore it has no good > > syntax to offer for declaring the type of variables, and instead you > > have to write e.g. > > > > a = 0 # type: float > > b = [] # type: List[int] > > c = None # type: Optional[str] > > > > I'd like to address this in the future, and I think the most elegant > > syntax would be to let you write these as follows: > > > > a: float = 0 > > b: List[int] = [] > > c: Optional[str] = None > > > Those examples look reasonable to me. > > > [...] > > Second, when these occur in a class body, they can define either class > > variables or instance variables. Do we need to be able to specify > > which? > > I would think so. Consider the case that you have Class.spam and > Class().spam which may not be the same type. E.g. the class attribute > (representing the default value used by all instances) might be a > mandatory int, while the instance attribute might be Optional[int]. > > > > Third, there's an annoying thing with tuples/commas here. On the one > > hand, in a function declaration, we may see (a: int = 0, b: str = ''). > > On the other hand, in an assignment, we may see > > > > a, b = 0, '' > > > > Suppose we wanted to add types to the latter. Would we write this as > > > > a, b: int, str = 0, '' > > > > or as > > > > a: int, b: str = 0, '' > > Require parens around the name:hint. > > (a:int), (b:str) = 0, '' > > Or just stick to a type hinting comment :-) > > > What about this case? > > spam, eggs = [1, 2.0, 'foo'], (1, '') > [a, b, c], [d, e] = spam, eggs > > That becomes: > > [(a:int), (b:float), (c:str)], [(x:int), (y:str)] = spam, eggs > > which is bearable, but just unpleasant enough to discourage people from > doing it unless they really need to. > > > -- > Steve > > > ------------------------------ > > Message: 2 > Date: Tue, 2 Aug 2016 12:57:12 +1000 > From: Steven D'Aprano > To: python-ideas at python.org > Subject: Re: [Python-ideas] Trial balloon: adding variable type > declarations in support of PEP 484 > Message-ID: <20160802025711.GE6608 at ando.pearwood.info> > Content-Type: text/plain; charset=us-ascii > > On Tue, Aug 02, 2016 at 07:40:39AM +1000, Chris Angelico wrote: > > > Additional case, unless it's patently obvious to someone with more 484 > > experience than I: what happens with chained assignment? > > > > a = b = c = 0 > > > > Does each variable get separately tagged, or does one tag apply to all? > > (a:int) = (b:Optional[int]) = c = 0 > > `a` is declared to always be an int, `b` declared to be an int or None, > and `c` is not declared. > > > > -- > Steve > > > ------------------------------ > > Message: 3 > Date: Tue, 2 Aug 2016 04:33:53 +0000 > From: eryk sun > To: python-ideas at python.org > Subject: Re: [Python-ideas] Proposing new file-like object methods > Message-ID: > < > CACL+1au+bX_tKQvxDJC1SBLSObKbqAUPjj9dpmcr9s1izq0Cng at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Tue, Aug 2, 2016 at 2:00 AM, Random832 wrote: > > On Mon, Aug 1, 2016, at 20:27, Wes Turner wrote: > >> file.pread? > > > > Wouldn't that be rather like having os.fstatat? > > > > For the concept in general, I'm concerned that the equivalent > > functionality on windows may require that the I/O be done > > asynchronously, or that the file have been opened in a special > > way. > > Updating the file position is all or nothing. In synchronous mode, > which is what the CRT's low I/O implementation uses, the file position > is always updated. However, one can atomically set the file position > before a read by passing an overlapped with an offset to ReadFile, > which in turn sets the ByteOffset argument of the underlying > NtReadFile [1] system call. > > [1]: https://msdn.microsoft.com/en-us/library/ff567072 > > For example: > > import _winapi > import ctypes > from ctypes import wintypes > > kernel32 = ctypes.WinDLL('kernel32', use_last_error=True) > > ULONG_PTR = wintypes.WPARAM > > class OVERLAPPED(ctypes.Structure): > class _OFF_PTR(ctypes.Union): > class _OFFSET(ctypes.Structure): > _fields_ = (('Offset', wintypes.DWORD), > ('OffsetHigh', wintypes.DWORD)) > _fields_ = (('_offset', _OFFSET), > ('Pointer', wintypes.LPVOID)) > _anonymous_ = ('_offset',) > _fields_ = (('Internal', ULONG_PTR), > ('InternalHigh', ULONG_PTR), > ('_off_ptr', _OFF_PTR), > ('hEvent', wintypes.HANDLE)) > _anonymous_ = ('_off_ptr',) > > with open('test.txt', 'wb') as f: > f.write(b'0123456789') > > h = _winapi.CreateFile('test.txt', 0x80000000, 3, 0, 3, 0, 0) > buf = (ctypes.c_char * 5)() > n = (wintypes.DWORD * 1)() > fp = (wintypes.LARGE_INTEGER * 1)() > > ov = (OVERLAPPED * 1)() > ov[0].Offset = 5 > > > >>> kernel32.ReadFile(h, buf, 5, n, ov) > 1 > >>> buf[:] > b'56789' > >>> kernel32.SetFilePointerEx(h, 0, fp, 1) > 1 > >>> fp[0] > 10 > > Note that after reading 5 bytes at an offset of 5, the current file > position is 10. With pread() it would still be at 0. The file position > can be tracked and set atomically before each read or write, but I > don't think this is practical unless it's part of a larger project to > use the Windows API directly. > > > ------------------------------ > > Message: 4 > Date: Mon, 1 Aug 2016 21:57:12 -0700 > From: Guido van Rossum > To: "Steven D'Aprano" > Cc: Python-Ideas > Subject: Re: [Python-ideas] Trial balloon: adding variable type > declarations in support of PEP 484 > Message-ID: > < > CAP7+vJJLz2wRPpkrtb8j8v7vTRFqjbw_b4a5pwAv0-5HyKNf4w at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > The parentheses really strike me as too much complexity. You should > just split it up into multiple lines, or use a semicolon. > > On Mon, Aug 1, 2016 at 7:57 PM, Steven D'Aprano > wrote: > > On Tue, Aug 02, 2016 at 07:40:39AM +1000, Chris Angelico wrote: > > > >> Additional case, unless it's patently obvious to someone with more 484 > >> experience than I: what happens with chained assignment? > >> > >> a = b = c = 0 > >> > >> Does each variable get separately tagged, or does one tag apply to all? > > > > (a:int) = (b:Optional[int]) = c = 0 > > > > `a` is declared to always be an int, `b` declared to be an int or None, > > and `c` is not declared. > > > > > > > > -- > > Steve > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > --Guido van Rossum (python.org/~guido) > > > ------------------------------ > > Message: 5 > Date: Mon, 1 Aug 2016 22:02:33 -0700 > From: Guido van Rossum > To: "Steven D'Aprano" > Cc: Python-Ideas > Subject: Re: [Python-ideas] Trial balloon: adding variable type > declarations in support of PEP 484 > Message-ID: > < > CAP7+vJJHYLwzbnOqrbqD3vboktXG-Rp+8n-MYxm8uf_Ge2m_yQ at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > Regarding class variables: in my experience instance variables way > outnumber class variables, except when the latter are used as defaults > for instance variables. > > See e.g. this bit of code from mypy (and many others, actually): > https://github.com/python/mypy/blob/master/mypy/nodes.py#L154. All of > these declare instance variables. Many of them have no need for a > default in the class, but must give one anyway or else there's nothing > to put the type comment on -- you can't write > > defs # type: List[Statement] > > (it would be a NameError) and you certainly don't want to write > > defs = [] # type: List[Statement] > > (else the gods of shared mutable state will curse you) so you have to > make do with > > defs = None # type: List[Statement] > > But this would be totally reasonable as > > defs: List[Statement] > > under my proposal. > > On Mon, Aug 1, 2016 at 7:55 PM, Steven D'Aprano > wrote: > > On Mon, Aug 01, 2016 at 02:31:16PM -0700, Guido van Rossum wrote: > >> PEP 484 doesn't change Python's syntax. Therefore it has no good > >> syntax to offer for declaring the type of variables, and instead you > >> have to write e.g. > >> > >> a = 0 # type: float > >> b = [] # type: List[int] > >> c = None # type: Optional[str] > >> > >> I'd like to address this in the future, and I think the most elegant > >> syntax would be to let you write these as follows: > >> > >> a: float = 0 > >> b: List[int] = [] > >> c: Optional[str] = None > > > > > > Those examples look reasonable to me. > > > > > > [...] > >> Second, when these occur in a class body, they can define either class > >> variables or instance variables. Do we need to be able to specify > >> which? > > > > I would think so. Consider the case that you have Class.spam and > > Class().spam which may not be the same type. E.g. the class attribute > > (representing the default value used by all instances) might be a > > mandatory int, while the instance attribute might be Optional[int]. > > > > > >> Third, there's an annoying thing with tuples/commas here. On the one > >> hand, in a function declaration, we may see (a: int = 0, b: str = ''). > >> On the other hand, in an assignment, we may see > >> > >> a, b = 0, '' > >> > >> Suppose we wanted to add types to the latter. Would we write this as > >> > >> a, b: int, str = 0, '' > >> > >> or as > >> > >> a: int, b: str = 0, '' > > > > Require parens around the name:hint. > > > > (a:int), (b:str) = 0, '' > > > > Or just stick to a type hinting comment :-) > > > > > > What about this case? > > > > spam, eggs = [1, 2.0, 'foo'], (1, '') > > [a, b, c], [d, e] = spam, eggs > > > > That becomes: > > > > [(a:int), (b:float), (c:str)], [(x:int), (y:str)] = spam, eggs > > > > which is bearable, but just unpleasant enough to discourage people from > > doing it unless they really need to. > > > > > > -- > > Steve > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > --Guido van Rossum (python.org/~guido) > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > ------------------------------ > > End of Python-ideas Digest, Vol 117, Issue 13 > ********************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Aug 2 11:16:00 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 3 Aug 2016 01:16:00 +1000 Subject: [Python-ideas] Python-ideas Digest, Vol 117, Issue 13 In-Reply-To: References: Message-ID: On Wed, Aug 3, 2016 at 1:11 AM, Arek Bulski wrote: > I am moving the discussion on file-like methods to GitHub. Mailing list is > not very usable IMHO. Please switch to that. > > https://github.com/python/peps/issues/66 > > pozdrawiam, > Arkadiusz Bulski I think you'll find the mailing list far more usable if you turn off digests and get the individual posts as they come. ChrisA From chris.barker at noaa.gov Tue Aug 2 13:02:11 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 2 Aug 2016 10:02:11 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160801201044.GA6608@ando.pearwood.info> References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> Message-ID: On Mon, Aug 1, 2016 at 1:10 PM, Steven D'Aprano wrote: > It means that this clamp() function would have to be implemented in C. > It *doesn't* mean that it will have to convert its arguments to floats, > or reject non-float arguments. > sure -- though I hope it would special-case and be most efficient for floats. However, for the most part, the math module IS all about floats -- though I don't suppose there is any harm in allowing other types. > This follows the IEEE spec -- so the only correct result from > > > > clip(x, float('nan')) is NaN. > > I don't agree that this is the "only correct result". > I don't think IEE754 says anything about a "clip" function, but a NaN is neither greater than, less than, nor equal to any value -- so when you ask if, for example, for the input value if it is less than or equal to NaN, but NaN if NaN is great then the input, there is no answer -- the spirit of IEEE NaN handling leads to NaN being the only correct result. Note that I'm pretty sure that min() and max() are wrong here, too. That's why I said that it was an accident of implementation that passing > a NAN as one of the lower or upper bounds will be equivalent to setting > the bounds to minus/plus infinity: exactly -- and we should not have the results be an accident of implimentation -- but rather be thougth out, and follow IEE754 intent. I suppose we could rule that case out: if either bound is a NAN, raise > an exception. But that will require a conversion to float, which may > fail. I'd rather just document that passing NANs as bounds will lead to > implementation-specific behaviour that you cannot rely on it. why not say that passing NaNs as bounds will result in NaN result? At least if the value is a float -- if it's anything else than maybe an exception, as NaN does not make sense for anything else anyway. > If you > want to specify an unbounded limit, pass None or an infinity with the > right sign. exactly -- that's there, so why not let NaN be NaN? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Aug 2 13:19:03 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 2 Aug 2016 13:19:03 -0400 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> Message-ID: On Tue, Aug 2, 2016 at 1:02 PM, Chris Barker wrote: > I don't think IEE754 says anything about a "clip" function, but a NaN is > neither greater than, less than, nor equal to any value -- so when you ask > if, for example, for the input value if it is less than or equal to NaN, but > NaN if NaN is great then the input, there is no answer -- the spirit of IEEE > NaN handling leads to NaN being the only correct result. > > Note that I'm pretty sure that min() and max() are wrong here, too. Builtin max is wrong >>> nan = float('nan') >>> max(nan, 1) nan >>> max(1, nan) 1 but numpy's maximum gets it right: >>> numpy.maximum(nan, 1) nan >>> numpy.maximum(1, nan) nan And here is how numpy defines clip: >>> numpy.clip(nan, 1, 2) nan >>> numpy.clip(1, 1, nan) 1.0 >>> numpy.clip(1, nan, nan) 1.0 I am not sure I like the last two results. From steve at pearwood.info Tue Aug 2 14:22:04 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 3 Aug 2016 04:22:04 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> Message-ID: <20160802182203.GF6608@ando.pearwood.info> On Tue, Aug 02, 2016 at 10:02:11AM -0700, Chris Barker wrote: > I don't think IEE754 says anything about a "clip" function, but a NaN is > neither greater than, less than, nor equal to any value -- so when you ask > if, for example, for the input value if it is less than or equal to NaN, Then the answer MUST be False. That's specified by IEEE-754. > but NaN if NaN is great then the input, there is no answer -- the spirit of > IEEE NaN handling leads to NaN being the only correct result. Incorrect. The IEEE standard actually does specify the behaviour of comparisons with NANs, and Python does it correctly. See also the Decimal module. > Note that I'm pretty sure that min() and max() are wrong here, too. In a later update to the standard, IEEE-854 if I remember correctly, there's a whole series of extra comparisons which will return NAN given a NAN argument, including alternate versions of max() and min(). I can't remember which is in 754 and which in 854, but there are two versions of each: min #1 (x, NAN) must return x min #2 (x, NAN) must return NAN and same for max. In any case, clamping is based of < and > comparisons, which are well-specified by IEEE 754 even when NANs are included: # pseudo-code for op in ( < <= == >= > ): assert all(x op NAN is False for all x) assert all(x != NAN is True for all x) If you want the comparisons to return NANs, you're looking at different comparisons from a different standard. > > That's why I said that it was an accident of implementation that passing > > a NAN as one of the lower or upper bounds will be equivalent to setting > > the bounds to minus/plus infinity: > > exactly -- and we should not have the results be an accident of > implimentation -- but rather be thougth out, and follow IEE754 intent. There are lots of places in Python where the behaviour is an accident of implementation. I don't think that this clamp() function should convert the arguments to floats (which may fail, or lose precision) just to prevent the caller passing a NAN as one of the bounds. Just document the fact that you shouldn't use NANs as lower/upper bounds. > why not say that passing NaNs as bounds will result in NaN result? Because that means that EVERY call to clamp() has to convert both bounds to float and see if they are NANs. If you're calling this in a loop: for x in range(1000): print(clamp(x, lower, upper)) each bound gets converted to float and checked for NAN-ness 1000 times. This is a total waste of effort for 99.999% of uses, where the bounds will be numbers. > At least > if the value is a float -- if it's anything else than maybe an exception, > as NaN does not make sense for anything else anyway. Of course it does: clamp() can change the result type, so it could return a NAN. But why would you bother? clamp(Fraction(1, 2), 0.75, 100) returns 0.75; clamp(100, 0.0, 50.0) returns 50.0; > > If you > > want to specify an unbounded limit, pass None or an infinity with the > > right sign. > > exactly -- that's there, so why not let NaN be NaN? Because it is unnecessary. If you want a NAN-enforcing version of clamp(), it is *easy* to write a wrapper: def clamp_nan(value, lower, upper): if math.isnan(lower) or math.isnan(upper): return float('nan') return clamp(value, lower, upper) A nice, easy four-line function. But if clamp() does that check, it's hard to avoid the checks when you don't want them. I know my bounds aren't NANs, and I'm calling clamp() in big loop. Don't check them a million times, they're never going to be NANs, just do the comparisons. It's easy to write a stricter function if you need it. It's hard to write a less strict function when you don't want the strictness. -- Steve From random832 at fastmail.com Tue Aug 2 14:45:49 2016 From: random832 at fastmail.com (Random832) Date: Tue, 02 Aug 2016 14:45:49 -0400 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160802182203.GF6608@ando.pearwood.info> References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> Message-ID: <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> On Tue, Aug 2, 2016, at 14:22, Steven D'Aprano wrote: > In any case, clamping is based of < and > comparisons, which are > well-specified by IEEE 754 even when NANs are included: Sure, but what the standard doesn't say is exactly what sequence of comparisons is entailed by a clamp function. def clamp(x, a, b): if x < a: return a else: if x > b: return b else: return x def clamp(x, a, b): if a <= x: if x <= b: return x else: return b else: return a There are, technically, eight possible naive implementations, varying along three axes: - which of a or b is compared first - x < a (or a > x) vs x >= a (or a <= x) - x > b (or b < x) vs x <= b (or b >= x) And then there are implementations that may do more than two comparisons. def clamp(x, a, b): if a <= x <= b: return x elif x < a: return a else: return b All such functions are equivalent if {a, b, x} is a set over which the relational operators define a total ordering, and a <= b. However, this is not the case if NaN is used for any of the arguments. From victor.stinner at gmail.com Tue Aug 2 15:28:24 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 2 Aug 2016 21:28:24 +0200 Subject: [Python-ideas] size of the installation of Python on mobile devices In-Reply-To: <22430.57892.461912.677961@turnbull.sk.tsukuba.ac.jp> References: <8d577973-eaaf-21cf-3604-8ffd28773ba4@gmail.com> <579A49F8.9080708@egenix.com> <7bb89d4a-3930-d2fd-9736-2360008f561e@gmail.com> <22428.22132.611219.145348@turnbull.sk.tsukuba.ac.jp> <22430.57892.461912.677961@turnbull.sk.tsukuba.ac.jp> Message-ID: Le 1 ao?t 2016 07:46, "Stephen J. Turnbull" < turnbull.stephen.fw at u.tsukuba.ac.jp> a ?crit : > > Victor Stinner writes: > > > Xavier is a core developer. He is free to dedicate his time to > > supporting sourceless distribution :-) > > So are we all, core or not. But on Nick's terms (he even envisions > releases with the "sourceless" build broken), I don't think adding to > core is fair to Xavier's (and others') efforts in this direction. It > would also set an unfortunate precedent. > > Steve Sorry, I don't have the bandwith to follow this discussion closely. I consider that mobile platforms (smartphones, tablets, and all these new funny and shiny embedded devices) are important and Python would loose a huge "market" if we decide that such platforms are not good enough for our language... I consider that it's important but I'm not interested enough to spend much time on it. For example, I don't know well configure and Makefile, whereas many changes are needed in these files to enhance cross compilation and android support. Sorry but I don't think that discussing if mobile platforms are important is worth it. Check numbers. More mobile are sold than computers. Compare markets of smartphones versus desktop computers and watch the fall of the desktop computer market... Can we now focus on fixing concrete technical issues? Xavier already explained that "sourceless" distribution are already used in the wild, it's not something new. It will not open a gate to the hell (of closed source softwares). People who want to protect their IP (hide their code) already patch and compile CPython... Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Aug 2 16:50:37 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 2 Aug 2016 13:50:37 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> Message-ID: On Tue, Aug 2, 2016 at 11:45 AM, Random832 wrote: > Sure, but what the standard doesn't say is exactly what sequence of > comparisons is entailed by a clamp function. > > def clamp(x, a, b): > if x < a: return a > else: > if x > b: return b > else: return x > > def clamp(x, a, b): > if a <= x: > if x <= b: return x > else: return b > else: return a > > There are, technically, eight possible naive implementations, varying > along three axes: > Exactly-- I thought this was self evident, but apparently not -- thanks for spelling it out. > All such functions are equivalent if {a, b, x} is a set over which the > relational operators define a total ordering, and a <= b. However, this > is not the case if NaN is used for any of the arguments. > Exactly again -- NaN's are kind of a pain :-( As for the convert to floats issue -- correctness is more important than performance, and performance is probably most important for the special case of all floats. (or floats and integers, I suppose) -- i'm sure we can find a solution. LIkely something iike the second option above would work fine, and also work for anything with an ordinary total ordering. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Tue Aug 2 17:56:27 2016 From: mertz at gnosis.cx (David Mertz) Date: Tue, 2 Aug 2016 17:56:27 -0400 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> Message-ID: It really doesn't make sense to me that a clamp() function would *limit to* a NaN. I realize one can write various implementations that act differently here, but the principle of least surprise seems violated by letting a NaN be an actual end point IMO. numpy.clip() seems to behave just right, FWIW. If I'm asking for a value that is "not more than (less than) my bounds" I don't want all my values to become NaN's. by virtue of that. A regular number is not affirmatively outside the bounds of a NaN in a commonsense way. It's just not comparable to it at all. So for that purpose?no determinable bound?a NaN amounts to the same thing as an Inf (but just for this purpose, they are very different in other contexts). I guess I think of clamping as "pulling in the values that are *definitely* outside a range." Nothing is definite that way with a NaN. So 'clamp(nan, -1, 1)' is conceptually 'nan' because the unknown value might or might not be "really" outside the range (but not definitely). And likewise 'clamp(X, nan, nan)' has to be X because we can't *know* X is outside the range. A NaN, conceptually, is a value that *might* exist, if only we knew more and could determine it.... but as is, it's just "unknown." On Tue, Aug 2, 2016 at 4:50 PM, Chris Barker wrote: > On Tue, Aug 2, 2016 at 11:45 AM, Random832 wrote: > >> Sure, but what the standard doesn't say is exactly what sequence of >> comparisons is entailed by a clamp function. >> >> def clamp(x, a, b): >> if x < a: return a >> else: >> if x > b: return b >> else: return x >> >> def clamp(x, a, b): >> if a <= x: >> if x <= b: return x >> else: return b >> else: return a >> >> There are, technically, eight possible naive implementations, varying >> along three axes: >> > > Exactly-- I thought this was self evident, but apparently not -- thanks > for spelling it out. > > >> All such functions are equivalent if {a, b, x} is a set over which the >> relational operators define a total ordering, and a <= b. However, this >> is not the case if NaN is used for any of the arguments. >> > > Exactly again -- NaN's are kind of a pain :-( > > As for the convert to floats issue -- correctness is more important than > performance, and performance is probably most important for the special > case of all floats. (or floats and integers, I suppose) -- i'm sure we can > find a solution. LIkely something iike the second option above would work > fine, and also work for anything with an ordinary total ordering. > > -Chris > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Aug 2 18:09:19 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Aug 2016 15:09:19 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Mon, Aug 1, 2016 at 4:35 PM, Daniel Moisset wrote: > On Mon, Aug 1, 2016 at 10:31 PM, Guido van Rossum wrote: >> Second, when these occur in a class body, they can define either class >> variables or instance variables. Do we need to be able to specify >> which? > > I'd say that if I have a class C with a class variable cv, instance variable > iv, a good type checking system should detect: > > C.cv # ok > C.iv # error! > C().iv # ok > > which is something that PEP484 doesn't clarify much (and mypy flags all 3 as > valid) Yeah, this is all because you can't express that in Python either. When you see an assignment in a class body you can't tell if it's meant as an instance variable default or a class variable (except for some specific cases -- e.g. nested class definitions are pretty obvious :-). > So in short, I think it is relevant to specify differently class vs instance > vars. Agreed, we need to invent a workable proposal for this. Here's a strawman: - The default is an instance variable (backed by a class variable as default if there's an initial value) - To define a class variable, prefix the type with 'class` Example: class C: a: int # instance var b: List[int] = None # instance var c: class List[int] = [] # class var Class variables must come with an initializer; instance variables may or may not have an initializer. (Bonus: instance variable initializers must be immutable values.) Regarding everything involving multiple variables (either `a = b = 0` or `a, b = 0, 0`) I propose that those cannot be combined with types. Period. -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Tue Aug 2 18:31:40 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 2 Aug 2016 18:31:40 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators Message-ID: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> Hi, This is a new PEP to add asynchronous generators to Python 3.6. The PEP is also available at [1]. There is a reference implementation [2] that supports everything that the PEP proposes to add. [1] https://www.python.org/dev/peps/pep-0525/ [2] https://github.com/1st1/cpython/tree/async_gen Thank you! PEP: 525 Title: Asynchronous Generators Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov Discussions-To: Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 28-Jul-2016 Python-Version: 3.6 Post-History: 02-Aug-2016 Abstract ======== PEP 492 introduced support for native coroutines and ``async``/``await`` syntax to Python 3.5. It is proposed here to extend Python's asynchronous capabilities by adding support for *asynchronous generators*. Rationale and Goals =================== Regular generators (introduced in PEP 255) enabled an elegant way of writing complex *data producers* and have them behave like an iterator. However, currently there is no equivalent concept for the *asynchronous iteration protocol* (``async for``). This makes writing asynchronous data producers unnecessarily complex, as one must define a class that implements ``__aiter__`` and ``__anext__`` to be able to use it in an ``async for`` statement. Essentially, the goals and rationale for PEP 255, applied to the asynchronous execution case, hold true for this proposal as well. Performance is an additional point for this proposal: in our testing of the reference implementation, asynchronous generators are **2x** faster than an equivalent implemented as an asynchronous iterator. As an illustration of the code quality improvement, consider the following class that prints numbers with a given delay once iterated:: class Ticker: """Yield numbers from 0 to `to` every `delay` seconds.""" def __init__(self, delay, to): self.delay = delay self.i = 0 self.to = to def __aiter__(self): return self async def __anext__(self): i = self.i if i >= self.to: raise StopAsyncIteration self.i += 1 if i: await asyncio.sleep(self.delay) return i The same can be implemented as a much simpler asynchronous generator:: async def ticker(delay, to): """Yield numbers from 0 to `to` every `delay` seconds.""" for i in range(to): yield i await asyncio.sleep(delay) Specification ============= This proposal introduces the concept of *asynchronous generators* to Python. This specification presumes knowledge of the implementation of generators and coroutines in Python (PEP 342, PEP 380 and PEP 492). Asynchronous Generators ----------------------- A Python *generator* is any function containing one or more ``yield`` expressions:: def func(): # a function return def genfunc(): # a generator function yield We propose to use the same approach to define *asynchronous generators*:: async def coro(): # a coroutine function await smth() async def asyncgen(): # an asynchronous generator function await smth() yield 42 The result of calling an *asynchronous generator function* is an *asynchronous generator object*, which implements the asynchronous iteration protocol defined in PEP 492. It is a ``SyntaxError`` to have a non-empty ``return`` statement in an asynchronous generator. Support for Asynchronous Iteration Protocol ------------------------------------------- The protocol requires two special methods to be implemented: 1. An ``__aiter__`` method returning an *asynchronous iterator*. 2. An ``__anext__`` method returning an *awaitable* object, which uses ``StopIteration`` exception to "yield" values, and ``StopAsyncIteration`` exception to signal the end of the iteration. Asynchronous generators define both of these methods. Let's manually iterate over a simple asynchronous generator:: async def genfunc(): yield 1 yield 2 gen = genfunc() assert gen.__aiter__() is gen assert await gen.__anext__() == 1 assert await gen.__anext__() == 2 await gen.__anext__() # This line will raise StopAsyncIteration. Finalization ------------ PEP 492 requires an event loop or a scheduler to run coroutines. Because asynchronous generators are meant to be used from coroutines, they also require an event loop to run and finalize them. Asynchronous generators can have ``try..finally`` blocks, as well as ``async with``. It is important to provide a guarantee that, even when partially iterated, and then garbage collected, generators can be safely finalized. For example:: async def square_series(con, to): async with con.transaction(): cursor = con.cursor( 'SELECT generate_series(0, $1) AS i', to) async for row in cursor: yield row['i'] ** 2 async for i in square_series(con, 1000): if i == 100: break The above code defines an asynchronous generator that uses ``async with`` to iterate over a database cursor in a transaction. The generator is then iterated over with ``async for``, which interrupts the iteration at some point. The ``square_series()`` generator will then be garbage collected, and without a mechanism to asynchronously close the generator, Python interpreter would not be able to do anything. To solve this problem we propose to do the following: 1. Implement an ``aclose`` method on asynchronous generators returning a special *awaitable*. When awaited it throws a ``GeneratorExit`` into the suspended generator and iterates over it until either a ``GeneratorExit`` or a ``StopAsyncIteration`` occur. This is very similar to what the ``close()`` method does to regular Python generators, except that an event loop is required to execute ``aclose()``. 2. Raise a ``RuntimeError``, when an asynchronous generator executes a ``yield`` expression in its ``finally`` block (using ``await`` is fine, though):: async def gen(): try: yield finally: await asyncio.sleep(1) # Can use 'await'. yield # Cannot use 'yield', # this line will trigger a # RuntimeError. 3. Add two new methods to the ``sys`` module: ``set_asyncgen_finalizer()`` and ``get_asyncgen_finalizer()``. The idea behind ``sys.set_asyncgen_finalizer()`` is to allow event loops to handle generators finalization, so that the end user does not need to care about the finalization problem, and it just works. When an asynchronous generator is iterated for the first time, it stores a reference to the current finalizer. If there is none, a ``RuntimeError`` is raised. This provides a strong guarantee that every asynchronous generator object will always have a finalizer installed by the correct event loop. When an asynchronous generator is about to be garbage collected, it calls its cached finalizer. The assumption is that the finalizer will schedule an ``aclose()`` call with the loop that was active when the iteration started. For instance, here is how asyncio is modified to allow safe finalization of asynchronous generators:: # asyncio/base_events.py class BaseEventLoop: def run_forever(self): ... old_finalizer = sys.get_asyncgen_finalizer() sys.set_asyncgen_finalizer(self._finalize_asyncgen) try: ... finally: sys.set_asyncgen_finalizer(old_finalizer) ... def _finalize_asyncgen(self, gen): self.create_task(gen.aclose()) ``sys.set_asyncgen_finalizer()`` is thread-specific, so several event loops running in parallel threads can use it safely. Asynchronous Generator Object ----------------------------- The object is modeled after the standard Python generator object. Essentially, the behaviour of asynchronous generators is designed to replicate the behaviour of synchronous generators, with the only difference in that the API is asynchronous. The following methods and properties are defined: 1. ``agen.__aiter__()``: Returns ``agen``. 2. ``agen.__anext__()``: Returns an *awaitable*, that performs one asynchronous generator iteration when awaited. 3. ``agen.asend(val)``: Returns an *awaitable*, that pushes the ``val`` object in the ``agen`` generator. When the ``agen`` has not yet been iterated, ``val`` must be ``None``. Example:: async def gen(): await asyncio.sleep(0.1) v = yield 42 print(v) await asyncio.sleep(0.2) g = gen() await g.asend(None) # Will return 42 after sleeping # for 0.1 seconds. await g.asend('hello') # Will print 'hello' and # raise StopAsyncIteration # (after sleeping for 0.2 seconds.) 4. ``agen.athrow(typ, [val, [tb]])``: Returns an *awaitable*, that throws an exception into the ``agen`` generator. Example:: async def gen(): try: await asyncio.sleep(0.1) yield 'hello' except ZeroDivisionError: await asyncio.sleep(0.2) yield 'world' g = gen() v = await g.asend(None) print(v) # Will print 'hello' after # sleeping for 0.1 seconds. v = await g.athrow(ZeroDivisionError) print(v) # Will print 'world' after $ sleeping 0.2 seconds. 5. ``agen.aclose()``: Returns an *awaitable*, that throws a ``GeneratorExit`` exception into the generator. The *awaitable* can either return a yielded value, if ``agen`` handled the exception, or ``agen`` will be closed and the exception will propagate back to the caller. 6. ``agen.__name__`` and ``agen.__qualname__``: readable and writable name and qualified name attributes. 7. ``agen.ag_await``: The object that ``agen`` is currently *awaiting* on, or ``None``. This is similar to the currently available ``gi_yieldfrom`` for generators and ``cr_await`` for coroutines. 8. ``agen.ag_frame``, ``agen.ag_running``, and ``agen.ag_code``: defined in the same way as similar attributes of standard generators. ``StopIteration`` and ``StopAsyncIteration`` are not propagated out of asynchronous generators, and are replaced with a ``RuntimeError``. Implementation Details ---------------------- Asynchronous generator object (``PyAsyncGenObject``) shares the struct layout with ``PyGenObject``. In addition to that, the reference implementation introduces three new objects: 1. ``PyAsyncGenASend``: the awaitable object that implements ``__anext__`` and ``asend()`` methods. 2. ``PyAsyncGenAThrow``: the awaitable object that implements ``athrow()`` and ``aclose()`` methods. 3. ``_PyAsyncGenWrappedValue``: every directly yielded object from an asynchronous generator is implicitly boxed into this structure. This is how the generator implementation can separate objects that are yielded using regular iteration protocol from objects that are yielded using asynchronous iteration protocol. ``PyAsyncGenASend`` and ``PyAsyncGenAThrow`` are awaitables (they have ``__await__`` methods returning ``self``) and are coroutine-like objects (implementing ``__iter__``, ``__next__``, ``send()`` and ``throw()`` methods). Essentially, they control how asynchronous generators are iterated: .. image:: pep-0525-1.png :align: center :width: 80% PyAsyncGenASend and PyAsyncGenAThrow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``PyAsyncGenASend`` is a coroutine-like object that drives ``__anext__`` and ``asend()`` methods and implements the asynchronous iteration protocol. ``agen.asend(val)`` and ``agen.__anext__()`` return instances of ``PyAsyncGenASend`` (which hold references back to the parent ``agen`` object.) The data flow is defined as follows: 1. When ``PyAsyncGenASend.send(val)`` is called for the first time, ``val`` is pushed to the parent ``agen`` object (using existing facilities of ``PyGenObject``.) Subsequent iterations over the ``PyAsyncGenASend`` objects, push ``None`` to ``agen``. When a ``_PyAsyncGenWrappedValue`` object is yielded, it is unboxed, and a ``StopIteration`` exception is raised with the unwrapped value as an argument. 2. When ``PyAsyncGenASend.throw(*exc)`` is called for the first time, ``*exc`` is throwed into the parent ``agen`` object. Subsequent iterations over the ``PyAsyncGenASend`` objects, push ``None`` to ``agen``. When a ``_PyAsyncGenWrappedValue`` object is yielded, it is unboxed, and a ``StopIteration`` exception is raised with the unwrapped value as an argument. 3. ``return`` statements in asynchronous generators raise ``StopAsyncIteration`` exception, which is propagated through ``PyAsyncGenASend.send()`` and ``PyAsyncGenASend.throw()`` methods. ``PyAsyncGenAThrow`` is very similar to ``PyAsyncGenASend``. The only difference is that ``PyAsyncGenAThrow.send()``, when called first time, throws an exception into the parent ``agen`` object (instead of pushing a value into it.) New Standard Library Functions and Types ---------------------------------------- 1. ``types.AsyncGeneratorType`` -- type of asynchronous generator object. 2. ``sys.set_asyncgen_finalizer()`` and ``sys.get_asyncgen_finalizer()`` methods to set up asynchronous generators finalizers in event loops. 3. ``inspect.isasyncgen()`` and ``inspect.isasyncgenfunction()`` introspection functions. Backwards Compatibility ----------------------- The proposal is fully backwards compatible. In Python 3.5 it is a ``SyntaxError`` to define an ``async def`` function with a ``yield`` expression inside, therefore it's safe to introduce asynchronous generators in 3.6. Performance =========== Regular Generators ------------------ There is no performance degradation for regular generators. The following micro benchmark runs at the same speed on CPython with and without asynchronous generators:: def gen(): i = 0 while i < 100000000: yield i i += 1 list(gen()) Improvements over asynchronous iterators ---------------------------------------- The following micro-benchmark shows that asynchronous generators are about **2.3x faster** than asynchronous iterators implemented in pure Python:: N = 10 ** 7 async def agen(): for i in range(N): yield i class AIter: def __init__(self): self.i = 0 def __aiter__(self): return self async def __anext__(self): i = self.i if i >= N: raise StopAsyncIteration self.i += 1 return i Design Considerations ===================== ``aiter()`` and ``anext()`` builtins ------------------------------------ Originally, PEP 492 defined ``__aiter__`` as a method that should return an *awaitable* object, resulting in an asynchronous iterator. However, in CPython 3.5.2, ``__aiter__`` was redefined to return asynchronous iterators directly. To avoid breaking backwards compatibility, it was decided that Python 3.6 will support both ways: ``__aiter__`` can still return an *awaitable* with a ``DeprecationWarning`` being issued. Because of this dual nature of ``__aiter__`` in Python 3.6, we cannot add a synchronous implementation of ``aiter()`` built-in. Therefore, it is proposed to wait until Python 3.7. Asynchronous list/dict/set comprehensions ----------------------------------------- Syntax for asynchronous comprehensions is unrelated to the asynchronous generators machinery, and should be considered in a separate PEP. Asynchronous ``yield from`` --------------------------- While it is theoretically possible to implement ``yield from`` support for asynchronous generators, it would require a serious redesign of the generators implementation. ``yield from`` is also less critical for asynchronous generators, since there is no need provide a mechanism of implementing another coroutines protocol on top of coroutines. And to compose asynchronous generators a simple ``async for`` loop can be used:: async def g1(): yield 1 yield 2 async def g2(): async for v in g1(): yield v Why the ``asend()`` and ``athrow()`` methods are necessary ---------------------------------------------------------- They make it possible to implement concepts similar to ``contextlib.contextmanager`` using asynchronous generators. For instance, with the proposed design, it is possible to implement the following pattern:: @async_context_manager async def ctx(): await open() try: yield finally: await close() async with ctx(): await ... Another reason is that it is possible to push data and throw exceptions into asynchronous generators using the object returned from ``__anext__`` object, but it is hard to do that correctly. Adding explicit ``asend()`` and ``athrow()`` will pave a safe way to accomplish that. In terms of implementation, ``asend()`` is a slightly more generic version of ``__anext__``, and ``athrow()`` is very similar to ``aclose()``. Therefore having these methods defined for asynchronous generators does not add any extra complexity. Example ======= A working example with the current reference implementation (will print numbers from 0 to 9 with one second delay):: async def ticker(delay, to): for i in range(to): yield i await asyncio.sleep(delay) async def run(): async for i in ticker(1, 10): print(i) import asyncio loop = asyncio.get_event_loop() try: loop.run_until_complete(run()) finally: loop.close() Implementation ============== The complete reference implementation is available at [1]_. References ========== .. [1] https://github.com/1st1/cpython/tree/async_gen Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From tjreedy at udel.edu Tue Aug 2 18:55:22 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 2 Aug 2016 18:55:22 -0400 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On 8/2/2016 1:14 AM, Guido van Rossum wrote: > What makes you respond so vehemently to `a: float`? I am not Gregory, but I also had a negative reaction. It makes no sense to me. By your rule that annotations are optional and ignorable, 'a: float' is a statement expression consisting of the identifier 'a'. If one ignores or removes the annotations of a function header, one is left with a valid function header with the same meaning. The only runtime effect is on the .annotations attribute, and any consequence of that. If one were to do the same with a (proposed) annotated assignment, would be left with a valid assignment, and there should be no runtime effect either way. If one removes ': float' from 'a: float', one is left with 'a', a single-name expression statement. To be consistent, the addition or removed of the annotation should have no runtime effect here either. The meaning is the statement is 'if a is not bound to anything in the namespace stack, raise NameError'. In batch mode, the else part is ignore it, ie, 'pass'. I have never seen a name expression statement used this way in non-interactive code. It would be an obscure way to control program flow. Interactive mode adds 'else print(a)', which is useful, hence '>>> a' is common. This is not relevant to the offline use of annotations. If 'a' is bound to something, then the annotation belongs on the binding statement. If it is not, then the annotation is irrelevant unless added to the NameError message. If, as I suspect. upi meant 'a: float' to be a different kind of statement, such as a static type declaration for the name 'a', it would be a major change to Python, unlike adding type hints to existing statements. It would make the annotation required, not optional. It would complicate an annotation stripper, as 'a: float' would have to be handled differently from 'a: float = 1.0'. The existing scope declarations are a bit jarring also, but they have the runtime effect of enabling non-local name binding. The best alternative to 'global n' I can think of would have been a tagged '=', such as '=*' meaning 'bind in the global namespace. But this would have had problems both initially and when adding augmented assignments and non-local binding. -- Terry Jan Reedy From guido at python.org Tue Aug 2 19:07:27 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Aug 2016 16:07:27 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Tue, Aug 2, 2016 at 3:55 PM, Terry Reedy wrote: > On 8/2/2016 1:14 AM, Guido van Rossum wrote: >> >> What makes you respond so vehemently to `a: float`? > > > I am not Gregory, but I also had a negative reaction. It makes no sense to > me. > > By your rule that annotations are optional and ignorable, 'a: float' is a > statement expression consisting of the identifier 'a'. If one ignores or > removes the annotations of a function header, one is left with a valid > function header with the same meaning. The only runtime effect is on the > .annotations attribute, and any consequence of that. If one were to do the > same with a (proposed) annotated assignment, would be left with a valid > assignment, and there should be no runtime effect either way. If one > removes ': float' from 'a: float', one is left with 'a', a single-name > expression statement. To be consistent, the addition or removed of the > annotation should have no runtime effect here either. You're taking the "optional and ignorable" too literally. My strawman proposal for the semantics of a: float is that the interpreter should evaluate the expression `float` and then move on to the next line. That's the same as what happens with annotations in signatures. (However this is a potential slow-down and maybe we need to skip it.) The idea of this syntax is that you can point to function definitions and hand-wave a bit and say "just like an argument has an optional type and an optional default, a variable can have either a type or an initializer or both" (and then in very small print continue to explain that if you leave out both the two situations differ, and ditto for multiple assignment and unpacking). > The meaning is the statement is 'if a is not bound to anything in the > namespace stack, raise NameError'. In batch mode, the else part is ignore > it, ie, 'pass'. I have never seen a name expression statement used this way > in non-interactive code. It would be an obscure way to control program > flow. > > Interactive mode adds 'else print(a)', which is useful, hence '>>> a' is > common. This is not relevant to the offline use of annotations. None of that is relevant, really. :-) > If 'a' is bound to something, then the annotation belongs on the binding > statement. If it is not, then the annotation is irrelevant unless added to > the NameError message. > > > If, as I suspect. upi meant 'a: float' to be a different kind of statement, > such as a static type declaration for the name 'a', it would be a major > change to Python, unlike adding type hints to existing statements. It would > make the annotation required, not optional. It would complicate an > annotation stripper, as 'a: float' would have to be handled differently from > 'a: float = 1.0'. But there are no annotation strippers, only parsers that understand the various annotation syntaxes and ignore the annotations. > The existing scope declarations are a bit jarring also, but they have the > runtime effect of enabling non-local name binding. The best alternative to > 'global n' I can think of would have been a tagged '=', such as '=*' meaning > 'bind in the global namespace. But this would have had problems both > initially and when adding augmented assignments and non-local binding. This is entirely different from global/nonlocal -- the latter are references to previously declared/initialized variables. Here we are declaring new local/class/instance variables. Pretty much the opposite. -- --Guido van Rossum (python.org/~guido) From rosuav at gmail.com Tue Aug 2 19:11:39 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 3 Aug 2016 09:11:39 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Wed, Aug 3, 2016 at 9:07 AM, Guido van Rossum wrote: >> If, as I suspect. upi meant 'a: float' to be a different kind of statement, >> such as a static type declaration for the name 'a', it would be a major >> change to Python, unlike adding type hints to existing statements. It would >> make the annotation required, not optional. It would complicate an >> annotation stripper, as 'a: float' would have to be handled differently from >> 'a: float = 1.0'. > > But there are no annotation strippers, only parsers that understand > the various annotation syntaxes and ignore the annotations. Hmm, is that true, or are there 3->2 tools that do that? (Though they'd just have to be special-cased to remove the entire line.) ChrisA From guido at python.org Tue Aug 2 19:15:06 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Aug 2016 16:15:06 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Tue, Aug 2, 2016 at 4:11 PM, Chris Angelico wrote: > On Wed, Aug 3, 2016 at 9:07 AM, Guido van Rossum wrote: >>> If, as I suspect. upi meant 'a: float' to be a different kind of statement, >>> such as a static type declaration for the name 'a', it would be a major >>> change to Python, unlike adding type hints to existing statements. It would >>> make the annotation required, not optional. It would complicate an >>> annotation stripper, as 'a: float' would have to be handled differently from >>> 'a: float = 1.0'. >> >> But there are no annotation strippers, only parsers that understand >> the various annotation syntaxes and ignore the annotations. > > Hmm, is that true, or are there 3->2 tools that do that? (Though > they'd just have to be special-cased to remove the entire line.) We had one in mypy that pretended to be a codec, but it was a disaster so I consider it a failed concept. -- --Guido van Rossum (python.org/~guido) From chris.barker at noaa.gov Tue Aug 2 19:35:55 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 2 Aug 2016 16:35:55 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> Message-ID: On Tue, Aug 2, 2016 at 2:56 PM, David Mertz wrote: > It really doesn't make sense to me that a clamp() function would *limit > to* a NaN. I realize one can write various implementations that act > differently here, but the principle of least surprise seems violated by > letting a NaN be an actual end point IMO. > NaN's rarely follow the principle of least surprise :-) In [7]: float('nan') == float('nan') Out[7]: False and you are not letting it be an end point -- you are returning Not a Number -- i.e. I have no idea what this value should be. If I'm asking for a value that is "not more than (less than) my bounds" > If your bounds are NaN, then you cannot know if you value is within those bounds -- that's how NaN works. A NaN, conceptually, is a value that *might* exist, if only we knew more > and could determine it.... but as is, it's just "unknown." > NaN is often used for missing values and the like, but that's now quite what it means -- it means just what it says, NOT a number. You know nothing about it. If someone is passing a NaN in for a bound, then they are passing in garbage, essentially -- "I have no idea what my bounds are" so garbage is what they should get back -- "I have no idea what your clamped values are". The reality is that NaNs tend to propagate through calculations -- once one gets introduced, you are very, very likely to get NaN as a result -- this won't change that. If you want unbounded, then don't use this function :-) -- or pass in inf or -inf -- that's what they are for. And they work for integers, too: float('inf') > 9999999999999999999999999999999 Out[13]: True If they don't work for other numeric types, then that should be fixed in those types... One final thought: How would a NaN find it's way into this function? two ways: 1) the user specified it, thinking it might mean "unlimited" -- well don't do that! It will fail the first test. 2) the limit was calculated in some way that resulted in a NaN -- well, in this case, they really have no idea what that limit should be -- the NaN should absolutely be propagated, like it is for any other arithmetic operation. -CHB PS: numpy may be a good place to look for precedent, but unfortunately, it is not necessarily a good place to look for carefully thought out implementations -- much of it was put in there when someone needed it, without much discussion at all. I'm sure that NaN's behave the way they do in numpy.clip() because of how it happens to be implemented, not because anyone carefully thought it out. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Aug 2 19:52:57 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 2 Aug 2016 19:52:57 -0400 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> Message-ID: On Tue, Aug 2, 2016 at 7:35 PM, Chris Barker wrote: > If you want unbounded, then don't use this function :-) -- or pass in inf > or -inf -- that's what they are for. And they work ... +inf :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From acaceres at google.com Tue Aug 2 20:48:36 2016 From: acaceres at google.com (Alvaro Caceres) Date: Tue, 2 Aug 2016 19:48:36 -0500 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Tue, Aug 2, 2016 at 6:07 PM, Guido van Rossum wrote: > On Tue, Aug 2, 2016 at 3:55 PM, Terry Reedy wrote: > > On 8/2/2016 1:14 AM, Guido van Rossum wrote: > >> > >> What makes you respond so vehemently to `a: float`? > > > > > > I am not Gregory, but I also had a negative reaction. It makes no sense > to > > me. > > > > By your rule that annotations are optional and ignorable, 'a: float' is a > > statement expression consisting of the identifier 'a'. If one ignores or > > removes the annotations of a function header, one is left with a valid > > function header with the same meaning. The only runtime effect is on the > > .annotations attribute, and any consequence of that. If one were to do > the > > same with a (proposed) annotated assignment, would be left with a valid > > assignment, and there should be no runtime effect either way. If one > > removes ': float' from 'a: float', one is left with 'a', a single-name > > expression statement. To be consistent, the addition or removed of the > > annotation should have no runtime effect here either. > > You're taking the "optional and ignorable" too literally. My strawman > proposal for the semantics of > > a: float > > is that the interpreter should evaluate the expression `float` and > then move on to the next line. That's the same as what happens with > annotations in signatures. (However this is a potential slow-down and > maybe we need to skip it.) > > The idea of this syntax is that you can point to function definitions > and hand-wave a bit and say "just like an argument has an optional > type and an optional default, a variable can have either a type or an > initializer or both" (and then in very small print continue to explain > that if you leave out both the two situations differ, and ditto for > multiple assignment and unpacking). > The criticism I would make about allowing variables without assignments like a: float is that it makes my mental model of a variable a little bit more complicated than it is currently. If I see "a" again a few lines below, it can either be pointing to some object or be un-initialized. Maybe the benefits are worth it, but I don't really see it, and I wanted to point out this "cost". Semi-related: what should happen if I do "del a" and then re-use the variable name a few lines below? My feeling is that the type annotation should also get discarded when the variable is deleted. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue Aug 2 19:23:06 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 03 Aug 2016 11:23:06 +1200 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> Message-ID: <57A12B5A.90504@canterbury.ac.nz> David Mertz wrote: > It really doesn't make sense to me that a clamp() function would *limit > to* a NaN. Keep in mind that the NaNs involved have probably arisen from some other computation that went wrong, and that the purpose of the whole NaN system is to propagate an indication of that wrongness so that it's evident in the final result. So here's how I see it: clamp(NaN, y, z) is asking "Is an unknown number between y and z?" The answer to that is not known, so the result should be NaN. clamp(x, y, NaN) is asking "Is x between y and an unknown number?" If x > y, the answer to that is not known, so the result should be NaN. If x < y, you might argue that the result should be y. But consider clamp(x, 2, 1). You're asking it to limit x to a value not less than 2 and not greater than 1. There's no such number, so arguably the result should be NaN. If you accept that, then clamp(x, y, NaN) should be NaN in all cases, since we don't know that the upper bound isn't less than the lower bound. So in summary, I think it should be: clamp(NaN, y, z) --> NaN clamp(x, NaN, z) --> NaN clamp(x, y, NaN) --> NaN clamp(x, y, z) --> NaN if z < y -- Greg From greg at krypto.org Tue Aug 2 21:15:46 2016 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 03 Aug 2016 01:15:46 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Mon, Aug 1, 2016 at 10:14 PM Guido van Rossum wrote: > What makes you respond so vehemently to `a: float`? > First reaction. It doesn't actually seem too bad now. It is already legal to just say `a` as a statement so this isn't much different. def f(): a is already legal. So `a: List[float]` would at least be meaningful rather than a meaningless statement that any linter should question. :) > > The `def` keyword has been proposed before, but *I* have a vehement > response to it; `def` is for functions. > > The Cython syntax may live forever in Cython, but I don't want to add > it to Python, especially since we already have function annotations > using `var: type = default` -- variable declarations must somehow > rhyme with this. > I agree, cdef is their own mess. :) > > The `local` keyword isn't horrible but would still require a > `__future__` import and a long wait, which is why I'm exploring just > `var: type = value`. I think we can pull it off syntactically, using a > trick no more horrible than the one we already employ to restrict the > LHS of an assignment even though the grammar seen by the parser is > something like `expr_stmt: testlist ('=' testlist)*` (at least it was > something like this long ago -- it's more complex now but the same > idea still applies, since the official parser is still LR(1)). > > Regarding scopes, I like the way mypy currently does this -- you can > only have a `# type` comment on the first assignment of a variable, > and scopes are flat as they are in Python. (Mypy is really > anticipating a syntax for variable declarations here.) Seems we agree > on this, at least. > -gps > On Mon, Aug 1, 2016 at 4:39 PM, Gregory P. Smith wrote: > > > > On Mon, Aug 1, 2016 at 2:32 PM Guido van Rossum > wrote: > >> > >> PEP 484 doesn't change Python's syntax. Therefore it has no good > >> syntax to offer for declaring the type of variables, and instead you > >> have to write e.g. > >> > >> a = 0 # type: float > >> b = [] # type: List[int] > >> c = None # type: Optional[str] > >> > >> I'd like to address this in the future, and I think the most elegant > >> syntax would be to let you write these as follows: > >> > >> a: float = 0 > >> b: List[int] = [] > >> c: Optional[str] = None > >> > >> (I've considered a 'var' keyword in the past, but there just are too > >> many variables named 'var' in my code. :-) > >> > > > > My first impression of this given the trivial int and str examples is... > Why > > are you declaring types for things that are plainly obvious? I guess > that's > > a way of saying pick better examples. :) Ones where the types aren't > > implied by obvious literals on the RHS. > > > > Your examples using complex types such as List[int] and Optional[str] are > > already good ones as that can't be immediately inferred. > > > > b: str = module.something(a) > > > > is a better example as without knowledge of module.something we cannot > > immediately infer the type and thus the type declaration might be > considered > > useful to prevent bugs rather than annoying read and keep up to date. > > > > I predict it will be more useful for people to declare abstract > > interface-like types rather than concrete ones such as int or str > anyways. > > (duck typing ftw) But my predictions shouldn't be taken too seriously. > I > > want to see what happens. > > > >> > >> There are some corner cases to consider. First, to declare a > >> variable's type without giving it an initial value, we can write this: > >> > >> a: float > > > > > > I don't like this at all. We only allow pre-declaration without an > > assignment using keywords today. the 'local' suggestion others have > > mentioned is worth consideration but I worry any time we add a keyword as > > that breaks a lot of existing code. Cython uses 'cdef' for this but we > > obviously don't want that as it implies much more and isn't obvious > outside > > of the cython context. > > > > You could potentially reuse the 'def' keyword for this. > > > > def a: List[float]. > > > > This would be a surprising new syntax for many who are used to searching > > code for r'^\s*def' to find function definitions. Precedent: Cython > already > > overloads its own 'cdef' concept for both variable and function/method > use. > > > > Potential alternative to the above def (ab)use: > > > > def a -> List[float] > > def a List[float] > > def List[float] a # copies the Cython ordering which seems to derive > from C > > syntax for obvious reasons > > > > But the -> token really implies return value while the : token already > > implies variable type annotation. At first glance I'm not happy with > these > > but arguments could be made. > > > >> Second, when these occur in a class body, they can define either class > >> variables or instance variables. Do we need to be able to specify > >> which? > >> > >> Third, there's an annoying thing with tuples/commas here. On the one > >> hand, in a function declaration, we may see (a: int = 0, b: str = ''). > >> On the other hand, in an assignment, we may see > >> > >> a, b = 0, '' > >> > >> Suppose we wanted to add types to the latter. Would we write this as > >> > >> a, b: int, str = 0, '' > >> > >> or as > >> > >> a: int, b: str = 0, '' > >> > >> ??? Personally I think neither is acceptable, and we should just write > it > >> as > >> > >> a: int = 0 > >> b: str = '' > > > > > > Disallowing ": type" syntax in the presence of tuple assignment seems > simple > > and wise to me. Easy to parse. But I understand if people disagree and > want > > a defined way to do it. > > > >> but this is a slight step back from > >> > >> a, b = 0, '' # type: (int, str) > >> > >> -- > >> --Guido van Rossum (python.org/~guido) > > > > > > When thinking about how to spell this out in a PEP, it is worth taking > into > > account existing ways of declaring types on variables in Python. Cython > took > > the "Keyword Type Name" approach with "cdef double j" syntax. > > http://cython.readthedocs.io/en/latest/src/quickstart/cythonize.html > > > > Is it an error to write the following (poor style) code declaring a type > for > > the same variable multiple times: > > > > c: int = module.count_things(x) > > compute_thing(c) > > if c > 3: > > c: str = module.get_thing(3) > > logging.info('end of thing 3: %s', c[-5:]) > > do_something(c) > > > > where c takes on multiple types within a single scope? static single > > assignment form would generate a c', c'', and union of c' and c'' types > for > > the final do_something call to reason about that code. but it is > entirely > > doable in Python and does happen in unfortunately real world messy code > as > > variables are reused in bad ways. > > > > My preference would be to make it an error for more than one type to be > > declared for the same variable. > > First type ever mentioned within the scope wins and all others are > > SyntaxError worthy. > > Assigning to a variable in a scope before an assignment that declares its > > type should probably also be a SyntaxError. > > > > -gps > > > > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Wed Aug 3 02:45:11 2016 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 3 Aug 2016 08:45:11 +0200 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> Message-ID: Hi! I'm all for this. I've run into it so many times already while implementing the async/await support in Cython, it's really a totally obvious extension to what there is currently, and it's practically a requirement for any serious usage of async iterators. Some comments below. Yury Selivanov schrieb am 03.08.2016 um 00:31: > PEP 492 requires an event loop or a scheduler to run coroutines. > Because asynchronous generators are meant to be used from coroutines, > they also require an event loop to run and finalize them. Well, or *something* that uses them in the same way as an event loop would. Doesn't have to be an event loop. > 1. Implement an ``aclose`` method on asynchronous generators > returning a special *awaitable*. When awaited it > throws a ``GeneratorExit`` into the suspended generator and > iterates over it until either a ``GeneratorExit`` or > a ``StopAsyncIteration`` occur. > > This is very similar to what the ``close()`` method does to regular > Python generators, except that an event loop is required to execute > ``aclose()``. I don't see a motivation for adding an "aclose()" method in addition to the normal "close()" method. Similar for send/throw. Could you elaborate on that? > 3. Add two new methods to the ``sys`` module: > ``set_asyncgen_finalizer()`` and ``get_asyncgen_finalizer()``. > > The idea behind ``sys.set_asyncgen_finalizer()`` is to allow event > loops to handle generators finalization, so that the end user > does not need to care about the finalization problem, and it just > works. > > When an asynchronous generator is iterated for the first time, > it stores a reference to the current finalizer. If there is none, > a ``RuntimeError`` is raised. This provides a strong guarantee that > every asynchronous generator object will always have a finalizer > installed by the correct event loop. > > When an asynchronous generator is about to be garbage collected, > it calls its cached finalizer. The assumption is that the finalizer > will schedule an ``aclose()`` call with the loop that was active > when the iteration started. > > For instance, here is how asyncio is modified to allow safe > finalization of asynchronous generators:: > > # asyncio/base_events.py > > class BaseEventLoop: > > def run_forever(self): > ... > old_finalizer = sys.get_asyncgen_finalizer() > sys.set_asyncgen_finalizer(self._finalize_asyncgen) > try: > ... > finally: > sys.set_asyncgen_finalizer(old_finalizer) > ... > > def _finalize_asyncgen(self, gen): > self.create_task(gen.aclose()) > > ``sys.set_asyncgen_finalizer()`` is thread-specific, so several event > loops running in parallel threads can use it safely. Phew, this adds quite some complexity and magic. That is a problem. For one, this uses a global setup, so There Can Only Be One of these finalizers. ISTM that if special cleanup is required, either the asyncgen itself should know how to do that, or we should provide some explicit API that does something when *initialising* the asyncgen. That seems better than doing something global behind the users' back. Have you considered providing some kind of factory in asyncio that wraps asyncgens or so? I won't go into the implementation details for now, but thanks for the PEP and for working on this so actively. Stefan From storchaka at gmail.com Wed Aug 3 07:11:25 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 3 Aug 2016 14:11:25 +0300 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <57A12B5A.90504@canterbury.ac.nz> References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> Message-ID: On 03.08.16 02:23, Greg Ewing wrote: > So in summary, I think it should be: > > clamp(NaN, y, z) --> NaN > clamp(x, NaN, z) --> NaN > clamp(x, y, NaN) --> NaN > clamp(x, y, z) --> NaN if z < y What about clamp(NaN, y, y)? From srkunze at mail.de Wed Aug 3 09:36:10 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 3 Aug 2016 15:36:10 +0200 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <621a46cf-1d0a-bdcd-23ba-1532221587cf@mail.de> On 03.08.2016 03:15, Gregory P. Smith wrote: > > > On Mon, Aug 1, 2016 at 10:14 PM Guido van Rossum > wrote: > > What makes you respond so vehemently to `a: float`? > > > First reaction. It doesn't actually seem too bad now. It is already > legal to just say `a` as a statement so this isn't much different. > > def f(): > a > > is already legal. So `a: List[float]` would at least be meaningful > rather than a meaningless statement that any linter should question. :) > That's legal? To me that seems like a syntax/name error. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Aug 3 09:41:14 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 03 Aug 2016 06:41:14 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <621a46cf-1d0a-bdcd-23ba-1532221587cf@mail.de> References: <621a46cf-1d0a-bdcd-23ba-1532221587cf@mail.de> Message-ID: <57A1F47A.10500@stoneleaf.us> On 08/03/2016 06:36 AM, Sven R. Kunze wrote: > On 03.08.2016 03:15, Gregory P. Smith wrote: >> def f(): >> a >> >> is already legal. So `a: List[float]` would at least be meaningful rather than a meaningless statement that any linter should question. :) > > That's legal? To me that seems like a syntax/name error. If `a` is global (or nonlocal), then it's legal (the object is returned, this discarded); otherwise it's a NameError. -- ~Ethan~ From srkunze at mail.de Wed Aug 3 09:49:29 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 3 Aug 2016 15:49:29 +0200 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <065f4634-609a-a1de-92e2-cd99a6adcd1e@mail.de> On 01.08.2016 23:31, Guido van Rossum wrote: > PEP 484 doesn't change Python's syntax. Therefore it has no good > syntax to offer for declaring the type of variables, and instead you > have to write e.g. > > a = 0 # type: float > b = [] # type: List[int] > c = None # type: Optional[str] > > I'd like to address this in the future, and I think the most elegant > syntax would be to let you write these as follows: > > a: float = 0 > b: List[int] = [] > c: Optional[str] = None > > (I've considered a 'var' keyword in the past, but there just are too > many variables named 'var' in my code. :-) I can't help but this seems like a short "key: value" dict declaration. Besides the fact that those examples don't really highlight the real use-cases. But others already covered that. > There are some corner cases to consider. First, to declare a > variable's type without giving it an initial value, we can write this: > > a: float Then writing "a" should be allow, too, right? As a variable declaration without any type hint. That's usually a NameError. Put it differently, what is the use-case Python has missed so long in not providing a way of declaring an empty variable? Will it be filled with None? > Second, when these occur in a class body, they can define either class > variables or instance variables. Do we need to be able to specify > which? Could be useful but might result in a lot of double maintenance work (class body + place of initilization). > Third, there's an annoying thing with tuples/commas here. On the one > hand, in a function declaration, we may see (a: int = 0, b: str = ''). > On the other hand, in an assignment, we may see > > a, b = 0, '' > > Suppose we wanted to add types to the latter. Would we write this as > > a, b: int, str = 0, '' > > or as > > a: int, b: str = 0, '' > > ??? Personally I think neither is acceptable, and we should just write it as > > a: int = 0 > b: str = '' > > but this is a slight step back from > > a, b = 0, '' # type: (int, str) > Or using "a: float" from above: a: float b: str a, b = 0, '' So, ignoring commas for now and solving it later would also work. Sven From srkunze at mail.de Wed Aug 3 09:51:10 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 3 Aug 2016 15:51:10 +0200 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <57A1F47A.10500@stoneleaf.us> References: <621a46cf-1d0a-bdcd-23ba-1532221587cf@mail.de> <57A1F47A.10500@stoneleaf.us> Message-ID: <1f28d59a-6abe-f6cd-5aa2-447dd8c43723@mail.de> On 03.08.2016 15:41, Ethan Furman wrote: > If `a` is global (or nonlocal), then it's legal (the object is > returned, this discarded); otherwise it's a NameError. That's correct but this thread is about declaration. From ethan at stoneleaf.us Wed Aug 3 10:39:24 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 03 Aug 2016 07:39:24 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <1f28d59a-6abe-f6cd-5aa2-447dd8c43723@mail.de> References: <621a46cf-1d0a-bdcd-23ba-1532221587cf@mail.de> <57A1F47A.10500@stoneleaf.us> <1f28d59a-6abe-f6cd-5aa2-447dd8c43723@mail.de> Message-ID: <57A2021C.7070002@stoneleaf.us> On 08/03/2016 06:51 AM, Sven R. Kunze wrote: > On 03.08.2016 15:41, Ethan Furman wrote: >> If `a` is global (or nonlocal), then it's legal (the object is returned, this discarded); otherwise it's a NameError. > > That's correct but this thread is about declaration. Ah, I thought you were questioning the `a` all by itself, not the `a: List[float]`. -- ~Ethan~ From rob.cliffe at btinternet.com Wed Aug 3 10:36:39 2016 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Wed, 3 Aug 2016 15:36:39 +0100 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <57A12B5A.90504@canterbury.ac.nz> References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> Message-ID: On 03/08/2016 00:23, Greg Ewing wrote: > David Mertz wrote: >> It really doesn't make sense to me that a clamp() function would >> *limit to* a NaN. > > Keep in mind that the NaNs involved have probably arisen > from some other computation that went wrong, and that > the purpose of the whole NaN system is to propagate an > indication of that wrongness so that it's evident in the > final result. > > So here's how I see it: > > clamp(NaN, y, z) is asking "Is an unknown number between > y and z?" The answer to that is not known, so the result > should be NaN. > > clamp(x, y, NaN) is asking "Is x between y and an unknown > number?" If x > y, the answer to that is not known, so the > result should be NaN. +1 so far > > If x < y, you might argue that the result should be y. > But consider clamp(x, 2, 1). You're asking it to limit > x to a value not less than 2 and not greater than 1. > There's no such number, so arguably the result should > be NaN. I think clamp(x,2,1) should raise ValueError. It's asking for something impossible. > > If you accept that, then clamp(x, y, NaN) should be > NaN in all cases, since we don't know that the upper > bound isn't less than the lower bound. +0.8. Returning y when x > > So in summary, I think it should be: > > clamp(NaN, y, z) --> NaN > clamp(x, NaN, z) --> NaN > clamp(x, y, NaN) --> NaN > clamp(x, y, z) --> NaN if z < y > From p.f.moore at gmail.com Wed Aug 3 10:49:31 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 3 Aug 2016 15:49:31 +0100 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> Message-ID: On 3 August 2016 at 15:36, Rob Cliffe wrote: >> If x < y, you might argue that the result should be y. >> But consider clamp(x, 2, 1). You're asking it to limit >> x to a value not less than 2 and not greater than 1. >> There's no such number, so arguably the result should >> be NaN. > > I think clamp(x,2,1) should raise ValueError. It's asking for something impossible. Agreed. >> If you accept that, then clamp(x, y, NaN) should be >> NaN in all cases, since we don't know that the upper >> bound isn't less than the lower bound. > > +0.8. Returning y when x the temptation to guess". clamp(val, lo, hi) should raise ValueError if either lo or hi is NaN, for the same reason (lo < hi doesn't hold, in this case because the values are incomparable). Paul From dmoisset at machinalis.com Wed Aug 3 10:51:28 2016 From: dmoisset at machinalis.com (Daniel Moisset) Date: Wed, 3 Aug 2016 15:51:28 +0100 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Tue, Aug 2, 2016 at 11:09 PM, Guido van Rossum wrote: > On Mon, Aug 1, 2016 at 4:35 PM, Daniel Moisset > wrote: > > > > I'd say that if I have a class C with a class variable cv, instance > variable > > iv, a good type checking system should detect: > > > > C.cv # ok > > C.iv # error! > > C().iv # ok > > > > which is something that PEP484 doesn't clarify much (and mypy flags all > 3 as > > valid) > > Yeah, this is all because you can't express that in Python either. > When you see an assignment in a class body you can't tell if it's > meant as an instance variable default or a class variable (except for > some specific cases -- e.g. nested class definitions are pretty > obvious :-). > May be I've gotten wrong my python style for many years, but I always considered that the "proper" way to create instance variables was inside the initializer (or in rare occasions, some other method/classmethod). For me, an assignment at a class body is a class variable/constant. So I was going to propose "type declarations at class level are always for class variables, and inside methods (preceded by "self.") are for instance variables". Using class level variables for defaults always seemed unpythonic and error prone (when mutable state is involved) to me. I felt that was common practice but can't find documentation to support it, so I'd like to hear if I'm wrong there :) > > -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Aug 3 10:54:04 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Aug 2016 00:54:04 +1000 Subject: [Python-ideas] size of the installation of Python on mobile devices In-Reply-To: <22432.37911.335413.674163@turnbull.sk.tsukuba.ac.jp> References: <8d577973-eaaf-21cf-3604-8ffd28773ba4@gmail.com> <579A49F8.9080708@egenix.com> <7bb89d4a-3930-d2fd-9736-2360008f561e@gmail.com> <22428.22132.611219.145348@turnbull.sk.tsukuba.ac.jp> <22430.57892.461912.677961@turnbull.sk.tsukuba.ac.jp> <22432.25405.170635.186167@turnbull.sk.tsukuba.ac.jp> <22432.37911.335413.674163@turnbull.sk.tsukuba.ac.jp> Message-ID: On 2 August 2016 at 22:37, Stephen J. Turnbull wrote: > Based on the evidence presented so far, I think adding to core is > premature and the feature would benefit from cooperation among > interested parties in a development environment that can move far > faster than core will. Ah, that's likely the key difference in our assumptions then: CPython maintenance branches already move plenty fast enough for any use case where you aren't waiting for us to make official releases :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg at krypto.org Wed Aug 3 10:57:29 2016 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 03 Aug 2016 14:57:29 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <065f4634-609a-a1de-92e2-cd99a6adcd1e@mail.de> References: <065f4634-609a-a1de-92e2-cd99a6adcd1e@mail.de> Message-ID: On Wed, Aug 3, 2016, 6:49 AM Sven R. Kunze wrote: > On 01.08.2016 23:31, Guido van Rossum wrote: > > PEP 484 doesn't change Python's syntax. Therefore it has no good > > syntax to offer for declaring the type of variables, and instead you > > have to write e.g. > > > > a = 0 # type: float > > b = [] # type: List[int] > > c = None # type: Optional[str] > > > > I'd like to address this in the future, and I think the most elegant > > syntax would be to let you write these as follows: > > > > a: float = 0 > > b: List[int] = [] > > c: Optional[str] = None > > > > (I've considered a 'var' keyword in the past, but there just are too > > many variables named 'var' in my code. :-) > > I can't help but this seems like a short "key: value" dict declaration. > > Besides the fact that those examples don't really highlight the real > use-cases. But others already covered that. > > > There are some corner cases to consider. First, to declare a > > variable's type without giving it an initial value, we can write this: > > > > a: float > > Then writing "a" should be allow, too, right? As a variable declaration > without any type hint. That's usually a NameError. > > Put it differently, what is the use-case Python has missed so long in > not providing a way of declaring an empty variable? Will it be filled > with None? > Nope, it wouldn't do anything at all. No bytecode. No namespace updated. No None. It is merely an informational statement that an optional type checker pass may make use of. This definitely counts as a difference between a bare 'a' and 'a: SPAM'. The former is a name lookup while the latter is a no-op. Good or bad I'm undecided. At least the latter is useful while the former is more of a side effect of how Python works. > > Second, when these occur in a class body, they can define either class > > variables or instance variables. Do we need to be able to specify > > which? > > Could be useful but might result in a lot of double maintenance work > (class body + place of initilization). > > > Third, there's an annoying thing with tuples/commas here. On the one > > hand, in a function declaration, we may see (a: int = 0, b: str = ''). > > On the other hand, in an assignment, we may see > > > > a, b = 0, '' > > > > Suppose we wanted to add types to the latter. Would we write this as > > > > a, b: int, str = 0, '' > > > > or as > > > > a: int, b: str = 0, '' > > > > ??? Personally I think neither is acceptable, and we should just write > it as > > > > a: int = 0 > > b: str = '' > > > > but this is a slight step back from > > > > a, b = 0, '' # type: (int, str) > > > > Or using "a: float" from above: > > a: float > b: str > a, b = 0, '' > > > So, ignoring commas for now and solving it later would also work. > > Sven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vgr255 at live.ca Wed Aug 3 11:08:49 2016 From: vgr255 at live.ca (Emanuel Barry) Date: Wed, 3 Aug 2016 15:08:49 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <065f4634-609a-a1de-92e2-cd99a6adcd1e@mail.de>, Message-ID: I'm behind Gregory here, as I'm not a big fan of the "a: float" syntax (but I don't have a strong opinion either way). I'm strongly against merely "a" being equal to "a = None", though. In my code, that's either a typo or I've wrapped that in a try: ... except NameError: ..., and I'd like Python to throw a NameError in that case. So either we deal with an asymmetry in how annotations work, or we disallow it. I'm likely not going to use that particular syntax even if it exists, so I'm -0. -Emanuel >From Gregory P. Smith Re: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 On Wed, Aug 3, 2016, 6:49 AM Sven R. Kunze > wrote: On 01.08.2016 23:31, Guido van Rossum wrote: > There are some corner cases to consider. First, to declare a > variable's type without giving it an initial value, we can write this: > > a: float Then writing "a" should be allow, too, right? As a variable declaration without any type hint. That's usually a NameError. Put it differently, what is the use-case Python has missed so long in not providing a way of declaring an empty variable? Will it be filled with None? Nope, it wouldn't do anything at all. No bytecode. No namespace updated. No None. It is merely an informational statement that an optional type checker pass may make use of. This definitely counts as a difference between a bare 'a' and 'a: SPAM'. The former is a name lookup while the latter is a no-op. Good or bad I'm undecided. At least the latter is useful while the former is more of a side effect of how Python works. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Wed Aug 3 11:13:21 2016 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Wed, 3 Aug 2016 12:13:21 -0300 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On 3 August 2016 at 11:51, Daniel Moisset wrote: > May be I've gotten wrong my python style for many years, but I always > considered that the "proper" way to create instance variables was inside the > initializer (or in rare occasions, some other method/classmethod). For me, > an assignment at a class body is a class variable/constant. > > So I was going to propose "type declarations at class level are always for > class variables, and inside methods (preceded by "self.") are for instance > variables". Using class level variables for defaults always seemed > unpythonic and error prone (when mutable state is involved) to me. I felt > that was common practice but can't find documentation to support it, so I'd > like to hear if I'm wrong there :) You've just missed one of the most powerful side-effects of how Python's class and instance variables interact: class Spaceship: color = RED hitpoints = 50 def powerup(self): self.color = BLUE self.hitpoints += 100 Above: the defaults are good for almost all spaceship instaces - but when one of them is "powered up", and only them, that instance values are defined to be different than the defaults specified in the class. At that point proper instance variables are created in the instance? __dict__, but for all other instances, the defaults, living in the instance's".__class__.__dict__" are just good enough. From ncoghlan at gmail.com Wed Aug 3 11:24:45 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Aug 2016 01:24:45 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On 3 August 2016 at 10:48, Alvaro Caceres via Python-ideas wrote: > The criticism I would make about allowing variables without assignments like > > a: float > > is that it makes my mental model of a variable a little bit more complicated > than it is currently. If I see "a" again a few lines below, it can either be > pointing to some object or be un-initialized. Maybe the benefits are worth > it, but I don't really see it, and I wanted to point out this "cost". This concern rings true for me as well - "I'm going to be defining a variable named 'a' later and it will be a float" isn't a concept Python has had before. I *have* that concept in my mental model of C/C++, but trying to activate for Python has my brain going "Wut? No.". I'd be much happier if we made initialisation mandatory, so the above would need to be written as either: a: float = 0.0 # Or other suitable default value or: a: Optional[float] = None The nebulous concept (and runtime loophole) where you can see: class Example: a: float ... but still have Example().a throw AttributeError would also be gone. (Presumably this approach would also simplify typechecking inside __new__ and __init__ implementations, as the attribute will reliably be defined the moment the instance is created, even if it hasn't been set to an appropriate value yet) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From jsbueno at python.org.br Wed Aug 3 11:31:15 2016 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Wed, 3 Aug 2016 12:31:15 -0300 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: I may have missed that on the message deluge so far - but would the type annotations be available at runtime, like parameter annotations live in "__annotations__"? An __annotations__ dict straight in the class itself? I thinkt hat is the obvious thing - although I did not see it in the above messages. (A s I said, I may have missed its mention) That is ratehr important because then, beyond enabling static code analysis it is easy to have third party frameworks that enforce typing in runtime. On 3 August 2016 at 12:13, Joao S. O. Bueno wrote: > On 3 August 2016 at 11:51, Daniel Moisset wrote: >> May be I've gotten wrong my python style for many years, but I always >> considered that the "proper" way to create instance variables was inside the >> initializer (or in rare occasions, some other method/classmethod). For me, >> an assignment at a class body is a class variable/constant. >> >> So I was going to propose "type declarations at class level are always for >> class variables, and inside methods (preceded by "self.") are for instance >> variables". Using class level variables for defaults always seemed >> unpythonic and error prone (when mutable state is involved) to me. I felt >> that was common practice but can't find documentation to support it, so I'd >> like to hear if I'm wrong there :) > > You've just missed one of the most powerful side-effects of how > Python's class and instance variables interact: > > class Spaceship: > color = RED > hitpoints = 50 > > def powerup(self): > > self.color = BLUE > self.hitpoints += 100 > > Above: the defaults are good for almost all spaceship instaces - but > when one of them is "powered up", and only them, that instance values > are defined to be different than the defaults specified in the class. > At that point proper instance variables are created in the instance? > __dict__, but for all other instances, the defaults, living in the > instance's".__class__.__dict__" are just good enough. From yselivanov.ml at gmail.com Wed Aug 3 11:32:44 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 3 Aug 2016 11:32:44 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> Message-ID: Hi Stefan! On 2016-08-03 2:45 AM, Stefan Behnel wrote: > Hi! > > I'm all for this. I've run into it so many times already while implementing > the async/await support in Cython, it's really a totally obvious extension > to what there is currently, and it's practically a requirement for any > serious usage of async iterators. Thanks for the support! > > Some comments below. > > Yury Selivanov schrieb am 03.08.2016 um 00:31: >> PEP 492 requires an event loop or a scheduler to run coroutines. >> Because asynchronous generators are meant to be used from coroutines, >> they also require an event loop to run and finalize them. > Well, or *something* that uses them in the same way as an event loop would. > Doesn't have to be an event loop. Sure, I'm just using the same terminology PEP 492 was defined with. We can say "coroutine runner" instead of "event loop". > > >> 1. Implement an ``aclose`` method on asynchronous generators >> returning a special *awaitable*. When awaited it >> throws a ``GeneratorExit`` into the suspended generator and >> iterates over it until either a ``GeneratorExit`` or >> a ``StopAsyncIteration`` occur. >> >> This is very similar to what the ``close()`` method does to regular >> Python generators, except that an event loop is required to execute >> ``aclose()``. > I don't see a motivation for adding an "aclose()" method in addition to the > normal "close()" method. Similar for send/throw. Could you elaborate on that? There will be no "close", "send" and "throw" defined for asynchronous generators. Only their asynchronous equivalents. This topic is actually quite complex, so bear with me. 1. It is important to understand that asynchronous iteration protocol is multiplexed into normal iteration protocol. For example: @types.coroutine def foobar(): yield 'spam' async def agen(): await foobar() yield 123 The 'agen' generator, on the lowest level of generators implementation will yield two things -- 'spam', and a wrapped 123 value. Because 123 is wrapped, the async generators machinery can distinguish async yields from normal yields. The idea behind __anext__ coroutine is that it yields through all "normal" yields, and raises a StopIteration when it encounters a "wrapped" yield (same idea behind aclose(), athrow(), and asend()) 2. Now let's look at two generators (sync and async): def gen(): async def agen(): try: try: ... ... finally: finally: fin1() await afin1() fin2() await afin2() yield 123 yield 123 * If we call 'gen().close()' when gen() is suspended somewhere in its try block, a GeneratorExit exception will be thrown in it. Then, fin1() and fin2() calls will be executed. Then, a 'yield 123' line will happen, which will cause a RuntimeError('generator yielded while closing'). The reason for the RuntimeError is that the interpreter does not want generators to yield in their finally statements. It wants them to synchronously finalize themselves. Yielding while closing doesn't make any sense. * Now, if we would just reuse the synchronous 'close()' method for 'agen()' -- awaiting on 'afin1()' would simply result in a RuntimeError('generator yielded while closing'). So the close() implementation for agen() must allow some yields, to make 'await' expressions possible in the finally block. This is something that is absolutely required, because instead of try..finally you could have 'async with' block in agen() -- so the ability to call asynchronous code in the 'finally' block is very important. Therefore, it's necessary to introduce a new close semantics for asynchronous generators: - It is OK to 'await' on anything in finally blocks in async generators. - Trying to 'yield' in finally blocks will result in a RuntimeError('async generator yielded while closing') -- similarly to sync generators. - Because we have to allow awaits in generator's finally blocks, the new 'close' method has to be a coroutine-like object. Since all this is quite different from sync generators' close method, it was decided to have a different name for this method for async generators: aclose. aclose() is a coroutine-like object, you can await on it, and you can even throw a CancelledError into it; so it's possible to write 'await asyncio.wait_for(agen.aclose(), timeout=1)'. 3. asend() and athrow(). This is very similar to aclose(). Both have to be coroutines because async yields are multiplexed into normal yields that awaitables use behind the scenes. async def foo(): await asyncio.sleep(1) yield 123 If we had a synchronous send() method defined for foo(), you'd see something like this: gen = foo() gen.send(None) -> Instead, what you really want is this: gen = foo() await gen.asend(None) -> 123 4. I really recommend you to play with the reference implementation. You can emulate synchronous "send" and "throw" buy doing this trick: gen = foo() gen.__anext__().send() gen.__anext__().throw() > >> 3. Add two new methods to the ``sys`` module: >> ``set_asyncgen_finalizer()`` and ``get_asyncgen_finalizer()``. >> >> The idea behind ``sys.set_asyncgen_finalizer()`` is to allow event >> loops to handle generators finalization, so that the end user >> does not need to care about the finalization problem, and it just >> works. >> >> When an asynchronous generator is iterated for the first time, >> it stores a reference to the current finalizer. If there is none, >> a ``RuntimeError`` is raised. This provides a strong guarantee that >> every asynchronous generator object will always have a finalizer >> installed by the correct event loop. >> >> When an asynchronous generator is about to be garbage collected, >> it calls its cached finalizer. The assumption is that the finalizer >> will schedule an ``aclose()`` call with the loop that was active >> when the iteration started. >> >> For instance, here is how asyncio is modified to allow safe >> finalization of asynchronous generators:: >> >> # asyncio/base_events.py >> >> class BaseEventLoop: >> >> def run_forever(self): >> ... >> old_finalizer = sys.get_asyncgen_finalizer() >> sys.set_asyncgen_finalizer(self._finalize_asyncgen) >> try: >> ... >> finally: >> sys.set_asyncgen_finalizer(old_finalizer) >> ... >> >> def _finalize_asyncgen(self, gen): >> self.create_task(gen.aclose()) >> >> ``sys.set_asyncgen_finalizer()`` is thread-specific, so several event >> loops running in parallel threads can use it safely. > Phew, this adds quite some complexity and magic. That is a problem. For > one, this uses a global setup, so There Can Only Be One of these > finalizers. ISTM that if special cleanup is required, either the asyncgen > itself should know how to do that, or we should provide some explicit API > that does something when *initialising* the asyncgen. That seems better > than doing something global behind the users' back. Have you considered > providing some kind of factory in asyncio that wraps asyncgens or so? set_asyncgen_finalizer is thread-specific, so you can have one finalizer set up per thread. The reference implementation actually integrates this all into asyncio. The idea is to setup loop async gens finalizer just before the loop starts, and reset the finalizer to the previous one (usually it's None) just before it stops. The finalizer is attached to a generator when it is yielding for the first time -- this guarantees that every generators will have a correct finalizer attached to it. It's not right to attach the finalizer (or wrap the generator) when the generator is initialized. Consider this code: async def foo(): async with smth(): yield async def coro(gen): async for i in foo(): ... loop.run_until_complete(coro(foo())) ^^ In the above example, when the 'foo()' is instantiated, there is no loop or finalizers set up at all. BUT since a loop (or coroutine wrapper) is required to iterate async generators, there is a strong guarantee that it *will* present on the first iteration. Regarding "async gen itself should know how to cleanup" -- that's not possible. async gen could just have an async with block and then GCed (after being partially consumed). Users won't expect to do anything besides using try..finally or async with, so it's the responsibility of the coroutine runner to cleanup async gen. Hence 'aclose' is a coroutine, and hence this set_asyncgen_finalizer API for coroutine runners. This is indeed the most magical part of the proposal. Although it's important to understand that the regular Python users will likely never encounter this in their life -- finalizers will be set up by the framework they use (asyncio, Tornado, Twisted, you name it). Thanks! Yury From p.f.moore at gmail.com Wed Aug 3 11:35:56 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 3 Aug 2016 16:35:56 +0100 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On 3 August 2016 at 16:13, Joao S. O. Bueno wrote: > On 3 August 2016 at 11:51, Daniel Moisset wrote: >> May be I've gotten wrong my python style for many years, but I always >> considered that the "proper" way to create instance variables was inside the >> initializer (or in rare occasions, some other method/classmethod). For me, >> an assignment at a class body is a class variable/constant. >> >> So I was going to propose "type declarations at class level are always for >> class variables, and inside methods (preceded by "self.") are for instance >> variables". Using class level variables for defaults always seemed >> unpythonic and error prone (when mutable state is involved) to me. I felt >> that was common practice but can't find documentation to support it, so I'd >> like to hear if I'm wrong there :) > > You've just missed one of the most powerful side-effects of how > Python's class and instance variables interact: > > class Spaceship: > color = RED > hitpoints = 50 > > def powerup(self): > > self.color = BLUE > self.hitpoints += 100 > > Above: the defaults are good for almost all spaceship instaces - but > when one of them is "powered up", and only them, that instance values > are defined to be different than the defaults specified in the class. > At that point proper instance variables are created in the instance? > __dict__, but for all other instances, the defaults, living in the > instance's".__class__.__dict__" are just good enough. Yes, but I view that as "when you ask for an instance variable, if there isn't one you get the class variable as the default - nested namespaces basically, just like local variables of a function and global variables. So to me class Spaceship: hitpoints: int = 50 declares a class Spaceship, with a *class* variable hitpoints. Instances will get that value by default, but can assign something different. >From a typing perspective, whether it's acceptable for an instance variable to have a different type than the class variable with the same name, is the point here. Because of the behaviour of acting as a default, it probably isn't acceptable. But OTOH, there's nothing saying you *have* to respect the link class Whatever: myval : int = 20 def __init__(self): self.myval = "Fooled you!" The problem is of course that there's no way to attach a type declaration to an instance variable, unless you allow self.myval : str = "Fooled you" which opens up the possibility of declaring types for (in theory) arbitrary expressions. In summary, I think type annotations on variables declared at class scope should describe the type of the class variable - because that's what the assignment is creating. That leaves no obvious means of declaring the type of an instance variable (can you put a comment-style type annotation on the self.x = whatever line at the moment?) Which is a problem, but not one (IMO) that should be solved by somehow pretending that when you declare a class variable you're actually declaring an instance variable. Paul From alexander.belopolsky at gmail.com Wed Aug 3 11:40:19 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 3 Aug 2016 11:40:19 -0400 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Wed, Aug 3, 2016 at 11:31 AM, Joao S. O. Bueno wrote: > An __annotations__ dict straight in the class itself? I think that is > the obvious thing - although I did not see it in the above messages. > (A s I said, I may have missed its mention) It was mentioned that var annotations would be evaluated and discarded, but I agree that this is less than ideal. I agree that it is better to store var annotations in the namespace that they appear in. Maybe something like this: a: int = 0 b: List[float] = [] would result in a namespace { 'a': 0, 'b': [], '__var_annotations__': { 'a': int, 'b': List[float], }, } From p.f.moore at gmail.com Wed Aug 3 11:47:12 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 3 Aug 2016 16:47:12 +0100 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On 3 August 2016 at 16:40, Alexander Belopolsky wrote: > On Wed, Aug 3, 2016 at 11:31 AM, Joao S. O. Bueno wrote: >> An __annotations__ dict straight in the class itself? I think that is >> the obvious thing - although I did not see it in the above messages. >> (A s I said, I may have missed its mention) > > It was mentioned that var annotations would be evaluated and > discarded, but I agree that this is less than ideal. I agree that it > is better to store var annotations in the namespace that they appear > in. Agreed, it seems a shame that this would be an area where it's not possible to introspect data that was available at compile time. (I'd say it doesn't feel Pythonic, except that claiming that Guido's proposing something non-Pythonic is self-contradictory :-)) However, I have no expectation of ever needing this data for anything I'd write, so it's not actually something that would matter to me one way or the other. Paul From rosuav at gmail.com Wed Aug 3 11:47:47 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 4 Aug 2016 01:47:47 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Thu, Aug 4, 2016 at 1:40 AM, Alexander Belopolsky wrote: > On Wed, Aug 3, 2016 at 11:31 AM, Joao S. O. Bueno wrote: >> An __annotations__ dict straight in the class itself? I think that is >> the obvious thing - although I did not see it in the above messages. >> (A s I said, I may have missed its mention) > > It was mentioned that var annotations would be evaluated and > discarded, but I agree that this is less than ideal. I agree that it > is better to store var annotations in the namespace that they appear > in. Maybe something like this: > > a: int = 0 > b: List[float] = [] > > would result in a namespace > > { > 'a': 0, > 'b': [], > '__var_annotations__': { > 'a': int, > 'b': List[float], > }, So... the maligned statement "a: int" would actually mean "__var_annotations__['a'] = int", which is a perfectly valid executable statement. This might solve the problem? Question, though: Does __var_annotations__ always exist (as a potentially empty dict), or is it created on first use? A function's __annotations__ is always present, but functions have lots of attributes that we normally don't care about. Polluting every single namespace seems wasteful; magically creating a new local variable when you first hit an annotated 'declaration' seems odd. ChrisA From tritium-list at sdamon.com Wed Aug 3 11:51:13 2016 From: tritium-list at sdamon.com (tritium-list at sdamon.com) Date: Wed, 3 Aug 2016 11:51:13 -0400 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <00b701d1ed9e$dcfb56d0$96f20470$@hotmail.com> > -----Original Message----- > From: Python-ideas [mailto:python-ideas-bounces+tritium- > Agreed, it seems a shame that this would be an area where it's not > possible to introspect data that was available at compile time. (I'd > say it doesn't feel Pythonic, except that claiming that Guido's > proposing something non-Pythonic is self-contradictory :-)) > Guido is the BDFL of CPython, not the arbiter of what the community thinks is 'pythonic'. It is yet to be seen if type annotations are considered pythonic at all. Under the assumption that they are, yes, annotations that also come with initialization would be the more pythonic way of doing things. From chris.barker at noaa.gov Wed Aug 3 11:52:24 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 3 Aug 2016 08:52:24 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> Message-ID: <4929689045364026497@unknownmsgid> > clamp(val, lo, hi) should raise ValueError if either lo or hi is NaN, NaN and ValueError are kind of redundant. NaN exists as a way to (kind of) propagate value errors within the hardware floating point machinery. So it _might_ make sense for Python to raise ValueError wherever a NaN shows up, but as long as, e.g. X * NaN Returns NaN, so should clamp() or clip() or whatever it's called. Greg spelled it all out, but in short: If a NaN is passed in anywhere, you get a NaN back. One could argue that: clamp(NaN, x,x) Is clearly defined as x. But that would require special casing, and, "equality" is a bit of an ephemeral concept with floats, so better to return NaN. -CHB From ethan at stoneleaf.us Wed Aug 3 12:05:05 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 03 Aug 2016 09:05:05 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <00b701d1ed9e$dcfb56d0$96f20470$@hotmail.com> References: <00b701d1ed9e$dcfb56d0$96f20470$@hotmail.com> Message-ID: <57A21631.4040209@stoneleaf.us> On 08/03/2016 08:51 AM, tritium-list at sdamon.com wrote: > Paul Moore opined: >> >> Agreed, it seems a shame that this would be an area where it's not >> possible to introspect data that was available at compile time. (I'd >> say it doesn't feel Pythonic, except that claiming that Guido's >> proposing something non-Pythonic is self-contradictory :-)) > > Guido is the BDFL of CPython [...] Guido is the BDFL of Python, of which cPython is the reference implementation, and by which all other implementations are measured. -- ~Ethan~ From guido at python.org Wed Aug 3 12:11:42 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Aug 2016 09:11:42 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Wed, Aug 3, 2016 at 8:24 AM, Nick Coghlan wrote: > On 3 August 2016 at 10:48, Alvaro Caceres via Python-ideas > wrote: >> The criticism I would make about allowing variables without assignments like >> >> a: float >> >> is that it makes my mental model of a variable a little bit more complicated >> than it is currently. If I see "a" again a few lines below, it can either be >> pointing to some object or be un-initialized. Maybe the benefits are worth >> it, but I don't really see it, and I wanted to point out this "cost". > > This concern rings true for me as well - "I'm going to be defining a > variable named 'a' later and it will be a float" isn't a concept > Python has had before. I *have* that concept in my mental model of > C/C++, but trying to activate for Python has my brain going "Wut? > No.". Have you annotated a large code base yet? This half of the proposal comes from over six months of experience annotating large amounts of code (Dropbox code and mypy itself). We commonly see situations where a variable is assigned on each branch of an if/elif/etc. structure. If you need to annotate that variable, mypy currently requires that you put the annotation on the first assignment to the variable, which is in the first branch. It would be much cleaner if you could declare the variable before the first `if`. But picking a good initializer is tricky, especially if you have a type that does not include None. As an illustration, I found this code: https://github.com/python/mypy/blob/master/mypy/checkexpr.py#L1152 if op == 'not': self.check_usable_type(operand_type, e) result = self.chk.bool_type() # type: Type elif op == '-': method_type = self.analyze_external_member_access('__neg__', operand_type, e) result, method_type = self.check_call(method_type, [], [], e) e.method_type = method_type elif op == '+': method_type = self.analyze_external_member_access('__pos__', operand_type, e) result, method_type = self.check_call(method_type, [], [], e) e.method_type = method_type else: assert op == '~', "unhandled unary operator" method_type = self.analyze_external_member_access('__invert__', operand_type, e) result, method_type = self.check_call(method_type, [], [], e) e.method_type = method_type return result Look at the annotation of `result` in the if-block. We need an annotation because the first functions used to assign it returns a subclasses of `Type`, and the type inference engine will assume the variable's type is that of the first assignment. Be that as it may, given that we need the annotation, I think the code would be clearer if we could set the type *before* the `if` block. But we really don't want to set a value, and in particular we don't want to set it to None, since (assuming strict None-checking) None is not a valid value for this type -- we don't want the type to be `Optional[Type]`. IOW I want to be able to write this code as result: Type if op == 'not': self.check_usable_type(operand_type, e) result = self.chk.bool_type() elif op == '-': # etc. > I'd be much happier if we made initialisation mandatory, so the above > would need to be written as either: > > a: float = 0.0 # Or other suitable default value > > or: > > a: Optional[float] = None > > The nebulous concept (and runtime loophole) where you can see: > > class Example: > a: float > ... > > but still have Example().a throw AttributeError would also be gone. That's an entirely different issue though -- PEP 484 doesn't concern itself with whether variables are always initialized (though it's often easy for a type checker to check that). If we wrote that using `__init__` we could still have such a bug: class Example: def __init__(self, n: int) -> None: for i in range(n): self.a = 0.0 # type: float But the syntax used for declaring types is not implicated in this bug. > (Presumably this approach would also simplify typechecking inside > __new__ and __init__ implementations, as the attribute will reliably > be defined the moment the instance is created, even if it hasn't been > set to an appropriate value yet) But, again, real problems arise when the type of an *initialized* instance must always be some data structure (and not None), but you can't come up with a reasonable default initializer that has the proper type. Regarding the question of whether it's better to declare the types of instance variables in `__init__` (or `__new__`) or at the class level: for historical reasons, mypy uses both idioms in different places, and when exploring the code I've found it much more helpful to see the types declared in the class rather than in `__init__`. Compare for yourself: https://github.com/python/mypy/blob/master/mypy/build.py#L84 (puts the types in `__init__`) https://github.com/python/mypy/blob/master/mypy/build.py#L976 (puts the types in the class) -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Aug 3 12:15:05 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Aug 2016 09:15:05 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Wed, Aug 3, 2016 at 8:31 AM, Joao S. O. Bueno wrote: > I may have missed that on the message deluge so far - but would the > type annotations > be available at runtime, like parameter annotations live in "__annotations__"? Actually we've hardly touched on that yet. For classes I think it would be nice to make these available. For locals I think we should not evaluate the type at all, otherwise the cost of using a variable declaration would be too high. > An __annotations__ dict straight in the class itself? I thinkt hat is > the obvious thing - although I did not see it in the above messages. > (A s I said, I may have missed its mention) > > That is rather important because then, beyond enabling static code > analysis it is easy to have third party frameworks that enforce typing > in runtime. Although, frankly, that's not something that PEP 484 particularly cares about. Runtime type checking has very different requirements -- would you really want to check that a List[int] contains only ints if the list has a million items? -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Aug 3 12:26:12 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Aug 2016 09:26:12 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Wed, Aug 3, 2016 at 8:35 AM, Paul Moore wrote: [...] > Yes, but I view that as "when you ask for an instance variable, if > there isn't one you get the class variable as the default - nested > namespaces basically, just like local variables of a function and > global variables. > > So to me > > class Spaceship: > hitpoints: int = 50 > > declares a class Spaceship, with a *class* variable hitpoints. > Instances will get that value by default, but can assign something > different. But here you're thinking with your runtime hat on. The type checker would actually like to understand what you mean here -- a class variable (maybe it can be updated by saying `Spaceship.hitpoints += 10`) or an instance variable. Which is why I proposed that you can make it a class variable by saying `hitpoints: class int = 50`, and the default would be for it to be an instance variable. I propose that the difference is that class variables cannot be updated through the instance (as that would make it an instance variable, which you've explicitly promised not to do by using the `class int` type). > From a typing perspective, whether it's acceptable for an instance > variable to have a different type than the class variable with the > same name, is the point here. Because of the behaviour of acting as a > default, it probably isn't acceptable. But OTOH, there's nothing > saying you *have* to respect the link > > class Whatever: > myval : int = 20 > def __init__(self): > self.myval = "Fooled you!" Again you're thinking with your runtime hat on. If you were to run this through a type checker you'd *want* to get an error here (since realistically there is probably a lot more going on in that class and the type clash is a symptom of a poor choice for a variable name or a bad initializer). > The problem is of course that there's no way to attach a type > declaration to an instance variable, unless you allow > > self.myval : str = "Fooled you" > > which opens up the possibility of declaring types for (in theory) > arbitrary expressions. We should probably allow that too, but only for assignment to instance variables using `self` (it's easy for the type checker to only allow certain forms -- the runtime should be more lenient). A type checker that warns about instance variables used or set without a declaration could be very useful indeed (catching many typos). > In summary, I think type annotations on variables declared at class > scope should describe the type of the class variable - because that's > what the assignment is creating. That leaves no obvious means of > declaring the type of an instance variable (can you put a > comment-style type annotation on the self.x = whatever line at the > moment?) Which is a problem, but not one (IMO) that should be solved > by somehow pretending that when you declare a class variable you're > actually declaring an instance variable. I look at it from a more pragmatic side. What do I want my type checker to check? The annotations need to be useful to help me catch bugs but not so annoying that I have to constantly override the type checker. From this perspective I definitely want a way to declare the types of instance variables at the class level. Packages like SQLAlchemy and Django and traits, that have each developed their own, quite sophisticated machinery for declaring instance variables, seem to support this perspective. -- --Guido van Rossum (python.org/~guido) From p.f.moore at gmail.com Wed Aug 3 13:29:17 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 3 Aug 2016 18:29:17 +0100 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On 3 August 2016 at 17:26, Guido van Rossum wrote: > But here you're thinking with your runtime hat on Yep, that's the mistake I was making. Thanks for clarifying. I've yet to actually use a type checker[1], so I'm still not used to thinking in terms of "typecheck-time" behaviour. Paul [1] Off-topic, but I'm not sure if type annotations are powerful enough (yet) to track bytes vs str sufficiently to help with resolving Python 2 -> Python 3 string handling errors. When I get a chance to confirm that's possible, I definitely have some candidate codebases I'd like to try my hand with :-) From steve at pearwood.info Wed Aug 3 13:35:32 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 4 Aug 2016 03:35:32 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <20160803173532.GG6608@ando.pearwood.info> On Wed, Aug 03, 2016 at 09:11:42AM -0700, Guido van Rossum wrote: > On Wed, Aug 3, 2016 at 8:24 AM, Nick Coghlan wrote: > > On 3 August 2016 at 10:48, Alvaro Caceres via Python-ideas > > wrote: > >> The criticism I would make about allowing variables without assignments like > >> > >> a: float > >> > >> is that it makes my mental model of a variable a little bit more complicated > >> than it is currently. If I see "a" again a few lines below, it can either be > >> pointing to some object or be un-initialized. Maybe the benefits are worth > >> it, but I don't really see it, and I wanted to point out this "cost". > > > > This concern rings true for me as well - "I'm going to be defining a > > variable named 'a' later and it will be a float" isn't a concept > > Python has had before. I *have* that concept in my mental model of > > C/C++, but trying to activate for Python has my brain going "Wut? > > No.". > > Have you annotated a large code base yet? This half of the proposal > comes from over six months of experience annotating large amounts of > code (Dropbox code and mypy itself). [...] > IOW I want to be able to write this code as > > result: Type > if op == 'not': > self.check_usable_type(operand_type, e) > result = self.chk.bool_type() > elif op == '-': > # etc. Just playing Devil's Advocate here, could you use a type hint *comment*? #type result: Type if op == 'not': self.check_usable_type(operand_type, e) result = self.chk.bool_type() elif op == '-': # etc. Advantages: - absolutely no runtime cost, not even to evaluate and discard the name following the colon; - doesn't (wrongly) imply that the name "result" exists yet; Disadvantages: - can't build a runtime __annotations__ dict; [...] > But, again, real problems arise when the type of an *initialized* > instance must always be some data structure (and not None), but you > can't come up with a reasonable default initializer that has the > proper type. Again, playing Devil's Advocate... maybe we want a concept of "undefined" like in Javascipt. result: Type = undef which would make it clear that this is a type declaration, and that result is still undefined. Disadvantages: - Javascript programmers will think you can write `print(result)` and get "undef" (or similar); - requires a new keyword. -- Steve From elazarg at gmail.com Wed Aug 3 13:52:38 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Wed, 03 Aug 2016 17:52:38 +0000 Subject: [Python-ideas] Trial balloon: adding variable type In-Reply-To: References: Message-ID: > > Date: Wed, 3 Aug 2016 09:11:42 -0700 > From: Guido van Rossum ... It would be much cleaner if you could declare the > variable before the first `if`. But picking a good initializer is > tricky, especially if you have a type that does not include None. > PEP-484 suggests ellipsis for this. ~Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Aug 3 14:01:23 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Aug 2016 11:01:23 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Wed, Aug 3, 2016 at 10:29 AM, Paul Moore wrote: > On 3 August 2016 at 17:26, Guido van Rossum wrote: >> But here you're thinking with your runtime hat on > > Yep, that's the mistake I was making. Thanks for clarifying. > > I've yet to actually use a type checker[1], so I'm still not used to > thinking in terms of "typecheck-time" behaviour. That's unfortunate -- the feature I'm developing here is only of interest to people using a type checker, and the many details of what it should look like and how it should work will benefit most from feedback from people who have actually dealt with the existing way of declaring variables. > Paul > > [1] Off-topic, but I'm not sure if type annotations are powerful > enough (yet) to track bytes vs str sufficiently to help with resolving > Python 2 -> Python 3 string handling errors. When I get a chance to > confirm that's possible, I definitely have some candidate codebases > I'd like to try my hand with :-) Not yet, but we're working on it. See https://github.com/python/typing/issues/208 -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Aug 3 14:03:54 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Aug 2016 11:03:54 -0700 Subject: [Python-ideas] Trial balloon: adding variable type In-Reply-To: References: Message-ID: On Wed, Aug 3, 2016 at 10:52 AM, ????? wrote: >> Date: Wed, 3 Aug 2016 09:11:42 -0700 >> From: Guido van Rossum >> >> ... It would be much cleaner if you could declare the >> variable before the first `if`. But picking a good initializer is >> tricky, especially if you have a type that does not include None. > > PEP-484 suggests ellipsis for this. But x = ... already has a meaning -- it assigns x the (fairly pointless) value Ellipsis (which is a singleton object like None). -- --Guido van Rossum (python.org/~guido) From chris.barker at noaa.gov Wed Aug 3 14:33:34 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 3 Aug 2016 11:33:34 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <20160803173532.GG6608@ando.pearwood.info> References: <20160803173532.GG6608@ando.pearwood.info> Message-ID: On Wed, Aug 3, 2016 at 10:35 AM, Steven D'Aprano wrote: > Again, playing Devil's Advocate... maybe we want a concept of > "undefined" like in Javascipt. > > result: Type = undef > > which would make it clear that this is a type declaration, and that > result is still undefined. > I like this -- and we need to change the interpreter anyway. I take it this would be a no-op at run time? Though I'm still on the fence -- it's a common idiom to use None to mean undefined. For example, I at least, always thought is was better style to do: class Something(): an_attribute = None and then: if self.an_attribute is None: .... Than not predefine it, and do: if not hasattr(self, 'an_attribute'): .... granted, I only do that for class (or instance) attributes in real code, not all names. So do we really need to support "this variable can be undefined, but if it is defined in can NOT be None? Are there really cases where a variable can't be None, but there is NO reasonable default value? Or are folks hitting a limitation in the type checker's ability to deal with None? > Disadvantages: > > - Javascript programmers will think you can write `print(result)` and > get "undef" (or similar); > Do we care about that???? BTW, does JS have a None? or is 'undef' its None? -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin-lists at theolliviers.com Wed Aug 3 15:53:48 2016 From: kevin-lists at theolliviers.com (Kevin Ollivier) Date: Wed, 03 Aug 2016 12:53:48 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <20160803173532.GG6608@ando.pearwood.info> Message-ID: <147CB148-89DB-4F2C-A88C-550A2E237338@theolliviers.com> Hi Chris, From: Python-ideas on behalf of Chris Barker Date: Wednesday, August 3, 2016 at 11:33 AM To: Steven D'Aprano Cc: Python-Ideas Subject: Re: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 On Wed, Aug 3, 2016 at 10:35 AM, Steven D'Aprano wrote: Again, playing Devil's Advocate... maybe we want a concept of "undefined" like in Javascipt. result: Type = undef which would make it clear that this is a type declaration, and that result is still undefined. I like this -- and we need to change the interpreter anyway. I take it this would be a no-op at run time? Though I'm still on the fence -- it's a common idiom to use None to mean undefined. For example, I at least, always thought is was better style to do: class Something(): an_attribute = None and then: if self.an_attribute is None: .... Than not predefine it, and do: if not hasattr(self, 'an_attribute'): .... granted, I only do that for class (or instance) attributes in real code, not all names. This is SOP for me, too. I think the hassle of dealing with uninitialized variables (and the bugs that result when not being diligent about them) naturally discourages their use. So do we really need to support "this variable can be undefined, but if it is defined in can NOT be None? Are there really cases where a variable can't be None, but there is NO reasonable default value? Or are folks hitting a limitation in the type checker's ability to deal with None? Disadvantages: - Javascript programmers will think you can write `print(result)` and get "undef" (or similar); Do we care about that???? BTW, does JS have a None? or is 'undef' its None? JavaScript has the null keyword in addition to undefined. In my experience with JavaScript, in practice having both a null keyword and an undefined keyword almost always ends up being a source of bugs. Consider the non-intuitive results of the following: var value = undefined; If (value == null) // returns true if value is undefined thanks to type coercion, since null is coerced to false var value = null; If (value == undefined) // returns true if value is null, again thanks to type coercion This is one reason why JS needs the === / !== operators in order to short-curcuit type coercion, and the need to do this is a common source of mistakes among JS developers. I do think Python is better equipped to make this less confusing thanks to the is keyword, etc., but no matter how you look at it, undef, if created, will be something distinct from None and yet sometimes devs will see the two as equivalent. This opens up questions like "do we want a way for None and undef to be equivalent sometimes, but in other cases consider them distinct cases?" I'd say keep it simple. Have developers use None for uninitialized variables if we require assignment, or simply make: a: int implicitly set the value to None if we allow no explicit assignment. Regards, Kevin -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Aug 3 16:10:21 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 4 Aug 2016 06:10:21 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <20160803173532.GG6608@ando.pearwood.info> References: <20160803173532.GG6608@ando.pearwood.info> Message-ID: On Thu, Aug 4, 2016 at 3:35 AM, Steven D'Aprano wrote: >> But, again, real problems arise when the type of an *initialized* >> instance must always be some data structure (and not None), but you >> can't come up with a reasonable default initializer that has the >> proper type. > > Again, playing Devil's Advocate... maybe we want a concept of > "undefined" like in Javascipt. > > result: Type = undef > > which would make it clear that this is a type declaration, and that > result is still undefined. > > Disadvantages: > > - Javascript programmers will think you can write `print(result)` and > get "undef" (or similar); > - requires a new keyword. Python already has a way of spelling "make this no longer be defined": del result Not sure that helps, though, because it too will NameError if it doesn't already exist :) ChrisA From p.f.moore at gmail.com Wed Aug 3 16:54:39 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 3 Aug 2016 21:54:39 +0100 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On 3 August 2016 at 19:01, Guido van Rossum wrote: > On Wed, Aug 3, 2016 at 10:29 AM, Paul Moore wrote: >> On 3 August 2016 at 17:26, Guido van Rossum wrote: >>> But here you're thinking with your runtime hat on >> >> Yep, that's the mistake I was making. Thanks for clarifying. >> >> I've yet to actually use a type checker[1], so I'm still not used to >> thinking in terms of "typecheck-time" behaviour. > > That's unfortunate -- the feature I'm developing here is only of > interest to people using a type checker, and the many details of what > it should look like and how it should work will benefit most from > feedback from people who have actually dealt with the existing way of > declaring variables. Understood. Mostly I'm keeping out of it - I chimed in because I (mistakenly) thought the way of declaring instance variables clashed with a non-(typing)-expert's understanding of what was going on (defining a class variable which is used as a fallback value if there's no instance variable of the same name). I still think the notation could be confusing, but the confusion can be fixed with documentation, and I'm happy to propose fixes for documentation when I finally get round to using the feature. Sorry for the noise. >> [1] Off-topic, but I'm not sure if type annotations are powerful >> enough (yet) to track bytes vs str sufficiently to help with resolving >> Python 2 -> Python 3 string handling errors. When I get a chance to >> confirm that's possible, I definitely have some candidate codebases >> I'd like to try my hand with :-) > > Not yet, but we're working on it. See > https://github.com/python/typing/issues/208 Great :-) Paul From pavol.lisy at gmail.com Wed Aug 3 18:02:20 2016 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Thu, 4 Aug 2016 00:02:20 +0200 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: If I understand this proposal then we probably need to consider this too: if something: a: float else: a: str a = 'what would static type-checker do here?' del a a: int = 0 def fnc(): global a: list a = 7 fnc() a = [1, 2, 3] # and this could be interesting for static type-checker too From mistersheik at gmail.com Wed Aug 3 19:01:22 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 3 Aug 2016 16:01:22 -0700 (PDT) Subject: [Python-ideas] Making the stdlib consistent again In-Reply-To: References: Message-ID: At the risk of making a long thread even longer, one idea that would not involve any work on the part of core developers, but might still partially satisfy the proponents of this plan is to maintain your own shim modules. For example, nose is a (the de facto?) testing library for Python, and it provides pep8 aliases to most of unittest's commands. You could have your own similar project on pypi, e.g., "logging8" or something like that. I for one would use it instead of logging because I find the pep8 style "relaxing". Best, Neil On Monday, August 1, 2016 at 1:34:13 PM UTC-4, Brett Cannon wrote: > > > > On Mon, Aug 1, 2016, 08:36 Matthias welp > > wrote: > >> On 1 August 2016 at 16:23, Chris Angelico > >> wrote: >> >On Mon, Aug 1, 2016 at 11:31 PM, Matthias welp > > wrote: >> >>> Now, suppose that the "external function" switches to use the name >> >>> collections.DefaultDict. The above code will break, unless the two >> >>> names defaultdict and DefaultDict are updated in parallel somehow to >> >>> both point to MyMockClass. How do you propose we support that? >> >> >> >> This would be fixed with the 'aliasing variable names'-solution. >> >> >> > >> > Not sure I follow; are you proposing that module attributes be able to >> > say "I'm the same as that guy over there"? That could be done with >> > descriptor protocol (think @property, where you can write a >> > getter/setter that has the actual value in a differently-named public >> > attribute), but normally, modules don't allow that, as you need to >> > mess with the class not the instance. But there have been numerous >> > proposals to make that easier for module authors, one way or another. >> > That would make at least some things easier - the mocking example >> > would work that way - but it'd still mean people have to grok more >> > than one name when reading code, and it'd most likely mess with >> > people's expectations in tracebacks etc (the function isn't called >> > what I thought it was called). Or is that not what you mean by >> > aliasing? >> >> By aliasing I meant that the names of the fuctions/variables/ >> classes (variables) are all using the same value pointer/location. >> That could mean that in debugging the name of the variable >> is used as the name of the function. e.g. debugging get_int >> results in 'in get_int(), line 5' but it's original getint results in >> 'in getint(), line 5'. Another option is 'in get_int(), line 5 of >> getint()' >> if you want to retain the source location and name. >> > > And all of that requires work beyond simple aliasing by assignment. That > means writing code to make this work as well as tests to make sure nothing > breaks (on top of the documentation). > > Multiple core devs have now said why this isn't a technical problem. Nick > has pointed out that unless someone does a study showing the new names > would be worth the effort then the situation will not change. At this point > the core devs have either muted this thread and thus you're not reaching > them anymore or we are going to continue to give the same answer and we > feel like we're repeating ourselves and this is a drain on our time. > > I know everyone involved on this thread means well (including you, > Matthias), but do realize that not letting this topic go is a drain on > people's time. Every email you write is taking the time of hundreds of > people, so it's not just taking 10 minutes of my spare time to read and > respond to this email while I'm on vacation (happy BC Day), but it's a > minute for everyone else who is on this mailing list to simply decide what > to do with it (you can't assume people can mute threads thanks to the > variety of email clients out there). So every email sent to this list > literally takes an accumulative time of hours from people to deal with. So > when multiple core devs have given an answer and what it will take to > change the situation then please accept that answer. Otherwise you run the > risk of frustrating the core devs by making us feel like we're are not > being listened to or trusted. And that is part of what leads to burnout for > the core devs. > > -brett > > >> It could be descriptors for Python library implementations, but in C >> it could be implemented as a pointer to instead of >> a struct containing , or it could compile(?) to use >> the same reference. >> >> I am not familiar enough with the structure of CPython and >> how it's variable lookups are built, but these are just a few ideas. >> >> -Matthias >> _______________________________________________ >> Python-ideas mailing list >> Python... at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Aug 3 19:53:55 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Aug 2016 16:53:55 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Wed, Aug 3, 2016 at 3:02 PM, Pavol Lisy wrote: > If I understand this proposal then we probably need to consider this too: > > if something: > a: float > else: > a: str > > a = 'what would static type-checker do here?' The beauty of it is that that's entirely up to the static checker. In mypy this would probably be an error. But at runtime we can make this totally well-defined. > del a > a: int = 0 Ditto. > def fnc(): > global a: list I'm not proposing to add such syntax, and the right place for the type of a would be at the global level, not on the `global` syatement. > a = 7 > fnc() > a = [1, 2, 3] # and this could be interesting for static type-checker too Indeed, but that's not what we're debating here. -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Wed Aug 3 20:52:12 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 4 Aug 2016 10:52:12 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <20160803173532.GG6608@ando.pearwood.info> Message-ID: <20160804005212.GI6608@ando.pearwood.info> On Wed, Aug 03, 2016 at 11:33:34AM -0700, Chris Barker wrote: > On Wed, Aug 3, 2016 at 10:35 AM, Steven D'Aprano > wrote: > > > Again, playing Devil's Advocate... maybe we want a concept of > > "undefined" like in Javascipt. > > > > result: Type = undef > > > > which would make it clear that this is a type declaration, and that > > result is still undefined. > > > > I like this -- and we need to change the interpreter anyway. I take it this > would be a no-op at run time? That was my intention. result: Type = undef would be a no-op at runtime, but the type-checker could take the type-hint from it. There's no actual "undef" value. If you followed that line by: print(result) you would get a NameError, same as today. To answer Chris A's objection in a later post, this is *not* an alternative to the existing `del` keyword. You can't use it to delete an existing name: result = 23 result = undef # SyntaxError, no type given result: Type = undef # confuse the type-checker AND a runtime no-op > Though I'm still on the fence -- it's a common idiom to use None to mean > undefined. Yes, but None is an actual value. "undef" would not be, it would just be syntax to avoid giving an actual value. I don't actually want to introduce a special "Undefined" value like in Javascript, I just want something to say to the program "this is just a type declaration, no value has been defined as yet". > So do we really need to support "this variable can be undefined, but if it > is defined in can NOT be None? > > Are there really cases where a variable can't be None, but there is NO > reasonable default value? Yes. See Guido's responses. The use-case is: (1) you have a variable that needs a type-hint for a specific type, excluding None; (2) there's no default value you can give it; (3) and it's set in different places (such as in different branches of an if...elif... block). Currently, you have to write: if condition: spam: Widget = Widget(arg) # lots more code here elif other_condition: spam: Widget = load(storage) # lots more code here elif another_condition spam: Widget = eggs.lookup('cheese')['aardvark'] # lots more code here else: spam: Widget = create_widget(x, y, z) # lots more code here which repeats the Widget declaration multiple times. For readability and maintenance, Guido wants to pull the declaration out and put it at the top like this: spam = ???? #type spam:Widget if condition: spam = Widget(arg) # lots more code here # etc. but there's nothing he can put in place of the ???? placeholder. He can't use a specific Widget, because he doesn't know which Widget will be used. He can't use None, because None is not a valid value for spam. Neither of these options are right: spam: Widget = None # fails the type check spam: Optional[Widget] = None # passes the type check here # but allows spam to be None later, which is wrong. > Or are folks hitting a limitation in the type checker's ability to deal > with None? No. The limitation is that there's no way to declare a type-hint without declaring a value at the same time. Hence some suggestions: spam: Widget spam: Widget = undef #type spam: Widget > > Disadvantages: > > > > - Javascript programmers will think you can write `print(result)` and > > get "undef" (or similar); > > > > Do we care about that???? I shouldn't have specifically said "Javascript programmers". I think it applies to lots of people. And besides, there's probably already people who have a variable called "undef" and turning it into a keyword will break their code. > BTW, does JS have a None? or is 'undef' its None? Javascript has a null, as well as an undefined value. Thanks to the magic of Javascript's type-coercion rules, they compare equal: [steve at ando ~]$ rhino Rhino 1.7 release 0.7.r2.3.el5_6 2011 05 04 js> a = null; null js> b = undefined; js> print(a, b) null undefined js> a == b; true js> a === b; false In case my intent is still not clear, I DON'T want to introduce an actual "undefined" value, as in Javascript. I only made this suggestion in the hope that it might spark a better idea in others. -- Steve From guido at python.org Wed Aug 3 21:58:14 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Aug 2016 18:58:14 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <20160804005212.GI6608@ando.pearwood.info> References: <20160803173532.GG6608@ando.pearwood.info> <20160804005212.GI6608@ando.pearwood.info> Message-ID: On Wed, Aug 3, 2016 at 5:52 PM, Steven D'Aprano wrote: [...] > result: Type = undef > > would be a no-op at runtime, but the type-checker could take the > type-hint from it. There's no actual "undef" value. If you followed that > line by: > > print(result) > > you would get a NameError, same as today. [...] Thanks for explaining my use case so well (in the text I snipped). However, I still feel that just result: Type is the better alternative. Syntactically there's no problem with it. It doesn't require a new `undef` keyword or constant, and it doesn't make people think there's an `undef` value. As you mentioned, this has been an unmitigated disaster in JS. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Thu Aug 4 02:11:37 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 04 Aug 2016 18:11:37 +1200 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <57A2DC99.3090302@canterbury.ac.nz> Guido van Rossum wrote: > Which is why I proposed that you can > make it a class variable by saying `hitpoints: class int = 50`, and > the default would be for it to be an instance variable. What about class variables that serve as defaults for instance variables? Would you be required to declare it twice in that case? -- Greg From greg.ewing at canterbury.ac.nz Thu Aug 4 02:15:10 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 04 Aug 2016 18:15:10 +1200 Subject: [Python-ideas] Trial balloon: adding variable type In-Reply-To: References: Message-ID: <57A2DD6E.5020902@canterbury.ac.nz> Guido van Rossum wrote: > But > > x = ... > > already has a meaning -- it assigns x the (fairly pointless) value > Ellipsis You can always say x = Ellipsis if you really want that. -- Greg From greg.ewing at canterbury.ac.nz Thu Aug 4 02:38:00 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 04 Aug 2016 18:38:00 +1200 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <20160804005212.GI6608@ando.pearwood.info> References: <20160803173532.GG6608@ando.pearwood.info> <20160804005212.GI6608@ando.pearwood.info> Message-ID: <57A2E2C8.9060208@canterbury.ac.nz> Concerning type annotations on local variables of a function: 1) Would they be stored anywhere? If so, where? 2) Would they be evaluated once, or every time the function is called? 3) If they're evaluated once, when and in what scope? -- Greg From elazarg at gmail.com Thu Aug 4 03:07:29 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 04 Aug 2016 07:07:29 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: > Date: Wed, 3 Aug 2016 16:53:55 -0700 > From: Guido van Rossum > > On Wed, Aug 3, 2016 at 3:02 PM, Pavol Lisy wrote: > > > def fnc(): > > global a: list > > I'm not proposing to add such syntax, and the right place for the type > of a would be at the global level, not on the `global` syatement. > If 'a' is only read, it looks like a sensible way to state that this function should be called only when 'a' is a list. It's not the same as a global type declaration. More like a parameter type hint. If 'a' is written to, it is very similar to the 'if else' example. (which, if I may add, is usually defined as the meet of the assertions. If we talk about abstract interpretation frameworks) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavol.lisy at gmail.com Thu Aug 4 03:16:13 2016 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Thu, 4 Aug 2016 09:16:13 +0200 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On 8/4/16, Guido van Rossum wrote: > On Wed, Aug 3, 2016 at 3:02 PM, Pavol Lisy wrote: >> def fnc(): >> global a: list > > I'm not proposing to add such syntax, and the right place for the type > of a would be at the global level, not on the `global` syatement. > >> a = 7 >> fnc() >> a = [1, 2, 3] # and this could be interesting for static type-checker >> too > > Indeed, but that's not what we're debating here. Sorry but for me is really important where we are going (at least as inspiration). As I understand now this changes could end in code which could be pure python and on other hand also compilable by static typed compiler too. Maybe it is what we wanted (I prefer this one), maybe we doesnt care and maybe we want to avoid it. For example Cython's code: cdef int n could be writen: n: cdef.int or n: 'cdef int' and I think it could be good to see convergence here. And maybe this cdef int n, k, i could be inspiring too allow one to many possibility n, j, k: int # (1) I think it is natural and better than n, j, k: int, int, int But (1) would be discordant with this (from PEP484) def __init__(self, left: Node, right: Node) From srkunze at mail.de Thu Aug 4 04:46:03 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 4 Aug 2016 10:46:03 +0200 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <6059264f-f312-9dac-1e1e-9efc90a606b6@mail.de> Another thought: Can it be used for something else than type declarations like function annotations allow? On 01.08.2016 23:31, Guido van Rossum wrote: > PEP 484 doesn't change Python's syntax. Therefore it has no good > syntax to offer for declaring the type of variables, and instead you > have to write e.g. > > a = 0 # type: float > b = [] # type: List[int] > c = None # type: Optional[str] > > I'd like to address this in the future, and I think the most elegant > syntax would be to let you write these as follows: > > a: float = 0 > b: List[int] = [] > c: Optional[str] = None > > (I've considered a 'var' keyword in the past, but there just are too > many variables named 'var' in my code. :-) > > There are some corner cases to consider. First, to declare a > variable's type without giving it an initial value, we can write this: > > a: float > > Second, when these occur in a class body, they can define either class > variables or instance variables. Do we need to be able to specify > which? > > Third, there's an annoying thing with tuples/commas here. On the one > hand, in a function declaration, we may see (a: int = 0, b: str = ''). > On the other hand, in an assignment, we may see > > a, b = 0, '' > > Suppose we wanted to add types to the latter. Would we write this as > > a, b: int, str = 0, '' > > or as > > a: int, b: str = 0, '' > > ??? Personally I think neither is acceptable, and we should just write it as > > a: int = 0 > b: str = '' > > but this is a slight step back from > > a, b = 0, '' # type: (int, str) > From greg.ewing at canterbury.ac.nz Thu Aug 4 01:59:39 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 04 Aug 2016 17:59:39 +1200 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <4929689045364026497@unknownmsgid> References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <4929689045364026497@unknownmsgid> Message-ID: <57A2D9CB.7090403@canterbury.ac.nz> Chris Barker - NOAA Federal wrote: > One could argue that: > > clamp(NaN, x,x) > > Is clearly defined as x. But that would require special casing, and, > "equality" is a bit of an ephemeral concept with floats, so better to > return NaN. Yeah, it would only apply to a vanishingly small part of the possible parameter space, so I don't think it would be worth the bother. -- Greg From srkunze at mail.de Thu Aug 4 04:59:39 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 4 Aug 2016 10:59:39 +0200 Subject: [Python-ideas] proper naming of Enum members In-Reply-To: <20160718143503.42267702@anarchist.wooz.org> References: <578CF316.7040806@stoneleaf.us> <20160718143503.42267702@anarchist.wooz.org> Message-ID: On 18.07.2016 20:35, Barry Warsaw wrote: > and so on. Make the code less shifty :). > That's what I thought as well and good reference to repeating the namespace of the constants. +1 ALL_CAPS feels very inconvenient to me as it reminds me of 1) ancient history (DOS, C, etc.) and 2) A SHOUTING CONVERSATION WHICH GIVES TOO MUCH WEIGHT TO UNIMPORTANT THINGS!!!!! CAN YOU HERE MEEE?? ;-) Sven From srkunze at mail.de Thu Aug 4 05:06:26 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 4 Aug 2016 11:06:26 +0200 Subject: [Python-ideas] Sequence views In-Reply-To: References: Message-ID: On 19.07.2016 16:43, Serhiy Storchaka wrote: > NumPy have similar features, but they work with packed arrays of > specific numeric types, not with general sequences (such as list or > str). And NumPy is a large library, providing a number of features not > needed for most Python users. IIRC, numpy also supports object-arrays not just numeric arrays. Sven From dmoisset at machinalis.com Thu Aug 4 07:03:26 2016 From: dmoisset at machinalis.com (Daniel Moisset) Date: Thu, 4 Aug 2016 12:03:26 +0100 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <6059264f-f312-9dac-1e1e-9efc90a606b6@mail.de> References: <6059264f-f312-9dac-1e1e-9efc90a606b6@mail.de> Message-ID: I'll post here another wild idea (based on the https://github.com/python/ mypy/blob/master/mypy/build.py#L976 example), just thinking on instance attributes. I myself am not fully sold on it but it's quite different so perhaps adds a new angle to the brainstorming and has some nice upsides: class State: def __instancevars__( manager: BuildManager, order: int, # Order in which modules were encountered id: str, # Fully qualified module name path: Optional[str], # Path to module source xpath: str, # Path or '' source: Optional[str], # Module source code meta: Optional[CacheMeta], data: Optional[str], tree: Optional[MypyFile], dependencies: List[str], suppressed: List[str], # Suppressed/missing dependencies priorities: Dict[str, int] ): ... The benefits of this is: * No need to add new syntax or keywords here, this works in any 3.x * This part of the change can be done right now without waiting for python 3.6 * it's guaranteed to be consistent with function annotations * The instance vars annotations end up in a standard place on runtime if someone needs them (run time checking, documentation generators, etc). * No confusing access to a name that may look like a NameError * Very obvious runtime behaviour for python users even if they don't use a typechecker. * This can even be used in python 2.x, by changing the annotation to a "# type: T" along the argument. The downsides are: * There is a bit of boilerplate (the parenthesis, the ellipsis, the extra commas) * It's unclear what the body of this function is supposed to do, but probably the typechecker can ensure it's always "..." (I'm assuming that whoever writes this wants to use a type checker anyway) * It's also unclear what a default argument would mean here, or *args, or **kwargs (again, the typechecker could enforce proper usage here) * It doesn't cover locals and class variables, so it's still necessary the new syntax for "type declaration and initialization assignment" (but not the "declaration without assignment") As a second order iterations on the idea * If this idea works well now it's easier to add nicer syntax later that maps to this (like a block statement) and specify the runtime semantics in term of this (which is what I see the traditional way in python) * It could be possible to give semantic to the body (like a post __new__ thing where locals() from this function are copied to the instance dict), and this would allow setting real instance defaults () but this is new class creation semantics and can have a performance cost on instantiation when having types declared Hope this adds to the discussion, On Thu, Aug 4, 2016 at 9:46 AM, Sven R. Kunze wrote: > Another thought: > > Can it be used for something else than type declarations like function > annotations allow? > > > On 01.08.2016 23:31, Guido van Rossum wrote: > >> PEP 484 doesn't change Python's syntax. Therefore it has no good >> syntax to offer for declaring the type of variables, and instead you >> have to write e.g. >> >> a = 0 # type: float >> b = [] # type: List[int] >> c = None # type: Optional[str] >> >> I'd like to address this in the future, and I think the most elegant >> syntax would be to let you write these as follows: >> >> a: float = 0 >> b: List[int] = [] >> c: Optional[str] = None >> >> (I've considered a 'var' keyword in the past, but there just are too >> many variables named 'var' in my code. :-) >> >> There are some corner cases to consider. First, to declare a >> variable's type without giving it an initial value, we can write this: >> >> a: float >> >> Second, when these occur in a class body, they can define either class >> variables or instance variables. Do we need to be able to specify >> which? >> >> Third, there's an annoying thing with tuples/commas here. On the one >> hand, in a function declaration, we may see (a: int = 0, b: str = ''). >> On the other hand, in an assignment, we may see >> >> a, b = 0, '' >> >> Suppose we wanted to add types to the latter. Would we write this as >> >> a, b: int, str = 0, '' >> >> or as >> >> a: int, b: str = 0, '' >> >> ??? Personally I think neither is acceptable, and we should just write it >> as >> >> a: int = 0 >> b: str = '' >> >> but this is a slight step back from >> >> a, b = 0, '' # type: (int, str) >> >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Aug 4 08:35:58 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 4 Aug 2016 22:35:58 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> Message-ID: <20160804123557.GJ6608@ando.pearwood.info> On Tue, Aug 02, 2016 at 04:35:55PM -0700, Chris Barker wrote: > If someone is passing a NaN in for a bound, then they are passing in > garbage, essentially -- "I have no idea what my bounds are" so garbage is > what they should get back -- "I have no idea what your clamped values are". The IEEE 754 standard tells us what min(x, NAN) and max(x, NAN) should be: in both cases it is x. https://en.wikipedia.org/wiki/IEEE_754_revision#min_and_max Quote: In order to support operations such as windowing in which a NaN input should be quietly replaced with one of the end points, min and max are defined to select a number, x, in preference to a quiet NaN: min(x,NaN) = min(NaN,x) = x max(x,NaN) = max(NaN,x) = x According to Wikipedia, this behaviour was chosen specifically for the use-case we are discussing: windowing or clamping. See also page 9 of Professor William Kahan's notes here: https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF Quote: For instance max{x, y} should deliver the same result as max{y, x} but almost no implementations do that when x is NaN. There are good reasons to define max{NaN, 5} := max{5, NaN} := 5 though many would disagree. It's okay to disagree and want "NAN poisoning" behaviour. If we define clamp(x, NAN, NAN) as x, as I have been arguing, then you can *easily* get the behaviour you want with a simple wrapper: def clamp(x, lower, upper): if math.isnan(lower) or math.isnan(upper): # raise or return NAN else: return math.clamp(x, lower, upper) Apart from the cost of one extra function call, which isn't too bad, this is no more expensive than what you are suggesting *everyone* should pay (two calls to math.isnan). So you are no worse off under my proposal: just define your own helper function, and you get the behaviour you want. We all win. But if the standard clamp() function has the behaviour you want, violating IEEE-754, then you are forcing it on *everyone*, whether they want it or not. I don't want it, and I cannot use it. There's nothing I can do except re-implement clamp() from scratch and ignore the one in the math library. As you propose it, clamp() is no use to me: it unnecesarily converts the bounds to float, which may raise an exception. If I use it in a loop, it unnecessarily checks to see if the bounds are NANs, over and over and over again, even when I know that they aren't. It does the wrong thing (according to my needs, according to Professor Kahan, and according to the current revision of IEEE-754) if I do happen to pass a NAN as bounds. Numpy has a "nanmin" which ignores NANs (as specified by IEEE-754), and "amin" which propogates NANs: http://docs.scipy.org/doc/numpy/reference/generated/numpy.nanmin.html http://docs.scipy.org/doc/numpy/reference/generated/numpy.amin.html Similar for "minimum" and "fmin", which return the element-wise minimums. By the way, there are also POSIX functions fmin and fmax which behave according to the standard: http://man7.org/linux/man-pages/man3/fmin.3.html http://man7.org/linux/man-pages/man3/fmax.3.html Julia has a clamp() function, although unfortunately the documentation doesn't say what the behaviour with NANs is: http://julia.readthedocs.io/en/latest/stdlib/math/#Base.clamp -- Steve From ncoghlan at gmail.com Thu Aug 4 08:37:33 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Aug 2016 22:37:33 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On 4 August 2016 at 02:11, Guido van Rossum wrote: > On Wed, Aug 3, 2016 at 8:24 AM, Nick Coghlan wrote: >> On 3 August 2016 at 10:48, Alvaro Caceres via Python-ideas >> wrote: >>> The criticism I would make about allowing variables without assignments like >>> >>> a: float >>> >>> is that it makes my mental model of a variable a little bit more complicated >>> than it is currently. If I see "a" again a few lines below, it can either be >>> pointing to some object or be un-initialized. Maybe the benefits are worth >>> it, but I don't really see it, and I wanted to point out this "cost". >> >> This concern rings true for me as well - "I'm going to be defining a >> variable named 'a' later and it will be a float" isn't a concept >> Python has had before. I *have* that concept in my mental model of >> C/C++, but trying to activate for Python has my brain going "Wut? >> No.". > > Have you annotated a large code base yet? This half of the proposal > comes from over six months of experience annotating large amounts of > code (Dropbox code and mypy itself). We commonly see situations where > a variable is assigned on each branch of an if/elif/etc. structure. If > you need to annotate that variable, mypy currently requires that you > put the annotation on the first assignment to the variable, which is > in the first branch. It would be much cleaner if you could declare the > variable before the first `if`. But picking a good initializer is > tricky, especially if you have a type that does not include None. Ah, that makes sense - given that motivation, I agree it's worth introducing the concept. You'll probably want to make sure to give that example from the mypy code (or a simpler version), as I expect that pre-declaration aspect will be the most controversial part of the whole proposal. I wonder if there's some way we could make that new statement form trigger the following runtime behaviour: a : int a # This raises a new UnboundNameError at module scope, UnboundLocalError otherwise Otherwise we're at risk of allowing thoroughly confusing runtime behaviour like: >>> a = "Not an int" >>> def f(): ... # a: int would go here ... print(a) # This should really fail ... >>> f() Not an int The possibility that springs to mind is a new dedicated opcode, DECLARE_NAME, that works like an assignment that appears anywhere in the function for function namespaces, and does something new for module and class namespaces where it's like an assignment, but doesn't appear in locals(). Depending on how the latter work, we may even be able to raise a new UnboundAttributeError subclass for attempts to access declared-but-not-defined attributes. We'd also want the new syntax to conflict with both global and nonlocal, the same way they currently conflict with each other: >>> def f(): ... global a ... nonlocal a ... File "", line 2 SyntaxError: name 'a' is nonlocal and global Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Thu Aug 4 08:42:55 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 4 Aug 2016 22:42:55 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <57A12701.1020903@canterbury.ac.nz> References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <57A12701.1020903@canterbury.ac.nz> Message-ID: <20160804124254.GK6608@ando.pearwood.info> On Wed, Aug 03, 2016 at 11:04:33AM +1200, Greg Ewing wrote: > Steven D'Aprano wrote: > >In any case, clamping is based of < and > comparisons, which are > >well-specified by IEEE 754 even when NANs are included: > > Those rules are not enough to determine the behaviour of > functions such as min, max and clamp, though. IEEE-754 specifies the behaviour of min() and max() with a NAN argument. See my previous email. > >>why not say that passing NaNs as bounds will result in NaN result? > > > >Because that means that EVERY call to clamp() has to convert both bounds > >to float and see if they are NANs. > > No, it doesn't -- it only needs to check whether they are > NaN if they're *already* floats. If they're not floats, > they're obviously not NaN, so just leave them alone. That's wrong. Decimal has NANs. Other numeric types could potentially have NANs too. > (Actually, that's not quite true, since they could be a > custom type with its own notion of NaN-ness -- maybe we > could do with an __isnan__ protocol?) I would like to see a common "isnan" function or method for both floats and Decimal, because I'm sick of writing code like: try: flag = x.is_nan() # Decimal? except AttributeError: flag = math.isnan(x) # float Note that the spelling is different too, for extra sadness :-( -- Steve From rosuav at gmail.com Thu Aug 4 09:12:36 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 4 Aug 2016 23:12:36 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160804124254.GK6608@ando.pearwood.info> References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <57A12701.1020903@canterbury.ac.nz> <20160804124254.GK6608@ando.pearwood.info> Message-ID: On Thu, Aug 4, 2016 at 10:42 PM, Steven D'Aprano wrote: > I would like to see a common "isnan" function or method for both floats > and Decimal, because I'm sick of writing code like: > > try: > flag = x.is_nan() # Decimal? > except AttributeError: > flag = math.isnan(x) # float > > > Note that the spelling is different too, for extra sadness :-( def isnan(x): return x != x Might land you a few false positives, but not many. And it's IEEE compliant. ChrisA From steve at pearwood.info Thu Aug 4 09:20:28 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 4 Aug 2016 23:20:28 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <57A12B5A.90504@canterbury.ac.nz> References: <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> Message-ID: <20160804132028.GL6608@ando.pearwood.info> On Wed, Aug 03, 2016 at 11:23:06AM +1200, Greg Ewing wrote: > David Mertz wrote: > >It really doesn't make sense to me that a clamp() function would *limit > >to* a NaN. That's what I thought too, at first, but on reading more about the IEEE-754 standard, I've changed my mind. Passing a NAN as bounds can be interpreter as "bounds is missing", i.e. "no bounds". > Keep in mind that the NaNs involved have probably arisen > from some other computation that went wrong, and that > the purpose of the whole NaN system is to propagate an > indication of that wrongness so that it's evident in the > final result. That's not quite right. NANs are allowed to "disappear". In fact, Professor Kahan has specifically written that NANs which cannot diappear out of a calculation are useless: Were there no way to get rid of NaNs, they would be as useless as Indefinites on CRAYs; as soon as one were encountered, computation would be best stopped rather than continued for an indefinite time to an Indefinite conclusion. That is why some operations upon NaNs must deliver non-NaN results. Which operations? Page 8, https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF He describes some of the conditions under which a NAN might drop out of a calculation. He also says that min(NAN, x) and max(NAN, x) should both return x, which implies that so should clamp(x, NAN, NAN). > So here's how I see it: > > clamp(NaN, y, z) is asking "Is an unknown number between > y and z?" The answer to that is not known, so the result > should be NaN. I agree, and fortunately that's easily performed without any explicit test for NAN-ness. Given x = float('nan'), neither x < lower nor x > upper will ever be true, no matter what the lower and upper bounds are. So we'll fall through to the default and return x, which is a NAN, as wanted. > clamp(x, y, NaN) is asking "Is x between y and an unknown > number?" If x > y, the answer to that is not known, so the > result should be NaN. No, that's not necessarily right. That's one possible interpretation of setting a bounds to NAN. I've seen that referred to as "NAN poisoning", and it is a reasonable thing to ask for. But... ...another interpretion, and one which is closer to the current revision of the IEEE-754 standard, is that clamp(x, NAN, NAN) should treat the NANs as "missing values", i.e. that there is no lower or upper bound. That would be equivalent to specifying infinities as bounds. If you want a NAN-poisoning version of clamp(), it is easy to build it from a NAN-as-missing-value clamp(). If you start with NAN-poisoning, you can't easily get NANs-as-missing-values. So if we get only one, we should treat NANs as missing values, and let people build the NAN-poisoning version as a wrapper. > If x < y, you might argue that the result should be y. > But consider clamp(x, 2, 1). You're asking it to limit > x to a value not less than 2 and not greater than 1. > There's no such number, so arguably the result should > be NaN. In that case, I would raise ValueError. > So in summary, I think it should be: > > clamp(NaN, y, z) --> NaN Agreed. It couldn't reasonably be anything else. > clamp(x, NaN, z) --> NaN > clamp(x, y, NaN) --> NaN No, both these cases should treat NAN as equivalent to no limit, and clamp x as appropriate. If you want a second, NAN-poisoning clamp(), that's your perogative, but don't force it upon everyone. > clamp(x, y, z) --> NaN if z < y That's a clear error, and it should raise immediately. I see no advantage to returning NAN in this case. Think about why you're clamping. It's unlikely to be used just once, for a single calculation. You're likely to be clamping a whole series of values, with a fixed lower and upper bounds. The bounds are unlikely to be known at compile-time, but they aren't going to change from clamping to clamping. Something like this: lower, upper = get_bounds() for x in values(): y = some_calculation(x) y = clamp(y, lower, upper) do_something_with(y) is the most likely use-case, I think. If lower happens to be greater than upper, that's clearly a mistake. Its better to get an exception immediately, rather than run through a million calculations and only then discover that you've ended up with a million NANs. It's okay if you get a few NANs, that simply indicates that one of your x values was a NAN, or a calculation produced a NAN. But if *every* calculation produces a NAN, well, that's a sign of breakage. Hence, better to raise straight away. -- Steve From srkunze at mail.de Thu Aug 4 09:20:42 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 4 Aug 2016 15:20:42 +0200 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> Message-ID: <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> Hey Yury, that's a great proposal! On 03.08.2016 00:31, Yury Selivanov wrote: > > class Ticker: > """Yield numbers from 0 to `to` every `delay` seconds.""" > > def __init__(self, delay, to): > self.delay = delay > self.i = 0 > self.to = to > > def __aiter__(self): > return self > > async def __anext__(self): > i = self.i > if i >= self.to: > raise StopAsyncIteration > self.i += 1 > if i: > await asyncio.sleep(self.delay) > return i > > > The same can be implemented as a much simpler asynchronous generator:: > > async def ticker(delay, to): > """Yield numbers from 0 to `to` every `delay` seconds.""" > for i in range(to): > yield i > await asyncio.sleep(delay) > That's a great motivational example. +1 Especially when reading the venerable PEP 255 (Q/A part), this also gets rid of the "low-level detail": "raise StopAsyncIteration". This could also be worth mentioning in the motivation part for PEP 525. > [...] > Asynchronous Generator Object > ----------------------------- > > The object is modeled after the standard Python generator object. > Essentially, the behaviour of asynchronous generators is designed > to replicate the behaviour of synchronous generators, with the only > difference in that the API is asynchronous. > > The following methods and properties are defined: > > 1. ``agen.__aiter__()``: Returns ``agen``. > > 2. ``agen.__anext__()``: Returns an *awaitable*, that performs one > asynchronous generator iteration when awaited. > > 3. ``agen.asend(val)``: Returns an *awaitable*, that pushes the > ``val`` object in the ``agen`` generator. When the ``agen`` has > not yet been iterated, ``val`` must be ``None``. > > [...] > 4. ``agen.athrow(typ, [val, [tb]])``: Returns an *awaitable*, that > throws an exception into the ``agen`` generator. > > [...] > 5. ``agen.aclose()``: Returns an *awaitable*, that throws a > ``GeneratorExit`` exception into the generator. The *awaitable* can > either return a yielded value, if ``agen`` handled the exception, > or ``agen`` will be closed and the exception will propagate back > to the caller. > > 6. ``agen.__name__`` and ``agen.__qualname__``: readable and writable > name and qualified name attributes. > > 7. ``agen.ag_await``: The object that ``agen`` is currently *awaiting* > on, or ``None``. This is similar to the currently available > ``gi_yieldfrom`` for generators and ``cr_await`` for coroutines. > > 8. ``agen.ag_frame``, ``agen.ag_running``, and ``agen.ag_code``: > defined in the same way as similar attributes of standard generators. > > ``StopIteration`` and ``StopAsyncIteration`` are not propagated out of > asynchronous generators, and are replaced with a ``RuntimeError``. From your answer to Stefan, I get the impression that the reason why we actual need all those a* methods (basically a duplication of the existing gen protocol), is the fact that normal generators can be converted to coroutines. That means, 'yield' still can be used in both ways. So, it's a technical symptom of the backwards-compatibility rather than something that cannot be avoided by design. Is this correct? If it's correct, would you think it would make sense to get rid of the a* in a later iteration of the async capabilities of Python? So, just using the normal generator protocol again? One note on all examples but the last. Reading those examples, it creates the illusion of actual working code which is not the case, right? One would always need to 1) wrap module-level statements into its own coroutine, 2) create an event-loop and 3) run it. Do you think clarifying this in the PEP makes sense? Thanks again for your hard work here. Async generators definitely completes the picture. Sven From michael.selik at gmail.com Thu Aug 4 09:26:07 2016 From: michael.selik at gmail.com (Michael Selik) Date: Thu, 04 Aug 2016 13:26:07 +0000 Subject: [Python-ideas] Trial balloon: adding variable type In-Reply-To: <57A2DD6E.5020902@canterbury.ac.nz> References: <57A2DD6E.5020902@canterbury.ac.nz> Message-ID: On Thu, Aug 4, 2016 at 2:15 AM Greg Ewing wrote: > Guido van Rossum wrote: > > But > > > > x = ... > > > > already has a meaning -- it assigns x the (fairly pointless) value > > Ellipsis > > You can always say > > x = Ellipsis > > if you really want that. > I appreciate that ``...`` is pointless in the core so that I'm free to give it a purpose in my own libraries. I prefer to write ``Ellipsis`` as ``...``. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Aug 4 09:29:44 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 4 Aug 2016 23:29:44 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <20160804132944.GM6608@ando.pearwood.info> On Thu, Aug 04, 2016 at 10:37:33PM +1000, Nick Coghlan wrote: > I wonder if there's some way we could make that new statement form > trigger the following runtime behaviour: > > a : int > a # This raises a new UnboundNameError at module scope, > UnboundLocalError otherwise Why wouldn't it raise NameError, as it does now? I understand that `a: int` with no assignment will be treated as a no-op by the interpreter. a: int = 0 will be fine, of course, since that's assigning 0 to a. > Otherwise we're at risk of allowing thoroughly confusing runtime behaviour like: > > >>> a = "Not an int" > >>> def f(): > ... # a: int would go here > ... print(a) # This should really fail > ... > >>> f() > Not an int I would expect that the type-checker will complain that you're declaring a local variable "a" but there's no local "a" in the function. If the checker isn't that smart, I expect it would complain that "a" is set to a string, but declared as an int. Either way, the type-checker ought to complain. -- Steve From steve at pearwood.info Thu Aug 4 09:32:31 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 4 Aug 2016 23:32:31 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <57A2E2C8.9060208@canterbury.ac.nz> References: <20160803173532.GG6608@ando.pearwood.info> <20160804005212.GI6608@ando.pearwood.info> <57A2E2C8.9060208@canterbury.ac.nz> Message-ID: <20160804133230.GN6608@ando.pearwood.info> On Thu, Aug 04, 2016 at 06:38:00PM +1200, Greg Ewing wrote: > Concerning type annotations on local variables of a function: > > 1) Would they be stored anywhere? If so, where? > > 2) Would they be evaluated once, or every time the > function is called? > > 3) If they're evaluated once, when and in what scope? I would think that annotations on local variables should not be evaluated or stored. I think it is reasonable to have annotations on global variables (or inside a class definition) stored somewhere for runtime analysis, like function annotations. -- Steve From dmoisset at machinalis.com Thu Aug 4 10:12:11 2016 From: dmoisset at machinalis.com (Daniel Moisset) Date: Thu, 4 Aug 2016 15:12:11 +0100 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: There's something I found a bit confusing in the runtime semantic of defaults in your strawman proposal: On Tue, Aug 2, 2016 at 11:09 PM, Guido van Rossum wrote: > Agreed, we need to invent a workable proposal for this. Here's a strawman: > > - The default is an instance variable (backed by a class variable as > default if there's an initial value) > - To define a class variable, prefix the type with 'class` > > Example: > > class C: > a: int # instance var > b: List[int] = None # instance var > c: class List[int] = [] # class var > > >From that description I understand that after C is defined, C.__dict__ will contain the {'c': []} mapping... where us the {'b': None} mapping stored? what's the runtime semantics that get it to every created instance? Or you are proposing that in runtime these are equivalent to "b, c = None, []" ? If that's the case, I find it misleading to call them "instance variables" vs "class variables", given that those concepts are supposed to have different runtime semantics, not just for the typechecker. -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Thu Aug 4 11:07:07 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 4 Aug 2016 17:07:07 +0200 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <20160804133230.GN6608@ando.pearwood.info> References: <20160803173532.GG6608@ando.pearwood.info> <20160804005212.GI6608@ando.pearwood.info> <57A2E2C8.9060208@canterbury.ac.nz> <20160804133230.GN6608@ando.pearwood.info> Message-ID: <45e785fd-bf41-0e09-105d-f0d1b3aa98bc@mail.de> On 04.08.2016 15:32, Steven D'Aprano wrote: > I would think that annotations on local variables should not > beevaluated or stored. > I think it is reasonable to have annotations on global variables (or > inside a class definition) stored somewhere for runtime analysis, like > function annotations. I don't know. That somehow would break expected Python semantics and would lead to big asymmetry. Most people don't think about the difference between module-level variables and variables in functions. Their code works regardlessly. The more I think about it, the more I like #comment-style annotations then. They are a clear sign to the developer >not at runtime<, and only available for static analysis. This might even be true for class/instance variables. -- Sven From gvanrossum at gmail.com Thu Aug 4 11:12:15 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu, 4 Aug 2016 08:12:15 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <57A2E2C8.9060208@canterbury.ac.nz> References: <20160803173532.GG6608@ando.pearwood.info> <20160804005212.GI6608@ando.pearwood.info> <57A2E2C8.9060208@canterbury.ac.nz> Message-ID: For pragmatic reasons I propose not to evaluate locally scoped type annotations at all. Basically these would slow you down too much otherwise. --Guido (mobile) On Aug 3, 2016 11:38 PM, "Greg Ewing" wrote: > Concerning type annotations on local variables of a function: > > 1) Would they be stored anywhere? If so, where? > > 2) Would they be evaluated once, or every time the > function is called? > > 3) If they're evaluated once, when and in what scope? > > -- > Greg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Thu Aug 4 11:17:55 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 4 Aug 2016 11:17:55 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> Message-ID: <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Hi Sven, On 2016-08-04 9:20 AM, Sven R. Kunze wrote: > Hey Yury, > > that's a great proposal! > > > On 03.08.2016 00:31, Yury Selivanov wrote: >> >> class Ticker: >> """Yield numbers from 0 to `to` every `delay` seconds.""" >> >> def __init__(self, delay, to): >> self.delay = delay >> self.i = 0 >> self.to = to >> >> def __aiter__(self): >> return self >> >> async def __anext__(self): >> i = self.i >> if i >= self.to: >> raise StopAsyncIteration >> self.i += 1 >> if i: >> await asyncio.sleep(self.delay) >> return i >> >> >> The same can be implemented as a much simpler asynchronous generator:: >> >> async def ticker(delay, to): >> """Yield numbers from 0 to `to` every `delay` seconds.""" >> for i in range(to): >> yield i >> await asyncio.sleep(delay) >> > > That's a great motivational example. +1 > > Especially when reading the venerable PEP 255 (Q/A part), this also > gets rid of the "low-level detail": "raise StopAsyncIteration". This > could also be worth mentioning in the motivation part for PEP 525. Thanks! [..] > From your answer to Stefan, I get the impression that the reason why > we actual need all those a* methods (basically a duplication of the > existing gen protocol), is the fact that normal generators can be > converted to coroutines. That means, 'yield' still can be used in both > ways. > > So, it's a technical symptom of the backwards-compatibility rather > than something that cannot be avoided by design. Is this correct? > async/await in Python is implemented on top of the generator protocol. Any 'await' is either awaiting on a coroutine or on a Future-like object. Future-like objects are defined by implementing the __await__ method, which should return a generator. So coroutines and generators are very intimately tied to each other, and that's *by design*. Any coroutine that iterates over an asynchronous generator uses the generator protocol behind the scenes. So we have to multiplex the async generaotor's "yields" into the generator protocol in such a way, that it stays isolated, and does not interfere with the "yields" that drive async/await. > > > If it's correct, would you think it would make sense to get rid of the > a* in a later iteration of the async capabilities of Python? So, just > using the normal generator protocol again? Because async generators will contain 'await' expressions, we have to have a* methods (although we can name them without the "a" prefix, but I believe that would be confusing for many users). > > > One note on all examples but the last. Reading those examples, it > creates the illusion of actual working code which is not the case, > right? One would always need to 1) wrap module-level statements into > its own coroutine, 2) create an event-loop and 3) run it. Do you think > clarifying this in the PEP makes sense? I'll think about this, thanks! Maybe I can add a line in the beginning of the "Specification" section. > > > Thanks again for your hard work here. Async generators definitely > completes the picture. Thank you, Sven Yury From gvanrossum at gmail.com Thu Aug 4 11:22:09 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu, 4 Aug 2016 08:22:09 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <20160803173532.GG6608@ando.pearwood.info> <20160804005212.GI6608@ando.pearwood.info> <57A2E2C8.9060208@canterbury.ac.nz> Message-ID: However the presence of a local declaration like 'a: int' would create a local slot for 'a' as if it were assigned to somewhere in the function. --Guido (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Thu Aug 4 11:40:47 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 4 Aug 2016 11:40:47 -0400 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <20160803173532.GG6608@ando.pearwood.info> <20160804005212.GI6608@ando.pearwood.info> <57A2E2C8.9060208@canterbury.ac.nz> Message-ID: On Thu, Aug 4, 2016 at 11:22 AM, Guido van Rossum wrote: > However the presence of a local declaration like 'a: int' would create a > local slot for 'a' as if it were assigned to somewhere in the function. Does this mean that the following code will raise a NameError? a = None def f(): a: int a (FWIW, I like the : notation more than any alternative proposed so far.) From victor.stinner at gmail.com Thu Aug 4 11:48:51 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 4 Aug 2016 17:48:51 +0200 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> Message-ID: When I really need such function, I define it like this: def clamp(min_val, value, max_val): return min(max(min_val, value), max_val) Test: min_val <= result <= max_val. The parameter order is chosen to get something looking like min_val <= value (result in fact) <= max_val. If you need special handling of NaN, I suggest to add a special version in the math module. I'm not sure that it's worth it to add such new function to the standard library. Victor Le 31 juil. 2016 6:13 AM, "Neil Girdhar" a ?crit : > It's common to want to clip (or clamp) a number to a range. This feature > is commonly needed for both floating point numbers and integers: > > http://stackoverflow.com/questions/9775731/clamping- > floating-numbers-in-python > http://stackoverflow.com/questions/4092528/how-to- > clamp-an-integer-to-some-range-in-python > > There are a few approaches: > > * use a couple ternary operators (e.g. https://github.com/ > scipy/scipy/pull/5944/files line 98, which generated a lot of discussion) > * use a min/max construction, > * call sorted on a list of the three numbers and pick out the first, or > * use numpy.clip. > > Am I right that there is no *obvious* way to do this? If so, I suggest > adding math.clip (or math.clamp) to the standard library that has the > meaning: > > def clip(number, lower, upper): > return lower if number < lower else upper if number > upper else number > > This would work for non-numeric types so long as the non-numeric types > support comparison. It might also be worth adding > > assert lower < upper > > to catch some bugs. > > Best, > > Neil > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gvanrossum at gmail.com Thu Aug 4 12:15:02 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu, 4 Aug 2016 09:15:02 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <20160803173532.GG6608@ando.pearwood.info> <20160804005212.GI6608@ando.pearwood.info> <57A2E2C8.9060208@canterbury.ac.nz> Message-ID: On Thu, Aug 4, 2016 at 8:40 AM, Alexander Belopolsky wrote: > On Thu, Aug 4, 2016 at 11:22 AM, Guido van Rossum wrote: >> However the presence of a local declaration like 'a: int' would create a >> local slot for 'a' as if it were assigned to somewhere in the function. > > Does this mean that the following code will raise a NameError? > > a = None > def f(): > a: int > a It will do exactly the same as a = None def f(): if False: a = ... a This raises UnboundLocalError (a subclass of NameError). > (FWIW, I like the : notation more than any alternative > proposed so far.) Me too. :-) -- --Guido van Rossum (python.org/~guido) From guido at python.org Thu Aug 4 12:20:54 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Aug 2016 09:20:54 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <6059264f-f312-9dac-1e1e-9efc90a606b6@mail.de> Message-ID: On Thu, Aug 4, 2016 at 4:03 AM, Daniel Moisset wrote: > I'll post here another wild idea (based on the > https://github.com/python/mypy/blob/master/mypy/build.py#L976 example), just > thinking on instance attributes. I myself am not fully sold on it but it's > quite different so perhaps adds a new angle to the brainstorming and has > some nice upsides: > > class State: > > def __instancevars__( > manager: BuildManager, > order: int, # Order in which modules were encountered > id: str, # Fully qualified module name > path: Optional[str], # Path to module source > xpath: str, # Path or '' > source: Optional[str], # Module source code > meta: Optional[CacheMeta], > data: Optional[str], > tree: Optional[MypyFile], > dependencies: List[str], > suppressed: List[str], # Suppressed/missing dependencies > priorities: Dict[str, int] > ): ... That feels too hacky for a long-term solution, and some of your downsides are very real. The backwards compatible solution is to keep using type comments (with initial values set to None, or `...` if Python 2 isn't needed). But thanks for thinking creatively about the problem! -- --Guido van Rossum (python.org/~guido) From alexander.belopolsky at gmail.com Thu Aug 4 12:24:48 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 4 Aug 2016 12:24:48 -0400 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> Message-ID: On Thu, Aug 4, 2016 at 11:48 AM, Victor Stinner wrote: > When I really need such function, I define it like this: > > def clamp(min_val, value, max_val): return min(max(min_val, value), max_val) and your colleague next door defines it like this: def clamp(min_val, value, max_val): return min(max_val, max(value , min_val)) and a third party library ships def clamp(min_val, value, max_val): return max(min(max_val, value), min_val) and combinatorially, there is at least a half-dozen more variations. The behavior of each variant is subtly different from the others. Having this function in stdlib would allow standardizing on one well-documented (and hopefully well-motivated) variant. From guido at python.org Thu Aug 4 12:27:34 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Aug 2016 09:27:34 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Thu, Aug 4, 2016 at 12:16 AM, Pavol Lisy wrote: > On 8/4/16, Guido van Rossum wrote: >> On Wed, Aug 3, 2016 at 3:02 PM, Pavol Lisy wrote: > >>> def fnc(): >>> global a: list >> >> I'm not proposing to add such syntax, and the right place for the type >> of a would be at the global level, not on the `global` syatement. >> >>> a = 7 >>> fnc() >>> a = [1, 2, 3] # and this could be interesting for static type-checker >>> too >> >> Indeed, but that's not what we're debating here. > > Sorry but for me is really important where we are going (at least as > inspiration). But you are thinking with your runtime hat on. To a static type checker, the call to fnc() is irrelevant to the type of the global variable `a`. It's like writing def foo(): return int def bar(a: foo()): return a+1 The static type checker rejects `foo()` as an invalid type, even though you know what it means at runtime. (But what if foo() were to flip a coin to choose between int and str?) > As I understand now this changes could end in code which could be pure > python and on other hand also compilable by static typed compiler too. > Maybe it is what we wanted (I prefer this one), maybe we doesnt care > and maybe we want to avoid it. > > For example Cython's code: > cdef int n > > could be writen: > n: cdef.int > > or > n: 'cdef int' > > and I think it could be good to see convergence here. Yes, that should be possible. Just don't run mypy over that source code. :-) > And maybe this > cdef int n, k, i > > could be inspiring too allow one to many possibility > n, j, k: int # (1) > > I think it is natural and better than > n, j, k: int, int, int > > But (1) would be discordant with this (from PEP484) > def __init__(self, left: Node, right: Node) And that's why I really don't want to go there. What if someone wrote T = Tuple[int, int, str] a, b, c: T Do we now have three Tuple[int, int, str] variables, or two ints and a str? -- --Guido van Rossum (python.org/~guido) From guido at python.org Thu Aug 4 12:40:06 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Aug 2016 09:40:06 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Thu, Aug 4, 2016 at 7:12 AM, Daniel Moisset wrote: > There's something I found a bit confusing in the runtime semantic of > defaults in your strawman proposal: > > On Tue, Aug 2, 2016 at 11:09 PM, Guido van Rossum wrote: >> >> Agreed, we need to invent a workable proposal for this. Here's a strawman: >> >> - The default is an instance variable (backed by a class variable as >> default if there's an initial value) >> - To define a class variable, prefix the type with 'class` >> >> Example: >> >> class C: >> a: int # instance var >> b: List[int] = None # instance var >> c: class List[int] = [] # class var >> > > From that description I understand that after C is defined, C.__dict__ will > contain the {'c': []} mapping... where us the {'b': None} mapping stored? > what's the runtime semantics that get it to every created instance? > > Or you are proposing that in runtime these are equivalent to "b, c = None, > []" ? If that's the case, I find it misleading to call them "instance > variables" vs "class variables", given that those concepts are supposed to > have different runtime semantics, not just for the typechecker. The latter (except possibly for also storing the types in __annotations__). I'm a little worried by your claim that it's misleading to distinguish between instance and class variables. There really are three categories: - pure class variable -- exists in class __dict__ only - pure instance variable -- exists in instance __dict__ only - instance variable with class default -- in class __dict__ and maybe in instance __dict__ (Note that even a pure class variable can be referenced as an instance attribute -- this is the mechanism that enables the third category.) Pragmatically, the two categories of instance variables together are wildly more common than pure class variables. I also believe that the two categories of instance variables are both pretty common. So I want the notation to support all three: class Starship: stats: class Dict[str, int] = {} # Pure class variable damage: class int = 0 # Hybrid class/instance variable captain: str # Pure instance variable The intention is that the captain's name must always be set when a Starship is initialized, but all Starships start their life with zero damage, and stats is merely a place where various Starship methods can leave behind counters logging how often various events occur. (In real code it would probably be a defaultdict.) -- --Guido van Rossum (python.org/~guido) From alexander.belopolsky at gmail.com Thu Aug 4 12:44:11 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 4 Aug 2016 12:44:11 -0400 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <20160803173532.GG6608@ando.pearwood.info> <20160804005212.GI6608@ando.pearwood.info> <57A2E2C8.9060208@canterbury.ac.nz> Message-ID: On Thu, Aug 4, 2016 at 12:15 PM, Guido van Rossum wrote: > It will do exactly the same as > > a = None > def f(): > if False: a = ... > a > > This raises UnboundLocalError (a subclass of NameError). That's what I expected. Moreover, it looks like there is a precedent for such behavior: $ python3 -O >>> a = None >>> def f(): ... if __debug__: a = ... ... a ... >>> f() Traceback (most recent call last): File "", line 1, in File "", line 3, in f UnboundLocalError: local variable 'a' referenced before assignment >>> dis(f) 3 0 LOAD_FAST 0 (a) 3 POP_TOP 4 LOAD_CONST 0 (None) 7 RETURN_VALUE (Note that the if statement is optimized away.) BTW, a: int looks like an "incomplete" assignment to me. It's a statement that under-specifies a: gives it a type, but not a value. Visually, ':' is a shorter version of '=' and a half of the venerable Pascal's ':='. It all makes sense to me, but your milage may vary if your first language was JavaScript rather than ALGOL. :-) From gvanrossum at gmail.com Thu Aug 4 12:58:01 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu, 4 Aug 2016 09:58:01 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <20160803173532.GG6608@ando.pearwood.info> <20160804005212.GI6608@ando.pearwood.info> <57A2E2C8.9060208@canterbury.ac.nz> Message-ID: On Thu, Aug 4, 2016 at 9:44 AM, Alexander Belopolsky wrote: > BTW, a: int looks like an "incomplete" assignment to me. It's a > statement that under-specifies a: gives it a type, but not a value. > Visually, ':' is a shorter version of '=' and a half of the venerable > Pascal's ':='. It all makes sense to me, but your milage may vary if > your first language was JavaScript rather than ALGOL. :-) Actually, I'm trying to keep some memories of Pascal. In Pascal you write: var age: integer; The family of languages derived from C is already doing enough to keep the ALGOL tradition alive. Anyways, "age: int" matches what you write in function signatures in Python, so I think our hands are tied here and we might as well make the most of it. -- --Guido van Rossum (python.org/~guido) From elazarg at gmail.com Thu Aug 4 13:07:50 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 04 Aug 2016 17:07:50 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: > > > > (FWIW, I like the : notation more than any alternative > > proposed so far.) > > Me too. :-) > Just wanted to note that the typo lamda: int Becomes syntactically correct. Maybe there are other similar problems. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Aug 4 13:12:53 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Aug 2016 10:12:53 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Thu, Aug 4, 2016 at 10:07 AM, ????? wrote: > Just wanted to note that the typo > > lamda: int > > Becomes syntactically correct. Maybe there are other similar problems. But even lambda: int on a line by itself is pointless, so I don't think this is much of a problem. -- --Guido van Rossum (python.org/~guido) From rosuav at gmail.com Thu Aug 4 13:15:42 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 5 Aug 2016 03:15:42 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Fri, Aug 5, 2016 at 3:07 AM, ????? wrote: > Just wanted to note that the typo > > lamda: int > > Becomes syntactically correct. Maybe there are other similar problems. True, but the correct version "lambda: int" wouldn't appear on a line on its own, creating an anonymous function and then abandoning it. It'll normally appear in a function call or similar, where you wouldn't be allowed to type-declare a new variable. ChrisA From chris.barker at noaa.gov Thu Aug 4 13:51:17 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 4 Aug 2016 10:51:17 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> Message-ID: On Thu, Aug 4, 2016 at 9:24 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > The behavior of each variant is subtly different from the others. > Having this function in stdlib would allow standardizing on one > well-documented (and hopefully well-motivated) variant. > exactly -- not every small function should be in the stdlib -- but there is a place for it when there are multiple subtly different way to implement it. regardless of the outcome of the NaN issue -- I think the "one obvious way to do it" should be the math.clip() (or math.clamp()) function. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Thu Aug 4 13:58:55 2016 From: random832 at fastmail.com (Random832) Date: Thu, 04 Aug 2016 13:58:55 -0400 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> Message-ID: <1470333535.3044288.686191761.491F666B@webmail.messagingengine.com> On Thu, Aug 4, 2016, at 12:24, Alexander Belopolsky wrote: > On Thu, Aug 4, 2016 at 11:48 AM, Victor Stinner > wrote: > > When I really need such function, I define it like this: > > > > def clamp(min_val, value, max_val): return min(max(min_val, value), max_val) > > and your colleague next door defines it like this: > > def clamp(min_val, value, max_val): return min(max_val, max(value , > min_val)) Ideally min and max should themselves be defined in a way that makes that not an issue (or perhaps only an issue for different-signed zero values) > and a third party library ships > > def clamp(min_val, value, max_val): return max(min(max_val, value), > min_val) That one is more of an issue, though AIUI only so when min_val > max_val. From chris.barker at noaa.gov Thu Aug 4 14:01:38 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 4 Aug 2016 11:01:38 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160804123557.GJ6608@ando.pearwood.info> References: <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <20160804123557.GJ6608@ando.pearwood.info> Message-ID: On Thu, Aug 4, 2016 at 5:35 AM, Steven D'Aprano wrote: > The IEEE 754 standard tells us what min(x, NAN) and max(x, NAN) should > be: in both cases it is x. > > https://en.wikipedia.org/wiki/IEEE_754_revision#min_and_max I thought an earlier post said something about a alternatvie min and max? -- but anyway, consisetncy with min and max is a pretty good argument. Quote: > > For instance max{x, y} should deliver the same result as max{y, x} but > almost no implementations do that when x is NaN. There are good > reasons to define max{NaN, 5} := max{5, NaN} := 5 though many would > disagree. > I don't disagree that there are good reason, just that it's the final way to go :-) -- but if Kahan equivocates, then there isn't one way to go :-) As you propose it, clamp() is no use to me: it unnecessarily converts the > bounds to float, which may raise an exception. no it doesn't -- that's only one way to implement it. We really should decide on the behaviour we want, and then figure out how to implemt it -- not choose something because it's easier to implement. there was an earlier post with an implementation that would give the NaN-poising behaviour, but would also work with any total-ordered type as well. not that's thought about possible edge cases. I think this is it: def clamp(x, a, b): if a <= x: if x <= b: return x else: return b else: return a hmm -- doesn't work for x is NaN, but limits are not -- but I'm sure that could be worked out. In [*32*]: clamp(nan, 0, 100) Out[*32*]: 0 -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenbarn at brenbarn.net Thu Aug 4 14:33:30 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Thu, 04 Aug 2016 11:33:30 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <57A38A7A.4060401@brenbarn.net> On 2016-08-01 14:31, Guido van Rossum wrote: > PEP 484 doesn't change Python's syntax. Therefore it has no good > syntax to offer for declaring the type of variables, and instead you > have to write e.g. > > a = 0 # type: float > b = [] # type: List[int] > c = None # type: Optional[str] Let me ask a perhaps silly question. Reading a lot of subsequent messages in this thread, it seems that the main intended use for all this is external static type checkers. (Hence all the references to thinking with "runtime hats" on.) Given this, what is the benefit of making this change to Python syntax? If it's only going to be used by static checker tools, is there any difference between having those tools grab it from comments vs. from "real" syntax? This seems doubly questionable if, for local variables, the type annotations are totally unavailable at runtime (which I think is what was suggested). It seems odd to have Python syntax that not only doesn't do anything, but can't even be made to do anything by the program itself when it runs. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From alexander.belopolsky at gmail.com Thu Aug 4 15:09:04 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 4 Aug 2016 15:09:04 -0400 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <57A38A7A.4060401@brenbarn.net> References: <57A38A7A.4060401@brenbarn.net> Message-ID: On Thu, Aug 4, 2016 at 2:33 PM, Brendan Barnwell wrote: > It seems odd to have Python syntax that not only doesn't do anything, but > can't even be made to do anything by the program itself when it runs. Why do you find this odd? Python already has #-comment syntax with exactly the property you complain about. Think of local variable annotation as a structured comment syntax. From dmoisset at machinalis.com Thu Aug 4 15:11:47 2016 From: dmoisset at machinalis.com (Daniel Moisset) Date: Thu, 4 Aug 2016 20:11:47 +0100 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Thu, Aug 4, 2016 at 5:40 PM, Guido van Rossum wrote: > On Thu, Aug 4, 2016 at 7:12 AM, Daniel Moisset > wrote: > > On Tue, Aug 2, 2016 at 11:09 PM, Guido van Rossum > wrote: > >> class C: > >> a: int # instance var > >> b: List[int] = None # instance var > >> c: class List[int] = [] # class var > >> > > > > From that description I understand that after C is defined, C.__dict__ > will > > contain the {'c': []} mapping... where us the {'b': None} mapping stored? > > what's the runtime semantics that get it to every created instance? > > > > Or you are proposing that in runtime these are equivalent to "b, c = > None, > > []" ? If that's the case, I find it misleading to call them "instance > > variables" vs "class variables", given that those concepts are supposed > to > > have different runtime semantics, not just for the typechecker. > > The latter (except possibly for also storing the types in __annotations__). > > I'm a little worried by your claim that it's misleading to distinguish > between instance and class variables. There really are three > categories: > For me what's misleading is the runtime behaviour for b, which has na initializer but is not flagged as class variable... see below > > - pure class variable -- exists in class __dict__ only > - pure instance variable -- exists in instance __dict__ only > - instance variable with class default -- in class __dict__ and maybe > in instance __dict__ > > (Note that even a pure class variable can be referenced as an instance > attribute -- this is the mechanism that enables the third category.) > > Pragmatically, the two categories of instance variables together are > wildly more common than pure class variables. I also believe that the > two categories of instance variables are both pretty common. So I want > the notation to support all three: > > class Starship: > stats: class Dict[str, int] = {} # Pure class variable > damage: class int = 0 # Hybrid class/instance variable > captain: str # Pure instance variable > > The intention is that the captain's name must always be set when a > Starship is initialized, but all Starships start their life with zero > damage, and stats is merely a place where various Starship methods can > leave behind counters logging how often various events occur. (In real > code it would probably be a defaultdict.) > I follow the example perfectly. Now suppose a reader finds the following piece of code: class Starship: stats: class Dict[str, int] = {} # Pure class variable damage: class int = 0 # Hybrid class/instance variable captain: str # Pure instance variable speed: float = 0 I added a new attribute (similar to b in your original example). Given that the type declaration doesn't say "class",the reader might be inclined to think it's an instance variable. But in runtime (if I got you right), that variable will be stored in "Starship.__dict__" and writing "Starship.speed = 3" will change the speed of those starship instances that still haven't set the attribute. So in the end both "damage" and "speed" have "class variable" runtime semantics, even when one is flagged as "class" and the other isn't. The other combination that feels a bit confusing when adding "class" tags is saying "attr: class T", without an initializer.... in what case would someone do that? what does it mean if I find some code saying that about the class, that it might get that attribute set somewhere else? -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenbarn at brenbarn.net Thu Aug 4 15:19:58 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Thu, 04 Aug 2016 12:19:58 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <57A38A7A.4060401@brenbarn.net> Message-ID: <57A3955E.7080901@brenbarn.net> On 2016-08-04 12:09, Alexander Belopolsky wrote: > On Thu, Aug 4, 2016 at 2:33 PM, Brendan Barnwell wrote: >> It seems odd to have Python syntax that not only doesn't do anything, but >> can't even be made to do anything by the program itself when it runs. > > Why do you find this odd? Python already has #-comment syntax with > exactly the property you complain about. Think of local variable > annotation as a structured comment syntax. Exactly. But the existing comment annotations (# type: float) already are structured comment syntax, so what does this new one add? (Note we already have at least one parallel --- although it is much more limited in scope --- namely encoding declarations, which are also done within existing comment syntax, rather than having their own "real" syntax.) -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From ethan at stoneleaf.us Thu Aug 4 15:44:53 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 04 Aug 2016 12:44:53 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <57A39B35.20007@stoneleaf.us> On 08/04/2016 12:11 PM, Daniel Moisset wrote: > I follow the example perfectly. Now suppose a reader finds the following > piece of code: > > class Starship: > stats: class Dict[str, int] = {} # Pure class variable > damage: class int = 0 # Hybrid class/instance variable > captain: str # Pure instance variable > speed: float = 0 > > I added a new attribute (similar to b in your original example). Given > that the type declaration doesn't say "class",the reader might be > inclined to think it's an instance variable. But in runtime (if I got > you right), that variable will be stored in "Starship.__dict__" and > writing "Starship.speed = 3" will change the speed of those starship > instances that still haven't set the attribute. The type checker should flag that as a bug, since Starship.speed is supposed to be an instance variable. -- ~Ethan~ From victor.stinner at gmail.com Thu Aug 4 15:46:05 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 4 Aug 2016 21:46:05 +0200 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <1470333535.3044288.686191761.491F666B@webmail.messagingengine.com> References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <1470333535.3044288.686191761.491F666B@webmail.messagingengine.com> Message-ID: Le 4 ao?t 2016 19:59, "Random832" a ?crit : > > def clamp(min_val, value, max_val): return min(max_val, max(value , > > min_val)) > > Ideally min and max should themselves be defined in a way that makes > that not an issue (or perhaps only an issue for different-signed zero > values) There is a generic sum() and a specific math.fsum() function which is more accurate to sum a list of float. Maybe before starting to talk about clamp(), we should define new math.fmin() and math.fmax() functions? A suggest to start to write a short PEP as the math.is_close() PEP since there are subtle issues like NaN (float but also Decimal!) and combinations of numerical types (int, float, complex, Decimal, Fraction, numpy scalars like float16, ...). Maybe a PEP is not needed, I didn't read carefully the thread to check if there is a consensus or not. I dislike the idea of modifying min() and max() to add special cases for float NaN and decimal NaN. Which type do you expect for fmax(int, int)? Should it be int or float? Should fmax(Decimal, float) raise an error, return a float or return a Decimal? Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckaynor at zindagigames.com Thu Aug 4 16:21:27 2016 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Thu, 4 Aug 2016 13:21:27 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160804132028.GL6608@ando.pearwood.info> References: <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> Message-ID: On Thu, Aug 4, 2016 at 6:20 AM, Steven D'Aprano wrote: > Think about why you're clamping. It's unlikely to be used just once, for > a single calculation. You're likely to be clamping a whole series of > values, with a fixed lower and upper bounds. The bounds are unlikely to > be known at compile-time, but they aren't going to change from clamping > to clamping. Something like this: > > lower, upper = get_bounds() > for x in values(): > y = some_calculation(x) > y = clamp(y, lower, upper) > do_something_with(y) > > > is the most likely use-case, I think. > I was curious about what the likely cases are in many cases, so I ran a quick sample from a professional project I am working on, and found the following results: clamping to [0, 1]: 50 instances, almost always dealing with percentages lower is 0: 44 instances, almost all were clamping an index to list bounds, though a few outliers existed lower is 1: 4 instances, 3 were clamping a 1-based index, the other was some safety code to ensure a computed wait time falls within certain bounds to avoid both stalling and spamming both values were constant, but with no real specific values: 11 instances (two of these are kinda 0,1 limits, but a log is being done for volume calculations, so 0 is invalid, but the number is very close to 0) one value was constant, with some arbitrary limit and the other was computed: 0 both values were computed: 20 instances (many instances have the clamping pulled from data, which is generally constant but can be changed easier than code) Any given call to clamp was put into the first of the categories it matched. "computed" is fairly general, it includes cases where the value is user-input with no actual math done. As would be expected, all cases were using computed value as the input, only the min/max were ever constant. The project in this case is a video game's game logic code, written in C#. None of the shaders or engine code is included. There may be additional clamping using min/max combinations, rather than the provided clamp helpers that were not included, however the search did find two instances, where they were commented as being clamps, which were included. Basically all of the cases will repeat fairly often, either every frame, move, or level. Most are not in loops outside of the frame/game loop. > If lower happens to be greater than upper, that's clearly a mistake. Its > better to get an exception immediately, rather than run through a > million calculations and only then discover that you've ended up with a > million NANs. It's okay if you get a few NANs, that simply indicates > that one of your x values was a NAN, or a calculation produced a NAN. > But if *every* calculation produces a NAN, well, that's a sign of > breakage. Hence, better to raise straight away. > I personally don't have much opinion on NAN behaviour in general - I don't think I've ever actually used them in any of my code, and the few cases they show up, it is due to a bug or corrupted data that I want caught early. Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Aug 4 16:32:15 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Aug 2016 13:32:15 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Thu, Aug 4, 2016 at 12:11 PM, Daniel Moisset wrote: [...] > I follow the example perfectly. Now suppose a reader finds the following > piece of code: > > class Starship: > stats: class Dict[str, int] = {} # Pure class variable > damage: class int = 0 # Hybrid class/instance variable > captain: str # Pure instance variable > speed: float = 0 > > I added a new attribute (similar to b in your original example). Given that > the type declaration doesn't say "class",the reader might be inclined to > think it's an instance variable. But in runtime (if I got you right), that > variable will be stored in "Starship.__dict__" and writing "Starship.speed = > 3" will change the speed of those starship instances that still haven't set > the attribute. So in the end both "damage" and "speed" have "class variable" > runtime semantics, even when one is flagged as "class" and the other isn't. We may have to debate more what the checker should allow here. (Since it's okay for a type checker to disallow things that might actually work at runtime.) I'm inclined to allow it, as long as the value assigned to Starship.speed is compatible with float. > The other combination that feels a bit confusing when adding "class" tags is > saying "attr: class T", without an initializer.... in what case would > someone do that? what does it mean if I find some code saying that about the > class, that it might get that attribute set somewhere else? Agreed that looks silly. We probably should make that a syntax error. -- --Guido van Rossum (python.org/~guido) From jsbueno at python.org.br Thu Aug 4 16:37:36 2016 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 4 Aug 2016 17:37:36 -0300 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: > The other combination that feels a bit confusing when adding "class" tags is > saying "attr: class T", without an initializer.... in what case would > someone do that? what does it mean if I find some code saying that about the > class, that it might get that attribute set somewhere else? Actually, for the __init_subclass__ class initializer (PEP 487) , discussed in depth on a recently active thread on Python-dev, such attributes would make sense On 4 August 2016 at 17:32, Guido van Rossum wrote: > On Thu, Aug 4, 2016 at 12:11 PM, Daniel Moisset wrote: > [...] >> I follow the example perfectly. Now suppose a reader finds the following >> piece of code: >> >> class Starship: >> stats: class Dict[str, int] = {} # Pure class variable >> damage: class int = 0 # Hybrid class/instance variable >> captain: str # Pure instance variable >> speed: float = 0 >> >> I added a new attribute (similar to b in your original example). Given that >> the type declaration doesn't say "class",the reader might be inclined to >> think it's an instance variable. But in runtime (if I got you right), that >> variable will be stored in "Starship.__dict__" and writing "Starship.speed = >> 3" will change the speed of those starship instances that still haven't set >> the attribute. So in the end both "damage" and "speed" have "class variable" >> runtime semantics, even when one is flagged as "class" and the other isn't. > > We may have to debate more what the checker should allow here. (Since > it's okay for a type checker to disallow things that might actually > work at runtime.) I'm inclined to allow it, as long as the value > assigned to Starship.speed is compatible with float. > >> The other combination that feels a bit confusing when adding "class" tags is >> saying "attr: class T", without an initializer.... in what case would >> someone do that? what does it mean if I find some code saying that about the >> class, that it might get that attribute set somewhere else? > > Agreed that looks silly. We probably should make that a syntax error. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From chris.barker at noaa.gov Thu Aug 4 19:17:47 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 4 Aug 2016 16:17:47 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> Message-ID: On Thu, Aug 4, 2016 at 1:21 PM, Chris Kaynor wrote: > If lower happens to be greater than upper, that's clearly a mistake. Its >> better to get an exception immediately, rather than run through a >> million calculations and only then discover that you've ended up with a >> million NANs. >> > sure. > It's okay if you get a few NANs, that simply indicates >> that one of your x values was a NAN, or a calculation produced a NAN. >> But if *every* calculation produces a NAN, well, that's a sign of >> breakage. Hence, better to raise straight away. >> > sure -- but one reason NaN exists is so that errors can get propagated through the hardware without bringing everything to a halt -- this is really key in vectorized operations. And it's really useful. So I"d rather not have an exception there, if you are doing something like: [ clamp(x, y, z) for z in the_max_values] might be better to check for NaN somewhere else than have that whole operation fail. I think it would also require more special case checking in the code.... I personally don't have much opinion on NAN behaviour in general - I don't > think I've ever actually used them in any of my code, and the few cases > they show up, it is due to a bug or corrupted data that I want caught early. > exactly -- usually a bug or corrupted data -- if NaN is passed in as a limit, it's probably a error of some sort, you really don't want it silently passing your input value through. And you have inf and -inf if you do want "no limit" -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Aug 4 20:55:29 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 5 Aug 2016 10:55:29 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> Message-ID: <20160805005529.GO6608@ando.pearwood.info> On Thu, Aug 04, 2016 at 01:21:27PM -0700, Chris Kaynor wrote: > I was curious about what the likely cases are in many cases, so I ran a > quick sample from a professional project I am working on, and found the > following results: [...] > As would be expected, all cases were using computed value as the input, > only the min/max were ever constant. Thanks for doing that Chris. That's what I expected: I can't think of any use-case for clamping a constant value to varying bounds: x = known_value() for lower, upper in zip(seq1, seq2): y = clamp(x, lower, upper) process(y) -- Steve From steve at pearwood.info Thu Aug 4 21:05:11 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 5 Aug 2016 11:05:11 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> Message-ID: <20160805010510.GP6608@ando.pearwood.info> On Thu, Aug 04, 2016 at 04:17:47PM -0700, Chris Barker wrote: > I think it would also require more special case checking in the code.... I think you are over-complicating this, AND ignoring what the IEEE-754 standard says about this. > if NaN is passed in as a limit, it's probably a error of some sort That's not what the standard says. The standard says NAN as a limit should be treated as "no limit". > And you have inf and -inf if you do want "no limit" That will still apply. You can also pass 1.7976931348623157E+308, the largest possible float. (If we're talking about float arguments -- for int and Decimal, you can easily exceed that.) -- Steve From steve at pearwood.info Thu Aug 4 21:21:45 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 5 Aug 2016 11:21:45 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <57A12701.1020903@canterbury.ac.nz> <20160804124254.GK6608@ando.pearwood.info> Message-ID: <20160805012145.GQ6608@ando.pearwood.info> On Thu, Aug 04, 2016 at 11:12:36PM +1000, Chris Angelico wrote: > On Thu, Aug 4, 2016 at 10:42 PM, Steven D'Aprano wrote: > > I would like to see a common "isnan" function or method for both floats > > and Decimal, because I'm sick of writing code like: > > > > try: > > flag = x.is_nan() # Decimal? > > except AttributeError: > > flag = math.isnan(x) # float > > > > > > Note that the spelling is different too, for extra sadness :-( > > def isnan(x): > return x != x > > Might land you a few false positives, but not many. And it's IEEE compliant. That's what I used to use before math.isnan existed. But I fear some clever clogs deciding that they like NANs but don't like that NANs violate reflexivity, or merely optimizing equality like this: def __eq__(self, other): # optimisation if self is other: return True # more expensive comparison ... In the absence of an isnan() method or function, falling back on x!=x is okay as a last resort, but I wouldn't want to rely on it as the first resort. -- Steve From steve at pearwood.info Thu Aug 4 21:35:52 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 5 Aug 2016 11:35:52 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> <1470333535.3044288.686191761.491F666B@webmail.messagingengine.com> Message-ID: <20160805013552.GR6608@ando.pearwood.info> On Thu, Aug 04, 2016 at 09:46:05PM +0200, Victor Stinner wrote: > A suggest to start to write a short PEP as the math.is_close() PEP since > there are subtle issues like NaN (float but also Decimal!) and combinations > of numerical types (int, float, complex, Decimal, Fraction, numpy scalars > like float16, ...). > > Maybe a PEP is not needed, I didn't read carefully the thread to check if > there is a consensus or not. No consensus. > I dislike the idea of modifying min() and max() to add special cases for > float NaN and decimal NaN. min() and max() currently return NANs when given a NAN and number, but I don't know if that is deliberate or accidental. The IEEE-754 standard says that min(x, NAN) and max(x, NAN) should return x. https://en.wikipedia.org/wiki/IEEE_754_revision#min_and_max > Which type do you expect for fmax(int, int)? Should it be int or float? int. > Should fmax(Decimal, float) raise an error, return a float or return a > Decimal? Comparisons between float and Decimal no longer raise: py> Decimal(1) < 1.0 False so I would expect that fmax would return the larger of the two. If the larger is a float, it should return a float. If the larger is a Decimal, it should return a Decimal. If the two values are equal, it's okay to pick an arbitrary one. -- Steve From steve at pearwood.info Thu Aug 4 21:54:34 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 5 Aug 2016 11:54:34 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <4929689045364026497@unknownmsgid> References: <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <4929689045364026497@unknownmsgid> Message-ID: <20160805015434.GS6608@ando.pearwood.info> On Wed, Aug 03, 2016 at 08:52:24AM -0700, Chris Barker - NOAA Federal wrote: > One could argue that: > > clamp(NaN, x,x) > > Is clearly defined as x. But that would require special casing Not so special: if lower == upper != None: return lower That's in the spirit of Professor Kahan's admonition that NANs should not be treated as a one way street: most calculations that lead to NANs will, of course, stay as NANs, but there are cases where a calculation on a NAN will lead to a non-NAN: py> math.hypot(INF, NAN) inf py> math.hypot(NAN, INF) inf py> NAN**0.0 1.0 If the bounds are equal, then clamp(NAN, a, a) should return a. > and, "equality" is a bit of an ephemeral concept with floats, so > better to return NaN. Not really. Please read what Kahan says about NANs. He was one of the committee members that worked out most of these issues in the nineties. He says: NaN must not be confused with ?Undefined.? On the contrary, IEEE 754 defines NaN perfectly well even though most language standards ignore and many compilers deviate from that definition. The deviations usually afflict relational expressions, discussed below. Arithmetic operations upon NaNs other than SNaNs (see below) never signal INVALID, and always produce NaN unless replacing every NaN operand by any finite or infinite real values would produce the same finite or infinite floating-point result independent of the replacements. That's exactly the situation here: clamp(x, a, a) should return a for any finite or infinite x, which means it should do the same when x is a NAN as well. https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF See page 7. -- Steve From turnbull.stephen.fw at u.tsukuba.ac.jp Thu Aug 4 23:09:09 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 5 Aug 2016 12:09:09 +0900 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160804132028.GL6608@ando.pearwood.info> References: <20160801024738.GC13777@ando.pearwood.info> <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> Message-ID: <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> Steven D'Aprano writes: > > clamp(x, y, z) --> NaN if z < y > > That's a clear error, and it should raise immediately. I see no > advantage to returning NAN in this case. > > Think about why you're clamping. It's unlikely to be used just once, for > a single calculation. You're likely to be clamping a whole series of > values, with a fixed lower and upper bounds. "Likely" isn't a good enough reason: x = [clamp(x[t], f(t), g(t)) for t in range(1_000_000)] is perfectly plausible code. The "for which case is it easier to write a wrapper" argument applies here, I think. From nickolainovik at gmail.com Thu Aug 4 23:47:38 2016 From: nickolainovik at gmail.com (Nickolai Novik) Date: Thu, 4 Aug 2016 20:47:38 -0700 (PDT) Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> Message-ID: Hi Yury! Thanks you for pushing this forward! As an author/contributor to several packages would love to use async generator in aiomysql/aiobotocore libraries. In particular want to share my use case for async generators. aiobotocore (https://github.com/aio-libs/aiobotocore) is wrapper around botocore library (https://github.com/boto/botocore) brings amazon services client to asyncio world. AWS api is huge, botocore itself 20k lines of code, so I bit hacked it here and there to add asyncio support. Code a bit ugly and hackish, but we have full amazon support only with few hundreds lines of code. Since protocol separated from IO, it is pretty strait forward to change requests http client with aiohttp one. But to port pagination logic, so it is possible to use async for, was very challenging. Here is original pagination https://github.com/boto/botocore/blob/bb09e88508f5593ce4393c72e7c1edbaf6d28a6a/botocore/paginate.py#L91-L145 which has structure like: async def f(): var = await io_fetch() while True: if var > 10: var += 1 yield var var = await io_fetch() else: var +=2 yield var If we rewrite this generator using __aiter__ code become very ugly and barely readable, since we need to track a lot of states, with async generator such problem does not exist. Moreover with this PEP, generators could be easily ported by just putting async/await in proper places, without changing logic. Thank you for this PEP! -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Aug 5 02:06:48 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 5 Aug 2016 16:06:48 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> References: <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> Message-ID: <20160805060648.GU6608@ando.pearwood.info> On Fri, Aug 05, 2016 at 12:09:09PM +0900, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > > clamp(x, y, z) --> NaN if z < y > > > > That's a clear error, and it should raise immediately. I see no > > advantage to returning NAN in this case. > > > > Think about why you're clamping. It's unlikely to be used just once, for > > a single calculation. You're likely to be clamping a whole series of > > values, with a fixed lower and upper bounds. > > "Likely" isn't a good enough reason: Of course it is. We write code to prefer known, common use-cases, not just hypothetical "What If" scenarios. E.g. most trig functions (sin, cos, tan) take angles in radians. I've seen a few take angles in degrees. I've even seen trig functions that take their argument as a multiple of pi (e.g. sinpi(1.25) being equivalent to sin(1.25*pi), only more accurate). All of these have good use-cases. But I'm willing to bet that you will never, ever find a general purpose programming language or maths library with specialised trig functions that take arguments in 1/37th of a gon. (Yes, "gon" is a real unit.) If you need such a thing, you write it yourself. Chris has already gone through his code and confirmed what I expected: he uses "clamp" extensively, and the bounds are invariably fixed once at the start of the loop. But if you find yourself in that unusual situation of needing something unusual, you can easily write your own wrapper: def clamp(value, lower, upper): if lower > upper: return "Surprise!" return math.clamp(value, lower, upper) Of course, if math.clamp() returned NAN, you could just as easily go the other way and write a wrapper to raise instead. Neither case is particularly onerous. But in one case, only a few people will need to wrap the function; in the second case, many people (possibly even *everybody*) will want to wrap the function to avoid the unhelpful standard behaviour. It is our job as function designers to try to cater for the majority, not the minority, when possible. > x = [clamp(x[t], f(t), g(t)) for t in range(1_000_000)] > > is perfectly plausible code. I have my doubts. Sure, you can write it, but what would you use it for? What's your use-case? -- Steve From ncoghlan at gmail.com Fri Aug 5 03:27:20 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 5 Aug 2016 17:27:20 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <20160804132944.GM6608@ando.pearwood.info> References: <20160804132944.GM6608@ando.pearwood.info> Message-ID: On 4 August 2016 at 23:29, Steven D'Aprano wrote: > On Thu, Aug 04, 2016 at 10:37:33PM +1000, Nick Coghlan wrote: >> Otherwise we're at risk of allowing thoroughly confusing runtime behaviour like: >> >> >>> a = "Not an int" >> >>> def f(): >> ... # a: int would go here >> ... print(a) # This should really fail >> ... >> >>> f() >> Not an int > > I would expect that the type-checker will complain that you're declaring > a local variable "a" but there's no local "a" in the function. If the > checker isn't that smart, I expect it would complain that "a" is set to > a string, but declared as an int. Either way, the type-checker ought to > complain. Guido's reply clarified that he expects the compiler to be declaration aware (so it correctly adds "a" to the local symbol table when a type declaration is given), which means functions will be free of ambiguous behaviour - they'll throw UnboundLocalError if a name is declared and then referenced without first being defined. That means it's mainly in classes that oddities will still be possible: >>> a = "Not an int" >>> class Example: ... # a: int ... # a class: int ... other_attr = derived_from(a) # Oops ... a = 1 ... That's probably acceptable to handle as "Don't do that" though, and leave it to typecheckers to pick it up as problematic "Referenced before definition" behaviour. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Aug 5 03:40:04 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 5 Aug 2016 17:40:04 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On 5 August 2016 at 02:40, Guido van Rossum wrote: > class Starship: > stats: class Dict[str, int] = {} # Pure class variable > damage: class int = 0 # Hybrid class/instance variable > captain: str # Pure instance variable I'm mostly on board with the proposal now, but have one last niggle: What do you think of associating the "class" more with the variable name than the type definition for class declarations by putting to the left of the colon? That is: class Starship: stats class: Dict[str, int] = {} # Pure class variable damage class: int = 0 # Hybrid class/instance variable captain: str # Pure instance variable Pronounced as: "stats is declared on the class as a dict mapping from strings to integers and is initialised as an empty dict" "damage is declared on the class as an integer and is initialised as zero" "captain is declared on instances as an integer" Just a minor thing, but the closer association with the name reads better to me since "Class attribute or instance attribute?" is really a property of the name binding, rather than of the permitted types that can be bound to that name Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mertz at gnosis.cx Fri Aug 5 07:29:46 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 5 Aug 2016 07:29:46 -0400 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160805060648.GU6608@ando.pearwood.info> References: <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> Message-ID: On Aug 5, 2016 2:13 AM, "Steven D'Aprano" wrote: > > x = [clamp(x[t], f(t), g(t)) for t in range(1_000_000)] > > > > is perfectly plausible code. > > I have my doubts. Sure, you can write it, but what would you use it for? > What's your use-case? Looks like ordinary trend with error bounds to me. I can easily imagine writing that code is clamp is introduced. And glad to see that Kahan explicitly supports my intuition on NaN not genetically infecting every operation... In fact that clamp(x, nan, nan) is explicitness x according to IEEE-754 2008... Not NaN. -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Fri Aug 5 10:30:35 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 5 Aug 2016 23:30:35 +0900 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160805060648.GU6608@ando.pearwood.info> References: <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> Message-ID: <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> Steven D'Aprano writes: > > x = [clamp(x[t], f(t), g(t)) for t in range(1_000_000)] > > > > is perfectly plausible code. > I have my doubts. Sure, you can write it, but what would you use it for? > What's your use-case? Any varying optimal control might also be subject to bounds that vary. These problems arise rather frequently in economic theory. Often the cheapest way to compute them is to compute an unconstrained problem and clamp. I can even think of a case where clamp could be used with a constant control and a varying bound: S-s inventory control facing occasional large orders in an otherwise continuous, stationary demand process. From andrew.svetlov at gmail.com Fri Aug 5 11:11:45 2016 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Fri, 05 Aug 2016 15:11:45 +0000 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: As maintainer of aiohttp (asyncio-compatible HTTP client/server) and aiopg (PostgreSQL asyncio driver) I totally support the pep. The problem is: I'm able to write async iterator class which utilizes __aiter__/__anext__. But users of my libraries are not. Writing synchronous generator functions for data processing is very common pattern. But class based approach requires saving all context not in local variables but in instance vars, you know. It's a nightmare in practice, as consequence people writes more complex and error prone code than they could write with this PEP. https://www.python.org/dev/peps/pep-0255/ for simple generators is 15 years old, I believe everybody understand why we need it. Async generators extend the same concept to asyncio world by solving the same problems but with async code. Honestly I personally don't feel a need for two-way generators with `asend()`/`athrow()` but as Yury explained he need them internally for `anext`/`aclose` anyway. On Thu, Aug 4, 2016 at 6:18 PM Yury Selivanov wrote: > Hi Sven, > > On 2016-08-04 9:20 AM, Sven R. Kunze wrote: > > Hey Yury, > > > > that's a great proposal! > > > > > > On 03.08.2016 00:31, Yury Selivanov wrote: > >> > >> class Ticker: > >> """Yield numbers from 0 to `to` every `delay` seconds.""" > >> > >> def __init__(self, delay, to): > >> self.delay = delay > >> self.i = 0 > >> self.to = to > >> > >> def __aiter__(self): > >> return self > >> > >> async def __anext__(self): > >> i = self.i > >> if i >= self.to: > >> raise StopAsyncIteration > >> self.i += 1 > >> if i: > >> await asyncio.sleep(self.delay) > >> return i > >> > >> > >> The same can be implemented as a much simpler asynchronous generator:: > >> > >> async def ticker(delay, to): > >> """Yield numbers from 0 to `to` every `delay` seconds.""" > >> for i in range(to): > >> yield i > >> await asyncio.sleep(delay) > >> > > > > That's a great motivational example. +1 > > > > Especially when reading the venerable PEP 255 (Q/A part), this also > > gets rid of the "low-level detail": "raise StopAsyncIteration". This > > could also be worth mentioning in the motivation part for PEP 525. > > Thanks! > > [..] > > From your answer to Stefan, I get the impression that the reason why > > we actual need all those a* methods (basically a duplication of the > > existing gen protocol), is the fact that normal generators can be > > converted to coroutines. That means, 'yield' still can be used in both > > ways. > > > > So, it's a technical symptom of the backwards-compatibility rather > > than something that cannot be avoided by design. Is this correct? > > > > async/await in Python is implemented on top of the generator protocol. > Any 'await' is either awaiting on a coroutine or on a Future-like > object. Future-like objects are defined by implementing the __await__ > method, which should return a generator. > > So coroutines and generators are very intimately tied to each other, and > that's *by design*. > > Any coroutine that iterates over an asynchronous generator uses the > generator protocol behind the scenes. So we have to multiplex the async > generaotor's "yields" into the generator protocol in such a way, that it > stays isolated, and does not interfere with the "yields" that drive > async/await. > > > > > > > If it's correct, would you think it would make sense to get rid of the > > a* in a later iteration of the async capabilities of Python? So, just > > using the normal generator protocol again? > > Because async generators will contain 'await' expressions, we have to > have a* methods (although we can name them without the "a" prefix, but I > believe that would be confusing for many users). > > > > > > > One note on all examples but the last. Reading those examples, it > > creates the illusion of actual working code which is not the case, > > right? One would always need to 1) wrap module-level statements into > > its own coroutine, 2) create an event-loop and 3) run it. Do you think > > clarifying this in the PEP makes sense? > > I'll think about this, thanks! Maybe I can add a line in the beginning > of the "Specification" section. > > > > > > > Thanks again for your hard work here. Async generators definitely > > completes the picture. > > Thank you, Sven > > Yury > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Thanks, Andrew Svetlov -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Aug 5 12:12:23 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 5 Aug 2016 09:12:23 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Fri, Aug 5, 2016 at 12:40 AM, Nick Coghlan wrote: > I'm mostly on board with the proposal now, but have one last niggle: > What do you think of associating the "class" more with the variable > name than the type definition for class declarations by putting to the > left of the colon? > > That is: > > class Starship: > stats class: Dict[str, int] = {} # Pure class variable > damage class: int = 0 # Hybrid class/instance variable > captain: str # Pure instance variable > > Pronounced as: > > "stats is declared on the class as a dict mapping from strings to > integers and is initialised as an empty dict" > "damage is declared on the class as an integer and is initialised as zero" > "captain is declared on instances as an integer" > > Just a minor thing, but the closer association with the name reads > better to me since "Class attribute or instance attribute?" is really > a property of the name binding, rather than of the permitted types > that can be bound to that name Hmm... But the type is *also* a property of the name binding. And I think the "class-var-ness" needs to be preserved in the __annotations__ dict somehow, so that's another reason why it "belongs" to the type rather than to the name binding (a nebulous concept to begin with). Also, I like the idea that everything between the ':' and the '=' (or the end of the line) belongs to the type checker. I expect that'll be easier for people who aren't interested in the type checker. -- --Guido van Rossum (python.org/~guido) From ericsnowcurrently at gmail.com Fri Aug 5 12:23:13 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 5 Aug 2016 10:23:13 -0600 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Fri, Aug 5, 2016 at 1:40 AM, Nick Coghlan wrote: > On 5 August 2016 at 02:40, Guido van Rossum wrote: >> class Starship: >> stats: class Dict[str, int] = {} # Pure class variable >> damage: class int = 0 # Hybrid class/instance variable >> captain: str # Pure instance variable > > I'm mostly on board with the proposal now, Same here. The only thing I'd like considered further is exposing the annotations at runtime. Others have already suggested this and Guido has indicated that they will not be evaluated at runtime. I think that's fine in the short term, but would still like future support considered (or at least not ruled out) for runtime evaluation (and availability on e.g. func/module/cls.__var_annotations__). If performance is the main concern, we should be able to add compiler support to evaluate/store the annotations selectively on a per-file basis, like we do for __future__ imports. Alternately, for functions we could evaluate the annotations in global/non-local scope at the time the code object is created (i.e. when handling the MAKE_FUNCTION op code), rather than as part of the code object's bytecode. Again, not adding the support right now is fine but we should be careful to not preclude later support. Of course, this assumes sufficient benefit from run-time access variable type annotations, which I think is weaker than the argument for function annotations. We should still keep this possible future aspect in mind though. Relatedly, it would be nice to address the future use of this syntax for more generic variable annotations (a la function annotations), but that's less of a concern for me. The only catch is that making "class" an optional part of the syntax impacts the semantics of the more generic "variable annotations". However, I don't see "class" as a problem, particularly if it is more strongly associated with the name rather than the annotation, as you've suggested below. If anything it's an argument *for* your recommendation. :) > but have one last niggle: > What do you think of associating the "class" more with the variable > name than the type definition for class declarations by putting to the > left of the colon? > > [snip] > > Just a minor thing, but the closer association with the name reads > better to me since "Class attribute or instance attribute?" is really > a property of the name binding, rather than of the permitted types > that can be bound to that name +1 One question: Will the use of the "class" syntax only be allowed at the top level of class definition blocks? -eric From steve at pearwood.info Fri Aug 5 12:29:01 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 6 Aug 2016 02:29:01 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> References: <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> Message-ID: <20160805162901.GX6608@ando.pearwood.info> On Fri, Aug 05, 2016 at 11:30:35PM +0900, Stephen J. Turnbull wrote: > I can even think of a case where clamp could be used with a constant > control and a varying bound: S-s inventory control facing occasional > large orders in an otherwise continuous, stationary demand process. Sounds interesting. Is there a link to somewhere I could learn more about this? -- Steve From alexander.belopolsky at gmail.com Fri Aug 5 12:38:57 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 5 Aug 2016 12:38:57 -0400 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Fri, Aug 5, 2016 at 12:23 PM, Eric Snow wrote: > Alternately, for functions > we could evaluate the annotations in global/non-local scope at the > time the code object is created (i.e. when handling the MAKE_FUNCTION > op code), rather than as part of the code object's bytecode. I had exactly the same thoughts, but I wonder how likely the type annotations would need local scope. For example, is something like this def f(x): v: type(x) ... going to be supported by type checkers? From yselivanov.ml at gmail.com Fri Aug 5 12:39:59 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 5 Aug 2016 12:39:59 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: On 2016-08-05 11:11 AM, Andrew Svetlov wrote: > As maintainer of aiohttp (asyncio-compatible HTTP client/server) and > aiopg (PostgreSQL asyncio driver) I totally support the pep. > > The problem is: I'm able to write async iterator class which utilizes > __aiter__/__anext__. > But users of my libraries are not. > > Writing synchronous generator functions for data processing is very > common pattern. > > But class based approach requires saving all context not in local > variables but in instance vars, you know. > > It's a nightmare in practice, as consequence people writes more > complex and error prone code than they could write with this PEP. > > https://www.python.org/dev/peps/pep-0255/ for simple generators is 15 > years old, I believe everybody understand why we need it. Thanks, Andrew! Your feedback is coming from a deep experience of working with asyncio, therefore it's extremely valuable! Yury From guido at python.org Fri Aug 5 12:41:13 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 5 Aug 2016 09:41:13 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Fri, Aug 5, 2016 at 9:23 AM, Eric Snow wrote: > The only thing I'd like considered further is exposing the annotations > at runtime. Others have already suggested this and Guido has > indicated that they will not be evaluated at runtime. I think that's > fine in the short term, but would still like future support considered > (or at least not ruled out) for runtime evaluation (and availability > on e.g. func/module/cls.__var_annotations__). Actually my current leaning is as follows: - For annotated module globals, the type annotations will be evaluated and collected in a dict named __annotations__ (itself a global variable in the module). - Annotations in class scope are evaluated and collected in a dict named __annotations__ on the class (itself a class attribute). This will contain info about annotated class variables as well as instance variables. I'm thinking that the class variable annotations will be wrapped in a `ClassVar[...]` object. - Annotations on local variables are *not* evaluated (because it would be too much of a speed concern) but the presence of an annotation still tells Python that it's a local variable, so a "local variable slot" is reserved for it (even if no initial value is present), and if you annotate it without initialization you may get an UnboundLocalError, e.g. here: def foo(): x: int print(x) # UnboundLocalError This is the same as e.g. def bar(): if False: x = 0 print(x) # UnboundLocalError > If performance is the main concern, we should be able to add compiler > support to evaluate/store the annotations selectively on a per-file > basis, like we do for __future__ imports. Alternately, for functions > we could evaluate the annotations in global/non-local scope at the > time the code object is created (i.e. when handling the MAKE_FUNCTION > op code), rather than as part of the code object's bytecode. I considered that, but before allowing that complexity, I think we should come up with a compelling use case (not a toy example). This also produces some surprising behavior, e.g. what would the following do: def foo(): T = List[int] a: T = [] # etc. If we evaluate the annotation `T` in the surrounding scope, it would be a NameError, but a type checker should have no problem with this (it's just a local type alias). > Again, not adding the support right now is fine but we should be > careful to not preclude later support. Of course, this assumes > sufficient benefit from run-time access variable type annotations, > which I think is weaker than the argument for function annotations. > We should still keep this possible future aspect in mind though. The problem with such a promise is that it has no teeth, until the future behavior is entirely specified, and then we might as well do it now. My current proposal (no evaluation of annotations for locals) means that you can write complete nonsense there (as long as it is *syntactically* correct) and Python would allow it. Surely somebody is going to come up with a trick to rely on that and then the future development would break their code. > Relatedly, it would be nice to address the future use of this syntax > for more generic variable annotations (a la function annotations), but > that's less of a concern for me. The only catch is that making > "class" an optional part of the syntax impacts the semantics of the > more generic "variable annotations". However, I don't see "class" as > a problem, particularly if it is more strongly associated with the > name rather than the annotation, as you've suggested below. If > anything it's an argument *for* your recommendation. :) I'm unclear on what you mean by "more generic variable annotations". Do you have an example? > One question: Will the use of the "class" syntax only be allowed at > the top level of class definition blocks? Oooh, very good question. I think that's the idea. Enforcement can't happen directly at the syntactic level, but it can be checked in the same way that we check that e.g. `return` only occurs in a function or `break` and `continue` only in a loop. -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Fri Aug 5 12:43:03 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 5 Aug 2016 12:43:03 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> Message-ID: <4f12caa4-fd5d-d8d8-2430-53e1a9fd3e6f@gmail.com> On 2016-08-04 11:47 PM, Nickolai Novik wrote: > Hi Yury! > > Thanks you for pushing this forward! As an author/contributor to > several packages would love to use async generator in > aiomysql/aiobotocore libraries. In particular want to share my use > case for async generators. > > aiobotocore (https://github.com/aio-libs/aiobotocore) is wrapper > around botocore library (https://github.com/boto/botocore) brings > amazon services client to asyncio world. AWS api is huge, botocore > itself 20k lines of code, so I bit hacked it here and there to add > asyncio support. Code a bit ugly and hackish, but we have full amazon > support only with few hundreds lines of code. Since protocol separated > from IO, it is pretty strait forward to change requests http client > with aiohttp one. But to port pagination logic, so it is possible to > use async for, was very challenging. Here is original pagination > https://github.com/boto/botocore/blob/bb09e88508f5593ce4393c72e7c1edbaf6d28a6a/botocore/paginate.py#L91-L145 > which has structure like: > > > async def f(): > var = await io_fetch() > while True: > if var > 10: > var += 1 > yield var > var = await io_fetch() > else: > var +=2 > yield var > > > If we rewrite this generator using __aiter__ code become very ugly and > barely readable, since we need to track a lot of states, with async > generator such problem does not exist. Moreover with this PEP, > generators could be easily ported by just putting async/await in > proper places, without changing logic. > Thank you, Nickolai! I've written a couple of iterators that look like the one you have in botocore -- I totally feel your pain. It's really cool that you share this feedback. As I sad to Andrew in this thread -- feedback from power asyncio users is extremely useful. Thanks, Yury From brett at python.org Fri Aug 5 12:17:20 2016 From: brett at python.org (Brett Cannon) Date: Fri, 05 Aug 2016 16:17:20 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <57A3955E.7080901@brenbarn.net> References: <57A38A7A.4060401@brenbarn.net> <57A3955E.7080901@brenbarn.net> Message-ID: On Thu, 4 Aug 2016 at 12:20 Brendan Barnwell wrote: > On 2016-08-04 12:09, Alexander Belopolsky wrote: > > On Thu, Aug 4, 2016 at 2:33 PM, Brendan Barnwell > wrote: > >> It seems odd to have Python syntax that not only doesn't do > anything, but > >> can't even be made to do anything by the program itself when it runs. > > > > Why do you find this odd? Python already has #-comment syntax with > > exactly the property you complain about. Think of local variable > > annotation as a structured comment syntax. > > Exactly. But the existing comment annotations (# type: float) > already > are structured comment syntax, so what does this new one add? (Note we > already have at least one parallel --- although it is much more limited > in scope --- namely encoding declarations, which are also done within > existing comment syntax, rather than having their own "real" syntax.) > [I'm going to give an explanation one shot, then I will leave the topic alone as explaining the "why" definitely has the possibility of dragging on] You're right this could be viewed as syntactic sugar, but then again so is multiplication for integers as a short-hand for addition or the fact we even have infix operators which translate to method calls on objects. In all cases, the syntax is meant to either make people's lives easier to to promote their use. In this instance it both makes people's lives easier and promotes use. By making it syntax it becomes easier to use as the compiler can now detect when the information is improperly typed. And by making it syntax, not only does it make sure there's a strong standard but lets people know that this isn't some little feature but in fact is being promoted by the Python language itself. There's also the disconnect of having type annotations on function parameters but completely missing the OOP aspect of attributes on objects which leaves a gaping hole in terms of syntactic support for type hints. Hopefully that all makes sense as to why Guido has brought this up. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kxepal at gmail.com Fri Aug 5 16:34:19 2016 From: kxepal at gmail.com (Alexander Shorin) Date: Fri, 5 Aug 2016 23:34:19 +0300 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: Hi Yury! Thanks for this PEP! This is really what asyncio misses since you made async iterators. As contributor to aiohttp and maintainer of aiocouchdb and few else projects I also support this PEP. Great one! I agree with what Nikolay and Andrew already said: dancing around__aiter__/__anext__ to emulate genarator-like behaviour is quite boring and complicated. This also makes porting "old synced" code onto asyncio harder since you have to rewrite every generator in iterator fashion and wrap each with an object that provides __aiter__ / __anext__ interface. For the reference I have is ijson PR[1] that adds asyncio support. If we forget about compatibility with 3.3/3.4, with async generators the result implementation would be much more elegant and looks more closer to the original synced code, because it's all based on generators. Lack of async generators forces people to implement them on their own [2] ones[3], not very effective as the one that could be built-in. Also custom interface for each implementation makes hardly to reuse them and don't even imagine what happens if two custom async generator implementations will met in a single project. So thank you again, Yury, for pushing this really important and missing feature forward! [1]: https://github.com/isagalaev/ijson/pull/46 [2]: https://github.com/germn/aiogen [3]: https://github.com/ethanfrey/aiojson/blob/master/aiojson/utils/aiogen.py -- ,,,^..^,,, From srkunze at mail.de Fri Aug 5 16:51:33 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 5 Aug 2016 22:51:33 +0200 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: On 04.08.2016 17:17, Yury Selivanov wrote: > >> From your answer to Stefan, I get the impression that the reason why >> we actual need all those a* methods (basically a duplication of the >> existing gen protocol), is the fact that normal generators can be >> converted to coroutines. That means, 'yield' still can be used in >> both ways. >> >> So, it's a technical symptom of the backwards-compatibility rather >> than something that cannot be avoided by design. Is this correct? >> > > async/await in Python is implemented on top of the generator > protocol. Any 'await' is either awaiting on a coroutine or on a > Future-like object. Future-like objects are defined by implementing > the __await__ method, which should return a generator. > > So coroutines and generators are very intimately tied to each other, > and that's *by design*. > > Any coroutine that iterates over an asynchronous generator uses the > generator protocol behind the scenes. So we have to multiplex the > async generaotor's "yields" into the generator protocol in such a way, > that it stays isolated, and does not interfere with the "yields" that > drive async/await. Yes, that is how I understand it as well. So, all this complexity stems from the intertwining of generators and coroutines. I am wondering if loosening the tie between the two could make it all simpler; even for you. (Not right now, but maybe later.) > >> If it's correct, would you think it would make sense to get rid of >> the a* in a later iteration of the async capabilities of Python? So, >> just using the normal generator protocol again? > > Because async generators will contain 'await' expressions, we have to > have a* methods (although we can name them without the "a" prefix, but > I believe that would be confusing for many users). > I think that is related to Stefan's question here. I cannot speak for him but it seems to me the confusion is actually the other way around. Coming from a database background, I prefer redundant-free data representation. The "a" prefix encodes the information "that's an asynchronous generator" which is repeated several times and thus is redundant. Another example: when writing code for generators and checking for the "send" method, I would then need to check for the "asend" method as well. So, I would need to touch that code although it even might not have been necessary in the first place. I don't want to talk you out of the "a" prefix, but I get the feeling that the information "that's asynchronous" should be provided as a separate attribute. I believe it would reduce confusion, simplify duck-typing and flatten the learning curve. :) >> One note on all examples but the last. Reading those examples, it >> creates the illusion of actual working code which is not the case, >> right? One would always need to 1) wrap module-level statements into >> its own coroutine, 2) create an event-loop and 3) run it. Do you >> think clarifying this in the PEP makes sense? > > I'll think about this, thanks! Maybe I can add a line in the > beginning of the "Specification" section. > >> Thanks again for your hard work here. Async generators definitely >> completes the picture. > > Thank you, Sven > > Yury You're welcome. :) Sven From srkunze at mail.de Fri Aug 5 17:03:55 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 5 Aug 2016 23:03:55 +0200 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <4e473af1-bd49-4409-2e51-c873d6aeb26a@mail.de> On 05.08.2016 18:41, Guido van Rossum wrote: > I considered that, but before allowing that complexity, I think we > should come up with a compelling use case (not a toy example). This > also produces some surprising behavior, e.g. what would the following > do: > > def foo(): > T = List[int] > a: T = [] > # etc. > > If we evaluate the annotation `T` in the surrounding scope, it would > be a NameError, but a type checker should have no problem with this > (it's just a local type alias). Will arbitrary expressions work or only type declarations? a: I am asking because https://www.python.org/dev/peps/pep-3107/#parameters is not limited to types. -- Sven From ericsnowcurrently at gmail.com Fri Aug 5 18:26:07 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 5 Aug 2016 16:26:07 -0600 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Fri, Aug 5, 2016 at 10:41 AM, Guido van Rossum wrote: > On Fri, Aug 5, 2016 at 9:23 AM, Eric Snow wrote: >> The only thing I'd like considered further is exposing the annotations >> at runtime. > > Actually my current leaning is as follows: > [snip] Sounds good. I don't think it's likely to be a problem for code that expects __annotations__ only on functions (if such code exists). > I considered that, but before allowing that complexity, I think we > should come up with a compelling use case (not a toy example). Agreed. > [snip] >> We should still keep this possible future aspect in mind though. > > The problem with such a promise is that it has no teeth, until the > future behavior is entirely specified, and then we might as well do it > now. My current proposal (no evaluation of annotations for locals) > means that you can write complete nonsense there (as long as it is > *syntactically* correct) and Python would allow it. Surely somebody is > going to come up with a trick to rely on that and then the future > development would break their code. Yeah, I think your current approach is good enough. >> Relatedly, it would be nice to address the future use of this syntax >> for more generic variable annotations (a la function annotations), but >> that's less of a concern for me. The only catch is that making >> "class" an optional part of the syntax impacts the semantics of the >> more generic "variable annotations". However, I don't see "class" as >> a problem, particularly if it is more strongly associated with the >> name rather than the annotation, as you've suggested below. If >> anything it's an argument *for* your recommendation. :) > > I'm unclear on what you mean by "more generic variable annotations". > Do you have an example? I'm talking about the idea of using variable annotations for more than just type declarations, just as there are multiple uses in the wild for function annotations. As I said, I'm not terribly interested in the use case and just wanted to point it out. :) > >> One question: Will the use of the "class" syntax only be allowed at >> the top level of class definition blocks? > > Oooh, very good question. I think that's the idea. Enforcement can't > happen directly at the syntactic level, but it can be checked in the > same way that we check that e.g. `return` only occurs in a function or > `break` and `continue` only in a loop. Sounds good. -eric From guido at python.org Fri Aug 5 18:31:03 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 5 Aug 2016 15:31:03 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Fri, Aug 5, 2016 at 3:26 PM, Eric Snow wrote: > On Fri, Aug 5, 2016 at 10:41 AM, Guido van Rossum wrote: >> On Fri, Aug 5, 2016 at 9:23 AM, Eric Snow wrote: >>> Relatedly, it would be nice to address the future use of this syntax >>> for more generic variable annotations (a la function annotations), but >>> that's less of a concern for me. The only catch is that making >>> "class" an optional part of the syntax impacts the semantics of the >>> more generic "variable annotations". However, I don't see "class" as >>> a problem, particularly if it is more strongly associated with the >>> name rather than the annotation, as you've suggested below. If >>> anything it's an argument *for* your recommendation. :) >> >> I'm unclear on what you mean by "more generic variable annotations". >> Do you have an example? > > I'm talking about the idea of using variable annotations for more than > just type declarations, just as there are multiple uses in the wild > for function annotations. As I said, I'm not terribly interested in > the use case and just wanted to point it out. :) Heh, I actively want to squash such uses. My confusion was due to the specific meaning of "generic" in PEP 484. -- --Guido van Rossum (python.org/~guido) From ericsnowcurrently at gmail.com Fri Aug 5 18:37:04 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 5 Aug 2016 16:37:04 -0600 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Fri, Aug 5, 2016 at 10:41 AM, Guido van Rossum wrote: > Actually my current leaning is as follows: > > [for modules: evaluate and store in mod.__annotations__] > [for classes: evaluate and store in cls.__annotations__] > [for function bodies: recognize but do not evaluate] I wonder if that distinction will bias anyone away from using annotations in function bodies (or using them at all). Probably not significantly. -eric From guido at python.org Fri Aug 5 18:39:40 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 5 Aug 2016 15:39:40 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Fri, Aug 5, 2016 at 3:37 PM, Eric Snow wrote: > On Fri, Aug 5, 2016 at 10:41 AM, Guido van Rossum wrote: >> Actually my current leaning is as follows: >> >> [for modules: evaluate and store in mod.__annotations__] >> [for classes: evaluate and store in cls.__annotations__] >> [for function bodies: recognize but do not evaluate] > > I wonder if that distinction will bias anyone away from using > annotations in function bodies (or using them at all). Probably not > significantly. My current bias is towards not using annotations in function bodies unless mypy insists on them (as it does in some corner cases where the type inference falls short or requires too much looking ahead). In classes my bias is towards fully specifying all instance variables, because they serve an important documentation purpose. -- --Guido van Rossum (python.org/~guido) From chris.barker at noaa.gov Fri Aug 5 19:03:14 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Fri, 5 Aug 2016 16:03:14 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160805015434.GS6608@ando.pearwood.info> References: <20160801201044.GA6608@ando.pearwood.info> <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <4929689045364026497@unknownmsgid> <20160805015434.GS6608@ando.pearwood.info> Message-ID: <-4955769658966475468@unknownmsgid> >> One could argue that: >> >> clamp(NaN, x,x) >> >> Is clearly defined as x. But that would require special casing > > Not so special: > > if lower == upper != None: > return lower I had the impression earlier that you didn't want a whole pile of special cases, even if each was simple. But sure, this is a nice one to get "right". > on a NAN will lead to a non-NAN: Yes, if you'd get the same non-nan result for any floating point value where the NaN is, then that makes sense - if NaN means "we have no idea what value this is", but you get the same result regardless, then fine. But that does Not apply to: clamp(x, y, Nan) Or min(x NaN) However, as wrong as I might think it is ( ;-) ) -- it seems the IEEE has decided that: min(x, NaN) should truth x (and max). So we should be consistent. > Arithmetic operations upon NaNs ... never signal INVALID, and always produce NaN unless > replacing every NaN operand by any finite or infinite real values > would produce the same finite or infinite floating-point result > independent of the replacements. Which is the case for: clamp(NaN, x,x) But is not for; clamp(x, NaN, NaN) But a standard is a standard :-( -CHB From yselivanov.ml at gmail.com Fri Aug 5 19:25:29 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 5 Aug 2016 19:25:29 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> On 2016-08-05 4:51 PM, Sven R. Kunze wrote: > On 04.08.2016 17:17, Yury Selivanov wrote: >> >>> From your answer to Stefan, I get the impression that the reason why >>> we actual need all those a* methods (basically a duplication of the >>> existing gen protocol), is the fact that normal generators can be >>> converted to coroutines. That means, 'yield' still can be used in >>> both ways. >>> >>> So, it's a technical symptom of the backwards-compatibility rather >>> than something that cannot be avoided by design. Is this correct? >>> >> >> async/await in Python is implemented on top of the generator >> protocol. Any 'await' is either awaiting on a coroutine or on a >> Future-like object. Future-like objects are defined by implementing >> the __await__ method, which should return a generator. >> >> So coroutines and generators are very intimately tied to each other, >> and that's *by design*. >> >> Any coroutine that iterates over an asynchronous generator uses the >> generator protocol behind the scenes. So we have to multiplex the >> async generaotor's "yields" into the generator protocol in such a >> way, that it stays isolated, and does not interfere with the "yields" >> that drive async/await. > > Yes, that is how I understand it as well. So, all this complexity > stems from the intertwining of generators and coroutines. > > I am wondering if loosening the tie between the two could make it all > simpler; even for you. (Not right now, but maybe later.) Unfortunately, the only way to "loosen the tie" is to re-implement generator protocol for coroutines (make them a separate thing). Besides breaking backwards compatibility with the existing Python code that uses @coroutine decorator and 'yield from' syntax, it will also introduce a ton of work. > > > >> >>> If it's correct, would you think it would make sense to get rid of >>> the a* in a later iteration of the async capabilities of Python? So, >>> just using the normal generator protocol again? >> >> Because async generators will contain 'await' expressions, we have to >> have a* methods (although we can name them without the "a" prefix, >> but I believe that would be confusing for many users). >> > > I think that is related to Stefan's question here. I cannot speak for > him but it seems to me the confusion is actually the other way around. > > Coming from a database background, I prefer redundant-free data > representation. The "a" prefix encodes the information "that's an > asynchronous generator" which is repeated several times and thus is > redundant. > > Another example: when writing code for generators and checking for the > "send" method, I would then need to check for the "asend" method as > well. So, I would need to touch that code although it even might not > have been necessary in the first place. > > I don't want to talk you out of the "a" prefix, but I get the feeling > that the information "that's asynchronous" should be provided as a > separate attribute. I believe it would reduce confusion, simplify > duck-typing and flatten the learning curve. :) Thing is you can't write one piece of code that will accept any type of generator (sync or async). * send, throw, close, __next__ methods for sync generators, they are synchronous. * send, throw, close, __next__ methods for async generators, they *require* to use 'await' on them. There is no way to make them "synchronous", because you have awaits in async generators. Because of the above, duck-typing simply isn't possible. The prefix is there to make people aware that this is a completely different API, even though it looks similar. Thank you, Yury From yselivanov.ml at gmail.com Fri Aug 5 19:30:49 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 5 Aug 2016 19:30:49 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: <9f7b6740-9113-469c-ac27-ad3fe048d836@gmail.com> On 2016-08-05 4:34 PM, Alexander Shorin wrote: > Hi Yury! > > Thanks for this PEP! This is really what asyncio misses since you made > async iterators. > > As contributor to aiohttp and maintainer of aiocouchdb and few else > projects I also support this PEP. Great one! > > I agree with what Nikolay and Andrew already said: dancing > around__aiter__/__anext__ to emulate genarator-like behaviour is quite > boring and complicated. This also makes porting "old synced" code onto > asyncio harder since you have to rewrite every generator in iterator > fashion and wrap each with an object that provides __aiter__ / > __anext__ interface. > > For the reference I have is ijson PR[1] that adds asyncio support. If > we forget about compatibility with 3.3/3.4, with async generators the > result implementation would be much more elegant and looks more closer > to the original synced code, because it's all based on generators. > > Lack of async generators forces people to implement them on their own > [2] ones[3], not very effective as the one that could be built-in. > Also custom interface for each implementation makes hardly to reuse > them and don't even imagine what happens if two custom async generator > implementations will met in a single project. > > So thank you again, Yury, for pushing this really important and > missing feature forward! > > [1]: https://github.com/isagalaev/ijson/pull/46 > [2]: https://github.com/germn/aiogen > [3]: https://github.com/ethanfrey/aiojson/blob/master/aiojson/utils/aiogen.py > Thank you Alexander for the feedback! It's really great that asyncio users begin to share their experience in this thread. Indeed, the key idea is to complete async/await implementation, so that you can think in async/await terms and use simply use existing Python idioms. Want a context manager in your asyncio application? Use 'async with'. Want an iterator? Use 'async for' and async generators. Hopefully we'll get there. Thank you, Yury From steve at pearwood.info Fri Aug 5 22:09:33 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 6 Aug 2016 12:09:33 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <4e473af1-bd49-4409-2e51-c873d6aeb26a@mail.de> References: <4e473af1-bd49-4409-2e51-c873d6aeb26a@mail.de> Message-ID: <20160806020933.GY6608@ando.pearwood.info> On Fri, Aug 05, 2016 at 11:03:55PM +0200, Sven R. Kunze wrote: > Will arbitrary expressions work or only type declarations? > > a: My understanding is that the Python interpreter will treat the part after the colon as a no-op. So long as it is syntactically valid, it will be ignored, and never evaluated at all. # Okay spam: fe + fi * fo - fum # SyntaxError spam: fe fi fo fum But the type-checker (if any) should be expected to complain bitterly about anything it doesn't understand, and rightly so. I think that this form of variable annotation should be officially reserved for type hints, and only type hints. -- Steve From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Aug 6 01:24:01 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 6 Aug 2016 14:24:01 +0900 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160805162901.GX6608@ando.pearwood.info> References: <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> Message-ID: <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> Steven D'Aprano writes: > On Fri, Aug 05, 2016 at 11:30:35PM +0900, Stephen J. Turnbull wrote: > > > I can even think of a case where clamp could be used with a constant > > control and a varying bound: S-s inventory control facing occasional > > large orders in an otherwise continuous, stationary demand process. > > Sounds interesting. Is there a link to somewhere I could learn more > about this? The textbook I use is Nancy Stokey, The Economics of Inaction https://www.amazon.co.jp/s/ref=nb_sb_noss?__mk_ja_JP=%E3%82%AB%E3%82%BF%E3%82%AB%E3%83%8A&url=search-alias%3Daps&field-keywords=nancy+stokey+economics+inaction The example I gave is not a textbook example, but is an "obvious" extension of the simplest textbook models. From leewangzhong+python at gmail.com Sat Aug 6 03:55:32 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 6 Aug 2016 03:55:32 -0400 Subject: [Python-ideas] Addition to operator module: starcaller In-Reply-To: <579C41BA.1040205@brenbarn.net> References: <57915EA4.6080602@canterbury.ac.nz> <579C41BA.1040205@brenbarn.net> Message-ID: On Jul 29, 2016 11:00 PM, "Michael Selik" wrote: > If it goes in the operator module, then the name ``operator.starcall`` seems better than ``starcaller``. I'd say ``operator.unpack`` would be better, except for the confusion between unpacking assignment and unpacking as arguments. I suppose ``operator.apply`` would be OK, too. Is there a better vocabulary word for unpack-as-args? Not sure if "splat" is better. `operator.starcall(f, *args, **kwargs)` isn't obvious, unless you know `itertools.starmap` and `map`. `operator.unpack(...)` can be obvious, and the obvious meanings for many are probably wrong. That's bad. "unpack[ed]call" is ugly, long, and three syllables. On Jul 30, 2016 1:57 AM, "Brendan Barnwell" wrote: > Why not just operator.call? I suppose actually operator.caller would be more consistent with the existing attrgetter and methodcaller? The desired meaning of `somename(x, y)` is `x(*y)`. `call(x,y)` looks like it should be equivalent to `x.__call__(y)`, which means `x(y)`. `attrgetter(x)(y)` means `getattr(y,x)` (in the simple case), and `methodcaller(m,x)(y)` means `getattr(y,m)(x)`, so `caller(x)(y)` might mean `y(x)`. `operator.thinger` is for reverse currying: `thingdoer(x)` returns a new callable, and `thingdoer(x)(y)` means `dothing(y,x)` (conceptually). (I'm not sure if the original proposal of a wrapper would be called `starrer(f)(args)` or `unstarrer(f)(args)`,and that bugs me.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Aug 6 04:09:39 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 6 Aug 2016 18:09:39 +1000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On 6 August 2016 at 02:12, Guido van Rossum wrote: > On Fri, Aug 5, 2016 at 12:40 AM, Nick Coghlan wrote: >> That is: >> >> class Starship: >> stats class: Dict[str, int] = {} # Pure class variable >> damage class: int = 0 # Hybrid class/instance variable >> captain: str # Pure instance variable >> >> Pronounced as: >> >> "stats is declared on the class as a dict mapping from strings to >> integers and is initialised as an empty dict" >> "damage is declared on the class as an integer and is initialised as zero" >> "captain is declared on instances as an integer" >> >> Just a minor thing, but the closer association with the name reads >> better to me since "Class attribute or instance attribute?" is really >> a property of the name binding, rather than of the permitted types >> that can be bound to that name > > Hmm... But the type is *also* a property of the name binding. And I > think the "class-var-ness" needs to be preserved in the > __annotations__ dict somehow, so that's another reason why it > "belongs" to the type rather than to the name binding (a nebulous > concept to begin with). > > Also, I like the idea that everything between the ':' and the '=' (or > the end of the line) belongs to the type checker. I expect that'll be > easier for people who aren't interested in the type checker. Fair point, although this and the __var_annotations__ discussion also raises the point that annotations have to date always been valid Python expressions. So perhaps rather than using the class keyword, it would make sense to riff off classmethod and generic types to propose: class Starship: stats: ClassAttr[Dict[str, int]] = {} # Pure class variable damage: DefaultAttr[int] = 0 # Hybrid class/instance variable captain: str # Pure instance variable Pronounced as: "stats is a class attribute mapping from strings to integers and is initialised as an empty dict" "damage is an instance attribute declared as an integer with a default value of zero defined on the class" "captain is an instance attribute declared as a string" In addition to clearly distinguishing class-only attributes from class-attribute-as-default-value, this would also mean that the specific meaning of those annotations could be documented in typing.ClassAttr and typing.DefaultAttr, rather than needing to be covered in the documentation of the type annotation syntax itself. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Aug 6 04:28:31 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 6 Aug 2016 18:28:31 +1000 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: On 6 August 2016 at 01:11, Andrew Svetlov wrote: > Honestly I personally don't feel a need for two-way generators with > `asend()`/`athrow()` but as Yury explained he need them internally for > `anext`/`aclose` anyway. Even if asend()/athrow() are cheap from an implementation perspective, they're *not* necessarily cheap from a "cognitive burden of learning the API" perspective. We know what send() and throw() on generators are for: implementing coroutines. However, we also know that layering that on top of the generator protocol turned out to be inherently confusing, hence PEP 492 and async/await. So if we don't have a concrete use case for "coroutine coroutines" (what does that even *mean*?), we shouldn't add asend()/athrow() just because it's easy to do so. If we wanted to hedge our bets, we could expose them as _asend() and _athrow() initially, on the grounds that doing so is cheap, and we're genuinely curious to see if anyone can find a non-toy use case for them. But we should be very reluctant to add them to the public API without a clear idea of why they're there. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan_ml at behnel.de Sat Aug 6 04:29:05 2016 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 6 Aug 2016 10:29:05 +0200 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> Message-ID: Yury Selivanov schrieb am 03.08.2016 um 17:32: > On 2016-08-03 2:45 AM, Stefan Behnel wrote: >> Yury Selivanov schrieb am 03.08.2016 um 00:31: >>> PEP 492 requires an event loop or a scheduler to run coroutines. >>> Because asynchronous generators are meant to be used from coroutines, >>> they also require an event loop to run and finalize them. >> Well, or *something* that uses them in the same way as an event loop would. >> Doesn't have to be an event loop. > > Sure, I'm just using the same terminology PEP 492 was defined with. > We can say "coroutine runner" instead of "event loop". > >>> 1. Implement an ``aclose`` method on asynchronous generators >>> returning a special *awaitable*. When awaited it >>> throws a ``GeneratorExit`` into the suspended generator and >>> iterates over it until either a ``GeneratorExit`` or >>> a ``StopAsyncIteration`` occur. >>> >>> This is very similar to what the ``close()`` method does to regular >>> Python generators, except that an event loop is required to execute >>> ``aclose()``. >> I don't see a motivation for adding an "aclose()" method in addition to the >> normal "close()" method. Similar for send/throw. Could you elaborate on >> that? > > There will be no "close", "send" and "throw" defined for asynchronous > generators. Only their asynchronous equivalents. > [...] > Since all this is quite different from sync generators' close > method, it was decided to have a different name for this method > for async generators: aclose. Ok, why not. Different names for similar things that behave differently enough. > The 'agen' generator, on the lowest level of generators implementation > will yield two things -- 'spam', and a wrapped 123 value. Because > 123 is wrapped, the async generators machinery can distinguish async > yields from normal yields. This is actually going to be tricky to backport for Cython (which supports Py2.6+) since it seems to depend on a globally known C implemented wrapper object type. We'd have to find a way to share that across different packages and also across different Cython versions (types are only shared within the same Cython version). I guess we'd have to store a reference to that type in some well hidden global place somewhere, and then never touch its implementation again... Is that wrapper type going to be exposed anywhere in the Python visible world, or is it purely internal? (Not that I see a use case for making it visible to Python code...) BTW, why wouldn't "async yield from" work if the only distinction point is whether a yielded object is wrapped or not? That should work at any level of delegation, shouldn't it? >>> 3. Add two new methods to the ``sys`` module: >>> ``set_asyncgen_finalizer()`` and ``get_asyncgen_finalizer()``. >>> >>> The idea behind ``sys.set_asyncgen_finalizer()`` is to allow event >>> loops to handle generators finalization, so that the end user >>> does not need to care about the finalization problem, and it just >>> works. >>> >>> When an asynchronous generator is iterated for the first time, >>> it stores a reference to the current finalizer. If there is none, >>> a ``RuntimeError`` is raised. This provides a strong guarantee that >>> every asynchronous generator object will always have a finalizer >>> installed by the correct event loop. >>> >>> When an asynchronous generator is about to be garbage collected, >>> it calls its cached finalizer. The assumption is that the finalizer >>> will schedule an ``aclose()`` call with the loop that was active >>> when the iteration started. >>> >>> For instance, here is how asyncio is modified to allow safe >>> finalization of asynchronous generators:: >>> >>> # asyncio/base_events.py >>> >>> class BaseEventLoop: >>> >>> def run_forever(self): >>> ... >>> old_finalizer = sys.get_asyncgen_finalizer() >>> sys.set_asyncgen_finalizer(self._finalize_asyncgen) >>> try: >>> ... >>> finally: >>> sys.set_asyncgen_finalizer(old_finalizer) >>> ... >>> >>> def _finalize_asyncgen(self, gen): >>> self.create_task(gen.aclose()) >>> >>> ``sys.set_asyncgen_finalizer()`` is thread-specific, so several event >>> loops running in parallel threads can use it safely. >> Phew, this adds quite some complexity and magic. That is a problem. For >> one, this uses a global setup, so There Can Only Be One of these >> finalizers. ISTM that if special cleanup is required, either the asyncgen >> itself should know how to do that, or we should provide some explicit API >> that does something when *initialising* the asyncgen. That seems better >> than doing something global behind the users' back. Have you considered >> providing some kind of factory in asyncio that wraps asyncgens or so? > > set_asyncgen_finalizer is thread-specific, so you can have one > finalizer set up per thread. > > The reference implementation actually integrates this all into > asyncio. The idea is to setup loop async gens finalizer just > before the loop starts, and reset the finalizer to the previous > one (usually it's None) just before it stops. > > The finalizer is attached to a generator when it is yielding > for the first time -- this guarantees that every generators will > have a correct finalizer attached to it. > > It's not right to attach the finalizer (or wrap the generator) > when the generator is initialized. Consider this code: > > async def foo(): > async with smth(): > yield > > async def coro(gen): > async for i in foo(): > ... > > loop.run_until_complete(coro(foo())) > > ^^ In the above example, when the 'foo()' is instantiated, there > is no loop or finalizers set up at all. BUT since a loop (or > coroutine wrapper) is required to iterate async generators, there > is a strong guarantee that it *will* present on the first iteration. Correct. And it also wouldn't help to generally extend the Async-Iterator protocol with an aclose() method because ending an (async-)for loop doesn't mean we are done with the async iterator, so this would just burden the users with unnecessary cleanup handling. That's unfortunate... > Regarding "async gen itself should know how to cleanup" -- that's not > possible. async gen could just have an async with block and then > GCed (after being partially consumed). Users won't expect to do > anything besides using try..finally or async with, so it's the > responsibility of the coroutine runner to cleanup async gen. Hence > 'aclose' is a coroutine, and hence this set_asyncgen_finalizer API > for coroutine runners. > > This is indeed the most magical part of the proposal. Although it's > important to understand that the regular Python users will likely > never encounter this in their life -- finalizers will be set up > by the framework they use (asyncio, Tornado, Twisted, you name it). I think my main problem is that you keep speaking about event loops (of which There Can Be Only One, by design), whereas coroutines are a much more general concept and I cannot overlook all possible forms of using them in the future. What I would like to avoid is the case where we globally require setting up one finalizer handler (or context), and then prevent users from doing their own cleanup handling in some module context somewhere. It feels to me like there should be some kind of stacking for this (which in turn feels like a context manager) in order to support adapters and the like that need to do their own cleanup handling (or somehow intercept the global handling), regardless of what else is running. But I am having a hard time trying to come up with an example where the thread context really doesn't work. Even if an async generator is shared across multiple threads, then those threads would most likely be controlled by the coroutine runner, which would simply set up a proper finalizer context for each of the threads that points back to itself, so that it would not be the pure chance of first iteration to determine who gets to clean up afterwards. It's possible that such (contrieved?) cases can be handled in one way or another by changing the finalization itself, instead of changing the finalizer context. And that would be done by wrapping async iterators in a delegator that uses try-finally, be it via a decorator or an explicit wrapper. It's just difficult to come to a reasonable conclusion at this level of uncertainty. That's why I'm bouncing this consideration here, in case others have an idea how this problem could be avoided all together. Stefan From guido at python.org Sat Aug 6 11:29:16 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Aug 2016 08:29:16 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: Yeah I've come around to a similar position. I had ClassVar[] but your ClassAttr[] is perhaps better. On Saturday, August 6, 2016, Nick Coghlan wrote: > On 6 August 2016 at 02:12, Guido van Rossum > wrote: > > On Fri, Aug 5, 2016 at 12:40 AM, Nick Coghlan > wrote: > >> That is: > >> > >> class Starship: > >> stats class: Dict[str, int] = {} # Pure class variable > >> damage class: int = 0 # Hybrid class/instance variable > >> captain: str # Pure instance variable > >> > >> Pronounced as: > >> > >> "stats is declared on the class as a dict mapping from strings to > >> integers and is initialised as an empty dict" > >> "damage is declared on the class as an integer and is initialised > as zero" > >> "captain is declared on instances as an integer" > >> > >> Just a minor thing, but the closer association with the name reads > >> better to me since "Class attribute or instance attribute?" is really > >> a property of the name binding, rather than of the permitted types > >> that can be bound to that name > > > > Hmm... But the type is *also* a property of the name binding. And I > > think the "class-var-ness" needs to be preserved in the > > __annotations__ dict somehow, so that's another reason why it > > "belongs" to the type rather than to the name binding (a nebulous > > concept to begin with). > > > > Also, I like the idea that everything between the ':' and the '=' (or > > the end of the line) belongs to the type checker. I expect that'll be > > easier for people who aren't interested in the type checker. > > Fair point, although this and the __var_annotations__ discussion also > raises the point that annotations have to date always been valid > Python expressions. So perhaps rather than using the class keyword, it > would make sense to riff off classmethod and generic types to propose: > > class Starship: > stats: ClassAttr[Dict[str, int]] = {} # Pure class variable > damage: DefaultAttr[int] = 0 # Hybrid class/instance variable > captain: str # Pure instance variable > > Pronounced as: > > "stats is a class attribute mapping from strings to integers and > is initialised as an empty dict" > "damage is an instance attribute declared as an integer with a > default value of zero defined on the class" > "captain is an instance attribute declared as a string" > > In addition to clearly distinguishing class-only attributes from > class-attribute-as-default-value, this would also mean that the > specific meaning of those annotations could be documented in > typing.ClassAttr and typing.DefaultAttr, rather than needing to be > covered in the documentation of the type annotation syntax itself. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, > Australia > -- --Guido (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Sat Aug 6 12:39:01 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 6 Aug 2016 12:39:01 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> Message-ID: <0367abcd-a400-9573-4457-7261f0b98978@gmail.com> On 2016-08-06 4:29 AM, Stefan Behnel wrote: > Yury Selivanov schrieb am 03.08.2016 um 17:32: >> On 2016-08-03 2:45 AM, Stefan Behnel wrote: >> >> The 'agen' generator, on the lowest level of generators implementation >> will yield two things -- 'spam', and a wrapped 123 value. Because >> 123 is wrapped, the async generators machinery can distinguish async >> yields from normal yields. > This is actually going to be tricky to backport for Cython (which supports > Py2.6+) since it seems to depend on a globally known C implemented wrapper > object type. We'd have to find a way to share that across different > packages and also across different Cython versions (types are only shared > within the same Cython version). I guess we'd have to store a reference to > that type in some well hidden global place somewhere, and then never touch > its implementation again... I don't think you need to care about these details. You can implement async generators using async iteration protocol as it was defined in PEP 492. You can compile a Cython async generator into an object that implements `am_aiter` slot (which should return `self`) and `am_anext` slot (which should return a coroutine-like object, similar to what Cython compiles 'async def'). `am_anext` coroutine should "return" (i.e. raise `StopIteration`) when it reaches an "async yield" point, or raise "StopAsyncIteration" when the generator is exhausted. Essentially, because pure Python async generators work fine with 'async for' and will continue to do so in 3.6, I think there shouldn't be technical problems to add asynchronous generators in Cython. > > Is that wrapper type going to be exposed anywhere in the Python visible > world, or is it purely internal? (Not that I see a use case for making it > visible to Python code...) Yes, it's a purely internal C-level thing. Python code will never see it. > BTW, why wouldn't "async yield from" work if the only distinction point is > whether a yielded object is wrapped or not? That should work at any level > of delegation, shouldn't it? I can try ;) Although when I was working on the PEP I had a feeling that this won't work without a serious refactoring of genobject.c. >> ^^ In the above example, when the 'foo()' is instantiated, there >> is no loop or finalizers set up at all. BUT since a loop (or >> coroutine wrapper) is required to iterate async generators, there >> is a strong guarantee that it *will* present on the first iteration. > Correct. And it also wouldn't help to generally extend the Async-Iterator > protocol with an aclose() method because ending an (async-)for loop doesn't > mean we are done with the async iterator, so this would just burden the > users with unnecessary cleanup handling. That's unfortunate... Well, we have the *exact same thing* with regular (synchronous) iterators. Let's say you have an iterable object (with `__iter__` and `__next__`), which cleanups resources at the end of its iteration. If you didn't implement its __del__ properly (or at all), the resources won't be cleaned-up when it's partially iterated and then GCed. >> Regarding "async gen itself should know how to cleanup" -- that's not >> possible. async gen could just have an async with block and then >> GCed (after being partially consumed). Users won't expect to do >> anything besides using try..finally or async with, so it's the >> responsibility of the coroutine runner to cleanup async gen. Hence >> 'aclose' is a coroutine, and hence this set_asyncgen_finalizer API >> for coroutine runners. >> >> This is indeed the most magical part of the proposal. Although it's >> important to understand that the regular Python users will likely >> never encounter this in their life -- finalizers will be set up >> by the framework they use (asyncio, Tornado, Twisted, you name it). > I think my main problem is that you keep speaking about event loops (of > which There Can Be Only One, by design), whereas coroutines are a much more > general concept and I cannot overlook all possible forms of using them in > the future. What I would like to avoid is the case where we globally > require setting up one finalizer handler (or context), and then prevent > users from doing their own cleanup handling in some module context > somewhere. It feels to me like there should be some kind of stacking for > this (which in turn feels like a context manager) in order to support > adapters and the like that need to do their own cleanup handling (or > somehow intercept the global handling), regardless of what else is running. We can call the thing that runs coroutines a "coroutine runner". I'll update the PEP. Regardless of the naming issues, I don't see any potential problem with how `set_asyncgen_finalizer` is currently defined: 1. Because a finalizer (set with `set_asyncgen_finalizer`) is assigned to generators on their first iteration -- it's guaranteed that each async generator will have a correct one attached to it. 2. It's extremely unlikely that somebody will design a system that switches coroutine runners *while async/awaiting a coroutine*. There are no good reasons to do this, and I doubt that it's even a possible thing. But even in this unlikely use case, you can easily stack finalizers following this pattern: old_finalizer = sys.get_asyncgen_finalizer() sys.set_asyncgen_finalizer(my_finalizer) try: # do my thing finally: sys.set_asyncgen_finalizer(old_finalizer) Thanks, Yiry From yselivanov.ml at gmail.com Sat Aug 6 12:51:26 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 6 Aug 2016 12:51:26 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: Hi Nick, On 2016-08-06 4:28 AM, Nick Coghlan wrote: > On 6 August 2016 at 01:11, Andrew Svetlov wrote: >> Honestly I personally don't feel a need for two-way generators with >> `asend()`/`athrow()` but as Yury explained he need them internally for >> `anext`/`aclose` anyway. > Even if asend()/athrow() are cheap from an implementation perspective, > they're *not* necessarily cheap from a "cognitive burden of learning > the API" perspective. We know what send() and throw() on generators > are for: implementing coroutines. However, we also know that layering > that on top of the generator protocol turned out to be inherently > confusing, hence PEP 492 and async/await. > > So if we don't have a concrete use case for "coroutine coroutines" > (what does that even *mean*?), we shouldn't add asend()/athrow() just > because it's easy to do so. One immediate use case that comes to mind is @contextlib.asynccontextmanager Essentially this: @contextlib.asynccontextmanager async def ctx(): resource = await acquire() try: yield resource finally: await release(resource) I'm also worrying that in order to do similar things people will abuse "agen.__anext__().send()" and "agen.__anext__().throw()". And this is a very error-prone thing to do. For example, in asyncio code, you will capture yielded Futures that the generator is trying to await on (those yielded futures have to be captured by the running Task object instead). "asend" and "athrow" know about iteration protocol and can see the "wrapped" yielded objects, hence they can iterate the generator correctly and reliably. I think Nathaniel Smith (CC-ed) had a few more use cases for asend/athrow. > > If we wanted to hedge our bets, we could expose them as _asend() and > _athrow() initially, on the grounds that doing so is cheap, and we're > genuinely curious to see if anyone can find a non-toy use case for > them. But we should be very reluctant to add them to the public API > without a clear idea of why they're there. I think we should decide if we want to expose them at all. If we're in doubt, let's not do that at all. Exposing them with a "_" prefix doesn't free us from maintaining backwards compatibility etc. You're completely right about the increased "cognitive burden of learning the API". On the other hand, send() and throw() for sync generators were designed for power users; most Python developers don't even know that they exist. I think we can view asend() and athrow() similarly. Thank you, Yury From stefan_ml at behnel.de Sat Aug 6 13:03:30 2016 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 6 Aug 2016 19:03:30 +0200 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <0367abcd-a400-9573-4457-7261f0b98978@gmail.com> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <0367abcd-a400-9573-4457-7261f0b98978@gmail.com> Message-ID: Yury Selivanov schrieb am 06.08.2016 um 18:39: > You can implement async generators using async iteration protocol as > it was defined in PEP 492. > [...] > Essentially, because pure Python async generators work fine > with 'async for' and will continue to do so in 3.6, I think > there shouldn't be technical problems to add asynchronous > generators in Cython. Hmm, I thought that at least interoperability might get in the way. I guess I'll just have to give it a try... > 2. It's extremely unlikely that somebody will design a system that > switches coroutine runners *while async/awaiting a coroutine*. Yes, I guess so. > But even in this unlikely use case, you can > easily stack finalizers following this pattern: > > old_finalizer = sys.get_asyncgen_finalizer() > sys.set_asyncgen_finalizer(my_finalizer) > try: > # do my thing > finally: > sys.set_asyncgen_finalizer(old_finalizer) That only works for synchronous code, though, because if this is done in a coroutine, it might get suspended within the try block and would leak its own finalizer into the outer world. Stefan From yselivanov.ml at gmail.com Sat Aug 6 13:11:00 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 6 Aug 2016 13:11:00 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <0367abcd-a400-9573-4457-7261f0b98978@gmail.com> Message-ID: <9c80028c-3adb-885f-b11a-bffb84896e80@gmail.com> On 2016-08-06 1:03 PM, Stefan Behnel wrote: > Yury Selivanov schrieb am 06.08.2016 um 18:39: >> You can implement async generators using async iteration protocol as >> it was defined in PEP 492. >> [...] >> Essentially, because pure Python async generators work fine >> with 'async for' and will continue to do so in 3.6, I think >> there shouldn't be technical problems to add asynchronous >> generators in Cython. > Hmm, I thought that at least interoperability might get in the way. I guess > I'll just have to give it a try... Maybe, if we add 'yield from' (which I really don't want to bother with in 3.6 at least). It'd be extremely helpful if you could prototype the PEP in Cython! >> 2. It's extremely unlikely that somebody will design a system that >> switches coroutine runners *while async/awaiting a coroutine*. > Yes, I guess so. > > >> But even in this unlikely use case, you can >> easily stack finalizers following this pattern: >> >> old_finalizer = sys.get_asyncgen_finalizer() >> sys.set_asyncgen_finalizer(my_finalizer) >> try: >> # do my thing >> finally: >> sys.set_asyncgen_finalizer(old_finalizer) > That only works for synchronous code, though, because if this is done in a > coroutine, it might get suspended within the try block and would leak its > own finalizer into the outer world. set_asyncgen_finalizer is designed to be used *only* by coroutine runners. This is a low-level API that coroutines should never touch. (At least my experience working with coroutines says so...) Thank you, Yury From brett at python.org Sat Aug 6 13:41:11 2016 From: brett at python.org (Brett Cannon) Date: Sat, 06 Aug 2016 17:41:11 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Fri, 5 Aug 2016 at 15:27 Eric Snow wrote: > On Fri, Aug 5, 2016 at 10:41 AM, Guido van Rossum > wrote: > > On Fri, Aug 5, 2016 at 9:23 AM, Eric Snow > wrote: > >> The only thing I'd like considered further is exposing the annotations > >> at runtime. > > > > Actually my current leaning is as follows: > > [snip] > > Sounds good. I don't think it's likely to be a problem for code that > expects __annotations__ only on functions (if such code exists). > > > I considered that, but before allowing that complexity, I think we > > should come up with a compelling use case (not a toy example). > > Agreed. > > > [snip] > >> We should still keep this possible future aspect in mind though. > > > > The problem with such a promise is that it has no teeth, until the > > future behavior is entirely specified, and then we might as well do it > > now. My current proposal (no evaluation of annotations for locals) > > means that you can write complete nonsense there (as long as it is > > *syntactically* correct) and Python would allow it. Surely somebody is > > going to come up with a trick to rely on that and then the future > > development would break their code. > > Yeah, I think your current approach is good enough. > > >> Relatedly, it would be nice to address the future use of this syntax > >> for more generic variable annotations (a la function annotations), but > >> that's less of a concern for me. The only catch is that making > >> "class" an optional part of the syntax impacts the semantics of the > >> more generic "variable annotations". However, I don't see "class" as > >> a problem, particularly if it is more strongly associated with the > >> name rather than the annotation, as you've suggested below. If > >> anything it's an argument *for* your recommendation. :) > > > > I'm unclear on what you mean by "more generic variable annotations". > > Do you have an example? > > I'm talking about the idea of using variable annotations for more than > just type declarations, just as there are multiple uses in the wild > for function annotations. As I said, I'm not terribly interested in > the use case and just wanted to point it out. :) > Since the information will be in the AST, it would be a more generic solution to come up with a way to keep the AST that was used to generate a code object around. I'm not saying that this is a specific reason to try and tackle that idea (although it does come up regularly), but I think it's a reasonable thing to punt on if this is a better all-around solution in this case. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kxepal at gmail.com Sat Aug 6 17:21:35 2016 From: kxepal at gmail.com (Alexander Shorin) Date: Sun, 7 Aug 2016 00:21:35 +0300 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: While Nick has a point about coroutine coroutines confusion. I consider asend/athow will be useful in real world. Because they provide stable, well known and uniform interface to communicate with a coroutine. For now to have two-way communication with some coroutine you have to establish two queues: inbox and outbox, pass them around, don't eventually mess them and so on. Queues are great for message-passing communication and generally good idea to use, but it still some boilerplate code as like as __aiter__/__anext__ (which async generators will get rid) to maintain. Also queue is a bad reference to a coroutine owns it - it's hard to find own who is really reading that inbox in complex codebase. With asend/athrow there will be no need to plass queues around, but just coroutine instance and do all the communication with it as with plain object. You need to retrieve a value - do iterate over it; need to send a value back - do asend. Termination and error-passing are aslo easy to do. So while this interface looks complicated for learning curve, it could be very helpful, when you need simple communication channel between coroutines sized equals one. For something greater you have to return to queues approach, but they involve quite a lot of else bits to care about like queue overflow, which is indeed topic for advanced users. -- ,,,^..^,,, On Sat, Aug 6, 2016 at 7:51 PM, Yury Selivanov wrote: > Hi Nick, > > On 2016-08-06 4:28 AM, Nick Coghlan wrote: > >> On 6 August 2016 at 01:11, Andrew Svetlov >> wrote: >>> >>> Honestly I personally don't feel a need for two-way generators with >>> `asend()`/`athrow()` but as Yury explained he need them internally for >>> `anext`/`aclose` anyway. >> >> Even if asend()/athrow() are cheap from an implementation perspective, >> they're *not* necessarily cheap from a "cognitive burden of learning >> the API" perspective. We know what send() and throw() on generators >> are for: implementing coroutines. However, we also know that layering >> that on top of the generator protocol turned out to be inherently >> confusing, hence PEP 492 and async/await. >> >> So if we don't have a concrete use case for "coroutine coroutines" >> (what does that even *mean*?), we shouldn't add asend()/athrow() just >> because it's easy to do so. > > > One immediate use case that comes to mind is > @contextlib.asynccontextmanager > > Essentially this: > > @contextlib.asynccontextmanager > async def ctx(): > resource = await acquire() > try: > yield resource > finally: > await release(resource) > > I'm also worrying that in order to do similar things people will > abuse "agen.__anext__().send()" and "agen.__anext__().throw()". > And this is a very error-prone thing to do. For example, in > asyncio code, you will capture yielded Futures that the > generator is trying to await on (those yielded futures > have to be captured by the running Task object instead). > > "asend" and "athrow" know about iteration protocol and can see > the "wrapped" yielded objects, hence they can iterate the > generator correctly and reliably. > > I think Nathaniel Smith (CC-ed) had a few more use cases for > asend/athrow. > > >> >> If we wanted to hedge our bets, we could expose them as _asend() and >> _athrow() initially, on the grounds that doing so is cheap, and we're >> genuinely curious to see if anyone can find a non-toy use case for >> them. But we should be very reluctant to add them to the public API >> without a clear idea of why they're there. > > > I think we should decide if we want to expose them at all. > If we're in doubt, let's not do that at all. Exposing them > with a "_" prefix doesn't free us from maintaining backwards > compatibility etc. > > You're completely right about the increased "cognitive burden > of learning the API". On the other hand, send() and throw() > for sync generators were designed for power users; most > Python developers don't even know that they exist. I think > we can view asend() and athrow() similarly. > > Thank you, > Yury > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From elazarg at gmail.com Sat Aug 6 19:24:09 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Sat, 06 Aug 2016 23:24:09 +0000 Subject: [Python-ideas] Make "raise" an expression Message-ID: The raise statement cannot be used as an expression, even though there is no theoretical reason for it not to. I don't see any practical reason either - except the fact that it requires changing the syntax. I have found this idea mentioned by Anthony Lee in a thread from 2014, regarding "or raise", https://mail.python.org/pipermail/python-ideas/2014-November/029921.html. B ut there is nothing against the idea there. Yes, it could easily be implemented using a function, but this seems to be an argument against *augmenting* the language, not against *changing* it. Moreover, this function will have uglier name, since `raise` is taken. (It could have been the other way around - omitting the statement entirely, leaving only a built in function `raise()`. I'm not suggesting that, of course) As mentioned there, it does add a single line in the traceback - which is not immediately what you are looking for. Python's tendency towards explicit control flow is irrelevant in this case. An exception can already be raised from anywhere, at any time, and there's no way see it from the source code (unlike e.g. break/continue). Such a change will (only) break external tools that use the ast module, or tools analyzing Python code that uses this feature. This happens at every language change, so it can be done together with the next such change. Use cases: result = f(args) or raise Exception(...) result = f(x, y) if x > y else raise Exception(...) f(lambda x: max(x, 0) or raise Exception(...)) d = defaultdict(list) ... d[v] or raise KeyError(v) try: return foo() except: return try_bar() or raise To be honest, I'm not sure that this change is worthwhile. Admittedly the use cases are weak, and in some ways this is complementary of the (sadly rejected) "except expression". I would still like to "finish that thought", so - why not? ~Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Aug 6 19:29:51 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 7 Aug 2016 09:29:51 +1000 Subject: [Python-ideas] Make "raise" an expression In-Reply-To: References: Message-ID: On Sun, Aug 7, 2016 at 9:24 AM, ????? wrote: > To be honest, I'm not sure that this change is worthwhile. Admittedly the > use cases are weak, and in some ways this is complementary of the (sadly > rejected) "except expression". I would still like to "finish that thought", > so - why not? > More "why". Python doesn't emphasize one-liners, so there needs to be a good reason not to just make this an 'if' statement. The use cases given are indeed weak, but given stronger ones, I'm sure this could be done. ChrisA From songofacandy at gmail.com Sat Aug 6 20:44:58 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Sun, 7 Aug 2016 09:44:58 +0900 Subject: [Python-ideas] Adding bytes.frombuffer() constructor Message-ID: bytearray is used as read / write buffer in asyncio. When implementing reading stream, read() API may be defined as returns "bytes". To get front n bytes from bytearray, there are two ways. 1. bytes(bytearray[:n]) 2. bytes(memoryview(bytearray)[:n]) (1) is simplest, but it produces temporary bytearray having n bytes. While (2) is more efficient than (1), it uses still temporary memoryview object, and it looks bit tricky. I want simple, readable, and temporary-less way to get part of bytearray as bytes. The API I propose looks like this: bytes.frombuffer(byteslike, length=-1, offset=0) How do you feel about it? -- INADA Naoki From michael.selik at gmail.com Sat Aug 6 20:57:32 2016 From: michael.selik at gmail.com (Michael Selik) Date: Sun, 07 Aug 2016 00:57:32 +0000 Subject: [Python-ideas] Make "raise" an expression In-Reply-To: References: Message-ID: On Sat, Aug 6, 2016, 7:24 PM ????? wrote: > The raise statement cannot be used as an expression, even though there is > no theoretical reason for it not to. I don't see any practical reason > either - except the fact that it requires changing the syntax. > > I have found this idea mentioned by Anthony Lee in a thread from 2014, > regarding "or raise", > https://mail.python.org/pipermail/python-ideas/2014-November/029921.html. > But there is nothing against the idea there. > > Yes, it could easily be implemented using a function, but this seems to be > an argument against *augmenting* the language, not against *changing* it. > Moreover, this function will have uglier name, since `raise` is taken. (It > could have been the other way around - omitting the statement entirely, > leaving only a built in function `raise()`. I'm not suggesting that, of > course) > Luckily, "throw" is available for you. def throw(exception): raise exception As mentioned there, it does add a single line in the traceback - which is > not immediately what you are looking for. > > Python's tendency towards explicit control flow is irrelevant in this > case. An exception can already be raised from anywhere, at any time, and > there's no way see it from the source code (unlike e.g. break/continue). > > Such a change will (only) break external tools that use the ast module, or > tools analyzing Python code that uses this feature. This happens at every > language change, so it can be done together with the next such change. > > Use cases: > > result = f(args) or raise Exception(...) > result = f(x, y) if x > y else raise Exception(...) > > f(lambda x: max(x, 0) or raise Exception(...)) > > d = defaultdict(list) > ... > d[v] or raise KeyError(v) > > try: > return foo() > except: > return try_bar() or raise > > To be honest, I'm not sure that this change is worthwhile. Admittedly the > use cases are weak, and in some ways this is complementary of the (sadly > rejected) "except expression". I would still like to "finish that > thought", so - why not? > > ~Elazar > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Aug 6 22:12:57 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 6 Aug 2016 22:12:57 -0400 Subject: [Python-ideas] Make "raise" an expression In-Reply-To: References: Message-ID: On 8/6/2016 8:57 PM, Michael Selik wrote: > > > On Sat, Aug 6, 2016, 7:24 PM ????? > > wrote: > > The raise statement cannot be used as an expression, even though > there is no theoretical reason for it not to. I don't see any > practical reason either - except the fact that it requires changing > the syntax. > > I have found this idea mentioned by Anthony Lee in a thread from > 2014, regarding "or > raise", https://mail.python.org/pipermail/python-ideas/2014-November/029921.html. But there > is nothing against the idea there. > > Yes, it could easily be implemented using a function, but this seems > to be an argument against *augmenting* the language, not against > *changing* it. Moreover, this function will have uglier name, since > `raise` is taken. (It could have been the other way around - > omitting the statement entirely, leaving only a built in function > `raise()`. I'm not suggesting that, of course) > > > Luckily, "throw" is available for you. > > def throw(exception): > raise exception Nice. This makes the 'or raise' construct available to any who like it in all versions. There is precedent for this name in generator.throw. -- Terry Jan Reedy From michael.selik at gmail.com Sun Aug 7 00:20:42 2016 From: michael.selik at gmail.com (Michael Selik) Date: Sun, 07 Aug 2016 04:20:42 +0000 Subject: [Python-ideas] Make "raise" an expression In-Reply-To: References: Message-ID: On Sat, Aug 6, 2016 at 10:13 PM Terry Reedy wrote: > On 8/6/2016 8:57 PM, Michael Selik wrote: > > On Sat, Aug 6, 2016, 7:24 PM ????? > > > > wrote: > > The raise statement cannot be used as an expression, even though > > there is no theoretical reason for it not to. I don't see any > > practical reason either - except the fact that it requires changing > > the syntax. > > > > I have found this idea mentioned by Anthony Lee in a thread from > > 2014, regarding "or > > raise", > https://mail.python.org/pipermail/python-ideas/2014-November/029921.html. > But there > > is nothing against the idea there. > > > > Yes, it could easily be implemented using a function, but this seems > > to be an argument against *augmenting* the language, not against > > *changing* it. Moreover, this function will have uglier name, since > > `raise` is taken. (It could have been the other way around - > > omitting the statement entirely, leaving only a built in function > > `raise()`. I'm not suggesting that, of course) > > > > > > Luckily, "throw" is available for you. > > > > def throw(exception): > > raise exception > > Nice. This makes the 'or raise' construct available to any who like it > in all versions. There is precedent for this name in generator.throw. > I suppose it should really be more like In [1]: def throw(exception=None, *, cause=None): ...: if exception is None: ...: raise ...: if cause is None: ...: raise exception ...: raise exception from cause I'm not sure if I like the use in a shortcut expression. Too easy to accidentally use ``or`` when you mean ``and`` and vice versa. False or throw(RuntimeError('you shoulda been True to me')) True and throw(RuntimeError('I knew it was you, Fredo.')) -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.selik at gmail.com Sun Aug 7 01:08:26 2016 From: michael.selik at gmail.com (Michael Selik) Date: Sun, 07 Aug 2016 05:08:26 +0000 Subject: [Python-ideas] Adding bytes.frombuffer() constructor In-Reply-To: References: Message-ID: On Sat, Aug 6, 2016 at 8:45 PM INADA Naoki wrote: > 1. bytes(bytearray[:n]) > 2. bytes(memoryview(bytearray)[:n]) > > (1) is simplest, but it produces temporary bytearray having n bytes. > Does that actually make the difference between unacceptably inefficient performance and acceptably efficient for an application you're working on? > While (2) is more efficient than (1), it uses still temporary memoryview > object, and it looks bit tricky. > Using the memoryview is nicely explicit whereas ``bytes.frombuffer`` could be creating a temporary bytearray as part of its construction. The API I propose looks like this: > bytes.frombuffer(byteslike, length=-1, offset=0) > RawIOBase.read and the other read methods described in the io module use the parameter "size" instead of "length". https://docs.python.org/3/library/io.html#io.RawIOBase -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.svetlov at gmail.com Sun Aug 7 08:36:15 2016 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Sun, 07 Aug 2016 12:36:15 +0000 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: @contextlib.asynccontextmanager and especially an alternative for asyncio.Queue looks like valuable examples for asend/athrow. On Sun, Aug 7, 2016 at 12:21 AM Alexander Shorin wrote: > While Nick has a point about coroutine coroutines confusion. I > consider asend/athow will be useful in real world. Because they > provide stable, well known and uniform interface to communicate with a > coroutine. > > For now to have two-way communication with some coroutine you have to > establish two queues: inbox and outbox, pass them around, don't > eventually mess them and so on. Queues are great for message-passing > communication and generally good idea to use, but it still some > boilerplate code as like as __aiter__/__anext__ (which async > generators will get rid) to maintain. Also queue is a bad reference > to a coroutine owns it - it's hard to find own who is really reading > that inbox in complex codebase. > > With asend/athrow there will be no need to plass queues around, but > just coroutine instance and do all the communication with it as with > plain object. You need to retrieve a value - do iterate over it; need > to send a value back - do asend. Termination and error-passing are > aslo easy to do. > > So while this interface looks complicated for learning curve, it could > be very helpful, when you need simple communication channel between > coroutines sized equals one. For something greater you have to return > to queues approach, but they involve quite a lot of else bits to care > about like queue overflow, which is indeed topic for advanced users. > > > > -- > ,,,^..^,,, > > > On Sat, Aug 6, 2016 at 7:51 PM, Yury Selivanov > wrote: > > Hi Nick, > > > > On 2016-08-06 4:28 AM, Nick Coghlan wrote: > > > >> On 6 August 2016 at 01:11, Andrew Svetlov > >> wrote: > >>> > >>> Honestly I personally don't feel a need for two-way generators with > >>> `asend()`/`athrow()` but as Yury explained he need them internally for > >>> `anext`/`aclose` anyway. > >> > >> Even if asend()/athrow() are cheap from an implementation perspective, > >> they're *not* necessarily cheap from a "cognitive burden of learning > >> the API" perspective. We know what send() and throw() on generators > >> are for: implementing coroutines. However, we also know that layering > >> that on top of the generator protocol turned out to be inherently > >> confusing, hence PEP 492 and async/await. > >> > >> So if we don't have a concrete use case for "coroutine coroutines" > >> (what does that even *mean*?), we shouldn't add asend()/athrow() just > >> because it's easy to do so. > > > > > > One immediate use case that comes to mind is > > @contextlib.asynccontextmanager > > > > Essentially this: > > > > @contextlib.asynccontextmanager > > async def ctx(): > > resource = await acquire() > > try: > > yield resource > > finally: > > await release(resource) > > > > I'm also worrying that in order to do similar things people will > > abuse "agen.__anext__().send()" and "agen.__anext__().throw()". > > And this is a very error-prone thing to do. For example, in > > asyncio code, you will capture yielded Futures that the > > generator is trying to await on (those yielded futures > > have to be captured by the running Task object instead). > > > > "asend" and "athrow" know about iteration protocol and can see > > the "wrapped" yielded objects, hence they can iterate the > > generator correctly and reliably. > > > > I think Nathaniel Smith (CC-ed) had a few more use cases for > > asend/athrow. > > > > > >> > >> If we wanted to hedge our bets, we could expose them as _asend() and > >> _athrow() initially, on the grounds that doing so is cheap, and we're > >> genuinely curious to see if anyone can find a non-toy use case for > >> them. But we should be very reluctant to add them to the public API > >> without a clear idea of why they're there. > > > > > > I think we should decide if we want to expose them at all. > > If we're in doubt, let's not do that at all. Exposing them > > with a "_" prefix doesn't free us from maintaining backwards > > compatibility etc. > > > > You're completely right about the increased "cognitive burden > > of learning the API". On the other hand, send() and throw() > > for sync generators were designed for power users; most > > Python developers don't even know that they exist. I think > > we can view asend() and athrow() similarly. > > > > Thank you, > > Yury > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Thanks, Andrew Svetlov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Aug 7 11:47:36 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Aug 2016 01:47:36 +1000 Subject: [Python-ideas] Adding bytes.frombuffer() constructor In-Reply-To: References: Message-ID: On 7 August 2016 at 15:08, Michael Selik wrote: > > > On Sat, Aug 6, 2016 at 8:45 PM INADA Naoki wrote: > >> 1. bytes(bytearray[:n]) >> 2. bytes(memoryview(bytearray)[:n]) >> >> (1) is simplest, but it produces temporary bytearray having n bytes. >> > > Does that actually make the difference between unacceptably inefficient > performance and acceptably efficient for an application you're working on? > > >> While (2) is more efficient than (1), it uses still temporary memoryview >> object, and it looks bit tricky. >> > > Using the memoryview is nicely explicit whereas ``bytes.frombuffer`` could > be creating a temporary bytearray as part of its construction. > It could, but it wouldn't (since that would be pointlessly inefficient). The main question to be answered here would be whether adding a dedicated spelling for "bytes(memoryview(bytearray)[:n])" actually smooths out the learning curve for memoryview in general, where folks would learn: 1. "bytes(mybytearray[:n])" copies the data twice for no good reason 2. "bytes.frombuffer(mybytearray, n)" avoids the double copy 3. "bytes(memoryview(mybytearray)[:n])" generalises to arbitrary slices With memoryview being a builtin, I'm not sure that argument can be made successfully - the transformation in going from step 1 direct to step 3 is just "wrap the original object with memoryview before slicing to avoid the double copy", and that's no more complicated than using a different constructor method. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Aug 7 11:53:55 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Aug 2016 01:53:55 +1000 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: On 7 August 2016 at 02:51, Yury Selivanov wrote: > Hi Nick, > > On 2016-08-06 4:28 AM, Nick Coghlan wrote: > > On 6 August 2016 at 01:11, Andrew Svetlov >> wrote: >> >>> Honestly I personally don't feel a need for two-way generators with >>> `asend()`/`athrow()` but as Yury explained he need them internally for >>> `anext`/`aclose` anyway. >>> >> Even if asend()/athrow() are cheap from an implementation perspective, >> they're *not* necessarily cheap from a "cognitive burden of learning >> the API" perspective. We know what send() and throw() on generators >> are for: implementing coroutines. However, we also know that layering >> that on top of the generator protocol turned out to be inherently >> confusing, hence PEP 492 and async/await. >> >> So if we don't have a concrete use case for "coroutine coroutines" >> (what does that even *mean*?), we shouldn't add asend()/athrow() just >> because it's easy to do so. >> > > One immediate use case that comes to mind is > @contextlib.asynccontextmanager > > Essentially this: > > @contextlib.asynccontextmanager > async def ctx(): > resource = await acquire() > try: > yield resource > finally: > await release(resource) > Good point. Let's make adding that part of the PEP, since it will be useful, makes it immediately obvious why we're including asend() and athrow(), and will also stress test the coroutine traceback machinery in an interesting way. (Note that depending on the implementation details, it may need to end up living in asyncio somewhere - that will depend on whether or not we need to use any asyncio APIs in the implementation of _CoroutineContextManager) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Sun Aug 7 15:09:31 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 8 Aug 2016 04:09:31 +0900 Subject: [Python-ideas] Adding bytes.frombuffer() constructor In-Reply-To: References: Message-ID: On Mon, Aug 8, 2016 at 12:47 AM, Nick Coghlan wrote: > On 7 August 2016 at 15:08, Michael Selik wrote: >> >> >> On Sat, Aug 6, 2016 at 8:45 PM INADA Naoki wrote: >>> >>> 1. bytes(bytearray[:n]) >>> 2. bytes(memoryview(bytearray)[:n]) >>> >>> (1) is simplest, but it produces temporary bytearray having n bytes. >> >> >> Does that actually make the difference between unacceptably inefficient >> performance and acceptably efficient for an application you're working on? Yes. My intention is Tornado and AsyncIO. Since they are framework, we can't assume how large it is -- it may be few bytes ~ few giga bytes. >>> >>> While (2) is more efficient than (1), it uses still temporary memoryview >>> object, and it looks bit tricky. >> >> >> Using the memoryview is nicely explicit whereas ``bytes.frombuffer`` could >> be creating a temporary bytearray as part of its construction. > > > It could, but it wouldn't (since that would be pointlessly inefficient). > > The main question to be answered here would be whether adding a dedicated > spelling for "bytes(memoryview(bytearray)[:n])" actually smooths out the > learning curve for memoryview in general, where folks would learn: > > 1. "bytes(mybytearray[:n])" copies the data twice for no good reason > 2. "bytes.frombuffer(mybytearray, n)" avoids the double copy > 3. "bytes(memoryview(mybytearray)[:n])" generalises to arbitrary slices > > With memoryview being a builtin, I'm not sure that argument can be made > successfully - the transformation in going from step 1 direct to step 3 is > just "wrap the original object with memoryview before slicing to avoid the > double copy", and that's no more complicated than using a different > constructor method. I'm not sure, too. memoryview may and may not be bytes-like object which os.write or socket.send accepts. But memoryview is successor of buffer. So we should encourage to use it for zero copy slicing. Thank you. -- INADA Naoki From srkunze at mail.de Mon Aug 8 07:44:44 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 8 Aug 2016 13:44:44 +0200 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> Message-ID: <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> On 06.08.2016 01:25, Yury Selivanov wrote: > > On 2016-08-05 4:51 PM, Sven R. Kunze wrote: >> Yes, that is how I understand it as well. So, all this complexity >> stems from the intertwining of generators and coroutines. >> >> I am wondering if loosening the tie between the two could make it all >> simpler; even for you. (Not right now, but maybe later.) > > Unfortunately, the only way to "loosen the tie" is to re-implement > generator protocol for coroutines (make them a separate thing). > Besides breaking backwards compatibility with the existing Python code > that uses @coroutine decorator and 'yield from' syntax, it will also > introduce a ton of work. You are the best to know that. :) >> >> I think that is related to Stefan's question here. I cannot speak for >> him but it seems to me the confusion is actually the other way around. >> >> Coming from a database background, I prefer redundant-free data >> representation. The "a" prefix encodes the information "that's an >> asynchronous generator" which is repeated several times and thus is >> redundant. >> >> Another example: when writing code for generators and checking for >> the "send" method, I would then need to check for the "asend" method >> as well. So, I would need to touch that code although it even might >> not have been necessary in the first place. >> >> I don't want to talk you out of the "a" prefix, but I get the feeling >> that the information "that's asynchronous" should be provided as a >> separate attribute. I believe it would reduce confusion, simplify >> duck-typing and flatten the learning curve. :) > > > Thing is you can't write one piece of code that will accept any type > of generator (sync or async). > > * send, throw, close, __next__ methods for sync generators, they are > synchronous. > > * send, throw, close, __next__ methods for async generators, they > *require* to use 'await' on them. There is no way to make them > "synchronous", because you have awaits in async generators. There are a lot of things you can do with generators (decorating, adding attributes etc.) and none of them require you to "make them synchronous". > Because of the above, duck-typing simply isn't possible. See above. > The prefix is there to make people aware that this is a completely > different API, even though it looks similar. Sure. So, what do you think about the separate attribute to make people aware? Best, Sven From andrew.svetlov at gmail.com Mon Aug 8 08:43:28 2016 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Mon, 08 Aug 2016 12:43:28 +0000 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> Message-ID: > Sure. So, what do you think about the separate attribute to make people aware? What people should be aware about? IMHO sync generators and async ones have different types. What also is needed? On Mon, Aug 8, 2016 at 2:45 PM Sven R. Kunze wrote: > On 06.08.2016 01:25, Yury Selivanov wrote: > > > > On 2016-08-05 4:51 PM, Sven R. Kunze wrote: > >> Yes, that is how I understand it as well. So, all this complexity > >> stems from the intertwining of generators and coroutines. > >> > >> I am wondering if loosening the tie between the two could make it all > >> simpler; even for you. (Not right now, but maybe later.) > > > > Unfortunately, the only way to "loosen the tie" is to re-implement > > generator protocol for coroutines (make them a separate thing). > > Besides breaking backwards compatibility with the existing Python code > > that uses @coroutine decorator and 'yield from' syntax, it will also > > introduce a ton of work. > > You are the best to know that. :) > > >> > >> I think that is related to Stefan's question here. I cannot speak for > >> him but it seems to me the confusion is actually the other way around. > >> > >> Coming from a database background, I prefer redundant-free data > >> representation. The "a" prefix encodes the information "that's an > >> asynchronous generator" which is repeated several times and thus is > >> redundant. > >> > >> Another example: when writing code for generators and checking for > >> the "send" method, I would then need to check for the "asend" method > >> as well. So, I would need to touch that code although it even might > >> not have been necessary in the first place. > >> > >> I don't want to talk you out of the "a" prefix, but I get the feeling > >> that the information "that's asynchronous" should be provided as a > >> separate attribute. I believe it would reduce confusion, simplify > >> duck-typing and flatten the learning curve. :) > > > > > > Thing is you can't write one piece of code that will accept any type > > of generator (sync or async). > > > > * send, throw, close, __next__ methods for sync generators, they are > > synchronous. > > > > * send, throw, close, __next__ methods for async generators, they > > *require* to use 'await' on them. There is no way to make them > > "synchronous", because you have awaits in async generators. > > There are a lot of things you can do with generators (decorating, adding > attributes etc.) and none of them require you to "make them synchronous". > > > Because of the above, duck-typing simply isn't possible. > > See above. > > > The prefix is there to make people aware that this is a completely > > different API, even though it looks similar. > > Sure. So, what do you think about the separate attribute to make people > aware? > > > Best, > Sven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Thanks, Andrew Svetlov -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Aug 8 13:06:15 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 8 Aug 2016 13:06:15 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> Message-ID: <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> On 2016-08-08 7:44 AM, Sven R. Kunze wrote: >> Thing is you can't write one piece of code that will accept any type >> of generator (sync or async). >> >> * send, throw, close, __next__ methods for sync generators, they are >> synchronous. >> >> * send, throw, close, __next__ methods for async generators, they >> *require* to use 'await' on them. There is no way to make them >> "synchronous", because you have awaits in async generators. > > There are a lot of things you can do with generators (decorating, > adding attributes etc.) and none of them require you to "make them > synchronous". You have to be aware of what you're decorating. Always. For instance, here's an example of a buggy code: @functools.lru_cache() def foo(): yield 123 I'm really against duck typing here. I see no point in making the API for async generators to look similar to the API of sync generators. You have to provide a solid real-world example of where it might help with asynchronous generators to convince me otherwise ;) > >> Because of the above, duck-typing simply isn't possible. > > See above. > >> The prefix is there to make people aware that this is a completely >> different API, even though it looks similar. > > Sure. So, what do you think about the separate attribute to make > people aware? There is a separate attribute already -- __class__. Plus a couple of new functions in inspect module: inspect.isasyncgenfunction and inspect.isasyncgen. And the new types.AsyncGeneratorType for isinstance checks. Thanks, Yury From yselivanov.ml at gmail.com Mon Aug 8 13:09:29 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 8 Aug 2016 13:09:29 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: <3eab1e5f-33f2-86b3-45df-c2b61d7b3475@gmail.com> On 2016-08-07 11:53 AM, Nick Coghlan wrote: > > One immediate use case that comes to mind is > > @contextlib.asynccontextmanager > > Essentially this: > > @contextlib.asynccontextmanager > async def ctx(): > resource = await acquire() > try: > yield resource > finally: > await release(resource) > > > > Good point. Let's make adding that part of the PEP, since it will be > useful, makes it immediately obvious why we're including asend() and > athrow(), and will also stress test the coroutine traceback machinery > in an interesting way. (Note that depending on the implementation > details, it may need to end up living in asyncio somewhere - that will > depend on whether or not we need to use any asyncio APIs in the > implementation of _CoroutineContextManager) It's already in the PEP: https://www.python.org/dev/peps/pep-0525/#why-the-asend-and-athrow-methods-are-necessary Although it's not super visible, I agree. Should I try to make that section more prominent? Thank you, Yury From guido at python.org Mon Aug 8 14:06:05 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Aug 2016 11:06:05 -0700 Subject: [Python-ideas] Looking for a co-author for a PEP about variable declarations Message-ID: I've written up a wall of text at https://github.com/python/typing/issues/258 and would like to develop it into a PEP at an accelerated schedule. Is anyone interested in helping out? This *could* be a great mentoring opportunity. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Mon Aug 8 14:17:33 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 8 Aug 2016 13:17:33 -0500 Subject: [Python-ideas] Looking for a co-author for a PEP about variable declarations In-Reply-To: References: Message-ID: Well, I don't really have much experience, and, last time I offered to help, I was mostly ignored... But you *did* say "mentoring opportunity"... I have this week off since college work starts next week. -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ On Aug 8, 2016 1:06 PM, "Guido van Rossum" wrote: > I've written up a wall of text at https://github.com/python/ > typing/issues/258 and would like to develop it into a PEP at an > accelerated schedule. Is anyone interested in helping out? This *could* be > a great mentoring opportunity. > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Mon Aug 8 15:11:25 2016 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Mon, 8 Aug 2016 21:11:25 +0200 Subject: [Python-ideas] Looking for a co-author for a PEP about variable declarations Message-ID: > I've written up a wall of text at > https://github.com/python/typing/issues/258 and would like to develop it > into a PEP at an accelerated schedule. Is anyone interested in helping out? > This *could* be a great mentoring opportunity. I think this is a great idea and I will be glad to help. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Mon Aug 8 15:41:30 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Mon, 08 Aug 2016 19:41:30 +0000 Subject: [Python-ideas] Looking for a co-author for a PEP about variable declarations In-Reply-To: References: Message-ID: I will be glad to participate. ~Elazar On Mon, Aug 8, 2016 at 9:07 PM Guido van Rossum wrote: > I've written up a wall of text at > https://github.com/python/typing/issues/258 and would like to develop it > into a PEP at an accelerated schedule. Is anyone interested in helping out? > This *could* be a great mentoring opportunity. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Aug 8 15:49:57 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 9 Aug 2016 05:49:57 +1000 Subject: [Python-ideas] Looking for a co-author for a PEP about variable declarations In-Reply-To: References: Message-ID: On Tue, Aug 9, 2016 at 5:41 AM, ????? wrote: > I will be glad to participate. > ~Elazar Word of warning: This is likely to be a controversial topic. Whoever takes this on can expect to receive several hundred emails on the subject, and will need to hash out the important details in the PEP, keeping everything that anyone will go looking for, but not letting the document get so long that people miss stuff, and include enough detail, but not too much, and basically be the Charlie that everyone complains to :) The PEP author is the central focus for all the discussion, and I predict a lot of discussion for this particular topic. Your next week or so will not have a great deal of sleep in it. That said, though: I strongly encourage you to take part. Just don't underestimate what you're taking on. ChrisA From elazarg at gmail.com Mon Aug 8 16:13:47 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Mon, 08 Aug 2016 20:13:47 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: > > class Starship: > stats: class Dict[str, int] = {} # Pure class variable > damage: class int = 0 # Hybrid class/instance variable > captain: str # Pure instance variable > I can't avoid noting that there is an opportunity here to insert NamedTuple into the core language. The above example is almost there, except it's mutable and without convenient methods. But class Starship(tuple): damage: int = 0 captain: str = "Kirk" Is an obvious syntax for Starship = NamedTuple('Starship', [('damage', int), ('captain', str)]) Only much more available and intuitive to read, use, and of course - type check. (Of course, it does mean adding semantics to the declaration syntax in general) I'm not really suggesting to make this change now, but I believe it will be done, sooner or later. My brief experience with mypy convinced me that it must be the case. The new declaration syntax only makes it easier. ~Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Aug 8 16:25:33 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Aug 2016 13:25:33 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: That's a very interesting idea and one that deserves pursuing (though I agree it's not a blocker for the PEP I'm hoping to write). I think the next step is to prototype this -- which can only happen once we have an implementation of the PEP. Though perhaps you could start by writing a prototype that works by having the user write the following: class Starship(PrototypeNamedTuple): damage = 0 captain = "Kirk" __annotations__ = dict(damage=int, captain=str) It could also benefit from PEP 520 (Preserving Class Attribute Definition Order). Who's game? --Guido On Mon, Aug 8, 2016 at 1:13 PM, ????? wrote: > class Starship: >> stats: class Dict[str, int] = {} # Pure class variable >> damage: class int = 0 # Hybrid class/instance variable >> captain: str # Pure instance variable >> > > I can't avoid noting that there is an opportunity here to insert > NamedTuple into the core language. The above example is almost there, > except it's mutable and without convenient methods. But > > class Starship(tuple): > damage: int = 0 > captain: str = "Kirk" > > Is an obvious syntax for > > Starship = NamedTuple('Starship', [('damage', int), ('captain', str)]) > > Only much more available and intuitive to read, use, and of course - type > check. > (Of course, it does mean adding semantics to the declaration syntax in > general) > > I'm not really suggesting to make this change now, but I believe it will > be done, sooner or later. My brief experience with mypy convinced me that > it must be the case. The new declaration syntax only makes it easier. > > ~Elazar > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Mon Aug 8 16:58:55 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Mon, 08 Aug 2016 20:58:55 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: class PrototypeNamedTuple: cache = {} def __new__(cls, *args): P = PrototypeNamedTuple if cls not in P.cache: P.cache[cls] = typing.NamedTuple(cls.__name__, cls.__annotations__.items()) return P.cache[cls](*args) Works modulo ordering, though I'm not sure that's the right way to do it. The ordering part of namedtuple is orthogonal to the value-type/immutability part. So I would imagine making "Value" for the latter, "tuple" for the former, and namedtuple is mixing both (possibly given a convenient name, such as PrototypeNamedTuple). "Value" can also seen as mixing "Struct" and "Immutable", but that's overdoing it I guess. ~Elazar On Mon, Aug 8, 2016 at 11:25 PM Guido van Rossum wrote: > That's a very interesting idea and one that deserves pursuing (though I > agree it's not a blocker for the PEP I'm hoping to write). I think the next > step is to prototype this -- which can only happen once we have an > implementation of the PEP. Though perhaps you could start by writing a > prototype that works by having the user write the following: > > class Starship(PrototypeNamedTuple): > damage = 0 > captain = "Kirk" > __annotations__ = dict(damage=int, captain=str) > > It could also benefit from PEP 520 (Preserving Class Attribute Definition > Order). > > Who's game? > > --Guido > > On Mon, Aug 8, 2016 at 1:13 PM, ????? wrote: > >> class Starship: >>> stats: class Dict[str, int] = {} # Pure class variable >>> damage: class int = 0 # Hybrid class/instance variable >>> captain: str # Pure instance variable >>> >> >> I can't avoid noting that there is an opportunity here to insert >> NamedTuple into the core language. The above example is almost there, >> except it's mutable and without convenient methods. But >> >> class Starship(tuple): >> damage: int = 0 >> captain: str = "Kirk" >> >> Is an obvious syntax for >> >> Starship = NamedTuple('Starship', [('damage', int), ('captain', >> str)]) >> >> Only much more available and intuitive to read, use, and of course - type >> check. >> (Of course, it does mean adding semantics to the declaration syntax in >> general) >> >> I'm not really suggesting to make this change now, but I believe it will >> be done, sooner or later. My brief experience with mypy convinced me that >> it must be the case. The new declaration syntax only makes it easier. >> >> ~Elazar >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Aug 8 17:08:41 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Aug 2016 14:08:41 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: Hm, overlooking the ordering is kind of a big deal for something with "tuple" in its name. :-) Also it feels like it needs a metaclass instead of a cache. Maybe from this we can learn though that __anotations__ should be an OrderedDict? On Mon, Aug 8, 2016 at 1:58 PM, ????? wrote: > class PrototypeNamedTuple: > cache = {} > def __new__(cls, *args): > P = PrototypeNamedTuple > if cls not in P.cache: > P.cache[cls] = typing.NamedTuple(cls.__name__, > cls.__annotations__.items()) > return P.cache[cls](*args) > > Works modulo ordering, though I'm not sure that's the right way to do it. > > The ordering part of namedtuple is orthogonal to the > value-type/immutability part. So I would imagine making "Value" for the > latter, "tuple" for the former, and namedtuple is mixing both (possibly > given a convenient name, such as PrototypeNamedTuple). "Value" can also > seen as mixing "Struct" and "Immutable", but that's overdoing it I guess. > > ~Elazar > > On Mon, Aug 8, 2016 at 11:25 PM Guido van Rossum wrote: > >> That's a very interesting idea and one that deserves pursuing (though I >> agree it's not a blocker for the PEP I'm hoping to write). I think the next >> step is to prototype this -- which can only happen once we have an >> implementation of the PEP. Though perhaps you could start by writing a >> prototype that works by having the user write the following: >> >> class Starship(PrototypeNamedTuple): >> damage = 0 >> captain = "Kirk" >> __annotations__ = dict(damage=int, captain=str) >> >> It could also benefit from PEP 520 (Preserving Class Attribute Definition >> Order). >> >> Who's game? >> >> --Guido >> >> On Mon, Aug 8, 2016 at 1:13 PM, ????? wrote: >> >>> class Starship: >>>> stats: class Dict[str, int] = {} # Pure class variable >>>> damage: class int = 0 # Hybrid class/instance variable >>>> captain: str # Pure instance variable >>>> >>> >>> I can't avoid noting that there is an opportunity here to insert >>> NamedTuple into the core language. The above example is almost there, >>> except it's mutable and without convenient methods. But >>> >>> class Starship(tuple): >>> damage: int = 0 >>> captain: str = "Kirk" >>> >>> Is an obvious syntax for >>> >>> Starship = NamedTuple('Starship', [('damage', int), ('captain', >>> str)]) >>> >>> Only much more available and intuitive to read, use, and of course - >>> type check. >>> (Of course, it does mean adding semantics to the declaration syntax in >>> general) >>> >>> I'm not really suggesting to make this change now, but I believe it will >>> be done, sooner or later. My brief experience with mypy convinced me that >>> it must be the case. The new declaration syntax only makes it easier. >>> >>> ~Elazar >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Mon Aug 8 17:11:58 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Mon, 08 Aug 2016 21:11:58 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: Feels like named parameters are better off being an OrderedDict in the first place. NamedTuple pushes OrderedDict to become kind of builtin. On Tue, Aug 9, 2016 at 12:09 AM Guido van Rossum wrote: > Hm, overlooking the ordering is kind of a big deal for something with > "tuple" in its name. :-) > > Also it feels like it needs a metaclass instead of a cache. > > Maybe from this we can learn though that __anotations__ should be an > OrderedDict? > > On Mon, Aug 8, 2016 at 1:58 PM, ????? wrote: > >> class PrototypeNamedTuple: >> cache = {} >> def __new__(cls, *args): >> P = PrototypeNamedTuple >> if cls not in P.cache: >> P.cache[cls] = typing.NamedTuple(cls.__name__, >> cls.__annotations__.items()) >> return P.cache[cls](*args) >> >> Works modulo ordering, though I'm not sure that's the right way to do it. >> >> The ordering part of namedtuple is orthogonal to the >> value-type/immutability part. So I would imagine making "Value" for the >> latter, "tuple" for the former, and namedtuple is mixing both (possibly >> given a convenient name, such as PrototypeNamedTuple). "Value" can also >> seen as mixing "Struct" and "Immutable", but that's overdoing it I guess. >> >> ~Elazar >> >> On Mon, Aug 8, 2016 at 11:25 PM Guido van Rossum >> wrote: >> >>> That's a very interesting idea and one that deserves pursuing (though I >>> agree it's not a blocker for the PEP I'm hoping to write). I think the next >>> step is to prototype this -- which can only happen once we have an >>> implementation of the PEP. Though perhaps you could start by writing a >>> prototype that works by having the user write the following: >>> >>> class Starship(PrototypeNamedTuple): >>> damage = 0 >>> captain = "Kirk" >>> __annotations__ = dict(damage=int, captain=str) >>> >>> It could also benefit from PEP 520 (Preserving Class Attribute >>> Definition Order). >>> >>> Who's game? >>> >>> --Guido >>> >>> On Mon, Aug 8, 2016 at 1:13 PM, ????? wrote: >>> >>>> class Starship: >>>>> stats: class Dict[str, int] = {} # Pure class variable >>>>> damage: class int = 0 # Hybrid class/instance variable >>>>> captain: str # Pure instance variable >>>>> >>>> >>>> I can't avoid noting that there is an opportunity here to insert >>>> NamedTuple into the core language. The above example is almost there, >>>> except it's mutable and without convenient methods. But >>>> >>>> class Starship(tuple): >>>> damage: int = 0 >>>> captain: str = "Kirk" >>>> >>>> Is an obvious syntax for >>>> >>>> Starship = NamedTuple('Starship', [('damage', int), ('captain', >>>> str)]) >>>> >>>> Only much more available and intuitive to read, use, and of course - >>>> type check. >>>> (Of course, it does mean adding semantics to the declaration syntax in >>>> general) >>>> >>>> I'm not really suggesting to make this change now, but I believe it >>>> will be done, sooner or later. My brief experience with mypy convinced me >>>> that it must be the case. The new declaration syntax only makes it easier. >>>> >>>> ~Elazar >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>> >>> >>> >>> -- >>> --Guido van Rossum (python.org/~guido) >>> >> > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Aug 8 17:28:54 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Aug 2016 14:28:54 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Mon, Aug 8, 2016 at 2:11 PM, ????? wrote: > Feels like named parameters are better off being an OrderedDict in the > first place. > PEP 468. > NamedTuple pushes OrderedDict to become kind of builtin. > Why? Having both in the collections module is good enough. > > On Tue, Aug 9, 2016 at 12:09 AM Guido van Rossum wrote: > >> Hm, overlooking the ordering is kind of a big deal for something with >> "tuple" in its name. :-) >> >> Also it feels like it needs a metaclass instead of a cache. >> >> Maybe from this we can learn though that __anotations__ should be an >> OrderedDict? >> >> On Mon, Aug 8, 2016 at 1:58 PM, ????? wrote: >> >>> class PrototypeNamedTuple: >>> cache = {} >>> def __new__(cls, *args): >>> P = PrototypeNamedTuple >>> if cls not in P.cache: >>> P.cache[cls] = typing.NamedTuple(cls.__name__, >>> cls.__annotations__.items()) >>> return P.cache[cls](*args) >>> >>> Works modulo ordering, though I'm not sure that's the right way to do it. >>> >>> The ordering part of namedtuple is orthogonal to the >>> value-type/immutability part. So I would imagine making "Value" for the >>> latter, "tuple" for the former, and namedtuple is mixing both (possibly >>> given a convenient name, such as PrototypeNamedTuple). "Value" can also >>> seen as mixing "Struct" and "Immutable", but that's overdoing it I guess. >>> >>> ~Elazar >>> >>> On Mon, Aug 8, 2016 at 11:25 PM Guido van Rossum >>> wrote: >>> >>>> That's a very interesting idea and one that deserves pursuing (though I >>>> agree it's not a blocker for the PEP I'm hoping to write). I think the next >>>> step is to prototype this -- which can only happen once we have an >>>> implementation of the PEP. Though perhaps you could start by writing a >>>> prototype that works by having the user write the following: >>>> >>>> class Starship(PrototypeNamedTuple): >>>> damage = 0 >>>> captain = "Kirk" >>>> __annotations__ = dict(damage=int, captain=str) >>>> >>>> It could also benefit from PEP 520 (Preserving Class Attribute >>>> Definition Order). >>>> >>>> Who's game? >>>> >>>> --Guido >>>> >>>> On Mon, Aug 8, 2016 at 1:13 PM, ????? wrote: >>>> >>>>> class Starship: >>>>>> stats: class Dict[str, int] = {} # Pure class variable >>>>>> damage: class int = 0 # Hybrid class/instance variable >>>>>> captain: str # Pure instance variable >>>>>> >>>>> >>>>> I can't avoid noting that there is an opportunity here to insert >>>>> NamedTuple into the core language. The above example is almost there, >>>>> except it's mutable and without convenient methods. But >>>>> >>>>> class Starship(tuple): >>>>> damage: int = 0 >>>>> captain: str = "Kirk" >>>>> >>>>> Is an obvious syntax for >>>>> >>>>> Starship = NamedTuple('Starship', [('damage', int), ('captain', >>>>> str)]) >>>>> >>>>> Only much more available and intuitive to read, use, and of course - >>>>> type check. >>>>> (Of course, it does mean adding semantics to the declaration syntax in >>>>> general) >>>>> >>>>> I'm not really suggesting to make this change now, but I believe it >>>>> will be done, sooner or later. My brief experience with mypy convinced me >>>>> that it must be the case. The new declaration syntax only makes it easier. >>>>> >>>>> ~Elazar >>>>> >>>>> _______________________________________________ >>>>> Python-ideas mailing list >>>>> Python-ideas at python.org >>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>>> >>>> >>>> >>>> >>>> -- >>>> --Guido van Rossum (python.org/~guido) >>>> >>> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Mon Aug 8 18:08:48 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Mon, 08 Aug 2016 22:08:48 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Tue, Aug 9, 2016 at 12:29 AM Guido van Rossum wrote: > On Mon, Aug 8, 2016 at 2:11 PM, ????? wrote: > >> Feels like named parameters are better off being an OrderedDict in the >> first place. >> > > PEP 468. > > Sorry, I should have read this PEP before. > NamedTuple pushes OrderedDict to become kind of builtin. >> > > Why? Having both in the collections module is good enough. > What I meant in becoming builtin is not the accessibility of the name, but the parallel incremental support of namedtuple, OrderedDict and (as I find out) order of **kwargs. In much the same way that class is made out of dict (and keeps one), namedtuple is an OrderedDict (and keeps one). Much like dict has a constructor `dict(a=1, b=2.0)` and a literal `{'a' : 1, 'b' : 2.0}`, OrderedDict has its OrderedDict(a=1, b=2.0) and should have the literal ('a': 1, 'b': 2.0). Replace 1 and 2 with int and float and you get a very natural syntax for a NamedTuple that is the type of a matching OrderedDict. And in my mind, the requirement for type names matches nicely the enforcement of immutability. Continuing this thought, the annotations for the class is actually its structural type. It opens the door for future requests for adding e.g. an operator for structural equivalence. This is all very far from the current language, so it's only thoughts and not really ideas; probably does not belong here. Sorry about that. ~Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Aug 8 18:32:59 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Aug 2016 15:32:59 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Mon, Aug 8, 2016 at 3:08 PM, ????? wrote: > On Tue, Aug 9, 2016 at 12:29 AM Guido van Rossum wrote: > >> On Mon, Aug 8, 2016 at 2:11 PM, ????? wrote: >> >>> Feels like named parameters are better off being an OrderedDict in the >>> first place. >>> >> >> PEP 468. >> >> > Sorry, I should have read this PEP before. > No problem, it's fine to participate before reading *everything*! > NamedTuple pushes OrderedDict to become kind of builtin. >>> >> >> Why? Having both in the collections module is good enough. >> > > What I meant in becoming builtin is not the accessibility of the name, but > the parallel incremental support of namedtuple, OrderedDict and (as I find > out) order of **kwargs. > Well, they're all already in the stdlib, although namedtuple is written in Python, OrderedDict in C (recently), and **kwarg is ancient. > In much the same way that class is made out of dict (and keeps one), > namedtuple is an OrderedDict (and keeps one). > I'm not sure if you're talking about the class or the instances. A class instance usually has a dict (unless if it has __slots__ and all its base classes have __slots__). But a namedtuple instance does not have a dict or OrderedDict -- it is a tuple at heart. > Much like dict has a constructor `dict(a=1, b=2.0)` and a literal `{'a' : > 1, 'b' : 2.0}`, OrderedDict has its OrderedDict(a=1, b=2.0) and should > have the literal ('a': 1, 'b': 2.0). > That's debatable at least, and probably there are better solutions. > Replace 1 and 2 with int and float and you get a very natural syntax for a > NamedTuple that is the type of a matching OrderedDict. > And in my mind, the requirement for type names matches nicely the > enforcement of immutability. > > Continuing this thought, the annotations for the class is actually its > structural type. It opens the door for future requests for adding e.g. an > operator for structural equivalence. > > This is all very far from the current language, so it's only thoughts and > not really ideas; probably does not belong here. Sorry about that. > > ~Elazar > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Mon Aug 8 18:37:26 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 9 Aug 2016 00:37:26 +0200 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> Message-ID: <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> On 08.08.2016 19:06, Yury Selivanov wrote: > You have to be aware of what you're decorating. Always. You have to be aware of what the decorator does first before you decide if it's relevant to be aware of what you're decorating. > For instance, here's an example of a buggy code: > > @functools.lru_cache() > def foo(): > yield 123 > > I'm really against duck typing here. I see no point in making the API > for async generators to look similar to the API of sync generators. And so do I to making them "almost similar" by putting an "a" in front of the well-known generator protocol. But maybe, because there is not real convincing argument on either side, it doesn't matter at all. > You have to provide a solid real-world example of where it might help > with asynchronous generators to convince me otherwise ;) Something, I learned in the past years is to be prepared. So, such solid real-world examples might emerge when async generators are out in the wild. Let's hope this does not make it harder for them to start with. :) > There is a separate attribute already -- __class__. Plus a couple of > new functions in inspect module: inspect.isasyncgenfunction and > inspect.isasyncgen. And the new types.AsyncGeneratorType for > isinstance checks. Got it, thanks. Just another random thought I want to share: From what I've heard in the wild, that most if not all pieces of async are mirroring existing Python features. So, building async basically builds a parallel structure in Python resembling Python. Async generators complete the picture. Some people (including me) are concerned by this because they feel that having two "almost the same pieces" is not necessarily a good thing to have. And not necessarily bad but it feels like duplicating code all over the place especially as existing functions are incompatible with async. As I understand it, one goal is to be as close to sync code as possible to make async code easier to understand. So, if dropping 'await's all over the place is the main difference between current async and sync code, I get the feeling both styles could be very much unified. (The examples of the PEP try to do it like this but it wouldn't work as we already discussed.) Maybe, it's too early for a discussion about this but I wanted to dump this thought somewhere. :) Thanks, Sven From ncoghlan at gmail.com Mon Aug 8 23:23:21 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 9 Aug 2016 13:23:21 +1000 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> Message-ID: On 9 August 2016 at 08:37, Sven R. Kunze wrote: > From what I've heard in the wild, that most if not all pieces of async are > mirroring existing Python features. So, building async basically builds a > parallel structure in Python resembling Python. Async generators complete > the picture. Some people (including me) are concerned by this because they > feel that having two "almost the same pieces" is not necessarily a good > thing to have. And not necessarily bad but it feels like duplicating code > all over the place especially as existing functions are incompatible with > async. > It's a known problem that applies to programming language design in general rather than being Python specific: http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ The async/await model used in C# and Python 3.5+ aims to at least make it clear whether you're dealing with "red" (asynchronous) or "blue" (synchronous) code, but some use cases will still benefit from using something like gevent to bridge between the synchronous world and the asynchronous one: http://python-notes.curiousefficiency.org/en/latest/pep_ideas/async_programming.html#gevent-and-pep-3156 In cases where blocking for IO is acceptable, the "run_until_complete()" method on event loop implementations can be used in order to invoke asynchronous code synchronously: http://www.curiousefficiency.org/posts/2015/07/asyncio-background-calls.html > As I understand it, one goal is to be as close to sync code as possible to > make async code easier to understand. So, if dropping 'await's all over the > place is the main difference between current async and sync code, I get the > feeling both styles could be very much unified. (The examples of the PEP > try to do it like this but it wouldn't work as we already discussed.) > Maybe, it's too early for a discussion about this but I wanted to dump this > thought somewhere. :) > One of the big mindset shifts it encourages is to design as many support libraries as possible as computational pipelines and message-drive state machines, rather than coupling them directly to IO operations (which is the way many of them work today). Brett Cannon started the Sans IO information project to discuss this concept at http://sans-io.readthedocs.io/ Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Aug 8 23:27:45 2016 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Aug 2016 20:27:45 -0700 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> Message-ID: Just don't oversell run_until_complete() -- some people coming from slightly different event loop paradigms expect to be able to pump for events at any point, possibly causing recursive invocations. That doesn't work here (and it's a feature it doesn't). On Mon, Aug 8, 2016 at 8:23 PM, Nick Coghlan wrote: > On 9 August 2016 at 08:37, Sven R. Kunze wrote: > >> From what I've heard in the wild, that most if not all pieces of async >> are mirroring existing Python features. So, building async basically builds >> a parallel structure in Python resembling Python. Async generators complete >> the picture. Some people (including me) are concerned by this because they >> feel that having two "almost the same pieces" is not necessarily a good >> thing to have. And not necessarily bad but it feels like duplicating code >> all over the place especially as existing functions are incompatible with >> async. >> > > It's a known problem that applies to programming language design in > general rather than being Python specific: http://journal.stuffwithstuff. > com/2015/02/01/what-color-is-your-function/ > > The async/await model used in C# and Python 3.5+ aims to at least make it > clear whether you're dealing with "red" (asynchronous) or "blue" > (synchronous) code, but some use cases will still benefit from using > something like gevent to bridge between the synchronous world and the > asynchronous one: http://python-notes.curiousefficiency.org/en/ > latest/pep_ideas/async_programming.html#gevent-and-pep-3156 > > In cases where blocking for IO is acceptable, the "run_until_complete()" > method on event loop implementations can be used in order to invoke > asynchronous code synchronously: http://www.curiousefficiency. > org/posts/2015/07/asyncio-background-calls.html > > >> As I understand it, one goal is to be as close to sync code as possible >> to make async code easier to understand. So, if dropping 'await's all over >> the place is the main difference between current async and sync code, I get >> the feeling both styles could be very much unified. (The examples of the >> PEP try to do it like this but it wouldn't work as we already discussed.) >> Maybe, it's too early for a discussion about this but I wanted to dump this >> thought somewhere. :) >> > > One of the big mindset shifts it encourages is to design as many support > libraries as possible as computational pipelines and message-drive state > machines, rather than coupling them directly to IO operations (which is the > way many of them work today). Brett Cannon started the Sans IO information > project to discuss this concept at http://sans-io.readthedocs.io/ > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue Aug 9 02:17:49 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Aug 2016 18:17:49 +1200 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <57A9758D.4020300@canterbury.ac.nz> ????? wrote: > class Starship(tuple): > damage: int = 0 > captain: str = "Kirk" > > Is an obvious syntax for > > Starship = NamedTuple('Starship', [('damage', int), ('captain', str)]) But the untyped version of that already has a meaning -- it's a tuple subclass with two extra class attributes that are unrelated to its indexable items. I thought that type annotations weren't meant to change runtime semantics? -- Greg From ncoghlan at gmail.com Tue Aug 9 10:13:04 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Aug 2016 00:13:04 +1000 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> Message-ID: On 9 August 2016 at 13:27, Guido van Rossum wrote: > Just don't oversell run_until_complete() -- some people coming from > slightly different event loop paradigms expect to be able to pump for > events at any point, possibly causing recursive invocations. That doesn't > work here (and it's a feature it doesn't). > Ah, interesting - I'd only ever used it for the "synchronous code that just wants this *particular* operation to run asynchronously in the current thread" case, which it handles nicely :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From drekin at gmail.com Tue Aug 9 10:18:27 2016 From: drekin at gmail.com (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=) Date: Tue, 9 Aug 2016 16:18:27 +0200 Subject: [Python-ideas] PEP 525: Asynchronous Generators Message-ID: Hello, when I was reading the PEP, it surprised me that return is not allowed in an async generator. You can transparently decompose an ordinary function, and yield from was added to be able to do the same with generators, so it seems natural to be able to decompose async generators the same way ? using (await) yield from and return. I found the explanation in the section about async yield from, but maybe a link to that section might be added as explanation why return produces a SyntaxError. In ideal world it seems that the transition from ordinary functions to generators and the transition from ordinary functions to async functions (coroutines) are orthogonal, and async gnerators are the natural combination of both transitions. I thing this is true despite the fact that async functions are *implemented* on top of generators. I imagine the situation as follows: being a generator adds an additional output channel ? the one leading to generator consumer, and being an async function adds an additional output channel for communication with the scheduler. An async generator somehow adds both these channels. I understand that having the concepts truly ortogonal would mean tons of work, I just think it's pity that something natural doesn't work only because of implementation reasons. However, introducing async generators is an important step in the right way. Thank you for that. Regards, Adam Barto? -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Aug 9 11:23:54 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Aug 2016 08:23:54 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <57A9758D.4020300@canterbury.ac.nz> References: <57A9758D.4020300@canterbury.ac.nz> Message-ID: On Mon, Aug 8, 2016 at 11:17 PM, Greg Ewing wrote: > ????? wrote: > >> class Starship(tuple): >> damage: int = 0 captain: str = "Kirk" >> >> Is an obvious syntax for >> Starship = NamedTuple('Starship', [('damage', int), ('captain', str)]) >> > > But the untyped version of that already has a meaning -- > it's a tuple subclass with two extra class attributes > that are unrelated to its indexable items. > > I thought that type annotations weren't meant to change > runtime semantics? Correct, but we can invent a new base class that has these semantics. It's no different from Enum. Anyway, I think this will have to be an add-on to be designed after the basics are done. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Aug 9 11:52:31 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 9 Aug 2016 11:52:31 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: Message-ID: <2fdafba7-7539-62d1-c9b8-d38559fa43ef@gmail.com> On 2016-08-09 10:18 AM, Adam Barto? wrote: > [..] > I understand that having the concepts truly ortogonal would mean tons > of work, I just think it's pity that something natural doesn't work > only because of implementation reasons. However, introducing async > generators is an important step in the right way. Thank you for that. Hi Adam, Thanks a lot for the positive feedback. Not adding 'yield from' as part of PEP 525 isn't dictated solely by the implementation difficulties. It's also important to scope the new features in such a way that we're confident that the implementation is solid, well reviewed and well tested. While I myself can see why having 'yield from' in asynchronous generators would be cool, there are many more important things and details that we have to do "right" with this PEP. Therefore my idea is to move in small steps: introduce asynchronous generators in 3.6; extend them with asynchronous 'yield from' in 3.7, if there is interest in that. Thanks, Yury From guido at python.org Tue Aug 9 12:08:04 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Aug 2016 09:08:04 -0700 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> Message-ID: On Tue, Aug 9, 2016 at 7:13 AM, Nick Coghlan wrote: > > On 9 August 2016 at 13:27, Guido van Rossum wrote: > >> Just don't oversell run_until_complete() -- some people coming from >> slightly different event loop paradigms expect to be able to pump for >> events at any point, possibly causing recursive invocations. That doesn't >> work here (and it's a feature it doesn't). >> > > Ah, interesting - I'd only ever used it for the "synchronous code that > just wants this *particular* operation to run asynchronously in the current > thread" case, which it handles nicely :) > It's really best reserved just for the main() function of an app. Anywhere else, you run the risk that your use of the event loop gets hidden in layers of other code (example: some logging backend that writes to a database) and eventually someone will call your function from an async callback or coroutine. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rutsky.vladimir at gmail.com Tue Aug 9 13:24:23 2016 From: rutsky.vladimir at gmail.com (Vladimir Rutsky) Date: Tue, 9 Aug 2016 20:24:23 +0300 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: Hi Yury, Thank you for posting this PEP! As an asyncio-based libraries user, author and contributor I appreciate that topic of easy-writing of asynchronous generators is being covered now. I like how simple writing of async generators will be with this PEP, same with consequent functionality (e.g. async context wrappers). It's great that with this PEP it would be possible to write simple async data processing wrappers without a knowledge of a single special method (__aiter__, __anext__, __aenter__, __aexit__) using same patterns as regular Python user would use in synchronous code: @async_context_manager async def connection(uri): conn = await connect(uri) try: yield conn finally: conn.close() await conn.wait_closed() async def query_data(conn, q): cursor = await conn.query(q) try: row = await cursor.fetch_row() while row is not None: yield row row = await cursor.fetch_row() finally: cursor.close() await cursor.wait_closed() async with connection(uri) as conn: async for item in query_data(q): print(item) And this is as simple as in synchronous code, but made asynchronous. Can you explain details of canceling of asend and await coroutines? Consider following example: async def gen(): zero = yield 0 asyncio.sleep(10) one = yield 1 async def f(loop): g = gen() await g.asend(None) send_task = loop.create_task(g.asend('zero')) send_task.cancel() # What is the state of g now? # Can I receive one from g? Or cancel() will lead to CancelledError throw from asyncio.sleep(10)? one = g.asend('one') Will canceling of asend() task result in canceling of generator? Thanks, Vladimir Rutsky From yselivanov.ml at gmail.com Tue Aug 9 14:02:52 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 9 Aug 2016 14:02:52 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> Message-ID: Hi Vladimir, On 2016-08-09 1:24 PM, Vladimir Rutsky wrote: > Hi Yury, > > Thank you for posting this PEP! > As an asyncio-based libraries user, author and contributor I > appreciate that topic of easy-writing of asynchronous generators is > being covered now. > I like how simple writing of async generators will be with this PEP, > same with consequent functionality (e.g. async context wrappers). Thanks a lot for reaching out! > It's great that with this PEP it would be possible to write simple > async data processing wrappers without a knowledge of a single special > method > (__aiter__, __anext__, __aenter__, __aexit__) using same patterns as > regular Python user would use in synchronous code: > [..] Precisely this. > Can you explain details of canceling of asend and await coroutines? > > Consider following example: > > async def gen(): > zero = yield 0 > asyncio.sleep(10) > one = yield 1 > > async def f(loop): > g = gen() > > await g.asend(None) > > send_task = loop.create_task(g.asend('zero')) > send_task.cancel() > > # What is the state of g now? > # Can I receive one from g? Or cancel() will lead to > CancelledError throw from asyncio.sleep(10)? > one = g.asend('one') > > Will canceling of asend() task result in canceling of generator? > Cancelling `asend()` will result in an asyncio.CancelledError being raised from the 'await asyncio.sleep(0)' line inside the async generator: async def gen(): try: await asyncio.sleep(1) except asyncio.CancelledError: print('got it') raise # It's OK to not to re-raise the error too! yield 123 async def run(): g = gen() t = asyncio.ensure_future(g.asend(None)) await asyncio.sleep(0.5) t.cancel() await t # this line will fail with CancelledError ^ the above code will print "got it" and then crash with an asyncio.CancelledError raised back from 'await t'. You can handle the error in the generator and continue to iterate over it. If you don't catch that exception inside your generator, the generator will be closed (it's similar to what would happen if an exception occurs inside a sync generator, or is thrown into it with gen.throw()). Thank you, Yury From elazarg at gmail.com Tue Aug 9 17:47:19 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Tue, 09 Aug 2016 21:47:19 +0000 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: It's already possible to overload NamedTuple, in a way that will allow the following abuse of notation: @NamedTuple def Starship(damage:int, captain:str): pass The 'def' is unfortunate and potentially confusing (although it *is* a callable definition), and the ": pass" is meaningless. But I think it is clear and concise if you know what NamedTuple is. Introducing new keyword will of course solve both problems (if there's "async def", why not "type def"? :) ). case class Starship(damage:int, captain:str) Possible variations on the decorator theme: "unordered" namedtuple (note the *) @Value def Starship(*, damage:int, captain:str): pass self-describing (another level of notation abuse): @Type def Starship(damage:int, captain:str) -> NamedTuple: pass On Mon, Aug 8, 2016 at 11:25 PM Guido van Rossum wrote: > That's a very interesting idea and one that deserves pursuing (though I > agree it's not a blocker for the PEP I'm hoping to write). I think the next > step is to prototype this -- which can only happen once we have an > implementation of the PEP. Though perhaps you could start by writing a > prototype that works by having the user write the following: > > class Starship(PrototypeNamedTuple): > damage = 0 > captain = "Kirk" > __annotations__ = dict(damage=int, captain=str) > > It could also benefit from PEP 520 (Preserving Class Attribute Definition > Order). > > Who's game? > > --Guido > > On Mon, Aug 8, 2016 at 1:13 PM, ????? wrote: > >> class Starship: >>> stats: class Dict[str, int] = {} # Pure class variable >>> damage: class int = 0 # Hybrid class/instance variable >>> captain: str # Pure instance variable >>> >> >> I can't avoid noting that there is an opportunity here to insert >> NamedTuple into the core language. The above example is almost there, >> except it's mutable and without convenient methods. But >> >> class Starship(tuple): >> damage: int = 0 >> captain: str = "Kirk" >> >> Is an obvious syntax for >> >> Starship = NamedTuple('Starship', [('damage', int), ('captain', >> str)]) >> >> Only much more available and intuitive to read, use, and of course - type >> check. >> (Of course, it does mean adding semantics to the declaration syntax in >> general) >> >> I'm not really suggesting to make this change now, but I believe it will >> be done, sooner or later. My brief experience with mypy convinced me that >> it must be the case. The new declaration syntax only makes it easier. >> >> ~Elazar >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Aug 9 18:42:12 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 9 Aug 2016 15:42:12 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> References: <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> Message-ID: Is this idea still alive? Despite the bike shedding, I think that some level of consensus may have been reached. So I suggest that either Neil (because it was your idea) or Steven (because you've had a lot of opinions, and done a lot of the homework) or both, of course, put together a reference implementation and a proposal, post it here, and see how it flies. It's one function, so hopefully won't need a PEP, but if your proposal meets with a lot of resistance, then you could turn it into a PEP then. But getting all this discussion summaries would be good as a first step. NOTE: I think it's a fine idea, but I've got way to omay other things I'd like to do first -- so I'm not going to push this forward... -CHB On Fri, Aug 5, 2016 at 10:24 PM, Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > Steven D'Aprano writes: > > On Fri, Aug 05, 2016 at 11:30:35PM +0900, Stephen J. Turnbull wrote: > > > > > I can even think of a case where clamp could be used with a constant > > > control and a varying bound: S-s inventory control facing occasional > > > large orders in an otherwise continuous, stationary demand process. > > > > Sounds interesting. Is there a link to somewhere I could learn more > > about this? > > The textbook I use is Nancy Stokey, The Economics of Inaction > https://www.amazon.co.jp/s/ref=nb_sb_noss?__mk_ja_JP=%E3% > 82%AB%E3%82%BF%E3%82%AB%E3%83%8A&url=search-alias%3Daps& > field-keywords=nancy+stokey+economics+inaction > > The example I gave is not a textbook example, but is an "obvious" > extension of the simplest textbook models. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Tue Aug 9 19:24:55 2016 From: mertz at gnosis.cx (David Mertz) Date: Tue, 9 Aug 2016 16:24:55 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> Message-ID: FWIW. I first opined that people should just write their own utility function. However, there are enough differences of opinion about the right semantics that I support it being in the standard library now. Those decisions may or may not be made the way I most prefer, but I think an "official behavior in the edge cases is best to have... I can always implement my own version with different behavior if I need to. On Aug 9, 2016 3:43 PM, "Chris Barker" wrote: > Is this idea still alive? > > Despite the bike shedding, I think that some level of consensus may have > been reached. So I suggest that either Neil (because it was your idea) or > Steven (because you've had a lot of opinions, and done a lot of the > homework) or both, of course, put together a reference implementation and a > proposal, post it here, and see how it flies. > > It's one function, so hopefully won't need a PEP, but if your proposal > meets with a lot of resistance, then you could turn it into a PEP then. But > getting all this discussion summaries would be good as a first step. > > NOTE: I think it's a fine idea, but I've got way to omay other things I'd > like to do first -- so I'm not going to push this forward... > > -CHB > > > > > On Fri, Aug 5, 2016 at 10:24 PM, Stephen J. Turnbull < > turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > >> Steven D'Aprano writes: >> > On Fri, Aug 05, 2016 at 11:30:35PM +0900, Stephen J. Turnbull wrote: >> > >> > > I can even think of a case where clamp could be used with a constant >> > > control and a varying bound: S-s inventory control facing occasional >> > > large orders in an otherwise continuous, stationary demand process. >> > >> > Sounds interesting. Is there a link to somewhere I could learn more >> > about this? >> >> The textbook I use is Nancy Stokey, The Economics of Inaction >> https://www.amazon.co.jp/s/ref=nb_sb_noss?__mk_ja_JP=%E3%82% >> AB%E3%82%BF%E3%82%AB%E3%83%8A&url=search-alias%3Daps&field- >> keywords=nancy+stokey+economics+inaction >> >> The example I gave is not a textbook example, but is an "obvious" >> extension of the simplest textbook models. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Tue Aug 9 19:32:18 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 9 Aug 2016 17:32:18 -0600 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Tue, Aug 9, 2016 at 3:47 PM, ????? wrote: > It's already possible to overload NamedTuple, in a way that will allow the > following abuse of notation: > > @NamedTuple > def Starship(damage:int, captain:str): pass > > The 'def' is unfortunate and potentially confusing (although it *is* a > callable definition), and the ": pass" is meaningless. But I think it is > clear and concise if you know what NamedTuple is. > > Introducing new keyword will of course solve both problems (if there's > "async def", why not "type def"? :) ). If we're dealing with classes then we should be using the class syntax. There are a number of options here for identifying attributes in a definition and even auto-generating parts of the class (e.g. __init__). Let's look at several (with various objectives): # For the sake of demonstration, we ignore opportunities for type inference. # currently (with comments for type info) class Bee(namedtuple('Bee', 'name ancient_injury managerie')): """can a bee be said to be...""" # name: str # ancient_injury: bool # menagerie: bool def __new__(cls, name='Eric', ancient_injury=False, menagerie=False): return super().__new__(cls, name, ancient_injury, menagerie) def half_a(self): return self.ancient_injury or self.menagerie # using attribute annotations and a decorator (and PEP 520) @as_namedtuple class Bee: """...""" name: str = 'Eric' ancient_injury: bool = False menagerie: bool = False def half_a(self): ... # using attribute annotations and a metaclass (and PEP 520) class Bee(metaclass=NamedtupleMeta): """...""" name: str = 'Eric' ancient_injury: bool = False menagerie: bool = False def half_a(self): ... # using one class decorator and PEP 520 and comments for type info @as_namedtuple class Bee: name = 'Eric' # str ancient_injury = False # bool menagerie = False # bool def half_a(self): ... # using one class decorator and comments for type info @as_namedtuple('name ancient_injury managerie', name='Eric', ancient_injury=False, menagerie=False) class Bee: """...""" # name: str # ancient_injury: bool # menagerie: bool def half_a(self): ... # using one class decorator (and PEP 468) and comments for type info # similar to the original motivation for PEP 468 @as_namedtuple(name='Eric', ancient_injury=False, menagerie=False) class Bee: """...""" # name: str # ancient_injury: bool # menagerie: bool def half_a(self): ... # using a class decorator for each attribute @as_namedtuple('name ancient_injury managerie') @attr('name', str, 'Eric') @attr('ancient_injury', bool, False) @attr('menagerie', bool, False) class Bee: """...""" def half_a(self): ... Those are simple examples and we could certainly come up with others, all using the class syntax. For me, the key is finding the sweet spot between readability/communicating intent and packing too many roles into the class syntax. To be honest, the examples using attribute annotations seem fine to me. Even if you don't need attr type info (which you don't most of the time, particularly with default values), my favorite solution (a class decorator that leverages PEP 520) is still the most readable and in-line with the class syntax, at least for me. :) -eric p.s. The same approaches could also be applied to generating non-namedtuple classes, e.g. SimpleNamespace subclasses. From steve at pearwood.info Tue Aug 9 20:05:50 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 10 Aug 2016 10:05:50 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> Message-ID: <20160810000548.GB26300@ando.pearwood.info> On Tue, Aug 09, 2016 at 03:42:12PM -0700, Chris Barker wrote: > Is this idea still alive? > > Despite the bike shedding, I think that some level of consensus may have > been reached. So I suggest that either Neil (because it was your idea) or > Steven (because you've had a lot of opinions, and done a lot of the > homework) or both, of course, put together a reference implementation and a > proposal, post it here, and see how it flies. I'm happy to write up a summary and a reference implementation. -- Steve From ethan at stoneleaf.us Tue Aug 9 20:24:23 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 09 Aug 2016 17:24:23 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160810000548.GB26300@ando.pearwood.info> References: <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> <20160810000548.GB26300@ando.pearwood.info> Message-ID: <57AA7437.4070704@stoneleaf.us> On 08/09/2016 05:05 PM, Steven D'Aprano wrote: > On Tue, Aug 09, 2016 at 03:42:12PM -0700, Chris Barker wrote: >> Is this idea still alive? >> >> Despite the bike shedding, I think that some level of consensus may have >> been reached. So I suggest that either Neil (because it was your idea) or >> Steven (because you've had a lot of opinions, and done a lot of the >> homework) or both, of course, put together a reference implementation and a >> proposal, post it here, and see how it flies. > > I'm happy to write up a summary and a reference implementation. Excellent! I'm looking forward to it. -- ~Ethan~ From mistersheik at gmail.com Wed Aug 10 00:32:20 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 9 Aug 2016 21:32:20 -0700 (PDT) Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160810000548.GB26300@ando.pearwood.info> References: <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> <20160810000548.GB26300@ando.pearwood.info> Message-ID: <4805751d-6ca1-481f-9b9f-855324a9aad7@googlegroups.com> Thank you! On Tuesday, August 9, 2016 at 8:07:54 PM UTC-4, Steven D'Aprano wrote: > > On Tue, Aug 09, 2016 at 03:42:12PM -0700, Chris Barker wrote: > > Is this idea still alive? > > > > Despite the bike shedding, I think that some level of consensus may have > > been reached. So I suggest that either Neil (because it was your idea) > or > > Steven (because you've had a lot of opinions, and done a lot of the > > homework) or both, of course, put together a reference implementation > and a > > proposal, post it here, and see how it flies. > > I'm happy to write up a summary and a reference implementation. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.combelle at gmail.com Wed Aug 10 02:46:43 2016 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Wed, 10 Aug 2016 08:46:43 +0200 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <8457d50f-2ea4-47dd-ab44-f3bf28fe8bb3@googlegroups.com> Message-ID: I just stumble upon on this precise use case yesterday, I solved it unsatisfactorily by the following code (inlined) value = max(lower, value) value = min(upper, value) So It's certainly a good thing to have -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Aug 10 00:46:01 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 9 Aug 2016 21:46:01 -0700 (PDT) Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: <6985f794-7972-4c61-9d18-c46fe668beeb@googlegroups.com> With PEP 520 accepted, would it be possible to iterate over __definition_order__? class PrototypeNamedTuple: cache = {} def __new__(cls, *args): P = PrototypeNamedTuple if cls not in P.cache: P.cache[cls] = typing.NamedTuple(cls.__name__, [(definition, cls.__annotations__[definition]) for definition in cls.__definition_order__] ) return P.cache[cls](*args) On Monday, August 8, 2016 at 5:09:50 PM UTC-4, Guido van Rossum wrote: > > Hm, overlooking the ordering is kind of a big deal for something with > "tuple" in its name. :-) > > Also it feels like it needs a metaclass instead of a cache. > > Maybe from this we can learn though that __anotations__ should be an > OrderedDict? > > On Mon, Aug 8, 2016 at 1:58 PM, ????? > > wrote: > >> class PrototypeNamedTuple: >> cache = {} >> def __new__(cls, *args): >> P = PrototypeNamedTuple >> if cls not in P.cache: >> P.cache[cls] = typing.NamedTuple(cls.__name__, >> cls.__annotations__.items()) >> return P.cache[cls](*args) >> >> Works modulo ordering, though I'm not sure that's the right way to do it. >> >> The ordering part of namedtuple is orthogonal to the >> value-type/immutability part. So I would imagine making "Value" for the >> latter, "tuple" for the former, and namedtuple is mixing both (possibly >> given a convenient name, such as PrototypeNamedTuple). "Value" can also >> seen as mixing "Struct" and "Immutable", but that's overdoing it I guess. >> >> ~Elazar >> >> On Mon, Aug 8, 2016 at 11:25 PM Guido van Rossum > > wrote: >> >>> That's a very interesting idea and one that deserves pursuing (though I >>> agree it's not a blocker for the PEP I'm hoping to write). I think the next >>> step is to prototype this -- which can only happen once we have an >>> implementation of the PEP. Though perhaps you could start by writing a >>> prototype that works by having the user write the following: >>> >>> class Starship(PrototypeNamedTuple): >>> damage = 0 >>> captain = "Kirk" >>> __annotations__ = dict(damage=int, captain=str) >>> >>> It could also benefit from PEP 520 (Preserving Class Attribute >>> Definition Order). >>> >>> Who's game? >>> >>> --Guido >>> >>> On Mon, Aug 8, 2016 at 1:13 PM, ????? > >>> wrote: >>> >>>> class Starship: >>>>> stats: class Dict[str, int] = {} # Pure class variable >>>>> damage: class int = 0 # Hybrid class/instance variable >>>>> captain: str # Pure instance variable >>>>> >>>> >>>> I can't avoid noting that there is an opportunity here to insert >>>> NamedTuple into the core language. The above example is almost there, >>>> except it's mutable and without convenient methods. But >>>> >>>> class Starship(tuple): >>>> damage: int = 0 >>>> captain: str = "Kirk" >>>> >>>> Is an obvious syntax for >>>> >>>> Starship = NamedTuple('Starship', [('damage', int), ('captain', >>>> str)]) >>>> >>>> Only much more available and intuitive to read, use, and of course - >>>> type check. >>>> (Of course, it does mean adding semantics to the declaration syntax in >>>> general) >>>> >>>> I'm not really suggesting to make this change now, but I believe it >>>> will be done, sooner or later. My brief experience with mypy convinced me >>>> that it must be the case. The new declaration syntax only makes it easier. >>>> >>>> ~Elazar >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python... at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>> >>> >>> >>> -- >>> --Guido van Rossum (python.org/~guido) >>> >> > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Wed Aug 10 13:17:48 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 10 Aug 2016 19:17:48 +0200 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> Message-ID: On 09.08.2016 05:23, Nick Coghlan wrote: > On 9 August 2016 at 08:37, Sven R. Kunze > wrote: > > From what I've heard in the wild, that most if not all pieces of > async are mirroring existing Python features. So, building async > basically builds a parallel structure in Python resembling Python. > Async generators complete the picture. Some people (including me) > are concerned by this because they feel that having two "almost > the same pieces" is not necessarily a good thing to have. And not > necessarily bad but it feels like duplicating code all over the > place especially as existing functions are incompatible with async. > > > It's a known problem that applies to programming language design in > general rather than being Python specific: > http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ If it's a such well-known **problem**, why does it even exist in Python in the first place? ;-) I don't buy that one couldn't have avoided it. Lately, I talked to friend of mine about async and his initial reaction was like "hmm that reminds me of early programming days, where you have to explicitly tell the scheduler to get control back". He's much older than me, so I think it was interesting for him to see that history is repeating again. > The async/await model used in C# and Python 3.5+ aims to at least make > it clear whether you're dealing with "red" (asynchronous) or "blue" > (synchronous) code, but some use cases will still benefit from using > something like gevent to bridge between the synchronous world and the > asynchronous one: > http://python-notes.curiousefficiency.org/en/latest/pep_ideas/async_programming.html#gevent-and-pep-3156 > > In cases where blocking for IO is acceptable, the > "run_until_complete()" method on event loop implementations can be > used in order to invoke asynchronous code synchronously: > http://www.curiousefficiency.org/posts/2015/07/asyncio-background-calls.html > > As I understand it, one goal is to be as close to sync code as > possible to make async code easier to understand. So, if dropping > 'await's all over the place is the main difference between current > async and sync code, I get the feeling both styles could be very > much unified. (The examples of the PEP try to do it like this but > it wouldn't work as we already discussed.) Maybe, it's too early > for a discussion about this but I wanted to dump this thought > somewhere. :) > > > One of the big mindset shifts it encourages is to design as many > support libraries as possible as computational pipelines and > message-drive state machines, rather than coupling them directly to IO > operations (which is the way many of them work today). Brett Cannon > started the Sans IO information project to discuss this concept at > http://sans-io.readthedocs.io/ Interesting shift indeed and I like small Lego bricks I can use to build a big house. However, couldn't this be achieved without splitting the community? Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Wed Aug 10 13:42:21 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 10 Aug 2016 13:42:21 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> Message-ID: <82ab71d1-8139-e39d-cf49-d54088c3086f@gmail.com> Hi Brett, On 2016-08-10 12:27 PM, Brett Cannon wrote: > [..] > > According to the inspect module that's a coroutine function that > creates a coroutine/awaitable (and a function w/ @types.coroutine is > just an awaitable when it contains a `yield`). Now the existence of > `yield` in a function causing it to be a generator is obviously not a > new concept, but since coroutines come from a weird generator > background thanks to `yield from` we might need to start being very > clear what @types.coroutine, `async def`, and `async def` w/ `yield` > are -- awaitable (but not coroutine in spite of the decorator name), > coroutine, and async generator, respectively -- and not refer to > @types.coroutine/`yield from` in the same breath to minimize confusion. > Good suggestion. FWIW, in PEP 492, we called awaitables created with @types.coroutine as "generator-based coroutines". These days I see less and less people use @asyncio.coroutine and 'yield from'. Even less so know about @types.coroutine and how async/await tied to generators in the interpreter. This knowledge is now only required for people who implement/maintain frameworks like asyncio/twisted/curio. So I hope that we're already at the stage when people can just use async/await without really thinking how it's implemented. PEP 525 is another step in that direction -- making asynchronous iterators easy to use. Yury From guido at python.org Wed Aug 10 13:56:25 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Aug 2016 10:56:25 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: <6985f794-7972-4c61-9d18-c46fe668beeb@googlegroups.com> References: <6985f794-7972-4c61-9d18-c46fe668beeb@googlegroups.com> Message-ID: Sounds like you're thinking with your runtime hat on, not your type checker hat. :-) On Tue, Aug 9, 2016 at 9:46 PM, Neil Girdhar wrote: > With PEP 520 accepted, would it be possible to iterate over > __definition_order__? > > class PrototypeNamedTuple: > cache = {} > def __new__(cls, *args): > P = PrototypeNamedTuple > if cls not in P.cache: > P.cache[cls] = typing.NamedTuple(cls.__name__, > > [(definition, cls.__annotations__[definition]) > for definition in cls.__definition_order__] > > ) > return P.cache[cls](*args) > > On Monday, August 8, 2016 at 5:09:50 PM UTC-4, Guido van Rossum wrote: >> >> Hm, overlooking the ordering is kind of a big deal for something with >> "tuple" in its name. :-) >> >> Also it feels like it needs a metaclass instead of a cache. >> >> Maybe from this we can learn though that __anotations__ should be an >> OrderedDict? >> >> On Mon, Aug 8, 2016 at 1:58 PM, ????? wrote: >> >>> class PrototypeNamedTuple: >>> cache = {} >>> def __new__(cls, *args): >>> P = PrototypeNamedTuple >>> if cls not in P.cache: >>> P.cache[cls] = typing.NamedTuple(cls.__name__, >>> cls.__annotations__.items()) >>> return P.cache[cls](*args) >>> >>> Works modulo ordering, though I'm not sure that's the right way to do it. >>> >>> The ordering part of namedtuple is orthogonal to the >>> value-type/immutability part. So I would imagine making "Value" for the >>> latter, "tuple" for the former, and namedtuple is mixing both (possibly >>> given a convenient name, such as PrototypeNamedTuple). "Value" can also >>> seen as mixing "Struct" and "Immutable", but that's overdoing it I guess. >>> >>> ~Elazar >>> >>> On Mon, Aug 8, 2016 at 11:25 PM Guido van Rossum >>> wrote: >>> >>>> That's a very interesting idea and one that deserves pursuing (though I >>>> agree it's not a blocker for the PEP I'm hoping to write). I think the next >>>> step is to prototype this -- which can only happen once we have an >>>> implementation of the PEP. Though perhaps you could start by writing a >>>> prototype that works by having the user write the following: >>>> >>>> class Starship(PrototypeNamedTuple): >>>> damage = 0 >>>> captain = "Kirk" >>>> __annotations__ = dict(damage=int, captain=str) >>>> >>>> It could also benefit from PEP 520 (Preserving Class Attribute >>>> Definition Order). >>>> >>>> Who's game? >>>> >>>> --Guido >>>> >>>> On Mon, Aug 8, 2016 at 1:13 PM, ????? wrote: >>>> >>>>> class Starship: >>>>>> stats: class Dict[str, int] = {} # Pure class variable >>>>>> damage: class int = 0 # Hybrid class/instance variable >>>>>> captain: str # Pure instance variable >>>>>> >>>>> >>>>> I can't avoid noting that there is an opportunity here to insert >>>>> NamedTuple into the core language. The above example is almost there, >>>>> except it's mutable and without convenient methods. But >>>>> >>>>> class Starship(tuple): >>>>> damage: int = 0 >>>>> captain: str = "Kirk" >>>>> >>>>> Is an obvious syntax for >>>>> >>>>> Starship = NamedTuple('Starship', [('damage', int), ('captain', >>>>> str)]) >>>>> >>>>> Only much more available and intuitive to read, use, and of course - >>>>> type check. >>>>> (Of course, it does mean adding semantics to the declaration syntax in >>>>> general) >>>>> >>>>> I'm not really suggesting to make this change now, but I believe it >>>>> will be done, sooner or later. My brief experience with mypy convinced me >>>>> that it must be the case. The new declaration syntax only makes it easier. >>>>> >>>>> ~Elazar >>>>> >>>>> _______________________________________________ >>>>> Python-ideas mailing list >>>>> Python... at python.org >>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>>> >>>> >>>> >>>> >>>> -- >>>> --Guido van Rossum (python.org/~guido) >>>> >>> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Wed Aug 10 14:10:53 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 10 Aug 2016 11:10:53 -0700 Subject: [Python-ideas] Fix default encodings on Windows Message-ID: <57AB6E2D.6050704@python.org> I suspect there's a lot of discussion to be had around this topic, so I want to get it started. There are some fairly drastic ideas here and I need help figuring out whether the impact outweighs the value. Some background: within the Windows API, the preferred encoding is UTF-16. This is a 16-bit format that is typed as wchar_t in the APIs that use it. These APIs are generally referred to as the *W APIs (because they have a W suffix). There are also (broadly deprecated) APIs that use an 8-bit format (char), where the encoding is assumed to be "the user's active code page". These are *A APIs. AFAIK, there are no cases where a *A API should be preferred over a *W API, and many newer APIs are *W only. In general, Python passes byte strings into the *A APIs and text strings into the *W APIs. Right now, sys.getfilesystemencoding() on Windows returns "mbcs", which translates to "the system's active code page". As this encoding generally cannot represent all paths on Windows, it is deprecated and Unicode strings are recommended instead. This, however, means you need to write significantly different code between POSIX (use bytes) and Windows (use text). ISTM that changing sys.getfilesystemencoding() on Windows to "utf-8" and updating path_converter() (Python/posixmodule.c; likely similar code in other places) to decode incoming byte strings would allow us to undeprecate byte strings and add the requirement that they *must* be encoded with sys.getfilesystemencoding(). I assume that this would allow cross-platform code to handle paths similarly by encoding to whatever the sys module says they should and using bytes consistently (starting this thread is meant to validate/refute my assumption). (Yes, I know that people on POSIX should just change to using Unicode and surrogateescape. Unfortunately, rather than doing that they complain about Windows and drop support for the platform. If you want to keep hitting them with the stick, go ahead, but I'm inclined to think the carrot is more valuable here.) Similarly, locale.getpreferredencoding() on Windows returns a legacy value - the user's active code page - which should generally not be used for any reason. The one exception is as a default encoding for opening files when no other information is available (e.g. a Unicode BOM or explicit encoding argument). BOMs are very common on Windows, since the default assumption is nearly always a bad idea. Making open()'s default encoding detect a BOM before falling back to locale.getpreferredencoding() would resolve many issues, but I'm also inclined towards making the fallback utf-8, leaving locale.getpreferredencoding() solely as a way to get the active system codepage (with suitable warnings about it only being useful for back-compat). This would match the behavior that the .NET Framework has used for many years - effectively, utf_8_sig on read and utf_8 on write. Finally, the encoding of stdin, stdout and stderr are currently (correctly) inferred from the encoding of the console window that Python is attached to. However, this is typically a codepage that is different from the system codepage (i.e. it's not mbcs) and is almost certainly not Unicode. If users are starting Python from a console, they can use "chcp 65001" first to switch to UTF-8, and then *most* functionality works (input() has some issues, but those can be fixed with a slight rewrite and possibly breaking readline hooks). It is also possible for Python to change the current console encoding to be UTF-8 on initialize and change it back on finalize. (This would leave the console in an unexpected state if Python segfaults, but console encoding is probably the least of anyone's worries at that point.) So I'm proposing actively changing the current console to be Unicode while Python is running, and hence sys.std[in|out|err] will default to utf-8. So that's a broad range of changes, and I have little hope of figuring out all the possible issues, back-compat risks, and flow-on effects on my own. Please let me know (either on-list or off-list) how a change like this would affect your projects, either positively or negatively, and whether you have any specific experience with these changes/fixes and think they should be approached differently. To summarise the proposals (remembering that these would only affect Python 3.6 on Windows): * change sys.getfilesystemencoding() to return 'utf-8' * automatically decode byte paths assuming they are utf-8 * remove the deprecation warning on byte paths * make the default open() encoding check for a BOM or else use utf-8 * [ALTERNATIVE] make the default open() encoding check for a BOM or else use sys.getpreferredencoding() * force the console encoding to UTF-8 on initialize and revert on finalize So what are your concerns? Suggestions? Thanks, Steve From p.f.moore at gmail.com Wed Aug 10 14:44:02 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 10 Aug 2016 19:44:02 +0100 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB6E2D.6050704@python.org> References: <57AB6E2D.6050704@python.org> Message-ID: On 10 August 2016 at 19:10, Steve Dower wrote: > To summarise the proposals (remembering that these would only affect Python > 3.6 on Windows): > > * change sys.getfilesystemencoding() to return 'utf-8' > * automatically decode byte paths assuming they are utf-8 > * remove the deprecation warning on byte paths > * make the default open() encoding check for a BOM or else use utf-8 > * [ALTERNATIVE] make the default open() encoding check for a BOM or else use > sys.getpreferredencoding() > * force the console encoding to UTF-8 on initialize and revert on finalize > > So what are your concerns? Suggestions? I presume you'd be targeting 3.7 for this change. Broadly, I'm +1 on all of this. Personally, I'm moving to UTF-8 everywhere, so it seems OK to me, but I suspect defaulting open() to UTF-8 in the absence of a BOM might cause issues for people. Most text editors still (AFAIK) use the ANSI codepage by default, and it's the one place where an identifying BOM isn't possible. So your alternative may be a safer choice. On the other hand, files from Unix (via say github) would typically be UTF-8 without BOM, so it becomes a question of choosing the best compromise. I'm inclined to go for cross-platform and UTF-8 and clearly document the change. We might want a more convenient short form for open(filename, "r", encoding=sys.getpreferredencoding()), though, to ease the transition... We'd also need to consider how the new default encoding would interact with PYTHONIOENCODING. For the console, does this mean that the win_unicode_console module will no longer be needed when these changes go in? Sorry, not much in the way of direct experience or information I can add, but a strong +1 on the change (and I'd be happy to help where needed). Paul From random832 at fastmail.com Wed Aug 10 14:46:25 2016 From: random832 at fastmail.com (Random832) Date: Wed, 10 Aug 2016 14:46:25 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB6E2D.6050704@python.org> References: <57AB6E2D.6050704@python.org> Message-ID: <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> On Wed, Aug 10, 2016, at 14:10, Steve Dower wrote: > To summarise the proposals (remembering that these would only affect > Python 3.6 on Windows): > > * change sys.getfilesystemencoding() to return 'utf-8' > * automatically decode byte paths assuming they are utf-8 > * remove the deprecation warning on byte paths Why? What's the use case? > * make the default open() encoding check for a BOM or else use utf-8 > * [ALTERNATIVE] make the default open() encoding check for a BOM or else > use sys.getpreferredencoding() For reading, I assume. When opened for writing, it should probably be utf-8-sig [if it's not mbcs] to match what Notepad does. What about files opened for appending or updating? In theory it could ingest the whole file to see if it's valid UTF-8, but that has a time cost. Notepad, if there's no BOM, checks the first 256 bytes of the file for whether it's likely to be utf-16 or mbcs [utf-8 isn't considered AFAIK], and can get it wrong for certain very short files [i.e. the infamous "this app can break"] What to do on opening a pipe or device? [Is os.fstat able to detect these cases?] Maybe the BOM detection phase should be deferred until the first read. What should encoding be at that point if this is done? Is there a "utf-any" encoding that can handle all five BOMs? If not, should there be? how are "utf-16" and "utf-32" files opened for appending or updating handled today? > * force the console encoding to UTF-8 on initialize and revert on > finalize Why not implement a true unicode console? What if sys.stdin/stdout are pipes (or non-console devices such as a serial port)? From steve.dower at python.org Wed Aug 10 15:08:48 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 10 Aug 2016 12:08:48 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> Message-ID: <57AB7BC0.4010804@python.org> On 10Aug2016 1144, Paul Moore wrote: > I presume you'd be targeting 3.7 for this change. Does 3.6 seem too aggressive? I think I have time to implement the changes before beta 1, as it's mostly changing default values and mopping up resulting breaks. (Doing something like reimplementing files using the Win32 API rather than the CRT would be too big a task for 3.6.) > Most text editors still (AFAIK) use > the ANSI codepage by default, and it's the one place where an > identifying BOM isn't possible. So your alternative may be a safer > choice. On the other hand, files from Unix (via say github) would > typically be UTF-8 without BOM, so it becomes a question of choosing > the best compromise. I'm inclined to go for cross-platform and UTF-8 > and clearly document the change. That last point was my thinking. Notepad's default is just as bad as Python's default right now, but basically everyone acknowledges that it's bad. I don't think we should prevent Python from behaving better because one Windows tool doesn't. > We might want a more convenient short > form for open(filename, "r", encoding=sys.getpreferredencoding()), > though, to ease the transition... We'd also need to consider how the > new default encoding would interact with PYTHONIOENCODING. PYTHONIOENCODING doesn't affect locale.getpreferredencoding() (but it does affect sys.std*.encoding). > For the console, does this mean that the win_unicode_console module > will no longer be needed when these changes go in? That's the hope, though that module approaches the solution differently and may still uses. An alternative way for us to fix this whole thing would be to bring win_unicode_console into the standard library and use it by default (or probably whenever PYTHONIOENCODING is not specified). > Sorry, not much in the way of direct experience or information I can > add, but a strong +1 on the change (and I'd be happy to help where > needed). Testing with obscure filenames and strings is where help will be needed most :) Cheers, Steve From steve.dower at python.org Wed Aug 10 15:22:20 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 10 Aug 2016 12:22:20 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> Message-ID: <57AB7EEC.2020201@python.org> On 10Aug2016 1146, Random832 wrote: > On Wed, Aug 10, 2016, at 14:10, Steve Dower wrote: >> To summarise the proposals (remembering that these would only affect >> Python 3.6 on Windows): >> >> * change sys.getfilesystemencoding() to return 'utf-8' >> * automatically decode byte paths assuming they are utf-8 >> * remove the deprecation warning on byte paths > > Why? What's the use case? Allowing library developers who support POSIX and Windows to just use bytes everywhere to represent paths. >> * make the default open() encoding check for a BOM or else use utf-8 >> * [ALTERNATIVE] make the default open() encoding check for a BOM or else >> use sys.getpreferredencoding() > > For reading, I assume. When opened for writing, it should probably be > utf-8-sig [if it's not mbcs] to match what Notepad does. What about > files opened for appending or updating? In theory it could ingest the > whole file to see if it's valid UTF-8, but that has a time cost. Writing out the BOM automatically basically makes your files incompatible with other platforms, which rarely expect a BOM. By omitting it but writing and reading UTF-8 we ensure that Python can handle its own files on any platform, while potentially upsetting some older applications on Windows or platforms that don't assume UTF-8 as a default. > Notepad, if there's no BOM, checks the first 256 bytes of the file for > whether it's likely to be utf-16 or mbcs [utf-8 isn't considered AFAIK], > and can get it wrong for certain very short files [i.e. the infamous > "this app can break"] Yeah, this is a pretty horrible idea :) I don't want to go there by default, but people can install chardet if they want the functionality. > What to do on opening a pipe or device? [Is os.fstat able to detect > these cases?] We should be able to detect them, but why treat them any differently from a file? Right now they're just as broken as they will be after the change if you aren't specifying 'b' or an encoding - probably more broken, since at least you'll get less encoding errors when the encoding is UTF-8. > Maybe the BOM detection phase should be deferred until the first read. > What should encoding be at that point if this is done? Is there a > "utf-any" encoding that can handle all five BOMs? If not, should there > be? how are "utf-16" and "utf-32" files opened for appending or updating > handled today? Yes, I think it would be. I suspect we'd have to leave the encoding unknown until the first read, and perhaps force it to utf-8-sig if someone asks before we start. I don't *think* this is any less predictable than the current behaviour, given it only applies when you've left out any encoding specification, but maybe it is. It probably also entails opening the file descriptor in bytes mode, which might break programs that pass the fd directly to CRT functions. Personally I wish they wouldn't, but it's too late to stop them now. >> * force the console encoding to UTF-8 on initialize and revert on >> finalize > > Why not implement a true unicode console? What if sys.stdin/stdout are > pipes (or non-console devices such as a serial port)? Mostly because it's much more work. As I mentioned in my other post, an alternative would be to bring win_unicode_console into the stdlib and enable it by default (which considering the package was largely developed on bugs.p.o is probably okay, but we'd probably need to rewrite it in C, which is basically implementing a true Unicode console). You're right that changing the console encoding after launching Python is probably going to mess with pipes. We can detect whether the streams are interactive or not and adjust accordingly, but that's going to get messy if you're only piping in/out and stdin/out end up with different encodings. I'll put some more thought into this part. Thanks, Steve From p.f.moore at gmail.com Wed Aug 10 15:23:00 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 10 Aug 2016 20:23:00 +0100 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB7BC0.4010804@python.org> References: <57AB6E2D.6050704@python.org> <57AB7BC0.4010804@python.org> Message-ID: On 10 August 2016 at 20:08, Steve Dower wrote: > On 10Aug2016 1144, Paul Moore wrote: >> >> I presume you'd be targeting 3.7 for this change. > > Does 3.6 seem too aggressive? I think I have time to implement the changes > before beta 1, as it's mostly changing default values and mopping up > resulting breaks. (Doing something like reimplementing files using the Win32 > API rather than the CRT would be too big a task for 3.6.) I guess I just assumed it was a bigger change than that. I don't object to it going into 3.6 as such, but it might need longer for any debates to die down. I guess that comes down to how big this thread gets, though. Personally, I'd be OK with it being in 3.6, we'll see if others think it's too aggressive :-) > Testing with obscure filenames and strings is where help will be needed most > :) I'm happy to invent hard cases for you, but I'm in the UK. For real use, the Euro symbol's about as obscure as we get around here ;-) Paul From random832 at fastmail.com Wed Aug 10 15:26:21 2016 From: random832 at fastmail.com (Random832) Date: Wed, 10 Aug 2016 15:26:21 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB7BC0.4010804@python.org> References: <57AB6E2D.6050704@python.org> <57AB7BC0.4010804@python.org> Message-ID: <1470857181.1525666.691718929.251B8B35@webmail.messagingengine.com> On Wed, Aug 10, 2016, at 15:08, Steve Dower wrote: > Testing with obscure filenames and strings is where help will be needed > most :) How about filenames with invalid surrogates? For added fun, consider that the file system encoding is normally used with surrogateescape. From steve.dower at python.org Wed Aug 10 15:39:19 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 10 Aug 2016 12:39:19 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1470857181.1525666.691718929.251B8B35@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <57AB7BC0.4010804@python.org> <1470857181.1525666.691718929.251B8B35@webmail.messagingengine.com> Message-ID: <57AB82E7.40205@python.org> On 10Aug2016 1226, Random832 wrote: > On Wed, Aug 10, 2016, at 15:08, Steve Dower wrote: >> Testing with obscure filenames and strings is where help will be needed >> most :) > > How about filenames with invalid surrogates? For added fun, consider > that the file system encoding is normally used with surrogateescape. This is where it gets extra fun, since surrogateescape is not normally used on Windows because we receive paths as Unicode text and pass them back as Unicode text without ever encoding or decoding them. Currently a broken filename (such as '\udee1.txt') can be correctly seen with os.listdir('.') but not os.listdir(b'.') (because Windows will return it as '?.txt'). It can be passed to open(), but encoding the name to utf-8 or utf-16 fails, and I doubt there's any encoding that is going to succeed. As far as I can tell, if you get a weird name in bytes today you are broken, and there is no way to be unbroken without doing the actual right thing and converting paths on POSIX into Unicode with surrogateescape. So our official advice has to stay the same - treating paths as text with smuggled bytes is the *only* way to be truly correct. But unless we also deprecate byte paths on POSIX, we'll never get there. (Now there's a dangerous idea ;) ) Cheers, Steve From random832 at fastmail.com Wed Aug 10 16:09:05 2016 From: random832 at fastmail.com (Random832) Date: Wed, 10 Aug 2016 16:09:05 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB7EEC.2020201@python.org> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> Message-ID: <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> On Wed, Aug 10, 2016, at 15:22, Steve Dower wrote: > > Why? What's the use case? [byte paths] > > Allowing library developers who support POSIX and Windows to just use > bytes everywhere to represent paths. Okay, how is that use case impacted by it being mbcs instead of utf-8? What about only doing the deprecation warning if non-ascii bytes are present in the value? > > For reading, I assume. When opened for writing, it should probably be > > utf-8-sig [if it's not mbcs] to match what Notepad does. What about > > files opened for appending or updating? In theory it could ingest the > > whole file to see if it's valid UTF-8, but that has a time cost. > > Writing out the BOM automatically basically makes your files > incompatible with other platforms, which rarely expect a BOM. Yes but you're not running on other platforms, you're running on the platform you're running on. If files need to be moved between platforms, converting files with a BOM to without ought to be the responsibility of the same tool that converts CRLF line endings to LF. > By > omitting it but writing and reading UTF-8 we ensure that Python can > handle its own files on any platform, while potentially upsetting some > older applications on Windows or platforms that don't assume UTF-8 as a > default. Okay, you haven't addressed updating and appending. I realized after posting that updating should be in binary, but that leaves appending. Should we detect BOMs and/or attempt to detect the encoding by other means in those cases? > > Notepad, if there's no BOM, checks the first 256 bytes of the file for > > whether it's likely to be utf-16 or mbcs [utf-8 isn't considered AFAIK], > > and can get it wrong for certain very short files [i.e. the infamous > > "this app can break"] > > Yeah, this is a pretty horrible idea :) Eh, maybe the utf-16 because it can give some hilariously bad results, but using it to differentiate between utf-8 and mbcs might not be so bad. But what to do if all we see is ascii? > > What to do on opening a pipe or device? [Is os.fstat able to detect > > these cases?] > > We should be able to detect them, but why treat them any differently > from a file? Eh, I was mainly concerned about if the first few bytes aren't a BOM? What about blocking waiting for them? But if this is delayed until the first read then it's fine. > It probably also entails opening the file descriptor in bytes mode, > which might break programs that pass the fd directly to CRT functions. > Personally I wish they wouldn't, but it's too late to stop them now. The only thing O_TEXT does rather than O_BINARY is convert CRLF line endings (and maybe end on ^Z), and I don't think we even expose the constants for the CRT's unicode modes. From eryksun at gmail.com Wed Aug 10 16:16:57 2016 From: eryksun at gmail.com (eryk sun) Date: Wed, 10 Aug 2016 20:16:57 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB6E2D.6050704@python.org> References: <57AB6E2D.6050704@python.org> Message-ID: On Wed, Aug 10, 2016 at 6:10 PM, Steve Dower wrote: > Similarly, locale.getpreferredencoding() on Windows returns a legacy value - > the user's active code page - which should generally not be used for any > reason. The one exception is as a default encoding for opening files when no > other information is available (e.g. a Unicode BOM or explicit encoding > argument). BOMs are very common on Windows, since the default assumption is > nearly always a bad idea. The CRT doesn't allow UTF-8 as a locale encoding because Windows itself doesn't allow this. So locale.getpreferredencoding() can't change, but in practice it can be ignored. Speaking of locale, Windows Python should call setlocale(LC_CTYPE, "") in pylifecycle.c in order to work around an inconsistency between LC_TIME and LC_CTYPE in the the default "C" locale. The former is ANSI while the latter is effectively Latin-1, which leads to mojibake in time.tzname and elsewhere. Calling setlocale(LC_CTYPE, "") is already done on most Unix systems, so this would actually improve cross-platform consistency. > Finally, the encoding of stdin, stdout and stderr are currently (correctly) > inferred from the encoding of the console window that Python is attached to. > However, this is typically a codepage that is different from the system > codepage (i.e. it's not mbcs) and is almost certainly not Unicode. If users > are starting Python from a console, they can use "chcp 65001" first to > switch to UTF-8, and then *most* functionality works (input() has some > issues, but those can be fixed with a slight rewrite and possibly breaking > readline hooks). Using codepage 65001 for output is broken prior to Windows 8 because WriteFile/WriteConsoleA returns (as an output parameter) the number of decoded UTF-16 codepoints instead of the number of bytes written, which makes a buffered writer repeatedly write garbage at the end of each write in proportion to the number of non-ASCII characters. This can be worked around by decoding to get the UTF-16 size before each write, or by just blindly assuming that a console write always succeeds in writing the entire buffer. In this case the console should be detected by GetConsoleMode(). isatty() isn't right for this since it's true for all character devices, which includes NUL among others. Codepage 65001 is broken for non-ASCII input (via ReadFile/ReadConsoleA) in all versions of Windows that I've tested, including Windows 10. By attaching a debugger to conhost.exe you can see how it fails in WideCharToMultiByte because it assumes one byte per character. If you try to read 10 bytes, it assumes you're trying to read 10 UTF-16 'characters' into a 10 byte buffer, which fails for UTF-8 when even a single non-ASCII character is read. The ReadFile/ReadConsoleA call returns that it successfully read 0 bytes, which is interpreted as EOF. This cannot be worked around. The only way to read the full range of Unicode from the console is via the wide-character APIs ReadConsoleW and ReadConsoleInputW. IMO, Python needs a C implementation of the win_unicode_console module, using the wide-character APIs ReadConsoleW and WriteConsoleW. Note that this sets sys.std*.encoding as UTF-8 and transcodes, so Python code never has to work directly with UTF-16 encoded text. From p.f.moore at gmail.com Wed Aug 10 16:56:03 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 10 Aug 2016 21:56:03 +0100 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> Message-ID: On 10 August 2016 at 21:16, eryk sun wrote: > IMO, Python needs a C implementation of the win_unicode_console > module, using the wide-character APIs ReadConsoleW and WriteConsoleW. > Note that this sets sys.std*.encoding as UTF-8 and transcodes, so > Python code never has to work directly with UTF-16 encoded text. +1 on this (and if this means we need to wait till 3.7, so be it). I'd originally thought this was what Steve was proposing. Paul From rosuav at gmail.com Wed Aug 10 17:31:17 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Aug 2016 07:31:17 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> Message-ID: On Thu, Aug 11, 2016 at 6:09 AM, Random832 wrote: > On Wed, Aug 10, 2016, at 15:22, Steve Dower wrote: >> > Why? What's the use case? [byte paths] >> >> Allowing library developers who support POSIX and Windows to just use >> bytes everywhere to represent paths. > > Okay, how is that use case impacted by it being mbcs instead of utf-8? AIUI, the data flow would be: Python bytes object -> decode to Unicode text -> encode to UTF-16 -> Windows API. If you do the first transformation using mbcs, you're guaranteed *some* result (all Windows codepages have definitions for all byte values, if I'm not mistaken), but a hard-to-predict one - and worse, one that can change based on system settings. Also, if someone naively types "bytepath.decode()", Python will default to UTF-8, *not* to the system codepage. I'd rather a single consistent default encoding. > What about only doing the deprecation warning if non-ascii bytes are > present in the value? -1. Data-dependent warnings just serve to strengthen the feeling that "weird characters" keep breaking your programs, instead of writing your program to cope with all characters equally. It's like being racist against non-ASCII characters :) On Thu, Aug 11, 2016 at 4:10 AM, Steve Dower wrote: > To summarise the proposals (remembering that these would only affect Python > 3.6 on Windows): > > * change sys.getfilesystemencoding() to return 'utf-8' > * automatically decode byte paths assuming they are utf-8 > * remove the deprecation warning on byte paths +1 on these. > * make the default open() encoding check for a BOM or else use utf-8 -0.5. Is there any precedent for this kind of data-based detection being the default? An explicit "utf-sig" could do a full detection, but even then it's not perfect - how do you distinguish UTF-32LE from UTF-16LE that starts with U+0000? Do you say "UTF-32 is rare so we'll assume UTF-16", or do you say "files starting U+0000 are rare, so we'll assume UTF-32"? > * [ALTERNATIVE] make the default open() encoding check for a BOM or else use > sys.getpreferredencoding() -1. Same concerns as the above, plus I'd rather use the saner default. > * force the console encoding to UTF-8 on initialize and revert on finalize -0 for Python itself; +1 for Python's interactive interpreter. Programs that mess with console settings get annoying when they crash out and don't revert properly. Unless there is *no way* that you could externally kill the process without also bringing the terminal down, there's the distinct possibility of messing everything up. Would it be possible to have a "sys.setconsoleutf8()" that changes the console encoding and slaps in an atexit() to revert? That would at least leave it in the hands of the app. Overall I'm +1 on shifting from eight-bit encodings to UTF-8. Don't be held back by what Notepad does. ChrisA From rosuav at gmail.com Wed Aug 10 17:57:17 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Aug 2016 07:57:17 +1000 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> Message-ID: On Thu, Aug 11, 2016 at 3:17 AM, Sven R. Kunze wrote: > Lately, I talked to friend of mine about async and his initial reaction was > like "hmm that reminds me of early programming days, where you have to > explicitly tell the scheduler to get control back". He's much older than me, > so I think it was interesting for him to see that history is repeating > again. > Yes. One critical difference is that in the early days, one rogue program could bring the entire computer down; what we have now is preemptive switching between processes and cooperative between tasks within a process. Worst case, you can still only stall out your own program. ChrisA From brett at python.org Wed Aug 10 12:27:52 2016 From: brett at python.org (Brett Cannon) Date: Wed, 10 Aug 2016 16:27:52 +0000 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> Message-ID: I think over on async-sig my confusion over how this is for async generators and not coroutines came up. Now I think I know where my confusion stems from. I don't think this will change anything, but I wanted to get it out there as it might influence how we communicate things. Take Yury's simple example: async def ticker(delay, to): """Yield numbers from 0 to `to` every `delay` seconds.""" for i in range(to): yield i await asyncio.sleep(delay) In it was see `yield` and `await` in an `async def`. OK, so that signals that it's an asynchronous generator. But what happens if we take out the `yield`? async def ticker(delay, to): """Yield numbers from 0 to `to` every `delay` seconds.""" for i in range(to): await asyncio.sleep(delay) According to the inspect module that's a coroutine function that creates a coroutine/awaitable (and a function w/ @types.coroutine is just an awaitable when it contains a `yield`). Now the existence of `yield` in a function causing it to be a generator is obviously not a new concept, but since coroutines come from a weird generator background thanks to `yield from` we might need to start being very clear what @types.coroutine, `async def`, and `async def` w/ `yield` are -- awaitable (but not coroutine in spite of the decorator name), coroutine, and async generator, respectively -- and not refer to @types.coroutine/`yield from` in the same breath to minimize confusion. -Brett On Tue, 2 Aug 2016 at 15:32 Yury Selivanov wrote: > Hi, > > This is a new PEP to add asynchronous generators to Python 3.6. The PEP > is also available at [1]. > > There is a reference implementation [2] that supports everything that > the PEP proposes to add. > > [1] https://www.python.org/dev/peps/pep-0525/ > > [2] https://github.com/1st1/cpython/tree/async_gen > > Thank you! > > > PEP: 525 > Title: Asynchronous Generators > Version: $Revision$ > Last-Modified: $Date$ > Author: Yury Selivanov > Discussions-To: > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 28-Jul-2016 > Python-Version: 3.6 > Post-History: 02-Aug-2016 > > > Abstract > ======== > > PEP 492 introduced support for native coroutines and ``async``/``await`` > syntax to Python 3.5. It is proposed here to extend Python's > asynchronous capabilities by adding support for > *asynchronous generators*. > > > Rationale and Goals > =================== > > Regular generators (introduced in PEP 255) enabled an elegant way of > writing complex *data producers* and have them behave like an iterator. > > However, currently there is no equivalent concept for the *asynchronous > iteration protocol* (``async for``). This makes writing asynchronous > data producers unnecessarily complex, as one must define a class that > implements ``__aiter__`` and ``__anext__`` to be able to use it in > an ``async for`` statement. > > Essentially, the goals and rationale for PEP 255, applied to the > asynchronous execution case, hold true for this proposal as well. > > Performance is an additional point for this proposal: in our testing of > the reference implementation, asynchronous generators are **2x** faster > than an equivalent implemented as an asynchronous iterator. > > As an illustration of the code quality improvement, consider the > following class that prints numbers with a given delay once iterated:: > > class Ticker: > """Yield numbers from 0 to `to` every `delay` seconds.""" > > def __init__(self, delay, to): > self.delay = delay > self.i = 0 > self.to = to > > def __aiter__(self): > return self > > async def __anext__(self): > i = self.i > if i >= self.to: > raise StopAsyncIteration > self.i += 1 > if i: > await asyncio.sleep(self.delay) > return i > > > The same can be implemented as a much simpler asynchronous generator:: > > async def ticker(delay, to): > """Yield numbers from 0 to `to` every `delay` seconds.""" > for i in range(to): > yield i > await asyncio.sleep(delay) > > > Specification > ============= > > This proposal introduces the concept of *asynchronous generators* to > Python. > > This specification presumes knowledge of the implementation of > generators and coroutines in Python (PEP 342, PEP 380 and PEP 492). > > > Asynchronous Generators > ----------------------- > > A Python *generator* is any function containing one or more ``yield`` > expressions:: > > def func(): # a function > return > > def genfunc(): # a generator function > yield > > We propose to use the same approach to define > *asynchronous generators*:: > > async def coro(): # a coroutine function > await smth() > > async def asyncgen(): # an asynchronous generator function > await smth() > yield 42 > > The result of calling an *asynchronous generator function* is > an *asynchronous generator object*, which implements the asynchronous > iteration protocol defined in PEP 492. > > It is a ``SyntaxError`` to have a non-empty ``return`` statement in an > asynchronous generator. > > > Support for Asynchronous Iteration Protocol > ------------------------------------------- > > The protocol requires two special methods to be implemented: > > 1. An ``__aiter__`` method returning an *asynchronous iterator*. > 2. An ``__anext__`` method returning an *awaitable* object, which uses > ``StopIteration`` exception to "yield" values, and > ``StopAsyncIteration`` exception to signal the end of the iteration. > > Asynchronous generators define both of these methods. Let's manually > iterate over a simple asynchronous generator:: > > async def genfunc(): > yield 1 > yield 2 > > gen = genfunc() > > assert gen.__aiter__() is gen > > assert await gen.__anext__() == 1 > assert await gen.__anext__() == 2 > > await gen.__anext__() # This line will raise StopAsyncIteration. > > > Finalization > ------------ > > PEP 492 requires an event loop or a scheduler to run coroutines. > Because asynchronous generators are meant to be used from coroutines, > they also require an event loop to run and finalize them. > > Asynchronous generators can have ``try..finally`` blocks, as well as > ``async with``. It is important to provide a guarantee that, even > when partially iterated, and then garbage collected, generators can > be safely finalized. For example:: > > async def square_series(con, to): > async with con.transaction(): > cursor = con.cursor( > 'SELECT generate_series(0, $1) AS i', to) > async for row in cursor: > yield row['i'] ** 2 > > async for i in square_series(con, 1000): > if i == 100: > break > > The above code defines an asynchronous generator that uses > ``async with`` to iterate over a database cursor in a transaction. > The generator is then iterated over with ``async for``, which interrupts > the iteration at some point. > > The ``square_series()`` generator will then be garbage collected, > and without a mechanism to asynchronously close the generator, Python > interpreter would not be able to do anything. > > To solve this problem we propose to do the following: > > 1. Implement an ``aclose`` method on asynchronous generators > returning a special *awaitable*. When awaited it > throws a ``GeneratorExit`` into the suspended generator and > iterates over it until either a ``GeneratorExit`` or > a ``StopAsyncIteration`` occur. > > This is very similar to what the ``close()`` method does to regular > Python generators, except that an event loop is required to execute > ``aclose()``. > > 2. Raise a ``RuntimeError``, when an asynchronous generator executes > a ``yield`` expression in its ``finally`` block (using ``await`` > is fine, though):: > > async def gen(): > try: > yield > finally: > await asyncio.sleep(1) # Can use 'await'. > > yield # Cannot use 'yield', > # this line will trigger a > # RuntimeError. > > 3. Add two new methods to the ``sys`` module: > ``set_asyncgen_finalizer()`` and ``get_asyncgen_finalizer()``. > > The idea behind ``sys.set_asyncgen_finalizer()`` is to allow event > loops to handle generators finalization, so that the end user > does not need to care about the finalization problem, and it just > works. > > When an asynchronous generator is iterated for the first time, > it stores a reference to the current finalizer. If there is none, > a ``RuntimeError`` is raised. This provides a strong guarantee that > every asynchronous generator object will always have a finalizer > installed by the correct event loop. > > When an asynchronous generator is about to be garbage collected, > it calls its cached finalizer. The assumption is that the finalizer > will schedule an ``aclose()`` call with the loop that was active > when the iteration started. > > For instance, here is how asyncio is modified to allow safe > finalization of asynchronous generators:: > > # asyncio/base_events.py > > class BaseEventLoop: > > def run_forever(self): > ... > old_finalizer = sys.get_asyncgen_finalizer() > sys.set_asyncgen_finalizer(self._finalize_asyncgen) > try: > ... > finally: > sys.set_asyncgen_finalizer(old_finalizer) > ... > > def _finalize_asyncgen(self, gen): > self.create_task(gen.aclose()) > > ``sys.set_asyncgen_finalizer()`` is thread-specific, so several event > loops running in parallel threads can use it safely. > > > Asynchronous Generator Object > ----------------------------- > > The object is modeled after the standard Python generator object. > Essentially, the behaviour of asynchronous generators is designed > to replicate the behaviour of synchronous generators, with the only > difference in that the API is asynchronous. > > The following methods and properties are defined: > > 1. ``agen.__aiter__()``: Returns ``agen``. > > 2. ``agen.__anext__()``: Returns an *awaitable*, that performs one > asynchronous generator iteration when awaited. > > 3. ``agen.asend(val)``: Returns an *awaitable*, that pushes the > ``val`` object in the ``agen`` generator. When the ``agen`` has > not yet been iterated, ``val`` must be ``None``. > > Example:: > > async def gen(): > await asyncio.sleep(0.1) > v = yield 42 > print(v) > await asyncio.sleep(0.2) > > g = gen() > > await g.asend(None) # Will return 42 after sleeping > # for 0.1 seconds. > > await g.asend('hello') # Will print 'hello' and > # raise StopAsyncIteration > # (after sleeping for 0.2 seconds.) > > 4. ``agen.athrow(typ, [val, [tb]])``: Returns an *awaitable*, that > throws an exception into the ``agen`` generator. > > Example:: > > async def gen(): > try: > await asyncio.sleep(0.1) > yield 'hello' > except ZeroDivisionError: > await asyncio.sleep(0.2) > yield 'world' > > g = gen() > v = await g.asend(None) > print(v) # Will print 'hello' after > # sleeping for 0.1 seconds. > > v = await g.athrow(ZeroDivisionError) > print(v) # Will print 'world' after > $ sleeping 0.2 seconds. > > 5. ``agen.aclose()``: Returns an *awaitable*, that throws a > ``GeneratorExit`` exception into the generator. The *awaitable* can > either return a yielded value, if ``agen`` handled the exception, > or ``agen`` will be closed and the exception will propagate back > to the caller. > > 6. ``agen.__name__`` and ``agen.__qualname__``: readable and writable > name and qualified name attributes. > > 7. ``agen.ag_await``: The object that ``agen`` is currently *awaiting* > on, or ``None``. This is similar to the currently available > ``gi_yieldfrom`` for generators and ``cr_await`` for coroutines. > > 8. ``agen.ag_frame``, ``agen.ag_running``, and ``agen.ag_code``: > defined in the same way as similar attributes of standard generators. > > ``StopIteration`` and ``StopAsyncIteration`` are not propagated out of > asynchronous generators, and are replaced with a ``RuntimeError``. > > > Implementation Details > ---------------------- > > Asynchronous generator object (``PyAsyncGenObject``) shares the > struct layout with ``PyGenObject``. In addition to that, the > reference implementation introduces three new objects: > > 1. ``PyAsyncGenASend``: the awaitable object that implements > ``__anext__`` and ``asend()`` methods. > > 2. ``PyAsyncGenAThrow``: the awaitable object that implements > ``athrow()`` and ``aclose()`` methods. > > 3. ``_PyAsyncGenWrappedValue``: every directly yielded object from an > asynchronous generator is implicitly boxed into this structure. This > is how the generator implementation can separate objects that are > yielded using regular iteration protocol from objects that are > yielded using asynchronous iteration protocol. > > ``PyAsyncGenASend`` and ``PyAsyncGenAThrow`` are awaitables (they have > ``__await__`` methods returning ``self``) and are coroutine-like objects > (implementing ``__iter__``, ``__next__``, ``send()`` and ``throw()`` > methods). Essentially, they control how asynchronous generators are > iterated: > > .. image:: pep-0525-1.png > :align: center > :width: 80% > > > PyAsyncGenASend and PyAsyncGenAThrow > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > ``PyAsyncGenASend`` is a coroutine-like object that drives ``__anext__`` > and ``asend()`` methods and implements the asynchronous iteration > protocol. > > ``agen.asend(val)`` and ``agen.__anext__()`` return instances of > ``PyAsyncGenASend`` (which hold references back to the parent > ``agen`` object.) > > The data flow is defined as follows: > > 1. When ``PyAsyncGenASend.send(val)`` is called for the first time, > ``val`` is pushed to the parent ``agen`` object (using existing > facilities of ``PyGenObject``.) > > Subsequent iterations over the ``PyAsyncGenASend`` objects, push > ``None`` to ``agen``. > > When a ``_PyAsyncGenWrappedValue`` object is yielded, it > is unboxed, and a ``StopIteration`` exception is raised with the > unwrapped value as an argument. > > 2. When ``PyAsyncGenASend.throw(*exc)`` is called for the first time, > ``*exc`` is throwed into the parent ``agen`` object. > > Subsequent iterations over the ``PyAsyncGenASend`` objects, push > ``None`` to ``agen``. > > When a ``_PyAsyncGenWrappedValue`` object is yielded, it > is unboxed, and a ``StopIteration`` exception is raised with the > unwrapped value as an argument. > > 3. ``return`` statements in asynchronous generators raise > ``StopAsyncIteration`` exception, which is propagated through > ``PyAsyncGenASend.send()`` and ``PyAsyncGenASend.throw()`` methods. > > ``PyAsyncGenAThrow`` is very similar to ``PyAsyncGenASend``. The only > difference is that ``PyAsyncGenAThrow.send()``, when called first time, > throws an exception into the parent ``agen`` object (instead of pushing > a value into it.) > > > New Standard Library Functions and Types > ---------------------------------------- > > 1. ``types.AsyncGeneratorType`` -- type of asynchronous generator > object. > > 2. ``sys.set_asyncgen_finalizer()`` and ``sys.get_asyncgen_finalizer()`` > methods to set up asynchronous generators finalizers in event loops. > > 3. ``inspect.isasyncgen()`` and ``inspect.isasyncgenfunction()`` > introspection functions. > > > Backwards Compatibility > ----------------------- > > The proposal is fully backwards compatible. > > In Python 3.5 it is a ``SyntaxError`` to define an ``async def`` > function with a ``yield`` expression inside, therefore it's safe to > introduce asynchronous generators in 3.6. > > > Performance > =========== > > Regular Generators > ------------------ > > There is no performance degradation for regular generators. > The following micro benchmark runs at the same speed on CPython with > and without asynchronous generators:: > > def gen(): > i = 0 > while i < 100000000: > yield i > i += 1 > > list(gen()) > > > Improvements over asynchronous iterators > ---------------------------------------- > > The following micro-benchmark shows that asynchronous generators > are about **2.3x faster** than asynchronous iterators implemented in > pure Python:: > > N = 10 ** 7 > > async def agen(): > for i in range(N): > yield i > > class AIter: > def __init__(self): > self.i = 0 > > def __aiter__(self): > return self > > async def __anext__(self): > i = self.i > if i >= N: > raise StopAsyncIteration > self.i += 1 > return i > > > Design Considerations > ===================== > > > ``aiter()`` and ``anext()`` builtins > ------------------------------------ > > Originally, PEP 492 defined ``__aiter__`` as a method that should > return an *awaitable* object, resulting in an asynchronous iterator. > > However, in CPython 3.5.2, ``__aiter__`` was redefined to return > asynchronous iterators directly. To avoid breaking backwards > compatibility, it was decided that Python 3.6 will support both > ways: ``__aiter__`` can still return an *awaitable* with > a ``DeprecationWarning`` being issued. > > Because of this dual nature of ``__aiter__`` in Python 3.6, we cannot > add a synchronous implementation of ``aiter()`` built-in. Therefore, > it is proposed to wait until Python 3.7. > > > Asynchronous list/dict/set comprehensions > ----------------------------------------- > > Syntax for asynchronous comprehensions is unrelated to the asynchronous > generators machinery, and should be considered in a separate PEP. > > > Asynchronous ``yield from`` > --------------------------- > > While it is theoretically possible to implement ``yield from`` support > for asynchronous generators, it would require a serious redesign of the > generators implementation. > > ``yield from`` is also less critical for asynchronous generators, since > there is no need provide a mechanism of implementing another coroutines > protocol on top of coroutines. And to compose asynchronous generators a > simple ``async for`` loop can be used:: > > async def g1(): > yield 1 > yield 2 > > async def g2(): > async for v in g1(): > yield v > > > Why the ``asend()`` and ``athrow()`` methods are necessary > ---------------------------------------------------------- > > They make it possible to implement concepts similar to > ``contextlib.contextmanager`` using asynchronous generators. > For instance, with the proposed design, it is possible to implement > the following pattern:: > > @async_context_manager > async def ctx(): > await open() > try: > yield > finally: > await close() > > async with ctx(): > await ... > > Another reason is that it is possible to push data and throw exceptions > into asynchronous generators using the object returned from > ``__anext__`` object, but it is hard to do that correctly. Adding > explicit ``asend()`` and ``athrow()`` will pave a safe way to > accomplish that. > > In terms of implementation, ``asend()`` is a slightly more generic > version of ``__anext__``, and ``athrow()`` is very similar to > ``aclose()``. Therefore having these methods defined for asynchronous > generators does not add any extra complexity. > > > Example > ======= > > A working example with the current reference implementation (will > print numbers from 0 to 9 with one second delay):: > > async def ticker(delay, to): > for i in range(to): > yield i > await asyncio.sleep(delay) > > > async def run(): > async for i in ticker(1, 10): > print(i) > > > import asyncio > loop = asyncio.get_event_loop() > try: > loop.run_until_complete(run()) > finally: > loop.close() > > > Implementation > ============== > > The complete reference implementation is available at [1]_. > > > References > ========== > > .. [1] https://github.com/1st1/cpython/tree/async_gen > > > Copyright > ========= > > This document has been placed in the public domain. > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Aug 10 18:15:39 2016 From: brett at python.org (Brett Cannon) Date: Wed, 10 Aug 2016 22:15:39 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB6E2D.6050704@python.org> References: <57AB6E2D.6050704@python.org> Message-ID: On Wed, 10 Aug 2016 at 11:16 Steve Dower wrote: > [SNIP] > > Finally, the encoding of stdin, stdout and stderr are currently > (correctly) inferred from the encoding of the console window that Python > is attached to. However, this is typically a codepage that is different > from the system codepage (i.e. it's not mbcs) and is almost certainly > not Unicode. If users are starting Python from a console, they can use > "chcp 65001" first to switch to UTF-8, and then *most* functionality > works (input() has some issues, but those can be fixed with a slight > rewrite and possibly breaking readline hooks). > > It is also possible for Python to change the current console encoding to > be UTF-8 on initialize and change it back on finalize. (This would leave > the console in an unexpected state if Python segfaults, but console > encoding is probably the least of anyone's worries at that point.) So > I'm proposing actively changing the current console to be Unicode while > Python is running, and hence sys.std[in|out|err] will default to utf-8. > > So that's a broad range of changes, and I have little hope of figuring > out all the possible issues, back-compat risks, and flow-on effects on > my own. Please let me know (either on-list or off-list) how a change > like this would affect your projects, either positively or negatively, > and whether you have any specific experience with these changes/fixes > and think they should be approached differently. > > > To summarise the proposals (remembering that these would only affect > Python 3.6 on Windows): > > [SNIP] > * force the console encoding to UTF-8 on initialize and revert on finalize > Don't have enough Windows experience to comment on the other parts of this proposal, but for the console encoding I am a hearty +1 as I'm tired of Unicode characters failing to show up in the REPL. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Aug 10 19:04:01 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 10 Aug 2016 17:04:01 -0600 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <6985f794-7972-4c61-9d18-c46fe668beeb@googlegroups.com> Message-ID: On Wed, Aug 10, 2016 at 11:56 AM, Guido van Rossum wrote: > Sounds like you're thinking with your runtime hat on, not your type checker > hat. :-) > > On Tue, Aug 9, 2016 at 9:46 PM, Neil Girdhar wrote: >> >> With PEP 520 accepted, would it be possible to iterate over >> __definition_order__? Still, it would be really nice to be able to introspect a class's instance attributes at run-time. A stdlib helper for that would be great, e.g. "inspect.get_inst_attrs(cls)". At one point a few years back I wrote something like that derived from the signature of cls.__init__() and in the spirit of inspect.signature(). It turned out to be quite useful. Relatedly, it may make sense to recommend in PEP 8 that all instance attribute declarations in a class definition be grouped together and to do so right before the methods (right before __new__/__init__?). (...or disallow instance attribute declarations in the stdlib for now.) -eric From eryksun at gmail.com Wed Aug 10 19:04:00 2016 From: eryksun at gmail.com (eryk sun) Date: Wed, 10 Aug 2016 23:04:00 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> Message-ID: On Wed, Aug 10, 2016 at 8:09 PM, Random832 wrote: > On Wed, Aug 10, 2016, at 15:22, Steve Dower wrote: >> >> Allowing library developers who support POSIX and Windows to just use >> bytes everywhere to represent paths. > > Okay, how is that use case impacted by it being mbcs instead of utf-8? Using 'mbcs' doesn't work reliably with arbitrary bytes paths in locales that use a DBCS codepage such as 932. If a sequence is invalid, it gets passed to the filesystem as the default Unicode character, so it won't successfully roundtrip. In the following example b"\x81\xad", which isn't defined in CP932, gets mapped to the codepage's default Unicode character, Katakana middle dot, which encodes back as b"\x81E": >>> locale.getpreferredencoding() 'cp932' >>> open(b'\x81\xad', 'w').close() >>> os.listdir('.') ['?'] >>> unicodedata.name(os.listdir('.')[0]) 'KATAKANA MIDDLE DOT' >>> '?'.encode('932') b'\x81E' This isn't a problem for single-byte codepages, since every byte value uniquely maps to a Unicode code point, even if it's simply b'\x81' => u"\x81". Obviously there's still the general problem of dealing with arbitrary Unicode filenames created by other programs, since the ANSI API can only return a best-fit encoding of the filename, which is useless for actually accessing the file. >> It probably also entails opening the file descriptor in bytes mode, >> which might break programs that pass the fd directly to CRT functions. >> Personally I wish they wouldn't, but it's too late to stop them now. > > The only thing O_TEXT does rather than O_BINARY is convert CRLF line > endings (and maybe end on ^Z), and I don't think we even expose the > constants for the CRT's unicode modes. Python 3 uses O_BINARY when opening files, unless you explicitly call os.open. Specifically, FileIO.__init__ adds O_BINARY to the open flags if the platform defines it. The Windows CRT reads the BOM for the Unicode modes O_WTEXT, O_U16TEXT, and O_U8TEXT. For O_APPEND | O_WRONLY mode, this requires opening the file twice, the first time with read access. See configure_text_mode() in "Windows Kits\10\Source\10.0.10586.0\ucrt\lowio\open.cpp". Python doesn't expose or use these Unicode text-mode constants. That's for the best because in Unicode mode the CRT invokes the invalid parameter handler when a buffer doesn't have an even number of bytes, i.e. a multiple of sizeof(wchar_t). Python could copy how configure_text_mode() handles the BOM, except it shouldn't write a BOM for new UTF-8 files. From guido at python.org Wed Aug 10 19:08:16 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Aug 2016 16:08:16 -0700 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: <6985f794-7972-4c61-9d18-c46fe668beeb@googlegroups.com> Message-ID: On Wed, Aug 10, 2016 at 4:04 PM, Eric Snow wrote: [...] > > Still, it would be really nice to be able to introspect a class's > instance attributes at run-time. A stdlib helper for that would be > great, e.g. "inspect.get_inst_attrs(cls)". At one point a few years > back I wrote something like that derived from the signature of > cls.__init__() and in the spirit of inspect.signature(). It turned > out to be quite useful. > Yes, the proposal will store variable annotations for globals and for classes in __annotations__ (one global, one per class). Just not for local variables. > Relatedly, it may make sense to recommend in PEP 8 that all instance > attribute declarations in a class definition be grouped together and > to do so right before the methods (right before __new__/__init__?). > (...or disallow instance attribute declarations in the stdlib for > now.) > Let's wait until we have some experience with how it's used before updating PEP 8... :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Wed Aug 10 19:30:41 2016 From: random832 at fastmail.com (Random832) Date: Wed, 10 Aug 2016 19:30:41 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> Message-ID: <1470871841.2523792.691926257.5A72710B@webmail.messagingengine.com> On Wed, Aug 10, 2016, at 19:04, eryk sun wrote: > Using 'mbcs' doesn't work reliably with arbitrary bytes paths in > locales that use a DBCS codepage such as 932. Er... utf-8 doesn't work reliably with arbitrary bytes paths either, unless you intend to use surrogateescape (which you could also do with mbcs). Is there any particular reason to expect all bytes paths in this scenario to be valid UTF-8? > Python 3 uses O_BINARY when opening files, unless you explicitly call > os.open. Specifically, FileIO.__init__ adds O_BINARY to the open flags > if the platform defines it. Fair enough. I wasn't sure, particularly considering that python does expose O_BINARY, O_TEXT, and msvcrt.setmode. I'm not sure I approve of os.open not also adding it (or perhaps adding it only if O_TEXT is not explicitly added), but... meh. > Python could copy how > configure_text_mode() handles the BOM, except it shouldn't write a BOM > for new UTF-8 files. I disagree. I think that *on windows* it should, just like *on windows* it should write CR-LF for line endings. From steve.dower at python.org Wed Aug 10 19:40:31 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 10 Aug 2016 16:40:31 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> Message-ID: <57ABBB6F.7050206@python.org> On 10Aug2016 1431, Chris Angelico wrote: > On Thu, Aug 11, 2016 at 6:09 AM, Random832 wrote: >> On Wed, Aug 10, 2016, at 15:22, Steve Dower wrote: >>>> Why? What's the use case? [byte paths] >>> >>> Allowing library developers who support POSIX and Windows to just use >>> bytes everywhere to represent paths. >> >> Okay, how is that use case impacted by it being mbcs instead of utf-8? > > AIUI, the data flow would be: Python bytes object -> decode to Unicode > text -> encode to UTF-16 -> Windows API. If you do the first > transformation using mbcs, you're guaranteed *some* result (all > Windows codepages have definitions for all byte values, if I'm not > mistaken), but a hard-to-predict one - and worse, one that can change > based on system settings. Also, if someone naively types > "bytepath.decode()", Python will default to UTF-8, *not* to the system > codepage. > > I'd rather a single consistent default encoding. I'm proposing to make that single consistent default encoding utf-8. It sounds like we're in agreement? >> What about only doing the deprecation warning if non-ascii bytes are >> present in the value? > > -1. Data-dependent warnings just serve to strengthen the feeling that > "weird characters" keep breaking your programs, instead of writing > your program to cope with all characters equally. It's like being > racist against non-ASCII characters :) Agreed. This won't happen. > On Thu, Aug 11, 2016 at 4:10 AM, Steve Dower wrote: >> To summarise the proposals (remembering that these would only affect Python >> 3.6 on Windows): >> >> * change sys.getfilesystemencoding() to return 'utf-8' >> * automatically decode byte paths assuming they are utf-8 >> * remove the deprecation warning on byte paths > > +1 on these. > >> * make the default open() encoding check for a BOM or else use utf-8 > > -0.5. Is there any precedent for this kind of data-based detection > being the default? An explicit "utf-sig" could do a full detection, > but even then it's not perfect - how do you distinguish UTF-32LE from > UTF-16LE that starts with U+0000? Do you say "UTF-32 is rare so we'll > assume UTF-16", or do you say "files starting U+0000 are rare, so > we'll assume UTF-32"? The BOM exists solely for data-based detection, and the UTF-8 BOM is different from the UTF-16 and UTF-32 ones. So we either find an exact BOM (which IIRC decodes as a no-op spacing character, though I have a feeling some version of Unicode redefined it exclusively for being the marker) or we use utf-8. But the main reason for detecting the BOM is that currently opening files with 'utf-8' does not skip the BOM if it exists. I'd be quite happy with changing the default encoding to: * utf-8-sig when reading (so the UTF-8 BOM is skipped if it exists) * utf-8 when writing (so the BOM is *not* written) This provides the best compatibility when reading/writing files without making any guesses. We could reasonably extend this to read utf-16 and utf-32 if they have a BOM, but that's an extension and not necessary for the main change. >> * force the console encoding to UTF-8 on initialize and revert on finalize > > -0 for Python itself; +1 for Python's interactive interpreter. > Programs that mess with console settings get annoying when they crash > out and don't revert properly. Unless there is *no way* that you could > externally kill the process without also bringing the terminal down, > there's the distinct possibility of messing everything up. The main problem here is that if the console is not forced to UTF-8 then it won't render any of the characters correctly. > Would it be possible to have a "sys.setconsoleutf8()" that changes the > console encoding and slaps in an atexit() to revert? That would at > least leave it in the hands of the app. Yes, but if the app is going to opt-in then I'd suggest the win_unicode_console package, which won't require any particular changes. It sounds like we'll have to look into effectively merging that package into the core. I'm afraid that'll come with a much longer tail of bugs (and will quite likely break code that expects to use file descriptors to access stdin/out), but it's the least impactful way to do it. Cheers, Steve From steve.dower at python.org Wed Aug 10 19:48:35 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 10 Aug 2016 16:48:35 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1470871841.2523792.691926257.5A72710B@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <1470871841.2523792.691926257.5A72710B@webmail.messagingengine.com> Message-ID: <57ABBD53.4090204@python.org> On 10Aug2016 1630, Random832 wrote: > On Wed, Aug 10, 2016, at 19:04, eryk sun wrote: >> Using 'mbcs' doesn't work reliably with arbitrary bytes paths in >> locales that use a DBCS codepage such as 932. > > Er... utf-8 doesn't work reliably with arbitrary bytes paths either, > unless you intend to use surrogateescape (which you could also do with > mbcs). > > Is there any particular reason to expect all bytes paths in this > scenario to be valid UTF-8? On Windows, all paths are effectively UCS-2 (they are defined as UTF-16, but surrogate pairs don't seem to be validated, which IIUC means it's really UCS-2), so while the majority can be encoded as valid UTF-8, there are some paths which cannot. (These paths are going to break many other tools though, such as PowerShell, so we won't be in bad company if we can't handle them properly in edge cases). surrogateescape is irrelevant because it's only for decoding from bytes. An alternative approach would be to replace mbcs with a ucs-2 encoding that is basically just a blob of the path that was returned from Windows (using the Unicode APIs). None of the manipulation functions would work on this though, since nearly every second character would be \x00, but it's the only way (besides using str) to maintain full fidelity for every possible path name. Compromising on UTF-8 is going to increase consistency across platforms and across different Windows installations without increasing the rate of errors above what we currently see (given that invalid characters are currently replaced with '?'). It's not a 100% solution, but it's a 99% solution where the 1% is not handled well by anyone. Cheers, Steve From eryksun at gmail.com Wed Aug 10 19:49:21 2016 From: eryksun at gmail.com (eryk sun) Date: Wed, 10 Aug 2016 23:49:21 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1470871841.2523792.691926257.5A72710B@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <1470871841.2523792.691926257.5A72710B@webmail.messagingengine.com> Message-ID: On Wed, Aug 10, 2016 at 11:30 PM, Random832 wrote: > Er... utf-8 doesn't work reliably with arbitrary bytes paths either, > unless you intend to use surrogateescape (which you could also do with > mbcs). > > Is there any particular reason to expect all bytes paths in this > scenario to be valid UTF-8? The problem is more so that data is lost without an error when using the legacy ANSI API. If the path is invalid UTF-8, Python will at least raise an exception when decoding it. To work around this, the developers may decide they need to just bite the bullet and use Unicode, or maybe there could be legacy Latin-1 and ANSI modes enabled by an environment variable or sys flag. From rosuav at gmail.com Wed Aug 10 20:41:01 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Aug 2016 10:41:01 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57ABBB6F.7050206@python.org> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> Message-ID: On Thu, Aug 11, 2016 at 9:40 AM, Steve Dower wrote: > On 10Aug2016 1431, Chris Angelico wrote: >> I'd rather a single consistent default encoding. > > I'm proposing to make that single consistent default encoding utf-8. It > sounds like we're in agreement? Yes, we are. I was disagreeing with Random's suggestion that mbcs would also serve. Defaulting to UTF-8 everywhere is (a) consistent on all systems, regardless of settings; and (b) consistent with bytes.decode() and str.encode(), both of which default to UTF-8. >> -0.5. Is there any precedent for this kind of data-based detection >> being the default? An explicit "utf-sig" could do a full detection, >> but even then it's not perfect - how do you distinguish UTF-32LE from >> UTF-16LE that starts with U+0000? Do you say "UTF-32 is rare so we'll >> assume UTF-16", or do you say "files starting U+0000 are rare, so >> we'll assume UTF-32"? > > > The BOM exists solely for data-based detection, and the UTF-8 BOM is > different from the UTF-16 and UTF-32 ones. So we either find an exact BOM > (which IIRC decodes as a no-op spacing character, though I have a feeling > some version of Unicode redefined it exclusively for being the marker) or we > use utf-8. > > But the main reason for detecting the BOM is that currently opening files > with 'utf-8' does not skip the BOM if it exists. I'd be quite happy with > changing the default encoding to: > > * utf-8-sig when reading (so the UTF-8 BOM is skipped if it exists) > * utf-8 when writing (so the BOM is *not* written) > > This provides the best compatibility when reading/writing files without > making any guesses. We could reasonably extend this to read utf-16 and > utf-32 if they have a BOM, but that's an extension and not necessary for the > main change. AIUI the utf-8-sig encoding is happy to decode something that doesn't have a signature, right? If so, then yes, I would definitely support that mild mismatch in defaults. Chew up that UTF-8 aBOMination and just use UTF-8 as is. I've almost never seen files stored in UTF-32 (even UTF-16 isn't all that common compared to UTF-8), so I wouldn't stress too much about that. Recognizing FE FF or FF FE and decoding as UTF-16 might be worth doing, but it could easily be retrofitted (that byte sequence won't decode as UTF-8). >>> * force the console encoding to UTF-8 on initialize and revert on >>> finalize >> >> >> -0 for Python itself; +1 for Python's interactive interpreter. >> Programs that mess with console settings get annoying when they crash >> out and don't revert properly. Unless there is *no way* that you could >> externally kill the process without also bringing the terminal down, >> there's the distinct possibility of messing everything up. > > > The main problem here is that if the console is not forced to UTF-8 then it > won't render any of the characters correctly. Ehh, that's annoying. Is there a way to guarantee, at the process level, that the console will be returned to "normal state" when Python exits? If not, there's the risk that people run a Python program and then the *next* program gets into trouble. But if that happens only on abnormal termination ("I killed Python from Task Manager, and it left stuff messed up so I had to close the console"), it's probably an acceptable risk. And the benefit sounds well worthwhile. Revising my recommendation to +0.9. ChrisA From eryksun at gmail.com Wed Aug 10 21:55:25 2016 From: eryksun at gmail.com (eryk sun) Date: Thu, 11 Aug 2016 01:55:25 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57ABBB6F.7050206@python.org> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> Message-ID: On Wed, Aug 10, 2016 at 11:40 PM, Steve Dower wrote: > It sounds like we'll have to look into effectively merging that package into > the core. I'm afraid that'll come with a much longer tail of bugs (and will > quite likely break code that expects to use file descriptors to access > stdin/out), but it's the least impactful way to do it. Programs that use sys.std*.encoding but use the file descriptor seem like a weird case to me. Do you have an example? From steve at pearwood.info Wed Aug 10 23:14:04 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Aug 2016 13:14:04 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57ABBB6F.7050206@python.org> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> Message-ID: <20160811031403.GD26300@ando.pearwood.info> On Wed, Aug 10, 2016 at 04:40:31PM -0700, Steve Dower wrote: > On 10Aug2016 1431, Chris Angelico wrote: > >>* make the default open() encoding check for a BOM or else use utf-8 > > > >-0.5. Is there any precedent for this kind of data-based detection > >being the default? There is precedent: the Python interpreter will accept a BOM instead of an encoding cookie when importing .py files. [Chris] > >An explicit "utf-sig" could do a full detection, > >but even then it's not perfect - how do you distinguish UTF-32LE from > >UTF-16LE that starts with U+0000? BOMs are a heuristic, nothing more. If you're reading arbitrary files could start with anything, then of course they can guess wrong. But then if I dumped a bunch of arbitrary Unicode codepoints in your lap and asked you to guess the language, you would likely get it wrong too :-) [Chris] > >Do you say "UTF-32 is rare so we'll > >assume UTF-16", or do you say "files starting U+0000 are rare, so > >we'll assume UTF-32"? The way I have done auto-detection based on BOMs is you start by reading four bytes from the file in binary mode. (If there are fewer than four bytes, it cannot be a text file with a BOM.) Compare those first four bytes against the UTF-32 BOMs first, and the UTF-16 BOMs *second* (otherwise UTF-16 will shadow UFT-32). Note that there are two BOMs (big-endian and little-endian). Then check for UTF-8, and if you're really keen, UTF-7 and UTF-1. def bom2enc(bom, default=None): """Return encoding name from a four-byte BOM.""" if bom.startswith((b'\x00\x00\xFE\xFF', b'\xFF\xFE\x00\x00')): return 'utf_32' elif bom.startswith((b'\xFE\xFF', b'\xFF\xFE')): return 'utf_16' elif bom.startswith(b'\xEF\xBB\xBF'): return 'utf_8_sig' elif bom.startswith(b'\x2B\x2F\x76'): if len(bom) == 4 and bom[4] in b'\x2B\x2F\x38\x39': return 'utf_7' elif bom.startswith(b'\xF7\x64\x4C'): return 'utf_1' elif default is None: raise ValueError('no recognisable BOM signature') else: return default [Steve Dower] > The BOM exists solely for data-based detection, and the UTF-8 BOM is > different from the UTF-16 and UTF-32 ones. So we either find an exact > BOM (which IIRC decodes as a no-op spacing character, though I have a > feeling some version of Unicode redefined it exclusively for being the > marker) or we use utf-8. The Byte Order Mark is always U+FEFF encoded into whatever bytes your encoding uses. You should never use U+FEFF except as a BOM, but of course arbitrary Unicode strings might include it in the middle of the string Just Because. In that case, it may be interpreted as a legacy "ZERO WIDTH NON-BREAKING SPACE" character. But new content should never do that: you should use U+2060 "WORD JOINER" instead, and treat a U+FEFF inside the body of your file or string as an unsupported character. http://www.unicode.org/faq/utf_bom.html#BOM [Steve] > But the main reason for detecting the BOM is that currently opening > files with 'utf-8' does not skip the BOM if it exists. I'd be quite > happy with changing the default encoding to: > > * utf-8-sig when reading (so the UTF-8 BOM is skipped if it exists) > * utf-8 when writing (so the BOM is *not* written) Sounds reasonable to me. Rather than hard-coding that behaviour, can we have a new encoding that does that? "utf-8-readsig" perhaps. [Steve] > This provides the best compatibility when reading/writing files without > making any guesses. We could reasonably extend this to read utf-16 and > utf-32 if they have a BOM, but that's an extension and not necessary for > the main change. The use of a BOM is always a guess :-) Maybe I just happen to have a Latin1 file that starts with "???", or a Mac Roman file that starts with "???". Either case will be wrongly detected as UTF-8. That's the risk you take when using a heuristic. And if you don't want to use that heuristic, then you must specify the actual encoding in use. -- Steven D'Aprano From ncoghlan at gmail.com Wed Aug 10 23:17:30 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Aug 2016 13:17:30 +1000 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> Message-ID: On 11 August 2016 at 03:17, Sven R. Kunze wrote: > On 09.08.2016 05:23, Nick Coghlan wrote: > > On 9 August 2016 at 08:37, Sven R. Kunze wrote: > >> From what I've heard in the wild, that most if not all pieces of async >> are mirroring existing Python features. So, building async basically builds >> a parallel structure in Python resembling Python. Async generators complete >> the picture. Some people (including me) are concerned by this because they >> feel that having two "almost the same pieces" is not necessarily a good >> thing to have. And not necessarily bad but it feels like duplicating code >> all over the place especially as existing functions are incompatible with >> async. >> > > It's a known problem that applies to programming language design in > general rather than being Python specific: http://journal.stuffwithstuff. > com/2015/02/01/what-color-is-your-function/ > > > If it's a such well-known **problem**, why does it even exist in Python in > the first place? ;-) > Because sync vs async *isn't* primarily about parallelism - it's about different approaches to modelling a problem space. The key differences are akin to those between functional programming and object-oriented programming, rather than to those between using threads and processes for parallel code execution (when all you want is an even lighter weight alternative to threads without changing the programming model, then you want something like gevent). Glyph (of Twisted) wrote a good post about this back when asyncio was first being considered for the standard library: https://glyph.twistedmatrix.com/2014/02/unyielding.html So Python has support for both synchronous and asynchronous programming constructs for much the same reason we have both closures and classes: while technically isomorphic capabilities at a theoretical language design level, in practice, different problems lend themselves to different modelling techniques. I don't buy that one couldn't have avoided it. > You may want to check out some of the more event-driven programming oriented languages I mention in http://www.curiousefficiency.org/posts/2015/10/languages-to-improve-your-python.html#event-driven-programming-javascript-go-erlang-elixir The easy way to avoid it is to just not offer one of the other, the same way that pure functional languages don't offer the ability to define objects with mutable state, and primarily object-oriented languages may not offer support for full lexical closures. > Lately, I talked to friend of mine about async and his initial reaction > was like "hmm that reminds me of early programming days, where you have to > explicitly tell the scheduler to get control back". He's much older than > me, so I think it was interesting for him to see that history is repeating > again. > Yep, cooperative multi-threading predated pre-emptive multi-threading by *many* years (one of the big upgrades between Mac OS 9 and OS X was finally getting a pre-emptive scheduler). The difference is that where an OS is scheduling CPU access between mutually unaware applications, application developers can assume that other control flows *within* the application space are *not* actively hostile. In that latter context, enabling local reasoning about where context switches may happen can be valuable from a maintainability perspective, hence the pursuit of "best of both worlds" approaches now that the programming language design community has extensive experience with both. > One of the big mindset shifts it encourages is to design as many support > libraries as possible as computational pipelines and message-drive state > machines, rather than coupling them directly to IO operations (which is the > way many of them work today). Brett Cannon started the Sans IO information > project to discuss this concept at http://sans-io.readthedocs.io/ > > > Interesting shift indeed and I like small Lego bricks I can use to build a > big house. > > However, couldn't this be achieved without splitting the community? > You assume the current confusion around how the sync and async worlds interrelate will be permanent - it won't, any more than the confusion around the introduction of lexical closures and new-style classes was permanent back in the 2.2 era. It will be a process of convergence, first with the disparate async communities (Twisted, Tornado, gevent, etc) converging on the async/await syntactic abstraction on the asynchronous side of things, and then with standardisation of the sync/async gateways to make async implementations easy to consume from synchronous code (thread and process pools mean consumption of synchronous code from async is already relatively standardised) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Aug 10 23:26:16 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Aug 2016 13:26:16 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB6E2D.6050704@python.org> References: <57AB6E2D.6050704@python.org> Message-ID: On 11 August 2016 at 04:10, Steve Dower wrote: > > I suspect there's a lot of discussion to be had around this topic, so I want to get it started. There are some fairly drastic ideas here and I need help figuring out whether the impact outweighs the value. My main reaction would be that if Drekin (Adam Barto?) agrees the changes natively solve the problems that https://pypi.python.org/pypi/win_unicode_console works around, it's probably a good idea. The status quo is also sufficiently broken from both a native Windows perspective and a cross-platform compatibility perspective that your proposals are highly unlikely to make things *worse* :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Aug 10 23:28:56 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Aug 2016 13:28:56 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> Message-ID: On 11 August 2016 at 13:26, Nick Coghlan wrote: > On 11 August 2016 at 04:10, Steve Dower wrote: >> >> I suspect there's a lot of discussion to be had around this topic, so I want to get it started. There are some fairly drastic ideas here and I need help figuring out whether the impact outweighs the value. > > My main reaction would be that if Drekin (Adam Barto?) agrees the > changes natively solve the problems that > https://pypi.python.org/pypi/win_unicode_console works around, it's > probably a good idea. Also, a reminder that Adam has a couple of proposals on the tracker aimed at getting CPython to use a UTF-16-LE console on Windows: http://bugs.python.org/issue22555#msg242943 (last two issue references in that comment) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Wed Aug 10 23:46:22 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Aug 2016 13:46:22 +1000 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> Message-ID: On Thu, Aug 11, 2016 at 1:17 PM, Nick Coghlan wrote: >> Lately, I talked to friend of mine about async and his initial reaction >> was like "hmm that reminds me of early programming days, where you have to >> explicitly tell the scheduler to get control back". He's much older than me, >> so I think it was interesting for him to see that history is repeating >> again. > > > Yep, cooperative multi-threading predated pre-emptive multi-threading by > *many* years (one of the big upgrades between Mac OS 9 and OS X was finally > getting a pre-emptive scheduler). The difference is that where an OS is > scheduling CPU access between mutually unaware applications, application > developers can assume that other control flows *within* the application > space are *not* actively hostile. > Mac OS lagged behind other operating systems in that. Back before 1990, OS/2 offered the PC world an alternative to the cooperative multitasking of Windows. The basic difference was: In Windows (that being the 3.x line at the time), an infinite loop in your program will bring the whole system down, but in OS/2, it'll only bring your program down. Windows NT then built on the OS/2 model, and Win2K, XP, and all subsequent versions of Windows have used that same preemption. Linux takes things slightly differently in the way it handles threads within a process, but still, processes are preemptively switched between. Though tongue-in-cheek, this talk shows (along the way) some of the costs of preemption, and thus some of the possibilities you'd open up if you could go back to cooperation. While I don't think the time is right for *operating systems* to go that route, it's definitely an option for tasks within one process, where (as Nick says) you can assume that it's not actively hostile. On the other hand, I've had times when I have *really* appreciated preemptive thread switching, as it's allowed me to fix problems in one of my threads by accessing another thread interactively, so maybe the best model is a collection of threads with a master. https://www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript ChrisA From rosuav at gmail.com Thu Aug 11 00:09:00 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Aug 2016 14:09:00 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <20160811031403.GD26300@ando.pearwood.info> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> <20160811031403.GD26300@ando.pearwood.info> Message-ID: On Thu, Aug 11, 2016 at 1:14 PM, Steven D'Aprano wrote: > On Wed, Aug 10, 2016 at 04:40:31PM -0700, Steve Dower wrote: > >> On 10Aug2016 1431, Chris Angelico wrote: >> >>* make the default open() encoding check for a BOM or else use utf-8 >> > >> >-0.5. Is there any precedent for this kind of data-based detection >> >being the default? > > There is precedent: the Python interpreter will accept a BOM instead of > an encoding cookie when importing .py files. Okay, that's good enough for me. > [Chris] >> >An explicit "utf-sig" could do a full detection, >> >but even then it's not perfect - how do you distinguish UTF-32LE from >> >UTF-16LE that starts with U+0000? > > BOMs are a heuristic, nothing more. If you're reading arbitrary files > could start with anything, then of course they can guess wrong. But then > if I dumped a bunch of arbitrary Unicode codepoints in your lap and > asked you to guess the language, you would likely get it wrong too :-) I have my own mental heuristics, but I can't recognize one Cyrillic language from another. And some Slavic languages can be written with either Latin or Cyrillic letters, just to further confuse matters. Of course, "arbitrary Unicode codepoints" might not all come from one language, and might not be any language at all. (Do you wanna build a U+2603?) > [Chris] >> >Do you say "UTF-32 is rare so we'll >> >assume UTF-16", or do you say "files starting U+0000 are rare, so >> >we'll assume UTF-32"? > > The way I have done auto-detection based on BOMs is you start by reading > four bytes from the file in binary mode. (If there are fewer than four > bytes, it cannot be a text file with a BOM.) Interesting. Are you assuming that a text file cannot be empty? Because 0xFF 0xFE could represent an empty file in UTF-16, and 0xEF 0xBB 0xBF likewise for UTF-8. Or maybe you don't care about files with less than one character in them? > Compare those first four > bytes against the UTF-32 BOMs first, and the UTF-16 BOMs *second* > (otherwise UTF-16 will shadow UFT-32). Note that there are two BOMs > (big-endian and little-endian). Then check for UTF-8, and if you're > really keen, UTF-7 and UTF-1. For a default file-open encoding detection, I would minimize the number of options. The UTF-7 BOM could be the beginning of a file containing Base 64 data encoded in ASCII, which is a very real possibility. > elif bom.startswith(b'\x2B\x2F\x76'): > if len(bom) == 4 and bom[4] in b'\x2B\x2F\x38\x39': > return 'utf_7' So I wouldn't include UTF-7 in the detection. Nor UTF-1. Both are rare. Even UTF-32 doesn't necessarily have to be included. When was the last time you saw a UTF-32LE-BOM file? > [Steve] >> But the main reason for detecting the BOM is that currently opening >> files with 'utf-8' does not skip the BOM if it exists. I'd be quite >> happy with changing the default encoding to: >> >> * utf-8-sig when reading (so the UTF-8 BOM is skipped if it exists) >> * utf-8 when writing (so the BOM is *not* written) > > Sounds reasonable to me. > > Rather than hard-coding that behaviour, can we have a new encoding that > does that? "utf-8-readsig" perhaps. +1. Makes the documentation easier by having the default value for encoding not depend on the value for mode. ChrisA From random832 at fastmail.com Thu Aug 11 00:57:43 2016 From: random832 at fastmail.com (Random832) Date: Thu, 11 Aug 2016 00:57:43 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> Message-ID: <1470891463.1218630.692102449.5D597688@webmail.messagingengine.com> On Wed, Aug 10, 2016, at 17:31, Chris Angelico wrote: > AIUI, the data flow would be: Python bytes object Nothing _starts_ as a Python bytes object. It has to be read from somewhere or encoded in the source code as a literal. The scenario is very different for "defined internally within the program" (how are these not gonna be ASCII) vs "user input" (user input how? from the console? from tkinter? how'd that get converted to bytes?) vs "from a network or something like a tar file where it represents a path on some other system" (in which case it's in whatever encoding that system used, or *maybe* an encoding defined as part of the network protocol or file format). The use case has not been described adequately enough to answer my question. From p.f.moore at gmail.com Thu Aug 11 04:46:33 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 11 Aug 2016 09:46:33 +0100 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1470871841.2523792.691926257.5A72710B@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <1470871841.2523792.691926257.5A72710B@webmail.messagingengine.com> Message-ID: On 11 August 2016 at 00:30, Random832 wrote: >> Python could copy how >> configure_text_mode() handles the BOM, except it shouldn't write a BOM >> for new UTF-8 files. > > I disagree. I think that *on windows* it should, just like *on windows* > it should write CR-LF for line endings. Tools like git and hg, and cross platform text editors, handle transparently managing the differences between line endings for you. But nothing much handles BOM stripping/adding automatically. So while in theory the two cases are similar, in practice lack of tool support means that if we start adding BOMs on Windows (and requiring them so that we can detect UTF8) then we'll be setting up new interoperability problems for Python users, for little benefit. Paul From cory at lukasa.co.uk Thu Aug 11 04:46:38 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Thu, 11 Aug 2016 09:46:38 +0100 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> Message-ID: > On 11 Aug 2016, at 04:46, Chris Angelico wrote: > > Though tongue-in-cheek, this talk shows (along the way) some of the > costs of preemption, and thus some of the possibilities you'd open up > if you could go back to cooperation. While I don't think the time is > right for *operating systems* to go that route, it's definitely an > option for tasks within one process, where (as Nick says) you can > assume that it's not actively hostile. On the other hand, I've had > times when I have *really* appreciated preemptive thread switching, as > it's allowed me to fix problems in one of my threads by accessing > another thread interactively, so maybe the best model is a collection > of threads with a master. As glyph notes in the blog post Nick linked, the reason threads are a problematic programming model is exactly because of their pre-emptive nature. This essentially gives you combinatoric expansion of the excecution-order-possibility-space: you must now confirm that your program is safe to running its instructions in almost any order, because that is what pre-emptive multithreading allows. Now, as you point out Chris, pre-emptive thread switching can help in weird situations, but those weird situations certainly should not be the average case! Most programmers should not be writing I/O code that is multithreaded for the same reason that most programmers shouldn?t be writing assembly code to speed up their Python programs: the benefit obtained from that approach is usually not outweighed by the increased engineering and maintenance costs. That?s not to say that there aren?t times where both of those ideas are good ideas: just that they?re uncommon, and shouldn?t be the first tool you pull out of your toolkit. While we?re talking about function colour, we should note that you don?t *have* to have function colour. A quick look at your average Twisted codebase that doesn?t use @inlineCallbacks will quickly show you that you can write an asynchronous program using nothing but synchronous function calls, so long as you are careful. And this has always been true: in my first job I worked on a cooperative-multitasking program written entirely in C, and as we all know C has absolutely no method for describing function colour. Again, so long as you are careful and don?t do any blocking I/O in your program, everything works just fine. But, here?s the rub. People hate callbacks, and people are bad at threads. Not everyone: I?m perfectly happy writing Twisted code, thank you, and I know loads of programmers who swear blind that threading is easy and they never have any problems, and they are almost certainly right, I?m not going to deny their experience of their discipline. But the reality is that writing good threaded code and writing good callback code are both harder than writing good code using coloured functions: they require care, and discipline, and foresight, and an understanding of code execution that is much more complex than what is rendered in any single function. Whereas writing code with coloured functions is *easier*: the space of possible execution flows is lower, the yield points are all clearly marked, and most of the care and discipline you need is reduced. The general long term solution to this is not to remove function colour, because it helps people. The long term solution is to develop a programming culture that says that 99% of your code should be normal functions, and only the last tiny 1% should make any statements about function colour. This is part of what the Sans-I/O project is about: demonstrating that you should avoid painting your functions as async until the last possible second, and that the fewer times you write async/await/@asyncio.coroutine /yield from the easier your life gets. Cory -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Aug 11 05:07:42 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 11 Aug 2016 10:07:42 +0100 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> Message-ID: On 11 August 2016 at 01:41, Chris Angelico wrote: > I've almost never seen files stored in UTF-32 (even UTF-16 isn't all > that common compared to UTF-8), so I wouldn't stress too much about > that. Recognizing FE FF or FF FE and decoding as UTF-16 might be worth > doing, but it could easily be retrofitted (that byte sequence won't > decode as UTF-8). I see UTF-16 relatively often as a result of redirecting stdout in Powershell and forgetting that it defaults (stupidly, IMO) to UTF-16. >> The main problem here is that if the console is not forced to UTF-8 then it >> won't render any of the characters correctly. > > Ehh, that's annoying. Is there a way to guarantee, at the process > level, that the console will be returned to "normal state" when Python > exits? If not, there's the risk that people run a Python program and > then the *next* program gets into trouble. There's also the risk that Python programs using subprocess.Popen start the subprocess with the console in a non-standard state. Should we be temporarily restoring the console codepage in that case? How does the following work? set codepage to UTF-8 ... set codepage back spawn subprocess X, but don't wait for it set codepage to UTF-8 ... ... At this point what codepage does Python see? What codepage does process X see? (Note that they are both sharing the same console). ... restore codepage Paul From ncoghlan at gmail.com Thu Aug 11 08:00:08 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Aug 2016 22:00:08 +1000 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> Message-ID: On 11 Aug 2016 18:48, "Cory Benfield" wrote: > > While we?re talking about function colour, we should note that you don?t *have* to have function colour. A quick look at your average Twisted codebase that doesn?t use @inlineCallbacks will quickly show you that you can write an asynchronous program using nothing but synchronous function calls, so long as you are careful. And this has always been true: in my first job I worked on a cooperative-multitasking program written entirely in C, and as we all know C has absolutely no method for describing function colour. Again, so long as you are careful and don?t do any blocking I/O in your program, everything works just fine. Twisted callbacks are still red functions - you call them via the event loop rather than directly, and only event loop aware functions know how to make that request of the event loop. async/await just makes the function colour locally visible with a clear syntax, rather than needing to be inferred from behavioural details of the function implementation. > The general long term solution to this is not to remove function colour, because it helps people. The long term solution is to develop a programming culture that says that 99% of your code should be normal functions, and only the last tiny 1% should make any statements about function colour. This is part of what the Sans-I/O project is about: demonstrating that you should avoid painting your functions as async until the last possible second, and that the fewer times you write async/await/@asyncio.coroutine/yield from the easier your life gets. This part I wholeheartedly agree with, though - the vast majority of code will ideally be synchronous *non-blocking* code, suitable for use in data transformation pipelines and other data modelling and manipulation operations. Similar to Unicode handling, "event-driven or blocking?" is a decision best made at *system boundaries*, with the bulk of the application code not needing to worry about that decision either way. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cory at lukasa.co.uk Thu Aug 11 09:16:04 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Thu, 11 Aug 2016 14:16:04 +0100 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> Message-ID: <95639967-5D79-4A12-9083-CBFF0F2A6AC9@lukasa.co.uk> > On 11 Aug 2016, at 13:00, Nick Coghlan wrote: > Twisted callbacks are still red functions - you call them via the event loop rather than directly, and only event loop aware functions know how to make that request of the event loop. > > async/await just makes the function colour locally visible with a clear syntax, rather than needing to be inferred from behavioural details of the function implementation. > That's not really true, though its trueness depends on what you mean when you say ?Twisted callbacks?. Like asyncio (which adopted this model from Twisted), Twisted has two separate things that might be called ?callbacks?. The first can be called ?callbacks-by-convention?: these are things like IProtocol, which define a model that Twisted will use to deliver data to your application. The second can be called ?callbacks-by-implementation?, which are Deferreds. For ?callbacks-by-convention?, I think the best way to describe it is that Twisted functions have a *convention* of being red, but they aren?t *actually* red. They can be called whenever you like. For example, consider the following Twisted protocol: from twisted.internet.protocol import Protocol class PrinterProtocol(Protocol): def connectionMade(self, transport): self.transport = transport self.transport.write( b'GET / HTTP/1.1\r\n' b'Host: http2bin.org\r\n' b'Connection: close\r\n' b'\r\n' ) def dataReceived(self, data): print data, def connectionLost(self, reason): print "Connection lost? There is nothing here that requires execution inside a reactor. In fact, if you change the spelling of ?write? to ?sendall?, you don?t need anything that the Python 2.7 standard library doesn?t give you: import socket def main(): p = PrinterProtocol() s = socket.create_connection(('http2bin.org', 80)) p.connectionMade(s) while True: data = s.recv(8192) if data: p.dataReceived(data) else: p.connectionLost(None) break s.close() What matters here is that these functions are *just functions*. Unlike Python 3?s async functions, they do not require a coroutine runner that understands their special yield values. You call them, they do a thing, they return synchronously. This is how callback code *has* to be constructed, and it?s why callbacks are the lowest-common-denominator kind of async code. Conversely, async functions as defined in Python 3 really do need a coroutine runner to execute them or nothing happens. But I presume you were really talking about the second kind of Twisted callback-function, which is the kind that uses Deferreds. And once again, there is nothing inherent in these functions that gives them a colour. Consider this bit of code: def showConnection(conn): print conn conn.close() def doStuff(): deferredConnection = makeConnection(?http2bin.org?, 80) deferredConnection.addCallback(showConnection) return deferredConnection Your argument is that those functions are red. My argument is that they are uncoloured. For example, you can write makeConnection() like this: def makeConnection(host, port): s = socket.create_connection((host, port)) d = Deferred() d.callback(s) return d The simple fact of returning a Deferred doesn?t make your function async. This is one of Deferred?s fairly profound differences from asyncio.Future, which genuinely *does* require an event loop: Deferred can be evaluated entirely synchronously, with no recourse to an event loop. This has the effect of *uncolouring* functions that work with Deferreds. Their control flow is obscured, sure, but they are nonetheless not like async functions. In fact, it would be entirely possible to replace Twisted?s core event loop functions with ones that don?t delegate to a reactor, and then have a synchronous version of Twisted. It would be pretty gloriously useless, but is entirely do-able. Cory From steve at pearwood.info Thu Aug 11 10:25:03 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 12 Aug 2016 00:25:03 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> <20160811031403.GD26300@ando.pearwood.info> Message-ID: <20160811142502.GE26300@ando.pearwood.info> On Thu, Aug 11, 2016 at 02:09:00PM +1000, Chris Angelico wrote: > On Thu, Aug 11, 2016 at 1:14 PM, Steven D'Aprano wrote: > > The way I have done auto-detection based on BOMs is you start by reading > > four bytes from the file in binary mode. (If there are fewer than four > > bytes, it cannot be a text file with a BOM.) > > Interesting. Are you assuming that a text file cannot be empty? Hmmm... not consciously, but I guess I was. If the file is empty, how do you know it's text? > Because 0xFF 0xFE could represent an empty file in UTF-16, and 0xEF > 0xBB 0xBF likewise for UTF-8. Or maybe you don't care about files with > less than one character in them? I'll have to think about it some more :-) > For a default file-open encoding detection, I would minimize the > number of options. The UTF-7 BOM could be the beginning of a file > containing Base 64 data encoded in ASCII, which is a very real > possibility. I'm coming from the assumption that you're reading unformated text in an unknown encoding, rather than some structured format. But we're getting off topic here. In context of Steve's suggestion, we should only autodetect UTF-8. In other words, if there's a UTF-8 BOM, skip it, otherwise treat the file as UTF-8. > When was the last time you saw a UTF-32LE-BOM file? Two minutes ago, when I looked at my test suite :-P -- Steve From random832 at fastmail.com Thu Aug 11 10:52:53 2016 From: random832 at fastmail.com (Random832) Date: Thu, 11 Aug 2016 10:52:53 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <20160811142502.GE26300@ando.pearwood.info> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> <20160811031403.GD26300@ando.pearwood.info> <20160811142502.GE26300@ando.pearwood.info> Message-ID: <1470927173.2739053.692545393.4724F733@webmail.messagingengine.com> On Thu, Aug 11, 2016, at 10:25, Steven D'Aprano wrote: > > Interesting. Are you assuming that a text file cannot be empty? > > Hmmm... not consciously, but I guess I was. > > If the file is empty, how do you know it's text? Heh. That's the *other* thing that Notepad does wrong in the opinion of people coming from the Unix world - a Windows text file does not need to end with a [CR]LF, and normally will not. > But we're getting off topic here. In context of Steve's suggestion, we > should only autodetect UTF-8. In other words, if there's a UTF-8 BOM, > skip it, otherwise treat the file as UTF-8. I think there's still room for UTF-16. It's two of the four encodings supported by Notepad, after all. From guettliml at thomas-guettler.de Thu Aug 11 11:22:36 2016 From: guettliml at thomas-guettler.de (=?UTF-8?Q?Thomas_G=c3=bcttler?=) Date: Thu, 11 Aug 2016 17:22:36 +0200 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> Message-ID: <57AC983C.1040304@thomas-guettler.de> Am 10.08.2016 um 19:17 schrieb Sven R. Kunze: > On 09.08.2016 05:23, Nick Coghlan wrote: >> On 9 August 2016 at 08:37, Sven R. Kunze > wrote: >> >> From what I've heard in the wild, that most if not all pieces of async are mirroring existing Python features. So, >> building async basically builds a parallel structure in Python resembling Python. Async generators complete the >> picture. Some people (including me) are concerned by this because they feel that having two "almost the same >> pieces" is not necessarily a good thing to have. And not necessarily bad but it feels like duplicating code all >> over the place especially as existing functions are incompatible with async. >> >> >> It's a known problem that applies to programming language design in general rather than being Python specific: >> http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ > > > If it's a such well-known **problem**, why does it even exist in Python in the first place? ;-) > > I don't buy that one couldn't have avoided it. > > > Lately, I talked to friend of mine about async and his initial reaction was like "hmm that reminds me of early > programming days, where you have to explicitly tell the scheduler to get control back". He's much older than me, so I > think it was interesting for him to see that history is repeating again. Up to now there is only one answer to the question "Is `await` in Python3 Cooperative Multitasking?" http://stackoverflow.com/questions/38865050/is-await-in-python3-cooperative-multitasking The user there writes: [about await] That sounds quite like Cooperative multitasking to me. In 1996 I was a student at university and was told that preemptive multitasking is better. Since tools like http://python-rq.org/ can be implemented in Python2.7 I ask myself: why change the language? My opinion: if you want to do parallel processing, use a tool like python-rq or celery. Regards, Thomas G?ttler -- Thomas Guettler http://www.thomas-guettler.de/ From steve.dower at python.org Thu Aug 11 11:31:46 2016 From: steve.dower at python.org (Steve Dower) Date: Thu, 11 Aug 2016 08:31:46 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1470927173.2739053.692545393.4724F733@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> <20160811031403.GD26300@ando.pearwood.info> <20160811142502.GE26300@ando.pearwood.info> <1470927173.2739053.692545393.4724F733@webmail.messagingengine.com> Message-ID: Unless someone else does the implementation, I'd rather add a utf8-readsig encoding that initially only skips a utf8 BOM - notably, you always get the same encoding, it just sometimes skips the first three bytes. I think we can change this later to detect and switch to utf16 without it being disastrous, though we've made it this far without it and frankly there are good reasons to "encourage" utf8 over utf16. My big concern is the console... I think that change is inevitably going to have to break someone, but I need to map out the possibilities first to figure out just how bad it'll be. Top-posted from my Windows Phone -----Original Message----- From: "Random832" Sent: ?8/?11/?2016 7:54 To: "python-ideas at python.org" Subject: Re: [Python-ideas] Fix default encodings on Windows On Thu, Aug 11, 2016, at 10:25, Steven D'Aprano wrote: > > Interesting. Are you assuming that a text file cannot be empty? > > Hmmm... not consciously, but I guess I was. > > If the file is empty, how do you know it's text? Heh. That's the *other* thing that Notepad does wrong in the opinion of people coming from the Unix world - a Windows text file does not need to end with a [CR]LF, and normally will not. > But we're getting off topic here. In context of Steve's suggestion, we > should only autodetect UTF-8. In other words, if there's a UTF-8 BOM, > skip it, otherwise treat the file as UTF-8. I think there's still room for UTF-16. It's two of the four encodings supported by Notepad, after all. _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Aug 11 12:24:21 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 12 Aug 2016 02:24:21 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> <20160811031403.GD26300@ando.pearwood.info> <20160811142502.GE26300@ando.pearwood.info> <1470927173.2739053.692545393.4724F733@webmail.messagingengine.com> Message-ID: On Fri, Aug 12, 2016 at 1:31 AM, Steve Dower wrote: > My big concern is the console... I think that change is inevitably going to > have to break someone, but I need to map out the possibilities first to > figure out just how bad it'll be. Obligatory XKCD: https://xkcd.com/1172/ Subprocess invocation has been mentioned. What about logging? Will there be issues with something that attempts to log to both console and file? ChrisA From cory at lukasa.co.uk Thu Aug 11 12:38:58 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Thu, 11 Aug 2016 17:38:58 +0100 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <57AC983C.1040304@thomas-guettler.de> References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> <8717858e-a04d-a6a3-c5de-8101cb69a906@mail.de> <04406d41-6a05-341a-bfca-75cb6781af04@gmail.com> <51b77f51-6719-7c19-9445-b3afd35c8fe6@gmail.com> <3621051e-aaa9-427d-c0bb-e44e6e766ec7@mail.de> <8acf8247-1b03-5f0f-0a74-a235f486e956@gmail.com> <78fb06dc-ee03-35ad-8b8f-9852ba14fd1e@mail.de> <57AC983C.1040304@thomas-guettler.de> Message-ID: > On 11 Aug 2016, at 16:22, Thomas G?ttler wrote: > > The user there writes: [about await] That sounds quite like Cooperative multitasking to me. > > In 1996 I was a student at university and was told that preemptive multitasking is better. > > Since tools like http://python-rq.org/ can be implemented in Python2.7 I ask myself: why change the language? > > My opinion: if you want to do parallel processing, use a tool like python-rq or celery. I agree. async/await doesn?t do parallel processing, it does concurrent processing. See also: https://blog.golang.org/concurrency-is-not-parallelism Cory -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Aug 11 13:01:04 2016 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 11 Aug 2016 11:01:04 -0600 Subject: [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484 In-Reply-To: References: Message-ID: On Tue, Aug 9, 2016 at 5:32 PM, Eric Snow wrote: > There are a number of options here for identifying attributes > in a definition and even auto-generating parts of the class (e.g. > __init__). Let's look at several (with various objectives): > > # currently (with comments for type info) > # using attribute annotations and a decorator (and PEP 520) > # using attribute annotations and a metaclass (and PEP 520) > # using one class decorator and PEP 520 and comments for type info > # using one class decorator and comments for type info > # using one class decorator (and PEP 468) and comments for type info > # using a class decorator for each attribute Another approach that I've used in the past (along with a derivative): # using a non-data descriptor @as_namedtuple class Bee: """...""" name = Attr(str, 'Eric', doc='the bee's name') ancient_injury = Attr(bool, False) menagerie = Attr(bool, False) def half_a(self): ... # using a non-data descriptor along with PEP 487 class Bee(Namedtuple): """...""" name = Attr(str, 'Eric', doc='the bee's name') ancient_injury = Attr(bool, False) menagerie = Attr(bool, False) def half_a(self): ... While the descriptor gives you docstrings (a la property), I expect it isn't as static-analysis-friendly as the proposed variable annotations. -eric From drekin at gmail.com Thu Aug 11 14:34:05 2016 From: drekin at gmail.com (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=) Date: Thu, 11 Aug 2016 20:34:05 +0200 Subject: [Python-ideas] Fix default encodings on Windows Message-ID: > > On 11 August 2016 at 04:10, Steve Dower > wrote: > >>* I suspect there's a lot of discussion to be had around this topic, so I want to get it started. There are some fairly drastic ideas here and I need help figuring out whether the impact outweighs the value. > * > My main reaction would be that if Drekin (Adam Barto?) agrees the > changes natively solve the problems thathttps://pypi.python.org/pypi/win_unicode_console works around, it's > probably a good idea. > > The status quo is also sufficiently broken from both a native Windows > perspective and a cross-platform compatibility perspective that your > proposals are highly unlikely to make things *worse* :) > > Cheers, > Nick. > > The main idea of win_unicode_console is simple: to use WinAPI functions ReadConsoleW and WriteConsoleW to communicate with the interactive console on Windows and to wrap this in standard Python IO hierarchy ? that's why sys.std*.encoding would be 'utf-16-le': it corresponds to widechar strings used by Windows wide APIs. But this is only about sys.std*.encoding, which I think is not so imporant. AFAIK sys.std*.encoding should be used only when you want to communicate in bytes (which I think is not a good idea), so it tells you, which encoding is sys.std*.buffer assuming. In fact sys.std* may even not have the buffer attribute, so its encoding attribute would be useless in that case. Unfortunatelly, sys.std*.encoding is used in some other places ? namely by the consumers of the old PyOS_Readline API (the tokenizer and input) use it to decode the bytes returned. Actually, the consumers assume differente encodings (sys.stdin.encoding vs. sys.stdout.encoding), so it is impossible to write a correct readline hook when the encodings are not the same. So I think it would be nice to have Python and string-based implementation of readline hooks ? sys.readlinehook attribute, which would use sys.std* by default on Windows and GNU readline on Unix. Nevertheless, I think it is a good idea to have more 'utf-8' defaults (or 'utf-8-readsig' for open()). I don't know whether it helps with the console issue to open the standard streams in 'utf-8'. Adam Barto? -------------- next part -------------- An HTML attachment was scrubbed... URL: From drekin at gmail.com Thu Aug 11 14:41:06 2016 From: drekin at gmail.com (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=) Date: Thu, 11 Aug 2016 20:41:06 +0200 Subject: [Python-ideas] Fix default encodings on Windows Message-ID: Eryk Sun wrote: > IMO, Python needs a C implementation of the win_unicode_console > module, using the wide-character APIs ReadConsoleW and WriteConsoleW. > Note that this sets sys.std*.encoding as UTF-8 and transcodes, so > Python code never has to work directly with UTF-16 encoded text. > > The transcoding wrappers with 'utf-8' encoding are used just as a work around the fact that Python tokenizer cannot use utf-16-le and that the readlinehook machinery is unfortunately bytes-based. The tanscoding wrapper just has encoding 'utf-8' and no buffer attribute, so there is no actual transcoding in sys.std* objects. It's just a signal for PyOS_Readline consumers, and the transcoding occurs in a custom readline hook. Nothing like this would be needed if PyOS_Readline was replaced by some Python API wrapper around sys.readlinehook that would be Unicode string based. Adam Barto? -------------- next part -------------- An HTML attachment was scrubbed... URL: From eryksun at gmail.com Fri Aug 12 08:31:19 2016 From: eryksun at gmail.com (eryk sun) Date: Fri, 12 Aug 2016 12:31:19 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: Message-ID: Thu, Aug 11, 2016 at 6:41 PM, Adam Barto? wrote: > The transcoding wrappers with 'utf-8' encoding are used just as a work > around the fact that Python tokenizer cannot use utf-16-le and that the > readlinehook machinery is unfortunately bytes-based. The tanscoding wrapper > just has encoding 'utf-8' and no buffer attribute, so there is no actual > transcoding in sys.std* objects. It's just a signal for PyOS_Readline > consumers, and the transcoding occurs in a custom readline hook. Nothing > like this would be needed if PyOS_Readline was replaced by some Python API > wrapper around sys.readlinehook that would be Unicode string based. If win_unicode_console gets added to the standard library, I think it should provide at least a std*.buffer interface that transcodes between UTF-16 and UTF-8 (with errors='replace'), to make this as much of a drop-in replacement as possible. I know it's not required. For example, IDLE doesn't implement this. But I'm also sure there's code out there that uses stdout.buffer, including in the standard library. It's mostly test code (not including cases for piping output from a child process) and simple script interfaces, but if we don't have to break people's code, we really shouldn't. From eryksun at gmail.com Fri Aug 12 08:38:49 2016 From: eryksun at gmail.com (eryk sun) Date: Fri, 12 Aug 2016 12:38:49 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> Message-ID: On Thu, Aug 11, 2016 at 9:07 AM, Paul Moore wrote: > set codepage to UTF-8 > ... > set codepage back > spawn subprocess X, but don't wait for it > set codepage to UTF-8 > ... > ... At this point what codepage does Python see? What codepage does > process X see? (Note that they are both sharing the same console). The input and output codepages are global data in conhost.exe. They aren't tracked for each attached process (unlike input history and aliases). That's how chcp.com works in the first place. Otherwise its calls to SetConsoleCP and SetConsoleOutputCP would be pointless. But IMHO all talk of using codepage 65001 is a waste of time. I think the trailing garbage output with this codepage in Windows 7 is unacceptable. And getting EOF for non-ASCII input is a show stopper. The problem occurs in conhost. All you get is the EOF result from ReadFile/ReadConsoleA, so it can't be worked around. This kills the REPL and raises EOFError for input(). ISTM the only people who think codepage 65001 actually works are those using Windows 8+ who occasionally need to print non-OEM text and never enter (or paste) anything but ASCII text. From steve.dower at python.org Fri Aug 12 09:33:03 2016 From: steve.dower at python.org (Steve Dower) Date: Fri, 12 Aug 2016 06:33:03 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> Message-ID: I was thinking we would end up using the console API for input but stick with the standard handles for output, mostly to minimize the amount of magic switching we have to do. But since we can just switch the entire stream object in __std*__ once at startup if nothing is redirected it probably isn't that much of a simplification. I have some airport/aeroplane time today where I can experiment. Top-posted from my Windows Phone -----Original Message----- From: "eryk sun" Sent: ?8/?12/?2016 5:40 To: "python-ideas" Subject: Re: [Python-ideas] Fix default encodings on Windows On Thu, Aug 11, 2016 at 9:07 AM, Paul Moore wrote: > set codepage to UTF-8 > ... > set codepage back > spawn subprocess X, but don't wait for it > set codepage to UTF-8 > ... > ... At this point what codepage does Python see? What codepage does > process X see? (Note that they are both sharing the same console). The input and output codepages are global data in conhost.exe. They aren't tracked for each attached process (unlike input history and aliases). That's how chcp.com works in the first place. Otherwise its calls to SetConsoleCP and SetConsoleOutputCP would be pointless. But IMHO all talk of using codepage 65001 is a waste of time. I think the trailing garbage output with this codepage in Windows 7 is unacceptable. And getting EOF for non-ASCII input is a show stopper. The problem occurs in conhost. All you get is the EOF result from ReadFile/ReadConsoleA, so it can't be worked around. This kills the REPL and raises EOFError for input(). ISTM the only people who think codepage 65001 actually works are those using Windows 8+ who occasionally need to print non-OEM text and never enter (or paste) anything but ASCII text. _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Aug 12 09:41:52 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 12 Aug 2016 14:41:52 +0100 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> Message-ID: On 12 August 2016 at 13:38, eryk sun wrote: >> ... At this point what codepage does Python see? What codepage does >> process X see? (Note that they are both sharing the same console). > > The input and output codepages are global data in conhost.exe. They > aren't tracked for each attached process (unlike input history and > aliases). That's how chcp.com works in the first place. Otherwise its > calls to SetConsoleCP and SetConsoleOutputCP would be pointless. That's what I expected, but hadn't had time to confirm (your point about chcp didn't occur to me). Thanks. > But IMHO all talk of using codepage 65001 is a waste of time. I think > the trailing garbage output with this codepage in Windows 7 is > unacceptable. And getting EOF for non-ASCII input is a show stopper. > The problem occurs in conhost. All you get is the EOF result from > ReadFile/ReadConsoleA, so it can't be worked around. This kills the > REPL and raises EOFError for input(). ISTM the only people who think > codepage 65001 actually works are those using Windows 8+ who > occasionally need to print non-OEM text and never enter (or paste) > anything but ASCII text. Agreed, mucking with global state that subprocesses need was sufficient for me, but the other issues you mention seem conclusive. I understand Steve's point about being an improvement over 100% wrong, but we've lived with the current state of affairs long enough that I think we should take whatever time is needed to do it right, rather than briefly postponing the inevitable with a partial solution. Paul PS I've spent the last week on a different project trying to "save time" with partial solutions to precisely this issue, so apologies if I'm in a particularly unforgiving mood about it right now :-( From random832 at fastmail.com Fri Aug 12 10:20:48 2016 From: random832 at fastmail.com (Random832) Date: Fri, 12 Aug 2016 10:20:48 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB6E2D.6050704@python.org> References: <57AB6E2D.6050704@python.org> Message-ID: <1471011648.3049096.693513873.7CF5A957@webmail.messagingengine.com> On Wed, Aug 10, 2016, at 14:10, Steve Dower wrote: > * force the console encoding to UTF-8 on initialize and revert on > finalize > > So what are your concerns? Suggestions? As far as I know, the single biggest problem caused by the status quo for console encoding is "some string containing characters not in the console codepage is printed out; unhandled UnicodeEncodeError". Is there any particular reason not to use errors='replace'? Is there any particular reason for the REPL, when printing the repr of a returned object, not to replace characters not in the stdout encoding with backslash sequences? Does Python provide any mechanism to access the built-in "best fit" mappings for windows codepages (which mostly consist of removing accents from latin letters)? From random832 at fastmail.com Fri Aug 12 11:33:35 2016 From: random832 at fastmail.com (Random832) Date: Fri, 12 Aug 2016 11:33:35 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB7BC0.4010804@python.org> References: <57AB6E2D.6050704@python.org> <57AB7BC0.4010804@python.org> Message-ID: <1471016015.3065611.693567457.65A9DC9B@webmail.messagingengine.com> On Wed, Aug 10, 2016, at 15:08, Steve Dower wrote: > That's the hope, though that module approaches the solution differently > and may still uses. An alternative way for us to fix this whole thing > would be to bring win_unicode_console into the standard library and use > it by default (or probably whenever PYTHONIOENCODING is not specified). I have concerns about win_unicode_console: - For the "text_transcoded" streams, stdout.encoding is utf-8. For the "text" streams, it is utf-16. - There is no object, as far as I can find, which can be used as an unbuffered unicode I/O object. - raw output streams silently drop the last byte if an odd number of bytes are written. - The sys.stdout obtained via streams.enable does not support .buffer / .buffer.raw / .detach - All of these objects provide a fileno() interface. - When using os.read/write for data that represents text, the data still should be encoded in the console encoding and not in utf-8 or utf-16. How is it important to preserve the validity of the conventional advice for "putting stdin/stdout in binary mode" using .buffer or .detach? I suspect this is mainly used for programs intended to have their output redirected, but today it 'kind of works' to run such a program on the console and inspect its output. How important is it for os.read/write(stdxxx.fileno()) to be consistent with stdxxx.encoding? Should errors='surrogatepass' be used? It's unlikely, but not impossible, to paste an invalid surrogate into the console. With win_unicode_console, this results in a UnicodeDecodeError and, if this happened during a readline, disables the readline hook. Is it possible to break this by typing a valid surrogate pair that falls across a buffer boundary? From drekin at gmail.com Fri Aug 12 12:24:11 2016 From: drekin at gmail.com (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=) Date: Fri, 12 Aug 2016 18:24:11 +0200 Subject: [Python-ideas] Fix default encodings on Windows Message-ID: *On Fri Aug 12 11:33:35 EDT 2016, * *Random832 wrote:*> On Wed, Aug 10, 2016, at 15:08, Steve Dower wrote: >>* That's the hope, though that module approaches the solution differently *>>* and may still uses. An alternative way for us to fix this whole thing *>>* would be to bring win_unicode_console into the standard library and use *>>* it by default (or probably whenever PYTHONIOENCODING is not specified). *> > I have concerns about win_unicode_console: > - For the "text_transcoded" streams, stdout.encoding is utf-8. For the > "text" streams, it is utf-16. UTF-16 it the "native" encoding since it corresponds to the wide chars used by Read/WriteConsoleW. The UTF-8 is used just as a signal for the consumers of PyOS_Readline. > - There is no object, as far as I can find, which can be used as an > unbuffered unicode I/O object. There is no buffer just on those wrapping streams because the bytes I have are not in UTF-8. Adding one would mean a fake buffer that just decodes and writes to the text stream. AFAIK there is no guarantee that sys.std* objects have buffer attribute and any code relying on that is incorrect. But I inderstand that there may be such code and we may want to be compatible. > - raw output streams silently drop the last byte if an odd number of > bytes are written. That's not true, it doesn't write an odd number of bytes, but returns the correct number of bytes written. If only one byte is given, it raises a ValueError. > - The sys.stdout obtained via streams.enable does not support .buffer / > .buffer.raw / .detach > - All of these objects provide a fileno() interface. Is this wrong? If I remember, I provide it because of some check -- maybe in input() -- to be viewed as a stdio stream. > - When using os.read/write for data that represents text, the data still > should be encoded in the console encoding and not in utf-8 or utf-16. I don't know what to do with this. Generally I wouldn't use bytes to communicate textual data. Regards, Adam Barto? -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Aug 12 13:05:22 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 12 Aug 2016 10:05:22 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> Message-ID: On Fri, Aug 12, 2016 at 6:41 AM, Paul Moore wrote: > I > understand Steve's point about being an improvement over 100% wrong, > but we've lived with the current state of affairs long enough that I > think we should take whatever time is needed to do it right, Sure -- but his is such a freakin' mess that there may well not BE a "right" solution. In which case, something IS better than nothing. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Aug 12 13:19:01 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 12 Aug 2016 18:19:01 +0100 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> Message-ID: On 12 August 2016 at 18:05, Chris Barker wrote: > On Fri, Aug 12, 2016 at 6:41 AM, Paul Moore wrote: >> >> I >> understand Steve's point about being an improvement over 100% wrong, >> but we've lived with the current state of affairs long enough that I >> think we should take whatever time is needed to do it right, > > > Sure -- but his is such a freakin' mess that there may well not BE a "right" > solution. > > In which case, something IS better than nothing. Using Unicode APIs for console IO *is* better. Powershell does it, and it works there. All I'm saying is that we should focus on that as our "improved solution", rather than looking at CP_UTF8 as a "quick and dirty" solution, as there's no evidence that people need "quick and dirty" (they have win_unicode_console if the current state of affairs isn't sufficient for them). I'm not arguing that we do nothing. Are you saying we should use CP_UTF8 *in preference* to wide character APIs? Or that we should implement CP_UTF8 first and then wide chars later? Or are we in violent agreement that we should implement wide chars? Paul From random832 at fastmail.com Fri Aug 12 14:35:13 2016 From: random832 at fastmail.com (Random832) Date: Fri, 12 Aug 2016 14:35:13 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: Message-ID: <1471026913.3106297.693776993.0F32D83A@webmail.messagingengine.com> On Fri, Aug 12, 2016, at 12:24, Adam Barto? wrote: > There is no buffer just on those wrapping streams because the bytes I > have are not in UTF-8. Adding one would mean a fake buffer that just > decodes and writes to the text stream. AFAIK there is no guarantee > that sys.std* objects have buffer attribute and any code relying on > that is incorrect. But I inderstand that there may be such code and we > may want to be compatible. Yes that's what I meant, I just think it needs to be considered if we're thinking about making it (or something like it) the default python sys.std*. Maybe the decision will be that maintaining compatibility with these cases isn't important. > > - The sys.stdout obtained via streams.enable does not support > > .buffer / .buffer.raw / .detach > > - All of these objects provide a fileno() interface. > > Is this wrong? If I remember, I provide it because of some check -- > maybe in input() -- to be viewed as a stdio stream. I don't know if it's *wrong* per se (same with the no buffer/raw thing etc), I'm just concerned about the possible effects on code that is written against the current implementation. From tritium-list at sdamon.com Fri Aug 12 16:10:48 2016 From: tritium-list at sdamon.com (tritium-list at sdamon.com) Date: Fri, 12 Aug 2016 16:10:48 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> Message-ID: <010d01d1f4d5$9e840a80$db8c1f80$@hotmail.com> > -----Original Message----- > From: Python-ideas [mailto:python-ideas-bounces+tritium- > list=sdamon.com at python.org] On Behalf Of Paul Moore > Sent: Friday, August 12, 2016 9:42 AM > To: eryk sun > Cc: python-ideas > Subject: Re: [Python-ideas] Fix default encodings on Windows > > On 12 August 2016 at 13:38, eryk sun wrote: > >> ... At this point what codepage does Python see? What codepage does > >> process X see? (Note that they are both sharing the same console). > > > > The input and output codepages are global data in conhost.exe. They > > aren't tracked for each attached process (unlike input history and > > aliases). That's how chcp.com works in the first place. Otherwise its > > calls to SetConsoleCP and SetConsoleOutputCP would be pointless. > > That's what I expected, but hadn't had time to confirm (your point > about chcp didn't occur to me). Thanks. > > > But IMHO all talk of using codepage 65001 is a waste of time. I think > > the trailing garbage output with this codepage in Windows 7 is > > unacceptable. And getting EOF for non-ASCII input is a show stopper. > > The problem occurs in conhost. All you get is the EOF result from > > ReadFile/ReadConsoleA, so it can't be worked around. This kills the > > REPL and raises EOFError for input(). ISTM the only people who think > > codepage 65001 actually works are those using Windows 8+ who > > occasionally need to print non-OEM text and never enter (or paste) > > anything but ASCII text. > > Agreed, mucking with global state that subprocesses need was > sufficient for me, but the other issues you mention seem conclusive. I > understand Steve's point about being an improvement over 100% wrong, > but we've lived with the current state of affairs long enough that I > think we should take whatever time is needed to do it right, rather > than briefly postponing the inevitable with a partial solution. For the love of all that is holy and good, ignore that sentiment. We need ANY AND ALL improvements to this miserable console experience. > Paul > > PS I've spent the last week on a different project trying to "save > time" with partial solutions to precisely this issue, so apologies if > I'm in a particularly unforgiving mood about it right now :-( > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From chris.barker at noaa.gov Fri Aug 12 17:14:11 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 12 Aug 2016 14:14:11 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1470854785.1512364.691660185.6E0A73F9@webmail.messagingengine.com> <57AB7EEC.2020201@python.org> <1470859745.1539711.691724401.16839566@webmail.messagingengine.com> <57ABBB6F.7050206@python.org> Message-ID: On Fri, Aug 12, 2016 at 10:19 AM, Paul Moore wrote: > > In which case, something IS better than nothing. > > > I'm not arguing that we do nothing. Are you saying we should use > CP_UTF8 *in preference* to wide character APIs? Or that we should > implement CP_UTF8 first and then wide chars later? Honestly, I don't understand the details enough to argue either way. > Or are we in > violent agreement that we should implement wide chars? probably -- to the extend I understand the issues :-) But I am arguing that anything that makes it "better" that actually gets implemented is better than a "right" solution that no one has the time to make it happen, or that we can't agree on anyway. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Aug 12 19:03:38 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 13 Aug 2016 01:03:38 +0200 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB6E2D.6050704@python.org> References: <57AB6E2D.6050704@python.org> Message-ID: Hello, I'm in holiday and I'm writing on a phone, so sorry in advance for the short answer. In short: we should drop support for the bytes API. Just use Unicode on all platforms, especially for filenames. Sorry but most of these changes look like very bad ideas. Or maybe I misunderstood something. Windows bytes API are broken in different ways, in short your proposal is to put another layer on top of it to try to workaround issues. Unicode is complex. Unicode issues are hard to debug. Adding a new layer makes debugging even harder. Is the bug in the input data? In the layer? In the final Windows function? In my experience on UNIX, the most important part is the interoperability with other applications. I understand that Python 2 will speak ANSI code page but Python 3 will speak UTF-8. I don't understand how it can work. Almsot all Windows applications speak the ANSI code page (I'm talking about stdin, stdout, pipes, ...). Do you propose to first try to decode from UTF-8 or fallback on decoding from the ANSI code page? What about encoding? Always encode to UTF-8? About BOM: I hate them. Many applications don't understand them. Again, think about Python 2. I recall vaguely that the Unicode strandard suggests to not use BOM (I have to check). I recall a bug in gettext. The tool doesn't understand BOM. When I opened the file in vim, the BOM was invisible (hidden). I had to use hexdump to understand the issue! BOM introduces issues very difficult to debug :-/ I also think that it goes in the wrong direction in term of interoperability. For the Windows console: I played with all Windows functions, tried all fonts and many code pages. I also read technical blog articles of Microsoft employees. I gave up on this issue. It doesn't seem possible to support fully Unicode the Windows console (at least the last time I checked). By the way, it seems like Windows functions have bugs, and the code page 65001 fixes a few issues but introduces new issues... Victor Le 10 ao?t 2016 20:16, "Steve Dower" a ?crit : > I suspect there's a lot of discussion to be had around this topic, so I > want to get it started. There are some fairly drastic ideas here and I need > help figuring out whether the impact outweighs the value. > > Some background: within the Windows API, the preferred encoding is UTF-16. > This is a 16-bit format that is typed as wchar_t in the APIs that use it. > These APIs are generally referred to as the *W APIs (because they have a W > suffix). > > There are also (broadly deprecated) APIs that use an 8-bit format (char), > where the encoding is assumed to be "the user's active code page". These > are *A APIs. AFAIK, there are no cases where a *A API should be preferred > over a *W API, and many newer APIs are *W only. > > In general, Python passes byte strings into the *A APIs and text strings > into the *W APIs. > > Right now, sys.getfilesystemencoding() on Windows returns "mbcs", which > translates to "the system's active code page". As this encoding generally > cannot represent all paths on Windows, it is deprecated and Unicode strings > are recommended instead. This, however, means you need to write > significantly different code between POSIX (use bytes) and Windows (use > text). > > ISTM that changing sys.getfilesystemencoding() on Windows to "utf-8" and > updating path_converter() (Python/posixmodule.c; likely similar code in > other places) to decode incoming byte strings would allow us to undeprecate > byte strings and add the requirement that they *must* be encoded with > sys.getfilesystemencoding(). I assume that this would allow cross-platform > code to handle paths similarly by encoding to whatever the sys module says > they should and using bytes consistently (starting this thread is meant to > validate/refute my assumption). > > (Yes, I know that people on POSIX should just change to using Unicode and > surrogateescape. Unfortunately, rather than doing that they complain about > Windows and drop support for the platform. If you want to keep hitting them > with the stick, go ahead, but I'm inclined to think the carrot is more > valuable here.) > > Similarly, locale.getpreferredencoding() on Windows returns a legacy value > - the user's active code page - which should generally not be used for any > reason. The one exception is as a default encoding for opening files when > no other information is available (e.g. a Unicode BOM or explicit encoding > argument). BOMs are very common on Windows, since the default assumption is > nearly always a bad idea. > > Making open()'s default encoding detect a BOM before falling back to > locale.getpreferredencoding() would resolve many issues, but I'm also > inclined towards making the fallback utf-8, leaving > locale.getpreferredencoding() solely as a way to get the active system > codepage (with suitable warnings about it only being useful for > back-compat). This would match the behavior that the .NET Framework has > used for many years - effectively, utf_8_sig on read and utf_8 on write. > > Finally, the encoding of stdin, stdout and stderr are currently > (correctly) inferred from the encoding of the console window that Python is > attached to. However, this is typically a codepage that is different from > the system codepage (i.e. it's not mbcs) and is almost certainly not > Unicode. If users are starting Python from a console, they can use "chcp > 65001" first to switch to UTF-8, and then *most* functionality works > (input() has some issues, but those can be fixed with a slight rewrite and > possibly breaking readline hooks). > > It is also possible for Python to change the current console encoding to > be UTF-8 on initialize and change it back on finalize. (This would leave > the console in an unexpected state if Python segfaults, but console > encoding is probably the least of anyone's worries at that point.) So I'm > proposing actively changing the current console to be Unicode while Python > is running, and hence sys.std[in|out|err] will default to utf-8. > > So that's a broad range of changes, and I have little hope of figuring out > all the possible issues, back-compat risks, and flow-on effects on my own. > Please let me know (either on-list or off-list) how a change like this > would affect your projects, either positively or negatively, and whether > you have any specific experience with these changes/fixes and think they > should be approached differently. > > > To summarise the proposals (remembering that these would only affect > Python 3.6 on Windows): > > * change sys.getfilesystemencoding() to return 'utf-8' > * automatically decode byte paths assuming they are utf-8 > * remove the deprecation warning on byte paths > * make the default open() encoding check for a BOM or else use utf-8 > * [ALTERNATIVE] make the default open() encoding check for a BOM or else > use sys.getpreferredencoding() > * force the console encoding to UTF-8 on initialize and revert on finalize > > So what are your concerns? Suggestions? > > Thanks, > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Aug 12 19:13:19 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 13 Aug 2016 01:13:19 +0200 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB6E2D.6050704@python.org> References: <57AB6E2D.6050704@python.org> Message-ID: Le 10 ao?t 2016 20:16, "Steve Dower" a ?crit : > So what are your concerns? Suggestions? Add a new option specific to Windows to switch to UTF-8 everywhere, use BOM, whatever you want, *but* don't change the defaults. IMO mbcs encoding is the least worst encoding for the default. I have an idea of a similar option for UNIX: ignore user preference (LC_ALL, LC_CTYPE, LANG environment variables) and force UTF-8. It's a common request on UNIX where UTF-8 is now the encoding of almost all systems, whereas the C library continues to use ASCII when the POSIX locale is used (which occurs in many cases). Perl already has such utf8 option. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Aug 12 19:25:58 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 13 Aug 2016 01:25:58 +0200 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> Message-ID: In short, a PEP is a summary of a long discussion. IMHO a PEP is required to write down the rationale and lists most important alternative and explain why the PEP is better. The hard part is to write a short but "complete" PEP. I tried to follow this discussion and I still to understand why my proposition of "def clamp(min_val, value, max_val): return min(max(min_val, value), max_val)" is not good. I expect that a PEP replies to this question without to read the whole thread :-) I don't recall neither what was the "conclusion" for NaN. Victor Le 10 ao?t 2016 00:43, "Chris Barker" a ?crit : > > Is this idea still alive? > > Despite the bike shedding, I think that some level of consensus may have been reached. So I suggest that either Neil (because it was your idea) or Steven (because you've had a lot of opinions, and done a lot of the homework) or both, of course, put together a reference implementation and a proposal, post it here, and see how it flies. > > It's one function, so hopefully won't need a PEP, but if your proposal meets with a lot of resistance, then you could turn it into a PEP then. But getting all this discussion summaries would be good as a first step. > > NOTE: I think it's a fine idea, but I've got way to omay other things I'd like to do first -- so I'm not going to push this forward... > > -CHB > > > > > On Fri, Aug 5, 2016 at 10:24 PM, Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: >> >> Steven D'Aprano writes: >> > On Fri, Aug 05, 2016 at 11:30:35PM +0900, Stephen J. Turnbull wrote: >> > >> > > I can even think of a case where clamp could be used with a constant >> > > control and a varying bound: S-s inventory control facing occasional >> > > large orders in an otherwise continuous, stationary demand process. >> > >> > Sounds interesting. Is there a link to somewhere I could learn more >> > about this? >> >> The textbook I use is Nancy Stokey, The Economics of Inaction >> https://www.amazon.co.jp/s/ref=nb_sb_noss?__mk_ja_JP=%E3%82%AB%E3%82%BF%E3%82%AB%E3%83%8A&url=search-alias%3Daps&field-keywords=nancy+stokey+economics+inaction >> >> The example I gave is not a textbook example, but is an "obvious" >> extension of the simplest textbook models. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Aug 12 19:48:19 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 12 Aug 2016 16:48:19 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> Message-ID: On Fri, Aug 12, 2016 at 4:25 PM, Victor Stinner wrote: > I tried to follow this discussion and I still to understand why my > proposition of "def clamp(min_val, value, max_val): return min(max(min_val, > value), max_val)" is not good. I expect that a PEP replies to this question > without to read the whole thread :-) I don't recall neither what was the > "conclusion" for NaN. > This was the implementation I suggested (but I borrowed it from StackOverflow, I don't claim originality). There are a couple arguable bugs in that implementation, and several questions I would want answered in a PEP. I'm not going to argue again about the best answer, but we should explicitly answer what the result of the following are (with some justified reasons): clamp(5, nan, nan) clamp(5, 0, nan) clamp(5, nan, 10) clamp(nan, 0, 10) clamp(nan, 5, 5) Also, min and max take the "first thing not greater/less than the rest". Arguably that's not what we would want for clamp(). But maybe it is, explain the reasons. E.g.: >>> max(1, nan) 1 >>> max(nan, 1) nan >>> max(1.0, 1) 1.0 >>> max(1, 1.0) 1 This has the obvious implications for the semantics of clamp() if it is based on min()/max() in the manner proposed. Also, what is the calling syntax? Are the arguments strictly positional, or do they have keywords? What are those default values if the arguments are not specified for either or both of min_val/max_val? E.g., is this OK: clamp(5, min_val=0) If this is allowable to mean "unbounded on the top" then the simple implementation will break using the most obvious default values: clamp('foo', min_val='aaa') # expect "lexical clamping" >>> min(max("aaa", "foo"), float('inf')) Traceback (most recent call last): File "", line 1, in TypeError: unorderable types: float() < str() >>> min(max("aaa", "foo"), None) Traceback (most recent call last): File "", line 1, in TypeError: unorderable types: NoneType() < str() >>> min(max("aaa", "foo"), "zzz") 'foo' Quite possibly this is exactly the behavior we want, but I'd like an explanation for why. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Aug 12 20:14:26 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 13 Aug 2016 10:14:26 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> Message-ID: <20160813001425.GI26300@ando.pearwood.info> On Sat, Aug 13, 2016 at 01:25:58AM +0200, Victor Stinner wrote: > I tried to follow this discussion and I still to understand why my > proposition of "def clamp(min_val, value, max_val): return min(max(min_val, > value), max_val)" is not good. I expect that a PEP replies to this question > without to read the whole thread :-) I said I would write up a summary, and I will. If you want to call it a PEP, I'm okay with that. I won't forget your proposal either :-) -- Steve From eryksun at gmail.com Fri Aug 12 22:44:00 2016 From: eryksun at gmail.com (eryk sun) Date: Sat, 13 Aug 2016 02:44:00 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1471011648.3049096.693513873.7CF5A957@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <1471011648.3049096.693513873.7CF5A957@webmail.messagingengine.com> Message-ID: On Fri, Aug 12, 2016 at 2:20 PM, Random832 wrote: > On Wed, Aug 10, 2016, at 14:10, Steve Dower wrote: >> * force the console encoding to UTF-8 on initialize and revert on >> finalize >> >> So what are your concerns? Suggestions? > > As far as I know, the single biggest problem caused by the status quo > for console encoding is "some string containing characters not in the > console codepage is printed out; unhandled UnicodeEncodeError". Is there > any particular reason not to use errors='replace'? If that's all you want then you can set PYTHONIOENCODING=:replace. Prepare to be inundated with question marks. Python's 'cp*' encodings are cross-platform, so they don't call Windows NLS APIs. If you want a best-fit encoding, then 'mbcs' is the only choice. Use chcp.com to switch to your system's ANSI codepage and set PYTHONIOENCODING=mbcs:replace. An 'oem' encoding could be added, but I'm no fan of these best-fit encodings. Writing question marks at least hints that the output is wrong. > Is there any particular reason for the REPL, when printing the repr of a > returned object, not to replace characters not in the stdout encoding > with backslash sequences? sys.displayhook already does this. It falls back on sys_displayhook_unencodable if printing the repr raises a UnicodeEncodeError. > Does Python provide any mechanism to access the built-in "best fit" > mappings for windows codepages (which mostly consist of removing accents > from latin letters)? As mentioned above, for output this is only available with 'mbcs'. For reading input via ReadFile or ReadConsoleA (and thus also C _read, fread, and fgets), the console already encodes its UTF-16 input buffer using a best-fit encoding to the input codepage. So there's no error in the following example, even though the result is wrong: >>> sys.stdin.encoding 'cp437' >>> s = '?' >>> s, ord(s) ('A', 65) Jumping back to the codepage 65001 discussion, here's a function to simulate the bad output that Windows Vista and 7 users see: def write(text): writes = [] n = 0 buffer = text.replace('\n', '\r\n').encode('utf-8') while buffer: decoded = buffer.decode('utf-8', 'replace') buffer = buffer[len(decoded):] writes.append(decoded.replace('\r', '\n')) return ''.join(writes) For example: >>> greek = '?????????\n' >>> write(greek) '?????????\n\n????\n\n?\n\n' It gets worse with characters that require 3 bytes in UTF-8: >>> devanagari = '?????????\n' >>> write(devanagari) '?????????\n\n??????\n\n????\n\n??\n\n' This problem doesn't exit in Windows 8+ because the old LPC-based communication (LPC is an undocumented protocol that's used extensively for IPC between Windows subsystems) with the console was rewritten to use a kernel driver (condrv.sys). Now it works like any other device by calling NtReadFile, NtWriteFile, and NtDeviceIoControlFile. Apparently in the rewrite someone fixed the fact that the conhost code that handles WriteFile and WriteConsoleA was incorrectly returning the number of UTF-16 codes written instead of the number of bytes. Unfortunately the rewrite also broke Ctrl+C handling because ReadFile no longer sets the last error to ERROR_OPERATION_ABORTED when a console read is interrupted by Ctrl+C. I'm surprised so few Windows users have noticed or cared that Ctrl+C kills the REPL and misbehaves with input() in the Windows 8/10 console. The source of the Ctrl+C bug is an incorrect NTSTATUS code STATUS_ALERTED, which should be STATUS_CANCELLED. The console has always done this wrong, but before the rewrite there was common code for ReadFile and ReadConsole that handled STATUS_ALERTED specially. It's still there in ReadConsole, so Ctrl+C handling works fine in Unicode programs that use ReadConsoleW (e.g. cmd.exe, powershell.exe). It also works fine if win_unicode_console is enabled. Finally, here's a ctypes example in Windows 10.0.10586 that shows the unsolvable problem with non-ASCII input when using codepage 65001: import ctypes, msvcrt conin = open(r'\\.\CONIN$', 'r+') hConin = msvcrt.get_osfhandle(conin.fileno()) kernel32 = ctypes.WinDLL('kernel32', use_last_error=True) nread = (ctypes.c_uint * 1)() ASCII-only input works: >>> buf = (ctypes.c_char * 100)() >>> kernel32.ReadFile(hConin, buf, 100, nread, None) spam 1 >>> nread[0], buf.value (6, b'spam\r\n') But it returns EOF if "a" is replaced by Greek "?": >>> buf = (ctypes.c_char * 100)() >>> kernel32.ReadFile(hConin, buf, 100, nread, None) sp?m 1 >>> nread[0], buf.value (0, b'') Notice that the read is successful but nread is 0. That signifies EOF. So the REPL will just silently quit as if you entered Ctrl+Z, and input() will raise EOFError. This can't be worked around. The problem is in conhost.exe, which assumes a request for N bytes wants N UTF-16 codes from the input buffer. This can only work with ASCII in UTF-8. From python at mrabarnett.plus.com Fri Aug 12 23:31:19 2016 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 13 Aug 2016 04:31:19 +0100 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> Message-ID: <890d9a3a-631c-d544-7ac6-3d21bdaec75b@mrabarnett.plus.com> On 2016-08-13 00:48, David Mertz wrote: > On Fri, Aug 12, 2016 at 4:25 PM, Victor Stinner > > wrote: > [snip] > > Also, what is the calling syntax? Are the arguments strictly positional, > or do they have keywords? What are those default values if the arguments > are not specified for either or both of min_val/max_val? E.g., is this OK: > > clamp(5, min_val=0) > I would've thought that the obvious default would be None, meaning "missing". From rosuav at gmail.com Fri Aug 12 23:43:40 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 13 Aug 2016 13:43:40 +1000 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <890d9a3a-631c-d544-7ac6-3d21bdaec75b@mrabarnett.plus.com> References: <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> <890d9a3a-631c-d544-7ac6-3d21bdaec75b@mrabarnett.plus.com> Message-ID: On Sat, Aug 13, 2016 at 1:31 PM, MRAB wrote: > On 2016-08-13 00:48, David Mertz wrote: >> >> On Fri, Aug 12, 2016 at 4:25 PM, Victor Stinner >> > wrote: >> > [snip] >> >> >> Also, what is the calling syntax? Are the arguments strictly positional, >> or do they have keywords? What are those default values if the arguments >> are not specified for either or both of min_val/max_val? E.g., is this >> OK: >> >> clamp(5, min_val=0) >> > I would've thought that the obvious default would be None, meaning > "missing". Doesn't really matter what the defaults are. That call means "clamp with a minimum of 0 and no maximum". It's been completely omitted. But yes, probably it would be min_val=None, max_val=None. ChrisA From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Aug 13 04:12:32 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 13 Aug 2016 17:12:32 +0900 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57AB6E2D.6050704@python.org> References: <57AB6E2D.6050704@python.org> Message-ID: <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> Steve Dower writes: > ISTM that changing sys.getfilesystemencoding() on Windows to > "utf-8" and updating path_converter() (Python/posixmodule.c; I think this proposal requires the assumption that strings intended to be interpreted as file names invariably come from the Windows APIs. I don't think that is true: Makefiles and similar, configuration files, all typically contain filenames. Zipfiles (see below). Python is frequently used as a glue language, so presumably receives such file name information as (more or less opaque) bytes objects over IPC channels. These just aren't under OS control, so the assumption will fail. Supporting Windows users in Japan means dealing with lots of crap produced by standard-oblivious software. Eg, Shift JIS filenames in zipfiles. AFAICT Windows itself never does that, but the majority of zipfiles I get from colleagues have Shift JIS in the directory (and it's the great majority if you assume that people who use ASCII transliterations are doing so because they know that non-Windows-users can't handle Shift JIS file names in zipfiles). So I believe bytes-oriented software must expect non-UTF-8 file names in Japan. UTF-8 may have penetration in the rest of the world, but the great majority of my Windows-using colleagues in Japan still habitually and by preference use Shift JIS in text files. I suppose that includes files that are used by programs, and thus file names, and probably extends to most Windows users here. I suspect a similar situation holds in China, where AIUI "GB is not just a good idea, it's the law,"[1] and possibly Taiwan (Big 5) and Korea (KSC) as those standards have always provided the benefits of (nearly) universal repertoires[2]. > and add the requirement that [bytes file names] *must* be encoded > with sys.getfilesystemencoding(). To the extent that this *can* work, it *already* works. Trying to enforce a particular encoding will simply break working code that depends on sys.getfilesystemencoding() matching the encoding that other programs use. You have no carrot. These changes enforce an encoding on bytes for Windows APIs but can't do so for data, and so will make file-names- are-just-bytes programmers less happy with Python, not more happy. The exception is the proposed console changes, because there you *do* perform all I/O with OS APIs. But I don't know anything about the Windows console except that nobody seems happy with it. > Similarly, locale.getpreferredencoding() on Windows returns a > legacy value - the user's active code page - which should generally > not be used for any reason. This is even less supportable, because it breaks much code that used to work without specifying an encoding. Refusing to respect the locale preferred encoding would force most Japanese scripters to specify encodings where they currently accept the system default, I suspect. On those occasions my Windows-using colleagues deliver text files, they are *always* encoded in Shift JIS. University databases the deliver CSV files allow selecting Shift JIS or UTF-8, and most people choose Shift JIS. And so on. In Japan, Shift JIS remains pervasive on Windows. I don't think Japan is special in this, except in the pervasiveness of Shift JIS. For everybody I think there will be more loss than benefit imposed. > BOMs are very common on Windows, since the default assumption is > nearly always a bad idea. I agree (since 1990!) that Shift JIS by default is a bad idea, but there's no question that it is still overwhelmingly popular. I suspect UTF-8 signatures are uncommon, too, as most UTF-8 originates on Mac or *nix platforms. > This would match the behavior that the .NET Framework has used for > many years - effectively, utf_8_sig on read and utf_8 on write. But .NET is a framework. It expects to be the world in which programs exist, no? Python is very frequently used as a glue language, and I suspect the analogy fails due to that distinction. Footnotes: [1] Strictly speaking, certain programs must support GB 18030. I don't think it's legally required to be the default encoding. [2] For example, the most restricted Japanese standard, JIS X 0208, includes not only "full-width" versions of ASCII characters, but the full Greek and Cyrillic alphabets, many math symbols, a full line drawing set, and much more besides the native syllabary and Han ideographs. The elderly Chinese GB 2312 not only includes Greek and Cyrillic, and the various symbols, but also the Japanese syllabaries. (And the more recent GB 18030 swallowed Unicode whole.) From drekin at gmail.com Sat Aug 13 06:28:45 2016 From: drekin at gmail.com (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=) Date: Sat, 13 Aug 2016 12:28:45 +0200 Subject: [Python-ideas] Fix default encodings on Windows Message-ID: On Fri Aug 12 19:03:38 EDT 2016 Victor Stinner wrote: > For the Windows console: I played with all Windows functions, tried all > fonts and many code pages. I also read technical blog articles of Microsoft > employees. I gave up on this issue. It doesn't seem possible to support > fully Unicode the Windows console (at least the last time I checked). By > the way, it seems like Windows functions have bugs, and the code page 65001 > fixes a few issues but introduces new issues... > > Do you mean that it doesn't seem possible to support Unicode on the Windows console by means of ANSI codepages? Because using the wide APIs seems to work (as win_unicode_console shows). There are some issues like non-BMP characters, which are encoded as surrogate pairs and the console doesn't understand them for display (shows two boxes), but this is just matter of display and not of corruption of the actual data (e.g. you can copy the text from console). Also, there seems to be no font to support all Unicode and AFAIK you cannot configure the console to use multiple fonts, but again this is a display issue of the console window itself rather than of the essential communication between Python and console. Adam Barto? -------------- next part -------------- An HTML attachment was scrubbed... URL: From drekin at gmail.com Sat Aug 13 06:46:54 2016 From: drekin at gmail.com (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=) Date: Sat, 13 Aug 2016 12:46:54 +0200 Subject: [Python-ideas] Fix default encodings on Windows Message-ID: Stephen J. Turnbull writes: > The exception is the proposed console changes, because there you *do* > perform all I/O with OS APIs. But I don't know anything about the > Windows console except that nobody seems happy with it. > > I'm quite happy with it. I mean, it's far from perfect, and when you look at discussions on Stack Overflow regarding Unicode on Windows console, almost everyone blames Windows console, but I think that software often doesn't communicate with it correctly. When Windows Unicode wide APIs are used, it just works. Adam Barto? -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Sat Aug 13 08:23:35 2016 From: random832 at fastmail.com (Random832) Date: Sat, 13 Aug 2016 08:23:35 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> Message-ID: <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> On Sat, Aug 13, 2016, at 04:12, Stephen J. Turnbull wrote: > Steve Dower writes: > > ISTM that changing sys.getfilesystemencoding() on Windows to > > "utf-8" and updating path_converter() (Python/posixmodule.c; > > I think this proposal requires the assumption that strings intended to > be interpreted as file names invariably come from the Windows APIs. I > don't think that is true: Makefiles and similar, configuration files, > all typically contain filenames. Zipfiles (see below). And what's going to happen if you shovel those bytes into the filesystem without conversion on Linux, or worse, OSX? This problem isn't unique to Windows. > Python is frequently used as a glue language, so presumably receives > such file name information as (more or less opaque) bytes objects over > IPC channels. They *can't* be opaque. Someone has to decide what they mean, and you as the application developer might well have to step up and *be that someone*. If you don't, someone else will decide for you. > These just aren't under OS control, so the assumption will > fail. > > So I believe bytes-oriented software must expect non-UTF-8 file names > in Japan. The only way to deal with data representing filenames and destined for the filesystem on windows is to convert it, somehow, ultimately to UTF-16-LE. Not doing so is impossible, it's only a question of what layer it happens in. If you convert it using the wrong encoding, you lose. The only way to deal with it on Mac OS X is to convert it to UTF-8. If you don't, you lose. If you convert it using the wrong encoding, you lose. This proposal embodies an assumption that bytes from unknown sources used as filenames are more likely to be UTF-8 than in the locale ACP (i.e. "mbcs" in pythonspeak, and Shift-JIS in Japan). Personally, I think the whole edifice is rotten, and choosing one encoding over another isn't a solution; the only solution is to require the application to make a considered decision about what the bytes mean and pass its best effort at converting to a Unicode string to the API. This is true on Windows, it's true on OSX, and I would argue it's pretty close to being true on Linux except in a few very niche cases. So I think for the filesystem encoding we should stay the course, continuing to print a DeprecationWarning and maybe, just maybe, eventually actually deprecating it. On Windows and OSX, this "glue language" business of shoveling bytes from one place to another without caring what they mean can only last as long as they don't touch the filesystem. > You have no carrot. These changes enforce an encoding on bytes for > Windows APIs but can't do so for data, and so will make file-names- > are-just-bytes programmers less happy with Python, not more happy. I think the use case that the proposal has in mind is a file-names-are-just- bytes program (or set of programs) that reads from the filesystem, converts to bytes for a file/network, and then eventually does the reverse - either end may be on windows. Using UTF-8 will allow those to make the round trip (strictly speaking, you may need surrogatepass, and OSX does its weird normalization thing), using any other encoding (except for perhaps GB18030) will not. From mertz at gnosis.cx Sat Aug 13 09:45:27 2016 From: mertz at gnosis.cx (David Mertz) Date: Sat, 13 Aug 2016 06:45:27 -0700 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: References: <20160802182203.GF6608@ando.pearwood.info> <1470163549.2394114.684030937.77AD807A@webmail.messagingengine.com> <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> <890d9a3a-631c-d544-7ac6-3d21bdaec75b@mrabarnett.plus.com> Message-ID: None seems reasonable. But it does require some conditional checks rather than the simplest min-of-max. Not a bad answer, just something to be explicit about. On Aug 12, 2016 8:44 PM, "Chris Angelico" wrote: > On Sat, Aug 13, 2016 at 1:31 PM, MRAB wrote: > > On 2016-08-13 00:48, David Mertz wrote: > >> > >> On Fri, Aug 12, 2016 at 4:25 PM, Victor Stinner > >> > wrote: > >> > > [snip] > >> > >> > >> Also, what is the calling syntax? Are the arguments strictly positional, > >> or do they have keywords? What are those default values if the arguments > >> are not specified for either or both of min_val/max_val? E.g., is this > >> OK: > >> > >> clamp(5, min_val=0) > >> > > I would've thought that the obvious default would be None, meaning > > "missing". > > Doesn't really matter what the defaults are. That call means "clamp > with a minimum of 0 and no maximum". It's been completely omitted. > > But yes, probably it would be min_val=None, max_val=None. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arek.bulski at gmail.com Sat Aug 13 10:31:06 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Sat, 13 Aug 2016 16:31:06 +0200 Subject: [Python-ideas] From mailing list to GitHub issues Message-ID: I have been a subscriber only for few weeks now but I dont like the mailing list at all. First, I get all the topics even tho Windows encoding is not of my interest. Second, most of the text is auto quotes anyway. Third, editing posts can sometimes be helpful, for correcting typos and such. I think it would be beneficial to use GitHub issues instead, one for each topic and perhaps one for general notifications like announcing new topics or forum wide announcements. Unfortunately it seems that moving away from existing ways always meets with a lot of inertia. On the other hand, python is probably one of most actively developed langs around so maybe it is doable. I put my proposal on the forum floor to discuss. Cheers to all active participants. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p1himik at gmail.com Sat Aug 13 10:35:39 2016 From: p1himik at gmail.com (Eugene Pakhomov) Date: Sat, 13 Aug 2016 21:35:39 +0700 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: Message-ID: To ease the struggle of pressing "Mark as read" every time an uninteresting email arrives, look for "Mute" button. In Gmail it's under "More" drop-down. Apart from that, I completely agree. Maybe not necessarily GitHub, but something similar that's not email lists. On Sat, Aug 13, 2016 at 9:31 PM, Arek Bulski wrote: > I have been a subscriber only for few weeks now but I dont like the > mailing list at all. First, I get all the topics even tho Windows encoding > is not of my interest. Second, most of the text is auto quotes anyway. > Third, editing posts can sometimes be helpful, for correcting typos and > such. > > I think it would be beneficial to use GitHub issues instead, one for each > topic and perhaps one for general notifications like announcing new topics > or forum wide announcements. > > Unfortunately it seems that moving away from existing ways always meets > with a lot of inertia. On the other hand, python is probably one of most > actively developed langs around so maybe it is doable. I put my proposal on > the forum floor to discuss. > > Cheers to all active participants. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Aug 13 11:41:48 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 14 Aug 2016 01:41:48 +1000 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: Message-ID: On Sun, Aug 14, 2016 at 12:31 AM, Arek Bulski wrote: > I have been a subscriber only for few weeks now but I dont like the mailing > list at all. First, I get all the topics even tho Windows encoding is not of > my interest. Second, most of the text is auto quotes anyway. Third, editing > posts can sometimes be helpful, for correcting typos and such. > > I think it would be beneficial to use GitHub issues instead, one for each > topic and perhaps one for general notifications like announcing new topics > or forum wide announcements. Strongly disagree. Yes, you have to cope with the topics you're not interested in, but a good email client will help you with that anyway. (In Gmail, for instance, "Mute this thread" does that for you.) You'd have to deal with that on *any* forum, so email is no different. GitHub Issues is *only* good for one purpose, and that is the management of one repository. You tried to start a general Python question on the PEPs repository, which isn't right. It emphasizes the PEP process as if it were the one and only way to discuss Python ideas, which it most certainly isn't - the vast majority of python-ideas threads don't result in PEPs. There are other places where discussion can happen, too. "BPO" (http://bugs.python.org/) is where changes to Python's core code end up - it might be moving to GitHub Issues, but if it does, it wouldn't be part of the PEPs repo, but part of the CPython repo. There's the python-dev mailing list, where a lot of traffic isn't specifically about changes to anything at all, but is about general policies and such. And python-list (aka comp.lang.python) also gets a lot of discussion, although you'd start a thread on python-list if you expect the answer to be more of "Here's how you can do that" than "Yes/no, we will/won't add that to the language". They're unlikely to shift to GitHub Issues. Ultimately, email is the best way that I've *ever* seen for discussing important matters like this. All the others (BPO, GH Issues, etc), and even social media (eg when Christine sends me a Twitter message), channel through to email - GitHub sends me an email any time an issue is created or commented on, etc. Every important discussion I've ever been involved with has been in one of my email inboxes, with the possible exception of real-time conversations - which then end up being ephemeral. The ONLY benefit you're stating for GH Issues is that it has per-topic notifications. Those would be broken the instant a discussion begins to wander, as you get the age-old problem of "is this a reply to that, or is it a new topic?" (answer: it's both), so I think you'd find the advantage over email isn't all that great anyway. Get Mozilla Thunderbird or Squirrel Mail or some other at least half-way decent mail client, and you should be able to cope with python-ideas. ChrisA From guido at python.org Sat Aug 13 12:26:16 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Aug 2016 09:26:16 -0700 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: Message-ID: I don't have time to respond at length, but I would just like to mention that I'm actually pretty tired of email threads getting off the rails and wouldn't mind looking at other approaches, including possibly a dedicated GitHub tracker (*not* the cpython or peps repo's tracker). There are other possible solutions too, e.g. discourse or MailMan3 + HyperKitty. Like Chris, I live in my inbox and everything of importance must come through there or I won't know about it, but that doesn't imply to me that it's the best way to manage discussions. I frequently go to the website (e.g. GitHub, bugs.python.org, discourse) for a better UI to peruse and manage a discussion. A few projects I'm on have no mailing list, only GitHub trackers, and they seem to work well for a variety of purposes, including newbie help, bugs, philosophical discussions, and debates on the future shape of specific features. It takes away the whole "is this a bug or is it a feature" insecurity that many people have before posting to a tracker (often out of fear of annoying the developers) -- you just post to the tracker and someone will triage it, and your head won't be bitten off. That's all I have time for. On Sat, Aug 13, 2016 at 8:41 AM, Chris Angelico wrote: > On Sun, Aug 14, 2016 at 12:31 AM, Arek Bulski > wrote: > > I have been a subscriber only for few weeks now but I dont like the > mailing > > list at all. First, I get all the topics even tho Windows encoding is > not of > > my interest. Second, most of the text is auto quotes anyway. Third, > editing > > posts can sometimes be helpful, for correcting typos and such. > > > > I think it would be beneficial to use GitHub issues instead, one for each > > topic and perhaps one for general notifications like announcing new > topics > > or forum wide announcements. > > Strongly disagree. > > Yes, you have to cope with the topics you're not interested in, but a > good email client will help you with that anyway. (In Gmail, for > instance, "Mute this thread" does that for you.) You'd have to deal > with that on *any* forum, so email is no different. > > GitHub Issues is *only* good for one purpose, and that is the > management of one repository. You tried to start a general Python > question on the PEPs repository, which isn't right. It emphasizes the > PEP process as if it were the one and only way to discuss Python > ideas, which it most certainly isn't - the vast majority of > python-ideas threads don't result in PEPs. > > There are other places where discussion can happen, too. "BPO" > (http://bugs.python.org/) is where changes to Python's core code end > up - it might be moving to GitHub Issues, but if it does, it wouldn't > be part of the PEPs repo, but part of the CPython repo. There's the > python-dev mailing list, where a lot of traffic isn't specifically > about changes to anything at all, but is about general policies and > such. And python-list (aka comp.lang.python) also gets a lot of > discussion, although you'd start a thread on python-list if you expect > the answer to be more of "Here's how you can do that" than "Yes/no, we > will/won't add that to the language". They're unlikely to shift to > GitHub Issues. > > Ultimately, email is the best way that I've *ever* seen for discussing > important matters like this. All the others (BPO, GH Issues, etc), and > even social media (eg when Christine sends me a Twitter message), > channel through to email - GitHub sends me an email any time an issue > is created or commented on, etc. Every important discussion I've ever > been involved with has been in one of my email inboxes, with the > possible exception of real-time conversations - which then end up > being ephemeral. > > The ONLY benefit you're stating for GH Issues is that it has per-topic > notifications. Those would be broken the instant a discussion begins > to wander, as you get the age-old problem of "is this a reply to that, > or is it a new topic?" (answer: it's both), so I think you'd find the > advantage over email isn't all that great anyway. Get Mozilla > Thunderbird or Squirrel Mail or some other at least half-way decent > mail client, and you should be able to cope with python-ideas. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Sat Aug 13 12:36:06 2016 From: phd at phdru.name (Oleg Broytman) Date: Sat, 13 Aug 2016 18:36:06 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: Message-ID: <20160813163606.GA28955@phdru.name> Hi! Let me completely disagree. On Sat, Aug 13, 2016 at 04:31:06PM +0200, Arek Bulski wrote: > I have been a subscriber only for few weeks now but I dont like the mailing > list at all. First, I get all the topics even tho Windows encoding is not > of my interest. Second, most of the text is auto quotes anyway. Third, > editing posts can sometimes be helpful, for correcting typos and such. > > I think it would be beneficial to use GitHub issues instead, one for each > topic and perhaps one for general notifications like announcing new topics > or forum wide announcements. > > Unfortunately it seems that moving away from existing ways always meets > with a lot of inertia. On the other hand, python is probably one of most > actively developed langs around so maybe it is doable. I put my proposal on > the forum floor to discuss. > > Cheers to all active participants. The advantages of email: -- Push technology: it's delivered to my mailbox and I don't need to visit the discussion site. -- I can filter incoming messages and deliver them to whatever mailboxes I prefer. -- I can read it in any interface I prefer -- there are mail user agents with Web, GUI, TUI and command line interfaces. -- I can filter and sort message in whatever order I prefer -- by discussion, by subtopic, by date, by author. -- I can download the entire mail archive or its part to process it offline -- read it, search through it, write messages while offline to send them later. The disadvantages of web chat/forum/trackers: -- It forces me to always be online. -- Very limited support for themes in the interface -- I can only read messages in whatever web interface the f..ing servers and the freaking browsers give me. -- Very limited functionality for message filtering, sorting and searching. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From phd at phdru.name Sat Aug 13 12:54:49 2016 From: phd at phdru.name (Oleg Broytman) Date: Sat, 13 Aug 2016 18:54:49 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <20160813163606.GA28955@phdru.name> References: <20160813163606.GA28955@phdru.name> Message-ID: <20160813165449.GA31170@phdru.name> On Sat, Aug 13, 2016 at 06:36:06PM +0200, Oleg Broytman wrote: > Hi! Let me completely disagree. > > On Sat, Aug 13, 2016 at 04:31:06PM +0200, Arek Bulski wrote: > > I have been a subscriber only for few weeks now but I dont like the mailing > > list at all. First, I get all the topics even tho Windows encoding is not > > of my interest. Second, most of the text is auto quotes anyway. Third, > > editing posts can sometimes be helpful, for correcting typos and such. > > > > I think it would be beneficial to use GitHub issues instead, one for each > > topic and perhaps one for general notifications like announcing new topics > > or forum wide announcements. > > > > Unfortunately it seems that moving away from existing ways always meets > > with a lot of inertia. On the other hand, python is probably one of most > > actively developed langs around so maybe it is doable. I put my proposal on > > the forum floor to discuss. > > > > Cheers to all active participants. > > The advantages of email: > > -- Push technology: it's delivered to my mailbox and I don't need to > visit the discussion site. > -- I can filter incoming messages and deliver them to whatever mailboxes > I prefer. > -- I can read it in any interface I prefer -- there are mail user agents > with Web, GUI, TUI and command line interfaces. > -- I can filter and sort message in whatever order I prefer -- by > discussion, by subtopic, by date, by author. > -- I can download the entire mail archive or its part to process it > offline -- read it, search through it, write messages while offline > to send them later. -- I can edit messages in my preferred editor (browsers with extensions also allow that though I consider that less convenient). -- I can pipe messages through different programs -- pagers, decoders, decryptors, antispam filters. > The disadvantages of web chat/forum/trackers: > > -- It forces me to always be online. > -- Very limited support for themes in the interface -- I can only read > messages in whatever web interface the f..ing servers and the > freaking browsers give me. > -- Very limited functionality for message filtering, sorting and searching. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From donald at stufft.io Sat Aug 13 13:16:01 2016 From: donald at stufft.io (Donald Stufft) Date: Sat, 13 Aug 2016 13:16:01 -0400 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <20160813163606.GA28955@phdru.name> References: <20160813163606.GA28955@phdru.name> Message-ID: <00BD2FEA-3149-41F7-8D12-59B7C9F9CB41@stufft.io> > On Aug 13, 2016, at 12:36 PM, Oleg Broytman wrote: > > The advantages of email: I think one of the big trade offs here, is that the traditional mailing list can work very well if everyone involved takes the time to develop a custom tool chain that fits their own workflow perfectly and if they spend the time learning the deficiency of the systems to ensure they correctly work around them. The web forum thing can theoretically achieve much less of a theoretical ?maximum? for productivity, but it typically means that you can bring productivity gains to those who can?t or won?t spend time maintaining a custom mailing stack. Essentially it becomes a trade off between losing some of the flexibility/productivity for a handful of people in exchange for boosting productivity for most other folks. One of the big problems with mailing lists is that you have no control over the clients, so you can?t really achieve anything more robust than whatever the lowest common denominator is for all mail clients that are participating in the discussion. An examples: A thread is going off the rails and we wish to redirect them to a new topic or list while either closing the old topic or allowing discussion to continue in the original topic. With the traditional mailing list, your only real options are to tell people to stop and? hope they do that? Except that becomes a problem because people?s email can be severely delayed, people miss messages, etc. I have yet to see a mailing list where someone didn?t accidentally post something to the wrong place, and then you end up having 10+ people all scolding them for posting in the wrong place, meanwhile you have some people answering the question anyways, and it becomes a huge mess. Compare this to the experience with a web forum where you can just move the existing thread immediately, and/or redirect people to a new location and optionally close the old thread to no longer allow posting. I can go on and on, but by having some control over the client, these systems are able to add additional features that make the baseline UX of discussion much better, though perhaps worse for individual users who are willing to spend time carefully crafting their own experience. Consider that almost every advantage you listed for email, could also be considered a disadvantage in that because each of those things are _possible_, that the tooling has to attempt to handle all of those things sanely. ? Donald Stufft From steve.dower at python.org Sat Aug 13 13:24:32 2016 From: steve.dower at python.org (Steve Dower) Date: Sat, 13 Aug 2016 10:24:32 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> Message-ID: Just a heads-up that I've assigned http://bugs.python.org/issue1602 to myself and started a patch for the console changes. Let's move the console discussion back over there. Hopefully it will show up in 3.6.0b1, but if you're prepared to apply a patch and test on Windows, feel free to grab my work so far. There's a lot of "making sure other things aren't broken" left to do. Cheers, Steve From arek.bulski at gmail.com Sat Aug 13 13:40:41 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Sat, 13 Aug 2016 19:40:41 +0200 Subject: [Python-ideas] From mailing list to GitHub issues Message-ID: Praise the guide! (Guido) ?GitHub issues are also delivered by email, with full post content. Guido and others will be satisfied. And mailing lists also send you messages in whatever freakin interface they provide it. And on my android gmail app it aint pretty. Most of it is auto replies in plain text and replaying to a particular thread is kinda impossible. I had to bring my laptop to reply to this. As Donald pointed out, there are people who are not going to create custom email processing toolchains. I am one of them. Moderation is not of concern to me particularly but that is also an advantage. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Sat Aug 13 13:45:32 2016 From: steve.dower at python.org (Steve Dower) Date: Sat, 13 Aug 2016 10:45:32 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> Message-ID: On 13Aug2016 0523, Random832 wrote: > On Sat, Aug 13, 2016, at 04:12, Stephen J. Turnbull wrote: >> Steve Dower writes: >> > ISTM that changing sys.getfilesystemencoding() on Windows to >> > "utf-8" and updating path_converter() (Python/posixmodule.c; >> >> I think this proposal requires the assumption that strings intended to >> be interpreted as file names invariably come from the Windows APIs. I >> don't think that is true: Makefiles and similar, configuration files, >> all typically contain filenames. Zipfiles (see below). > > And what's going to happen if you shovel those bytes into the > filesystem without conversion on Linux, or worse, OSX? This problem > isn't unique to Windows. Yeah, this is basically my view too. If your path bytes don't come from the filesystem, you need to know the encoding regardless. But it's very reasonable to be able to round-trip. Currently, the following two lines of code can have different behaviour on Windows (i.e. the latter fails to open the file): >>> open(os.listdir('.')[-1]) >>> open(os.listdir(b'.')[-1]) On Windows, the filesystem encoding is inherently Unicode, which means you can't reliably round-trip filenames through the current code page. Changing all of Python to use the Unicode APIs internally and making the bytes encoding utf-8 (or utf-16-le, which would save a conversion) resolves this and doesn't really affect >> These just aren't under OS control, so the assumption will >> fail. >> >> So I believe bytes-oriented software must expect non-UTF-8 file names >> in Japan. Even on Japanese Windows, non-UTF-8 file names must be encodable with UTF-16 or they cannot exist on the file system. This moves the encoding boundary into the application, which is where it needed to be anyway for robust software - "Correct" path handling still requires decoding to text, and if you know that your source is the encoded with the active code page then byte_path.decode('mbcs', 'surrogateescape') is still valid. Cheers, Steve From phd at phdru.name Sat Aug 13 13:50:06 2016 From: phd at phdru.name (Oleg Broytman) Date: Sat, 13 Aug 2016 19:50:06 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <00BD2FEA-3149-41F7-8D12-59B7C9F9CB41@stufft.io> References: <20160813163606.GA28955@phdru.name> <00BD2FEA-3149-41F7-8D12-59B7C9F9CB41@stufft.io> Message-ID: <20160813175006.GA32662@phdru.name> Good addition, makes me think. Thank you! On Sat, Aug 13, 2016 at 01:16:01PM -0400, Donald Stufft wrote: > > > On Aug 13, 2016, at 12:36 PM, Oleg Broytman wrote: > > > > The advantages of email: > > I think one of the big trade offs here, is that the traditional mailing list can work very well if everyone involved takes the time to develop a custom tool chain that fits their own workflow perfectly and if they spend the time learning the deficiency of the systems to ensure they correctly work around them. The web forum thing can theoretically achieve much less of a theoretical ???maximum??? for productivity, but it typically means that you can bring productivity gains to those who can???t or won???t spend time maintaining a custom mailing stack. > > Essentially it becomes a trade off between losing some of the flexibility/productivity for a handful of people in exchange for boosting productivity for most other folks. > > One of the big problems with mailing lists is that you have no control over the clients, so you can???t really achieve anything more robust than whatever the lowest common denominator is for all mail clients that are participating in the discussion. An examples: > > A thread is going off the rails and we wish to redirect them to a new topic or list while either closing the old topic or allowing discussion to continue in the original topic. With the traditional mailing list, your only real options are to tell people to stop and??? hope they do that? Except that becomes a problem because people???s email can be severely delayed, people miss messages, etc. I have yet to see a mailing list where someone didn???t accidentally post something to the wrong place, and then you end up having 10+ people all scolding them for posting in the wrong place, meanwhile you have some people answering the question anyways, and it becomes a huge mess. Compare this to the experience with a web forum where you can just move the existing thread immediately, and/or redirect people to a new location and optionally close the old thread to no longer allow posting. From the recent and not so recent discussions of Moxi Marlinspike about centralized vs decentralized solutions (unfederated messaging vs email/jabber): "Indeed, cannibalizing a federated application-layer protocol into a centralized service is almost a sure recipe for a successful consumer product today. It's what Slack did with IRC, what Facebook did with email, and what WhatsApp has done with XMPP. In each case, the federated service is stuck in time, while the centralized service is able to iterate into the modern world and beyond.". https://whispersystems.org/blog/the-ecosystem-is-moving/ The problem for me is that it's about consumers while I prefer to deal with powerful users. > I can go on and on, but by having some control over the client, these systems are able to add additional features that make the baseline UX of discussion much better, though perhaps worse for individual users who are willing to spend time carefully crafting their own experience. Consider that almost every advantage you listed for email, could also be considered a disadvantage in that because each of those things are _possible_, that the tooling has to attempt to handle all of those things sanely. > > ??? > Donald Stufft Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From phd at phdru.name Sat Aug 13 13:54:30 2016 From: phd at phdru.name (Oleg Broytman) Date: Sat, 13 Aug 2016 19:54:30 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: Message-ID: <20160813175430.GA313@phdru.name> On Sat, Aug 13, 2016 at 07:40:41PM +0200, Arek Bulski wrote: > Praise the guide! (Guido) > > ???GitHub issues are also delivered by email, with full post content. Guido > and others will be satisfied. I wouldn't be satisfied without the ability to answer to this messages by email. Our bug tracker (Roundup) is an example of the best of both world -- I can use both web-interface and email. I can create issues by email and reply to email sent to me the tracker. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From phd at phdru.name Sat Aug 13 14:21:13 2016 From: phd at phdru.name (Oleg Broytman) Date: Sat, 13 Aug 2016 20:21:13 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: Message-ID: <20160813182113.GA1534@phdru.name> On Sat, Aug 13, 2016 at 07:40:41PM +0200, Arek Bulski wrote: > And mailing lists also send you messages in > whatever freakin interface they provide it. And on my android gmail app it > aint pretty. In my not so humble opinion, web interfaces, especially mobile ones, are even less pretty. > As Donald pointed out, there are people who are not going to create > custom email processing toolchains. In what way they will be helpful to the development of Python? Contributors have to install, learn, configure and use a lot of development tools, comparing to which email tools are just toys. Python development is not in dire need for contributions, it's rather in dire need for good contributions, code reviews and documentation updates, hence it's in dire need of powerful users. Every one of us was a novice sometime. Thanks goodness, there were enough golden-hearted power users to tought me in my novice time. Python community is certainly one of the best in this regard, it deals with novices in amazingly gentle way. And the outcome of the dealing with novices IMO should be: we drag novices (some of them kicking and screaming) to become power users. We should help novices but not downgrade our tools in the name of novices. Mine opinion only. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From donald at stufft.io Sat Aug 13 14:36:05 2016 From: donald at stufft.io (Donald Stufft) Date: Sat, 13 Aug 2016 14:36:05 -0400 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <20160813182113.GA1534@phdru.name> References: <20160813182113.GA1534@phdru.name> Message-ID: > On Aug 13, 2016, at 2:21 PM, Oleg Broytman wrote: > >> >> As Donald pointed out, there are people who are not going to create >> custom email processing toolchains. > > In what way they will be helpful to the development of Python? > Contributors have to install, learn, configure and use a lot of > development tools, comparing to which email tools are just toys. Well, I personally generally do not have the time to sit there and craft some sort of complex email toolchain to deal with it. I just leave lists if their volume are too high for me to deal with or they don?t provide me the tools to interact with them in a non frustrating way. For example, I?m no longer subscribed to python-dev because of these reasons. I *think* I?ve positively impacted the development of Python, but maybe not! In any case I think that falling down the trap of thinking that anyone who is willing to contribute to Python is also willing to maintain a personal toolchain for dealing with the deficiencies in mailing lists is not a place we should be in. That?s not to say that a traditional mailing list may not represent the best trade off? I have my opinion, you have yours, but we should not start thinking that adding an obstacle course that a user must complete before they can meaningfully contribute is doing anything but self selecting for people willing to run that particular obstacle course, not selecting for skill or likely impact. ? Donald Stufft From mertz at gnosis.cx Sat Aug 13 14:44:39 2016 From: mertz at gnosis.cx (David Mertz) Date: Sat, 13 Aug 2016 11:44:39 -0700 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: I find email list VASTLY easier to deal with than any newfangled web-based custom discussion forum. Part of that is that it is a uniform interface to every list I belong too, and I can choose my own MUA. With all those web things, every site works a little bit different from every other one, that imposes an unnecessary cognitive burden (and usually simply lacks some desired capability) On Aug 13, 2016 11:39 AM, "Donald Stufft" wrote: > > > On Aug 13, 2016, at 2:21 PM, Oleg Broytman wrote: > > > >> > >> As Donald pointed out, there are people who are not going to create > >> custom email processing toolchains. > > > > In what way they will be helpful to the development of Python? > > Contributors have to install, learn, configure and use a lot of > > development tools, comparing to which email tools are just toys. > > > Well, I personally generally do not have the time to sit there and craft > some sort of complex email toolchain to deal with it. I just leave lists > if their volume are too high for me to deal with or they don?t provide me > the tools to interact with them in a non frustrating way. For example, I?m > no longer subscribed to python-dev because of these reasons. I *think* I?ve > positively impacted the development of Python, but maybe not! In any case > I think that falling down the trap of thinking that anyone who is willing > to contribute to Python is also willing to maintain a personal toolchain > for dealing with the deficiencies in mailing lists is not a place we should > be in. > > That?s not to say that a traditional mailing list may not represent the > best > trade off? I have my opinion, you have yours, but we should not start > thinking > that adding an obstacle course that a user must complete before they can > meaningfully contribute is doing anything but self selecting for people > willing > to run that particular obstacle course, not selecting for skill or likely > impact. > > ? > Donald Stufft > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Aug 13 15:09:53 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 14 Aug 2016 04:09:53 +0900 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> Message-ID: <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> Random832 writes: > And what's going to happen if you shovel those bytes into the > filesystem without conversion on Linux, or worse, OSX? Off topic. See Subject: field. > This proposal embodies an assumption that bytes from unknown sources > used as filenames are more likely to be UTF-8 than in the locale ACP Then it's irrelevant: most bytes are not from "unknown sources", they're from correspondents (or from yourself!) -- and for most users most of the time, those correspondents share the locale encoding with them. At least where I live, they use that encoding frequently. > the only solution is to require the application to make a > considered decision That's not a solution. Code is not written with every decision considered, and it never will be. The (long-run) solution is a la Henry Ford: "you can encode text any way you want, as long as it's UTF-8". Then it won't matter if people ever make considered decisions about encoding! But trying to enforce that instead of letting it evolve naturally (as it is doing) will cause unnecessary pain for Python programmers, and I believe quite a lot of pain. I used to be in the "make them speak UTF-8" camp. But in the 15 years since PEP 263, experience has shown me that mostly it doesn't matter, and that when it does matter, you have to deal with the large variety of encodings anyway -- assuming UTF-8 is not a win. For use cases that can be encoding-agnostic because all cooperating participants share a locale encoding, making them explicitly specify the locale encoding is just a matter of "misery loves company". Please, let's not do things for that reason. > I think the use case that the proposal has in mind is a > file-names-are-just-bytes program (or set of programs) that reads > from the filesystem, converts to bytes for a file/network, and then > eventually does the reverse - either end may be on windows. You have misspoken somewhere. The programs under discussion do not "convert" input to bytes; they *receive* bytes, either from POSIX APIs or from Windows *A APIs, and use them as is. Unless I am greatly mistaken, Steve simply wants that to work as well on Windows as on POSIX platforms, so that POSIX programmers who do encoding-agnostic programming have one less barrier to supporting their software on Windows. But you'll have to ask Steve to rule on that. Steve From mafagafogigante at gmail.com Sat Aug 13 15:55:08 2016 From: mafagafogigante at gmail.com (Bernardo Sulzbach) Date: Sat, 13 Aug 2016 16:55:08 -0300 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: <43dc516b-1520-9c00-1509-79282026ee88@gmail.com> On 08/13/2016 03:44 PM, David Mertz wrote: > I find email list VASTLY easier to deal with than any newfangled web-based > custom discussion forum. Part of that is that it is a uniform interface to > every list I belong too, and I can choose my own MUA. With all those web > things, every site works a little bit different from every other one, that > imposes an unnecessary cognitive burden (and usually simply lacks some > desired capability) I completely agree. I really enjoy bug trackers and think that bugs should no longer be tracked within the repository (as in a 5K lines TODO.txt), but mailing lists are not, by any means, inferior to bug trackers as far as discussions go. --- Also, you can reply to messages from GitHub by email (just reply and send a message with only your answer). However, because it allows for editing (which is a disaster, not an advantage), if you don't follow their links you may respond to an answer that no longer exists or was completely changed by the author or by a moderator. How to prevent this? Quote the text you are replying to. And we are back to email, but now with a web browser involved. On top of this, I think that just making it easier for new contributors will not help with getting good and dedicated contributors. This is already pretty popular, getting it some extra GitHub stars and links should not help much. From arek.bulski at gmail.com Sat Aug 13 16:30:44 2016 From: arek.bulski at gmail.com (Arkadiusz Bulski) Date: Sat, 13 Aug 2016 22:30:44 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: Message-ID: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> ?I think that just making it easier for new contributors will not help with getting good and dedicated contributors.? This is exactly what Donald was talking about. You are creating an obstacle course for people to go through before they can contribute anything. I totally agree that we need *good contributions* but making it harder does not help. Remember why what made python the best lang in the first place, ease of use? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.cliffe at btinternet.com Sat Aug 13 16:46:47 2016 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Sat, 13 Aug 2016 21:46:47 +0100 Subject: [Python-ideas] Consider adding clip or clamp function to math In-Reply-To: <20160813001425.GI26300@ando.pearwood.info> References: <57A12B5A.90504@canterbury.ac.nz> <20160804132028.GL6608@ando.pearwood.info> <22436.853.797870.835321@turnbull.sk.tsukuba.ac.jp> <20160805060648.GU6608@ando.pearwood.info> <22436.41739.428069.534278@turnbull.sk.tsukuba.ac.jp> <20160805162901.GX6608@ando.pearwood.info> <22437.29809.206551.416746@turnbull.sk.tsukuba.ac.jp> <20160813001425.GI26300@ando.pearwood.info> Message-ID: <600e45a5-f976-b715-292c-87c0c3e3de3e@btinternet.com> Far be it from me to damp your enthusiasm, but it seems to me that the functionality of a clamp function is so simple, and yet has so many possible variations, that it's not worth providing it. I.e., it's probably quicker for someone to write their own version (typically <= 4 lines of code) than to look up the library version read *and understand* its specification decide whether it's suitable for their use case maybe: decide that it isn't (or they don't understand it) and write their own one anyway. A custom version can also be optimised for the particular use case. Regards Rob Cliffe On 13/08/2016 01:14, Steven D'Aprano wrote: > On Sat, Aug 13, 2016 at 01:25:58AM +0200, Victor Stinner wrote: >> I tried to follow this discussion and I still to understand why my >> proposition of "def clamp(min_val, value, max_val): return min(max(min_val, >> value), max_val)" is not good. I expect that a PEP replies to this question >> without to read the whole thread :-) > I said I would write up a summary, and I will. If you want to call it a > PEP, I'm okay with that. I won't forget your proposal either :-) > > From steve.dower at python.org Sat Aug 13 18:00:46 2016 From: steve.dower at python.org (Steve Dower) Date: Sat, 13 Aug 2016 15:00:46 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> Message-ID: The last point is correct: if you get bytes from a file system API, you should be able to pass them back in without losing information. CP_ACP (a.k.a. the *A API) does not allow this, so I'm proposing using the *W API everywhere and encoding to utf-8 when the user wants/gives bytes. Top-posted from my Windows Phone -----Original Message----- From: "Stephen J. Turnbull" Sent: ?8/?13/?2016 12:11 To: "Random832" Cc: "python-ideas at python.org" Subject: Re: [Python-ideas] Fix default encodings on Windows Random832 writes: > And what's going to happen if you shovel those bytes into the > filesystem without conversion on Linux, or worse, OSX? Off topic. See Subject: field. > This proposal embodies an assumption that bytes from unknown sources > used as filenames are more likely to be UTF-8 than in the locale ACP Then it's irrelevant: most bytes are not from "unknown sources", they're from correspondents (or from yourself!) -- and for most users most of the time, those correspondents share the locale encoding with them. At least where I live, they use that encoding frequently. > the only solution is to require the application to make a > considered decision That's not a solution. Code is not written with every decision considered, and it never will be. The (long-run) solution is a la Henry Ford: "you can encode text any way you want, as long as it's UTF-8". Then it won't matter if people ever make considered decisions about encoding! But trying to enforce that instead of letting it evolve naturally (as it is doing) will cause unnecessary pain for Python programmers, and I believe quite a lot of pain. I used to be in the "make them speak UTF-8" camp. But in the 15 years since PEP 263, experience has shown me that mostly it doesn't matter, and that when it does matter, you have to deal with the large variety of encodings anyway -- assuming UTF-8 is not a win. For use cases that can be encoding-agnostic because all cooperating participants share a locale encoding, making them explicitly specify the locale encoding is just a matter of "misery loves company". Please, let's not do things for that reason. > I think the use case that the proposal has in mind is a > file-names-are-just-bytes program (or set of programs) that reads > from the filesystem, converts to bytes for a file/network, and then > eventually does the reverse - either end may be on windows. You have misspoken somewhere. The programs under discussion do not "convert" input to bytes; they *receive* bytes, either from POSIX APIs or from Windows *A APIs, and use them as is. Unless I am greatly mistaken, Steve simply wants that to work as well on Windows as on POSIX platforms, so that POSIX programmers who do encoding-agnostic programming have one less barrier to supporting their software on Windows. But you'll have to ask Steve to rule on that. Steve _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokoproject at gmail.com Sat Aug 13 18:33:54 2016 From: gokoproject at gmail.com (John Wong) Date: Sat, 13 Aug 2016 18:33:54 -0400 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> Message-ID: On Sat, Aug 13, 2016 at 4:30 PM, Arkadiusz Bulski wrote: > ?I think that just making it easier for new contributors > > will not help with getting good and dedicated contributors.? > > > > This is exactly what Donald was talking about. You are creating an > obstacle course for people to go through before they can contribute > anything. I totally agree that we need **good contributions** but making > it harder does not help. Remember why what made python the best lang in the > first place, ease of use? > > + 1 on all the reasons so far why GitHub isn't the only place. In fact, I'd say issue tracker in general is not necessarily a place for discussion except for very development-purpose. Whether it's Mozilla, Python or Cloud Foundry, Apache Cassandra, what not, from my experience, the most meaningful discussion always happens over email and over some kind of personal messaging platform, e.g. IRC or Slack. It's very hard to quantify which platform is best or *better*. Where I work, my development team is 98% exclusive remote and in a different timezone. So to get my message across, I have to either rely on email (and sometimes we don't answer each other's email), or set up a proper meeting, especially we tend to glass over important details or the email is overwhelmingly complex to digest (I have a tendency to write very long very thorough emails and some of my co-workers don't seem to able to get it, perhaps because of language barrier). Since CPython development itself still use its own tracker, discussing within GitHub is not ideal. There is an alternative, and that's using a forum. Rust ditched mailing list and went straight to Discourse [1]. Elasticsearch also ditched email list and went Discourse (although the quality there is quite bad, from my own experience). It's could be an alternative. But before we choose an alternative, let's think for use case. What is it that I can't do easily with mailing list? Does tagging help you organize your discussions? I haven't contributed anything code-wise to CPython development, I am more of a spectator, hoping to learn new things about Python and CPython from reading emails, so I am not in the best position to give you my opinion of defining a dev-release-production workflow. But as a reader, yeah, some kind of filtering, built into my reader client, would be really helpful. Having syntax highlight is also useful. Also, I agree with Guido: a lot of discussions here end up tangent to the original discussion. Off topic discussion is welcome, but I recommend folks to stay away from off topic too often and folks forget to branch off so off topic discussion end up polluting the thread. As a reader, I am tired of that. Thanks. John [1]: https://www.reddit.com/r/rust/comments/2tdqgc/rustdev_say_goodbye_to_the_mailing_list/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sat Aug 13 19:17:57 2016 From: brett at python.org (Brett Cannon) Date: Sat, 13 Aug 2016 23:17:57 +0000 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: On Sat, 13 Aug 2016 at 11:39 Donald Stufft wrote: > > > On Aug 13, 2016, at 2:21 PM, Oleg Broytman wrote: > > > >> > >> As Donald pointed out, there are people who are not going to create > >> custom email processing toolchains. > > > > In what way they will be helpful to the development of Python? > > Contributors have to install, learn, configure and use a lot of > > development tools, comparing to which email tools are just toys. > > > Well, I personally generally do not have the time to sit there and craft > some sort of complex email toolchain to deal with it. I just leave lists > if their volume are too high for me to deal with or they don?t provide me > the tools to interact with them in a non frustrating way. For example, I?m > no longer subscribed to python-dev because of these reasons. I *think* I?ve > positively impacted the development of Python, but maybe not! In any case > I think that falling down the trap of thinking that anyone who is willing > to contribute to Python is also willing to maintain a personal toolchain > for dealing with the deficiencies in mailing lists is not a place we should > be in. > > That?s not to say that a traditional mailing list may not represent the > best > trade off? I have my opinion, you have yours, but we should not start > thinking > that adding an obstacle course that a user must complete before they can > meaningfully contribute is doing anything but self selecting for people > willing > to run that particular obstacle course, not selecting for skill or likely > impact. > To give another example of how mailing lists put people at the mercy of how much time and expertise they have with their MUA, python-dev received https://mail.python.org/pipermail/python-dev/2016-August/145815.html today. Now in the archives that email looks fine, but the two opening paragraphs of that email came formatted a bit differently into my inbox (which is Google Inbox): ``` Hello, We are experimenting with a tool for inspecting how well languages and libraries support server certificate verification when establishing TLS connections. We are getting rather confusing results in our first major shootout of bundled CPython 2 and 3 versions in major, still supported OS distributions. We would love to get any insight into the test stubs and results. Maybe we are doing something horribly wrong? ``` That's a mess and the whole email is formatted like that. I actually have not read the email because of the formatting issue. As Oleg pointed out, when you go with a federated solution like mail, you are the mercy of whatever tools people choose to use with the service. But when you use a centralized approach you know the experience is consistent for everyone and thus there's a certain level of quality control. Now some say we don't need to include more people in discussions and we should let the power users continue to use their powerful workflows, while others say we should make it easier for all to participate. For me this community is known for being welcoming and I want to foster that, and if that means some of us have to use slightly less powerful tools to manage conversations so that everyone gets a better experience overall then I say that's worthy tradeoff. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Aug 13 19:25:32 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 14 Aug 2016 09:25:32 +1000 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: On Sun, Aug 14, 2016 at 9:17 AM, Brett Cannon wrote: > That's a mess and the whole email is formatted like that. I actually have > not read the email because of the formatting issue. As Oleg pointed out, > when you go with a federated solution like mail, you are the mercy of > whatever tools people choose to use with the service. But when you use a > centralized approach you know the experience is consistent for everyone and > thus there's a certain level of quality control. > IMO that's an argument in favour of the federated approach. ChrisA From greg.ewing at canterbury.ac.nz Sat Aug 13 19:59:56 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 14 Aug 2016 11:59:56 +1200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <20160813175006.GA32662@phdru.name> References: <20160813163606.GA28955@phdru.name> <00BD2FEA-3149-41F7-8D12-59B7C9F9CB41@stufft.io> <20160813175006.GA32662@phdru.name> Message-ID: <57AFB47C.4030401@canterbury.ac.nz> Oleg Broytman wrote: > From the recent and not so recent discussions of Moxi Marlinspike > about centralized vs decentralized solutions (unfederated messaging vs > email/jabber): "Indeed, cannibalizing a federated application-layer > protocol into a centralized service is almost a sure recipe for a > successful consumer product today. It's what Slack did with IRC, what > Facebook did with email, and what WhatsApp has done with XMPP. In each > case, the federated service is stuck in time, while the centralized > service is able to iterate into the modern world and beyond.". If I've managed to unravel that pile of buzzwords and tortured metaphors correctly, what he seems to be saying is "Locking you into a proprietary messaging system is good for you, really, believe me." -- Greg From phd at phdru.name Sat Aug 13 22:34:28 2016 From: phd at phdru.name (Oleg Broytman) Date: Sun, 14 Aug 2016 04:34:28 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <57AFB47C.4030401@canterbury.ac.nz> References: <20160813163606.GA28955@phdru.name> <00BD2FEA-3149-41F7-8D12-59B7C9F9CB41@stufft.io> <20160813175006.GA32662@phdru.name> <57AFB47C.4030401@canterbury.ac.nz> Message-ID: <20160814023428.GA14476@phdru.name> On Sun, Aug 14, 2016 at 11:59:56AM +1200, Greg Ewing wrote: > Oleg Broytman wrote: > > > From the recent and not so recent discussions of Moxi Marlinspike > >about centralized vs decentralized solutions (unfederated messaging vs > >email/jabber): "Indeed, cannibalizing a federated application-layer > >protocol into a centralized service is almost a sure recipe for a > >successful consumer product today. It's what Slack did with IRC, what > >Facebook did with email, and what WhatsApp has done with XMPP. In each > >case, the federated service is stuck in time, while the centralized > >service is able to iterate into the modern world and beyond.". > > If I've managed to unravel that pile of buzzwords and tortured > metaphors correctly, what he seems to be saying is "Locking you > into a proprietary messaging system is good for you, really, > believe me." I disagree with him on many grounds, but many people agree. Email, as we constantly hear, stuck in XIX century. Centralized proprietary messaging deliver people from the necessity to learn email tools and from necessity to use those horrible tools (the fact that they need to learn their shiny new web tools and that those proprietary apps are also quite horrible is usually disregarded). > -- > Greg Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From guido at python.org Sat Aug 13 22:32:45 2016 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Aug 2016 19:32:45 -0700 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> Message-ID: On Sat, Aug 13, 2016 at 3:33 PM, John Wong wrote: > Whether it's Mozilla, Python or Cloud Foundry, Apache Cassandra, what not, > from my experience, the most meaningful discussion always happens over > email and over some kind of personal messaging platform, e.g. IRC or Slack. > That's an odd juxtaposition, and not true in my experience. I've always hated IRC's culture (too much sniping and too much noise, and an intentional lack of records). While many operational decisions in the Python world are indeed taken on IRC (if that's where the few folks who need to take action on some issue get together), anything that requires some kind of long-lasting record does not belong there. At work I use Slack extensively but again it only works for things that people who aren't online at that very moment can safely skip; for everything else we either create documents that are cooperatively edited, or use trackers or mailing lists (and often all three :-). In the mypy world we simply don't have mailing lists; instead we use three GitHub trackers for everything (one for mypy, one for typeshed, and one for PEP-484 and the typing module). We have many excellent discussions there (in addition to bug reports and triage). In the Python world many meaningful discussions happen in bugs.python.org, as python-ideas is often too polarized to be of much use (the current thread being no exception), while python-dev more and more becomes an "official" channel to be avoided until a decision has already been negotiated elsewhere (stuff I post there runs a serious risk of being quoted out of context in media channels I've never heard of). In terms of putting barriers in place of newbie contributions, mailing lists appear more problematic than GitHub trackers, given how often we get a reply to a digest or an indecipherable jumble of quoting. Trackers also effectively avoid top-posting issues (which no amount of referring to mailing list etiquette can prevent). Note that all these alternatives still send email notifications (to those who want them), and trackers (including GitHub) also allow replies by email. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sat Aug 13 22:49:25 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 14 Aug 2016 14:49:25 +1200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> Message-ID: <57AFDC35.4090501@canterbury.ac.nz> Guido van Rossum wrote: > Note that all these alternatives still send email notifications (to > those who want them), and trackers (including GitHub) also allow replies > by email. Can you start a new topic by email, or only reply to existing ones? -- Greg From donald at stufft.io Sat Aug 13 23:04:47 2016 From: donald at stufft.io (Donald Stufft) Date: Sat, 13 Aug 2016 23:04:47 -0400 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <57AFDC35.4090501@canterbury.ac.nz> References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> <57AFDC35.4090501@canterbury.ac.nz> Message-ID: <0C5A6591-EA44-4F6C-ACD7-A9C6F95F21FF@stufft.io> > On Aug 13, 2016, at 10:49 PM, Greg Ewing wrote: > > Can you start a new topic by email, or only reply to > existing ones? Depends on the specific system and the configuration setup for it. Github only allows replies, Discourse can optionally allow creation of new topics via email. ? Donald Stufft From rosuav at gmail.com Sun Aug 14 00:01:05 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 14 Aug 2016 14:01:05 +1000 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> Message-ID: On Sun, Aug 14, 2016 at 12:32 PM, Guido van Rossum wrote: > In terms of putting barriers in place of newbie contributions, mailing lists > appear more problematic than GitHub trackers, given how often we get a reply > to a digest or an indecipherable jumble of quoting. Trackers also > effectively avoid top-posting issues (which no amount of referring to > mailing list etiquette can prevent). The biggest problem I'm seeing is with digests. Can that feature be flagged off as "DO NOT USE THIS UNLESS YOU KNOW WHAT YOU ARE ASKING FOR"? So many people seem to select digest mode, then get extremely confused by it. ChrisA From brett at python.org Sun Aug 14 00:57:03 2016 From: brett at python.org (Brett Cannon) Date: Sun, 14 Aug 2016 04:57:03 +0000 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: On Sat, Aug 13, 2016, 16:26 Chris Angelico wrote: > On Sun, Aug 14, 2016 at 9:17 AM, Brett Cannon wrote: > > That's a mess and the whole email is formatted like that. I actually have > > not read the email because of the formatting issue. As Oleg pointed out, > > when you go with a federated solution like mail, you are the mercy of > > whatever tools people choose to use with the service. But when you use a > > centralized approach you know the experience is consistent for everyone > and > > thus there's a certain level of quality control. > > > > IMO that's an argument in favour of the federated approach. > I don't understand how me receiving a badly formatted email due to some disagreement between the sender's and my email client is a point of support? -Brett > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Aug 14 01:08:53 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 14 Aug 2016 15:08:53 +1000 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: On Sun, Aug 14, 2016 at 2:57 PM, Brett Cannon wrote: > On Sat, Aug 13, 2016, 16:26 Chris Angelico wrote: >> >> On Sun, Aug 14, 2016 at 9:17 AM, Brett Cannon wrote: >> > That's a mess and the whole email is formatted like that. I actually >> > have >> > not read the email because of the formatting issue. As Oleg pointed out, >> > when you go with a federated solution like mail, you are the mercy of >> > whatever tools people choose to use with the service. But when you use a >> > centralized approach you know the experience is consistent for everyone >> > and >> > thus there's a certain level of quality control. >> > >> >> IMO that's an argument in favour of the federated approach. > > > I don't understand how me receiving a badly formatted email due to some > disagreement between the sender's and my email client is a point of support? That in itself isn't. But your next point is that email lets people choose what they use, whereas centralized systems force everyone to use the exact same client (or whatever clients the one central authority provides - eg Slack offers web and desktop, and I think mobile). In fact, the entire *point* of the centralized systems is to force everyone to use a restricted set of clients, instead of having the freedom to choose. Would the world be a better place if everyone were forced to write all code in Python? ChrisA From p1himik at gmail.com Sun Aug 14 01:13:51 2016 From: p1himik at gmail.com (Eugene Pakhomov) Date: Sun, 14 Aug 2016 12:13:51 +0700 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: The world probably wouldn't better, but I think python-ideas will be better on something like GitHub. People who like it will benefit from it, people who don't will still be able to use their email setup (as pointed earlier - at least for the majority of cases). Regards, Eugene On Sun, Aug 14, 2016 at 12:08 PM, Chris Angelico wrote: > On Sun, Aug 14, 2016 at 2:57 PM, Brett Cannon wrote: > > On Sat, Aug 13, 2016, 16:26 Chris Angelico wrote: > >> > >> On Sun, Aug 14, 2016 at 9:17 AM, Brett Cannon wrote: > >> > That's a mess and the whole email is formatted like that. I actually > >> > have > >> > not read the email because of the formatting issue. As Oleg pointed > out, > >> > when you go with a federated solution like mail, you are the mercy of > >> > whatever tools people choose to use with the service. But when you > use a > >> > centralized approach you know the experience is consistent for > everyone > >> > and > >> > thus there's a certain level of quality control. > >> > > >> > >> IMO that's an argument in favour of the federated approach. > > > > > > I don't understand how me receiving a badly formatted email due to some > > disagreement between the sender's and my email client is a point of > support? > > That in itself isn't. But your next point is that email lets people > choose what they use, whereas centralized systems force everyone to > use the exact same client (or whatever clients the one central > authority provides - eg Slack offers web and desktop, and I think > mobile). In fact, the entire *point* of the centralized systems is to > force everyone to use a restricted set of clients, instead of having > the freedom to choose. Would the world be a better place if everyone > were forced to write all code in Python? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arek.bulski at gmail.com Sun Aug 14 04:47:36 2016 From: arek.bulski at gmail.com (Arkadiusz Bulski) Date: Sun, 14 Aug 2016 10:47:36 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> Message-ID: <57b03025.a913190a.6f4c1.a901@mx.google.com> Just pointing out that there is an official organisation account on github already. All we need is someone to create a repo and people will immediately start posting there. After a week you will see for yourself that it simply works. https://github.com/python For me personally, mailing lists are as ephemeral as chats. I would be more than happy to talk to you folks over WhatsApp. ~~Arkadiusz Bulski~~ Od: Arkadiusz Bulski -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony at xtfx.me Sun Aug 14 04:57:44 2016 From: anthony at xtfx.me (C Anthony Risinger) Date: Sun, 14 Aug 2016 03:57:44 -0500 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: We are all interacting from different points in or own personal development, in addition to growth and changes in how we interact with technology. I used to care about top posting. Email netiquette and rules and all that. I'd perform delicate inlining of responses to promote better readability and really give them readers the ideal experience. Or whatever. (I'm not sure inlining and bottom posting is even better. Top posting lets me read the most relevant discussion first and follow the signal back to source if I wanted at leisure. This is a good way to maximize time spent learning from refined thoughts instead of emergent ones) Mobile dominates my non-work net-time today. I don't want to get out a laptop to respond pretty. I have elementary kids now and life is faster. Email is almost by design static and unable to change. Text. Walls of it! Every message in this thread looks like a jagged unique snowflake. Email doesn't need to change. It's already the common denominator. The next thing will certainly email you! There is value in it and it is recognized. Should people younger than myself care about all the garbage I too see as relic? Making discussion more accessible and individually relevant/impactful is all that really matters. We should explore ways to do that! On Aug 14, 2016 12:14 AM, "Eugene Pakhomov" wrote: > The world probably wouldn't better, but I think python-ideas will be > better on something like GitHub. > People who like it will benefit from it, people who don't will still be > able to use their email setup (as pointed earlier - at least for the > majority of cases). > > Regards, > Eugene > > > On Sun, Aug 14, 2016 at 12:08 PM, Chris Angelico wrote: > >> On Sun, Aug 14, 2016 at 2:57 PM, Brett Cannon wrote: >> > On Sat, Aug 13, 2016, 16:26 Chris Angelico wrote: >> >> >> >> On Sun, Aug 14, 2016 at 9:17 AM, Brett Cannon >> wrote: >> >> > That's a mess and the whole email is formatted like that. I actually >> >> > have >> >> > not read the email because of the formatting issue. As Oleg pointed >> out, >> >> > when you go with a federated solution like mail, you are the mercy of >> >> > whatever tools people choose to use with the service. But when you >> use a >> >> > centralized approach you know the experience is consistent for >> everyone >> >> > and >> >> > thus there's a certain level of quality control. >> >> > >> >> >> >> IMO that's an argument in favour of the federated approach. >> > >> > >> > I don't understand how me receiving a badly formatted email due to some >> > disagreement between the sender's and my email client is a point of >> support? >> >> That in itself isn't. But your next point is that email lets people >> choose what they use, whereas centralized systems force everyone to >> use the exact same client (or whatever clients the one central >> authority provides - eg Slack offers web and desktop, and I think >> mobile). In fact, the entire *point* of the centralized systems is to >> force everyone to use a restricted set of clients, instead of having >> the freedom to choose. Would the world be a better place if everyone >> were forced to write all code in Python? >> >> ChrisA >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Sun Aug 14 06:18:09 2016 From: phd at phdru.name (Oleg Broytman) Date: Sun, 14 Aug 2016 12:18:09 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <57b03025.a913190a.6f4c1.a901@mx.google.com> References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> <57b03025.a913190a.6f4c1.a901@mx.google.com> Message-ID: <20160814101809.GA990@phdru.name> On Sun, Aug 14, 2016 at 10:47:36AM +0200, Arkadiusz Bulski wrote: > Just pointing out that there is an official organisation account on github already. All we need is someone to create a repo and people will immediately start posting there. After a week you will see for yourself that it simply works. > https://github.com/python I have been being the lead developer and release manager for https://github.com/sqlobject for about 10 years. I have never seen long fruitful communications in web issue tracker. The best discussions happen in our mailing list. So I doubt I can see anything in a week. > For me personally, mailing lists are as ephemeral as chats. I would be more than happy to talk to you folks over WhatsApp. Are you going to run Python tests on your smartphone? If not, wouldn't it be a little problematic to work on a real computer but to communicate on a phone? > ~~Arkadiusz Bulski~~ > > Od: Arkadiusz Bulski Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From phd at phdru.name Sun Aug 14 06:22:21 2016 From: phd at phdru.name (Oleg Broytman) Date: Sun, 14 Aug 2016 12:22:21 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: <20160814102221.GB990@phdru.name> On Sun, Aug 14, 2016 at 03:57:44AM -0500, C Anthony Risinger wrote: > Mobile dominates my non-work net-time today. I don't want to get out a > laptop to respond pretty. In what ways are you going to contribute without getting out to your laptop? > Email is almost by design static and unable to change. Text. Walls of it! We are talking about Python development, and the development is performed with texts. Python is written in C, Python code is written... well, in Python, documentation is written in reStructuredText. So discussions about all of this is, naturally, textual. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From p.f.moore at gmail.com Sun Aug 14 07:19:33 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 14 Aug 2016 12:19:33 +0100 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: On 13 August 2016 at 19:44, David Mertz wrote: > I find email list VASTLY easier to deal with than any newfangled web-based > custom discussion forum. Part of that is that it is a uniform interface to > every list I belong too, and I can choose my own MUA. With all those web > things, every site works a little bit different from every other one, that > imposes an unnecessary cognitive burden (and usually simply lacks some > desired capability) For me, the important point is "uniform interface to all lists". I like email for the forums I participate in, simply because I only need *one* browser tab open (gmail) and I don't have to remember or bookmark a variety of URLs. On forums that have their own web interface, I participate much less frequently, and tend to fall into much more of a "drive by" interaction, only contributing to "my" threads, rather than fully participating like I do on mailing lists. Whereas with my email lists, I read pretty much everything (sometimes only skimming, of course), which leads to me participating in threads I would otherwise have ignored. I can't really offer any opinion on the "mailing lists are more efficient" debate, as I simply dump all my lists into gmail, with a label per list, so I'm not exactly a power user, but the "single website for everything" aspect is the huge bonus for me. Would I follow python-ideas if it moved to a different forum? Certainly, if the new forum let me just click on something and from there on interact solely by email. Maybe, if I found that having a python-ideas tab permanently open was worthwhile. Otherwise, I don't know. Likely not, except on specific topics (but without an email feed, I don't know how I'd find out about such topics). And my participation would be much less frequent. (I leave it to others to judge whether that would be a good thing ;-)) In my opinions, forums tend to encourage a much more focused style of discussion. In one way, that's a good thing (and I'm sure many people would prefer python-ideas to have more focus). But it *also* tends to deter people from contributing - I can't quite express why, but there's somehow less of a sense of being an open debate with a forum. Maybe that's just me - it's certainly a subjective thing - but in a forum I'd expect the quality of discussion to increase, but the quantity (and breadth) to decrease. While I'm mentioning random thoughts, email replies to a forum like a github tracker are often a little disruptive, because etiquette is different. Email users tend to quote extensively for context, and often include signatures. Neither of these things is typically as necessary on a tracker (minimal, careful quoting is frequent, but not the extensive quoting common on mailing lists). So I could imagine a "mixed" interface actually being *less* comfortable for both types of participant. Paul From srkunze at mail.de Sun Aug 14 07:30:03 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Sun, 14 Aug 2016 13:30:03 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: On 14.08.2016 13:19, Paul Moore wrote: > In my opinions, forums tend to encourage a much more focused style of > discussion. In one way, that's a good thing (and I'm sure many people > would prefer python-ideas to have more focus). But it *also* tends to > deter people from contributing - I can't quite express why, but > there's somehow less of a sense of being an open debate with a forum. > Maybe that's just me - it's certainly a subjective thing - but in a > forum I'd expect the quality of discussion to increase, but the > quantity (and breadth) to decrease. The "why" comes from the linear nature of forums threads. I like it more, too. :) Sven From arek.bulski at gmail.com Sun Aug 14 07:57:03 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Sun, 14 Aug 2016 13:57:03 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <57b03025.a913190a.6f4c1.a901@mx.google.com> References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> <57b03025.a913190a.6f4c1.a901@mx.google.com> Message-ID: ?I throw a proposal on the table: lets create a "python-ideas" repo under "python" account on GitHub and move this and only this thread onto it. If it fails, nothing but this thread is lost (not persisted in the mailing list) which would make no difference anyway. People made many points that are purely abstract. We need some hands on experience to see if it works for us or doesnt. Guido, could you create a repo for us?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Aug 14 08:15:10 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 14 Aug 2016 22:15:10 +1000 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> <57b03025.a913190a.6f4c1.a901@mx.google.com> Message-ID: <20160814121510.GO26300@ando.pearwood.info> On Sun, Aug 14, 2016 at 01:57:03PM +0200, Arek Bulski wrote: > ?I throw a proposal on the table: lets create a "python-ideas" repo under > "python" account on GitHub and move this and only this thread onto it. If > it fails, What is your definition of "fails"? If three people follow you onto Github, and say "Well, isn't this nice!", is that a success? Its all well and good to say "let's try it and see", but unless you have concrete, object criteria for success or failure, all that will happen is that some person or group of people will decide on subjective grounds that they like the new way of doing things, or don't, and we all should, or shouldn't, change. -- Steve From gvanrossum at gmail.com Sun Aug 14 10:45:46 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun, 14 Aug 2016 07:45:46 -0700 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: A uniform interface works well enough for issue trackers. And the "freedom of choice" idea doesn't overrule all other concerns. Maybe we should just start a python-ideas tracker and see who comes. There's no reason it couldn't exist in addition to the mailing list. (Before you scream fragmentation, we have many lists already.) --Guido (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicholas.chammas at gmail.com Sun Aug 14 12:16:06 2016 From: nicholas.chammas at gmail.com (Nicholas Chammas) Date: Sun, 14 Aug 2016 16:16:06 +0000 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <00BD2FEA-3149-41F7-8D12-59B7C9F9CB41@stufft.io> References: <20160813163606.GA28955@phdru.name> <00BD2FEA-3149-41F7-8D12-59B7C9F9CB41@stufft.io> Message-ID: On Sat, Aug 13, 2016 at 1:16 PM Donald Stufft donald at stufft.io wrote: I think one of the big trade offs here, is that the traditional mailing > list can work very well if everyone involved takes the time to develop a > custom tool chain that fits their own workflow perfectly and if they spend > the time learning the deficiency of the systems to ensure they correctly > work around them. The web forum thing can theoretically achieve much less > of a theoretical ?maximum? for productivity, but it typically means that > you can bring productivity gains to those who can?t or won?t spend time > maintaining a custom mailing stack. > This is an excellent point, and is similar to one that was made on a similar discussion at the start of the year . That thread focused more on Discourse as a potential alternative to the mailing list for python-ideas, but a lot of the arguments being made here on both sides are a repetition of what was discussed there (and probably of what has been discussed several times over the course of many years). Nick ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Sun Aug 14 12:20:10 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 14 Aug 2016 18:20:10 +0200 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> Message-ID: > The last point is correct: if you get bytes from a file system API, you should be able to pass them back in without losing information. CP_ACP (a.k.a. the *A API) does not allow this, so I'm proposing using the *W API everywhere and encoding to utf-8 when the user wants/gives bytes. You get troubles when the filename comes a file, another application, a registry key, ... which is encoded to CP_ACP. Do you plan to transcode all these data? (decode from CP_ACP, encode back to UTF-8) -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony at xtfx.me Sun Aug 14 12:21:47 2016 From: anthony at xtfx.me (C Anthony Risinger) Date: Sun, 14 Aug 2016 11:21:47 -0500 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <20160814102221.GB990@phdru.name> References: <20160813182113.GA1534@phdru.name> <20160814102221.GB990@phdru.name> Message-ID: On Aug 14, 2016 5:18 AM, "Oleg Broytman" wrote: > > On Sun, Aug 14, 2016 at 03:57:44AM -0500, C Anthony Risinger < anthony at xtfx.me> wrote: > > Mobile dominates my non-work net-time today. I don't want to get out a > > laptop to respond pretty. > > In what ways are you going to contribute without getting out to your > laptop? I meant it's more difficult to respond "properly" in interleaved style (like I am now, despite mobile) from the devices I'm on 95% of my list reading time. Unless I've crafted something gorgeous I still tend to feel a reservation to share. > > Email is almost by design static and unable to change. Text. Walls of it! > > We are talking about Python development, and the development is > performed with texts. Python is written in C, Python code is written... > well, in Python, documentation is written in reStructuredText. So > discussions about all of this is, naturally, textual. Sure sure, but text is messy. Every message looks half-broken due to forced formatting (mixed HTML/plaintext and massive fluctuations in typography), wrapping (in portrait mode almost every message is very jagged due to inserted newlines by some MUA) and folding (when people switch quote styles proper folding by my MUA is hard, only top posting is free of this). I'm not sure X is better than straight email per se, but I am interested in moving the status quo forward and would try alternatives. -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Sun Aug 14 12:26:35 2016 From: donald at stufft.io (Donald Stufft) Date: Sun, 14 Aug 2016 12:26:35 -0400 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <20160814101809.GA990@phdru.name> References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> <57b03025.a913190a.6f4c1.a901@mx.google.com> <20160814101809.GA990@phdru.name> Message-ID: <612E9961-D29C-45EA-91B7-1D2ED41B6917@stufft.io> > On Aug 14, 2016, at 6:18 AM, Oleg Broytman wrote: > > Are you going to run Python tests on your smartphone? If not, > wouldn't it be a little problematic to work on a real computer but to > communicate on a phone? This is kind of silly, I can contribute to a discussion from a phone without needing to lay down a bunch of code in my text editor. Not every valuable contribution is writing code, and even for people who write code, not every thing they do requires that. ? Donald Stufft From steve.dower at python.org Sun Aug 14 13:49:21 2016 From: steve.dower at python.org (Steve Dower) Date: Sun, 14 Aug 2016 10:49:21 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> Message-ID: I plan to use only Unicode to interact with the OS and then utf8 within Python if the caller wants bytes. Currently we effectively use Unicode to interact with the OS and then CP_ACP if the caller wants bytes. All the *A APIs just decode strings and call the *W APIs, and encode the return values. I'm proposing that we move the decoding and encoding into Python and make it (nearly) lossless. In practice, this means all *A APIs are banned within the CPython source, and if we get/need bytes we have to convert to text first using the FS encoding, which will be utf8. Top-posted from my Windows Phone -----Original Message----- From: "Victor Stinner" Sent: ?8/?14/?2016 9:20 To: "Steve Dower" Cc: "Stephen J. Turnbull" ; "python-ideas" ; "Random832" Subject: Re: [Python-ideas] Fix default encodings on Windows > The last point is correct: if you get bytes from a file system API, you should be able to pass them back in without losing information. CP_ACP (a.k.a. the *A API) does not allow this, so I'm proposing using the *W API everywhere and encoding to utf-8 when the user wants/gives bytes. You get troubles when the filename comes a file, another application, a registry key, ... which is encoded to CP_ACP. Do you plan to transcode all these data? (decode from CP_ACP, encode back to UTF-8) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sun Aug 14 13:33:38 2016 From: brett at python.org (Brett Cannon) Date: Sun, 14 Aug 2016 17:33:38 +0000 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <20160814121510.GO26300@ando.pearwood.info> References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> <57b03025.a913190a.6f4c1.a901@mx.google.com> <20160814121510.GO26300@ando.pearwood.info> Message-ID: On Sun, 14 Aug 2016 at 05:16 Steven D'Aprano wrote: > On Sun, Aug 14, 2016 at 01:57:03PM +0200, Arek Bulski wrote: > > ?I throw a proposal on the table: lets create a "python-ideas" repo under > > "python" account on GitHub and move this and only this thread onto it. If > > it fails, > > What is your definition of "fails"? > > If three people follow you onto Github, and say "Well, isn't this > nice!", is that a success? > > Its all well and good to say "let's try it and see", but unless you have > concrete, object criteria for success or failure, all that will happen > is that some person or group of people will decide on subjective grounds > that they like the new way of doing things, or don't, and we all should, > or shouldn't, change. > Yep, and in the case of python-ideas that subjective decision falls on my shoulders because you can't measure happiness objectively very well. :( And as for creating a test GH repo for this, I'm thinking about it. But I should mention the only reason I'm thinking about it is because some of us have been discussing the cognitive overload of mailing lists as they currently stand behind the scenes and GH was potentially the next experiment. It has nothing to do with the OP as I believe it would have been a bit more appropriate/nicer to ask for suggestions on how to manage the email volume rather than coming in and say, "I don't like this, so it should change" (a similar issue also came up on the peps repo w/ the OP trying to use that issue tracker for this same proposed purpose, so I'm trying to stay impartial in spite of how this idea has been presented). IOW I'm personally muting this thread as I am not hearing any new information on this topic and it's on me to make a decision as to whether a GitHub repo will be set up as an experiment. -------------- next part -------------- An HTML attachment was scrubbed... URL: From arek.bulski at gmail.com Sun Aug 14 15:35:39 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Sun, 14 Aug 2016 21:35:39 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> <57b03025.a913190a.6f4c1.a901@mx.google.com> Message-ID: As i think Donald pointed it out, it doesnt take a laptop to contribute. Did you all notice that Guido replied from a phone? Currently half of the mailing list mail is large auto quotes or subject date info. Lines are never broken the way they should be. Who wants to keep their mail toolchains, keep it. Dont make the rest of us put up with this shit. There is no definition of fails just As i dont have a definition of consensus. People will stick to it or not. No voting, just participation. 14 sie 2016 1:57 PM "Arek Bulski" napisa?(a): ?I throw a proposal on the table: lets create a "python-ideas" repo under "python" account on GitHub and move this and only this thread onto it. If it fails, nothing but this thread is lost (not persisted in the mailing list) which would make no difference anyway. People made many points that are purely abstract. We need some hands on experience to see if it works for us or doesnt. Guido, could you create a repo for us?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.combelle at gmail.com Sun Aug 14 19:05:47 2016 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Mon, 15 Aug 2016 01:05:47 +0200 Subject: [Python-ideas] add a hash to .pyc to don't mess between .py and .pyc Message-ID: <1c9f8e60-ddad-fc7d-a162-3f60c9ab8dec@gmail.com> I have stumbled upon several time with the following problem. I delete a module and the .pyc stay around. and by "magic", python still use the .pyc A similar error happen (but less often) when by some file system manipulation the .pyc happen to be newer than the .py but correspond to an older version of .py. It is not a major problem but it is still an existing problem. I'm not the first one to have this problem. A stack overflow search lead to quite a lot of relevant answers http://stackoverflow.com/search?q=old+pyc and google search too https://www.google.fr/search?q=old+pyc moreover several result of google result in bug tracking of various project. (There is also in these result the fact that .pyc are stored in VCS repositories but this is another problem not related) I even found a blog post using .pyc as a backdoor http://secureallthethings.blogspot.fr/2015/11/backdooring-python-via-pyc-pi-wa-si_9.html My idea to kill both bird in one stone would be to add a hash (likely to be cryptographic) of the .py file in the .pyc file and read the .py file and check the hash The additional cost of first startup cost will be just the hash calculation which I think is cheap comparing to other factors (especially input output) The additional second startup cost of a program the main cost will be the additional read of .py files and the cheap hash calculations. I believe the removing of the bugs would worth the performance cost. I know that some use case makes a use of just using .pyc and not keeping .py around, for example by not distribute the source file. But in my vision, this uses case should be solved per opt-in decision and not as a default. Several opt-in mechanisms could be envisioned: environment variables, command line switches, special compilation of .pyc which explicitly ask to not check for the hash. -- Xavier From rosuav at gmail.com Sun Aug 14 19:23:13 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 15 Aug 2016 09:23:13 +1000 Subject: [Python-ideas] add a hash to .pyc to don't mess between .py and .pyc In-Reply-To: <1c9f8e60-ddad-fc7d-a162-3f60c9ab8dec@gmail.com> References: <1c9f8e60-ddad-fc7d-a162-3f60c9ab8dec@gmail.com> Message-ID: On Mon, Aug 15, 2016 at 9:05 AM, Xavier Combelle wrote: > I know that some use case makes a use of just using .pyc and not keeping > .py around, for example by not distribute the source file. > But in my vision, this uses case should be solved per opt-in decision > and not as a default. Several opt-in mechanisms could be envisioned: > environment variables, command line switches, special compilation of > .pyc which explicitly ask to not check for the hash. Of those, only the last one is truly viable - the application developer isn't necessarily the one choosing to make a sourceless module (it could be any library module anywhere in the tree, including the CPython standard library - sometimes that's distributed without .py files, to reduce interpreter on-disk size). So what this would mean is that a sourceless distro is not simply "delete the .py files and stuff keeps working", but "run this script and it'll recompile the .py files to stand-alone .pyc files". As such, I think the idea has merit; but it won't close the backdoor that you mentioned (anyone who wants to make that kind of attack would simply make a file that's marked as stand-alone). That said, though - anyone who can maliciously write to your file system has already won, whether they're writing pyc or py files. The only difference is how easily it's detected. Fully loading and hashing the .py file seems like a paranoia option, and if you want that, just blow away all .pyc files, have your PYTHONPATH point to a read-only file system, and force the interpreter to compile everything fresh every time. How does this interact with the __pycache__ directory? ChrisA From wes.turner at gmail.com Sun Aug 14 20:45:23 2016 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 14 Aug 2016 19:45:23 -0500 Subject: [Python-ideas] add a hash to .pyc to don't mess between .py and .pyc In-Reply-To: References: <1c9f8e60-ddad-fc7d-a162-3f60c9ab8dec@gmail.com> Message-ID: You can add a `make clean` build step: pyclean: find . -name '*.pyc' -delete You can delete all .pyc files - $ find . -name '*.pyc' -delete - http://manpages.ubuntu.com/manpages/precise/man1/pyclean.1.html #.pyc, .pyo You can rebuild all .pyc files (for a given directory): - $ python -m compileall -h - https://docs.python.org/2/library/compileall.html - https://docs.python.org/3/library/compileall.html You can, instead of building .pyc, build .pyo - https://docs.python.org/2/using/cmdline.html#envvar-PYTHONOPTIMIZE - https://docs.python.org/2/using/cmdline.html#cmdoption-O You can not write .pyc or .pyo w/ PYTHONDONTWRITEBYTECODE / -B - https://docs.python.org/2/using/cmdline.html#envvar-PYTHONDONTWRITEBYTECODE - https://docs.python.org/2/using/cmdline.html#cmdoption-B - If the files exist though, - https://docs.python.org/3/reference/import.html You can build a PEX (which rebuilds .pyc files) and test/deploy that: - https://github.com/pantsbuild/pex#integrating-pex-into-your-workflow - https://pantsbuild.github.io/python-readme.html#more-about-python-tests How .pyc files currently work: - http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html - https://www.python.org/dev/peps/pep-3147/#flow-chart (*.pyc -> ./__pycache__) - http://raulcd.com/how-python-caches-compiled-bytecode.html You could add a hash of the .py source file in the header of the .pyc/.pyo object (as proposed) - The overhead of this hashing would be a significant performance regression - Instead, today, the build step can just pyclean or build a .zip/.WHL/.PEX which is expected to be a fresh build On Sun, Aug 14, 2016 at 6:23 PM, Chris Angelico wrote: > On Mon, Aug 15, 2016 at 9:05 AM, Xavier Combelle > wrote: > > I know that some use case makes a use of just using .pyc and not keeping > > .py around, for example by not distribute the source file. > > But in my vision, this uses case should be solved per opt-in decision > > and not as a default. Several opt-in mechanisms could be envisioned: > > environment variables, command line switches, special compilation of > > .pyc which explicitly ask to not check for the hash. > > Of those, only the last one is truly viable - the application > developer isn't necessarily the one choosing to make a sourceless > module (it could be any library module anywhere in the tree, including > the CPython standard library - sometimes that's distributed without > .py files, to reduce interpreter on-disk size). So what this would > mean is that a sourceless distro is not simply "delete the .py files > and stuff keeps working", but "run this script and it'll recompile the > .py files to stand-alone .pyc files". > > As such, I think the idea has merit; but it won't close the backdoor > that you mentioned (anyone who wants to make that kind of attack would > simply make a file that's marked as stand-alone). That said, though - > anyone who can maliciously write to your file system has already won, > whether they're writing pyc or py files. The only difference is how > easily it's detected. Fully loading and hashing the .py file seems > like a paranoia option, and if you want that, just blow away all .pyc > files, have your PYTHONPATH point to a read-only file system, and > force the interpreter to compile everything fresh every time. > > How does this interact with the __pycache__ directory? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Aug 14 20:49:32 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 15 Aug 2016 10:49:32 +1000 Subject: [Python-ideas] add a hash to .pyc to don't mess between .py and .pyc In-Reply-To: <1c9f8e60-ddad-fc7d-a162-3f60c9ab8dec@gmail.com> References: <1c9f8e60-ddad-fc7d-a162-3f60c9ab8dec@gmail.com> Message-ID: <20160815004931.GS26300@ando.pearwood.info> On Mon, Aug 15, 2016 at 01:05:47AM +0200, Xavier Combelle wrote: > I have stumbled upon several time with the following problem. > I delete a module and the .pyc stay around. and by "magic", python still > use the .pyc Upgrade to Python 3.2 or better, and the problem will go away. In 3.2 and above, the .pyc files are stored in a separate __pycache__ directory, and are only used if the .py file still exists. In Python 3.1 and older, you have: # directory in sys.path spam.py spam.pyc eggs.py eggs.pyc and if you delete eggs.py, Python will still use eggs.pyc. But in 3.2 and higher the cache keeps implementation and version specific byte-code files: spam.py eggs.py __pycache__/ +-- spam-cpython-32.pyc +-- spam-cpython-35.pyc +-- spam-pypy-33.pyc +-- eggs-cpython-34.pyc +-- eggs-cpython-35.pyc If you delete the eggs.py file, the eggs byte-code files won't be used. Byte-code only modules are still supported, but you have to explicitly opt-in to that by moving the .pyc file out of the __pycache__ directory and renaming it. See PEP 3147 for more details: https://www.python.org/dev/peps/pep-3147/ -- Steve From xavier.combelle at gmail.com Sun Aug 14 22:35:32 2016 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Mon, 15 Aug 2016 04:35:32 +0200 Subject: [Python-ideas] add a hash to .pyc to don't mess between .py and .pyc In-Reply-To: References: <1c9f8e60-ddad-fc7d-a162-3f60c9ab8dec@gmail.com> Message-ID: On 15/08/2016 02:45, Wes Turner wrote: > > You can add a `make clean` build step: > > pyclean: > find . -name '*.pyc' -delete > > You can delete all .pyc files > > - $ find . -name '*.pyc' -delete > - http://manpages.ubuntu.com/manpages/precise/man1/pyclean.1.html > #.pyc, .pyo > > You can rebuild all .pyc files (for a given directory): > > - $ python -m compileall -h > - https://docs.python.org/2/library/compileall.html > - https://docs.python.org/3/library/compileall.html > > > > You can, instead of building .pyc, build .pyo > > - https://docs.python.org/2/using/cmdline.html#envvar-PYTHONOPTIMIZE > - https://docs.python.org/2/using/cmdline.html#cmdoption-O > > You can not write .pyc or .pyo w/ PYTHONDONTWRITEBYTECODE / -B > > - https://docs.python.org/2/using/cmdline.html#envvar-PYTHONDONTWRITEBYTECODE > - https://docs.python.org/2/using/cmdline.html#cmdoption-B > - If the files exist though, > - https://docs.python.org/3/reference/import.html > > You can build a PEX (which rebuilds .pyc files) and test/deploy that: > > - https://github.com/pantsbuild/pex#integrating-pex-into-your-workflow > - https://pantsbuild.github.io/python-readme.html#more-about-python-tests > > How .pyc files currently work: > > - http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html > - https://www.python.org/dev/peps/pep-3147/#flow-chart (*.pyc -> > ./__pycache__) > - http://raulcd.com/how-python-caches-compiled-bytecode.html > > You could add a hash of the .py source file in the header of the > .pyc/.pyo object (as proposed) > > - The overhead of this hashing would be a significant performance > regression > - Instead, today, the build step can just pyclean or build a > .zip/.WHL/.PEX which is expected to be a fresh build > The problem is not the option of you have to prevent the problem, the simplest way being to delete the .pyc file, It is easy to do once you spot it. The problem is that it randomly happen in normal workflow. To have an idea of the overhead of the whole hashing procedure I run the following script import sys from time import time from zlib import adler32 as h t2 =time() import decimal print(decimal.__file__) c1 = time()-t2 t1=time() r=h(open(decimal.__file__,'rb').read()) c2= time()-t1 print(c2,c1,c2/c1) decimal was chosen because it was the biggest file of the standard library. on 20 runs, the overhead was always between 1% and 1.5% So yes the overhead on the import process is measurable but very small. By consequence, I would not call it significant. Moreover the import process is only a part (and not the biggest one) of a whole. At the difference of my first mail I now consider only a non cryptographic hash/checksum as the only aim is to prevent accidental unmatch between .pyc and .py file. From mertz at gnosis.cx Sun Aug 14 22:38:34 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 14 Aug 2016 19:38:34 -0700 Subject: [Python-ideas] add a hash to .pyc to don't mess between .py and .pyc In-Reply-To: References: <1c9f8e60-ddad-fc7d-a162-3f60c9ab8dec@gmail.com> Message-ID: Given that this issue does not affect modern Python versions, it sounds like time for the OP to upgrade. On Aug 14, 2016 7:36 PM, "Xavier Combelle" wrote: > On 15/08/2016 02:45, Wes Turner wrote: > > > > You can add a `make clean` build step: > > > > pyclean: > > find . -name '*.pyc' -delete > > > > You can delete all .pyc files > > > > - $ find . -name '*.pyc' -delete > > - http://manpages.ubuntu.com/manpages/precise/man1/pyclean.1.html > > #.pyc, .pyo > > > > You can rebuild all .pyc files (for a given directory): > > > > - $ python -m compileall -h > > - https://docs.python.org/2/library/compileall.html > > - https://docs.python.org/3/library/compileall.html > > > > > > > > You can, instead of building .pyc, build .pyo > > > > - https://docs.python.org/2/using/cmdline.html#envvar-PYTHONOPTIMIZE > > - https://docs.python.org/2/using/cmdline.html#cmdoption-O > > > > You can not write .pyc or .pyo w/ PYTHONDONTWRITEBYTECODE / -B > > > > - https://docs.python.org/2/using/cmdline.html#envvar- > PYTHONDONTWRITEBYTECODE > > - https://docs.python.org/2/using/cmdline.html#cmdoption-B > > - If the files exist though, > > - https://docs.python.org/3/reference/import.html > > > > You can build a PEX (which rebuilds .pyc files) and test/deploy that: > > > > - https://github.com/pantsbuild/pex#integrating-pex-into-your-workflow > > - https://pantsbuild.github.io/python-readme.html#more-about- > python-tests > > > > How .pyc files currently work: > > > > - http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html > > - https://www.python.org/dev/peps/pep-3147/#flow-chart (*.pyc -> > > ./__pycache__) > > - http://raulcd.com/how-python-caches-compiled-bytecode.html > > > > You could add a hash of the .py source file in the header of the > > .pyc/.pyo object (as proposed) > > > > - The overhead of this hashing would be a significant performance > > regression > > - Instead, today, the build step can just pyclean or build a > > .zip/.WHL/.PEX which is expected to be a fresh build > > > The problem is not the option of you have to prevent the problem, the > simplest way being > to delete the .pyc file, It is easy to do once you spot it. The problem > is that it randomly happen in > normal workflow. > > To have an idea of the overhead of the whole hashing procedure I run the > following script > > import sys > > from time import time > from zlib import adler32 as h > t2 =time() > import decimal > print(decimal.__file__) > c1 = time()-t2 > t1=time() > r=h(open(decimal.__file__,'rb').read()) > c2= time()-t1 > print(c2,c1,c2/c1) > > decimal was chosen because it was the biggest file of the standard library. > on 20 runs, the overhead was always between 1% and 1.5% > So yes the overhead on the import process is measurable but very small. > By consequence, I would not call it significant. > Moreover the import process is only a part (and not the biggest one) of > a whole. > > At the difference of my first mail I now consider only a non > cryptographic hash/checksum > as the only aim is to prevent accidental unmatch between .pyc and .py file. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.combelle at gmail.com Sun Aug 14 22:55:03 2016 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Mon, 15 Aug 2016 04:55:03 +0200 Subject: [Python-ideas] add a hash to .pyc to don't mess between .py and .pyc In-Reply-To: <20160815004931.GS26300@ando.pearwood.info> References: <1c9f8e60-ddad-fc7d-a162-3f60c9ab8dec@gmail.com> <20160815004931.GS26300@ando.pearwood.info> Message-ID: <993695af-cb88-704c-cc08-34e27a0f1157@gmail.com> On 15/08/2016 02:49, Steven D'Aprano wrote: > On Mon, Aug 15, 2016 at 01:05:47AM +0200, Xavier Combelle wrote: >> I have stumbled upon several time with the following problem. >> I delete a module and the .pyc stay around. and by "magic", python still >> use the .pyc > Upgrade to Python 3.2 or better, and the problem will go away. > > In 3.2 and above, the .pyc files are stored in a separate __pycache__ > directory, and are only used if the .py file still exists. > Sorry all about the noise, I knew the existence of __pycache__ for the python version but I totally missed that it solved also the problem the stalled .pyc too. From wes.turner at gmail.com Sun Aug 14 22:56:11 2016 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 14 Aug 2016 21:56:11 -0500 Subject: [Python-ideas] add a hash to .pyc to don't mess between .py and .pyc In-Reply-To: References: <1c9f8e60-ddad-fc7d-a162-3f60c9ab8dec@gmail.com> Message-ID: On Sun, Aug 14, 2016 at 9:35 PM, Xavier Combelle wrote: > On 15/08/2016 02:45, Wes Turner wrote: > > > > You can add a `make clean` build step: > > > > pyclean: > > find . -name '*.pyc' -delete > > > > You can delete all .pyc files > > > > - $ find . -name '*.pyc' -delete > > - http://manpages.ubuntu.com/manpages/precise/man1/pyclean.1.html > > #.pyc, .pyo > > > > You can rebuild all .pyc files (for a given directory): > > > > - $ python -m compileall -h > > - https://docs.python.org/2/library/compileall.html > > - https://docs.python.org/3/library/compileall.html > > > > > > > > You can, instead of building .pyc, build .pyo > > > > - https://docs.python.org/2/using/cmdline.html#envvar-PYTHONOPTIMIZE > > - https://docs.python.org/2/using/cmdline.html#cmdoption-O > > > > You can not write .pyc or .pyo w/ PYTHONDONTWRITEBYTECODE / -B > > > > - https://docs.python.org/2/using/cmdline.html#envvar- > PYTHONDONTWRITEBYTECODE > > - https://docs.python.org/2/using/cmdline.html#cmdoption-B > > - If the files exist though, > > - https://docs.python.org/3/reference/import.html > > > > You can build a PEX (which rebuilds .pyc files) and test/deploy that: > > > > - https://github.com/pantsbuild/pex#integrating-pex-into-your-workflow > > - https://pantsbuild.github.io/python-readme.html#more-about- > python-tests > > > > How .pyc files currently work: > > > > - http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html > > - https://www.python.org/dev/peps/pep-3147/#flow-chart (*.pyc -> > > ./__pycache__) > > - http://raulcd.com/how-python-caches-compiled-bytecode.html > > > > You could add a hash of the .py source file in the header of the > > .pyc/.pyo object (as proposed) > > > > - The overhead of this hashing would be a significant performance > > regression > > - Instead, today, the build step can just pyclean or build a > > .zip/.WHL/.PEX which is expected to be a fresh build > > > The problem is not the option of you have to prevent the problem, the > simplest way being > to delete the .pyc file, It is easy to do once you spot it. The problem > is that it randomly happen in > normal workflow. > IIUC, the timestamp in the .pyc header is designed to prevent this ocurrence? Reasons that the modification timestamp comparison could be off: - Time change - Daylight savings time - NTP drift adjustment? > To have an idea of the overhead of the whole hashing procedure I run the > following script > > import sys > > from time import time > from zlib import adler32 as h > t2 =time() > import decimal > print(decimal.__file__) > c1 = time()-t2 > t1=time() > r=h(open(decimal.__file__,'rb').read()) > c2= time()-t1 > print(c2,c1,c2/c1) > > decimal was chosen because it was the biggest file of the standard library. > on 20 runs, the overhead was always between 1% and 1.5% > So yes the overhead on the import process is measurable but very small. > By consequence, I would not call it significant. > Moreover the import process is only a part (and not the biggest one) of > a whole. > I agree that 1 to 1.5% is not significant. > At the difference of my first mail I now consider only a non > cryptographic hash/checksum > as the only aim is to prevent accidental unmatch between .pyc and .py file. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.combelle at gmail.com Sun Aug 14 23:16:32 2016 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Mon, 15 Aug 2016 05:16:32 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <20160813182113.GA1534@phdru.name> Message-ID: <3ee038a6-fb14-6371-8385-335ac84cae4d@gmail.com> On 14/08/2016 16:45, Guido van Rossum wrote: > > A uniform interface works well enough for issue trackers. And the > "freedom of choice" idea doesn't overrule all other concerns. > > Maybe we should just start a python-ideas tracker and see who comes. > There's no reason it couldn't exist in addition to the mailing list. > (Before you scream fragmentation, we have many lists already.) > > --Guido (mobile) > > Does that mean to follow python-ideas, one will need to subscribe to the tracker and the mailing list or will exist a kind of bridge ? From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Aug 15 00:59:39 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 15 Aug 2016 13:59:39 +0900 Subject: [Python-ideas] Move this thread! [was: From mailing list to GitHub issues] In-Reply-To: <00BD2FEA-3149-41F7-8D12-59B7C9F9CB41@stufft.io> References: <20160813163606.GA28955@phdru.name> <00BD2FEA-3149-41F7-8D12-59B7C9F9CB41@stufft.io> Message-ID: <22449.19515.582961.103290@turnbull.sk.tsukuba.ac.jp> Please move this discussion to Overload-SIG. It's off-topic on -ideas now that the SIG exists. @overload-sig members: et tu? :-( I do greatly appreciate the restraint shown by those who forwarded to the SIG in lieu of reply here. :-D The SIG has several experimental channels using different discussion modes, and is prepared to instantiate more. That has the advantage of allowing "head to head" comparison of modes in one (but only one) realistic setting. It also encourages thinking about things like "Mailman with Feature X that Discourse has" (nb, preferring Mailman is my bias, YMMV). It is becoming clear from discussion in that SIG that the responsiveness of the projects providing various communiation platforms to our needs, or our willingness (and ability) to develop an appropriate fork, is likely to be important. https://discuss.python.org/c/overload-sig This is a Discourse forum, hosted by python.org. Register as usual for Discourse. Currently inactive, mostly we've moved to Mailman 3 *for now*. https://mail.python.org/pipermail/overload-sig/ This is a Mailman 2 list, obsoleted by the Mailman 3 list below. Archives only are available at that URL (they haven't yet been ported to the Mailman 3 list). Subscriptions have been moved to Mailman 3 permanently, and I think there's a consensus that Mailman 3 is a big win over Mailman 2 (though there is no consensus on comparison with other modes yet). https://mail.python.org/mm3/mailman3/lists/overload-sig at python.org/ This is a Mailman 3 list. Most likely you're not registered with Mailman 3 yet. Just click the Login button anyway, that will provide a registration link. We currently use Mozilla Persona for registration and authentication. You can use a couple of social auth systems for authentication, or register your email with Persona in the usual "get a token URI in the mail" fashion. GitHub We probably *soon* will create a github project for overload-sig. @Guido: It's *not* a good idea to try this out on -ideas yet (unless we don't care if the experimental animal dies). From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Aug 15 01:05:30 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 15 Aug 2016 14:05:30 +0900 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> Message-ID: <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> Steve Dower writes: > I plan to use only Unicode to interact with the OS and then utf8 > within Python if the caller wants bytes. This doesn't answer Victor's questions, or mine. This proposal requires identifying and transcoding bytes that represent text in encodings other than UTF-8. 1. How do you propose to identify "bytes that represent text (and might be filenames)" if they did *not* originate in a filesystem or console API? 2. How do you propose to identify the non-UTF-8 encoding, if you have forced all variables signifying bytes encodings to UTF-8? Additional considerations: As far as I can see, this is just a recipe for a different way to get mojibake. *The* way to avoid mojibake is to "let text be text" *internally*. Developers who insist on processing text as bytes are going to get what they deserve *in edge cases*. But mostly (ie, in the mono-encoding environments of most users) it just (barely ;-) works. And there are many use cases where you *can* process bytes that happen to encode text as "just bytes" (eg, low-level networking code). These cases have performance issues if the bytes-text-bytes-text-bytes double-round-trip implied for *stream content* (vs the OS APIs you're concerned with, which effectively round-trip text-bytes-text) is imposed on them. From tjreedy at udel.edu Mon Aug 15 01:21:48 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 15 Aug 2016 01:21:48 -0400 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: Message-ID: On 8/13/2016 1:40 PM, Arek Bulski wrote: > Praise the guide! (Guido) > > ?GitHub issues are also delivered by email, with full post content. > Guido and others will be satisfied. The mailing lists are currently mirrored on news.gmane.org, though that could change. This works great for me. The tracker mailings to my mailbox (only for issues I signup for) also work for me. Github issues would not be mirrored, and my experience with getting email just for the devguide tracker left a negative impression with me. The one plus to me would be the reduction of quoting. -- Terry Jan Reedy From victor.stinner at gmail.com Mon Aug 15 04:31:56 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 15 Aug 2016 10:31:56 +0200 Subject: [Python-ideas] add a hash to .pyc to don't mess between .py and .pyc In-Reply-To: <1c9f8e60-ddad-fc7d-a162-3f60c9ab8dec@gmail.com> References: <1c9f8e60-ddad-fc7d-a162-3f60c9ab8dec@gmail.com> Message-ID: The purpose of .pyc is to optmize python. With your proposed change, the number of syscalls is doubled (open, read, close) and you add extra work (compute hash) when .pyc is used. If your filesystem works correctly, you should not have to bother. Victor Le 15 ao?t 2016 01:06, "Xavier Combelle" a ?crit : > I have stumbled upon several time with the following problem. > I delete a module and the .pyc stay around. and by "magic", python still > use the .pyc > A similar error happen (but less often) when by some file system > manipulation the .pyc happen to be > newer than the .py but correspond to an older version of .py. It is not > a major problem but it is still an existing problem. > > I'm not the first one to have this problem. A stack overflow search lead > to quite a lot of relevant answers > http://stackoverflow.com/search?q=old+pyc and google search too > https://www.google.fr/search?q=old+pyc > moreover several result of google result in bug tracking of various > project. (There is also in these result the fact that .pyc > are stored in VCS repositories but this is another problem not related) > I even found a blog post using .pyc as a backdoor > http://secureallthethings.blogspot.fr/2015/11/ > backdooring-python-via-pyc-pi-wa-si_9.html > > My idea to kill both bird in one stone would be to add a hash (likely to > be cryptographic) of the .py file in the .pyc file and read the .py file > and check the hash > The additional cost of first startup cost will be just the hash > calculation which I think is cheap comparing to other factors > (especially input output) > The additional second startup cost of a program the main cost will be > the additional read of .py files and the cheap hash calculations. > > I believe the removing of the bugs would worth the performance cost. > > I know that some use case makes a use of just using .pyc and not keeping > .py around, for example by not distribute the source file. > But in my vision, this uses case should be solved per opt-in decision > and not as a default. Several opt-in mechanisms could be envisioned: > environment variables, command line switches, special compilation of > .pyc which explicitly ask to not check for the hash. > > -- > Xavier > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arek.bulski at gmail.com Mon Aug 15 04:34:48 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Mon, 15 Aug 2016 10:34:48 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> <57b03025.a913190a.6f4c1.a901@mx.google.com> Message-ID: But we do not care if the experimental animal dies, that is the point of doing the experiment. I registered at Discuss and kina like it. Then tried to create a new thread and my Android keyboard shows over the fields. Discuss As it is now doesnt work for mobile. 14 sie 2016 9:35 PM "Arek Bulski" napisa?(a): > As i think Donald pointed it out, it doesnt take a laptop to contribute. > Did you all notice that Guido replied from a phone? > > Currently half of the mailing list mail is large auto quotes or subject > date info. Lines are never broken the way they should be. Who wants to keep > their mail toolchains, keep it. Dont make the rest of us put up with this > shit. > > There is no definition of fails just As i dont have a definition of > consensus. People will stick to it or not. No voting, just participation. > > 14 sie 2016 1:57 PM "Arek Bulski" napisa?(a): > > ?I throw a proposal on the table: lets create a "python-ideas" repo under > "python" account on GitHub and move this and only this thread onto it. If > it fails, nothing but this thread is lost (not persisted in the mailing > list) which would make no difference anyway. People made many points that > are purely abstract. We need some hands on experience to see if it works > for us or doesnt. > > Guido, could you create a repo for us?? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arek.bulski at gmail.com Mon Aug 15 07:05:26 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Mon, 15 Aug 2016 13:05:26 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> <57b03025.a913190a.6f4c1.a901@mx.google.com> Message-ID: For those talking abstract points, here is a screenshot of GitHub. Works like a charm on mobile. https://s3.postimg.org/e30lc3tk3/Screenshot_2016_08_15_13_00_59_com_browser_inter.png 15 sie 2016 10:34 AM "Arek Bulski" napisa?(a): > But we do not care if the experimental animal dies, that is the point of > doing the experiment. > > I registered at Discuss and kina like it. Then tried to create a new > thread and my Android keyboard shows over the fields. Discuss As it is now > doesnt work for mobile. > > 14 sie 2016 9:35 PM "Arek Bulski" napisa?(a): > >> As i think Donald pointed it out, it doesnt take a laptop to contribute. >> Did you all notice that Guido replied from a phone? >> >> Currently half of the mailing list mail is large auto quotes or subject >> date info. Lines are never broken the way they should be. Who wants to keep >> their mail toolchains, keep it. Dont make the rest of us put up with this >> shit. >> >> There is no definition of fails just As i dont have a definition of >> consensus. People will stick to it or not. No voting, just participation. >> >> 14 sie 2016 1:57 PM "Arek Bulski" napisa?(a): >> >> ?I throw a proposal on the table: lets create a "python-ideas" repo under >> "python" account on GitHub and move this and only this thread onto it. If >> it fails, nothing but this thread is lost (not persisted in the mailing >> list) which would make no difference anyway. People made many points that >> are purely abstract. We need some hands on experience to see if it works >> for us or doesnt. >> >> Guido, could you create a repo for us?? >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From arek.bulski at gmail.com Mon Aug 15 08:39:17 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Mon, 15 Aug 2016 14:39:17 +0200 Subject: [Python-ideas] From mailing list to GitHub issues In-Reply-To: <22449.43239.376984.533729@turnbull.sk.tsukuba.ac.jp> References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> <57b03025.a913190a.6f4c1.a901@mx.google.com> <22449.43239.376984.533729@turnbull.sk.tsukuba.ac.jp> Message-ID: One person saying "this thread doesnt belong here" doesnt make it so. I have met too often in my life with a situation where people were chasing others away because they simply didnt like the particular topic. This thread still Has its participants. You are replying on it As well. And until Discuss is no longer broken, no posting for me there ?. Will file a bug report tho, as suggested. Sorry for etiquette, will do better. I meant that people not using mailing lists are a considerable group and I was asking on their behalf. And I shall add "thou not Ask Guido directly" to the commandments. The repo should be under official python account, i dont see much sense in starting a private one. But fine, there you go. We can migrate it later if you ?want. ?https://github.com/arekbulski/python-ideas/issues/1? -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Mon Aug 15 09:23:44 2016 From: steve.dower at python.org (Steve Dower) Date: Mon, 15 Aug 2016 06:23:44 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> Message-ID: I guess I'm not sure what your question is then. Using text internally is of course the best way to deal with it. But for those who insist on using bytes, this change at least makes Windows a feasible target without requiring manual encoding/decoding at every boundary. Top-posted from my Windows Phone -----Original Message----- From: "Stephen J. Turnbull" Sent: ?8/?14/?2016 22:06 To: "Steve Dower" Cc: "Victor Stinner" ; "python-ideas" ; "Random832" Subject: RE: [Python-ideas] Fix default encodings on Windows Steve Dower writes: > I plan to use only Unicode to interact with the OS and then utf8 > within Python if the caller wants bytes. This doesn't answer Victor's questions, or mine. This proposal requires identifying and transcoding bytes that represent text in encodings other than UTF-8. 1. How do you propose to identify "bytes that represent text (and might be filenames)" if they did *not* originate in a filesystem or console API? 2. How do you propose to identify the non-UTF-8 encoding, if you have forced all variables signifying bytes encodings to UTF-8? Additional considerations: As far as I can see, this is just a recipe for a different way to get mojibake. *The* way to avoid mojibake is to "let text be text" *internally*. Developers who insist on processing text as bytes are going to get what they deserve *in edge cases*. But mostly (ie, in the mono-encoding environments of most users) it just (barely ;-) works. And there are many use cases where you *can* process bytes that happen to encode text as "just bytes" (eg, low-level networking code). These cases have performance issues if the bytes-text-bytes-text-bytes double-round-trip implied for *stream content* (vs the OS APIs you're concerned with, which effectively round-trip text-bytes-text) is imposed on them. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Mon Aug 15 09:41:08 2016 From: random832 at fastmail.com (Random832) Date: Mon, 15 Aug 2016 09:41:08 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> Message-ID: <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> On Mon, Aug 15, 2016, at 09:23, Steve Dower wrote: > I guess I'm not sure what your question is then. > > Using text internally is of course the best way to deal with it. But for > those who insist on using bytes, this change at least makes Windows a > feasible target without requiring manual encoding/decoding at every > boundary. Why isn't it already? What's "not feasible" about requiring manual encoding/decoding? Basically your assumption is that people using Python on windows and having to deal with files that contain filename data encoded as bytes are more likely to be dealing with data that is either UTF-8 anyway (coming from Linux or some other platform) or came from the current version of Python (which will encode things in UTF-8 under the change) than they are to deal with data that came from other Windows programs that encoded things in the codepage used by them and by other Windows users in the same country / who speak the same language. From steve.dower at python.org Mon Aug 15 12:35:18 2016 From: steve.dower at python.org (Steve Dower) Date: Mon, 15 Aug 2016 09:35:18 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> Message-ID: I'm still not sure we're talking about the same thing right now. For `open(path_as_bytes).read()`, are we talking about the way path_as_bytes is passed to the file system? Or the codec used to decide the returned string? Top-posted from my Windows Phone -----Original Message----- From: "Random832" Sent: ?8/?15/?2016 6:41 To: "Steve Dower" ; "Stephen J. Turnbull" Cc: "Victor Stinner" ; "python-ideas" Subject: Re: [Python-ideas] Fix default encodings on Windows On Mon, Aug 15, 2016, at 09:23, Steve Dower wrote: > I guess I'm not sure what your question is then. > > Using text internally is of course the best way to deal with it. But for > those who insist on using bytes, this change at least makes Windows a > feasible target without requiring manual encoding/decoding at every > boundary. Why isn't it already? What's "not feasible" about requiring manual encoding/decoding? Basically your assumption is that people using Python on windows and having to deal with files that contain filename data encoded as bytes are more likely to be dealing with data that is either UTF-8 anyway (coming from Linux or some other platform) or came from the current version of Python (which will encode things in UTF-8 under the change) than they are to deal with data that came from other Windows programs that encoded things in the codepage used by them and by other Windows users in the same country / who speak the same language. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Mon Aug 15 12:54:03 2016 From: random832 at fastmail.com (Random832) Date: Mon, 15 Aug 2016 12:54:03 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> Message-ID: <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> On Mon, Aug 15, 2016, at 12:35, Steve Dower wrote: > I'm still not sure we're talking about the same thing right now. > > For `open(path_as_bytes).read()`, are we talking about the way > path_as_bytes is passed to the file system? Or the codec used to decide > the returned string? We are talking about the way path_as_bytes is passed to the filesystem, and in particular what encoding path_as_bytes is *actually* in, when it was obtained from a file or other stream opened in binary mode. From steve.dower at python.org Mon Aug 15 14:26:34 2016 From: steve.dower at python.org (Steve Dower) Date: Mon, 15 Aug 2016 11:26:34 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> Message-ID: <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> On 15Aug2016 0954, Random832 wrote: > On Mon, Aug 15, 2016, at 12:35, Steve Dower wrote: >> I'm still not sure we're talking about the same thing right now. >> >> For `open(path_as_bytes).read()`, are we talking about the way >> path_as_bytes is passed to the file system? Or the codec used to decide >> the returned string? > > We are talking about the way path_as_bytes is passed to the filesystem, > and in particular what encoding path_as_bytes is *actually* in, when it > was obtained from a file or other stream opened in binary mode. Okay good, we are talking about the same thing. Passing path_as_bytes in that location has been deprecated since 3.3, so we are well within our rights (and probably overdue) to make it a TypeError in 3.6. While it's obviously an invalid assumption, for the purposes of changing the language we can assume that no existing code is passing bytes into any functions where it has been deprecated. As far as I'm concerned, there are currently no filesystem APIs on Windows that accept paths as bytes. Given that, I'm proposing adding support for using byte strings encoded with UTF-8 in file system functions on Windows. This allows Python users to omit switching code like: if os.name == 'nt': f = os.stat(os.listdir('.')[-1]) else: f = os.stat(os.listdir(b'.')[-1]) Or simply using the bytes variant unconditionally because they heard it was faster (sacrificing cross-platform correctness, since it may not correctly round-trip on Windows). My proposal is to remove all use of the *A APIs and only use the *W APIs. That completely removes the (already deprecated) use of bytes as paths. I then propose to change the (unused on Windows) sys.getfsdefaultencoding() to 'utf-8' and handle bytes being passed into filesystem functions by transcoding into UTF-16 and calling the *W APIs. This completely removes the active codepage from the chain, allows paths returned from the filesystem to correctly roundtrip via bytes in Python, and allows those bytes paths to be manipulated at '\' characters. (Frankly I don't mind what encoding we use, and I'd be quite happy to force bytes paths to be UTF-16-LE encoded, which would also round-trip invalid surrogate pairs. But that would prevent basic manipulation which seems to be a higher priority.) This does not allow you to take bytes from an arbitrary source and assume that they are correctly encoded for the file system. Python 3.3, 3.4 and 3.5 have been warning that doing that is deprecated and the path needs to be decoded to a known encoding first. At this stage, it's time for us to either make byte paths an error, or to specify a suitable encoding that can correctly round-trip paths. If this does not answer the question, I'm going to need the question to be explained more clearly for me. Cheers, Steve From steve.dower at python.org Mon Aug 15 14:38:00 2016 From: steve.dower at python.org (Steve Dower) Date: Mon, 15 Aug 2016 11:38:00 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> Message-ID: <100ef43f-f710-f97e-25ae-07963ab762c5@python.org> On 15Aug2016 1126, Steve Dower wrote: > My proposal is to remove all use of the *A APIs and only use the *W > APIs. That completely removes the (already deprecated) use of bytes as > paths. I then propose to change the (unused on Windows) > sys.getfsdefaultencoding() to 'utf-8' and handle bytes being passed into > filesystem functions by transcoding into UTF-16 and calling the *W APIs. Of course, I meant sys.getfilesystemencoding() here. The C functions have "FSDefault" in many of the names, which is why I guessed the wrong Python variant. Cheers, Steve From barry at python.org Mon Aug 15 17:03:05 2016 From: barry at python.org (Barry Warsaw) Date: Mon, 15 Aug 2016 17:03:05 -0400 Subject: [Python-ideas] Digests (was Re: From mailing list to GitHub issues) References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> Message-ID: <20160815170305.0b12ff6d@anarchist.wooz.org> On Aug 14, 2016, at 02:01 PM, Chris Angelico wrote: >The biggest problem I'm seeing is with digests. Can that feature be >flagged off as "DO NOT USE THIS UNLESS YOU KNOW WHAT YOU ARE ASKING >FOR"? So many people seem to select digest mode, then get extremely >confused by it. Yes, we can turn off digests for python-ideas, or any Mailman mailing list. I was tempted to JFDI, but it would mean that ~25% of list members would no longer get messages. That's because 254 out of 979 members are currently receiving digests. Let's give people a grace period, say of one week. You have until Monday 22-Aug-2016 to switch to non-digest delivery or read the mailing list through some other outlet (e.g. Gmane's NNTP interface) if you still want to get messages for python-ideas. Cheers, -Barry P.S. I am refraining from responding to other topics in this thread, since I think the proper place to do that is overload-sig. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From eryksun at gmail.com Mon Aug 15 21:19:03 2016 From: eryksun at gmail.com (eryk sun) Date: Tue, 16 Aug 2016 01:19:03 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> Message-ID: On Mon, Aug 15, 2016 at 6:26 PM, Steve Dower wrote: > > (Frankly I don't mind what encoding we use, and I'd be quite happy to force bytes > paths to be UTF-16-LE encoded, which would also round-trip invalid surrogate > pairs. But that would prevent basic manipulation which seems to be a higher > priority.) The CRT manually decodes and encodes using the private functions __acrt_copy_path_to_wide_string and __acrt_copy_to_char. These use either the ANSI or OEM codepage, depending on the value returned by WinAPI AreFileApisANSI. CPython could follow suit. Doing its own encoding and decoding would enable using filesystem functions that will never get an [A]NSI version (e.g. GetFileInformationByHandleEx), while still retaining backward compatibility. Filesystem encoding could use WC_NO_BEST_FIT_CHARS and raise a warning when lpUsedDefaultChar is true. Filesystem decoding could use MB_ERR_INVALID_CHARS and raise a warning and retry without this flag for ERROR_NO_UNICODE_TRANSLATION (e.g. an invalid DBCS sequence). This could be implemented with a new "warning" handler for PyUnicode_EncodeCodePage and PyUnicode_DecodeCodePageStateful. A new 'fsmbcs' encoding could be added that checks AreFileApisANSI to choose betwen CP_ACP and CP_OEMCP. From chris.barker at noaa.gov Mon Aug 15 21:34:59 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 15 Aug 2016 18:34:59 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> Message-ID: <8087808059974504431@unknownmsgid> > Given that, I'm proposing adding support for using byte strings encoded with UTF-8 in file system functions on Windows. This allows Python users to omit switching code like: > > if os.name == 'nt': > f = os.stat(os.listdir('.')[-1]) > else: > f = os.stat(os.listdir(b'.')[-1]) REALLY? Do we really want to encourage using bytes as paths? IIUC, anyone that wants to platform-independentify that code just needs to use proper strings (or pat glib) for paths everywhere, yes? I understand that pre-surrogate-escape, there was a need for bytes paths, but those days are gone, yes? So why, at this late date, kludge what should be a deprecated pattern into the Windows build??? -CHB > My proposal is to remove all use of the *A APIs and only use the *W APIs. That completely removes the (already deprecated) use of bytes as paths. Yes, this is good. > I then propose to change the (unused on Windows) sys.getfsdefaultencoding() to 'utf-8' and handle bytes being passed into filesystem functions by transcoding into UTF-16 and calling the *W APIs. I'm really not sure utf-8 is magic enough to do this. Where do you imagine that utf-8 is coming from as bytes??? AIUI, while utf-8 is almost universal in *nix for file system names, folks do not want to count on it -- hence the use of bytes. And it is far less prevalent in the Windows world... > , allows paths returned from the filesystem to correctly roundtrip via bytes in Python, That you could do with native bytes (UTF-16, yes?) > . But that would prevent basic manipulation which seems to be a higher priority.) Still think Unicode is the answer to that... > At this stage, it's time for us to either make byte paths an error, +1. :-) CHB From steve.dower at python.org Mon Aug 15 21:39:35 2016 From: steve.dower at python.org (Steve Dower) Date: Mon, 15 Aug 2016 18:39:35 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> Message-ID: <57B26ED7.5000107@python.org> On 15Aug2016 1819, eryk sun wrote: > On Mon, Aug 15, 2016 at 6:26 PM, Steve Dower wrote: >> >> (Frankly I don't mind what encoding we use, and I'd be quite happy to force bytes >> paths to be UTF-16-LE encoded, which would also round-trip invalid surrogate >> pairs. But that would prevent basic manipulation which seems to be a higher >> priority.) > > The CRT manually decodes and encodes using the private functions > __acrt_copy_path_to_wide_string and __acrt_copy_to_char. These use > either the ANSI or OEM codepage, depending on the value returned by > WinAPI AreFileApisANSI. CPython could follow suit. Doing its own > encoding and decoding would enable using filesystem functions that > will never get an [A]NSI version (e.g. GetFileInformationByHandleEx), > while still retaining backward compatibility. > > Filesystem encoding could use WC_NO_BEST_FIT_CHARS and raise a warning > when lpUsedDefaultChar is true. Filesystem decoding could use > MB_ERR_INVALID_CHARS and raise a warning and retry without this flag > for ERROR_NO_UNICODE_TRANSLATION (e.g. an invalid DBCS sequence). This > could be implemented with a new "warning" handler for > PyUnicode_EncodeCodePage and PyUnicode_DecodeCodePageStateful. A new > 'fsmbcs' encoding could be added that checks AreFileApisANSI to choose > betwen CP_ACP and CP_OEMCP. None of that makes it less complicated or more reliable. Warnings based on values are bad (they should be based on types) and using the *W APIs exclusively is the right way to go. The question then is whether we allow file system functions to return bytes, and if so, which encoding to use. This then directly informs what the functions accept, for the purposes of round-tripping. *Any* encoding that may silently lose data is a problem, which basically leaves utf-16 as the only option. However, as that causes other problems, maybe we can accept the tradeoff of returning utf-8 and failing when a path contains invalid surrogate pairs (which is extremely rare by comparison to characters outside of CP_ACP)? If utf-8 is unacceptable, we're back to the current situation and should be removing the support for bytes that was deprecated three versions ago. Cheers, Steve From ncoghlan at gmail.com Mon Aug 15 22:00:00 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Aug 2016 12:00:00 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <8087808059974504431@unknownmsgid> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> Message-ID: On 16 August 2016 at 11:34, Chris Barker - NOAA Federal wrote: >> Given that, I'm proposing adding support for using byte strings encoded with UTF-8 in file system functions on Windows. This allows Python users to omit switching code like: >> >> if os.name == 'nt': >> f = os.stat(os.listdir('.')[-1]) >> else: >> f = os.stat(os.listdir(b'.')[-1]) > > REALLY? Do we really want to encourage using bytes as paths? IIUC, > anyone that wants to platform-independentify that code just needs to > use proper strings (or pat glib) for paths everywhere, yes? The problem is that bytes-as-paths actually *does* work for Mac OS X and systemd based Linux distros properly configured to use UTF-8 for OS interactions. This means that a lot of backend network service code makes that assumption, especially when it was originally written for Python 2, and rather than making it work properly on Windows, folks just drop Windows support as part of migrating to Python 3. At an ecosystem level, that means we're faced with a choice between implicitly encouraging folks to make their code *nix only, and finding a way to provide a more *nix like experience when running on Windows (where UTF-8 encoded binary data just works, and either other encodings lead to mojibake or else you use chardet to figure things out). Steve is suggesting that the latter option is preferable, a view I agree with since it lowers barriers to entry for Windows based developers to contribute to primarily *nix focused projects. > I understand that pre-surrogate-escape, there was a need for bytes > paths, but those days are gone, yes? No, UTF-8 encoded bytes are still the native language of network service development: http://utf8everywhere.org/ It also helps with cases where folks are switching back and forth between Python and other environments like JavaScript and Go where the UTF-8 assumption is more prevalent. > So why, at this late date, kludge what should be a deprecated pattern > into the Windows build??? Promoting cross-platform consistency often leads to enabling patterns that are considered a bad idea from a native platform perspective, and this strikes me as an example of that (just as the binary/text separation itself is a case where Python 3 diverged from the POSIX text model to improve consistency across *nix, Windows, JVM and CLR environments). Cheers, Nick. From eryksun at gmail.com Tue Aug 16 02:06:10 2016 From: eryksun at gmail.com (eryk sun) Date: Tue, 16 Aug 2016 06:06:10 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57B26ED7.5000107@python.org> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <57B26ED7.5000107@python.org> Message-ID: >> On Mon, Aug 15, 2016 at 6:26 PM, Steve Dower >> wrote: > > and using the *W APIs exclusively is the right way to go. My proposal was to use the wide-character APIs, but transcoding CP_ACP without best-fit characters and raising a warning whenever the default character is used (e.g. substituting Katakana middle dot when creating a file using a bytes path that has an invalid sequence in CP932). This proposal was in response to the case made by Stephen Turnbull. If using UTF-8 is getting such heavy pushback, I thought half a solution was better than nothing, and it also sets up the infrastructure to easily switch to UTF-8 if that idea eventually gains acceptance. It could raise exceptions instead of warnings if that's preferred, since bytes paths on Windows are already deprecated. > *Any* encoding that may silently lose data is a problem, which basically > leaves utf-16 as the only option. However, as that causes other problems, > maybe we can accept the tradeoff of returning utf-8 and failing when a > path contains invalid surrogate pairs Are there any common sources of illegal UTF-16 surrogates in Windows filenames? I see that WTF-8 (Wobbly) was developed to handle this problem. A WTF-8 path would roundtrip back to the filesystem, but it should only be used internally in a program. From victor.stinner at gmail.com Tue Aug 16 06:28:58 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 16 Aug 2016 12:28:58 +0200 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <57B26ED7.5000107@python.org> Message-ID: 2016-08-16 8:06 GMT+02:00 eryk sun : > My proposal was to use the wide-character APIs, but transcoding CP_ACP > without best-fit characters and raising a warning whenever the default > character is used (e.g. substituting Katakana middle dot when creating > a file using a bytes path that has an invalid sequence in CP932). A problem with all these proposal is that they *add* new code to the CPython code base, code specific to Windows. There are very few core developers (1 or 2?) who work on this code specific to Windows. I would prefer to *drop* code specific to Windows rather that *adding* (or changing) code specific to Windows, just to make the CPython code base simpler to maintain. It's already annoying enough. It's common that a Python function has one implementation for all platforms except Windows, and a second implementation specific to Windows. An example: os.listdir() * ~150 lines of C code for the Windows implementation * ~100 lines of C code for the UNIX/BSD implementation * The Windows implementation is splitted in two parts: Unicode and bytes, so the code is basically duplicated (2 versions) If you remove the bytes support, the Windows function is reduced to 100 lines (-50). I'm not sure that modifying the API using byte would solve any issue on Windows, and there is an obvious risk of regression (mojibake when you concatenerate strings encoded to UTF-8 and to ANSI code page). I'm in favor of forcing developers to use Unicode on Windows, which is the correct way to use the Windows API. The side effect is that such code works perfectly well on UNIX/BSD ;-) To be clear: drop the deprecated code to support bytes on Windows. I already proposed to drop bytes support on Windows and most answers were "please keep them", so another option is to keep the "broken code" as the status quo... I really hate APIs using bytes on Windows because they use WideCharToMultiByte() (encode unicode to bytes) in a mode which is likely to lead to mojibake: unencodable characters are replaced with "best fit characters" or "?". https://unicodebook.readthedocs.io/operating_systems.html#encode-and-decode-functions In a perfect world, I would also propose to deprecate bytes filenames on UNIX, but I expect an insane flamewar on the definition of "UNIX", history of UNIX, etc. (non technical discussion, since Unicode works very well on Python 3...). Victor From p.f.moore at gmail.com Tue Aug 16 06:53:12 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 16 Aug 2016 11:53:12 +0100 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> Message-ID: On 15 August 2016 at 19:26, Steve Dower wrote: > Passing path_as_bytes in that location has been deprecated since 3.3, so we > are well within our rights (and probably overdue) to make it a TypeError in > 3.6. While it's obviously an invalid assumption, for the purposes of > changing the language we can assume that no existing code is passing bytes > into any functions where it has been deprecated. > > As far as I'm concerned, there are currently no filesystem APIs on Windows > that accept paths as bytes. [...] On 16 August 2016 at 03:00, Nick Coghlan wrote: > The problem is that bytes-as-paths actually *does* work for Mac OS X > and systemd based Linux distros properly configured to use UTF-8 for > OS interactions. This means that a lot of backend network service code > makes that assumption, especially when it was originally written for > Python 2, and rather than making it work properly on Windows, folks > just drop Windows support as part of migrating to Python 3. > > At an ecosystem level, that means we're faced with a choice between > implicitly encouraging folks to make their code *nix only, and finding > a way to provide a more *nix like experience when running on Windows > (where UTF-8 encoded binary data just works, and either other > encodings lead to mojibake or else you use chardet to figure things > out). > > Steve is suggesting that the latter option is preferable, a view I > agree with since it lowers barriers to entry for Windows based > developers to contribute to primarily *nix focused projects. So does this mean that you're recommending reverting the deprecation of bytes as paths in favour of documenting that bytes as paths is acceptable, but it will require an encoding of UTF-8 rather than the current behaviour? If so, that raises some questions: 1. Is it OK to backtrack on a deprecation by changing the behaviour like this? (I think it is, but others who rely on the current, deprecated, behaviour may not). 2. Should we be making "always UTF-8" the behaviour on all platforms, rather than just Windows (e.g., Unix systems which haven't got UTF-8 as their locale setting)? This doesn't seem to be a Windows-specific question any more (I'm assuming that if bytes-as-paths are deprecated, that's a cross-platform change, but see below). Having said all this, I can't find the documentation stating that bytes paths are deprecated - the open() documentation for 3.5 says "file is either a string or bytes object giving the pathname (absolute or relative to the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped" and there's no mention of a deprecation. Steve - could you provide a reference? Paul From eryksun at gmail.com Tue Aug 16 09:09:32 2016 From: eryksun at gmail.com (eryk sun) Date: Tue, 16 Aug 2016 13:09:32 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> Message-ID: On Tue, Aug 16, 2016 at 10:53 AM, Paul Moore wrote: > > Having said all this, I can't find the documentation stating that > bytes paths are deprecated - the open() documentation for 3.5 says > "file is either a string or bytes object giving the pathname (absolute > or relative to the current working directory) of the file to be opened > or an integer file descriptor of the file to be wrapped" and there's > no mention of a deprecation. Bytes paths aren't deprecated on Unix -- only on Windows, and only for the os functions. You can see the deprecation warning with -Wall: >>> os.listdir(b'.') __main__:1: DeprecationWarning: The Windows bytes API has been deprecated, use Unicode filenames instead AFAIK this isn't documented. Since the Windows CRT's _open implementation uses MultiByteToWideChar without the flag MB_ERR_INVALID_CHARS, bytes paths should also be deprecated for io.open. The problem is that bad DBCS sequences are mapped silently to the default Unicode character instead of raising an error. From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Aug 16 09:49:05 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 16 Aug 2016 22:49:05 +0900 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> Message-ID: <22451.6609.54145.916305@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > At an ecosystem level, that means we're faced with a choice between > implicitly encouraging folks to make their code *nix only, and > finding a way to provide a more *nix like experience when running > on Windows (where UTF-8 encoded binary data just works, and either > other encodings lead to mojibake or else you use chardet to figure > things out). Most of the time we do know what the encoding is, we can just ask Windows (although Steve proposes to make Python fib about that, we could add other APIs). This change means that programs that until now could be encoding- agnostic and just pass around bytes on Windows, counting on Python to consistently convert those to the appropriate form for the API, can't do that any more. They have to find out what the encoding is, and transcode to UTF-8, or rewrite to do their processing as text. This is a potential burden on existing user code. I suppose that there are such programs, for the same reasons that networking programs tend to use bytes I/O: ports from Python 2, an (misplaced?) emphasis on performance, etc. > Steve is suggesting that the latter option is preferable, a view I > agree with since it lowers barriers to entry for Windows based > developers to contribute to primarily *nix focused projects. Sure, but do you have any idea what the costs might be? Aside from the code burden mentioned above, there's a reputational issue. Just yesterday I was having a (good-natured) Perl vs. Python discussion on my LUG ML, and two developers volunteered that they avoid Python because "the Python developers frequently break backward compatibility". These memes tend to go off on their own anyway, but this change will really feed that one. > Promoting cross-platform consistency often leads to enabling > patterns that are considered a bad idea from a native platform > perspective, and this strikes me as an example of that (just as the > binary/text separation itself is a case where Python 3 diverged > from the POSIX text model to improve consistency across *nix, > Windows, JVM and CLR environments). I would say rather Python 3 chose an across-the-board better, more robust model supporting internationalization and multilingualization properly. POSIX's text model is suitable at best for a fragile localization. This change, OTOH, is a step backward we wouldn't consider except for the intended effect on ease of writing networking code. That's important, but I really don't think that's going to be the only major effect, and I fear it won't be the most important effect. Of course that's FUD -- I have no data on potential burden to existing use cases, or harm to reputation. But neither do you and Steve. :-( From steve.dower at python.org Tue Aug 16 09:59:00 2016 From: steve.dower at python.org (Steve Dower) Date: Tue, 16 Aug 2016 06:59:00 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> Message-ID: Hmm, doesn't seem to be explicitly listed as a deprecation, though discussion form around that time makes it clear that everyone thought it was. I also found this proposal to use strict mbcs to decode bytes for use against the file system, which is basically the same as what I'm proposing now apart from the more limited encoding: https://mail.python.org/pipermail/python-dev/2011-October/114203.html It definitely results in less C code to maintain if we do the decode ourselves. We could use strict mbcs, but I'd leave the deprecation warnings in there. Or perhaps we provide an env var to use mbcs as the file system encoding but default to utf8 (I already have one for selecting legacy console encoding)? Callers should be asking the sys module for the encoding anyway, so I'd expect few libraries to be impacted, though applications might prefer it. Top-posted from my Windows Phone -----Original Message----- From: "Paul Moore" Sent: ?8/?16/?2016 3:54 To: "Nick Coghlan" Cc: "python-ideas" Subject: Re: [Python-ideas] Fix default encodings on Windows On 15 August 2016 at 19:26, Steve Dower wrote: > Passing path_as_bytes in that location has been deprecated since 3.3, so we > are well within our rights (and probably overdue) to make it a TypeError in > 3.6. While it's obviously an invalid assumption, for the purposes of > changing the language we can assume that no existing code is passing bytes > into any functions where it has been deprecated. > > As far as I'm concerned, there are currently no filesystem APIs on Windows > that accept paths as bytes. [...] On 16 August 2016 at 03:00, Nick Coghlan wrote: > The problem is that bytes-as-paths actually *does* work for Mac OS X > and systemd based Linux distros properly configured to use UTF-8 for > OS interactions. This means that a lot of backend network service code > makes that assumption, especially when it was originally written for > Python 2, and rather than making it work properly on Windows, folks > just drop Windows support as part of migrating to Python 3. > > At an ecosystem level, that means we're faced with a choice between > implicitly encouraging folks to make their code *nix only, and finding > a way to provide a more *nix like experience when running on Windows > (where UTF-8 encoded binary data just works, and either other > encodings lead to mojibake or else you use chardet to figure things > out). > > Steve is suggesting that the latter option is preferable, a view I > agree with since it lowers barriers to entry for Windows based > developers to contribute to primarily *nix focused projects. So does this mean that you're recommending reverting the deprecation of bytes as paths in favour of documenting that bytes as paths is acceptable, but it will require an encoding of UTF-8 rather than the current behaviour? If so, that raises some questions: 1. Is it OK to backtrack on a deprecation by changing the behaviour like this? (I think it is, but others who rely on the current, deprecated, behaviour may not). 2. Should we be making "always UTF-8" the behaviour on all platforms, rather than just Windows (e.g., Unix systems which haven't got UTF-8 as their locale setting)? This doesn't seem to be a Windows-specific question any more (I'm assuming that if bytes-as-paths are deprecated, that's a cross-platform change, but see below). Having said all this, I can't find the documentation stating that bytes paths are deprecated - the open() documentation for 3.5 says "file is either a string or bytes object giving the pathname (absolute or relative to the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped" and there's no mention of a deprecation. Steve - could you provide a reference? Paul _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Tue Aug 16 09:59:36 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 16 Aug 2016 14:59:36 +0100 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> Message-ID: On 16 August 2016 at 14:09, eryk sun wrote: > On Tue, Aug 16, 2016 at 10:53 AM, Paul Moore wrote: >> >> Having said all this, I can't find the documentation stating that >> bytes paths are deprecated - the open() documentation for 3.5 says >> "file is either a string or bytes object giving the pathname (absolute >> or relative to the current working directory) of the file to be opened >> or an integer file descriptor of the file to be wrapped" and there's >> no mention of a deprecation. > > Bytes paths aren't deprecated on Unix -- only on Windows, and only for > the os functions. You can see the deprecation warning with -Wall: > > >>> os.listdir(b'.') > __main__:1: DeprecationWarning: The Windows bytes API has been > deprecated, use Unicode filenames instead Thanks. So this remains a Windows-only issue (which is good). > AFAIK this isn't documented. It probably should be. Although if we're changing the deprecation to a behaviour change, then maybe there's no point. But some of the arguments here about breaking code are hinging on the idea that people currently using the bytes API are using an (on the way to being) unsupported feature and it's not really acceptable to take that position if the deprecation wasn't announced. If the objections being raised here (in the context of Japanese encodings and similar) would apply equally to the bytes API being removed, then it seems to me that we have a failure in our deprecation process, as those objections should have been addressed when we started the deprecation. Alternatively, if the deprecation of the os functions is OK, but it's the deprecation of open (and presumably io.open) that's the issue, then the whole process is somewhat problematic - it seems daft in the long term to deprecate bytes paths in os functions like os.open and yet allow them in the supposedly higher level io.open and the open builtin. (And in the short term, it's illogical to me that the deprecation isn't for open as well as the os functions). I don't have a view on whether the cost to Japanese users is sufficiently high that we should continue along the deprecation path (or even divert to an enforced-UTF8 approach that's just as problematic for them). But maybe it's worth a separate thread, specifically focused on the use of bytes paths, rather than being lumped in with other Windows encoding issues? Paul From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Aug 16 10:11:13 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 16 Aug 2016 23:11:13 +0900 Subject: [Python-ideas] Digests (was Re: From mailing list to GitHub issues) In-Reply-To: <20160815170305.0b12ff6d@anarchist.wooz.org> References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> <20160815170305.0b12ff6d@anarchist.wooz.org> Message-ID: <22451.7937.487708.288246@turnbull.sk.tsukuba.ac.jp> Barry Warsaw writes: > On Aug 14, 2016, at 02:01 PM, Chris Angelico wrote: > > >The biggest problem I'm seeing is with digests. Can that feature be > >flagged off as "DO NOT USE THIS UNLESS YOU KNOW WHAT YOU ARE ASKING > >FOR"? So many people seem to select digest mode, then get extremely > >confused by it. "So many"? Of the 5018 posts on this list I've received since Sept 24, 2015 (some were private replies etc, but only a handful), exactly three were inappropriate replies to digests. In the first, on May 2, 2016, the poster very carefully changed the subject, provided a well-organized summary of the posts he had received, and then didn't trim. Failure to trim is a practice that has been common ever since Guido started top-posting from his phone a couple years back. I don't think this is a symptom of "confusion." In the second, on July 8, the poster top-posted on a quoted digest without any way to determine what content he was replying to. However, this person is a reasonably frequent poster since, and has not done it again. Initial confusion, yes, but no real harm done. In the third, on August 2, we have the same as the second. Again, it hasn't happened again although this person has posted a few times since. (That may be due to you advice to turn off digests.) I don't think this is anywhere near as damaging to the list as the practice of top-posting (let alone the bottom-posts, which are more frequent than reply-to-digest). > Yes, we can turn off digests for python-ideas, or any Mailman mailing list. > > I was tempted to JFDI, but it would mean that ~25% of list members would no > longer get messages. That's because 254 out of 979 members are currently > receiving digests. Is this really sufficient reason for eliminating a feature that more than 1 in 4 subscribers has explicitly chosen? Most of whom never post? How about instead adding "^Subject:.*Python-Ideas Digest, Vol \d+, Issue \d" to the spam filter, and so imposing moderation delay (or even rejection) on the poster? Steve From random832 at fastmail.com Tue Aug 16 10:59:59 2016 From: random832 at fastmail.com (Random832) Date: Tue, 16 Aug 2016 10:59:59 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> Message-ID: <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> On Tue, Aug 16, 2016, at 09:59, Paul Moore wrote: > It probably should be. Although if we're changing the deprecation to a > behaviour change, then maybe there's no point. But some of the > arguments here about breaking code are hinging on the idea that people > currently using the bytes API are using an (on the way to being) > unsupported feature and it's not really acceptable to take that > position if the deprecation wasn't announced. If the objections being > raised here (in the context of Japanese encodings and similar) would > apply equally to the bytes API being removed, There also seems to be an undercurrent in the discussions we're having now that using bytes paths and not unicode paths is somehow The Right Thing for unix-like OSes, and that breaking it (in whatever way) on windows causes code that Does The Right Thing on unix to require extra work to port to windows. That's seemingly both the rationale for the proposal itself and for the objections. From chris.barker at noaa.gov Tue Aug 16 11:33:04 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 16 Aug 2016 08:33:04 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> Message-ID: <3578204020037430971@unknownmsgid> > There also seems to be an undercurrent in the discussions we're having > now that using bytes paths and not unicode paths is somehow The Right > Thing for unix-like OSes, Almost -- from my perusing of discussions from the last few years, there do seem to be some library developers and *nix affectionados that DO think it's The Right Thing -- after all, a char* has always worked, yes? But these folks also seem to think that a ?nix system with no way of knowing what the encoding of the names in the file system (and could have more than one) is not "broken" in any way. A note about "utf-8 everywhere": while maybe a good idea, it's my understanding that *nix developers absolutely do not want utf-8 to be assumed in the Python APIs. Rather, this is all about punting the handling of encodings down to the application level, rather that the OS and Library level. Which is more backward compatible, but otherwise a horrible idea. And very much in conflict with Python 3's approach. So it seems odd to assume utf-8 on Windows, where it is less ubiquitous. Back to "The Right Thing" -- it's clear to me that everyone supporting this proposal is vet much doing so because it's "The Pragmatic Thing". But it seems folks porting from py2 need to explicitly convert the calls from str to bytes anyway to get the bytes behavior. With surrogate escapes, now you need to do nothing. So we're really supporting code that was ported to py3 earlier in the game - but it seems a bad idea to cement that hacks solution in place. And if the file amen in question are coming from a byte stream somehow, rather than file system API calls, then you really do need to know the encoding -- yes really! If a developer wants to assume utf-8, that's fine, but the developer should be making that decision, not Python itself. And not on Windows only. -CHB > and that breaking it (in whatever way) on > windows causes code that Does The Right Thing on unix to require extra > work to port to windows. That's seemingly both the rationale for the > proposal itself and for the objections. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From gvanrossum at gmail.com Tue Aug 16 11:36:45 2016 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue, 16 Aug 2016 08:36:45 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> Message-ID: I am going to mute this thread but I am worried about the outcome. Once there is agreement please check with me first. --Guido (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Tue Aug 16 11:56:57 2016 From: steve.dower at python.org (Steve Dower) Date: Tue, 16 Aug 2016 08:56:57 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <3578204020037430971@unknownmsgid> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> Message-ID: <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> I just want to clearly address two points, since I feel like multiple posts have been unclear on them. 1. The bytes API was deprecated in 3.3 and it is listed in https://docs.python.org/3/whatsnew/3.3.html. Lack of mention in the docs is an unfortunate oversight, but it was certainly announced and the warning has been there for three released versions. We can freely change or remove the support now, IMHO. 2. Windows file system encoding is *always* UTF-16. There's no "assuming mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what encoding it is". We know exactly what the encoding is on every supported version of Windows. UTF-16. This discussion is for the developers who insist on using bytes for paths within Python, and the question is, "how do we best represent UTF-16 encoded paths in bytes?" The choices are: * don't represent them at all (remove bytes API) * convert and drop characters not in the (legacy) active code page * convert and fail on characters not in the (legacy) active code page * convert and fail on invalid surrogate pairs * represent them as UTF-16-LE in bytes (with embedded '\0' everywhere) Currently we have the second option. My preference is the fourth option, as it will cause the least breakage of existing code and enable the most amount of code to just work in the presence of non-ACP characters. The fifth option is the best for round-tripping within Windows APIs. The only code that will break with any change is code that was using an already deprecated API. Code that correctly uses str to represent "encoding agnostic text" is unaffected. If you see an alternative choice to those listed above, feel free to contribute it. Otherwise, can we focus the discussion on these (or any new) choices? Cheers, Steve From chris.barker at noaa.gov Tue Aug 16 12:06:16 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 16 Aug 2016 09:06:16 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <3578204020037430971@unknownmsgid> References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> Message-ID: Just to make sure this is clear, the Pragmatic logic is thus: * There are more *nix-centric developers in the Python ecosystem than Windows-centric (or even Windows-agnostic) developers. * The bytes path approach works fine on *nix systems. * Whatever might be Right and Just -- the reality is that a number of projects, including important and widely used libraries and frameworks, use the bytes API for working with filenames and paths, etc. Therefore, there is a lot of code that does not work right on Windows. Currently, to get it to work right on Windows, you need to write Windows specific code, which many folks don't want or know how to do (or just can't support one way or the other). So the Solution is to either: (A) get everyone to use Unicode "properly", which will work on all platforms (but only on py3.5 and above?) or (B) kludge some *nix-compatible support for byte paths into Windows, that will work at least much of the time. It's clear (to me at least) that (A) it the "Right Thing", but real world experience has shown that it's unlikely to happen any time soon. Practicality beats Purity and all that -- this is a judgment call. Have I got that right? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenbarn at brenbarn.net Tue Aug 16 12:08:28 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Tue, 16 Aug 2016 09:08:28 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> Message-ID: <57B33A7C.7080104@brenbarn.net> On 2016-08-16 08:56, Steve Dower wrote: > I just want to clearly address two points, since I feel like multiple > posts have been unclear on them. > > 1. The bytes API was deprecated in 3.3 and it is listed in > https://docs.python.org/3/whatsnew/3.3.html. Lack of mention in the docs > is an unfortunate oversight, but it was certainly announced and the > warning has been there for three released versions. We can freely change > or remove the support now, IMHO. I strongly disagree with that. If using the code does not raise a visible warning (because DeprecationWarning is silent by default), and the documentation does not say it's deprecated, it hasn't actually been deprecated. Deprecation is the communicative act of saying "don't do this anymore". If that information is not communicated in the appropriate places (e.g., the docs), the deprecation has not occurred. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From chris.barker at noaa.gov Tue Aug 16 12:12:32 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 16 Aug 2016 09:12:32 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> Message-ID: Thanks for the clarity, Steve, a couple questions/thoughts: The choices are: > > * don't represent them at all (remove bytes API) > Would the bytes API be removed on *nix also? > * convert and drop characters not in the (legacy) active code page > * convert and fail on characters not in the (legacy) active code page > "Failure is not an option" -- These two seem like a plain old bad idea. * convert and fail on invalid surrogate pairs > where would an invalid surrogate pair come from? never from a file system API call, yes? * represent them as UTF-16-LE in bytes (with embedded '\0' everywhere) > would this be doing anything -- or just keeping whatever the Windows API takes/returns? i.e. exactly what is done on *nix? > The fifth option is the best for round-tripping within Windows APIs. > How is it better? only performance (i.e. no encoding/decoding required) -- or would it be more reliable as well? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Tue Aug 16 12:35:06 2016 From: random832 at fastmail.com (Random832) Date: Tue, 16 Aug 2016 12:35:06 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> Message-ID: <1471365306.1419044.697031409.610F9FFE@webmail.messagingengine.com> On Tue, Aug 16, 2016, at 12:12, Chris Barker wrote: > * convert and fail on invalid surrogate pairs > > where would an invalid surrogate pair come from? never from a file system > API call, yes? In principle it could, if the filesystem contains a file with an invalid surrogate pair. Nothing else, in general, prevents such a file from being created, though it's not easy to do so by accident. From rosuav at gmail.com Tue Aug 16 12:44:42 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 17 Aug 2016 02:44:42 +1000 Subject: [Python-ideas] Digests (was Re: From mailing list to GitHub issues) In-Reply-To: <22451.7937.487708.288246@turnbull.sk.tsukuba.ac.jp> References: <57af836f.4d012e0a.1f0d2.b0bd@mx.google.com> <20160815170305.0b12ff6d@anarchist.wooz.org> <22451.7937.487708.288246@turnbull.sk.tsukuba.ac.jp> Message-ID: On Wed, Aug 17, 2016 at 12:11 AM, Stephen J. Turnbull wrote: > Barry Warsaw writes: > > On Aug 14, 2016, at 02:01 PM, Chris Angelico wrote: > > > > >The biggest problem I'm seeing is with digests. Can that feature be > > >flagged off as "DO NOT USE THIS UNLESS YOU KNOW WHAT YOU ARE ASKING > > >FOR"? So many people seem to select digest mode, then get extremely > > >confused by it. > > "So many"? Of the 5018 posts on this list I've received since Sept > 24, 2015 (some were private replies etc, but only a handful), exactly > three were inappropriate replies to digests. > [chomp details] > I don't think this is anywhere near as damaging to the list as the > practice of top-posting (let alone the bottom-posts, which are more > frequent than reply-to-digest). It's not just this list, though. I've seen the same phenomenon on other Mailman lists too. I think you're right that top-posting is more of an issue (certainly it's far more prevalent), but it's also a lot harder to solve. > > Yes, we can turn off digests for python-ideas, or any Mailman mailing list. > > > > I was tempted to JFDI, but it would mean that ~25% of list members would no > > longer get messages. That's because 254 out of 979 members are currently > > receiving digests. > > Is this really sufficient reason for eliminating a feature that more > than 1 in 4 subscribers has explicitly chosen? Most of whom never post? > > How about instead adding > > "^Subject:.*Python-Ideas Digest, Vol \d+, Issue \d" > > to the spam filter, and so imposing moderation delay (or even > rejection) on the poster? That would be a decent idea. I was thinking more of the sign-up screen, though. How many of those 25% of subscribers really want digests, and how many of them completely misunderstood this: """Would you like to receive list mail batched in a daily digest?""" and picked "Yes" because they want to receive mail every day, rather than having to go to some web page to read it? My counter-suggestion is to simply remove that option from the front page. Anyone who genuinely wants a digest can go into their settings and request it; Mailman's settings pages are a lot more verbose than a sign-up page can be. The obvious default would then be the sane one. ChrisA From moritz.sichert at googlemail.com Tue Aug 16 12:46:46 2016 From: moritz.sichert at googlemail.com (Moritz Sichert) Date: Tue, 16 Aug 2016 18:46:46 +0200 Subject: [Python-ideas] PEP 525: Asynchronous Generators Message-ID: <7f7e3bb3-2ea1-53d1-0c05-78fe3c14e861@googlemail.com> >>> 2. It's extremely unlikely that somebody will design a system that >>> switches coroutine runners *while async/awaiting a coroutine*. >> Yes, I guess so. >> >> >>> But even in this unlikely use case, you can >>> easily stack finalizers following this pattern: >>> >>> old_finalizer = sys.get_asyncgen_finalizer() >>> sys.set_asyncgen_finalizer(my_finalizer) >>> try: >>> # do my thing >>> finally: >>> sys.set_asyncgen_finalizer(old_finalizer) >> That only works for synchronous code, though, because if this is done in a >> coroutine, it might get suspended within the try block and would leak its >> own finalizer into the outer world. > > set_asyncgen_finalizer is designed to be used *only* by coroutine > runners. This is a low-level API that coroutines should never > touch. (At least my experience working with coroutines says so...) First of all, thanks for your work in this PEP! I think it really completes the async Python to a state where most synchronous code can be changed easily to be asynchronous. I'm not entirely sure about get/set_asyncgen_finalizer(), though. I've written a short function that converts an async iterator (that includes async generators) into a normal one: def sync_iter(async_iter, *, loop=None): async_iter = async_iter.__aiter__() is_asyncgen = inspect.isasyncgen(async_iter) if loop is None: loop = asyncio.get_event_loop() if is_asyncgen: def finalizer(gen): loop.run_until_complete(gen.aclose()) old_finalizer = sys.get_asyncgen_finalizer() sys.set_asyncgen_finalizer(finalizer) try: while True: yield loop.run_until_complete(async_iter.__anext__()) except StopAsyncIteration: return finally: if is_asyncgen: sys.set_asyncgen_finalizer(old_finalizer) Now my questions: - Is it correct do check if the async iterator is actually a generator and only in that case do the whole get/set_asyncgen_finalizer() thing? As I understand, the finalizer is not needed for "normal" async iterators, i.e. instances of classes with a __anext__() method. - Would it make sense to call sys.set_asyncgen_finalizer(old_finalizer) after the first call of async_iter.__anext__() instead of only at the end? As I understand the finalizer is set when the generator is started. - Is loop.run_until_complete(gen.aclose()) a sensible finalizer? If so, is there even any other finalizer that would make sense? Maybe creating a task for gen.aclose() instead of waiting for it to be completed? From srkunze at mail.de Tue Aug 16 13:06:32 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 16 Aug 2016 19:06:32 +0200 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> Message-ID: <8bff964e-de9b-20e4-25fc-22704227223f@mail.de> On 16.08.2016 18:06, Chris Barker wrote: > It's clear (to me at least) that (A) it the "Right Thing", but real > world experience has shown that it's unlikely to happen any time soon. > > Practicality beats Purity and all that -- this is a judgment call. Maybe, but even when it takes a lot of time to get it right, I always prefer the "Right Thing". My past experience taught me that everything will always come back to the "Right Thing" even partly as it is *surprise* the "Right Thing" (TM). Question is: are we in a hurry? Has somebody too little time to wait for the "Right Thing" to happen? Sven From steve.dower at python.org Tue Aug 16 13:44:31 2016 From: steve.dower at python.org (Steve Dower) Date: Tue, 16 Aug 2016 10:44:31 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <8bff964e-de9b-20e4-25fc-22704227223f@mail.de> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <8bff964e-de9b-20e4-25fc-22704227223f@mail.de> Message-ID: On 16Aug2016 1006, Sven R. Kunze wrote: > Question is: are we in a hurry? Has somebody too little time to wait for > the "Right Thing" to happen? Not really in a hurry. It's just that I decided to attack a few of the encoding issues on Windows and this was one of them. Ideally I'd want to get the change in for 3.6.0b1 so there's plenty of testing time. But we've been waiting many years for this already so I guess a few more won't hurt. The current situation of making Linux developers write different path handling code for Windows vs Linux (or just use str for paths) is painful for some, but not as bad as the other issues I want to fix. Cheers, Steve From srkunze at mail.de Tue Aug 16 14:08:37 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 16 Aug 2016 20:08:37 +0200 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <8bff964e-de9b-20e4-25fc-22704227223f@mail.de> Message-ID: <119cdd17-4778-75b7-8823-21970b8ce09e@mail.de> On 16.08.2016 19:44, Steve Dower wrote: > On 16Aug2016 1006, Sven R. Kunze wrote: >> Question is: are we in a hurry? Has somebody too little time to wait for >> the "Right Thing" to happen? > > Not really in a hurry. It's just that I decided to attack a few of the > encoding issues on Windows and this was one of them. > > Ideally I'd want to get the change in for 3.6.0b1 so there's plenty of > testing time. But we've been waiting many years for this already so I > guess a few more won't hurt. The current situation of making Linux > developers write different path handling code for Windows vs Linux (or > just use str for paths) is painful for some, but not as bad as the > other issues I want to fix. > I assume one overall goal will be Windows and Linux programs handling paths the same way which I personally find a very good idea. And as long as such long-term goals are properly communicated, people are educated the right way and official deprecation phases are in place, everything is good, I guess. :) Sven From p.f.moore at gmail.com Tue Aug 16 14:20:40 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 16 Aug 2016 19:20:40 +0100 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> Message-ID: On 16 August 2016 at 16:56, Steve Dower wrote: > I just want to clearly address two points, since I feel like multiple posts > have been unclear on them. > > 1. The bytes API was deprecated in 3.3 and it is listed in > https://docs.python.org/3/whatsnew/3.3.html. Lack of mention in the docs is > an unfortunate oversight, but it was certainly announced and the warning has > been there for three released versions. We can freely change or remove the > support now, IMHO. For clarity, the statement was: """ issue 13374: The Windows bytes API has been deprecated in the os module. Use Unicode filenames, instead of bytes filenames, to not depend on the ANSI code page anymore and to support any filename. """ First of all, note that I'm perfectly OK with deprecating bytes paths. However, this statement specifically does *not* say anything about use of bytes paths outside of the os module (builtin open and the io module being the obvious places). Secondly, it appears that unfortunately the main Python documentation wasn't updated to state this. So while "we can freely change or remove the support now" may be true, it's not that simple - the debate here is at least in part about builtin open, and there's nothing anywhere that I can see that states that bytes support in open has been deprecated. Maybe there should have been, and maybe everyone involved at the time assumed that it was, but that's water under the bridge. > 2. Windows file system encoding is *always* UTF-16. There's no "assuming > mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what encoding > it is". We know exactly what the encoding is on every supported version of > Windows. UTF-16. > > This discussion is for the developers who insist on using bytes for paths > within Python, and the question is, "how do we best represent UTF-16 encoded > paths in bytes?" People passing bytes to open() have in my view, already chosen not to follow the standard advice of "decode incoming data at the boundaries of your application". They may have good reasons for that, but it's perfectly reasonable to expect them to take responsibility for manually tracking the encoding of the resulting bytes values flowing through their code. It is of course, also true that "works for me in my environment" is a viable strategy - but the maintenance cost of this strategy if things change (whether in Python, or in the environment) is on the application developers - they are hoping that cost is minimal, but that's a risk they choose to take. > The choices are: > > * don't represent them at all (remove bytes API) > * convert and drop characters not in the (legacy) active code page > * convert and fail on characters not in the (legacy) active code page > * convert and fail on invalid surrogate pairs > * represent them as UTF-16-LE in bytes (with embedded '\0' everywhere) Actually, with the exception of the last one (which seems "obviously not sensible") these all feel more to me like answers to the question "how do we best interpret bytes provided to us as UTF-16?". It's a subtle point, but IMO important. It's much easier to answer the question you posed, but what people are actually concerned about is interpreting bytes, not representing Unicode. The correct answer to "how do we interpret bytes" is "in the face of ambiguity, refuse to guess" - but people using the bytes API have *already* bought into the current heuristic for guessing, so changing affects them. > Currently we have the second option. > > My preference is the fourth option, as it will cause the least breakage of > existing code and enable the most amount of code to just work in the > presence of non-ACP characters. It changes the encoding used to interpret bytes. While it preserves more information in the "UTF-16 to bytes" direction, nobody really cares about that direction. And in the "bytes to UTF-16" direction, it changes the interpretation of basically all non-ASCII bytes. That's a lot of breakage. Although as already noted, it's only breaking things that currently work while relying on a (maybe) undocumented API (byte paths to builtin open isn't actually documented) and on an arguably bad default that nevertheless works for them. > The fifth option is the best for round-tripping within Windows APIs. > > The only code that will break with any change is code that was using an > already deprecated API. Code that correctly uses str to represent "encoding > agnostic text" is unaffected. Code using Unicode is unaffected, certainly. Ideally that means that only a tiny minority of users should be affected. Are we over-reacting to reports of standard practices in Japan? I've no idea. > If you see an alternative choice to those listed above, feel free to > contribute it. Otherwise, can we focus the discussion on these (or any new) > choices? Accept that we should have deprecated builtin open and the io module, but didn't do so. Extend the existing deprecation of bytes paths on Windows, to cover *all* APIs, not just the os module, But modify the deprecation to be "use of the Windows CP_ACP code page (via the ...A Win32 APIs) is deprecated and will be replaced with use of UTF-8 as the implied encoding for all bytes paths on Windows starting in Python 3.7". Document and publicise it much more prominently, as it is a breaking change. Then leave it one release for people to prepare for the change. Oh, and (obviously) check back with Guido on his view - he's expressed concern, but I for one don't have the slightest idea in this case what his preference would be... Paul From moloney at ohsu.edu Tue Aug 16 15:35:49 2016 From: moloney at ohsu.edu (Brendan Moloney) Date: Tue, 16 Aug 2016 19:35:49 +0000 Subject: [Python-ideas] Allow manual creation of DirEntry objects Message-ID: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> Hi, I have been using the 'scandir' package (https://github.com/benhoyt/scandir) for a while now to speed up some directory tree processing code. Since Python 3.5 now includes 'os.scandir' in the stdlib (https://www.python.org/dev/peps/pep-0471/) I decided to try to make my code work with the built-in version if available. The first issue I hit was that the 'DirEntry' class was not actually being exposed (http://bugs.python.org/issue27038). However in the discussion of that bug I noticed that the constructor for the 'DirEntry' class was deliberately being left undocumented and that there was no clear way to manually create a DirEntry object from a path. I brought up my objections to this decision in the bug tracker and was asked to have the discussion over here on python-ideas. I have a bunch of functions that operate on DirEntry objects, typically doing some sort of filtering to select the paths I actually want to process. The overwhelming majority of the time these functions are going to be operating on DirEntry objects produced by the scandir function, but there are some cases where the user will be supplying the path themselves (for example, the root of a directory tree to process). In my current code base that uses the scandir package I just wrap these paths in a 'GenericDirEntry' object and then pass them through the filter functions the same as any results coming from the scandir function. With the decision to not expose any method in the stdlib to manually create a DirEntry object, I am stuck with no good options. The least bad option I guess would be to copy the GenericDirEntry code out of the scandir package into my own code base. This seems rather silly. I really don't understand the rationale for not giving users a way to create these objects themselves, and I haven't actually seen that explained anywhere. I guess people are unhappy with the overlap between pathlib.Path objects and DirEntry objects and this is a misguided attempt to prod people into using pathlib. I think a better approach is to document the differences between DirEntry and pathlib.Path objects and encourage users to default to using pathlib.Path unless they have good reasons for using DirEntry. Thanks, Brendan -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Aug 16 17:13:38 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Aug 2016 14:13:38 -0700 Subject: [Python-ideas] Allow manual creation of DirEntry objects In-Reply-To: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> References: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> Message-ID: It sounds fine to just submit a patch to add and document the DirEntry constructor. I don't think anyone intended to disallow your use case, it's more likely that nobody thought of it. On Tue, Aug 16, 2016 at 12:35 PM, Brendan Moloney wrote: > Hi, > > I have been using the 'scandir' package (https://github.com/benhoyt/ > scandir) for a while now to > speed up some directory tree processing code. Since Python 3.5 now > includes 'os.scandir' in the > stdlib (https://www.python.org/dev/peps/pep-0471/) I decided to try to > make my code work with > the built-in version if available. > > The first issue I hit was that the 'DirEntry' class was not actually being > exposed > (http://bugs.python.org/issue27038). However in the discussion of that > bug I noticed that the > constructor for the 'DirEntry' class was deliberately being left > undocumented and that there > was no clear way to manually create a DirEntry object from a path. I > brought up my objections > to this decision in the bug tracker and was asked to have the discussion > over here on > python-ideas. > > I have a bunch of functions that operate on DirEntry objects, typically > doing some sort of filtering > to select the paths I actually want to process. The overwhelming majority > of the time these functions > are going to be operating on DirEntry objects produced by the scandir > function, but there are some > cases where the user will be supplying the path themselves (for example, > the root of a directory tree > to process). In my current code base that uses the scandir package I just > wrap these paths in a > 'GenericDirEntry' object and then pass them through the filter functions > the same as any results > coming from the scandir function. > > With the decision to not expose any method in the stdlib to manually > create a DirEntry object, I am > stuck with no good options. The least bad option I guess would be to copy > the GenericDirEntry code > out of the scandir package into my own code base. This seems rather > silly. I really don't understand > the rationale for not giving users a way to create these objects > themselves, and I haven't actually seen > that explained anywhere. I guess people are unhappy with the overlap > between pathlib.Path objects > and DirEntry objects and this is a misguided attempt to prod people into > using pathlib. I think a better > approach is to document the differences between DirEntry and pathlib.Path > objects and encourage > users to default to using pathlib.Path unless they have good reasons for > using DirEntry. > > Thanks, > Brendan > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Tue Aug 16 19:02:06 2016 From: cs at zip.com.au (cs at zip.com.au) Date: Wed, 17 Aug 2016 09:02:06 +1000 Subject: [Python-ideas] Digests (was Re: From mailing list to GitHub issues) In-Reply-To: References: Message-ID: <20160816230206.GA19323@cskk.homeip.net> On 17Aug2016 02:44, Chris Angelico wrote: >On Wed, Aug 17, 2016 at 12:11 AM, Stephen J. Turnbull > wrote: >> Barry Warsaw writes: >> > On Aug 14, 2016, at 02:01 PM, Chris Angelico wrote: >> > >The biggest problem I'm seeing is with digests. Can that feature be >> > >flagged off as "DO NOT USE THIS UNLESS YOU KNOW WHAT YOU ARE ASKING >> > >FOR"? So many people seem to select digest mode, then get extremely >> > >confused by it. >> >> "So many"? Of the 5018 posts on this list I've received since Sept >> 24, 2015 (some were private replies etc, but only a handful), exactly >> three were inappropriate replies to digests. >> [chomp details] >> I don't think this is anywhere near as damaging to the list as the >> practice of top-posting (let alone the bottom-posts, which are more >> frequent than reply-to-digest). > >It's not just this list, though. I've seen the same phenomenon on >other Mailman lists too. Yes, digests cause a lot of trouble when digest users switch from lurking to posting. (I also think they're a usability fail versus threaded messages). But Stephen's numbers seem plausible. There are other lists where things are worse, I think. Here the noise lvel is low. [...] >> Is this really sufficient reason for eliminating a feature that more >> than 1 in 4 subscribers has explicitly chosen? Most of whom never post? Personally I think digests are bad enough to actively discourage. >> How about instead adding >> "^Subject:.*Python-Ideas Digest, Vol \d+, Issue \d" >> to the spam filter, and so imposing moderation delay (or even >> rejection) on the poster? > >That would be a decent idea. I was thinking more of the sign-up >screen, though. How many of those 25% of subscribers really want >digests, and how many of them completely misunderstood this: > >"""Would you like to receive list mail batched in a daily digest?""" > >and picked "Yes" because they want to receive mail every day, rather >than having to go to some web page to read it? My counter-suggestion >is to simply remove that option from the front page. Anyone who >genuinely wants a digest can go into their settings and request it; >Mailman's settings pages are a lot more verbose than a sign-up page >can be. The obvious default would then be the sane one. This I support. I think the other driver for digests is people who don't filter their email. They think digests keep their inbox low volume, and have never really looked at the benefits of filtering lists into topic folders. Cheers, Cameron Simpson From victor.stinner at gmail.com Tue Aug 16 19:03:58 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 17 Aug 2016 01:03:58 +0200 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> Message-ID: 2016-08-16 17:56 GMT+02:00 Steve Dower : > 2. Windows file system encoding is *always* UTF-16. There's no "assuming > mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what encoding > it is". We know exactly what the encoding is on every supported version of > Windows. UTF-16. I think that you missed a important issue (or "use case") which is called the "Makefile problem" by Mercurial developers: https://www.mercurial-scm.org/wiki/EncodingStrategy#The_.22makefile_problem.22 I already explained it before, but maybe you misunderstood or just missed it, so here is a more concrete example. A runner.py script produces a bytes filename and sends it to a second read_file.py script through stdin/stdout. The read_file.py script opens the file using open(filename). The read_file.py script is run by Python 2 which works naturally on bytes. The question is how the runner.py produces (encodes) the filename. runner.py (script run by Python 3.7): --- import os, sys, subprocess, tempfile filename = 'h\xe9.txt' content = b'foo bar' print("filename unicode: %a" % filename) root = os.path.realpath(os.path.dirname(__file__)) script = os.path.join(root, 'read_file.py') old_cwd = os.getcwd() with tempfile.TemporaryDirectory() as tmpdir: os.chdir(tmpdir) with open(filename, 'wb') as fp: fp.write(content) filenameb = os.listdir(b'.')[0] # Python 3.5 encodes Unicode (UTF-16) to the ANSI code page # what if Python 3.7 encodes Unicode (UTF-16) to UTF-8? print("filename bytes: %a" % filenameb) proc = subprocess.Popen(['py', '-2', script], stdin=subprocess.PIPE, stdout=subprocess.PIPE) stdout = proc.communicate(filenameb)[0] print("File content: %a" % stdout) os.chdir(old_cwd) --- read_file.py (run by Python 2): --- import sys filename = sys.stdin.read() # Python 2 calls the Windows C open() function # which expects a filename encoded to the ANSI code page with open(filename) as fp: content = fp.read() sys.stdout.write(content) sys.stdout.flush() --- read_file.py only works if the non-ASCII filename is encoded to the ANSI code page. The question is how you expect developers should handle such use case. For example, are developers responsible to transcode communicate() data (input and outputs) manually? That's why I keep repeating that ANSI code page is the best *default* encoding because it is the encoded expected by other applications. I know that the ANSI code page is usually limited and caused various painful issues when handling non-ASCII data, but it's the status quo if you really want to handle data as bytes... Sorry, I didn't read all emails of this long thread, so maybe I missed your answer to this issue. Victor From victor.stinner at gmail.com Tue Aug 16 19:11:21 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 17 Aug 2016 01:11:21 +0200 Subject: [Python-ideas] Allow manual creation of DirEntry objects In-Reply-To: References: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> Message-ID: 2016-08-16 23:13 GMT+02:00 Guido van Rossum : > It sounds fine to just submit a patch to add and document the DirEntry > constructor. I don't think anyone intended to disallow your use case, it's > more likely that nobody thought of it. Currently, the DirEntry constructor expects data which comes from opendir/readdir functions on UNIX/BSD or FindFirstFile/FindNextFile functions on Windows. These functions are not exposed in Python, so it's unlikely that you can get expected value. The DirEntry object was created to avoid syscalls in the common case thanks to data provided by these functions. But I guess that Brendan wants to create a DirEntry object which would call os.stat() the first time that an attribute is read and then benefit of the code. You loose the "no syscall" optimization, since at least once syscall is needed. In this case, I guess that the constructor should be DirEntry(directory, entry_name) where os.path.join(directory, entry_name) is the full path. An issue is how to document the behaviour of DirEntry. Objects created by os.scandir() would be "optimized", whereas objects created manually would be "less optimized". DirEntry is designed for os.scandir(), it's very limited compared to pathlib. IMO pathlib would be a better candidate for "cached os.stat results" with a full API to access the file system. Victor From victor.stinner at gmail.com Tue Aug 16 19:14:29 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 17 Aug 2016 01:14:29 +0200 Subject: [Python-ideas] Allow manual creation of DirEntry objects In-Reply-To: References: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> Message-ID: By the way, for all these reasons, I'm not really excited by Python 3.6 change exposing os.DirEntry ( https://bugs.python.org/issue27038 ). Victor 2016-08-17 1:11 GMT+02:00 Victor Stinner : > 2016-08-16 23:13 GMT+02:00 Guido van Rossum : >> It sounds fine to just submit a patch to add and document the DirEntry >> constructor. I don't think anyone intended to disallow your use case, it's >> more likely that nobody thought of it. > > Currently, the DirEntry constructor expects data which comes from > opendir/readdir functions on UNIX/BSD or FindFirstFile/FindNextFile > functions on Windows. These functions are not exposed in Python, so > it's unlikely that you can get expected value. The DirEntry object was > created to avoid syscalls in the common case thanks to data provided > by these functions. > > But I guess that Brendan wants to create a DirEntry object which would > call os.stat() the first time that an attribute is read and then > benefit of the code. You loose the "no syscall" optimization, since at > least once syscall is needed. > > In this case, I guess that the constructor should be > DirEntry(directory, entry_name) where os.path.join(directory, > entry_name) is the full path. > > An issue is how to document the behaviour of DirEntry. Objects created > by os.scandir() would be "optimized", whereas objects created manually > would be "less optimized". > > DirEntry is designed for os.scandir(), it's very limited compared to > pathlib. IMO pathlib would be a better candidate for "cached os.stat > results" with a full API to access the file system. > > Victor From steve.dower at python.org Tue Aug 16 19:27:43 2016 From: steve.dower at python.org (Steve Dower) Date: Tue, 16 Aug 2016 16:27:43 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> Message-ID: <7239afca-62fa-a3b0-ffbc-784e6f9cf3c1@python.org> On 16Aug2016 1603, Victor Stinner wrote: > 2016-08-16 17:56 GMT+02:00 Steve Dower : >> 2. Windows file system encoding is *always* UTF-16. There's no "assuming >> mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what encoding >> it is". We know exactly what the encoding is on every supported version of >> Windows. UTF-16. > > I think that you missed a important issue (or "use case") which is > called the "Makefile problem" by Mercurial developers: > https://www.mercurial-scm.org/wiki/EncodingStrategy#The_.22makefile_problem.22 > > I already explained it before, but maybe you misunderstood or just > missed it, so here is a more concrete example. I guess I misunderstood. The concrete example really help, thank you. The problem here is that there is an application boundary without a defined encoding, right where you put the comment. > filenameb = os.listdir(b'.')[0] > # Python 3.5 encodes Unicode (UTF-16) to the ANSI code page > # what if Python 3.7 encodes Unicode (UTF-16) to UTF-8? > print("filename bytes: %a" % filenameb) > > proc = subprocess.Popen(['py', '-2', script], > stdin=subprocess.PIPE, stdout=subprocess.PIPE) > stdout = proc.communicate(filenameb)[0] > print("File content: %a" % stdout) If you are defining the encoding as 'mbcs', then you need to check that sys.getfilesystemencoding() == 'mbcs', and if it doesn't then reencode. Alternatively, since this script is the "new" code, you would use `os.listdir('.')[0].encode('mbcs')`, given that you have explicitly determined that mbcs is the encoding for the later transfer. Essentially, the problem is that this code is relying on a certain non-guaranteed behaviour of a deprecated API, where using sys.getfilesystemencoding() as documented would have prevented any issue (see https://docs.python.org/3/library/os.html#file-names-command-line-arguments-and-environment-variables). In one of the emails I think you missed, I called this out as the only case where code will break with a change to sys.getfilesystemencoding(). So yes, breaking existing code is something I would never do lightly. However, I'm very much of the opinion that the only code that will break is code that is already broken (or at least fragile) and that nobody is forced to take a major upgrade to Python or should necessarily expect 100% compatibility between major versions. Cheers, Steve From guido at python.org Tue Aug 16 19:50:15 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Aug 2016 16:50:15 -0700 Subject: [Python-ideas] Allow manual creation of DirEntry objects In-Reply-To: References: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> Message-ID: On Tue, Aug 16, 2016 at 4:14 PM, Victor Stinner wrote: > By the way, for all these reasons, I'm not really excited by Python > 3.6 change exposing os.DirEntry ( https://bugs.python.org/issue27038 > ). > But that's separate from the constructor. We could expose the class with a constructor that always fails (the C code could construct instances through a backdoor). Exposing the type is useful for type annotations, e.g. def is_foobar(de: os.DirEntry) -> bool: ... and for the occasional isinstance() check. Also, what does the scandir package mentioned by the OP use as the constructor signature? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Aug 16 19:50:51 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 17 Aug 2016 01:50:51 +0200 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <7239afca-62fa-a3b0-ffbc-784e6f9cf3c1@python.org> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <7239afca-62fa-a3b0-ffbc-784e6f9cf3c1@python.org> Message-ID: 2016-08-17 1:27 GMT+02:00 Steve Dower : >> filenameb = os.listdir(b'.')[0] >> # Python 3.5 encodes Unicode (UTF-16) to the ANSI code page >> # what if Python 3.7 encodes Unicode (UTF-16) to UTF-8? >> print("filename bytes: %a" % filenameb) >> >> proc = subprocess.Popen(['py', '-2', script], >> stdin=subprocess.PIPE, stdout=subprocess.PIPE) >> stdout = proc.communicate(filenameb)[0] >> print("File content: %a" % stdout) > > > If you are defining the encoding as 'mbcs', then you need to check that > sys.getfilesystemencoding() == 'mbcs', and if it doesn't then reencode. Sorry, I don't understand. What do you mean by "defining an encoding"? It's not possible to modify sys.getfilesystemencoding() in Python. What does "reencode"? I'm lost. > Alternatively, since this script is the "new" code, you would use > `os.listdir('.')[0].encode('mbcs')`, given that you have explicitly > determined that mbcs is the encoding for the later transfer. My example is not new code. It is a very simplified script to explain the issue that can occur in a large code base which *currently* works well on Python 2 and Pyhon 3 in the common case (only handle data encodable to the ANSI code page). > Essentially, the problem is that this code is relying on a certain > non-guaranteed behaviour of a deprecated API, where using > sys.getfilesystemencoding() as documented would have prevented any issue > (see > https://docs.python.org/3/library/os.html#file-names-command-line-arguments-and-environment-variables). sys.getfilesystemencoding() is used in applications which store data as Unicode, but we are talking about applications storing data as bytes, no? > So yes, breaking existing code is something I would never do lightly. > However, I'm very much of the opinion that the only code that will break is > code that is already broken (or at least fragile) and that nobody is forced > to take a major upgrade to Python or should necessarily expect 100% > compatibility between major versions. Well, it's somehow the same issue that we had in Python 2: applications work in most cases, but start to fail with non-ASCII characters, or maybe only in some cases. In this case, the ANSI code page is fine if all data can be encoded to the ANSI code page. You start to get troubles when you start to use characters not encodable to your ANSI code page. Last time I checked, Microsoft Visual Studio behaved badly (has bugs) with such filenames. It's the same for many applications. So it's not like Windows applications already handle this case very well. So let me call it a corner case. I'm not sure that it's worth it to explicitly break the Python backward compatibility on Windows for such corner case, especially because it's already possible to fix applications by starting to use Unicode everywhere (which would likely fix more issues than expected as a side effect). It's still unclear to me if it's simpler to modify an application using bytes to start using Unicode (for filenames), or if your proposition requires less changes. My main concern is the "makefile issue" which requires more complex code to transcode data between UTF-8 and ANSI code page. To me, it's like we are going back to Python 2 where no data had known encoding and mojibake was the default. If you manipulate strings in two encodings, it's likely to make mistakes and concatenate two strings encoded to two different encodings (=> mojibake). Victor From victor.stinner at gmail.com Tue Aug 16 19:56:35 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 17 Aug 2016 01:56:35 +0200 Subject: [Python-ideas] Allow manual creation of DirEntry objects In-Reply-To: References: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> Message-ID: 2016-08-17 1:50 GMT+02:00 Guido van Rossum : > We could expose the class with a > constructor that always fails (the C code could construct instances through > a backdoor). Oh, in fact you cannot create an instance of os.DirEntry, it has no (Python) constructor: $ ./python Python 3.6.0a4+ (default:e615718a6455+, Aug 17 2016, 00:12:17) >>> import os >>> os.DirEntry(1) Traceback (most recent call last): File "", line 1, in TypeError: cannot create 'posix.DirEntry' instances Only os.scandir() can produce such objects. The question is still if it makes sense to allow to create DirEntry objects in Python :-) > Also, what does the scandir package mentioned by the OP use as the > constructor signature? The implementation of os.scandir() comes from the scandir package. It contains the same code, and so has the same behaviour (DirEntry has no constructor). Victor From brett at python.org Tue Aug 16 19:35:14 2016 From: brett at python.org (Brett Cannon) Date: Tue, 16 Aug 2016 23:35:14 +0000 Subject: [Python-ideas] Allow manual creation of DirEntry objects In-Reply-To: References: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> Message-ID: On Tue, 16 Aug 2016 at 16:15 Victor Stinner wrote: > By the way, for all these reasons, I'm not really excited by Python > 3.6 change exposing os.DirEntry ( https://bugs.python.org/issue27038 > ). > It was exposed at Guido's request for type hinting in typeshed. -Brett > > Victor > > 2016-08-17 1:11 GMT+02:00 Victor Stinner : > > 2016-08-16 23:13 GMT+02:00 Guido van Rossum : > >> It sounds fine to just submit a patch to add and document the DirEntry > >> constructor. I don't think anyone intended to disallow your use case, > it's > >> more likely that nobody thought of it. > > > > Currently, the DirEntry constructor expects data which comes from > > opendir/readdir functions on UNIX/BSD or FindFirstFile/FindNextFile > > functions on Windows. These functions are not exposed in Python, so > > it's unlikely that you can get expected value. The DirEntry object was > > created to avoid syscalls in the common case thanks to data provided > > by these functions. > > > > But I guess that Brendan wants to create a DirEntry object which would > > call os.stat() the first time that an attribute is read and then > > benefit of the code. You loose the "no syscall" optimization, since at > > least once syscall is needed. > > > > In this case, I guess that the constructor should be > > DirEntry(directory, entry_name) where os.path.join(directory, > > entry_name) is the full path. > > > > An issue is how to document the behaviour of DirEntry. Objects created > > by os.scandir() would be "optimized", whereas objects created manually > > would be "less optimized". > > > > DirEntry is designed for os.scandir(), it's very limited compared to > > pathlib. IMO pathlib would be a better candidate for "cached os.stat > > results" with a full API to access the file system. > > > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Tue Aug 16 20:14:09 2016 From: steve.dower at python.org (Steve Dower) Date: Tue, 16 Aug 2016 17:14:09 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <7239afca-62fa-a3b0-ffbc-784e6f9cf3c1@python.org> Message-ID: On 16Aug2016 1650, Victor Stinner wrote: > 2016-08-17 1:27 GMT+02:00 Steve Dower : >>> filenameb = os.listdir(b'.')[0] >>> # Python 3.5 encodes Unicode (UTF-16) to the ANSI code page >>> # what if Python 3.7 encodes Unicode (UTF-16) to UTF-8? >>> print("filename bytes: %a" % filenameb) >>> >>> proc = subprocess.Popen(['py', '-2', script], >>> stdin=subprocess.PIPE, stdout=subprocess.PIPE) >>> stdout = proc.communicate(filenameb)[0] >>> print("File content: %a" % stdout) >> >> >> If you are defining the encoding as 'mbcs', then you need to check that >> sys.getfilesystemencoding() == 'mbcs', and if it doesn't then reencode. > > Sorry, I don't understand. What do you mean by "defining an encoding"? > It's not possible to modify sys.getfilesystemencoding() in Python. > What does "reencode"? I'm lost. You are transferring text between two applications without specifying what the encoding is. sys.getfilesystemencoding() does not apply to proc.communicate() - you can use your choice of encoding for communicating between two processes. >> Alternatively, since this script is the "new" code, you would use >> `os.listdir('.')[0].encode('mbcs')`, given that you have explicitly >> determined that mbcs is the encoding for the later transfer. > > My example is not new code. It is a very simplified script to explain > the issue that can occur in a large code base which *currently* works > well on Python 2 and Pyhon 3 in the common case (only handle data > encodable to the ANSI code page). If you are planning to run it with Python 3.6, then I'd argue it's "new" code. When you don't want anything to change, you certainly don't change the major version of your runtime. >> Essentially, the problem is that this code is relying on a certain >> non-guaranteed behaviour of a deprecated API, where using >> sys.getfilesystemencoding() as documented would have prevented any issue >> (see >> https://docs.python.org/3/library/os.html#file-names-command-line-arguments-and-environment-variables). > > sys.getfilesystemencoding() is used in applications which store data > as Unicode, but we are talking about applications storing data as > bytes, no? No, we're talking about how Python code communicates with the file system. Applications can store their data however they like, but when they pass it to a filesystem function they need to pass it as str or bytes encoding with sys.getfilesystemencoding() (this has always been the case). >> So yes, breaking existing code is something I would never do lightly. >> However, I'm very much of the opinion that the only code that will break is >> code that is already broken (or at least fragile) and that nobody is forced >> to take a major upgrade to Python or should necessarily expect 100% >> compatibility between major versions. > > Well, it's somehow the same issue that we had in Python 2: > applications work in most cases, but start to fail with non-ASCII > characters, or maybe only in some cases. > > In this case, the ANSI code page is fine if all data can be encoded to > the ANSI code page. You start to get troubles when you start to use > characters not encodable to your ANSI code page. Last time I checked, > Microsoft Visual Studio behaved badly (has bugs) with such filenames. > It's the same for many applications. So it's not like Windows > applications already handle this case very well. So let me call it a > corner case. The existence of bugs in other applications is not a good reason to help people create new bugs. > I'm not sure that it's worth it to explicitly break the Python > backward compatibility on Windows for such corner case, especially > because it's already possible to fix applications by starting to use > Unicode everywhere (which would likely fix more issues than expected > as a side effect). > > It's still unclear to me if it's simpler to modify an application > using bytes to start using Unicode (for filenames), or if your > proposition requires less changes. My proposition requires less changes *when you target multiple platforms and would prefer to use bytes*. It allows the below code to be written as either branch without losing the ability to round-trip whatever filename happens to be returned: if os.name == 'nt': f = open(os.listdir('.')[-1]) else: f = open(os.listdir(b'.')[-1]) If you choose just the first branch (use str for paths), then you do get a better result. However, we have been telling people to do that since 3.0 (and made it easier in 3.2 IIRC) and it's now 3.5 and they are still complaining about not getting to use bytes for paths. So rather than have people say "Windows support is too hard", this change enables the second branch to be used on all platforms. > My main concern is the "makefile issue" which requires more complex > code to transcode data between UTF-8 and ANSI code page. To me, it's > like we are going back to Python 2 where no data had known encoding > and mojibake was the default. If you manipulate strings in two > encodings, it's likely to make mistakes and concatenate two strings > encoded to two different encodings (=> mojibake). Your makefile example is going back to Python 2, as it has no known encoding. If you want to associate an encoding with bytes, you decode it to text or you explicitly specify what the encoding should be. Your own example makes assumptions about what encoding the bytes have, which is why it has a bug. Cheers, Steve From brenbarn at brenbarn.net Tue Aug 16 22:15:27 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Tue, 16 Aug 2016 19:15:27 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <7239afca-62fa-a3b0-ffbc-784e6f9cf3c1@python.org> Message-ID: <57B3C8BF.1040705@brenbarn.net> On 2016-08-16 17:14, Steve Dower wrote: > The existence of bugs in other applications is not a good reason to help > people create new bugs. I haven't been following all the details in this thread, but isn't the whole purpose of this proposed change to accommodate code (apparently on Linux?) that is buggy in that it assumes it can use bytes for paths without knowing the encoding? It seems like from one perspective allowing bytes in paths is just helping to accommodate a certain very widespread class of bugs. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From steve.dower at python.org Tue Aug 16 23:44:50 2016 From: steve.dower at python.org (Steve Dower) Date: Tue, 16 Aug 2016 20:44:50 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57B3C8BF.1040705@brenbarn.net> References: <57AB6E2D.6050704@python.org> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <7239afca-62fa-a3b0-ffbc-784e6f9cf3c1@python.org> <57B3C8BF.1040705@brenbarn.net> Message-ID: <57B3DDB2.50106@python.org> On 16Aug2016 1915, Brendan Barnwell wrote: > On 2016-08-16 17:14, Steve Dower wrote: >> The existence of bugs in other applications is not a good reason to help >> people create new bugs. > > I haven't been following all the details in this thread, but isn't > the whole purpose of this proposed change to accommodate code > (apparently on Linux?) that is buggy in that it assumes it can use bytes > for paths without knowing the encoding? It seems like from one > perspective allowing bytes in paths is just helping to accommodate a > certain very widespread class of bugs. Using bytes on Linux (in Python) is incorrect but works reliably, while using bytes on Windows is incorrect and unreliable. This change makes it incorrect and reliable on both platforms. I said at the start the correct alternative would be to actually force all developers to use str for paths everywhere. That seems infeasible, so I'm trying to at least improve the situation for Windows users who are running code written by Linux developers. Hence there are tradeoffs, rather than perfection. (Also, you took my quote out of context - it was referring to the fact that non-Python developers sometimes fail to get path encoding correct too. But your question was fair.) Cheers, Steve From steve.dower at python.org Tue Aug 16 23:51:40 2016 From: steve.dower at python.org (Steve Dower) Date: Tue, 16 Aug 2016 20:51:40 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <57B3C8BF.1040705@brenbarn.net> References: <57AB6E2D.6050704@python.org> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <7239afca-62fa-a3b0-ffbc-784e6f9cf3c1@python.org> <57B3C8BF.1040705@brenbarn.net> Message-ID: <57B3DF4C.20808@python.org> I've just created http://bugs.python.org/issue27781 with a patch removing use of the *A API from posixmodule.c and changing the default FS encoding to utf-8. Since we're still discussing whether the encoding should be utf-8 or something else, let's keep that here. But if you want to see how the changes would look, feel free to check out the patch and comment on the issue. When we reach some agreement here I'll try and summarize the points of view on the issue so we have a record there. Cheers, Steve From eryksun at gmail.com Wed Aug 17 01:49:50 2016 From: eryksun at gmail.com (eryk sun) Date: Wed, 17 Aug 2016 05:49:50 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> Message-ID: On Tue, Aug 16, 2016 at 3:56 PM, Steve Dower wrote: > > 2. Windows file system encoding is *always* UTF-16. There's no "assuming > mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what encoding > it is". We know exactly what the encoding is on every supported version of > Windows. UTF-16. Internal filesystem details don't directly affect this issue, except for how each filesystem handles invalid surrogates in names passed to functions in the wide-character API. Some filesystems that are available on Windows do reject a filename that has an invalid surrogate, so I think any program that attempts to create such malformed names is already broken. For example, with NTFS I can create a file named "\ud800b\ud800a\ud800d", but trying this in a VirtualBox shared folder fails because the VBoxSF filesystem can't transcode the name to its internal UTF-8 encoding. Thus I don't think supporting invalid surrogates should be a deciding factor in favor of UTF-16, which I think is an unpractical choice. Bytes coming from files, databases, and the network are likely to be either UTF-8 or some legacy encoding, so the practical choice is between ANSI/OEM and UTF-8. The reliable choice is UTF-8. Using UTF-8 for bytes paths can be adopted at first in 3.6 as an option that gets enabled via an environment variable. If it's not enabled or explicitly disabled, show a visible warning (i.e. not requiring -Wall) that legacy bytes paths are deprecated. In 3.7 UTF-8 can become the default, but the same environment variable should allow opting out to use the legacy encoding. The infrastructure put in place to support this should be able to work either way. Victor, I haven't checked Steve's patch yet in issue 27781, but making this change should largely simplify the Windows support code in many cases, as the bytes path conversion can be centralized, and relatively few functions return strings that need to be encoded back as bytes. posixmodule.c will no longer need separate code paths that call *A functions, e.g.: CreateFileA, CreateDirectoryA, CreateHardLinkA, CreateSymbolicLinkA, DeleteFileA, RemoveDirectoryA, FindFirstFileA, MoveFileExA, GetFileAttributesA, GetFileAttributesExA, SetFileAttributesA, GetCurrentDirectoryA, SetCurrentDirectoryA, SetEnvironmentVariableA, ShellExecuteA From ncoghlan at gmail.com Wed Aug 17 03:18:36 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 17 Aug 2016 17:18:36 +1000 Subject: [Python-ideas] Allow manual creation of DirEntry objects In-Reply-To: References: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> Message-ID: On 17 August 2016 at 09:56, Victor Stinner wrote: > 2016-08-17 1:50 GMT+02:00 Guido van Rossum : >> We could expose the class with a >> constructor that always fails (the C code could construct instances through >> a backdoor). > > Oh, in fact you cannot create an instance of os.DirEntry, it has no > (Python) constructor: > > $ ./python > Python 3.6.0a4+ (default:e615718a6455+, Aug 17 2016, 00:12:17) >>>> import os >>>> os.DirEntry(1) > Traceback (most recent call last): > File "", line 1, in > TypeError: cannot create 'posix.DirEntry' instances > > Only os.scandir() can produce such objects. > > The question is still if it makes sense to allow to create DirEntry > objects in Python :-) I think it does, as it isn't really any different from someone calling the stat() method on a DirEntry instance created by os.scandir(). It also prevents folks attempting things like: def slow_constructor(dirname, entryname): for entry in os.scandir(dirname): if entry.name == entryname: entry.stat() return entry Allowing DirEntry construction from Python further gives us a straightforward answer to the "stat caching" question: "just use os.DirEntry instances and call stat() to make the snapshot" If folks ask why os.DirEntry caches results when pathlib.Path doesn't, we have the answer that cache invalidation is a hard problem, and hence we consider it useful in the lower level interface that is optimised for speed, but problematic in the higher level one that is more focused on cross-platform correctness of filesystem interactions. I don't know whether it would make sense to allow a pre-existing stat result to be based to DirEntry, but it does seem like it might be useful for adapting existing stat-based backend APIs to a more user friendly DirEntry based front end API. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Aug 17 05:35:36 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 17 Aug 2016 18:35:36 +0900 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> Message-ID: <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> Paul Moore writes: > On 16 August 2016 at 16:56, Steve Dower wrote: > > This discussion is for the developers who insist on using bytes > > for paths within Python, and the question is, "how do we best > > represent UTF-16 encoded paths in bytes?" That's incomplete, AFAICS. (Paul makes this point somewhat differently.) We don't want to represent paths in bytes on Windows if we can avoid it. Nor does UTF-16 really enter into it (except for the technical issue of invalid surrogate pairs). So a full statement is, "How do we best represent Windows file system paths in bytes for interoperability with systems that natively represent paths in bytes?" ("Other systems" refers to both other platforms and existing programs on Windows.) BTW, why "surrogate pairs"? Does Windows validate surrogates to ensure they come in pairs, but not necessarily in the right order (or perhaps sometimes they resolve to non-characters such as U+1FFFF)? Paul says: > People passing bytes to open() have in my view, already chosen not > to follow the standard advice of "decode incoming data at the > boundaries of your application". They may have good reasons for > that, but it's perfectly reasonable to expect them to take > responsibility for manually tracking the encoding of the resulting > bytes values flowing through their code. Abstractly true, but in practice there's no such need for those who made the choice! In a properly set up POSIX locale[1], it Just Works by design, especially if you use UTF-8 as the preferred encoding. It's Windows developers and users who suffer, not those who wrote the code, nor their primary audience which uses POSIX platforms. > It is of course, also true that "works for me in my environment" is > a viable strategy - but the maintenance cost of this strategy if > things change (whether in Python, or in the environment) is on the > application developers - they are hoping that cost is minimal, but > that's a risk they choose to take. Nick's point is that the risk is on Windows users and developers for the Windows platform who did *not* make that choice, but rather had it made for them by developers on a different platform where it Just Works. He argues that we should level the playing field. It's also relevant that those developers on the originating platform for the code typically resist complexifying changes to make things work on other platforms too (cf. Victor's advocacy of removing the bytes APIs on Windows). Victor's points are good IMO; he's not just resisting Windows, there are real resource consequences. > Code using Unicode is unaffected, certainly. Ideally that means that > only a tiny minority of users should be affected. Are we over-reacting > to reports of standard practices in Japan? I've no idea. AFAIK, India and Southeast Asia have already abandoned their indigenous standards in favor of Unicode/UTF-8, so it doesn't matter if they use str or bytes, either way Steve's proposal will Just Work. I don't know anything about Arabic, Hebrew, Cyrillic, and Eastern Europeans. That leaves China, which is like Japan in having had a practically universal encoding (ie, every script you'll actually see roundtrips, emoji being the only practical issue) since the 1970s. So I suspect Chinese also primarily use their local code page (GB2312 or GB18030) for plain text documents, possibly including .ini and Makefiles. Over-reaction? I have no idea either. Just a potentially widespread risk, both to users and to Python's reputation for maintaining compatibility. (I don't think it's "fair", but among my acquaintances Python has a poor rep -- Steve's argument that if you develop code for 3.5 you should expect to have to modify it to use it with 3.6 cuts no ice with them.) > > If you see an alternative choice to those listed above, feel free > > to contribute it. Otherwise, can we focus the discussion on these > > (or any new) choices? > > Accept that we should have deprecated builtin open and the io module, > but didn't do so. Extend the existing deprecation of bytes paths on > Windows, to cover *all* APIs, not just the os module, But modify the > deprecation to be "use of the Windows CP_ACP code page (via the ...A > Win32 APIs) is deprecated and will be replaced with use of UTF-8 as > the implied encoding for all bytes paths on Windows starting in Python > 3.7". Document and publicise it much more prominently, as it is a > breaking change. Then leave it one release for people to prepare for > the change. I like this one! If my paranoid fears are realized, in practice it might have to wait two releases, but at least this announcement should get people who are at risk to speak up. If they don't, then you can just call me "Chicken Little" and go ahead! Footnotes: [1] An oxymoron, but there you go. From eryksun at gmail.com Wed Aug 17 09:37:32 2016 From: eryksun at gmail.com (eryk sun) Date: Wed, 17 Aug 2016 13:37:32 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> Message-ID: On Wed, Aug 17, 2016 at 9:35 AM, Stephen J. Turnbull wrote: > BTW, why "surrogate pairs"? Does Windows validate surrogates to > ensure they come in pairs, but not necessarily in the right order (or > perhaps sometimes they resolve to non-characters such as U+1FFFF)? A program can pass the filesystem a name containing one or more surrogate codes that isn't in a valid UTF-16 surrogate pair (i.e. a leading code in the range D800-DBFF followed by a trailing code in the range DC00-DFFF). In the user-mode runtime library and kernel executive, nothing up to the filesystem driver checks for a valid UTF-16 string. Microsoft's filesystems remain compatible with UCS2 from the 90s and don't care that the name isn't legal UTF-16. The same goes for the in-memory filesystems used for named pipes (NPFS, \\.\pipe) and mailslots (MSFS, \\.\mailslot). But non-Microsoft filesystems don't necessarily store names as wide-character strings. They may use UTF-8, in which case an invalid UTF-16 name will cause the system call to fail because it's an invalid parameter. If the filesystem allows creating such a badly named file or directory, it can still be accessed using a regular unicode path, which is how things stand currently. I see that Victor has suggested using "surrogatepass" in issue 27781. That would allow seamless operation. The downside is that bytes have a higher chance of leaking out of Python than strings created by 'surrogateescape' on Unix. But since it isn't a proper Unicode string on disk, at least nothing has changed substantively by transcoding to "surrogatepass" UTF-8. From guido at python.org Wed Aug 17 10:20:24 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Aug 2016 07:20:24 -0700 Subject: [Python-ideas] Allow manual creation of DirEntry objects In-Reply-To: References: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> Message-ID: Brendan, The conclusion is that you should just file a bug asking for a working constructor -- or upload a patch if you want to. --Guido On Wed, Aug 17, 2016 at 12:18 AM, Nick Coghlan wrote: > On 17 August 2016 at 09:56, Victor Stinner > wrote: > > 2016-08-17 1:50 GMT+02:00 Guido van Rossum : > >> We could expose the class with a > >> constructor that always fails (the C code could construct instances > through > >> a backdoor). > > > > Oh, in fact you cannot create an instance of os.DirEntry, it has no > > (Python) constructor: > > > > $ ./python > > Python 3.6.0a4+ (default:e615718a6455+, Aug 17 2016, 00:12:17) > >>>> import os > >>>> os.DirEntry(1) > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: cannot create 'posix.DirEntry' instances > > > > Only os.scandir() can produce such objects. > > > > The question is still if it makes sense to allow to create DirEntry > > objects in Python :-) > > I think it does, as it isn't really any different from someone calling > the stat() method on a DirEntry instance created by os.scandir(). It > also prevents folks attempting things like: > > def slow_constructor(dirname, entryname): > for entry in os.scandir(dirname): > if entry.name == entryname: > entry.stat() > return entry > > Allowing DirEntry construction from Python further gives us a > straightforward answer to the "stat caching" question: "just use > os.DirEntry instances and call stat() to make the snapshot" > > If folks ask why os.DirEntry caches results when pathlib.Path doesn't, > we have the answer that cache invalidation is a hard problem, and > hence we consider it useful in the lower level interface that is > optimised for speed, but problematic in the higher level one that is > more focused on cross-platform correctness of filesystem interactions. > > I don't know whether it would make sense to allow a pre-existing stat > result to be based to DirEntry, but it does seem like it might be > useful for adapting existing stat-based backend APIs to a more user > friendly DirEntry based front end API. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Wed Aug 17 11:33:20 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 17 Aug 2016 08:33:20 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> Message-ID: <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> On 17Aug2016 0235, Stephen J. Turnbull wrote: > Paul Moore writes: > > On 16 August 2016 at 16:56, Steve Dower wrote: > > > > This discussion is for the developers who insist on using bytes > > > for paths within Python, and the question is, "how do we best > > > represent UTF-16 encoded paths in bytes?" > > That's incomplete, AFAICS. (Paul makes this point somewhat > differently.) We don't want to represent paths in bytes on Windows if > we can avoid it. Nor does UTF-16 really enter into it (except for the > technical issue of invalid surrogate pairs). So a full statement is, > "How do we best represent Windows file system paths in bytes for > interoperability with systems that natively represent paths in bytes?" > ("Other systems" refers to both other platforms and existing programs > on Windows.) That's incorrect, or at least possible to interpret correctly as the wrong thing. The goal is "code compatibility with systems ...", not interoperability. Nothing about this will make it easier to take a path from Windows and use it on Linux or vice versa, but it will make it easier/more reliable to take code that uses paths on Linux and use it on Windows. > BTW, why "surrogate pairs"? Does Windows validate surrogates to > ensure they come in pairs, but not necessarily in the right order (or > perhaps sometimes they resolve to non-characters such as U+1FFFF)? Eryk answered this better than I would have. > Paul says: > > > People passing bytes to open() have in my view, already chosen not > > to follow the standard advice of "decode incoming data at the > > boundaries of your application". They may have good reasons for > > that, but it's perfectly reasonable to expect them to take > > responsibility for manually tracking the encoding of the resulting > > bytes values flowing through their code. > > Abstractly true, but in practice there's no such need for those who > made the choice! In a properly set up POSIX locale[1], it Just Works by > design, especially if you use UTF-8 as the preferred encoding. It's > Windows developers and users who suffer, not those who wrote the code, > nor their primary audience which uses POSIX platforms. You mentioned "locale", "preferred" and "encoding" in the same sentence, so I hope you're not thinking of locale.getpreferredencoding()? Changing that function is orthogonal to this discussion, despite the fact that in most cases it returns the same code page as what is going to be used by the file system functions (which in most cases will also be used by the encoding returned from sys.getfilesystemencoding()). When Windows developers and users suffer, I see it as my responsibility to reduce that suffering. Changing Python on Windows should do that without affecting developers on Linux, even though the Right Way is to change all the developers on Linux to use str for paths. > > > If you see an alternative choice to those listed above, feel free > > > to contribute it. Otherwise, can we focus the discussion on these > > > (or any new) choices? > > > > Accept that we should have deprecated builtin open and the io module, > > but didn't do so. Extend the existing deprecation of bytes paths on > > Windows, to cover *all* APIs, not just the os module, But modify the > > deprecation to be "use of the Windows CP_ACP code page (via the ...A > > Win32 APIs) is deprecated and will be replaced with use of UTF-8 as > > the implied encoding for all bytes paths on Windows starting in Python > > 3.7". Document and publicise it much more prominently, as it is a > > breaking change. Then leave it one release for people to prepare for > > the change. > > I like this one! If my paranoid fears are realized, in practice it > might have to wait two releases, but at least this announcement should > get people who are at risk to speak up. If they don't, then you can > just call me "Chicken Little" and go ahead! I don't think there's any reasonable way to noisily deprecate these functions within Python, but certainly the docs can be made clearer. People who explicitly encode with sys.getfilesystemencoding() should not get the deprecation message, but we can't tell whether they got their bytes from the right encoding or a RNG, so there's no way to discriminate. I'm going to put together a summary post here (hopefully today) and get those who have been contributing to basically sign off on it, then I'll take it to python-dev. The possible outcomes I'll propose will basically be "do we keep the status quo, undeprecate and change the functionality, deprecate the deprecation and undeprecate/change in a couple releases, or say that it wasn't a real deprecation so we can deprecate and then change functionality in a couple releases". Cheers, Steve From ncoghlan at gmail.com Wed Aug 17 12:01:20 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Aug 2016 02:01:20 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22446.54896.974310.64547@turnbull.sk.tsukuba.ac.jp> <1471091015.186990.694264945.485144DD@webmail.messagingengine.com> <22447.28801.926296.534902@turnbull.sk.tsukuba.ac.jp> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> Message-ID: On 17 August 2016 at 02:06, Chris Barker wrote: > Just to make sure this is clear, the Pragmatic logic is thus: > > * There are more *nix-centric developers in the Python ecosystem than > Windows-centric (or even Windows-agnostic) developers. > > * The bytes path approach works fine on *nix systems. For the given value of "works fine" that is "works fine, except when it doesn't, and then you end up with mojibake". > * Whatever might be Right and Just -- the reality is that a number of > projects, including important and widely used libraries and frameworks, use > the bytes API for working with filenames and paths, etc. > > Therefore, there is a lot of code that does not work right on Windows. > > Currently, to get it to work right on Windows, you need to write Windows > specific code, which many folks don't want or know how to do (or just can't > support one way or the other). > > So the Solution is to either: > > (A) get everyone to use Unicode "properly", which will work on all > platforms (but only on py3.5 and above?) > > or > > (B) kludge some *nix-compatible support for byte paths into Windows, that > will work at least much of the time. > > It's clear (to me at least) that (A) it the "Right Thing", but real world > experience has shown that it's unlikely to happen any time soon. > > Practicality beats Purity and all that -- this is a judgment call. > > Have I got that right? Yep, pretty much. Based on Stephen Turnbull's concerns, I wonder if we could make a whitelist of universal encodings that Python-on-Windows will use in preference to UTF-8 if they're configured as the current code page. If we accepted GB18030, GB2312, Shift-JIS, and ISO-2022-* as overrides, then problems would be significantly less likely. Another alternative would be to apply a similar solution as we do on Linux with regards to the "surrogateescape" error handler: there are some interfaces (like the standard streams) where we only enable that error handler specifically if the preferred encoding is reported as ASCII. In 2016, we're *very* skeptical about any properly configured system actually being ASCII-only (rather than that value showing up because the POSIX standards mandate it as the default), so we don't really believe the OS when it tells us that. The equivalent for Windows would be to disbelieve the configured code page only when it was reported as "mbcs" - for folks that had configured their system to use something other than the default, Python would believe them, just as we do on Linux. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve.dower at python.org Wed Aug 17 12:38:10 2016 From: steve.dower at python.org (Steve Dower) Date: Wed, 17 Aug 2016 09:38:10 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> Message-ID: <55cafbcb-7ead-6742-46ff-f6c5314920d7@python.org> On 17Aug2016 0901, Nick Coghlan wrote: > On 17 August 2016 at 02:06, Chris Barker wrote: >> So the Solution is to either: >> >> (A) get everyone to use Unicode "properly", which will work on all >> platforms (but only on py3.5 and above?) >> >> or >> >> (B) kludge some *nix-compatible support for byte paths into Windows, that >> will work at least much of the time. >> >> It's clear (to me at least) that (A) it the "Right Thing", but real world >> experience has shown that it's unlikely to happen any time soon. >> >> Practicality beats Purity and all that -- this is a judgment call. >> >> Have I got that right? > > Yep, pretty much. Based on Stephen Turnbull's concerns, I wonder if we > could make a whitelist of universal encodings that Python-on-Windows > will use in preference to UTF-8 if they're configured as the current > code page. If we accepted GB18030, GB2312, Shift-JIS, and ISO-2022-* > as overrides, then problems would be significantly less likely. > > Another alternative would be to apply a similar solution as we do on > Linux with regards to the "surrogateescape" error handler: there are > some interfaces (like the standard streams) where we only enable that > error handler specifically if the preferred encoding is reported as > ASCII. In 2016, we're *very* skeptical about any properly configured > system actually being ASCII-only (rather than that value showing up > because the POSIX standards mandate it as the default), so we don't > really believe the OS when it tells us that. > > The equivalent for Windows would be to disbelieve the configured code > page only when it was reported as "mbcs" - for folks that had > configured their system to use something other than the default, > Python would believe them, just as we do on Linux. The problem here is that "mbcs" is not configurable - it's a meta-encoder that uses whatever is configured as the "language (system locale) to use when displaying text in programs that do not support Unicode" (quote from the dialog where administrators can configure this). So there's nothing to disbelieve here. And even on machines where the current code page is "reliable", UTF-16 is still the actual encoding, which means UTF-8 is still a better choice for representing the path as a blob of bytes. Currently we have inconsistent encoding between different Windows machines and could either remove that inconsistency completely or simply reduce it for (approx.) English speakers. I would rather an extreme here - either make it consistent regardless of user configuration, or make it so broken that nobody can use it at all. (And note that the correct way to support *some* other FS encodings would be to change the return value from sys.getfilesystemencoding(), which breaks people who currently ignore that just as badly as changing it to utf-8 would.) Cheers, Steve From storchaka at gmail.com Wed Aug 17 14:40:40 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 17 Aug 2016 21:40:40 +0300 Subject: [Python-ideas] Allow manual creation of DirEntry objects In-Reply-To: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> References: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> Message-ID: On 16.08.16 22:35, Brendan Moloney wrote: > I have a bunch of functions that operate on DirEntry objects, typically > doing some sort of filtering > to select the paths I actually want to process. The overwhelming > majority of the time these functions > are going to be operating on DirEntry objects produced by the scandir > function, but there are some > cases where the user will be supplying the path themselves (for example, > the root of a directory tree > to process). In my current code base that uses the scandir package I > just wrap these paths in a > 'GenericDirEntry' object and then pass them through the filter functions > the same as any results > coming from the scandir function. You can just create an object that duck-types DirEntry. See for example _DummyDirEntry in the os module. From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Aug 17 22:32:30 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 18 Aug 2016 11:32:30 +0900 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> Message-ID: <22453.7742.978788.404595@turnbull.sk.tsukuba.ac.jp> eryk sun writes: > On Wed, Aug 17, 2016 at 9:35 AM, Stephen J. Turnbull > wrote: > > BTW, why "surrogate pairs"? Does Windows validate surrogates to > > ensure they come in pairs, but not necessarily in the right order (or > > perhaps sometimes they resolve to non-characters such as U+1FFFF)? > > Microsoft's filesystems remain compatible with UCS2 So it's not just invalid surrogate *pairs*, it's invalid surrogates of all kinds. This means that it's theoretically possible (though I gather that it's unlikely in the extreme) for a real Windows filename to indistinguishable from one generated by Python's surrogateescape handler. What happens when Python's directory manipulation functions on Windows encounter such a filename? Do they try to write it to the disk directory? Do they succeed? Does that depend on surrogateescape? Is there a reason in practice to allow surrogateescape at all on names in Windows filesystems, at least when using the *W API? You mention non-Microsoft filesystems; are they common enough to matter? I admit that as we converge on sanity (UTF-8 for text/* content, some kind of Unicode for filesystem names) none of this is very likely to matter, but I'm a worrywart.... Steve From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Aug 17 22:42:54 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 18 Aug 2016 11:42:54 +0900 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> Message-ID: <22453.8366.592660.590478@turnbull.sk.tsukuba.ac.jp> Steve Dower writes: > On 17Aug2016 0235, Stephen J. Turnbull wrote: > > So a full statement is, "How do we best represent Windows file > > system paths in bytes for interoperability with systems that > > natively represent paths in bytes?" ("Other systems" refers to > > both other platforms and existing programs on Windows.) > > That's incorrect, or at least possible to interpret correctly as > the wrong thing. The goal is "code compatibility with systems ...", > not interoperability. You're right, I stated that incorrectly. I don't have anything to add to your corrected version. > > In a properly set up POSIX locale[1], it Just Works by design, > > especially if you use UTF-8 as the preferred encoding. It's > > Windows developers and users who suffer, not those who wrote the > > code, nor their primary audience which uses POSIX platforms. > > You mentioned "locale", "preferred" and "encoding" in the same sentence, > so I hope you're not thinking of locale.getpreferredencoding()? Changing > that function is orthogonal to this discussion, You consistently ignore Makefiles, .ini, etc. It is *not* orthogonal, it is *the* reason for all opposition to your proposal or request that it be delayed. Filesystem names *are* text in part because they are *used as filenames in text*. > When Windows developers and users suffer, I see it as my responsibility > to reduce that suffering. Changing Python on Windows should do that > without affecting developers on Linux, even though the Right Way is to > change all the developers on Linux to use str for paths. I resent that. If I were a partisan Linux fanboy, I'd be cheering you on because I think your proposal is going to hurt an identifiable and large class of *Windows* users. I know about and fear this possiblity because they use a language I love (Japanese) and an encoding I hate but have achieved a state of peaceful coexistence with (Shift JIS). And on the general principle, *I* don't disagree. I mentioned earlier that I use only the str interfaces in my own code on Linux and Mac OS X, and that I suspect that there are no real efficiency implications to using str rather than bytes for those interfaces. On the other hand, the programming convenience of reading the occasional "text" filename (or other text, such as XML tags) out of a binary stream and passing it directly to filesystem APIs cannot be denied. I think that the kind of usage you propose (a fixed, universal codec, universally accepted; ie, 'utf-8') is the best way to handle that in the long run. But as Grandmaster Lasker said, "Before the end game, the gods have placed the middle game." (Lord Keynes isn't relevant here, Python will outlive all of us. :-) > I don't think there's any reasonable way to noisily deprecate these > functions within Python, but certainly the docs can be made > clearer. People who explicitly encode with > sys.getfilesystemencoding() should not get the deprecation message, > but we can't tell whether they got their bytes from the right > encoding or a RNG, so there's no way to discriminate. I agree with you within Python; the custom is for DeprecationWarnings to be silent by default. As for "making noise", how about announcing the deprecation as like the top headline for 3.6, postponing the actual change to 3.7, and in the meantime you and Nick do a keynote duet at PyCon? (Your partner could be Guido, too, but Nick has been the most articulate proponent for this particular aspect of "inclusion". I think having a representative from the POSIX world explaining the importance of this for "all of us" would greatly multiply the impact.) Perhaps, given my proposed timing, a discussion at the language summit in '17 and the keynote in '18 would be the best timing. (OT, political: I've been strongly influenced in this proposal by recently reading http://blog.aurynn.com/contempt-culture. There's not as much of it in Python as in other communities I'm involved in, but I think this would be a good symbolic opportunity to express our oppostion to it. "Inclusion" isn't just about gender and race!) > I'm going to put together a summary post here (hopefully today) and get > those who have been contributing to basically sign off on it, then I'll > take it to python-dev. The possible outcomes I'll propose will basically > be "do we keep the status quo, undeprecate and change the functionality, > deprecate the deprecation and undeprecate/change in a couple releases, > or say that it wasn't a real deprecation so we can deprecate and then > change functionality in a couple releases". FWIW, of those four, I dislike 'status quo' the most, and like 'say it wasn't real, deprecate and change' the best. Although I lean toward phrasing that as "we deprecated it, but we realize that practitioners are by and large not aware of the deprecation, and nobody expects the Spanish Inquisition". @Nick, if you're watching: I wonder if it would be possible to expand the "in the file system, bytes are UTF-8" proposal to POSIX as well, perhaps for 3.8? From eryksun at gmail.com Thu Aug 18 03:27:35 2016 From: eryksun at gmail.com (eryk sun) Date: Thu, 18 Aug 2016 07:27:35 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <22453.7742.978788.404595@turnbull.sk.tsukuba.ac.jp> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <22453.7742.978788.404595@turnbull.sk.tsukuba.ac.jp> Message-ID: On Thu, Aug 18, 2016 at 2:32 AM, Stephen J. Turnbull wrote: > > So it's not just invalid surrogate *pairs*, it's invalid surrogates of > all kinds. This means that it's theoretically possible (though I > gather that it's unlikely in the extreme) for a real Windows filename > to indistinguishable from one generated by Python's surrogateescape > handler. Absolutely if the filesystem is one of Microsoft's such as NTFS, FAT32, exFAT, ReFS, NPFS (named pipes), MSFS (mailslots) -- and I'm pretty sure it's also possible with CDFS and UDFS. UDF allows any Unicode character except NUL. > What happens when Python's directory manipulation functions on Windows > encounter such a filename? Do they try to write it to the disk > directory? Do they succeed? Does that depend on surrogateescape? Python allows these 'Unicode' (but not strictly UTF compatible) strings, so it doesn't have a problem with such filenames, as long as it's calling the Windows wide-character APIs. > Is there a reason in practice to allow surrogateescape at all on names > in Windows filesystems, at least when using the *W API? You mention > non-Microsoft filesystems; are they common enough to matter? Previously I gave an example with a VirtualBox shared folder, which rejects names with invalid surrogates. I don't know how common that is in general. I typically switch between 2 guests on a Linux host and share folders between systems. In Windows I mount shared folders as directory symlinks in C:\Mount. I just tested another example that led to different results. Ext2Fsd is a free ext2/ext3 filesystem driver for Windows. I mounted an ext2 disk in Windows 10. Next, in Python I created a file named "\udc00b\udc00a\udc00d" in the root directory. Ext2Fsd defaults to using UTF-8 as the drive codepage, so I expected it to reject this filename, just like VBoxSF does. But it worked: >>> os.listdir('.')[-1] '\udc00b\udc00a\udc00d' As expected the ANSI API substitutes question marks for the surrogate codes: >>> os.listdir(b'.')[-1] b'?b?a?d' So what did Ext2Fsd write in this supposedly UTF-8 filesystem? I mounted the disk in Linux to check: >>> os.listdir(b'.')[-1] b'\xed\xb0\x80b\xed\xb0\x80a\xed\xb0\x80d' It blindly encoded the surrogate codes, creating invalid UTF-8. I think it's called WTF-8 (Wobbly Transformation Format). The file manager in Linux displays this file as "???b???a???d (invalid encoding)", and ls prints "???b???a???d". Python uses its surrogateescape error handler: >>> os.listdir('.')[-1] '\udced\udcb0\udc80b\udced\udcb0\udc80a\udced\udcb0\udc80d' The original name can be decoded using the surrogatepass error handler: >>> os.listdir(b'.')[-1].decode(errors='surrogatepass') '\udc00b\udc00a\udc00d' From steve.dower at python.org Thu Aug 18 09:23:16 2016 From: steve.dower at python.org (Steve Dower) Date: Thu, 18 Aug 2016 06:23:16 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <22453.8366.592660.590478@turnbull.sk.tsukuba.ac.jp> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> <22453.8366.592660.590478@turnbull.sk.tsukuba.ac.jp> Message-ID: "You consistently ignore Makefiles, .ini, etc." Do people really do open('makefile', 'rb'), extract filenames and try to use them without ever decoding the file contents? I've honestly never seen that, and it certainly looks like the sort of thing Python 3 was intended to discourage. (As soon as you open(..., 'r') you're only affected by this change if you explicitly encode again with mbcs.) Top-posted from my Windows Phone -----Original Message----- From: "Stephen J. Turnbull" Sent: ?8/?17/?2016 19:43 To: "Steve Dower" Cc: "Paul Moore" ; "Python-Ideas" Subject: Re: [Python-ideas] Fix default encodings on Windows Steve Dower writes: > On 17Aug2016 0235, Stephen J. Turnbull wrote: > > So a full statement is, "How do we best represent Windows file > > system paths in bytes for interoperability with systems that > > natively represent paths in bytes?" ("Other systems" refers to > > both other platforms and existing programs on Windows.) > > That's incorrect, or at least possible to interpret correctly as > the wrong thing. The goal is "code compatibility with systems ...", > not interoperability. You're right, I stated that incorrectly. I don't have anything to add to your corrected version. > > In a properly set up POSIX locale[1], it Just Works by design, > > especially if you use UTF-8 as the preferred encoding. It's > > Windows developers and users who suffer, not those who wrote the > > code, nor their primary audience which uses POSIX platforms. > > You mentioned "locale", "preferred" and "encoding" in the same sentence, > so I hope you're not thinking of locale.getpreferredencoding()? Changing > that function is orthogonal to this discussion, You consistently ignore Makefiles, .ini, etc. It is *not* orthogonal, it is *the* reason for all opposition to your proposal or request that it be delayed. Filesystem names *are* text in part because they are *used as filenames in text*. > When Windows developers and users suffer, I see it as my responsibility > to reduce that suffering. Changing Python on Windows should do that > without affecting developers on Linux, even though the Right Way is to > change all the developers on Linux to use str for paths. I resent that. If I were a partisan Linux fanboy, I'd be cheering you on because I think your proposal is going to hurt an identifiable and large class of *Windows* users. I know about and fear this possiblity because they use a language I love (Japanese) and an encoding I hate but have achieved a state of peaceful coexistence with (Shift JIS). And on the general principle, *I* don't disagree. I mentioned earlier that I use only the str interfaces in my own code on Linux and Mac OS X, and that I suspect that there are no real efficiency implications to using str rather than bytes for those interfaces. On the other hand, the programming convenience of reading the occasional "text" filename (or other text, such as XML tags) out of a binary stream and passing it directly to filesystem APIs cannot be denied. I think that the kind of usage you propose (a fixed, universal codec, universally accepted; ie, 'utf-8') is the best way to handle that in the long run. But as Grandmaster Lasker said, "Before the end game, the gods have placed the middle game." (Lord Keynes isn't relevant here, Python will outlive all of us. :-) > I don't think there's any reasonable way to noisily deprecate these > functions within Python, but certainly the docs can be made > clearer. People who explicitly encode with > sys.getfilesystemencoding() should not get the deprecation message, > but we can't tell whether they got their bytes from the right > encoding or a RNG, so there's no way to discriminate. I agree with you within Python; the custom is for DeprecationWarnings to be silent by default. As for "making noise", how about announcing the deprecation as like the top headline for 3.6, postponing the actual change to 3.7, and in the meantime you and Nick do a keynote duet at PyCon? (Your partner could be Guido, too, but Nick has been the most articulate proponent for this particular aspect of "inclusion". I think having a representative from the POSIX world explaining the importance of this for "all of us" would greatly multiply the impact.) Perhaps, given my proposed timing, a discussion at the language summit in '17 and the keynote in '18 would be the best timing. (OT, political: I've been strongly influenced in this proposal by recently reading http://blog.aurynn.com/contempt-culture. There's not as much of it in Python as in other communities I'm involved in, but I think this would be a good symbolic opportunity to express our oppostion to it. "Inclusion" isn't just about gender and race!) > I'm going to put together a summary post here (hopefully today) and get > those who have been contributing to basically sign off on it, then I'll > take it to python-dev. The possible outcomes I'll propose will basically > be "do we keep the status quo, undeprecate and change the functionality, > deprecate the deprecation and undeprecate/change in a couple releases, > or say that it wasn't a real deprecation so we can deprecate and then > change functionality in a couple releases". FWIW, of those four, I dislike 'status quo' the most, and like 'say it wasn't real, deprecate and change' the best. Although I lean toward phrasing that as "we deprecated it, but we realize that practitioners are by and large not aware of the deprecation, and nobody expects the Spanish Inquisition". @Nick, if you're watching: I wonder if it would be possible to expand the "in the file system, bytes are UTF-8" proposal to POSIX as well, perhaps for 3.8? -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Thu Aug 18 11:25:33 2016 From: steve.dower at python.org (Steve Dower) Date: Thu, 18 Aug 2016 08:25:33 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> References: <57AB6E2D.6050704@python.org> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> Message-ID: Summary for python-dev. This is the email I'm proposing to take over to the main mailing list to get some actual decisions made. As I don't agree with some of the possible recommendations, I want to make sure that they're represented fairly. I also want to summarise the background leading to why we should consider making a change here at all, rather than simply leaving it alone. There's a chance this will all make its way into a PEP, depending on how controversial the core team thinks this is. Please let me know if you think I've misrepresented (or unfairly represented) any of the positions, or if you think I can simplify/clarify anything in here. Please don't treat this like a PEP review - it's just going to be an email to python-dev - but the more we can avoid having the discussions there we've already had here the better. Cheers, Steve --- Background ========== File system paths are almost universally represented as text in some encoding determined by the file system. In Python, we expose these paths via a number of interfaces, such as the os and io modules. Paths may be passed either direction across these interfaces, that is, from the filesystem to the application (for example, os.listdir()), or from the application to the filesystem (for example, os.unlink()). When paths are passed between the filesystem and the application, they are either passed through as a bytes blob or converted to/from str using sys.getfilesystemencoding(). The result of encoding a string with sys.getfilesystemencoding() is a blob of bytes in the native format for the default file system. On Windows, the native format for the filesystem is utf-16-le. The recommended platform APIs for accessing the filesystem all accept and return text encoded in this format. However, prior to Windows NT (and possibly further back), the native format was a configurable machine option and a separate set of APIs existed to accept this format. The option (the "active code page") and these APIs (the "*A functions") still exist in recent versions of Windows for backwards compatibility, though new functionality often only has a utf-16-le API (the "*W functions"). In Python, we recommend using str as the default format on Windows because it can correctly round-trip all the characters representable in utf-16-le. Our support for bytes explicitly uses the *A functions and hence the encoding for the bytes is "whatever the active code page is". Since the active code page cannot represent all Unicode characters, the conversion of a path into bytes can lose information without warning. As a demonstration of this: >>> open('test\uAB00.txt', 'wb').close() >>> import glob >>> glob.glob('test*') ['test\uab00.txt'] >>> glob.glob(b'test*') [b'test?.txt'] The Unicode character in the second call to glob is missing information. You can observe the same results in os.listdir() or any function that matches its result type to the parameter type. Why is this a problem? ====================== While the obvious and correct answer is to just use str everywhere, it remains well known that on Linux and MacOS it is perfectly okay to use bytes when taking values from the filesystem and passing them back. Doing so also avoids the cost of decoding and reencoding, such that (theoretically), code like below should be faster because of the `b'.'`: >>> for f in os.listdir(b'.'): ... os.stat(f) ... On Windows, if a filename exists that cannot be encoding with the active code page, you will receive an error from the above code. These errors are why in Python 3.3 the use of bytes paths on Windows was deprecated (listed in the What's New, but not clearly obvious in the documentation - more on this later). The above code produces multiple deprecation warnings in 3.3, 3.4 and 3.5 on Windows. However, we still keep seeing libraries use bytes paths, which can cause unexpected issues on Windows. Given the current approach of quietly recommending that library developers either write their code twice (once for bytes and once for str) or use str exclusively are not working, we should consider alternative mitigations. Proposals ========= There are two dimensions here - the fix and the timing. We can basically choose any fix and any timing. The main differences between the fixes are the balance between incorrect behaviour and backwards-incompatible behaviour. The main issue with respect to timing is whether or not we believe using bytes as paths on Windows was correctly deprecated in 3.3 and sufficiently advertised since to allow us to change the behaviour in 3.6. Fixes ----- Fix #1: Change sys.getfilesystemencoding() to utf-8 on Windows Currently the default filesystem encoding is 'mbcs', which is a meta-encoder that uses the active code page. In reality, our implementation uses the *A APIs and we don't explicitly decode bytes in order to pass them to the filesystem. This allows the OS to quietly replace invalid characters (the equivalent of 'mbcs:replace'). This proposal would remove all use of the *A APIs and only ever call the *W APIs. When paths are returned to Python as str, they will be decoded from utf-16-le. When paths are to be returned as bytes, we would decode from utf-16-le to utf-8 using surrogatepass. Equally, when paths are provided as bytes, they are decoded from utf-8 into utf-16-le and passed to the *W APIs. The choice of utf-8 is to ensure the ability to round-trip, while also allowing basic manipulation of paths as bytes (basically, locating and slicing at '\' characters). It is debated, but I believe this is not a backwards compatibility issue because: * byte paths in Python are specified as being encoded by sys.getfilesystemencoding() * byte paths on Windows have been deprecated for three versions Unfortunately, the deprecation is not explicitly called out anywhere in the docs apart from the What's New page, so there is an argument that it shouldn't be counted despite the warnings in the interpreter. However, this is more directly addressed in the discussion of timing below. Equally, sys.getfilesystemencoding() documents the specific return values for various platforms, as well as that it is part of the protocol for using bytes to represent filesystem strings. I believe both of these arguments are invalid, that the only code that will break as a result of this change is relying on deprecated functionality and not correctly following the encoding contract, and that the (probably noisy) breakage that will occur is less bad than the silent breakage that currently exists. As far as implementation goes, there is already a patch for this at http://bugs.python.org/issue27781. In short, we update the path converter to decode bytes (path->narrow) to Unicode (path->wide) and remove all the code that would call *A APIs. In my patch I've changed path->narrow to a flag that indicates whether to convert back to bytes on return, and also to prevent compilation of code that tries to use ->narrow as a string on Windows (maybe that will get too annoying for contributors? good discussion for the tracker IMHO). Fix #2: Do the mbcs decoding ourselves This is essentially the same as fix #1, but instead of changing to utf-8 we keep mbcs as the encoding. This approach will allow us to utilise new functionality that is only available as *W APIs, and also lets us be more strict about encoding/decoding to bytes. For example, rather than silently replacing Unicode characters with '?', we could warn or fail the operation, potentially modifying that behaviour with an environment variable or flag. Compared to fix #1, this will enable some new functionality but will not fix any of the problems immediately. New runtime errors may cause some problems to be more obvious and lead to fixes, provided library maintainers are interested in supporting Windows and adding a separate code path to treat filesystem paths as strings. Fix #3: Make bytes paths on Windows an error By preventing the use of bytes paths on Windows completely we prevent users from hitting encoding issues. However, we do this at the expense of usability. I don't have numbers of libraries that will simply fail on Windows if this "fix" is made, but given I've already had people directly email me and tell me about their problems we can safely assume it's non-zero. I'm really not a fan of this fix, because it doesn't actually make things better in a practical way, despite being more "pure". Timing #1: Change it in 3.6 This timing assumes that we believe the deprecation of using bytes for paths in Python 3.3 was sufficiently well advertised that we can freely make changes in 3.6. A typical deprecation cycle would be two versions before removal (though we also often leave things in forever when they aren't fundamentally broken), so we have passed that point and theoretically can remove or change the functionality without breaking it. In this case, we would announce in 3.6 that using bytes as paths on Windows is no longer deprecated, and that the encoding used is whatever is returned by sys.getfilesystemencoding(). Timing #2: Change it in 3.7 This timing assumes that the deprecation in 3.3 was valid, but acknowledges that it was not well publicised. For 3.6, we aggressively make it known that only strings should be used to represent paths on Windows and bytes are invalid and going to change in 3.7. (It has been suggested that I could use a keynote at PyCon to publicise this, and while I'd totally accept a keynote, I'd hate to subject a crowd to just this issue for an hour :) ). My concern with this approach is that there is no benefit to the change at all. If we aggressively publicise the fact that libraries that don't handle Unicode paths on Windows properly are using deprecated functionality and need to be fixed by 3.7 in order to avoid breaking (more precisely - continuing to be broken, but with a different error message), then we will alienate non-Windows developers further from the platform (net loss for the ecosystem) and convince some to switch to str everywhere (net gain for the ecosystem). The latter case removes the need to make any change in 3.7 at all, so we would really just be making noise about something that people haven't noticed and not necessarily going in and fixing anything. Timing #3: Change it in 3.8 This timing assumes that the deprecation in 3.3 was not sufficient and we need to start a new deprecation cycle. This is strengthened by the fact that the deprecation announcement does not explicitly include the io module or the builtin open() function, and so some developers may believe that using bytes for paths with these is okay despite the os module being deprecated. The one upside to this approach is that it would also allow us to change locale.getpreferredencoding() to utf-8 on Windows (to affect the default behaviour of open(..., 'r') ), which I don't believe is going to be possible without a new deprecation cycle. There is a strong argument that the following code should also round-trip regardless of platform: >>> with open('list.txt', 'w') as f: ... for i in os.listdir('.'): ... print(i, file=f) ... >>> with open('list.txt', 'r') as f: ... files = list(f) ... Currently, the default encoding for open() cannot represent all filenames that may be returned from listdir(). This may affect makefiles and configuration files that contain paths. Currently they will work correctly for paths that can be represented in the machine's active code page (though it should be noted that the *A APIs may be changed to use the OEM code page rather than the active code page, which would also break this case). Possibly resolving both issues simultaneously is worth waiting for two more releases? I'm not convinced the change to getfilesystemencoding() needs to wait for getpreferredencoding() to also change, or that they necessarily need to match, but it would not be hugely surprising to see the changes bundled together. I'll also note that there has been no discussion about changing getpreferredencoding() so far, though there have been a number of "+1" votes alongside some "+1 with significant concerns" votes. Changing the default encoding of the contents of data files is pretty scary, so I'm not in any rush to force it in. Acknowledgements ================ Thanks to Stephen Turnbull, Eryk Sun, Victor Stinner and Random832 for their significant contributions and willingness to engage, and to everyone else on python-ideas for contributing to the discussion. From rosuav at gmail.com Thu Aug 18 11:29:00 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 19 Aug 2016 01:29:00 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> Message-ID: On Fri, Aug 19, 2016 at 1:25 AM, Steve Dower wrote: >>>> open('test\uAB00.txt', 'wb').close() >>>> import glob >>>> glob.glob('test*') > ['test\uab00.txt'] >>>> glob.glob(b'test*') > [b'test?.txt'] > > The Unicode character in the second call to glob is missing information. You > can observe the same results in os.listdir() or any function that matches > its result type to the parameter type. Apologies if this is just noise, but I'm a little confused by this. The second call to glob doesn't have any Unicode characters at all, the way I see it - it's all bytes. Am I completely misunderstanding this? ChrisA From flying-sheep at web.de Thu Aug 18 11:05:00 2016 From: flying-sheep at web.de (Philipp A.) Date: Thu, 18 Aug 2016 15:05:00 +0000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-literal?= =?utf-8?q?s_impossible?= Message-ID: Hi, I originially posted this via google groups, which didn?t make it through to the list proper, sorry! Read it here please: https://groups.google.com/forum/#!topic/python-ideas/V1U6DGL5J1s My arguments are basically: 1. f-literals are semantically not strings, but expressions. 2. Their escape sequences in the code parts are fundamentally both detrimental and superfluous (they?re only in for convenience, as confirmed by Guido in the quote below) 3. They?re detrimental because Syntax highlighters are (by design) unable to handle this part of Python 3.6a4?s grammar. This will cause code to be highlighted as parts of a string and therefore overlooked. i?m very sure this will cause bugs. 4. The fact that people see the embedded expressions as somehow ?part of the string? is confusing. My poposal is to redo their grammar: They shouldn?t be parsed as strings and post-processed, but be their own thing. This also opens the door to potentially extend to with something like JavaScript?s tagged templates) Without the limitations of the string tokenization code/rules, only the string parts would have escape sequences, and the expression parts would be regular python code (?holes? in the literal). Below the mentioned quote and some replies to the original thread: Guido van Rossum schrieb am Mi., 17. Aug. 2016 um 20:11 Uhr: > The explanation is honestly that the current approach is the most > straightforward for the implementation (it's pretty hard to intercept the > string literal before escapes have been processed) and nobody cares enough > about the edge cases to force the implementation to jump through more hoops. > > I really don't think this discussion should be reopened. If you disagree, > please start a new thread on python-ideas. > I really think it should. Please look at python code with f-literals. if they?re highlighted as strings throughout, you won?t be able to spot which parts are code. if they?re highlighted as code, the escaping rules guarantee that most highlighters can?t correctly highlight python anymore. i think that?s a big issue for readability. Brett Cannon schrieb am Mi., 17. Aug. 2016 um 20:28 Uhr: > They are still strings, there is just post-processing on the string itself > to do the interpolation. > Sounds hacky to me. I?d rather see a proper parser for them, which of course would make my vision easy. > By doing it this way the implementation can use Python itself to do the > tokenizing of the string, while if you do the string interpolation > beforehand you would then need to do it entirely at the C level which is > very messy and painful since you're explicitly avoiding Python's automatic > handling of Unicode, etc. > of course we reuse the tokenization for the string parts. as said, you can view an f-literal as interleaved sequence of strings and expressions with an attached format specification. starts the f-literal, string contents follow. the only difference to other strings is <{> which starts expression tokenization. once the expression ends, an optional follows, then a <}> to switch back to string tokenization this repeats until (in string parsing mode) a <'> is encountered which ends the f-literal. You also make it harder to work with Unicode-based variable names (or at > least explain it). If you have Unicode in a variable name but you can't use > \N{} in the string to help express it you then have to say "normal Unicode > support in the string applies everywhere *but* in the string interpolation > part". > i think you?re just proving my point that the way f-literals work now is confusing. the embedded expressions are just normal python. the embedded strings just normal strings. you can simply switch between both using <{> and <[format]}>. unicode in variable names works exactly the same as in all other python code because it is regular python code. Or another reason is you can explain f-strings as "basically > str.format_map(**locals(), **globals()), but without having to make the > actual method call" (and worrying about clashing keys but I couldn't think > of a way of using dict.update() in a single line). But with your desired > change it kills this explanation by saying f-strings aren't like this but > some magical string that does all of this stuff before normal string > normalization occurs. > no, it?s simply the expression parts (that for normal formatting are inside of the braces of .format(...)) are *interleaved* in between string parts. they?re not part of the string. just regular plain python code. Cheers, and i really hope i?ve made a strong case, philipp -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Thu Aug 18 11:45:55 2016 From: random832 at fastmail.com (Random832) Date: Thu, 18 Aug 2016 11:45:55 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> Message-ID: <1471535155.2050843.699224841.42B2C38C@webmail.messagingengine.com> On Thu, Aug 18, 2016, at 11:29, Chris Angelico wrote: > >>>> glob.glob('test*') > > ['test\uab00.txt'] > >>>> glob.glob(b'test*') > > [b'test?.txt'] > > > > The Unicode character in the second call to glob is missing information. > > Apologies if this is just noise, but I'm a little confused by this. > The second call to glob doesn't have any Unicode characters at all, > the way I see it - it's all bytes. Am I completely misunderstanding > this? The unicode character is in the actual name of the actual file being matched. That the byte string returned by glob fails to represent that character in any encoding is the problem. Glob results don't exist in a vacuum, they're supposed to represent, and be usable to access, files that actually exist on the real filesystem. From steve.dower at python.org Thu Aug 18 11:54:26 2016 From: steve.dower at python.org (Steve Dower) Date: Thu, 18 Aug 2016 08:54:26 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> Message-ID: <80f137eb-8452-d35e-be3e-724ca62d3da1@python.org> On 18Aug2016 0829, Chris Angelico wrote: > The second call to glob doesn't have any Unicode characters at all, > the way I see it - it's all bytes. Am I completely misunderstanding > this? You're not the only one - I think this has been the most common misunderstanding. On Windows, the paths as stored in the filesystem are actually all text - more precisely, utf-16-le encoded bytes, represented as 16-bit characters strings. Converting to an 8-bit character representation only exists for compatibility with code written for other platforms (either Linux, or much older versions of Windows). The operating system has one way to do the conversion to bytes, which Python currently uses, but since we control that transformation I'm proposing an alternative conversion that is more reliable than compatible (with Windows 3.1... shouldn't affect compatibility with code that properly handles multibyte encodings, which should include anything developed for Linux in the last decade or two). Does that help? I tried to keep the explanation short and focused :) Cheers, Steve From rosuav at gmail.com Thu Aug 18 12:00:43 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 19 Aug 2016 02:00:43 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <80f137eb-8452-d35e-be3e-724ca62d3da1@python.org> References: <57AB6E2D.6050704@python.org> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> <80f137eb-8452-d35e-be3e-724ca62d3da1@python.org> Message-ID: On Fri, Aug 19, 2016 at 1:54 AM, Steve Dower wrote: > On 18Aug2016 0829, Chris Angelico wrote: >> >> The second call to glob doesn't have any Unicode characters at all, >> the way I see it - it's all bytes. Am I completely misunderstanding >> this? > > > You're not the only one - I think this has been the most common > misunderstanding. > > On Windows, the paths as stored in the filesystem are actually all text - > more precisely, utf-16-le encoded bytes, represented as 16-bit characters > strings. > > Converting to an 8-bit character representation only exists for > compatibility with code written for other platforms (either Linux, or much > older versions of Windows). The operating system has one way to do the > conversion to bytes, which Python currently uses, but since we control that > transformation I'm proposing an alternative conversion that is more reliable > than compatible (with Windows 3.1... shouldn't affect compatibility with > code that properly handles multibyte encodings, which should include > anything developed for Linux in the last decade or two). > > Does that help? I tried to keep the explanation short and focused :) Ah, I think I see what you mean. There's a slight ambiguity in the word "missing" here. 1) The Unicode character in the result lacks some of the information it should have 2) The Unicode character in the file name is information that has now been lost. My reading was the first, but AIUI you actually meant the second. If so, I'd be inclined to reword it very slightly, eg: "The Unicode character in the second call to glob is now lost information." Is that a correct interpretation? ChrisA From steve.dower at python.org Thu Aug 18 12:07:42 2016 From: steve.dower at python.org (Steve Dower) Date: Thu, 18 Aug 2016 09:07:42 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> <80f137eb-8452-d35e-be3e-724ca62d3da1@python.org> Message-ID: <95b65bad-c9fa-daa0-08d1-fd2bdcec0376@python.org> On 18Aug2016 0900, Chris Angelico wrote: > On Fri, Aug 19, 2016 at 1:54 AM, Steve Dower wrote: >> On 18Aug2016 0829, Chris Angelico wrote: >>> >>> The second call to glob doesn't have any Unicode characters at all, >>> the way I see it - it's all bytes. Am I completely misunderstanding >>> this? >> >> >> You're not the only one - I think this has been the most common >> misunderstanding. >> >> On Windows, the paths as stored in the filesystem are actually all text - >> more precisely, utf-16-le encoded bytes, represented as 16-bit characters >> strings. >> >> Converting to an 8-bit character representation only exists for >> compatibility with code written for other platforms (either Linux, or much >> older versions of Windows). The operating system has one way to do the >> conversion to bytes, which Python currently uses, but since we control that >> transformation I'm proposing an alternative conversion that is more reliable >> than compatible (with Windows 3.1... shouldn't affect compatibility with >> code that properly handles multibyte encodings, which should include >> anything developed for Linux in the last decade or two). >> >> Does that help? I tried to keep the explanation short and focused :) > > Ah, I think I see what you mean. There's a slight ambiguity in the > word "missing" here. > > 1) The Unicode character in the result lacks some of the information > it should have > > 2) The Unicode character in the file name is information that has now been lost. > > My reading was the first, but AIUI you actually meant the second. If > so, I'd be inclined to reword it very slightly, eg: > > "The Unicode character in the second call to glob is now lost information." > > Is that a correct interpretation? I think so, though I find the wording a little awkward (and on rereading, my original wording was pretty bad). How about: "The second call to glob has replaced the Unicode character with '?', which means the actual filename cannot be recovered and the path is no longer valid." Cheers, STeve From rosuav at gmail.com Thu Aug 18 12:17:29 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 19 Aug 2016 02:17:29 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On Fri, Aug 19, 2016 at 1:05 AM, Philipp A. wrote: > the embedded expressions are just normal python. the embedded strings just > normal strings. you can simply switch between both using <{> and > <[format]}>. > The trouble with that way of thinking is that, to a human, the braces contain something. They don't "uncontain" it. Those braced expressions are still part of a string; they just have this bit of magic that gets them evaluated. Consider this: >>> "This is a number: {:0\u07c4}".format(13) 'This is a number: 0013' Format codes are just text, so I should be able to use Unicode escapes. Okay. Now let's make that an F-string. >>> f"This is a number: {13:0\u07c4}" 'This is a number: 0013' Format codes are still just text. So you'd have to say that the rules of text stop at an unbracketed colon, which is a pretty complicated rule to follow. The only difference between .format and f-strings is that the bit before the colon is the actual expression, rather than a placeholder that drags the value in from the format arguments. In human terms, that's not all that significant. IMO it doesn't matter that much either way - people will have to figure stuff out anyway. I like the idea that everything in the quotes is a string (and then parts of it get magically evaluated), but could live with there being some non-stringy parts in it. My suspicion is that what's easiest to code (ie easiest for the CPython parser) is also going to be easiest for all or most other tools (eg syntax highlighters). ChrisA From rosuav at gmail.com Thu Aug 18 12:18:25 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 19 Aug 2016 02:18:25 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <95b65bad-c9fa-daa0-08d1-fd2bdcec0376@python.org> References: <57AB6E2D.6050704@python.org> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> <80f137eb-8452-d35e-be3e-724ca62d3da1@python.org> <95b65bad-c9fa-daa0-08d1-fd2bdcec0376@python.org> Message-ID: On Fri, Aug 19, 2016 at 2:07 AM, Steve Dower wrote: > I think so, though I find the wording a little awkward (and on rereading, my > original wording was pretty bad). How about: > > "The second call to glob has replaced the Unicode character with '?', which > means the actual filename cannot be recovered and the path is no longer > valid." I like that. Very clear and precise, without losing too much concision. Thank you for explaining, as Cameron Baum often says. ChrisA From random832 at fastmail.com Thu Aug 18 12:26:26 2016 From: random832 at fastmail.com (Random832) Date: Thu, 18 Aug 2016 12:26:26 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: <1471537586.2061186.699270681.2DC6B8BE@webmail.messagingengine.com> On Thu, Aug 18, 2016, at 12:17, Chris Angelico wrote: > The trouble with that way of thinking is that, to a human, the braces > contain something. They don't "uncontain" it. Those braced expressions > are still part of a string; they just have this bit of magic that gets > them evaluated. Consider this: There's a precedent. "$()" works this way in bash - call it a recursive parser context or whatever you like, but the point is that "$(command "argument with spaces")" works fine, and humans don't seem to have any trouble with it. Really it all comes down to what exactly the "bit of magic" is and how magical it is. > IMO it doesn't matter that much either way - people will have to > figure stuff out anyway. I like the idea that everything in the quotes > is a string (and then parts of it get magically evaluated), but could > live with there being some non-stringy parts in it. My suspicion is > that what's easiest to code (ie easiest for the CPython parser) is > also going to be easiest for all or most other tools (eg syntax > highlighters). Except the parser has to actually parse string literals into what string they represent (so it can apply a further transformation to the result). Syntax highlighters generally don't. From eryksun at gmail.com Thu Aug 18 12:39:45 2016 From: eryksun at gmail.com (eryk sun) Date: Thu, 18 Aug 2016 16:39:45 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <95b65bad-c9fa-daa0-08d1-fd2bdcec0376@python.org> References: <57AB6E2D.6050704@python.org> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> <80f137eb-8452-d35e-be3e-724ca62d3da1@python.org> <95b65bad-c9fa-daa0-08d1-fd2bdcec0376@python.org> Message-ID: On Thu, Aug 18, 2016 at 4:07 PM, Steve Dower wrote: > On 18Aug2016 0900, Chris Angelico wrote: >> >> On Fri, Aug 19, 2016 at 1:54 AM, Steve Dower >> wrote: >>> >>> On 18Aug2016 0829, Chris Angelico wrote: >>>> >>>> >>>> The second call to glob doesn't have any Unicode characters at all, >>>> the way I see it - it's all bytes. Am I completely misunderstanding >>>> this? >>> >>> >>> >>> You're not the only one - I think this has been the most common >>> misunderstanding. >>> >>> On Windows, the paths as stored in the filesystem are actually all text - >>> more precisely, utf-16-le encoded bytes, represented as 16-bit characters >>> strings. >>> >>> Converting to an 8-bit character representation only exists for >>> compatibility with code written for other platforms (either Linux, or >>> much >>> older versions of Windows). The operating system has one way to do the >>> conversion to bytes, which Python currently uses, but since we control >>> that >>> transformation I'm proposing an alternative conversion that is more >>> reliable >>> than compatible (with Windows 3.1... shouldn't affect compatibility with >>> code that properly handles multibyte encodings, which should include >>> anything developed for Linux in the last decade or two). >>> >>> Does that help? I tried to keep the explanation short and focused :) >> >> >> Ah, I think I see what you mean. There's a slight ambiguity in the >> word "missing" here. >> >> 1) The Unicode character in the result lacks some of the information >> it should have >> >> 2) The Unicode character in the file name is information that has now been >> lost. >> >> My reading was the first, but AIUI you actually meant the second. If >> so, I'd be inclined to reword it very slightly, eg: >> >> "The Unicode character in the second call to glob is now lost >> information." >> >> Is that a correct interpretation? > > > I think so, though I find the wording a little awkward (and on rereading, my > original wording was pretty bad). How about: > > "The second call to glob has replaced the Unicode character with '?', which > means the actual filename cannot be recovered and the path is no longer > valid." They're all just characters in the context of Unicode, so I think it's clearest to use the character code, e.g.: The second call to glob has replaced the U+AB00 character with '?', which means ... From rosuav at gmail.com Thu Aug 18 12:44:25 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 19 Aug 2016 02:44:25 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> <80f137eb-8452-d35e-be3e-724ca62d3da1@python.org> <95b65bad-c9fa-daa0-08d1-fd2bdcec0376@python.org> Message-ID: On Fri, Aug 19, 2016 at 2:39 AM, eryk sun wrote: > They're all just characters in the context of Unicode, so I think it's > clearest to use the character code, e.g.: > > The second call to glob has replaced the U+AB00 character with '?', > which means ... Technically the character has been replaced with the byte value 63, although at this point, we're getting into dangerous areas of bytes being interpreted in one way or another. ChrisA From steve.dower at python.org Thu Aug 18 12:50:11 2016 From: steve.dower at python.org (Steve Dower) Date: Thu, 18 Aug 2016 09:50:11 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: <39147140-e9cc-313f-f213-063f5143c122@python.org> I'm generally inclined to agree, especially as someone who is very likely to be implementing syntax highlighting and completion support within f-literals. I stepped out of the original discussion near the start as it looked like we were going to end up with interleaved strings and normal expressions, but if that's not the case then it is going to make it very difficult to provide a nice coding experience for them. On 18Aug2016 0805, Philipp A. wrote: > My poposal is to redo their grammar: > They shouldn?t be parsed as strings and post-processed, but be their own > thing. This also opens the door to potentially extend to with something > like JavaScript?s tagged templates) > > Without the limitations of the string tokenization code/rules, only the > string parts would have escape sequences, and the expression parts would > be regular python code (?holes? in the literal). This is where I thought we'd end up - the '{' character (unless escaped by, e.g. \N, which addresses a concern below) would terminate the string literal and start an expression, which may be followed by a ':' and a format code literal. The '}' character would open the next string literal, and this continues until the closing quote. > They are still strings, there is just post-processing on the string > itself to do the interpolation. > > > Sounds hacky to me. I?d rather see a proper parser for them I believe the proper parser is already used, but the issue is that escapes have already been dealt with. Of course, it shouldn't be too difficult for the tokenizer to recognize {} quoted expressions within an f-literal and not modify escapes. There are multiple ways to handle this. > Or another reason is you can explain f-strings as "basically > str.format_map(**locals(), **globals()), but without having to make > the actual method call" (and worrying about clashing keys but I > couldn't think of a way of using dict.update() in a single line). > But with your desired change it kills this explanation by saying > f-strings aren't like this but some magical string that does all of > this stuff before normal string normalization occurs. > > > no, it?s simply the expression parts (that for normal formatting are > inside of the braces of .format(...)) are *interleaved* in between > string parts. they?re not part of the string. just regular plain python > code. Agreed. The .format_map() analogy breaks down very quickly when you consider f-literals like: >>> f'a { \'b\' }' 'a b' If the contents of the braces were simply keys in the namespace then we wouldn't be able to put string literals in there. But because it is an arbitrary expression, if we want to put string literals in the f-literal (bearing in mind that we may be writing something more like f'{x.partition(\'-\')[0]}'), the escaping rules become very messy very quickly. I don't think f'{x.partition('-')[0]}' is any less readable as a result of the reused quotes, and it will certainly be easier for highlighters to handle (assuming they're doing anything more complicated than simply displaying the entire expression in a different colour). So I too would like to see escapes made unnecessary within the expression part of a f-literal. Possibly if we put together a simple enough patch for the tokenizer it will be accepted? Cheers, Steve From steve.dower at python.org Thu Aug 18 13:15:11 2016 From: steve.dower at python.org (Steve Dower) Date: Thu, 18 Aug 2016 10:15:11 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <39147140-e9cc-313f-f213-063f5143c122@python.org> References: <39147140-e9cc-313f-f213-063f5143c122@python.org> Message-ID: <1a77d6ae-0d7e-dc03-b2d9-5445c2439f30@python.org> On 18Aug2016 0950, Steve Dower wrote: > I'm generally inclined to agree, especially as someone who is very > likely to be implementing syntax highlighting and completion support > within f-literals. I also really don't like the subject line. "Do not require string escapes within expressions in f-literals" more accurately represents the topic and the suggestion. "Let's make impossible" is just asking for a highly emotionally-charged discussion, which is best avoided in basically all circumstances, especially for less-frequent contributors to a community, and extra-especially when you haven't met most of the other contributors in person. Cheers, Steve From python at mrabarnett.plus.com Thu Aug 18 13:18:58 2016 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 18 Aug 2016 18:18:58 +0100 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> Message-ID: <2647db1c-c4d5-a95b-2f2e-2ff3999cd95d@mrabarnett.plus.com> On 2016-08-16 16:56, Steve Dower wrote: > I just want to clearly address two points, since I feel like multiple > posts have been unclear on them. > > 1. The bytes API was deprecated in 3.3 and it is listed in > https://docs.python.org/3/whatsnew/3.3.html. Lack of mention in the docs > is an unfortunate oversight, but it was certainly announced and the > warning has been there for three released versions. We can freely change > or remove the support now, IMHO. > > 2. Windows file system encoding is *always* UTF-16. There's no "assuming > mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what > encoding it is". We know exactly what the encoding is on every supported > version of Windows. UTF-16. > > This discussion is for the developers who insist on using bytes for > paths within Python, and the question is, "how do we best represent > UTF-16 encoded paths in bytes?" > > The choices are: > > * don't represent them at all (remove bytes API) > * convert and drop characters not in the (legacy) active code page > * convert and fail on characters not in the (legacy) active code page > * convert and fail on invalid surrogate pairs > * represent them as UTF-16-LE in bytes (with embedded '\0' everywhere) > > Currently we have the second option. > > My preference is the fourth option, as it will cause the least breakage > of existing code and enable the most amount of code to just work in the > presence of non-ACP characters. > > The fifth option is the best for round-tripping within Windows APIs. > > The only code that will break with any change is code that was using an > already deprecated API. Code that correctly uses str to represent > "encoding agnostic text" is unaffected. > > If you see an alternative choice to those listed above, feel free to > contribute it. Otherwise, can we focus the discussion on these (or any > new) choices? > Could we use still call it 'mbcs', but use 'surrogateescape'? From random832 at fastmail.com Thu Aug 18 13:24:15 2016 From: random832 at fastmail.com (Random832) Date: Thu, 18 Aug 2016 13:24:15 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <2647db1c-c4d5-a95b-2f2e-2ff3999cd95d@mrabarnett.plus.com> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <2647db1c-c4d5-a95b-2f2e-2ff3999cd95d@mrabarnett.plus.com> Message-ID: <1471541055.2074437.699332881.291B2581@webmail.messagingengine.com> On Thu, Aug 18, 2016, at 13:18, MRAB wrote: > > If you see an alternative choice to those listed above, feel free to > > contribute it. Otherwise, can we focus the discussion on these (or any > > new) choices? > > > Could we use still call it 'mbcs', but use 'surrogateescape'? Er, this discussion is about converting *from* unicode (including arbitrary but usually valid characters) *to* bytes. From steve.dower at python.org Thu Aug 18 13:31:43 2016 From: steve.dower at python.org (Steve Dower) Date: Thu, 18 Aug 2016 10:31:43 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: <2647db1c-c4d5-a95b-2f2e-2ff3999cd95d@mrabarnett.plus.com> References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <2647db1c-c4d5-a95b-2f2e-2ff3999cd95d@mrabarnett.plus.com> Message-ID: On 18Aug2016 1018, MRAB wrote: > Could we use still call it 'mbcs', but use 'surrogateescape'? surrogateescape is used for escaping undecodable values when you want to represent arbitrary bytes in Unicode. It's the wrong direction for this situation - we are starting with valid Unicode and encoding to bytes (for the convenience of the Python developer who wants to use bytes everywhere). Bytes correctly encoded under mbcs can always be correctly decoded to Unicode ('correctly' implies that they were encoded with the same configuration as the machine doing the decoding - mbcs changes from machine to machine). So there's nothing to escape from mbcs->Unicode, and we don't control the definition of Unicode->mbcs well enough to be able to invent an escaping scheme while remaining compatible with the operating system's interpretation of mbcs (CP_ACP). (One way to look at the utf-8 proposal is saying "we will escape arbitrary Unicode characters within Python bytes strings and decode them at the Python-OS boundary". The main concern about this is the backwards compatibility issues around people taking arbitrarily encoded bytes and sharing them without including the encoding. Previously that would work on a subset of machines without Unicode support, but this change would only make it work within Python 3.6 and later. Hence the discussion about whether this whole thing was deprecated already or not.) Cheers, Steve From tjreedy at udel.edu Thu Aug 18 13:36:09 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 18 Aug 2016 13:36:09 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> Message-ID: On 8/18/2016 11:25 AM, Steve Dower wrote: > In this case, we would announce in 3.6 that using bytes as paths on > Windows is no longer deprecated, My understanding is the the first 2 fixes refine the deprecation rather than reversing it. And #3 simply applies it. -- Terry Jan Reedy From brett at python.org Thu Aug 18 13:38:07 2016 From: brett at python.org (Brett Cannon) Date: Thu, 18 Aug 2016 17:38:07 +0000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On Thu, 18 Aug 2016 at 08:32 Philipp A. wrote: > [SNIP] > Brett Cannon schrieb am Mi., 17. Aug. 2016 um > 20:28 Uhr: > >> They are still strings, there is just post-processing on the string >> itself to do the interpolation. >> > > Sounds hacky to me. I?d rather see a proper parser for them, which of > course would make my vision easy. > You say "hacky", I say "pragmatic". And Python's code base is actually rather top-notch and so it isn't bad code, but simply a design decision you are disagreeing with. Please remember that you're essentially asking people to spend their personal time to remove working code and re-implement something that you have not volunteered to actually code up yourself. Don't forget that none of us get paid to work on Python full-time; a lucky couple of us get to spend one day a week on Python and we all take time away from our family to work on things when we can. Insulting someone's hard work that they did for free to try and improve Python is not going to motivate people to want to help out with this idea. And considering Eric Smith who originally implemented all of this is possibly the person in the best position to implement your idea just had his work called "hacky" by you is not really a great motivator for him. IOW you really need to be mindful of the tone of your emails (as does anyone else who ever asks for something to change while not being willing to put in the time and effort to actually produce the code to facilitate the change). You have now had both Steve and me point out your tone and so you're quickly approaching a threshold where people will stop pointing this out and simply ignore your emails, so please be mindful of how you phrase things. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Thu Aug 18 13:39:18 2016 From: steve.dower at python.org (Steve Dower) Date: Thu, 18 Aug 2016 10:39:18 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> Message-ID: On 18Aug2016 1036, Terry Reedy wrote: > On 8/18/2016 11:25 AM, Steve Dower wrote: > >> In this case, we would announce in 3.6 that using bytes as paths on >> Windows is no longer deprecated, > > My understanding is the the first 2 fixes refine the deprecation rather > than reversing it. And #3 simply applies it. #3 certainly just applies the deprecation. As for the first two, I don't see any reason to deprecate the functionality once the issues are resolved. If using utf-8 encoded bytes is going to work fine in all the same cases as using str, why discourage it? From eryksun at gmail.com Thu Aug 18 15:12:08 2016 From: eryksun at gmail.com (eryk sun) Date: Thu, 18 Aug 2016 19:12:08 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> <80f137eb-8452-d35e-be3e-724ca62d3da1@python.org> <95b65bad-c9fa-daa0-08d1-fd2bdcec0376@python.org> Message-ID: On Thu, Aug 18, 2016 at 4:44 PM, Chris Angelico wrote: > On Fri, Aug 19, 2016 at 2:39 AM, eryk sun wrote: >> They're all just characters in the context of Unicode, so I think it's >> clearest to use the character code, e.g.: >> >> The second call to glob has replaced the U+AB00 character with '?', >> which means ... > > Technically the character has been replaced with the byte value 63, > although at this point, we're getting into dangerous areas of bytes > being interpreted in one way or another. Windows NLS codepages are all supersets of ASCII (no EBCDIC to worry about), and the default character when encoding is always b"?". The default Unicode character when decoding is also almost always "?", except Japanese uses U+30FB. From tjreedy at udel.edu Thu Aug 18 15:15:05 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 18 Aug 2016 15:15:05 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <39147140-e9cc-313f-f213-063f5143c122@python.org> References: <39147140-e9cc-313f-f213-063f5143c122@python.org> Message-ID: On 8/18/2016 12:50 PM, Steve Dower wrote: > I'm generally inclined to agree, especially as someone who is very > likely to be implementing syntax highlighting and completion support > within f-literals. I consider these separate issues. IDLE currently provides filename completion support within strings while still highlighting the string one color. Even if it were enhanced to do name completion within an f-string, I am not sure I would want to put a mixture of colors within the string rather than leave if all one color. > I stepped out of the original discussion near the start as it looked > like we were going to end up with interleaved strings and normal > expressions, but if that's not the case then it is going to make it > very difficult to provide a nice coding experience for them. This is the crux of this thread. Is an f-string a single string that contains magically handled code, or interleaved strings using { and } as closing and opening quotes (which is backwards from their normal function of being opener and closer) and expressions? The latter view makes the grammar context sensitive, I believe, as } could only open a string if there is a previous f-tagged string an indefinite number of alternations back. It is not uncommon to write strings that consist completely of code. "for i in iterable: a.append(f(i))" to be written out or eval()ed or exec()ed. Does your environment have a mode to provide syntax highlighting and completion support for such things? What I think would be more useful would be the ability to syntax check such code strings while editing. A python-coded editor could just pass the extracted string to compile(). > I don't think f'{x.partition('-')[0]}' is any less readable as a result > of the reused quotes, I find it hard to not read f'{x.partition(' + ')[0]}' as string concatenation. > and it will certainly be easier for highlighters > to handle (assuming they're doing anything more complicated than simply > displaying the entire expression in a different colour). Without the escapes, existing f-unaware highlighters like IDLE's will be broken in that they will highlight the single f-string as two strings with differently highlighted content in the middle. For f'{x.partition('if')[0]}', the 'if' is and will be erroneously highlighted as a keyword. I consider this breakage unacceptible. -- Terry Jan Reedy From random832 at fastmail.com Thu Aug 18 15:26:54 2016 From: random832 at fastmail.com (Random832) Date: Thu, 18 Aug 2016 15:26:54 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <39147140-e9cc-313f-f213-063f5143c122@python.org> Message-ID: <1471548414.3030016.699443441.135E65BF@webmail.messagingengine.com> On Thu, Aug 18, 2016, at 15:15, Terry Reedy wrote: > This is the crux of this thread. Is an f-string a single string that > contains magically handled code, or interleaved strings using { and } as > closing and opening quotes (which is backwards from their normal > function of being opener and closer) I'd rather conceptualize it as a sequence of two* kinds of thing: literal character sequences [as sequences of characters other than {] and expressions [started with {, and ended with a } that is not otherwise part of the expression] rather than treating { as a closing quote. In particular, treating } as an opening quote doesn't really work, since expressions can contain both strings (which may contain an unbalanced }) and dictionary/set literals (which contain balanced }'s which are not in quotes) - what ends the expression is a } at the top level. *or three, considering that escapes are used in the non-expression parts. > and expressions? The latter view > makes the grammar context sensitive, I believe, as } could only open a > string if there is a previous f-tagged string an indefinite number of > alternations back. } at the top level is otherwise a syntax error. I don't know enough about the theoretical constructs involved to know if this makes it formally 'context sensitive' or not - I don't know that it's any more context sensitive than ) being valid if there is a matching (. Honestly, I'd be more worried about : than }. From steve.dower at python.org Thu Aug 18 15:30:49 2016 From: steve.dower at python.org (Steve Dower) Date: Thu, 18 Aug 2016 12:30:49 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <39147140-e9cc-313f-f213-063f5143c122@python.org> Message-ID: On 18Aug2016 1215, Terry Reedy wrote: > On 8/18/2016 12:50 PM, Steve Dower wrote: >> I don't think f'{x.partition('-')[0]}' is any less readable as a result >> of the reused quotes, > > I find it hard to not read f'{x.partition(' + ')[0]}' as string > concatenation. That's a fair counter-example. Though f'{x.partition(\' + \')[0]}' still reads like string concatenation to me at first glance. YMMV. >> and it will certainly be easier for highlighters >> to handle (assuming they're doing anything more complicated than simply >> displaying the entire expression in a different colour). > > Without the escapes, existing f-unaware highlighters like IDLE's will be > broken in that they will highlight the single f-string as two strings > with differently highlighted content in the middle. For > f'{x.partition('if')[0]}', the 'if' is and will be erroneously > highlighted as a keyword. I consider this breakage unacceptible. Won't it be broken anyway because of the new prefix? I'm sure there's a fairly straightforward way for a regex to say that a closing quote must not be preceded immediately by a backslash or by an open brace at all without a closing brace in between. Not having escapes within the expression makes it harder for everyone except the Python developer, in my opinion, and the rest of us ought to go out of our way for them. Cheers, Steve From moloney at ohsu.edu Thu Aug 18 15:40:59 2016 From: moloney at ohsu.edu (Brendan Moloney) Date: Thu, 18 Aug 2016 19:40:59 +0000 Subject: [Python-ideas] Allow manual creation of DirEntry objects In-Reply-To: References: <5F6A858FD00E5F4A82E3206D2D854EF8B54E2368@EXMB09.ohsu.edu> , Message-ID: <5F6A858FD00E5F4A82E3206D2D854EF8B54E26A6@EXMB09.ohsu.edu> Thanks, opened an issue here: http://bugs.python.org/issue27796 -Brendan ________________________________ From: gvanrossum at gmail.com [gvanrossum at gmail.com] on behalf of Guido van Rossum [guido at python.org] Sent: Wednesday, August 17, 2016 7:20 AM To: Nick Coghlan; Brendan Moloney Cc: Victor Stinner; python-ideas at python.org Subject: Re: [Python-ideas] Allow manual creation of DirEntry objects Brendan, The conclusion is that you should just file a bug asking for a working constructor -- or upload a patch if you want to. --Guido On Wed, Aug 17, 2016 at 12:18 AM, Nick Coghlan > wrote: On 17 August 2016 at 09:56, Victor Stinner > wrote: > 2016-08-17 1:50 GMT+02:00 Guido van Rossum >: >> We could expose the class with a >> constructor that always fails (the C code could construct instances through >> a backdoor). > > Oh, in fact you cannot create an instance of os.DirEntry, it has no > (Python) constructor: > > $ ./python > Python 3.6.0a4+ (default:e615718a6455+, Aug 17 2016, 00:12:17) >>>> import os >>>> os.DirEntry(1) > Traceback (most recent call last): > File "", line 1, in > TypeError: cannot create 'posix.DirEntry' instances > > Only os.scandir() can produce such objects. > > The question is still if it makes sense to allow to create DirEntry > objects in Python :-) I think it does, as it isn't really any different from someone calling the stat() method on a DirEntry instance created by os.scandir(). It also prevents folks attempting things like: def slow_constructor(dirname, entryname): for entry in os.scandir(dirname): if entry.name == entryname: entry.stat() return entry Allowing DirEntry construction from Python further gives us a straightforward answer to the "stat caching" question: "just use os.DirEntry instances and call stat() to make the snapshot" If folks ask why os.DirEntry caches results when pathlib.Path doesn't, we have the answer that cache invalidation is a hard problem, and hence we consider it useful in the lower level interface that is optimised for speed, but problematic in the higher level one that is more focused on cross-platform correctness of filesystem interactions. I don't know whether it would make sense to allow a pre-existing stat result to be based to DirEntry, but it does seem like it might be useful for adapting existing stat-based backend APIs to a more user friendly DirEntry based front end API. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Aug 18 18:05:11 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 18 Aug 2016 15:05:11 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> <22453.8366.592660.590478@turnbull.sk.tsukuba.ac.jp> Message-ID: On Thu, Aug 18, 2016 at 6:23 AM, Steve Dower wrote: > "You consistently ignore Makefiles, .ini, etc." > > Do people really do open('makefile', 'rb'), extract filenames and try to > use them without ever decoding the file contents? > I'm sure they do :-( But this has always confused me - back in the python2 "good old days" text and binary mode were exactly the same on *nix -- so folks sometimes fell into the trap of opening binary files as text on *nix, and then it failing on Windows but I can't image why anyone would have done the opposite. So in porting to py3, they would have had to *add* that 'b' (and a bunch of b'filename') to keep the good old bytes is text interface. Why would anyone do that? Honestly confused. I've honestly never seen that, and it certainly looks like the sort of > thing Python 3 was intended to discourage. > exactly -- we really don't need to support folks reading text files in binary mode and not considering encoding... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Aug 18 20:11:58 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 19 Aug 2016 12:11:58 +1200 Subject: [Python-ideas] =?windows-1252?q?Let=92s_make_escaping_in_f-liter?= =?windows-1252?q?als_impossible?= In-Reply-To: References: Message-ID: <57B64ECE.40109@canterbury.ac.nz> Chris Angelico wrote: >>>>f"This is a number: {13:0\u07c4}" If I understand correctly, the proposal intends to make it easier for a syntax hightlighter to treat f"This is a number: {foo[42]:0\u07c4}" as f"This is a number: { foo[42] :0\u07c4}" --------------------- ------- ---------- highlight as string hightlight highlight as string as code I'm not sure an RE-based syntax hightlighter would have any easier a time with that, because for the second part it would need to recognise ':' as starting a string, but only if it followed some stuff that was preceded by the beginning of an f-string. I'm not very familiar with syntax higlighters, so I don't know if they're typically smart enought to cope with things like that. -- Greg From steve at pearwood.info Thu Aug 18 20:18:30 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 19 Aug 2016 10:18:30 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: <20160819001830.GW26300@ando.pearwood.info> On Fri, Aug 19, 2016 at 02:17:29AM +1000, Chris Angelico wrote: > Format codes are just text, I really think that is wrong. They're more like executable code. https://www.python.org/dev/peps/pep-0498/#expression-evaluation "Just text" implies it is data: result = "function(arg)" like the string on the right hand side of the = is data. You wouldn't say that a function call was data (although it may *return* data): result = function(arg) or that it was "just text", and you shouldn't say the same about: result = f"{function(arg)}" either since they are functionally equivalent. Format codes are "just text" only in the sense that source code is "just text". Its technically correct and horribly misleading. > so I should be able to use Unicode > escapes. Okay. Now let's make that an F-string. > > >>> f"This is a number: {13:0\u07c4}" > 'This is a number: 0013' If your aim is to write obfuscated code, then, yes, you should be able to write something like that. *wink* I seem to recall that Java allows string escapes in ordinary expressions, so that instead of writing: result = function(arg) you could write: result = \x66\x75\x6e\x63\x74\x69\x6f\x6e\x28\x61\x72\x67\x29 instead. We can't, and shouldn't, allow anything like this in Python code. Should we allow it inside f-strings? result = f"{\x66\x75\x6e\x63\x74\x69\x6f\x6e\x28\x61\x72\x67\x29}" -- Steve From steve at pearwood.info Thu Aug 18 20:21:16 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 19 Aug 2016 10:21:16 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <1471537586.2061186.699270681.2DC6B8BE@webmail.messagingengine.com> References: <1471537586.2061186.699270681.2DC6B8BE@webmail.messagingengine.com> Message-ID: <20160819002116.GX26300@ando.pearwood.info> On Thu, Aug 18, 2016 at 12:26:26PM -0400, Random832 wrote: > There's a precedent. "$()" works this way in bash - call it a recursive > parser context or whatever you like, but the point is that "$(command > "argument with spaces")" works fine, and humans don't seem to have any > trouble with it. This is the first time I've ever seen anyone claim that humans don't have any trouble with bash escaping and evaluation rules. -- Steve From eric at trueblade.com Thu Aug 18 20:27:50 2016 From: eric at trueblade.com (Eric V. Smith) Date: Thu, 18 Aug 2016 20:27:50 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <39147140-e9cc-313f-f213-063f5143c122@python.org> Message-ID: <24c024c0-5c4d-45ce-13fd-571e13e1b486@trueblade.com> On 8/18/2016 3:15 PM, Terry Reedy wrote: > On 8/18/2016 12:50 PM, Steve Dower wrote: > I find it hard to not read f'{x.partition(' + ')[0]}' as string > concatenation. > >> and it will certainly be easier for highlighters >> to handle (assuming they're doing anything more complicated than simply >> displaying the entire expression in a different colour). > > Without the escapes, existing f-unaware highlighters like IDLE's will be > broken in that they will highlight the single f-string as two strings > with differently highlighted content in the middle. For > f'{x.partition('if')[0]}', the 'if' is and will be erroneously > highlighted as a keyword. I consider this breakage unacceptible. Right. Because all strings (regardless of prefixes) are first parsed as strings, and then have their prefix "operator" applied, it's easy for a parser to ignore any sting prefix character. So something that parses or scans a Python file and currently understands u, b, and r to be string prefixes, just needs to add f to the prefixes it uses, and it can now at least understand f-strings (and fr-strings). It doesn't need to implement a full-blown expression parser just to find out where the end of a f-string is. Eric. From rosuav at gmail.com Thu Aug 18 20:37:02 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 19 Aug 2016 10:37:02 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <20160819001830.GW26300@ando.pearwood.info> References: <20160819001830.GW26300@ando.pearwood.info> Message-ID: On Fri, Aug 19, 2016 at 10:18 AM, Steven D'Aprano wrote: > On Fri, Aug 19, 2016 at 02:17:29AM +1000, Chris Angelico wrote: > >> Format codes are just text, > > I really think that is wrong. They're more like executable code. > > https://www.python.org/dev/peps/pep-0498/#expression-evaluation > > "Just text" implies it is data: > > result = "function(arg)" > > like the string on the right hand side of the = is data. You wouldn't > say that a function call was data (although it may *return* data): > > result = function(arg) > > or that it was "just text", and you shouldn't say the same about: > > result = f"{function(arg)}" > > either since they are functionally equivalent. Format codes are "just > text" only in the sense that source code is "just text". Its technically > correct and horribly misleading. > By "format code", I'm talking about the bit after the colon, which isn't executable code, but is a directive that says how the result is to be formatted. These have existed since str.format() was introduced, and have always been text, not code. ChrisA From tjreedy at udel.edu Fri Aug 19 01:07:58 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 19 Aug 2016 01:07:58 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <39147140-e9cc-313f-f213-063f5143c122@python.org> Message-ID: On 8/18/2016 3:30 PM, Steve Dower wrote: > On 18Aug2016 1215, Terry Reedy wrote: >> On 8/18/2016 12:50 PM, Steve Dower wrote: >>> I don't think f'{x.partition('-')[0]}' is any less readable as a result >>> of the reused quotes, Why are you reusing the single quote', which needs the escaping that you don't like, instead of any of at least 6 alternatives that do not need any escaping? f'{x.partition("-")[0]}' f'{x.partition("""-""")[0]}' f"{x.partition('-')[0]}" f'''{x.partition('-')[0]}''' f"""{x.partition('-')[0]}""" f"""{x.partition('''-''')[0]}""" It seems to me that that this is at least somewhat a strawman issue. If you want to prohibit backslashed quote reuse in expressions, as in f'{x.partition(\'-\')[0]}', that is okay with me, as this is unnecessary* and arguably bad. The third alternative above is better. What breaks colorizers, and what I therefore object to, is the innovation of adding magical escaping of ' or " without \. Or add a new style rule to PEP 8. F-strings: avoid unnecessary escaping in the expression part of f-strings. Good: f"{x.partition('-')[0]}" Bad: f'{x.partition(\'-\')[0]}' Then PEP-8 checkers will flag such usage. *I am sure that there are possible complex expressions that would be prohibited by the rule that would be otherwise possible. But they should be extremely rare and possibly not the best solution anyway. >> I find it hard to not read f'{x.partition(' + ')[0]}' as string >> concatenation. > That's a fair counter-example. Though f'{x.partition(\' + \')[0]}' still > reads like string concatenation to me at first glance. YMMV. When the outer and inner quotes are no longer the same, the effect is greatly diminished if not eliminated. >>> and it will certainly be easier for highlighters >>> to handle (assuming they're doing anything more complicated than simply >>> displaying the entire expression in a different colour). >> >> Without the escapes, existing f-unaware highlighters like IDLE's will be >> broken in that they will highlight the single f-string as two strings >> with differently highlighted content in the middle. For >> f'{x.partition('if')[0]}', the 'if' is and will be erroneously >> highlighted as a keyword. I consider this breakage unacceptible. > > Won't it be broken anyway because of the new prefix? No. IDLE currently handles f-strings just fine other than not coloring the 'f'. This is a minor issue and easily fixed by adding '|f' and if allowed, '|F' at the end of the current stringprefix re. > I'm sure there's a fairly straightforward way for a regex to say that a > closing quote must not be preceded immediately by a backslash or by an > open brace at all without a closing brace in between. I do not know that this is possible. Here is IDLE's current re for an unprefixed single quoted string. r"'[^'\\\n]*(\\.[^'\\\n]*)*'?" The close quote is optional because it must match a string that is in the process of being typed and is not yet closed. I consider providing a tested augmented re to be required for this proposal. Even then, making the parsing out of strings in Python code for colorizing version dependent is a problem in itself for colorizers not tied to a particular x.y version. Leaving prefixes aside, I can't remember string delimiter syntax changing since I learned it in 1.3. > Not having escapes within the expression makes it harder for everyone > except the Python developer, in my opinion, and the rest of us ought to > go out of our way for them. I am not sure that this says what you mean. -- Terry Jan Reedy From tjreedy at udel.edu Fri Aug 19 01:16:25 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 19 Aug 2016 01:16:25 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <24c024c0-5c4d-45ce-13fd-571e13e1b486@trueblade.com> References: <39147140-e9cc-313f-f213-063f5143c122@python.org> <24c024c0-5c4d-45ce-13fd-571e13e1b486@trueblade.com> Message-ID: On 8/18/2016 8:27 PM, Eric V. Smith wrote: > On 8/18/2016 3:15 PM, Terry Reedy wrote: >> Without the escapes, existing f-unaware highlighters like IDLE's will be >> broken in that they will highlight the single f-string as two strings >> with differently highlighted content in the middle. For >> f'{x.partition('if')[0]}', the 'if' is and will be erroneously >> highlighted as a keyword. I consider this breakage unacceptible. > > Right. Because all strings (regardless of prefixes) are first parsed as > strings, and then have their prefix "operator" applied, it's easy for a > parser to ignore any sting prefix character. > > So something that parses or scans a Python file and currently > understands u, b, and r to be string prefixes, just needs to add f to > the prefixes it uses, and it can now at least understand f-strings (and > fr-strings). It doesn't need to implement a full-blown expression parser > just to find out where the end of a f-string is. Indeed, IDLE has one prefix re, which has changed occasionally and which I need to change for 3.6, and 4 res for the 4 unprefixed strings, which have been the same, AFAIK, for decades. It that prefixes all 4 string res with the prefix re and o or's the results together to get the 'string' re. -- Terry Jan Reedy From tjreedy at udel.edu Fri Aug 19 01:28:54 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 19 Aug 2016 01:28:54 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <20160819001830.GW26300@ando.pearwood.info> References: <20160819001830.GW26300@ando.pearwood.info> Message-ID: On 8/18/2016 8:18 PM, Steven D'Aprano wrote: > On Fri, Aug 19, 2016 at 02:17:29AM +1000, Chris Angelico wrote: > >> Format codes are just text, > > I really think that is wrong. They're more like executable code. > > https://www.python.org/dev/peps/pep-0498/#expression-evaluation I agree with you here. I just note that the strings passed to exec, eval, and compile are also executable code strings (and nothing but!). But I don't remember a suggestion that *they* should by colored as anything other than a string. However, this thread has suggested to me that perhaps there *should* be a way to syntax check such strings in the editor rather than waiting for the runtime call. -- Terry Jan Reedy From tjreedy at udel.edu Fri Aug 19 01:39:45 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 19 Aug 2016 01:39:45 -0400 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> Message-ID: On 8/18/2016 1:39 PM, Steve Dower wrote: > On 18Aug2016 1036, Terry Reedy wrote: >> On 8/18/2016 11:25 AM, Steve Dower wrote: >> >>> In this case, we would announce in 3.6 that using bytes as paths on >>> Windows is no longer deprecated, >> >> My understanding is the the first 2 fixes refine the deprecation rather >> than reversing it. And #3 simply applies it. > > #3 certainly just applies the deprecation. > > As for the first two, I don't see any reason to deprecate the > functionality once the issues are resolved. If using utf-8 encoded bytes > is going to work fine in all the same cases as using str, why discourage > it? As I understand it, you still proposing to remove the use of bytes encoded with anything other than utf-8 (and the corresponding *A internal functions) and in particular stop lossy path transformations. Am I wrong? -- Terry Jan Reedy From steve at pearwood.info Fri Aug 19 01:49:28 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 19 Aug 2016 15:49:28 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <24c024c0-5c4d-45ce-13fd-571e13e1b486@trueblade.com> References: <39147140-e9cc-313f-f213-063f5143c122@python.org> <24c024c0-5c4d-45ce-13fd-571e13e1b486@trueblade.com> Message-ID: <20160819054928.GY26300@ando.pearwood.info> On Thu, Aug 18, 2016 at 08:27:50PM -0400, Eric V. Smith wrote: > Right. Because all strings (regardless of prefixes) are first parsed as > strings, and then have their prefix "operator" applied, it's easy for a > parser to ignore any sting prefix character. Is that why raw strings can't end with a backspace? If so, that's the first time I've seen an explanation of that fact which makes sense! -- Steve From elazarg at gmail.com Fri Aug 19 02:07:11 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Fri, 19 Aug 2016 06:07:11 +0000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <20160819001830.GW26300@ando.pearwood.info> Message-ID: ?????? ??? ??, 19 ????' 2016, 08:29, ??? Terry Reedy ?: > On 8/18/2016 8:18 PM, Steven D'Aprano wrote: > > On Fri, Aug 19, 2016 at 02:17:29AM +1000, Chris Angelico wrote: > > > >> Format codes are just text, > > > > I really think that is wrong. They're more like executable code. > > > > https://www.python.org/dev/peps/pep-0498/#expression-evaluation > > I agree with you here. I just note that the strings passed to exec, > eval, and compile are also executable code strings (and nothing but!). > But I don't remember a suggestion that *they* should by colored as > anything other than a string. But these are objects of type str, not string literals. If they were, I guess someone would have suggested such coloring. ~Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Aug 19 02:57:50 2016 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 19 Aug 2016 09:57:50 +0300 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <39147140-e9cc-313f-f213-063f5143c122@python.org> Message-ID: On 19.08.16 08:07, Terry Reedy wrote: > On 8/18/2016 3:30 PM, Steve Dower wrote: >> On 18Aug2016 1215, Terry Reedy wrote: >>> On 8/18/2016 12:50 PM, Steve Dower wrote: >>>> I don't think f'{x.partition('-')[0]}' is any less readable as a result >>>> of the reused quotes, > > Why are you reusing the single quote', which needs the escaping that you > don't like, instead of any of at least 6 alternatives that do not need > any escaping? > > f'{x.partition("-")[0]}' > f'{x.partition("""-""")[0]}' > f"{x.partition('-')[0]}" > f'''{x.partition('-')[0]}''' > f"""{x.partition('-')[0]}""" > f"""{x.partition('''-''')[0]}""" > > It seems to me that that this is at least somewhat a strawman issue. > > If you want to prohibit backslashed quote reuse in expressions, as in > f'{x.partition(\'-\')[0]}', that is okay with me, as this is > unnecessary* and arguably bad. The third alternative above is better. > What breaks colorizers, and what I therefore object to, is the > innovation of adding magical escaping of ' or " without \. > > Or add a new style rule to PEP 8. +1. It is even possible to add a SyntaxWarning in future. From ncoghlan at gmail.com Fri Aug 19 03:30:53 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 19 Aug 2016 17:30:53 +1000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> <22453.8366.592660.590478@turnbull.sk.tsukuba.ac.jp> Message-ID: On 19 August 2016 at 08:05, Chris Barker wrote: > On Thu, Aug 18, 2016 at 6:23 AM, Steve Dower wrote: >> >> "You consistently ignore Makefiles, .ini, etc." >> >> Do people really do open('makefile', 'rb'), extract filenames and try to >> use them without ever decoding the file contents? > > > I'm sure they do :-( > > But this has always confused me - back in the python2 "good old days" text > and binary mode were exactly the same on *nix -- so folks sometimes fell > into the trap of opening binary files as text on *nix, and then it failing > on Windows but I can't image why anyone would have done the opposite. > > So in porting to py3, they would have had to *add* that 'b' (and a bunch of > b'filename') to keep the good old bytes is text interface. > > Why would anyone do that? For a fair amount of *nix-centric code that primarily works with ASCII data, adding the 'b' prefix is the easiest way to get into the common subset of Python 2 & 3. However, this means that such code is currently relying on deprecated functionality on Windows, and if we actually followed through on the deprecation with feature removal, Steve's expectation (which I agree with) is that many affected projects would just drop Windows support entirely, rather than changing their code to use str instead of bytes (at least under Python 3 on Windows). The end result of Steve's proposed changes should be that such code would typically do the right thing across all of Mac OS X, Linux and WIndows, as long as the latter two are configured to use "utf-8" as their default locale encoding or active code page (respectively). Linux and Windows would still both have situations encountered with ssh environment variable forwarding and with East Asian system configurations that have the potential to result in mojibake, where these challenges come up mainly with network communications on Linux, and local file processing on Windows. The reason I like Steve's proposal is that it gets us to a better baseline situation for cross-platform compatibility (including with the CLR and JVM API models), and replaces the status quo with three smaller as yet unsolved problems: - network protocol interoperability on Linux systems configured with a non UTF-8 locale - system access on Linux servers with a forwarded SSH environment that doesn't match the server settings - processing file contents on Windows systems with an active code page other than UTF-8 For Linux, our answer is basically "UTF-8 is really the only system locale that works properly for other reasons, so we'll essentially wait for non-UTF-8 Linux systems to slowly age out of humanity's collective IT infrastructure" For Windows, our preliminary answer is the same as the situation on Linux, which is why Stephen's concerned by the proposal - it reduces the incentive for folks to support Windows *properly*, by switching to modeling paths as text the way pathlib does. However, it seems to me that those higher level pathlib APIs are the best way to encourage future code to be more WIndows friendly - they sweep a lot of these messy low level concerns under the API rug, so more Python 3 native code will use str paths by default, with bytes paths mainly showing in Python 2/3 compatible code bases and some optimised data processing code. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From eryksun at gmail.com Fri Aug 19 03:48:34 2016 From: eryksun at gmail.com (eryk sun) Date: Fri, 19 Aug 2016 07:48:34 +0000 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> Message-ID: On Thu, Aug 18, 2016 at 3:25 PM, Steve Dower wrote: > allow us to change locale.getpreferredencoding() to utf-8 on Windows _bootlocale.getpreferredencoding would need to be hard coded to return 'utf-8' on Windows. _locale._getdefaultlocale() itself shouldn't return 'utf-8' as the encoding because the CRT doesn't allow it as a locale encoding. site.aliasmbcs() uses getpreferredencoding, so it will need to be modified. The codecs module could add get_acp and get_oemcp functions based on GetACP and GetOEMCP, returning for example 'cp1252' and 'cp850'. Then aliasmbcs could call get_acp. Adding get_oemcp would also help with decoding output from subprocess.Popen. There's been discussion about adding encoding and errors options to Popen, and what the default should be. When writing to a pipe or file, some programs use OEM, some use ANSI, some use the console codepage if available, and far fewer use Unicode encodings. Obviously it's better to specify the encoding in each case if you know it. Regarding the locale module, how about modernizing _locale._getdefaultlocale to return the Windows locale name [1] from GetUserDefaultLocaleName? For example, it could return a tuple such as ('en-UK', None) and ('uz-Latn-UZ', None) -- always with the encoding set to None. The CRT accepts the new locale names, but it isn't quite up to speed. It still sets a legacy locale when the locale string is empty. In this case the high-level setlocale could call _getdefaultlocale. Also _parse_localename, which is called by getlocale, needs to return a tuple with the encoding as None. Currently it raises a ValueError for Windows locale names as defined by [1]. [1]: https://msdn.microsoft.com/en-us/library/dd373814 From p.f.moore at gmail.com Fri Aug 19 04:16:13 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 19 Aug 2016 09:16:13 +0100 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <39147140-e9cc-313f-f213-063f5143c122@python.org> Message-ID: On 19 August 2016 at 06:07, Terry Reedy wrote: > It seems to me that that this is at least somewhat a strawman issue. > > If you want to prohibit backslashed quote reuse in expressions, as in > f'{x.partition(\'-\')[0]}', that is okay with me, as this is unnecessary* > and arguably bad. The third alternative above is better. What breaks > colorizers, and what I therefore object to, is the innovation of adding > magical escaping of ' or " without \. > > Or add a new style rule to PEP 8. > > F-strings: avoid unnecessary escaping in the expression part of f-strings. > Good: f"{x.partition('-')[0]}" > Bad: f'{x.partition(\'-\')[0]}' > > Then PEP-8 checkers will flag such usage. +1. While substantial IDEs like PyCharm or PTVS may use a full-scale parser to do syntax highlighting, I suspect that many tools just use relatively basic regex parsing (Vim certainly does). For those tools, unescaped nested quotes will likely be extremely difficult, if not impossible, to parse correctly. Whereas the current behaviour is "just" standard string highlighting. So if the Python parser were to change as proposed, I'd still argue strongly for a coding style that never uses any construct that would be interpreted differently from current behaviour (i.e., the changed behaviour should essentially be irrelevant). Given this, I thing the argument to change, whether it's theoretically an improvement or not, is irrelevant, and practicality says there's no point in bothering. (Python's parser is intentionally simple, to make it easy for humans and tools to parse Python code - I'm not sure the proposed change to syntax meets that guideline for simple syntax). Paul From eric at trueblade.com Fri Aug 19 04:27:16 2016 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 19 Aug 2016 04:27:16 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <39147140-e9cc-313f-f213-063f5143c122@python.org> <24c024c0-5c4d-45ce-13fd-571e13e1b486@trueblade.com> Message-ID: On 8/19/2016 1:16 AM, Terry Reedy wrote: > On 8/18/2016 8:27 PM, Eric V. Smith wrote: >> So something that parses or scans a Python file and currently >> understands u, b, and r to be string prefixes, just needs to add f to >> the prefixes it uses, and it can now at least understand f-strings (and >> fr-strings). It doesn't need to implement a full-blown expression parser >> just to find out where the end of a f-string is. > > Indeed, IDLE has one prefix re, which has changed occasionally and which > I need to change for 3.6, and 4 res for the 4 unprefixed strings, which > have been the same, AFAIK, for decades. It that prefixes all 4 string > res with the prefix re and o or's the results together to get the > 'string' re. For something else that would become significantly more complicated to implement, you need look no further than the stdlib's own tokenizer module. So Python itself would require changes to parsers/lexers in Python/ast.c, IDLE, and Lib/tokenizer.py. In addition it would require adding tokens to Include/tokens.h and the generated Lib/token.py, and everyone using those files would need to adapt. Not that it's impossible, of course. But don't underestimate the amount of work this proposal would cause to the many places in and outside of Python that examine Python code. Eric. From ncoghlan at gmail.com Fri Aug 19 07:10:27 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 19 Aug 2016 21:10:27 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <39147140-e9cc-313f-f213-063f5143c122@python.org> <24c024c0-5c4d-45ce-13fd-571e13e1b486@trueblade.com> Message-ID: On 19 August 2016 at 18:27, Eric V. Smith wrote: > For something else that would become significantly more complicated to > implement, you need look no further than the stdlib's own tokenizer module. > So Python itself would require changes to parsers/lexers in Python/ast.c, > IDLE, and Lib/tokenizer.py. In addition it would require adding tokens to > Include/tokens.h and the generated Lib/token.py, and everyone using those > files would need to adapt. > > Not that it's impossible, of course. But don't underestimate the amount of > work this proposal would cause to the many places in and outside of Python > that examine Python code. And if folks want to do something more clever than regex based single colour string highlighting, Python's own AST module is available to help them out: >>> tree = ast.parse("f'example{parsing:formatting}and trailer'") >>> ast.dump(tree) "Module(body=[Expr(value=JoinedStr(values=[Str(s='example'), FormattedValue(value=Name(id='parsing', ctx=Load()), conversion=-1, format_spec=Str(s='formatting')), Str(s='and trailer')]))])" Extracting the location of the field expression for syntax highlighting: >>> ast.dump(tree.body[0].value.values[1].value) "Name(id='parsing', ctx=Load())" (I haven't shown it in the example, but AST nodes have lineno and col_offset fields so you can relocate the original source code for processing) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mistersheik at gmail.com Fri Aug 19 07:11:58 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 19 Aug 2016 04:11:58 -0700 (PDT) Subject: [Python-ideas] Consider having collections.abc.Sequence implement __eq__ and __ne__ In-Reply-To: References: Message-ID: I mean zip(self, other) On Friday, August 19, 2016 at 6:46:57 AM UTC-4, Neil Girdhar wrote: > > Both Mapping and Set provide __eq__ and __ne__. I was wondering why not > have Sequence do the same? > > > class Sequence(Sized, Reversible, Container): > > def __eq__(self, other): > if not isinstance(other, Sequence): > return NotImplemented > if len(self) != len(other): > return False > for a, b in self, other: > if a != b: > return False > return True > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Aug 19 06:46:57 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 19 Aug 2016 03:46:57 -0700 (PDT) Subject: [Python-ideas] Consider having collections.abc.Sequence implement __eq__ and __ne__ Message-ID: Both Mapping and Set provide __eq__ and __ne__. I was wondering why not have Sequence do the same? class Sequence(Sized, Reversible, Container): def __eq__(self, other): if not isinstance(other, Sequence): return NotImplemented if len(self) != len(other): return False for a, b in self, other: if a != b: return False return True -------------- next part -------------- An HTML attachment was scrubbed... URL: From arek.bulski at gmail.com Fri Aug 19 08:01:41 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Fri, 19 Aug 2016 14:01:41 +0200 Subject: [Python-ideas] Consider having collections.abc.Sequence Message-ID: Could use all(a==b for zip(seq,seq2)) -- Arkadiusz Bulski -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From vgr255 at live.ca Fri Aug 19 08:35:47 2016 From: vgr255 at live.ca (Emanuel Barry) Date: Fri, 19 Aug 2016 12:35:47 +0000 Subject: [Python-ideas] Consider having collections.abc.Sequence In-Reply-To: References: Message-ID: Arek Bulski wrote: > Could use all(a==b for zip(seq,seq2)) Or even `all(itertools.starmap(operator.eq, zip(a, b)))` if you prefer, but this isn?t about how easy or clever or obfuscated one can write that; it?s about convenience. ABCs expose the lowest common denominator for concrete classes of their kind, and having __eq__ makes sense for Sequence (I?m surprised that it?s not already in). I think we can skip the Python-ideas thread and go straight to opening an issue and submitting a patch :) Neil, care to do that? -Emanuel -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Aug 19 08:39:52 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 19 Aug 2016 12:39:52 +0000 Subject: [Python-ideas] Consider having collections.abc.Sequence In-Reply-To: References: Message-ID: Right, of course that's better. On Fri, Aug 19, 2016 at 8:01 AM Arek Bulski wrote: > Could use all(a==b for zip(seq,seq2)) > > -- Arkadiusz Bulski -- > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Aug 19 08:43:16 2016 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 19 Aug 2016 05:43:16 -0700 (PDT) Subject: [Python-ideas] Consider having collections.abc.Sequence In-Reply-To: References: Message-ID: Sure. http://bugs.python.org/issue27802 On Friday, August 19, 2016 at 8:36:39 AM UTC-4, Emanuel Barry wrote: > > Arek Bulski wrote: > > > Could use all(a==b for zip(seq,seq2)) > > > > Or even `all(itertools.starmap(operator.eq, zip(a, b)))` if you prefer, > but this isn?t about how easy or clever or obfuscated one can write that; > it?s about convenience. ABCs expose the lowest common denominator for > concrete classes of their kind, and having __eq__ makes sense for Sequence > (I?m surprised that it?s not already in). > > > > I think we can skip the Python-ideas thread and go straight to opening an > issue and submitting a patch :) Neil, care to do that? > > > > -Emanuel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Aug 19 13:01:40 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 19 Aug 2016 10:01:40 -0700 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> <22453.8366.592660.590478@turnbull.sk.tsukuba.ac.jp> Message-ID: On Fri, Aug 19, 2016 at 12:30 AM, Nick Coghlan wrote: > > So in porting to py3, they would have had to *add* that 'b' (and a bunch > of > > b'filename') to keep the good old bytes is text interface. > > > > Why would anyone do that? > > For a fair amount of *nix-centric code that primarily works with ASCII > data, adding the 'b' prefix is the easiest way to get into the common > subset of Python 2 & 3. > Sure -- but it's entirely unnecessary, yes? If you don't change your code, you'll get py2(bytes) strings as paths in py2, and py3 (Unicode) strings as paths on py3. So different, yes. But wouldn't it all work? So folks are making an active choice to change their code to get some perceived (real?) performance benefit??? However, as I understand it, py3 string paths did NOT "just work" in place of py2 paths before surrogate pairs were introduced (when was that?) -- so are we dealing with all of this because some (a lot, and important) libraries ported to py3 early in the game? What I'm getting at is whether there is anything other than inertia that keeps folks using bytes paths in py3 code? Maybe it wouldn't be THAT hard to get folks to make the switch: it's EASIER to port your code to py3 this way! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Aug 19 14:07:48 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 19 Aug 2016 14:07:48 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <20160819001830.GW26300@ando.pearwood.info> Message-ID: On 8/19/2016 2:07 AM, ????? wrote: > > > ?????? ??? ??, 19 ????' 2016, 08:29, ??? Terry Reedy > ? >: > > On 8/18/2016 8:18 PM, Steven D'Aprano wrote: > > On Fri, Aug 19, 2016 at 02:17:29AM +1000, Chris Angelico wrote: > > > >> Format codes are just text, > > > > I really think that is wrong. They're more like executable code. > > > > https://www.python.org/dev/peps/pep-0498/#expression-evaluation > > I agree with you here. I just note that the strings passed to exec, > eval, and compile are also executable code strings (and nothing but!). > But I don't remember a suggestion that *they* should by colored as > anything other than a string. > > > But these are objects of type str, not string literals. If they were, I > guess someone would have suggested such coloring. I was referring to strings created by string literals in the code sitting in an editor. -- Terry Jan Reedy From eric at trueblade.com Fri Aug 19 14:43:54 2016 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 19 Aug 2016 14:43:54 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On 8/18/2016 11:05 AM, Philipp A. wrote: > Hi, I originially posted this via google groups, which didn?t make it > through to the list proper, sorry! Read it here please: > https://groups.google.com/forum/#!topic/python-ideas/V1U6DGL5J1s Hi, Philipp. I'm including your original proposal here, so that it's archived properly. Here's what you'd like to have work: f'foo{repr('bar\n') * 3}baz' == 'foo"bar\n""bar\n""bar\n"baz' We've had this discussion before, but I can't find it in the archives. It's possible it happened off-list. The problem I have with your proposal is that it greatly complicates the implementation and it makes it harder for humans to reason about the code (for the same reasons). This is not just an ease of implementation issue: it's a cognitive burden issue. As it currently stands, Python strings, of all 'flavors' (raw, unicode, binary, and f-), are parsed the same way: an optional prefix, one or three quote characters, the body of the string, then matching quote characters. This is sufficiently simple that it can be (and often is) implemented with regular expressions. You also need to support combinations of prefixes, like raw f-strings (fr or rf). With your proposal, it's much more difficult to find the end of an f-string. I do not think this is a reasonable requirement. For example, consider the following: f'a{func({'a{':1,'3}':[{}, ')', '{']})}b' A regex will not be able to deal with the matching braces needed to find the end of the expression. You need to keep track of the nesting level of parens, braces, brackets, and quotes (at least those, I might have left something off). The way this currently is written in Python 3.6a4: f"a{func({'a{':1,'3}':[{}, ')', '{']})}b" It's trivially easy to find the end of the string. It's easy for both humans and the parsers. Now admittedly in order to execute or syntax highlight the existing f-strings, you need to perform this parsing. But of the 3 parsers that ship with Python (ast.c, tokenize.py, IDLE), only ast.c needs to do that currently. I don't think tokenize.py ever will, and IDLE might (but could use the ast module). I think many parsers (e.g. python-mode.el, etc.) will just be able to simply consume the f-strings without looking inside them and move one, and we shouldn't unnecessarily complicate them. > My arguments are basically: > > 1. f-literals are semantically not strings, but expressions. > 2. Their escape sequences in the code parts are fundamentally both > detrimental and superfluous (they?re only in for convenience, as > confirmed by Guido in the quote below) I disagree that they're detrimental and superfluous. I'd say they're consistent with all other strings. > 3. They?re detrimental because Syntax highlighters are (by design) > unable to handle this part of Python 3.6a4?s grammar. This will > cause code to be highlighted as parts of a string and therefore > overlooked. i?m very sure this will cause bugs. I disagree. > 4. The fact that people see the embedded expressions as somehow ?part > of the string? is confusing. > > My poposal is to redo their grammar: > They shouldn?t be parsed as strings and post-processed, but be their own > thing. This also opens the door to potentially extend to with something > like JavaScript?s tagged templates) > > Without the limitations of the string tokenization code/rules, only the > string parts would have escape sequences, and the expression parts would > be regular python code (?holes? in the literal). > > Below the mentioned quote and some replies to the original thread: > > Guido van Rossum > schrieb am > Mi., 17. Aug. 2016 um 20:11 Uhr: > > The explanation is honestly that the current approach is the most > straightforward for the implementation (it's pretty hard to > intercept the string literal before escapes have been processed) and > nobody cares enough about the edge cases to force the implementation > to jump through more hoops. > > I really don't think this discussion should be reopened. If you > disagree, please start a new thread on python-ideas. > > > I really think it should. Please look at python code with f-literals. if > they?re highlighted as strings throughout, you won?t be able to spot > which parts are code. if they?re highlighted as code, the escaping rules > guarantee that most highlighters can?t correctly highlight python > anymore. i think that?s a big issue for readability. Maybe I'm the odd man out, but I really don't care if my editor ever syntax highlights within f-strings. I don't plan on putting anything more complicated than variable names in my f-strings, and I think PEP 8 should recommend something similar. > Brett Cannon > schrieb am > Mi., 17. Aug. 2016 um 20:28 Uhr: > > They are still strings, there is just post-processing on the string > itself to do the interpolation. > > > Sounds hacky to me. I?d rather see a proper parser for them, which of > course would make my vision easy. You're saying if such a parser existed, it would be easy to use it to parse your version of f-strings? True enough! And as Nick points out, such a thing already exists in the ast module. But if your proposal were accepted, using such an approach would not be optional (only if you care about the inside of an f-string), it would be required to parse any python code even if you don't care about the contents of f-strings. > By doing it this way the implementation can use Python itself to do > the tokenizing of the string, while if you do the string > interpolation beforehand you would then need to do it entirely at > the C level which is very messy and painful since you're explicitly > avoiding Python's automatic handling of Unicode, etc. > > > of course we reuse the tokenization for the string parts. as said, you > can view an f-literal as interleaved sequence of strings and expressions > with an attached format specification. > > starts the f-literal, string contents follow. the only difference > to other strings is > <{> which starts expression tokenization. once the expression ends, an > optional > follows, then a > <}> to switch back to string tokenization > this repeats until (in string parsing mode) a > <'> is encountered which ends the f-literal. > > You also make it harder to work with Unicode-based variable names > (or at least explain it). If you have Unicode in a variable name but > you can't use \N{} in the string to help express it you then have to > say "normal Unicode support in the string applies everywhere *but* > in the string interpolation part". > > > i think you?re just proving my point that the way f-literals work now is > confusing. > > the embedded expressions are just normal python. the embedded strings > just normal strings. you can simply switch between both using <{> and > <[format]}>. > > unicode in variable names works exactly the same as in all other python > code because it is regular python code. > > Or another reason is you can explain f-strings as "basically > str.format_map(**locals(), **globals()), but without having to make > the actual method call" (and worrying about clashing keys but I > couldn't think of a way of using dict.update() in a single line). > But with your desired change it kills this explanation by saying > f-strings aren't like this but some magical string that does all of > this stuff before normal string normalization occurs. > > > no, it?s simply the expression parts (that for normal formatting are > inside of the braces of .format(...)) are *interleaved* in between > string parts. they?re not part of the string. just regular plain python > code. I think that's our disagreement. I do see them as part of the string. > Cheers, and i really hope i?ve made a strong case, Thanks for your concern and your comments. But you've not swayed me. > philipp Eric. From guido at python.org Fri Aug 19 14:57:56 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Aug 2016 11:57:56 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: I don't think we should take action now. Would it make sense, as a precaution, to declare the PEP provisional for one release? Then we can develop a sense of whether the current approach causes real problems. We could also emit some kind of warning if the expression part contains an escaped quote, since that's where a potential change would cause breakage. (Or we could leave that to the linters.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Aug 19 15:00:57 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 20 Aug 2016 05:00:57 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On Sat, Aug 20, 2016 at 4:43 AM, Eric V. Smith wrote: > Maybe I'm the odd man out, but I really don't care if my editor ever syntax > highlights within f-strings. I don't plan on putting anything more > complicated than variable names in my f-strings, and I think PEP 8 should > recommend something similar. I'd go further than "variable names", and happily include attribute access, subscripting (item access), etc, including a couple of levels of same: def __repr__(self): return f"{self.__class__.__name__}(foo={self.foo!r}, spam={self.spam!r})" But yes, I wouldn't put arbitrarily complex expressions into f-strings. Whether PEP 8 needs to explicitly say so or not, I would agree with you that it's a bad idea. ChrisA From steve.dower at python.org Fri Aug 19 15:45:42 2016 From: steve.dower at python.org (Steve Dower) Date: Fri, 19 Aug 2016 12:45:42 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On 19Aug2016 1157, Guido van Rossum wrote: > I don't think we should take action now. > > Would it make sense, as a precaution, to declare the PEP provisional for > one release? Then we can develop a sense of whether the current approach > causes real problems. > > We could also emit some kind of warning if the expression part contains > an escaped quote, since that's where a potential change would cause > breakage. (Or we could leave that to the linters.) After reading the responses, I like the idea of explicitly discouraging any sort of string escapes within the expression (whether quotes or special characters), but think it's best left for the linters and style guides. Basically, avoid writing a literal f'{expr}' where you'd need to modify expr at all to rewrite it as: >>> x = expr >>> f'{x}' We will almost certainly be looking to enable code completions and syntax highlighting in Visual Studio within expressions, and we can't easily process the string and then parse it for this purpose, but I think we'll be able to gracefully degrade in cases where escapes that are valid in strings are not valid in code. Cheers, Steve From p.f.moore at gmail.com Fri Aug 19 15:48:51 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 19 Aug 2016 20:48:51 +0100 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On 19 August 2016 at 19:43, Eric V. Smith wrote: > Maybe I'm the odd man out, but I really don't care if my editor ever syntax > highlights within f-strings. I don't plan on putting anything more > complicated than variable names in my f-strings, and I think PEP 8 should > recommend something similar. I agree entirely with this. You're definitely not alone. Paul From anthony at xtfx.me Fri Aug 19 16:11:32 2016 From: anthony at xtfx.me (C Anthony Risinger) Date: Fri, 19 Aug 2016 15:11:32 -0500 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On Fri, Aug 19, 2016 at 1:43 PM, Eric V. Smith wrote: > > > With your proposal, it's much more difficult to find the end of an > f-string. I do not think this is a reasonable requirement. > > For example, consider the following: > f'a{func({'a{':1,'3}':[{}, ')', '{']})}b' > > A regex will not be able to deal with the matching braces needed to > find the end of the expression. You need to keep track of the nesting > level of parens, braces, brackets, and quotes (at least those, I might > have left something off). > > The way this currently is written in Python 3.6a4: > f"a{func({'a{':1,'3}':[{}, ')', '{']})}b" > > It's trivially easy to find the end of the string. It's easy for both > humans and the parsers. > It might be harder to find the end of an f-string in one shot, but I think that's the crux of the issue: to a reader/developer, is an f-string conceptually one thing or a compound thing? To me (someone who would like to see f-string expressions appear like normal expressions, without extra quoting, and proper syntax highlighting *always*, just like shell), this argument is essentially the same as trying to use a regex to find a closing bracket or brace or parse HTML. It's only hard (disregarding any underlying impl details) because that view regards f-strings as singular things with only one "end", when in reality an f-string is much much more like a compound expression that just happens to look like a string. If one rejects the notion that an f-string is "one thing", the boundaries can then be defined as either an unescaped closing quote an unescaped opening curly brace. When that boundary is met, the highlighter switches to normal python syntax parsing just like it would have at the real end of the string. It also starts looking for a closing curly brace to denote the start of "another" string. There is a difference however in that f-string "expressions" also support format specifications. These are not proper Python expressions to begin with so they don't have any existing highlights. Not sure what they should look like I really think it should. Please look at python code with f-literals. if >> they?re highlighted as strings throughout, you won?t be able to spot >> which parts are code. if they?re highlighted as code, the escaping rules >> guarantee that most highlighters can?t correctly highlight python >> anymore. i think that?s a big issue for readability. >> > > Maybe I'm the odd man out, but I really don't care if my editor ever > syntax highlights within f-strings. I don't plan on putting anything more > complicated than variable names in my f-strings, and I think PEP 8 should > recommend something similar. > If things aren't highlighted properly I can't see them very well. If f-strings look like other strings in my editor I will absolutely gloss over them as normal strings, expecting no magic, until I later realize some other way. Since I spend significant amounts of time reading foreign code I could see this being quite obnoxious. It might be sufficient to highlight the entire f-string a different color, but honestly I don't think an f-string should ever ever... ever misrepresent itself as a "string", because it's not. It's code that outputs a string, in a compact and concise way. Proper syntax highlighting is one of the most important things in my day-to-day development. I loved the idea of f-strings when they were discussed previously, but I haven't used them yet. If they hide what they really doing under the guise of being a string, I personally will use them much less, despite wanting to, and understanding their value. When I look at a string I want to immediately know just how literal it really is. -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony at xtfx.me Fri Aug 19 16:24:27 2016 From: anthony at xtfx.me (C Anthony Risinger) Date: Fri, 19 Aug 2016 15:24:27 -0500 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On Fri, Aug 19, 2016 at 3:11 PM, C Anthony Risinger wrote: > > > When I look at a string I want to immediately know just how literal it > really is. > To further this point, editors today show me \n and \t and friends in a different color, because they are escapes, and this visually tells me the thing going into the string at that point is not what is literally in the code. A raw string does not highlight these because they are no longer escapes, and what you see is what you get. Probably f-strings will be used most in short strings, but they'll also be used for long, heredoc-like triple-quoted strings. It's not going to be fun picking expressions out of that when the wall-of-text contains no visual cues. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Aug 19 16:39:02 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 19 Aug 2016 13:39:02 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: I don't think I've ever used a syntax highlighter than changed color of \n in a string. I get the concept, but I haven't suffered for the absence of that. Moreover, although I haven't yet used them, I really doubt I want extra syntax highlighting in f-strings beyond simply the color strings appear as. Well, maybe a uniform distinction for f-string vs. some other kind of string, but nothing internal to the string. YMMV, but that would be my preference in my text editor. Curly braces are perfectly good visual distinction to me. On Aug 19, 2016 1:25 PM, "C Anthony Risinger" wrote: > On Fri, Aug 19, 2016 at 3:11 PM, C Anthony Risinger > wrote: >> >> >> When I look at a string I want to immediately know just how literal it >> really is. >> > > To further this point, editors today show me \n and \t and friends in a > different color, because they are escapes, and this visually tells me the > thing going into the string at that point is not what is literally in the > code. A raw string does not highlight these because they are no longer > escapes, and what you see is what you get. > > Probably f-strings will be used most in short strings, but they'll also be > used for long, heredoc-like triple-quoted strings. It's not going to be fun > picking expressions out of that when the wall-of-text contains no visual > cues. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony at xtfx.me Fri Aug 19 16:50:10 2016 From: anthony at xtfx.me (C Anthony Risinger) Date: Fri, 19 Aug 2016 15:50:10 -0500 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On Fri, Aug 19, 2016 at 3:39 PM, David Mertz wrote: > I don't think I've ever used a syntax highlighter than changed color of \n > in a string. I get the concept, but I haven't suffered for the absence of > that. > > Moreover, although I haven't yet used them, I really doubt I want extra > syntax highlighting in f-strings beyond simply the color strings appear as. > Well, maybe a uniform distinction for f-string vs. some other kind of > string, but nothing internal to the string. > > YMMV, but that would be my preference in my text editor. Curly braces are > perfectly good visual distinction to me. > At least vim does this and so does Sublime Text IIRC. Maybe I spend a lot of time writing shell code too, but I very much appreciate the extra visual cue. The only real point I'm trying to make is that expressions within an f-string are an *escape*. They escape the normal semantics of a string literal and instead do something else for a while. Therefore, the escaped sections should not look like (or need to conform to) the rest of the string and they should not require quoting as if it were still within the string, because I escaped it already! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Aug 19 17:13:48 2016 From: mertz at gnosis.cx (David Mertz) Date: Fri, 19 Aug 2016 14:13:48 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: Ok. My .vimrc is probably different from yours. I'm sure you are right I *could* make that happen. But I haven't so far. On Aug 19, 2016 1:50 PM, "C Anthony Risinger" wrote: > On Fri, Aug 19, 2016 at 3:39 PM, David Mertz wrote: > >> I don't think I've ever used a syntax highlighter than changed color of >> \n in a string. I get the concept, but I haven't suffered for the absence >> of that. >> >> Moreover, although I haven't yet used them, I really doubt I want extra >> syntax highlighting in f-strings beyond simply the color strings appear as. >> Well, maybe a uniform distinction for f-string vs. some other kind of >> string, but nothing internal to the string. >> >> YMMV, but that would be my preference in my text editor. Curly braces are >> perfectly good visual distinction to me. >> > At least vim does this and so does Sublime Text IIRC. Maybe I spend a lot > of time writing shell code too, but I very much appreciate the extra visual > cue. > > The only real point I'm trying to make is that expressions within an > f-string are an *escape*. They escape the normal semantics of a string > literal and instead do something else for a while. Therefore, the escaped > sections should not look like (or need to conform to) the rest of the > string and they should not require quoting as if it were still within the > string, because I escaped it already! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Aug 19 19:09:11 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 20 Aug 2016 00:09:11 +0100 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On 19 August 2016 at 21:50, C Anthony Risinger wrote: > The only real point I'm trying to make is that expressions within an > f-string are an *escape*. They escape the normal semantics of a string > literal and instead do something else for a while. Therefore, the escaped > sections should not look like (or need to conform to) the rest of the string > and they should not require quoting as if it were still within the string, > because I escaped it already! So, to me f'{x.partition(' + ')[0]}' reads as a string concatenation. I'm not sure how you'd expect a syntax highlighter to make it look like anything else, to be honest (given that you're arguing *not* to highlight the whole of the content of the f-string the same way). The *real* solution is not to write something like this, instead write f"{x.partition(' + ')[0]}" That makes it easy for *humans* to read. Computers parsing it is irrelevant. Once you do that, the proposal here (that unescaped quotes can be used in an f-string) also becomes irrelevant - this expression parses exactly the same way under both the current code and the proposed approach. And that's generally true - code that is clearly written should, in my mind, work the same way regardless. So the proposal ends up being merely "choose your preference as to which form of badly-written code is a syntax error". So the only relevance of syntax highlighting is how badly it fails when handling badly-written or syntactically incorrect code. And detecting an f-string just like you detect any other string, is *much* better behaved in that situation. Detecting a closing quote is simple, and isn't thrown off by incorrect nesting. If you want the *content* of an f-string to be highlighted as an expression, Vim can do that, it can apply specific syntax when in another syntax group such as an f-string, and I'm sure other editors can do this as well - but you should do that once you've determined where the f-string starts and ends. Paul From python at 2sn.net Fri Aug 19 23:13:51 2016 From: python at 2sn.net (Alexander Heger) Date: Sat, 20 Aug 2016 13:13:51 +1000 Subject: [Python-ideas] discontinue iterable strings Message-ID: standard python should discontinue to see strings as iterables of characters - length-1 strings. I see this as one of the biggest design flaws of python. It may have seem genius at the time, but it has passed it usefulness for practical language use. For example, numpy has no issues >>> np.array('abc') array('abc', dtype='>> list('abc') ['a', 'b', 'c'] Numpy was of course design a lot later, with more experience in practical use (in mind). Maybe a starting point for transition that latter operation also returns ['abc'] in the long run, could be to have an explicit split operator as recommended use, e.g., 'abc'.split() 'abc'.split('') 'abc'.chars() 'abc'.items() the latter two could return an iterator whereas the former two return lists (currently raise exceptions). Similar for bytes, etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From egregius313 at gmail.com Fri Aug 19 23:24:26 2016 From: egregius313 at gmail.com (Edward Minnix) Date: Fri, 19 Aug 2016 23:24:26 -0400 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: Message-ID: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> This would introduce a major inconsistency. To do this, you would need to also strip string?s of their status as sequences (in collections.abc, Sequence is a subclass of Iterable). Thus, making string?s no longer iterable would also mean you could no longer take the length or slice of a string. While I believe your proposal was well intentioned, IMHO it would cause a giant inconsistency in Python (why would one of our core sequences not be iterable?) - Ed > On Aug 19, 2016, at 11:13 PM, Alexander Heger wrote: > > standard python should discontinue to see strings as iterables of characters - length-1 strings. I see this as one of the biggest design flaws of python. It may have seem genius at the time, but it has passed it usefulness for practical language use. For example, numpy has no issues > > >>> np.array('abc') > array('abc', dtype=' > whereas, as all know, > > >>> list('abc') > ['a', 'b', 'c'] > > Numpy was of course design a lot later, with more experience in practical use (in mind). > > Maybe a starting point for transition that latter operation also returns ['abc'] in the long run, could be to have an explicit split operator as recommended use, e.g., > > 'abc'.split() > 'abc'.split('') > 'abc'.chars() > 'abc'.items() > > the latter two could return an iterator whereas the former two return lists (currently raise exceptions). > Similar for bytes, etc. > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From random832 at fastmail.com Fri Aug 19 23:37:57 2016 From: random832 at fastmail.com (Random832) Date: Fri, 19 Aug 2016 23:37:57 -0400 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: Message-ID: <1471664277.2781336.700744857.0EA553A7@webmail.messagingengine.com> On Fri, Aug 19, 2016, at 23:13, Alexander Heger wrote: > Numpy was of course design a lot later, with more experience in practical > use (in mind). The meaning of np.array('abc') is a bit baffling to someone with no experience in numpy. It doesn't seem to be a one-dimensional array containing 'abc', as your next statement suggests. It seem to be a zero-dimensional array? > Maybe a starting point for transition that latter operation also returns > ['abc'] in the long run Just to be clear, are you proposing a generalized list(obj: non-iterable) constructor that returns [obj]? From anthony at xtfx.me Fri Aug 19 23:57:17 2016 From: anthony at xtfx.me (C Anthony Risinger) Date: Fri, 19 Aug 2016 22:57:17 -0500 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On Fri, Aug 19, 2016 at 6:09 PM, Paul Moore wrote: > On 19 August 2016 at 21:50, C Anthony Risinger wrote: > > The only real point I'm trying to make is that expressions within an > > f-string are an *escape*. They escape the normal semantics of a string > > literal and instead do something else for a while. Therefore, the escaped > > sections should not look like (or need to conform to) the rest of the > string > > and they should not require quoting as if it were still within the > string, > > because I escaped it already! > > So, to me > > f'{x.partition(' + ')[0]}' > > reads as a string concatenation. I'm not sure how you'd expect a > syntax highlighter to make it look like anything else, to be honest > (given that you're arguing *not* to highlight the whole of the content > of the f-string the same way). > The two string parts are string-colored and the x.partition bits would look like any other code in the file. It won't look like concatenation at that point. Examples referencing f'{one_expr_and_no_real_string_in_here}' feel somewhat crafted to confuse because the start and end quotes are directly adjacent to the expression. str(...) is the same complexity. Usage in the wild will have plenty of string-stuff on one or both sides, otherwise, why? Shell or Ruby code is probably more representative of how f-strings will be used. I know a couple people have mentioned they won't/don't care about highlighting in an f-string, but I honestly don't know a single person that would prefer this, except maybe one devops guy I know that does everything on old-school green text because why not. I've also spent hours and hours staring at--and sometimes editing--code on barebones servers/containers and I've come to respect the role colors play in my ability to quickly scan and read code. > The *real* solution is not to write something like this, instead write > > f"{x.partition(' + ')[0]}" > Why? Why should I have to care what kind of quote I used at the start of the string? I thought I "escaped" the string at the `{` and now my brain has moved on to the expression? Am I still "inside" the string? etc... It's not the highlighting I care about per se, I think we have a small UX failure here. In a quality editor, everything about the {...} will tell me I'm writing a Python expression. It'll be colored like an expression. It'll do fancy completion like an expression. Aw shucks, it *IS* a Python expression! Except for one tiny detail: I'm not allowed to use the quote I use in 95% of all my Python code--without thinking--because I already used it at the string start :-( It's like this weird invisible ruh-roh-still-in-a-string state hangs over you despite everything else suggesting otherwise (highlighting and whatever fanciness helps people output code). The only time I personally use a different quote is when it somehow makes the data more amenable to the task at hand. The data! The literal data! Not the expressions I'm conveniently inlining with the help of f-strings. When I do it's a conscious decision and comes with a reason. Otherwise I'll use one type of quote exclusively (which depends on the lang, but more and more, it's simply doubles). The appeal of f-strings is the rapid inlining of whatever plus string data. "Whatever" is typically more complex than a simple attribute access or variable reference, though not much more complex eg. `object.method(key, "default")`. If I have to water it down for people to find it acceptable (such as creating simpler variables ahead-of-time) I'd probably just keep using .format(...). Because what I have gained with an f-string? The problem I have is the very idea that while inlining expressions I'm still somehow inside the string, and I have to think about that. It's not a *huge* overhead for an experienced, most-handsome developer such as myself, but still falls in that 5% territory (using a quote because I must vs. the one used 95%). Since f-string are fabulous, I want to use them all the time! Alas, now I have to think about flip-flopping quotes. I don't know what it's like to be taught programming but this seems like a possible negative interaction for new people (learning and using [one] quote combined with easy string building). I know it's not *that* big of deal to switch quotes. I believe this simpler implementation out the gate (not a dig! still great!) will come at the cost of introducing a small Python oddity requiring explanation. Not just because it's at odds with other languages, but because it's at odds with what the editor is telling the user (free-form expression). tl;dr, UX is weaker when the editor implies a free-form expression in every possible way, but the writer can't use the quote they always use, and I think strings will be common in f-string expression sections. -- C Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Sat Aug 20 00:02:52 2016 From: random832 at fastmail.com (Random832) Date: Sat, 20 Aug 2016 00:02:52 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: <1471665772.2787655.700749457.3C792167@webmail.messagingengine.com> On Fri, Aug 19, 2016, at 19:09, Paul Moore wrote: > So, to me > > f'{x.partition(' + ')[0]}' > > reads as a string concatenation. I'm not sure how you'd expect a > syntax highlighter to make it look like anything else, to be honest One possible syntax highlighting scheme: - f' and ' are hilighted in pink, along with any plain text content of the string. - - Incidentally, any backslash escapes, not shown here, are highlighted in orange. - { and } are hilighted in blue; along with format specifiers, maybe, or maybe they get another color. - The code inside the expression is highlighted in orange. - Any keywords, builtins, constants, etc, within the expression are highlighted in their usual colors. - - In this example in particular, ' + ' and 0 are highlighted in pink. A pink + is a character within a string, a gray or orange + is an operator. In terms of Vim's basic syntax groups: - C = Constant [pink] - P = PreProc [blue], by precedent as use for the $(...) delimiters in sh) - S = Special [orange], by precedent as use for the content within $(...) in sh, and longstanding near-universal precedent, including in python, for backslash escapes. These would, naturally, have separate non-basic highlight groups, in case a particular user wanted to change one of them. f'foo {x.partition(' + ')[0]:aaa} bar\n' CCCCCCPSSSSSSSSSSSSCCCCCSSCSPPPPPCCCCSSC > (given that you're arguing *not* to highlight the whole of the content > of the f-string the same way). I'm not sure what you mean by "the same way" here, I haven't followed the discussion closely enough to know what statement by whom you're referring to here. From tritium-list at sdamon.com Sat Aug 20 00:07:40 2016 From: tritium-list at sdamon.com (tritium-list at sdamon.com) Date: Sat, 20 Aug 2016 00:07:40 -0400 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: Message-ID: <09fa01d1fa98$66af1be0$340d53a0$@hotmail.com> This is a feature, not a flaw. From: Python-ideas [mailto:python-ideas-bounces+tritium-list=sdamon.com at python.org] On Behalf Of Alexander Heger Sent: Friday, August 19, 2016 11:14 PM To: python-ideas Subject: [Python-ideas] discontinue iterable strings standard python should discontinue to see strings as iterables of characters - length-1 strings. I see this as one of the biggest design flaws of python. It may have seem genius at the time, but it has passed it usefulness for practical language use. For example, numpy has no issues >>> np.array('abc') array('abc', dtype='>> list('abc') ['a', 'b', 'c'] Numpy was of course design a lot later, with more experience in practical use (in mind). Maybe a starting point for transition that latter operation also returns ['abc'] in the long run, could be to have an explicit split operator as recommended use, e.g., 'abc'.split() 'abc'.split('') 'abc'.chars() 'abc'.items() the latter two could return an iterator whereas the former two return lists (currently raise exceptions). Similar for bytes, etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sat Aug 20 00:35:27 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 19 Aug 2016 21:35:27 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: <57B7DE0F.8030100@stoneleaf.us> On 08/19/2016 08:57 PM, C Anthony Risinger wrote: [...] > The appeal of f-strings is the rapid inlining of whatever plus string data. "Whatever" is typically more complex than a simple attribute access or variable reference, though not much more complex eg. `object.method(key, "default")`. If I have to water it down for people to find it acceptable (such as creating simpler variables ahead-of-time) I'd probably just keep using .format(...). Because what I have gained with an f-string? I suspect f-strings are in the same category as lambda -- if it's that complex, use the other tools instead. At this point I don't see this changing. If you want to make any headway you're going to have to do it with a complete alternate implementation, and even then I don't think you have good odds. -- ~Ethan~ From leewangzhong+python at gmail.com Sat Aug 20 01:47:46 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 20 Aug 2016 01:47:46 -0400 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: Message-ID: On Aug 19, 2016 11:14 PM, "Alexander Heger" wrote: > > standard python should discontinue to see strings as iterables of characters - length-1 strings. I see this as one of the biggest design flaws of python. It may have seem genius at the time, but it has passed it usefulness for practical language use. I'm bothered by it whenever I want to write code that takes a sequence and returns a sequence of the same type. But I don't think that the answer is to remove the concept of strings as sequences. And I don't want strings to be sequences of character code points, because that's forcing humans to think on the implementation level. Please explain the problem with the status quo, preferably with examples where it goes wrong. > For example, numpy has no issues > > >>> np.array('abc') > array('abc', dtype=' whereas, as all know, > > >>> list('abc') > ['a', 'b', 'c'] > > Numpy was of course design a lot later, with more experience in practical use (in mind). Numpy is for numbers. It was designed with numbers in mind. Numpy's relevant experience here is waaaay less than general Python's. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Sat Aug 20 02:01:17 2016 From: random832 at fastmail.com (Random832) Date: Sat, 20 Aug 2016 02:01:17 -0400 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: Message-ID: <1471672877.2810004.700796057.25C5145B@webmail.messagingengine.com> On Sat, Aug 20, 2016, at 01:47, Franklin? Lee wrote: > That says, "This is a 0-length array of 3-char Unicode strings." Numpy > doesn't recognize the string as a specification of an array. Try > `np.array(4.)` and you'll get (IIRC) `array(4., dtype='float')`, which > has > shape `()`. Numpy probably won't let you index either one. What can you > even do with it? In my poking around I found that you can index it with [()] or access with .item() and .itemset(value). Still seems more like a party trick than the well-reasoned practical implementation choice he implied it was. From brenbarn at brenbarn.net Sat Aug 20 02:25:44 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Fri, 19 Aug 2016 23:25:44 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: <57B7F7E8.7080806@brenbarn.net> On 2016-08-19 13:11, C Anthony Risinger wrote: > It might be harder to find the end of an f-string in one shot, but I > think that's the crux of the issue: to a reader/developer, is an > f-string conceptually one thing or a compound thing? > > To me (someone who would like to see f-string expressions appear like > normal expressions, without extra quoting, and proper syntax > highlighting *always*, just like shell), this argument is essentially > the same as trying to use a regex to find a closing bracket or brace or > parse HTML. It's only hard (disregarding any underlying impl details) > because that view regards f-strings as singular things with only one > "end", when in reality an f-string is much much more like a compound > expression that just happens to look like a string. Personally I think that is a dangerous road to go down. It seems it would lead to the practice of doing all sorts of complicated things inside f-strings, which seems like a bad idea to me. In principle you could write your entire program in an f-string, but that doesn't mean we need to accommodate the sort of syntax highlighting that would facilitate that. To me it seems more prudent to just say that f-strings are (as the name implies) strings, and leave it at that. If I ever get to the point where what I'm doing in the f-string is so complicated that I really need syntax highlighting for it to look good, I'd take that as a sign that I should move some of that code out of the f-string into ordinary expressions. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From python at 2sn.net Sat Aug 20 02:26:48 2016 From: python at 2sn.net (Alexander Heger) Date: Sat, 20 Aug 2016 16:26:48 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: Message-ID: On 20 August 2016 at 15:47, Franklin? Lee wrote: On Aug 19, 2016 11:14 PM, "Alexander Heger" wrote: > > standard python should discontinue to see strings as iterables of characters - length-1 strings. I see this as one of the biggest design flaws of python. It may have seem genius at the time, but it has passed it usefulness for practical language use. I'm bothered by it whenever I want to write code that takes a sequence and returns a sequence of the same type. But I don't think that the answer is to remove the concept of strings as sequences. And I don't want strings to be sequences of character code points, because that's forcing humans to think on the implementation level. Please explain the problem with the status quo, preferably with examples where it goes wrong. > For example, numpy has no issues > > >>> np.array('abc') > array('abc', dtype='>> a = np.array('abc') >>> a[()] 'abc' >>> a[()][2] 'c' The point is it does not try to disassemble it into elements as it would do with other iterables >>> np.array([1,2,3]) array([1, 2, 3]) >>> np.array([1,2,3]).shape (3,) Numpy is for numbers. It was designed with numbers in mind. Numpy's relevant experience here is waaaay less than general Python's. But it does deal with strings as monolithic objects, doing away with many of the pitfalls of strings in Python. And yes, it does a lot about memory management, so it is fully aware of strings and bytes ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at 2sn.net Sat Aug 20 02:28:19 2016 From: python at 2sn.net (Alexander Heger) Date: Sat, 20 Aug 2016 16:28:19 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: This would introduce a major inconsistency. To do this, you would need to also strip string?s of their status as sequences (in collections.abc, Sequence is a subclass of Iterable). Thus, making string?s no longer iterable would also mean you could no longer take the length or slice of a string. you can always define __len__ and __index__ independently. I do this for many objects. But it is a point I have not considered. While I believe your proposal was well intentioned, IMHO it would cause a giant inconsistency in Python (why would one of our core sequences not be iterable?) Yes, I am aware it will cause a lot of backward incompatibilities, but this is based on all the lengthy discussions about "string but not iterable" type determinations. If sting was not iterable, a lot of things would also be easier. You could also argue why an integer cannot be iterated over its bits? As had been noted, is one of few objects of which the component can be the object itself. 'a'[0] == 'a' I do not iterate over strings so often that it could not be done using, e.g., str.chars(): for c in str.chars(): print(c) On 20 August 2016 at 13:24, Edward Minnix wrote: > This would introduce a major inconsistency. To do this, you would need to > also strip string?s of their status as sequences (in collections.abc, > Sequence is a subclass of Iterable). Thus, making string?s no longer > iterable would also mean you could no longer take the length or slice of a > string. > > While I believe your proposal was well intentioned, IMHO it would cause a > giant inconsistency in Python (why would one of our core sequences not be > iterable?) > > - Ed > > > On Aug 19, 2016, at 11:13 PM, Alexander Heger wrote: > > > > standard python should discontinue to see strings as iterables of > characters - length-1 strings. I see this as one of the biggest design > flaws of python. It may have seem genius at the time, but it has passed it > usefulness for practical language use. For example, numpy has no issues > > > > >>> np.array('abc') > > array('abc', dtype=' > > > whereas, as all know, > > > > >>> list('abc') > > ['a', 'b', 'c'] > > > > Numpy was of course design a lot later, with more experience in > practical use (in mind). > > > > Maybe a starting point for transition that latter operation also returns > ['abc'] in the long run, could be to have an explicit split operator as > recommended use, e.g., > > > > 'abc'.split() > > 'abc'.split('') > > 'abc'.chars() > > 'abc'.items() > > > > the latter two could return an iterator whereas the former two return > lists (currently raise exceptions). > > Similar for bytes, etc. > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenbarn at brenbarn.net Sat Aug 20 02:32:03 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Fri, 19 Aug 2016 23:32:03 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: <57B7F963.7060702@brenbarn.net> On 2016-08-19 20:57, C Anthony Risinger wrote: > In a quality editor, everything about the {...} will tell me I'm writing a > Python expression. It'll be colored like an expression. It'll do fancy > completion like an expression. Aw shucks, it*IS* a Python expression! > Except for one tiny detail: I'm not allowed to use the quote I use in 95% > of all my Python code--without thinking--because I already used it at the > string start :-( It's like this weird invisible ruh-roh-still-in-a-string > state hangs over you despite everything else suggesting otherwise But it IS inside a string. That's why it's an f-string. The essence of your argument seems to be that you want expressions inside f-strings to act just like expressions outside of f-strings. But there's already a way to do that: just write the expression outside of the f-string. Then you can assign it to a variable, and refer to the variable in the f-string. The whole point of f-strings is that they allow expressions *inside strings*. It doesn't make sense to pretend those expressions are not inside strings. It's true that the string itself "isn't really a string" in the sense that it's put together at runtime rather than being a constant, but again, the point of f-strings is to make things like that writable as strings in source code. If you don't want to write them as strings, you can still concatenate separate string values or use various other solutions. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From python at 2sn.net Sat Aug 20 02:33:04 2016 From: python at 2sn.net (Alexander Heger) Date: Sat, 20 Aug 2016 16:33:04 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: <1471672877.2810004.700796057.25C5145B@webmail.messagingengine.com> References: <1471672877.2810004.700796057.25C5145B@webmail.messagingengine.com> Message-ID: > > > That says, "This is a 0-length array of 3-char Unicode strings." Numpy > > doesn't recognize the string as a specification of an array. Try > > `np.array(4.)` and you'll get (IIRC) `array(4., dtype='float')`, which > > has > > shape `()`. Numpy probably won't let you index either one. What can you > > even do with it? > > In my poking around I found that you can index it with [()] or access > with .item() and .itemset(value). Still seems more like a party trick > than the well-reasoned practical implementation choice he implied it > was. my apologies about the confusion, the dim=() was not the point, but rather that numpy treats the string as a monolithic object rather than disassembling it as it would do with other iterables. I was just trying to give the simplest possible example. Numpy still does >>> np.array([[1,2],[3,4]]) array([[1, 2], [3, 4]]) >>> np.array([[1,2],[3,4]]).shape (2, 2) but not here >>> np.array(['ab','cd']) array(['ab', 'cd'], dtype='>> np.array(['ab','cd']).shape (2,) -Alexander -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Sat Aug 20 03:42:42 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sat, 20 Aug 2016 03:42:42 -0400 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: Message-ID: On Aug 20, 2016 2:27 AM, "Alexander Heger" wrote: > The point is it does not try to disassemble it into elements as it would do with other iterables > > >>> np.array([1,2,3]) > array([1, 2, 3]) > >>> np.array([1,2,3]).shape > (3,) It isn't so much that strings are special, it's that lists and tuples are special. Very few iterables can be directly converted to Numpy arrays. Try `np.array({1,2})` and you get `array({1, 2}, dtype=object)`, a 0-dimensional array. > But it does deal with strings as monolithic objects, Seems to me that Numpy treats strings as "I, uh, don't really know what you want me to do with this" objects. That kinda makes sense for Numpy, because, uh, what DO you want Numpy to do with strings? Numpy is NOT designed to mess around with strings, and Numpy does NOT have as high a proportion of programmers using it for strings, so Numpy does not have much insight into what's good and what's useful for programmers who need to mess around with strings. In summary, Numpy isn't a good example of "strings done right, through more experience", because they are barely "done" at all. > doing away with many of the pitfalls of strings in Python. Please start listing the pitfalls, and show how alternatives will be an improvement. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Aug 20 03:47:44 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 20 Aug 2016 17:47:44 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: On Sat, Aug 20, 2016 at 4:28 PM, Alexander Heger wrote: > Yes, I am aware it will cause a lot of backward incompatibilities... Tell me, would you retain the ability to subscript a string to get its characters? >>> "asdf"[0] 'a' If not, you break a ton of code. If you do, they are automatically iterable *by definition*. Watch: class IsThisIterable: def __getitem__(self, idx): if idx < 5: return idx*idx raise IndexError >>> iti = IsThisIterable() >>> for x in iti: print(x) ... 0 1 4 9 16 So you can't lose iteration without also losing subscripting. ChrisA From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Aug 20 04:18:18 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 20 Aug 2016 17:18:18 +0900 Subject: [Python-ideas] Fix default encodings on Windows In-Reply-To: References: <57AB6E2D.6050704@python.org> <22449.19866.890960.881902@turnbull.sk.tsukuba.ac.jp> <1471268468.1079232.695694825.0DF94146@webmail.messagingengine.com> <1471280043.1122673.695914769.6B59B8C1@webmail.messagingengine.com> <909dd51a-4288-e110-3bc0-7d92ce319d0c@python.org> <8087808059974504431@unknownmsgid> <1471359599.1398250.696922633.4B8CEB63@webmail.messagingengine.com> <3578204020037430971@unknownmsgid> <2ff7ffea-345e-a83f-752c-6dcbfdeea3af@python.org> <22452.12264.206515.900194@turnbull.sk.tsukuba.ac.jp> <8f952fd8-2926-d8a0-2b6d-6d02df044b43@python.org> <22453.8366.592660.590478@turnbull.sk.tsukuba.ac.jp> Message-ID: <22456.4682.970532.305927@turnbull.sk.tsukuba.ac.jp> Chris Barker writes: > Sure -- but it's entirely unnecessary, yes? If you don't change > your code, you'll get py2(bytes) strings as paths in py2, and py3 > (Unicode) strings as paths on py3. So different, yes. But wouldn't > it all work? The difference is that if you happen to have a file name on Unix that is *not* encoded in the default locale, bytes Just Works, while Something Bad happens with unicode (mixing Python 3 and Python 2 terminology for clarity). Also, in Python the C/POSIX default locale implied a codec of 'ascii' which is quite risky nowadays, so using unicode meant always being conscious of encodings. > So folks are making an active choice to change their code to get some > perceived (real?) performance benefit??? No, they're making a passive choice to not fix whut ain't broke nohow, but in Python 3 is spelled differently. It's the same order of change as "print stuff" (Python 2) to "print(stuff)" (Python 3), except that it's not as automatic. (Ie, where print is *always* a function call in Python 3, often in a Python 2 -> 3 port you're better off with str than bytes, especially before PEP 461 "% formatting for bytes".) > However, as I understand it, py3 string paths did NOT "just work" > in place of py2 paths before surrogate pairs were introduced (when > was that?) I'm not sure what you're referring to. Python 2 unicode and Python 3 str have been capable of representing (for values of "representing" that require appropriate choice of I/O codecs) the entire repertoire of Unicode since version 1.6 [sic!]. I suppose you mean PEP 383 (implemented in Python 3.1), which added a pseudo-encoding for unencodable bytes, ie, the surrogateescape error handler. This was never a major consideration in practice, however, as you could always get basically the same effect with the 'latin-1' codec. That is, the surrogateescape handler is primarily of benefit to those who are already convinced that fully conformant Unicode is the way to go. It doesn't make a difference to those who prefer bytes. > What I'm getting at is whether there is anything other than inertia > that keeps folks using bytes paths in py3 code? Maybe it wouldn't > be THAT hard to get folks to make the switch: it's EASIER to port > your code to py3 this way! It's not. First, encoding awareness is real work. If you try to DTRT, you open yourself up to UnicodeErrors anywhere in your code where there's a Python/rest-of-world boundary. If you just use bytes, you may be producing garbage, but your program doesn't stop running, and you can always argue it's either your upstream's or your downstream's fault. I *personally* have always found the work to be worthwhile, as my work always involves "real" text processing, and frequently not in pure ASCII. Second, there are a lot of low-level use cases where (1) efficiency matters and (2) all the processing actually done involves switching on byte values in the range 32-126. It makes sense to do that work on bytes, wouldn't you say? And to make the switch cases easier to read, it's common practice to form (or contort) those bytes into human words. These cases include a lot of the familiar acronyms: SMTP, HTTP, DNS, VCS, VM (as in "bytecode interpreter"), ... and the projects are familiar: Twisted, Mercurial, .... Bottom line: I'm with you! I think that "filenames are text" *should* be the default mode for Python programmers. But there are important use cases where it's sometimes more work to make that work than to make bytes work (on POSIX), and typically those cases also inherit largish, battle-tested code bases that assume a "bytes in, bytes through, bytes out" model. We can't deprecate "filenames as bytes" on POSIX yet, and if we want to encourage participation in projects that use that model by Windows-based programmers, we can't deprecate completely on Windows, either. From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Aug 20 05:29:30 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 20 Aug 2016 18:29:30 +0900 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: <22456.8954.576042.596259@turnbull.sk.tsukuba.ac.jp> C Anthony Risinger writes: > The only time I personally use a different quote is when it somehow > makes the data more amenable to the task at hand. The data! The > literal data! Not the expressions I'm conveniently inlining with > the help of f-strings. You do *not* know that yet! *Nobody* does. Nobody has yet written an f-string in production code, let alone read thousands and written hundreds. Can you be sure that after you write a couple dozen f-strings you won't find that such "quote context" is carried over naturally from the way you write other strings? (Eg, because "I'm still in a string" is signaled by the highlighting of the surrounding stringish text.) I think the proposed changes to the PEP fall into the "Sometimes never is better than *right* now" category. The arguments I've seen so far are plausible but not founded in experience: it could easily go the other way, and I don't see potential for true disaster. > If I have to water it down for people to find it acceptable (such > as creating simpler variables ahead-of-time) I'd probably just keep > using .format(...). Because what I have gained with an f-string? I don't see a problem if you choose not to write f-strings. Would other people using that convention be hard for you to *read*? > Not just because it's at odds with other languages, but because > it's at odds with what the editor is telling the user (free-form > expression). There are no editors that will tell you such a thing yet. And if you trust an editor that *does* tell you that it's a free-form expression and use the same kind of quote that delimits the f-string, you won't actually create a syntax error. You're merely subject to the same kind of "problem" that you have if you don't write PEP8 conforming code. Regards, From p.f.moore at gmail.com Sat Aug 20 05:43:00 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 20 Aug 2016 10:43:00 +0100 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On 20 August 2016 at 04:57, C Anthony Risinger wrote: > The two string parts are string-colored and the x.partition bits would look like > any other code in the file. It won't look like concatenation at that point. That's entirely subjective and theoretical (unless you've implemented it and reviewed the resulting look of the code). In my (equally subjective and theoretical) opinion it would still look like concatenation, and would confuse me. I made a deliberate point of saying that *to me* it looked like concatenation. YMMV - remember this tangent was started by people stating their opinions. Saying that your opinion differs doesn't invalidate their (my) view. > tl;dr, UX is weaker when the editor implies a free-form expression in every > possible way, but the writer can't use the quote they always use, and I > think strings will be common in f-string expression sections. FWIW I would instantly reject any code passed to me for review which used the same quote within an f-string as was used to delimit it, should this proposal be accepted. Also, a lot of code is read on media that doesn't do syntax highlighting - email, books, etc. A construct that needs syntax highlighting to be readable is problematic because of this. Paul From p.f.moore at gmail.com Sat Aug 20 05:47:10 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 20 Aug 2016 10:47:10 +0100 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <1471665772.2787655.700749457.3C792167@webmail.messagingengine.com> References: <1471665772.2787655.700749457.3C792167@webmail.messagingengine.com> Message-ID: On 20 August 2016 at 05:02, Random832 wrote: > On Fri, Aug 19, 2016, at 19:09, Paul Moore wrote: >> So, to me >> >> f'{x.partition(' + ')[0]}' >> >> reads as a string concatenation. I'm not sure how you'd expect a >> syntax highlighter to make it look like anything else, to be honest > > One possible syntax highlighting scheme: Thanks for the detailed explanation and example. Yes, that may well be a reasonable highlighting scheme. I'd still object to reusing single quotes in the example given, though, as it would be confusing if printed, or in email, etc. And as a general principle, "needs syntax highlighting to be readable" is a problem to me. So I stand by my statement that as a style rule, f-strings should be written to work identically regardless of whether this proposal is implemented or not. Paul From michael.selik at gmail.com Sat Aug 20 08:31:48 2016 From: michael.selik at gmail.com (Michael Selik) Date: Sat, 20 Aug 2016 12:31:48 +0000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: On Sat, Aug 20, 2016 at 3:48 AM Chris Angelico wrote: > On Sat, Aug 20, 2016 at 4:28 PM, Alexander Heger wrote: > > Yes, I am aware it will cause a lot of backward incompatibilities... > > Tell me, would you retain the ability to subscript a string to get its > characters? > > >>> "asdf"[0] > 'a' > A separate character type would solve that issue. While Alexander Heger was advocating for a "monolithic object," and may in fact not want subscripting, I think he's more frustrated by the fact that iterating over a string gives other strings. If instead a 1-length string were a different, non-iterable type, that might avoid some problems. However, special-casing a character as a different type would bring its own problems. Note the annoyance of iterating over bytes and getting integers. In case it's not clear, I should add that I disagree with this proposal and do not want any change to strings. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.selik at gmail.com Sat Aug 20 08:53:48 2016 From: michael.selik at gmail.com (Michael Selik) Date: Sat, 20 Aug 2016 12:53:48 +0000 Subject: [Python-ideas] Consider having collections.abc.Sequence In-Reply-To: References: Message-ID: On Fri, Aug 19, 2016 at 8:55 AM Neil Girdhar wrote: > Sure. > > http://bugs.python.org/issue27802 > > > On Friday, August 19, 2016 at 8:36:39 AM UTC-4, Emanuel Barry wrote: >> >> Arek Bulski wrote: >> >> > Could use all(a==b for zip(seq,seq2)) >> >> >> >> Or even `all(itertools.starmap(operator.eq, zip(a, b)))` if you prefer, >> but this isn?t about how easy or clever or obfuscated one can write that; >> it?s about convenience. ABCs expose the lowest common denominator for >> concrete classes of their kind, and having __eq__ makes sense for Sequence >> (I?m surprised that it?s not already in). >> >> >> >> I think we can skip the Python-ideas thread and go straight to opening an >> issue and submitting a patch :) Neil, care to do that? >> >> >> >> -Emanuel >> > tuples and lists are both Sequences, yet are not equal to each other. py> [1] == (1,) False -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Aug 20 09:53:51 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 20 Aug 2016 23:53:51 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: On Sat, Aug 20, 2016 at 10:31 PM, Michael Selik wrote: > On Sat, Aug 20, 2016 at 3:48 AM Chris Angelico wrote: >> >> On Sat, Aug 20, 2016 at 4:28 PM, Alexander Heger wrote: >> > Yes, I am aware it will cause a lot of backward incompatibilities... >> >> Tell me, would you retain the ability to subscript a string to get its >> characters? >> >> >>> "asdf"[0] >> 'a' > > > A separate character type would solve that issue. While Alexander Heger was > advocating for a "monolithic object," and may in fact not want subscripting, > I think he's more frustrated by the fact that iterating over a string gives > other strings. If instead a 1-length string were a different, non-iterable > type, that might avoid some problems. > > However, special-casing a character as a different type would bring its own > problems. Note the annoyance of iterating over bytes and getting integers. > > In case it's not clear, I should add that I disagree with this proposal and > do not want any change to strings. Agreed. One of the handy traits of cross-platform code is that MANY languages let you subscript a double-quoted string to get a single-quoted character. Compare these blocks of code: if ("asdf"[0] == 'a') write("The first letter of asdf is a.\n"); if ("asdf"[0] == 'a'): print("The first letter of asdf is a.") if ("asdf"[0] == 'a') console.log("The first letter of asdf is a.") if ("asdf"[0] == 'a') printf("The first letter of asdf is a.\n"); if ("asdf"[0] == 'a') echo("The first letter of asdf is a.\n"); Those are Pike, Python, JavaScript/ECMAScript, C/C++, and PHP, respectively. Two of them treat single-quoted and double-quoted strings identically (Python and JS). Two use double quotes for strings and single quotes for character (aka integer) constants (Pike and C). One has double quotes for interpolated and single quotes for non-interpolated strings (PHP). And just to mess you up completely, two (or three) of these define strings to be sequences of bytes (C/C++ and PHP, plus Python 2), two as sequences of Unicode codepoints (Python and Pike), and one as sequences of UTF-16 code units (JS). But in all five, subscripting a double-quoted string yields a single-quoted character. I'm firmly of the opinion that this should not change. Code clarity is not helped by creating a brand-new "character" type and not having a corresponding literal for it, and the one obvious literal, given the amount of prior art using it, would be some form of quote character - probably the apostrophe. Since that's not available, I think a character type would be a major hurdle to get over. ChrisA From elazarg at gmail.com Sat Aug 20 13:18:35 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Sat, 20 Aug 2016 17:18:35 +0000 Subject: [Python-ideas] Consider having collections.abc.Sequence In-Reply-To: References: Message-ID: On Sat, Aug 20, 2016 at 3:54 PM Michael Selik wrote: > On Fri, Aug 19, 2016 at 8:55 AM Neil Girdhar > wrote: > >> Sure. >> >> http://bugs.python.org/issue27802 >> >> >> On Friday, August 19, 2016 at 8:36:39 AM UTC-4, Emanuel Barry wrote: >>> >>> Arek Bulski wrote: >>> >>> > Could use all(a==b for zip(seq,seq2)) >>> >>> >>> >>> Or even `all(itertools.starmap(operator.eq, zip(a, b)))` if you prefer, >>> but this isn?t about how easy or clever or obfuscated one can write that; >>> it?s about convenience. ABCs expose the lowest common denominator for >>> concrete classes of their kind, and having __eq__ makes sense for Sequence >>> (I?m surprised that it?s not already in). >>> >>> >>> >>> I think we can skip the Python-ideas thread and go straight to opening >>> an issue and submitting a patch :) Neil, care to do that? >>> >>> >>> >>> -Emanuel >>> >> > tuples and lists are both Sequences, yet are not equal to each other. > > py> [1] == (1,) > False > > As long as you treat them as an ABC.Sequences, they _should_ be equal. One can think of a static method Sequence.equals(a, b) for that purpose. Of course, that's not how it's done in dynamic languages such as Python (or Java!), so implementing the default __eq__ this way will break symmetry. ~Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Sat Aug 20 13:26:29 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Sat, 20 Aug 2016 17:26:29 +0000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: On Sat, Aug 20, 2016 at 4:54 PM Chris Angelico wrote: > I'm firmly of the opinion that this should not change. Code clarity is > not helped by creating a brand-new "character" type and not having a > corresponding literal for it, and the one obvious literal, given the > amount of prior art using it, would be some form of quote character - > probably the apostrophe. Since that's not available, I think a > character type would be a major hurdle to get over. It's possible (though not so pretty or obvious) to use $a for the character "a". ~Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sat Aug 20 13:32:10 2016 From: eric at trueblade.com (Eric V. Smith) Date: Sat, 20 Aug 2016 13:32:10 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On 8/19/2016 2:57 PM, Guido van Rossum wrote: > I don't think we should take action now. > > Would it make sense, as a precaution, to declare the PEP provisional for > one release? Then we can develop a sense of whether the current approach > causes real problems. > > We could also emit some kind of warning if the expression part contains > an escaped quote, since that's where a potential change would cause > breakage. (Or we could leave that to the linters.) If anything, I'd make it an error to have any backslashes inside the brackets of an f-string for 3.6. We could always remove this restriction at a later date. I say this because as soon as f-strings make it into the wild, we're going to have a hard time breaking code in say 3.7 by saying "well, we told you that f-strings might change". Although frankly, other than be executive fiat (which I'm okay with), I don't see us ever resolving the issue if f-strings are strings first, or if the brackets put you into "non-string mode". There are good arguments on both sides. Moving to the implementation details, I'm not sure how easy it would be to even find backslashes, though. IIRC, backslashes are replaced early, before the f-string parser really gets to process the string. It might require a new implementation of the f-string parser independent of regular strings, which I likely would not have time for before beta 1. Although since this would be a reduction in functionality, maybe it doesn't have to get done by then. I also haven't thought of how this would affect raw f-strings. In any event, I'll take a look at adding this restriction, just to get an estimate of the magnitude of work involved. The easiest thing to do might be to disallow backslashes in any part of an f-string for 3.6, although that seems to be going too far. Eric. From random832 at fastmail.com Sat Aug 20 14:25:22 2016 From: random832 at fastmail.com (Random832) Date: Sat, 20 Aug 2016 14:25:22 -0400 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: <1471717522.2931769.701118945.091DB9CC@webmail.messagingengine.com> On Sat, Aug 20, 2016, at 13:26, ????? wrote: > It's possible (though not so pretty or obvious) to use $a for the > character "a". This isn't Lisp. If I were inventing a character literal for Python I would probably spell it c'a'. From python at 2sn.net Sat Aug 20 16:39:21 2016 From: python at 2sn.net (Alexander Heger) Date: Sun, 21 Aug 2016 06:39:21 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: Message-ID: > > It isn't so much that strings are special, it's that lists and tuples are > special. Very few iterables can be directly converted to Numpy arrays. Try > `np.array({1,2})` and you get `array({1, 2}, dtype=object)`, a > 0-dimensional array. > there is no representation for sets as unordered data as in Numpy all is ordered > > But it does deal with strings as monolithic objects, > > Seems to me that Numpy treats strings as "I, uh, don't really know what > you want me to do with this" objects. That kinda makes sense for Numpy, > because, uh, what DO you want Numpy to do with strings? > if it was an iterable, convert to an array length-one characters > Numpy is NOT designed to mess around with strings, and Numpy does NOT have > as high a proportion of programmers using it for strings, so Numpy does not > have much insight into what's good and what's useful for programmers who > need to mess around with strings. > sometimes arrays of string-like objects are needed. In summary, Numpy isn't a good example of "strings done right, through more > experience", because they are barely "done" at all. > > > doing away with many of the pitfalls of strings in Python. > > Please start listing the pitfalls, and show how alternatives will be an > improvement. > The question is about determination of iterable objects. This has been discussed many times on this list. def f(x): try: for i in x: f(i) except: print(x) f((1,2,3)) f(('house', 'car', 'cup')) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at 2sn.net Sat Aug 20 16:51:45 2016 From: python at 2sn.net (Alexander Heger) Date: Sun, 21 Aug 2016 06:51:45 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: > > Agreed. One of the handy traits of cross-platform code is that MANY > languages let you subscript a double-quoted string to get a > single-quoted character. Compare these blocks of code: > > if ("asdf"[0] == 'a') > write("The first letter of asdf is a.\n"); > > if ("asdf"[0] == 'a'): > print("The first letter of asdf is a.") > > if ("asdf"[0] == 'a') > console.log("The first letter of asdf is a.") > > if ("asdf"[0] == 'a') > printf("The first letter of asdf is a.\n"); > > if ("asdf"[0] == 'a') > echo("The first letter of asdf is a.\n"); > > Those are Pike, Python, JavaScript/ECMAScript, C/C++, and PHP, > respectively. Two of them treat single-quoted and double-quoted > strings identically (Python and JS). Two use double quotes for strings > and single quotes for character (aka integer) constants (Pike and C). > One has double quotes for interpolated and single quotes for > non-interpolated strings (PHP). And just to mess you up completely, > two (or three) of these define strings to be sequences of bytes (C/C++ > and PHP, plus Python 2), two as sequences of Unicode codepoints > (Python and Pike), and one as sequences of UTF-16 code units (JS). But > in all five, subscripting a double-quoted string yields a > single-quoted character. > > I'm firmly of the opinion that this should not change. Code clarity is > not helped by creating a brand-new "character" type and not having a > corresponding literal for it, and the one obvious literal, given the > amount of prior art using it, would be some form of quote character - > probably the apostrophe. Since that's not available, I think a > character type would be a major hurdle to get over. I was not proposing a character type, only that strings are not iterable: for i in 'abc': print(i) TypeError: 'str' object is not iterable -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at 2sn.net Sat Aug 20 16:54:00 2016 From: python at 2sn.net (Alexander Heger) Date: Sun, 21 Aug 2016 06:54:00 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: > > I was not proposing a character type, only that strings are not iterable: > > for i in 'abc': > print(i) > > TypeError: 'str' object is not iterable > but for i in 'abc'.chars(): print(i) a b c -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at 2sn.net Sat Aug 20 16:56:49 2016 From: python at 2sn.net (Alexander Heger) Date: Sun, 21 Aug 2016 06:56:49 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: > > > Yes, I am aware it will cause a lot of backward incompatibilities... > > Tell me, would you retain the ability to subscript a string to get its > characters? > > >>> "asdf"[0] > 'a' > > If not, you break a ton of code. If you do, they are automatically > iterable *by definition*. Watch: > > class IsThisIterable: > def __getitem__(self, idx): > if idx < 5: return idx*idx > raise IndexError > > >>> iti = IsThisIterable() > >>> for x in iti: print(x) > ... > 0 > 1 > 4 > 9 > 16 > > So you can't lose iteration without also losing subscripting. > Python here does a lot of things implicitly. I always felt the (explicit) index operator in strings in many other languages sort of is syntactic sugar, in python it was taken to do literally the same things as on other objects. But it does not have to be that way. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.selik at gmail.com Sat Aug 20 17:04:08 2016 From: michael.selik at gmail.com (Michael Selik) Date: Sat, 20 Aug 2016 21:04:08 +0000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: On Sat, Aug 20, 2016 at 4:57 PM Alexander Heger wrote: > So you can't lose iteration without also losing subscripting. >> > > Python here does a lot of things implicitly. I always felt the (explicit) > index operator in strings in many other languages sort of is syntactic > sugar, in python it was taken to do literally the same things as on other > objects. But it does not have to be that way. > You can quibble with the original design choice, but unless you borrow Guido's time machine, there's not much point to that discussion. Instead, let's talk about the benefits and problems that your change proposal would cause. Benefits: - no more accidentally using str as an iterable Problems: - any code that subscripts, slices, or iterates over a str will break Did I leave anything out? How would you weigh the benefits against the problems? How would you manage the upgrade path for code that's been broken? -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at 2sn.net Sat Aug 20 17:27:53 2016 From: python at 2sn.net (Alexander Heger) Date: Sun, 21 Aug 2016 07:27:53 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: > > So you can't lose iteration without also losing subscripting. >>> >> >> Python here does a lot of things implicitly. I always felt the >> (explicit) index operator in strings in many other languages sort of is >> syntactic sugar, in python it was taken to do literally the same things as >> on other objects. But it does not have to be that way. >> > > You can quibble with the original design choice, but unless you borrow > Guido's time machine, there's not much point to that discussion. > Just to be clear, at the time it was designed, it surely was a genious idea with its obvious simplicity. I spend much of my time refactoring codes and interfaces from previous "genius" ideas, as usage matures. > Instead, let's talk about the benefits and problems that your change > proposal would cause. > > Benefits: > - no more accidentally using str as an iterable > > Problems: > - any code that subscripts, slices, or iterates over a str will break > I would try to keep indexing and slicing, but not iterating. Though there have been comments that may not be straightforward to implement. Not sure if strings would need to acquire a "substring" attribute that can be indexed and sliced. Did I leave anything out? > How would you weigh the benefits against the problems? > How would you manage the upgrade path for code that's been broken? > FIrst one needs to add the extension string attributes like split()/split(''), chars(), and substring[] (Python 3.7). When indexing becomes disallowed (Python 3.10 / 4.0) attempts to iterate (or slice) will raise TypeError. The fixes overall will be a lot easier and obvious than introduction of unicode as default string type in Python 3.0. It could already be used/test starting with Python 3.7 using 'from future import __monolythic_strings__`. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Sat Aug 20 17:56:29 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Sat, 20 Aug 2016 21:56:29 +0000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: On Sun, Aug 21, 2016 at 12:28 AM Alexander Heger wrote: > Did I leave anything out? >> How would you weigh the benefits against the problems? >> How would you manage the upgrade path for code that's been broken? >> > > FIrst one needs to add the extension string attributes like > split()/split(''), chars(), and substring[] (Python 3.7). > > When indexing becomes disallowed (Python 3.10 / 4.0) attempts to iterate > (or slice) will raise TypeError. The fixes overall will be a lot easier > and obvious than introduction of unicode as default string type in Python > 3.0. It could already be used/test starting with Python 3.7 using 'from > future import __monolythic_strings__`. > > Is there any equivalent __future__ import with such deep semantic implications? Most imports I can think of are mainly syntactic. And what would it do? change the type of string literals? change the behavior of str methods locally in this module? globally? How will this play with 3rd party libraries? Sounds like it will break stuff in a way that cannot be locally fixed. ~Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.selik at gmail.com Sat Aug 20 20:34:02 2016 From: michael.selik at gmail.com (Michael Selik) Date: Sun, 21 Aug 2016 00:34:02 +0000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: On Sat, Aug 20, 2016 at 5:27 PM Alexander Heger wrote: > - any code that subscripts, slices, or iterates over a str will break >> > > I would try to keep indexing and slicing, but not iterating. > So anything that wants to loop over a string character by character would need to construct a new object, like ``for c in list(s)``? That seems inefficient. I suppose you might advocate for a new type, some sort of stringview that would allow iteration over a string, but avoid allocating so much space as a list does, but that might bring us back to where we started. > The fixes overall will be a lot easier and obvious than introduction of > unicode as default string type in Python 3.0. > That's a bold claim. Have you considered what's at stake if that's not true? Anyway, why don't you write a proof of concept module for a non-iterable string, throw it on PyPI, and see if people like using it? -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Aug 20 22:52:48 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 21 Aug 2016 12:52:48 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: <20160821025247.GZ26300@ando.pearwood.info> On Sun, Aug 21, 2016 at 12:34:02AM +0000, Michael Selik wrote: > So anything that wants to loop over a string character by character would > need to construct a new object, like ``for c in list(s)``? That seems > inefficient. I suppose you might advocate for a new type, some sort of > stringview that would allow iteration over a string, but avoid allocating > so much space as a list does, but that might bring us back to where we > started. If this was ten years ago, and we were talking about backwards incompatible changes for the soon-to-be-started Python 3000, I might be more responsive to changing strings to be an atomic type (like ints, floats, etc) with a .chars() view that iterates over the characters. Or something like what Go does (I think), namely to distinguish between Chars and Strings: indexing a string gives you a Char, and Chars are not indexable and not iterable. But even then, the change probably would have required a PEP. > > The fixes overall will be a lot easier and obvious than introduction of > > unicode as default string type in Python 3.0. > > That's a bold claim. Have you considered what's at stake if that's not true? Saying that these so-called "fixes" (we haven't established yet that Python's string behaviour is a bug that need fixing) will be easier and more obvious than the change to Unicode is not that bold a claim. Pretty much everything is easier and more obvious than changing to Unicode. :-) (Possibly not bringing peace to the Middle East.) I think that while the suggestion does bring some benefit, the benefit isn't enough to make up for the code churn and disruption it would cause. But I encourage the OP to go through the standard library, pick a couple of modules, and re-write them to see how they would look using this proposal. -- Steve From rosuav at gmail.com Sun Aug 21 00:10:41 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 21 Aug 2016 14:10:41 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: <20160821025247.GZ26300@ando.pearwood.info> References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> Message-ID: On Sun, Aug 21, 2016 at 12:52 PM, Steven D'Aprano wrote: >> > The fixes overall will be a lot easier and obvious than introduction of >> > unicode as default string type in Python 3.0. >> >> That's a bold claim. Have you considered what's at stake if that's not true? > > Saying that these so-called "fixes" (we haven't established yet that > Python's string behaviour is a bug that need fixing) will be easier and > more obvious than the change to Unicode is not that bold a claim. Pretty > much everything is easier and more obvious than changing to Unicode. :-) > (Possibly not bringing peace to the Middle East.) And yet it's so simple. We can teach novice programmers about two's complement [1] representations of integers, and they have no trouble comprehending that the abstract concept of "integer" is different from the concrete representation in memory. We can teach intermediate programmers how hash tables work, and how to improve their performance on CPUs with 64-byte cache lines - again, there's no comprehension barrier between "mapping from key to value" and "puddle of bytes in memory that represent that mapping". But so many programmers are entrenched in the thinking that a byte IS a character. > I think that while the suggestion does bring some benefit, the benefit > isn't enough to make up for the code churn and disruption it would > cause. But I encourage the OP to go through the standard library, pick a > couple of modules, and re-write them to see how they would look using > this proposal. Python still has a rule that you can iterate over anything that has __getitem__, and it'll be called with 0, 1, 2, 3... until it raises IndexError. So you have two options: Remove that rule, and require that all iterable objects actually define __iter__; or make strings non-subscriptable, which means you need to do something like "asdf".char_at(0) instead of "asdf"[0]. IMO the second option is a total non-flyer - good luck convincing anyone that THAT is an improvement. The first one is possible, but dramatically broadens the backward-compatibility issue. You'd have to search for any class that defines __getitem__ and not __iter__. If that *does* get considered, it wouldn't be too hard to have a compatibility function, maybe in itertools. def subscript(self): i = 0 try: while "moar indexing": yield self[i] i += 1 except IndexError: pass class Demo: def __getitem__(self, item): ... __iter__ = itertools.subscript But there'd have to be the full search of "what will this break", even before getting as far as making strings non-iterable. ChrisA [1] Not "two's compliment", although I'm told that Two can say some very nice things. From brenbarn at brenbarn.net Sun Aug 21 01:06:07 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Sat, 20 Aug 2016 22:06:07 -0700 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> Message-ID: <57B936BF.1030008@brenbarn.net> On 2016-08-20 21:10, Chris Angelico wrote: >> >I think that while the suggestion does bring some benefit, the benefit >> >isn't enough to make up for the code churn and disruption it would >> >cause. But I encourage the OP to go through the standard library, pick a >> >couple of modules, and re-write them to see how they would look using >> >this proposal. > Python still has a rule that you can iterate over anything that has > __getitem__, and it'll be called with 0, 1, 2, 3... until it raises > IndexError. So you have two options: Remove that rule, and require > that all iterable objects actually define __iter__; or make strings > non-subscriptable, which means you need to do something like > "asdf".char_at(0) instead of "asdf"[0]. Isn't the rule that that __getitem__ iteration is available only if __iter__ is not explicitly defined? So there is a third option: retain __getitem__ but give this new modified string type an explicit __iter__ that raises TypeError. That said, I'm not sure I really support the overall proposal to change the behavior of strings. I agree that it is annoying that sometimes when you try to iterate over something you accidentally end up iterating over the characters of a string, but it's been that way for quite a while and changing it would be a significant behavior change. It seems like the main practical problem might be solved by just providing a standard library function iter_or_string or whatever, that just returns a one-item iterator if its argument is a string, or the normal iterator if not. It seems that gazillions of libraries already define such a function, and the main problem is just that, because there is no standard one, many people don't realize they need it until they accidentally iterate over a string and their code goes awry. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From ncoghlan at gmail.com Sun Aug 21 01:22:55 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Aug 2016 15:22:55 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> Message-ID: On 21 August 2016 at 14:10, Chris Angelico wrote: > On Sun, Aug 21, 2016 at 12:52 PM, Steven D'Aprano wrote: >> I think that while the suggestion does bring some benefit, the benefit >> isn't enough to make up for the code churn and disruption it would >> cause. But I encourage the OP to go through the standard library, pick a >> couple of modules, and re-write them to see how they would look using >> this proposal. > > Python still has a rule that you can iterate over anything that has > __getitem__, and it'll be called with 0, 1, 2, 3... until it raises > IndexError. So you have two options: Remove that rule, and require > that all iterable objects actually define __iter__; or make strings > non-subscriptable, which means you need to do something like > "asdf".char_at(0) instead of "asdf"[0]. IMO the second option is a > total non-flyer - good luck convincing anyone that THAT is an > improvement. The first one is possible, but dramatically broadens the > backward-compatibility issue. You'd have to search for any class that > defines __getitem__ and not __iter__. That's not actually true - any type that defines __getitem__ can prevent iteration just by explicitly raising TypeError from __iter__. It would be *weird* to do so, but it's entirely possible. However, the real problem with this proposal (and the reason why the switch from 8-bit str to "bytes are effectively a tuple of ints" in Python 3 was such a pain), is that there are a lot of bytes and text processing operations that *really do* operate code point by code point. Scanning a path for directory separators, scanning a CSV (or other delimited format) for delimiters, processing regular expressions, tokenising according to a grammar, analysing words in a text for character popularity, answering questions like "Is this a valid identifier?" all involve looking at each character in a sequence individually, rather than looking at the character sequence as an atomic unit. The idiomatic pattern for doing that kind of "item by item" processing in Python is iteration (whether through the Python syntax and builtins, or through the CPython C API). Now, if we were designing a language from scratch today, there's a strong case to be made that the *right* way to represent text is to have a stream-like interface (e.g. StringIO, BytesIO) around an atomic type (e.g. CodePoint, int). But we're not designing a language from scratch - we're iterating on one with a 25 year history of design, development, and use. There may also be a case to be made for introducing an AtomicStr type into Python's data model that works like a normal string, but *doesn't* support indexing, slicing, or iteration, and is instead an opaque blob of data that nevertheless supports all the other usual string operations. (Similar to the way that types.MappingProxyType lets you provide a read-only view of an otherwise mutable mapping, and that collections.KeysView, ValuesView and ItemsView provide different interfaces for a common underlying mapping) But changing the core text type itself to no longer be suitable for use in text processing tasks? Not gonna happen :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Sun Aug 21 01:27:32 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 21 Aug 2016 15:27:32 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: <57B936BF.1030008@brenbarn.net> References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> <57B936BF.1030008@brenbarn.net> Message-ID: On Sun, Aug 21, 2016 at 3:06 PM, Brendan Barnwell wrote: > On 2016-08-20 21:10, Chris Angelico wrote: >>> >>> >I think that while the suggestion does bring some benefit, the benefit >>> >isn't enough to make up for the code churn and disruption it would >>> >cause. But I encourage the OP to go through the standard library, pick a >>> >couple of modules, and re-write them to see how they would look using >>> >this proposal. >> >> Python still has a rule that you can iterate over anything that has >> __getitem__, and it'll be called with 0, 1, 2, 3... until it raises >> IndexError. So you have two options: Remove that rule, and require >> that all iterable objects actually define __iter__; or make strings >> non-subscriptable, which means you need to do something like >> "asdf".char_at(0) instead of "asdf"[0]. > > > Isn't the rule that that __getitem__ iteration is available only if > __iter__ is not explicitly defined? So there is a third option: retain > __getitem__ but give this new modified string type an explicit __iter__ that > raises TypeError. Hmm. It would somehow need to be recognized as "not iterable". I'm not sure how this detection is done; is it based on the presence/absence of __iter__, or is it by calling that method and seeing what comes back? If the latter, then sure, an __iter__ that raises would cover that. ChrisA From ncoghlan at gmail.com Sun Aug 21 01:32:44 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Aug 2016 15:32:44 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> Message-ID: On 21 August 2016 at 15:22, Nick Coghlan wrote: > There may also be a case to be made for introducing an AtomicStr type > into Python's data model that works like a normal string, but > *doesn't* support indexing, slicing, or iteration, and is instead an > opaque blob of data that nevertheless supports all the other usual > string operations. (Similar to the way that types.MappingProxyType > lets you provide a read-only view of an otherwise mutable mapping, and > that collections.KeysView, ValuesView and ItemsView provide different > interfaces for a common underlying mapping) Huh, prompted by Brendan Barnwell's comment, I just realised that a discussion I was having with Graham Dumpleton at PyCon Australia about getting the wrapt module (or equivalent functionality) into Python 3.7 (not 3.6 just due to the respective timelines) is actually relevant here: given wrapt.ObjectProxy (see http://wrapt.readthedocs.io/en/latest/wrappers.html#object-proxy ) it shouldn't actually be that difficult to write an "atomic_proxy" implementation that wraps arbitrary container objects in a proxy that permits most operations, but actively *prevents* them from being treated as collections.abc.Container instances of any kind. So if folks are looking for a way to resolve the perennial problem of "How do I tell container processing algorithms to treat *this particular container* as an opaque blob?" that arise most often with strings and binary data, I'd highly recommend that as a potentially fruitful avenue to start exploring. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From eryksun at gmail.com Sun Aug 21 02:02:47 2016 From: eryksun at gmail.com (eryk sun) Date: Sun, 21 Aug 2016 06:02:47 +0000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> <57B936BF.1030008@brenbarn.net> Message-ID: On Sun, Aug 21, 2016 at 5:27 AM, Chris Angelico wrote: > Hmm. It would somehow need to be recognized as "not iterable". I'm not > sure how this detection is done; is it based on the presence/absence > of __iter__, or is it by calling that method and seeing what comes > back? If the latter, then sure, an __iter__ that raises would cover > that. PyObject_GetIter calls __iter__ (i.e. tp_iter) if it's defined. To get a TypeError, __iter__ can return an object that's not an iterator, i.e. an object that doesn't have a __next__ method (i.e. tp_iternext). For example: >>> class C: ... def __iter__(self): return self ... def __getitem__(self, index): return 42 ... >>> iter(C()) Traceback (most recent call last): File "", line 1, in TypeError: iter() returned non-iterator of type 'C' If __iter__ isn't defined but __getitem__ is defined, then PySeqIter_New is called to get a sequence iterator. >>> class D: ... def __getitem__(self, index): return 42 ... >>> it = iter(D()) >>> type(it) >>> next(it) 42 From ncoghlan at gmail.com Sun Aug 21 02:16:48 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Aug 2016 16:16:48 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> <57B936BF.1030008@brenbarn.net> Message-ID: On 21 August 2016 at 16:02, eryk sun wrote: > On Sun, Aug 21, 2016 at 5:27 AM, Chris Angelico wrote: >> Hmm. It would somehow need to be recognized as "not iterable". I'm not >> sure how this detection is done; is it based on the presence/absence >> of __iter__, or is it by calling that method and seeing what comes >> back? If the latter, then sure, an __iter__ that raises would cover >> that. > > PyObject_GetIter calls __iter__ (i.e. tp_iter) if it's defined. To get > a TypeError, __iter__ can return an object that's not an iterator, > i.e. an object that doesn't have a __next__ method (i.e. tp_iternext). I believe Chris's concern was that "isintance(obj, collections.abc.Iterable)" would still return True. That's actually a valid concern, but Python 3.6 generalises the previously __hash__ specific "__hash__ = None" anti-registration mechanism to other protocols, including __iter__: https://hg.python.org/cpython/rev/72b9f195569c Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From michael.selik at gmail.com Sun Aug 21 02:34:56 2016 From: michael.selik at gmail.com (Michael Selik) Date: Sun, 21 Aug 2016 06:34:56 +0000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> <57B936BF.1030008@brenbarn.net> Message-ID: On Sun, Aug 21, 2016 at 1:27 AM Chris Angelico wrote: > Hmm. It would somehow need to be recognized as "not iterable". I'm not > sure how this detection is done; is it based on the presence/absence > of __iter__, or is it by calling that method and seeing what comes > back? If the latter, then sure, an __iter__ that raises would cover > that. > The detection of not hashable via __hash__ set to None was necessary, but not desirable. Better to have never defined the method/attribute in the first place. Since __iter__ isn't present on ``object``, we're free to use the better technique of not defining __iter__ rather than defining it as None, NotImplemented, etc. This is superior, because we don't want __iter__ to show up in a dir(), help(), or other tools. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eryksun at gmail.com Sun Aug 21 02:45:33 2016 From: eryksun at gmail.com (eryk sun) Date: Sun, 21 Aug 2016 06:45:33 +0000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> <57B936BF.1030008@brenbarn.net> Message-ID: On Sun, Aug 21, 2016 at 6:34 AM, Michael Selik wrote: > The detection of not hashable via __hash__ set to None was necessary, but > not desirable. Better to have never defined the method/attribute in the > first place. Since __iter__ isn't present on ``object``, we're free to use > the better technique of not defining __iter__ rather than defining it as > None, NotImplemented, etc. This is superior, because we don't want __iter__ > to show up in a dir(), help(), or other tools. The point is to be able to define __getitem__ without falling back on the sequence iterator. I wasn't aware of the recent commit that allows anti-registration of __iter__. This is perfect: >>> class C: ... __iter__ = None ... def __getitem__(self, index): return 42 ... >>> iter(C()) Traceback (most recent call last): File "", line 1, in TypeError: 'C' object is not iterable >>> isinstance(C(), collections.abc.Iterable) False From michael.selik at gmail.com Sun Aug 21 02:53:35 2016 From: michael.selik at gmail.com (Michael Selik) Date: Sun, 21 Aug 2016 06:53:35 +0000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> <57B936BF.1030008@brenbarn.net> Message-ID: On Sun, Aug 21, 2016 at 2:46 AM eryk sun wrote: > On Sun, Aug 21, 2016 at 6:34 AM, Michael Selik > wrote: > > The detection of not hashable via __hash__ set to None was necessary, but > > not desirable. Better to have never defined the method/attribute in the > > first place. Since __iter__ isn't present on ``object``, we're free to > use > > the better technique of not defining __iter__ rather than defining it as > > None, NotImplemented, etc. This is superior, because we don't want > __iter__ > > to show up in a dir(), help(), or other tools. > > The point is to be able to define __getitem__ without falling back on > the sequence iterator. > > I wasn't aware of the recent commit that allows anti-registration of > __iter__. This is perfect: > > >>> class C: > ... __iter__ = None > ... def __getitem__(self, index): return 42 > ... > >>> iter(C()) > Traceback (most recent call last): > File "", line 1, in > TypeError: 'C' object is not iterable > >>> isinstance(C(), collections.abc.Iterable) > False > For that to make sense, Iterable should be a parent of C, or C should be a subclass of something registered as an Iterable. Otherwise it'd be creating a general recommendation to say ``__iter__ = None`` on every non-Iterable class, which would be silly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.selik at gmail.com Sun Aug 21 02:56:11 2016 From: michael.selik at gmail.com (Michael Selik) Date: Sun, 21 Aug 2016 06:56:11 +0000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> <57B936BF.1030008@brenbarn.net> Message-ID: On Sun, Aug 21, 2016 at 2:53 AM Michael Selik wrote: > On Sun, Aug 21, 2016 at 2:46 AM eryk sun wrote: > >> On Sun, Aug 21, 2016 at 6:34 AM, Michael Selik >> wrote: >> > The detection of not hashable via __hash__ set to None was necessary, >> but >> > not desirable. Better to have never defined the method/attribute in the >> > first place. Since __iter__ isn't present on ``object``, we're free to >> use >> > the better technique of not defining __iter__ rather than defining it as >> > None, NotImplemented, etc. This is superior, because we don't want >> __iter__ >> > to show up in a dir(), help(), or other tools. >> >> The point is to be able to define __getitem__ without falling back on >> the sequence iterator. >> >> I wasn't aware of the recent commit that allows anti-registration of >> __iter__. This is perfect: >> >> >>> class C: >> ... __iter__ = None >> ... def __getitem__(self, index): return 42 >> ... >> >>> iter(C()) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: 'C' object is not iterable >> >>> isinstance(C(), collections.abc.Iterable) >> False >> > > For that to make sense, Iterable should be a parent of C, or C should be a > subclass of something registered as an Iterable. Otherwise it'd be creating > a general recommendation to say ``__iter__ = None`` on every non-Iterable > class, which would be silly. > I see your point for avoiding iterability when having __getitem__, but I hope that's seen as an anti-pattern that reduces flexibility. And I should learn to stop hitting the send button halfway through my email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Sun Aug 21 03:02:02 2016 From: random832 at fastmail.com (Random832) Date: Sun, 21 Aug 2016 03:02:02 -0400 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> <57B936BF.1030008@brenbarn.net> Message-ID: <1471762922.3059356.701402697.576E9016@webmail.messagingengine.com> On Sun, Aug 21, 2016, at 02:53, Michael Selik wrote: > For that to make sense, Iterable should be a parent of C, or C should > be a subclass of something registered as an Iterable. Otherwise it'd > be creating a general recommendation to say ``__iter__ = None`` on > every non-Iterable class, which would be silly. Er, we're talking about defining it on every non-iterable class *that defines __getitem__* (such as str in this thought experiment) From eryksun at gmail.com Sun Aug 21 03:14:09 2016 From: eryksun at gmail.com (eryk sun) Date: Sun, 21 Aug 2016 07:14:09 +0000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> <57B936BF.1030008@brenbarn.net> Message-ID: On Sun, Aug 21, 2016 at 6:53 AM, Michael Selik wrote: > For that to make sense, Iterable should be a parent of C, or C should be a > subclass of something registered as an Iterable. Otherwise it'd be creating > a general recommendation to say ``__iter__ = None`` on every non-Iterable > class, which would be silly. Iterable is a one-trick pony ABC that formerly just checked for an __iter__ method using any("__iter__" in B.__dict__ for B in C.__mro__). It was mentioned that the default __getitem__ iterator can be avoided by defining __iter__ as a callable that either directly or indirectly raises a TypeError, but that's an instance of Iterable, which is misleading. In 3.6 you can instead set `__iter__ = None`. At the low-level, slot_tp_iter has been updated to look for this with the following code: func = lookup_method(self, &PyId___iter__); if (func == Py_None) { Py_DECREF(func); PyErr_Format(PyExc_TypeError, "'%.200s' object is not iterable", Py_TYPE(self)->tp_name); return NULL; } At the high level, Iterable.__subclasshook__ calls _check_methods(C, "__iter__"): def _check_methods(C, *methods): mro = C.__mro__ for method in methods: for B in mro: if method in B.__dict__: if B.__dict__[method] is None: return NotImplemented break else: return NotImplemented return True From leewangzhong+python at gmail.com Sun Aug 21 03:33:23 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sun, 21 Aug 2016 03:33:23 -0400 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> Message-ID: On Aug 21, 2016 1:23 AM, "Nick Coghlan" wrote: > Now, if we were designing a language from scratch today, there's a > strong case to be made that the *right* way to represent text is to > have a stream-like interface (e.g. StringIO, BytesIO) around an atomic > type (e.g. CodePoint, int). But we're not designing a language from > scratch - we're iterating on one with a 25 year history of design, > development, and use. > > There may also be a case to be made for introducing an AtomicStr type > into Python's data model that works like a normal string, but > *doesn't* support indexing, slicing, or iteration, and is instead an > opaque blob of data that nevertheless supports all the other usual > string operations. (Similar to the way that types.MappingProxyType > lets you provide a read-only view of an otherwise mutable mapping, and > that collections.KeysView, ValuesView and ItemsView provide different > interfaces for a common underlying mapping) > > But changing the core text type itself to no longer be suitable for > use in text processing tasks? Not gonna happen :) Thought: A string, in compsci, is, sort of by definition, a sequence of characters. It is short for "string of characters", isn't it? If you were to create a new language, and you don't want to think of strings as char sequences, you might have a type called Text instead. Programmers could be required to call functions to get iterables, such as myText.chars(), myText.lines(), and even myText.words(). Thus, the proposal makes str try to be a Text type rather than the related but distinct String type. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leewangzhong+python at gmail.com Sun Aug 21 03:51:19 2016 From: leewangzhong+python at gmail.com (Franklin? Lee) Date: Sun, 21 Aug 2016 03:51:19 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On Aug 20, 2016 1:32 PM, "Eric V. Smith" wrote: > > On 8/19/2016 2:57 PM, Guido van Rossum wrote: >> >> I don't think we should take action now. >> >> Would it make sense, as a precaution, to declare the PEP provisional for >> one release? Then we can develop a sense of whether the current approach >> causes real problems. >> >> We could also emit some kind of warning if the expression part contains >> an escaped quote, since that's where a potential change would cause >> breakage. (Or we could leave that to the linters.) > > > If anything, I'd make it an error to have any backslashes inside the brackets of an f-string for 3.6. We could always remove this restriction at a later date. > > I say this because as soon as f-strings make it into the wild, we're going to have a hard time breaking code in say 3.7 by saying "well, we told you that f-strings might change". > > Although frankly, other than be executive fiat (which I'm okay with), I don't see us ever resolving the issue if f-strings are strings first, or if the brackets put you into "non-string mode". There are good arguments on both sides. > > Moving to the implementation details, I'm not sure how easy it would be to even find backslashes, though. IIRC, backslashes are replaced early, before the f-string parser really gets to process the string. It might require a new implementation of the f-string parser independent of regular strings, which I likely would not have time for before beta 1. Although since this would be a reduction in functionality, maybe it doesn't have to get done by then. > > I also haven't thought of how this would affect raw f-strings. > > In any event, I'll take a look at adding this restriction, just to get an estimate of the magnitude of work involved. The easiest thing to do might be to disallow backslashes in any part of an f-string for 3.6, although that seems to be going too far. > > > Eric. Speaking of which, how is this parsed? f"{'\n'}" If escape-handling is done first, the expression is a string literal holding an actual newline character (normally illegal), rather than an escape sequence which resolves to a newline character. If that one somehow works, how about this? f"{r'\n'}" I guess you'd have to write one of these: f"{'\\n'}" f"{'''\n''')" rf"{'\n'}" -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Sun Aug 21 03:55:37 2016 From: tritium-list at sdamon.com (tritium-list at sdamon.com) Date: Sun, 21 Aug 2016 03:55:37 -0400 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: <0c9c01d1fb81$67fdbc10$37f93430$@hotmail.com> That would require strings to also not be sequences, or to totally drop the sequence protocol. These are non-starters. They *will not* happen. Not they shouldn?t happen, or they probably won?t happen. They cannot and will not happen. That is a much bigger break than they were even willing to make between 2 and 3. From: Python-ideas [mailto:python-ideas-bounces+tritium-list=sdamon.com at python.org] On Behalf Of Alexander Heger Sent: Saturday, August 20, 2016 4:52 PM To: Chris Angelico Cc: python-ideas Subject: Re: [Python-ideas] discontinue iterable strings I was not proposing a character type, only that strings are not iterable: for i in 'abc': print(i) TypeError: 'str' object is not iterable -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Aug 21 03:56:59 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 21 Aug 2016 17:56:59 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On Sun, Aug 21, 2016 at 5:51 PM, Franklin? Lee wrote: > Speaking of which, how is this parsed? > f"{'\n'}" > If escape-handling is done first, the expression is a string literal holding > an actual newline character (normally illegal), rather than an escape > sequence which resolves to a newline character. It's illegal. > If that one somehow works, how about this? > f"{r'\n'}" Also illegal. > I guess you'd have to write one of these: > f"{'\\n'}" > f"{'''\n''')" > rf"{'\n'}" Modulo the typo in the second one, these all result in the same code: >>> dis.dis(lambda: f"{'\\n'}") 1 0 LOAD_CONST 1 ('\n') 2 FORMAT_VALUE 0 4 RETURN_VALUE >>> f"{'\\n'}" '\n' ChrisA From tritium-list at sdamon.com Sun Aug 21 04:08:47 2016 From: tritium-list at sdamon.com (tritium-list at sdamon.com) Date: Sun, 21 Aug 2016 04:08:47 -0400 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> Message-ID: <0ca101d1fb83$3ea5dd00$bbf19700$@hotmail.com> From: Python-ideas [mailto:python-ideas-bounces+tritium-list=sdamon.com at python.org] On Behalf Of ????? Sent: Saturday, August 20, 2016 5:56 PM To: python-ideas Subject: Re: [Python-ideas] discontinue iterable strings On Sun, Aug 21, 2016 at 12:28 AM Alexander Heger wrote: Did I leave anything out? How would you weigh the benefits against the problems? How would you manage the upgrade path for code that's been broken? FIrst one needs to add the extension string attributes like split()/split(''), chars(), and substring[] (Python 3.7). When indexing becomes disallowed (Python 3.10 / 4.0) attempts to iterate (or slice) will raise TypeError. The fixes overall will be a lot easier and obvious than introduction of unicode as default string type in Python 3.0. It could already be used/test starting with Python 3.7 using 'from future import __monolythic_strings__`. Is there any equivalent __future__ import with such deep semantic implications? Most imports I can think of are mainly syntactic. And what would it do? change the type of string literals? change the behavior of str methods locally in this module? globally? How will this play with 3rd party libraries? Sounds like it will break stuff in a way that cannot be locally fixed. ~Elazar from __future__ import unicode_literals outright changes the type of object string literals make (in python 2). If you were to create a non-iterable, non-sequence text type (a horrible idea, IMO) the same thing can be done done for that. From srkunze at mail.de Sun Aug 21 05:42:51 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Sun, 21 Aug 2016 11:42:51 +0200 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: <20160821025247.GZ26300@ando.pearwood.info> References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> Message-ID: <52ad93d1-1394-7099-b025-e7acccdc73f2@mail.de> On 21.08.2016 04:52, Steven D'Aprano wrote: > Saying that these so-called "fixes" (we haven't established yet that > Python's string behaviour is a bug that need fixing) will be easier and > more obvious than the change to Unicode is not that bold a claim. Agreed. Especially those "we need to distinguish between char and string" calls are somewhat irritating. I need to work with such languages at work sometimes and honestly: it sucks (but that may just be me). Furthermore, I don't see much benefit at all. First, the initial run and/or the first test will reveal the wrong behavior. Second, it just makes sense if people use a generic variable (say 'var') for different types of objects. But, well, people shouldn't do that in the first place. Third, it would make iterating over a string more cumbersome. Especially the last point makes me -1 on this proposal. My 2 cents, Sven From rosuav at gmail.com Sun Aug 21 06:13:03 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 21 Aug 2016 20:13:03 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: <0ca101d1fb83$3ea5dd00$bbf19700$@hotmail.com> References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <0ca101d1fb83$3ea5dd00$bbf19700$@hotmail.com> Message-ID: On Sun, Aug 21, 2016 at 6:08 PM, wrote: > > from __future__ import unicode_literals outright changes the type of object string literals make (in python 2). If you were to create a non-iterable, non-sequence text type (a horrible idea, IMO) the same thing can be done done for that. > It could; but that just changes what *literals* make. But what about other sources of strings - str()? bytes.decode()? format()? repr()? Which ones get changed, and which don't? There's no easy way to do this. ChrisA From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Aug 22 05:47:16 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 22 Aug 2016 18:47:16 +0900 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> Message-ID: <22458.51748.683203.658805@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > However, the real problem with this proposal (and the reason why the > switch from 8-bit str to "bytes are effectively a tuple of ints" in > Python 3 was such a pain), is that there are a lot of bytes and text > processing operations that *really do* operate code point by code > point. Sure, but code points aren't strings in any language I use except Python. And AFAIK strings are the only case in Python where a singleton *is* an element, and an element *is* a singleton. (Except it isn't: "ord('ab')" is a TypeError, even though "type('a')" returns "". ) I thought this was cute when I first encountered it (it happens that I was studying how you can embed a set of elements into the semigroup of sequences of such elements in algebra at the time), but it has *never* been of practical use to me that indexing or iterating a str returns str (rather than a code point). "''.join(list('abc'))" being an identity is an interesting, and maybe useful, fact, but I've never missed it in languages that distinguish characters from strings. Perhaps that's because they generally have a split function defined so that "''.join('abc'.split(''))" is also available for that identity. (N.B. Python doesn't accept an empty separator, but Emacs Lisp does, where "'abc'.split('')" returns "['', 'a', 'b', 'c', '']". I guess it's too late to make this change, though.) The reason that switching to bytes is a pain is that we changed the return type of indexing bytes to something requiring conversion of literals. You can't write "bytething[i] == b'a'", you need to write "bytething[i] == ord(b'a')", and "b''.join(list(b'abc')) is an error, not an identity. Of course the world broke! > But we're not designing a language from scratch - we're iterating > on one with a 25 year history of design, development, and use. +1 to that. From ncoghlan at gmail.com Mon Aug 22 08:00:35 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 22 Aug 2016 22:00:35 +1000 Subject: [Python-ideas] discontinue iterable strings In-Reply-To: <22458.51748.683203.658805@turnbull.sk.tsukuba.ac.jp> References: <1E35BCAA-7FBB-4169-9F3D-8039971EF2A3@gmail.com> <20160821025247.GZ26300@ando.pearwood.info> <22458.51748.683203.658805@turnbull.sk.tsukuba.ac.jp> Message-ID: On 22 August 2016 at 19:47, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > However, the real problem with this proposal (and the reason why the > > switch from 8-bit str to "bytes are effectively a tuple of ints" in > > Python 3 was such a pain), is that there are a lot of bytes and text > > processing operations that *really do* operate code point by code > > point. > > Sure, but code points aren't strings in any language I use except > Python. And AFAIK strings are the only case in Python where a > singleton *is* an element, and an element *is* a singleton. Sure, but the main concern at hand ("list(strobj)" giving a broken out list of individual code points rather than TypeError) isn't actually related to the fact those individual items are themselves length-1 strings, it's related to the fact that Python normally considers strings to be a sequence type rather than a scalar value type. str is far from the only builtin container type that NumPy gives the scalar treatment when sticking it into an array: >>> np.array("abc") array('abc', dtype='>> np.array(b"abc") array(b'abc', dtype='|S3') >>> np.array({1, 2, 3}) array({1, 2, 3}, dtype=object) >>> np.array({1:1, 2:2, 3:3}) array({1: 1, 2: 2, 3: 3}, dtype=object) (Interestingly, both bytearray and memoryview get interpreted as "uint8" arrays, unlike the bytes literal - presumably the latter discrepancy is a requirement for compatibility with NumPy's str/unicode handling in Python 2) That's why I suggested that a scalar proxy based on wrapt.ObjectProxy that masked all container related protocols could be an interesting future addition to the standard library (especially if it has been battle-tested on PyPI first). "I want to take this container instance, and make it behave like it wasn't a container, even if other code tries to use it as a container" is usually what people are after when they find str iteration inconvenient, but "treat this container as a scalar value, but otherwise expose all of its methods" is an operation with applications beyond strings. Not-so-coincidentally, that approach would also give us a de facto "code point" type: it would be the result of applying the scalar proxy to a length 1 str instance. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From flying-sheep at web.de Tue Aug 23 04:30:20 2016 From: flying-sheep at web.de (Philipp A.) Date: Tue, 23 Aug 2016 08:30:20 +0000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: Sorry for replying to late, i had an email issue. First two important things: 1. mental model and intuition and 2. precendence. About how to think of them: I?m strongly of the opinion that the mental models of either an alternating sequence of strings and formatted expressions, or a string with ?holes? for expressions, are better than ?a string where parts are magically evaluated after it?s created?. That smells like ?eval? instead of the usual separation of data and code. That?s the same reason I dislike calling them ?f-strings?: they?re undoubtedly **expressions evaluating to strings**, not string literals. Precedence exists in ruby?s and CoffeeScript?s string interpolation, Bash?s $(), JavaScript?s template literals, and many more: https://en.wikipedia.org/wiki/String_interpolation !!! *All* of them that support arbitrary code (And that I ever encountered) work the way I propose for python to work @Brett Cannon: Pragmatism is only good as long as it only compromises on elegance, not usability. I think ?partly evauable f-strings? are harder to use than ?f-literals with string and expression parts? @Chris Angelo: All those things being illegal is surprising, which is another argument in favor of my proposal. @Guido van Rossum I?d rather not like them to be preliminarily in the language in this form, considering Python?s track record of not changing preliminary things anymore?but: @Eriv V. Smith: Great idea with banning all backslashes for now. This is so close to release, so we could ban escape sequences and use all of the existing code, then write a new RFC to make sure things are optimal (which in my eyes means the holes/alternating sequence model instead of the thing we have now) Thank you all for your contributions to the discussion and again sorry for messing up and only now posting this correctly. Best, Philipp Chris Angelico schrieb am So., 21. Aug. 2016 um 09:57 Uhr: > On Sun, Aug 21, 2016 at 5:51 PM, Franklin? Lee > wrote: > > Speaking of which, how is this parsed? > > f"{'\n'}" > > If escape-handling is done first, the expression is a string literal > holding > > an actual newline character (normally illegal), rather than an escape > > sequence which resolves to a newline character. > > It's illegal. > > > If that one somehow works, how about this? > > f"{r'\n'}" > > Also illegal. > > > I guess you'd have to write one of these: > > f"{'\\n'}" > > f"{'''\n''')" > > rf"{'\n'}" > > Modulo the typo in the second one, these all result in the same code: > > >>> dis.dis(lambda: f"{'\\n'}") > 1 0 LOAD_CONST 1 ('\n') > 2 FORMAT_VALUE 0 > 4 RETURN_VALUE > >>> f"{'\\n'}" > '\n' > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Aug 23 08:18:18 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Aug 2016 22:18:18 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: On 21 August 2016 at 03:32, Eric V. Smith wrote: > If anything, I'd make it an error to have any backslashes inside the > brackets of an f-string for 3.6. We could always remove this restriction at > a later date. +1 for this if you can find a way to do it - it eliminates the problematic cases where the order of evaluation makes a difference, and ensures the parts within the braces can be reliably processed as normal Python code. > In any event, I'll take a look at adding this restriction, just to get an > estimate of the magnitude of work involved. The easiest thing to do might be > to disallow backslashes in any part of an f-string for 3.6, although that > seems to be going too far. Disallowing \t, \n, etc even in the plain text parts of the f-string would indeed be problematic. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yselivanov.ml at gmail.com Wed Aug 24 15:14:25 2016 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 24 Aug 2016 15:14:25 -0400 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: <7f7e3bb3-2ea1-53d1-0c05-78fe3c14e861@googlemail.com> References: <7f7e3bb3-2ea1-53d1-0c05-78fe3c14e861@googlemail.com> Message-ID: <2372bea6-661e-484a-9967-c83216fb2fa8@gmail.com> On 2016-08-16 12:46 PM, Moritz Sichert via Python-ideas wrote: >>>> 2. It's extremely unlikely that somebody will design a system that >>>> switches coroutine runners *while async/awaiting a coroutine*. >>> Yes, I guess so. >>> >>> >>>> But even in this unlikely use case, you can >>>> easily stack finalizers following this pattern: >>>> >>>> old_finalizer = sys.get_asyncgen_finalizer() >>>> sys.set_asyncgen_finalizer(my_finalizer) >>>> try: >>>> # do my thing >>>> finally: >>>> sys.set_asyncgen_finalizer(old_finalizer) >>> That only works for synchronous code, though, because if this is done in a >>> coroutine, it might get suspended within the try block and would leak its >>> own finalizer into the outer world. >> set_asyncgen_finalizer is designed to be used *only* by coroutine >> runners. This is a low-level API that coroutines should never >> touch. (At least my experience working with coroutines says so...) > First of all, thanks for your work in this PEP! I think it really completes the > async Python to a state where most synchronous code can be changed easily to be > asynchronous. Thank you! [..] > Now my questions: > - Is it correct do check if the async iterator is actually a generator and only > in that case do the whole get/set_asyncgen_finalizer() thing? As I understand, > the finalizer is not needed for "normal" async iterators, i.e. instances of > classes with a __anext__() method. set_asyncgen_finalizer is only going to be used for native AG. Asynchronous iterator objects implemented in pure Python, can store a reference to the running loop and implement __del__ to do any kind of finalization. > - Would it make sense to call sys.set_asyncgen_finalizer(old_finalizer) after > the first call of async_iter.__anext__() instead of only at the end? As I > understand the finalizer is set when the generator is started. > - Is loop.run_until_complete(gen.aclose()) a sensible finalizer? If so, is there > even any other finalizer that would make sense? Maybe creating a task for > gen.aclose() instead of waiting for it to be completed? Please read a new thread on python-dev. I think it answers your questions (in part) and asks a few more! Thank you, Yury From nicksjacobson at yahoo.com Wed Aug 24 22:29:05 2016 From: nicksjacobson at yahoo.com (Nick Jacobson) Date: Thu, 25 Aug 2016 02:29:05 +0000 (UTC) Subject: [Python-ideas] Adding optional parameter to shutil.rmtree to not delete root. References: <1733535507.1841020.1472092145046.JavaMail.yahoo.ref@mail.yahoo.com> Message-ID: <1733535507.1841020.1472092145046.JavaMail.yahoo@mail.yahoo.com> I've been finding that a common scenario is where I want to remove everything in a directory, but leave the (empty) root directory behind, not removing it. So for example, if I have a directory C:\foo and it contains subdirectory C:\foo\bar and file C:\foo\myfile.txt, and I want to remove the subdirectory (and everything in it) and file, leaving only C:\foo behind. (This is useful e.g. when the root directory has special permissions, so it wouldn't be so simple to remove it and recreate it again.) A number of ways to do this have been offered here: http://stackoverflow.com/questions/185936/delete-folder-contents-in-python But it'd be simpler if there were an optional parameter added to shutil.rmtree, called removeroot. It would default to true so as to not break any backward compatibility. If it's set to false, then it leaves the root directory in place. Thanks, Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-ideas at shalmirane.com Thu Aug 25 00:28:03 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Wed, 24 Aug 2016 21:28:03 -0700 Subject: [Python-ideas] SI scale factors in Python Message-ID: <20160825042803.GA2055@kundert.designers-guide.com> All, I propose that support for SI scale factors be added to Python. This would be very helpful for any program that heavily uses real numbers, such as those involved with scientific and engineering computation. There would be two primary changes. First, the lexer would be enhanced to take real literals with the following forms: c1 = 1nF (same as: c1 = 1e-9 # F ) c = 299.79M (same as: c = 299.79e6 ) f_hy = 1.4204GHz (same as: f_hy = 1.4204e9 # Hz) Basically a scale factor and units may follow a number, both of which are optional, but if the units are given the scale factor must also be given. Any units given could be kept with the number and would be accessible through an attribute or method call, or if it is felt that the cost of storing the units are too high, it may simply be discarded, in which case it is simply serving as documentation. The second change would be to the various real-to-string conversions available in Python. New formatting options would be provided to support SI scale factors. For example, print('Hydrogen line frequency: %q' % f_hy) Hydrogen line frequency: 1.4204GHz If the units are retained with the numbers, then %q (quantity) could be used to print number with SI scale factors and units, and %r (real) could be used to print the number with SI scale factors but without the units. A small package that fleshes out these ideas is available from https://github.com/KenKundert/engfmt It used to be that SI scale factors were only used by scientists and engineers, but over the last 20 years their popularity has increased and so now they are used everywhere. It is time for our programming languages to catch up. I find it a little shocking that no programming languages offer this feature yet, but engineering applications such as SPICE and Verilog have supported SI scale factors for a very long time. -Ken From random832 at fastmail.com Thu Aug 25 00:34:48 2016 From: random832 at fastmail.com (Random832) Date: Thu, 25 Aug 2016 00:34:48 -0400 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825042803.GA2055@kundert.designers-guide.com> References: <20160825042803.GA2055@kundert.designers-guide.com> Message-ID: <1472099688.3585218.705492153.25E70216@webmail.messagingengine.com> On Thu, Aug 25, 2016, at 00:28, Ken Kundert wrote: > Basically a scale factor and units may follow a number, both of which are > optional, but if the units are given the scale factor must also be given. So you can have 1000mm or 0.001km but not 1m? From ian.g.kelly at gmail.com Thu Aug 25 00:37:21 2016 From: ian.g.kelly at gmail.com (Ian Kelly) Date: Wed, 24 Aug 2016 22:37:21 -0600 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <1472099688.3585218.705492153.25E70216@webmail.messagingengine.com> References: <20160825042803.GA2055@kundert.designers-guide.com> <1472099688.3585218.705492153.25E70216@webmail.messagingengine.com> Message-ID: On Wed, Aug 24, 2016 at 10:34 PM, Random832 wrote: > On Thu, Aug 25, 2016, at 00:28, Ken Kundert wrote: >> Basically a scale factor and units may follow a number, both of which are >> optional, but if the units are given the scale factor must also be given. > > So you can have 1000mm or 0.001km but not 1m? Sort of makes sense. Should 1m be interpreted as 1 meter or 0.001 (unitless)? From rosuav at gmail.com Thu Aug 25 00:47:42 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 25 Aug 2016 14:47:42 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825042803.GA2055@kundert.designers-guide.com> References: <20160825042803.GA2055@kundert.designers-guide.com> Message-ID: On Thu, Aug 25, 2016 at 2:28 PM, Ken Kundert wrote: > I propose that support for SI scale factors be added to Python. This would > be very helpful for any program that heavily uses real numbers, such as those > involved with scientific and engineering computation. There would be two primary > changes. First, the lexer would be enhanced to take real literals with the > following forms: > > c1 = 1nF (same as: c1 = 1e-9 # F ) > c = 299.79M (same as: c = 299.79e6 ) > f_hy = 1.4204GHz (same as: f_hy = 1.4204e9 # Hz) > > Basically a scale factor and units may follow a number, both of which are > optional, but if the units are given the scale factor must also be given. Any > units given could be kept with the number and would be accessible through an > attribute or method call, or if it is felt that the cost of storing the units > are too high, it may simply be discarded, in which case it is simply serving as > documentation. If units are retained, what you have is no longer a simple number, but a value with a unit, and is a quite different beast. (For instance, addition would have to cope with unit mismatches (probably by throwing an error), and multiplication would have to combine the units (length * length = area).) That would be a huge new feature. I'd be inclined to require, for simplicity, that the scale factor and the unit be separated with a hash: c1 = 1n#F c = 299.79M f_hy = 1.4204G#Hz It reads almost as well as "GHz" does, but is clearly non-semantic. The resulting values would simply be floats, and the actual tag would be discarded - there'd be no difference between 1.4204G and 1420.4M, and the %q formatting code would render them the same way. Question, though: What happens with exa-? Currently, if the parser sees "1E", it'll expect to see another number, eg 1E+1 == 10.0. Will this double meaning cause confusion? ChrisA From eryksun at gmail.com Thu Aug 25 00:48:49 2016 From: eryksun at gmail.com (eryk sun) Date: Thu, 25 Aug 2016 04:48:49 +0000 Subject: [Python-ideas] Adding optional parameter to shutil.rmtree to not delete root. In-Reply-To: <1733535507.1841020.1472092145046.JavaMail.yahoo@mail.yahoo.com> References: <1733535507.1841020.1472092145046.JavaMail.yahoo.ref@mail.yahoo.com> <1733535507.1841020.1472092145046.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Thu, Aug 25, 2016 at 2:29 AM, Nick Jacobson via Python-ideas wrote: > I've been finding that a common scenario is where I want to remove > everything in a directory, but leave the (empty) root directory behind, not > removing it. > > So for example, if I have a directory C:\foo and it contains subdirectory > C:\foo\bar and file C:\foo\myfile.txt, and I want to remove the subdirectory > (and everything in it) and file, leaving only C:\foo behind. > > (This is useful e.g. when the root directory has special permissions, so it > wouldn't be so simple to remove it and recreate it again.) Here's a Windows workaround that clears the delete disposition after rmtree 'deletes' the directory. A Windows file or directory absolutely cannot be unlinked while there are handle or kernel references to it, and a handle with DELETE access can set and unset the delete disposition. This used to require the system call NtSetInformationFile, but Vista added SetFileInformationByHandle to the Windows API. import contextlib import ctypes import _winapi kernel32 = ctypes.WinDLL('kernel32', use_last_error=True) kernel32.SetFileInformationByHandle # Vista minimum (NT 6.0+) DELETE = 0x00010000 SHARE_ALL = 7 OPEN_EXISTING = 3 BACKUP = 0x02000000 FileDispositionInfo = 4 @contextlib.contextmanager def protect_file(path): hFile = _winapi.CreateFile(path, DELETE, SHARE_ALL, 0, OPEN_EXISTING, BACKUP, 0) try: yield if not kernel32.SetFileInformationByHandle( hFile, FileDispositionInfo, (ctypes.c_ulong * 1)(0), 4): raise ctypes.WinError(ctypes.get_last_error()) finally: kernel32.CloseHandle(hFile) For example: >>> os.listdir('test') ['dir1', 'dir2', 'file'] >>> with protect_file('test'): ... shutil.rmtree('test') ... >>> os.listdir('test') [] Another example: >>> open('file', 'w').close() >>> with protect_file('file'): ... os.remove('file') ... >>> os.path.exists('file') True From greg.ewing at canterbury.ac.nz Thu Aug 25 01:47:37 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Aug 2016 17:47:37 +1200 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: References: <20160825042803.GA2055@kundert.designers-guide.com> <1472099688.3585218.705492153.25E70216@webmail.messagingengine.com> Message-ID: <57BE8679.1060506@canterbury.ac.nz> Ian Kelly wrote: > Should 1m be interpreted as 1 meter or 0.001 (unitless)? I've never seen anyone use a scale factor prefix on its own with a dimensionless number. Sometimes informally the unit is omitted when it can be inferred from context (e.g. "1k" written next to a resistor symbol obviously means "1 kilohm"). But without that context it's ambiguous, so I don't think it should be allowed in program code. -- Greg From greg.ewing at canterbury.ac.nz Thu Aug 25 01:57:15 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Aug 2016 17:57:15 +1200 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: References: <20160825042803.GA2055@kundert.designers-guide.com> Message-ID: <57BE88BB.3030201@canterbury.ac.nz> Chris Angelico wrote: > If units are retained, what you have is no longer a simple number, but > a value with a unit, and is a quite different beast. (For instance, > addition would have to cope with unit mismatches (probably by throwing > an error), and multiplication would have to combine the units (length > * length = area).) And that can be surprisingly tricky. For example, newtons times metres equals joules -- but *only* if the force and the distance are in the same direction, otherwise it's torque rather than energy and the units are just newton-metres. -- Greg From rosuav at gmail.com Thu Aug 25 02:03:07 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 25 Aug 2016 16:03:07 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <57BE88BB.3030201@canterbury.ac.nz> References: <20160825042803.GA2055@kundert.designers-guide.com> <57BE88BB.3030201@canterbury.ac.nz> Message-ID: On Thu, Aug 25, 2016 at 3:57 PM, Greg Ewing wrote: > Chris Angelico wrote: >> >> If units are retained, what you have is no longer a simple number, but >> a value with a unit, and is a quite different beast. (For instance, >> addition would have to cope with unit mismatches (probably by throwing >> an error), and multiplication would have to combine the units (length >> * length = area).) > > > And that can be surprisingly tricky. For example, newtons > times metres equals joules -- but *only* if the force and > the distance are in the same direction, otherwise it's > torque rather than energy and the units are just > newton-metres. > Yeah. And a full-on unit-aware numeric system doesn't belong in the core language IMO. It belongs on PyPI, with an API like: length = N("100m") width = N("50m") area = length * width depth = N('2"') # inches volume = area * depth time = N("5 hours") flow_rate = volume/time print("Rain flowed through the pipe at", flow_rate) No core language changes needed for that. And since, in most cases, the values will come from user input anyway, a literal syntax won't be as important. ChrisA From xavier.combelle at gmail.com Thu Aug 25 02:08:34 2016 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Thu, 25 Aug 2016 08:08:34 +0200 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825042803.GA2055@kundert.designers-guide.com> References: <20160825042803.GA2055@kundert.designers-guide.com> Message-ID: <5e00697b-df4b-8c3d-c0b2-cb53628760c6@gmail.com> On 25/08/2016 06:28, Ken Kundert wrote: > All, > I propose that support for SI scale factors be added to Python. This would > be very helpful for any program that heavily uses real numbers, such as those > involved with scientific and engineering computation. There would be two primary > changes. First, the lexer would be enhanced to take real literals with the > following forms: > > c1 = 1nF (same as: c1 = 1e-9 # F ) > c = 299.79M (same as: c = 299.79e6 ) > f_hy = 1.4204GHz (same as: f_hy = 1.4204e9 # Hz) > There is little difference (except that it ask for a syntax modification which should be heavy weighted) between this proposition and c1 = 1*nF (same as: c1 = 1e-9 # F ) c = 299.79*M (same as: c = 299.79e6 ) f_hy = 1.4204*GHz (same as: f_hy = 1.4204e9 # Hz) with correct definition of the constants in a library. So a library would be welcome. From ian.g.kelly at gmail.com Thu Aug 25 02:11:15 2016 From: ian.g.kelly at gmail.com (Ian Kelly) Date: Thu, 25 Aug 2016 00:11:15 -0600 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <57BE88BB.3030201@canterbury.ac.nz> References: <20160825042803.GA2055@kundert.designers-guide.com> <57BE88BB.3030201@canterbury.ac.nz> Message-ID: On Wed, Aug 24, 2016 at 11:57 PM, Greg Ewing wrote: > Chris Angelico wrote: >> >> If units are retained, what you have is no longer a simple number, but >> a value with a unit, and is a quite different beast. (For instance, >> addition would have to cope with unit mismatches (probably by throwing >> an error), and multiplication would have to combine the units (length >> * length = area).) > > > And that can be surprisingly tricky. For example, newtons > times metres equals joules -- but *only* if the force and > the distance are in the same direction, otherwise it's > torque rather than energy and the units are just > newton-metres. I'd say that it more accurately depends on whether the distance represents a displacement or a position of application. If one pushes a shopping cart off-center, that produces both work and torque, with different "distance" vectors for each. Analytically, one is a cross-product and the other is a dot-product. The unit matching engine would have to understand the difference and know which one is being applied in the calculation. From python-ideas at shalmirane.com Thu Aug 25 04:19:55 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Thu, 25 Aug 2016 01:19:55 -0700 Subject: [Python-ideas] SI scale factors in Python Message-ID: <20160825081955.GA21350@kundert.designers-guide.com> > So you can have 1000mm or 0.001km but not 1m? If the scale factor is optional, then numbers like 1m are problematic because the m can represent either milli or meter. This is resolved by requiring the scale factor and defining a unity scale factor. I propose '_'. So 1m represents milli and 1_m represents 1 meter. > If units are retained, what you have is no longer a simple number, but > a value with a unit, and is a quite different beast. (For instance, > addition would have to cope with unit mismatches (probably by throwing > an error), and multiplication would have to combine the units (length > * length = area).) That would be a huge new feature. Indeed. I am not proposing that anything be done with the units other than possibly retain them for later output. Doing dimensional analysis on expressions would be a huge burden both for those implementing the language and for those using them in a program. Just allowing the units to be present, even it not retained, is a big advantage because it can bring a great deal of clarity to the meaning of the number. For example, even if the language does not flag an error when a user writes: vdiff = 1mV - 30uA the person that wrote the line will generally see it as a problem and fix it. In my experience, providing units is the most efficient form of documentation available in numerical programming in the sense that one or two additional characters can often clarify otherwise very confusing code. My feeling is that retaining the units on real literals is of little value if you don't also extend the real variable type to hold units, or to create another variable type that would carry the units. Extending reals does not seem like a good idea, but creating a new type, quantity, seems practical. In this case, the units would be rather ephemeral in that they would not survive any operation. Thus, the result of an operation between a quantity and either a integer, real or quantity would always be a real, meaning that the units are lost. In this way, units are very light-weight and only really serve as documentation (for both programmers and end users). But this idea of retaining the units is the least important aspect of this proposal. The important aspects are: 1. It allows numbers to be entered in a clean form that is easy to type and easy to interpret 2. It allows numbers to be output in a clean form that is easy to interpret. 3. In many cases it allows units to be inserted into the code in a very natural and clean way to improve the clarity of the code. > Question, though: What happens with exa-? Currently, if the parser > sees "1E", it'll expect to see another number, eg 1E+1 == 10.0. Will > this double meaning cause confusion? Oh, I did not see this. Both SPICE and Verilog limit the scale factors to the common ones (T, G, M, k, _, m, u, n, p, f, a). I work in electrical engineering, and in that domain exa never comes up. My suggestion would be to limit ourselves to the common scale factors as most people know them. Using P, E, Z, Y, z, and y often actually works against us as most people are not familiar with them and so cannot interpret them easily. -Ken From python-ideas at shalmirane.com Thu Aug 25 04:54:41 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Thu, 25 Aug 2016 01:54:41 -0700 Subject: [Python-ideas] SI scale factors in Python Message-ID: <20160825085441.GA29764@kundert.designers-guide.com> > Question, though: What happens with exa-? Currently, if the parser > sees "1E", it'll expect to see another number, eg 1E+1 == 10.0. Will > this double meaning cause confusion? Allow me to refine my answer to this question ... Yes, that is definitely problematic. I see two possible solutions. 1. Limit ourselves to the common scale factors: T, G, M, k, _, m, u, n, p, f, a 2. Or accept X in lieu of E. After all, the e is silent anyway. Thus, on input we accept ... 1Y -> 1e+24 1Z -> 1e+21 -> 1X -> 1e+18 <- only difference 1P -> 1e+15 1T -> 1e+12 1G -> 1e+09 1M -> 1e+06 1k -> 1e+03 1_ -> 1e+00 1m -> 1e-03 1u -> 1e-06 1n -> 1e-09 1p -> 1e-12 1f -> 1e-15 1a -> 1e-18 1z -> 1e-21 1y -> 1e-24 But on output we use ... 1Y -> 1e+24 optional 1Z -> 1e+21 optional -> 1E -> 1e+18 optional 1P -> 1e+15 optional 1T -> 1e+12 1G -> 1e+09 1M -> 1e+06 1k -> 1e+03 1_ -> 1e+00 1m -> 1e-03 1u -> 1e-06 1n -> 1e-09 1p -> 1e-12 1f -> 1e-15 1a -> 1e-18 1z -> 1e-21 optional 1y -> 1e-24 optional The optional scale factors are unfamiliar to most people, and if used might result in harder to read numbers. So I propose that '%r' only outputs the common scale factors, and %R outputs all the scale factors. Or we can use '#' in the format string to indicate the 'alternate' form should be used, in this case 'alternate' means that the extended set of scale factors should be used. -Ken From p.f.moore at gmail.com Thu Aug 25 05:06:04 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 25 Aug 2016 10:06:04 +0100 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825085441.GA29764@kundert.designers-guide.com> References: <20160825085441.GA29764@kundert.designers-guide.com> Message-ID: On 25 August 2016 at 09:54, Ken Kundert wrote: > 1G -> 1e+09 > 1M -> 1e+06 > 1k -> 1e+03 While these suffixes are suitable for a scientific context, in a computing context, 1k=1024, 1M=1024*1024 and 1G=1024*1024*1024 make just as much, if not more, sense (and yes, I'm aware of Gigabyte vs Gibibyte). If "1M" were a legal Python literal,. I would expect a lot of confusion over what it meant - to the extent that it would hurt readability badly. Paul From jsbueno at python.org.br Thu Aug 25 05:55:10 2016 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 25 Aug 2016 06:55:10 -0300 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: References: <20160825085441.GA29764@kundert.designers-guide.com> Message-ID: On 25 August 2016 at 06:06, Paul Moore wrote: > On 25 August 2016 at 09:54, Ken Kundert wrote: >> 1G -> 1e+09 >> 1M -> 1e+06 >> 1k -> 1e+03 > > While these suffixes are suitable for a scientific context, in a > computing context, 1k=1024, 1M=1024*1024 and 1G=1024*1024*1024 make > just as much, if not more, sense (and yes, I'm aware of Gigabyte vs > Gibibyte). > > If "1M" were a legal Python literal,. I would expect a lot of > confusion over what it meant - to the extent that it would hurt > readability badly. > Paul So, the idea of adding fixed sufixes to the core language I regard as awful - due to these and other considerations - But maybe, oen thign to think about is about "operatorless" multiplication - jsut puting two tokens side by side, which currently yieds a Syntax Error could call `__mul__` (or a new `__direct_mul__` method on the second operator. That would enable a lot of interesting things in already existing packages like SymPy - and would allow a "physics measurements" packages that would take care of the ideas on the starting of the thread. I certainly would be more confortable for mathematicians and other people using Python in interactive environments such as iPython notebooks. But other than having this as a multiplication, I am against the whole thing. From rosuav at gmail.com Thu Aug 25 06:16:22 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 25 Aug 2016 20:16:22 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825081955.GA21350@kundert.designers-guide.com> References: <20160825081955.GA21350@kundert.designers-guide.com> Message-ID: On Thu, Aug 25, 2016 at 6:19 PM, Ken Kundert wrote: >> So you can have 1000mm or 0.001km but not 1m? > > If the scale factor is optional, then numbers like 1m are problematic because > the m can represent either milli or meter. This is resolved by requiring the > scale factor and defining a unity scale factor. I propose '_'. So 1m represents > milli and 1_m represents 1 meter. This could also be hashed out using a constructor-only API. You'll probably want to avoid '_', as it's just been added as a comma separator for numeric literals. >> If units are retained, what you have is no longer a simple number, but >> a value with a unit, and is a quite different beast. (For instance, >> addition would have to cope with unit mismatches (probably by throwing >> an error), and multiplication would have to combine the units (length >> * length = area).) That would be a huge new feature. > > Indeed. I am not proposing that anything be done with the units other than > possibly retain them for later output. Doing dimensional analysis on expressions > would be a huge burden both for those implementing the language and for those > using them in a program. Just allowing the units to be present, even it not > retained, is a big advantage because it can bring a great deal of clarity to the > meaning of the number. For example, even if the language does not flag an error > when a user writes: > > vdiff = 1mV - 30uA > > the person that wrote the line will generally see it as a problem and fix it. How often do you do arithmetic on literals like that? More likely, what you'd do is tag your variable names, so it'll be something like: input_volts = 1m#V inefficiency = 30u#A vdiff = input_volts - inefficiency > In my experience, providing units is the most efficient form of documentation > available in numerical programming in the sense that one or two additional > characters can often clarify otherwise very confusing code. > > My feeling is that retaining the units on real literals is of little value if > you don't also extend the real variable type to hold units, or to create another > variable type that would carry the units. Extending reals does not seem like > a good idea, but creating a new type, quantity, seems practical. In this case, > the units would be rather ephemeral in that they would not survive any > operation. Thus, the result of an operation between a quantity and either > a integer, real or quantity would always be a real, meaning that the units are > lost. In this way, units are very light-weight and only really serve as > documentation (for both programmers and end users). > > But this idea of retaining the units is the least important aspect of this > proposal. The important aspects are: > 1. It allows numbers to be entered in a clean form that is easy to type and easy > to interpret > 2. It allows numbers to be output in a clean form that is easy to interpret. > 3. In many cases it allows units to be inserted into the code in a very natural > and clean way to improve the clarity of the code. The decimal.Decimal and fractions.Fractions types have no syntactic support. I would suggest imitating their styles initially, sorting out all the details of which characters mean what, and appealing for syntax once it's all settled - otherwise, it's too likely that something will end up being baked into the language half-baked, if that makes any sense. >> Question, though: What happens with exa-? Currently, if the parser >> sees "1E", it'll expect to see another number, eg 1E+1 == 10.0. Will >> this double meaning cause confusion? > > Oh, I did not see this. Both SPICE and Verilog limit the scale factors to the > common ones (T, G, M, k, _, m, u, n, p, f, a). I work in electrical engineering, > and in that domain exa never comes up. My suggestion would be to limit ourselves > to the common scale factors as most people know them. Using P, E, Z, Y, z, and > y often actually works against us as most people are not familiar with them and > so cannot interpret them easily. That seems pretty reasonable. At very least, it'd be something that can be extended later. Even femto and atto are rare enough that they could be dropped if necessary (pico too, perhaps). Easy scaling seems general enough to include in the language. Tagging numbers with units, though, feels like the domain of a third-party library. Maybe I'm wrong. ChrisA From mertz at gnosis.cx Thu Aug 25 06:24:36 2016 From: mertz at gnosis.cx (David Mertz) Date: Thu, 25 Aug 2016 03:24:36 -0700 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: References: <20160825085441.GA29764@kundert.designers-guide.com> Message-ID: I have trouble imagining situations in which SI units would help readability over scientific notation. On the one hand reading it always involves mental conversion to the exponents, so this is just an extra step. E.g. this equality is evident at a glance: 1e3 * 1e6 * 1e-9 == 1 This one would take me a few seconds, and I'd need to run it twice in my head to make sure I wasn't doing it wrong: 1k * 1M * 1n == 1 The more complicated the expression, the bigger the cognitive advantage for numeric exponents. I think the OP is drawn astray by the utility of domain specific familiar dimensional units. 1MeV means something in a domain. So does 1ps. Or 1km. But really it's the units not the exponent that are helping us (plus knowing that our specific domain deals with dimensional quantities of a certain scale). But even past the ease of mental arithmetic, I don't see why one would use literals very often to start with. If they are constants, give them a better name! If they are variables, likewise. If it helps your domain, add the units to the *names*, e.g.: lightyear_m = 9.5e15 galaxy_radians = calculate_arc() dist_m = lighyear_m * galaxy_radians * arc_len That looks a lot clearer to me than sticking together raw numbers, however spelled. On Aug 25, 2016 2:56 AM, "Joao S. O. Bueno" wrote: > On 25 August 2016 at 06:06, Paul Moore wrote: > > On 25 August 2016 at 09:54, Ken Kundert > wrote: > >> 1G -> 1e+09 > >> 1M -> 1e+06 > >> 1k -> 1e+03 > > > > While these suffixes are suitable for a scientific context, in a > > computing context, 1k=1024, 1M=1024*1024 and 1G=1024*1024*1024 make > > just as much, if not more, sense (and yes, I'm aware of Gigabyte vs > > Gibibyte). > > > > If "1M" were a legal Python literal,. I would expect a lot of > > confusion over what it meant - to the extent that it would hurt > > readability badly. > > Paul > > > So, the idea of adding fixed sufixes to the core language I regard as > awful - due to these and other considerations - > But maybe, oen thign to think about is about "operatorless" > multiplication - jsut puting two tokens side by side, which currently > yieds a Syntax Error could call `__mul__` > (or a new `__direct_mul__` method on the second operator. > > That would enable a lot of interesting things in already existing > packages like SymPy - and would allow a "physics measurements" > packages that would take care of the ideas on the starting of the > thread. > > I certainly would be more confortable for mathematicians and other > people using Python in interactive environments such as iPython > notebooks. > > But other than having this as a multiplication, I am against the whole > thing. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Aug 25 07:24:05 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 25 Aug 2016 21:24:05 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825042803.GA2055@kundert.designers-guide.com> References: <20160825042803.GA2055@kundert.designers-guide.com> Message-ID: <20160825112405.GH26300@ando.pearwood.info> On Wed, Aug 24, 2016 at 09:28:03PM -0700, Ken Kundert wrote: > All, > I propose that support for SI scale factors be added to Python. I think there's something to be said for these ideas, but you are combining multiple ideas into one suggestion. First off, units with quantities: I think that is an excellent idea, but one best supported by a proper unit library that supports more than just SI units. There are already a few of those. See for example this Stackoverflow question: http://stackoverflow.com/questions/2125076/unit-conversion-in-python Sympy also does dimensional analysis: http://docs.sympy.org/latest/modules/physics/unitsystems/examples.html Google for more. If I try to add 30 feet to 6 metres and get either 36 feet or 36 metres, then your unit system is *worse* than useless, it is actively harmful. I don't mind if I get 15.144 metres or 49.685039 feet or even 5.0514947e-08 lightseconds, but I better not get 36 of anything. And likewise for adding 30 kilograms to 6 metres. That has to be an error, or this system will just be an attractive nuisance, luring people into a false sense of security while actually not protecting them from dimensional and unit conversion bugs at all. So I am an extremely strong -1 to the suggestion that we allow unit suffixes on numeric quantities but treat them as a no-op. Should Python support a unit conversion library in the standard library? I think perhaps not -- there's plenty of competition in the unit conversion ecosystem, both in Python and out of it, and I don't think that there's any one library that is both sufficiently "best of breed" enough and stable enough to put into the std lib. Remember that the std lib is where good libraries go to die: once they hit the std lib, stability becomes much, much more important than new features. But if you wish to argue differently, I'll be willing to hear your suggestions. Now, on to the second part of the suggestion: support for SI prefixes. I think this is simple enough, and useful enough, that we could make it part of the std lib -- and possibly even in 3.6 (possibly under a provisional basis). The library could be dead simple: # prefixes.py # Usage: # from prefixes import * # x = 123*M # like 123000000 # y = 45*Ki # like 45*1024 # SI unit prefixes # http://physics.nist.gov/cuu/Units/prefixes.html Y = yotta = 10**24 Z = zetta = 10**21 [...] k = kilo = 10**3 K = k # not strictly an SI prefix, but common enough to allow it m = milli = 1e-3 ? = micro = 1e-6 # A minor PEP-8 violation, but (I hope) forgiveable. u = ? # not really an SI prefix, but very common # etc # International Electrotechnical Commission (IEC) binary prefixes # http://physics.nist.gov/cuu/Units/binary.html Ki = kibi = 1024 Mi = mibi = 1024**2 Gi = 1024**3 # etc That's practically it. Of course, this simple implementation would allow usage that was a technical violation of the SI system: x = 45*M*?*Ki as well as usage that is semantically meaningless: x = 45 + M but those abuses are best covered by "consenting adults". (In other words, if you don't like it, don't do it.) And it wouldn't support the obsolete binary prefixes that use SI symbols with binary values (K=1024, M=1024**2, etc), but that's a good thing. They're an abomination that need to die as soon as possible. -- Steve From rosuav at gmail.com Thu Aug 25 07:44:18 2016 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 25 Aug 2016 21:44:18 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825112405.GH26300@ando.pearwood.info> References: <20160825042803.GA2055@kundert.designers-guide.com> <20160825112405.GH26300@ando.pearwood.info> Message-ID: On Thu, Aug 25, 2016 at 9:24 PM, Steven D'Aprano wrote: > If I try to add 30 feet to 6 metres and get either 36 feet or 36 metres, > then your unit system is *worse* than useless, it is actively harmful. I > don't mind if I get 15.144 metres or 49.685039 feet or even > 5.0514947e-08 lightseconds, but I better not get 36 of anything. > To be fair, under the OP's proposal, you wouldn't get 36 *of anything* - you'd just get the bare number 36. So it's not worse than useless, just useless. ChrisA From victor.stinner at gmail.com Thu Aug 25 09:14:39 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 25 Aug 2016 15:14:39 +0200 Subject: [Python-ideas] Adding optional parameter to shutil.rmtree to not delete root. In-Reply-To: <1733535507.1841020.1472092145046.JavaMail.yahoo@mail.yahoo.com> References: <1733535507.1841020.1472092145046.JavaMail.yahoo.ref@mail.yahoo.com> <1733535507.1841020.1472092145046.JavaMail.yahoo@mail.yahoo.com> Message-ID: Maybe add a different function rather add a flag? Something like shutil.remove_dir_files(). Victor Le 25 ao?t 2016 4:32 AM, "Nick Jacobson via Python-ideas" < python-ideas at python.org> a ?crit : > I've been finding that a common scenario is where I want to remove > everything in a directory, but leave the (empty) root directory behind, not > removing it. > > So for example, if I have a directory C:\foo and it contains subdirectory > C:\foo\bar and file C:\foo\myfile.txt, and I want to remove the > subdirectory (and everything in it) and file, leaving only C:\foo behind. > > (This is useful e.g. when the root directory has special permissions, so > it wouldn't be so simple to remove it and recreate it again.) > > A number of ways to do this have been offered here: > http://stackoverflow.com/questions/185936/delete-folder-contents-in-python > > But it'd be simpler if there were an optional parameter added to > shutil.rmtree, called removeroot. It would default to true so as to not > break any backward compatibility. If it's set to false, then it leaves the > root directory in place. > > Thanks, > > Nick > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Aug 25 07:35:39 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 25 Aug 2016 04:35:39 -0700 Subject: [Python-ideas] PEP 525: Asynchronous Generators In-Reply-To: References: <9c691f58-aa1f-3797-c75b-b1272e730da9@gmail.com> Message-ID: Hi Yury et al, Apologies for not participating in this discussion before -- unfortunately I've been dealing with some boring medical issues and haven't been able to contribute as much as early as I should have... But I do have two concerns, one minor and one not. Minor comment: this is purely an aesthetic issue, but the PEP should probably have some discussion of the choice between "async yield" vs bare "yield" as the syntax. Arguments for "async yield" would be consistency with the other async syntax ("async for" etc.) and potentially reduced confusion about how inside an async def, "await" yields, and "yield" yields, but in totally different senses. Arguments for "yield" would be terseness, I guess. Probably this is just something for Guido to pick... Major issue: I have serious concerns about the whole async finalizer design. 1) there is a very general problem of how to handle cleanup in async code, which isn't specific to async generators at all -- really any object that holds resources that need async cleanup has exactly the same issue. So the special purpose solution here makes me uncomfortable. 2) our understanding of this problem (resource cleanup in async code) is *really* immature. You can check the async-sig archives for some discussion -- I think it's safe to say that all the "experts" are extremely confused about this problem's shape/scope/best practices. I guess probably there are 10ish people who kind of understand the problem and 0 people who deeply understand it. (I was working on at least writing something up try and make the issue more accessible, but have not managed yet, which I feel bad about :-(.) But given this, trying to freeze one particular strategy into the language on the 3.6 timeline raises a red flag for me. 3) specifically, I don't understand how the proposed solution in the pep can work reliably in the presence of non-prompt garbage collection (e.g. in the presence of cycles on cpython, or on pypy in general). What happens when the async generator object gets collected after the event loop exits? We could require event loops to force a full collection before exiting, but this is a rather uncomfortable coupling! (Do all python implementations even have a way to force a guaranteed-to-collect-ALL-garbage collection? What are the performance implications of async code triggering such full collections at inopportune times?) And even this didn't necessarily solve the problem, since async generators frames could still be "live" from the GC point of view, even if we know that their resources should be released. (E.g. consider an exception thrown from an async gen that propagates out of the event loop, so the event loop exits while the async gen's frame is pinned by the traceback object.) Python in general has been moving towards more explicit/deterministic cleanup through "with" statements, and I think this is great. My feeling is that we should try to continue in this vein, rather than jumping through hoops to let people get away with sloppy coding that relies on __del__ and usually works except when it doesn't. For example, what if we added an optional __abreak__ coroutine to the async iterator protocol, with the semantics that an "async for" will always either iterate to exhaustion, or else call __abreak__, i.e. # new style code async for x in aiter: ... becomes sugar for # 3.5 style code try: async for x in aiter: ... finally: if for_loop_exited_early and hasattr(aiter, "___abreak__"): await aiter.__abreak__(...) Basically the idea here is that currently, what we should *really* be doing if we care about reliable+deterministic resource cleanup is never writing bare "for" loops but instead always writing async with ... as aiter: async for ... in aiter: ... But that's really tiresome and no one does it, so let's build the "with" into the for loop. Of course there are cases where you want to reuse a single iterator object in multiple loops; this would look like: # doing something clever, so we have to explicitly take charge of cleanup ourselves: async with ... as aiter: # read header (= first non-comment line) async for header in iterlib.unclosing(aiter): if not header.startswith("#"): break # continue with same iterator to read body async for line in iterlib.unclosing(aiter): ... where iterlib.unclosing returns a simple proxy object that passes through __anext__ and friends but swallows __abreak__ to make it a no-op. Or maybe not, this isn't a fully worked out proposal :-). My main concern is that we don't rush to make a decision for 3.6, skip the hard work of considering different designs, and end up with something we regret in retrospect. Some other options to consider: - add async generators to 3.6, but defer the full cleanup discussion for now -- so in 3.6 people will have to use "async with aclosing(aiter): ..." or whatever, but at least we can start getting experience with these objects now, and worry about making them more pleasant to use in the 3.7 time frame. - defer the whole thing to 3.7. Obviously this would be really sad, but... To hold us over until then, it turns out that a library based solution is actually pretty ergonomic -- see https://github.com/njsmith/async_generator . I should add asend/athrow/aclose (I think I have them on a branch), but otherwise it's pretty much as capable and easy to use as "real" async generators. So while I'd love to see these in the language, it doesn't feel super urgent to me. -n On Aug 2, 2016 3:31 PM, "Yury Selivanov" wrote: Hi, This is a new PEP to add asynchronous generators to Python 3.6. The PEP is also available at [1]. There is a reference implementation [2] that supports everything that the PEP proposes to add. [1] https://www.python.org/dev/peps/pep-0525/ [2] https://github.com/1st1/cpython/tree/async_gen Thank you! PEP: 525 Title: Asynchronous Generators Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov Discussions-To: Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 28-Jul-2016 Python-Version: 3.6 Post-History: 02-Aug-2016 Abstract ======== PEP 492 introduced support for native coroutines and ``async``/``await`` syntax to Python 3.5. It is proposed here to extend Python's asynchronous capabilities by adding support for *asynchronous generators*. Rationale and Goals =================== Regular generators (introduced in PEP 255) enabled an elegant way of writing complex *data producers* and have them behave like an iterator. However, currently there is no equivalent concept for the *asynchronous iteration protocol* (``async for``). This makes writing asynchronous data producers unnecessarily complex, as one must define a class that implements ``__aiter__`` and ``__anext__`` to be able to use it in an ``async for`` statement. Essentially, the goals and rationale for PEP 255, applied to the asynchronous execution case, hold true for this proposal as well. Performance is an additional point for this proposal: in our testing of the reference implementation, asynchronous generators are **2x** faster than an equivalent implemented as an asynchronous iterator. As an illustration of the code quality improvement, consider the following class that prints numbers with a given delay once iterated:: class Ticker: """Yield numbers from 0 to `to` every `delay` seconds.""" def __init__(self, delay, to): self.delay = delay self.i = 0 self.to = to def __aiter__(self): return self async def __anext__(self): i = self.i if i >= self.to: raise StopAsyncIteration self.i += 1 if i: await asyncio.sleep(self.delay) return i The same can be implemented as a much simpler asynchronous generator:: async def ticker(delay, to): """Yield numbers from 0 to `to` every `delay` seconds.""" for i in range(to): yield i await asyncio.sleep(delay) Specification ============= This proposal introduces the concept of *asynchronous generators* to Python. This specification presumes knowledge of the implementation of generators and coroutines in Python (PEP 342, PEP 380 and PEP 492). Asynchronous Generators ----------------------- A Python *generator* is any function containing one or more ``yield`` expressions:: def func(): # a function return def genfunc(): # a generator function yield We propose to use the same approach to define *asynchronous generators*:: async def coro(): # a coroutine function await smth() async def asyncgen(): # an asynchronous generator function await smth() yield 42 The result of calling an *asynchronous generator function* is an *asynchronous generator object*, which implements the asynchronous iteration protocol defined in PEP 492. It is a ``SyntaxError`` to have a non-empty ``return`` statement in an asynchronous generator. Support for Asynchronous Iteration Protocol ------------------------------------------- The protocol requires two special methods to be implemented: 1. An ``__aiter__`` method returning an *asynchronous iterator*. 2. An ``__anext__`` method returning an *awaitable* object, which uses ``StopIteration`` exception to "yield" values, and ``StopAsyncIteration`` exception to signal the end of the iteration. Asynchronous generators define both of these methods. Let's manually iterate over a simple asynchronous generator:: async def genfunc(): yield 1 yield 2 gen = genfunc() assert gen.__aiter__() is gen assert await gen.__anext__() == 1 assert await gen.__anext__() == 2 await gen.__anext__() # This line will raise StopAsyncIteration. Finalization ------------ PEP 492 requires an event loop or a scheduler to run coroutines. Because asynchronous generators are meant to be used from coroutines, they also require an event loop to run and finalize them. Asynchronous generators can have ``try..finally`` blocks, as well as ``async with``. It is important to provide a guarantee that, even when partially iterated, and then garbage collected, generators can be safely finalized. For example:: async def square_series(con, to): async with con.transaction(): cursor = con.cursor( 'SELECT generate_series(0, $1) AS i', to) async for row in cursor: yield row['i'] ** 2 async for i in square_series(con, 1000): if i == 100: break The above code defines an asynchronous generator that uses ``async with`` to iterate over a database cursor in a transaction. The generator is then iterated over with ``async for``, which interrupts the iteration at some point. The ``square_series()`` generator will then be garbage collected, and without a mechanism to asynchronously close the generator, Python interpreter would not be able to do anything. To solve this problem we propose to do the following: 1. Implement an ``aclose`` method on asynchronous generators returning a special *awaitable*. When awaited it throws a ``GeneratorExit`` into the suspended generator and iterates over it until either a ``GeneratorExit`` or a ``StopAsyncIteration`` occur. This is very similar to what the ``close()`` method does to regular Python generators, except that an event loop is required to execute ``aclose()``. 2. Raise a ``RuntimeError``, when an asynchronous generator executes a ``yield`` expression in its ``finally`` block (using ``await`` is fine, though):: async def gen(): try: yield finally: await asyncio.sleep(1) # Can use 'await'. yield # Cannot use 'yield', # this line will trigger a # RuntimeError. 3. Add two new methods to the ``sys`` module: ``set_asyncgen_finalizer()`` and ``get_asyncgen_finalizer()``. The idea behind ``sys.set_asyncgen_finalizer()`` is to allow event loops to handle generators finalization, so that the end user does not need to care about the finalization problem, and it just works. When an asynchronous generator is iterated for the first time, it stores a reference to the current finalizer. If there is none, a ``RuntimeError`` is raised. This provides a strong guarantee that every asynchronous generator object will always have a finalizer installed by the correct event loop. When an asynchronous generator is about to be garbage collected, it calls its cached finalizer. The assumption is that the finalizer will schedule an ``aclose()`` call with the loop that was active when the iteration started. For instance, here is how asyncio is modified to allow safe finalization of asynchronous generators:: # asyncio/base_events.py class BaseEventLoop: def run_forever(self): ... old_finalizer = sys.get_asyncgen_finalizer() sys.set_asyncgen_finalizer(self._finalize_asyncgen) try: ... finally: sys.set_asyncgen_finalizer(old_finalizer) ... def _finalize_asyncgen(self, gen): self.create_task(gen.aclose()) ``sys.set_asyncgen_finalizer()`` is thread-specific, so several event loops running in parallel threads can use it safely. Asynchronous Generator Object ----------------------------- The object is modeled after the standard Python generator object. Essentially, the behaviour of asynchronous generators is designed to replicate the behaviour of synchronous generators, with the only difference in that the API is asynchronous. The following methods and properties are defined: 1. ``agen.__aiter__()``: Returns ``agen``. 2. ``agen.__anext__()``: Returns an *awaitable*, that performs one asynchronous generator iteration when awaited. 3. ``agen.asend(val)``: Returns an *awaitable*, that pushes the ``val`` object in the ``agen`` generator. When the ``agen`` has not yet been iterated, ``val`` must be ``None``. Example:: async def gen(): await asyncio.sleep(0.1) v = yield 42 print(v) await asyncio.sleep(0.2) g = gen() await g.asend(None) # Will return 42 after sleeping # for 0.1 seconds. await g.asend('hello') # Will print 'hello' and # raise StopAsyncIteration # (after sleeping for 0.2 seconds.) 4. ``agen.athrow(typ, [val, [tb]])``: Returns an *awaitable*, that throws an exception into the ``agen`` generator. Example:: async def gen(): try: await asyncio.sleep(0.1) yield 'hello' except ZeroDivisionError: await asyncio.sleep(0.2) yield 'world' g = gen() v = await g.asend(None) print(v) # Will print 'hello' after # sleeping for 0.1 seconds. v = await g.athrow(ZeroDivisionError) print(v) # Will print 'world' after $ sleeping 0.2 seconds. 5. ``agen.aclose()``: Returns an *awaitable*, that throws a ``GeneratorExit`` exception into the generator. The *awaitable* can either return a yielded value, if ``agen`` handled the exception, or ``agen`` will be closed and the exception will propagate back to the caller. 6. ``agen.__name__`` and ``agen.__qualname__``: readable and writable name and qualified name attributes. 7. ``agen.ag_await``: The object that ``agen`` is currently *awaiting* on, or ``None``. This is similar to the currently available ``gi_yieldfrom`` for generators and ``cr_await`` for coroutines. 8. ``agen.ag_frame``, ``agen.ag_running``, and ``agen.ag_code``: defined in the same way as similar attributes of standard generators. ``StopIteration`` and ``StopAsyncIteration`` are not propagated out of asynchronous generators, and are replaced with a ``RuntimeError``. Implementation Details ---------------------- Asynchronous generator object (``PyAsyncGenObject``) shares the struct layout with ``PyGenObject``. In addition to that, the reference implementation introduces three new objects: 1. ``PyAsyncGenASend``: the awaitable object that implements ``__anext__`` and ``asend()`` methods. 2. ``PyAsyncGenAThrow``: the awaitable object that implements ``athrow()`` and ``aclose()`` methods. 3. ``_PyAsyncGenWrappedValue``: every directly yielded object from an asynchronous generator is implicitly boxed into this structure. This is how the generator implementation can separate objects that are yielded using regular iteration protocol from objects that are yielded using asynchronous iteration protocol. ``PyAsyncGenASend`` and ``PyAsyncGenAThrow`` are awaitables (they have ``__await__`` methods returning ``self``) and are coroutine-like objects (implementing ``__iter__``, ``__next__``, ``send()`` and ``throw()`` methods). Essentially, they control how asynchronous generators are iterated: .. image:: pep-0525-1.png :align: center :width: 80% PyAsyncGenASend and PyAsyncGenAThrow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``PyAsyncGenASend`` is a coroutine-like object that drives ``__anext__`` and ``asend()`` methods and implements the asynchronous iteration protocol. ``agen.asend(val)`` and ``agen.__anext__()`` return instances of ``PyAsyncGenASend`` (which hold references back to the parent ``agen`` object.) The data flow is defined as follows: 1. When ``PyAsyncGenASend.send(val)`` is called for the first time, ``val`` is pushed to the parent ``agen`` object (using existing facilities of ``PyGenObject``.) Subsequent iterations over the ``PyAsyncGenASend`` objects, push ``None`` to ``agen``. When a ``_PyAsyncGenWrappedValue`` object is yielded, it is unboxed, and a ``StopIteration`` exception is raised with the unwrapped value as an argument. 2. When ``PyAsyncGenASend.throw(*exc)`` is called for the first time, ``*exc`` is throwed into the parent ``agen`` object. Subsequent iterations over the ``PyAsyncGenASend`` objects, push ``None`` to ``agen``. When a ``_PyAsyncGenWrappedValue`` object is yielded, it is unboxed, and a ``StopIteration`` exception is raised with the unwrapped value as an argument. 3. ``return`` statements in asynchronous generators raise ``StopAsyncIteration`` exception, which is propagated through ``PyAsyncGenASend.send()`` and ``PyAsyncGenASend.throw()`` methods. ``PyAsyncGenAThrow`` is very similar to ``PyAsyncGenASend``. The only difference is that ``PyAsyncGenAThrow.send()``, when called first time, throws an exception into the parent ``agen`` object (instead of pushing a value into it.) New Standard Library Functions and Types ---------------------------------------- 1. ``types.AsyncGeneratorType`` -- type of asynchronous generator object. 2. ``sys.set_asyncgen_finalizer()`` and ``sys.get_asyncgen_finalizer()`` methods to set up asynchronous generators finalizers in event loops. 3. ``inspect.isasyncgen()`` and ``inspect.isasyncgenfunction()`` introspection functions. Backwards Compatibility ----------------------- The proposal is fully backwards compatible. In Python 3.5 it is a ``SyntaxError`` to define an ``async def`` function with a ``yield`` expression inside, therefore it's safe to introduce asynchronous generators in 3.6. Performance =========== Regular Generators ------------------ There is no performance degradation for regular generators. The following micro benchmark runs at the same speed on CPython with and without asynchronous generators:: def gen(): i = 0 while i < 100000000: yield i i += 1 list(gen()) Improvements over asynchronous iterators ---------------------------------------- The following micro-benchmark shows that asynchronous generators are about **2.3x faster** than asynchronous iterators implemented in pure Python:: N = 10 ** 7 async def agen(): for i in range(N): yield i class AIter: def __init__(self): self.i = 0 def __aiter__(self): return self async def __anext__(self): i = self.i if i >= N: raise StopAsyncIteration self.i += 1 return i Design Considerations ===================== ``aiter()`` and ``anext()`` builtins ------------------------------------ Originally, PEP 492 defined ``__aiter__`` as a method that should return an *awaitable* object, resulting in an asynchronous iterator. However, in CPython 3.5.2, ``__aiter__`` was redefined to return asynchronous iterators directly. To avoid breaking backwards compatibility, it was decided that Python 3.6 will support both ways: ``__aiter__`` can still return an *awaitable* with a ``DeprecationWarning`` being issued. Because of this dual nature of ``__aiter__`` in Python 3.6, we cannot add a synchronous implementation of ``aiter()`` built-in. Therefore, it is proposed to wait until Python 3.7. Asynchronous list/dict/set comprehensions ----------------------------------------- Syntax for asynchronous comprehensions is unrelated to the asynchronous generators machinery, and should be considered in a separate PEP. Asynchronous ``yield from`` --------------------------- While it is theoretically possible to implement ``yield from`` support for asynchronous generators, it would require a serious redesign of the generators implementation. ``yield from`` is also less critical for asynchronous generators, since there is no need provide a mechanism of implementing another coroutines protocol on top of coroutines. And to compose asynchronous generators a simple ``async for`` loop can be used:: async def g1(): yield 1 yield 2 async def g2(): async for v in g1(): yield v Why the ``asend()`` and ``athrow()`` methods are necessary ---------------------------------------------------------- They make it possible to implement concepts similar to ``contextlib.contextmanager`` using asynchronous generators. For instance, with the proposed design, it is possible to implement the following pattern:: @async_context_manager async def ctx(): await open() try: yield finally: await close() async with ctx(): await ... Another reason is that it is possible to push data and throw exceptions into asynchronous generators using the object returned from ``__anext__`` object, but it is hard to do that correctly. Adding explicit ``asend()`` and ``athrow()`` will pave a safe way to accomplish that. In terms of implementation, ``asend()`` is a slightly more generic version of ``__anext__``, and ``athrow()`` is very similar to ``aclose()``. Therefore having these methods defined for asynchronous generators does not add any extra complexity. Example ======= A working example with the current reference implementation (will print numbers from 0 to 9 with one second delay):: async def ticker(delay, to): for i in range(to): yield i await asyncio.sleep(delay) async def run(): async for i in ticker(1, 10): print(i) import asyncio loop = asyncio.get_event_loop() try: loop.run_until_complete(run()) finally: loop.close() Implementation ============== The complete reference implementation is available at [1]_. References ========== .. [1] https://github.com/1st1/cpython/tree/async_gen Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Aug 25 12:44:50 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 26 Aug 2016 02:44:50 +1000 Subject: [Python-ideas] Adding optional parameter to shutil.rmtree to not delete root. In-Reply-To: References: <1733535507.1841020.1472092145046.JavaMail.yahoo.ref@mail.yahoo.com> <1733535507.1841020.1472092145046.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 25 August 2016 at 23:14, Victor Stinner wrote: > Maybe add a different function rather add a flag? Something like > shutil.remove_dir_files(). The typical Python term for the concept would be "clear", giving shutil.clear_dir(). Like the ".clear()" method on containers, it would delete the contents, but leave the container itself alone. rmtree() could then be a thin wrapper around clear_dir() that also deletes the base directory. Alternatively, if we wanted to stick with the "rm" prefix, we could use "shutil.rmcontents()" as the name. The main downside I'd see to that is that I'd half expect it to work on files as well (truncating them to a length of zero), while clear_dir() is unambiguous. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From benhoyt at gmail.com Thu Aug 25 13:31:10 2016 From: benhoyt at gmail.com (Ben Hoyt) Date: Thu, 25 Aug 2016 13:31:10 -0400 Subject: [Python-ideas] Atomic counter / atomic increment Message-ID: I had to implement a simple atomic counter the other day to count the total number of requests processed in a multi-threaded Python web server. I was doing a demo of "how cool Python is" to my colleagues, and they were generally wowed, but one of the things that made them do a double-take (coming mostly from Scala/Java) was that there was no atomic counter in the standard library. The fact that I and many other folks have implemented such things makes me wonder if it should be in the standard library. It's pretty simple to implement, basically the handful of lines of code below (full version on GitHub Gist at https://gist.github.com/benhoyt/8c8a8d62debe8e5aa5340373f9c509c7): import threading class AtomicCounter: def __init__(self, initial=0): self.value = initial self._lock = threading.Lock() def increment(self, num=1): with self._lock: self.value += num return self.value And if you just want a one-off and don't want to write a class, it's like so: import threading counter_lock = threading.Lock() counter = 0 with counter_lock: counter += 1 value = counter print(value) But it could be this much more obvious code: import threading counter = threading.AtomicCounter() value = counter.increment() print(value) Thoughts? Would such a class make a good candidate for the standard library? (API could probably be improved.) -Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Aug 25 14:01:30 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 25 Aug 2016 20:01:30 +0200 Subject: [Python-ideas] Atomic counter / atomic increment In-Reply-To: References: Message-ID: <57BF327A.4060802@egenix.com> On 25.08.2016 19:31, Ben Hoyt wrote: > I had to implement a simple atomic counter the other day to count the total > number of requests processed in a multi-threaded Python web server. > > I was doing a demo of "how cool Python is" to my colleagues, and they were > generally wowed, but one of the things that made them do a double-take > (coming mostly from Scala/Java) was that there was no atomic counter in the > standard library. > > The fact that I and many other folks have implemented such things makes me > wonder if it should be in the standard library. As long as Python uses a GIL to protect C level function calls, you can use an iterator for this: import itertools x = itertools.count() ... mycount = next(x) ... The trick here is that the CALL_FUNCTION byte code will trigger the increment of the iterator. Since this is implemented in C, the GIL will serve as lock on the iterator while it is being incremented. With Python 4.0, this will all be different, though :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Aug 25 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From python-ideas at shalmirane.com Thu Aug 25 14:02:11 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Thu, 25 Aug 2016 11:02:11 -0700 Subject: [Python-ideas] SI scale factors in Python Message-ID: <20160825180211.GA1472@kundert.designers-guide.com> All, This proposal basically has two parts. One part is that Python should naturally support printing real numbers with SI scale factors. Currently there are three formats for printing real number: %f, %d, %g. They all become difficult to read for even moderately large or small numbers. Exponential notation is hard for humans to read. That is why SI scale factors have largely replaced exponential notation everywhere except in programming. Adding another format for printing real numbers in human readable form seems like a modest extension that is long past due in all programming languages. I am asking that Python be the leader here. I am sure other languages will pick it up once it is implemented in Python. The second part is the logical dual to the first: input. People should be able to enter numbers in Python using SI scale factors. This means as real literals, such as 2.4G, but it should also work with casting, float('2.4G'). Once you allow SI scale factors on numbers, the natural tendency is for people to want to add units, which is a good thing because it gives important information about the number. We should allow it because it improves the code by making it more self documenting. Even if the language completely ignores the units, we have still gained by allowing the units to be there, just like we gain when we allow user to add comments to their code even though the compiler ignores them. Some people have suggested that we take the next step and use the units for dimensional analysis, but that is highly problematic because you cannot do dimensional analysis unless everything is specified with the correct units, and that can be a huge burden for the user. So instead, I am suggesting that we provide simple hooks that simply allow access to the units. That way people can build dimensional analysis packages using the units if they felt the need. -Ken From benhoyt at gmail.com Thu Aug 25 14:10:42 2016 From: benhoyt at gmail.com (Ben Hoyt) Date: Thu, 25 Aug 2016 14:10:42 -0400 Subject: [Python-ideas] Atomic counter / atomic increment In-Reply-To: <57BF327A.4060802@egenix.com> References: <57BF327A.4060802@egenix.com> Message-ID: > As long as Python uses a GIL to protect C level function > calls, you can use an iterator for this: > > import itertools > x = itertools.count() > ... > mycount = next(x) > Yeah, that's a neat hack -- I saw it recommended on StackOverflow, and saw it used in the standard library somewhere. I think that's probably okay in the *CPython* stdlib, because it's CPython so you know it has the GIL. But this wouldn't work in other Python implementations, would it (IronPython and Jython don't have a GIL). Or when itertools.count() is implemented in pure Python on some system? Seems like it could blow up in someone's face when they're least expecting it. I also think using *iter*tools is a pretty non-obvious way to get a thread-safe counter. -Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Aug 25 14:46:44 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 25 Aug 2016 20:46:44 +0200 Subject: [Python-ideas] Atomic counter / atomic increment In-Reply-To: References: <57BF327A.4060802@egenix.com> Message-ID: <57BF3D14.80807@egenix.com> On 25.08.2016 20:10, Ben Hoyt wrote: >> As long as Python uses a GIL to protect C level function >> calls, you can use an iterator for this: >> >> import itertools >> x = itertools.count() >> ... >> mycount = next(x) >> > > Yeah, that's a neat hack -- I saw it recommended on StackOverflow, and saw > it used in the standard library somewhere. I think that's probably okay in > the *CPython* stdlib, because it's CPython so you know it has the GIL. But > this wouldn't work in other Python implementations, would it (IronPython > and Jython don't have a GIL). Or when itertools.count() is implemented in > pure Python on some system? Seems like it could blow up in someone's face > when they're least expecting it. I also think using *iter*tools is a pretty > non-obvious way to get a thread-safe counter. All true. Having an implementation in threading which hides away the details would be nice. On CPython using iterators would certainly be one of the most efficient ways of doing this. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Aug 25 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From guido at python.org Thu Aug 25 14:46:43 2016 From: guido at python.org (Guido van Rossum) Date: Thu, 25 Aug 2016 11:46:43 -0700 Subject: [Python-ideas] Atomic counter / atomic increment In-Reply-To: References: <57BF327A.4060802@egenix.com> Message-ID: The only reason I can think of for wanting this simple class in the stdlib would be that on GIL-free Python implementations you could replace it with a lock-free version. But I think we'd probably want to rethink a bunch of other datastructures to be lock-free, and a 3rd party package on PyPI makes more sense to me than jumping right into the stdlib. On Thu, Aug 25, 2016 at 11:10 AM, Ben Hoyt wrote: > > As long as Python uses a GIL to protect C level function >> calls, you can use an iterator for this: >> >> import itertools >> x = itertools.count() >> ... >> mycount = next(x) >> > > Yeah, that's a neat hack -- I saw it recommended on StackOverflow, and saw > it used in the standard library somewhere. I think that's probably okay in > the *CPython* stdlib, because it's CPython so you know it has the GIL. But > this wouldn't work in other Python implementations, would it (IronPython > and Jython don't have a GIL). Or when itertools.count() is implemented in > pure Python on some system? Seems like it could blow up in someone's face > when they're least expecting it. I also think using *iter*tools is a pretty > non-obvious way to get a thread-safe counter. > > -Ben > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicksjacobson at yahoo.com Thu Aug 25 15:27:42 2016 From: nicksjacobson at yahoo.com (Nick Jacobson) Date: Thu, 25 Aug 2016 19:27:42 +0000 (UTC) Subject: [Python-ideas] Adding optional parameter to shutil.rmtree to not delete root. In-Reply-To: References: <1733535507.1841020.1472092145046.JavaMail.yahoo.ref@mail.yahoo.com> <1733535507.1841020.1472092145046.JavaMail.yahoo@mail.yahoo.com> Message-ID: <1469289453.2481903.1472153262615.JavaMail.yahoo@mail.yahoo.com> +1 for clear_dir() I agree that there's no other obvious meaning that it could have. On Thursday, August 25, 2016 9:44 AM, Nick Coghlan wrote: On 25 August 2016 at 23:14, Victor Stinner wrote: > Maybe add a different function rather add a flag? Something like > shutil.remove_dir_files(). The typical Python term for the concept would be "clear", giving shutil.clear_dir(). Like the ".clear()" method on containers, it would delete the contents, but leave the container itself alone. rmtree() could then be a thin wrapper around clear_dir() that also deletes the base directory. Alternatively, if we wanted to stick with the "rm" prefix, we could use "shutil.rmcontents()" as the name. The main downside I'd see to that is that I'd half expect it to work on files as well (truncating them to a length of zero), while clear_dir() is unambiguous. Cheers, Nick. -- Nick Coghlan? |? ncoghlan at gmail.com? |? Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Thu Aug 25 15:50:31 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 25 Aug 2016 15:50:31 -0400 Subject: [Python-ideas] Adding optional parameter to shutil.rmtree to not delete root. In-Reply-To: <1469289453.2481903.1472153262615.JavaMail.yahoo@mail.yahoo.com> References: <1733535507.1841020.1472092145046.JavaMail.yahoo.ref@mail.yahoo.com> <1733535507.1841020.1472092145046.JavaMail.yahoo@mail.yahoo.com> <1469289453.2481903.1472153262615.JavaMail.yahoo@mail.yahoo.com> Message-ID: <712E42BD-72AB-49CA-A14F-0DA1F22BA3AC@gmail.com> > On Aug 25, 2016, at 3:27 PM, Nick Jacobson via Python-ideas wrote: > > +1 for clear_dir() > > I agree that there's no other obvious meaning that it could have. +1, but I think "cleardir" would better match naming conventions in the shutil module. (My personal rule of thumb on the use of underscores in function names is to omit them if any name component is abbreviated. So either spell it out as clear_directory or shorten as cleardir.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Aug 25 16:03:32 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 25 Aug 2016 21:03:32 +0100 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825180211.GA1472@kundert.designers-guide.com> References: <20160825180211.GA1472@kundert.designers-guide.com> Message-ID: On 25 August 2016 at 19:02, Ken Kundert wrote: > This proposal basically has two parts. One part is that Python should > naturally support printing real numbers with SI scale factors. Currently there > are three formats for printing real number: %f, %d, %g. They all become > difficult to read for even moderately large or small numbers. Exponential > notation is hard for humans to read. That is why SI scale factors have largely > replaced exponential notation everywhere except in programming. Adding another > format for printing real numbers in human readable form seems like a modest > extension that is long past due in all programming languages. I am asking that > Python be the leader here. I am sure other languages will pick it up once it is > implemented in Python. This part would be easy to implement as a small PyPI module, with a formatting function siformat(num) -> str. If that proved popular (and it well might) then there's more of a case for adding it as a language feature, Maybe as a custom string formatting code, or maybe just as a standard library function. But without that sort of real-world experience, it's likely that any proposal will get bogged down in "what if" theoretical debate. > The second part is the logical dual to the first: input. People should be able > to enter numbers in Python using SI scale factors. This means as real literals, > such as 2.4G, but it should also work with casting, float('2.4G'). Once you > allow SI scale factors on numbers, the natural tendency is for people to want to > add units, which is a good thing because it gives important information about > the number. We should allow it because it improves the code by making it more > self documenting. Even if the language completely ignores the units, we have > still gained by allowing the units to be there, just like we gain when we allow > user to add comments to their code even though the compiler ignores them. This is much more problematic. Currently, even the long-established decimal and rational types don't enjoy syntactic support, so I'd be surprised if SI scale factors got syntax support first. But following their lead, by (again) starting with a conversion function along the lines of SI("3k") -> 3000 would be a good test of applicability. It could easily go in the same module as you're using to trial the string formatting function above. I'd expect that a function to do this conversion would be a good way of thrashing out the more controversial aspects of the proposal - whether E means "exponent" or "exa", whether M and G get misinterpreted as computing-style 2**20 and 2**30, etc. Having real world experience of how to solve these questions would be invaluable in moving forward with a proposal to add language support. > Some people have suggested that we take the next step and use the units for > dimensional analysis, but that is highly problematic because you cannot do > dimensional analysis unless everything is specified with the correct units, and > that can be a huge burden for the user. So instead, I am suggesting that we > provide simple hooks that simply allow access to the units. That way people can > build dimensional analysis packages using the units if they felt the need. Dimensional analysis packages already exist (I believe) and they don't rely on syntactic support. Any proposal to add something to the language *really* needs to demonstrate that 1. Those packages are currently suffering from the lack of language support. 2. The proposed change would allow them to resolve existing problems that they haven't been able to address any other way. 3. The proposed change isn't some sort of "attractive nuisance" for naive users, leading them to think they can write dimensionally correct programs *without* using one of the existing packages. Python has a track record of being open to adding syntactic support if it demonstrably helps 3rd party tools (for example, the matrix multiplication operator was added specifically to help the numeric Python folks address a long-standing issue they had), so this is a genuine possibility - but such proposals need support from the groups they are intended to help. At the moment, I'm not even aware of a particular "dimensional analysis with Python" community, or any particular "best of breed" package in this area that might lead such a proposal - and a language change of this nature probably does need that sort of backing. That's not to say there's no room for debate here - the proposal is interesting, and not without precedent (for example Windows Powershell supports constants of the form 1MB, 1GB - which ironically are computing-style 2*20 and 2*30 rather than SI-style 10*6 and 10*9). But there's a pretty high bar for a language change like this, and it's worth doing the groundwork to avoid wasting a lot of time on something that's not going to be accepted in its current form. Hope this helps, Paul From python-ideas at shalmirane.com Thu Aug 25 16:06:05 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Thu, 25 Aug 2016 13:06:05 -0700 Subject: [Python-ideas] SI scale factors in Python Message-ID: <20160825200605.GB1472@kundert.designers-guide.com> Here is a fairly typical example that illustrates the usefulness of supporting SI scale factors and units in Python. This is simple simulation code that is used to test a current mirror driving an 100kOhm resistor ... Here is some simulation code that uses SI scale factors for delta in [-500nA, 0, 500nA]: input = 2.75uA + delta wait(1us) expected = 100kOhm*(2.75uA + delta) tolerance = 2.2mV fails = check_output(expected, tolerance) print('%s: I(in)=%rA, measured V(out)=%rV, expected V(out)=%rV, diff=%rV.' % ( 'FAIL' if fails else 'pass', input, get_output(), expected, get_output() - expected )) with the output being: pass: I(in)=2.25uA, measured V(out)=226.7mV, expected V(out)=225mV, diff=1.7mV. pass: I(in)=2.75uA, measured V(out)=276.8mV, expected V(out)=275mV, diff=1.8mV. FAIL: I(in)=3.25uA, measured V(out)=327.4mV, expected V(out)=325mV, diff=2.4mV. And the same code in Python today ... for delta in [-5e-7, 0, 5e-7]: input = 2.75e-6 + delta wait(1e-6) expected = 1e5*(2.75e-6 + delta) tolerance = 2.2e-3 fails = check_output(expected, tolerance) print('%s: I(in)=%eA, measured V(out)=%eV, expected V(out)=%eV, diff=%eV.' % ( 'FAIL' if fails else 'pass', input, get_output(), expected, get_output() - expected )) with the output being: pass: I(in)=2.25e-6A, measured V(out)=226.7e-3V, expected V(out)=225e-3V, diff=1.7e-3V. pass: I(in)=2.75e-6A, measured V(out)=276.8e-3V, expected V(out)=275e-3V, diff=1.8e-3V. FAIL: I(in)=3.25e-6A, measured V(out)=327.4e-3V, expected V(out)=325e-3V, diff=2.4e-3V. There are two things to notice. In the first example the numbers are easier to read: for example, 500nA is easier to read the 5e-7. Second, there is information in the units that provides provides useful information. One can easily see that the input signal is a current and that the output is a voltage. Furthermore, anybody can look at this code and do a simple sanity check on the expressions even if they don't understand the system being simulated. They can check the units on the input and output expressions to assure that they are all consistent. -Ken From rosuav at gmail.com Thu Aug 25 18:43:21 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 26 Aug 2016 08:43:21 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825200605.GB1472@kundert.designers-guide.com> References: <20160825200605.GB1472@kundert.designers-guide.com> Message-ID: On Fri, Aug 26, 2016 at 6:06 AM, Ken Kundert wrote: > Here is some simulation code that uses SI scale factors > > for delta in [-500nA, 0, 500nA]: > input = 2.75uA + delta > wait(1us) > expected = 100kOhm*(2.75uA + delta) > tolerance = 2.2mV > fails = check_output(expected, tolerance) > print('%s: I(in)=%rA, measured V(out)=%rV, expected V(out)=%rV, diff=%rV.' % ( > 'FAIL' if fails else 'pass', > input, get_output(), expected, get_output() - expected > )) > > And the same code in Python today ... > > for delta in [-5e-7, 0, 5e-7]: > input = 2.75e-6 + delta > wait(1e-6) > expected = 1e5*(2.75e-6 + delta) > tolerance = 2.2e-3 > fails = check_output(expected, tolerance) > print('%s: I(in)=%eA, measured V(out)=%eV, expected V(out)=%eV, diff=%eV.' % ( > 'FAIL' if fails else 'pass', > input, get_output(), expected, get_output() - expected > )) Rename a few things: for deltaA in [-5e-7, 0, 5e-7]: inputA = 2.75e-6 + deltaA wait(1e-6) expectedA = 1e5*(2.75e-6 + deltaA) toleranceV = 2.2e-3 fails = check_output(expectedA, toleranceV) print('%s: I(in)=%eA, measured V(out)=%eV, expected V(out)=%eV, diff=%eV.' % ( 'FAIL' if fails else 'pass', inputA, get_output(), expectedA, get_output() - expectedA )) I may have something wrong here (for instance, I'm not sure if check_output should be taking an expected amperage and a tolerance in volts), but you get the idea: it's the variables, not the literals, that get tagged. Another way to do this would be to use MyPy and type hinting to create several subtypes of floating-point number: class V(float): pass class A(float): pass for delta in [A(-5e-7), A(0), A(5e-7)]: input = A(2.75e-6) + delta # etc or in some other way have static analysis tag the variables, based on their origins. > There are two things to notice. In the first example the numbers are easier to > read: for example, 500nA is easier to read the 5e-7. This is useful, but doesn't depend on the "A" at the end. > Second, there is > information in the units that provides provides useful information. One can > easily see that the input signal is a current and that the output is a voltage. > Furthermore, anybody can look at this code and do a simple sanity check on the > expressions even if they don't understand the system being simulated. They can > check the units on the input and output expressions to assure that they are all > consistent. These two are better served by marking the variables rather than the literals; in fact, if properly done, the tagging can be verified by a program, not just a human. (That's what tools like MyPy are aiming to do.) ChrisA From xavier.combelle at gmail.com Thu Aug 25 19:35:04 2016 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Fri, 26 Aug 2016 01:35:04 +0200 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825200605.GB1472@kundert.designers-guide.com> References: <20160825200605.GB1472@kundert.designers-guide.com> Message-ID: <56ad62dc-fcfd-760a-9d65-c661a3b2468f@gmail.com> On 25/08/2016 22:06, Ken Kundert wrote: > Here is a fairly typical example that illustrates the usefulness of supporting > SI scale factors and units in Python. > > This is simple simulation code that is used to test a current mirror driving an > 100kOhm resistor ... > > Here is some simulation code that uses SI scale factors > > for delta in [-500nA, 0, 500nA]: > input = 2.75uA + delta > wait(1us) > expected = 100kOhm*(2.75uA + delta) > tolerance = 2.2mV > fails = check_output(expected, tolerance) > print('%s: I(in)=%rA, measured V(out)=%rV, expected V(out)=%rV, diff=%rV.' % ( > 'FAIL' if fails else 'pass', > input, get_output(), expected, get_output() - expected > )) > > with the output being: > > pass: I(in)=2.25uA, measured V(out)=226.7mV, expected V(out)=225mV, diff=1.7mV. > pass: I(in)=2.75uA, measured V(out)=276.8mV, expected V(out)=275mV, diff=1.8mV. > FAIL: I(in)=3.25uA, measured V(out)=327.4mV, expected V(out)=325mV, diff=2.4mV. > > And the same code in Python today ... > > for delta in [-5e-7, 0, 5e-7]: > input = 2.75e-6 + delta > wait(1e-6) > expected = 1e5*(2.75e-6 + delta) > tolerance = 2.2e-3 > fails = check_output(expected, tolerance) > print('%s: I(in)=%eA, measured V(out)=%eV, expected V(out)=%eV, diff=%eV.' % ( > 'FAIL' if fails else 'pass', > input, get_output(), expected, get_output() - expected > )) > > with the output being: > > pass: I(in)=2.25e-6A, measured V(out)=226.7e-3V, expected V(out)=225e-3V, diff=1.7e-3V. > pass: I(in)=2.75e-6A, measured V(out)=276.8e-3V, expected V(out)=275e-3V, diff=1.8e-3V. > FAIL: I(in)=3.25e-6A, measured V(out)=327.4e-3V, expected V(out)=325e-3V, diff=2.4e-3V. > > There are two things to notice. In the first example the numbers are easier to > read: for example, 500nA is easier to read the 5e-7. Second, there is > information in the units that provides provides useful information. One can > easily see that the input signal is a current and that the output is a voltage. > Furthermore, anybody can look at this code and do a simple sanity check on the > expressions even if they don't understand the system being simulated. They can > check the units on the input and output expressions to assure that they are all > consistent. > > -Ken > _______________________________________________ > And the same code with python today def wait(delay): pass def check_output(expected, tolerance): return False def get_output(): return 1*uA def f(): for delta in [-500*nA, 0, 500*nA]: input = 2.75*uA + delta wait(1*us) expected = 100*kOhm*(2.75*uA + delta) tolerance = 2.2*mV fails = check_output(expected, tolerance) print('%s: I(in)=%rA, measured V(out)=%rV, expected V(out)=%rV, diff=%rV.' % ( 'FAIL' if fails else 'pass', input, get_output(), expected, get_output() - expected )) f() pass: I(in)=2.25e-05A, measured V(out)=1e-05V, expected V(out)=22.5V, diff=-22.49999V. pass: I(in)=2.75e-05A, measured V(out)=1e-05V, expected V(out)=27.5V, diff=-27.49999V. pass: I(in)=3.2500000000000004e-05A, measured V(out)=1e-05V, expected V(out)=32.50000000000001V, diff=-32.499990000000004V. Do you really see fundamental difference with your original code? From steve at pearwood.info Thu Aug 25 19:50:53 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 26 Aug 2016 09:50:53 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: References: <20160825180211.GA1472@kundert.designers-guide.com> Message-ID: <20160825235052.GJ26300@ando.pearwood.info> On Thu, Aug 25, 2016 at 09:03:32PM +0100, Paul Moore wrote: > That's not to say there's no room for debate here - the proposal is > interesting, and not without precedent (for example Windows Powershell > supports constants of the form 1MB, 1GB That's great! I know a few command line tools and scripts which do that, and it's really useful. > - which ironically are > computing-style 2*20 and 2*30 rather than SI-style 10*6 and 10*9). Do I remember correctly that Windows file Explorer displays disk sizes is decimal SI units? If so, how very Microsoft, to take a standard and confuse it rather than encourage it :-( Historically, there are *three* different meanings for "MB", only one of which is an official standard: http://physics.nist.gov/cuu/Units/binary.html -- Steve From steve at pearwood.info Thu Aug 25 20:14:53 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 26 Aug 2016 10:14:53 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825180211.GA1472@kundert.designers-guide.com> References: <20160825180211.GA1472@kundert.designers-guide.com> Message-ID: <20160826001453.GK26300@ando.pearwood.info> On Thu, Aug 25, 2016 at 11:02:11AM -0700, Ken Kundert wrote: > Once you > allow SI scale factors on numbers, the natural tendency is for people to want to > add units, which is a good thing because it gives important information about > the number. We should allow it because it improves the code by making it more > self documenting. Even if the language completely ignores the units, we have > still gained by allowing the units to be there, just like we gain when we allow > user to add comments to their code even though the compiler ignores them. This is dangerously wrong, and the analogy with comments is misleading. Everyone knows that comments are ignored by the interpreter, and even then, the ideal is to write self-documenting code, not comments: "At Resolver we've found it useful to short-circuit any doubt and just refer to comments in code as 'lies'. " --Michael Foord paraphrases Christian Muirhead on python-dev, 2009-03-22 This part of your proposal would be *worse*: you would fool the casual or naive user into believing that Python did dimensional analysis, while in fact not doing so. You would give them a false sense of security. Don't think of people writing code like this: result = 23mA + 75MHz which is obviously wrong. Think about them writing code like this: total = sum_resistors_in_parallel(input, extra) where the arguments may themselves have been passed to the current function as parameters from somewhere else. Or they may be data values read from a file. Their definitions may be buried deep in another part of the program. Their units aren't obvious to the reader without serious work. This part of your proposal makes the language *worse*: we lose the simple data validation that "23mA" is not a valid number, but without gaining the protection of dimensional analysis. To give an analogy, you are suggesting that we stick a sticker on the dashboard of our car saying "Airbag" but without actually installing an airbag. And you've removed the seat belt. The driver has to read the manual to learn that the "Airbag" is just a label, not an actual functioning airbag. > Some people have suggested that we take the next step and use the units for > dimensional analysis, but that is highly problematic because you cannot do > dimensional analysis unless everything is specified with the correct units, and > that can be a huge burden for the user. What? That's the *whole point* of dimensional analysis: to ensure that the user is not adding a length to a weight and then treating the result as a time. To say that "it is too hard to specify the correct units, so we should just ignore the units" boggles my mind. Any reasonable dimensional program should perform automatic unit conversions: you can add inches to metres, but not inches to pounds. There are already many of these available for Python. -- Steve From ncoghlan at gmail.com Thu Aug 25 22:19:53 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 26 Aug 2016 12:19:53 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825200605.GB1472@kundert.designers-guide.com> References: <20160825200605.GB1472@kundert.designers-guide.com> Message-ID: On 26 August 2016 at 06:06, Ken Kundert wrote: > Here is a fairly typical example that illustrates the usefulness of supporting > SI scale factors and units in Python. Ken, To build a persuasive case, you'll find it's necessary to stop comparing Python-with-syntactic-support to Python-with-no-syntactic-support-and-no-third-party-libraries, and instead bring in the existing dimensional analysis libraries and tooling, and show how this change would improve *those*. That is, compare your proposed syntax *not* to plain Python code, but to Python code using some of the support libraries Steven D'Aprano mentioned, whether that's SymPy (as in http://docs.sympy.org/latest/modules/physics/unitsystems/examples.html ) or one of the other unit conversion libraries (as in http://stackoverflow.com/questions/2125076/unit-conversion-in-python ) It's also worth explicitly highlighting that the main intended beneficiaries would *NOT* be traditional software applications (where the data entry UI is clearly distinct from the source code) and instead engineers, scientists, and other data analysts using environments like Project Jupyter, where the code frequently *is* the data entry UI. And given that target audience, a further question that needs to be addressed is whether or not native syntactic support would be superior to what's already possible through Jupyter/IPython cell magics like https://bitbucket.org/birkenfeld/ipython-physics The matrix multiplication PEP (https://www.python.org/dev/peps/pep-0465/ ) is one of the best examples to study on how to make this kind of case well - it surveys the available options, explains how they all have a shared challenge with the status quo, and requests the simplest possible enabling change to the language definition to help them solve the problem. One potentially fruitful argument to pursue might be to make Python a better tool for teaching maths and science concepts at primary and secondary level (since Jupyter et al are frequently seen as introducing too much tooling complexity to be accessible at that level), but again, you'd need to explore whether or not anyone is currently using Python in that way, and what their feedback is in terms of the deficiencies of the status quo. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From random832 at fastmail.com Thu Aug 25 23:34:23 2016 From: random832 at fastmail.com (Random832) Date: Thu, 25 Aug 2016 23:34:23 -0400 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825235052.GJ26300@ando.pearwood.info> References: <20160825180211.GA1472@kundert.designers-guide.com> <20160825235052.GJ26300@ando.pearwood.info> Message-ID: <1472182463.1965630.706572961.46E190EF@webmail.messagingengine.com> On Thu, Aug 25, 2016, at 19:50, Steven D'Aprano wrote: > Historically, there are *three* different meanings for "MB", only one of > which is an official standard: > > http://physics.nist.gov/cuu/Units/binary.html The link doesn't work for me... is the third one the 1,024,000 bytes implicit in describing standard-formatted floppy disks as "1.44 MB" (they are actually 1440 bytes: 80 tracks, 2 sides, 18 512-byte sectors) or "1.2 MB" (15 sectors). From python-ideas at shalmirane.com Thu Aug 25 23:46:54 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Thu, 25 Aug 2016 20:46:54 -0700 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160826001453.GK26300@ando.pearwood.info> References: <20160825180211.GA1472@kundert.designers-guide.com> <20160826001453.GK26300@ando.pearwood.info> Message-ID: <20160826034654.GA24190@kundert.designers-guide.com> On Fri, Aug 26, 2016 at 10:14:53AM +1000, Steven D'Aprano wrote: > On Thu, Aug 25, 2016 at 11:02:11AM -0700, Ken Kundert wrote: > > > Even if the language completely ignores the units, we have still gained by > > allowing the units to be there, just like we gain when we allow user to add > > comments to their code even though the compiler ignores them. > > This part of your proposal would be *worse*: you would fool the casual or > naive user into believing that Python did dimensional analysis, while in fact > not doing so. You would give them a false sense of security. This idea is new to general purpose languages, but it has been used for over 40 years in the circuit design community. Specifically, SPICE, an extremely heavily used circuit simulation package, introduced this concept in 1974. In SPICE the scale factor is honored but any thing after the scale factor is ignored. Being both a heavy user and developer of SPICE, I can tell you that in all that time this issue has never come up. In fact, the users never expected there to be any support for dimensional analysis, nor did they request it. > Don't think of people writing code like this: > > result = 23mA + 75MHz > > which is obviously wrong. Think about them writing code like this: > > total = sum_resistors_in_parallel(input, extra) You say that '23mA + 75MHz' is obviously wrong, but it is only obviously wrong because the units are included, which is my point. If I had written '0.023 + 76e6', it would not be obviously wrong. > > Some people have suggested that we take the next step and use the units for > > dimensional analysis, but that is highly problematic because you cannot do > > dimensional analysis unless everything is specified with the correct units, > > and that can be a huge burden for the user. > > What? > > That's the *whole point* of dimensional analysis: to ensure that the > user is not adding a length to a weight and then treating the result as > a time. To say that "it is too hard to specify the correct units, so we > should just ignore the units" boggles my mind. > > Any reasonable dimensional program should perform automatic unit > conversions: you can add inches to metres, but not inches to pounds. > There are already many of these available for Python. Indeed that is the point of dimensional analysis. However, despite the availability of all these packages, they are rarely if ever used because you have to invest a tremendous effort before they can be effective. For example, consider the simple case of Ohms Law: V = R*I To perform dimensional analysis we need to know the units of V, R, and I. These are variables not literals, so some mechanism needs to be provided for specifying the units of variables, even those that don't exist yet, like V. And what if the following is encountered: V = I Dimensional analysis says this is wrong, but the it may be that the resistance is simply being suppressed because it is unity. False positives of this sort are a tremendous problem with this form of automated dimensional analysis. Then there are things like this: V = 2*sin(f*t) In this case dimensional analysis (DA) indicates an error, but it is the wrong error. DA will complain about the fact that a dimensionless number is being assigned to a variable intended to carry a voltage. But in this case 2 has units of voltage, but they were not explicitly specified, so this is another false positive. The real error is the argument of the sin function. The sin function expects radians, which is dimensionless, and f*t is dimensionless, so there is no complaint, but it is not in radians, and so there should be an error. You could put a 'unit' of radians on pi, but that is not right either. Really it is 2*pi that gives you radians, and if you put radians on py and then used pi to compute the area of a unit circle, you would get pi*r^2 where r=1 meter, and the resulting units would be radians*m^2, which is nonsensical. Turns out there are many kinds of dimensionless numbers. For example, the following represents a voltage amplifier: Av = 4 # voltage gain (V/V) Ai = 0.03 # current gain (A/A) Vout = Ai*Vin In this case Ai is expected to be unitless, which it is, so there is no error. However Ai is the ratio of two currents, not two voltages, so there actually should be an error. Now consider an expression that contains an arbitrary function call: V = f(I) How do we determine the units of the return value of f? Now, finally consider the BSIM4 MOSFET model equations. They are described in http://www-device.eecs.berkeley.edu/bsim/Files/BSIM4/BSIM460/doc/BSIM460_Manual.pdf If you look at this document you will find over 200 pages of extremely complicated and tedious model equations. The parameters of these models can have extremely complicated units. Well beyond anything I am proposing for real literals. For example, consider NOIA, the 'flicker noise parameter A'. It has units of (eV)^-1 s^(1-EF) m^-3. Where EF is some number between 0 and 1. That will never work with dimensional analysis because it does not even make sense from the perspectives of dimensional analysis. Describing all of the units in those equations would be a huge undertaking, and in the end they would end up with errors they cannot get rid off. Dimensional analysis is a seductive siren that, in the end, demands a great deal of you and generally delivers very little. And it has little to do with my proposal, which is basically this: Numbers with SI scale factor and units have become very popular. Using them is a very common way of expressing either large or small numbers. And that is true in the scientific and engineering communities, in the programming community (even the linux sort command supports sorting on numbers with SI scale factors: --human-numeric-sort), and even in popular culture. Python should support them. And I mean support with a capital S. I can come up with many different hacks to support these ideas in Python today, and I have. But this should not be a hack. This should be built into the language front and center. It should be the preferred way that we specify and output real numbers. -Ken From ncoghlan at gmail.com Fri Aug 26 01:34:03 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 26 Aug 2016 15:34:03 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160826034654.GA24190@kundert.designers-guide.com> References: <20160825180211.GA1472@kundert.designers-guide.com> <20160826001453.GK26300@ando.pearwood.info> <20160826034654.GA24190@kundert.designers-guide.com> Message-ID: On 26 August 2016 at 13:46, Ken Kundert wrote: > On Fri, Aug 26, 2016 at 10:14:53AM +1000, Steven D'Aprano wrote: >> On Thu, Aug 25, 2016 at 11:02:11AM -0700, Ken Kundert wrote: >> >> > Even if the language completely ignores the units, we have still gained by >> > allowing the units to be there, just like we gain when we allow user to add >> > comments to their code even though the compiler ignores them. >> >> This part of your proposal would be *worse*: you would fool the casual or >> naive user into believing that Python did dimensional analysis, while in fact >> not doing so. You would give them a false sense of security. > > This idea is new to general purpose languages, but it has been used for over 40 > years in the circuit design community. Specifically, SPICE, an extremely heavily > used circuit simulation package, introduced this concept in 1974. In SPICE the > scale factor is honored but any thing after the scale factor is ignored. Being > both a heavy user and developer of SPICE, I can tell you that in all that time > this issue has never come up. In fact, the users never expected there to be any > support for dimensional analysis, nor did they request it. [snip] > And it has little to do with my proposal, which is basically this: > > Numbers with SI scale factor and units have become very popular. Using them is > a very common way of expressing either large or small numbers. And that is true > in the scientific and engineering communities, in the programming community > (even the linux sort command supports sorting on numbers with SI scale factors: > --human-numeric-sort), and even in popular culture. > > Python should support them. And I mean support with a capital S. I can come up > with many different hacks to support these ideas in Python today, and I have. > But this should not be a hack. This should be built into the language front and > center. It should be the preferred way that we specify and output real numbers. Thanks for the additional background Ken - that does start to build a much more compelling case. I now think there's another analogy you'll be able to draw on to make it even more compelling at a language design level: just because the *runtime* doesn't do dimensional analysis on static unit annotations doesn't mean that sufficiently clever static analysers couldn't do so at some point in the future. That then puts this proposal squarely in the same category as function annotations and gradual typing: semantic annotations that more clearly expressed developer intent, and aren't checked at runtime, but can be checked by a human during code review, and (optionally) by static analysers as a quality gate. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pavol.lisy at gmail.com Fri Aug 26 01:35:36 2016 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Fri, 26 Aug 2016 07:35:36 +0200 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160825081955.GA21350@kundert.designers-guide.com> References: <20160825081955.GA21350@kundert.designers-guide.com> Message-ID: On 8/25/16, Ken Kundert wrote: [...] > Just allowing the units to be present, even it not > > retained, is a big advantage because it can bring a great deal of clarity to > the > meaning of the number. For example, even if the language does not flag an > error > when a user writes: > > vdiff = 1mV - 30uA It reminds me: "Metric mishap caused loss of NASA's Mars Climate orbiter. It could be nice to have language support helping to avoid something similar. [...] > 2. Or accept X in lieu of E. After all, the e is silent anyway. Thus, on input > we accept ... > > 1Y -> 1e+24 > 1Z -> 1e+21 > -> 1X -> 1e+18 <- only difference > 1P -> 1e+15 Are SI prefixes frozen? Could not be safer to use E_ instead of X in case of possible new future prefixes? ------ What you are proposing reminds me " [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484". Instead of adding constant type declaration. Sorry this is just really quick idea around thinking that it could be good to have parser possibility to check metric mishaps. distance1 = 1:km # or? -> distance1:length = 1:km distance2 = 1000:cm # or? -> distance2:length = 1000:cm length = distance1 + distance2 # our parser could yell :) (or compiler could translate it with warning) From levkivskyi at gmail.com Fri Aug 26 02:31:10 2016 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Fri, 26 Aug 2016 09:31:10 +0300 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: References: <20160825180211.GA1472@kundert.designers-guide.com> <20160826001453.GK26300@ando.pearwood.info> <20160826034654.GA24190@kundert.designers-guide.com> Message-ID: On 26 August 2016 at 08:34, Nick Coghlan wrote: > On 26 August 2016 at 13:46, Ken Kundert > wrote: > > On Fri, Aug 26, 2016 at 10:14:53AM +1000, Steven D'Aprano wrote: > >> On Thu, Aug 25, 2016 at 11:02:11AM -0700, Ken Kundert wrote: > >> > >> > Even if the language completely ignores the units, we have still > gained by > >> > allowing the units to be there, just like we gain when we allow user > to add > >> > comments to their code even though the compiler ignores them. > >> > >> This part of your proposal would be *worse*: you would fool the casual > or > >> naive user into believing that Python did dimensional analysis, while > in fact > >> not doing so. You would give them a false sense of security. > > > > This idea is new to general purpose languages, but it has been used for > over 40 > > years in the circuit design community. Specifically, SPICE, an extremely > heavily > > used circuit simulation package, introduced this concept in 1974. In > SPICE the > > scale factor is honored but any thing after the scale factor is > ignored. Being > > both a heavy user and developer of SPICE, I can tell you that in all > that time > > this issue has never come up. In fact, the users never expected there to > be any > > support for dimensional analysis, nor did they request it. > > [snip] > > > And it has little to do with my proposal, which is basically this: > > > > Numbers with SI scale factor and units have become very popular. Using > them is > > a very common way of expressing either large or small numbers. And that > is true > > in the scientific and engineering communities, in the programming > community > > (even the linux sort command supports sorting on numbers with SI scale > factors: > > --human-numeric-sort), and even in popular culture. > > > > Python should support them. And I mean support with a capital S. I can > come up > > with many different hacks to support these ideas in Python today, and I > have. > > But this should not be a hack. This should be built into the language > front and > > center. It should be the preferred way that we specify and output real > numbers. > > Thanks for the additional background Ken - that does start to build a > much more compelling case. > > I now think there's another analogy you'll be able to draw on to make > it even more compelling at a language design level: just because the > *runtime* doesn't do dimensional analysis on static unit annotations > doesn't mean that sufficiently clever static analysers couldn't do so > at some point in the future. That then puts this proposal squarely in > the same category as function annotations and gradual typing: semantic > annotations that more clearly expressed developer intent, and aren't > checked at runtime, but can be checked by a human during code review, > and (optionally) by static analysers as a quality gate. > Unfortunately, I didn't read the whole thread, but it seems to me that this would be just a more sophisticated version of NewType. mypy type checker already supports NewType (not sure about pytype). So that one can write (assuming PEP 526): USD = NewType('USD', float) EUR = NewType('EUR', float) amount = EUR(100) # later in code new_amount: USD = amount # flagged as error by type checker The same idea applies to physical units. Of course type checkers do not know that e.g. 1m / 1s is 1 m/s, but it is something they could be taught (for example by adding @overload for division operator). -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Aug 26 02:54:03 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 26 Aug 2016 16:54:03 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160826034654.GA24190@kundert.designers-guide.com> References: <20160825180211.GA1472@kundert.designers-guide.com> <20160826001453.GK26300@ando.pearwood.info> <20160826034654.GA24190@kundert.designers-guide.com> Message-ID: <20160826065403.GL26300@ando.pearwood.info> On Thu, Aug 25, 2016 at 08:46:54PM -0700, Ken Kundert wrote: > This idea is new to general purpose languages, For the record, at least some HP calculators include a "units" data type as part of the programming language "RPL", e.g. the HP-28 and HP-48 series. I've been using those for 20+ years so I'm quite familiar with how useful this feature can be. > but it has been used for over 40 > years in the circuit design community. Specifically, SPICE, an extremely heavily > used circuit simulation package, introduced this concept in 1974. In SPICE the > scale factor is honored but any thing after the scale factor is ignored. Being > both a heavy user and developer of SPICE, I can tell you that in all that time > this issue has never come up. In fact, the users never expected there to be any > support for dimensional analysis, nor did they request it. I can't comment about the circuit design community, but you're trying to extrapolate from a single specialist application to a general purpose programming language used by people of many, many varied levels of expertise, of competence, with many different needs. It makes a lot of sense for applications to allow SI prefixes as suffixes within a restricted range. For example, the dd application allows the user to specify the amount of data to copy using either bytes or blocks, with optional suffixes: BLOCKS and BYTES may be followed by the following multiplicative suffixes: xM M, c 1, w 2, b 512, kB 1000, K 1024, MB 1000*1000, M 1024*1024, GB 1000*1000*1000, G 1024*1024*1024, and so on for T, P, E, Z, Y. (Quoting from the man page.) That makes excellent sense for a specialist application where numeric quantities always mean the same thing, or in this case, one of two things. As purely multiplicative suffixes, that even makes sense for Python: earlier I said that it was a good idea to add a simple module defining SI and IEC multiplicative constants in the std lib so that we could do x = 42*M or similar. But that's a far cry from allowing and ignoring units. > > Don't think of people writing code like this: > > > > result = 23mA + 75MHz > > > > which is obviously wrong. Think about them writing code like this: > > > > total = sum_resistors_in_parallel(input, extra) > > You say that '23mA + 75MHz' is obviously wrong, but it is only obviously wrong > because the units are included, which is my point. If I had written '0.023 > + 76e6', it would not be obviously wrong. I understand your point and the benefit of dimensional analysis. But the problem is, as users of specialised applications we may be used to doing direct arithmetic on numeric literal values, with or without attached units: 23mA + 75MHz # error is visible 23 + 75 # error is hidden but as *programmers* we rarely do that. Generally speaking, it is rare to be doing arithmetic on literals where we might have the opportunity to attach a unit. We doing arithmetic on *variables* that have come from elsewhere. Reading the source code doesn't show us something that might be improved by adding a unit: # we hardly ever see this 23 + 75 # we almost always see something like this input + argument At best, we can choose descriptive variable names that hint what the correct dimensions should be: weight_of_contents + weight_of_container The argument that units would make it easier for the programmer to spot errors is, I think, invalid, because the programmer will hardly ever get to see the units. [...] > Indeed that is the point of dimensional analysis. However, despite the > availability of all these packages, they are rarely if ever used because you > have to invest a tremendous effort before they can be effective. For example, > consider the simple case of Ohms Law: > > V = R*I > > To perform dimensional analysis we need to know the units of V, R, and I. These > are variables not literals, so some mechanism needs to be provided for > specifying the units of variables, even those that don't exist yet, like V. This is not difficult, and you exaggerate the amount of effort required. To my sorrow, I'm not actually familiar with any of the Python libraries for this, so I'll give an example using the HP-48GX RPL language. Suppose I have a value which I am expecting to be current in amperes. (On the HP calculator, it will be passed to me on the stack, but the equivalent in Python will be a argument passed to a function.) For simplicity, let's assume that if it is a raw number, I will trust that the user knows what they are doing and just convert it to a unit object with dimension "ampere", otherwise I expect some sort of unit object which is dimensionally compatible: 1_A CONV is the RPL program to perform this conversion on the top of the stack, and raise an error if the dimensions are incompatible. Converting to a more familiar Python-like API, I would expect something like: current = Unit("A").convert(current) or possibly: current = Unit("A", current) take your pick. That's not a "tremendous" amount of effort, it is comparable to ensuring that I'm using (say) floats in the first place: if not isinstance(current, float): raise TypeError > And what if the following is encountered: > > V = I > > Dimensional analysis says this is wrong, That's because it is wrong. > but the it may be that the resistance > is simply being suppressed because it is unity. Your specialist experience in the area of circuit design is misleading you. There's effectively only one unit of resistance, the ohm, although my "units" program also lists: R_K 25812.807 ohm abohm abvolt / abamp intohm 1.000495 ohm kilohm kiloohm megohm megaohm microhm microohm ohm V/A siemensunit 0.9534 ohm statohm statvolt / statamp So even here with resistence "unity" is ambiguous. Do you mean one intohm, one microhm, one statohm or something else? I'll grant you that in the world of circuit design perhaps it could only mean the SI ohm. But you're not in the world of circuit design any more, you are dealing with a programming language that will be used by people for many, many different purposes, for whom "unity" might mean (for example): 1 foot per second 1 foot per minute 1 metre per second 1 kilometre per hour 1 mile per hour 1 lightspeed 1 knot 1 mach Specialist applications might be able to take shortcuts in dimensional analysis when "everybody knows" what the suppressed units must be. General purpose programming languages *cannot*. It is better NOT to offer the illusion of dimensional analysis than to mislead the user into thinking they are covered when they are not. Better to let them use a dedicated units package, not build a half-baked bug magnet into the language syntax. -- Steve From rosuav at gmail.com Fri Aug 26 03:04:33 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 26 Aug 2016 17:04:33 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160826065403.GL26300@ando.pearwood.info> References: <20160825180211.GA1472@kundert.designers-guide.com> <20160826001453.GK26300@ando.pearwood.info> <20160826034654.GA24190@kundert.designers-guide.com> <20160826065403.GL26300@ando.pearwood.info> Message-ID: On Fri, Aug 26, 2016 at 4:54 PM, Steven D'Aprano wrote: > But you're not in the world of circuit design any more, you are dealing > with a programming language that will be used by people for many, > many different purposes, for whom "unity" might mean (for example): > > 1 foot per second > 1 foot per minute > 1 metre per second > 1 kilometre per hour > 1 mile per hour > 1 lightspeed > 1 knot > 1 mach 1 byte per character ChrisA From ncoghlan at gmail.com Fri Aug 26 04:31:30 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 26 Aug 2016 18:31:30 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160826065403.GL26300@ando.pearwood.info> References: <20160825180211.GA1472@kundert.designers-guide.com> <20160826001453.GK26300@ando.pearwood.info> <20160826034654.GA24190@kundert.designers-guide.com> <20160826065403.GL26300@ando.pearwood.info> Message-ID: On 26 August 2016 at 16:54, Steven D'Aprano wrote: > At best, we can choose descriptive variable names that hint what the > correct dimensions should be: > > weight_of_contents + weight_of_container > > > The argument that units would make it easier for the programmer to spot > errors is, I think, invalid, because the programmer will hardly ever get > to see the units. This is based on a narrowly construed definition of "programming" though. It makes more sense in the context of interactive data analysis and similar activities, where Python is being used as a scripting language, rather than as a full-fledged applications programming language. So let's consider the following hypothetical: 1. We add SI multiplier support to the numeric literal syntax, with "E" unilaterally replaced with "X" (for both input and output) to avoid the ambiguity with exponential notation 2. One or more domain specific libraries adopt Ivan Levkivskyi's suggestion of using PEP 526 to declare units Then Ken's example becomes: from circuit_units import A, V, Ohm, seconds delta: A for delta in [-500n, 0, 500n]: input: A = 2.75u + delta wait(seconds(1u)) expected: V = Ohm(100k)*input tolerance: V = 2.2m fails = check_output(expected, tolerance) print('%s: I(in)=%rA, measured V(out)=%rV, expected V(out)=%rV, diff=%rV.' % ( 'FAIL' if fails else 'pass', input, get_output(), expected, get_output() - expected )) The only new pieces there beyond PEP 526 itself are the SI unit multiplier on literals, and the type annotations declared in the circuit_units module. To actually get a typechecker to be happy with the code, Ohm.__mul__ would need to be overloaded as returning a V result when the RHS is categorised as A. An environment focused on circuit simulation could pre-import some of those symbols so users didn't need to do it explicitly. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Fri Aug 26 05:18:12 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 26 Aug 2016 19:18:12 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <1472182463.1965630.706572961.46E190EF@webmail.messagingengine.com> References: <20160825180211.GA1472@kundert.designers-guide.com> <20160825235052.GJ26300@ando.pearwood.info> <1472182463.1965630.706572961.46E190EF@webmail.messagingengine.com> Message-ID: <20160826091812.GM26300@ando.pearwood.info> On Thu, Aug 25, 2016 at 11:34:23PM -0400, Random832 wrote: > On Thu, Aug 25, 2016, at 19:50, Steven D'Aprano wrote: > > Historically, there are *three* different meanings for "MB", only one of > > which is an official standard: > > > > http://physics.nist.gov/cuu/Units/binary.html > > The link doesn't work for me... is the third one the 1,024,000 bytes > implicit in describing standard-formatted floppy disks as "1.44 MB" > (they are actually 1440 bytes: 80 tracks, 2 sides, 18 512-byte sectors) > or "1.2 MB" (15 sectors). Quoting from the above document: Historical context* Once upon a time, computer professionals noticed that 2**10 was very nearly equal to 1000 and started using the SI prefix "kilo" to mean 1024. That worked well enough for a decade or two because everybody who talked kilobytes knew that the term implied 1024 bytes. But, almost overnight a much more numerous "everybody" bought computers, and the trade computer professionals needed to talk to physicists and engineers and even to ordinary people, most of whom know that a kilometer is 1000 meters and a kilogram is 1000 grams. Then data storage for gigabytes, and even terabytes, became practical, and the storage devices were not constructed on binary trees, which meant that, for many practical purposes, binary arithmetic was less convenient than decimal arithmetic. The result is that today "everybody" does not "know" what a megabyte is. When discussing computer memory, most manufacturers use megabyte to mean 2**20 = 1 048 576 bytes, but the manufacturers of computer storage devices usually use the term to mean 1 000 000 bytes. Some designers of local area networks have used megabit per second to mean 1 048 576 bit/s, but all telecommunications engineers use it to mean 10**6 bit/s. And if two definitions of the megabyte are not enough, a third megabyte of 1 024 000 bytes is the megabyte used to format the familiar 90 mm (3 1/2 inch), "1.44 MB" diskette. The confusion is real, as is the potential for incompatibility in standards and in implemented systems. -- Steve From steve at pearwood.info Fri Aug 26 06:01:29 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 26 Aug 2016 20:01:29 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: References: <20160825081955.GA21350@kundert.designers-guide.com> Message-ID: <20160826100129.GN26300@ando.pearwood.info> On Fri, Aug 26, 2016 at 07:35:36AM +0200, Pavol Lisy wrote: > On 8/25/16, Ken Kundert wrote: > > [...] > > > Just allowing the units to be present, even it not > > > > retained, is a big advantage because it can bring a great deal of clarity to > > the > > meaning of the number. For example, even if the language does not flag an > > error > > when a user writes: > > > > vdiff = 1mV - 30uA > > It reminds me: "Metric mishap caused loss of NASA's Mars Climate > orbiter. It could be nice to have language support helping to avoid > something similar. This proposal won't help to avoid this sort of disasterous misuse of units. It will make that sort of mistake *easier*, not harder, by giving the user the false sense of security. A good description of the Mars Orbiter mishape can be found here, with a link to the NASA report: http://pint.readthedocs.io/en/0.7.2/ Suppose I am programming the Mars lander. I read in some thruster data, in pound-force seconds: thrust = sm_forces(arg) # say, it returns 100 lbf?s I don't expect to see the tag "lbf?s" anywhere unless I explicitly print the value out or view it in a debugger. So the tag gives me no visual assistence in avoiding unit conversion bugs. It is worse than having no unit attached at all, because now I have the false sense of security that it is tagged with a unit. Much later on I pass that to a function that expects the thrust to be in Newton seconds: result = fire_engines(thrust) There's no dimensional analysis, so I could just as easily pass 100 kilograms per second cubed, or 100 volts. I have no protection from passing wrong units. But let's ignore that possibility, and trust that I do actually pass a thrust rather than something completely different. The original poster Ken has said that he doesn't want to do unit conversions. So I pass a measurement in pound force seconds, which is compatible with Newton seconds, and quite happily use 100 lbf?s as if it were 100 N?s. End result: a repeat of the original Mars lander debacle, when my lander crashes directly into the surface of Mars, due to a failure to convert units. This could have been avoided if I had used a real units package that applied the conversion factor 1 lbf?s = 44.5 N?s, but Kevin's suggestion won't prevent that. You can't avoid bugs caused by using the wrong units by just labelling values with a unit. You actually have to convert from the wrong units to the right units, something this proposal avoids. I think that Ken is misled by his experience in one narrow field, circuit design, where everyone uses precisely the same SI units and there are no conversions needed. This is a field where people can drop dimensions because everyone understands what you mean to say that a current equals a voltage. But in the wider world, that's disasterous. Take v = s/t (velocity equals distance over time). If I write v = s because it is implicitly understood that the time t is "one": s = 100 miles v = s Should v be understood as 100 miles per hour or 100 miles per second or 100 miles per year? That sort of ambiguity doesn't come up in circuit design, but it is common elsewhere. -- Steve From steve at pearwood.info Fri Aug 26 07:29:18 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 26 Aug 2016 21:29:18 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: References: <20160825180211.GA1472@kundert.designers-guide.com> <20160826001453.GK26300@ando.pearwood.info> <20160826034654.GA24190@kundert.designers-guide.com> <20160826065403.GL26300@ando.pearwood.info> Message-ID: <20160826112918.GO26300@ando.pearwood.info> On Fri, Aug 26, 2016 at 06:31:30PM +1000, Nick Coghlan wrote: > On 26 August 2016 at 16:54, Steven D'Aprano wrote: > > At best, we can choose descriptive variable names that hint what the > > correct dimensions should be: > > > > weight_of_contents + weight_of_container > > > > > > The argument that units would make it easier for the programmer to spot > > errors is, I think, invalid, because the programmer will hardly ever get > > to see the units. > > This is based on a narrowly construed definition of "programming" > though. It makes more sense in the context of interactive data > analysis and similar activities, where Python is being used as a > scripting language, rather than as a full-fledged applications > programming language. I don't think that is right. I'd put it the other way: as important as interactive use is, it is a subset of general programming, not a superset of it. I *love* using Python as a calculator (which is why this thread is inspiring me to investigate the unit conversion/tracking packages already available). But even when using Python as a calculator, oh I'm sorry, "for interactive data analysis" *wink*, there are going to be plenty of opportunities for me to write: x = ... y = ... # much later on z = x + y so that I don't necessarily see the units directly there on the screen by the time I actually go to use them. Likewise if I'm reading my values from a data file. IPython even generalises the magic variable _ into a potentially unlimited series of magic variables _1 _2 _3 etc, and it is normal to be using values taken from a variable rather than as a literal. The point is that Ken's examples of calculations on literals is misleading, because only a fraction of calculations involve literals. And likely a small fraction, if you consider all the people using Python for scripting, server-side programming, application programming, etc rather than just the subset of them using it for interactive use. By the way, here's another programming language designed for interactive use as a super-calculator. Frink does dimensional analysis and unit conversions: http://frinklang.org/#SampleCalculations If we're serious about introducing dimension and unit handling to Python, we should look at: - existing Python libraries; - HP calculators; - Frink; at the very least. -- Steve From levkivskyi at gmail.com Fri Aug 26 07:49:51 2016 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Fri, 26 Aug 2016 14:49:51 +0300 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160826100129.GN26300@ando.pearwood.info> References: <20160825081955.GA21350@kundert.designers-guide.com> <20160826100129.GN26300@ando.pearwood.info> Message-ID: On 26 August 2016 at 13:01, Steven D'Aprano wrote: > On Fri, Aug 26, 2016 at 07:35:36AM +0200, Pavol Lisy wrote: > > On 8/25/16, Ken Kundert wrote: > > > > [...] > > > > > Just allowing the units to be present, even it not > > > > > > retained, is a big advantage because it can bring a great deal of > clarity to > > > the > > > meaning of the number. For example, even if the language does not flag > an > > > error > > > when a user writes: > > > > > > vdiff = 1mV - 30uA > > > > It reminds me: "Metric mishap caused loss of NASA's Mars Climate > > orbiter. It could be nice to have language support helping to avoid > > something similar. > [snip] > Take v = s/t (velocity equals distance over time). If I write v = s > because it is implicitly understood that the time t is "one": > > s = 100 miles > v = s > > Should v be understood as 100 miles per hour or 100 miles per second or > 100 miles per year? That sort of ambiguity doesn't come up in circuit > design, but it is common elsewhere. > If one writes this as from units import m, s, miles s = miles(100) v: m/s = s This could be flagged as an error by a static type checker. Let me add some clarifications here: 1. By defining __mul__ and __truediv__ on m, s, and other units one can achieve the desirable semantics 2. Arbitrary (reasonable) unit can be described by a tuple of 7 rational numbers (powers of basic SI units, m/s will be e.g. (1, -1, 0, 0, 0, 0, 0)), if one wants also non SI units, then there will be one more float number in the tuple. 3. It is impossible to write down all the possible overloads for operations on units, e.g. 1 m / 1 s should be 1 m/s, 1 m/s / 1 s should be 1 m/s**2, and so on to infinity. Only finite number of overloads can be described with PEP 484 type hints. 4. It is very easy to specify all overloads with very basic dependent types, unit will depend on the above mentioned tuple, and multiplication should be overloaded like this (I write three numbers instead of seven for simplicity): class Unit(Dependent[k,l,m]): def __mul__(self, other: Unit[ko, lo, mo]) -> Unit[k+ko, l+lo, m+mo]: ... 5. Currently neither "mainstream" python type checkers nor PEP 484 support dependent types. 6. For those who are not familiar with dependent types, this concept is very similar to generics. Generic type (e.g. List) is like a "function" that takes a concrete type (e.g. int) and "returns" another concrete type (e.g. List[int], lists of integers). Dependent types do the same, but they allowed to also receive values, not only types as "arguments". The most popular example is matrices of fixed size n by m: Mat[n, m]. The matrix multiplication then could be overloaded as class Mat(Dependent[n, m]): def __matmul__(self, other: Mat[m, k]) -> Mat[n, k]: ... 7. I like the formulation by Nick, if e.g. the library circuit_units defines sufficiently many overloads, then it will safely cover 99.9% of use cases *without* dependent types. (An operation for which type checker does not find an overload will be flagged as error, although the operation might be correct). -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Fri Aug 26 07:52:07 2016 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Fri, 26 Aug 2016 14:52:07 +0300 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: References: <20160825081955.GA21350@kundert.designers-guide.com> <20160826100129.GN26300@ando.pearwood.info> Message-ID: On 26 August 2016 at 14:49, Ivan Levkivskyi wrote: > On 26 August 2016 at 13:01, Steven D'Aprano wrote: > >> On Fri, Aug 26, 2016 at 07:35:36AM +0200, Pavol Lisy wrote: >> > On 8/25/16, Ken Kundert wrote: >> > >> > [...] >> > >> > > Just allowing the units to be present, even it not >> > > >> > > retained, is a big advantage because it can bring a great deal of >> clarity to >> > > the >> > > meaning of the number. For example, even if the language does not >> flag an >> > > error >> > > when a user writes: >> > > >> > > vdiff = 1mV - 30uA >> > >> > It reminds me: "Metric mishap caused loss of NASA's Mars Climate >> > orbiter. It could be nice to have language support helping to avoid >> > something similar. >> > > [snip] > > >> Take v = s/t (velocity equals distance over time). If I write v = s >> because it is implicitly understood that the time t is "one": >> >> s = 100 miles >> v = s >> >> Should v be understood as 100 miles per hour or 100 miles per second or >> 100 miles per year? That sort of ambiguity doesn't come up in circuit >> design, but it is common elsewhere. >> > > If one writes this as > > from units import m, s, miles > > s = miles(100) > v: m/s = s > Sorry for a name collision in this example. It should read: dist = miles(100) vel: m/s = dist -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Aug 26 08:47:18 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 26 Aug 2016 22:47:18 +1000 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis Message-ID: <20160826124716.GP26300@ando.pearwood.info> Ken has made what I consider a very reasonable suggestion, to introduce SI prefixes to Python syntax for numbers. For example, typing 1K will be equivalent to 1000. However, there are some complexities that have been glossed over. (1) Are the results floats, ints, or something else? I would expect that 1K would be int 1000, not float 1000. But what about fractional prefixes, like 1m? Should that be a float or a decimal? If I write 7981m I would expect 7.981, not 7.9809999999999999, so maybe I want a decimal float, not a binary float? Actually, what I would really want is for the scale factor to be tracked separately. If I write 7981m * 1M, I should end up with 7981000 as an int, not a float. Am I being unreasonable? Obviously if I write 1.1K then I'm expecting a float. So I'm not *entirely* unreasonable :-) (2) Decimal or binary scale factors? The SI units are all decimal, and I think if we support these, we should insist that K == 1000, not 1024. For binary scale factors, there is the IEC standard: http://physics.nist.gov/cuu/Units/binary.html which defines Ki = 2**10, Mi = 2**20, etc. (Fortunately this doesn't have to deal with fractional prefixes.) So it would be easy enough to support them as well. (3) ? or u, k or K? I'm going to go to the barricades to fight for the real SI prefixes ? and k to be supported. If people want to support the common fakes u and K as well, that's fine, I have no objection, but I think that its important to support the actual prefixes too. (Python 3 assumes UTF-8 as the default encoding, so it shouldn't cause any technical difficulties to support ? as syntax. The political difficulties though...) (4) What about E? E is tricky if we want 1E to be read as the integer 10**18, because it matches the floating point syntax 1E (which is currently a syntax error). So there's a nasty bit of ambiguity where it may be unclear whether or not 1E is intended as an int or an incomplete float, and then there's 1E1E which might be read as 1E1*10**18 or as just an error. Replacing E with (say) X is risky. The two largest current SI prefixes are Z and Y, it seems very likely that the next one added (if that ever happens) will be X. Actually, using any other letter risks clashing with a future expansion of the SI prefixes. (5) What about other numeric types? Just because there's no syntactic support for Fraction and Decimal shouldn't mean we can't use these scale factors with them. (6) What happens to int(), float() etc? I wouldn't want int("23K") to suddenly change from being an error to returning 23000. Presumably we would want int to take an optional argument to allow the interpretation of scale factors. This gives us an advantage: int("23E", scale=True) is unambiguously an int, and we can ignore the fact that it looks like a float. (7) What about repr() and str()? I don't think that the repr() or str() of numeric types should change. But perhaps format() could grow some new codes to display numbers using either the most obvious scale factor, or some specific scale factor. * * * This leads to my first proposal: require an explicit numeric prefix on numbers before scale factors are allowed, similar to how we treat non-decimal bases. 8M # remains a syntax error 0s8M # unambiguously an int with a scale factor of M = 10**6 0s1E1E # a float 1E1 with a scale factor of E = 10**18 0s1.E # a float 1. with a scale factor of E, not an exponent int('8M') # remains a ValueError int('0s8M', base=0) # returns 8*10**6 Or if that's too heavy (two whole characters, plus the suffix!) perhaps we could have a rule that the suffix must follow the final underscore of the number: 8_M # int 8*10*6 123_456_789_M # int 123456789*10**6 123_M_456 # still an error 8._M # float 8.0*10**6 int() and float() take a keyword only argument to allow a scale factor when converting from strings: int("8_M") # remains an error int("8_M", scale=True) # allowed This solves the problem with E and floats. Its only a scale factor if it immediately follows the final underscore in the float, otherwise it is the regular exponent sign. Proposal number two: don't make any changes to the syntax, but treat these as *literally* numeric scale factors. Add a simple module to the std lib defining the various factors: k = kilo = 10**3 M = mega = 10**6 G = giga = 10**9 etc. and then allow the user to literally treat them as scale factors by multiplying: from scaling import * int_value = 8*M float_value = 8.0*M fraction_value = Fraction(1, 8)*M decimal_value = Decimal("1.2345")*M and so forth. The biggest advantage of this is that there is no syntactic changes needed, it is completely backwards compatible, it works with any numeric type and even non-numbers: py> x = [None]*M py> len(x) 1000000 You can even scale by multiple factors: x = 8*M*K Disadvantages: none I can think of. (Some cleverness may be needed to have fractional scale values work with both floats and Decimals, but that shouldn't be hard.) -- Steve From rosuav at gmail.com Fri Aug 26 09:34:18 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 26 Aug 2016 23:34:18 +1000 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <20160826124716.GP26300@ando.pearwood.info> References: <20160826124716.GP26300@ando.pearwood.info> Message-ID: On Fri, Aug 26, 2016 at 10:47 PM, Steven D'Aprano wrote: > (1) Are the results floats, ints, or something else? > > I would expect that 1K would be int 1000, not float 1000. But what about > fractional prefixes, like 1m? Should that be a float or a decimal? > > If I write 7981m I would expect 7.981, not 7.9809999999999999, so maybe > I want a decimal float, not a binary float? Introduce "d" as a prefix meaning 1, and this could be the way of creating something that people have periodically asked for: Decimal literals. (Though IIRC there were some complexities involving Decimal literals and decimal.getcontext(), which would have to be resolved before 1m could represent a Decimal.) > Actually, what I would really want is for the scale factor to be tracked > separately. If I write 7981m * 1M, I should end up with 7981000 as an > int, not a float. Am I being unreasonable? Easy. Make them Fraction literals instead. You'll end up with 7981000/1 as a rational, rather than a pure int, but if you want accurate handling of SI prefixes, rationals will serve you fairly well. > Obviously if I write 1.1K then I'm expecting a float. So I'm not > *entirely* unreasonable :-) Obviously :) > (2) Decimal or binary scale factors? > > The SI units are all decimal, and I think if we support these, we should > insist that K == 1000, not 1024. For binary scale factors, there is the > IEC standard: > > http://physics.nist.gov/cuu/Units/binary.html > > which defines Ki = 2**10, Mi = 2**20, etc. (Fortunately this doesn't > have to deal with fractional prefixes.) So it would be easy enough to > support them as well. from __future__ import binary_scale_factors as scale_factors from __future__ import decimal_scale_factors as scale_factors # tongue only partly in cheek > (3) ? or u, k or K? > > I'm going to go to the barricades to fight for the real SI prefixes ? > and k to be supported. If people want to support the common fakes u and > K as well, that's fine, I have no objection, but I think that its > important to support the actual prefixes too. I would strongly support the use of ? and weakly u. With k vs K, no opinion. If both can be supported without being confusing, grab 'em both. With output formats, it's less clear, but I would still be inclined toward ? for output. > (4) What about E? > > E is tricky if we want 1E to be read as the integer 10**18, because it > matches the floating point syntax 1E (which is currently a syntax > error). So there's a nasty bit of ambiguity where it may be unclear > whether or not 1E is intended as an int or an incomplete float, and then > there's 1E1E which might be read as 1E1*10**18 or as just an error. It's worse than that. Currently, 1E+2 is a perfectly legal 100.0 (float), but under this proposal, it would be a constant expression yielding 1_000_000_000_000_000_002, so it wouldn't just be giving meaning to things that are currently errors. > Replacing E with (say) X is risky. The two largest current SI prefixes > are Z and Y, it seems very likely that the next one added (if that ever > happens) will be X. Actually, using any other letter risks clashing with > a future expansion of the SI prefixes. Anything's risky. Probably the least risky option is to simply stop before Exa and implement the feature without. > (7) What about repr() and str()? > > I don't think that the repr() or str() of numeric types should change. > But perhaps format() could grow some new codes to display numbers using > either the most obvious scale factor, or some specific scale factor. Agreed. And I'd have them simply pick the one most obvious - if you want a specific factor, you can simply invert and display. > This leads to my first proposal: require an explicit numeric prefix on > numbers before scale factors are allowed, similar to how we treat > non-decimal bases. > > 8M # remains a syntax error > 0s8M # unambiguously an int with a scale factor of M = 10**6 > > 0s1E1E # a float 1E1 with a scale factor of E = 10**18 > 0s1.E # a float 1. with a scale factor of E, not an exponent > > int('8M') # remains a ValueError > int('0s8M', base=0) # returns 8*10**6 Hmm, interesting. Feels clunky but could work. > Or if that's too heavy (two whole characters, plus the suffix!) perhaps > we could have a rule that the suffix must follow the final underscore > of the number: > > 8_M # int 8*10*6 > 123_456_789_M # int 123456789*10**6 > 123_M_456 # still an error > 8._M # float 8.0*10**6 This sounds better IMO. It's not legal syntax in any version of Python older than 3.6, so there's minimal backward compatibility trouble. ChrisA From guido at python.org Fri Aug 26 10:42:00 2016 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Aug 2016 07:42:00 -0700 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <20160826124716.GP26300@ando.pearwood.info> References: <20160826124716.GP26300@ando.pearwood.info> Message-ID: On Fri, Aug 26, 2016 at 5:47 AM, Steven D'Aprano wrote: > Ken has made what I consider a very reasonable suggestion, to introduce > SI prefixes to Python syntax for numbers. For example, typing 1K will be > equivalent to 1000. > > However, there are some complexities that have been glossed over. > [...] > Please curb your enthusiasm. This is not going to happen. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Fri Aug 26 11:06:44 2016 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 26 Aug 2016 16:06:44 +0100 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160826065403.GL26300@ando.pearwood.info> References: <20160825180211.GA1472@kundert.designers-guide.com> <20160826001453.GK26300@ando.pearwood.info> <20160826034654.GA24190@kundert.designers-guide.com> <20160826065403.GL26300@ando.pearwood.info> Message-ID: <304e12ac-c523-089a-d3f5-e4018f41349a@mrabarnett.plus.com> On 2016-08-26 07:54, Steven D'Aprano wrote: [snip] > Specialist applications might be able to take shortcuts in > dimensional analysis when "everybody knows" what the suppressed units > must be. General purpose programming languages *cannot*. It is better > NOT to offer the illusion of dimensional analysis than to mislead the > user into thinking they are covered when they are not. > > Better to let them use a dedicated units package, not build a > half-baked bug magnet into the language syntax. > If you're going to have units, you might also include (for want of a name) "colours" (or "flavours"), which behave slightly differently with respect to arithmetic operators. For example: When you add 2 values, they must have the same units and same colours. The result will have the same units and colours. # "amp" is a unit, "current" is a colour. # The result is a current measured in amps. 1 amp current + 2 amp current == 3 amp current When you divide 2 values, they could have the same or different units, but must have the same colours. The result will have a combination of the units (some might also cancel out), but will have the same colours. # "amp" is a unit, "current" is a colour. # The result is a ratio of currents. 6 amp current / 2 amp current == 3 current From python at mrabarnett.plus.com Fri Aug 26 11:27:56 2016 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 26 Aug 2016 16:27:56 +0100 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: References: <20160826124716.GP26300@ando.pearwood.info> Message-ID: On 2016-08-26 14:34, Chris Angelico wrote: > On Fri, Aug 26, 2016 at 10:47 PM, Steven D'Aprano wrote: [snip] >> Or if that's too heavy (two whole characters, plus the suffix!) perhaps >> we could have a rule that the suffix must follow the final underscore >> of the number: >> >> 8_M # int 8*10*6 >> 123_456_789_M # int 123456789*10**6 >> 123_M_456 # still an error >> 8._M # float 8.0*10**6 > > This sounds better IMO. It's not legal syntax in any version of Python > older than 3.6, so there's minimal backward compatibility trouble. > According to Wikipedia, it's recommended that there be a space between the number and the units, thus not "1kg" but "1 kg". As we don't put spaces inside numbers in Python, insisting on an underscore instead would seem to be a reasonable compromise. From steve at pearwood.info Fri Aug 26 11:35:18 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 27 Aug 2016 01:35:18 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: References: <20160825081955.GA21350@kundert.designers-guide.com> <20160826100129.GN26300@ando.pearwood.info> Message-ID: <20160826153517.GR26300@ando.pearwood.info> On Fri, Aug 26, 2016 at 02:49:51PM +0300, Ivan Levkivskyi wrote: > 1. By defining __mul__ and __truediv__ on m, s, and other units one can > achieve the desirable semantics I'm not entirely sure about that. I'm not even close to an expert on the theory of types, so I welcome correction, but it doesn't seem reasonable to me to model units as types. Or at least not using a standard type checker. ("Define standard," I hear you ask. Um, the sort of type checker that Lingxiao Jiang and Zhendong Su had in mind when they wrote Osprey?) http://web.cs.ucdavis.edu/~su/publications/icse06-unit.pdf Okay, so you certainly can do dimensional analysis with *some* type-checkers. But should you? I think that units are orthogonal to types: I can have a float of unit "gram" and a Fraction of unit "gram", and they shouldn't necessarily be treated as the same type. Likewise I can have a float of unit "gram" and a float of unit "inch", and while they are the same type, they aren't the same dimension. So I think that you need *two* distinct checkers, one to check types, and one to check dimensions (and do unit conversions), even if they're both built on the same or similar technologies. Another issue is that a decent dimensional system should allow the user to create their own dimensions, not just their own units. The seven standard SI dimensions are a good starting point, but there are applications where you may want more, e.g. currency, bits are two common ones. And no, you (probably) don't want bits to be a dimensionless number: it makes no sense to add "7 bits" and "2 radians". Your application may want to track "number of cats" and "number of dogs" as separate dimensions, rather than treat both as dimensionless quantities. And then use the type-checker to ensure that they are both ints, not floats. > 2. Arbitrary (reasonable) unit can be described by a tuple of 7 rational > numbers > (powers of basic SI units, m/s will be e.g. (1, -1, 0, 0, 0, 0, 0)), if one > wants also > non SI units, then there will be one more float number in the tuple. A decent unit converter/dimension analyser needs to support arbitrary dimensions, not just the seven SI dimensions. But let's not get bogged down with implementation details. > 3. It is impossible to write down all the possible overloads for operations > on units, > e.g. 1 m / 1 s should be 1 m/s, 1 m/s / 1 s should be 1 m/s**2, > and so on to infinity. Only finite number of overloads can be described > with PEP 484 type hints. Right. This is perhaps why the authors of Osprey say that "standard type checking algorithms are not powerful enough to handle units because of their abelian group nature (e.g., being commutative, multiplicative, and associative)." Another factor: dimensions should support rational powers, not just integer powers. -- Steve From ncoghlan at gmail.com Fri Aug 26 11:50:00 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Aug 2016 01:50:00 +1000 Subject: [Python-ideas] Barriers to decimal literals (redux) (was Re: SI scale factors alone, without units or dimensional analysis) Message-ID: On 26 August 2016 at 23:34, Chris Angelico wrote: > Introduce "d" as a prefix meaning 1, and this could be the way of > creating something that people have periodically asked for: Decimal > literals. > > (Though IIRC there were some complexities involving Decimal literals > and decimal.getcontext(), which would have to be resolved before 1m > could represent a Decimal.) The most likely candidate for an actual Decimal literal is fixed precision decimal128 - the context dependence of the variable precision decimals currently in the standard library is fine for a custom class, but would be incredibly unintuitive for something that looked similar to a float literal or an int literal (assuming we could get it to work sensibly in the first place). However, actually implementing that would be rather a lot of work, for an unclear pay-off, as the folks that really need decimals seem happy enough with the variable precision ones (especially following Stefan Krah's creation and contribution of the C accelerated implementation), and the vagaries of binary floating point are going to be a factor in the computing landscape for many years to come even if programming language designers do end up collectively settling on decimal128 as the baseline type for decimal data processing. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pavol.lisy at gmail.com Fri Aug 26 13:06:38 2016 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Fri, 26 Aug 2016 19:06:38 +0200 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <20160826124716.GP26300@ando.pearwood.info> References: <20160826124716.GP26300@ando.pearwood.info> Message-ID: On 8/26/16, Steven D'Aprano wrote: [...] > from scaling import * > int_value = 8*M > float_value = 8.0*M > fraction_value = Fraction(1, 8)*M > decimal_value = Decimal("1.2345")*M [...] > Disadvantages: none I can think of. Really interesting idea, but from my POV a little disadvantage is "import *" which bring bunch of one-letter variables into namespace. From tjreedy at udel.edu Fri Aug 26 16:54:54 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 26 Aug 2016 16:54:54 -0400 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <20160826124716.GP26300@ando.pearwood.info> References: <20160826124716.GP26300@ando.pearwood.info> Message-ID: On 8/26/2016 8:47 AM, Steven D'Aprano wrote: > This leads to my first proposal: require an explicit numeric prefix on > numbers before scale factors are allowed, similar to how we treat > non-decimal bases. > > 8M # remains a syntax error -1 for the syntax, +1 for keeping it an error > 0s8M # unambiguously an int with a scale factor of M = 10**6 -.5, better 0a for a in b,o,x is used for numeral base, which is related to scaling of each numeral individually. 0s would be good for ancient sexigesimal (base 60) notation, which we still use for time and circle degrees. > Or if that's too heavy (two whole characters, plus the suffix!) perhaps > we could have a rule that the suffix must follow the final underscore > of the number: > > 8_M # int 8*10*6 > 123_456_789_M # int 123456789*10**6 -.1, better I do not remember seeing this use of SI scale factors divorced from units. I can see how it works well for the relatively small community of EEs, but I expect it would only make Python more confusing for many others, especially school kids. > Proposal number two: don't make any changes to the syntax, but treat > these as *literally* numeric scale factors. Add a simple module to the > std lib defining the various factors: +1 for PyPI, currently +-0 for stdlib '*' is easier to type than '_'. > k = kilo = 10**3 > M = mega = 10**6 > G = giga = 10**9 > > etc. and then allow the user to literally treat them as scale factors by > multiplying: > > from scaling import * > int_value = 8*M > float_value = 8.0*M > fraction_value = Fraction(1, 8)*M > decimal_value = Decimal("1.2345")*M A main use for me would be large ints: us_debt = 19*G. But I would also want to be able to write 19.3*G and get an int, and that would not work. The new _ syntax will alleviate the problem in a different way. 19_300_000_000 will work. Rounded 0s for counts do not always come in groups of 3. > and so forth. The biggest advantage of this is that there is no > syntactic changes needed, it is completely backwards compatible, it > works with any numeric type and even non-numbers: and for PyPI, it does not need pydev + Guido approval. > Disadvantages: none I can think of. Someone mentioned cluttering namespace with 20-30 single char names. For readers, remembering what each letter means > (Some cleverness may be needed to have fractional scale values work with > both floats and Decimals, but that shouldn't be hard.) Make M, G, etc, instances of a class with def __mul__(self, other) that conditions on type of other. -- Terry Jan Reedy From python-ideas at shalmirane.com Fri Aug 26 17:25:12 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Fri, 26 Aug 2016 14:25:12 -0700 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: References: <20160826124716.GP26300@ando.pearwood.info> Message-ID: <20160826212512.GE9468@kundert.designers-guide.com> Okay, so I talked to Guido about this, and all he was trying to convey is that there is an extremely high bar that must be reached before he will consider changing the base language, which of course is both prudent and expected. I'd like to continue the discussion because I believe there is some chance that we could reach that bar even though Guido is clearly skeptical. At this point I'd like to suggest that there be two more constraints we consider: 1. whatever form we choose to be used when outputting numbers should the same form we use when inputting numbers (so %r should produce valid python input, as does all the other format codes), 2. and whatever form we choose to be used when outputting numbers should look natural to end users. So ideas like using 2.4*G, 2.4*GHz, or 0s2.4G wouldn't really work because we would not want to output numbers to end users in this form. Even 2.4_GHz, while better, would still look somewhat unnatural to end users. One more thing to consider is that we already have a precedent here. Python already accepts a suffix on real numbers: j signifies an imaginary number. In this case the above constraints are satisfied. For example, 2j is a natural form to show any end user that understands imaginary numbers, and 2j is acceptable input to the language. To be consistent with that, it seems like 2G or 2GHz should be preferred over 2_G or 2_GHz. Of course, this brings up another issue, how to we handle imaginary numbers with scale factors. The possibilities include: 1. you don't get them, you can either specify j or a scale factor, but not both 2. you do get them, but if we allow units, then j should be first 3. you do get them, but we don't allow units and j could be first or second I like choice 2 myself. Also, to be consistent with j, and because I think it is simpler overall, I think 2G should be a real number, not an integer. Similarly, I think 2Gi, if we accept it, should also be a real number, simply for consistency. One last thing, we can accept 273K as input for 273,000, but when we output it we must use k to avoid confusion with Kelvin (and because that is the standard). Also, we can use ? for for inputting or outputting 1e-6, but we must always accept u as valid input. -Ken On Fri, Aug 26, 2016 at 07:42:00AM -0700, Guido van Rossum wrote: > On Fri, Aug 26, 2016 at 5:47 AM, Steven D'Aprano > wrote: > > > Ken has made what I consider a very reasonable suggestion, to introduce > > SI prefixes to Python syntax for numbers. For example, typing 1K will be > > equivalent to 1000. > > > > However, there are some complexities that have been glossed over. > > [...] > > > > Please curb your enthusiasm. This is not going to happen. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From python-ideas at shalmirane.com Fri Aug 26 18:24:49 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Fri, 26 Aug 2016 15:24:49 -0700 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160826100129.GN26300@ando.pearwood.info> References: <20160825081955.GA21350@kundert.designers-guide.com> <20160826100129.GN26300@ando.pearwood.info> Message-ID: <20160826222449.GG9468@kundert.designers-guide.com> Steven, This keeps coming up, so let me address it again. First, I concede that you are correct that my proposal does not provide dimensional analysis, so any dimensional errors that exist in this new code will not be caught by Python itself, as is currently the case. However, you should concede that by bringing the units front and center in the language, they are more likely to be caught by the user themselves. Yes, it is true that my proposal only addresses units on literals and not variables, expressions, functions, etc. But my proposal addresses not only real literals in the program itself, but also real values in the input and output. Extending this to address variables, expressions, functions, etc would only make sense if the units were checked, which of course is dimensional analysis. It is my position that dimensional analysis is so difficult and burdensome that there is no way it should be in the base Python language. If available, it should be as an add on. This proposal is more about adding capabilities to be base language that happen to make dimensional analysis easier and more attractive than about providing dimensional analysis itself. Second, I concede that there is some chance that users may be lulled into a false sense of complacency and that some dimensional errors would get missed by these otherwise normally very diligent users. But I would point out that I have been intensively using and supporting languages that provide this feature for 40 years and have never seen it. Finally, lets consider the incident on Mars. The problem occurred because one software package output numbers in English units (what were they thinking?) that were then entered into another program that was expecting metric units. The only way this could have been caught in an automated fashion is if the first package output the units for its numbers, and the second package accessed and checked those units. And it is precisely this that I am trying to make easier and more likely with this extension. Of the three steps that must occur, output the units, input the units, check the units, this proposal addresses the first two. -Ken On Fri, Aug 26, 2016 at 08:01:29PM +1000, Steven D'Aprano wrote: > On Fri, Aug 26, 2016 at 07:35:36AM +0200, Pavol Lisy wrote: > > On 8/25/16, Ken Kundert wrote: > > > > [...] > > > > > Just allowing the units to be present, even it not > > > > > > retained, is a big advantage because it can bring a great deal of clarity to > > > the > > > meaning of the number. For example, even if the language does not flag an > > > error > > > when a user writes: > > > > > > vdiff = 1mV - 30uA > > > > It reminds me: "Metric mishap caused loss of NASA's Mars Climate > > orbiter. It could be nice to have language support helping to avoid > > something similar. > > This proposal won't help to avoid this sort of disasterous misuse of > units. It will make that sort of mistake *easier*, not harder, by giving > the user the false sense of security. > > A good description of the Mars Orbiter mishape can be found here, with > a link to the NASA report: > > http://pint.readthedocs.io/en/0.7.2/ > > Suppose I am programming the Mars lander. I read in some thruster data, > in pound-force seconds: > > thrust = sm_forces(arg) # say, it returns 100 lbf?s > > I don't expect to see the tag "lbf?s" anywhere unless I explicitly print > the value out or view it in a debugger. So the tag gives me no visual > assistence in avoiding unit conversion bugs. It is worse than having no > unit attached at all, because now I have the false sense of security > that it is tagged with a unit. > > Much later on I pass that to a function that expects the thrust to be in > Newton seconds: > > result = fire_engines(thrust) > > There's no dimensional analysis, so I could just as easily pass 100 > kilograms per second cubed, or 100 volts. I have no protection from > passing wrong units. But let's ignore that possibility, and trust that I > do actually pass a thrust rather than something completely different. > > The original poster Ken has said that he doesn't want to do unit > conversions. So I pass a measurement in pound force seconds, which is > compatible with Newton seconds, and quite happily use 100 lbf?s as if it > were 100 N?s. > > End result: a repeat of the original Mars lander debacle, when my lander > crashes directly into the surface of Mars, due to a failure to convert > units. This could have been avoided if I had used a real units package > that applied the conversion factor 1 lbf?s = 44.5 N?s, but Kevin's > suggestion won't prevent that. > > You can't avoid bugs caused by using the wrong units by just labelling > values with a unit. You actually have to convert from the wrong units to > the right units, something this proposal avoids. > > I think that Ken is misled by his experience in one narrow field, > circuit design, where everyone uses precisely the same SI units and > there are no conversions needed. This is a field where people can drop > dimensions because everyone understands what you mean to say that a > current equals a voltage. But in the wider world, that's disasterous. > > Take v = s/t (velocity equals distance over time). If I write v = s > because it is implicitly understood that the time t is "one": > > s = 100 miles > v = s > > Should v be understood as 100 miles per hour or 100 miles per second or > 100 miles per year? That sort of ambiguity doesn't come up in circuit > design, but it is common elsewhere. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From python at mrabarnett.plus.com Fri Aug 26 18:29:57 2016 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 26 Aug 2016 23:29:57 +0100 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <20160826212512.GE9468@kundert.designers-guide.com> References: <20160826124716.GP26300@ando.pearwood.info> <20160826212512.GE9468@kundert.designers-guide.com> Message-ID: On 2016-08-26 22:25, Ken Kundert wrote: > Okay, so I talked to Guido about this, and all he was trying to convey is that > there is an extremely high bar that must be reached before he will consider > changing the base language, which of course is both prudent and expected. > > I'd like to continue the discussion because I believe there is some chance > that we could reach that bar even though Guido is clearly skeptical. > > At this point I'd like to suggest that there be two more constraints we > consider: > 1. whatever form we choose to be used when outputting numbers should the same > form we use when inputting numbers (so %r should produce valid python input, > as does all the other format codes), > 2. and whatever form we choose to be used when outputting numbers should look > natural to end users. > > So ideas like using 2.4*G, 2.4*GHz, or 0s2.4G wouldn't really work because we > would not want to output numbers to end users in this form. Even 2.4_GHz, while > better, would still look somewhat unnatural to end users. > "2.4" would not look natural to a lot of people; they would expect "2,4". Either you force the end user to accept what Python uses for a decimal point and digit grouping, or you convert on input and output. And if you're doing that for the decimal point and digit grouping, you might as well also do that for scale factors and units. > One more thing to consider is that we already have a precedent here. Python > already accepts a suffix on real numbers: j signifies an imaginary number. In > this case the above constraints are satisfied. For example, 2j is a natural form > to show any end user that understands imaginary numbers, and 2j is acceptable > input to the language. To be consistent with that, it seems like 2G or 2GHz > should be preferred over 2_G or 2_GHz. > > Of course, this brings up another issue, how to we handle imaginary numbers with > scale factors. The possibilities include: > 1. you don't get them, you can either specify j or a scale factor, but not both > 2. you do get them, but if we allow units, then j should be first > 3. you do get them, but we don't allow units and j could be first or second > > I like choice 2 myself. Also, to be consistent with j, and because I think it is > simpler overall, I think 2G should be a real number, not an integer. Similarly, > I think 2Gi, if we accept it, should also be a real number, simply for > consistency. > > One last thing, we can accept 273K as input for 273,000, but when we output it > we must use k to avoid confusion with Kelvin (and because that is the standard). > Also, we can use ? for for inputting or outputting 1e-6, but we must always > accept u as valid input. > Interestingly, it's not possible to use J as a unit (Joule): >>> 1J 1j From steve at pearwood.info Fri Aug 26 23:48:29 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 27 Aug 2016 13:48:29 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160826222324.GF9468@kundert.designers-guide.com> References: <20160825081955.GA21350@kundert.designers-guide.com> <20160826100129.GN26300@ando.pearwood.info> <20160826222324.GF9468@kundert.designers-guide.com> Message-ID: <20160827034829.GS26300@ando.pearwood.info> On Fri, Aug 26, 2016 at 03:23:24PM -0700, Ken Kundert wrote: > Second, I concede that there is some chance that users may be lulled into > a false sense of complacency and that some dimensional errors would get missed > by these otherwise normally very diligent users. But I would point out that > I have been intensively using and supporting languages that provide this feature > for 40 years and have never seen it. In your first post, you said that there were no languages at all that supported units as a language feature, and suggested that Python should lead the way here: I find it a little shocking that no programming languages offer this feature yet Now you say you've been using these "languages" plural for forty years. Would you like to rephrase your claim? I am unable to reconcile the discrepency. (There are three languages that I know of that support units as a first class language feature, RPL, Frink and Fortress. None of them are 40 years old.) -- Steve From greg.ewing at canterbury.ac.nz Fri Aug 26 21:16:42 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 27 Aug 2016 13:16:42 +1200 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <20160826124716.GP26300@ando.pearwood.info> References: <20160826124716.GP26300@ando.pearwood.info> Message-ID: <57C0E9FA.8080907@canterbury.ac.nz> Steven D'Aprano wrote: > Obviously if I write 1.1K then I'm expecting a float. Why is it obvious that you're expecting a float and not a decimal in that case? > The SI units are all decimal, and I think if we support these, we should > insist that K == 1000, not 1024. For binary scale factors, there is the > IEC standard: Or perhaps allow the multiplier to be followed by 'b' or 'B' (bits/bytes/binary) to signal a binary scale factor. > and then > there's 1E1E which might be read as 1E1*10**18 or as just an error. I don't think it's necessary or desirable to support having both a scale factor *and* an exponent, so I'd go for making it an error. You can always write 1E * 1E18 etc. if you need to. > 8M # remains a syntax error > 0s8M # unambiguously an int with a scale factor of M = 10**6 That looks ugly and hard to read to me. If we're to have that, I'm not sure 's' is the best character, since it suggest something to do with strings. > Proposal number two: don't make any changes to the syntax, but treat > these as *literally* numeric scale factors. > > k = kilo = 10**3 > M = mega = 10**6 > G = giga = 10**9 > > int_value = 8*M > float_value = 8.0*M > fraction_value = Fraction(1, 8)*M > decimal_value = Decimal("1.2345")*M I like this! > You can even scale by multiple factors: > > x = 8*M*K Which also offers a neat solution to the "floppy megabytes" problem: k = 1000 kB = 1024 floppy_size = 1.44*k*kB -- Greg From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Aug 27 02:24:49 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 27 Aug 2016 15:24:49 +0900 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <20160826212512.GE9468@kundert.designers-guide.com> References: <20160826124716.GP26300@ando.pearwood.info> <20160826212512.GE9468@kundert.designers-guide.com> Message-ID: <22465.12849.783819.73749@turnbull.sk.tsukuba.ac.jp> Ken Kundert writes: > I'd like to continue the discussion because I believe there is some > chance that we could reach that bar even though Guido is clearly > skeptical. OK. > input to the language. To be consistent with that, it seems like 2G > or 2GHz should be preferred over 2_G or 2_GHz. Sure, *in the user interface*. The Python interpreter REPL is a user interface, but it's a "bare minimum" intended to expose the language and no more. Interfaces like Jupyter can provide the heuristic that an identifier following a numeric literal is a "unit" (whatever we decide that should be, a type as I suggest, semantically null as you seem to prefer, or something else) without changing the language. This separation of concerns also allows Python (the language) to experiment with different implementations of "unit" while Jupyter maintains its user interface without change. If you want to change the *language* you need to provide answers to the following. I have no answers to them that I like, but maybe you can do better. How about 2.4Gaunitwithaveryveryveryveryveryverylongname? Consider the chemical unit "mol". How do you distinguish "1 mol" from "1/1000 ol"? Similarly, how do you distinguish "1 joule" from "1 imaginary oule"? If you allow both naked prefixes and prefixed units, how do you distinguish "1/10 a" from "10" when both are represented "1da"? > One last thing, we can accept 273K as input for 273,000, but when > we output it we must use k to avoid confusion with Kelvin (and > because that is the standard). I think that's unacceptable. If "273K" has the valid interpretation "0 degrees Celcius" and we're going to accept units at all, we must not ask users to type 273000mK or even 2730dK. So if we ever accept "1K" to mean 1000, we're kinda hosed for accepting units. I think units syntax is broken anyway per the examples above, and Guido already pronounced on "naked scale prefixes": > On Fri, Aug 26, 2016 at 07:42:00AM -0700, Guido van Rossum wrote: > > Please curb your enthusiasm. This is not going to happen. +1 Guido may have retracted this pronouncement in private mail, but by that same token, he can reinstate it. I've learned to trust his first reactions; backing off this way is a symptom of openness to new ideas, not inaccuracy of the first reaction. Steve From steve at pearwood.info Sat Aug 27 03:05:34 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 27 Aug 2016 17:05:34 +1000 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <57C0E9FA.8080907@canterbury.ac.nz> References: <20160826124716.GP26300@ando.pearwood.info> <57C0E9FA.8080907@canterbury.ac.nz> Message-ID: <20160827070532.GT26300@ando.pearwood.info> On Sat, Aug 27, 2016 at 01:16:42PM +1200, Greg Ewing wrote: > Steven D'Aprano wrote: > > >Obviously if I write 1.1K then I'm expecting a float. > > Why is it obvious that you're expecting a float and not > a decimal in that case? Because if you search the list archives you'll see that, in the short term at least, I'm not in favour of changing the default floating point type from binary floats to decimal floats *wink* Also, I didn't say "binary float", I might have meant "decimal float" :-) But all joking aside, you are making a good point. Since the SI units are all powers of ten, maybe this should be linked to decimal and integer numbers rather than binary floats. Let's look ahead to the day (Python 4? Python 5?) where there is a built-in decimal floating point type with fixed precision. The existing Decimal type will remain variable precision. Whatever suffixes we allow now will limit our choices for this hypothetical "decimal128" (say) type. So if we have: 123d to mean 123 deci-units, or 123*10**-1, then we've just eliminated the ability to make 123d a decimal. Now that's not necessarily a reason to reject this proposal, but it does add a complication. (And frankly, I'd rather get built-in fixed precision decimals than syntax for scale factors. But that's another story.) > >The SI units are all decimal, and I think if we support these, we should > >insist that K == 1000, not 1024. For binary scale factors, there is the > >IEC standard: > > Or perhaps allow the multiplier to be followed by > 'b' or 'B' (bits/bytes/binary) to signal a binary scale > factor. Why sure! Let's ignore the perfectly good, well-known existing official standard to invent our own standard that clashes with the de facto standard use of "b" for bits and "B" for bytes! *wink* > >8M # remains a syntax error > >0s8M # unambiguously an int with a scale factor of M = 10**6 > > That looks ugly and hard to read to me. > > If we're to have that, I'm not sure 's' is the best character, > since it suggest something to do with strings. To be honest, I agree. Even though I suggested 0s as a prefix, I don't actually like it. I much prefer the rule "any scale factor must follow the last underscore of the number". > >Proposal number two: don't make any changes to the syntax, but treat > >these as *literally* numeric scale factors. > > > >k = kilo = 10**3 > >M = mega = 10**6 > >G = giga = 10**9 > > > >int_value = 8*M > >float_value = 8.0*M > >fraction_value = Fraction(1, 8)*M > >decimal_value = Decimal("1.2345")*M > > I like this! I've started experimenting with this, and when I get time, I'll put it on PyPI so that people can experiment with it too. > >You can even scale by multiple factors: > > > >x = 8*M*K > > Which also offers a neat solution to the "floppy megabytes" > problem: > > k = 1000 > kB = 1024 > > floppy_size = 1.44*k*kB Indeed, although I would write that as k*Ki. -- Steve From mertz at gnosis.cx Sat Aug 27 03:06:39 2016 From: mertz at gnosis.cx (David Mertz) Date: Sat, 27 Aug 2016 00:06:39 -0700 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <57C0E9FA.8080907@canterbury.ac.nz> References: <20160826124716.GP26300@ando.pearwood.info> <57C0E9FA.8080907@canterbury.ac.nz> Message-ID: >> Proposal number two: don't make any changes to the syntax, but treat these as *literally* numeric scale factors. >> k = kilo = 10**3 >> M = mega = 10**6 >> G = giga = 10**9 >> >> int_value = 8*M float_value = 8.0*M >> fraction_value = Fraction(1, 8)*M >> decimal_value = Decimal("1.2345")*M This is the only variant I've seen that I would consider "not awful." Of course, this involves no change in the language, but just a module on PyPI. Of the awful options, a suffix underscore and multiplier (1.1_G) is the least awful. It's a little bit reminiscent of the optional internal underscores being added to literals. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-ideas at shalmirane.com Sat Aug 27 04:44:54 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Sat, 27 Aug 2016 01:44:54 -0700 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160827034829.GS26300@ando.pearwood.info> References: <20160825081955.GA21350@kundert.designers-guide.com> <20160826100129.GN26300@ando.pearwood.info> <20160826222324.GF9468@kundert.designers-guide.com> <20160827034829.GS26300@ando.pearwood.info> Message-ID: <20160827084454.GD14864@kundert.designers-guide.com> SPICE, written by Larry Nagel, introduced the concept in 1972. It is a circuit simulator, and the language involved was a netlist language: basically a list of components, the nodes there were connected to, and their values. It looked like this: R1 1 0 1K C1 1 0 1nF I1 1 0 1mA SPICE was an incredibly influential program used by virtually all circuit designers for decades. Interesting, it was very likely the first open source software project. It was developed at Berkeley as a free and open source project, well before those terms were in common use, and it was highly influential on the BSD UNIX developers, also at Berkeley, which in turn were influential on Stallman at MIT. Verilog, a hardware modeling language adopted the concept in a small scale (just for time) in the 1980's. Then in the early 90's Verilog-A was created, a version of Verilog designed to allow people to model analog circuits. It allowed use of SI scale factors for all real numbers. A few years later Verilog-AMS was released. It combined Verilog and Verilog-A. It also allows SI scale factors on all real numbers. I developed Verilog-A as well as Spectre, a replacement for SPICE, and so I am intimately familiar with language issues, the implementation issues, and the user issues of use of SI scale factors in particular, and computational programming in general. So SPICE was a netlist language, Verilog was a modeling language. I was not aware of any general purpose programming languages that offer supports for SI scale factors or units. RPL, Frink, and Fortress are new to me. I took a quick look at Frink and it does not look like a general purpose programming language either, more like a calculator language. That is, of course, what RPL is. Neither really look up to taking on a serious computational task. Fortress looks like a general purpose programming language, but little detail seems to remain about this language, and I found nothing on units or scale factors. -Ken On Sat, Aug 27, 2016 at 01:48:29PM +1000, Steven D'Aprano wrote: > On Fri, Aug 26, 2016 at 03:23:24PM -0700, Ken Kundert wrote: > > > Second, I concede that there is some chance that users may be lulled into > > a false sense of complacency and that some dimensional errors would get missed > > by these otherwise normally very diligent users. But I would point out that > > I have been intensively using and supporting languages that provide this feature > > for 40 years and have never seen it. > > In your first post, you said that there were no languages at all > that supported units as a language feature, and suggested that Python > should lead the way here: > > I find it a little shocking that no programming languages offer this > feature yet > > Now you say you've been using these "languages" plural for forty years. > Would you like to rephrase your claim? I am unable to reconcile the > discrepency. > > (There are three languages that I know of that support units as a first > class language feature, RPL, Frink and Fortress. None of them are 40 > years old.) > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From greg.ewing at canterbury.ac.nz Sat Aug 27 04:46:49 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 27 Aug 2016 20:46:49 +1200 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <20160827070532.GT26300@ando.pearwood.info> References: <20160826124716.GP26300@ando.pearwood.info> <57C0E9FA.8080907@canterbury.ac.nz> <20160827070532.GT26300@ando.pearwood.info> Message-ID: <57C15379.7070603@canterbury.ac.nz> Steven D'Aprano wrote: >>Why is it obvious that you're expecting a float and not >>a decimal in that case? > > Because if you search the list archives you'll see that, in the short > term at least, I'm not in favour of changing the default floating point > type from binary floats to decimal floats *wink* I only asked because earlier in the same post you said you wanted 7981m to be decimal. Given that, I wouldn't have guessed you wanted 1.1K to *not* be decimal. -- Greg From python-ideas at shalmirane.com Sat Aug 27 04:52:09 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Sat, 27 Aug 2016 01:52:09 -0700 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <22465.12849.783819.73749@turnbull.sk.tsukuba.ac.jp> References: <20160826124716.GP26300@ando.pearwood.info> <20160826212512.GE9468@kundert.designers-guide.com> <22465.12849.783819.73749@turnbull.sk.tsukuba.ac.jp> Message-ID: <20160827085209.GF14864@kundert.designers-guide.com> On Sat, Aug 27, 2016 at 03:24:49PM +0900, Stephen J. Turnbull wrote: > If you want to change the *language* you need to provide answers to > the following. I have no answers to them that I like, but maybe you > can do better. > > How about 2.4Gaunitwithaveryveryveryveryveryverylongname? Why would we care if the user wants to use a long name for their units? We don't care if they use a long name for their variables. > Consider the chemical unit "mol". How do you distinguish "1 mol" from > "1/1000 ol"? The rule is you cannot give unit without a scale factor, and the unity scale factor is _, so if you wanted to say 1 mol you would use 1_mol. 1mol means one milli ol. > Similarly, how do you distinguish "1 joule" from "1 imaginary oule"? Again, you cannot give units without a scale factor. so 1 joule is 1_J. For one imaginary joule, it would be 1j_J. These look a little strange, but that is because the use they unit scale factor, which is the one that is currently not in heavy use. Other scale factors look much more natural. For example, 1 milli mol is 1mmol. 1 kilo joule is 1kJ. > If you allow both naked prefixes and prefixed units, how do you > distinguish "1/10 a" from "10" when both are represented "1da"? I suggest that we do not support the h (=100), da (=10), d (=0.1), or c (=0.01) scale factors. The primary supported scale factors should be TGMk_munpfa. The extended set would include YZEP and zy. > > One last thing, we can accept 273K as input for 273,000, but when > > we output it we must use k to avoid confusion with Kelvin (and > > because that is the standard). > > I think that's unacceptable. If "273K" has the valid interpretation > "0 degrees Celcius" and we're going to accept units at all, we must > not ask users to type 273000mK or even 2730dK. So if we ever accept > "1K" to mean 1000, we're kinda hosed for accepting units. I think > units syntax is broken anyway per the examples above, and Guido > already pronounced on "naked scale prefixes": The valid interpretation of 273K is 273,000. If you want 273 Kelvin, you would use 273_K. > Steve From arek.bulski at gmail.com Sat Aug 27 05:04:00 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Sat, 27 Aug 2016 11:04:00 +0200 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis Message-ID: SI units are a standard that was kind of imposed top down on the computer science community. But we learned to use KB MB so why no keep the defacto standard we already have? Kibibytes and mibibytes were never really adopted. 1K == 1000 1KB == 1024 1M == 1000**2 1MB == 1024**2 Suffixes, simple. int_value = 8M float_value = 8.0M or float("8M") fraction_value = Fraction(1M, 8) or Fraction("1M/8") decimal_value = Decimal("1.2345M") Suffixes are by definition at the end of a literal. So 1E1E == 1E1 * 1E -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Aug 27 08:04:18 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 27 Aug 2016 21:04:18 +0900 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <304e12ac-c523-089a-d3f5-e4018f41349a@mrabarnett.plus.com> References: <20160825180211.GA1472@kundert.designers-guide.com> <20160826001453.GK26300@ando.pearwood.info> <20160826034654.GA24190@kundert.designers-guide.com> <20160826065403.GL26300@ando.pearwood.info> <304e12ac-c523-089a-d3f5-e4018f41349a@mrabarnett.plus.com> Message-ID: <22465.33218.897900.442788@turnbull.sk.tsukuba.ac.jp> MRAB writes: > When you divide 2 values, they could have the same or different units, > but must have the same colours. The result will have a combination of > the units (some might also cancel out), but will have the same colours. > > # "amp" is a unit, "current" is a colour. > # The result is a ratio of currents. > 6 amp current / 2 amp current == 3 current I don't understand why a ratio would retain color. What's the application? For example, in circuit analysis, if "current" is a color, I would expect "potential" and "resistance" to be colors, too. But from I1*R1 = V = I2*R2, we have I1/I2 = R2/R1 in a parallel circuit, so unitless ratios of color current become unitless ratios of color resistance. Furthermore that ratio might arise from physical phenomena such as temperature-varying resistance, and be propagated to another physical phenomenon such as the deflection of a meter's needle. What might color tell us about these computations? (Note: I'm pretty sure that MRAB didn't choose "current" as a parallel to "resistance". Nevertheless, the possibility of propagation of values across color boundaries to be necessary and I don't see how color is going to be used.) My own take is that units specify possible operations, ie, they are nothing more than a partial specification of a type. Rather than speculate on additional attributes that might useful in conjunction with units, we should see if there are convenient ways to describe the constraints that units produce on behavior of types. Ie, by creating types VoltType(UnitType), AmpereType(UnitType), and OhmType(UnitType), and specifying the "equation" VoltType = AmpereType * OhmType on those types, the __mul__ and __div__ operators would be modified to implement the expected operations as a function of UnitType. That is, there would be a helper impose_type_expression_equivalence() that would take the string "VoltType = AmpereType * OhmType" and manipulate the derived type methods appropriately. One aspect of this approach is that we can conveniently derive concrete units such as V = VoltType(1) # Some users might prefer the unit # Volt, variable V, thus "VoltType" mA = AmpereType(1e-3) # SI scale prefix! k? = OhmType(1e3) # Unicode! They don't serve the OP's request (he *doesn't* want type checking, he *does* want syntax), but I prefer these anyway: 10*V == (2*mA)*(5*k?) Developers of types for circuit analysis derived from UnitType might prefer different names that reflect the type being measured rather than the unit, eg Current or CurrentType instead of AmpereType. There is no problem with units like Joule (in syntax-based proposals, it collides with the imaginary unit) and Kelvin (in syntax-based proposals, it collides with a non-SI prefix that nevertheless is so commonly used that both the OP and Steven d'Aprano say should be recognized as "kilo"). Another advantage (IMHO) is that "reified" units can be treated as equivalent to "real" (aka standard or base) units. What I mean is that New York is not approximately 4 Mm from Los Angeles (that would give most people a WTF feeling), it's about 4000 km. While I realize programmers will be able to do that conversion, this flexibility allows people to use units that feel natural to them. If you want to miles and feet, you can define them as ft = LengthType(12/39.37) # base unit is meter per SI mi = 5280*ft very conveniently. Using this approach, Ken's example that Nick rewrote to use type hinting would look like this: from circuit_units import uA, V, mV, kOhm, u_second, VoltType us = u_second # Use project convention. # u_second = SecondType(1e-6) # A gratuitous style change. expected: VoltType # With so few declarations, # I prefer "predeclaration". # There is no millivolt type, # so derived units are all # consistent with this variable. # A gratuitous style change. for delta in [-0.5*uA, 0*uA, 0.5*uA]: # uA = AmpereType(1e-6) # I dislike [-0.5, 0, 0.5]*uA, # but it could be implemented. input = 2.75*uA + delta wait(1*us) # The "1*" is redundant. expected = (100*kOhm)*input # kOhm = OhmType(1e3) tolerance = 2.2*mV # mV = VoltType(1e-3) fails = check_output(expected, tolerance) print('%s: I(in)=%rA, measured V(out)=%rV, expected V(out)=%rV, diff=%rV.' % ( 'FAIL' if fails else 'pass', input, get_output(), expected, get_output() - expected )) Hmm: need for only *one* variable declaration. This is very much to my personal taste, YMMV. The main question is whether this device could support efficient computation. All of these units are objects with math dunders that have to dispatch on type (or else they need to produce "expression Types" such as Ampere_Ohm, but I don't think type checkers would automatically know that Volt = Ampere_Ohm). This clearly can't compare to the efficiency of NewType(float). But AIUI, one NewType(float) can't mix with another, which is not the behavior we want here. We could do VoltType = NewType('Volt', float) AmpereType = NewType('Ampere', float) WattType = NewType('Watt', float) def power(potential, current): return WattType(float(Volt)*float(Ampere)) but this is not very readable, and error-prone IMO. It's also less efficient than the "zero cost" that NewType promises for types like User_ID (https://www.python.org/dev/peps/pep-0484/#newtype-helper-function). I suppose it would be feasible (though ugly) to provide two implementations of VoltType, one as a "real" class as I propose above, and the other simply VoltType = float. The former would be used for type checking, the latter for production computations. Perhaps such a process could be automated in mypy? A final advantage is that I suppose that it should be possible to implement "color" as MRAB proposes through the Python type system. We don't have to define it now, but can take advantage of the benefits of "units as types" approach immediately. I don't know how to implement impose_type_expression_equivalence(), yet, so this is not a proposal for the stdlib. But the necessary definitions by hand are straightforward, though tedious. Individual implementations of units can be done *now*, without change to Python, AFAICS. Computational efficiency is an issue, but one that doesn't matter to educational applications, for example. Steve From mal at egenix.com Sat Aug 27 08:36:11 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 27 Aug 2016 14:36:11 +0200 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <22465.33218.897900.442788@turnbull.sk.tsukuba.ac.jp> References: <20160825180211.GA1472@kundert.designers-guide.com> <20160826001453.GK26300@ando.pearwood.info> <20160826034654.GA24190@kundert.designers-guide.com> <20160826065403.GL26300@ando.pearwood.info> <304e12ac-c523-089a-d3f5-e4018f41349a@mrabarnett.plus.com> <22465.33218.897900.442788@turnbull.sk.tsukuba.ac.jp> Message-ID: <57C1893B.20008@egenix.com> I've been following this discussion on and off for a while, but still fail to see how SI units, factors or the like are a use case which is general enough to warrant changing the language. There are packages available on PyPI for dealing with this in a similar way we deal with decimal literals in Python: C extension: https://pypi.python.org/pypi/cfunits/ http://pythonhosted.org/cfunits/cfunits.Units.html (interfaces to the udunits-2 lib: http://www.unidata.ucar.edu/software/udunits/udunits-2.2.20/doc/udunits/udunits2.html) Pure python: https://pypi.python.org/pypi/units/ IMHO, a literal notation like "2 m" is more likely related to a missing operator which should be flagged as SyntaxError than the declaration of an integer with associated unit. By keeping such analysis to string to object conversion tools/functions you make the intent explicit, which allows for better error reporting. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Aug 27 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From xavier.combelle at gmail.com Sat Aug 27 10:51:28 2016 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Sat, 27 Aug 2016 16:51:28 +0200 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160827084454.GD14864@kundert.designers-guide.com> References: <20160825081955.GA21350@kundert.designers-guide.com> <20160826100129.GN26300@ando.pearwood.info> <20160826222324.GF9468@kundert.designers-guide.com> <20160827034829.GS26300@ando.pearwood.info> <20160827084454.GD14864@kundert.designers-guide.com> Message-ID: <9b635489-2562-4fff-5505-e0be9acfb138@gmail.com> On 27/08/2016 10:44, Ken Kundert wrote: > SPICE, written by Larry Nagel, introduced the concept in 1972. It is a circuit > simulator, and the language involved was a netlist language: basically a list of > components, the nodes there were connected to, and their values. It looked like > this: > > R1 1 0 1K > C1 1 0 1nF > I1 1 0 1mA > > SPICE was an incredibly influential program used by virtually all circuit > designers for decades. Interesting, it was very likely the first open source > software project. It was developed at Berkeley as a free and open source > project, well before those terms were in common use, and it was highly > influential on the BSD UNIX developers, also at Berkeley, which in turn were > influential on Stallman at MIT. > > Verilog, a hardware modeling language adopted the concept in a small scale (just > for time) in the 1980's. Then in the early 90's Verilog-A was created, > a version of Verilog designed to allow people to model analog circuits. It > allowed use of SI scale factors for all real numbers. A few years later > Verilog-AMS was released. It combined Verilog and Verilog-A. It also allows SI > scale factors on all real numbers. I developed Verilog-A as well as Spectre, > a replacement for SPICE, and so I am intimately familiar with language issues, > the implementation issues, and the user issues of use of SI scale factors in > particular, and computational programming in general. > > So SPICE was a netlist language, Verilog was a modeling language. I was not > aware of any general purpose programming languages that offer supports for SI > scale factors or units. RPL, Frink, and Fortress are new to me. I took a quick > look at Frink and it does not look like a general purpose programming language > either, more like a calculator language. That is, of course, what RPL is. > Neither really look up to taking on a serious computational task. Fortress looks > like a general purpose programming language, but little detail seems to remain > about this language, and I found nothing on units or scale factors. > > -Ken > Both example (SPICE and Verilog) are electronic design languages. So their task is easy, they allow only electronic units. It is very likely that your experience in these languages can't be used to translate a general purpose language. I know that F# use unit of measure (see for example: https://fsharpforfunandprofit.com/posts/units-of-measure/ or http://stevenpemberton.net/blog/2015/03/11/FSharp-Units-Of-Measure/ ) this experience can hardly transpose to python has it is an heavily statically type checked language. From turnbull.stephen.fw at u.tsukuba.ac.jp Sat Aug 27 11:00:47 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 28 Aug 2016 00:00:47 +0900 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: <20160827084204.GC14864@kundert.designers-guide.com> References: <20160826124716.GP26300@ando.pearwood.info> <20160826212512.GE9468@kundert.designers-guide.com> <22465.12849.783819.73749@turnbull.sk.tsukuba.ac.jp> <20160827084204.GC14864@kundert.designers-guide.com> Message-ID: <22465.43807.8775.459823@turnbull.sk.tsukuba.ac.jp> Ken Kundert writes: > The rule is you cannot give unit without a scale factor, and the > unity scale factor is _, so if you wanted to say 1 mol you would > use 1_mol. 1mol means one milli ol. These look a little strange, > but that is because the use they unit scale factor, which is the > one that is currently not in heavy use. One reason I like Python is that it has relatively few of these irregularities and ambiguities (in comparison to other languages and non-programming contexts with similar usage). For me, that counts against this proposal. BTW, where is "_" used as the unit scale prefix? > I suggest that we do not support the h (=100), da (=10), d (=0.1), > or c (=0.01) scale factors. I don't think it's reasonable to exclude those. Around me, cm, dB, and ha (centimeters, decibels, and hectares) are in common use. What happened to "support with a capital S"? I don't speak for anybody but myself, but I think this proposal has gotten less interesting/acceptable with each post. I'm going wait and see if the "units are types" approach goes anywhere. I think it's probably the only one that has wings, but that's because it requires no change to the language. From mertz at gnosis.cx Sat Aug 27 13:47:08 2016 From: mertz at gnosis.cx (David Mertz) Date: Sat, 27 Aug 2016 10:47:08 -0700 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <9b635489-2562-4fff-5505-e0be9acfb138@gmail.com> References: <20160825081955.GA21350@kundert.designers-guide.com> <20160826100129.GN26300@ando.pearwood.info> <20160826222324.GF9468@kundert.designers-guide.com> <20160827034829.GS26300@ando.pearwood.info> <20160827084454.GD14864@kundert.designers-guide.com> <9b635489-2562-4fff-5505-e0be9acfb138@gmail.com> Message-ID: It really feels like the OP simply wants Python to become a language for circuit design, with no consideration of general pulse usability, not of other domains. Little in the proposal translates well outside his particular domain, and the differences between domains simply make the proposed additions opportunities for new errors. On Aug 27, 2016 7:52 AM, "Xavier Combelle" wrote: > > > On 27/08/2016 10:44, Ken Kundert wrote: > > SPICE, written by Larry Nagel, introduced the concept in 1972. It is a > circuit > > simulator, and the language involved was a netlist language: basically a > list of > > components, the nodes there were connected to, and their values. It > looked like > > this: > > > > R1 1 0 1K > > C1 1 0 1nF > > I1 1 0 1mA > > > > SPICE was an incredibly influential program used by virtually all circuit > > designers for decades. Interesting, it was very likely the first open > source > > software project. It was developed at Berkeley as a free and open source > > project, well before those terms were in common use, and it was highly > > influential on the BSD UNIX developers, also at Berkeley, which in turn > were > > influential on Stallman at MIT. > > > > Verilog, a hardware modeling language adopted the concept in a small > scale (just > > for time) in the 1980's. Then in the early 90's Verilog-A was created, > > a version of Verilog designed to allow people to model analog circuits. > It > > allowed use of SI scale factors for all real numbers. A few years later > > Verilog-AMS was released. It combined Verilog and Verilog-A. It also > allows SI > > scale factors on all real numbers. I developed Verilog-A as well as > Spectre, > > a replacement for SPICE, and so I am intimately familiar with language > issues, > > the implementation issues, and the user issues of use of SI scale > factors in > > particular, and computational programming in general. > > > > So SPICE was a netlist language, Verilog was a modeling language. I was > not > > aware of any general purpose programming languages that offer supports > for SI > > scale factors or units. RPL, Frink, and Fortress are new to me. I took a > quick > > look at Frink and it does not look like a general purpose programming > language > > either, more like a calculator language. That is, of course, what RPL is. > > Neither really look up to taking on a serious computational task. > Fortress looks > > like a general purpose programming language, but little detail seems to > remain > > about this language, and I found nothing on units or scale factors. > > > > -Ken > > > Both example (SPICE and Verilog) are electronic design languages. > So their task is easy, they allow only electronic units. It is very > likely that your experience > in these languages can't be used to translate a general purpose language. > I know that F# use unit of measure (see for example: > https://fsharpforfunandprofit.com/posts/units-of-measure/ or > http://stevenpemberton.net/blog/2015/03/11/FSharp-Units-Of-Measure/ ) > this experience can hardly transpose to python has it is an heavily > statically type checked language. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Aug 27 14:25:15 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 28 Aug 2016 04:25:15 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160826222324.GF9468@kundert.designers-guide.com> References: <20160825081955.GA21350@kundert.designers-guide.com> <20160826100129.GN26300@ando.pearwood.info> <20160826222324.GF9468@kundert.designers-guide.com> Message-ID: <20160827182514.GV26300@ando.pearwood.info> On Fri, Aug 26, 2016 at 03:23:24PM -0700, Ken Kundert wrote: > Steven, > This keeps coming up, so let me address it again. > > First, I concede that you are correct that my proposal does not provide > dimensional analysis, so any dimensional errors that exist in this new code will > not be caught by Python itself, as is currently the case. > > However, you should concede that by bringing the units front and center in the > language, they are more likely to be caught by the user themselves. I do not concede any such thing at all. At best this might apply under some circumstances with extremely simple formulae when working directly with literals, in other words when using Python like a souped-up calculator. (Which is a perfectly reasonable way to use Python -- I do that myself. But it's not the only way to use Python.) But counting against that is that there will be other cases where what should be a error will sneak past because it happens to look like a valid scale factor or unit: x += 1Y # oops, I fat-fingered the Y when I meant 16 > It is > my position that dimensional analysis is so difficult and burdensome that there > is no way it should be in the base Python language. If available, it should be > as an add on. This is a strange and contradictory position to take. If dimensional analysis is so "difficult and burdensome", how do you expect the user to do it in their head by just looking at the source code? It is your argument above that users will be able to catch dimensional errors just by looking at the units in the source code, but here, just one sentence later, you claim that dimensional analysis is so difficult and burdensome that users cannot deal with it even with the assistence of the interpreter. I cannot reconcile those two beliefs. If you think that dimensional analysis is both important and "difficult and burdensome", then surely we should want to automate as much of it as possible? Of course the easy cases are easy: torque = 45_N * 18_m is obviously correct, but the hard cases are not. As far as I can tell, your suggested syntax doesn't easily support compound units, let alone more realistic cases of formulae from sciences other than electrical engineering: # Van der Waals equation pressure = (5_mol * 6.022140857e23/1_mol * 1.38064852e?23_J/1_K * 340_K / (2.5_m**3 - 5_mol * 0.1281_m**3/1_mol) - (5_mol)**2*(19.7483_L*1_L*1_bar/(1_mol)**2) /(2.5_m**3)**2) ) I'm not even sure if I've got that right after checking it three times. I believe it is completely unrealistic to expect the reader to spot dimensional errors by eye in anything but the most trivial cases. Here is how I would do the same calculation in sympy. For starters, rather than using a bunch of "magic constants" directly in the formula, I would set them up as named variables. That's just good programming practice whether there are units involved or not. # Grab the units we need. from sympy.physics.units import mol, J, K, m, Pa, bar, liter as L # And some constants. from sympy.physics.units import avogadro_constant as N_a, boltzmann as k R = N_a*k # Define our variables. n = 5*mol T = 340*K V = 2.5*m**3 # Van der Waals constants for carbon tetrachloride. a = 19.7483*L**2*bar/mol**2 b = 0.1281*m**3/mol # Apply Van der Waals equation to calculate the pressure p = n*R*T/(V - n*b) - n**2*a/V**2 # Print the result in Pascal. print p/Pa Sympy (apparently) doesn't warn you if your units are incompatible, it just treats them as separate terms in an expression: py> 2*m + 3*K 3*K + 2*m which probably makes sense from the point of view of a computer algebra system (adding two metres and three degrees Kelvin is no weirder than adding x and y). But from a unit conversion point of view, I think sympy is the wrong solution. Nevertheless, it still manages to give the right result, and in a form that is easy to understand, easy to read, and easy to confirm is correct. (If p/Pa is not a pure number, then I know the units are wrong. That's not ideal, but it's better than having to track the units myself. There are better solutions than sympy, I just picked this because I happened to have it already installed.) > This proposal is more about adding capabilities to be base > language that happen to make dimensional analysis easier and more attractive > than about providing dimensional analysis itself. I think it is an admirable aim to want to make unit tracking easier in Python. That doesn't imply that this is the right way to go about it. Perhaps you should separate your suggested syntax from your ultimate aim. Instead of insisting that your syntax is the One Right Way to get units into Python, how about thinking about what other possible syntax might work? Here's a possibility, thrown out just to be shot down: # Van der Waals constants for carbon tetrachloride. a = 19.7483 as L**2*bar/mol**2 b = 0.1281 as m**3/mol I think that's better than: a = 19.7483_L * (1_L) * (1_bar) / (1_mol)**2 b = 0.1281_m * (1_m)**2 / 1_mol and *certainly* better than trying to have the intrepreter guess whether: 19.7483_L**2*bar/mol**2 means 19.7483 with units L**2*bar/mol**2 or 19.7483_L squared, times bar, divided by mol**2 -- Steve From rosuav at gmail.com Sat Aug 27 15:22:54 2016 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 28 Aug 2016 05:22:54 +1000 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: <20160827182514.GV26300@ando.pearwood.info> References: <20160825081955.GA21350@kundert.designers-guide.com> <20160826100129.GN26300@ando.pearwood.info> <20160826222324.GF9468@kundert.designers-guide.com> <20160827182514.GV26300@ando.pearwood.info> Message-ID: On Sun, Aug 28, 2016 at 4:25 AM, Steven D'Aprano wrote: > > Sympy (apparently) doesn't warn you if your units are incompatible, it > just treats them as separate terms in an expression: > > py> 2*m + 3*K > 3*K + 2*m > > which probably makes sense from the point of view of a computer algebra > system (adding two metres and three degrees Kelvin is no weirder than > adding x and y). But from a unit conversion point of view, I think > sympy is the wrong solution. As a generic tool, I would say this is correct. It keeps things simple and straight-forward. Worst case, you see a strange result at the end, rather than getting an instant exception; in fact, it's very similar to NaN, in that some operations might cancel out the "error" status. ChrisA From levkivskyi at gmail.com Sat Aug 27 18:17:20 2016 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Sun, 28 Aug 2016 01:17:20 +0300 Subject: [Python-ideas] SI scale factors in Python In-Reply-To: References: <20160825081955.GA21350@kundert.designers-guide.com> <20160826100129.GN26300@ando.pearwood.info> <20160826222324.GF9468@kundert.designers-guide.com> <20160827182514.GV26300@ando.pearwood.info> Message-ID: On 26 August 2016 at 18:35, Steven D'Aprano wrote: > I think that units are orthogonal to types: I can have a float of unit > "gram" and a Fraction of unit "gram", and they shouldn't necessarily be > treated as the same type. I think you are mixing here what I sometimes call classes (i.e. runtime implementation) and types (i.e., static "interface" declaration). In this terms I think units are types. But probably it is a more philosophical question and could be a matter of taste. On 27 August 2016 at 15:04, Stephen J. Turnbull < turnbull.stephen.fw at u.tsukuba.ac.jp> wrote: > def power(potential, current): > return WattType(float(Volt)*float(Ampere)) One don't need to call float on NewTypes derived from it, they will be cast "automatically", so that: def power(I, V): return WattType(I*V) should be sufficient. Concerning the speed vs. flexibility issue, one could use stub files: # content of units.py def WattType(x): return x #etc. # contents of units.pyi class WattType: def __init__(self, x: float) -> None: ... def __mul__(self, other: float) -> WattType: # over 9000 of complicated overloads and stuff In such way units will be very fast at runtime but will be thoroughly checked by static type checkers. As I understand there are two separate parts of the proposal: 1) suffixes like micro, kilo, etc. -- but Guido does not like this idea yet 2) physical units -- this part I think could be 99% percent solved by PEP 484 and PEP 526 (it is not 100% because this will require dependent types). -- Ivan On 27 August 2016 at 22:22, Chris Angelico wrote: > On Sun, Aug 28, 2016 at 4:25 AM, Steven D'Aprano > wrote: > > > > Sympy (apparently) doesn't warn you if your units are incompatible, it > > just treats them as separate terms in an expression: > > > > py> 2*m + 3*K > > 3*K + 2*m > > > > which probably makes sense from the point of view of a computer algebra > > system (adding two metres and three degrees Kelvin is no weirder than > > adding x and y). But from a unit conversion point of view, I think > > sympy is the wrong solution. > > As a generic tool, I would say this is correct. It keeps things simple > and straight-forward. Worst case, you see a strange result at the end, > rather than getting an instant exception; in fact, it's very similar > to NaN, in that some operations might cancel out the "error" status. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arek.bulski at gmail.com Sat Aug 27 18:39:42 2016 From: arek.bulski at gmail.com (Arek Bulski) Date: Sun, 28 Aug 2016 00:39:42 +0200 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: References: Message-ID: Just to mention, these K M G suffixes are dimensionless. They can be used simply out of convenience, like 4K is a shorthand for 4000. And 9G is definitely easier to write and *therefore less prone to error* than a full literal. Dimensional analysis is fine but not the only reason to add these. pozdrawiam, Arkadiusz Bulski 2016-08-27 11:04 GMT+02:00 Arek Bulski : > SI units are a standard that was kind of imposed top down on the computer > science community. But we learned to use KB MB so why no keep the defacto > standard we already have? Kibibytes and mibibytes were never really > adopted. > > 1K == 1000 > 1KB == 1024 > 1M == 1000**2 > 1MB == 1024**2 > > Suffixes, simple. > > int_value = 8M > float_value = 8.0M or float("8M") > fraction_value = Fraction(1M, 8) or Fraction("1M/8") > decimal_value = Decimal("1.2345M") > > Suffixes are by definition at the end of a literal. So > > 1E1E == 1E1 * 1E > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Aug 28 06:37:00 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 28 Aug 2016 11:37:00 +0100 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: References: Message-ID: On 27 August 2016 at 23:39, Arek Bulski wrote: > They can be used simply out of convenience, like 4K is a shorthand for 4000. > And 9G is definitely easier to write and *therefore less prone to error* > than a full literal. I dispute "less prone to error". Like it or not, there are a lot of people who would interpret "4K" as 4096. You can argue whether they are right or wrong all you like, but the fact remains that 4K is open to misinterpretation in a way that 4096 or 4000 is not. The (minimal) convenience of being able to write 4K rather than 4000 doesn't, in my mind, outweigh the risk. And if you're concerned about larger numbers, such as 16000000000, and the need to count zeroes, I'd argue that you should name such a constant - and Python 3.6 will allow you to write it as 16_000_000_000 in any case, making things completely clear. Paul From steve at pearwood.info Sun Aug 28 08:22:49 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 28 Aug 2016 22:22:49 +1000 Subject: [Python-ideas] SI scale factors alone, without units or dimensional analysis In-Reply-To: References: Message-ID: <20160828122247.GW26300@ando.pearwood.info> On Sun, Aug 28, 2016 at 11:37:00AM +0100, Paul Moore wrote: > And if you're concerned about larger numbers, such as 16000000000, and > the need to count zeroes, I'd argue that you should name such a > constant - and Python 3.6 will allow you to write it as 16_000_000_000 > in any case, making things completely clear. Since we're talking about integers, you can write 16*10**9 and the keyhole optimizer in recent versions of Python will do the maths for you at compile-time. Or just write 16e9 and use a float. By the way, I see at least one language (Frink) uses EE for exact integer exponents: 16e9 => float 16.0*1.0**9 16ee9 => int 16*10**9 Anyone think this is a good idea? -- Steve From dvl at psu.edu Sun Aug 28 12:24:16 2016 From: dvl at psu.edu (ROGER GRAYDON CHRISTMAN) Date: Sun, 28 Aug 2016 12:24:16 -0400 Subject: [Python-ideas] A proposal to rename the term "duck typing" Message-ID: <1472401456l.14614654l.0l@psu.edu> After sending this proposal to the Python list, it was suggested that I redirect it here ("recommended" likely being to strong a word). We have a term in our lexicon "duck typing" that traces its origins, in part to a quote along the lines of "If it walks like a duck, and talks like a duck, ..." But I would assert that it would be far more Pythonic to use this quote instead: BEDEMIR: How do you know she is a witch? VILLAGER #1: She looks like one. In that case, it would be far more appropriate for use to call this sort of type analysis "witch typing" For additional supporting evidence: WITCH: I'm not a witch. I'm not a witch. BEDEMIR: But you are dressed as one. WITCH: They dressed me up like this. CROWD: No, we didn't -- no. WITCH: And this isn't my nose, it's a false one. BEDEMIR: Well? VILLAGER #1: Well, we did do the nose. BEDEMIR: The nose? VILLAGER #1: And the hat -- but she is a witch! The listed operations very much resemble how one defines an iterator in Python. One can take any object add imply add two methods, named __iter__ and __next__ and (voila!), you now have an iterator. And it is known to be such because of these methods. Now, to be intellectually honest, I must acknowledge a very cogent opposing argument from a Mr. Cameron Simpson in Australia, who pointed out that the true identification of a which was not based on these superficial appearances, but instead based on the weight of a duck -- which of course, is very strongly suggestive that "duck typing" is appropriate. After recovering from the confusion that I suffered from him having a name other than "Bruce", I tried to consider more fully the logic of his argument. It seems that the test here is as follows: VILLAGER #1: If... she.. weighs the same as a duck, she's made of wood. BEDEMIR: And therefore--? VILLAGER #1: A witch! Now, "duck typing" strictly applied would seem to me that weighing the same as a duck would imply that she was a duck. This is not explicitly stated in the source material -- but if that was the implicit understanding, then we would have "is-a" relationships relating ducks to wooden things to witches. It's here that I admittedly get a bit befuddled. If being a duck is sufficient evidence of witchcraft, we would be saying that all ducks are witches. On the other hand, if we are to use this test universally, then that sounds like all witches are ducks. I do not have enough samples to fully determine whether the set of witches and the set of ducks are identical sets, but I would be intended to think otherwise, especially if we furthermore have to consider the equivalence with the set of things made of wood. I think I might find it far simpler to accept the villagers' initial intuitions about the type-inferences rules for witches. I certainly would not want to have to set up a three-layer class hierarchy any time I wished to do dynamic type inference, so I'll just go back to the initial question. It was not whether the woman was a duck, but whether she was a witch. Hence, I believe a more Pythonic name for this sort of type inference should be "witch typing" Roger Christman Electrical Engineering and Computer Science Pennsylvania State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sun Aug 28 13:39:35 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 28 Aug 2016 13:39:35 -0400 Subject: [Python-ideas] A proposal to rename the term "duck typing" In-Reply-To: <1472401456l.14614654l.0l@psu.edu> References: <1472401456l.14614654l.0l@psu.edu> Message-ID: On 8/28/2016 12:24 PM, ROGER GRAYDON CHRISTMAN wrote: > After sending this proposal to replace 'duck typing' with 'witch typing' > to the Python list, where it was mostly viewed as a joke > it was suggested that I redirect it here by Mark Lawrence, who likes to joke also. I recommend that people not waste time on this. -- Terry Jan Reedy From bruce at leban.us Sun Aug 28 13:41:42 2016 From: bruce at leban.us (Bruce Leban) Date: Sun, 28 Aug 2016 10:41:42 -0700 Subject: [Python-ideas] A proposal to rename the term "duck typing" In-Reply-To: <1472401456l.14614654l.0l@psu.edu> References: <1472401456l.14614654l.0l@psu.edu> Message-ID: On Sunday, August 28, 2016, ROGER GRAYDON CHRISTMAN wrote: > > We have a term in our lexicon "duck typing" that traces its origins, in > part to a quote along the lines of > "If it walks like a duck, and talks like a duck, ..." > > ... > > In that case, it would be far more appropriate for use to call this sort > of type analysis "witch typing" > I believe the duck is out of the bag on this one. First the "duck test" that you quote above is over 100 years old. https://en.m.wikipedia.org/wiki/Duck_test So that's entrenched. Second this isn't a Python-only term anymore and language is notoriously hard to change prescriptively. Third I think the duck test is more appropriate than the witch test which involves the testers faking the results. --- Bruce -- --- Bruce Check out my puzzle book and get it free here: http://J.mp/ingToConclusionsFree (available on iOS) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-ideas at shalmirane.com Sun Aug 28 21:44:05 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Sun, 28 Aug 2016 18:44:05 -0700 Subject: [Python-ideas] real numbers with SI scale factors Message-ID: <20160829014404.GB29601@kundert.designers-guide.com> Wow, things have suddenly turned very negative. I understand that this is very normal for these types of forums where it is easy to lose sight of the big picture. So let me try to refocus this discussion again. MOTIVATION The way the scientific and engineering communities predominately write real numbers is by using SI scale factors. These numbers almost always represent physical quantities, so it is common to write the number with scale factor and units. So for example, the distance to Andromeda is 780kpc, the pressure at the bottom of the Mariana Trench is 108MPa, the total power produced by a typical hurricane is 600TW, the number of base pairs in human DNA is 3.2Gb, and the Bohr radius is 53pm. These numbers are the language of science and engineering, but in recent years they have also entered the realm of popular culture. For example, an article by Ars Technica that calculates that the value of the music that can fit on a classic iPod as over $8G (that is the total penalty that can be accessed if they were all stolen copyright works). http://arstechnica.com/tech-policy/2012/06/from-gigabytes-to-petadollars-copyright-math-begets-copyright-currency/ In all of these examples the numbers are either very large or very small, and they employ the use of SI scale factors to make them easy to write and communicate. This way of writing numbers is so well established that it was formally standardized as part of the International System of Units (the Syst?me international d'unit?s) in 1960. The problem is that most general purpose programming languages do not support this way of writing numbers. Instead they force people to convert the numbers to and from exponential notation, which is rather inconvenient and hard to read, write, say, or hear (it is much easier to say or hear 53 picometers than 5.3e-11 meters). When working with a general purpose programming language, the above numbers become: 780kpc -> 7.8e+05 108MPa -> 1.08e+08 600TW -> 6e+14 3.2Gb -> 3.2e+09 53pm -> 5.3e-11 $8G -> 8e+09 Notice that the numbers become longer, harder to read, harder to type, harder to say, and harder to hear. NEXT STEP So the question is, should Python accommodate this widely used method of writing real numbers? Python already has many ways of writing numbers. For example, Python allows integers to be written in binary, octal, decimal and hex. Any number you can write in one form you can write in another, so the only real reason for providing all these ways is the convenience of the users. Don't Python's users in the scientific and engineering communities deserve the same treatment? These are, after all, core communities for Python. Now, Python is a very flexible language, and it is certainly possible to add simple libraries to make it easy to read and write numbers with SI scale factors. I have written such a library (engfmt). But with such libraries this common form of writing numbers remains a second class citizen. The language itself does not understand SI scale factors, instead any time you want to input or output numbers in their natural form you must manually convert them yourself. Changing Python so that it understands SI scale factors on real numbers as first class citizens innately requires a change to the base language; it cannot be done solely through libraries. The question before you is, should we do it? The same question confronted the developers of Python when it was decided to add binary, octal and hexadecimal number support to Python. You could have done it with libraries, but you didn't because binary, octal and hexadecimal numbers were too common and important to be left as second class citizens. Well, use of binary, octal and hexadecimal numbers is tiny compared to the use of real numbers with SI scale factors. Before we expend any more effort on this topic, let's put aside the question of how it should be done, or how it will be used after its done, and just focus on whether we do it at all. Should Python support real numbers specified with SI scale factors as first class citizens? What do you think? -Ken From thalesfc at gmail.com Sun Aug 28 21:50:39 2016 From: thalesfc at gmail.com (Thales filizola costa) Date: Sun, 28 Aug 2016 18:50:39 -0700 Subject: [Python-ideas] a multiProcess scheduler Message-ID: Hey guys, I was recently involved in a job change, and for that I have been doing a lot of programming interviews (white board questions). One common question on those interviews were: "how to implement a scheduler?" follow up by "how to make it multi-processing?". I have to confess that I only had a clue on how to do that. After the interview period, I started searching for a solution for that, and could not find one. The std python implementation for a scheduler says "No multi-threading is implied; you are supposed to hack that yourself, or use a single instance per application." So, I hacked my own implementation of a multi-process scheduler in python: https://github.com/thalesfc/Multprocess-Scheduler What do you guys think? How to improve it? Is it relevant enough to be incorporated to std python ? Thanks, Thales. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Aug 28 22:32:35 2016 From: mertz at gnosis.cx (David Mertz) Date: Sun, 28 Aug 2016 19:32:35 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829014404.GB29601@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: -1 on Python ever having any syntactic support for SI scale factors. It makes the language needlessly complicated, has no benefit I've discerned (vs using libraries), and is a magnet for a large class of bugs. Btw, the argument below feels dishonest in another respect. Within a domain there is a general size scale of quantities of interest. I worked in a molecular dynamics lab for a number of years, and we would deal with simulated timesteps of a few femtoseconds. A total simulation might run into microseconds (or with our custom supercomputer, a millisecond). There were lots of issues I don't begin to understand of exactly how many femtoseconds might be possible to squeeze in a timesteps while retaining good behavior. But the numbers of interest were in the range 0.5-5, and anyone in the field knows that. In contrast, cosmologists deal with intervals of petaseconds. Yeah, I know it's not as simple as that, but just to get at the symmetry. No one would write 2.5e-15 every time they were doing something with an MD timestep. The scaling, if anywhere at all, would be defined once as a general factor at the boundaries. The number of interest is simply, e.g. 2.5, not some large negative exponent on that. In fact, at a certain point I proposed that we should deal with rounding issues by calling the minimum domain specific time unit an attosecond, and only use integers in using this unit. That wasn't what was adopted, but it wasn't absurd. If we had done that, we simply deal with, say 1500 "inherent units" in the program. The fact it related to a physical quantity is at most something for documentation (this principle isn't different because we used floats in this case). On Aug 28, 2016 8:44 PM, "Ken Kundert" wrote: > Wow, things have suddenly turned very negative. I understand that this is > very > normal for these types of forums where it is easy to lose sight of the big > picture. So let me try to refocus this discussion again. > > > MOTIVATION > > The way the scientific and engineering communities predominately write real > numbers is by using SI scale factors. These numbers almost always > represent > physical quantities, so it is common to write the number with scale factor > and > units. So for example, the distance to Andromeda is 780kpc, the pressure > at the > bottom of the Mariana Trench is 108MPa, the total power produced by a > typical > hurricane is 600TW, the number of base pairs in human DNA is 3.2Gb, and > the Bohr > radius is 53pm. These numbers are the language of science and > engineering, but > in recent years they have also entered the realm of popular culture. For > example, an article by Ars Technica that calculates that the value of the > music > that can fit on a classic iPod as over $8G (that is the total penalty that > can > be accessed if they were all stolen copyright works). > > http://arstechnica.com/tech-policy/2012/06/from-gigabytes- > to-petadollars-copyright-math-begets-copyright-currency/ > > In all of these examples the numbers are either very large or very small, > and > they employ the use of SI scale factors to make them easy to write and > communicate. This way of writing numbers is so well established that it > was > formally standardized as part of the International System of Units (the > Syst?me > international d'unit?s) in 1960. > > The problem is that most general purpose programming languages do not > support > this way of writing numbers. Instead they force people to convert the > numbers to > and from exponential notation, which is rather inconvenient and hard to > read, > write, say, or hear (it is much easier to say or hear 53 picometers than > 5.3e-11 > meters). > > When working with a general purpose programming language, the above numbers > become: > > 780kpc -> 7.8e+05 > 108MPa -> 1.08e+08 > 600TW -> 6e+14 > 3.2Gb -> 3.2e+09 > 53pm -> 5.3e-11 > $8G -> 8e+09 > > Notice that the numbers become longer, harder to read, harder to type, > harder to > say, and harder to hear. > > > NEXT STEP > > So the question is, should Python accommodate this widely used method of > writing > real numbers? Python already has many ways of writing numbers. For example, > Python allows integers to be written in binary, octal, decimal and hex. Any > number you can write in one form you can write in another, so the only real > reason for providing all these ways is the convenience of the users. Don't > Python's users in the scientific and engineering communities deserve the > same > treatment? These are, after all, core communities for Python. > > Now, Python is a very flexible language, and it is certainly possible to > add > simple libraries to make it easy to read and write numbers with SI scale > factors. I have written such a library (engfmt). But with such libraries > this > common form of writing numbers remains a second class citizen. The language > itself does not understand SI scale factors, instead any time you want to > input > or output numbers in their natural form you must manually convert them > yourself. > > Changing Python so that it understands SI scale factors on real numbers as > first > class citizens innately requires a change to the base language; it cannot > be > done solely through libraries. The question before you is, should we do > it? > > The same question confronted the developers of Python when it was decided > to add > binary, octal and hexadecimal number support to Python. You could have > done it > with libraries, but you didn't because binary, octal and hexadecimal > numbers > were too common and important to be left as second class citizens. Well, > use of > binary, octal and hexadecimal numbers is tiny compared to the use of real > numbers with SI scale factors. > > Before we expend any more effort on this topic, let's put aside the > question of > how it should be done, or how it will be used after its done, and just > focus on > whether we do it at all. Should Python support real numbers specified with > SI > scale factors as first class citizens? > > What do you think? > > -Ken > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Aug 28 22:33:16 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 29 Aug 2016 12:33:16 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829014404.GB29601@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: On Mon, Aug 29, 2016 at 11:44 AM, Ken Kundert wrote: > When working with a general purpose programming language, the above numbers > become: > > 780kpc -> 7.8e+05 > 108MPa -> 1.08e+08 > 600TW -> 6e+14 > 3.2Gb -> 3.2e+09 > 53pm -> 5.3e-11 > $8G -> 8e+09 > > Notice that the numbers become longer, harder to read, harder to type, harder to > say, and harder to hear. > And easier to compare. The SI prefixes are almost consistent in using uppercase for larger units and lowercase for smaller, but not quite; and there's no particular pattern in which letter is larger. For someone who isn't extremely familiar with them, that makes them completely unordered - which is larger, peta or exa? Which is smaller, nano or pico? Plus, there's a limit to how far you can go with these kinds of numbers, currently yotta at e+24. Exponential notation scales to infinity (to 1e308 in IEEE 64-bit binary floats, but plenty further in decimal.Decimal - I believe its limit is about 1e+(1e6), and REXX on OS/2 had a limit of 1e+(1e10) for its arithmetic), remaining equally readable at all scales. So we can't get rid of exponential notation, no matter what happens. Mathematics cannot usefully handle a system in which we have to represent large exponents with ridiculous compound scale factors: sys.float_info.max = 179.76931348623157*Y*Y*Y*Y*Y*Y*Y*Y*Y*Y*Y*Y*E (It's even worse if the Exa collision means you stop at Peta. 179.76931348623157*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*M, anyone?) Which means that these tags are a duplicate way of representing a specific set of exponents. > Before we expend any more effort on this topic, let's put aside the question of > how it should be done, or how it will be used after its done, and just focus on > whether we do it at all. Should Python support real numbers specified with SI > scale factors as first class citizens? Except that those are exactly the important questions to be answered. How *could* it be done? With the units stripped off, your examples become: 780k == 7.8e+05 == 780*k 108M == 1.08e+08 == 108*M 600T == 6e+14 == 600*T 3.2G == 3.2e+09 == 3.2*G 53p == 5.3e-11 == 53*p 8G == 8e+09 == 8*G Without any support whatsoever, you can already use the third column notation, simply by creating this module: # si.py k, M, G, T, P, E, Z, Y = 1e3, 1e6, 1e9, 1e12, 1e15, 1e18, 1e21, 1e24 m, ?, n, p, f, a, z, y = 1e-3, 1e-6, 1e-9, 1e-12, 1e-15, 1e-18, 1e-21, 1e-24 u = ? K = k And using it as "from si import *" at the top of your code. Do we see a lot of code in the wild doing this? "[H]ow it will be used after it's done" is exactly the question that this would answer. > Don't Python's users in the scientific and engineering communities deserve > the same treatment? These are, after all, core communities for Python. Yes. That's why we have things like the @ matrix multiplication operator (because the numeric computational community asked for it), and %-formatting for bytes strings (because the networking, mainly HTTP serving, community asked for it). Python *does* have a history of supporting things that are needed by specific sub-communities of Python coders. But there first needs to be a demonstrable need. How much are people currently struggling due to the need to transform "gigapascal" into "e+9"? Can you show convoluted real-world code that would be made dramatically cleaner by language support? ChrisA From rosuav at gmail.com Sun Aug 28 22:38:03 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 29 Aug 2016 12:38:03 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: On Mon, Aug 29, 2016 at 12:32 PM, David Mertz wrote: > In fact, at a certain point I proposed that we should deal with rounding > issues by calling the minimum domain specific time unit an attosecond, and > only use integers in using this unit. That wasn't what was adopted, but it > wasn't absurd. If we had done that, we simply deal with, say 1500 "inherent > units" in the program. The fact it related to a physical quantity is at most > something for documentation (this principle isn't different because we used > floats in this case). Definitely not absurd. I've done the same kind of thing numerous times (storing monetary values in cents, distances in millimeters, or timepoints in music in milliseconds), because it's just way, WAY simpler than working with fractional values. So the SI prefix gets attached to the (implicit) *unit*, not to the value. I believe this is the correct way to handle things. ChrisA From python at mrabarnett.plus.com Sun Aug 28 22:45:46 2016 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 29 Aug 2016 03:45:46 +0100 Subject: [Python-ideas] a multiProcess scheduler In-Reply-To: References: Message-ID: On 2016-08-29 02:50, Thales filizola costa wrote: > Hey guys, > > I was recently involved in a job change, and for that I have been doing > a lot of programming interviews (white board questions). One common > question on those interviews were: "how to implement a scheduler?" > follow up by "how to make it multi-processing?". I have to confess that > I only had a clue on how to do that. > > After the interview period, I started searching for a solution for that, > and could not find one. The std python implementation for a scheduler > says "No multi-threading is implied; you are supposed to hack that > yourself, or use a single instance per application." > > So, I hacked my own implementation of a multi-process scheduler in python: > > https://github.com/thalesfc/Multprocess-Scheduler > > What do you guys think? How to improve it? Is it relevant enough to be > incorporated to std python ? > Try putting it on PyPI and see how much use it gets. From python at mrabarnett.plus.com Sun Aug 28 22:43:45 2016 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 29 Aug 2016 03:43:45 +0100 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829014404.GB29601@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: On 2016-08-29 02:44, Ken Kundert wrote: [snip] > > The way the scientific and engineering communities predominately write real > numbers is by using SI scale factors. These numbers almost always represent > physical quantities, so it is common to write the number with scale factor and > units. So for example, the distance to Andromeda is 780kpc, the pressure at the > bottom of the Mariana Trench is 108MPa, the total power produced by a typical > hurricane is 600TW, the number of base pairs in human DNA is 3.2Gb, and the Bohr > radius is 53pm. These numbers are the language of science and engineering, but > in recent years they have also entered the realm of popular culture. For > example, an article by Ars Technica that calculates that the value of the music > that can fit on a classic iPod as over $8G (that is the total penalty that can > be accessed if they were all stolen copyright works). > > http://arstechnica.com/tech-policy/2012/06/from-gigabytes-to-petadollars-copyright-math-begets-copyright-currency/ > For currency, it's usually "million" or "m"/"M", "billion" or "bn" (or maybe "b"/"B"), "trillion" (or maybe "tn" or "t"/"T"). Using a suffixed SI scale factor with a prefixed currency symbol is not that common, in my experience. [snip] > When working with a general purpose programming language, the above numbers > become: > > 780kpc -> 7.8e+05 > 108MPa -> 1.08e+08 > 600TW -> 6e+14 > 3.2Gb -> 3.2e+09 > 53pm -> 5.3e-11 > $8G -> 8e+09 > > Notice that the numbers become longer, harder to read, harder to type, harder to > say, and harder to hear. > There's also "engineering notation", where the exponent is a multiple of 3. [snip] > The same question confronted the developers of Python when it was decided to add > binary, octal and hexadecimal number support to Python. You could have done it > with libraries, but you didn't because binary, octal and hexadecimal numbers > were too common and important to be left as second class citizens. Well, use of > binary, octal and hexadecimal numbers is tiny compared to the use of real > numbers with SI scale factors. > I expect that octal and hexadecimal number support was there from the start. CPython is written in C and Python borrowed the notation. The binary notation was added in Python 2.6 and followed the same pattern as the hexadecimal notation. The octal notation of a leading "0" was later replaced with a clearer one that followed the same pattern. C had octal and hexadecimal from the start. (Actually, I'm not entirely sure about hexadecimal, octal being the preferred form, but if it wasn't there from the very start, it was an early addition.) C descends from BCPL, which had octal and hexadecimal, and BCPL dates from 1967. There are other languages too that had hexadecimal and octal. They've been around in programming languages for decades. How many languages have scale factors? Does Fortran? Not that I know of. [snip] From bruce at leban.us Sun Aug 28 22:56:00 2016 From: bruce at leban.us (Bruce Leban) Date: Sun, 28 Aug 2016 19:56:00 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829014404.GB29601@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: On Sun, Aug 28, 2016 at 6:44 PM, Ken Kundert wrote: > When working with a general purpose programming language, the above numbers > become: > > 780kpc -> 7.8e+05 > 108MPa -> 1.08e+08 > 600TW -> 6e+14 > 3.2Gb -> 3.2e+09 > 53pm -> 5.3e-11 > $8G -> 8e+09 > These are not equal. 780kpc is a *number with units*. 7.8e+05 == 780000 is a *unitless number*. All the numbers on the right hand side above have no units so I can't tell which are pc or W or m or $. It's asking for trouble to go halfway in representing units. On the left hand side, 780kpc + 108MPa is invalid while 780kpc + 53pm is valid. On the right hand side, sums of any two numbers are valid as they would be with the unitless SI prefixes. So if you want to solve this problem, write a module that supports units. For example, m(780, 'kpc') == m('780kpc') == m(780, kpc) and it's legal to write m(780, kpc) + m(53, pm) but an exception gets raised if you write m(2, kW) + m(3, kh) instead of m(2, kW) * m(3, km) == m(6, MWh). In fact, several people have already done that. Here are three I found in < 1 minute of searching: https://pint.readthedocs.io/en/0.7.2/ https://pypi.python.org/pypi/units/ https://python-measurement.readthedocs.io/en/latest/ Should support for this ever get added to the core language? I doubt it. But if one of these modules becomes enormously popular and semi-standard, you never know. I think you're much more likely to get this into your Python code by way of a preprocessor. --- Bruce Check out my puzzle book and get it free here: http://J.mp/ingToConclusionsFree (available on iOS) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-ideas at shalmirane.com Sun Aug 28 23:01:31 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Sun, 28 Aug 2016 20:01:31 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: <20160829030131.GC29601@kundert.designers-guide.com> > It makes the language needlessly complicated, has no benefit I've discerned > (vs using libraries), and is a magnet for a large class of bugs. Well the comment about bugs in speculation that does not fit with the extensive experience in the electrical engineering community. But other than that, these arguments could be used against supporting binary, octal, and hexadecimal notation. Are you saying building those into the language was a mistake? On Sun, Aug 28, 2016 at 07:32:35PM -0700, David Mertz wrote: > -1 on Python ever having any syntactic support for SI scale factors. > > > Btw, the argument below feels dishonest in another respect. Within a domain > there is a general size scale of quantities of interest. I worked in a > molecular dynamics lab for a number of years, and we would deal with > simulated timesteps of a few femtoseconds. A total simulation might run > into microseconds (or with our custom supercomputer, a millisecond). > > There were lots of issues I don't begin to understand of exactly how many > femtoseconds might be possible to squeeze in a timesteps while retaining > good behavior. But the numbers of interest were in the range 0.5-5, and > anyone in the field knows that. > > In contrast, cosmologists deal with intervals of petaseconds. Yeah, I know > it's not as simple as that, but just to get at the symmetry. > > No one would write 2.5e-15 every time they were doing something with an MD > timestep. The scaling, if anywhere at all, would be defined once as a > general factor at the boundaries. The number of interest is simply, e.g. > 2.5, not some large negative exponent on that. > > In fact, at a certain point I proposed that we should deal with rounding > issues by calling the minimum domain specific time unit an attosecond, and > only use integers in using this unit. That wasn't what was adopted, but it > wasn't absurd. If we had done that, we simply deal with, say 1500 "inherent > units" in the program. The fact it related to a physical quantity is at > most something for documentation (this principle isn't different because we > used floats in this case). Yes, without an convenient way of specifying real numbers, the computation communities have to resort to things like this. And they can work for a while, but over type the scale of things often change, and good choice of scale can turn bad after a few years. For example, when I first started in electrical engineering, the typical size of capacitors was in the micro Farad range, and software would just assume that if you gave a capacitance it was in uF. But then, with the advancement of technology the capacitors got smaller. They went from uF to nF to pF and now a growing fraction of capacitors are specified in the fF range. The fact that SPICE allowed values to be specified with SI scale factors, meant that it continued to be easy to use over the years, whereas the programs that hard coded the scale of its numbers because increasing difficult to use and then eventually became simply absurd. Even your example is a good argument for specifying numbers is SI scale factors. If I am using one of your molecular simulators, I don't want to specify the simulation time range as being from 1 to 1_000_000_000_000 fs. That is ridiculous. There are 12 orders of magnitude between the minimum resolvable time and the maximum. There are only two practical ways of representing values over such a wide range: using SI scale factors and using exponential notation. And we can tell which is preferred. You said femptoseconds, you did not say 1e-15 seconds. Even you prefer SI scale factors. -Ken > On Aug 28, 2016 8:44 PM, "Ken Kundert" wrote: > > > Wow, things have suddenly turned very negative. I understand that this is > > very > > normal for these types of forums where it is easy to lose sight of the big > > picture. So let me try to refocus this discussion again. > > > > > > MOTIVATION > > > > The way the scientific and engineering communities predominately write real > > numbers is by using SI scale factors. These numbers almost always > > represent > > physical quantities, so it is common to write the number with scale factor > > and > > units. So for example, the distance to Andromeda is 780kpc, the pressure > > at the > > bottom of the Mariana Trench is 108MPa, the total power produced by a > > typical > > hurricane is 600TW, the number of base pairs in human DNA is 3.2Gb, and > > the Bohr > > radius is 53pm. These numbers are the language of science and > > engineering, but > > in recent years they have also entered the realm of popular culture. For > > example, an article by Ars Technica that calculates that the value of the > > music > > that can fit on a classic iPod as over $8G (that is the total penalty that > > can > > be accessed if they were all stolen copyright works). > > > > http://arstechnica.com/tech-policy/2012/06/from-gigabytes- > > to-petadollars-copyright-math-begets-copyright-currency/ > > > > In all of these examples the numbers are either very large or very small, > > and > > they employ the use of SI scale factors to make them easy to write and > > communicate. This way of writing numbers is so well established that it > > was > > formally standardized as part of the International System of Units (the > > Syst?me > > international d'unit?s) in 1960. > > > > The problem is that most general purpose programming languages do not > > support > > this way of writing numbers. Instead they force people to convert the > > numbers to > > and from exponential notation, which is rather inconvenient and hard to > > read, > > write, say, or hear (it is much easier to say or hear 53 picometers than > > 5.3e-11 > > meters). > > > > When working with a general purpose programming language, the above numbers > > become: > > > > 780kpc -> 7.8e+05 > > 108MPa -> 1.08e+08 > > 600TW -> 6e+14 > > 3.2Gb -> 3.2e+09 > > 53pm -> 5.3e-11 > > $8G -> 8e+09 > > > > Notice that the numbers become longer, harder to read, harder to type, > > harder to > > say, and harder to hear. > > > > > > NEXT STEP > > > > So the question is, should Python accommodate this widely used method of > > writing > > real numbers? Python already has many ways of writing numbers. For example, > > Python allows integers to be written in binary, octal, decimal and hex. Any > > number you can write in one form you can write in another, so the only real > > reason for providing all these ways is the convenience of the users. Don't > > Python's users in the scientific and engineering communities deserve the > > same > > treatment? These are, after all, core communities for Python. > > > > Now, Python is a very flexible language, and it is certainly possible to > > add > > simple libraries to make it easy to read and write numbers with SI scale > > factors. I have written such a library (engfmt). But with such libraries > > this > > common form of writing numbers remains a second class citizen. The language > > itself does not understand SI scale factors, instead any time you want to > > input > > or output numbers in their natural form you must manually convert them > > yourself. > > > > Changing Python so that it understands SI scale factors on real numbers as > > first > > class citizens innately requires a change to the base language; it cannot > > be > > done solely through libraries. The question before you is, should we do > > it? > > > > The same question confronted the developers of Python when it was decided > > to add > > binary, octal and hexadecimal number support to Python. You could have > > done it > > with libraries, but you didn't because binary, octal and hexadecimal > > numbers > > were too common and important to be left as second class citizens. Well, > > use of > > binary, octal and hexadecimal numbers is tiny compared to the use of real > > numbers with SI scale factors. > > > > Before we expend any more effort on this topic, let's put aside the > > question of > > how it should be done, or how it will be used after its done, and just > > focus on > > whether we do it at all. Should Python support real numbers specified with > > SI > > scale factors as first class citizens? > > > > What do you think? > > > > -Ken > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > From brenbarn at brenbarn.net Sun Aug 28 23:26:38 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Sun, 28 Aug 2016 20:26:38 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829014404.GB29601@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: <57C3AB6E.1030107@brenbarn.net> On 2016-08-28 18:44, Ken Kundert wrote: > When working with a general purpose programming language, the above numbers > become: > > 780kpc -> 7.8e+05 > 108MPa -> 1.08e+08 > 600TW -> 6e+14 > 3.2Gb -> 3.2e+09 > 53pm -> 5.3e-11 > $8G -> 8e+09 > > Notice that the numbers become longer, harder to read, harder to type, harder to > say, and harder to hear. You've continually repeated this assertion, but I don't buy it. For the general case, exponential notation is easier to read because you can always see exactly what the exponent is as a number. To read SI units, you have to know all the SI prefixes. This may well be common within scientific communities, but to say that it is "easier" is really a bit much. The same is true of "harder to type". "kpc" is three characters; e+5 is also three (note that you don't need to write e+05), and one of those is a number that transparently indicates how many places to move the decimal, whereas all of the letters in "kpc" are opaque unless you already know what the number is meant to represent. If you have concrete evidence (e.g., from actual user experience research) showing that it is across-the-board "easier" to read or type SI prefixes than exponential notation, that would be good to see. In the absence of that, these assertions are just doubling down on the same initial claim, namely that adding SI units to Python would make things more convenient *for those using it to compute with literally-entered quantities in SI units*. I quite agree that that is likely true, but to my mind that is not enough to justify the disruption of adding it at the syntactic level. (Unless, again, you have some actual evidence showing that this particular kind of use of numeric literals occurs in a large proportion of Python code.) > Before we expend any more effort on this topic, let's put aside the question of > how it should be done, or how it will be used after its done, and just focus on > whether we do it at all. Should Python support real numbers specified with SI > scale factors as first class citizens? My current opinion is no. There are lots of things that are common. (Should we include a spellchecker in Python because many people frequently make spelling errors?) The fact that SI units are de rigeur in the physical science community isn't enough. I would want to see some actual attempt to quantify how much benefit there would be in the PYTHON community (which of course includes, but is not limited to, those using Python for physical-science computations). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From python-ideas at shalmirane.com Sun Aug 28 23:29:45 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Sun, 28 Aug 2016 20:29:45 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: <20160829032945.GD29601@kundert.designers-guide.com> On Mon, Aug 29, 2016 at 12:33:16PM +1000, Chris Angelico wrote: > On Mon, Aug 29, 2016 at 11:44 AM, Ken Kundert > wrote: > > When working with a general purpose programming language, the above numbers > > become: > > > > 780kpc -> 7.8e+05 > > 108MPa -> 1.08e+08 > > 600TW -> 6e+14 > > 3.2Gb -> 3.2e+09 > > 53pm -> 5.3e-11 > > $8G -> 8e+09 > > > > Notice that the numbers become longer, harder to read, harder to type, harder to > > say, and harder to hear. > > > > And easier to compare. The SI prefixes are almost consistent in using > uppercase for larger units and lowercase for smaller, but not quite; > and there's no particular pattern in which letter is larger. For > someone who isn't extremely familiar with them, that makes them > completely unordered - which is larger, peta or exa? Which is smaller, > nano or pico? Plus, there's a limit to how far you can go with these > kinds of numbers, currently yotta at e+24. Exponential notation scales > to infinity (to 1e308 in IEEE 64-bit binary floats, but plenty further > in decimal.Decimal - I believe its limit is about 1e+(1e6), and REXX > on OS/2 had a limit of 1e+(1e10) for its arithmetic), remaining > equally readable at all scales. > > So we can't get rid of exponential notation, no matter what happens. > Mathematics cannot usefully handle a system in which we have to > represent large exponents with ridiculous compound scale factors: > > sys.float_info.max = 179.76931348623157*Y*Y*Y*Y*Y*Y*Y*Y*Y*Y*Y*Y*E > > (It's even worse if the Exa collision means you stop at Peta. > 179.76931348623157*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*M, anyone?) > > Which means that these tags are a duplicate way of representing a > specific set of exponents. > Yes, of course. No one is suggesting abandoning exponential notation. I am not suggesting we force people to use SI scale factors, only that we allow them to. What I am suggesting is that we stop saying to them things like 'you must use exponential notation because we have decided that its better. See, you can easily compare the size of numbers by looking at the exponents.' What is wrong with have two ways of doing things? We have many ways of specifying the value of the integer 16: 0b10000, 0o20, 16, 0x10, 16L, .... > > Before we expend any more effort on this topic, let's put aside the question > > of > > how it should be done, or how it will be used after its done, and just focus on > > whether we do it at all. Should Python support real numbers specified with SI > > scale factors as first class citizens? > > Except that those are exactly the important questions to be answered. > How *could* it be done? With the units stripped off, your examples > become: > > 780k == 7.8e+05 == 780*k > 108M == 1.08e+08 == 108*M > 600T == 6e+14 == 600*T > 3.2G == 3.2e+09 == 3.2*G > 53p == 5.3e-11 == 53*p > 8G == 8e+09 == 8*G > > Without any support whatsoever, you can already use the third column > notation, simply by creating this module: > > # si.py > k, M, G, T, P, E, Z, Y = 1e3, 1e6, 1e9, 1e12, 1e15, 1e18, 1e21, 1e24 > m, ?, n, p, f, a, z, y = 1e-3, 1e-6, 1e-9, 1e-12, 1e-15, 1e-18, 1e-21, 1e-24 > u = ? > K = k > > And using it as "from si import *" at the top of your code. Do we see > a lot of code in the wild doing this? "[H]ow it will be used after > it's done" is exactly the question that this would answer. Because by focusing on the implementation details, we miss the big picture. We have already done that, and we ended up going down countless ratholes. > > > Don't Python's users in the scientific and engineering communities deserve > > the same treatment? These are, after all, core communities for Python. > > Yes. That's why we have things like the @ matrix multiplication > operator (because the numeric computational community asked for it), > and %-formatting for bytes strings (because the networking, mainly > HTTP serving, community asked for it). Python *does* have a history of > supporting things that are needed by specific sub-communities of > Python coders. But there first needs to be a demonstrable need. How > much are people currently struggling due to the need to transform > "gigapascal" into "e+9"? Can you show convoluted real-world code that > would be made dramatically cleaner by language support? Can you show code that would have been convoluted if Python had used a library rather than built-in support for hexadecimal numbers? So, in summary, you are suggesting that we tell the scientific and engineering communities that we refuse to provide native support for their preferred way of writing numbers because: 1. our way is better, 2. their way is bad because some uneducated person might see the numbers and not understand them, 3. we already have way of representing numbers that we came up with in the '60s and we simply cannot do another, 4. well we could do it, but we have decided that if you would only adapt to this new way of doing things that we just came up with, then we would not have to do any work, and that is better for us. Oh and this this new way of writing numbers, it only works in the program itself. Your out of luck when it comes to IO. These do not seem like good reasons for not doing this. -Ken From ncoghlan at gmail.com Sun Aug 28 23:32:23 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Aug 2016 13:32:23 +1000 Subject: [Python-ideas] a multiProcess scheduler In-Reply-To: References: Message-ID: On 29 August 2016 at 11:50, Thales filizola costa wrote: > What do you guys think? How to improve it? Is it relevant enough to be > incorporated to std python ? There are actually quite a few distributed schedulers out there (which can expand beyond a single machine), but "python multiprocess scheduler" isn't likely to bring them up in a web search (as when you're limited to a single machine, multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor is generally already good enough). At a Python level, Celery is probably the most popular option for that: http://www.celeryproject.org/ Another well-established option is Kamaelia: http://www.kamaelia.org/Home.html Dask is a more recent alternative specifically focused on computational tasks: http://dask.pydata.org/en/latest/ Once you move outside Python specific tooling, there are even more language independent options to play with, including the likes of Mesos and Kubernetes. Cheers, Nick. P.S. It's a fairly sad indictment of our industry that people think this is a sensible question to ask in developer interviews - the correct answer from a business efficiency perspective is "I wouldn't, I would use an existing open source task scheduler rather than inventing my own", just as the correct answer to "How would you implement a sort algorithm?" from that perspective is "I wouldn't, as the Python standard library's sorting implementation is vastly superior to anything I could come up with in 5 minutes on a whiteboard, and the native sorting capabilities of databases are also excellent". Reimplementing existing software from first principles is a great learning exercise, but it's not particularly relevant to the task of day-to-day software construction in most organisations. (Alternatively, if the answer the interviewer is looking for is "I wouldn't, I would use...", then it may be an unfair "Gotcha!" question, and those aren't cool either, since they expect the interviewee to be able to read the interviewer's mind, rather than just answering the specific question they were asked) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brenbarn at brenbarn.net Sun Aug 28 23:40:58 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Sun, 28 Aug 2016 20:40:58 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829032945.GD29601@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> Message-ID: <57C3AECA.1030207@brenbarn.net> On 2016-08-28 20:29, Ken Kundert wrote: > What is wrong with have two ways of doing things? We have many ways of > specifying the value of the integer 16: 0b10000, 0o20, 16, 0x10, 16L, .... Zen of Python: "There should be one-- and preferably only one --obvious way to do it." If Python didn't have binary or octal notation and someone came here proposing it, I would not support it, for the same reasons I don't support your proposal. If someone proposed eliminating binary or octal notation for Python 4 (or even maybe Python 3.8), I would probably support it for the same reason. Those notations are not useful enough to justify their existence. Hexadecimal is more justifiable as it is far more widely used, but I would be more open to removing hexadecimal than I would be to adding octal. Also, "L" as a long-integer suffix is already gone in Python 3. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From steve at pearwood.info Sun Aug 28 23:45:20 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 29 Aug 2016 13:45:20 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <57C3AB6E.1030107@brenbarn.net> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> Message-ID: <20160829034520.GY26300@ando.pearwood.info> On Sun, Aug 28, 2016 at 08:26:38PM -0700, Brendan Barnwell wrote: > On 2016-08-28 18:44, Ken Kundert wrote: > >When working with a general purpose programming language, the above numbers > >become: > > > > 780kpc -> 7.8e+05 [...] For the record, I don't know what kpc might mean. "kilo pico speed of light"? So I looked it up using units, and it is kilo-parsecs. That demonstrates that unless your audience is intimately familiar with the domain you are working with, adding units (especially units that aren't actually used for anything) adds confusion. Python is not a specialist application targetted at a single domain. It is a general purpose programming language where you can expect a lot of cross-domain people (e.g. a system administrator asked to hack on a script in a domain they know nothing about). > You've continually repeated this assertion, but I don't buy it. For > the general case, exponential notation is easier to read because you can > always see exactly what the exponent is as a number. To read SI units, > you have to know all the SI prefixes. This may well be common within > scientific communities, but to say that it is "easier" is really a bit > much. The same is true of "harder to type". "kpc" is three characters; > e+5 is also three (note that you don't need to write e+05), You don't have to write e+5 either, just e5 is sufficient. > and one of > those is a number that transparently indicates how many places to move > the decimal, whereas all of the letters in "kpc" are opaque unless you > already know what the number is meant to represent. > > If you have concrete evidence (e.g., from actual user experience > research) showing that it is across-the-board "easier" to read or type > SI prefixes than exponential notation, that would be good to see. I completely believe Ken that within a single tightly focussed user community, using their expected conventions (including SI prefixes) works really well. But Python users do not belong to a single tightly focussed user community. -- Steve From ncoghlan at gmail.com Mon Aug 29 00:16:37 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Aug 2016 14:16:37 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829014404.GB29601@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: On 29 August 2016 at 11:44, Ken Kundert wrote: > When working with a general purpose programming language, the above numbers > become: > > 780kpc -> 7.8e+05 > 108MPa -> 1.08e+08 > 600TW -> 6e+14 > 3.2Gb -> 3.2e+09 > 53pm -> 5.3e-11 > $8G -> 8e+09 A better comparison here would be to engineering notation with comments stating the units, since Python doesn't restrict the mantissa to a single integer digit (unlike strict scientific notation): 780kpc -> 780e3 # parsecs 108MPa -> 108e6 # pascals 600TW -> 600e12 # watts 3.2Gb -> 3.2e9 # base pairs 53pm -> 53e-12 # meters $8G -> 8e9 # dollars The fundamental requirements for readable code are: - code authors want their code to be readable - the language makes readable code possible So this starts to look like a style guide recommendation: 1. use engineering notation rather than scientific notation 2. annotate your literals with units if they're not implied by context I find the pressure example a particularly interesting one, as I don't know any meteorologists that work with Pascals directly - they work with hectopascals or kilopascals instead. That would make the second example more likely to be one of: 108e3 # kilopascals 1.08e6 # hectopascals Similarly, depending on what you're doing (and this gets into the "natural unit of work" concept David Mertz raised), your base unit of mass may be micrograms, milligrams, grams, kilograms, or tonnes, and engineering notation lets you freely shift those scaling factors between your literals and your (implied or explicit) units, while native SI scaling would be very confusing if you're working with anything other than the base SI unit. Accordingly, I'm starting to wonder if a better way for us to go here might be to finally introduce the occasionally-discussed-but-never-written Informational PEP that spells out "reserved syntax for Python supersets", where we carve out certain things we plan *NOT* to do with the base language, so folks writing Python supersets (e.g. for electronics design, data analysis or code generation) can use that syntax without needing to worry about future versions of Python potentially treading on their toes. Specifically, in addition to this SI scaling idea, I'm thinking we could document: - the Cython extensions for C code generation in .pyx files (cdef, ctypedef, cimport, nogil, NULL) - the IPython extensions for cell magics and shell invocation (unary '%', unary '!'), and maybe their help syntax (postfix '?') The reason I think this may be worth doing is that some of these ideas are ones that only make sense *given a suitably constrained domain*. Python supersets like Cython and IPython get to constrain their target domain in a way that makes these extensions appropriate there in a way that wouldn't be appropriate at the level of the base language, but we can still be explicit at the base language level that we're not doing certain things because we're delegating them to a tool with a more focused target audience. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Mon Aug 29 00:32:38 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 29 Aug 2016 14:32:38 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <57C3AECA.1030207@brenbarn.net> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <57C3AECA.1030207@brenbarn.net> Message-ID: On Mon, Aug 29, 2016 at 1:40 PM, Brendan Barnwell wrote: > On 2016-08-28 20:29, Ken Kundert wrote: >> >> What is wrong with have two ways of doing things? We have many ways of >> specifying the value of the integer 16: 0b10000, 0o20, 16, 0x10, 16L, .... > > > Zen of Python: "There should be one-- and preferably only one > --obvious way to do it." > > If Python didn't have binary or octal notation and someone came here > proposing it, I would not support it, for the same reasons I don't support > your proposal. If someone proposed eliminating binary or octal notation for > Python 4 (or even maybe Python 3.8), I would probably support it for the > same reason. Those notations are not useful enough to justify their > existence. Hexadecimal is more justifiable as it is far more widely used, > but I would be more open to removing hexadecimal than I would be to adding > octal. > I agree with you on octal - there are very few places where it's the one obvious way to do things, and you can always use int("755",8) if you have data that's represented best octally. But hex is incredibly helpful when you do any sort of bit manipulations, and decimal quickly becomes unwieldy and error-prone. Here's some code from Lib/stat.py: S_IFDIR = 0o040000 # directory S_IFCHR = 0o020000 # character device S_IFBLK = 0o060000 # block device S_IFREG = 0o100000 # regular file S_IFIFO = 0o010000 # fifo (named pipe) S_IFLNK = 0o120000 # symbolic link S_IFSOCK = 0o140000 # socket file These are shown in octal, because Unix file modes are often written in octal. If Python didn't support octal, the obvious alternative would be hex: S_IFDIR = 0x4000 # directory S_IFCHR = 0x2000 # character device S_IFBLK = 0x6000 # block device S_IFREG = 0x8000 # regular file S_IFIFO = 0x1000 # fifo (named pipe) S_IFLNK = 0xA000 # symbolic link S_IFSOCK = 0xC000 # socket file About comparable for these; not as good for the actual permission bits, since there are three blocks of three bits. Python could manage without octal literals, as long as hex literals are available. (I don't support their *removal*, because that's completely unnecessary backward incompatibility; but if Python today didn't have octal support, I wouldn't support its addition.) But the decimal equivalents? No thank you. S_IFDIR = 16384 # directory S_IFCHR = 8192 # character device S_IFBLK = 24756 # block device S_IFREG = 32768 # regular file S_IFIFO = 4096 # fifo (named pipe) S_IFLNK = 40960 # symbolic link S_IFSOCK = 49152 # socket file One of these is wrong. Which one? You know for certain that each of these values has at most two bits set. Can you read these? If you're familiar with the powers of two, you should have no trouble eyeballing the single-bit examples, but what about the others? We need hex constants for anything that involves bitwise manipulations. Having binary constants is nice, but (like with octal) not strictly necessary; but we need at least one out of bin/oct/hex. (Also, 16L doesn't actually mean the integer 16 - it means the *long* integer 16, which is as different from 16 as 16.0 is.) ChrisA From ncoghlan at gmail.com Mon Aug 29 00:35:15 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Aug 2016 14:35:15 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <57C3AECA.1030207@brenbarn.net> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <57C3AECA.1030207@brenbarn.net> Message-ID: On 29 August 2016 at 13:40, Brendan Barnwell wrote: > On 2016-08-28 20:29, Ken Kundert wrote: >> >> What is wrong with have two ways of doing things? We have many ways of >> specifying the value of the integer 16: 0b10000, 0o20, 16, 0x10, 16L, .... > > > Zen of Python: "There should be one-- and preferably only one > --obvious way to do it." > > If Python didn't have binary or octal notation and someone came here > proposing it, I would not support it, for the same reasons I don't support > your proposal. If someone proposed eliminating binary or octal notation for > Python 4 (or even maybe Python 3.8), I would probably support it for the > same reason. Those notations are not useful enough to justify their > existence. Hexadecimal is more justifiable as it is far more widely used, > but I would be more open to removing hexadecimal than I would be to adding > octal. Octal literals were on the Python 3 chopping block, with only two things saving them: - *nix file permissions (i.e. the existing sysadmin user base) - the proposal to switch to "0o" as the prefix The addition of "0b" was to make bitwise operators easier to work with, rather than requiring folks to mentally convert between binary and hexadecimal just to figure out how to set a particular bit flag, with the requirement to understand binary math being seen as an essential requirement for working with computers at the software development level (since it impacts so many things, directly and indirectly). Hexadecimal then sticks around as a way of more concisely writing binary literals However, the readability-as-a-general-purpose-language argument in the case of SI scaling factors goes as follows: - exponential notation (both the scientific and engineering variants) falls into the same "required to understand computers" category as binary and hexadecimal notation - for folks that have memorised the SI scaling factors, the engineering notation equivalents should be just as readable - for folks that have not memorised the SI scaling factors, the engineering notation equivalents are *more* readable - therefore, at the language level, this is a style guide recommendation to use engineering notation for quantitative literals over scientific notation (since engineering notation is easier to mentally convert to SI prefixes) However, once we're talking domain specific languages (like circuit design), rather than a general purpose programming language, then knowledge of the SI prefixes can be included in the assumed set of user knowledge, and made a language level feature. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Mon Aug 29 00:46:44 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 29 Aug 2016 14:46:44 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829032945.GD29601@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> Message-ID: On Mon, Aug 29, 2016 at 1:29 PM, Ken Kundert wrote: > Because by focusing on the implementation details, we miss the big picture. We > have already done that, and we ended up going down countless ratholes. They're important ratholes though. Without digging into those questions, all you have is an emotive argument of "but we NEEEEEEED to support SI prefixes as integer suffixes!". >> > Don't Python's users in the scientific and engineering communities deserve >> > the same treatment? These are, after all, core communities for Python. >> >> Yes. That's why we have things like the @ matrix multiplication >> operator (because the numeric computational community asked for it), >> and %-formatting for bytes strings (because the networking, mainly >> HTTP serving, community asked for it). Python *does* have a history of >> supporting things that are needed by specific sub-communities of >> Python coders. But there first needs to be a demonstrable need. How >> much are people currently struggling due to the need to transform >> "gigapascal" into "e+9"? Can you show convoluted real-world code that >> would be made dramatically cleaner by language support? > > Can you show code that would have been convoluted if Python had used a library > rather than built-in support for hexadecimal numbers? See my other email, with examples of bit flags. It's not too bad if you only ever work with a single bit at a time, but bit masks combine beautifully in binary, fairly cleanly in hex, and really badly in decimal. Hex is a great trade-off between clean bit handling and compact representation. (Octal is roughly the same trade-off, and in days of yore was the one obvious choice, but hex has overtaken it.) > So, in summary, you are suggesting that we tell the scientific and engineering > communities that we refuse to provide native support for their preferred way of > writing numbers because: > 1. our way is better, Is more general, yes. If all you have is SI prefixes, you're badly scuppered. If all you have is exponential notation, you can do everything. > 2. their way is bad because some uneducated person might see the numbers and not > understand them, Is, again, less general. It's a way of writing numbers that makes sense only in a VERY narrow area. > 3. we already have way of representing numbers that we came up with in the '60s > and we simply cannot do another, False. > 4. well we could do it, but we have decided that if you would only adapt to this > new way of doing things that we just came up with, then we would not have to > do any work, and that is better for us. Oh and this this new way of writing > numbers, it only works in the program itself. Your out of luck when it comes > to IO. I'm not sure what you mean by "IO" here, but if you're writing a program that accepts text strings and prints text strings, it's free to do whatever it wants. > These do not seem like good reasons for not doing this. Not worded the way you have them, no, because you've aimed for an extremely emotional argument instead of answering concrete questions like "where's the code that this would improve". Find some real-world code that would truly benefit from this. Show us how it's better. Something that I don't think you've acknowledged is that the SI scaling markers are *prefixes to units*, not *suffixes to numbers*. That is to say, you don't have "132G" of a thing called a "pascal", you have "132" of a thing called a "gigapascal". Outside of a unit-based system, SI prefixes don't really have meaning. I don't remember ever, in a scientific context, seeing a coefficient of friction listed as "30 milli-something"; it's always "0.03". So if unitless values are simply numbers, and Python's ints and floats are unitless, they won't see much benefit from prefixes-on-nothing. ChrisA From thalesfc at gmail.com Mon Aug 29 01:53:40 2016 From: thalesfc at gmail.com (Thales filizola costa) Date: Sun, 28 Aug 2016 22:53:40 -0700 Subject: [Python-ideas] a multiProcess scheduler In-Reply-To: References: Message-ID: Hi Nick, I have just checked all the links you posted, they are indeed very interesting and very efficient. However, I think those are very complicate in terms of installation and setup, and I still see a lot of usages for a multi-process scheduler. 2016-08-28 20:32 GMT-07:00 Nick Coghlan : > On 29 August 2016 at 11:50, Thales filizola costa > wrote: > > What do you guys think? How to improve it? Is it relevant enough to be > > incorporated to std python ? > > There are actually quite a few distributed schedulers out there (which > can expand beyond a single machine), but "python multiprocess > scheduler" isn't likely to bring them up in a web search (as when > you're limited to a single machine, multiprocessing.Pool or > concurrent.futures.ProcessPoolExecutor is generally already good > enough). > > At a Python level, Celery is probably the most popular option for > that: http://www.celeryproject.org/ > > Another well-established option is Kamaelia: http://www.kamaelia.org/Home. > html > > Dask is a more recent alternative specifically focused on > computational tasks: http://dask.pydata.org/en/latest/ > > Once you move outside Python specific tooling, there are even more > language independent options to play with, including the likes of > Mesos and Kubernetes. > > Cheers, > Nick. > > P.S. It's a fairly sad indictment of our industry that people think > this is a sensible question to ask in developer interviews - the > correct answer from a business efficiency perspective is "I wouldn't, > I would use an existing open source task scheduler rather than > inventing my own", just as the correct answer to "How would you > implement a sort algorithm?" from that perspective is "I wouldn't, as > the Python standard library's sorting implementation is vastly > superior to anything I could come up with in 5 minutes on a > whiteboard, and the native sorting capabilities of databases are also > excellent". Reimplementing existing software from first principles is > a great learning exercise, but it's not particularly relevant to the > task of day-to-day software construction in most organisations. > > (Alternatively, if the answer the interviewer is looking for is "I > wouldn't, I would use...", then it may be an unfair "Gotcha!" > question, and those aren't cool either, since they expect the > interviewee to be able to read the interviewer's mind, rather than > just answering the specific question they were asked) > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-ideas at shalmirane.com Mon Aug 29 03:07:58 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Mon, 29 Aug 2016 00:07:58 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829034520.GY26300@ando.pearwood.info> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> Message-ID: <20160829070758.GA19357@kundert.designers-guide.com> On Mon, Aug 29, 2016 at 01:45:20PM +1000, Steven D'Aprano wrote: > On Sun, Aug 28, 2016 at 08:26:38PM -0700, Brendan Barnwell wrote: > > On 2016-08-28 18:44, Ken Kundert wrote: > > >When working with a general purpose programming language, the above numbers > > >become: > > > > > > 780kpc -> 7.8e+05 > [...] > > For the record, I don't know what kpc might mean. "kilo pico speed of > light"? So I looked it up using units, and it is kilo-parsecs. That > demonstrates that unless your audience is intimately familiar with the > domain you are working with, adding units (especially units that aren't > actually used for anything) adds confusion. > > Python is not a specialist application targetted at a single domain. It > is a general purpose programming language where you can expect a lot of > cross-domain people (e.g. a system administrator asked to hack on a > script in a domain they know nothing about). I talked to astrophysicist about your comments, and what she said was: 1. She would love it if Python had built in support for real numbers with SI scale factors 2. I told her about my library for reading and writing numbers with SI scale factors, and she was much less enthusiastic because using it would require convincing the rest of the group, which would be too much effort. 3. She was amused by the "kilo pico speed of light" comment, but she was adamant that the fact that you, or some system administrator, does not understand what kpc means has absolutely no affect on her desired to use SI scale factors. Her comment: I did not write it for him. 4. She pointed out that the software she writes and uses is intended either for herself of other astrophysicists. No system administrators involved. > > You've continually repeated this assertion, but I don't buy it. For > > the general case, exponential notation is easier to read because you can > > always see exactly what the exponent is as a number. To read SI units, > > you have to know all the SI prefixes. This may well be common within > > scientific communities, but to say that it is "easier" is really a bit > > much. The same is true of "harder to type". "kpc" is three characters; > > e+5 is also three (note that you don't need to write e+05), > > You don't have to write e+5 either, just e5 is sufficient. > > > and one of > > those is a number that transparently indicates how many places to move > > the decimal, whereas all of the letters in "kpc" are opaque unless you > > already know what the number is meant to represent. > > > > If you have concrete evidence (e.g., from actual user experience > > research) showing that it is across-the-board "easier" to read or type > > SI prefixes than exponential notation, that would be good to see. > > I completely believe Ken that within a single tightly focussed user > community, using their expected conventions (including SI prefixes) > works really well. But Python users do not belong to a single tightly > focussed user community. You think that Python is only used by generalists? That is silly. Have you seen SciPy? If you think that, take a look at Casa (casaguides.nrao.edu). It is written by astrophysicists for astrophysicists doing observations on radio telescope arrays. That is pretty specialized. -Ken From brenbarn at brenbarn.net Mon Aug 29 03:18:02 2016 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Mon, 29 Aug 2016 00:18:02 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829070758.GA19357@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> Message-ID: <57C3E1AA.5050107@brenbarn.net> On 2016-08-29 00:07, Ken Kundert wrote: >> >I completely believe Ken that within a single tightly focussed user >> >community, using their expected conventions (including SI prefixes) >> >works really well. But Python users do not belong to a single tightly >> >focussed user community. > You think that Python is only used by generalists? That is silly. Have you seen > SciPy? If you think that, take a look at Casa (casaguides.nrao.edu). It is > written by astrophysicists for astrophysicists doing observations on radio > telescope arrays. That is pretty specialized. I think you misunderstand. My position (reiterated by the text you quote from Steven D'Aprano) is not that Python is used only by generalists. It is that we shouldn't change Python in a way that ONLY helps specialists. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From python-ideas at shalmirane.com Mon Aug 29 03:31:57 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Mon, 29 Aug 2016 00:31:57 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: <20160829073156.GC19357@kundert.designers-guide.com> > There are other languages too that had hexadecimal and octal. > > They've been around in programming languages for decades. > > How many languages have scale factors? > > Does Fortran? Not that I know of. > The reason why hexadecimal and octal are in general purpose languages and real numbers with SI scale factors are not is because languages are developed by computer scientists and not by scientists. I keep using SPICE and Verilog as examples of a languages that supports SI scale factors, and that is because they are the extremely rare cases where the languages were either developed or specified by end users and not by computer scientists. The reason why computer scientists tend to add hexadecimal and octal numbers to their languages and not SI scale factors is that they use hexadecimal and octal numbers, and as we have seen by this discussion, are rather unfamiliar with real numbers with SI scale factors. It is easy for them to justify adding hex because they know from personal experience that it is useful, but if you don't use widely scaled real numbers day in and day out it is hard to understand just how tedious exponential notation is and how useful it would be to use SI scale factors. From python-ideas at shalmirane.com Mon Aug 29 03:56:35 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Mon, 29 Aug 2016 00:56:35 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <57C3E1AA.5050107@brenbarn.net> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> <57C3E1AA.5050107@brenbarn.net> Message-ID: <20160829075635.GE19357@kundert.designers-guide.com> On Mon, Aug 29, 2016 at 12:18:02AM -0700, Brendan Barnwell wrote: > On 2016-08-29 00:07, Ken Kundert wrote: > > > >I completely believe Ken that within a single tightly focussed user > > > >community, using their expected conventions (including SI prefixes) > > > >works really well. But Python users do not belong to a single tightly > > > >focussed user community. > > You think that Python is only used by generalists? That is silly. Have you seen > > SciPy? If you think that, take a look at Casa (casaguides.nrao.edu). It is > > written by astrophysicists for astrophysicists doing observations on radio > > telescope arrays. That is pretty specialized. > > I think you misunderstand. My position (reiterated by the text you quote > from Steven D'Aprano) is not that Python is used only by generalists. It is > that we shouldn't change Python in a way that ONLY helps specialists. > But surely we should consider changing Python if the change benefits a wide variety of specialists, especially if the change is small and fits cleanly into the language. In this case, our specialists come from most of the disciplines of science and engineering. That is a pretty big group. -Ken From rosuav at gmail.com Mon Aug 29 04:02:29 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 29 Aug 2016 18:02:29 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829070758.GA19357@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> Message-ID: On Mon, Aug 29, 2016 at 5:07 PM, Ken Kundert wrote: > > I talked to astrophysicist about your comments, and what she said was: > 1. She would love it if Python had built in support for real numbers with SI > scale factors > 2. I told her about my library for reading and writing numbers with SI scale > factors, and she was much less enthusiastic because using it would require > convincing the rest of the group, which would be too much effort. > 3. She was amused by the "kilo pico speed of light" comment, but she was adamant > that the fact that you, or some system administrator, does not understand > what kpc means has absolutely no affect on her desired to use SI scale > factors. Her comment: I did not write it for him. > 4. She pointed out that the software she writes and uses is intended either for > herself of other astrophysicists. No system administrators involved. So can you share some of her code, and show how the ability to scale unitless numbers would help it? ChrisA From python-ideas at shalmirane.com Mon Aug 29 05:08:18 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Mon, 29 Aug 2016 02:08:18 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> Message-ID: <20160829090817.GF19357@kundert.designers-guide.com> > > These do not seem like good reasons for not doing this. > > Not worded the way you have them, no, because you've aimed for an > extremely emotional argument instead of answering concrete questions > like "where's the code that this would improve". Find some real-world > code that would truly benefit from this. Show us how it's better. Sorry, I am trying very hard not to let my emotions show through, and instead provide answers, examples, and comments that are well reasoned and well supported. I do find it frustrating that I appear to be the only one involved in the conversation that has a strong background in numerical computation, meaning that I have to carry one side of the argument without any support. It is especially frustrating when that background is used as a reason to discount my position. Let me try to make the case in an unemotional way. It is hard to justify the need for SI scale factors being built into the language with an example because it is relatively simple to do the conversion. For example ... With built-in support for SI scale factors: h_line = 1.4204GHz print('hline = {:r}Hz'.format(h_line)) ... In Python today: from engfmt import Quantity h_line = Quantity('1.4204GHz') print('hline = {:q}'.format(h_line)) h_line = float(h_line) ... Not really much harder to use the library. This is very similar to the situation with octal numbers ... With built-in support for octal numbers: S_IFREG = 0o100000 # regular file With out built-in support for octal numbers: S_IFREG = int('100000', base=8) # regular file So just giving a simple example is not enough to see the importance of native support. The problem with using a library is that you always have to convert from SI scale factors as the number is input and then converting back as the number is output. So you can spend a fair amount of effort converting too and from representations that support SI scale factors. Not a big deal if there is only a few, but can be burdensome if there is a large number. But the real benefit to building it in a native capability is that it puts pressure on the rest of the ecosystem to also adopt the new way of representing real numbers. For example, now the interchange packages and formats (Pickle, YaML, etc.) need to come up with a way of passing the information without losing its essential character. This in turn puts pressure on other languages to follow suit. It would also put pressure on documenting and formatting packages, such as Sphinx, Jinja, and matplotlib, to adapt. Now it becomes easier to generate clean documentation. Also the interactive environments, such as ipython, need to adapt. The more this occurs, the better life gets for scientists and engineers. > Something that I don't think you've acknowledged is that the SI > scaling markers are *prefixes to units*, not *suffixes to numbers*. > That is to say, you don't have "132G" of a thing called a "pascal", > you have "132" of a thing called a "gigapascal". Outside of a > unit-based system, SI prefixes don't really have meaning. I don't > remember ever, in a scientific context, seeing a coefficient of > friction listed as "30 milli-something"; it's always "0.03". So if > unitless values are simply numbers, and Python's ints and floats are > unitless, they won't see much benefit from prefixes-on-nothing. Yeah, this is why I suggested that we support the ability for users to specify units with the numbers, but it is not a hard and fast rule. Once you add support for SI scale factors, people find them so convenient that they tend to use them whether they are units or not. For example, it is common for circuit designers to specify the gain of an amplifier using SI scale factors even though gain is often unitless. For example, gain=50k. Also, electrical engineers will often drop the units when they are obvious, especially if they are long. For example, it is common to see a resistance specified as 100k. When values are given in a table and all the values in a column have the same units, it is common to give numbers with scale factors by without units to save space. -Ken From tjreedy at udel.edu Mon Aug 29 05:12:39 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 29 Aug 2016 05:12:39 -0400 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829014404.GB29601@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: On 8/28/2016 9:44 PM, Ken Kundert wrote: > The way the scientific and engineering communities predominately write real > numbers is by using SI scale factors. I don't believe it, not with naked scale factors as you have proposed. I have worked in science and I never saw naked scale factors until this proposal. The scale factors are usually attached to units. > These numbers almost always represent > physical quantities, so it is common to write the number with scale factor and > units. The scale factor is part of the unit, and people now learn this in grade school, I presume. > So for example, the distance to Andromeda is 780kpc, the pressure at the > bottom of the Mariana Trench is 108MPa, the total power produced by a typical > hurricane is 600TW, the number of base pairs in human DNA is 3.2Gb, and the Bohr > radius is 53pm. These are all scaled units and to me not relevant to the proposed addition of scale factors without units. At this point I quit reading. -- Terry Jan Reedy From rosuav at gmail.com Mon Aug 29 05:37:28 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 29 Aug 2016 19:37:28 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829090817.GF19357@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <20160829090817.GF19357@kundert.designers-guide.com> Message-ID: On Mon, Aug 29, 2016 at 7:08 PM, Ken Kundert wrote: >> > These do not seem like good reasons for not doing this. >> >> Not worded the way you have them, no, because you've aimed for an >> extremely emotional argument instead of answering concrete questions >> like "where's the code that this would improve". Find some real-world >> code that would truly benefit from this. Show us how it's better. > > Sorry, I am trying very hard not to let my emotions show through, and instead > provide answers, examples, and comments that are well reasoned and well > supported. An "emotional argument" doesn't necessarily mean that your emotions are governing everything - it's more that your line of reasoning is to play on other people's emotions, rather than on concrete data. Your primary argument has been "But think of all the scientific developers - don't you care about them??", without actually giving us code to work with. (In a recent post, you did at least give notes from a conversation with a lady of hard science, but without any of her code, we still can't evaluate the true benefit of unitless SI scaling.) > I do find it frustrating that I appear to be the only one involved in > the conversation that has a strong background in numerical computation, meaning > that I have to carry one side of the argument without any support. It is > especially frustrating when that background is used as a reason to discount my > position. Let me try to make the case in an unemotional way. No no no; your background is not a reason to discount your position. However, on its own, it's insufficient justification for your position. Suppose I come to python-ideas and say "Hey, the MUD community would really benefit from a magic decoder that would use UTF-8 where possible, ISO-8859-1 as fall-back, and Windows-1252 for characters not in 8859-1". Apart from responding that 8859-1 is a complete subset of 1252, there's not really a lot that you could discuss about that proposal, unless I were to show you some of my code. I can tell you about the number of MUDs that I play, the number of MUD clients that I've written, and some stats from my MUD server, and say "The MUD community needs this support", but it's of little value compared to actual code. (For the record, a two-step decode of "UTF-8, fall back on 1252" is exactly what I do... in half a dozen lines of code. So this does NOT need to be implemented.) That's why I keep asking you for code examples. Real-world code, taken from important projects, that would be significantly improved by this proposal. It has to be Python 3 compatible (unless you reckon that this is the killer feature that will make people take the jump from 2.7), and it has to be enough of an improvement that its authors will be willing to drop support for <3.6 (which might be a trivial concern, eg if the author expects to be the only person running the code). > So just giving a simple example is not enough to see the importance of native > support. The problem with using a library is that you always have to convert > from SI scale factors as the number is input and then converting back as the > number is output. So you can spend a fair amount of effort converting too and > from representations that support SI scale factors. Not a big deal if there is > only a few, but can be burdensome if there is a large number. Maybe; or maybe you'd already be doing a lot of I/O work, and it's actually quite trivial to slap in one little function call at one call site, and magically apply it to everything you do. Without code, we can't know. (I sound like a broken record here.) > But the real > benefit to building it in a native capability is that it puts pressure on the > rest of the ecosystem to also adopt the new way of representing real numbers. > For example, now the interchange packages and formats (Pickle, YaML, etc.) need > to come up with a way of passing the information without losing its essential > character. Ewwww, I doubt it. That would either mean a whole lot of interchange formats would need to have a whole new form of support. You didn't mention JSON, but that's a lot more common than Pickle; and JSON is unlikely to gain a feature like this unless ECMAScript does (because JSON is supposed to be a strict subset of JavaScript's Object Notation). Pickle - at least the one in Python - doesn't need any way to store non-semantic information, so unless you intend for the scale factor to be a fundamental part of the number, it won't need changes. (People don't manually edit pickle files the way they edit JSON files.) YAML might need to be enhanced. That's the only one I can think of. And it's a big fat MIGHT. > This in turn puts pressure on other languages to follow suit. It > would also put pressure on documenting and formatting packages, such as Sphinx, > Jinja, and matplotlib, to adapt. Now it becomes easier to generate clean > documentation. Also the interactive environments, such as ipython, need to > adapt. The more this occurs, the better life gets for scientists and engineers. Eww eww eww. You're now predicating your benefit on a huge number of different groups gaining support for this. To what extent do they need to take notice of the SI scales on numbers? Input-only? Output? Optionally on output? How do you control this? Are SI-scaled numbers somehow different from raw numbers, or are they equivalent forms (like 1.1e3 and 1100.0)? If they're equivalent forms, how do you decide how to represent on output? Are all these questions to be answered globally, across all systems, or is it okay for one group to decide one thing and another another? >> Something that I don't think you've acknowledged is that the SI >> scaling markers are *prefixes to units*, not *suffixes to numbers*. >> That is to say, you don't have "132G" of a thing called a "pascal", >> you have "132" of a thing called a "gigapascal". Outside of a >> unit-based system, SI prefixes don't really have meaning. I don't >> remember ever, in a scientific context, seeing a coefficient of >> friction listed as "30 milli-something"; it's always "0.03". So if >> unitless values are simply numbers, and Python's ints and floats are >> unitless, they won't see much benefit from prefixes-on-nothing. > > Yeah, this is why I suggested that we support the ability for users to specify > units with the numbers, but it is not a hard and fast rule. Once you add support > for SI scale factors, people find them so convenient that they tend to use them > whether they are units or not. For example, it is common for circuit designers > to specify the gain of an amplifier using SI scale factors even though gain is > often unitless. For example, gain=50k. Also, electrical engineers will often > drop the units when they are obvious, especially if they are long. For example, > it is common to see a resistance specified as 100k. When values are given in > a table and all the values in a column have the same units, it is common to give > numbers with scale factors by without units to save space. Gain of 50k, that makes reasonable sense. Resistance as 100k is a shorthand for "100 kiloohms", and it's still fundamentally a unit-bearing value. All of this would be fine if you were building a front-end that was designed *SOLELY* for electrical engineers. So maybe that's the best way. Fork IDLE or iPython and build the very best electrical engineering interactive Python; it doesn't matter, then, how crazy it is for everyone else. You can mess with stuff on the way in and on the way out, you can interpret numbers as unitless values despite being written as "100kPa", and you can figure out what to do in all the edge cases based on actual real-world usage. You'd have your own rules for backward compatibility, rather than being bound by Python's release model (18 months between feature improvements, and nothing gets dropped without a VERY good reason), so you could chop and change as you have need. The base language would still be Python, but it'd be so optimized for electrical engineers that you'd never want to go back to vanilla CPython 3.6. Sound doable? ChrisA From ncoghlan at gmail.com Mon Aug 29 07:13:12 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Aug 2016 21:13:12 +1000 Subject: [Python-ideas] a multiProcess scheduler In-Reply-To: References: Message-ID: On 29 August 2016 at 15:53, Thales filizola costa wrote: > Hi Nick, > > I have just checked all the links you posted, they are indeed very > interesting and very efficient. However, I think those are very complicate > in terms of installation and setup, and I still see a lot of usages for a > multi-process scheduler. Potentially, but one of the big challenges you'll face is to establish how it differs from using asyncio in the current process to manage tasks dispatched to other processes via run_in_executor, and when specifically it would be useful thing for a developer to have in the builtin toolkit (vs being something they can install from PyPI). Don't get me wrong, I think it's really cool that you were able to implement this - there's just a big gap between "implementing this was useful to me" and "this is sufficiently useful in a wide enough range of cases not otherwise addressed by the standard library that it should be added as a new standard application building block". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Aug 29 07:38:00 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Aug 2016 21:38:00 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829090817.GF19357@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <20160829090817.GF19357@kundert.designers-guide.com> Message-ID: On 29 August 2016 at 19:08, Ken Kundert wrote: > Also the interactive environments, such as ipython, need to > adapt. The more this occurs, the better life gets for scientists and engineers. This theory of change is backwards - we follow IPython and Project Jupyter when it comes to understanding what's a desirable UX for scientists (primarily) and engineers (somewhat), rather than the other way around. (Ditto for SciPy and Numpy for the computational requirements side of things - in addition to the already referenced https://www.python.org/dev/peps/pep-0465/ for matrix multiplication, there's also https://www.python.org/dev/peps/pep-0357/ which defined the __index__ protocol, and https://www.python.org/dev/peps/pep-3118/ which defined a rich C-level protocol for shaped data export. Even before there was a PEP process, extended slicing and the Ellipsis literal were added for the benefits of folks writing multidimensional array indexing libraries) So if your aim is to make a "scientists & engineers will appreciate it" UX argument, then you're unlikely to gain much traction here if you haven't successfully made that argument in the Project Jupyter and/or SciPy ecosystems first - if there was a popular "%%siunits" cell magic, or a custom Project Jupyter kernel that added support for SI literals, we'd be having a very different discussion (and you wouldn't feel so alone in making the case for the feature). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mertz at gnosis.cx Mon Aug 29 07:46:36 2016 From: mertz at gnosis.cx (David Mertz) Date: Mon, 29 Aug 2016 04:46:36 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829090817.GF19357@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <20160829090817.GF19357@kundert.designers-guide.com> Message-ID: I teach working scientists about numeric computing on a daily basis. There are a few special field where Ken's ideas are the norm, at least in informal notation. The large majority of working scientists would find a syntax change like he proposes an annoyance and nuisance. Alienating and confusing everyone who isn't a circuit designer is a bad goal. It's not going to happen in python. If you really want this syntax, you need to use a different language, or maybe write a preprocessor that turns a slightly different language back into Python. On Aug 29, 2016 4:09 AM, "Ken Kundert" wrote: > > > These do not seem like good reasons for not doing this. > > > > Not worded the way you have them, no, because you've aimed for an > > extremely emotional argument instead of answering concrete questions > > like "where's the code that this would improve". Find some real-world > > code that would truly benefit from this. Show us how it's better. > > Sorry, I am trying very hard not to let my emotions show through, and > instead > provide answers, examples, and comments that are well reasoned and well > supported. I do find it frustrating that I appear to be the only one > involved in > the conversation that has a strong background in numerical computation, > meaning > that I have to carry one side of the argument without any support. It is > especially frustrating when that background is used as a reason to > discount my > position. Let me try to make the case in an unemotional way. > > It is hard to justify the need for SI scale factors being built into the > language with an example because it is relatively simple to do the > conversion. > For example ... > > With built-in support for SI scale factors: > > h_line = 1.4204GHz > print('hline = {:r}Hz'.format(h_line)) > ... > > In Python today: > > from engfmt import Quantity > h_line = Quantity('1.4204GHz') > print('hline = {:q}'.format(h_line)) > h_line = float(h_line) > ... > > Not really much harder to use the library. This is very similar to the > situation > with octal numbers ... > > With built-in support for octal numbers: > > S_IFREG = 0o100000 # regular file > > With out built-in support for octal numbers: > > S_IFREG = int('100000', base=8) # regular file > > So just giving a simple example is not enough to see the importance of > native > support. The problem with using a library is that you always have to > convert > from SI scale factors as the number is input and then converting back as > the > number is output. So you can spend a fair amount of effort converting too > and > from representations that support SI scale factors. Not a big deal if > there is > only a few, but can be burdensome if there is a large number. But the real > benefit to building it in a native capability is that it puts pressure on > the > rest of the ecosystem to also adopt the new way of representing real > numbers. > For example, now the interchange packages and formats (Pickle, YaML, etc.) > need > to come up with a way of passing the information without losing its > essential > character. This in turn puts pressure on other languages to follow suit. > It > would also put pressure on documenting and formatting packages, such as > Sphinx, > Jinja, and matplotlib, to adapt. Now it becomes easier to generate clean > documentation. Also the interactive environments, such as ipython, need to > adapt. The more this occurs, the better life gets for scientists and > engineers. > > > > Something that I don't think you've acknowledged is that the SI > > scaling markers are *prefixes to units*, not *suffixes to numbers*. > > That is to say, you don't have "132G" of a thing called a "pascal", > > you have "132" of a thing called a "gigapascal". Outside of a > > unit-based system, SI prefixes don't really have meaning. I don't > > remember ever, in a scientific context, seeing a coefficient of > > friction listed as "30 milli-something"; it's always "0.03". So if > > unitless values are simply numbers, and Python's ints and floats are > > unitless, they won't see much benefit from prefixes-on-nothing. > > Yeah, this is why I suggested that we support the ability for users to > specify > units with the numbers, but it is not a hard and fast rule. Once you add > support > for SI scale factors, people find them so convenient that they tend to use > them > whether they are units or not. For example, it is common for circuit > designers > to specify the gain of an amplifier using SI scale factors even though > gain is > often unitless. For example, gain=50k. Also, electrical engineers will > often > drop the units when they are obvious, especially if they are long. For > example, > it is common to see a resistance specified as 100k. When values are given > in > a table and all the values in a column have the same units, it is common > to give > numbers with scale factors by without units to save space. > > -Ken > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Mon Aug 29 08:10:01 2016 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 29 Aug 2016 13:10:01 +0100 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829014404.GB29601@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: On 29/08/2016 02:44, Ken Kundert wrote: > > Changing Python so that it understands SI scale factors on real numbers as first > class citizens innately requires a change to the base language; it cannot be > done solely through libraries. The question before you is, should we do it? > No, no, no, if the people who provide this http://www.scipy.org/ can do without it. Now would you please be kind enough to give up with this dead horse before I take a ride to the Clifton Suspension Bridge or Beachy Head, whichever is closest. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From stephanh42 at gmail.com Mon Aug 29 08:35:26 2016 From: stephanh42 at gmail.com (Stephan Houben) Date: Mon, 29 Aug 2016 14:35:26 +0200 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: Note that the Sage computer algebra system uses Python with some syntactic changes implemented by a "pre-parser". The current proposal could be implemented in a similar way and then integrated in, say, Ipython. If it would prove to be wildly popular, then it would make a stronger case for incorporation in the core. Stephan Op 29 aug. 2016 2:12 p.m. schreef "Mark Lawrence via Python-ideas" < python-ideas at python.org>: > On 29/08/2016 02:44, Ken Kundert wrote: > >> >> Changing Python so that it understands SI scale factors on real numbers >> as first >> class citizens innately requires a change to the base language; it cannot >> be >> done solely through libraries. The question before you is, should we do >> it? >> >> > No, no, no, if the people who provide this http://www.scipy.org/ can do > without it. > > Now would you please be kind enough to give up with this dead horse before > I take a ride to the Clifton Suspension Bridge or Beachy Head, whichever is > closest. > > -- > My fellow Pythonistas, ask not what our language can do for you, ask > what you can do for our language. > > Mark Lawrence > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Mon Aug 29 08:55:13 2016 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 29 Aug 2016 13:55:13 +0100 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: On 29/08/2016 13:35, Stephan Houben wrote: > Note that the Sage computer algebra system uses Python with some > syntactic changes implemented by a "pre-parser". > > The current proposal could be implemented in a similar way and then > integrated in, say, Ipython. > > If it would prove to be wildly popular, then it would make a stronger > case for incorporation in the core. > > Stephan > > > Op 29 aug. 2016 2:12 p.m. schreef "Mark Lawrence via Python-ideas" > >: > > On 29/08/2016 02:44, Ken Kundert wrote: > > > Changing Python so that it understands SI scale factors on real > numbers as first > class citizens innately requires a change to the base language; > it cannot be > done solely through libraries. The question before you is, > should we do it? > > No, no, no, if the people who provide this http://www.scipy.org/ can > do without it. > > Now would you please be kind enough to give up with this dead horse > before I take a ride to the Clifton Suspension Bridge or Beachy > Head, whichever is closest. > As iPython is a core part of scipy, which I linked above, why would the developers want to incorporate this suggestion? I'd have also thought that if this idea was to be "wildly popular" it would have been done years ago. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From erik.m.bray at gmail.com Mon Aug 29 09:05:50 2016 From: erik.m.bray at gmail.com (Erik Bray) Date: Mon, 29 Aug 2016 15:05:50 +0200 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829070758.GA19357@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> Message-ID: On Mon, Aug 29, 2016 at 9:07 AM, Ken Kundert wrote: > On Mon, Aug 29, 2016 at 01:45:20PM +1000, Steven D'Aprano wrote: >> On Sun, Aug 28, 2016 at 08:26:38PM -0700, Brendan Barnwell wrote: >> > On 2016-08-28 18:44, Ken Kundert wrote: >> > >When working with a general purpose programming language, the above numbers >> > >become: >> > > >> > > 780kpc -> 7.8e+05 >> [...] >> >> For the record, I don't know what kpc might mean. "kilo pico speed of >> light"? So I looked it up using units, and it is kilo-parsecs. That >> demonstrates that unless your audience is intimately familiar with the >> domain you are working with, adding units (especially units that aren't >> actually used for anything) adds confusion. >> >> Python is not a specialist application targetted at a single domain. It >> is a general purpose programming language where you can expect a lot of >> cross-domain people (e.g. a system administrator asked to hack on a >> script in a domain they know nothing about). > > I talked to astrophysicist about your comments, and what she said was: > 1. She would love it if Python had built in support for real numbers with SI > scale factors > 2. I told her about my library for reading and writing numbers with SI scale > factors, and she was much less enthusiastic because using it would require > convincing the rest of the group, which would be too much effort. > 3. She was amused by the "kilo pico speed of light" comment, but she was adamant > that the fact that you, or some system administrator, does not understand > what kpc means has absolutely no affect on her desired to use SI scale > factors. Her comment: I did not write it for him. > 4. She pointed out that the software she writes and uses is intended either for > herself of other astrophysicists. No system administrators involved. Astropy also has a very powerful units package--originally derived from pyunit I think but long since diverged and grown: http://docs.astropy.org/en/stable/units/index.html It was originally developed especially for astronomy/astrophysics use and has some pre-defined units that many other packages don't have, as well as support for logarithmic units like decibel and optional (and customizeable) unit equivalences (e.g. frequency/wavelength or flux/power). That said, its power extends beyond astronomy and I heard through last week's EuroScipy that even some biology people have been using it. There's been some (informal) talk about splitting it out from Astropy into a stand-alone package. This is tricky since almost everything in Astropy has been built around it (dimensional calculations are always used where possible), but not impossible. One of the other big advantages of astropy.units is the Quantity class representing scale+dimension values. This is deeply integrated into Numpy so that units can be attached to Numpy arrays, and all Numpy ufuncs can operate on them in a dimensionally meaningful way. The needs for this have driven a number of recent features in Numpy. This is work that, unfortunately, could never be integrated into the Python stdlib. From ncoghlan at gmail.com Mon Aug 29 09:07:33 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Aug 2016 23:07:33 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: On 29 August 2016 at 22:55, Mark Lawrence via Python-ideas wrote: > As iPython is a core part of scipy, which I linked above, why would the > developers want to incorporate this suggestion? I'd have also thought that > if this idea was to be "wildly popular" it would have been done years ago. While "If this was a good idea, it would have been done years ago" is a useful rule of thumb, it's also useful to look for ways to test that heuristic to see if there were actually just other incidental barriers in the way of broader adoption. That's particularly so for cases like this one, where a common practice in a handful of domains has failed to make the leap into the broader computing context of general purpose programming. One of the nice things about IPython for this kind of experimentation is that it's *significantly* more pluggable than the default interpreter (where you can do some interesting things with import hooks, but it's hard to change the default REPL). That means it's easier for people to try out ideas as code pre-processors, rather than as full Python implementations (in this case, something that translated SI unit suffixes into runtime scaling factors) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From erik.m.bray at gmail.com Mon Aug 29 09:08:04 2016 From: erik.m.bray at gmail.com (Erik Bray) Date: Mon, 29 Aug 2016 15:08:04 +0200 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> Message-ID: On Mon, Aug 29, 2016 at 3:05 PM, Erik Bray wrote: > On Mon, Aug 29, 2016 at 9:07 AM, Ken Kundert > wrote: >> On Mon, Aug 29, 2016 at 01:45:20PM +1000, Steven D'Aprano wrote: >>> On Sun, Aug 28, 2016 at 08:26:38PM -0700, Brendan Barnwell wrote: >>> > On 2016-08-28 18:44, Ken Kundert wrote: >>> > >When working with a general purpose programming language, the above numbers >>> > >become: >>> > > >>> > > 780kpc -> 7.8e+05 >>> [...] >>> >>> For the record, I don't know what kpc might mean. "kilo pico speed of >>> light"? So I looked it up using units, and it is kilo-parsecs. That >>> demonstrates that unless your audience is intimately familiar with the >>> domain you are working with, adding units (especially units that aren't >>> actually used for anything) adds confusion. >>> >>> Python is not a specialist application targetted at a single domain. It >>> is a general purpose programming language where you can expect a lot of >>> cross-domain people (e.g. a system administrator asked to hack on a >>> script in a domain they know nothing about). >> >> I talked to astrophysicist about your comments, and what she said was: >> 1. She would love it if Python had built in support for real numbers with SI >> scale factors >> 2. I told her about my library for reading and writing numbers with SI scale >> factors, and she was much less enthusiastic because using it would require >> convincing the rest of the group, which would be too much effort. >> 3. She was amused by the "kilo pico speed of light" comment, but she was adamant >> that the fact that you, or some system administrator, does not understand >> what kpc means has absolutely no affect on her desired to use SI scale >> factors. Her comment: I did not write it for him. >> 4. She pointed out that the software she writes and uses is intended either for >> herself of other astrophysicists. No system administrators involved. > > Astropy also has a very powerful units package--originally derived > from pyunit I think but long since diverged and grown: > > http://docs.astropy.org/en/stable/units/index.html > > It was originally developed especially for astronomy/astrophysics use > and has some pre-defined units that many other packages don't have, as > well as support for logarithmic units like decibel and optional (and > customizeable) unit equivalences (e.g. frequency/wavelength or > flux/power). > > That said, its power extends beyond astronomy and I heard through last > week's EuroScipy that even some biology people have been using it. > There's been some (informal) talk about splitting it out from Astropy > into a stand-alone package. This is tricky since almost everything in > Astropy has been built around it (dimensional calculations are always > used where possible), but not impossible. > > One of the other big advantages of astropy.units is the Quantity class > representing scale+dimension values. This is deeply integrated into > Numpy so that units can be attached to Numpy arrays, and all Numpy > ufuncs can operate on them in a dimensionally meaningful way. The > needs for this have driven a number of recent features in Numpy. This > is work that, unfortunately, could never be integrated into the Python > stdlib. I'll also add that syntactic support for units has rarely been an issue in Astropy. The existing algebraic rules for units work fine with Python's existing order of operations. It can be *nice* to be able to write "1m" instead of "1 * m" but ultimately it doesn't add much for clarity (and if really desired could be handled with a preparser--something I've considered adding for Astropy sources (via codecs). Best, Erik From erik.m.bray at gmail.com Mon Aug 29 09:13:31 2016 From: erik.m.bray at gmail.com (Erik Bray) Date: Mon, 29 Aug 2016 15:13:31 +0200 Subject: [Python-ideas] A proposal to rename the term "duck typing" In-Reply-To: References: <1472401456l.14614654l.0l@psu.edu> Message-ID: On Sun, Aug 28, 2016 at 7:41 PM, Bruce Leban wrote: > > > On Sunday, August 28, 2016, ROGER GRAYDON CHRISTMAN wrote: >> >> >> We have a term in our lexicon "duck typing" that traces its origins, in >> part to a quote along the lines of >> "If it walks like a duck, and talks like a duck, ..." >> >> ... >> >> In that case, it would be far more appropriate for use to call this sort >> of type analysis "witch typing" > > > I believe the duck is out of the bag on this one. First the "duck test" that > you quote above is over 100 years old. > https://en.m.wikipedia.org/wiki/Duck_test So that's entrenched. > > Second this isn't a Python-only term anymore and language is notoriously > hard to change prescriptively. > > Third I think the duck test is more appropriate than the witch test which > involves the testers faking the results. Agreed. It's also fairly problematic given that you're deriving the term from a sketch about witch hunts. While the Monty Python sketch is hilarious and, it's the ignorant mob that's the butt of the joke rather than the "witch", this joke doesn't necessarily play well universally, especially given that there places today where women are being killed for being "witches". Best, Erik From rosuav at gmail.com Mon Aug 29 09:18:28 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 29 Aug 2016 23:18:28 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: On Mon, Aug 29, 2016 at 10:55 PM, Mark Lawrence via Python-ideas wrote: > I'd have also thought that if this idea was to be "wildly popular" it would > have been done years ago. Here's my question, though, if you want to see the lanterns so badly, why haven't you gone before? -- Flynn Rider, to Rapunzel There are a good few reasons, one of which is simply "nobody's actually done the work to implement it". And a lot of end users might be excited to use something if it were implemented, but wouldn't think to ask for it if nobody mentioned it. Spotting the feature that isn't there is a pretty hard thing to do (unless you're comparing two products and can say "what I want is program X but with feature Q from program Y"). ChrisA From steve at pearwood.info Mon Aug 29 09:20:24 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 29 Aug 2016 23:20:24 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> Message-ID: <20160829132024.GA26300@ando.pearwood.info> On Mon, Aug 29, 2016 at 02:35:26PM +0200, Stephan Houben wrote: > Note that the Sage computer algebra system uses Python with some syntactic > changes implemented by a "pre-parser". > > The current proposal could be implemented in a similar way and then > integrated in, say, Ipython. > > If it would prove to be wildly popular, then it would make a stronger case > for incorporation in the core. Indeed. My own personal feeling is that eventually unit tracking and dimensional checking will be considered as mainstream as garbage collection and type checking. But I don't think Python should try to blaze this trail, especially not with the current proposal. In the meantime, if Ken is right about this being of interest to scientists, Sage and IPython would be the most likely places to start. -- Steve From srkunze at mail.de Mon Aug 29 15:30:03 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 29 Aug 2016 21:30:03 +0200 Subject: [Python-ideas] a multiProcess scheduler In-Reply-To: References: Message-ID: On 29.08.2016 05:32, Nick Coghlan wrote: > (Alternatively, if the answer the interviewer is looking for is "I > wouldn't, I would use...", then it may be an unfair "Gotcha!" > question, and those aren't cool either, since they expect the > interviewee to be able to read the interviewer's mind, rather than > just answering the specific question they were asked) That at least would have been my response. ;) On the other hand, I would rather discuss why one would ever implement that instead of re-using it. Sven From srkunze at mail.de Mon Aug 29 16:45:14 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 29 Aug 2016 22:45:14 +0200 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <57C3AECA.1030207@brenbarn.net> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <57C3AECA.1030207@brenbarn.net> Message-ID: <8d1fa479-3e4b-ea1e-fbb2-0c273763d339@mail.de> On 29.08.2016 05:40, Brendan Barnwell wrote: > On 2016-08-28 20:29, Ken Kundert wrote: >> What is wrong with have two ways of doing things? We have many ways of >> specifying the value of the integer 16: 0b10000, 0o20, 16, 0x10, 16L, >> .... > > Zen of Python: "There should be one-- and preferably only one > --obvious way to do it." > > If Python didn't have binary or octal notation and someone came > here proposing it, I would not support it, for the same reasons I > don't support your proposal. If someone proposed eliminating binary > or octal notation for Python 4 (or even maybe Python 3.8), I would > probably support it for the same reason. Those notations are not > useful enough to justify their existence. Hexadecimal is more > justifiable as it is far more widely used, but I would be more open to > removing hexadecimal than I would be to adding octal. > > Also, "L" as a long-integer suffix is already gone in Python 3. > And now we have '_' in numbers. So much for "preferably one way". Sven From srkunze at mail.de Mon Aug 29 16:55:28 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 29 Aug 2016 22:55:28 +0200 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <20160829090817.GF19357@kundert.designers-guide.com> Message-ID: On 29.08.2016 11:37, Chris Angelico wrote: > That's why I keep asking you for code examples. Real-world code, taken > from important projects, that would be significantly improved by this > proposal. There was no reasonable real-world code examples taken from important projects, that would be significantly improved by underscores in numbers. Still, we got them, so your argument here is void. > It has to be Python 3 compatible (unless you reckon that > this is the killer feature that will make people take the jump from > 2.7), and it has to be enough of an improvement that its authors will > be willing to drop support for <3.6 (which might be a trivial concern, > eg if the author expects to be the only person running the code). All of those "has to be"s are optional (cf. underscores in numbers) and that's not different for this proposal. Sven From eric at trueblade.com Mon Aug 29 17:12:59 2016 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 29 Aug 2016 17:12:59 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: Message-ID: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> On 8/23/2016 8:18 AM, Nick Coghlan wrote: > On 21 August 2016 at 03:32, Eric V. Smith wrote: >> If anything, I'd make it an error to have any backslashes inside the >> brackets of an f-string for 3.6. We could always remove this restriction at >> a later date. > > +1 for this if you can find a way to do it - it eliminates the > problematic cases where the order of evaluation makes a difference, > and ensures the parts within the braces can be reliably processed as > normal Python code. I've been looking at this, and I agree it's the best thing to do, for now (and possibly forever). I'm just not convinced I can get it done before alpha 1. Assuming I can get the coding done, I think I should update PEP 498 to say there can be no backslashes inside the curly braces. That's my preferred outcome. If I can't get it done by alpha 1, then I think the options are: 1. Leave f-strings as they are now, and that's how they'll always be. 2. Leave f-strings as they are now, but mark them as provisional and warn people that the backslash restrictions will show up in an upcoming release. 3. Disallow any backslashes anywhere in f-strings for 3.6, and relax the restriction in 3.7 to make it only inside braces where the restriction is enforced. 4. Remove f-strings from 3.6, and add them in 3.7 with the "no backslash inside braces" restriction. I'm not wild about 2: people will ignore this and will write code that will break in 3.7. I'm also not wild about 3, since it's too restrictive. I'm open to suggestions. Eric. From srkunze at mail.de Mon Aug 29 17:16:22 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 29 Aug 2016 23:16:22 +0200 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829073156.GC19357@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829073156.GC19357@kundert.designers-guide.com> Message-ID: I didn't follow the previous discussion so far, so excuse me if I repeat something somebody already mentioned. But these are intriguing points you made here. On 29.08.2016 09:31, Ken Kundert wrote: > The reason why hexadecimal and octal are in general purpose languages and real > numbers with SI scale factors are not is because languages are developed by > computer scientists and not by scientists. I keep using SPICE and Verilog as > examples of a languages that supports SI scale factors, and that is because they > are the extremely rare cases where the languages were either developed or > specified by end users and not by computer scientists. > > The reason why computer scientists tend to add hexadecimal and octal numbers to > their languages and not SI scale factors is that they use hexadecimal and octal > numbers, and as we have seen by this discussion, are rather unfamiliar with real > numbers with SI scale factors. It is easy for them to justify adding hex because > they know from personal experience that it is useful, but if you don't use > widely scaled real numbers day in and day out it is hard to understand just how > tedious exponential notation is and how useful it would be to use SI scale > factors. I didn't know that THERE ARE languages that already feature SI factors. You could be right about their development. I for one wouldn't have an issue with this being in Python for the following reasons: 1) I wouldn't use it as I don't have the use-cases right now 2) if I would need to read such code, it wouldn't hurt my reading experience as I am used to SI 3) there will be two classes of code here: a) code that has use for it and thus uses it quite extensively and code that doesn't; depending on where you work you will encounter this feature or you don't even know it exists (this is true for many features in Python which is a good thing: each domain should use what is the best tool for them) The biggest issue I have is the following: SI scale factors without SI units do not make much sense, I think (especially considering those syntax changes). So, the potential, if any, can only illustrated in combination with them. But Python does not feature any SI units so far as those are provided by external packages. If you can resolve that I am +1 on this proposal, but otherwise just +0. Sven PS: If I think about it this way, I might have a use-case in a small side-project. From eric at trueblade.com Mon Aug 29 17:16:16 2016 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 29 Aug 2016 17:16:16 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> Message-ID: Oops, I meant beta 1 where I said alpha 1. Eric. On 8/29/2016 5:12 PM, Eric V. Smith wrote: > On 8/23/2016 8:18 AM, Nick Coghlan wrote: >> On 21 August 2016 at 03:32, Eric V. Smith wrote: >>> If anything, I'd make it an error to have any backslashes inside the >>> brackets of an f-string for 3.6. We could always remove this >>> restriction at >>> a later date. >> >> +1 for this if you can find a way to do it - it eliminates the >> problematic cases where the order of evaluation makes a difference, >> and ensures the parts within the braces can be reliably processed as >> normal Python code. > > I've been looking at this, and I agree it's the best thing to do, for > now (and possibly forever). > > I'm just not convinced I can get it done before alpha 1. Assuming I can > get the coding done, I think I should update PEP 498 to say there can be > no backslashes inside the curly braces. That's my preferred outcome. > > If I can't get it done by alpha 1, then I think the options are: > 1. Leave f-strings as they are now, and that's how they'll always be. > 2. Leave f-strings as they are now, but mark them as provisional and > warn people that the backslash restrictions will show up in an > upcoming release. > 3. Disallow any backslashes anywhere in f-strings for 3.6, and relax the > restriction in 3.7 to make it only inside braces where the > restriction is enforced. > 4. Remove f-strings from 3.6, and add them in 3.7 with the "no backslash > inside braces" restriction. > > I'm not wild about 2: people will ignore this and will write code that > will break in 3.7. I'm also not wild about 3, since it's too restrictive. > > I'm open to suggestions. > > Eric. > From ethan at stoneleaf.us Mon Aug 29 17:26:24 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 29 Aug 2016 14:26:24 -0700 Subject: [Python-ideas] =?windows-1252?q?Let=92s_make_escaping_in_f-liter?= =?windows-1252?q?als_impossible?= In-Reply-To: References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> Message-ID: <57C4A880.8090102@stoneleaf.us> On 08/29/2016 02:16 PM, Eric V. Smith wrote: >> I've been looking at this, and I agree it's the best thing to do, for >> now (and possibly forever). >> >> I'm just not convinced I can get it done before alpha 1. Isn't the f-string feature already in place? Update the PEP, then it's a bugfix. ;) -- ~Ethan~ From alex.rudy at gmail.com Mon Aug 29 17:33:28 2016 From: alex.rudy at gmail.com (Alex Rudy) Date: Mon, 29 Aug 2016 14:33:28 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> Message-ID: > On Aug 29, 2016, at 06:08, Erik Bray wrote: > > On Mon, Aug 29, 2016 at 3:05 PM, Erik Bray > wrote: >> On Mon, Aug 29, 2016 at 9:07 AM, Ken Kundert >> wrote: >>> On Mon, Aug 29, 2016 at 01:45:20PM +1000, Steven D'Aprano wrote: >>>> On Sun, Aug 28, 2016 at 08:26:38PM -0700, Brendan Barnwell wrote: >>>>> On 2016-08-28 18:44, Ken Kundert wrote: >>>>>> When working with a general purpose programming language, the above numbers >>>>>> become: >>>>>> >>>>>> 780kpc -> 7.8e+05 >>>> [...] >>>> >>>> For the record, I don't know what kpc might mean. "kilo pico speed of >>>> light"? So I looked it up using units, and it is kilo-parsecs. That >>>> demonstrates that unless your audience is intimately familiar with the >>>> domain you are working with, adding units (especially units that aren't >>>> actually used for anything) adds confusion. >>>> >>>> Python is not a specialist application targetted at a single domain. It >>>> is a general purpose programming language where you can expect a lot of >>>> cross-domain people (e.g. a system administrator asked to hack on a >>>> script in a domain they know nothing about). >>> >>> I talked to astrophysicist about your comments, and what she said was: >>> 1. She would love it if Python had built in support for real numbers with SI >>> scale factors >>> 2. I told her about my library for reading and writing numbers with SI scale >>> factors, and she was much less enthusiastic because using it would require >>> convincing the rest of the group, which would be too much effort. >>> 3. She was amused by the "kilo pico speed of light" comment, but she was adamant >>> that the fact that you, or some system administrator, does not understand >>> what kpc means has absolutely no affect on her desired to use SI scale >>> factors. Her comment: I did not write it for him. >>> 4. She pointed out that the software she writes and uses is intended either for >>> herself of other astrophysicists. No system administrators involved. >> >> Astropy also has a very powerful units package--originally derived >> from pyunit I think but long since diverged and grown: >> >> http://docs.astropy.org/en/stable/units/index.html >> >> It was originally developed especially for astronomy/astrophysics use >> and has some pre-defined units that many other packages don't have, as >> well as support for logarithmic units like decibel and optional (and >> customizeable) unit equivalences (e.g. frequency/wavelength or >> flux/power). >> >> That said, its power extends beyond astronomy and I heard through last >> week's EuroScipy that even some biology people have been using it. >> There's been some (informal) talk about splitting it out from Astropy >> into a stand-alone package. This is tricky since almost everything in >> Astropy has been built around it (dimensional calculations are always >> used where possible), but not impossible. >> >> One of the other big advantages of astropy.units is the Quantity class >> representing scale+dimension values. This is deeply integrated into >> Numpy so that units can be attached to Numpy arrays, and all Numpy >> ufuncs can operate on them in a dimensionally meaningful way. The >> needs for this have driven a number of recent features in Numpy. This >> is work that, unfortunately, could never be integrated into the Python >> stdlib. > > I'll also add that syntactic support for units has rarely been an > issue in Astropy. The existing algebraic rules for units work fine > with Python's existing order of operations. It can be *nice* to be > able to write "1m" instead of "1 * m" but ultimately it doesn't add > much for clarity (and if really desired could be handled with a > preparser--something I've considered adding for Astropy sources (via > codecs). I just want to add, as an astrophysicist who uses astropy.units: the astropy solution is pretty great, and I don?t mind the library overhead. I?d much rather have astropy.units, which does dimensional analysis, as well as handling SI prefixes for 2 reasons: 1. I don?t normally see or use SI prefixes without units, so bare SI prefixes are fairly worthless to me as a scientist. IF the units are going to be there, I?d much rather have a library that does a good job at dimensional analysis, and has taken my domain-specific concerns into account, for reasons fairly well covered in this thread. 2. I don?t find it cumbersome at all to use something like astropy.units which provides both the prefix and units for my code on input and output. The added syntactic weight of a single import, plus multiplication, is really not that big a burden, and makes it both clear what I am trying to write, and easy for the library to maintain this meaning when I use the variable later. e.g. from astropy.units import * distance = 10 * km If that multiplication symbol is really too much to handle, then I?d rather see python support implicit multiplication as suggested above (i.e. ?10 km? is parsed as ?10 * km") and domain-specific libraries can support SI prefixes and units. ~ Alex > > Best, > Erik > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Mon Aug 29 17:33:41 2016 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 29 Aug 2016 17:33:41 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <57C4A880.8090102@stoneleaf.us> References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> Message-ID: <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> On 8/29/2016 5:26 PM, Ethan Furman wrote: > On 08/29/2016 02:16 PM, Eric V. Smith wrote: > >>> I've been looking at this, and I agree it's the best thing to do, for >>> now (and possibly forever). >>> >>> I'm just not convinced I can get it done before alpha 1. > > Isn't the f-string feature already in place? Yes. It's been in 3.6 for quite a while (maybe a year?). > Update the PEP, then it's a bugfix. ;) Heh. I guess that's true. But it's sort of a big change, so shipping beta 1 with the code not agreeing with the PEP rubs me the wrong way. Or, I could stop worrying and typing emails, and instead just get on with it! Eric. From steve.dower at python.org Mon Aug 29 17:40:10 2016 From: steve.dower at python.org (Steve Dower) Date: Mon, 29 Aug 2016 14:40:10 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> Message-ID: <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> On 29Aug2016 1433, Eric V. Smith wrote: > On 8/29/2016 5:26 PM, Ethan Furman wrote: >> Update the PEP, then it's a bugfix. ;) > > Heh. I guess that's true. But it's sort of a big change, so shipping > beta 1 with the code not agreeing with the PEP rubs me the wrong way. > > Or, I could stop worrying and typing emails, and instead just get on > with it! I like this approach :) But I agree. Release Manager Ned has the final say, but I think this change can comfortably go in during the beta period. (I also disagree that it's a big change - nobody could agree on the 'obvious' behaviour of backslashes anyway, so chances are people would avoid them anyway, and there was strong consensus on advising people to avoid them.) Cheers, Steve From eric at trueblade.com Mon Aug 29 17:45:59 2016 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 29 Aug 2016 17:45:59 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> Message-ID: On 8/29/2016 5:40 PM, Steve Dower wrote: > On 29Aug2016 1433, Eric V. Smith wrote: >> On 8/29/2016 5:26 PM, Ethan Furman wrote: >>> Update the PEP, then it's a bugfix. ;) >> >> Heh. I guess that's true. But it's sort of a big change, so shipping >> beta 1 with the code not agreeing with the PEP rubs me the wrong way. >> >> Or, I could stop worrying and typing emails, and instead just get on >> with it! > > I like this approach :) > > But I agree. Release Manager Ned has the final say, but I think this > change can comfortably go in during the beta period. (I also disagree > that it's a big change - nobody could agree on the 'obvious' behaviour > of backslashes anyway, so chances are people would avoid them anyway, > and there was strong consensus on advising people to avoid them.) By "big", I meant "a lot of C code changes". And you'd be surprised by the percentage of the tests that are devoted to backslashes inside braces! Eric. From python-ideas at shalmirane.com Mon Aug 29 19:24:42 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Mon, 29 Aug 2016 16:24:42 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829070758.GA19357@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> Message-ID: <20160829232442.GB3534@kundert.designers-guide.com> > I talked to astrophysicist about your comments, and what she said was: > 1. She would love it if Python had built in support for real numbers with SI > scale factors > 2. I told her about my library for reading and writing numbers with SI scale > factors, and she was much less enthusiastic because using it would require > convincing the rest of the group, which would be too much effort. > 3. She was amused by the "kilo pico speed of light" comment, but she was adamant > that the fact that you, or some system administrator, does not understand > what kpc means has absolutely no affect on her desired to use SI scale > factors. Her comment: I did not write it for him. > 4. She pointed out that the software she writes and uses is intended either > for herself of other astrophysicists. No system administrators involved. It has been pointed out to me that the above comes off as being condescending towards Steven, system administrators and language developers in general. For this I am profoundly sorry. It was not my intent. My only point was that the output of these numerical programs are often so highly specialized that only the authors and their peers understand it. Let me go further in saying that if anything I have said in this discussion has come off as critical or insulting please know that that was not my intent. I have tremendous respect for what you all have accomplished and I am extremely appreciative of all the feedback and help you have given me. -Ken From songofacandy at gmail.com Mon Aug 29 20:05:29 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Tue, 30 Aug 2016 09:05:29 +0900 Subject: [Python-ideas] Force UTF-8 option regardless locale Message-ID: On Tue, Aug 30, 2016 at 8:14 AM, Victor Stinner wrote: > > I proposed the idea, but I'm not sure that we can have a single option > for Linux and Windows. Moreover, I never really worked on trying to > implement "-X utf8" on Linux, because it looks like the "misconfigured > system" are less and less common nowadays. I see very few user > requests in this direction. Some people loves tiny Linux image for Docker and RasberryPi. They doesn't has any locale other than C. Some OPs loves LANG=C or LC_ALL=C to avoid troubles and unexpected performance regression caused by locale. (e.g. sort command is much slower on ja_JP.utf8). I want to write script using utf-8 for stdio and fsencoding. Sometimes, people runs my script in C locale. And sometimes runs in misconfigured locale because SSH sends LANG that system doesn't have. So I wonder if Python has Force UTF-8" option. And if the option is configure option or site-wide installation option, because: * command line option cannot be set in shebang * Setting environment variable may be forgetten when writing scripts like crontab. The option may make startup bit faster, because it can skip setting locale in startup. Any thoughts? How should the option be set? -- INADA Naoki From Nikolaus at rath.org Mon Aug 29 21:59:15 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Mon, 29 Aug 2016 18:59:15 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829032945.GD29601@kundert.designers-guide.com> (Ken Kundert's message of "Sun, 28 Aug 2016 20:29:45 -0700") References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> Message-ID: <87zinv9fv0.fsf@vostro.rath.org> On Aug 28 2016, Ken Kundert wrote: > So, in summary, you are suggesting that we tell the scientific and engineering > communities that we refuse to provide native support for their preferred way of > writing numbers because: I think you're making some incorrect assumptions here. Who, exactly, do you mean with "we" and "them"? I consider myself part of the scientific community and think your proposal is a bad idea, and Google finds some Python modules from you, but no prior CPython contributions... Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From steve at pearwood.info Mon Aug 29 22:07:53 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 30 Aug 2016 12:07:53 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160829232442.GB3534@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> <20160829232442.GB3534@kundert.designers-guide.com> Message-ID: <20160830020752.GD26300@ando.pearwood.info> On Mon, Aug 29, 2016 at 04:24:42PM -0700, Ken Kundert wrote: > > [...] > > Her comment: I did not write it for him. > > [...] > > It has been pointed out to me that the above comes off as being condescending > towards Steven, system administrators and language developers in general. For > this I am profoundly sorry. It was not my intent. My only point was that the > output of these numerical programs are often so highly specialized that only the > authors and their peers understand it. No offense taken Ken! I completely understand what your astrophysicist friend means, and I don't expect that she should write code for me. But we have to consider code written for her, and me, and you, and system administrators, children learning their first language, Java gurus, Haskell experts, people whose only other language was BASIC in 1980, animators, scientists, web developers, and many, many other disparate groups of people. We have to do it without breaking backwards compatibility. And some how we have to try to balance all those different needs without compromising the essential "Pythonicity" of the language. The culture of Python is very conservative. I don't know of any features in Python that haven't come from some other language. Sometimes, like significant indentation, it was only a single language at the time that Python copied the feature. Sometimes a feature is not accepted unless it is in widespread use. It's a good sign that unit tracking is (slowly) becoming a language feature, like in F#, but I think you doomed your proposal as soon as you said (paraphrasing) "no other language does this, Python should lead the way and blaze this trail". (That's actually not the case, but when the prior art is niche languages like F# and Frink and calculator languages like RPL, it was always going to be a tough sell.) Sometimes it just means that the time is not right for a new feature. The ternary if operator was resisted for many years until a compelling reason to add it was found, then it was accepted rapidly. Maybe the time will never be right: nearly everyone agrees that there is no reason to actively avoid having multi-statement anonymous lambda functions, if only we can find the right syntax. But nobody has found the right syntax that isn't ambiguous, or goes against the style of Python, or requires changes to the parser that are unacceptible for other reasons. Personally, I think that your proposal has a lot of merit, it's just the details that I don't like. Absent an excellent reason why it MUST be a language feature, it should stay in libraries, where people are free to experiment more freely, or projects like IPython and Sage, that can explore their own interactive interpreters that can add new features that Python the language can't. And maybe, in Python 3.7 or 3.9 or 4.9, somebody will come up with the right proposal, or the right syntax, or notice that (let's imagine) Javascript or C++ has done it and the world didn't end, and Python will get unit tracking and/or multiplicative scaling factors as a language feature and you'll be vindicated. Personally, I hope it does. But it has to be done right, and I'm not convinced your proposal is the right way. So until then, I'm happy to stick to it being in libraries. But most importantly, thanks for caring about this! -- Steve From mertz at gnosis.cx Mon Aug 29 22:34:57 2016 From: mertz at gnosis.cx (David Mertz) Date: Mon, 29 Aug 2016 19:34:57 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <20160829090817.GF19357@kundert.designers-guide.com> Message-ID: On Mon, Aug 29, 2016 at 1:55 PM, Sven R. Kunze wrote: > There was no reasonable real-world code examples taken from important > projects, that would be significantly improved by underscores in numbers. > I recall dozens of real world examples that came up during the discussion, and have written very numerous such examples in code of my own. This is something that directly affects readability in code I write almost every day, and is a clear and obvious win. I taught examples *today* where I would have badly liked to have underscore separators, and it was obvious to me and students that awkwardness. Writing, e.g. `range(int(1e7))` feel contrives (but usually the way I do it). Writing `range(10000000)` is nearly impossible to parse visually. In contrast, writing `range(10_000_000)` will be immediately clear and obvious. None of those things can be said of SI units as Python literals. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-ideas at shalmirane.com Mon Aug 29 23:19:20 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Mon, 29 Aug 2016 20:19:20 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <87zinv9fv0.fsf@vostro.rath.org> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <87zinv9fv0.fsf@vostro.rath.org> Message-ID: <20160830031920.GB19296@kundert.designers-guide.com> Nikolaus, I have belatedly realized that this kind of hyperbole is counter productive. So let me back away from that statement and instead try to understand your reasons for not liking the proposal. Do you think there is no value to be able to naturally read and write numbers with SI scale factors from Python? Or is your issue with something about my proposal? -Ken On Mon, Aug 29, 2016 at 06:59:15PM -0700, Nikolaus Rath wrote: > On Aug 28 2016, Ken Kundert wrote: > > So, in summary, you are suggesting that we tell the scientific and engineering > > communities that we refuse to provide native support for their preferred way of > > writing numbers because: > > I think you're making some incorrect assumptions here. Who, exactly, do > you mean with "we" and "them"? > > I consider myself part of the scientific community and think your > proposal is a bad idea, and Google finds some Python modules from you, > but no prior CPython contributions... > > Best, > -Nikolaus > > -- > GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F > Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F > > ?Time flies like an arrow, fruit flies like a Banana.? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ncoghlan at gmail.com Mon Aug 29 23:24:50 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 30 Aug 2016 13:24:50 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> Message-ID: On 30 August 2016 at 07:40, Steve Dower wrote: > On 29Aug2016 1433, Eric V. Smith wrote: >> >> On 8/29/2016 5:26 PM, Ethan Furman wrote: >>> >>> Update the PEP, then it's a bugfix. ;) >> >> >> Heh. I guess that's true. But it's sort of a big change, so shipping >> beta 1 with the code not agreeing with the PEP rubs me the wrong way. >> >> Or, I could stop worrying and typing emails, and instead just get on >> with it! > > > I like this approach :) It would be good to update the PEP to say "No backslash escapes allowed inside braces" and file a bug against 3.6 for allowing it, though :) > But I agree. Release Manager Ned has the final say, but I think this change > can comfortably go in during the beta period. (I also disagree that it's a > big change - nobody could agree on the 'obvious' behaviour of backslashes > anyway, so chances are people would avoid them anyway, and there was strong > consensus on advising people to avoid them.) +1 - the beta deadline is "no new features", rather than "no further changes to features we already added". The beta period wouldn't be very useful if we couldn't make changes to new features based on user feedback :) (e.g. PEP 492's native coroutines needed some fairly major surgery during the 3.5 beta, as the Cython and Tornado folks found that some of the design decisions we'd made in the initial object model were major barriers to interoperability with other event loops and third party coroutine implementations) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From python-ideas at shalmirane.com Mon Aug 29 23:48:55 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Mon, 29 Aug 2016 20:48:55 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> Message-ID: <20160830034854.GC19296@kundert.designers-guide.com> Erik, One aspect of astropy.units that differs significantly from what I am proposing is that with astropy.units a user would explicitly specify the scale factor along with the units, and that scale factor would not change even if the value became very large or very small. For example: >>> from astropy import units as u >>> d_andromeda = 7.8e5 * u.parsec >>> print(d_andromeda) 780000.0 pc >>> d_sun = 93e6*u.imperial.mile >>> print(d_sun.to(u.parsec)) 4.850441695494146e-06 pc >>> print(d_andromeda.to(u.kpc)) 780.0 kpc >>> print(d_sun.to(u.kpc)) 4.850441695494146e-09 kpc I can see where this can be helpful at times, but it kind of goes against the spirit of SI scale factors, were you are generally expected to 'normalize' the scale factor (use the scale factor that results in the digits presented before the decimal point falling between 1 and 999). So I would expected d_andromeda = 780 kpc d_sun = 4.8504 upc Is the normalization available astropy.units and I just did not find it? Is there some reason not to provide the normalization? It seems to me that pre-specifying the scale factor might be preferred if one is generating data for a table and all the magnitude of the values are known in advance to within 2-3 orders of magnitude. It also seems to me that if these assumptions were not true, then normalizing the scale factors would generally be preferred. Do you believe that? -Ken On Mon, Aug 29, 2016 at 03:05:50PM +0200, Erik Bray wrote: > Astropy also has a very powerful units package--originally derived > from pyunit I think but long since diverged and grown: > > http://docs.astropy.org/en/stable/units/index.html > > It was originally developed especially for astronomy/astrophysics use > and has some pre-defined units that many other packages don't have, as > well as support for logarithmic units like decibel and optional (and > customizeable) unit equivalences (e.g. frequency/wavelength or > flux/power). > > That said, its power extends beyond astronomy and I heard through last > week's EuroScipy that even some biology people have been using it. > There's been some (informal) talk about splitting it out from Astropy > into a stand-alone package. This is tricky since almost everything in > Astropy has been built around it (dimensional calculations are always > used where possible), but not impossible. > > One of the other big advantages of astropy.units is the Quantity class > representing scale+dimension values. This is deeply integrated into > Numpy so that units can be attached to Numpy arrays, and all Numpy > ufuncs can operate on them in a dimensionally meaningful way. The > needs for this have driven a number of recent features in Numpy. This > is work that, unfortunately, could never be integrated into the Python > stdlib. From ncoghlan at gmail.com Mon Aug 29 23:49:21 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 30 Aug 2016 13:49:21 +1000 Subject: [Python-ideas] Force UTF-8 option regardless locale In-Reply-To: References: Message-ID: On 30 August 2016 at 10:05, INADA Naoki wrote: > On Tue, Aug 30, 2016 at 8:14 AM, Victor Stinner > wrote: >> >> I proposed the idea, but I'm not sure that we can have a single option >> for Linux and Windows. Moreover, I never really worked on trying to >> implement "-X utf8" on Linux, because it looks like the "misconfigured >> system" are less and less common nowadays. I see very few user >> requests in this direction. > > Some people loves tiny Linux image for Docker and RasberryPi. They > doesn't has any locale other than C. We run into this for CentOS images as well - the Docker images currently still default to C, as they don't have C.UTF-8 available (although you can set LANG=en_US.UTF-8 in your Dockerfile) (I think Fedora has started defaulting to C.UTF-8 now, but I haven't actually checked recently) > Some OPs loves LANG=C or LC_ALL=C to avoid troubles and unexpected > performance regression caused by locale. (e.g. sort command is much > slower on ja_JP.utf8). Broad availability of C.UTF-8 will hopefully help mitigate that behaviour, but there's still a long transition ahead on that front, as it seems unlikely "LANG=C" will ever be redefined to mean "LANG=C.UTF-8", so folks have to explicitly request "LANG=C.ASCII" to get the old US-centric behaviour :( > I want to write script using utf-8 for stdio and fsencoding. > Sometimes, people runs my script in C locale. And sometimes runs in > misconfigured > locale because SSH sends LANG that system doesn't have. > > So I wonder if Python has Force UTF-8" option. > And if the option is configure option or site-wide installation option, because: > > * command line option cannot be set in shebang > * Setting environment variable may be forgetten when writing scripts > like crontab. > > The option may make startup bit faster, because it can skip setting locale > in startup. > > Any thoughts? > How should the option be set? While I agree this is a good way to go, we unfortunately don't have a lot of precedent to work with here :( The closest we've had to date to a "CPython runtime configuration file" is the implementation dependent cert verification config file in PEP 493: https://www.python.org/dev/peps/pep-0493/#backporting-pep-476-to-earlier-python-versions Since that was designed specifically as a migration tool for the RHEL system Python, it glosses over a lot of things we'd need to care about for a proper config file, like: - how it works when running from a local checkout - how (or if) to support parallel installations - how (or if) to support virtual environments - how (or if) to support per-user overrides - how (or if) to support environment variable overrides - how (or if) to support command line overrides - how to support Windows - whether we're defining this as a CPython-only thing, or whether we'd expect other implementations to support it as well However, a config file was desirable in the cert verification case for the same reasons you mention here: so it can be visible system wide, without requiring changes to environment variables or command invocations. We do have a per-venv config file (pyvenv.cfg), but that's currently an implementation detail of the 'venv' module, rather than a clearly defined standard format. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Aug 30 00:20:30 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 30 Aug 2016 14:20:30 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160830034854.GC19296@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> <20160830034854.GC19296@kundert.designers-guide.com> Message-ID: On 30 August 2016 at 13:48, Ken Kundert wrote: > >>> d_sun = 93e6*u.imperial.mile > >>> print(d_sun.to(u.parsec)) > 4.850441695494146e-06 pc The "imperial.mile" example here highlights one key benefit that expression based approaches enjoy over dedicated syntax: easy access to Python's existing namespace features. As a quick implementation sketch, consider something like: >>> si_scaling = dict(k=1000, m=0.001) >>> class SIScale: ... def __getattr__(self, key): ... return SIUnit(si_scaling[key]) ... >>> class SIUnit: ... def __init__(self, value): ... self.value = value ... def __getattr__(self, ignored): ... return self.value ... >>> si = SIScale() >>> 500 * si.k.m 500000 >>> 500 * si.k.parsec 500000 >>> 500 * si.m.m 0.5 >>> 500 * si.m.N 0.5 You could also relatively easily adapt that such that there there was only one level of lookup, and you could write the examples without the second dot (you'd just need to do some parsing of the key value in __getattr__ to separate the SI prefix from the nominal units) One particular benefit of this kind of approach is that you automatically avoid the "E" ambiguity problem, since there's nothing wrong with "si.E" from Python's perspective. You also gain an easy hook to attach interactive help: "help(si)" (or si? in IPython terms) Expanding out to full dimensional analysis with something like astropy.units also becomes relatively straightforward, just by changing the kind of value that __getattr__ returns. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From thalesfc at gmail.com Tue Aug 30 00:23:45 2016 From: thalesfc at gmail.com (Thales filizola costa) Date: Mon, 29 Aug 2016 21:23:45 -0700 Subject: [Python-ideas] a multiProcess scheduler In-Reply-To: References: Message-ID: > > Potentially, but one of the big challenges you'll face is to establish > how it differs from using asyncio in the current process to manage > tasks dispatched to other processes via run_in_executor, and when > specifically it would be useful thing for a developer to have in the > builtin toolkit (vs being something they can install from PyPI). That's a good point. You are right. -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Aug 30 04:05:55 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 30 Aug 2016 17:05:55 +0900 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <8d1fa479-3e4b-ea1e-fbb2-0c273763d339@mail.de> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <57C3AECA.1030207@brenbarn.net> <8d1fa479-3e4b-ea1e-fbb2-0c273763d339@mail.de> Message-ID: <22469.15971.324029.825930@turnbull.sk.tsukuba.ac.jp> Sven R. Kunze writes: > And now we have '_' in numbers. > > So much for "preferably one way". Poor example. There used to be no way to group long strings of numerals for better readability. Now there is exactly one way. The Zen is not an axe. It's a scalpel. From victor.stinner at gmail.com Tue Aug 30 04:29:58 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 30 Aug 2016 10:29:58 +0200 Subject: [Python-ideas] Force UTF-8 option regardless locale In-Reply-To: References: Message-ID: Le 30 ao?t 2016 02:05, "INADA Naoki" a ?crit : > How should the option be set? I propose to add a new -X utf8 option. Maybe if the use case is important, we might add an PYTHONUTF8 environment variable. The problem is that I'm not sure that an env var is the right way to configure Python on such environment? But an env var shouldn't hurt and it is common to add a new env var with a new cmdline option. I added PYTHONFAULTHANDLER=1/-X faulthandler for faulthandler and PYTHONTRACEMALLOC=N/-X tracemalloc=N for tracemalloc. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Aug 30 05:31:14 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 30 Aug 2016 05:31:14 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> Message-ID: On 8/29/2016 5:33 PM, Eric V. Smith wrote: > On 8/29/2016 5:26 PM, Ethan Furman wrote: >> On 08/29/2016 02:16 PM, Eric V. Smith wrote: >> >>>> I've been looking at this, and I agree it's the best thing to do, for >>>> now (and possibly forever). >>>> >>>> I'm just not convinced I can get it done before alpha 1. >> >> Isn't the f-string feature already in place? > > Yes. It's been in 3.6 for quite a while (maybe a year?). > >> Update the PEP, then it's a bugfix. ;) You can do bug fixes during the beta series. > Heh. I guess that's true. But it's sort of a big change, so shipping > beta 1 with the code not agreeing with the PEP rubs me the wrong way. > Or, I could stop worrying and typing emails, and instead just get on > with it! > > Eric. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Terry Jan Reedy From p.f.moore at gmail.com Tue Aug 30 05:49:44 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 30 Aug 2016 10:49:44 +0100 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160830031920.GB19296@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <87zinv9fv0.fsf@vostro.rath.org> <20160830031920.GB19296@kundert.designers-guide.com> Message-ID: On 30 August 2016 at 04:19, Ken Kundert wrote: > Do you think there is no value to be able to naturally read and write numbers > with SI scale factors from Python? Or is your issue with something about my > proposal? Ken, Answering these questions from my perspective (and thanks for taking note of the comments and toning down your delivery, by the way) I have an issue with the way your proposal is vague about the relationship between SI scales and units - it's been mentioned a couple of times, but never adequately addressed, that scales are tightly linked to units (we routinely talk about kilometers and milligrams, but almost never about kilos or millis). There have been some strong (and justified, IMO) objections to adding units without giving them proper semantic meaning, and your response to that seems to be to talk for a while about scale factors in isolation, but then go straight back to examples using scaled units. It's hard to understand exactly what you're proposing when your examples don't match your suggestions. If we assume you *are* simply talking about pure scales (k=1000, M=1000000 etc), then you haven't addressed the many suggestions of alternatives, with anything more substantive than "well, I (and colleagues I know) prefer scale factors", plus some examples with scaled *units* again. Your comparisons tend to show your preferred approach in the best light, while using the least attractive alternative options. And there's almost no proper discussions of pros and cons. In short, you offer almost nothing in the way of objective arguments for your proposals. You mention "reading and writing numbers with scale factors from Python". It's easy enough to do external IO with scale factors, you just read strings and parse them as you wish. A language syntax only affects internal constants - and it's not clear to me that the benefit is significant even then, as I'd expect (as a matter of good style) that any constant needing this type of syntax should be named anyway. Again, this isn't something you address. You've offered no examples of real-world code in existing public projects that would be improved by your proposal. While that's not always necessary to a successful proposal, it certainly makes it more compelling, and helps to confirm that a proposal isn't limited to "niche" areas. So to summarise, I don't think you've made objective arguments for your proposal (your *subjective* enthusiasm for the proposal has never been in doubt), or addressed many of the comments that have already been made. To be honest, I don't think there's much chance of your proposal being accepted at this point in time. As Steven noted, Python tends not to be a leader in matters like this, and so the lack of mainstream prior art is probably sufficient to kill this proposal. But for your own benefit (and the benefit of any future proposals you may make in this or other areas - please don't feel put off by the fact that this specific proposal has met with a lot of resistance) you might want to review the thread and consider what a PEP might look like for this discussion, and how you would have incorporated and responded to the objections raised here - https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep is a good summary of the sort of things you should be looking at. There's no need to actually complete a PEP or post it, the proposal here hasn't reached a stage where a PEP is useful, but thinking about the PEP structure might help you understand the context a bit better. I hope this helps, Paul From srkunze at mail.de Tue Aug 30 06:47:38 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 30 Aug 2016 12:47:38 +0200 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <20160829090817.GF19357@kundert.designers-guide.com> Message-ID: <4a24a707-d012-c17c-a72b-0db365067787@mail.de> On 30.08.2016 04:34, David Mertz wrote: > On Mon, Aug 29, 2016 at 1:55 PM, Sven R. Kunze > wrote: > > There was no reasonable real-world code examples taken from > important projects, that would be significantly improved by > underscores in numbers. > > > I recall dozens of real world examples that came up during the > discussion, and have written very numerous such examples in code of my > own. This is something that directly affects readability in code I > write almost every day, and is a clear and obvious win. > I taught examples *today* where I would have badly liked to have > underscore separators, and it was obvious to me and students that > awkwardness. Writing, e.g. `range(int(1e7))` feel contrives (but > usually the way I do it). Writing `range(10000000)` is nearly > impossible to parse visually. In contrast, writing > `range(10_000_000)` will be immediately clear and obvious. > > None of those things can be said of SI units as Python literals. Hu? None of those things? I do think you exaggerate quite a lot here. If your real-world example works for underscores, it works for SI units and scales as well. I for one don't have usage of either, so having such a distance to the subject at hand, I don't see this as a compelling argument against/for his proposal. Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Tue Aug 30 08:13:05 2016 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 30 Aug 2016 08:13:05 -0400 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> Message-ID: <39d41984-354b-4409-6a8f-4f559bfaf580@trueblade.com> On 08/30/2016 07:02 AM, Philipp A. wrote: > Very cool of you to get this going! Thanks for raising the issue. > I hope the outcome is to ban escapes within braced code parts of > f-literals for 3.6 and add them ?the right way? in 3.7: f'foo{ bar['\n'] > }baz' There's debate on if that's the right way, and I personally think it's probably not. Personally, I'd be happy with the only change being to not allow backslashes inside braces. But that's not an argument that I'm willing to get into now, since I need to get this rolling for beta 1. > Also the name ?f-strings? is really misleading: They?re composite > expressions that evaluate to strings. They can only be considered > strings if you have no braced code parts in them. So I?m also still in > favor of renaming them (e.g. to ?f-literals?). I don't have much of an opinion here. I think there's not a lot of confusion to be had by calling them f-strings, but I think someone who works with teaching python might have a better handle on that. Eric. From elazarg at gmail.com Tue Aug 30 08:36:22 2016 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Tue, 30 Aug 2016 12:36:22 +0000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <39d41984-354b-4409-6a8f-4f559bfaf580@trueblade.com> References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> <39d41984-354b-4409-6a8f-4f559bfaf580@trueblade.com> Message-ID: > On Tue, Aug 30, 2016 at 3:13 PM Eric V. Smith wrote: > >On 08/30/2016 07:02 AM, Philipp A. wrote: > > Also the name ?f-strings? is really misleading: They?re composite > > expressions that evaluate to strings. They can only be considered > > strings if you have no braced code parts in them. So I?m also still in > > favor of renaming them (e.g. to ?f-literals?). > I don't have much of an opinion here. I think there's not a lot of > confusion to be had by calling them f-strings, but I think someone who > works with teaching python might have a better handle on that. The problem is that "literal" is a technical term in the domain of compilers, and not a well known or self-explanatory term. Especially for beginners. I'd suggest something like "Format expression". ~Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Tue Aug 30 08:56:45 2016 From: flying-sheep at web.de (Philipp A.) Date: Tue, 30 Aug 2016 12:56:45 +0000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> <39d41984-354b-4409-6a8f-4f559bfaf580@trueblade.com> Message-ID: Eric V. Smith schrieb am Di., 30. Aug. 2016 um 14:13 Uhr: > There's debate on if that's the right way, and I personally think it's > probably not. Personally, I'd be happy with the only change being to not > allow backslashes inside braces. But that's not an argument that I'm > willing to get into now, since I need to get this rolling for beta 1. And exactly that?s why i?m very happy with the way things are going: Banning escapes is the right short term solution, and possibly the best compromise if we can?t find consensus on how escapes should behave inside of those braces. ????????? schrieb am Di., 30. Aug. 2016 um 14:37 Uhr:? > The problem is that "literal" is a technical term in the domain of > compilers, and not a well known or self-explanatory term. Especially for > beginners. I'd suggest something like "Format expression". > That sounds fine as well! My issue is just that it?s as much of a string as a call of a (string returning) function/method or an expression concatenating strings: ''.join(things) # would you call this a string? '{!r}'.format(x) # or this? it?s basically the same as this ?f-string?: f'{x!r}' 'start' + 'end' # or this? it?s a concatenation of two strings, just like f'start{ "end" }' Best, Philipp -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Tue Aug 30 07:02:25 2016 From: flying-sheep at web.de (Philipp A.) Date: Tue, 30 Aug 2016 11:02:25 +0000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> Message-ID: Hi Eric, Very cool of you to get this going! I hope the outcome is to ban escapes within braced code parts of f-literals for 3.6 and add them ?the right way? in 3.7: f'foo{ bar['\n'] }baz' It really is how things work in every single language that i ever encountered that has template literals / string interpolation / f-literals / whatchacallit, so in order to be logical and non-surprising, we should do it that way (eventually). Also the name ?f-strings? is really misleading: They?re composite expressions that evaluate to strings. They can only be considered strings if you have no braced code parts in them. So I?m also still in favor of renaming them (e.g. to ?f-literals?). Best, Philipp -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Aug 30 09:43:03 2016 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 30 Aug 2016 23:43:03 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> <39d41984-354b-4409-6a8f-4f559bfaf580@trueblade.com> Message-ID: On Tue, Aug 30, 2016 at 10:56 PM, Philipp A. wrote: > My issue is just that it?s as much of a string as a call of a (string > returning) function/method or an expression concatenating strings: > > ''.join(things) # would you call this a string? > '{!r}'.format(x) # or this? it?s basically the same as this ?f-string?: > f'{x!r}' > 'start' + 'end' # or this? it?s a concatenation of two strings, just like > f'start{ "end" }' Yes, an f-string is really a form of expression, not a literal. But prior art has generally been to have similar constructs referred to as "interpolated strings" or similar terms: https://en.wikipedia.org/wiki/String_interpolation Plenty of languages have some such feature, and it's usually considered a type of string. Notice the close parallels between actual string literals used as format strings ("I have %d apples" % apples) and f-strings (f"I have {apples} apples"), and how this same parallel can be seen in many other languages. Yes, it may be a join expression to the compiler, but it's a string to the programmer. ChrisA From python-ideas at shalmirane.com Mon Aug 29 18:58:49 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Mon, 29 Aug 2016 15:58:49 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> Message-ID: <20160829225849.GA3534@kundert.designers-guide.com> Chris, I was not able to get an astrophyics example, but I do have a reasonable one that performs a spectral analysis of the output of an analog to digital converter, something radio astronomers are known to do. I am including the code, but it requires a rather large data file to run, which I will not include. The code uses my 'engfmt' library from pypi to perform conversion to SI form. In this example, there is no need for conversion from SI form. #!/usr/bin/env python3 import numpy as np from numpy.fft import fft, fftfreq, fftshift import matplotlib as mpl mpl.use('SVG') from matplotlib.ticker import FuncFormatter import matplotlib.pyplot as pl from engfmt import Quantity, set_preferences set_preferences(spacer=' ') def mag(spectrum): return np.absolute(spectrum) def freq_fmt(val, pos): return Quantity(val, 'Hz').to_eng() def volt_fmt(val, pos): return Quantity(val, 'V').to_eng() freq_formatter = FuncFormatter(freq_fmt) volt_formatter = FuncFormatter(volt_fmt) data = np.fromfile('delta-sigma.smpl', sep=' ') time, wave = data.reshape((2, len(data)//2), order='F') timestep = time[1] - time[0] nonperiodicity = wave[-1] - wave[0] period = timestep * len(time) print('timestep = {}'.format(Quantity(timestep, 's'))) print('nonperiodicity = {}'.format(Quantity(nonperiodicity, 'V'))) print('timepoints = {}'.format(len(time))) print('freq resolution = {}'.format(Quantity(1/period, 'Hz'))) window = np.kaiser(len(time), 11)/0.37 # beta=11 corresponds to alpha=3.5 (beta = pi*alpha), /.4 # 0.4 is the # processing gain with alpha=3.5 is 0.37 #window = 1 windowed = window*wave spectrum = 2*fftshift(fft(windowed))/len(time) freq = fftshift(fftfreq(len(wave), timestep)) fig = pl.figure() ax = fig.add_subplot(111) ax.plot(freq, mag(spectrum)) ax.set_yscale('log') ax.xaxis.set_major_formatter(freq_formatter) ax.yaxis.set_major_formatter(volt_formatter) pl.savefig('spectrum.svg') ax.set_xlim((0, 1e6)) pl.savefig('spectrum-zoomed.svg') When run, this program prints the following diagnostics to stdout: timestep = 20 ns nonperiodicity = 2.3 pV timepoints = 27994 freq resolution = 1.7861 kHz It also generates two SVG files. I have converted one to PNG and attached it. A few comments: 1. The data in the input file ('delta-sigma.smpl') has low dynamic range and is machine generated, and not really meant for direct human consumption. As such, it does not benefit from using SI scale factors. But there are certainly cases where the data has both high dynamic range and is intended for people to examine it directly. In those cases it would be very helpful if NumPy was able to directly read the file. As the language exists today, I would need to read the file myself, manually convert it, and feed the result to NumPy. 2. Many of these numbers that are output do have high dynamic range and are intended to be consumed directly by humans. These benefit from using SI scale factors. For example, the 'freq resolution' can vary from Hz to MHz and 'nonperiodicity' can vary from fV to mV. 3. Extra effort was expended to make the axis labels on the graph use SI scale factors so as to make the results 'publication quality'. My hope is that if Python accepted SI literals directly, then both NumPy nad MatPlotLib would also be extended to accept/use these formats directly, eliminating the need for me to do the conversions and manage the axes. -Ken On Mon, Aug 29, 2016 at 06:02:29PM +1000, Chris Angelico wrote: > On Mon, Aug 29, 2016 at 5:07 PM, Ken Kundert > wrote: > > > > I talked to astrophysicist about your comments, and what she said was: > > 1. She would love it if Python had built in support for real numbers with SI > > scale factors > > 2. I told her about my library for reading and writing numbers with SI scale > > factors, and she was much less enthusiastic because using it would require > > convincing the rest of the group, which would be too much effort. > > 3. She was amused by the "kilo pico speed of light" comment, but she was adamant > > that the fact that you, or some system administrator, does not understand > > what kpc means has absolutely no affect on her desired to use SI scale > > factors. Her comment: I did not write it for him. > > 4. She pointed out that the software she writes and uses is intended either for > > herself of other astrophysicists. No system administrators involved. > > So can you share some of her code, and show how the ability to scale > unitless numbers would help it? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- A non-text attachment was scrubbed... Name: spectrum-zoomed.svg.png Type: image/png Size: 49011 bytes Desc: not available URL: From steve at pearwood.info Tue Aug 30 09:41:10 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 30 Aug 2016 23:41:10 +1000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160830034854.GC19296@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> <20160830034854.GC19296@kundert.designers-guide.com> Message-ID: <20160830134109.GE26300@ando.pearwood.info> On Mon, Aug 29, 2016 at 08:48:55PM -0700, Ken Kundert wrote: > >>> print(d_sun.to(u.kpc)) > 4.850441695494146e-09 kpc > > I can see where this can be helpful at times, but it kind of goes against the > spirit of SI scale factors, were you are generally expected to 'normalize' the > scale factor (use the scale factor that results in the digits presented before > the decimal point falling between 1 and 999). So I would expected > > d_andromeda = 780 kpc > d_sun = 4.8504 upc Let me see if I get this straight... you *explicitly* asked for the distance to the sun in kpc (kiloparsecs), but you expected a result in ?pc (microparsecs)? When you ask the waiter for a short black, do you get upset that he doesn't bring you a latte with soy milk? *wink* I can see that such a normalising function would be useful, but I don't think it should be the default. (If I ask for millimetres, I want millimetres, not gigametres.) I've written and used code like that for bytes, it makes sense to apply it to other measurement units. But only if the caller requests normalisation, never by default. I don't think there is any such general expectation that values should be normalised in that way, and certainly not that your conversion program should automatically do it for you. For example, see this list of long-lived radioactive isotopes: http://w.astro.berkeley.edu/~dperley/areopagus/isotopetable.html Values above 650,000,000,000 (650e9) years are shown in "scientific format", not "engineering format", e.g. Selenium-82 is given as 1.1 x 10^20 rather than 110 x 10^18. Likewise: http://www.nist.gov/pml/data/halflife-html.cfm displays a range of units (minutes, hours, days) with the base value ranging up to over ten thousand, e.g. Ti-44 is shown as 22154 ? 456 days. This is NIST, which makes it pretty official. I don't think there's any general expectation that values should be shown in the range 1 to 999. (Perhaps in certain specialist areas.) -- Steve From jmcs at jsantos.eu Tue Aug 30 09:56:42 2016 From: jmcs at jsantos.eu (=?UTF-8?B?Sm/Do28gU2FudG9z?=) Date: Tue, 30 Aug 2016 13:56:42 +0000 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <4a24a707-d012-c17c-a72b-0db365067787@mail.de> References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <20160829090817.GF19357@kundert.designers-guide.com> <4a24a707-d012-c17c-a72b-0db365067787@mail.de> Message-ID: On Tue, 30 Aug 2016 at 12:48 Sven R. Kunze wrote: > On 30.08.2016 04:34, David Mertz wrote: > > On Mon, Aug 29, 2016 at 1:55 PM, Sven R. Kunze wrote: > >> There was no reasonable real-world code examples taken from important >> projects, that would be significantly improved by underscores in numbers. >> > > I recall dozens of real world examples that came up during the discussion, > and have written very numerous such examples in code of my own. This is > something that directly affects readability in code I write almost every > day, and is a clear and obvious win. > > I taught examples *today* where I would have badly liked to have > underscore separators, and it was obvious to me and students that > awkwardness. Writing, e.g. `range(int(1e7))` feel contrives (but usually > the way I do it). Writing `range(10000000)` is nearly impossible to parse > visually. In contrast, writing `range(10_000_000)` will be immediately > clear and obvious. > > None of those things can be said of SI units as Python literals. > > > Hu? None of those things? I do think you exaggerate quite a lot here. > > If your real-world example works for underscores, it works for SI units > and scales as well. > > There is obvious way now: G = Ghz = 1000 frequency = 1 * Ghz You can even have a non naive version Ghz that supports conversions and unit checking when doing arithmetic with it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Tue Aug 30 12:19:02 2016 From: flying-sheep at web.de (Philipp A.) Date: Tue, 30 Aug 2016 16:19:02 +0000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> <39d41984-354b-4409-6a8f-4f559bfaf580@trueblade.com> Message-ID: Sorry, but I'm afraid you are projecting your thinking onto others. The syntactical constructs are called ?string interpolations?, not ?interpolated strings?. I.e. they're interpolations (a certain type of action) on strings. Strings are the objects, not the subjects. Strings are data, we have code/expressions that look like strings with holes, but in reality, only the parts outside of the braces are strings. I hope I explained my semantics here adequately. Even if they're internally post-processed strings in the CPython code: that's an implementation detail, not a description of the way they work for Python users. Best, Philipp Chris Angelico schrieb am Di., 30. Aug. 2016, 15:43: > On Tue, Aug 30, 2016 at 10:56 PM, Philipp A. wrote: > > My issue is just that it?s as much of a string as a call of a (string > > returning) function/method or an expression concatenating strings: > > > > ''.join(things) # would you call this a string? > > '{!r}'.format(x) # or this? it?s basically the same as this ?f-string?: > > f'{x!r}' > > 'start' + 'end' # or this? it?s a concatenation of two strings, just > like > > f'start{ "end" }' > > Yes, an f-string is really a form of expression, not a literal. But > prior art has generally been to have similar constructs referred to as > "interpolated strings" or similar terms: > > https://en.wikipedia.org/wiki/String_interpolation > > Plenty of languages have some such feature, and it's usually > considered a type of string. Notice the close parallels between actual > string literals used as format strings ("I have %d apples" % apples) > and f-strings (f"I have {apples} apples"), and how this same parallel > can be seen in many other languages. Yes, it may be a join expression > to the compiler, but it's a string to the programmer. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Aug 30 12:32:50 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Aug 2016 09:32:50 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> <39d41984-354b-4409-6a8f-4f559bfaf580@trueblade.com> Message-ID: Please just call it f-string and move on, we've had the naming debate previously, it's no longer productive. Regarding eventually supporting f'{'x'}', that will have to be a new PEP to extend PEP 498. (I previously thought it would be an incompatibility, but since f'{' is currently invalid, it's not. However it's a huge change conceptually and implementation-wise, and I don't blame Eric if he doesn't want to be responsible for it. So it has to be a new PEP, to be introduced in 3.7 at the earliest. -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Tue Aug 30 12:34:39 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 31 Aug 2016 02:34:39 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> <39d41984-354b-4409-6a8f-4f559bfaf580@trueblade.com> Message-ID: <20160830163439.GI26300@ando.pearwood.info> On Tue, Aug 30, 2016 at 11:43:03PM +1000, Chris Angelico wrote: > On Tue, Aug 30, 2016 at 10:56 PM, Philipp A. wrote: > > My issue is just that it?s as much of a string as a call of a (string > > returning) function/method or an expression concatenating strings: > > > > ''.join(things) # would you call this a string? > > '{!r}'.format(x) # or this? it?s basically the same as this ?f-string?: > > f'{x!r}' > > 'start' + 'end' # or this? it?s a concatenation of two strings, just like > > f'start{ "end" }' > > Yes, an f-string is really a form of expression, not a literal. But > prior art has generally been to have similar constructs referred to as > "interpolated strings" or similar terms: > > https://en.wikipedia.org/wiki/String_interpolation *shrug* A misleading name is misleading no matter how many people use it. If started calling function calls: func(arg, x, y, z+1) "s-strings" for "source code strings", because you write them as source code, and changed the syntax to func"arg, x, y, z+1" we'd all recognise what a mistake it is to call this a string, string delimiters or not. The *result* of calling it may be a string, but the expression itself is a kind of function call. [nasty thought] Why don't we allow func"*args" as syntactic sugar for str(func(*args))? [/remove tongue from cheek] > Plenty of languages have some such feature, and it's usually > considered a type of string. The result is a type of string. The expression is not. > Notice the close parallels between actual > string literals used as format strings ("I have %d apples" % apples) > and f-strings (f"I have {apples} apples"), That's a misleading comparison. The *template string* is "I have %d apples" and that is just a string. The *format operator* is % and the formatting operation or expression is the entire expression. Since f-strings can contain arbitrary expressions, the analogy is not with the template string, but a function (method or operator) call: f"I have {fruit - bananas - oranges} apples" is not equivalent to "I have {} apples", but to "I have {} apples".format(fruit - bananas - oranges) which is clearly a method call that merely returns a string, not a string itself. Consequently, there's no way to delay evaluation of such an f-string, any more than you can delay evaluation of the expression[1]: func(fruit, 2*bananas, 3/oranges) You can't generate a template, and pass it to different environments to be formatted. If you need to evaluate f"I have {apples} apples" in six different contexts, you have to repeat yourself: it is equivalent to a function call, complete with implicit arguments, not a function object, and certainly not equivalent to a template string. > and how this same parallel > can be seen in many other languages. Yes, it may be a join expression > to the compiler, but it's a string to the programmer. Only if the programmer is fooled by the mere use of string delimiters. It really isn't a string. It is executable code that returns a string. It is basically a distant cousin to eval(), in disguise. I really wish we had a better name for these things than f-strings :-( [1] Tricks with eval() excluded. -- Steve From guido at python.org Tue Aug 30 12:43:05 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Aug 2016 09:43:05 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> <39d41984-354b-4409-6a8f-4f559bfaf580@trueblade.com> Message-ID: Philipp, you need to stop debating this issue *now*. You need to write a PEP that can go into Python 3.7. Further debate at the current level (a hair-width close to name-calling) is not going to sway anyone. (This actually goes for Chris too -- nothing is obviously going to change Philipp's mind, so you might as well stop debating and save all the onlookers the embarrassment.) -- --Guido van Rossum (python.org/~guido) From flying-sheep at web.de Tue Aug 30 12:56:38 2016 From: flying-sheep at web.de (Philipp A.) Date: Tue, 30 Aug 2016 16:56:38 +0000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> <39d41984-354b-4409-6a8f-4f559bfaf580@trueblade.com> Message-ID: Hi Guido, thanks for calling me out. Yikes, I'm terribly sorry that it came over that way! I'll write the RFC. Should I expand the existing one (this would need Chris? pending changes though) or write a new one? My goals were to sound factual and terse, not to insult anyone. And I don't see the flaws in my phrasing, so it seems I'm still sometimes bad at written communication. @everyone who perceived it as Guido did: It would be really nice if you could pinpoint the phrases and reasons that make it seem that I mean it that way. (In a private mail to me) Best, Philipp Guido van Rossum schrieb am Di., 30. Aug. 2016, 18:43: > Philipp, you need to stop debating this issue *now*. > > You need to write a PEP that can go into Python 3.7. Further debate at > the current level (a hair-width close to name-calling) is not going to > sway anyone. > > (This actually goes for Chris too -- nothing is obviously going to > change Philipp's mind, so you might as well stop debating and save all > the onlookers the embarrassment.) > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Aug 30 13:03:32 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Aug 2016 10:03:32 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> <39d41984-354b-4409-6a8f-4f559bfaf580@trueblade.com> Message-ID: You need to write a new PEP. And it's PEP, not RFC. On Tue, Aug 30, 2016 at 9:56 AM, Philipp A. wrote: > Hi Guido, thanks for calling me out. > > Yikes, I'm terribly sorry that it came over that way! > > I'll write the RFC. Should I expand the existing one (this would need Chris? > pending changes though) or write a new one? > > My goals were to sound factual and terse, not to insult anyone. And I don't > see the flaws in my phrasing, so it seems I'm still sometimes bad at written > communication. > > @everyone who perceived it as Guido did: It would be really nice if you > could pinpoint the phrases and reasons that make it seem that I mean it that > way. (In a private mail to me) > > Best, Philipp > > > Guido van Rossum schrieb am Di., 30. Aug. 2016, 18:43: >> >> Philipp, you need to stop debating this issue *now*. >> >> You need to write a PEP that can go into Python 3.7. Further debate at >> the current level (a hair-width close to name-calling) is not going to >> sway anyone. >> >> (This actually goes for Chris too -- nothing is obviously going to >> change Philipp's mind, so you might as well stop debating and save all >> the onlookers the embarrassment.) >> >> -- >> --Guido van Rossum (python.org/~guido) -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Tue Aug 30 13:30:27 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 31 Aug 2016 03:30:27 +1000 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> Message-ID: <20160830173027.GJ26300@ando.pearwood.info> On Tue, Aug 30, 2016 at 11:02:25AM +0000, Philipp A. wrote: > Hi Eric, > > Very cool of you to get this going! > > I hope the outcome is to ban escapes within braced code parts of f-literals > for 3.6 and add them ?the right way? in 3.7: f'foo{ bar['\n'] }baz' That looks like you are doing a key lookup on bar: bar = {'\n': 'something'} f'foo{ bar['\n'] }baz' looks like it will return 'foosomethingbaz'. I expect that syntax will confuse an awful lot of people. -- Steve From guido at python.org Tue Aug 30 13:39:43 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Aug 2016 10:39:43 -0700 Subject: [Python-ideas] =?utf-8?q?Let=E2=80=99s_make_escaping_in_f-litera?= =?utf-8?q?ls_impossible?= In-Reply-To: <20160830173027.GJ26300@ando.pearwood.info> References: <30cd8ad6-d04c-10bf-0fb9-618becae8641@trueblade.com> <57C4A880.8090102@stoneleaf.us> <77cf7986-4325-8b75-4615-320ce70b517e@trueblade.com> <00a5ef2d-c17c-7fb9-5cab-1684ab9b986a@python.org> <20160830173027.GJ26300@ando.pearwood.info> Message-ID: On Tue, Aug 30, 2016 at 10:30 AM, Steven D'Aprano wrote: [...] > That looks like you are doing a key lookup on bar: > > bar = {'\n': 'something'} > f'foo{ bar['\n'] }baz' > > looks like it will return 'foosomethingbaz'. I expect that syntax will > confuse an awful lot of people. Can we please stop debating this? This observation has been made about 100 times by now. -- --Guido van Rossum (python.org/~guido) From python-ideas at shalmirane.com Tue Aug 30 14:48:04 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Tue, 30 Aug 2016 11:48:04 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160830134109.GE26300@ando.pearwood.info> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> <20160830034854.GC19296@kundert.designers-guide.com> <20160830134109.GE26300@ando.pearwood.info> Message-ID: <20160830184804.GB2363@kundert.designers-guide.com> Steve, Actually I initially asked for the distances in parsecs and was expecting that they would be presented in a convenient format. So, to frame it in terms of your analogy, I ordered a short black and become upset when I am delivered 8oz of coffee in a 55 gallon drum. This seems to be one of those unstated assumptions that have caused confusion in these discussions. Sometimes you want fix the prefix, sometimes you don't. For example, the Bel (B) is a unit of measure for ratios, but we never use it directly, we always use decibels (dB). Nobody uses mB or kB or even B, it is always dB. But with other units we do use the scale factors and we do tend to normalize the presentation. For example, nobody says the Usain Bolt won the 100000mm dash, or the 0.1km dash. Similarly when people refer to the length of the Olympic road race in Rio, they say 56km, not 56000m. This is really only an issue with output. What I am suggesting is adding support for the second case into stdlib. For example: >>> print('Attenuation = {:.1f}dB at {:r}m.'.format(-13.7, 50e3)) Attenuation = -13.7dB at 50km. -Ken On Tue, Aug 30, 2016 at 11:41:10PM +1000, Steven D'Aprano wrote: > On Mon, Aug 29, 2016 at 08:48:55PM -0700, Ken Kundert wrote: > > > >>> print(d_sun.to(u.kpc)) > > 4.850441695494146e-09 kpc > > > > I can see where this can be helpful at times, but it kind of goes against the > > spirit of SI scale factors, were you are generally expected to 'normalize' the > > scale factor (use the scale factor that results in the digits presented before > > the decimal point falling between 1 and 999). So I would expected > > > > d_andromeda = 780 kpc > > d_sun = 4.8504 upc > > Let me see if I get this straight... you *explicitly* asked for the > distance to the sun in kpc (kiloparsecs), but you expected a result in > ?pc (microparsecs)? > > When you ask the waiter for a short black, do you get upset that > he doesn't bring you a latte with soy milk? *wink* > > I can see that such a normalising function would be useful, but I don't > think it should be the default. (If I ask for millimetres, I want > millimetres, not gigametres.) I've written and used code like that for > bytes, it makes sense to apply it to other measurement units. But only > if the caller requests normalisation, never by default. > > I don't think there is any such general expectation that values should > be normalised in that way, and certainly not that your conversion > program should automatically do it for you. For example, see this list > of long-lived radioactive isotopes: > > http://w.astro.berkeley.edu/~dperley/areopagus/isotopetable.html > > Values above 650,000,000,000 (650e9) years are shown in "scientific > format", not "engineering format", e.g. Selenium-82 is given > as 1.1 x 10^20 rather than 110 x 10^18. > > Likewise: > > http://www.nist.gov/pml/data/halflife-html.cfm > > displays a range of units (minutes, hours, days) with the base value > ranging up to over ten thousand, e.g. Ti-44 is shown as 22154 ? 456 > days. This is NIST, which makes it pretty official. I don't think > there's any general expectation that values should be shown in the range > 1 to 999. (Perhaps in certain specialist areas.) > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From guido at python.org Tue Aug 30 14:59:19 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Aug 2016 11:59:19 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160830184804.GB2363@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> <20160830034854.GC19296@kundert.designers-guide.com> <20160830134109.GE26300@ando.pearwood.info> <20160830184804.GB2363@kundert.designers-guide.com> Message-ID: On Tue, Aug 30, 2016 at 11:48 AM, Ken Kundert wrote: > [...] Similarly when people refer to the length of > the Olympic road race in Rio, they say 56km, not 56000m. However I can't help to point out that if I said the distance to the sun is 149.6 Gm, most people would do a double-take. > This is really only an issue with output. So maybe the proposal should be toned down to just a way to request SI units when formatting numbers? -- --Guido van Rossum (python.org/~guido) From python-ideas at shalmirane.com Tue Aug 30 15:28:27 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Tue, 30 Aug 2016 12:28:27 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> <20160830034854.GC19296@kundert.designers-guide.com> <20160830134109.GE26300@ando.pearwood.info> <20160830184804.GB2363@kundert.designers-guide.com> Message-ID: <20160830192827.GD2363@kundert.designers-guide.com> Guido, I am in the process of summarizing the discussion as a way of wrapping this up. As part of that I will be making a proposal that I think has a chance of being accepted, and it will largely be what you suggest. -Ken On Tue, Aug 30, 2016 at 11:59:19AM -0700, Guido van Rossum wrote: > On Tue, Aug 30, 2016 at 11:48 AM, Ken Kundert > wrote: > > [...] Similarly when people refer to the length of > > the Olympic road race in Rio, they say 56km, not 56000m. > > However I can't help to point out that if I said the distance to the > sun is 149.6 Gm, most people would do a double-take. > > > This is really only an issue with output. > > So maybe the proposal should be toned down to just a way to request SI > units when formatting numbers? > > -- > --Guido van Rossum (python.org/~guido) From mal at egenix.com Tue Aug 30 15:45:43 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 30 Aug 2016 21:45:43 +0200 Subject: [Python-ideas] Force UTF-8 option regardless locale In-Reply-To: References: Message-ID: <57C5E267.7020400@egenix.com> On 30.08.2016 10:29, Victor Stinner wrote: > Le 30 ao?t 2016 02:05, "INADA Naoki" a ?crit : >> How should the option be set? > > I propose to add a new -X utf8 option. Maybe if the use case is important, > we might add an PYTHONUTF8 environment variable. > > The problem is that I'm not sure that an env var is the right way to > configure Python on such environment? But an env var shouldn't hurt and it > is common to add a new env var with a new cmdline option. > > I added PYTHONFAULTHANDLER=1/-X faulthandler for faulthandler and > PYTHONTRACEMALLOC=N/-X tracemalloc=N for tracemalloc. In PyRun we simply define a default for PYTHONIOENCODING and set this to utf-8: http://www.egenix.com/products/python/PyRun/doc/#_Toc452660008 The encoding guessing is still available by setting the env var to "" (but this is hardly used). So far this has been working great. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Aug 30 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From python-ideas at shalmirane.com Tue Aug 30 16:34:27 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Tue, 30 Aug 2016 13:34:27 -0700 Subject: [Python-ideas] real numbers with SI scale factors: next steps Message-ID: <20160830203427.GE2363@kundert.designers-guide.com> Okay, let's try to wrap this up. In summary I proposed three things: 1. A change to the Python lexer to accept SI literal as an alternative, but not replacement to, E-notation. As an optional feature, simple units could be added to the end but would be largely ignored. So the following would be accepted: freq = 2.4GHz r = 1k l = 10nm The idea in accepting units was to allow them to be specified when convenient as additional documentation on the meaning of the number. Objections: a. Acceptance of the abbreviation for Exa (E) overlaps with E-notation (1E+1 could represent 1e18 + 1 or 10). A suggestion to change the prefix from E to X conflicts with a proposal to use X, W, and V to represent 10^27, 10^30, and 10^33 (en.wikipedia.org/wiki/Metric_prefix) b. Allowing the units to be specified will lead some users to assume a dimensional analysis is being performed when in fact the units are ignored. This false sense of security could lead to bugs. c. The proposal only supports simple units, not compound units such as m/s. So even if hooks were provided to allow access to the units to support an add-on dimensional analysis capability, an additional mechanism would have to be provided to support compound units. d. Many people objected to allowing the use of naked scale factors as a perversion of the standard. 2. A change to the float() function so that it accepts SI scale factors and units. This extension naturally follows from the first: the float function should accept anything the Python parser accepts. For example: freq = float('2.4GHz') r = float('1k') l = float('10nm') Objections: a. The Exa objection from the above proposal is problematic here as well. b. Things that used to be errors are now no longer errors. This could cause problems if a program was counting on float('1k') to be an error. 3. A change to the various string formatting mechanisms to allow outputting real numbers with SI scale factors: >>> print('Speed of light in a vacuum: {:r}m/s.'.format(2.9979e+08)) Speed of light in a vacuum: 299.79 Mm/s. >>> print('Speed of sound in water: %rm/s.' % 1481 Speed of sound in water: 1.481 km/s. Objections: No objections were raised that I recall, however here is something else to consider: a. Should we also provide mechanism for the binary scale factors (Ki, Mi, ..., Yi)? For example: '{:b}B'.format(2**30) --> 1 GiB. On proposed extension 1 (native support for SI literals) my conclusion is that we did not reach any sense of consensus and there was considerable opposition to my proposal. There was much less discussion on extensions 2 & 3, so it is hard to say whether consensus was reached. So, given all this, I would like to make the following recommendations: 1. No action should be taken. 2. The main justification to modifying float() was to make it consistent with the extended Python language. Without extension 1, this justification goes away. However the need to be able to easily convert strings of numbers with SI scale factors into floats still exists. This should be handled by adding a library or extending an existing library. 3. Allowing numbers to be formatted with SI prefixes is useful and not controversial. The 'r' and 'b' format codes should be added to the various string formatting mechanisms. What do you think? -Ken From p.f.moore at gmail.com Tue Aug 30 17:07:17 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 30 Aug 2016 22:07:17 +0100 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: <20160830203427.GE2363@kundert.designers-guide.com> References: <20160830203427.GE2363@kundert.designers-guide.com> Message-ID: On 30 August 2016 at 21:34, Ken Kundert wrote: > So, given all this, I would like to make the following recommendations: > 1. No action should be taken. > 2. The main justification to modifying float() was to make it consistent with > the extended Python language. Without extension 1, this justification goes > away. However the need to be able to easily convert strings of numbers with > SI scale factors into floats still exists. This should be handled by adding > a library or extending an existing library. > 3. Allowing numbers to be formatted with SI prefixes is useful and not > controversial. The 'r' and 'b' format codes should be added to the various > string formatting mechanisms. > > What do you think? Thanks for the summary (which I mostly elided) which I think was fair. Regarding (3), the only one that remains proposed, I think it would be useful to see a 3rd-party library implementation of the formatting operation proposed. This would allow any corner cases or controversial points to be ironed out before proposing it for direct incorporation in the string formatting mini-language. Furthermore, in Python 2.6, it will be possible to write f"The value is {si_format(the_val)}" directly, using PEP 498 f-strings. The combination of a 3rd party function and f-strings may even make special formatting support unnecessary - but that will be easier to establish with practical experience. And there's little or no downside - the proposed feature won't be possible before 3.7, so we may as well use lifetime of the 3.6 release to gain that experience. Paul From guido at python.org Tue Aug 30 17:16:28 2016 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Aug 2016 14:16:28 -0700 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: References: <20160830203427.GE2363@kundert.designers-guide.com> Message-ID: Given that something like this gets proposed from time to time, I wonder if it would make sense to actually write up (1) and (2) as a PEP that is immediately marked rejected. The PEP should make it clear *why* it is rejected. This would be a handy reference doc to have around the next time the idea comes up. -- --Guido van Rossum (python.org/~guido) From srkunze at mail.de Tue Aug 30 17:51:06 2016 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 30 Aug 2016 23:51:06 +0200 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: <20160830203427.GE2363@kundert.designers-guide.com> References: <20160830203427.GE2363@kundert.designers-guide.com> Message-ID: Thanks a lot for this comprehensive summary. :) Find my comments below. On 30.08.2016 22:34, Ken Kundert wrote: > Okay, let's try to wrap this up. In summary I proposed three things: > > 1. A change to the Python lexer to accept SI literal as an alternative, but not > replacement to, E-notation. As an optional feature, simple units could be > added to the end but would be largely ignored. So the following would be > accepted: > > freq = 2.4GHz > r = 1k > l = 10nm > > The idea in accepting units was to allow them to be specified when convenient > as additional documentation on the meaning of the number. > > Objections: > a. Acceptance of the abbreviation for Exa (E) overlaps with E-notation (1E+1 > could represent 1e18 + 1 or 10). A suggestion to change the prefix from > E to X conflicts with a proposal to use X, W, and V to represent 10^27, > 10^30, and 10^33 (en.wikipedia.org/wiki/Metric_prefix) I think this results from the possibility of omitting the SI units. > b. Allowing the units to be specified will lead some users to assume > a dimensional analysis is being performed when in fact the units are > ignored. This false sense of security could lead to bugs. Same can be said for variable annotations for which a PEP is in the works. > c. The proposal only supports simple units, not compound units such as m/s. > So even if hooks were provided to allow access to the units to support an > add-on dimensional analysis capability, an additional mechanism would have > to be provided to support compound units. I get the feeling that SI syntax should only work when the hook is provided. So this could be the dealbreaker here: only enabling it when the hook is provided, changes the syntax/semantics of valid Python code depending on the presence of some hidden hooks. Enabling the syntax regardless of a working hook, have those sideeffects like described by you above. So, no matter how done, it always has some negative connotation. > d. Many people objected to allowing the use of naked scale factors as > a perversion of the standard. Remove this and it also solves 1.a. > > 2. A change to the float() function so that it accepts SI scale factors and > units. This extension naturally follows from the first: the float function > should accept anything the Python parser accepts. For example: > > freq = float('2.4GHz') > r = float('1k') > l = float('10nm') > > Objections: > a. The Exa objection from the above proposal is problematic here as well. > b. Things that used to be errors are now no longer errors. This could cause > problems if a program was counting on float('1k') to be an error. > > > 3. A change to the various string formatting mechanisms to allow outputting real > numbers with SI scale factors: > > >>> print('Speed of light in a vacuum: {:r}m/s.'.format(2.9979e+08)) > Speed of light in a vacuum: 299.79 Mm/s. > > >>> print('Speed of sound in water: %rm/s.' % 1481 > Speed of sound in water: 1.481 km/s. > > Objections: > No objections were raised that I recall, however here is something else to > consider: > > a. Should we also provide mechanism for the binary scale factors (Ki, Mi, > ..., Yi)? For example: '{:b}B'.format(2**30) --> 1 GiB. > > On proposed extension 1 (native support for SI literals) my conclusion is that > we did not reach any sense of consensus and there was considerable opposition to > my proposal. There was much less discussion on extensions 2 & 3, so it is hard > to say whether consensus was reached. > > So, given all this, I would like to make the following recommendations: > 1. No action should be taken. > 2. The main justification to modifying float() was to make it consistent with > the extended Python language. Without extension 1, this justification goes > away. However the need to be able to easily convert strings of numbers with > SI scale factors into floats still exists. This should be handled by adding > a library or extending an existing library. > 3. Allowing numbers to be formatted with SI prefixes is useful and not > controversial. The 'r' and 'b' format codes should be added to the various > string formatting mechanisms. > > What do you think? I like your conclusion. It seems there is missing some technical note of why this won't happen the way you proposed it (maybe the hook + missing stdlib package for SI units). :) Aren't there some package already available for recommendation 3? Sven From barry at python.org Tue Aug 30 18:23:15 2016 From: barry at python.org (Barry Warsaw) Date: Tue, 30 Aug 2016 18:23:15 -0400 Subject: [Python-ideas] real numbers with SI scale factors: next steps References: <20160830203427.GE2363@kundert.designers-guide.com> Message-ID: <20160830182315.4ce6625f@anarchist.wooz.org> On Aug 30, 2016, at 02:16 PM, Guido van Rossum wrote: >Given that something like this gets proposed from time to time, I >wonder if it would make sense to actually write up (1) and (2) as a >PEP that is immediately marked rejected. The PEP should make it clear >*why* it is rejected. This would be a handy reference doc to have >around the next time the idea comes up. There certainly is precedence: e.g. PEPs 404 and 666. :) Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From songofacandy at gmail.com Tue Aug 30 19:34:22 2016 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 31 Aug 2016 08:34:22 +0900 Subject: [Python-ideas] Force UTF-8 option regardless locale In-Reply-To: <57C5E267.7020400@egenix.com> References: <57C5E267.7020400@egenix.com> Message-ID: On Wed, Aug 31, 2016 at 4:45 AM, M.-A. Lemburg wrote: > On 30.08.2016 10:29, Victor Stinner wrote: >> Le 30 ao?t 2016 02:05, "INADA Naoki" a ?crit : >>> How should the option be set? >> >> I propose to add a new -X utf8 option. Maybe if the use case is important, >> we might add an PYTHONUTF8 environment variable. >> >> The problem is that I'm not sure that an env var is the right way to >> configure Python on such environment? But an env var shouldn't hurt and it >> is common to add a new env var with a new cmdline option. >> >> I added PYTHONFAULTHANDLER=1/-X faulthandler for faulthandler and >> PYTHONTRACEMALLOC=N/-X tracemalloc=N for tracemalloc. > > In PyRun we simply define a default for PYTHONIOENCODING and > set this to utf-8: > > http://www.egenix.com/products/python/PyRun/doc/#_Toc452660008 > > The encoding guessing is still available by setting the env > var to "" (but this is hardly used). > > So far this has been working great. My concern is, people other than me running Python scripts on such systems (which has only C locale). Most unix commands runs well in C locale. But only Python script get many trouble. * locale error when just running Python script. (when bad LANG setting). * Unicode error happen when stdout is piped, while runs well when without pipe (when LANG=C, and no PYTHONIOENCODING set). * open() without explicit `encoding='utf-8'` runs well on Mac and LANG=*.utf8 environment. But UnicodeError happen on LANG=C environment. (Actually speaking, I and my company doesn't use UTF-8 filename. So we don't get trouble about fsencoding. But some other companies may.) On such system, site-wide configuration to override `nl_langinfo(CODESET)` may help people. Otherwise: 1 Face locale error when running Python script, and write LANG=C to their .bashrc. 2 Face UnicodeError when piping from Python script, and write PYTHONIOENCODING=utf-8 in their .bashrc. 3 Face UnicodeError when reading/writing from text file, and add explicit `encoding='utf-8'` (This bug may be not found on CI environment having *.UTF-8 locale, and happens in production environment) 4 Finally, people feel Python is troublesome language, and they don't want to use Python anymore. I know about `/etc/environment` file. But OPs doesn't like adding lines to it only for Python. They feel "Perl (or Ruby) is better than Python". This is why I think configuration option or site-wide configuration is desirable even if we have PYTHON(IO|FS|PREFERRED)ENCODINGS environment variables. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Aug 30 2016) >>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>>> Python Database Interfaces ... http://products.egenix.com/ >>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > > ::: We implement business ideas - efficiently in both time and costs ::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > http://www.malemburg.com/ > -- INADA Naoki From steve at pearwood.info Tue Aug 30 22:05:52 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 31 Aug 2016 12:05:52 +1000 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: <20160830203427.GE2363@kundert.designers-guide.com> References: <20160830203427.GE2363@kundert.designers-guide.com> Message-ID: <20160831020551.GN26300@ando.pearwood.info> On Tue, Aug 30, 2016 at 01:34:27PM -0700, Ken Kundert wrote: > 3. A change to the various string formatting mechanisms to allow outputting real > numbers with SI scale factors: This is somewhat similar to a library I wrote for formatting bytes: https://pypi.python.org/pypi/byteformat Given that feature freeze for 3.6 is two weeks way, I don't think that this proposal will appear before 3.7. So I'm interested, but I'm less interested *right now*. So for now I'll limit myself to only a few observations. > >>> print('Speed of light in a vacuum: {:r}m/s.'.format(2.9979e+08)) > Speed of light in a vacuum: 299.79 Mm/s. Do you think that {:r} might be confused with {!r}? What's the mnemonic here? Why "r" for scale factor? > >>> print('Speed of sound in water: %rm/s.' % 1481 > Speed of sound in water: 1.481 km/s. I doubt that you'll get any new % string formatting codes. That's a legacy interface, *not* deprecated but unlikely to get new features added, and it is intended to closely match the C printf codes. A few more questions: (1) Why no support for choosing a particular scale? If this only auto-scales, I'm not interested. (2) Support for full prefix names, so we can format (say) "kilograms" as well as "kg"? (3) Scientific notation and engineering notation? (4) 1e5 versus 1?10^5 notation? (5) Is this really something that format() needs to understand? We can get a *much* richer and more powerful interface by turning it into a generalise numeric pretty-printing library, at the cost of a little less convenience. > 3. Allowing numbers to be formatted with SI prefixes is useful and not > controversial. I wouldn't quite go that far. You made an extremely controversial request (new syntax for scaling prefixes + ignored units) and nearly all the attention was on that. For what its worth, I have no need for a format code which *only* auto-selects the scaling factor. If I don't have at least the option to choose which scaling factor I get, and hence the prefix, this is of little or no use to me, I likely wouldn't use it, and as far as I am concerned the nuisance value of having yet another format string code to learn outweighs the benefit. -- Steve From rosuav at gmail.com Tue Aug 30 22:35:24 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 31 Aug 2016 12:35:24 +1000 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: <20160831020551.GN26300@ando.pearwood.info> References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> Message-ID: On Wed, Aug 31, 2016 at 12:05 PM, Steven D'Aprano wrote: > (5) Is this really something that format() needs to understand? We can > get a *much* richer and more powerful interface by turning it into a > generalise numeric pretty-printing library, at the cost of a little less > convenience. Or just have a subclass of int or float that defines __format__, and can do whatever it likes - including specifying the scale, if you so choose. Say, something like: {:s} -- autoscale, prefix {:S} -- autoscale, full word {:sM} -- scale to mega, print "M" {:SM} -- scale to mega, print "Mega" etc ChrisA From python-ideas at shalmirane.com Wed Aug 31 00:08:01 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Tue, 30 Aug 2016 21:08:01 -0700 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: <20160831020551.GN26300@ando.pearwood.info> References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> Message-ID: <20160831040801.GF2363@kundert.designers-guide.com> > What's the mnemonic here? Why "r" for scale factor? My thinking was that r stands for real like f stands for float. With the base 2 scale factors, b stands for binary. > (1) Why no support for choosing a particular scale? If this only auto-scales, > I'm not interested. Auto-scaling is kind of the point. There is really little need for a special mechanism if your going to specify the scale factor yourself. >>> print('Attenuation = {:.1f} dB at {:r}m.'.format(-13.7, 50e3)) Attenuation = -13.7 dB at 50 km. If you wanted to force the second number to be in km, you use a %f format and scale the argument: >>> print('Attenuation = {:.1f} dB at {:.1f} km.'.format(-13.7, 50e3/1e3)) Attenuation = -13.7 dB at 50 km. > (2) Support for full prefix names, so we can format (say) "kilograms" as well > as "kg"? This assumes that somehow this code can access the units so that it can switch between long form 'grams' and short form 'g'. That is a huge expansion in the complexity for what seems like a small benefit. > (3) Scientific notation and engineering notation? > > (4) 1e5 versus 1?10^5 notation? Ah, okay. But all of these require auto-scaling. And I was still thinking that we need to provide input and output capability (ie, we still need be able to convert whatever format we output back from strings into floats). Are you thinking that we should parse 1?10^5? And why 1?10^5 and not 1?10?? > (5) Is this really something that format() needs to understand? We can get > a *much* richer and more powerful interface by turning it into a generalise > numeric pretty-printing library, at the cost of a little less convenience. This is suddenly a much bigger project than what I was envisioning. -Ken From p.f.moore at gmail.com Wed Aug 31 03:00:42 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 31 Aug 2016 08:00:42 +0100 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: <20160831040801.GF2363@kundert.designers-guide.com> References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> Message-ID: On 31 August 2016 at 05:08, Ken Kundert wrote: > Auto-scaling is kind of the point. There is really little need for a special > mechanism if your going to specify the scale factor yourself. > > >>> print('Attenuation = {:.1f} dB at {:r}m.'.format(-13.7, 50e3)) > Attenuation = -13.7 dB at 50 km. > > If you wanted to force the second number to be in km, you use a %f format and > scale the argument: > > >>> print('Attenuation = {:.1f} dB at {:.1f} km.'.format(-13.7, 50e3/1e3)) > Attenuation = -13.7 dB at 50 km. This argument can just as easily be used against your proposal: If you want auto-scaling you use a %s format and a suitable library function: >>> print('Attenuation = {:.1f} dB at {}m.'.format(-13.7, scale(50e3))) Attenuation = -13.7 dB at 50 km. Anything that's going to be included in the language has to consider other requirements than just your own. > This is suddenly a much bigger project than what I was envisioning. You're going to have to write the scaling code one way or the other. Writing it in Python and publishing it as a library is *far* easier than writing it in C and hooking it into the format mechanism. You can leave others to offer pull requests to your library to add extra types of formatting. IMO, it's probably time to write some code. Publish a library on PyPI (call it a "prototype" if you like) implementing the scale() function above, publicise it here and elsewhere, and see what reception it gets. Paul From rosuav at gmail.com Wed Aug 31 03:07:11 2016 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 31 Aug 2016 17:07:11 +1000 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: <20160831040801.GF2363@kundert.designers-guide.com> References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> Message-ID: On Wed, Aug 31, 2016 at 2:08 PM, Ken Kundert wrote: > > What's the mnemonic here? Why "r" for scale factor? > > My thinking was that r stands for real like f stands for float. > With the base 2 scale factors, b stands for binary. "Real" has historically often been a synonym for "float", and it doesn't really say that it'll be shown in engineering notation. But then, we currently have format codes 'e', 'f', and 'g', and I don't think there's much logic there beyond "exponential", "floating-point", and... "general format"? I think that's a back-formation, frankly, and 'g' was used simply because it comes nicely after 'e' and 'f'. (C's decision, not Python's, fwiw.) I'll stick with 'r' for now, but it could just as easily become 'h' to avoid confusion with %r for repr. >> (2) Support for full prefix names, so we can format (say) "kilograms" as well >> as "kg"? > > This assumes that somehow this code can access the units so that it can switch > between long form 'grams' and short form 'g'. That is a huge expansion in the > complexity for what seems like a small benefit. > AIUI, it's just giving the full word. class ScaledNumber(float): invert = {"?": 1e6, "m": 1e3, "": 1, "k": 1e-3, "M": 1e-6} words = {"?": "micro", "m": "milli", "": "", "k": "kilo", "M": "mega"} aliases = {"u": "?"} def autoscale(self): if self < 1e-6: return None if self < 1e-3: return "?" if self < 1: return "m" if self < 1e3: return "" if self < 1e6: return "k" if self < 1e9: return "M" return None def __format__(self, fmt): if fmt == "r" or fmt == "R": scale = self.autoscale() fmt = fmt + scale if scale else "f" if fmt.startswith("r"): scale = self.aliases.get(fmt[1], fmt[1]) return "%g%s" % (self * self.invert[scale], scale) if fmt.startswith("R"): scale = self.aliases.get(fmt[1], fmt[1]) return "%g %s" % (self * self.invert[scale], self.words[scale]) return super().__format__(self, fmt) >>> range = ScaledNumber(50e3) >>> print('Attenuation = {:.1f} dB at {:r}m.'.format(-13.7, range)) Attenuation = -13.7 dB at 50km. >>> print('Attenuation = {:.1f} dB at {:R}meters.'.format(-13.7, range)) Attenuation = -13.7 dB at 50 kilometers. >>> print('Attenuation = {:.1f} dB at {:rM}m.'.format(-13.7, range)) Attenuation = -13.7 dB at 0.05Mm. >>> print('Attenuation = {:.1f} dB at {:RM}meters.'.format(-13.7, range)) Attenuation = -13.7 dB at 0.05 megameters. It's a minor flexibility, but could be very useful. As you see, it's still not at all unit-aware; but grammatically, these formats only make sense if followed by an actual unit name. (And not an SI base unit, necessarily - you have to use "gram", not "kilogram", lest you get silly constructs like "microkilogram" for milligram.) Note that this *already works*. You do have to use an explicit class for your scaled numbers, since Python doesn't want you monkey-patching the built-in float type, but if you were to request that float.__format__ grow support for this, it'd be a relatively non-intrusive change. This class could live on PyPI until one day becoming subsumed into core, or just be a permanent third-party float formatting feature. ChrisA From python-ideas at shalmirane.com Wed Aug 31 03:47:45 2016 From: python-ideas at shalmirane.com (Ken Kundert) Date: Wed, 31 Aug 2016 00:47:45 -0700 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> Message-ID: <20160831074745.GA13259@kundert.designers-guide.com> Thanks Chris. I had misunderstood Steve's request, and I was thinking of something much more complicated. Your code is very helpful. -Ken On Wed, Aug 31, 2016 at 05:07:11PM +1000, Chris Angelico wrote: > On Wed, Aug 31, 2016 at 2:08 PM, Ken Kundert > wrote: > > > What's the mnemonic here? Why "r" for scale factor? > > > > My thinking was that r stands for real like f stands for float. > > With the base 2 scale factors, b stands for binary. > > "Real" has historically often been a synonym for "float", and it > doesn't really say that it'll be shown in engineering notation. But > then, we currently have format codes 'e', 'f', and 'g', and I don't > think there's much logic there beyond "exponential", "floating-point", > and... "general format"? I think that's a back-formation, frankly, and > 'g' was used simply because it comes nicely after 'e' and 'f'. (C's > decision, not Python's, fwiw.) I'll stick with 'r' for now, but it > could just as easily become 'h' to avoid confusion with %r for repr. > > >> (2) Support for full prefix names, so we can format (say) "kilograms" as well > >> as "kg"? > > > > This assumes that somehow this code can access the units so that it can switch > > between long form 'grams' and short form 'g'. That is a huge expansion in the > > complexity for what seems like a small benefit. > > > > AIUI, it's just giving the full word. > > class ScaledNumber(float): > invert = {"?": 1e6, "m": 1e3, "": 1, "k": 1e-3, "M": 1e-6} > words = {"?": "micro", "m": "milli", "": "", "k": "kilo", "M": "mega"} > aliases = {"u": "?"} > def autoscale(self): > if self < 1e-6: return None > if self < 1e-3: return "?" > if self < 1: return "m" > if self < 1e3: return "" > if self < 1e6: return "k" > if self < 1e9: return "M" > return None > def __format__(self, fmt): > if fmt == "r" or fmt == "R": > scale = self.autoscale() > fmt = fmt + scale if scale else "f" > if fmt.startswith("r"): > scale = self.aliases.get(fmt[1], fmt[1]) > return "%g%s" % (self * self.invert[scale], scale) > if fmt.startswith("R"): > scale = self.aliases.get(fmt[1], fmt[1]) > return "%g %s" % (self * self.invert[scale], self.words[scale]) > return super().__format__(self, fmt) > > >>> range = ScaledNumber(50e3) > >>> print('Attenuation = {:.1f} dB at {:r}m.'.format(-13.7, range)) > Attenuation = -13.7 dB at 50km. > >>> print('Attenuation = {:.1f} dB at {:R}meters.'.format(-13.7, range)) > Attenuation = -13.7 dB at 50 kilometers. > >>> print('Attenuation = {:.1f} dB at {:rM}m.'.format(-13.7, range)) > Attenuation = -13.7 dB at 0.05Mm. > >>> print('Attenuation = {:.1f} dB at {:RM}meters.'.format(-13.7, range)) > Attenuation = -13.7 dB at 0.05 megameters. > > It's a minor flexibility, but could be very useful. As you see, it's > still not at all unit-aware; but grammatically, these formats only > make sense if followed by an actual unit name. (And not an SI base > unit, necessarily - you have to use "gram", not "kilogram", lest you > get silly constructs like "microkilogram" for milligram.) > > Note that this *already works*. You do have to use an explicit class > for your scaled numbers, since Python doesn't want you monkey-patching > the built-in float type, but if you were to request that > float.__format__ grow support for this, it'd be a relatively > non-intrusive change. This class could live on PyPI until one day > becoming subsumed into core, or just be a permanent third-party float > formatting feature. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Wed Aug 31 08:14:06 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 31 Aug 2016 22:14:06 +1000 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: <20160831040801.GF2363@kundert.designers-guide.com> References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> Message-ID: <20160831121406.GR26300@ando.pearwood.info> On Tue, Aug 30, 2016 at 09:08:01PM -0700, Ken Kundert wrote: > > What's the mnemonic here? Why "r" for scale factor? > > My thinking was that r stands for real like f stands for float. Hmmm. Do you know many mathematicians who use SI prefixes when talking about real numbers? I don't think "real number" is relevant to SI prefixes. > With the base 2 scale factors, b stands for binary. Well, obviously :-) > > (1) Why no support for choosing a particular scale? If this only auto-scales, > > I'm not interested. > > Auto-scaling is kind of the point. There is really little need for a special > mechanism if your going to specify the scale factor yourself. The point is not to have to repeat yourself. If I have to scale numbers in lots of places, I don't want to have to re-write the same code in each of them. I want to call a function. Understand that I'm not against auto-scaling. I think it is a good idea. But I strongly disagree that it is the *only* way to do this. If there's code in the std lib to format numbers to some scale, I should be able to loop through a bunch of numbers and format them all in a consistent unit if I so choose, without having to do my own formatting. Its not that I don't want you to be able to auto-scale. I just want the choice of being able to use a consistent scale or not. [...] > If you wanted to force the second number to be in km, you use a %f format and > scale the argument: > > >>> print('Attenuation = {:.1f} dB at {:.1f} km.'.format(-13.7, 50e3/1e3)) > Attenuation = -13.7 dB at 50 km. *shrug* Well, you could do exactly the same thing. You only need a short function that determines the scale you want, and then scale it yourself. The point of making this a standard function is so that we don't have to keep re-writing the same code. > > (2) Support for full prefix names, so we can format (say) "kilograms" as well > > as "kg"? > > This assumes that somehow this code can access the units so that it can switch > between long form 'grams' and short form 'g'. That is a huge expansion in the > complexity for what seems like a small benefit. No, I'm talking about chosing between "M" or "mega". The actual unit itself is up to the caller to supply. You have definitely prodded my interest in the output side of this. I'm rather busy at the moment, but in the coming weeks I think I'll brush the cobwebs off byteformat and see what can be done. https://pypi.python.org/pypi/byteformat in case you want to have a play with it. -- Steve From ncoghlan at gmail.com Wed Aug 31 08:21:57 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 31 Aug 2016 22:21:57 +1000 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> Message-ID: On 31 August 2016 at 17:07, Chris Angelico wrote: > On Wed, Aug 31, 2016 at 2:08 PM, Ken Kundert > wrote: >> > What's the mnemonic here? Why "r" for scale factor? >> >> My thinking was that r stands for real like f stands for float. >> With the base 2 scale factors, b stands for binary. > > "Real" has historically often been a synonym for "float", and it > doesn't really say that it'll be shown in engineering notation. But > then, we currently have format codes 'e', 'f', and 'g', and I don't > think there's much logic there beyond "exponential", "floating-point", > and... "general format"? I think that's a back-formation, frankly, and > 'g' was used simply because it comes nicely after 'e' and 'f'. (C's > decision, not Python's, fwiw.) I'll stick with 'r' for now, but it > could just as easily become 'h' to avoid confusion with %r for repr. "h" would be a decent choice - it's not only a continuation of the e/f/g pattern, it's also very commonly used as a command line flag for "human-readable output" in system utilities that print numbers. The existing "alternate form" marker in string formatting could be used to request the use of the base 2 scaling prefixes rather than the base 10 ones: "#h". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From erik.m.bray at gmail.com Wed Aug 31 11:49:20 2016 From: erik.m.bray at gmail.com (Erik Bray) Date: Wed, 31 Aug 2016 17:49:20 +0200 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160830034854.GC19296@kundert.designers-guide.com> References: <20160829014404.GB29601@kundert.designers-guide.com> <57C3AB6E.1030107@brenbarn.net> <20160829034520.GY26300@ando.pearwood.info> <20160829070758.GA19357@kundert.designers-guide.com> <20160830034854.GC19296@kundert.designers-guide.com> Message-ID: On Tue, Aug 30, 2016 at 5:48 AM, Ken Kundert wrote: > Erik, > One aspect of astropy.units that differs significantly from what I am > proposing is that with astropy.units a user would explicitly specify the scale > factor along with the units, and that scale factor would not change even if the > value became very large or very small. For example: > > >>> from astropy import units as u > >>> d_andromeda = 7.8e5 * u.parsec > >>> print(d_andromeda) > 780000.0 pc > > >>> d_sun = 93e6*u.imperial.mile > >>> print(d_sun.to(u.parsec)) > 4.850441695494146e-06 pc > > >>> print(d_andromeda.to(u.kpc)) > 780.0 kpc > > >>> print(d_sun.to(u.kpc)) > 4.850441695494146e-09 kpc > > I can see where this can be helpful at times, but it kind of goes against the > spirit of SI scale factors, were you are generally expected to 'normalize' the > scale factor (use the scale factor that results in the digits presented before > the decimal point falling between 1 and 999). So I would expected > > d_andromeda = 780 kpc > d_sun = 4.8504 upc > > Is the normalization available astropy.units and I just did not find it? > Is there some reason not to provide the normalization? > > It seems to me that pre-specifying the scale factor might be preferred if one is > generating data for a table and all the magnitude of the values are known in > advance to within 2-3 orders of magnitude. > > It also seems to me that if these assumptions were not true, then normalizing > the scale factors would generally be preferred. > > Do you believe that? Hi Ken, I see what you're getting at, and that's a good idea. There's also nothing in the current implementation preventing it, and I think I'll even suggest this to Astropy (with proper attribution)! I think there are reasons not to always do this, but it's a nice option to have. Point being nothing about this particular feature requires special support from the language, unless I'm missing something obvious. And given that Astropy (or any other units library) is third-party chances are a feature like this will land in place a lot faster than it has any chance of showing up in Python :) Best, Erik > On Mon, Aug 29, 2016 at 03:05:50PM +0200, Erik Bray wrote: >> Astropy also has a very powerful units package--originally derived >> from pyunit I think but long since diverged and grown: >> >> http://docs.astropy.org/en/stable/units/index.html >> >> It was originally developed especially for astronomy/astrophysics use >> and has some pre-defined units that many other packages don't have, as >> well as support for logarithmic units like decibel and optional (and >> customizeable) unit equivalences (e.g. frequency/wavelength or >> flux/power). >> >> That said, its power extends beyond astronomy and I heard through last >> week's EuroScipy that even some biology people have been using it. >> There's been some (informal) talk about splitting it out from Astropy >> into a stand-alone package. This is tricky since almost everything in >> Astropy has been built around it (dimensional calculations are always >> used where possible), but not impossible. >> >> One of the other big advantages of astropy.units is the Quantity class >> representing scale+dimension values. This is deeply integrated into >> Numpy so that units can be attached to Numpy arrays, and all Numpy >> ufuncs can operate on them in a dimensionally meaningful way. The >> needs for this have driven a number of recent features in Numpy. This >> is work that, unfortunately, could never be integrated into the Python >> stdlib. From guido at python.org Wed Aug 31 12:19:22 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 31 Aug 2016 09:19:22 -0700 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> Message-ID: On Wed, Aug 31, 2016 at 5:21 AM, Nick Coghlan wrote: > On 31 August 2016 at 17:07, Chris Angelico wrote: >> On Wed, Aug 31, 2016 at 2:08 PM, Ken Kundert >> wrote: >>> > What's the mnemonic here? Why "r" for scale factor? >>> >>> My thinking was that r stands for real like f stands for float. >>> With the base 2 scale factors, b stands for binary. >> >> "Real" has historically often been a synonym for "float", and it >> doesn't really say that it'll be shown in engineering notation. But >> then, we currently have format codes 'e', 'f', and 'g', and I don't >> think there's much logic there beyond "exponential", "floating-point", >> and... "general format"? I think that's a back-formation, frankly, and >> 'g' was used simply because it comes nicely after 'e' and 'f'. (C's >> decision, not Python's, fwiw.) I'll stick with 'r' for now, but it >> could just as easily become 'h' to avoid confusion with %r for repr. > > "h" would be a decent choice - it's not only a continuation of the > e/f/g pattern, it's also very commonly used as a command line flag for > "human-readable output" in system utilities that print numbers. I like it. So after all the drama we're just talking about adding an 'h' format code that's like 'g' but uses SI scale factors instead of exponents. I guess we need to debate what it should do if the value is way out of range of the SI scale system -- what's it going to do when I pass it 1e50? I propose that it should fall back to 'g' style then, but use "engineering" style where exponents are always a multiple of 3.) > The existing "alternate form" marker in string formatting could be > used to request the use of the base 2 scaling prefixes rather than the > base 10 ones: "#h". Not sure about this one. -- --Guido van Rossum (python.org/~guido) From python at mrabarnett.plus.com Wed Aug 31 13:05:45 2016 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 31 Aug 2016 18:05:45 +0100 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: <20160831040801.GF2363@kundert.designers-guide.com> References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> Message-ID: <9fd36c9f-8ea6-26b1-ff59-eb85b6e69f74@mrabarnett.plus.com> On 2016-08-31 05:08, Ken Kundert wrote: >> What's the mnemonic here? Why "r" for scale factor? > > My thinking was that r stands for real like f stands for float. > With the base 2 scale factors, b stands for binary. > 'b' already means binary: >>> '{:b}'.format(100) '1100100' From python at mrabarnett.plus.com Wed Aug 31 13:07:14 2016 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 31 Aug 2016 18:07:14 +0100 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> Message-ID: <233afe30-c9b1-5f89-8839-fd73fb48023a@mrabarnett.plus.com> On 2016-08-31 17:19, Guido van Rossum wrote: > On Wed, Aug 31, 2016 at 5:21 AM, Nick Coghlan wrote: >> On 31 August 2016 at 17:07, Chris Angelico wrote: >>> On Wed, Aug 31, 2016 at 2:08 PM, Ken Kundert >>> wrote: >>>> > What's the mnemonic here? Why "r" for scale factor? >>>> >>>> My thinking was that r stands for real like f stands for float. >>>> With the base 2 scale factors, b stands for binary. >>> >>> "Real" has historically often been a synonym for "float", and it >>> doesn't really say that it'll be shown in engineering notation. But >>> then, we currently have format codes 'e', 'f', and 'g', and I don't >>> think there's much logic there beyond "exponential", "floating-point", >>> and... "general format"? I think that's a back-formation, frankly, and >>> 'g' was used simply because it comes nicely after 'e' and 'f'. (C's >>> decision, not Python's, fwiw.) I'll stick with 'r' for now, but it >>> could just as easily become 'h' to avoid confusion with %r for repr. >> >> "h" would be a decent choice - it's not only a continuation of the >> e/f/g pattern, it's also very commonly used as a command line flag for >> "human-readable output" in system utilities that print numbers. > > I like it. So after all the drama we're just talking about adding an > 'h' format code that's like 'g' but uses SI scale factors instead of > exponents. I guess we need to debate what it should do if the value is > way out of range of the SI scale system -- what's it going to do when > I pass it 1e50? I propose that it should fall back to 'g' style then, > but use "engineering" style where exponents are always a multiple of > 3.) > >> The existing "alternate form" marker in string formatting could be >> used to request the use of the base 2 scaling prefixes rather than the >> base 10 ones: "#h". > > Not sure about this one. > Does the 'type' have to be a single character? If not, how about 'hb' for binary scaling? From Nikolaus at rath.org Wed Aug 31 13:31:22 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 31 Aug 2016 10:31:22 -0700 Subject: [Python-ideas] real numbers with SI scale factors In-Reply-To: <20160830031920.GB19296@kundert.designers-guide.com> (Ken Kundert's message of "Mon, 29 Aug 2016 20:19:20 -0700") References: <20160829014404.GB29601@kundert.designers-guide.com> <20160829032945.GD29601@kundert.designers-guide.com> <87zinv9fv0.fsf@vostro.rath.org> <20160830031920.GB19296@kundert.designers-guide.com> Message-ID: <87r394onf9.fsf@thinkpad.rath.org> On Aug 29 2016, Ken Kundert wrote: > Nikolaus, > I have belatedly realized that this kind of hyperbole is counter productive. > So let me back away from that statement and instead try to understand your > reasons for not liking the proposal. > > Do you think there is no value to be able to naturally read and write numbers > with SI scale factors from Python? Or is your issue with something about my > proposal? * I think there is no value gained by being able to write 32.3m instead of 32.3e6. I think the second one is clear to everyone who uses SI prefixes, while the first one just introduces a lot of complexities. Most of them have been mentioned already: - no deducible ordering if one doesn't know the prefixes - potential for ambiguity with Exa - question about base 2 vs base 10, e.g what do you expect to be stored in *size* if you reed this: "size = 10M # we need that many bytes" - rather abitrary naming ("M" and "m" vs "T" and "p"). * I think having SI *Unit* support "32.3 kN" would be nice, but only if there is a space between number and unit, and only if the unit actually get's attached to the number. But your proposal would result in 1km + 1?N == 2 being true. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From random832 at fastmail.com Wed Aug 31 13:43:00 2016 From: random832 at fastmail.com (Random832) Date: Wed, 31 Aug 2016 13:43:00 -0400 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> Message-ID: <1472665380.41376.711754849.48190D2E@webmail.messagingengine.com> On Wed, Aug 31, 2016, at 12:19, Guido van Rossum wrote: > On Wed, Aug 31, 2016 at 5:21 AM, Nick Coghlan wrote: > > "h" would be a decent choice - it's not only a continuation of the > > e/f/g pattern, it's also very commonly used as a command line flag for > > "human-readable output" in system utilities that print numbers. > > I like it. So after all the drama we're just talking about adding an > 'h' format code that's like 'g' but uses SI scale factors instead of > exponents. I guess we need to debate what it should do if the value is > way out of range of the SI scale system -- what's it going to do when > I pass it 1e50? I propose that it should fall back to 'g' style then, > but use "engineering" style where exponents are always a multiple of > 3.) One thing to consider is that this is very likely to be used with a unit (e.g. "%hA" intending to display in amperes), so maybe it should put a space after it? Though really people are probably going to want "1 A" vs "1 kA" in that case, rather than "1 A" vs "1kA". Also, maybe consider that "1*10^50" [or, slightly less so, 1.0*10**50] is more human-readable than "1e+50". Er, with engineering style it'd be 100e+48 etc, but same basic issue. Also, is it really necessary to use single-character codes not shared with any other language? The only rationale here seems to be a desire to support everything in % and its limited grammar rather than requiring anyone to use format. If this feature is only supported in format a more verbose description of the desired format could be used. What if, for example, you want engineering style without SI scale factors? What should the "precision" field mean? %f takes a number of places after the decimal point whereas %e/%g takes a number of significant digits. Engineering or SI-scale-factor format suggests a third possibility: number of decimal places to be shown after the displayed decimal point, e.g. "%.1h" % 1.2345 * 10 ** x for x in range(10): "1.2", "12.3", "123.5", "1.2k", "12.3k", "123.5k", "1.2M", "12.3M", "123.5M". And the actual -h behavior of those system utilities you mentioned is "123k", "1.2M", "12M", with the effect being that the value always fits within a four-character field width, but this isn't a fixed number of decimal places *or* significant digits. > > The existing "alternate form" marker in string formatting could be > > used to request the use of the base 2 scaling prefixes rather than the > > base 10 ones: "#h". If base 2 scaling prefixes are used, should "engineering style" mean 2**[multiple of 10] instead of 10**[multiple of 3]? > Not sure about this one. From random832 at fastmail.com Wed Aug 31 13:50:44 2016 From: random832 at fastmail.com (Random832) Date: Wed, 31 Aug 2016 13:50:44 -0400 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: <1472665380.41376.711754849.48190D2E@webmail.messagingengine.com> References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> <1472665380.41376.711754849.48190D2E@webmail.messagingengine.com> Message-ID: <1472665844.45040.711818297.6A0BCBBB@webmail.messagingengine.com> On Wed, Aug 31, 2016, at 13:43, Random832 wrote: > And the actual -h behavior of those system utilities you mentioned is > "123k", "1.2M", "12M", with the effect being that the value always fits > within a four-character field width, but this isn't a fixed number of > decimal places *or* significant digits. I just did some testing... it can go to five characters when binary prefixes are used for e.g. "1023K". Also, interesting quirk - it always rounds up. 1025 bytes is "1.1K", and in SI mode, 1001 bytes is "1.1k" From eric at trueblade.com Wed Aug 31 13:48:37 2016 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 31 Aug 2016 13:48:37 -0400 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: <233afe30-c9b1-5f89-8839-fd73fb48023a@mrabarnett.plus.com> References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> <233afe30-c9b1-5f89-8839-fd73fb48023a@mrabarnett.plus.com> Message-ID: On 08/31/2016 01:07 PM, MRAB wrote: > On 2016-08-31 17:19, Guido van Rossum wrote: >> On Wed, Aug 31, 2016 at 5:21 AM, Nick Coghlan wrote: >>> "h" would be a decent choice - it's not only a continuation of the >>> e/f/g pattern, it's also very commonly used as a command line flag for >>> "human-readable output" in system utilities that print numbers. >> >> I like it. So after all the drama we're just talking about adding an >> 'h' format code that's like 'g' but uses SI scale factors instead of >> exponents. I guess we need to debate what it should do if the value is >> way out of range of the SI scale system -- what's it going to do when >> I pass it 1e50? I propose that it should fall back to 'g' style then, >> but use "engineering" style where exponents are always a multiple of >> 3.) Would you also want h to work with integers? >>> The existing "alternate form" marker in string formatting could be >>> used to request the use of the base 2 scaling prefixes rather than the >>> base 10 ones: "#h". >> >> Not sure about this one. >> '#' already has a meaning for float's 'g' format: >>> format(1.0, 'g') '1' >>> format(1.0, '#g') '1.00000' So I think you'd want to pick another type character to mean base 2 scaling, or another character other than #. But it gets cryptic pretty quickly. You could indeed use type == 'b' for floats to mean base 2 scaling, since it has no current meaning, but I'm not sure that's a great idea because 'b' means binary for integers, and if you want to also be able to scale ints (see above), then there's a conflict. Maybe type == 'z'? Or, use something like '@' (or whatever) instead of '#' to mean "the other alternate form", base 2 scaling. > Does the 'type' have to be a single character? As a practical matter, yes, it should just be a single character. You could make a special case for 'h' and 'hb', but I would not recommend that. Explaining it in the documentation would be confusing. Eric. From Nikolaus at rath.org Wed Aug 31 14:07:08 2016 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 31 Aug 2016 11:07:08 -0700 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: (Guido van Rossum's message of "Wed, 31 Aug 2016 09:19:22 -0700") References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> Message-ID: <87h9a0olrn.fsf@thinkpad.rath.org> On Aug 31 2016, Guido van Rossum wrote: > On Wed, Aug 31, 2016 at 5:21 AM, Nick Coghlan wrote: >> On 31 August 2016 at 17:07, Chris Angelico wrote: >>> On Wed, Aug 31, 2016 at 2:08 PM, Ken Kundert >>> wrote: >>>> > What's the mnemonic here? Why "r" for scale factor? >>>> >>>> My thinking was that r stands for real like f stands for float. >>>> With the base 2 scale factors, b stands for binary. >>> >>> "Real" has historically often been a synonym for "float", and it >>> doesn't really say that it'll be shown in engineering notation. But >>> then, we currently have format codes 'e', 'f', and 'g', and I don't >>> think there's much logic there beyond "exponential", "floating-point", >>> and... "general format"? I think that's a back-formation, frankly, and >>> 'g' was used simply because it comes nicely after 'e' and 'f'. (C's >>> decision, not Python's, fwiw.) I'll stick with 'r' for now, but it >>> could just as easily become 'h' to avoid confusion with %r for repr. >> >> "h" would be a decent choice - it's not only a continuation of the >> e/f/g pattern, it's also very commonly used as a command line flag for >> "human-readable output" in system utilities that print numbers. > > I like it. So after all the drama we're just talking about adding an > 'h' format code that's like 'g' but uses SI scale factors instead of > exponents. I guess we need to debate what it should do if the value is > way out of range of the SI scale system -- what's it going to do when > I pass it 1e50? I propose that it should fall back to 'g' style then, > but use "engineering" style where exponents are always a multiple of > 3.) There's also the important nitpick if 32e7 is best rendered as 320 M or 0.32 G. There's valid applications for both. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From shane at hathawaymix.org Wed Aug 31 17:46:13 2016 From: shane at hathawaymix.org (Shane Hathaway) Date: Wed, 31 Aug 2016 15:46:13 -0600 Subject: [Python-ideas] Extending expressions using ellipsis Message-ID: <49797ff7-5066-df52-c962-1148ddd4f2e4@hathawaymix.org> Hi, I write a lot of SQLAlchemy code that looks more or less like this: rows = ( dbsession.query(Table1) .join( Table2, Table2.y = Table1.y) .filter(Table1.x = xx) .all()) The expressions get very long and nearly always need to be spread to multiple lines. I've tried various styles and have chosen the style above as the most tasteful available. Pros of the existing syntax: - It's possible to indent clearly and consistently. - Nested indentation works out OK. - It's flexible; I can combine lines or separate them for emphasis. Cons: - Extra parentheses are required. - The indentation is not enforced by the parser, so I have unnecessary freedom that could let various mistakes slip through. - The closing parenthesis has to move every time I append to or reorder the expression, leading to diff noise in version control. (Alternatively, I could put the closing parenthesis on its own line, but that consumes precious vertical reading space.) I'd like to suggest a small change to the Python parser that would make long expressions read better: rows = dbsession.query(Table1) ... .join( Table2, Table2.y = Table1.y) .filter(Table1.x = xx) .all() The idea is to use an ellipsis at the end of a line to spread an expression over multiple indented lines, terminated by a return to an earlier indentation level. You can still indent more deeply as needed, as shown above by the join() method call. This syntax has all the pros of the existing syntax and resolves all the cons: - No extra parentheses are required. - The indentation is enforced, so my mistakes are more likely to be caught early. - Without a closing parenthesis, there is no diff noise when I append to or reorder an expression. I've thought about using a colon instead of an ellipsis, but in Python, a colon starts a list of statements; that's not my goal. Instead, I'm looking for ways to use parser-enforced indentation to avoid mistakes and help my code read better without changing any semantics. Feedback is welcome! Shane From ckaynor at zindagigames.com Wed Aug 31 20:21:19 2016 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Wed, 31 Aug 2016 17:21:19 -0700 Subject: [Python-ideas] Extending expressions using ellipsis In-Reply-To: <49797ff7-5066-df52-c962-1148ddd4f2e4@hathawaymix.org> References: <49797ff7-5066-df52-c962-1148ddd4f2e4@hathawaymix.org> Message-ID: Guido's time machine strikes again, though using a slash (\) rather than elipse: >>> '.'\ ... .join( ... ( ... '1', ... '2', ... ) ... ) '1.2' This is from Python 2.7.10 (what I have on the machine I am currently on), though I'm fairly sure it has worked for quite a bit longer than that. Chris On Wed, Aug 31, 2016 at 2:46 PM, Shane Hathaway wrote: > Hi, > > I write a lot of SQLAlchemy code that looks more or less like this: > > rows = ( > dbsession.query(Table1) > .join( > Table2, Table2.y = Table1.y) > .filter(Table1.x = xx) > .all()) > > The expressions get very long and nearly always need to be spread to > multiple lines. I've tried various styles and have chosen the style above > as the most tasteful available. > > Pros of the existing syntax: > > - It's possible to indent clearly and consistently. > - Nested indentation works out OK. > - It's flexible; I can combine lines or separate them for emphasis. > > Cons: > > - Extra parentheses are required. > - The indentation is not enforced by the parser, so I have unnecessary > freedom that could let various mistakes slip through. > - The closing parenthesis has to move every time I append to or reorder > the expression, leading to diff noise in version control. (Alternatively, I > could put the closing parenthesis on its own line, but that consumes > precious vertical reading space.) > > I'd like to suggest a small change to the Python parser that would make > long expressions read better: > > rows = dbsession.query(Table1) ... > .join( > Table2, Table2.y = Table1.y) > .filter(Table1.x = xx) > .all() > > The idea is to use an ellipsis at the end of a line to spread an > expression over multiple indented lines, terminated by a return to an > earlier indentation level. You can still indent more deeply as needed, as > shown above by the join() method call. > > This syntax has all the pros of the existing syntax and resolves all the > cons: > > - No extra parentheses are required. > - The indentation is enforced, so my mistakes are more likely to be caught > early. > - Without a closing parenthesis, there is no diff noise when I append to > or reorder an expression. > > I've thought about using a colon instead of an ellipsis, but in Python, a > colon starts a list of statements; that's not my goal. Instead, I'm looking > for ways to use parser-enforced indentation to avoid mistakes and help my > code read better without changing any semantics. > > Feedback is welcome! > > Shane > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Aug 31 21:22:00 2016 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 31 Aug 2016 20:22:00 -0500 Subject: [Python-ideas] Extending expressions using ellipsis In-Reply-To: References: <49797ff7-5066-df52-c962-1148ddd4f2e4@hathawaymix.org> Message-ID: On Aug 31, 2016 7:22 PM, "Chris Kaynor" wrote: > > Guido's time machine strikes again, GAH! We should've just used that for PEPs 484 and 526; instead of trying to prove type hints are useful, Guido could've just: 1. Go 50 years into the future. 2. Make note of the Python's world domination (Perl is overrated). 3. Grab random examples of code using type hints and note that no nuclear missile switches have gone off like last time\b\b\b\b\b\b\b\b\b\b has never happened. 4. Bring them back here and shove it in the rationale. Problem solved! (Unless of course, the time machine accidentally sets off one of those missile switches like last time\b\b\b\b\b\b\b\b\b\b has never happened.) > though using a slash (\) rather than elipse: > > >>> '.'\ > ... .join( > ... ( > ... '1', > ... '2', > ... ) > ... ) > '1.2' > > This is from Python 2.7.10 (what I have on the machine I am currently on), though I'm fairly sure it has worked for quite a bit longer than that. > > Chris > > On Wed, Aug 31, 2016 at 2:46 PM, Shane Hathaway wrote: >> >> Hi, >> >> I write a lot of SQLAlchemy code that looks more or less like this: >> >> rows = ( >> dbsession.query(Table1) >> .join( >> Table2, Table2.y = Table1.y) >> .filter(Table1.x = xx) >> .all()) >> >> The expressions get very long and nearly always need to be spread to multiple lines. I've tried various styles and have chosen the style above as the most tasteful available. >> >> Pros of the existing syntax: >> >> - It's possible to indent clearly and consistently. >> - Nested indentation works out OK. >> - It's flexible; I can combine lines or separate them for emphasis. >> >> Cons: >> >> - Extra parentheses are required. >> - The indentation is not enforced by the parser, so I have unnecessary freedom that could let various mistakes slip through. >> - The closing parenthesis has to move every time I append to or reorder the expression, leading to diff noise in version control. (Alternatively, I could put the closing parenthesis on its own line, but that consumes precious vertical reading space.) >> >> I'd like to suggest a small change to the Python parser that would make long expressions read better: >> >> rows = dbsession.query(Table1) ... >> .join( >> Table2, Table2.y = Table1.y) >> .filter(Table1.x = xx) >> .all() >> >> The idea is to use an ellipsis at the end of a line to spread an expression over multiple indented lines, terminated by a return to an earlier indentation level. You can still indent more deeply as needed, as shown above by the join() method call. >> >> This syntax has all the pros of the existing syntax and resolves all the cons: >> >> - No extra parentheses are required. >> - The indentation is enforced, so my mistakes are more likely to be caught early. >> - Without a closing parenthesis, there is no diff noise when I append to or reorder an expression. >> >> I've thought about using a colon instead of an ellipsis, but in Python, a colon starts a list of statements; that's not my goal. Instead, I'm looking for ways to use parser-enforced indentation to avoid mistakes and help my code read better without changing any semantics. >> >> Feedback is welcome! >> >> Shane >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Aug 31 21:25:28 2016 From: guido at python.org (Guido van Rossum) Date: Wed, 31 Aug 2016 18:25:28 -0700 Subject: [Python-ideas] Extending expressions using ellipsis In-Reply-To: <49797ff7-5066-df52-c962-1148ddd4f2e4@hathawaymix.org> References: <49797ff7-5066-df52-c962-1148ddd4f2e4@hathawaymix.org> Message-ID: On Wed, Aug 31, 2016 at 2:46 PM, Shane Hathaway wrote: [...] > I'd like to suggest a small change to the Python parser that would make long > expressions read better: > > rows = dbsession.query(Table1) ... > .join( > Table2, Table2.y = Table1.y) > .filter(Table1.x = xx) > .all() > > The idea is to use an ellipsis at the end of a line to spread an expression > over multiple indented lines, terminated by a return to an earlier > indentation level. You can still indent more deeply as needed, as shown > above by the join() method call. > > This syntax has all the pros of the existing syntax and resolves all the > cons: > > - No extra parentheses are required. > - The indentation is enforced, so my mistakes are more likely to be caught > early. > - Without a closing parenthesis, there is no diff noise when I append to or > reorder an expression. > > I've thought about using a colon instead of an ellipsis, but in Python, a > colon starts a list of statements; that's not my goal. Instead, I'm looking > for ways to use parser-enforced indentation to avoid mistakes and help my > code read better without changing any semantics. (And no, this isn't equivalent to using '\'.) Would this be enforced in the grammar or by the lexer? Since you say you expect the indentation to be enforced, that suggests it would be done by the grammar, but then the question is how you would modify the grammar? You could take the rule that says an expression can be followed by ".NAME" and extended it to also allow "... INDENT xxxxx DEDENT" where the xxxxx is whatever's allowed at ".NAME" (i.e. ".NAME" followed by other tails like "(......)" or "[.......]". But then you could only use this new idea for chaining method calls, and not for spreading other large expressions across multiple lines. -- --Guido van Rossum (python.org/~guido) From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Aug 31 23:56:40 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 1 Sep 2016 12:56:40 +0900 Subject: [Python-ideas] Extending expressions using ellipsis In-Reply-To: References: <49797ff7-5066-df52-c962-1148ddd4f2e4@hathawaymix.org> Message-ID: <22471.42744.446205.235372@turnbull.sk.tsukuba.ac.jp> Chris Kaynor writes: > Guido's time machine strikes again, though using a slash (\) rather than > elipse: > > >>> '.'\ > ... .join( > ... ( > ... '1', > ... '2', > ... ) > ... ) > '1.2' > > This is from Python 2.7.10 (what I have on the machine I am currently on), > though I'm fairly sure it has worked for quite a bit longer than that. I expect Shane is aware of that. There are three issues. First, the ellipsis is more visible, and because it's syntactic rather than lexical, trailing whitespace (line comments!) isn't a problem. Second, more important, Shane wants indentation enforced by the parser, which requires a syntactic line break. I'll add a third, which Shane may have meant to imply. That is, to get '\' to work with more general expressions (specifically, embedded function calls with arguments that are long enough to themselves require physical line breaks), you need to put in an escaped newline at each such. Shane's idea is that at the *first* physical linebreak, you put in the ellipsis, and after that your expression *must* obey pythonic indentation rules until the end of that expression. N.B. "Pythonic" rather than "Python" because Python currently doesn't have indentation rules for expressions, and the analogy to suite indentation will be imperfect, I suspect. Analogy (very inaccurate): "readable" regexp syntax. As yet, I have no opinion of the proposal itself, but it's clearly more powerful than your reply suggests. From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Aug 31 23:57:45 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 1 Sep 2016 12:57:45 +0900 Subject: [Python-ideas] real numbers with SI scale factors: next steps In-Reply-To: <1472665844.45040.711818297.6A0BCBBB@webmail.messagingengine.com> References: <20160830203427.GE2363@kundert.designers-guide.com> <20160831020551.GN26300@ando.pearwood.info> <20160831040801.GF2363@kundert.designers-guide.com> <1472665380.41376.711754849.48190D2E@webmail.messagingengine.com> <1472665844.45040.711818297.6A0BCBBB@webmail.messagingengine.com> Message-ID: <22471.42809.452598.705168@turnbull.sk.tsukuba.ac.jp> Random832 writes: > Also, interesting quirk - it always rounds up. 1025 bytes is "1.1K", and > in SI mode, 1001 bytes is "1.1k" That seems to be right approach: in system administration, these numbers are used mostly to understand resource usage, and underestimates are almost never what you want, while quite large overestimates are tolerable, and are typically limited because the actual precision of calculations is much higher than that of the "human-readable" output. I don't know if that would be true in general-purpose programming. I suspect not.