From nnorwitz at gmail.com Thu Jun 1 06:19:25 2006 From: nnorwitz at gmail.com (Neal Norwitz) Date: Wed, 31 May 2006 21:19:25 -0700 Subject: [Python-3000] Using a list for *args (was: Type annotations: annotating generators) In-Reply-To: <43aa6ff70605311233i6f8195fdye2ed52fc559830ea@mail.gmail.com> References: <43aa6ff70605271348y352921f6he107ba1f40a0393a@mail.gmail.com> <43aa6ff70605311233i6f8195fdye2ed52fc559830ea@mail.gmail.com> Message-ID: On 5/31/06, Collin Winter wrote: > > All in all, the tuple->list change was minimally invasive. > > Overall, I've chosen to keep the external interfaces of the changed > modules/packages the same; if there's a desire to change them later, > this SVN commit can be used to figure out where adjustments should be > made. Most of the changes involve the test suite, primarily where > higher-order functions are concerned. > > I've submitted a patch to implement this change as SF #1498441 > (http://python.orf/sf/1498441); it's assigned to Guido. .org that is :-) Could you run a benchmark before and after this patch? I'd like to know speed diff. Something like: ./python.exe -mtimeit 'def foo(*args): pass' 'foo()' ./python.exe -mtimeit 'def foo(*args): pass' 'foo(1)' ./python.exe -mtimeit 'def foo(*args): pass' 'foo(1, 2)' ./python.exe -mtimeit 'def foo(*args): pass' 'foo(1, 2, 3)' ./python.exe -mtimeit 'def foo(*args): pass' 'foo(*range(10))' You can post the speeds in the patch. Thanks, n From ncoghlan at gmail.com Thu Jun 1 12:04:00 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 01 Jun 2006 20:04:00 +1000 Subject: [Python-3000] packages in the stdlib In-Reply-To: References: <44716940.9000300@acm.org> <4472B196.7070506@acm.org> <447BC126.8050107@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <1149102266.5718.62.camel@fsol> Message-ID: <447EBB90.20104@gmail.com> Brett Cannon wrote: > So perhaps there is a way to create some kind of "virtual packages" or > "categories" in which existing modules could register themselves. This > could allow third-party modules (e.g. "gtk") to register themselves in > stdlib-supplied virtual packages (e.g. "gui"), for documentation and > findability purposes. "import gui; help(gui)" would give you the list of > available modules. > > > I see possible problems with this because then we run into your issue > with packaging; where do things go? At least with the stdlib we have > sufficient control to make sure things follow a standard in terms of > where thing should end up. > > I would rather do an all-or-nothing solution to the whole package > hierarchy for the stdlib. Does anyone else have an opinion on any of > this since this ending up just being fundamental differences in how two > people like to organize modules? Hmm, much as I hate to jump on a Web bandwagon, this just rang the 'tagging' bell in my head. XML, for instance, would fit in the "filefmt" category, but it also fits in categories like "datastruct", "markup", "parsing", "net" and "protocol". If Py3k improves the ability to access metadata associated with packages and modules (name, docstring, '__all__', etc) without actually importing the same, then it would be possible for "help(tag='gui')" to return one line descriptions for all modules that are marked as being 'gui' related. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From p.f.moore at gmail.com Thu Jun 1 13:29:28 2006 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 1 Jun 2006 12:29:28 +0100 Subject: [Python-3000] packages in the stdlib In-Reply-To: References: <44716940.9000300@acm.org> <4472B196.7070506@acm.org> <447BC126.8050107@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> Message-ID: <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> On 5/31/06, Brett Cannon wrote: > Why would a 3rd-party module be installed into the stdlib namespace? > net.jabber wouldn't exist unless it was in the stdlib or the module's author > decided to be snarky and inject their module into the stdlib namespace. Do you really want the stdlib to "steal" all of the simple names (like net, gui, data, ...)? While I don't think it's a particularly good idea for 3rd party modules to use such names, I'm not too keen on having them made effectively "reserved", either. And if there was a "net" package which contained all the networking modules in the stdlib, then yes I would expect a 3rd party developer of a jabber module to want to take advantage of the hierarchy and inject itself into the "net" namespace. Which would actually make name collisions worse rather than better. [Although, evidence from the current os module seems to imply that this is less of an issue than I'm claiming, in practice...] Paul. From ronaldoussoren at mac.com Thu Jun 1 16:51:03 2006 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 1 Jun 2006 16:51:03 +0200 Subject: [Python-3000] packages in the stdlib In-Reply-To: <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> References: <44716940.9000300@acm.org> <4472B196.7070506@acm.org> <447BC126.8050107@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> Message-ID: <477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com> On 1-jun-2006, at 13:29, Paul Moore wrote: > On 5/31/06, Brett Cannon wrote: >> Why would a 3rd-party module be installed into the stdlib namespace? >> net.jabber wouldn't exist unless it was in the stdlib or the >> module's author >> decided to be snarky and inject their module into the stdlib >> namespace. > > Do you really want the stdlib to "steal" all of the simple names (like > net, gui, data, ...)? While I don't think it's a particularly good > idea for 3rd party modules to use such names, I'm not too keen on > having them made effectively "reserved", either. That was my feeling too, except that I haven't made my mind up on the merit of having 3th-party modules inside such packages. I don't think the risk of nameclashes would be greater than it is now, there's already an implicit nameing convention, or rather several of them ;-), for naming modules in the standard library. The main problem I have with excluding 3th-party libraries from such generic toplevel packages in the standard library is that this increases the separation between stdlib and other code. I'd rather see a lean&mean standard library with a standard mechanism for adding more libraries and perhaps a central list of good libraries. > > And if there was a "net" package which contained all the networking > modules in the stdlib, then yes I would expect a 3rd party developer > of a jabber module to want to take advantage of the hierarchy and > inject itself into the "net" namespace. Which would actually make name > collisions worse rather than better. [Although, evidence from the > current os module seems to imply that this is less of an issue than > I'm claiming, in practice...] I suppose that's at least partially not an issue at the moment because you can only add stuff to existing packages through hacks. I wouldn't touch libraries that inject themselves into existing packages through .pth hackery because of the juckyness of it [*]. Ronald From brett at python.org Thu Jun 1 17:44:02 2006 From: brett at python.org (Brett Cannon) Date: Thu, 1 Jun 2006 08:44:02 -0700 Subject: [Python-3000] packages in the stdlib In-Reply-To: <477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com> References: <44716940.9000300@acm.org> <447BC126.8050107@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> <477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com> Message-ID: On 6/1/06, Ronald Oussoren wrote: > > > On 1-jun-2006, at 13:29, Paul Moore wrote: > > > On 5/31/06, Brett Cannon wrote: > >> Why would a 3rd-party module be installed into the stdlib namespace? > >> net.jabber wouldn't exist unless it was in the stdlib or the > >> module's author > >> decided to be snarky and inject their module into the stdlib > >> namespace. > > > > Do you really want the stdlib to "steal" all of the simple names (like > > net, gui, data, ...)? While I don't think it's a particularly good > > idea for 3rd party modules to use such names, I'm not too keen on > > having them made effectively "reserved", either. > > That was my feeling too, except that I haven't made my mind up on the > merit of having 3th-party modules inside such packages. I don't think > the risk of nameclashes would be greater than it is now, there's > already an implicit nameing convention, or rather several of > them ;-), for naming modules in the standard library. Right. And as Paul said in his email, the os module has shown this is not an issue. As long as the names are known ahead of time there is not much of a problem. The main problem I have with excluding 3th-party libraries from such > generic toplevel packages in the standard library is that this > increases the separation between stdlib and other code. I'd rather > see a lean&mean standard library with a standard mechanism for adding > more libraries and perhaps a central list of good libraries. Well, personally I would like to clean up the stdlib, but I don't want to make it too lean since the whole "Batteries Included" thing is handy. As for sanctioned libraries that don't come included, that could be possible, but the politics of picking the libraries could be nasty. > > > And if there was a "net" package which contained all the networking > > modules in the stdlib, then yes I would expect a 3rd party developer > > of a jabber module to want to take advantage of the hierarchy and > > inject itself into the "net" namespace. Which would actually make name > > collisions worse rather than better. [Although, evidence from the > > current os module seems to imply that this is less of an issue than > > I'm claiming, in practice...] > > I suppose that's at least partially not an issue at the moment > because you can only add stuff to existing packages through hacks. I > wouldn't touch libraries that inject themselves into existing > packages through .pth hackery because of the juckyness of it [*]. Yeah, something better than .pth files would be good. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060601/5b596939/attachment.htm From jcarlson at uci.edu Thu Jun 1 18:12:34 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 01 Jun 2006 09:12:34 -0700 Subject: [Python-3000] packages in the stdlib In-Reply-To: <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> References: <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> Message-ID: <20060601090456.6993.JCARLSON@uci.edu> "Paul Moore" wrote: > > On 5/31/06, Brett Cannon wrote: > > Why would a 3rd-party module be installed into the stdlib namespace? > > net.jabber wouldn't exist unless it was in the stdlib or the module's author > > decided to be snarky and inject their module into the stdlib namespace. > > Do you really want the stdlib to "steal" all of the simple names (like > net, gui, data, ...)? While I don't think it's a particularly good > idea for 3rd party modules to use such names, I'm not too keen on > having them made effectively "reserved", either. This is one reason why I was suggesting the 'py' (or other) top level package; then we would really have py.net, py.gui, py.data, etc., which would presumably avoid name collisions, and wouldn't reserve the generic names. As for 3rd party modules, that is those modules that would (or should) go into the site-packages right now, I'm not sure I like the idea of having them inject themselves into the "package heirarchy" of the standard library, though it wouldn't be too terribly difficult with an import hook combined with a setup hook*. - Josiah * The setup hook creates and/or modifies a special "3rd party packages" bit of metadata (presumably in XML). This metadata describes two things; where the module lies in the heirarchy registry, and where it actually lies in the filesystem. The import hook would adjust the __all__ or module/package dictionary on import to include the names of the modules that are importable, as known by the metadata registry. From bingham at cenix-bioscience.com Thu Jun 1 18:03:16 2006 From: bingham at cenix-bioscience.com (Aaron Bingham) Date: Thu, 01 Jun 2006 18:03:16 +0200 Subject: [Python-3000] packages in the stdlib In-Reply-To: <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> References: <44716940.9000300@acm.org> <4472B196.7070506@acm.org> <447BC126.8050107@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> Message-ID: <447F0FC4.1030906@cenix-bioscience.com> Paul Moore wrote: >On 5/31/06, Brett Cannon wrote: > > >>Why would a 3rd-party module be installed into the stdlib namespace? >>net.jabber wouldn't exist unless it was in the stdlib or the module's author >>decided to be snarky and inject their module into the stdlib namespace. >> >> > >Do you really want the stdlib to "steal" all of the simple names (like >net, gui, data, ...)? While I don't think it's a particularly good >idea for 3rd party modules to use such names, I'm not too keen on >having them made effectively "reserved", either. > > I'm confused. As far as I can see, a reserved prefix (the "py" or "stdlib" package others have mentioned) is the only reliable way to avoid naming conflicts with 3rd-party packages with a growing standard library. I suspect we wll be going round and round in circles here as long as a reserved prefix is ruled out. IMO, multiple reserved prefixes ("net", "gui", etc.) is much worse than one. Could someone please explain for my sake why a single reserved prefix is not acceptable? Thanks, -- -------------------------------------------------------------------- Aaron Bingham Senior Software Engineer Cenix BioScience GmbH -------------------------------------------------------------------- From brett at python.org Thu Jun 1 18:33:59 2006 From: brett at python.org (Brett Cannon) Date: Thu, 1 Jun 2006 09:33:59 -0700 Subject: [Python-3000] packages in the stdlib In-Reply-To: <447F0FC4.1030906@cenix-bioscience.com> References: <44716940.9000300@acm.org> <447BC126.8050107@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> <447F0FC4.1030906@cenix-bioscience.com> Message-ID: On 6/1/06, Aaron Bingham wrote: > > Paul Moore wrote: > > >On 5/31/06, Brett Cannon wrote: > > > > > >>Why would a 3rd-party module be installed into the stdlib namespace? > >>net.jabber wouldn't exist unless it was in the stdlib or the module's > author > >>decided to be snarky and inject their module into the stdlib namespace. > >> > >> > > > >Do you really want the stdlib to "steal" all of the simple names (like > >net, gui, data, ...)? While I don't think it's a particularly good > >idea for 3rd party modules to use such names, I'm not too keen on > >having them made effectively "reserved", either. > > > > > I'm confused. As far as I can see, a reserved prefix (the "py" or > "stdlib" package others have mentioned) is the only reliable way to > avoid naming conflicts with 3rd-party packages with a growing standard > library. I suspect we wll be going round and round in circles here as > long as a reserved prefix is ruled out. IMO, multiple reserved prefixes > ("net", "gui", etc.) is much worse than one. Could someone please > explain for my sake why a single reserved prefix is not acceptable? Guido doesn't like it. =) And he said he is going to ignore this topic probably until we get a good consensus on what we want. If we can get almost everyone for it we may be able to convince him to change his mind. That being said, I don't think the root name is needed if we keep the hierarchy flat. We have done fine so far without it. But if we do have one level of package organization then I think the root 'py' would be good. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060601/ac079110/attachment-0001.html From tjreedy at udel.edu Thu Jun 1 19:41:40 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 1 Jun 2006 13:41:40 -0400 Subject: [Python-3000] packages in the stdlib References: <44716940.9000300@acm.org> <447BC126.8050107@acm.org><1149080922.5718.20.camel@fsol><1149095977.5718.51.camel@fsol><430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com><79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com><477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com> Message-ID: "Brett Cannon" wrote in message news:bbaeab100606010844s552e7918i481301082e706ac6 at mail.gmail.com... >Well, personally I would like to clean up the stdlib, but I don't want to >make it >too lean since the whole "Batteries Included" thing is handy. Definitely as to both. > As for sanctioned libraries that don't come included, that could be > possible, > but the politics of picking the libraries could be nasty. Sanctioning possibly multiple libraries (with non-clashing names) in a category shoud be less nasty than picking just one to include and distribute with the library. The criteria for sanction should be similar to that for inclusion -- such as maturity and commitment -- but without having to be 'the best'. As a user, I think I would want to be able to plug things in. Terry Jan Reedy From tjreedy at udel.edu Thu Jun 1 20:13:08 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 1 Jun 2006 14:13:08 -0400 Subject: [Python-3000] packages in the stdlib References: <44716940.9000300@acm.org><4472B196.7070506@acm.org> <447BC126.8050107@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> <447F0FC4.1030906@cenix-bioscience.com> Message-ID: "Aaron Bingham" wrote in message news:447F0FC4.1030906 at cenix-bioscience.com... > I'm confused. As far as I can see, a reserved prefix (the "py" or > "stdlib" package others have mentioned) is the only reliable way to > avoid naming conflicts with 3rd-party packages with a growing standard > library. True, but.. > I suspect we wll be going round and round in circles here as > long as a reserved prefix is ruled out. IMO, multiple reserved prefixes > ("net", "gui", etc.) is much worse than one. But much better than a hundred or more ;-) > Could someone please > explain for my sake why a single reserved prefix is not acceptable? Because you have to type it over and over. Because it is pure nuisance for simple usage of python with imports only or almost only from the standard lib. Because it does nothing to organize the standard lib. Because it would be in addition to any set of organizing prefixes such as 'net', 'gui', etc, which are much more informative from a user viewpoint. There are two separate issues being discussed here: 1) reducing/eliminating name clashes between stdlib and other modules; 2) organing the stdlib with a shallow hierarchy. For the former, yes, a prefix on stdlib modules would work, but this most common case could/should be the default. Requiring instead a prefix on all *other* imports would accomplish the same. For instance, 's' for imports from site-packages and 'l' for imports of local modules on sys.path (which would then not have lib and lib/site-packages on it). But the problem I see with this approach is that is says that the most important thing about a module is where it comes from, rather than what I does. For the latter (2 above), I think those who want such mostly agree in principle on a mostly two-level hierarchy with about 10-20 short names for the top-level, using the lib docs as a starting point for the categories The top level files should have nice doc strings so that import xyzt; help(xyz) gives a nice list of the contents of xyz. To deal with the problem of cross-categorization, this doc couldalso have a 'See Also' section listing modules that might have been put in xyz and might be sought in xyz but which were actually put elsewhere. Up in the air is the question of plugging in other modules not included in the stdlib. With useful categories, this strikes me as a useful thing to do. From a usage viewpoint, what a module does is more important than who wrote it and who distributes it. When it become trivial to grab and install non-stdlib modules, then the distinction between stdlib and not becomes even less important. If there is an approved list of plugins for each top level package, that can be included in the doc as well. What would be really nice is if trying to import an uninstalled approved module would trigger an attempt to download and install it (in the appropriate package.) Terry Jan Reedy From collinw at gmail.com Thu Jun 1 21:32:39 2006 From: collinw at gmail.com (Collin Winter) Date: Thu, 1 Jun 2006 21:32:39 +0200 Subject: [Python-3000] Using a list for *args (was: Type annotations: annotating generators) In-Reply-To: References: <43aa6ff70605271348y352921f6he107ba1f40a0393a@mail.gmail.com> <43aa6ff70605311233i6f8195fdye2ed52fc559830ea@mail.gmail.com> Message-ID: <43aa6ff70606011232v7e415faax52c0e900b03164f@mail.gmail.com> On 6/1/06, Neal Norwitz wrote: > Could you run a benchmark before and after this patch? I'd like to > know speed diff. (Sorry you got this twice, Neal.) I've attached the benchmarks as a comment on the patch, but I'll repeat them here. All times are usecs per loop. ./python -mtimeit 'def foo(*args): pass' 'foo()' As tuple: 1.56 As list: 1.7 ./python -mtimeit 'def foo(*args): pass' 'foo(1)' As tuple: 1.75 As list: 2.04 ./python -mtimeit 'def foo(*args): pass' 'foo(1, 2)' As tuple: 1.87 As list: 2.15 ./python -mtimeit 'def foo(*args): pass' 'foo(1, 2, 3)' As tuple: 1.95 As list: 2.3 ./python -mtimeit 'def foo(*args): pass' 'foo(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)' As tuple: 2.67 As list: 2.97 Collin Winter From mcherm at mcherm.com Thu Jun 1 22:19:36 2006 From: mcherm at mcherm.com (Michael Chermside) Date: Thu, 01 Jun 2006 13:19:36 -0700 Subject: [Python-3000] Using a list for *args (was: Type annotations:annotating generators) Message-ID: <20060601131936.srakny7zvu3lwok0@login.werra.lunarpages.com> Collin Winter writes: > I've attached the benchmarks as a comment on the patch, but I'll > repeat them here. All times are usecs per loop. [statistics showing list is about 15% slower] My memory is fuzzy here. Can someone repeat for me the reasons why we wanted to use list? Were we just trying it out to see how it worked, or was there a desire to change? Was the desire to change because it improved some uses of the C api, or was it just for "purity" in use of tuples vs lists? I'm not a "need for speed" kind of guy, but I can't remember what the advantages of the list approach were supposed to be. ------- By the way I'm curious about the following also: # interpolating a list (I presume there's no advantage, but just checking) ./python -mtimeit 'def foo(*args): pass' 'foo(*range(10))' # calling a function that doesn't use *args ./python -mtimeit 'def foo(): pass' 'foo()' ./python -mtimeit 'def foo(x): pass' 'foo(1)' ./python -mtimeit 'def foo(x,y): pass' 'foo(1,2)' ./python -mtimeit 'def foo(x,y,z): pass' 'foo(1,2,3)' -- Michael Chermside PS: Thanks, Collin, for trying this. I have to admit, I'm surprised at how well-contained the changes turned out to be. From mike.klaas at gmail.com Thu Jun 1 23:16:25 2006 From: mike.klaas at gmail.com (Mike Klaas) Date: Thu, 1 Jun 2006 14:16:25 -0700 Subject: [Python-3000] packages in the stdlib In-Reply-To: References: <44716940.9000300@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> <447F0FC4.1030906@cenix-bioscience.com> Message-ID: <3d2ce8cb0606011416t30333f4aq641c00d557760eae@mail.gmail.com> Terry Reedy wrote: > Because you have to type it over and over. hmm, With the right context manager: import py with py as py: from gui import tkinker import net with net as net: import httplib import urllib -Mike From ronaldoussoren at mac.com Fri Jun 2 00:08:14 2006 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 2 Jun 2006 00:08:14 +0200 Subject: [Python-3000] packages in the stdlib In-Reply-To: References: <44716940.9000300@acm.org> <447BC126.8050107@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> <477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com> Message-ID: On 1-jun-2006, at 17:44, Brett Cannon wrote: > > > On 6/1/06, Ronald Oussoren wrote: > On 1-jun-2006, at 13:29, Paul Moore wrote: > > > On 5/31/06, Brett Cannon wrote: > >> Why would a 3rd-party module be installed into the stdlib > namespace? > >> net.jabber wouldn't exist unless it was in the stdlib or the > >> module's author > >> decided to be snarky and inject their module into the stdlib > >> namespace. > > > > Do you really want the stdlib to "steal" all of the simple names > (like > > net, gui, data, ...)? While I don't think it's a particularly good > > idea for 3rd party modules to use such names, I'm not too keen on > > having them made effectively "reserved", either. > > That was my feeling too, except that I haven't made my mind up on the > merit of having 3th-party modules inside such packages. I don't think > the risk of nameclashes would be greater than it is now, there's > already an implicit nameing convention, or rather several of > them ;-), for naming modules in the standard library. > > Right. And as Paul said in his email, the os module has shown this > is not an issue. As long as the names are known ahead of time > there is not much of a problem. And as I noted that's probably because the only ways to transparently patch the os module are evil .pth tricks and replacing files in the standard library. Neither are considered good style and therefore not something you'd do if you want someone to use your library. How would you react to a library on the cheeseshop that claims to add some useful functions to the os module? I'd be very, very hesitant to use such (hypothetical) library. There is however nothing wrong with naming a library tftplib. That's would be the obvious name for a library that supports TFTP and blends nicely with the standard library naming convention for network libraries. It would be nice if 3th-party libraries could blend in even with a more structured standard library, even if that would only be possible for carefully selected portions of the standard library. Not that this is a really big issue, the range of libraries on the cheeseshop is much, much larger than the functionality covered by the standard library ;-). There are of course also good reason for not wanting to follow the standard library conventions. Even if the stdlib would contain a 'gui' package for gui libraries and you could extend that from 3th-party code I'd not use that convention in PyObjC because its package structure explicitly mirrors that of the Objective-C libraries it wraps. > > The main problem I have with excluding 3th-party libraries from such > generic toplevel packages in the standard library is that this > increases the separation between stdlib and other code. I'd rather > see a lean&mean standard library with a standard mechanism for adding > more libraries and perhaps a central list of good libraries. > > Well, personally I would like to clean up the stdlib, but I don't > want to make it too lean since the whole "Batteries Included" thing > is handy. As for sanctioned libraries that don't come included, > that could be possible, but the politics of picking the libraries > could be nasty. How would that be more nasty than picking libraries to be included in the standard library? A sanctioned library list would be a level between the standard library and random 3th-party code, basicly to avoid tying the release cycle of "obviously useful" software to that of python itself. > > > > > And if there was a "net" package which contained all the networking > > modules in the stdlib, then yes I would expect a 3rd party developer > > of a jabber module to want to take advantage of the hierarchy and > > inject itself into the "net" namespace. Which would actually make > name > > collisions worse rather than better. [Although, evidence from the > > current os module seems to imply that this is less of an issue than > > I'm claiming, in practice...] > > I suppose that's at least partially not an issue at the moment > because you can only add stuff to existing packages through hacks. I > wouldn't touch libraries that inject themselves into existing > packages through .pth hackery because of the juckyness of it [*]. > > Yeah, something better than .pth files would be good. There's nothing wrong with .pth files per-se and I use them regularly. The juckyness is in .pth files that contain lines that start with 'import', those can do very scary things such as hot- patching the standard library during python startup. That's something I don't like to see in production code. Ronald From brett at python.org Fri Jun 2 00:15:08 2006 From: brett at python.org (Brett Cannon) Date: Thu, 1 Jun 2006 15:15:08 -0700 Subject: [Python-3000] packages in the stdlib In-Reply-To: References: <44716940.9000300@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> <477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com> Message-ID: On 6/1/06, Ronald Oussoren wrote: > > > On 1-jun-2006, at 17:44, Brett Cannon wrote: > > > > > > > On 6/1/06, Ronald Oussoren wrote: > > On 1-jun-2006, at 13:29, Paul Moore wrote: > > > > > On 5/31/06, Brett Cannon wrote: > > >> Why would a 3rd-party module be installed into the stdlib > > namespace? > > >> net.jabber wouldn't exist unless it was in the stdlib or the > > >> module's author > > >> decided to be snarky and inject their module into the stdlib > > >> namespace. > > > > > > Do you really want the stdlib to "steal" all of the simple names > > (like > > > net, gui, data, ...)? While I don't think it's a particularly good > > > idea for 3rd party modules to use such names, I'm not too keen on > > > having them made effectively "reserved", either. > > > > That was my feeling too, except that I haven't made my mind up on the > > merit of having 3th-party modules inside such packages. I don't think > > the risk of nameclashes would be greater than it is now, there's > > already an implicit nameing convention, or rather several of > > them ;-), for naming modules in the standard library. > > > > Right. And as Paul said in his email, the os module has shown this > > is not an issue. As long as the names are known ahead of time > > there is not much of a problem. > > And as I noted that's probably because the only ways to transparently > patch the os module are evil .pth tricks and replacing files in the > standard library. Neither are considered good style and therefore not > something you'd do if you want someone to use your library. How would > you react to a library on the cheeseshop that claims to add some > useful functions to the os module? I'd be very, very hesitant to use > such (hypothetical) library. Exactly; I wouldn't touch it. Which is why I don't like this idea of having third-party modules add themselves to some stdlib package. There is however nothing wrong with naming a library tftplib. That's > would be the obvious name for a library that supports TFTP and blends > nicely with the standard library naming convention for network > libraries. It would be nice if 3th-party libraries could blend in > even with a more structured standard library, even if that would only > be possible for carefully selected portions of the standard library. No, there is no problem. If we stick with a flat stdlib this is what I would push for. Not that this is a really big issue, the range of libraries on the > cheeseshop is much, much larger than the functionality covered by the > standard library ;-). There are of course also good reason for not > wanting to follow the standard library conventions. Even if the > stdlib would contain a 'gui' package for gui libraries and you could > extend that from 3th-party code I'd not use that convention in PyObjC > because its package structure explicitly mirrors that of the > Objective-C libraries it wraps. > > > > > The main problem I have with excluding 3th-party libraries from such > > generic toplevel packages in the standard library is that this > > increases the separation between stdlib and other code. I'd rather > > see a lean&mean standard library with a standard mechanism for adding > > more libraries and perhaps a central list of good libraries. > > > > Well, personally I would like to clean up the stdlib, but I don't > > want to make it too lean since the whole "Batteries Included" thing > > is handy. As for sanctioned libraries that don't come included, > > that could be possible, but the politics of picking the libraries > > could be nasty. > > How would that be more nasty than picking libraries to be included in > the standard library? A sanctioned library list would be a level > between the standard library and random 3th-party code, basicly to > avoid tying the release cycle of "obviously useful" software to that > of python itself. They are both nasty, and there is a reason why modules that have competitors don't get added easily. Look at all of the discussion it took to get pysqlite added; it took two tries and a lot of emails to get that cleared. -Brett > > > > > > > And if there was a "net" package which contained all the networking > > > modules in the stdlib, then yes I would expect a 3rd party developer > > > of a jabber module to want to take advantage of the hierarchy and > > > inject itself into the "net" namespace. Which would actually make > > name > > > collisions worse rather than better. [Although, evidence from the > > > current os module seems to imply that this is less of an issue than > > > I'm claiming, in practice...] > > > > I suppose that's at least partially not an issue at the moment > > because you can only add stuff to existing packages through hacks. I > > wouldn't touch libraries that inject themselves into existing > > packages through .pth hackery because of the juckyness of it [*]. > > > > Yeah, something better than .pth files would be good. > > There's nothing wrong with .pth files per-se and I use them > regularly. The juckyness is in .pth files that contain lines that > start with 'import', those can do very scary things such as hot- > patching the standard library during python startup. That's something > I don't like to see in production code. > > Ronald > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060601/45e00ef9/attachment.html From greg.ewing at canterbury.ac.nz Fri Jun 2 03:03:07 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 02 Jun 2006 13:03:07 +1200 Subject: [Python-3000] Wild idea: Deferred Evaluation & Implicit Lambda In-Reply-To: <200605301243.57103.tdickenson@geminidataloggers.com> References: <447C1AB3.5000602@acm.org> <200605301243.57103.tdickenson@geminidataloggers.com> Message-ID: <447F8E4B.6030205@canterbury.ac.nz> Toby Dickenson wrote: > The ?? operator first evaluated its left operand. If that succeeds its value > is returned. If that raised an exception it evaluates and returns its right > operand. That allowed your example to be written: > > value = a[key] ?? b[key] ?? 0 That wouldn't make sense so much in Python, because you don't usually want to catch all exceptions, only particular ones. So the operator would need to be parameterised somehow with the exception to catch, which would make it much less concise. A more Pythonic way would be something like value = a.get(key) or b.get(key) or 0 If some of your legitimate values can be false, you might need to use functions that attempt to get a value and return it wrapped somehow. The thought has just occurred that what *might* be useful here is an operator that works like "or", except that the only value it recognises as "false" is None. Not sure what to call it, though... -- Greg From greg.ewing at canterbury.ac.nz Fri Jun 2 03:14:32 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 02 Jun 2006 13:14:32 +1200 Subject: [Python-3000] weakrefs and cyclic references In-Reply-To: <1d85506f0605311102la44fb40j65954db9bad9a29c@mail.gmail.com> References: <1d85506f0605311102la44fb40j65954db9bad9a29c@mail.gmail.com> Message-ID: <447F90F8.6060000@canterbury.ac.nz> tomer filiba wrote: > you can solve the problem using > weakref.proxy > ... > so why not do this automatically? I would *not* want to have some of my references chosen at random and automatically made into weak ones. I may temporarily create a cycle and later break it by removing one of the references. If the other remaining reference had been picked for auto-conversion into a weak reference, I would lose the last reference to my object. (Besides being undesirable, it would also be extremely difficult to implement efficiently.) What might be useful is an easier way of *explicitly* creating and using weak references. We already have WeakKeyDictionary and WeakValueDictionary which behave just like ordinary dicts except that they weakly reference things. I'm thinking it would be nice to have a way of declaring any attribute to be a weak reference. Then it could be read and written it in the usual way, without all the code that uses it having to know about its weakness. This could probably be done fairly easily with a suitable property descriptor. -- Greg From bingham at cenix-bioscience.com Fri Jun 2 11:16:20 2006 From: bingham at cenix-bioscience.com (Aaron Bingham) Date: Fri, 02 Jun 2006 11:16:20 +0200 Subject: [Python-3000] packages in the stdlib In-Reply-To: References: <44716940.9000300@acm.org><4472B196.7070506@acm.org> <447BC126.8050107@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> <447F0FC4.1030906@cenix-bioscience.com> Message-ID: <448001E4.5070003@cenix-bioscience.com> Terry Reedy wrote: >"Aaron Bingham" wrote in message >news:447F0FC4.1030906 at cenix-bioscience.com... > > >>I'm confused. As far as I can see, a reserved prefix (the "py" or >>"stdlib" package others have mentioned) is the only reliable way to >>avoid naming conflicts with 3rd-party packages with a growing standard >>library. >> >> > >True, but.. > > > >>I suspect we wll be going round and round in circles here as >>long as a reserved prefix is ruled out. IMO, multiple reserved prefixes >>("net", "gui", etc.) is much worse than one. >> >> > >But much better than a hundred or more ;-) > > The fewer the better of course. >> Could someone please >>explain for my sake why a single reserved prefix is not acceptable? >> >> > >Because you have to type it over and over. > I tiny amount of pain for everyone to save a large amount of pain for a few (when their name gets used by a new stdlib package). >There are two separate issues being discussed here: >1) reducing/eliminating name clashes between stdlib and other modules; >2) organing the stdlib with a shallow hierarchy. > >For the former, yes, a prefix on stdlib modules would work, but this most >common case could/should be the default. > What the most common case is depends on what you are doing. For people writing one-off scripts, stdlib imports will dominate; in the code I work on, stdlib imports are only a small fraction (I'd guess about 10%) of all imports. >Requiring instead a prefix on all >*other* imports would accomplish the same. For instance, 's' for imports >from site-packages and 'l' for imports of local modules on sys.path (which >would then not have lib and lib/site-packages on it). > > True, but having the name of a module depend on how you choose to install it on a particular machine seems dangerous. >But the problem I see with this approach is that is says that the most >important thing about a module is where it comes from, rather than what I >does. > > Which is more important depends on what you are thinking about. If I am just trying to get something working quickly, what the module does is most important; if I am trying to minimize external dependancies, where the module comes from is most important. >For the latter (2 above), I think those who want such mostly agree in >principle on a mostly two-level hierarchy with about 10-20 short names for >the top-level, using the lib docs as a starting point for the categories > > That's fine with me, but I still think we need a top-level prefix. >Up in the air is the question of plugging in other modules not included in >the stdlib. With useful categories, this strikes me as a useful thing to >do. From a usage viewpoint, what a module does is more important than who >wrote it and who distributes it. > This strikes me as asking for naming conflicts. An alternative approach would be to have a system of categories for documentation purposes that are not related to the package names. Python could include support for searching by package category. >When it become trivial to grab and >install non-stdlib modules, then the distinction between stdlib and not >becomes even less important. > The distinction is still very important if I want my code to run with minimal fuss on anyone's machine. Cheers, -- -------------------------------------------------------------------- Aaron Bingham Senior Software Engineer Cenix BioScience GmbH -------------------------------------------------------------------- From gmccaughan at synaptics-uk.com Fri Jun 2 11:12:32 2006 From: gmccaughan at synaptics-uk.com (Gareth McCaughan) Date: Fri, 2 Jun 2006 10:12:32 +0100 Subject: [Python-3000] packages in the stdlib In-Reply-To: <3d2ce8cb0606011416t30333f4aq641c00d557760eae@mail.gmail.com> References: <44716940.9000300@acm.org> <3d2ce8cb0606011416t30333f4aq641c00d557760eae@mail.gmail.com> Message-ID: <200606021012.33299.gmccaughan@synaptics-uk.com> On Thursday 2006-06-01 22:16, Mike Klaas wrote: > Terry Reedy wrote: > > Because you have to type it over and over. > > hmm, With the right context manager: > > import py > with py as py: > from gui import tkinker > import net > with net as net: > import httplib > import urllib That's neat, but I think it's worse in both brevity and clarity than import tkinter import httplib, urllib or even (though here things would change as the number of imported libraries in each category tends to infinity, which in practice it probably doesn't) than import gui.tkinter import net.httplib import net.urllib -- g From ncoghlan at gmail.com Fri Jun 2 12:53:39 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 02 Jun 2006 20:53:39 +1000 Subject: [Python-3000] weakrefs and cyclic references In-Reply-To: <447F90F8.6060000@canterbury.ac.nz> References: <1d85506f0605311102la44fb40j65954db9bad9a29c@mail.gmail.com> <447F90F8.6060000@canterbury.ac.nz> Message-ID: <448018B3.7050102@gmail.com> Greg Ewing wrote: > What might be useful is an easier way of *explicitly* > creating and using weak references. > > We already have WeakKeyDictionary and WeakValueDictionary > which behave just like ordinary dicts except that they > weakly reference things. I'm thinking it would be nice > to have a way of declaring any attribute to be a weak > reference. Then it could be read and written it in the > usual way, without all the code that uses it having > to know about its weakness. > > This could probably be done fairly easily with a suitable > property descriptor. Something like the following? (although you could do a simpler version without the callback support) (untested!) class WeakAttr(object): """Descriptor to define weak instance attributes name is the name of the attribute callback is an optional callback function If supplied, the callback function is called with the instance and the attribute name as arguments after a currently referenced object is finalized. """ def __init__(self, name, callback=None): self._name = name self._callback = callback def __get__(self, obj, cls): if obj is None: return self attr_ref = getattr(obj, self._name) if attr_ref is not None: return attr_ref() return None def __set__(self, obj, value): name = self._name if value is None: setattr(obj, name, None) else: cb = self._callback if cb is not None: _cb = cb def cb(dead_ref): if dead_ref is getattr(obj, name): # Object that went away is still # the one referred to by the # attribute, so invoke the callback _cb(obj, name) attr_ref = weakref.ref(value, cb) setattr(obj, self._name, ) def __delete__(self, obj): delattr(obj, self._name) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From tomerfiliba at gmail.com Fri Jun 2 14:54:21 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Fri, 2 Jun 2006 14:54:21 +0200 Subject: [Python-3000] weakrefs and cyclic references In-Reply-To: <448018B3.7050102@gmail.com> References: <1d85506f0605311102la44fb40j65954db9bad9a29c@mail.gmail.com> <447F90F8.6060000@canterbury.ac.nz> <448018B3.7050102@gmail.com> Message-ID: <1d85506f0606020554v3b478434s4b5233f50e1010cc@mail.gmail.com> dang, you posted before me :) anyway, please check my implementation as well http://sebulba.wikispaces.com/recipe+weakattr i also included some demos. anyway, i'd like to have this or the other weakattr implementation included in weakref.py. it's a pretty useful feature to have in the stdlib, for example: from weakref import weakattr class blah(object): someattr = weakattr() def __init__(self): self.someattr = self just like properties. -tomer On 6/2/06, Nick Coghlan wrote: > Greg Ewing wrote: > > What might be useful is an easier way of *explicitly* > > creating and using weak references. > > > > We already have WeakKeyDictionary and WeakValueDictionary > > which behave just like ordinary dicts except that they > > weakly reference things. I'm thinking it would be nice > > to have a way of declaring any attribute to be a weak > > reference. Then it could be read and written it in the > > usual way, without all the code that uses it having > > to know about its weakness. > > > > This could probably be done fairly easily with a suitable > > property descriptor. > > Something like the following? (although you could do a simpler version without > the callback support) (untested!) > > class WeakAttr(object): > """Descriptor to define weak instance attributes > > name is the name of the attribute > callback is an optional callback function > > If supplied, the callback function is called with the > instance and the attribute name as arguments after a currently > referenced object is finalized. > """ > def __init__(self, name, callback=None): > self._name = name > self._callback = callback > > def __get__(self, obj, cls): > if obj is None: > return self > attr_ref = getattr(obj, self._name) > if attr_ref is not None: > return attr_ref() > return None > > def __set__(self, obj, value): > name = self._name > if value is None: > setattr(obj, name, None) > else: > cb = self._callback > if cb is not None: > _cb = cb > def cb(dead_ref): > if dead_ref is getattr(obj, name): > # Object that went away is still > # the one referred to by the > # attribute, so invoke the callback > _cb(obj, name) > attr_ref = weakref.ref(value, cb) > setattr(obj, self._name, ) > > def __delete__(self, obj): > delattr(obj, self._name) > > > > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > --------------------------------------------------------------- > http://www.boredomandlaziness.org > From collinw at gmail.com Fri Jun 2 18:05:40 2006 From: collinw at gmail.com (Collin Winter) Date: Fri, 2 Jun 2006 18:05:40 +0200 Subject: [Python-3000] Using a list for *args (was: Type annotations:annotating generators) In-Reply-To: <20060601131936.srakny7zvu3lwok0@login.werra.lunarpages.com> References: <20060601131936.srakny7zvu3lwok0@login.werra.lunarpages.com> Message-ID: <43aa6ff70606020905n5b4472cexfc61d4edfca396c2@mail.gmail.com> On 6/1/06, Michael Chermside wrote: > Collin Winter writes: > > I've attached the benchmarks as a comment on the patch, but I'll > > repeat them here. All times are usecs per loop. > [statistics showing list is about 15% slower] > > My memory is fuzzy here. Can someone repeat for me the reasons > why we wanted to use list? Were we just trying it out to see how > it worked, or was there a desire to change? Was the desire to > change because it improved some uses of the C api, or was it > just for "purity" in use of tuples vs lists? > > I'm not a "need for speed" kind of guy, but I can't remember what > the advantages of the list approach were supposed to be. The main reason (in my mind, at least) was tuple/list purity. > By the way I'm curious about the following also: > > # interpolating a list (I presume there's no advantage, but just checking) > ./python -mtimeit 'def foo(*args): pass' 'foo(*range(10))' Tuple: 4.22 List: 4.57 > # calling a function that doesn't use *args > ./python -mtimeit 'def foo(): pass' 'foo()' Tuple: 1.5 List: 1.51 > ./python -mtimeit 'def foo(x): pass' 'foo(1)' Tuple: 1.62 List: 1.59 > ./python -mtimeit 'def foo(x,y): pass' 'foo(1,2)' Tuple: 1.7 List: 1.7 > ./python -mtimeit 'def foo(x,y,z): pass' 'foo(1,2,3)' Tuple: 1.84 List: 1.83 Collin Winter From talin at acm.org Fri Jun 2 19:42:03 2006 From: talin at acm.org (Talin) Date: Fri, 02 Jun 2006 10:42:03 -0700 Subject: [Python-3000] packages in the stdlib In-Reply-To: References: <44716940.9000300@acm.org> <447BC126.8050107@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> <477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com> Message-ID: <4480786B.3060400@acm.org> Ronald Oussoren wrote: > On 1-jun-2006, at 17:44, Brett Cannon wrote: >>I suppose that's at least partially not an issue at the moment >>because you can only add stuff to existing packages through hacks. I >>wouldn't touch libraries that inject themselves into existing >>packages through .pth hackery because of the juckyness of it [*]. >> >>Yeah, something better than .pth files would be good. > > > There's nothing wrong with .pth files per-se and I use them > regularly. The juckyness is in .pth files that contain lines that > start with 'import', those can do very scary things such as hot- > patching the standard library during python startup. That's something > I don't like to see in production code. Reading over this thread, it seems to me that there is a cross-linkage between the "reorganize standard library" task and the "refactor import machinery" task - in that much of the arguments about the stdlib names seem to hinge on policy decisions as to (a) whether 3rd party libs should be allowed to co-mingle with the stdlib modules, and (b) what kinds of co-mingling should be allowed ('monkeypatching', et al), and (c) what specific import mechanisms should these 3rd-party modules have access to in order to do this co-mingling. Moreover, past threads on the topic of import machinery have given me the vague sense that there is a lot of accumulated cruft in the way that packages are built, distributed, and imported; that a lot of features and additions have been made to the various distutils / setuputils / import tools in order to solve various problems that have cropped up from time to time, and that certain people are rather disatisfied with the overal organization (or lack thereof) and inelegance of these additions, in particular their lack of a OOWTDI. I say 'vague sense' because even after reading all these threads, I only have a murky idea of what actual *problems* all of these various improvements are trying to solve. Given the cruft-disposal-themed mission statement of Py3000, it seems to me that it would make a lot of sense for someone to actually write down what all this stuff is actually trying to accomplish; And from there perhaps open the discussion as to whether there is some other, more sublimely beautiful and obviously simpler way to accomplish the same thing. As for the specific cast of .pth files, the general concept is, as far as I can tell, is that having to modify environment variables to include additional packages sucks; And it particularly sucks on non-Unixy platforms such as Windows and Jython. That in itself seems like a laudable goal, assuming of course that one has also listed the various use cases for why a package wouldn't simply be dumped in 'site-packages' with no need to modify anything. So before starting the work of sketching out broad categories of package names, it seems to me that step 1 and 2 are (1) identifying a set of requirements for package creation/distribution/location/etc, and (2) identify how the design of (1) will impact on the conventions of package organization. -- Talin From brett at python.org Fri Jun 2 20:20:19 2006 From: brett at python.org (Brett Cannon) Date: Fri, 2 Jun 2006 11:20:19 -0700 Subject: [Python-3000] packages in the stdlib In-Reply-To: <4480786B.3060400@acm.org> References: <44716940.9000300@acm.org> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> <477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com> <4480786B.3060400@acm.org> Message-ID: On 6/2/06, Talin wrote: > > Ronald Oussoren wrote: > > On 1-jun-2006, at 17:44, Brett Cannon wrote: > >>I suppose that's at least partially not an issue at the moment > >>because you can only add stuff to existing packages through hacks. I > >>wouldn't touch libraries that inject themselves into existing > >>packages through .pth hackery because of the juckyness of it [*]. > >> > >>Yeah, something better than .pth files would be good. > > > > > > There's nothing wrong with .pth files per-se and I use them > > regularly. The juckyness is in .pth files that contain lines that > > start with 'import', those can do very scary things such as hot- > > patching the standard library during python startup. That's something > > I don't like to see in production code. > > Reading over this thread, it seems to me that there is a cross-linkage > between the "reorganize standard library" task and the "refactor import > machinery" task - in that much of the arguments about the stdlib names > seem to hinge on policy decisions as to (a) whether 3rd party libs > should be allowed to co-mingle with the stdlib modules, and (b) what > kinds of co-mingling should be allowed ('monkeypatching', et al), and > (c) what specific import mechanisms should these 3rd-party modules have > access to in order to do this co-mingling. Personally, I am not advocating any change in imports nor any mingling of third-party code with the stdlib. Moreover, past threads on the topic of import machinery have given me > the vague sense that there is a lot of accumulated cruft in the way that > packages are built, distributed, and imported; that a lot of features > and additions have been made to the various distutils / setuputils / > import tools in order to solve various problems that have cropped up > from time to time, and that certain people are rather disatisfied with > the overal organization (or lack thereof) and inelegance of these > additions, in particular their lack of a OOWTDI. > > I say 'vague sense' because even after reading all these threads, I only > have a murky idea of what actual *problems* all of these various > improvements are trying to solve. > > Given the cruft-disposal-themed mission statement of Py3000, it seems to > me that it would make a lot of sense for someone to actually write down > what all this stuff is actually trying to accomplish; And from there > perhaps open the discussion as to whether there is some other, more > sublimely beautiful and obviously simpler way to accomplish the same > thing. Well, for me, the reorganization is to help make finding the module you want easier, both in the docs and at the interpreter. This includes grouping and renaming modules to be more reasonable and follow a consistent naming scheme. -Brett As for the specific cast of .pth files, the general concept is, as far > as I can tell, is that having to modify environment variables to include > additional packages sucks; And it particularly sucks on non-Unixy > platforms such as Windows and Jython. That in itself seems like a > laudable goal, assuming of course that one has also listed the various > use cases for why a package wouldn't simply be dumped in 'site-packages' > with no need to modify anything. > > So before starting the work of sketching out broad categories of package > names, it seems to me that step 1 and 2 are (1) identifying a set of > requirements for package creation/distribution/location/etc, and (2) > identify how the design of (1) will impact on the conventions of package > organization. > > -- Talin > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060602/e4067afb/attachment.html From tjreedy at udel.edu Fri Jun 2 20:53:15 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 2 Jun 2006 14:53:15 -0400 Subject: [Python-3000] packages in the stdlib References: <44716940.9000300@acm.org><4472B196.7070506@acm.org> <447BC126.8050107@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> <447F0FC4.1030906@cenix-bioscience.com> <448001E4.5070003@cenix-bioscience.com> Message-ID: "Aaron Bingham" wrote in message news:448001E4.5070003 at cenix-bioscience.com... >[me] >>For the latter (2 above), I think those who want such mostly agree in >>principle on a mostly two-level hierarchy with about 10-20 short names >>for >>the top-level, using the lib docs as a starting point for the categories > That's fine with me, but I still think we need a top-level prefix. I think that 10-20 reserved names is hardly such a burden that we would need anything more on top to avoid collisions -- especially if the list is fixed. The currently problem is that modules can be added to the stdlib that clash with existing 3rd party modules. That would no longer happen under my variation of the classification proposal, which would include a misc package. >>When it become trivial to grab and >>install non-stdlib modules, then the distinction between stdlib and not >>becomes even less important. >> > The distinction is still very important if I want my code to run with > minimal fuss on anyone's machine. Under the hypothesis 'trivial to install...' then the extra fuss would be small. Do you really consider 'little extra fuss' to be the same as 'lots of extra fuss'? Terry Jan Reedy From jimjjewett at gmail.com Fri Jun 2 21:49:32 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 2 Jun 2006 15:49:32 -0400 Subject: [Python-3000] packages in the stdlib In-Reply-To: <4480786B.3060400@acm.org> References: <44716940.9000300@acm.org> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> <477FA127-7229-42BC-AECF-44BFE8BA977A@mac.com> <4480786B.3060400@acm.org> Message-ID: On 6/2/06, Talin wrote: > ... it seems to me that there is a cross-linkage > between the "reorganize standard library" task and the "refactor import > machinery" task Eventually, yes. As Brett pointed out, "reorganize the standard library" stands on its own, and is intended to make finding modules easier. The tasks get linked when the library again grows, or when 3rd-party packages try to replace (or superset) the functionality. Then we might start caring that package X is exactly the stdlib package X (which was sufficient and tested against), or that it be Xplus (which the sysadmin or user says is a faster and bugfixed superset). >- in that much of the arguments about the stdlib names > seem to hinge on policy decisions as to (a) whether 3rd party libs > should be allowed to co-mingle with the stdlib modules, By default, yes, but it should be easy to tell which you have if you do care. So (pretending that wx is in the stdlib, because it has a short name) import UI.wx # Import a module claiming to implement the wx interface import py.UI.wx # Import exactly the wx that was installed with the standard lib. > and (b) what > kinds of co-mingling should be allowed ('monkeypatching', et al), and > (c) what specific import mechanisms should these 3rd-party modules have > access to in order to do this co-mingling. These are not related to the stdlib reorg. The only catch is that with a deeper namespace, some 3rd party packages will know where they belong, and it makes sense to let them say so. Namespace packages (let alone tags) are not in the standard library now, so this can't be done as cleanly. > Moreover, past threads on the topic of import machinery have given me > the vague sense that there is a lot of accumulated cruft in the way that > packages are ... > I say 'vague sense' because even after reading all these threads, I only > have a murky idea of what actual *problems* all of these various > improvements are trying to solve. Those that I'm vaguely aware of: (1) It is hard to split packages. The idiom of module.py importing _module sort of works for a two-way split of a single module, but splitting modules and subpackages across different locations doesn't work so nicely. Putting .pyc in one place and .py in another is a recurring minor itch. (2) Every import extension reinvents the wheel, and only one wheel at a time. Whether the file is in a zip archive or not should be unrelated to whether it is a .pyo file or a cheetah template --- but currently isn't. (3) As these one-off wheels build up, it becomes difficult to know where something really came from (and how), so it is harder to find the "real" package and easier to test (or ship) the wrong version. -jJ From tomerfiliba at gmail.com Fri Jun 2 22:06:59 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Fri, 2 Jun 2006 22:06:59 +0200 Subject: [Python-3000] a slight change to __[get|set|del]item__ Message-ID: <1d85506f0606021306k32ad723bx826d7cd7debae4dd@mail.gmail.com> Guido wrote: > Because the (...) in a function call isn't a tuple. > > I'm with Oleg -- a[x, y] is *intentionally* the same as a[(x, y)]. > This is a feature; you can write > > t = x, y # or t = (x, y) > > and later > > a[t] well is func((1,2,3)) the same as func(1,2,3)? no. so why should container[1, 2, 3] be the same as container[(1,2,3)]? you say it's a feature. is it intentionally *ambiguous*? what you'd want in that case is t = (1, 2, 3) container[*t] or something like that. i guess it's a dead subject, but i wanted to have that clarified. -tomer From steven.bethard at gmail.com Fri Jun 2 22:50:30 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 2 Jun 2006 14:50:30 -0600 Subject: [Python-3000] have iter(mapping) generate (key, value) pairs Message-ID: I'd like to suggest that we (at least briefly) re-consider the decision that iterating over a mapping generates the keys, not the (key, value) pairs. This was addressed somewhat in `PEP 234`_, with the pros and cons basically being: * From a purity standpoint, iterating over keys keeps the symmetry between ``if x in y`` and ``for x in y`` * From a practicality standpoint, iterating over keys means that most of the time, you'll also have to do a ``mapping[key]``, since most iterations access both the keys and values. I only bring this up now because Python 3000 is our opportunity to review old decisions, and I think there's one more argument for iterating over (key, value) pairs that was not discussed. Iterating over (key, value) pairs allows functions like dict() and dict.update() to accept both mappings and (key, value) iterables, without having to check for a .keys() function. Just to clarify the point, here's the code in UserDict.DictMixin:: def update(self, other=None, **kwargs): # Make progressively weaker assumptions about "other" if other is None: pass elif hasattr(other, 'iteritems'): # iteritems saves memory and lookups for k, v in other.iteritems(): self[k] = v elif hasattr(other, 'keys'): for k in other.keys(): self[k] = other[k] else: for k, v in other: self[k] = v if kwargs: self.update(kwargs) Note that even though the `Language Reference`_ defines mappings in terms of __len__, __getitem__, __setitem__, __delitem__ and __iter__, UserDict.DictMixin.update has to assume that all mappings have a .keys() method. For comparison, here's what it would look like if mappings iterated over (key, value) pairs:: def update(self, other=None, **kwargs): if other is not None: for k, v in other: self[k] = v if kwargs: self.update(kwargs) As far as backwards compatibility is concerned, if you need to write code that works in both Python 2.X and Python 3000, you just need to be explicit, e.g. using dict.iteritems() or dict.iterkeys() as necessary. (Yes, I know that .iter* is going to be dropped, but that's a backward compatibility concern for another PEP, not this one.) .. _PEP 234:http://www.python.org/dev/peps/pep-0234/ .. _Language Reference: http://docs.python.org/ref/sequence-types.html STeVe -- Grammar am for people who can't think for myself. --- Bucky Katt, Get Fuzzy From guido at python.org Fri Jun 2 23:28:13 2006 From: guido at python.org (Guido van Rossum) Date: Fri, 2 Jun 2006 14:28:13 -0700 Subject: [Python-3000] have iter(mapping) generate (key, value) pairs In-Reply-To: References: Message-ID: This was already considered and rejected. See PEP 3099. On 6/2/06, Steven Bethard wrote: > I'd like to suggest that we (at least briefly) re-consider the > decision that iterating over a mapping generates the keys, not the > (key, value) pairs. This was addressed somewhat in `PEP 234`_, with > the pros and cons basically being: > > * From a purity standpoint, iterating over keys keeps the symmetry > between ``if x in y`` and ``for x in y`` > * From a practicality standpoint, iterating over keys means that most > of the time, you'll also have to do a ``mapping[key]``, since most > iterations access both the keys and values. > > I only bring this up now because Python 3000 is our opportunity to > review old decisions, and I think there's one more argument for > iterating over (key, value) pairs that was not discussed. Iterating > over (key, value) pairs allows functions like dict() and dict.update() > to accept both mappings and (key, value) iterables, without having to > check for a .keys() function. Just to clarify the point, here's the > code in UserDict.DictMixin:: > > def update(self, other=None, **kwargs): > # Make progressively weaker assumptions about "other" > if other is None: > pass > elif hasattr(other, 'iteritems'): # iteritems saves memory and lookups > for k, v in other.iteritems(): > self[k] = v > elif hasattr(other, 'keys'): > for k in other.keys(): > self[k] = other[k] > else: > for k, v in other: > self[k] = v > if kwargs: > self.update(kwargs) > > Note that even though the `Language Reference`_ defines mappings in > terms of __len__, __getitem__, __setitem__, __delitem__ and __iter__, > UserDict.DictMixin.update has to assume that all mappings have a > .keys() method. > > For comparison, here's what it would look like if mappings iterated > over (key, value) pairs:: > > def update(self, other=None, **kwargs): > if other is not None: > for k, v in other: > self[k] = v > if kwargs: > self.update(kwargs) > > As far as backwards compatibility is concerned, if you need to write > code that works in both Python 2.X and Python 3000, you just need to > be explicit, e.g. using dict.iteritems() or dict.iterkeys() as > necessary. (Yes, I know that .iter* is going to be dropped, but > that's a backward compatibility concern for another PEP, not this > one.) > > > .. _PEP 234:http://www.python.org/dev/peps/pep-0234/ > .. _Language Reference: http://docs.python.org/ref/sequence-types.html > > STeVe > -- > Grammar am for people who can't think for myself. > --- Bucky Katt, Get Fuzzy > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mcherm at mcherm.com Fri Jun 2 23:30:05 2006 From: mcherm at mcherm.com (Michael Chermside) Date: Fri, 02 Jun 2006 14:30:05 -0700 Subject: [Python-3000] have iter(mapping) generate (key, value) pairs Message-ID: <20060602143005.mzagusmhwv5cow08@login.werra.lunarpages.com> Steven Bethard writes: > I'd like to suggest that we (at least briefly) re-consider the > decision that iterating over a mapping generates the keys, not the > (key, value) pairs. I agree, now is the best time for reconsidering the decision. My opinion on the matter itself is that I was unsure before we did it, but that use has convinced me that iter() returning the keys turns out to be very natural. Since I write "for x in myDict" a LOT this outweighs any minor implementation details in dict() and dict.update(). I say the original decision was a Python success story: it's one of those examples that I look back on whenever my confidence in Guido's intuition on syntax needs shoring up. -- Michael Chermside From ncoghlan at gmail.com Sat Jun 3 01:54:33 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 03 Jun 2006 09:54:33 +1000 Subject: [Python-3000] have iter(mapping) generate (key, value) pairs In-Reply-To: References: Message-ID: <4480CFB9.6020301@gmail.com> Steven Bethard wrote: > Note that even though the `Language Reference`_ defines mappings in > terms of __len__, __getitem__, __setitem__, __delitem__ and __iter__, > UserDict.DictMixin.update has to assume that all mappings have a > .keys() method. A slightly different proposal: Add an iteritems() builtin with the following definition: def iteritems(obj): # Check for mapping first try: items = obj.items # or __items__ if you prefer except AttributeError: pass else: return iter(items()) # Check for sequence next if hasattr(obj, "__getitem__"): return enumerate(obj) # Fall back on normal iteration return iter(obj) Then update the language reference so that the presence of of an items() (or __items__()) method is the defining characteristic that makes something a mapping instead of a sequence. After all, we've been trying to think of a way to denote that anyway. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From tomerfiliba at gmail.com Sat Jun 3 22:51:57 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Sat, 3 Jun 2006 22:51:57 +0200 Subject: [Python-3000] iostack and sock2 Message-ID: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> hi all some time ago i wrote this huge post about stackable IO and the need for a new socket module. i've made some progress with those, and i'd like to receive feedback. * a working alpha version of the new socket module (sock2) is available for testing and tweaking with at http://sebulba.wikispaces.com/project+sock2 * i'm working on a version of iostack... but i don't expect to make a public release until mid july. in the meanwhile, i started a wiki page on my site for it (motivation, plans, design): http://sebulba.wikispaces.com/project+iostack with lots of pretty-formatted info. i remember people saying that stating `read(n)` returns exactly `n` bytes is problematic, can you elaborate? btw, Guido said he'd review it, but he's too busy, and i'd like to receive comments from other people as well. thanks. -tomer From ncoghlan at gmail.com Sun Jun 4 05:52:19 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 04 Jun 2006 13:52:19 +1000 Subject: [Python-3000] iostack and sock2 In-Reply-To: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> Message-ID: <448258F3.3070808@gmail.com> tomer filiba wrote: > hi all > > some time ago i wrote this huge post about stackable IO and the > need for a new socket module. i've made some progress with > those, and i'd like to receive feedback. > > * a working alpha version of the new socket module (sock2) is > available for testing and tweaking with at > http://sebulba.wikispaces.com/project+sock2 > > * i'm working on a version of iostack... but i don't expect to make > a public release until mid july. in the meanwhile, i started a wiki > page on my site for it (motivation, plans, design): > http://sebulba.wikispaces.com/project+iostack Nice, very nice. Some things that don't appear to have been considered in the iostack design yet: - non-blocking IO and timeouts (e.g. on NetworkStreams) - interaction with (replacement of?) the select module Some other random thoughts about the current writeup: The design appears to implicitly assume that it is best to treat all streams as IO streams, and raise an exception if an output operation is accessed on an input-only stream (or vice versa). This seems like a reasonable idea to me, but it should be mentioned explicitly (e.g an alternative approach would be to define InputStream and OutputStream, and then have an IOStream that inherited from both of them). The common Stream API should include a flush() write method, so that application code doesn't need to care whether or not it is dealing with buffered IO when forcing output to be displayed. Any operations that may touch the filesystem or network shouldn't be properties - attribute access should never raise IOError (this is a guideline that came out of the Path discussion). (e.g. the 'position' property is probably a bad idea, because x.position may then raise an IOError) The stream layer hierarchy needs to be limited to layers that both expose and use the normal bytes-based Stream API. A separate stream interface concept is needed for something that can be used by the application, but cannot have other layers stacked on top of it. Additionally, any "bytes-in-bytes-out" transformation operation can be handled as a single codec layer that accepts an encoding function and a decoding function. This can then be used for compression layers, encryption layers, Golay encoding, A-law companding, AV codecs, etc. . . StreamLayer * ForwardingLayer - forwards all data written or read to another stream * BufferingLayer - buffers data using given buffer size * CodecLayer - encodes data written, decodes data read StreamInterface * TextInterface - text oriented interface to a stream * BytesInterface - byte oriented interface to a stream * RecordInterface - record (struct) oriented interface to a stream * ObjectInterface - object (pickle) oriented interface to a stream The key point about the stream interfaces is that while they will provide a common mechanism for getting at the underlying stream, their interfaces are otherwise unconstrained. The BytesInterface differs from a normal low-level stream primarily in the fact that it *is* line-iterable. On the topic of line buffering, the Python 2.x IO stack treats binary files as line iterable, using '\n' as a line separator (well, more strictly it's a record separator, since we're talking about binary files). There's actually an RFE on SF somewhere about making the record separator configurable in the 2.x IO stack (I raised the tracker item ages ago when someone else made the suggestion). However, the streams produced by iostack's 'file' helper are not currently line-iterable. Additionally, the 'textfile' helper tries to handle line terminators while the data is still bytes, while Unicode defines line endings in terms of characters. As I understand it, "\x0A" (CR), "\x0D" (LF), "\x0A\x0D" (CRLF), "\x85" (NEL), "\x0C" (FF), "\u2028" (LS), "\u2029" (PS) should all be treated as line terminators as far as Unicode is concerned. So I think line buffering and making things line iterable should be left to the TextInterface and BytesInterface layers. TextInterface would be most similar to the currently file interface, only working on Unicode strings instead of 8-bit strings (as well as using the Unicode definition of what constitutes a line ending). BytesInterface would work with binary files, returning a bytes object for each record. So I'd tweak the helper functions to look like: def file(filename, mode = "r", bufsize = -1, line_sep="\n"): f = FileStream(filename, mode) # a bufsize of 0 or None means unbuffered if bufsize: f = BufferingLayer(f, bufsize) # Use bytes interface to make file line-iterable return BytesInterface(f, line_sep) def textfile(filename, mode = "r", bufsize = -1, encoding = None): f = FileStream(filename, mode) # a bufsize of 0 or None means unbuffered if bufsize: f = BufferingLayer(f, bufsize) # Text interface deals with line terminators correctly return TextInterface(f, encoding) > with lots of pretty-formatted info. i remember people saying > that stating `read(n)` returns exactly `n` bytes is problematic, > can you elaborate? I can see that behaviour being seriously annoying when you get to the end of the stream. I'd far prefer for the stream to just give me the last bit when I ask for it and then tell me *next* time that there isn't anything left. This has worked well for a long time with the existing read method of file objects. If you want a method with the other behaviour, add a "readexact" API, rather than changing the semantics of "read" (although I'd be really curious to hear the use case for the other behaviour). (Take a look at the s3.recv(100) line in your Sock2 example - how irritating would it be for that to raise EOFError because you only got a few bytes?) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ronaldoussoren at mac.com Sun Jun 4 10:45:28 2006 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Sun, 4 Jun 2006 10:45:28 +0200 Subject: [Python-3000] packages in the stdlib In-Reply-To: References: <44716940.9000300@acm.org> <4472B196.7070506@acm.org> <447BC126.8050107@acm.org> <1149080922.5718.20.camel@fsol> <1149095977.5718.51.camel@fsol> <430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com> <79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com> <447F0FC4.1030906@cenix-bioscience.com> <448001E4.5070003@cenix-bioscience.com> Message-ID: <63094894-36B0-4487-9992-F913D9749F08@mac.com> On 2-jun-2006, at 20:53, Terry Reedy wrote: > > "Aaron Bingham" wrote in message > news:448001E4.5070003 at cenix-bioscience.com... >> [me] >>> For the latter (2 above), I think those who want such mostly >>> agree in >>> principle on a mostly two-level hierarchy with about 10-20 short >>> names >>> for >>> the top-level, using the lib docs as a starting point for the >>> categories > >> That's fine with me, but I still think we need a top-level prefix. > > I think that 10-20 reserved names is hardly such a burden that we > would > need anything more on top to avoid collisions -- especially if the > list is > fixed. The currently problem is that modules can be added to the > stdlib > that clash with existing 3rd party modules. That would no longer > happen > under my variation of the classification proposal, which would > include a > misc package. I'm -lots on a package named "misc". That's really poor naming, almost as bad as "util". Misc is the "we don't know what to do with these"-category and completely unobvious for anyone that doesn't already know where to look. It seems to me that misc would end up containing all modules and packages that don't fit in one of the preconceived toplevel packages and don't have enough peers in the misc package to move them to their own toplevel package. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2157 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20060604/c031b399/attachment.bin From tjreedy at udel.edu Sun Jun 4 21:18:15 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 4 Jun 2006 15:18:15 -0400 Subject: [Python-3000] packages in the stdlib References: <44716940.9000300@acm.org> <4472B196.7070506@acm.org><447BC126.8050107@acm.org><1149080922.5718.20.camel@fsol><1149095977.5718.51.camel@fsol><430FBE78-D78A-4346-BD18-27D1D038AFE7@mac.com><79990c6b0606010429j221aa7c1m677c878e9286fd91@mail.gmail.com><447F0FC4.1030906@cenix-bioscience.com><448001E4.5070003@cenix-bioscience.com> <63094894-36B0-4487-9992-F913D9749F08@mac.com> Message-ID: "Ronald Oussoren" wrote in message news:63094894-36B0-4487-9992-F913D9749F08 at mac.com... >I'm -lots on a package named "misc". That's really poor naming, >almost as bad as "util". Misc is the "we don't know what to do with >these"-category and completely unobvious for anyone that doesn't >already know where to look. It seems to me that misc would end up >containing all modules and packages that don't fit in one of the misc package to move them to their own toplevel package. --- Without a misc package, we either need to have an all-inclusive set of top level categories (difficult) or else put the oddballs at top level, which counteracts the purpose of having categories. The latter improperly highlites the oddballs and increases the chances of name-clashes -- especially when more are added. I have found catch-all categories very useful in other contexts. Terry Jan Reedy From tomerfiliba at gmail.com Sun Jun 4 21:45:24 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Sun, 4 Jun 2006 12:45:24 -0700 Subject: [Python-3000] iostack and sock2 In-Reply-To: <448258F3.3070808@gmail.com> References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <448258F3.3070808@gmail.com> Message-ID: <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> you certainly have good points there. i'll start with the easy ones: >Some things that don't appear to have been considered in the iostack design yet: > - non-blocking IO and timeouts (e.g. on NetworkStreams) NetworkStreams have a readavail() method, which reads all the available in-queue data, as well as a may_read and a may_write properties besides, because of the complexity of sockets (so many different options, protocols, etc), i'd leave the timeout to the socket itself. i.e. s = TcpSocket(...) s.timeout = 2 ns = NetworkStream(s) ns.read(100) > - interaction with (replacement of?) the select module well, it's too hard to design for a nonexisting module. select is all there is that's platform independent. random idea: * select is virtually platform independent * improved polling is inconsistent * kqueue is BSD-only * epoll is linux-only * windows has none of those maybe introduce a new select module that has select-objects, like the Poll() class, that will default to using select(), but could use kqueue/epoll when possible? s = Select((sock1, "r"), (sock2, "rw"), (sock3, "x")) res = s.wait(timeout = 1) for sock, events in res: .... - - - - - > The common Stream API should include a flush() write method, so that > application code doesn't need to care whether or not it is dealing with > buffered IO when forcing output to be displayed. i object. it would soon lead to things like today's StingIO, that defines isatty and flush, although it's completely meaningless. having to implement functions "just because" is ugly. i would suggest a different approach -- PseudoLayers. these are mockup layers that provide a do-nothing function only for interface consistency. each layer would define it's own pseudo layer, for example: class BufferingLayer(Layer): def flush(self): class PseudoBufferingLayer(Layer): def flush(self): pass when you pass an unbuffered stream to a function that expects it to be buffered (requires flush, etc), you would just wrap it with the pseudo-layer. this would allow arbitrary mockup APIs to be defined by users (why should flush be that special?) - - - - - > e.g an alternative approach would be to > define InputStream and OutputStream, and then have an IOStream that inherited > from both of them). hrrm... i need to think about this more. one problem i already see: class InputStream: def close(self):.... def read(self, count): ... class OutputStream: def close(self):.... def write(self, data)... class NetworkStream(InputStream, OutputStream): ... which version of close() gets called? - - - - - > e.g. the 'position' property is > probably a bad idea, because x.position may then raise an IOError i guess it's reasonable approach, but i'm a "usability beats purity" guy. f.position = 0 or f.position += 10 is so much more convenient than seek()ing and tell()ing. we can also optimize += by defining a Position type where __iadd__(n) uses seek(n, "curr") instead of seek(n + tell(), "start") btw, you can first test the "seakable" attribute, to see if positioning would work. and in the worst case, i'd vote for converting IOErrors to ValueErrors... def _set_pos(self, n) try: self.seek(n) except IOError: raise ValueError("invalid position value", n) so that f.position = -10 raises a ValueError, which is logical - - - - - > The stream layer hierarchy needs to be limited to layers that both expose and > use the normal bytes-based Stream API. A separate stream interface concept is > needed for something that can be used by the application, but cannot have > other layers stacked on top of it. yeah, i wanted to do so myself, but couldn't find a good definition to what is stackable and what's not. but i like the idea. i'll think some more about that as well. > The BytesInterface differs from a normal low-level > stream primarily in the fact that it *is* line-iterable. but what's a line in a binary file? how is that making sense? binary files are usually made of records, headers, pointers, arrays of records (tables)... think of how ELF32 looks like, or a database, or core dumps -- those are binary files. what would a "line" mean to a .tar.bz2 file? - - - - - > Additionally, the 'textfile' helper tries to handle line > terminators while the data is still bytes, while Unicode defines line endings > in terms of characters. As I understand it, "\x0A" (CR), "\x0D" (LF), > [...] well, currently, the TextLayer reads the stream character by character, until it finds "\n"... the specific encoding of "\n" depends on the layer's encoding, but i don't deal with all the weird cases you mentioned. - - - - - random idea: when compiled with universal line support, python unicode should equate "\n" to any of the forementioned characters. i.e. u"\n" == u"\u2028" # True the fact unicode is stupid shouldn't make programming unicode as stupid: a newline is a newline! but then again, it could be solved with a isnewline(ch) function instead, without messing the internals of the unicode type... so that's clearly (-1). i just write it "for the record". - - - - - > I can see that behaviour being seriously annoying when you get to the end of > the stream. I'd far prefer for the stream to just give me the last bit when I > ask for it and then tell me *next* time that there isn't anything left. well, today it's done like so: while True: x = f.read(100) if not x: break in iostack, that would be done like so: try: while True: x = f.read(100) except EOFError: last_x = f.readall() # read all the leftovers (0 <= leftovers < 100) a little longer, but not illogical > If you want a method with the other behaviour, add a "readexact" API, rather > than changing the semantics of "read" (although I'd be really curious to hear > the use case for the other behaviour). well, when i work with files/sockets, i tend to send data structures over them, like records, frames, protocols, etc. if a record is said to be x bytes long, and read(x) returns less than x bytes, my code has to loop until it gets enough bytes. for example, a record-codec: class RecordCodec: .... def read(self): raw = self.substream.read(struct.calcsize(self.format)) return struct.unpack(self.format, raw) if substream.read() returns less than the expected number of bytes, as is the case with sockets, the RecordCodec would have to perform its own buffering... and it happens in so many places today. imho, any framework must follow the DRY principal... i wish i could expand this acronym, but then i'd repeat myself ;) since the normal use-case for read(n) is expecting n bytes, read(n) is the standard API, while readany(n) can be used for unknown lengths. and when your IO library will be packed with useful things like FramingLayer, or SerializingLayer, you would just use such frames or whatever to transfer arbitrary lengths of data, without thinking twice. it would just become the natural way of doing that. imagine how cool it could be -- SerializingLayer could mean the end of specializied protocols and statemachines. you just send an object that could take care of its own (a ChatMessage would have a .show() method, etc.), - - - - - and you still have readany >>> my_netstream.readany(100) "hello" perhaps it should be renamed readupto(n) as for code that interacts with ugly protocols like HTTP, you could use: s = TextInterface(my_netstream, "ascii") header = [] for line in s: if not line: break header.append(line) - - - - - thanks for the ideas. -tomer On 6/3/06, Nick Coghlan wrote: > tomer filiba wrote: > > hi all > > > > some time ago i wrote this huge post about stackable IO and the > > need for a new socket module. i've made some progress with > > those, and i'd like to receive feedback. > > > > * a working alpha version of the new socket module (sock2) is > > available for testing and tweaking with at > > http://sebulba.wikispaces.com/project+sock2 > > > > * i'm working on a version of iostack... but i don't expect to make > > a public release until mid july. in the meanwhile, i started a wiki > > page on my site for it (motivation, plans, design): > > http://sebulba.wikispaces.com/project+iostack > > Nice, very nice. > > Some things that don't appear to have been considered in the iostack design yet: > - non-blocking IO and timeouts (e.g. on NetworkStreams) > - interaction with (replacement of?) the select module > > Some other random thoughts about the current writeup: > > The design appears to implicitly assume that it is best to treat all streams > as IO streams, and raise an exception if an output operation is accessed on an > input-only stream (or vice versa). This seems like a reasonable idea to me, > but it should be mentioned explicitly (e.g an alternative approach would be to > define InputStream and OutputStream, and then have an IOStream that inherited > from both of them). > > The common Stream API should include a flush() write method, so that > application code doesn't need to care whether or not it is dealing with > buffered IO when forcing output to be displayed. > > Any operations that may touch the filesystem or network shouldn't be > properties - attribute access should never raise IOError (this is a guideline > that came out of the Path discussion). (e.g. the 'position' property is > probably a bad idea, because x.position may then raise an IOError) > > The stream layer hierarchy needs to be limited to layers that both expose and > use the normal bytes-based Stream API. A separate stream interface concept is > needed for something that can be used by the application, but cannot have > other layers stacked on top of it. Additionally, any "bytes-in-bytes-out" > transformation operation can be handled as a single codec layer that accepts > an encoding function and a decoding function. This can then be used for > compression layers, encryption layers, Golay encoding, A-law companding, AV > codecs, etc. . . > > StreamLayer > * ForwardingLayer - forwards all data written or read to another stream > * BufferingLayer - buffers data using given buffer size > * CodecLayer - encodes data written, decodes data read > > StreamInterface > * TextInterface - text oriented interface to a stream > * BytesInterface - byte oriented interface to a stream > * RecordInterface - record (struct) oriented interface to a stream > * ObjectInterface - object (pickle) oriented interface to a stream > > The key point about the stream interfaces is that while they will provide a > common mechanism for getting at the underlying stream, their interfaces are > otherwise unconstrained. The BytesInterface differs from a normal low-level > stream primarily in the fact that it *is* line-iterable. > > On the topic of line buffering, the Python 2.x IO stack treats binary files as > line iterable, using '\n' as a line separator (well, more strictly it's a > record separator, since we're talking about binary files). > > There's actually an RFE on SF somewhere about making the record separator > configurable in the 2.x IO stack (I raised the tracker item ages ago when > someone else made the suggestion). > > However, the streams produced by iostack's 'file' helper are not currently > line-iterable. Additionally, the 'textfile' helper tries to handle line > terminators while the data is still bytes, while Unicode defines line endings > in terms of characters. As I understand it, "\x0A" (CR), "\x0D" (LF), > "\x0A\x0D" (CRLF), "\x85" (NEL), "\x0C" (FF), "\u2028" (LS), "\u2029" (PS) > should all be treated as line terminators as far as Unicode is concerned. > > So I think line buffering and making things line iterable should be left to > the TextInterface and BytesInterface layers. TextInterface would be most > similar to the currently file interface, only working on Unicode strings > instead of 8-bit strings (as well as using the Unicode definition of what > constitutes a line ending). BytesInterface would work with binary files, > returning a bytes object for each record. > > So I'd tweak the helper functions to look like: > > def file(filename, mode = "r", bufsize = -1, line_sep="\n"): > f = FileStream(filename, mode) > # a bufsize of 0 or None means unbuffered > if bufsize: > f = BufferingLayer(f, bufsize) > # Use bytes interface to make file line-iterable > return BytesInterface(f, line_sep) > > def textfile(filename, mode = "r", bufsize = -1, encoding = None): > f = FileStream(filename, mode) > # a bufsize of 0 or None means unbuffered > if bufsize: > f = BufferingLayer(f, bufsize) > # Text interface deals with line terminators correctly > return TextInterface(f, encoding) > > > with lots of pretty-formatted info. i remember people saying > > that stating `read(n)` returns exactly `n` bytes is problematic, > > can you elaborate? > > I can see that behaviour being seriously annoying when you get to the end of > the stream. I'd far prefer for the stream to just give me the last bit when I > ask for it and then tell me *next* time that there isn't anything left. This > has worked well for a long time with the existing read method of file objects. > If you want a method with the other behaviour, add a "readexact" API, rather > than changing the semantics of "read" (although I'd be really curious to hear > the use case for the other behaviour). > > (Take a look at the s3.recv(100) line in your Sock2 example - how irritating > would it be for that to raise EOFError because you only got a few bytes?) > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > --------------------------------------------------------------- > http://www.boredomandlaziness.org > From jcarlson at uci.edu Sun Jun 4 22:25:58 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 04 Jun 2006 13:25:58 -0700 Subject: [Python-3000] iostack and sock2 In-Reply-To: <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> References: <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> Message-ID: <20060604131031.69CF.JCARLSON@uci.edu> "tomer filiba" wrote: [snip] > > - interaction with (replacement of?) the select module > > well, it's too hard to design for a nonexisting module. select is all there > is that's platform independent. It is /relatively/ platform independent. > random idea: > * select is virtually platform independent > * improved polling is inconsistent > * kqueue is BSD-only > * epoll is linux-only > * windows has none of those Windows doesn't currently have a module designed to do this kind of thing, but it is possible to have a higher-performance method for Windows using various bits from the win32file module from pywin32 (I have been contemplating writing one, but I haven't had the time). [snip] > - - - - - > > > e.g an alternative approach would be to > > define InputStream and OutputStream, and then have an IOStream that inherited > > from both of them). > > hrrm... i need to think about this more. one problem i already see: > > class InputStream: > def close(self):.... > def read(self, count): ... > > class OutputStream: > def close(self):.... > def write(self, data)... > > class NetworkStream(InputStream, OutputStream): > ... > > which version of close() gets called? Both, you use super(). > - - - - - > > > e.g. the 'position' property is > > probably a bad idea, because x.position may then raise an IOError > > i guess it's reasonable approach, but i'm a "usability beats purity" guy. > f.position = 0 > or > f.position += 10 > > is so much more convenient than seek()ing and tell()ing. we can also > optimize += by defining a Position type where __iadd__(n) uses > seek(n, "curr") instead of seek(n + tell(), "start") > > btw, you can first test the "seakable" attribute, to see if positioning > would work. > > and in the worst case, i'd vote for converting IOErrors to ValueErrors... > > def _set_pos(self, n) > try: > self.seek(n) > except IOError: > raise ValueError("invalid position value", n) > > so that > f.position = -10 > raises a ValueError, which is logical Raising a ValueError on an unseekable stream would be confusing. [snip] > - - - - - > > random idea: > when compiled with universal line support, python unicode should > equate "\n" to any of the forementioned characters. > i.e. > > u"\n" == u"\u2028" # True I'm glad that you later decided for yourself that such a thing would be utterly and completely foolish. > - - - - - > > > I can see that behaviour being seriously annoying when you get to the end of > > the stream. I'd far prefer for the stream to just give me the last bit when I > > ask for it and then tell me *next* time that there isn't anything left. > > well, today it's done like so: > > while True: > x = f.read(100) > if not x: > break > > in iostack, that would be done like so: > > try: > while True: > x = f.read(100) > except EOFError: > last_x = f.readall() # read all the leftovers (0 <= leftovers < 100) > > a little longer, but not illogical > > > If you want a method with the other behaviour, add a "readexact" API, rather > > than changing the semantics of "read" (although I'd be really curious to hear > > the use case for the other behaviour). > > well, when i work with files/sockets, i tend to send data structures over them, > like records, frames, protocols, etc. if a record is said to be x bytes long, > and read(x) returns less than x bytes, my code has to loop until it gets > enough bytes. Rather than changing what people expect with the current .read() method, why not offer a different method called .readexact(n), which will read exactly n bytes, performing buffering as necessary. You can then optimize by using cStringIOs, lists of strings, resizable bytes, or whatever other method you want (but be careful never to .read(bignum) unless you change the underlying .read() implementation; right now it allocates a buffer of size bignum, which can cause huge amounts of malloc/realloc thrashing, and generally causes MemoryErrors). [snip] - Josiah From jcarlson at uci.edu Sun Jun 4 22:42:41 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 04 Jun 2006 13:42:41 -0700 Subject: [Python-3000] iostack and sock2 In-Reply-To: <448258F3.3070808@gmail.com> References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <448258F3.3070808@gmail.com> Message-ID: <20060604132632.69D2.JCARLSON@uci.edu> Nick Coghlan wrote: [snip] > Any operations that may touch the filesystem or network shouldn't be > properties - attribute access should never raise IOError (this is a guideline > that came out of the Path discussion). (e.g. the 'position' property is > probably a bad idea, because x.position may then raise an IOError) I agree completely. > The stream layer hierarchy needs to be limited to layers that both expose and > use the normal bytes-based Stream API. A separate stream interface concept is > needed for something that can be used by the application, but cannot have > other layers stacked on top of it. Additionally, any "bytes-in-bytes-out" > transformation operation can be handled as a single codec layer that accepts > an encoding function and a decoding function. This can then be used for > compression layers, encryption layers, Golay encoding, A-law companding, AV > codecs, etc. . . > > StreamLayer > * ForwardingLayer - forwards all data written or read to another stream > * BufferingLayer - buffers data using given buffer size > * CodecLayer - encodes data written, decodes data read > > StreamInterface > * TextInterface - text oriented interface to a stream > * BytesInterface - byte oriented interface to a stream > * RecordInterface - record (struct) oriented interface to a stream > * ObjectInterface - object (pickle) oriented interface to a stream I think these are generally OK. [snip] As I've been reading the updated IO stack discussions since Tomer brought it up months ago, I've been generally -1 on the idea of rewriting the IO stack. I didn't know why at first, but I've figured out that it is a combination of "I enjoy writing wire protocols" and "it would be very nice if my old socket/file software continued to work in py3k". Obviously the first part will generally not be an issue (and wouldn't be sufficiently compelling to refuse the change) with the updated IO stack, but the second will be. That is, if we switched from the current IO methods to the stack, all old socket and file handling software seem as though they will break. This sounds to me like gratuitous breakage. On the other hand, I wouldn't mind a new IO stack module or package that defined wrappers and such for files, sockets, etc., along with the StreamLayer and StreamInterface bits somewhere. One could then add an interface to the previously mentioned module for asynchronous IO on *nix (I can't remember its name), with a (hopefully) updated implementation for Windows, falling back to an implementation that uses select on platforms where an updated method is not available. Whether or not we would want to make this updated select-like framework available to old sockets, files, etc., is a separate discussion. - Josiah From greg.ewing at canterbury.ac.nz Mon Jun 5 00:52:07 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 05 Jun 2006 10:52:07 +1200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> Message-ID: <44836417.5080209@canterbury.ac.nz> tomer filiba wrote: > NetworkStreams have a readavail() method, which reads all the available > in-queue data, as well as a may_read and a may_write properties I'm -1 on having multiple kinds of read methods which are available only on some kinds of streams. The basic interface of a stream should be dirt simple. Given a read-up-to-n-bytes method, it's easy to implement read-exactly-n-bytes on top of it in a completely generic way. So provide it as a function that operates on a stream, or a method inherited from a generic base class. > maybe introduce a new select module that has select-objects, like > the Poll() class, that will default to using select(), but could use > kqueue/epoll when possible? My current opinion on select-like functionality is that you shouldn't need to import a module for it at all. Rather, you should be able to attach a callback directly to a stream. Then there just needs to be a wait_for_something_to_happen() function somewhere (perhaps with a timeout). Underneath, the implementation would use select, poll, or whatever is most fun on the platform concerned. -- Greg From mcherm at mcherm.com Mon Jun 5 15:52:07 2006 From: mcherm at mcherm.com (Michael Chermside) Date: Mon, 05 Jun 2006 06:52:07 -0700 Subject: [Python-3000] a slight change to __[get|set|del]item__ Message-ID: <20060605065207.93v8tbf3rx5w84sg@login.werra.lunarpages.com> Tomer writes: > well is func((1,2,3)) the same as func(1,2,3)? no. > so why should container[1, 2, 3] be the same as container[(1,2,3)]? > you say it's a feature. is it intentionally *ambiguous*? > > what you'd want in that case is > t = (1, 2, 3) > container[*t] > or something like that. > > i guess it's a dead subject, but i wanted to have that clarified. There's no ambiguity, the rule is like this: Parentheses are a piece of syntax that is used for grouping everywhere *except* in function/method argument lists (both function declarations and invocations). Empty parentheses are also used to indicate an empty tuple. The comma is a piece of synatax that has special meaning in function declarations, function/method invocations, list literals, and dictionary literals (I think that's the full list of exceptions). Everywhere else it indicates tuple creation. Admitedly, it's slightly odd that a special exception to the meaning of parentheses is made for the syntax of functions, but there is a LONG and powerful historical convention that makes this the most widely accepted syntax for function invocation. Both Smalltalk and Lisp were brilliant languages whose popularity was (IMO) severely wounded by failing to maintain this syntax for function invocation. Using the comma to separate items in collections makes good sense too. List and dictionary literals obviously fall into this category. Making tuple be "the collection with no syntax" was a clever syntactical trick that allows things like these: return a, b x, y = y, x for i, x in enumerate(aList): to feel completely natural yet still be regular in syntax. -- Michael Chermside From tomerfiliba at gmail.com Mon Jun 5 18:36:30 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 5 Jun 2006 18:36:30 +0200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <44836417.5080209@canterbury.ac.nz> References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <44836417.5080209@canterbury.ac.nz> Message-ID: <1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com> > I'm -1 on having multiple kinds of read methods which > are available only on some kinds of streams. The > basic interface of a stream should be dirt simple. it's a convenience method. instead of doing it yourself everytime, readavail() returns all the available data in the socket's buffers. the basic interface should be simple and spartan, but does that mean every deriving class must not extend it? from personal experience, of myself and others i've worked with, i can tell you readavail() would be very useful. for reference, .NET sockets has it. so of course .NET is NOT a model of great design, but it does show you trends and needs of programmers. > My current opinion on select-like functionality is > that you shouldn't need to import a module for it at > all. Rather, you should be able to attach a callback > directly to a stream. Then there just needs to be > a wait_for_something_to_happen() function somewhere > (perhaps with a timeout). yes, that's how i'd do it, but then how would you wait for multiple streams? compare select([sock1, sock2, sock3], [], []) to sock1.async_read(100, callback) how can you block/wait for multiple streams? -tomer On 6/5/06, Greg Ewing < greg.ewing at canterbury.ac.nz> wrote: > > tomer filiba wrote: > > > NetworkStreams have a readavail() method, which reads all the available > > in-queue data, as well as a may_read and a may_write properties > > I'm -1 on having multiple kinds of read methods which > are available only on some kinds of streams. The > basic interface of a stream should be dirt simple. > > Given a read-up-to-n-bytes method, it's easy to implement > read-exactly-n-bytes on top of it in a completely > generic way. So provide it as a function that operates > on a stream, or a method inherited from a generic base > class. > > > maybe introduce a new select module that has select-objects, like > > the Poll() class, that will default to using select(), but could use > > kqueue/epoll when possible? > > My current opinion on select-like functionality is > that you shouldn't need to import a module for it at > all. Rather, you should be able to attach a callback > directly to a stream. Then there just needs to be > a wait_for_something_to_happen() function somewhere > (perhaps with a timeout). > > Underneath, the implementation would use select, > poll, or whatever is most fun on the platform > concerned. > > -- > Greg > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060605/77c3b07a/attachment.html From tomerfiliba at gmail.com Mon Jun 5 19:16:40 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 5 Jun 2006 19:16:40 +0200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <20060604131031.69CF.JCARLSON@uci.edu> References: <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <20060604131031.69CF.JCARLSON@uci.edu> Message-ID: <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> > > well, it's too hard to design for a nonexisting module. select is all there > > is that's platform independent. > > It is /relatively/ platform independent. if it runs on windows, linux, *bsd, solaris, it's virtually platform independent. i don't consider the nokia N60 or whatever the name was, as well as other esoteric environments, as "platforms", at least not such that should be taken into consideration when designing APIs and standard modules. > I didn't know why at first, but I've figured > out that it is a combination of "I enjoy writing wire protocols" and "it > would be very nice if my old socket/file software continued to work in > py3k". [...] > Rather than changing what people expect with the current .read() method, > why not offer a different method called .readexact(n), which will read > exactly n bytes, performing buffering as necessary. okay, i give up on read(n) returning n bytes. that being said, and taking into account the "helpers" i suggested (a function named file/open that is API-compliant to today's file) -- i'd assume 80% of the code would be compatible. after all, the major use-cases of IO are files and sockets. if we keep those looking the same, at least the core APIs, most code should work fine. again, don't forget sock2 is separate from iostack, and can be used by itself. it has send/recv like normal sockets, is select()able, etc... the only adaptation needed for legacy code is converting "import socket" to "import sock2" (which would be unncecessary if it became the standard socket module), as well as converting s = socket.socket() s.connect(...) to s = socket.TcpSocket(...) grepping through the source can pinpoint these locations. > > random idea: > > when compiled with universal line support, python unicode should > > equate "\n" to any of the forementioned characters. > > i.e. > > > > u"\n" == u"\u2028" # True > > I'm glad that you later decided for yourself that such a thing would be > utterly and completely foolish. it's not foolish, it's bad. these are different things (foolish being "lacking a proper rationale", and bad being "destroying the very foundations of python"). but again, it was kept "for the record". > > f.position = -10 > > raises a ValueError, which is logical > > Raising a ValueError on an unseekable stream would be confusing. true, but so are TypeErrors for ArgumentErrors, or TypeErrors for HashErrors, etc. besides, why shouldn't attributes raise IOError? after all you are working with *IO*, so "s.position = -10" raising an IOError isn't all too strange. anyway, that's a technicality and the rest of the framework can suffer delaying that decision for later. > > class NetworkStream(InputStream, OutputStream): > > ... > > > > which version of close() gets called? > > Both, you use super(). if an InputStream and OutputStream are just interfaces, that's fine, but still, i don't find it acceptable for one method to be defined by two interfaces, and then have it intersected in a deriving class. perhaps the hierarchy should be class Stream: def close property closed def seek def tell class InputStream(Stream): def read def readexact def readall class OutputStream(Stream): def write but then, most of the streams, like files, pipes and sockets, would need to derive from both InputStream and OutputStream. another issue: class InputFile(InputStream) ... class OutputFile(OutputStream): ... class File(InputStream, OutputStream): .... i think there's gonna be much duplication of code, because FIle can't inherit from InputFile and OutputFile, as they are each a separate stream, while File is a single InOutStream. and a huge class hierarchy makes attribute lookups slower. -tomer On 6/4/06, Josiah Carlson wrote: > > "tomer filiba" wrote: > [snip] > > > - interaction with (replacement of?) the select module > > > > well, it's too hard to design for a nonexisting module. select is all there > > is that's platform independent. > > It is /relatively/ platform independent. > > > random idea: > > * select is virtually platform independent > > * improved polling is inconsistent > > * kqueue is BSD-only > > * epoll is linux-only > > * windows has none of those > > Windows doesn't currently have a module designed to do this kind of > thing, but it is possible to have a higher-performance method for > Windows using various bits from the win32file module from pywin32 (I > have been contemplating writing one, but I haven't had the time). > > [snip] > > > - - - - - > > > > > e.g an alternative approach would be to > > > define InputStream and OutputStream, and then have an IOStream that inherited > > > from both of them). > > > > hrrm... i need to think about this more. one problem i already see: > > > > class InputStream: > > def close(self):.... > > def read(self, count): ... > > > > class OutputStream: > > def close(self):.... > > def write(self, data)... > > > > class NetworkStream(InputStream, OutputStream): > > ... > > > > which version of close() gets called? > > Both, you use super(). > > > - - - - - > > > > > e.g. the 'position' property is > > > probably a bad idea, because x.position may then raise an IOError > > > > i guess it's reasonable approach, but i'm a "usability beats purity" guy. > > f.position = 0 > > or > > f.position += 10 > > > > is so much more convenient than seek()ing and tell()ing. we can also > > optimize += by defining a Position type where __iadd__(n) uses > > seek(n, "curr") instead of seek(n + tell(), "start") > > > > btw, you can first test the "seakable" attribute, to see if positioning > > would work. > > > > and in the worst case, i'd vote for converting IOErrors to ValueErrors... > > > > def _set_pos(self, n) > > try: > > self.seek(n) > > except IOError: > > raise ValueError("invalid position value", n) > > > > so that > > f.position = -10 > > raises a ValueError, which is logical > > Raising a ValueError on an unseekable stream would be confusing. > > [snip] > > - - - - - > > > > random idea: > > when compiled with universal line support, python unicode should > > equate "\n" to any of the forementioned characters. > > i.e. > > > > u"\n" == u"\u2028" # True > > I'm glad that you later decided for yourself that such a thing would be > utterly and completely foolish. > > > - - - - - > > > > > I can see that behaviour being seriously annoying when you get to the end of > > > the stream. I'd far prefer for the stream to just give me the last bit when I > > > ask for it and then tell me *next* time that there isn't anything left. > > > > well, today it's done like so: > > > > while True: > > x = f.read(100) > > if not x: > > break > > > > in iostack, that would be done like so: > > > > try: > > while True: > > x = f.read(100) > > except EOFError: > > last_x = f.readall() # read all the leftovers (0 <= leftovers < 100) > > > > a little longer, but not illogical > > > > > If you want a method with the other behaviour, add a "readexact" API, rather > > > than changing the semantics of "read" (although I'd be really curious to hear > > > the use case for the other behaviour). > > > > well, when i work with files/sockets, i tend to send data structures over them, > > like records, frames, protocols, etc. if a record is said to be x bytes long, > > and read(x) returns less than x bytes, my code has to loop until it gets > > enough bytes. > > Rather than changing what people expect with the current .read() method, > why not offer a different method called .readexact(n), which will read > exactly n bytes, performing buffering as necessary. You can then > optimize by using cStringIOs, lists of strings, resizable bytes, or > whatever other method you want (but be careful never to .read(bignum) > unless you change the underlying .read() implementation; right now it > allocates a buffer of size bignum, which can cause huge amounts of > malloc/realloc thrashing, and generally causes MemoryErrors). > > [snip] > > - Josiah > > From rasky at develer.com Mon Jun 5 20:00:41 2006 From: rasky at develer.com (Giovanni Bajo) Date: Mon, 5 Jun 2006 20:00:41 +0200 Subject: [Python-3000] iostack and sock2 References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> Message-ID: <016c01c688c9$f5b212d0$bf03030a@trilan> tomer filiba wrote: > some time ago i wrote this huge post about stackable IO and the > need for a new socket module. i've made some progress with > those, and i'd like to receive feedback. > > * a working alpha version of the new socket module (sock2) is > available for testing and tweaking with at > http://sebulba.wikispaces.com/project+sock2 > > * i'm working on a version of iostack... but i don't expect to make > a public release until mid july. in the meanwhile, i started a wiki > page on my site for it (motivation, plans, design): > http://sebulba.wikispaces.com/project+iostack > with lots of pretty-formatted info. i remember people saying > that stating `read(n)` returns exactly `n` bytes is problematic, > can you elaborate? Hi Tomer, this is great stuff you're doing! It's something that's really needed in my opinion. Basically, right now there's only a convention of passing around duck-typed things which have a "read" method, and that's all! It's nice to better define this duck-typed interface, and it seems you're doing very good progress on that. I hope I have more time to properly comment on this later (I'll wait for the first iteration of comments). One thing I would like to raise is the issue of KeyboardInterrupt. I find very inconvenient that a normal application doing a very simple blocking read from a socket can't be interrupted by a CTRL+C sequence. Usually, what I do is to setup a timeout on the sockets (eg. 0.4 seconds) and then simply retry if the data has not arrived yet. But this changes the code from: data = sock.recv(10) to: while 1: try: data = sock.recv(10) except socket.timeout: # just so that CTRL+C is processed continue else: break which is IMO counter-intuitive and un-pythonic. It's such a convoluted code that it happened once to me that another programmer collapsed this back into the bare sock.recv() because he couldn't immediately see why that complexity was required (of course, comments might have helped and stuff, but I guess you see my point). I believe that this kind of things ought to work by default with the minimum possible amount of code. Specifically, I think that the new iostack should allow blocking mode without trapping CTRL+C by default (which is the normal behaviour expected). I'm not sure if it's worth doing the auto-retry trick internally (bleah), or implement blocking calls with a call to select() so that you can also wait on signals, or something like that; I don't have a suggestion at this point, but I thought it was worth to raise the issue. -- Giovanni Bajo From jcarlson at uci.edu Mon Jun 5 20:44:15 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 05 Jun 2006 11:44:15 -0700 Subject: [Python-3000] iostack and sock2 In-Reply-To: <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> References: <20060604131031.69CF.JCARLSON@uci.edu> <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> Message-ID: <20060605110457.69DD.JCARLSON@uci.edu> "tomer filiba" wrote: > > > > well, it's too hard to design for a nonexisting module. select is all there > > > is that's platform independent. > > > > It is /relatively/ platform independent. > > if it runs on windows, linux, *bsd, solaris, it's virtually platform > independent. > i don't consider the nokia N60 or whatever the name was, as well as other > esoteric environments, as "platforms", at least not such that should be taken > into consideration when designing APIs and standard modules. [the following snipped from a different reply of yours] > compare > select([sock1, sock2, sock3], [], []) > to > sock1.async_read(100, callback) > > how can you block/wait for multiple streams? Depending on the constants defined during compile time, the file handle limit can be lower or higher than expected (I once used a version with a 32 handle limit; was a bit frustrating). Also, as discussed in the 'epoll implementation' thread on python-dev, an IOCP implementation for Windows could perhaps be written in such a way to be compatible with the libevent-python project. Visiting the libevent-python example script ( http://python-hpio.net/trac/browser/Projects/libevent-python/trunk/exa mples/echo_server.py) shows us how you can do such things. [snip] > > > random idea: > > > when compiled with universal line support, python unicode should > > > equate "\n" to any of the forementioned characters. > > > i.e. > > > > > > u"\n" == u"\u2028" # True > > > > I'm glad that you later decided for yourself that such a thing would be > > utterly and completely foolish. > > it's not foolish, it's bad. these are different things (foolish being "lacking > a proper rationale", and bad being "destroying the very foundations of > python"). but again, it was kept "for the record". I don't believe it would "[destroy] the very foundations of python" (unicode is not the very foundation of Python, and it wouldn't destroy unicode, only change its comparison semantics), but I do believe it "[lacks] a proper rationale". That is; unicode.split() should work as expected (if not, it should be fixed), and it seems as though line iteration over files with an encoding specified should deal with those other line endings - though its behavior in regards to universal newlines should probably be discussed. > > > f.position = -10 > > > raises a ValueError, which is logical > > > > Raising a ValueError on an unseekable stream would be confusing. > > true, but so are TypeErrors for ArgumentErrors, or TypeErrors for HashErrors, > etc. besides, why shouldn't attributes raise IOError? after all you are working > with *IO*, so "s.position = -10" raising an IOError isn't all too strange. > anyway, that's a technicality and the rest of the framework can suffer delaying > that decision for later. What other properties on other classes do is their own business. We are talking about this particular implementation of this particular feature on this particular class (or set of related classes). If given the choice of a ValueError or an IOError on f.position failure, I would opt for IOError; but I would prefer f.seek() and f.tell(), because with f.seek() you can use the "whence" parameter to get absolute or relative seeking. > > > class NetworkStream(InputStream, OutputStream): > > > ... > > > > > > which version of close() gets called? > > > > Both, you use super(). > > if an InputStream and OutputStream are just interfaces, that's fine, > but still, i don't find it acceptable for one method to be defined by > two interfaces, and then have it intersected in a deriving class. So have both InputStream and OutputStream use super to handle other possible .close() calls, rather than making their subclasses do so. > perhaps the hierarchy should be > > class Stream: > def close > property closed > def seek > def tell > > class InputStream(Stream): > def read > def readexact > def readall > > class OutputStream(Stream): > def write > > but then, most of the streams, like files, pipes and sockets, > would need to derive from both InputStream and OutputStream. But then there are other streams where you want to call two *different* .close() methods, and the above would only allow for 1. Closing multiple times shouldn't be a problem for most streams, but not closing enough could be a problem. > another issue: > > class InputFile(InputStream) > ... > class OutputFile(OutputStream): > ... > class File(InputStream, OutputStream): > .... > > i think there's gonna be much duplication of code, because FIle can't > inherit from InputFile and OutputFile, as they are each a separate stream, > while File is a single InOutStream. > > and a huge class hierarchy makes attribute lookups slower. Have you tried to measure this? In my tests (with Python 2.3), it's somewhere on the order of .2 microseconds per operation of difference between the original class and a 7th level subclass (i subclasses h, which subclasses g, which subclasses f, which subclasses, e, ...). - Josiah From tomerfiliba at gmail.com Mon Jun 5 21:06:48 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 5 Jun 2006 21:06:48 +0200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <016c01c688c9$f5b212d0$bf03030a@trilan> References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <016c01c688c9$f5b212d0$bf03030a@trilan> Message-ID: <1d85506f0606051206q20d9663am41bc47832c2486d4@mail.gmail.com> hey > One thing I would like to raise is the issue of KeyboardInterrupt. I find > very inconvenient that a normal application doing a very simple blocking > read from a socket can't be interrupted by a CTRL+C sequence. Usually, what > I do is to setup a timeout on the sockets (eg. 0.4 seconds) and then simply > retry if the data has not arrived yet. But this changes the code from: from my experience with linux and solaris, this CTRL+C problem only happens on windows machines. but then again, windows can't select() on anything but sockets, so there's not gonna be a generic solution. setting timeouts has some issues (inefficiency, platform dependency, etc.). but it's a good point to take into account. i'll see where that fits. -tomer On 6/5/06, Giovanni Bajo wrote: > tomer filiba wrote: > > > some time ago i wrote this huge post about stackable IO and the > > need for a new socket module. i've made some progress with > > those, and i'd like to receive feedback. > > > > * a working alpha version of the new socket module (sock2) is > > available for testing and tweaking with at > > http://sebulba.wikispaces.com/project+sock2 > > > > * i'm working on a version of iostack... but i don't expect to make > > a public release until mid july. in the meanwhile, i started a wiki > > page on my site for it (motivation, plans, design): > > http://sebulba.wikispaces.com/project+iostack > > with lots of pretty-formatted info. i remember people saying > > that stating `read(n)` returns exactly `n` bytes is problematic, > > can you elaborate? > > Hi Tomer, this is great stuff you're doing! It's something that's really > needed in my opinion. Basically, right now there's only a convention of > passing around duck-typed things which have a "read" method, and that's all! > It's nice to better define this duck-typed interface, and it seems you're > doing very good progress on that. I hope I have more time to properly > comment on this later (I'll wait for the first iteration of comments). > > One thing I would like to raise is the issue of KeyboardInterrupt. I find > very inconvenient that a normal application doing a very simple blocking > read from a socket can't be interrupted by a CTRL+C sequence. Usually, what > I do is to setup a timeout on the sockets (eg. 0.4 seconds) and then simply > retry if the data has not arrived yet. But this changes the code from: > > data = sock.recv(10) > > to: > > while 1: > try: > data = sock.recv(10) > except socket.timeout: > # just so that CTRL+C is processed > continue > else: > break > > which is IMO counter-intuitive and un-pythonic. It's such a convoluted code > that it happened once to me that another programmer collapsed this back into > the bare sock.recv() because he couldn't immediately see why that complexity > was required (of course, comments might have helped and stuff, but I guess > you see my point). > > I believe that this kind of things ought to work by default with the minimum > possible amount of code. Specifically, I think that the new iostack should > allow blocking mode without trapping CTRL+C by default (which is the normal > behaviour expected). I'm not sure if it's worth doing the auto-retry trick > internally (bleah), or implement blocking calls with a call to select() so > that you can also wait on signals, or something like that; I don't have a > suggestion at this point, but I thought it was worth to raise the issue. > -- > Giovanni Bajo > > From tomerfiliba at gmail.com Mon Jun 5 21:26:21 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 5 Jun 2006 21:26:21 +0200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <20060605110457.69DD.JCARLSON@uci.edu> References: <20060604131031.69CF.JCARLSON@uci.edu> <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> <20060605110457.69DD.JCARLSON@uci.edu> Message-ID: <1d85506f0606051226q7fa13a54jc69442fe029f5993@mail.gmail.com> > I don't believe it would "[destroy] the very foundations of python" > (unicode is not the very foundation of Python, and it wouldn't destroy > unicode, only change its comparison semantics), but I do believe it > "[lacks] a proper rationale" no, it would break the basic rules of comparisson. all of the sudden, 0x0a == 0x2028 == 0x85, etc. so you can't tell whether you got a "\x0a" character of "\x85" one... it would make python inconsistent. but this discussion is silly, let's quit it. we are both -1 on it. > That is; unicode.split() should work as > expected (if not, it should be fixed), and it seems as though line > iteration over files with an encoding specified should deal with those > other line endings - though its behavior in regards to universal > newlines should probably be discussed. unicode being native to python is gonna be one big pain to implement :) > If given the > choice of a ValueError or an IOError on f.position failure, I would opt > for IOError; so would i. > but I would prefer f.seek() and f.tell(), because with > f.seek() you can use the "whence" parameter to get absolute or relative > seeking. yes, but +=/-= can be overriden to provide "efficient seeking". and, just thought about it: just like negative indexes of sequences, negative positions should be relative to the end of the stream. for example: f.position = 4 # absolute -- seek(4, "start") f.position += 6 # relative to current -- seek(6, "curr") f.position = -7 # relative to end of stream -- seek(-7, "end") that's easy to implement and easy AND efficient to work with. > But then there are other streams where you want to call two *different* > .close() methods, and the above would only allow for 1. Closing > multiple times shouldn't be a problem for most streams, but not closing > enough could be a problem. hrrm... what do you mean by "closing multiple times"? like socket.shutdown for reading or for writing? but other than sockets, what else can be closed in multiple ways? you can't close the "reading" of a file, while keeping it open for writing. f = FileStream(...) InputStream.close(f) f.write(...) # exception: stream closed > Visiting the libevent-python example script ( > http://python-hpio.net/trac/browser/Projects/libevent-python/trunk/exa > mples/echo_server.py) shows us how you can do such things. i didn't see this before. i'll look into it. > > and a huge class hierarchy makes attribute lookups slower. > Have you tried to measure this? In my tests (with Python 2.3), it's > somewhere on the order of .2 microseconds per operation of difference no, i guess i fell for the common urban legend. sorry. thanks for the feedback. -tomer On 6/5/06, Josiah Carlson wrote: > > "tomer filiba" wrote: > > > > > > well, it's too hard to design for a nonexisting module. select is all there > > > > is that's platform independent. > > > > > > It is /relatively/ platform independent. > > > > if it runs on windows, linux, *bsd, solaris, it's virtually platform > > independent. > > i don't consider the nokia N60 or whatever the name was, as well as other > > esoteric environments, as "platforms", at least not such that should be taken > > into consideration when designing APIs and standard modules. > > [the following snipped from a different reply of yours] > > compare > > select([sock1, sock2, sock3], [], []) > > to > > sock1.async_read(100, callback) > > > > how can you block/wait for multiple streams? > > Depending on the constants defined during compile time, the file handle > limit can be lower or higher than expected (I once used a version with a > 32 handle limit; was a bit frustrating). Also, as discussed in the > 'epoll implementation' thread on python-dev, an IOCP implementation for > Windows could perhaps be written in such a way to be compatible with the > libevent-python project. Visiting the libevent-python example script ( > http://python-hpio.net/trac/browser/Projects/libevent-python/trunk/exa > mples/echo_server.py) shows us how you can do such things. > > [snip] > > > > random idea: > > > > when compiled with universal line support, python unicode should > > > > equate "\n" to any of the forementioned characters. > > > > i.e. > > > > > > > > u"\n" == u"\u2028" # True > > > > > > I'm glad that you later decided for yourself that such a thing would be > > > utterly and completely foolish. > > > > it's not foolish, it's bad. these are different things (foolish being "lacking > > a proper rationale", and bad being "destroying the very foundations of > > python"). but again, it was kept "for the record". > > I don't believe it would "[destroy] the very foundations of python" > (unicode is not the very foundation of Python, and it wouldn't destroy > unicode, only change its comparison semantics), but I do believe it > "[lacks] a proper rationale". That is; unicode.split() should work as > expected (if not, it should be fixed), and it seems as though line > iteration over files with an encoding specified should deal with those > other line endings - though its behavior in regards to universal > newlines should probably be discussed. > > > > > > f.position = -10 > > > > raises a ValueError, which is logical > > > > > > Raising a ValueError on an unseekable stream would be confusing. > > > > true, but so are TypeErrors for ArgumentErrors, or TypeErrors for HashErrors, > > etc. besides, why shouldn't attributes raise IOError? after all you are working > > with *IO*, so "s.position = -10" raising an IOError isn't all too strange. > > anyway, that's a technicality and the rest of the framework can suffer delaying > > that decision for later. > > What other properties on other classes do is their own business. We are > talking about this particular implementation of this particular feature > on this particular class (or set of related classes). If given the > choice of a ValueError or an IOError on f.position failure, I would opt > for IOError; but I would prefer f.seek() and f.tell(), because with > f.seek() you can use the "whence" parameter to get absolute or relative > seeking. > > > > > > class NetworkStream(InputStream, OutputStream): > > > > ... > > > > > > > > which version of close() gets called? > > > > > > Both, you use super(). > > > > if an InputStream and OutputStream are just interfaces, that's fine, > > but still, i don't find it acceptable for one method to be defined by > > two interfaces, and then have it intersected in a deriving class. > > So have both InputStream and OutputStream use super to handle other > possible .close() calls, rather than making their subclasses do so. > > > > perhaps the hierarchy should be > > > > class Stream: > > def close > > property closed > > def seek > > def tell > > > > class InputStream(Stream): > > def read > > def readexact > > def readall > > > > class OutputStream(Stream): > > def write > > > > but then, most of the streams, like files, pipes and sockets, > > would need to derive from both InputStream and OutputStream. > > But then there are other streams where you want to call two *different* > .close() methods, and the above would only allow for 1. Closing > multiple times shouldn't be a problem for most streams, but not closing > enough could be a problem. > > > > another issue: > > > > class InputFile(InputStream) > > ... > > class OutputFile(OutputStream): > > ... > > class File(InputStream, OutputStream): > > .... > > > > i think there's gonna be much duplication of code, because FIle can't > > inherit from InputFile and OutputFile, as they are each a separate stream, > > while File is a single InOutStream. > > > > and a huge class hierarchy makes attribute lookups slower. > > Have you tried to measure this? In my tests (with Python 2.3), it's > somewhere on the order of .2 microseconds per operation of difference > between the original class and a 7th level subclass (i subclasses h, > which subclasses g, which subclasses f, which subclasses, e, ...). > > > - Josiah > > From greg.ewing at canterbury.ac.nz Tue Jun 6 02:26:17 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 06 Jun 2006 12:26:17 +1200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com> References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <44836417.5080209@canterbury.ac.nz> <1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com> Message-ID: <4484CBA9.6070507@canterbury.ac.nz> tomer filiba wrote: > I wrote: > > My current opinion on select-like functionality is > > that you shouldn't need to import a module for it at > > all. Rather, you should be able to attach a callback > > directly to a stream. Then there just needs to be > > a wait_for_something_to_happen() function somewhere > > (perhaps with a timeout). > > yes, that's how i'd do it, but then how would you wait for > multiple streams? # somewhere in the program stream1.on_readable = handle_stream1 # somewhere else stream2.on_readable = handle_stream2 # and the main loop says while program_is_running(): wait_for_streams() The wait_for_streams function waits for activity on any stream which has a callback, and calls it. (BTW, I actually think this sort of functionality should be part of the OS kernel, with event-driven programs and libraries being so important nowadays. Sort of like being able to define signal handlers for file descriptors instead of having a small, fixed number of signals.) -- Greg From greg.ewing at canterbury.ac.nz Tue Jun 6 02:32:51 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 06 Jun 2006 12:32:51 +1200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> References: <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <20060604131031.69CF.JCARLSON@uci.edu> <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> Message-ID: <4484CD33.10400@canterbury.ac.nz> tomer filiba wrote: > okay, i give up on read(n) returning n bytes. An idea I had about this some time ago was that read() could be callable with two arguments: f.read(min_bytes, max_bytes) The two variations we're considering would then be special cases of this: f.read(0, num_bytes) # current read() behaviour f.read(num_bytes, num_bytes) # record-oriented read() behaviour -- Greg From greg.ewing at canterbury.ac.nz Tue Jun 6 02:57:07 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 06 Jun 2006 12:57:07 +1200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <1d85506f0606051226q7fa13a54jc69442fe029f5993@mail.gmail.com> References: <20060604131031.69CF.JCARLSON@uci.edu> <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> <20060605110457.69DD.JCARLSON@uci.edu> <1d85506f0606051226q7fa13a54jc69442fe029f5993@mail.gmail.com> Message-ID: <4484D2E3.20402@canterbury.ac.nz> tomer filiba wrote: > yes, but +=/-= can be overriden to provide "efficient seeking". and, just > thought about it: just like negative indexes of sequences, negative positions > should be relative to the end of the stream. for example: > > f.position = 4 # absolute -- seek(4, "start") > f.position += 6 # relative to current -- seek(6, "curr") > f.position = -7 # relative to end of stream -- seek(-7, "end") How would you seek to exactly the end of the file, without introducing signed integer zeroes to Python?-) -- Greg From jimjjewett at gmail.com Tue Jun 6 02:57:24 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 5 Jun 2006 20:57:24 -0400 Subject: [Python-3000] [Python-Dev] Stdlib Logging questions (PEP 337 SoC) In-Reply-To: <5.1.1.6.0.20060604231709.02f07700@mail.telecommunity.com> References: <5.1.1.6.0.20060604231709.02f07700@mail.telecommunity.com> Message-ID: On 6/4/06, Phillip J. Eby wrote: > can we please delay the import until it's actually needed? i.e., > until after some logging option is enabled? I have asked her to make this change. I don't like the extra conditional dance it causes, but I agree that not wanting to log is a valid use case. On the other hand, the one-time import cost is pretty low for a long-running process, and eventually gets paid if any other module calls logging. Would it make more sense to offer a null package that can be installed earlier in the search path if you want to truly disable logging? -jJ From jimjjewett at gmail.com Tue Jun 6 03:05:15 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 5 Jun 2006 21:05:15 -0400 Subject: [Python-3000] [Python-Dev] Stdlib Logging questions (PEP 337 SoC) In-Reply-To: References: <5.1.1.6.0.20060604231709.02f07700@mail.telecommunity.com> Message-ID: oops -- this was meant for python-dev, not python-3000. From lunz at falooley.org Tue Jun 6 03:44:36 2006 From: lunz at falooley.org (Jason Lunz) Date: Tue, 6 Jun 2006 01:44:36 +0000 (UTC) Subject: [Python-3000] iostack and sock2 References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <44836417.5080209@canterbury.ac.nz> <1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com> <4484CBA9.6070507@canterbury.ac.nz> Message-ID: greg.ewing at canterbury.ac.nz said: > (BTW, I actually think this sort of functionality should be part of > the OS kernel, with event-driven programs and libraries being so > important nowadays. Sort of like being able to define signal handlers > for file descriptors instead of having a small, fixed number of > signals.) do you mean that hypothetically? That's supported on linux, but I don't know how portable it is. You *can* define signal handlers for file descriptors. See F_SETSIG in fcntl(2), and sigaction(2). Jason From talin at acm.org Tue Jun 6 09:53:38 2006 From: talin at acm.org (Talin) Date: Tue, 06 Jun 2006 00:53:38 -0700 Subject: [Python-3000] String formatting: Conversion specifiers Message-ID: <44853482.3050103@acm.org> I've been slowly working on PEP 3101, specifically fleshing out the details, and there's a couple of issues that I wanted to run by the group mind here. Originally, I decided to punt on the issue of field conversion specifiers (i.e. %2.2s etc.) and simply say that they were unchanged from the existing implementation. However, I've been looking over the source for PyString_Format, and I'm thinking that what the code for handling field conversions is a lot more complicated than what we really need here. Here is a list of the conversion types that are currently supported by the % operator. First thing you notice is an eerie similarity between this and the documentation for 'sprintf'. :) Conversion Meaning Notes d Signed integer decimal. i Signed integer decimal. o Unsigned octal. (1) u Unsigned decimal. x Unsigned hexadecimal (lowercase). (2) X Unsigned hexadecimal (uppercase). (2) e Floating point exponential format (lowercase). E Floating point exponential format (uppercase). f Floating point decimal format. F Floating point decimal format. g Same as "e" if exponent is greater than -4 or less than precision, "f" otherwise. G Same as "E" if exponent is greater than -4 or less than precision, "F" otherwise. c Single character (accepts integer or single character string). r String (converts any python object using repr()). (3) s String (converts any python object using str()). (4) % No argument is converted, results in a "%" character in the result. Now, unlike C, in Python we already know the type of the thing we're going to print. So there's no need to tell the system 'this is a float' or 'this is an integer'. The only way I could see this being useful is if you had a type and wanted it to print out as some different type - but is that really the proper role of the string formatter? Similarly, what does it mean to have an 'unsigned' quantity in Python? If you say "print this negative number as unsigned", what does that mean? Does it take the absolute value, or does it do what C does and takes the number modulo 2^32? Neither seems particularly correct or intuitive to me. So I decided to sit down and rethink the whole conversion specifier system. I looked at the docs for the '%' operator, and some other languages, and here is what I came up with (this is an excerpt from the revised PEP.) Oh, and I should mention that I have a working implementation of what is described below. -------------------------- Standard Conversion Specifiers Most built-in types will support a standard set of conversion specifiers. These are similar in concept to the conversion specifiers used by the existing '%' operator, however there are also a number of significant differences. The general form of the standard conversion specifier is: [flags][length][.precision][type] The brackets ([]) indicate an optional field. The flags can be one of the following: '+' - indicates that a sign should be used for both positive as well as negative numbers (normally only negative numbers will have a sign.) '<' - Forces the field to be left-aligned within the available space (This is the default.) '>' - Forces the field to be right-aligned within the available space. '0' - Causes any leftover space in the field to be filled with leading zeros. Note that this option also implies that the field is right-aligned. ' ' - Causes the leftover space in the field to be filled with spaces. 'length' is the minimum field width. If not specified, then the field width will be determined by the content. For a numeric value, 'precision' is the number of digits after the decimal point that should be displayed. Finally, the 'type' determines how the data should be presented. It is generally only used for numeric types - string types do not need to indicate a type. The available types are: 'b' - Binary. Outputs the number in base 2. 'c' - Character. Converts the integer to the corresponding unicode character before printing. 'd' - Decimal Integer. Prints only the whole-number portion of the number. 'e' - Exponent notation. Prints the number in scientific notation using the letter 'e' to indicate the exponent. 'E' - Exponent notation. Same as 'e' except it uses an upper case 'E' as the separator character. 'f' - Fixed point. Displays the number as a fixed-point number. 'F' - Fixed point. Same as 'f'. 'g' - General format. This prints the number as a fixed-point number, unless the number is too large, in which case it switches to exponent notation. 'G' - General format. Same as 'g' except switches to 'E' if the number gets to large. 'n' - Number. This is the same as 'g', except that it uses the current locale setting to insert the appropriate number separator characters. 'o' - Octal format. Outputs the number in base 8. 'r' - Repr format. Outputs the value in a format which is likely to be readable by the interpreter. Also works with non-numeric fields. 'x' - Hex format. Outputs the number in base 16, using lower- case letters for the upper digits. 'X' - Hex format. Outputs the number in base 16, using upper- case letters for the upper digits. '%' - Percentage. Multiplies the number by 100 and displays in fixed ('f') format, followed by a percent sign. For non-built-in types, the conversion specifiers will be specific to that type. An example is the 'datetime' class, whose conversion specifiers might look something like the arguments to the strftime() function: "Today is: {0:a b d H:M:S Y}".format(datetime.now()) -- Talin From rasky at develer.com Tue Jun 6 10:34:11 2006 From: rasky at develer.com (Giovanni Bajo) Date: Tue, 6 Jun 2006 10:34:11 +0200 Subject: [Python-3000] iostack and sock2 References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com><016c01c688c9$f5b212d0$bf03030a@trilan> <1d85506f0606051206q20d9663am41bc47832c2486d4@mail.gmail.com> Message-ID: <009301c68943$fc37fe60$3db72997@bagio> tomer filiba wrote: >> One thing I would like to raise is the issue of KeyboardInterrupt. I >> find very inconvenient that a normal application doing a very simple >> blocking read from a socket can't be interrupted by a CTRL+C >> sequence. Usually, what I do is to setup a timeout on the sockets >> (eg. 0.4 seconds) and then simply retry if the data has not arrived >> yet. But this changes the code from: > > from my experience with linux and solaris, this CTRL+C problem only > happens on windows machines. but then again, windows can't select() > on anything but sockets, so there's not gonna be a generic solution. Windows has WaitForMultipleObjects() which can be used to multiplex between sockets and other handles. Giovanni Bajo From ncoghlan at gmail.com Tue Jun 6 11:51:32 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 06 Jun 2006 19:51:32 +1000 Subject: [Python-3000] iostack and sock2 In-Reply-To: <4484CD33.10400@canterbury.ac.nz> References: <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <20060604131031.69CF.JCARLSON@uci.edu> <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> <4484CD33.10400@canterbury.ac.nz> Message-ID: <44855024.1010408@gmail.com> Greg Ewing wrote: > tomer filiba wrote: > >> okay, i give up on read(n) returning n bytes. > > An idea I had about this some time ago was that read() > could be callable with two arguments: > > f.read(min_bytes, max_bytes) > > The two variations we're considering would then be special > cases of this: > > f.read(0, num_bytes) # current read() behaviour > > f.read(num_bytes, num_bytes) # record-oriented read() behaviour You can even makes this backwards compatible by having the min_bytes argument default to 0. (whether or not the order of the two arguments should be reversed in that case is debatable, though) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Tue Jun 6 11:47:28 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 06 Jun 2006 19:47:28 +1000 Subject: [Python-3000] iostack and sock2 In-Reply-To: <4484D2E3.20402@canterbury.ac.nz> References: <20060604131031.69CF.JCARLSON@uci.edu> <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> <20060605110457.69DD.JCARLSON@uci.edu> <1d85506f0606051226q7fa13a54jc69442fe029f5993@mail.gmail.com> <4484D2E3.20402@canterbury.ac.nz> Message-ID: <44854F30.60500@gmail.com> Greg Ewing wrote: > tomer filiba wrote: > >> yes, but +=/-= can be overriden to provide "efficient seeking". and, just >> thought about it: just like negative indexes of sequences, negative positions >> should be relative to the end of the stream. for example: >> >> f.position = 4 # absolute -- seek(4, "start") >> f.position += 6 # relative to current -- seek(6, "curr") >> f.position = -7 # relative to end of stream -- seek(-7, "end") > > How would you seek to exactly the end of the file, > without introducing signed integer zeroes to Python?-) Since it doesn't mean anything else, you could define a position of "None" as meaning 'just past the last valid byte in the file' (i.e., right at the end). Then "f.position = None" would seek to the end. This actually matches the way None behaves when it is used as the endpoint of a slice: range(3)[0:None] returns [0, 1, 2, 3]. If that's not intuitive enough for your tastes, then a class attribute would also work: f.position = f.END (f.END would be serving as 'signed zero', since f.position -=1 and f.position = -1 would do the same thing) FWIW, I also realised my objection to properties raising IOError doesn't apply for IO streams - unlike path objects, an IO stream *only* makes sense if you have access to the underlying IO layer. So documenting certain properties as potentially raising IOError seems legitimate in this case. I'm glad I thought of that, since I like this API a lot better than mucking around with passing strings to seek() ;) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ronaldoussoren at mac.com Tue Jun 6 12:06:19 2006 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 6 Jun 2006 12:06:19 +0200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <44855024.1010408@gmail.com> References: <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <20060604131031.69CF.JCARLSON@uci.edu> <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> <4484CD33.10400@canterbury.ac.nz> <44855024.1010408@gmail.com> Message-ID: On 6-jun-2006, at 11:51, Nick Coghlan wrote: > Greg Ewing wrote: >> tomer filiba wrote: >> >>> okay, i give up on read(n) returning n bytes. >> >> An idea I had about this some time ago was that read() >> could be callable with two arguments: >> >> f.read(min_bytes, max_bytes) >> >> The two variations we're considering would then be special >> cases of this: >> >> f.read(0, num_bytes) # current read() behaviour >> >> f.read(num_bytes, num_bytes) # record-oriented read() behaviour > > You can even makes this backwards compatible by having the > min_bytes argument > default to 0. (whether or not the order of the two arguments should be > reversed in that case is debatable, though) I'm slighly worried about this thread. Async I/O and "read exactly N bytes" don't really match up. I don't know about the other mechanisms, but at least with select and poll when the system says you can read from a file descriptor you're only guaranteed that one call to read(2)/recv(2)/... won't block. The implementation of a python read method that returns exactly the number of bytes that you requested will have to call the read system call in a loop and hence might block. There's also to issue of error handling: what happens when the first call to the read system call doesn't return enough data and the second call fails? Does this raise an exception (I suppose it does) and if so, what happens with the data that was returned by the first call to the read system call? All in all I'm not too thrilled by having this behaviour. It is handy when implementing record-oriented I/O, but not when doing line- oriented I/O. BTW. Has anyone looked at the consequences of the new iostack and sock2 for libraries like Twisted? Ronald From ncoghlan at gmail.com Tue Jun 6 12:45:17 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 06 Jun 2006 20:45:17 +1000 Subject: [Python-3000] String formatting: Conversion specifiers In-Reply-To: <44853482.3050103@acm.org> References: <44853482.3050103@acm.org> Message-ID: <44855CBD.3010507@gmail.com> Talin wrote: > So I decided to sit down and rethink the whole conversion specifier > system. I looked at the docs for the '%' operator, and some other > languages, and here is what I came up with (this is an excerpt from the > revised PEP.) Generally nice, but I'd format the writeup a bit differently (see below) and reorder the elements so that an arbitrary character can be supplied as the fill character and the old ' ' sign flag behaviour remains available. I'd also design it so that the standard conversion specifiers are available 'for free' (i.e., they work for any class, unless the class author deliberately replaces them with something else). Cheers, Nick. -------------------------------- Standard Conversion Specifiers If an object does not define its own conversion specifiers, a standard set of conversion specifiers are used. These are similar in concept to the conversion specifiers used by the existing '%' operator, however there are also a number of significant differences. The standard conversion specifiers fall into three major categories: string conversions, integer conversions and floating point conversions. The general form of a string conversion specifier is: [[fill][align]width][type] The brackets ([]) indicate an optional field. 'width' is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content. If the minimum field width is defined, then the optional align flag can be one of the following: '<' - Forces the field to be left-aligned within the available space (This is the default.) '>' - Forces the field to be right-aligned within the available space. The optional 'fill' character defines the character to be used to pad the field to the minimum width. The alignment flag must be supplied if the character is a number other than 0 (otherwise the character would be interpreted as part of the field width specifier). Finally, the 'type' determines how the data should be presented. The available string conversion types are: 's' - String format. Invokes str() on the object. This is the default conversion specifier type. 'r' - Repr format. Invokes repr() on the object. The general form of an integer conversion specifier is: [[fill][align]width][sign]type The 'fill', 'align' and 'width' fields are as for string conversion specifiers. The 'sign' field can be one of the following: '+' - indicates that a sign should be used for both positive as well as negative numbers '-' - indicates that a sign should be used only for negative numbers (this is the default behaviour) ' ' - indicates that a leading space should be used on positive numbers '()' - indicates that negative numbers should be surrounded by parentheses There are several integer conversion types. All invoke int() on the object before attempting to format it. The available integer conversion types are: 'b' - Binary. Outputs the number in base 2. 'c' - Character. Converts the integer to the corresponding unicode character before printing. 'd' - Decimal Integer. Outputs the number in base 10. 'o' - Octal format. Outputs the number in base 8. 'x' - Hex format. Outputs the number in base 16, using lower- case letters for the digits above 9. 'X' - Hex format. Outputs the number in base 16, using upper- case letters for the digits above 9. The general form of a floating point conversion specifier is: [[fill][align]width][.precision][sign]type The 'fill', 'align', 'width' and 'sign' fields are as for integer conversion specifiers. The 'precision' field is a decimal number indicating how many digits should be displayed after the decimal point. There are several floating point conversion types. All invoke float() on the object before attempting to format it. The available floating point conversion types are: 'e' - Exponent notation. Prints the number in scientific notation using the letter 'e' to indicate the exponent. 'E' - Exponent notation. Same as 'e' except it uses an upper case 'E' as the separator character. 'f' - Fixed point. Displays the number as a fixed-point number. 'F' - Fixed point. Same as 'f'. 'g' - General format. This prints the number as a fixed-point number, unless the number is too large, in which case it switches to 'e' exponent notation. 'G' - General format. Same as 'g' except switches to 'E' if the number gets to large. 'n' - Number. This is the same as 'g', except that it uses the current locale setting to insert the appropriate number separator characters. '%' - Percentage. Multiplies the number by 100 and displays in fixed ('f') format, followed by a percent sign. Objects are able to define their own conversion specifiers to replace the standard ones. An example is the 'datetime' class, whose conversion specifiers might look something like the arguments to the strftime() function: "Today is: {0:a b d H:M:S Y}".format(datetime.now()) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From qrczak at knm.org.pl Tue Jun 6 12:49:34 2006 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Tue, 06 Jun 2006 12:49:34 +0200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <4484CD33.10400@canterbury.ac.nz> (Greg Ewing's message of "Tue, 06 Jun 2006 12:32:51 +1200") References: <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <20060604131031.69CF.JCARLSON@uci.edu> <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> <4484CD33.10400@canterbury.ac.nz> Message-ID: <87fyiih83l.fsf@qrnik.zagroda> Greg Ewing writes: > The two variations we're considering would then be special > cases of this: > > f.read(0, num_bytes) # current read() behaviour > > f.read(num_bytes, num_bytes) # record-oriented read() behaviour Current read() reads at least 1 byte. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From mcherm at mcherm.com Tue Jun 6 14:23:17 2006 From: mcherm at mcherm.com (Michael Chermside) Date: Tue, 06 Jun 2006 05:23:17 -0700 Subject: [Python-3000] String formatting: Conversion specifiers Message-ID: <20060606052317.yrbvse4311q80go0@login.werra.lunarpages.com> Talin writes: > So I decided to sit down and rethink the whole conversion specifier > system. +1: good idea! Nick Coghlan writes: > Generally nice, but I'd format the writeup a bit differently (see below) and > reorder the elements so that an arbitrary character can be supplied as the > fill character and the old ' ' sign flag behaviour remains available. +1: nice tweak -- Michael Chermside From greg.ewing at canterbury.ac.nz Tue Jun 6 15:20:02 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 07 Jun 2006 01:20:02 +1200 Subject: [Python-3000] iostack and sock2 In-Reply-To: References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <44836417.5080209@canterbury.ac.nz> <1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com> <4484CBA9.6070507@canterbury.ac.nz> Message-ID: <44858102.30306@canterbury.ac.nz> Jason Lunz wrote: > greg.ewing at canterbury.ac.nz said: > > > Sort of like being able to define signal handlers > > for file descriptors instead of having a small, fixed number of > > signals.) > > That's supported on linux, but I don't > know how portable it is. See F_SETSIG in fcntl(2), and sigaction(2). According to the man page, it's Linux-specific. It's not quite the same thing, anyway. What I had in mind was attaching the handler itself directly to the file descriptor, rather than going through a signal number. That way, different piece of code can use the mechanism independently on different file descriptors without having to coordinate over sharing a signal handler. -- Greg From greg.ewing at canterbury.ac.nz Tue Jun 6 15:27:44 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 07 Jun 2006 01:27:44 +1200 Subject: [Python-3000] iostack and sock2 In-Reply-To: References: <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <20060604131031.69CF.JCARLSON@uci.edu> <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> <4484CD33.10400@canterbury.ac.nz> <44855024.1010408@gmail.com> Message-ID: <448582D0.7080506@canterbury.ac.nz> Ronald Oussoren wrote: > I'm slighly worried about this thread. Async I/O and "read exactly N > bytes" don't really match up. I don't know about the other mechanisms, > but at least with select and poll when the system says you can read > from a file descriptor you're only guaranteed that one call to > read(2)/recv(2)/... won't block. The implementation of a python read > method that returns exactly the number of bytes that you requested will > have to call the read system call in a loop and hence might block. This is one case where the callback model of async i/o may help. If there were a way to say "don't call me until you've got n bytes ready", the descriptor could become ready multiple times and multiple reads performed behind the scenes, then when enough bytes are there, your callback is called. -- Greg From greg.ewing at canterbury.ac.nz Tue Jun 6 15:30:55 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 07 Jun 2006 01:30:55 +1200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <87fyiih83l.fsf@qrnik.zagroda> References: <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <20060604131031.69CF.JCARLSON@uci.edu> <1d85506f0606051016w366b231as2cbc413ed1bbf7e6@mail.gmail.com> <4484CD33.10400@canterbury.ac.nz> <87fyiih83l.fsf@qrnik.zagroda> Message-ID: <4485838F.50407@canterbury.ac.nz> Marcin 'Qrczak' Kowalczyk wrote: > Greg Ewing writes: >> f.read(0, num_bytes) # current read() behaviour > > Current read() reads at least 1 byte. Except if EOF is reached before getting any bytes. In that case, if min_bytes is 0, the call simply returns 0 bytes. If min_bytes is greater than 0, it raises EOFError. -- Greg From lunz at falooley.org Tue Jun 6 16:58:55 2006 From: lunz at falooley.org (Jason Lunz) Date: Tue, 6 Jun 2006 10:58:55 -0400 Subject: [Python-3000] iostack and sock2 In-Reply-To: <44858102.30306@canterbury.ac.nz> References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <44836417.5080209@canterbury.ac.nz> <1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com> <4484CBA9.6070507@canterbury.ac.nz> <44858102.30306@canterbury.ac.nz> Message-ID: <20060606145855.GC14823@knob.reflex> On Wed, Jun 07, 2006 at 01:20:02AM +1200, Greg Ewing wrote: > It's not quite the same thing, anyway. What I had in mind was > attaching the handler itself directly to the file descriptor, rather > than going through a signal number. That way, different piece of code > can use the mechanism independently on different file descriptors > without having to coordinate over sharing a signal handler. I imagine if one were going to do this, that would be hidden in the stdlib. The OS gives you the primitive needed to implement it, but yes, there would have to be some infrastructure attached to the signal handler to look up the fd on each SIGIO and multiplex the event out to that fd's registered handler. >From the point of view of the code attaching the handler to the fd, that would all be pretty transparent, though. This all reminds me of something I've been wondering about - how do people feel about beefing up the stdlib's support for os primitives? There are plenty of things in the os module that are unix-only, but at the same time I've had to code things myself in C (like file descriptor passing over a unix socket, for example). One of the things I like about python is that it doesn't take a java-like approach of only exposing lowest-common-denominator OS facilities. os.select() is a good example - it's far more useful on unix than on Windows because of the platforms' respective implementations. But precisely because of that, it strikes me that python ought to expose the windows equivalent of nonblocking i/o and select/poll - i think that's overlapped i/o? Something like win32all, iow, could have a place in the standard distribution. Jason From janssen at parc.com Tue Jun 6 18:29:27 2006 From: janssen at parc.com (Bill Janssen) Date: Tue, 6 Jun 2006 09:29:27 PDT Subject: [Python-3000] String formatting: Conversion specifiers In-Reply-To: Your message of "Tue, 06 Jun 2006 00:53:38 PDT." <44853482.3050103@acm.org> Message-ID: <06Jun6.092931pdt."58641"@synergy1.parc.xerox.com> > Here is a list of the conversion types that are currently supported by > the % operator. First thing you notice is an eerie similarity between > this and the documentation for 'sprintf'. :) Yes. This is (or was) a significant advantage to the system. Many people already had mastered the C/C++ printf system of specifiers, and could use Python's with no mental upgrades. Is that no longer thought to be an advantage? > So there's no need to tell the system 'this is a float' > or 'this is an integer'. Except that the type specifier can affect the interpretation of the rest of the format string. For example, %.3f means to print three fractional digits. > The only way I could see this being useful is > if you had a type and wanted it to print out as some different type - > but is that really the proper role of the string formatter? Isn't that exactly what the string formatter does? I've got a binary value and want to express it as a different type, a string? Type punning at the low levels is often a useful debugging tool. Bill From jcarlson at uci.edu Tue Jun 6 18:38:05 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 06 Jun 2006 09:38:05 -0700 Subject: [Python-3000] iostack and sock2 In-Reply-To: <448582D0.7080506@canterbury.ac.nz> References: <448582D0.7080506@canterbury.ac.nz> Message-ID: <20060606091601.6A02.JCARLSON@uci.edu> Greg Ewing wrote: > > Ronald Oussoren wrote: > > > I'm slighly worried about this thread. Async I/O and "read exactly N > > bytes" don't really match up. I don't know about the other mechanisms, > > but at least with select and poll when the system says you can read > > from a file descriptor you're only guaranteed that one call to > > read(2)/recv(2)/... won't block. The implementation of a python read > > method that returns exactly the number of bytes that you requested will > > have to call the read system call in a loop and hence might block. > > This is one case where the callback model of async i/o may > help. If there were a way to say "don't call me until you've > got n bytes ready", the descriptor could become ready > multiple times and multiple reads performed behind the > scenes, then when enough bytes are there, your callback > is called. class ReadExactly: def __init__(self, callback, current_count): self.remaining = current_count self.callback = callback self.buffer = [] def __call__(self, data): while data: if len(data) >= self.remaining: b, self.buffer = self.buffer, [] b.append(data[:self.remaining]) data = data[:self.remaining] self.remaining = 0 self.callback(''.join(b), reader=self) else: self.buffer.append(data) self.remaining -= len(data) break Generally though, it's a bit easier to handle the piecewise reading, etc., as part of the async socket class. The asynchat module uses, handle_read(), collect_incoming_data(data), and found_terminator(); a semantic I've borrowed for my own asynchronous socket classes and have been fairly happy with. asynchat implements the handle_read() portion, which knows about line-terminated protocols (pop, smtp, http, ...) as well as protocols using the 'read X bytes semantic', where X can be fixed or variable. (read 4 bytes, decode, read X bytes, decode, read 4...) If the new asynchronous class had some equivalent functionality, and some reasonable set of default behavior (overridable via subclass or flags), you could get arbitrarily desired behavior; from "call this thing whenever you get data", to "call this thing when you have gotten X bytes", to "call this thing when you have found this ''line'' terminator X", etc. - Josiah From tomerfiliba at gmail.com Tue Jun 6 19:43:15 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Tue, 6 Jun 2006 19:43:15 +0200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <009301c68943$fc37fe60$3db72997@bagio> References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <016c01c688c9$f5b212d0$bf03030a@trilan> <1d85506f0606051206q20d9663am41bc47832c2486d4@mail.gmail.com> <009301c68943$fc37fe60$3db72997@bagio> Message-ID: <1d85506f0606061043wef3f0d7jd25cb11a5f3123ef@mail.gmail.com> WaitForMultipleObjects doesnt work on sockets of files... On 6/6/06, Giovanni Bajo wrote: > tomer filiba wrote: > > >> One thing I would like to raise is the issue of KeyboardInterrupt. I > >> find very inconvenient that a normal application doing a very simple > >> blocking read from a socket can't be interrupted by a CTRL+C > >> sequence. Usually, what I do is to setup a timeout on the sockets > >> (eg. 0.4 seconds) and then simply retry if the data has not arrived > >> yet. But this changes the code from: > > > > from my experience with linux and solaris, this CTRL+C problem only > > happens on windows machines. but then again, windows can't select() > > on anything but sockets, so there's not gonna be a generic solution. > > Windows has WaitForMultipleObjects() which can be used to multiplex between > sockets and other handles. > > Giovanni Bajo > > From talin at acm.org Tue Jun 6 20:07:09 2006 From: talin at acm.org (Talin) Date: Tue, 06 Jun 2006 11:07:09 -0700 Subject: [Python-3000] String formatting: Conversion specifiers In-Reply-To: <44855CBD.3010507@gmail.com> References: <44853482.3050103@acm.org> <44855CBD.3010507@gmail.com> Message-ID: <4485C44D.70007@acm.org> Nick Coghlan wrote: > Talin wrote: > >> So I decided to sit down and rethink the whole conversion specifier >> system. I looked at the docs for the '%' operator, and some other >> languages, and here is what I came up with (this is an excerpt from >> the revised PEP.) > > > Generally nice, but I'd format the writeup a bit differently (see below) > and reorder the elements so that an arbitrary character can be supplied > as the fill character and the old ' ' sign flag behaviour remains > available. Looks good - thanks for the feedback. My only comment is that I think that I would still like to have the sign field before the width. I'm pretty sure that this can be parsed unambiguously. > I'd also design it so that the standard conversion specifiers are > available 'for free' (i.e., they work for any class, unless the class > author deliberately replaces them with something else). > > Cheers, > Nick. > > -------------------------------- > > Standard Conversion Specifiers > > If an object does not define its own conversion specifiers, a standard > set of conversion specifiers are used. These are similar in concept to > the conversion specifiers used by the existing '%' operator, however > there are also a number of significant differences. The standard > conversion specifiers fall into three major categories: string > conversions, integer conversions and floating point conversions. > > The general form of a string conversion specifier is: > > [[fill][align]width][type] > > The brackets ([]) indicate an optional field. > > 'width' is a decimal integer defining the minimum field width. > If not specified, then the field width will be determined by > the content. > > If the minimum field width is defined, then the optional align > flag can be one of the following: > > '<' - Forces the field to be left-aligned within the available > space (This is the default.) > '>' - Forces the field to be right-aligned within the > available space. > > The optional 'fill' character defines the character to be used to > pad the field to the minimum width. The alignment flag must be > supplied if the character is a number other than 0 (otherwise the > character would be interpreted as part of the field width specifier). > > Finally, the 'type' determines how the data should be presented. > > The available string conversion types are: > > 's' - String format. Invokes str() on the object. > This is the default conversion specifier type. > 'r' - Repr format. Invokes repr() on the object. > > > The general form of an integer conversion specifier is: > > [[fill][align]width][sign]type > > The 'fill', 'align' and 'width' fields are as for string conversion > specifiers. > > The 'sign' field can be one of the following: > > '+' - indicates that a sign should be used for both > positive as well as negative numbers > '-' - indicates that a sign should be used only for negative > numbers (this is the default behaviour) > ' ' - indicates that a leading space should be used on > positive numbers > '()' - indicates that negative numbers should be surrounded > by parentheses > > There are several integer conversion types. All invoke int() on the > object before attempting to format it. > > The available integer conversion types are: > > 'b' - Binary. Outputs the number in base 2. > 'c' - Character. Converts the integer to the corresponding > unicode character before printing. > 'd' - Decimal Integer. Outputs the number in base 10. > 'o' - Octal format. Outputs the number in base 8. > 'x' - Hex format. Outputs the number in base 16, using lower- > case letters for the digits above 9. > 'X' - Hex format. Outputs the number in base 16, using upper- > case letters for the digits above 9. > > The general form of a floating point conversion specifier is: > > [[fill][align]width][.precision][sign]type > > The 'fill', 'align', 'width' and 'sign' fields are as for > integer conversion specifiers. > > The 'precision' field is a decimal number indicating how many digits > should be displayed after the decimal point. > > There are several floating point conversion types. All invoke > float() on > the object before attempting to format it. > > The available floating point conversion types are: > > 'e' - Exponent notation. Prints the number in scientific > notation using the letter 'e' to indicate the exponent. > 'E' - Exponent notation. Same as 'e' except it uses an upper > case 'E' as the separator character. > 'f' - Fixed point. Displays the number as a fixed-point > number. > 'F' - Fixed point. Same as 'f'. > 'g' - General format. This prints the number as a fixed-point > number, unless the number is too large, in which case > it switches to 'e' exponent notation. > 'G' - General format. Same as 'g' except switches to 'E' > if the number gets to large. > 'n' - Number. This is the same as 'g', except that it uses the > current locale setting to insert the appropriate > number separator characters. > '%' - Percentage. Multiplies the number by 100 and displays > in fixed ('f') format, followed by a percent sign. > > Objects are able to define their own conversion specifiers to replace > the standard ones. An example is the 'datetime' class, whose > conversion specifiers might look something like the arguments > to the strftime() function: > > "Today is: {0:a b d H:M:S Y}".format(datetime.now()) > > > From rasky at develer.com Tue Jun 6 20:15:07 2006 From: rasky at develer.com (Giovanni Bajo) Date: Tue, 6 Jun 2006 20:15:07 +0200 Subject: [Python-3000] iostack and sock2 References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <016c01c688c9$f5b212d0$bf03030a@trilan> <1d85506f0606051206q20d9663am41bc47832c2486d4@mail.gmail.com> <009301c68943$fc37fe60$3db72997@bagio> <1d85506f0606061043wef3f0d7jd25cb11a5f3123ef@mail.gmail.com> Message-ID: <01e401c68995$2480c8b0$3db72997@bagio> > On 6/6/06, Giovanni Bajo wrote: >> tomer filiba wrote: >> >>>> One thing I would like to raise is the issue of KeyboardInterrupt. >>>> I find very inconvenient that a normal application doing a very >>>> simple blocking read from a socket can't be interrupted by a CTRL+C >>>> sequence. Usually, what I do is to setup a timeout on the sockets >>>> (eg. 0.4 seconds) and then simply retry if the data has not arrived >>>> yet. But this changes the code from: >>> >>> from my experience with linux and solaris, this CTRL+C problem only >>> happens on windows machines. but then again, windows can't select() >>> on anything but sockets, so there's not gonna be a generic solution. >> >> Windows has WaitForMultipleObjects() which can be used to multiplex >> between sockets and other handles. >> > WaitForMultipleObjects doesnt work on sockets of files... You can use WSAAsyncSelect to activate message notification for socket events, and then wait with MsgWaitForMultipleObjects. Qt has a very good portable "reactor" implementation (QEventLoop in Qt3, could be renamed in Qt4) where you can register various events, including socket notifications and of course normal window messages. The implementation in Qt4 is GPL so you can have a look (src/corelib/kernel/qeventdispatcher_win.cpp). It *is* possible to have a single point of event dispatching under Windows too, and it is even possible to have it wrapped portably as Qt did. This is why I do expect Python to be able to handle this kind of things. Whatever portable poll/epoll/kqueue-kind of thing we end up with Py3k, it should use a similar technique under Windows to make sure normal messages are still processed. You really don't want to wait on sockets only and ignore Window messages altogether anyway. Giovanni Bajo From tomerfiliba at gmail.com Tue Jun 6 20:24:47 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Tue, 6 Jun 2006 20:24:47 +0200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <01e401c68995$2480c8b0$3db72997@bagio> References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <016c01c688c9$f5b212d0$bf03030a@trilan> <1d85506f0606051206q20d9663am41bc47832c2486d4@mail.gmail.com> <009301c68943$fc37fe60$3db72997@bagio> <1d85506f0606061043wef3f0d7jd25cb11a5f3123ef@mail.gmail.com> <01e401c68995$2480c8b0$3db72997@bagio> Message-ID: <1d85506f0606061124t411c5b44l8b067b46812160a4@mail.gmail.com> > You can use WSAAsyncSelect to activate message notification for socket events, > and then wait with MsgWaitForMultipleObjects. i remember reading in the winsock manual that these two methods are slower, and not suitable for servers. > It *is* possible to have a single point of event dispatching under Windows too, > and it is even possible to have it wrapped portably as Qt did. This is why I do > expect Python to be able to handle this kind of things i agree, but it needs much more thinking and research, and i will look into it. but i'm not sure it should be part of the iostack. this kind of sync/async io needs more thought anyway. -tomer On 6/6/06, Giovanni Bajo wrote: > > On 6/6/06, Giovanni Bajo wrote: > >> tomer filiba wrote: > >> > >>>> One thing I would like to raise is the issue of KeyboardInterrupt. > >>>> I find very inconvenient that a normal application doing a very > >>>> simple blocking read from a socket can't be interrupted by a CTRL+C > >>>> sequence. Usually, what I do is to setup a timeout on the sockets > >>>> (eg. 0.4 seconds) and then simply retry if the data has not arrived > >>>> yet. But this changes the code from: > >>> > >>> from my experience with linux and solaris, this CTRL+C problem only > >>> happens on windows machines. but then again, windows can't select() > >>> on anything but sockets, so there's not gonna be a generic solution. > >> > >> Windows has WaitForMultipleObjects() which can be used to multiplex > >> between sockets and other handles. > >> > > WaitForMultipleObjects doesnt work on sockets of files... > > You can use WSAAsyncSelect to activate message notification for socket events, > and then wait with MsgWaitForMultipleObjects. > > Qt has a very good portable "reactor" implementation (QEventLoop in Qt3, could > be renamed in Qt4) where you can register various events, including socket > notifications and of course normal window messages. The implementation in Qt4 > is GPL so you can have a look (src/corelib/kernel/qeventdispatcher_win.cpp). > > It *is* possible to have a single point of event dispatching under Windows too, > and it is even possible to have it wrapped portably as Qt did. This is why I do > expect Python to be able to handle this kind of things. Whatever portable > poll/epoll/kqueue-kind of thing we end up with Py3k, it should use a similar > technique under Windows to make sure normal messages are still processed. You > really don't want to wait on sockets only and ignore Window messages altogether > anyway. > > Giovanni Bajo > > From rasky at develer.com Tue Jun 6 20:30:01 2006 From: rasky at develer.com (Giovanni Bajo) Date: Tue, 6 Jun 2006 20:30:01 +0200 Subject: [Python-3000] iostack and sock2 References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <016c01c688c9$f5b212d0$bf03030a@trilan> <1d85506f0606051206q20d9663am41bc47832c2486d4@mail.gmail.com> <009301c68943$fc37fe60$3db72997@bagio> <1d85506f0606061043wef3f0d7jd25cb11a5f3123ef@mail.gmail.com> <01e401c68995$2480c8b0$3db72997@bagio> <1d85506f0606061124t411c5b44l8b067b46812160a4@mail.gmail.com> Message-ID: <01f001c68997$3935ec20$3db72997@bagio> tomer filiba wrote: >> You can use WSAAsyncSelect to activate message notification for >> socket events, and then wait with MsgWaitForMultipleObjects. > > i remember reading in the winsock manual that these two methods are > slower, and not suitable for servers. Might be FUD or outdated: have a link? Anyway, you can't have a long-running process which does not poll messages under Windows (the task is immediately marked as "not responding"), so I'm not sure what you compare it to, when you say "slower". What is the other "faster" method? WSAAsyncEvent? The only other way to go I can think of is multithreading (that is exactly how I do write servers in Python nowadays). >> It *is* possible to have a single point of event dispatching under >> Windows too, and it is even possible to have it wrapped portably as >> Qt did. This is why I do expect Python to be able to handle this >> kind of things > > i agree, but it needs much more thinking and research, and i will > look into it. but i'm not sure it should be part of the iostack. this > kind of sync/async io needs more thought anyway. Agreed. Thanks! Giovanni Bajo From jcarlson at uci.edu Tue Jun 6 21:46:49 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 06 Jun 2006 12:46:49 -0700 Subject: [Python-3000] iostack and sock2 In-Reply-To: <01f001c68997$3935ec20$3db72997@bagio> References: <1d85506f0606061124t411c5b44l8b067b46812160a4@mail.gmail.com> <01f001c68997$3935ec20$3db72997@bagio> Message-ID: <20060606124525.6A0E.JCARLSON@uci.edu> "Giovanni Bajo" wrote: > tomer filiba wrote: > > >> You can use WSAAsyncSelect to activate message notification for > >> socket events, and then wait with MsgWaitForMultipleObjects. > > > > i remember reading in the winsock manual that these two methods are > > slower, and not suitable for servers. > > Might be FUD or outdated: have a link? Anyway, you can't have a long-running > process which does not poll messages under Windows (the task is immediately > marked as "not responding"), so I'm not sure what you compare it to, when you > say "slower". What is the other "faster" method? WSAAsyncEvent? The only other > way to go I can think of is multithreading (that is exactly how I do write > servers in Python nowadays). If I've read the reports correctly, WSA* is technically limited to 512 file handles, and practically limited to fewer (you start getting a particular kind of exception). As a suggested replacement, they offer IO Completion Ports. - Josiah From tomerfiliba at gmail.com Tue Jun 6 22:14:58 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Tue, 6 Jun 2006 22:14:58 +0200 Subject: [Python-3000] iostack, continued Message-ID: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com> the old thread was getting too nested, so i made a summary of the key points raised during that discussion: http://sebulba.wikispaces.com/project+iostack+todo is there anything else i missed? any more comments to add to the summary? i'll have time to incorporate part of these issues on the weekend, not before. i'll also update the design document on my site accordingly. thanks for all the comments so far, they have already proved very helpful and furtile. -tomer From tjreedy at udel.edu Wed Jun 7 00:20:39 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 6 Jun 2006 18:20:39 -0400 Subject: [Python-3000] iostack, continued References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com> Message-ID: "tomer filiba" wrote in message news:1d85506f0606061314x615f07e0g748dbdba6ef97aae at mail.gmail.com... > thanks for all the comments so far, they have already proved > very helpful and furtile. I think you meant fertile (as opposed to futile ;-) From rasky at develer.com Wed Jun 7 02:31:07 2006 From: rasky at develer.com (Giovanni Bajo) Date: Wed, 7 Jun 2006 02:31:07 +0200 Subject: [Python-3000] iostack, continued References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com> Message-ID: <028301c689c9$aac61d60$3db72997@bagio> tomer filiba wrote: > the old thread was getting too nested, so i made a summary > of the key points raised during that discussion: > > http://sebulba.wikispaces.com/project+iostack+todo > > is there anything else i missed? any more comments to add > to the summary? About this part: "properties raising IOError", I would like to remember that Guido pronounced on The Way properties should be used in Py3k. It should be already written as part of a PEP 3000+ (don't remember way). Part of the pronouncement was that reading/writing properties should never have side-effects. I guess this kills the argument on "position" being a property? Giovanni Bajo From rasky at develer.com Wed Jun 7 02:32:37 2006 From: rasky at develer.com (Giovanni Bajo) Date: Wed, 7 Jun 2006 02:32:37 +0200 Subject: [Python-3000] iostack, continued References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com> Message-ID: <028701c689c9$e070f9d0$3db72997@bagio> tomer filiba wrote: > the old thread was getting too nested, so i made a summary > of the key points raised during that discussion: > > http://sebulba.wikispaces.com/project+iostack+todo > > is there anything else i missed? any more comments to add > to the summary? About this: > Streams be line-iterable (like today's file)? But what does a line mean to a binary file? Only text-files have a notion of lines. Maybe iterating records, at least for those wrapped streams with the notion of records. Giovanni Bajo From greg.ewing at canterbury.ac.nz Wed Jun 7 02:45:20 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 07 Jun 2006 12:45:20 +1200 Subject: [Python-3000] iostack and sock2 In-Reply-To: <20060606145855.GC14823@knob.reflex> References: <1d85506f0606031351o493a09d3l1eb8d4e2e7a4ccca@mail.gmail.com> <448258F3.3070808@gmail.com> <1d85506f0606041245t2b2ad9afh476541505d4dc273@mail.gmail.com> <44836417.5080209@canterbury.ac.nz> <1d85506f0606050936j23ec04b8y1990e11cfc0b0cf0@mail.gmail.com> <4484CBA9.6070507@canterbury.ac.nz> <44858102.30306@canterbury.ac.nz> <20060606145855.GC14823@knob.reflex> Message-ID: <448621A0.7050900@canterbury.ac.nz> Jason Lunz wrote: > I imagine if one were going to do this, that would be hidden in the > stdlib. Having it in libc would be okay. The important thing is that the implementation should allow your handlers to get called even if some library call is blocked and not being cooperative. -- Greg From greg.ewing at canterbury.ac.nz Wed Jun 7 03:05:10 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 07 Jun 2006 13:05:10 +1200 Subject: [Python-3000] iostack, continued In-Reply-To: <028301c689c9$aac61d60$3db72997@bagio> References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com> <028301c689c9$aac61d60$3db72997@bagio> Message-ID: <44862646.7020403@canterbury.ac.nz> Giovanni Bajo wrote: > About this part: "properties raising IOError", I would like to remember that > Guido pronounced on The Way properties should be used in Py3k. Part of the > pronouncement was that reading/writing properties should never have > side-effects. That's meaningless without a definition of what counts as a "side effect". Writing to a property must have *some* effect on the state of something, otherwise it's pointless. I'm guessing he meant it shouldn't affect the state of anything outside that object. But then we need to decide what counts as part of the state of a file object. Does it include the value of the file position of the underlying file descriptor? If it does, then file.position = foo is a legitimate usage of a property. -- Greg From jcarlson at uci.edu Wed Jun 7 06:41:03 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 06 Jun 2006 21:41:03 -0700 Subject: [Python-3000] iostack, continued In-Reply-To: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com> References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com> Message-ID: <20060606213547.6A27.JCARLSON@uci.edu> "tomer filiba" wrote: > > the old thread was getting too nested, so i made a summary > of the key points raised during that discussion: > > http://sebulba.wikispaces.com/project+iostack+todo > > is there anything else i missed? any more comments to add > to the summary? * """But then there are other streams where you want to call two *different* .close() methods, and the above would only allow for 1. Closing multiple times shouldn't be a problem for most streams, but not closing enough could be a problem.""" * """hrrm... what do you mean by "closing multiple times"? like socket.shutdown for reading or for writing? but other than sockets, what else can be closed in multiple ways? you can't close the "reading" of a file, while keeping it open for writing. That's not what I meant. What I meant was that most streams don't care if you close them twice. That is a = file(...);a.close();a.close() is OK. However, not all streams are robust against *not* closing. From my perspective, having each of the reading and writing stream classes include their own .close() method is perfectly reasonable, and if they happen to refer to the same stream (say a file opened in r+ or w+ mode), then .close()ing it twice is fine. But if they refer to different files, sockets, what have you (I'm sure someone will have a use case for these), and you don't .close() one of the streams, then that could be a problem. - Josiah From talin at acm.org Wed Jun 7 07:36:49 2006 From: talin at acm.org (Talin) Date: Tue, 06 Jun 2006 22:36:49 -0700 Subject: [Python-3000] String formatting: Conversion specifiers In-Reply-To: <44855CBD.3010507@gmail.com> References: <44853482.3050103@acm.org> <44855CBD.3010507@gmail.com> Message-ID: <448665F1.5060501@acm.org> Nick Coghlan wrote: > Talin wrote: > >> So I decided to sit down and rethink the whole conversion specifier >> system. I looked at the docs for the '%' operator, and some other >> languages, and here is what I came up with (this is an excerpt from >> the revised PEP.) > > > Generally nice, but I'd format the writeup a bit differently (see below) > and reorder the elements so that an arbitrary character can be supplied > as the fill character and the old ' ' sign flag behaviour remains > available. > > I'd also design it so that the standard conversion specifiers are > available 'for free' (i.e., they work for any class, unless the class > author deliberately replaces them with something else). > > Cheers, > Nick. > I've taken your proposal as a base, and made some additional changes to it. In addition, I've gone ahead and implemented a prototype of the built-in formatter based on the revised text. Note that I decided not to have different specifier syntax for each different data type - the reason is because I have a single parser that parses the conversion specifier, and it always parses precision, sign, etc., even if they are not used by that particular format type. So instead, it is simply the case that some specifier options aren't use for some format types. Here is the new text for the section: Standard Conversion Specifiers If an object does not define its own conversion specifiers, a standard set of conversion specifiers are used. These are similar in concept to the conversion specifiers used by the existing '%' operator, however there are also a number of significant differences. The standard conversion specifiers fall into three major categories: string conversions, integer conversions and floating point conversions. The general form of a standard conversion specifier is: [[fill]align][sign][width][.precision][type] The brackets ([]) indicate an optional field. Then the optional align flag can be one of the following: '<' - Forces the field to be left-aligned within the available space (This is the default.) '>' - Forces the field to be right-aligned within the available space. '=' - Forces the padding to be placed between immediately after the sign, if any. This is used for printing fields in the form '+000000120'. Note that unless a minimum field width is defined, the field width will always be the same size as the data to fill it, so that the alignment option has no meaning in this case. The optional 'fill' character defines the character to be used to pad the field to the minimum width. The alignment flag must be supplied if the character is a number other than 0 (otherwise the character would be interpreted as part of the field width specifier). A '0' fill character without an alignment flag implies an alignment type of '='. The 'sign' field can be one of the following: '+' - indicates that a sign should be used for both positive as well as negative numbers '-' - indicates that a sign should be used only for negative numbers (this is the default behaviour) ' ' - indicates that a leading space should be used on positive numbers '()' - indicates that negative numbers should be surrounded by parentheses 'width' is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content. The 'precision' field is a decimal number indicating how many digits should be displayed after the decimal point. Finally, the 'type' determines how the data should be presented. If the type field is absent, an appropriate type will be assigned based on the value to be formatted ('d' for integers and longs, 'g' for floats, and 's' for everything else.) The available string conversion types are: 's' - String format. Invokes str() on the object. This is the default conversion specifier type. 'r' - Repr format. Invokes repr() on the object. There are several integer conversion types. All invoke int() on the object before attempting to format it. The available integer conversion types are: 'b' - Binary. Outputs the number in base 2. 'c' - Character. Converts the integer to the corresponding unicode character before printing. 'd' - Decimal Integer. Outputs the number in base 10. 'o' - Octal format. Outputs the number in base 8. 'x' - Hex format. Outputs the number in base 16, using lower- case letters for the digits above 9. 'X' - Hex format. Outputs the number in base 16, using upper- case letters for the digits above 9. There are several floating point conversion types. All invoke float() on the object before attempting to format it. The available floating point conversion types are: 'e' - Exponent notation. Prints the number in scientific notation using the letter 'e' to indicate the exponent. 'E' - Exponent notation. Same as 'e' except it uses an upper case 'E' as the separator character. 'f' - Fixed point. Displays the number as a fixed-point number. 'F' - Fixed point. Same as 'f'. 'g' - General format. This prints the number as a fixed-point number, unless the number is too large, in which case it switches to 'e' exponent notation. 'G' - General format. Same as 'g' except switches to 'E' if the number gets to large. 'n' - Number. This is the same as 'g', except that it uses the current locale setting to insert the appropriate number separator characters. '%' - Percentage. Multiplies the number by 100 and displays in fixed ('f') format, followed by a percent sign. Objects are able to define their own conversion specifiers to replace the standard ones. An example is the 'datetime' class, whose conversion specifiers might look something like the arguments to the strftime() function: "Today is: {0:a b d H:M:S Y}".format(datetime.now()) Finally, I have two questions: 1) Where would be a good place to stick the rough prototype? I don't want to post it here, its rather long. 2) I'd like to know if anyone out there wants to take over the task of implementing 3102 so that I can focus my attention on 3101. I have fairly limited bandwidth at the moment, and 3101 is by far the more complex proposal. -- Talin From rasky at develer.com Wed Jun 7 11:37:39 2006 From: rasky at develer.com (Giovanni Bajo) Date: Wed, 7 Jun 2006 11:37:39 +0200 Subject: [Python-3000] iostack, continued References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com> <028301c689c9$aac61d60$3db72997@bagio> <44862646.7020403@canterbury.ac.nz> Message-ID: <03b601c68a16$05352380$3db72997@bagio> Greg Ewing wrote: >> About this part: "properties raising IOError", I would like to >> remember that Guido pronounced on The Way properties should be used >> in Py3k. Part of the pronouncement was that reading/writing >> properties should never have side-effects. > > That's meaningless without a definition of what counts as a > "side effect". Writing to a property must have *some* effect > on the state of something, otherwise it's pointless. > > I'm guessing he meant it shouldn't affect the state of anything > outside that object. But then we need to decide what counts > as part of the state of a file object. Does it include the > value of the file position of the underlying file descriptor? > If it does, then file.position = foo is a legitimate usage > of a property. I believe what he meant was that property change should not affect the state of anything but the *Python*'s object. Giovanni Bajo From ncoghlan at gmail.com Wed Jun 7 14:56:46 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 07 Jun 2006 22:56:46 +1000 Subject: [Python-3000] String formatting: Conversion specifiers In-Reply-To: <448665F1.5060501@acm.org> References: <44853482.3050103@acm.org> <44855CBD.3010507@gmail.com> <448665F1.5060501@acm.org> Message-ID: <4486CD0E.6010602@gmail.com> Talin wrote: > I've taken your proposal as a base, and made some additional changes to > it. In addition, I've gone ahead and implemented a prototype of the > built-in formatter based on the revised text. I like it! As for somewhere to put the prototype, a patch or RFE tracker item isn't too bad for holding a single Python file. We can always figure out a better place (such as somewhere in the SVN sandbox) later. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Wed Jun 7 15:21:35 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 07 Jun 2006 23:21:35 +1000 Subject: [Python-3000] String formatting: Conversion specifiers In-Reply-To: <06Jun6.092931pdt."58641"@synergy1.parc.xerox.com> References: <06Jun6.092931pdt."58641"@synergy1.parc.xerox.com> Message-ID: <4486D2DF.3080405@gmail.com> Bill Janssen wrote: >> Here is a list of the conversion types that are currently supported by >> the % operator. First thing you notice is an eerie similarity between >> this and the documentation for 'sprintf'. :) > > Yes. This is (or was) a significant advantage to the system. Many > people already had mastered the C/C++ printf system of specifiers, and > could use Python's with no mental upgrades. Is that no longer thought > to be an advantage? It's still to be preferred. Talin's latest version is still close enough to printf that using printf style formatters will 'do the right thing'. {0:s}, {0:.3f}, {0:5d}, {0:+8x} are all equivalent to their printf counterparts %s, %.3f, %5d, %+8x. (I thought doing it that way would be ambiguous, but Talin was able to figure out a way to preserve the compatibility while still adding the features we wanted). The proposed Py3k version just adds some enhancements: - choose an arbitrary fill character - choose left or right alignment in the filled field - choose to have the sign before or after the field padding - choose to use () to denote negative numbers - choose to output integers as binary numbers It also allows a class to override the handling of the formatting string entirely (so things like datetime can be first-class citizens in the formatting world). >> So there's no need to tell the system 'this is a float' >> or 'this is an integer'. > > Except that the type specifier can affect the interpretation of the > rest of the format string. For example, %.3f means to print three > fractional digits. It's possible to define the format string independently of the type specifier - its just that some of the fields only have an effect when certain type specifiers are used (e.g. precision is ignored for string and integer type specifiers). Talin's point is that because Python objects know their own type the formatting system can figure out a reasonable default type specifier (f for floats, d for integers and s for everything else). This means the whole conversion specifier can be made optional, including the type specifier. >> The only way I could see this being useful is >> if you had a type and wanted it to print out as some different type - >> but is that really the proper role of the string formatter? > > Isn't that exactly what the string formatter does? I've got a binary > value and want to express it as a different type, a string? Type > punning at the low levels is often a useful debugging tool. The latest version of the proposal explicitly states which builtin (str(), repr(), int() or float()) will be called before the value is formatted for each of the standard type specifiers. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Wed Jun 7 15:32:45 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 07 Jun 2006 23:32:45 +1000 Subject: [Python-3000] iostack, continued In-Reply-To: <03b601c68a16$05352380$3db72997@bagio> References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com> <028301c689c9$aac61d60$3db72997@bagio> <44862646.7020403@canterbury.ac.nz> <03b601c68a16$05352380$3db72997@bagio> Message-ID: <4486D57D.20005@gmail.com> Giovanni Bajo wrote: > Greg Ewing wrote: >> I'm guessing he meant it shouldn't affect the state of anything >> outside that object. But then we need to decide what counts >> as part of the state of a file object. Does it include the >> value of the file position of the underlying file descriptor? >> If it does, then file.position = foo is a legitimate usage >> of a property. > > > I believe what he meant was that property change should not affect the state of > anything but the *Python*'s object. I believe the original context where the question came up was for Path objects - Guido (rightly) objected to touching the file system as a side effect of accessing the attributes of a conceptual object like a path string. With a position attribute on actual file IO objects, it should be possible to set it up so that the file object only invokes tell() when you try to *change* the position. When you simply access the attribute, it will return the answer from an internal variable (it needs to do this anyway in order to take buffering into account). And having attribute modification on a file object touch the file system really doesn't seem particularly unreasonable. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From tanzer at swing.co.at Wed Jun 7 15:56:24 2006 From: tanzer at swing.co.at (Christian Tanzer) Date: Wed, 07 Jun 2006 15:56:24 +0200 Subject: [Python-3000] String formatting: Conversion specifiers In-Reply-To: Your message of "Wed, 07 Jun 2006 23:21:35 +1000." <4486D2DF.3080405@gmail.com> Message-ID: Nick Coghlan wrote: > Bill Janssen wrote: > >> Here is a list of the conversion types that are currently supported by > >> the % operator. First thing you notice is an eerie similarity between > >> this and the documentation for 'sprintf'. :) > > > > Yes. This is (or was) a significant advantage to the system. Many > > people already had mastered the C/C++ printf system of specifiers, and > > could use Python's with no mental upgrades. Is that no longer thought > > to be an advantage? (snip) > It's possible to define the format string independently of the type specifier > - its just that some of the fields only have an effect when certain type > specifiers are used (e.g. precision is ignored for string and integer type > specifiers). For strings, it isn't (not by Python at least): Python 2.4.2 (#1, May 30 2006, 13:47:24) [GCC 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> "%3.3s" % "abcdef" 'abc' -- Christian Tanzer http://www.c-tanzer.at/ From tomerfiliba at gmail.com Wed Jun 7 18:54:29 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Wed, 7 Jun 2006 18:54:29 +0200 Subject: [Python-3000] iostack, continued In-Reply-To: <03b601c68a16$05352380$3db72997@bagio> References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com> <028301c689c9$aac61d60$3db72997@bagio> <44862646.7020403@canterbury.ac.nz> <03b601c68a16$05352380$3db72997@bagio> Message-ID: <1d85506f0606070954q50a7c7bau66fdabfe690cdb7@mail.gmail.com> > I believe what he meant was that property change should not affect the state of > anything but the *Python*'s object. for reference, in sock2 i use properties to change the socket options of sockets. instead of doing if not s.getsockopt(SOL_SOCK, SOCK_REBINDADDR): s.setsockopt(SOL_SOCK, SOCK_REBINDADDR, 1) you can just do if not s.rebind_addr: s.rebind_addr = True which is much easier (both to maintain and read). these property- options also take care of platform dependent options (like the linger struct, which is different between winsock and bsd sockets) i can't speak for Guido now, but at first, when i proposed this options-via-properties mechanism, Guido was in favor. he agreed setsockopt is a highly non-pythonic way of doing things. besides, the context is different. a path object is not a stream object. they stand for different things. so you can't generalize like that -- the decision must be made on a per-case basis another key issue to consider here is convenience. it's much more convenient to use .position than .seek and tell. for example: original_pos = f.position try: ... do something with f except IOError: f.position = original_pos seek and tell are much more cumbersome. they will remain there, of course, if only for backwards compatibility. -tomer On 6/7/06, Giovanni Bajo wrote: > Greg Ewing wrote: > > >> About this part: "properties raising IOError", I would like to > >> remember that Guido pronounced on The Way properties should be used > >> in Py3k. Part of the pronouncement was that reading/writing > >> properties should never have side-effects. > > > > That's meaningless without a definition of what counts as a > > "side effect". Writing to a property must have *some* effect > > on the state of something, otherwise it's pointless. > > > > I'm guessing he meant it shouldn't affect the state of anything > > outside that object. But then we need to decide what counts > > as part of the state of a file object. Does it include the > > value of the file position of the underlying file descriptor? > > If it does, then file.position = foo is a legitimate usage > > of a property. > > > I believe what he meant was that property change should not affect the state of > anything but the *Python*'s object. > > Giovanni Bajo > > From tjreedy at udel.edu Wed Jun 7 20:17:49 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 7 Jun 2006 14:17:49 -0400 Subject: [Python-3000] iostack, continued References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com><028301c689c9$aac61d60$3db72997@bagio><44862646.7020403@canterbury.ac.nz><03b601c68a16$05352380$3db72997@bagio> <1d85506f0606070954q50a7c7bau66fdabfe690cdb7@mail.gmail.com> Message-ID: "tomer filiba" wrote in message news:1d85506f0606070954q50a7c7bau66fdabfe690cdb7 at mail.gmail.com... > instead of doing > if not s.getsockopt(SOL_SOCK, SOCK_REBINDADDR): > s.setsockopt(SOL_SOCK, SOCK_REBINDADDR, 1) > > you can just do > if not s.rebind_addr: > s.rebind_addr = True > > which is much easier (both to maintain and read). these property- > options also take care of platform dependent options (like the > linger struct, which is different between winsock and bsd sockets) Very nice. Much more 'pythonic'. > i can't speak for Guido now, but at first, when i proposed this > options-via-properties mechanism, Guido was in favor. he agreed > setsockopt is a highly non-pythonic way of doing things. > > besides, the context is different. a path object is not a stream > object. they stand for different things. so you can't generalize > like that -- the decision must be made on a per-case basis I agreed with Guido's pronouncement in its context. I also don't see it as applying to f.position (unless he explicitly says so). The Python file object is supposed to be a fairly direct proxy for the OS'es file object. > another key issue to consider here is convenience. it's much > more convenient to use .position than .seek and tell. for example: So I also like this. > original_pos = f.position > try: > ... do something with f > except IOError: > f.position = original_pos > > seek and tell are much more cumbersome. they will remain there, > of course, if only for backwards compatibility. Tell and seek go back half a century to manipulations of serial media like magnetic tape;-) For random access disks, their meaning is somewhat virtual or metaphorical rather than actual. To me, its time to let go of them, as much as possible, and use a more modern API. Terry Jan Reedy From greg.ewing at canterbury.ac.nz Thu Jun 8 01:34:05 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 08 Jun 2006 11:34:05 +1200 Subject: [Python-3000] iostack, continued In-Reply-To: <03b601c68a16$05352380$3db72997@bagio> References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com> <028301c689c9$aac61d60$3db72997@bagio> <44862646.7020403@canterbury.ac.nz> <03b601c68a16$05352380$3db72997@bagio> Message-ID: <4487626D.1080200@canterbury.ac.nz> Giovanni Bajo wrote: > I believe what he meant was that property change should not affect the state of > anything but the *Python*'s object. Then what counts as part of the Python object? If the object is wrapping a C struct from some library, is it okay for a property to change a member of that struct? -- Greg From greg.ewing at canterbury.ac.nz Thu Jun 8 01:42:40 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 08 Jun 2006 11:42:40 +1200 Subject: [Python-3000] iostack, continued In-Reply-To: <4486D57D.20005@gmail.com> References: <1d85506f0606061314x615f07e0g748dbdba6ef97aae@mail.gmail.com> <028301c689c9$aac61d60$3db72997@bagio> <44862646.7020403@canterbury.ac.nz> <03b601c68a16$05352380$3db72997@bagio> <4486D57D.20005@gmail.com> Message-ID: <44876470.7060806@canterbury.ac.nz> Nick Coghlan wrote: > With a position attribute on actual file IO objects, it should be > possible to set it up so that the file object only invokes tell() when > you try to *change* the position. When you simply access the attribute, > it will return the answer from an internal variable (it needs to do this > anyway in order to take buffering into account). Be careful -- in Unix it's possible for different file descriptors to share the same position pointer. For unbuffered streams at least, this should be reflected in the relevant properties or whatever is being used. -- Greg From greg.ewing at canterbury.ac.nz Thu Jun 8 12:14:34 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 08 Jun 2006 22:14:34 +1200 Subject: [Python-3000] Assignment decorators, anyone? Message-ID: <4487F88A.7080307@canterbury.ac.nz> I think I've come across a use case for @decorators on assignment statements. I have a function which is used like this: my_property = overridable_property('my_property', "This is my property.") However, it sucks a bit to have to write the name of the property twice. I just got bitten by changing the name of one of my properties and forgetting to change it in both places. If decorators could be applied to assignment statements, I'd be able to write it as something like @overridable_property my_property = "This is my property." (This would require the semantics of assignment decoration to be defined so that the assigned name is passed to the decorator function as well as the value being assigned.) On the other hand, maybe this is a use case for the "make" statement that was proposed earlier. -- Greg From tomerfiliba at gmail.com Sat Jun 10 10:55:17 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Sat, 10 Jun 2006 10:55:17 +0200 Subject: [Python-3000] enhanced descriptors Message-ID: <1d85506f0606100155v4e1c63d1jc66b21806055add4@mail.gmail.com> disclaimer: i'm not sure this suggestion is feasible to implement, because of the way descriptors work, but it's something we should consider adding. ---- as you may remember, in iostack, we said the position property should act like the following: f.position = # absolute seek f.position = # relative-to-end seek f.position += # relative-to-current seek so i wrote a data descriptor. implementing the first two is easy, but the third version is tricky. doing x.y += z on a data descriptor translates to x.__set__(y, x.__get__(y) + z) in my case, it means first tell()ing, adding the offset, and then seek()ing to the new position. this works, of course, but it requires two system calls instead of one. what i wished i had was x.__iadd__(y, z) so my suggestion is as follows: data descriptors must define __get__ and __set__. if they also define one of the inplace-operators (__iadd___, etc), it will be called instead of first __get__()ing and then __set__()ing. however, the inplace operators would have to use a different signature than the normal operators -- instead of __iadd__(self, other) they would be defined as __iadd__(self, obj, value). therefore, i suggest adding __set_iadd__, or something in that spirit, to solve the ambiguity. for example, my position descriptor would look like: class PositionDesc(object): def __get__(self, obj, cls): if obj is None: return self return obj.tell() def __set__(self, obj, value): if value >= 0: obj.seek(value, "start") else: obj.seek(value, "end") def __set_iadd__(self, obj, value): obj.seek(value, "curr") ... p = f.position # calls __get__ f.position = 5 # calls __set__ f.position = -5 # calls __set__ f.position += 10 # calls __set_iadd__ now there are two issues: * is it even possible to implement (without overcomplicating the descriptors mechanism)? * is it generally useful? i can't answer the first question, but it would surely be useful in iostack; and besides, for symmetry's sake, if x += y calls x.__iadd__(y), it should be optimized for descriptors as well. i'd hate having to do two system calls for something that can be done with one using seek(). -tomer From greg.ewing at canterbury.ac.nz Sat Jun 10 13:25:50 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 10 Jun 2006 23:25:50 +1200 Subject: [Python-3000] enhanced descriptors In-Reply-To: <1d85506f0606100155v4e1c63d1jc66b21806055add4@mail.gmail.com> References: <1d85506f0606100155v4e1c63d1jc66b21806055add4@mail.gmail.com> Message-ID: <448AAC3E.7020001@canterbury.ac.nz> tomer filiba wrote: > so my suggestion is as follows: > data descriptors must define __get__ and __set__. if they > also define one of the inplace-operators (__iadd___, etc), > it will be called instead of first __get__()ing and then > __set__()ing. > > however, the inplace operators would have to use a different > signature than the normal operators -- instead of > __iadd__(self, other) > they would be defined as > __iadd__(self, obj, value). This could be done, although it would require some large changes to the way things work. Currently the attribute access and inplace operation are done by separate bytecodes, so by the time the += gets processed, the whole descriptor business is finished with. What would be needed is to combine the attribute access and += operator into a single "add to attribute" operation. So there would be an ADD_TO_ATTRIBUTE bytecode, and a corresponding __iaddattr__ method or some such implementing it. Then of course you'd want corresponding methods for all the other inplace operators applied to attributes. And probably a third set for obj[index] += value etc. That's getting to be a ridiculously large set of methods. It could be cut down considerably by having just one in-place method of each kind, parameterised by a code indicating the arithmetic operation (like __richcmp__): Syntax Method obj.attr OP= value obj.__iattr__(op, value) obj[index] OP= value obj.__iitem__(op, value) It might be worth writing a PEP about this. Getting back to the problem at hand, there's another way it might be handled using current Python. Instead of a normal int, the position descriptor could return an instance of an int subclass with an __iadd__ method that manipulates the file position. There's one further problem with all of this, though. Afterwards, the result of the += is going get assigned back to the position property. If you want to avoid making another redundant system call, you'll somehow have to detect when the value being assigned is the result of a += and ignore it. -- Greg From tomerfiliba at gmail.com Sat Jun 10 16:43:53 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Sat, 10 Jun 2006 16:43:53 +0200 Subject: [Python-3000] enhanced descriptors In-Reply-To: <448AAC3E.7020001@canterbury.ac.nz> References: <1d85506f0606100155v4e1c63d1jc66b21806055add4@mail.gmail.com> <448AAC3E.7020001@canterbury.ac.nz> Message-ID: <1d85506f0606100743g5cb7a803yb9674a2058141cac@mail.gmail.com> well, adding bytecodes is out-of-the-question for me. i did think of doing a position-proxy class, but it has lots of drawbacks as well: * lots of methods to implement (to make it look like an int) * lazy evaluation -- should only perform tell() when requested, not before. for example, calling __repr__ or __add__ would have to tell(), while __iadd__ would not... nasty code * it would be slower: adding logic to __set__, and an int-like object (never as fast as a real int), etc. * and, worst of all, it would have unavoidable undesired behavior: desired behavior: f.position += 2 undesired behavior: p = f.position p += 2 # this would seek()!!! any good solution would require lots of magic, so i guess i'm just gonna pull off the += optimization. two system calls are not worth writing such an ugly code. the solution must come from "enhanced descriptors". - - - - - - - - - > What would be needed is to combine the attribute access > and += operator into a single "add to attribute" operation. > So there would be an ADD_TO_ATTRIBUTE bytecode, and a > corresponding __iaddattr__ method or some such implementing > it. i'm afraid that's not possible, because the compiler can't tell that x.y+=z is a descriptor assignment. > Then of course you'd want corresponding methods for all > the other inplace operators applied to attributes. And > probably a third set for obj[index] += value etc. no, i don't think so. indexing should first __get__ the object, and then index it. these are two separate operations. only the inplace operators should be optimized into one function. - - - - - - - - - > It might be worth writing a PEP about this. well, you asked for it :) preliminary pep: STORE_INPLACE_ATTR today, x.y += z is translated to x.__setattr__("y", x.__getattr__("y") +/+= z) depending on y (if it supports __iadd__ or only __add__) the proposed change is to replace this scheme by __setiattr__ - set inplace attr. it takes three arguments: name, operation, and value. it is invoked by the new bytecode instruction: STORE_INPLACE_ATTR. the new instruction's layout looks like this: TOS+2: value TOS+1: operation code (1=add, 2=sub, 3=mul, ...) TOS: object STORE_INPLACE_ATTR the need of this new special method is to optimize the inplace operators, for both normal attributes and descriptors. examples: for normal assignment, the normal behavior is retained x.y = 5 ==> x.__setattr__("y", 5) for augmented assignment, the inplace version (__setiattr__) is used instead: x.y += 5 ==> x.__setiattr__("y", operator.add, 5) the STORE_INPLACE_ATTR instruction would convert the operation code into the corresponding function from the `operator` module, to make __setiattr__ simpler. descriptors: the descriptor protocol is also extended with the __iset__ method -- inplace __set__. if the attribute is a data descriptor, __setiattr__ will try to call __iset__; if it does not exist, it would default to __get__ and then __set__. sketch implementation: def __setiattr__(self, name, op, value): attr = getattr(self, name) # descriptors if hasattr(attr, "__iset__"): attr.__iset__(self, op, value) return if hasattr(attr, "__set__"): result = op(attr.__get__(self, self.__class__), value) attr.__set__(result) return # normal attributes inplace_op_name = "__i%s__" % (op.name,) # ugly!! if hasattr(attr, inplace_op_name): getattr(atttr, inplace_op_name)(value) else: setattr(self, name, op(attr, value)) issues: should it be just one special method (__setiattr__) or a method per-operation (__setiadd__, __setisub__)? multiple methods mean a each method is simpler, but also cause code duplication. and lots of new method slots. notes: if the STORE_INPLACE_ATTR instruction does not find __setiattr__, it can always default to __setattr__, the same way it's done today. -tomer On 6/10/06, Greg Ewing wrote: > tomer filiba wrote: > > > so my suggestion is as follows: > > data descriptors must define __get__ and __set__. if they > > also define one of the inplace-operators (__iadd___, etc), > > it will be called instead of first __get__()ing and then > > __set__()ing. > > > > however, the inplace operators would have to use a different > > signature than the normal operators -- instead of > > __iadd__(self, other) > > they would be defined as > > __iadd__(self, obj, value). > > This could be done, although it would require some large > changes to the way things work. Currently the attribute > access and inplace operation are done by separate bytecodes, > so by the time the += gets processed, the whole descriptor > business is finished with. > > What would be needed is to combine the attribute access > and += operator into a single "add to attribute" operation. > So there would be an ADD_TO_ATTRIBUTE bytecode, and a > corresponding __iaddattr__ method or some such implementing > it. > > Then of course you'd want corresponding methods for all > the other inplace operators applied to attributes. And > probably a third set for obj[index] += value etc. > > That's getting to be a ridiculously large set of methods. > It could be cut down considerably by having just one > in-place method of each kind, parameterised by a code > indicating the arithmetic operation (like __richcmp__): > > Syntax Method > obj.attr OP= value obj.__iattr__(op, value) > obj[index] OP= value obj.__iitem__(op, value) > > It might be worth writing a PEP about this. > > Getting back to the problem at hand, there's another way > it might be handled using current Python. Instead of a > normal int, the position descriptor could return an > instance of an int subclass with an __iadd__ method that > manipulates the file position. > > There's one further problem with all of this, though. > Afterwards, the result of the += is going get assigned > back to the position property. If you want to avoid > making another redundant system call, you'll somehow > have to detect when the value being assigned is the > result of a += and ignore it. > > -- > Greg > From greg.ewing at canterbury.ac.nz Sun Jun 11 01:50:09 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 11 Jun 2006 11:50:09 +1200 Subject: [Python-3000] enhanced descriptors In-Reply-To: <1d85506f0606100743g5cb7a803yb9674a2058141cac@mail.gmail.com> References: <1d85506f0606100155v4e1c63d1jc66b21806055add4@mail.gmail.com> <448AAC3E.7020001@canterbury.ac.nz> <1d85506f0606100743g5cb7a803yb9674a2058141cac@mail.gmail.com> Message-ID: <448B5AB1.8050309@canterbury.ac.nz> tomer filiba wrote: > i did think of doing a position-proxy class, but it has lots of > drawbacks as well: > * lots of methods to implement (to make it look like an int) Not if you subclass it from int, and inherit all its behaviour. The only things you'd need to add are a reference to the base file object, and __iadd__ and __isub__ methods. > * lazy evaluation -- should only perform tell() when requested, > not before. for example, calling __repr__ or __add__ would have > to tell(), while __iadd__ would not... nasty code Yes, I hadn't thought of that. Quite nasty, especially since code that got the position of a file, did something else that changed the file position, and *then* used the position it got before, would get unexpected results. > any good solution would require lots of magic, so i guess > i'm just gonna pull off the += optimization. two system calls > are not worth writing such an ugly code. Yes, I'm coming to the same conclusion. Without changing the language, the desired behaviour isn't reasonably attainable. > i'm afraid that's not possible, because the compiler can't tell > that x.y+=z is a descriptor assignment. It wouldn't operate at the descriptor level, it would operate on the object itself, i.e. it would call x.__iattr__('y', '+=', z) or some such. So in your case you wouldn't use a descriptor for this part, but give the file object an __iattr__ method. > no, i don't think so. indexing should first __get__ the object, > and then index it. these are two separate operations. only the > inplace operators should be optimized into one function. I'm talking about using an in-place operator on the result of an indexing operation, e.g. x[i] += y which is a closely analogous situation. Not needed for this particular use case -- I'm just thinking ahead. -- Greg From talin at acm.org Sun Jun 11 03:07:05 2006 From: talin at acm.org (Talin) Date: Sat, 10 Jun 2006 18:07:05 -0700 Subject: [Python-3000] PEP 3101 update Message-ID: <448B6CB9.9050601@acm.org> Here's the latest PEP 3101 - I've incorporated changes based on suggestions from a lot of folks. This version incorporates: -- a detailed specification for conversion type fields -- description of error handling behavior -- 'strict' vs. 'lenient' error handling flag -- compound field names -- braces are now escaped using {{ instead of \{ -------------------------------------------------------------------- PEP: 3101 Title: Advanced String Formatting Version: $Revision: 46845 $ Last-Modified: $Date: 2006-06-10 17:59:06 -0700 (Sat, 10 Jun 2006) $ Author: Talin Status: Draft Type: Standards Content-Type: text/plain Created: 16-Apr-2006 Python-Version: 3.0 Post-History: 28-Apr-2006, 6-May-2006, 10-Jun-2006 Abstract This PEP proposes a new system for built-in string formatting operations, intended as a replacement for the existing '%' string formatting operator. Rationale Python currently provides two methods of string interpolation: - The '%' operator for strings. [1] - The string.Template module. [2] The scope of this PEP will be restricted to proposals for built-in string formatting operations (in other words, methods of the built-in string type). The '%' operator is primarily limited by the fact that it is a binary operator, and therefore can take at most two arguments. One of those arguments is already dedicated to the format string, leaving all other variables to be squeezed into the remaining argument. The current practice is to use either a dictionary or a tuple as the second argument, but as many people have commented [3], this lacks flexibility. The "all or nothing" approach (meaning that one must choose between only positional arguments, or only named arguments) is felt to be overly constraining. While there is some overlap between this proposal and string.Template, it is felt that each serves a distinct need, and that one does not obviate the other. In any case, string.Template will not be discussed here. Specification The specification will consist of the following parts: - Specification of a new formatting method to be added to the built-in string class. - Specification of a new syntax for format strings. - Specification of a new set of class methods to control the formatting and conversion of objects. - Specification of an API for user-defined formatting classes. - Specification of how formatting errors are handled. Note on string encodings: Since this PEP is being targeted at Python 3.0, it is assumed that all strings are unicode strings, and that the use of the word 'string' in the context of this document will generally refer to a Python 3.0 string, which is the same as Python 2.x unicode object. If it should happen that this functionality is backported to the 2.x series, then it will be necessary to handle both regular string as well as unicode objects. All of the function call interfaces described in this PEP can be used for both strings and unicode objects, and in all cases there is sufficient information to be able to properly deduce the output string type (in other words, there is no need for two separate APIs). In all cases, the type of the template string dominates - that is, the result of the conversion will always result in an object that contains the same representation of characters as the input template string. String Methods The build-in string class will gain a new method, 'format', which takes takes an arbitrary number of positional and keyword arguments: "The story of {0}, {1}, and {c}".format(a, b, c=d) Within a format string, each positional argument is identified with a number, starting from zero, so in the above example, 'a' is argument 0 and 'b' is argument 1. Each keyword argument is identified by its keyword name, so in the above example, 'c' is used to refer to the third argument. Format Strings Brace characters ('curly braces') are used to indicate a replacement field within the string: "My name is {0}".format('Fred') The result of this is the string: "My name is Fred" Braces can be escaped by doubling: "My name is {0} :-{{}}".format('Fred') Which would produce: "My name is Fred :-{}" The element within the braces is called a 'field'. Fields consist of a 'field name', which can either be simple or compound, and an optional 'conversion specifier'. Simple and Compound Field Names Simple field names are either names or numbers. If numbers, they must be valid base-10 integers; if names, they must be valid Python identifiers. A number is used to identify a positional argument, while a name is used to identify a keyword argument. A compound field name is a combination of multiple simple field names in an expression: "My name is {0.name}".format(file('out.txt')) This example shows the use of the 'getattr' or 'dot' operator in a field expression. The dot operator allows an attribute of an input value to be specified as the field value. The types of expressions that can be used in a compound name have been deliberately limited in order to prevent potential security exploits resulting from the ability to place arbitrary Python expressions inside of strings. Only two operators are supported, the '.' (getattr) operator, and the '[]' (getitem) operator. An example of the 'getitem' syntax: "My name is {0[name]}".format(dict(name='Fred')) It should be noted that the use of 'getitem' within a string is much more limited than its normal use. In the above example, the string 'name' really is the literal string 'name', not a variable named 'name'. The rules for parsing an item key are the same as for parsing a simple name - in other words, if it looks like a number, then its treated as a number, if it looks like an identifier, then it is used as a string. It is not possible to specify arbitrary dictionary keys from within a format string. Conversion Specifiers Each field can also specify an optional set of 'conversion specifiers' which can be used to adjust the format of that field. Conversion specifiers follow the field name, with a colon (':') character separating the two: "My name is {0:8}".format('Fred') The meaning and syntax of the conversion specifiers depends on the type of object that is being formatted, however many of the built-in types will recognize a standard set of conversion specifiers. Conversion specifiers can themselves contain replacement fields. For example, a field whose field width it itself a parameter could be specified via: "{0:{1}}".format(a, b, c) Note that the doubled '}' at the end, which would normally be escaped, is not escaped in this case. The reason is because the '{{' and '}}' syntax for escapes is only applied when used *outside* of a format field. Within a format field, the brace characters always have their normal meaning. The syntax for conversion specifiers is open-ended, since except than doing field replacements, the format() method does not attempt to interpret them in any way; it merely passes all of the characters between the first colon and the matching brace to the various underlying formatter methods. Standard Conversion Specifiers If an object does not define its own conversion specifiers, a standard set of conversion specifiers are used. These are similar in concept to the conversion specifiers used by the existing '%' operator, however there are also a number of significant differences. The standard conversion specifiers fall into three major categories: string conversions, integer conversions and floating point conversions. The general form of a standard conversion specifier is: [[fill]align][sign][width][.precision][type] The brackets ([]) indicate an optional field. Then the optional align flag can be one of the following: '<' - Forces the field to be left-aligned within the available space (This is the default.) '>' - Forces the field to be right-aligned within the available space. '=' - Forces the padding to be placed between immediately after the sign, if any. This is used for printing fields in the form '+000000120'. Note that unless a minimum field width is defined, the field width will always be the same size as the data to fill it, so that the alignment option has no meaning in this case. The optional 'fill' character defines the character to be used to pad the field to the minimum width. The alignment flag must be supplied if the character is a number other than 0 (otherwise the character would be interpreted as part of the field width specifier). A zero fill character without an alignment flag implies an alignment type of '='. The 'sign' field can be one of the following: '+' - indicates that a sign should be used for both positive as well as negative numbers '-' - indicates that a sign should be used only for negative numbers (this is the default behaviour) ' ' - indicates that a leading space should be used on positive numbers '()' - indicates that negative numbers should be surrounded by parentheses 'width' is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content. The 'precision' field is a decimal number indicating how many digits should be displayed after the decimal point. Finally, the 'type' determines how the data should be presented. If the type field is absent, an appropriate type will be assigned based on the value to be formatted ('d' for integers and longs, 'g' for floats, and 's' for everything else.) The available string conversion types are: 's' - String format. Invokes str() on the object. This is the default conversion specifier type. 'r' - Repr format. Invokes repr() on the object. There are several integer conversion types. All invoke int() on the object before attempting to format it. The available integer conversion types are: 'b' - Binary. Outputs the number in base 2. 'c' - Character. Converts the integer to the corresponding unicode character before printing. 'd' - Decimal Integer. Outputs the number in base 10. 'o' - Octal format. Outputs the number in base 8. 'x' - Hex format. Outputs the number in base 16, using lower- case letters for the digits above 9. 'X' - Hex format. Outputs the number in base 16, using upper- case letters for the digits above 9. There are several floating point conversion types. All invoke float() on the object before attempting to format it. The available floating point conversion types are: 'e' - Exponent notation. Prints the number in scientific notation using the letter 'e' to indicate the exponent. 'E' - Exponent notation. Same as 'e' except it uses an upper case 'E' as the separator character. 'f' - Fixed point. Displays the number as a fixed-point number. 'F' - Fixed point. Same as 'f'. 'g' - General format. This prints the number as a fixed-point number, unless the number is too large, in which case it switches to 'e' exponent notation. 'G' - General format. Same as 'g' except switches to 'E' if the number gets to large. 'n' - Number. This is the same as 'g', except that it uses the current locale setting to insert the appropriate number separator characters. '%' - Percentage. Multiplies the number by 100 and displays in fixed ('f') format, followed by a percent sign. Objects are able to define their own conversion specifiers to replace the standard ones. An example is the 'datetime' class, whose conversion specifiers might look something like the arguments to the strftime() function: "Today is: {0:a b d H:M:S Y}".format(datetime.now()) Controlling Formatting A class that wishes to implement a custom interpretation of its conversion specifiers can implement a __format__ method: class AST: def __format__(self, specifiers): ... The 'specifiers' argument will be either a string object or a unicode object, depending on the type of the original format string. The __format__ method should test the type of the specifiers parameter to determine whether to return a string or unicode object. It is the responsibility of the __format__ method to return an object of the proper type. string.format() will format each field using the following steps: 1) See if the value to be formatted has a __format__ method. If it does, then call it. 2) Otherwise, check the internal formatter within string.format that contains knowledge of certain builtin types. 3) Otherwise, call str() or unicode() as appropriate. User-Defined Formatting Classes There will be times when customizing the formatting of fields on a per-type basis is not enough. An example might be an accounting application, which displays negative numbers in parentheses rather than using a negative sign. The string formatting system facilitates this kind of application- specific formatting by allowing user code to directly invoke the code that interprets format strings and fields. User-written code can intercept the normal formatting operations on a per-field basis, substituting their own formatting methods. For example, in the aforementioned accounting application, there could be an application-specific number formatter, which reuses the string.format templating code to do most of the work. The API for such an application-specific formatter is up to the application; here are several possible examples: cell_format("The total is: {0}", total) TemplateString("The total is: {0}").format(total) Creating an application-specific formatter is relatively straight- forward. The string and unicode classes will have a class method called 'cformat' that does all the actual work of formatting; The built-in format() method is just a wrapper that calls cformat. The type signature for the cFormat function is as follows: cformat(template, format_hook, args, kwargs) The parameters to the cformat function are: -- The format template string. -- A callable 'format hook', which is called once per field -- A tuple containing the positional arguments -- A dict containing the keyword arguments The cformat function will parse all of the fields in the format string, and return a new string (or unicode) with all of the fields replaced with their formatted values. The format hook is a callable object supplied by the user, which is invoked once per field, and which can override the normal formatting for that field. For each field, the cformat function will attempt to call the field format hook with the following arguments: format_hook(value, conversion) The 'value' field corresponds to the value being formatted, which was retrieved from the arguments using the field name. The 'conversion' argument is the conversion spec part of the field, which will be either a string or unicode object, depending on the type of the original format string. The field_hook will be called once per field. The field_hook may take one of two actions: 1) Return a string or unicode object that is the result of the formatting operation. 2) Return None, indicating that the field_hook will not process this field and the default formatting should be used. This decision should be based on the type of the value object, and the contents of the conversion string. Error handling The string formatting system has two error handling modes, which are controlled by the value of a class variable: string.strict_format_errors = True The 'strict_format_errors' flag defaults to False, or 'lenient' mode. Setting it to True enables 'strict' mode. The current mode determines how errors are handled, depending on the type of the error. The types of errors that can occur are: 1) Reference to a missing or invalid argument from within a field specifier. In strict mode, this will raise an exception. In lenient mode, this will cause the value of the field to be replaced with the string '?name?', where 'name' will be the type of error (KeyError, IndexError, or AttributeError). So for example: >>> string.strict_format_errors = False >>> print 'Item 2 of argument 0 is: {0[2]}'.format( [0,1] ) "Item 2 of argument 0 is: ?IndexError?" 2) Unused argument. In strict mode, this will raise an exception. In lenient mode, this will be ignored. 3) Exception raised by underlying formatter. These exceptions are always passed through, regardless of the current mode. Alternate Syntax Naturally, one of the most contentious issues is the syntax of the format strings, and in particular the markup conventions used to indicate fields. Rather than attempting to exhaustively list all of the various proposals, I will cover the ones that are most widely used already. - Shell variable syntax: $name and $(name) (or in some variants, ${name}). This is probably the oldest convention out there, and is used by Perl and many others. When used without the braces, the length of the variable is determined by lexically scanning until an invalid character is found. This scheme is generally used in cases where interpolation is implicit - that is, in environments where any string can contain interpolation variables, and no special subsitution function need be invoked. In such cases, it is important to prevent the interpolation behavior from occuring accidentally, so the '$' (which is otherwise a relatively uncommonly-used character) is used to signal when the behavior should occur. It is the author's opinion, however, that in cases where the formatting is explicitly invoked, that less care needs to be taken to prevent accidental interpolation, in which case a lighter and less unwieldy syntax can be used. - Printf and its cousins ('%'), including variations that add a field index, so that fields can be interpolated out of order. - Other bracket-only variations. Various MUDs (Multi-User Dungeons) such as MUSH have used brackets (e.g. [name]) to do string interpolation. The Microsoft .Net libraries uses braces ({}), and a syntax which is very similar to the one in this proposal, although the syntax for conversion specifiers is quite different. [4] - Backquoting. This method has the benefit of minimal syntactical clutter, however it lacks many of the benefits of a function call syntax (such as complex expression arguments, custom formatters, etc.). - Other variations include Ruby's #{}, PHP's {$name}, and so on. Some specific aspects of the syntax warrant additional comments: 1) Backslash character for escapes. The original version of this PEP used backslash rather than doubling to escape a bracket. This worked because backslashes in Python string literals that don't conform to a standard backslash sequence such as '\n' are left unmodified. However, this caused a certain amount of confusion, and led to potential situations of multiple recursive escapes, i.e. '\\\\{' to place a literal backslash in front of a bracket. 2) The use of the colon character (':') as a separator for conversion specifiers. This was chosen simply because that's what .Net uses. Sample Implementation A rough prototype of the underlying 'cformat' function has been coded in Python, however it needs much refinement before being submitted. Backwards Compatibility Backwards compatibility can be maintained by leaving the existing mechanisms in place. The new system does not collide with any of the method names of the existing string formatting techniques, so both systems can co-exist until it comes time to deprecate the older system. References [1] Python Library Reference - String formating operations http://docs.python.org/lib/typesseq-strings.html [2] Python Library References - Template strings http://docs.python.org/lib/node109.html [3] [Python-3000] String formating operations in python 3k http://mail.python.org/pipermail/python-3000/2006-April/000285.html [4] Composite Formatting - [.Net Framework Developer's Guide] http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp?frame=true Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From ncoghlan at gmail.com Sun Jun 11 07:31:18 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 11 Jun 2006 15:31:18 +1000 Subject: [Python-3000] PEP 3101 update In-Reply-To: <448B6CB9.9050601@acm.org> References: <448B6CB9.9050601@acm.org> Message-ID: <448BAAA6.5060407@gmail.com> Talin wrote: > Conversion Specifiers > > Each field can also specify an optional set of 'conversion > specifiers' which can be used to adjust the format of that field. > Conversion specifiers follow the field name, with a colon (':') > character separating the two: > > "My name is {0:8}".format('Fred') > > The meaning and syntax of the conversion specifiers depends on the > type of object that is being formatted, however many of the > built-in types will recognize a standard set of conversion > specifiers. Given the changes below, this paragraph should now read something like, The meaning and syntax of the conversion specifiers depends on the type of object that is being formatted, however there is a standard set of conversion specifiers used for any object that does not override them. > > Conversion specifiers can themselves contain replacement fields. > For example, a field whose field width it itself a parameter > could be specified via: Typo: s/width it itself/width is itself/ > The syntax for conversion specifiers is open-ended, since except > than doing field replacements, the format() method does not > attempt to interpret them in any way; it merely passes all of the > characters between the first colon and the matching brace to > the various underlying formatter methods. Again, this paragraph has been overtaken by events. The syntax for conversion specifiers is open-ended, since a class can override the standard conversion specifiers. In such cases, the format() method merely passes all of the characters between the first colon and the matching brace to the relevant underlying formatting method. > Standard Conversion Specifiers It's probably worth avoiding describing the elements of the conversion specifier as fields - something neutral like 'element' should do. > '=' - Forces the padding to be placed between immediately > after the sign, if any. This is used for printing fields > in the form '+000000120'. Typo: s/placed between immediately/placed immediately/ > The 'precision' field is a decimal number indicating how many > digits should be displayed after the decimal point. Someone pointed out that for string conversions ('s' & 'r'), this field should determine how many characters are displayed. The 'precision' is a decimal number indicating how many digits should be displayed after the decimal point in a floating point conversion. In a string conversion the field indicates how many characters will be used from the field content. The precision is ignored for integer conversions. > There are several integer conversion types. All invoke int() on > the object before attempting to format it. Having another look at existing str-% behaviour, this should instead say: There are several integer conversion types. All will raise TypeError if the supplied object does not have an __index__ method. > There are several floating point conversion types. All invoke > float() on the object before attempting to format it. Similar to integers, this should instead say: There are several floating point conversion types. All will raise TypeError if the supplied object is not a float or decimal instance. > Controlling Formatting I'm becoming less and less satisfied with the idea that to get a string version of a float, I do this: x = str(val) But if I want to control the precision, I have to write: x = "{0:.3}".format(val) # Even worse than the current "%.3f" % val!! Why can't I instead write: x = str(val, ".3") IOW, why don't we change the signature of 'str' to accept a conversion specifier as an optional second argument? Then the interpretation of conversion specifiers in format strings is straightforward - the conversion specifier becomes the second argument to str(). Then it would be str() that does the dispatch of the standard conversion specifiers as described above if __format__ is not provided. Here's the description of controlling formatting in that case: ------------------------------------------------ Controlling Formatting A class that wishes to implement a custom interpretation of its conversion specifier can implement a __format__ method: class AST: def __format__(self, specifier): ... str.format() will always format each field by invoking str() with two arguments: the value to be formatted and the conversion specifier. If the field does not include a conversion specifier then it defaults to None. The signature of str() is updated to accept a conversion specifier as the second argument (defaulting to None). When the conversion specifier is None, the __str__() method of the passed in object is invoked (if present) falling back to __repr__() otherwise (aside from using unicode instead of 8-bit strings, this is unchanged from Python 2.x). If the conversion specifier is not None, then the object's __format__() method is invoked if present. Otherwise, the standard conversion specifiers described above are used. This means that where, in Python 2.x, controlling the precision of a float's string output required switching from the str() builtin to string formatting, Python 3k permits the conversion specifier to be added to the call to the builtin. x = str(val) # Unformatted x = str(val, '.3') # Limited to 3 decimal places This works for types with custom format specifiers, too: today = str(datetime.now(), 'a b d H:M:S Y') > User-Defined Formatting Classes > > There will be times when customizing the formatting of fields > on a per-type basis is not enough. An example might be an > accounting application, which displays negative numbers in > parentheses rather than using a negative sign. This is now a bad example, because we moved it into the standard conversion specifiers :) > The format hook is a callable object supplied by the user, which > is invoked once per field, and which can override the normal > formatting for that field. For each field, the cformat function > will attempt to call the field format hook with the following > arguments: > > format_hook(value, conversion) With my str() proposal above, the default format hook becomes 'str' itself. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From talin at acm.org Sun Jun 11 08:40:08 2006 From: talin at acm.org (Talin) Date: Sat, 10 Jun 2006 23:40:08 -0700 Subject: [Python-3000] PEP 3101 update In-Reply-To: <448BAAA6.5060407@gmail.com> References: <448B6CB9.9050601@acm.org> <448BAAA6.5060407@gmail.com> Message-ID: <448BBAC8.4090403@acm.org> Nick Coghlan wrote: >> Conversion Specifiers By the way, good feedback. I've incorporated most of the text changes into the PEP. I'd like to discuss a few of your suggestions in more detail before proceeding. >> There are several integer conversion types. All invoke int() on >> the object before attempting to format it. > > > Having another look at existing str-% behaviour, this should instead say: > > There are several integer conversion types. All will raise TypeError > if the supplied object does not have an __index__ method. This is a new 2.5 feature, correct? >> There are several floating point conversion types. All invoke >> float() on the object before attempting to format it. > > > Similar to integers, this should instead say: > > There are several floating point conversion types. All will raise > TypeError if the supplied object is not a float or decimal instance. This seems to close off opportunities for type-punning, which some on this list have asked for. If you want to print an int as a float, well, why not? >> Controlling Formatting > > > I'm becoming less and less satisfied with the idea that to get a string > version of a float, I do this: > > x = str(val) > > But if I want to control the precision, I have to write: > > x = "{0:.3}".format(val) # Even worse than the current "%.3f" % val!! I've been thinking about this very issue. A lot. I noticed that PyString_Format has a lot of internal functionality which has no pure-python equivalent. Like you say, it would be nice to have a simple method to format a single scalar value. I wasn't thinking about adding a parameter to str(), but instead some newly-named function (e.g. 'format'), although I realize that has compatibility problems. I think that a second param to str() is better; however, you will have to compete against any other possible claims for that valuable second parameter. As an aside, have a look at this function which I was just now working on. I'm not entirely sure it is correct, but I think you can see the motivation behind it: # Pure python implementation of the C printf 'e' format specificer def sci(val,precision,letter='e'): sign = '' if val < 0: sign = '-' val = -val exp = int(floor(log(val,10))) val *= 10**-exp if val == floor(val): val = int(val) else: val = round(val,precision) if val >= 10.0: exp += 1 val = val * 0.1 esign = '+' if exp < 0: exp = -exp esign = '-' if exp < 10: exp = '0' + str(exp) else: exp = str(exp) return sign + str(val) + letter + esign + exp > Why can't I instead write: > > x = str(val, ".3") > > IOW, why don't we change the signature of 'str' to accept a conversion > specifier as an optional second argument? > > Then the interpretation of conversion specifiers in format strings is > straightforward - the conversion specifier becomes the second argument > to str(). So my only question then is: What about classes that override __str__? Do they get the conversion specifier or not? One way to resolve this would be to go even further and bury the call to __format__ inside str! In other words, if you pass a second argument to str(), it will first check to see if there's a __format__ function, and if not, then it will fall back to __str__. > Then it would be str() that does the dispatch of the standard conversion > specifiers as described above if __format__ is not provided. My biggest concern about this is that PEP 3101 is getting kind of large, because we keep thinking of new issues related to string formatting. I'm wondering if maybe this idea of yours could be split off into a separate PEP. Other than that, I think it's a pretty good idea. -- Talin From collinw at gmail.com Sun Jun 11 14:00:57 2006 From: collinw at gmail.com (Collin Winter) Date: Sun, 11 Jun 2006 14:00:57 +0200 Subject: [Python-3000] Third-party annotation libraries vs the stdlib Message-ID: <43aa6ff70606110500n616b3f4cya30d114417ecc36e@mail.gmail.com> In working on the annotations PEP, I've run into more issues concerning the balance of responsibility between third-party libraries and the stdlib. So far, the trend has been to push responsibility for manipulating and interpreting annotations into libraries, keeping core Python free from any built-in semantics for the annotation expressions. However, nearly all the issues that have been discussed on this list go against the flow: the proposed Function() and Generator() classes, used for expressing higher-order functions and generator functions, respectively; type operations, like "T1 & T2" or "T1 | T2"; and the type parameterisation mechanism. Shipping any of these things with Python raises a number of other issues/questions that would need to be dealt with: 1. If Function() and Generator() ship in the stdlib, where do they go? In types? In a new module? Also, if Function() and Generator() come with Python, how do we make sure that third-party libraries can use them with minimal extra overhead (e.g., wrapping layers to make the shipped Function() and Generator() objects compatible with the library's internal architecture)? 2. If "T1 & T2" is possible with core Python (ie, no external libraries), what does "type(T1 & T2)" return? Is "type(T1 & T2)" the same as "type(T1 | T2)"? What can you do with these objects in core Python? Can you subclass from "T1 & T2"? Does "issubclass(T1, T1 | T2)" return True? What about "isinstance(5, int | dict)"? Are "T1 & T2" and "T1 | T2" the only defined operations? What about xor or not? 3. Similar questions are raised by having the "T1[x, y, z]" parameterisation method present in core Python: what is the type of "tuple[int, int]"? What can you do with it? Does "isinstance((5, 6, 7), tuple[int, int, int])" return True? Do they have the same & and | operations as other built-in types? What happens when you mix parameterised types and non-parameterised types, e.g., "tuple[int, (int, int)]"? Based on the complexity involved in specifying all of these issues, I say we punt: let the third-party libraries handle this. Addressing the above issues from this perspective: 1. Shipping Function() and Generator() objects is a (relative) piece of cake. 2. In my own experience with this kind of stuff, there's very little need to express and-ing and or-ing of type expressions. Third-party libraries can provide this on their own via And() and Or() classes/functions/whatevers. If some particular library absolutely insists on using the & and | operators, there might be some metaclass wizardry that could accomplish this, but I'm not saying I know what it is. 3. The questions raised by the special type parameterisation mechanism can be removed by simply omitting the mechanism. In particular, using regular tuples/lists/dicts/etc instead of the tuple[]/list[]/dict[] spelling completely removes the issue of mixing parameterised and non-parameterised expressions. To sum up: I propose that -- to combat these issues -- I limit the PEP to discussing how to supply annotations (the annotation syntax and C API) and how to read them back later (via __signature__). Collin Winter From mcherm at mcherm.com Mon Jun 12 14:58:59 2006 From: mcherm at mcherm.com (Michael Chermside) Date: Mon, 12 Jun 2006 05:58:59 -0700 Subject: [Python-3000] iostack, continued Message-ID: <20060612055859.eocfygr98rg0scoo@login.werra.lunarpages.com> Greg Ewing writes: > Be careful -- in Unix it's possible for different file > descriptors to share the same position pointer. Really? I had no idea. How does one invoke this behavior? How does current python (2.4) behave when subjected to this? -- Michael Chermside From steven.bethard at gmail.com Mon Jun 12 20:03:21 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 12 Jun 2006 12:03:21 -0600 Subject: [Python-3000] Assignment decorators, anyone? In-Reply-To: <4487F88A.7080307@canterbury.ac.nz> References: <4487F88A.7080307@canterbury.ac.nz> Message-ID: On 6/8/06, Greg Ewing wrote: > I think I've come across a use case for @decorators > on assignment statements. > > I have a function which is used like this: > > my_property = overridable_property('my_property', "This is my property.") > > However, it sucks a bit to have to write the name of > the property twice. I just got bitten by changing the > name of one of my properties and forgetting to change > it in both places. > > If decorators could be applied to assignment statements, > I'd be able to write it as something like > > @overridable_property > my_property = "This is my property." > > (This would require the semantics of assignment > decoration to be defined so that the assigned name > is passed to the decorator function as well as the > value being assigned.) > > On the other hand, maybe this is a use case for > the "make" statement that was proposed earlier. Yes, `PEP 359`_ provided functionality like this, but since it's withdrawn, another option for you is something like:: class my_property: __metaclass__ = overridable_property text = "This is my property." where overridable_property looks something like: def overridable_property(name, args, kwargs): text = kwargs.pop('text') # do whatever you normally do with name and text (This is basically all the "make" statement was doing under the covers anyway.) Of course, the end result is that you use a class statement to create something that isn't a class, but at least you manage to avoid writing "my_property" twice. .. _PEP 359: http://www.python.org/dev/peps/pep-0359/ STeVe -- Grammar am for people who can't think for myself. --- Bucky Katt, Get Fuzzy From tomerfiliba at gmail.com Mon Jun 12 21:06:38 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 12 Jun 2006 21:06:38 +0200 Subject: [Python-3000] enhanced descriptors, part 2 Message-ID: <1d85506f0606121206l1a7fb3abq76d4a916b36a764@mail.gmail.com> hrrrmpff. there really is a need for "enhanced descriptors", or something like that. i'm having serious trouble implementing the position property, as python is currently limited in this area. the rules were: * when you assign a positive integer, it's an absolute position to seek to * when you assign a negative integer, it's relative to the end, as is the case with slices * when you assign None, it's the ultimate last position -- seek(0, "end"), although you would use f.END instead of None directly * when you use the +=/-= operators, it's relative to the current position (if optimized via __iadd__, can reduce one unnecessary system call) but descriptors don't support augmented __set__()ing. one solution would be to return an int-like object, where __iadd__ would seek relative to the current position. aside of being slower than expected, complicating __set__, and implementing position caching, this has a major drawback: f.position += 4 # this assignment seeks p = f.position p += 4 # and this assignment seeks as well! so that's out of the question, and we'll have to suffer two system calls, at least in the experimental branch. maybe the C-branch could utilize under-the-hood tricks to avoid that. the current code of the descriptor looks like this: class PositionDesc(object): def __get__(self, obj, cls): if obj is None: return self return obj.tell() def __set__(self, obj, value): if value is None: obj.seek(0, "end") elif value < 0: obj.seek(value, "end") else: obj.seek(value, "start") but now we come to another problem... files became cyclic! or sorta... if f.position < x, then f.position -= x would assign a negative value to f.position. this, in turn, seeks relative to the end of the file, thus making the file behave like a semi-cyclic entity with a not-so-intuitive behavior... for example, assuming a file size of 100, and a current position of 70: pos = 70 pos -= 71 ==> pos = -1 ==> pos = (100 - 1) ==> pos = 99 baaaaaah! in the original design, i wanted to raise an exception if seeking relative to the current position got negative... but due to the aforementioned technical limitations, it's not possible. the whole issue could be solved if the descriptor protocol supported augmented assignment -- but it requires, of course, a drastic change to the language, something like the suggested STORE_INPLACE_ATTR or the __iattr__ suggested by Greg. will Guido pronounce on his choice (or dischard it altogether)? -tomer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060612/c36705e9/attachment.htm From brett at python.org Mon Jun 12 22:41:14 2006 From: brett at python.org (Brett Cannon) Date: Mon, 12 Jun 2006 13:41:14 -0700 Subject: [Python-3000] We should write a PEP on what goes into the stdlib Message-ID: Right now a discussion is going on in python-dev about what is reasonable for special needs of developers who bring in modules to the stdlib. This of course brings up the idea of slimming down the stdlib, having sumo releases, etc. That makes me think perhaps we should start thinking about collectively coming up with guidelines (which end up in a PEP; and yes, I am volunteering to write it) on deciding what is needed to accept a module into the stdlib. We can then use this to go through what is already there and trim out the fluff already there and get a list going of what will end up disappearing early on so people can know long in advance. Now this has nothing to do with a stdlib renaming or anything. This is purely about figuring out what is required for accepting a module and for pruning out what we don't want that we currently have. So, to start this discussion, here are my ideas... First, the modules must have been in the wild and used by the community. This has worked well so far by making sure the code is stable and that the API is good. Second, the code must follow Python coding guidelines. This means not just proper formatting and naming, but also that good unit tests are included as well. It also means that the module name might need to be renamed. Documentation must also be provided in the proper format before acceptance. All of this must be done *before* anything is checked in (use a branch if needed to hold the work on the transition). Third, a PEP discussing why the module should go in. Basically, a documented case for why the module should be distributed in Python. It also gives python-dev a central document to read and refer to when voting on whether something should be let into the stdlib. Can also document differences between the public version and the one in the stdlib. Fourth, the contributor must have signed a contribution agreement. Fifth, contributors realize that Python developers have any and all rights to check in changes to the code. They can do something like how Barry maintains external email releases and document that in the PEP. This is probably one of the more contentious ideas laid out here. But we need to worry about keeping the stdlib easily maintained since python-dev takes on responsibility for code once it's checked in so we need to keep this as simple as possible. Basically this eliminates PEP 360 for Py3K. Now, another thing is backwards compatibility. Do we worry about portability to older versions like we do now with PEP 291, or do all new modules checked in give up the right to force developers to keep the code compatible to a certain version? This is another ease of maintenance/nice to external release issue. And that external release/ease of maintenance is going to be the sticky point in all of this. We need to find a good balance or we for scaring away people from contributing code. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060612/6e5b2c70/attachment.htm From rhettinger at ewtllc.com Mon Jun 12 23:44:27 2006 From: rhettinger at ewtllc.com (Raymond Hettinger) Date: Mon, 12 Jun 2006 14:44:27 -0700 Subject: [Python-3000] We should write a PEP on what goes into the stdlib In-Reply-To: References: Message-ID: <448DE03B.2050205@ewtllc.com> Brett Cannon wrote: > This is purely about figuring out what is required for accepting a > module and for pruning out what we don't want that we currently have. Well intentioned, but futile. Each case ultimately gets decided on its merits. Any one reason for inclusion or exclusion can be outweighed by some other reason. There isn't a consistent ruleset that explains clearly why decimal, elementtree, email, and textwrap were included while Cheetah, Twisted, numpy, and BeautifulSoup were not. Overly general rules are likely to be rife with exceptions and amount to useless administrivia. I don't think these contentious issues can be decided in advance. The specifics of each case are more relevant than a laundry list of generalizations. > First, the modules must have been in the wild and used by the > community. This has worked well so far by making sure the code is > stable and that the API is good. Nice guideline, but the decimal module did not meet that test. For AST, the stability criterion was tossed and the ultimate API is still in limbo. Itertools went in directly. However, the tried and true mxTools never went in, and the venerable bytecodehacks never had a chance. > > Second, the code must follow Python coding guidelines. We already have a PEP for that. > > Third, a PEP discussing why the module should go in. We don't need a PEP for every module. If the python-dev discussion says we want it and Guido approves, then it is a done deal. > > Now, another thing is backwards compatibility. Isn't there already a PEP where people can add portability restrictions (i.e. having decimal continue to work on Py2.3? From brett at python.org Tue Jun 13 00:23:08 2006 From: brett at python.org (Brett Cannon) Date: Mon, 12 Jun 2006 15:23:08 -0700 Subject: [Python-3000] We should write a PEP on what goes into the stdlib In-Reply-To: <448DE03B.2050205@ewtllc.com> References: <448DE03B.2050205@ewtllc.com> Message-ID: One thing I forgot to say in the initial email was that I am being intentially heavy-handed with restrictions on people to get some dialog and see where people think things are okay and not. On 6/12/06, Raymond Hettinger wrote: > > Brett Cannon wrote: > > > This is purely about figuring out what is required for accepting a > > module and for pruning out what we don't want that we currently have. > > > Well intentioned, but futile. Each case ultimately gets decided on its > merits. Any one reason for inclusion or exclusion can be outweighed by > some other reason. There isn't a consistent ruleset that explains > clearly why decimal, elementtree, email, and textwrap were included > while Cheetah, Twisted, numpy, and BeautifulSoup were not. True. And notice none of my points say that some package must have been used in the community for X number of months or have Y number of users across Z operating systems. That is not the point of the PEP. The points I laid out are not that rigid and are basically what we follow, but centralized in a single place. Plus it codifies how we want to handle contributed code in terms of how flexible we want to be for handling people's wants on how we touch their code in the repository. A PEP on this would give us something to point to when people email the list saying, "I want to get this module added to the stdlib" and prevent ourselves from repeating the same lines over and over and let people know what we expect. Overly general rules are likely to be rife with exceptions and amount to > useless administrivia. I don't think these contentious issues can be > decided in advance. The specifics of each case are more relevant than a > laundry list of generalizations. I don't think the points made are that unreasonable. Following formatting guidelines, signing a contributor agreement, etc. are not useless administrivia. The PEP requirement maybe. And stating what python-dev is willing to do in terms of maintenance I think is totally reasonable to state up front. > First, the modules must have been in the wild and used by the > > community. This has worked well so far by making sure the code is > > stable and that the API is good. > > > Nice guideline, but the decimal module did not meet that test. Right, so? The decimal module would have most likely been picked up eventually; maybe not 2.3 but at some point. Having it available during dev would have counted as use in the community anyway. For AST, > the stability criterion was tossed and the ultimate API is still in > limbo. The AST is not a stdlib thing, in my opinion. That was back-end stuff. Plus you can't provide AST access directly without mucking with the internals anyway, so that basically requires dev within at least a branch. Itertools went in directly. Once again, fine, but would that have prevented it from ever going in? I doubt that. I know you did a lot of asking the community for what to include and such. Had you done that externally while working on it and then propose it to python-dev once you were satisfied with the implementation it probably would have gone right in. However, the tried and true mxTools > never went in, and the venerable bytecodehacks never had a chance. > > > > > > Second, the code must follow Python coding guidelines. > > We already have a PEP for that. Yeah, and yet we still accept stuff that does not necessarily follow those PEPs. I am not saying we need to write those PEPs, again, but say that those PEPs *must* be followed. > > > Third, a PEP discussing why the module should go in. > > We don't need a PEP for every module. If the python-dev discussion says > we want it and Guido approves, then it is a done deal. Look at pysqlite. We went through that discussion twice. Most module discussions end up being rather long and having a single place where stuff is written would be nice. But I don't view this as a necessary step. > > > Now, another thing is backwards compatibility. > > Isn't there already a PEP where people can add portability restrictions > (i.e. having decimal continue to work on Py2.3? > > > Yep, PEP 291. What I am asking here is whether contributers should be able to request compatibility restrictions on the source code at all. As I said, I purposely went heavy-handed in this to get feedback from people. The points I made are all very python-def friendly and not external developer friendly. But we need to discuss that to get an idea of what python-dev is willing to do to get external contributions for the stdlib. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060612/2a122c84/attachment.htm From rudyrudolph at excite.com Tue Jun 13 00:49:45 2006 From: rudyrudolph at excite.com (Rudy Rudolph) Date: Mon, 12 Jun 2006 18:49:45 -0400 (EDT) Subject: [Python-3000] PEP 3101 update Message-ID: <20060612224945.8AF9B2F5C3@xprdmxin.myway.com> Is it possible to support two additional string formatting features without overly complicating the whole thing? It would be nice to have align centered and align on decimal point. Centered would add fill chars both before and after the value. If an odd number of fill chars must be added, the extra char is after the value. Align on decimal is not necessary with 'f' formatting because you can right align the fractional part and get the same effect. However, with 'g' formatting the number of digits after the point may vary and there may not even be a decimal point. In this case, a column of numbers should be aligned at the last digit of the integer part. If these are desirable, we would need to choose suitable symbols for align. Either '><' or '|' seems appropriate for centered. Rudy _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web! From greg.ewing at canterbury.ac.nz Tue Jun 13 02:45:51 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 13 Jun 2006 12:45:51 +1200 Subject: [Python-3000] iostack, continued In-Reply-To: <20060612055859.eocfygr98rg0scoo@login.werra.lunarpages.com> References: <20060612055859.eocfygr98rg0scoo@login.werra.lunarpages.com> Message-ID: <448E0ABF.7040704@canterbury.ac.nz> Michael Chermside wrote: > Greg Ewing writes: > >> Be careful -- in Unix it's possible for different file >> descriptors to share the same position pointer. > > Really? I had no idea. > > How does one invoke this behavior? It happens every time you fork, and the child process inherits copies of the stdin/out/err descriptors. If e.g. stdin is coming from a disk file, and the child reads part of the file, and then the parent reads some more, it will start reading where the child left off. Another way is to use dup() or dup2() to make a copy of a file descriptor. > How does current python (2.4) > behave when subjected to this? Calls in the os module behave the same as their underlying system calls. File objects behave however the platform's C stdio library behaves. Buffering makes things a bit messy. Usually it's not a problem, because normally parent and child processes don't both read or write the same disk file. If they do, some flushing calls might be necessary. -- Greg From greg.ewing at canterbury.ac.nz Tue Jun 13 04:05:57 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 13 Jun 2006 14:05:57 +1200 Subject: [Python-3000] PEP 3101 update In-Reply-To: <20060612224945.8AF9B2F5C3@xprdmxin.myway.com> References: <20060612224945.8AF9B2F5C3@xprdmxin.myway.com> Message-ID: <448E1D85.2050409@canterbury.ac.nz> Rudy Rudolph wrote: > However, with > 'g' formatting the number of digits after the point may vary and there > may not even be a decimal point. In this case, a column of numbers > should be aligned at the last digit of the integer part. How do you do that when you're formatting one string at a time? -- Greg From tony at printra.net Tue Jun 13 05:06:40 2006 From: tony at printra.net (Tony Lownds) Date: Mon, 12 Jun 2006 20:06:40 -0700 Subject: [Python-3000] Third-party annotation libraries vs the stdlib In-Reply-To: <43aa6ff70606110500n616b3f4cya30d114417ecc36e@mail.gmail.com> References: <43aa6ff70606110500n616b3f4cya30d114417ecc36e@mail.gmail.com> Message-ID: <9C7E76CA-F4A0-4C94-B312-891F5A9B93BB@printra.net> On Jun 11, 2006, at 5:00 AM, Collin Winter wrote: > In working on the annotations PEP, I've run into more issues > concerning the balance of responsibility between third-party libraries > and the stdlib. > > So far, the trend has been to push responsibility for manipulating and > interpreting annotations into libraries, keeping core Python free from > any built-in semantics for the annotation expressions. However, nearly > all the issues that have been discussed on this list go against the > flow: the proposed Function() and Generator() classes, used for > expressing higher-order functions and generator functions, > respectively; type operations, like "T1 & T2" or "T1 | T2"; and the > type parameterisation mechanism. > > Shipping any of these things with Python raises a number of other > issues/questions that would need to be dealt with: > > 1. If Function() and Generator() ship in the stdlib, where do they go? > In types? In a new module? The types module seems like a decent place. > Also, if Function() and Generator() come with Python, how do we make > sure that third-party libraries can use them with minimal extra > overhead (e.g., wrapping layers to make the shipped Function() and > Generator() objects compatible with the library's internal > architecture)? > Thats an issue for third party libraries. > 2. If "T1 & T2" is possible with core Python (ie, no external > libraries), what does "type(T1 & T2)" return? Is "type(T1 & T2)" the > same as "type(T1 | T2)"? > These operations could return objects that describe the types and nothing else. It doesn't make sense for the result of T1 | T2 to be a type object. class TypeUnion: def __init__(self, *types): self.types = types def __repr__(self): return '(%s)' % ' | '.join(map(repr, self.types)) def __or__(self, other): ... > What can you do with these objects in core Python? Can you subclass > from "T1 & T2"? Does "issubclass(T1, T1 | T2)" return True? What about > "isinstance(5, int | dict)"? isinstance could be extended to work with TypeUnion objects. Only type objects are sensible for issubclass. It makes more sense for another predicate to determine subtype relationships. And I think it makes more sense for third party packages to provide the specific definitions and mechanisms for determining subtype relationships. Coming to a common and usable definition would be too difficult otherwise. I wanted to suggest that core Python's isinstance be extended to work with types and subtyping definitions be left up to third party packages but that won't work. You can't tell whether a given callable is a valid instance of a Function() without a subtype predicate. > Are "T1 & T2" and "T1 | T2" the only defined operations? What about > xor or not? > I can't think of any useful semantics for this. > 3. Similar questions are raised by having the "T1[x, y, z]" > parameterisation method present in core Python: what is the type of > "tuple[int, int]"? What can you do with it? It could be a type object that is a subclass of tuple. It could also be an object that describes the type, like TypeUnion above. > Does "isinstance((5, 6, > 7), tuple[int, int, int])" return True? For new style classes, isinstance(obj, T) is roughly equivalent to issubclass(type(obj), T). Lets say tuple[int, int, int] is a subclass of tuple. The result of type((5, 6, 7)) won't change -- it's the tuple type object. So isinstance((5, 6, 7), tuple[int, int, int]) would return False. That is misleading. I think it would be better if the tuple[int, int] would return something that isn't a type so that uses of isinstance are not misleading. Another idea would be to provide a different way to check that an instance is a valid member of a type. I bet this would get rejected quickly. >>> int.ismember(5) True >>> (int | dict).ismember(5) True > Do they have the same & and | operations as other built-in types? Sure, why not. > What happens when you mix > parameterised types and non-parameterised types, e.g., "tuple[int, > (int, int)]"? Is the question is whether the parameterization mechanism should enforce that it's parameters are valid types? > Based on the complexity involved in specifying all of these issues, I > say we punt: let the third-party libraries handle this. [...] > To sum up: I propose that -- to combat these issues -- I limit the PEP > to discussing how to supply annotations (the annotation syntax and C > API) and how to read them back later (via __signature__). +1 I think the annotations PEP should definitely punt on this and also punt on definitions of And(), Function(), Generator(), etc. Unless those are what is returned by __signature__? Annotations syntax and __signature__ object API could also be independent PEPs. It would be really nice to have a common language for type annotations. This lets authors of type-annotated code use the same annotations with a variety of third-party packages. From the issues above this seems hard to accomplish in a way that integrates well with the rest of Python. -Tony From thomas at python.org Tue Jun 13 09:39:40 2006 From: thomas at python.org (Thomas Wouters) Date: Tue, 13 Jun 2006 09:39:40 +0200 Subject: [Python-3000] [Python-Dev] xrange vs. int.__getslice__ In-Reply-To: <448E6A74.3010409@renet.ru> References: <448E6A74.3010409@renet.ru> Message-ID: <9e804ac0606130039o29ce1f39neff8af92e8faeff7@mail.gmail.com> On 6/13/06, Vladimir 'Yu' Stepanov wrote: > > You were bothered yet with function xrange ? :) I suggest to replace it. http://www.python.org/dev/peps/pep-0204/ (If you must really discuss this, which would probably be futile and senseless, please do it on python-3000 only.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060613/0625c63e/attachment.html From vys at renet.ru Tue Jun 13 09:34:12 2006 From: vys at renet.ru (Vladimir 'Yu' Stepanov) Date: Tue, 13 Jun 2006 11:34:12 +0400 Subject: [Python-3000] xrange vs. int.__getslice__ Message-ID: <448E6A74.3010409@renet.ru> You were bothered yet with function xrange ? :) I suggest to replace it. --------------------------------------------- for i in xrange(100): pass vs. for i in int[:100]: pass --------------------------------------------- --------------------------------------------- for i in xrange(1000, 1020): pass vs. for i in int[1000:1020]: pass --------------------------------------------- --------------------------------------------- for i in xrange(200, 100, -2): pass vs. for i in int[200:100:-2]: pass --------------------------------------------- From vys at renet.ru Tue Jun 13 10:11:17 2006 From: vys at renet.ru (Vladimir 'Yu' Stepanov) Date: Tue, 13 Jun 2006 12:11:17 +0400 Subject: [Python-3000] [Python-Dev] xrange vs. int.__getslice__ In-Reply-To: <9e804ac0606130039o29ce1f39neff8af92e8faeff7@mail.gmail.com> References: <448E6A74.3010409@renet.ru> <9e804ac0606130039o29ce1f39neff8af92e8faeff7@mail.gmail.com> Message-ID: <448E7325.4010000@renet.ru> Thomas Wouters wrote: > http://www.python.org/dev/peps/pep-0204/ > > (If you must really discuss this, which would probably be futile and > senseless, please do it on python-3000 only.) Certainly looks very similar. PEP-204 demands change in a parser and considers a new design as replacement to range functions. My offer can be considered as replacement to xrange functions. Any change in a syntactic design of language to spend it is not necessary. Thanks. From mcherm at mcherm.com Tue Jun 13 15:28:30 2006 From: mcherm at mcherm.com (Michael Chermside) Date: Tue, 13 Jun 2006 06:28:30 -0700 Subject: [Python-3000] We should write a PEP on what goes into the stdlib Message-ID: <20060613062830.55hi3ynpppdd8gc4@login.werra.lunarpages.com> Brett writes: > That makes me think perhaps we should start thinking about collectively > coming up with guidelines [...] on deciding what is needed to accept a > module into the stdlib. Raymond replies: > Each case ultimately gets decided on its merits. Any one reason for > inclusion or exclusion can be outweighed by some other reason. [...] > Overly general rules are likely to be rife with > exceptions and amount to useless administrivia. I don't think these > contentious issues can be decided in advance. The specifics of each case > are more relevant than a laundry list of generalizations. I agree. If we have a PEP with rules for acceptance, then every time we don't follow those rules exactly we will be accused of favoritism. If we have informal rules like today and decide things on a case-by-case basis, then everything is fine. Rather than a formal PEP, how about a wiki page (which is necessarily less of a formal "rule") that describes a good process to get your module accepted. The obvious things (release to the community, get wide usage, be recognized as best-of-breed, agree to donate code, agree to support for some time) could all be listed there. It's just as easy to refer someone to a wiki page as it is to refer them to a PEP, but it doesn't make it seem like we're bound to follow a particular process. -- Michael Chermside From mcherm at mcherm.com Tue Jun 13 15:39:27 2006 From: mcherm at mcherm.com (Michael Chermside) Date: Tue, 13 Jun 2006 06:39:27 -0700 Subject: [Python-3000] enhanced descriptors, part 2 Message-ID: <20060613063927.cpbh2nslum2owgsw@login.werra.lunarpages.com> tomer writes: > there really is a need for "enhanced descriptors", or something like > that. i'm having serious trouble implementing the position property, > as python is currently limited in this area. No, this doesn't necessarily imply that we need "enhanced descriptors", an alternative solution is to change the intended API for file positions. After all, the original motivation for using a property was that it was (a) nice to use, (b) easy to read, and (c) possible to implement. If (c) isn't true then perhaps we rethink the API. After all, how bad would it be to use the following: f.position -- used to access the current position f.seek_to(x) -- seek to an absolute position (may be relative to end) f.seek_by(x) -- seek by a relative amount Or even go half-way: f.position -- used to access the current position f.position = x -- seek to an absolute position (may be relative to end) f.seek_by(x) -- seek by a relative amount Properties are nice, but there's nothing wrong with methods either. If we went with the second approach, people might foolishly use "f.position += 4" where they intended "f.seek_by(x)" and it would still work fine, it just wouldn't be optimized. That's really not so bad. -- Michael Chermside From bborcic at gmail.com Tue Jun 13 16:34:18 2006 From: bborcic at gmail.com (Boris Borcic) Date: Tue, 13 Jun 2006 16:34:18 +0200 Subject: [Python-3000] xrange vs. int.__getslice__ In-Reply-To: <448E6A74.3010409@renet.ru> References: <448E6A74.3010409@renet.ru> Message-ID: Vladimir 'Yu' Stepanov wrote: > You were bothered yet with function xrange ? :) I suggest to replace it. > > --------------------------------------------- > for i in xrange(100): pass > vs. > for i in int[:100]: pass > --------------------------------------------- in a similar vein (slices on types) ----------------------------------------------- (slice(1,10),Ellipsis,slice(1,10)) vs slice[1:10,...,1:10] ----------------------------------------------- Boris -- "On na?t tous les m?tres du m?me monde" From rrr at ronadam.com Tue Jun 13 19:32:08 2006 From: rrr at ronadam.com (Ron Adam) Date: Tue, 13 Jun 2006 12:32:08 -0500 Subject: [Python-3000] We should write a PEP on what goes into the stdlib In-Reply-To: References: Message-ID: Brett Cannon wrote: > So, to start this discussion, here are my ideas... > > First, the modules must have been in the wild and used by the > community. This has worked well so far by making sure the code is > stable and that the API is good. Those modules and packages that necessary parts of python are dependent on should probably be near the top of your list. Just what is included as necessary parts could be discussed. Possibly a short list would include... * Modules needed to manage, test and document the python installation. * Modules needed to run, edit and test python programs. * Modules needed to document programs * Modules needed to package, install and distribute programs. * Modules needed for platform compatibility. After including these, and those modules and packages these are dependent on, there might not be all that much to remove. Which would leave... * Modules and packages that are so popular that it doesn't make since to not install them. All else could probably either be an optionally installed package included in the distribution or as an easy to install egg package. I don't think determining what goes into the stdlib is as difficult as people think. It all seems pretty practical to me (Although not trivial to do when taken as a whole) Maybe adding a few guide lines as to what should not be in the standard lib would be a good way to prevent it from growing to large. Ie, modules that haven't been tested sufficiently in the wild or by python DEV, or modules rarely needed or used... etc. Listing the inverse of these as reasons for inclusion seems to be the suggested approach here, but that seems to me to be working from the wrong end in my humble opinion. Ron From rudyrudolph at excite.com Tue Jun 13 19:33:30 2006 From: rudyrudolph at excite.com (Rudy Rudolph) Date: Tue, 13 Jun 2006 13:33:30 -0400 (EDT) Subject: [Python-3000] PEP 3101 update Message-ID: <20060613173330.9388899E4A@xprdmxin.myway.com> Rudy Rudolph wrote: >It would be nice to have align centered and align on decimal point. >However, with 'g' formatting the number of digits after the point may >vary and there may not even be a decimal point. In this case, a column >of numbers should be aligned at the last digit of the integer part. Greg Ewing wrote: >How do you do that when you're formatting one string at a time? I thought the whole idea of alignment specifications was so that we could print one line at a time but get everything to line up. To right align, we use '>' for align and specify the last position relative to the what came before. That relative position is known as the field width. To decimal align, we use a different char for align and specify the decimal position relative to what came before. One possible way to do this is with, for example, '9.3g' which means fill chars and digits in the first 5 positions, a decimal point if necessary in the sixth position, and digits and fill chars in the last three positions. If the same format string is used for every line printed, the decimals line up, just like the right sides line up with right alignment. Well, actually the last digits of the integer portions line up even if there isn't a decimal point, just like with a decimal tab in MS Word. There are certainly other ways to specify the same thing and I don't much care what the format is. It should be easy enough both to settle on a format and to implement, and it would certainly be useful. Rudy _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web! From tomerfiliba at gmail.com Tue Jun 13 19:48:25 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Tue, 13 Jun 2006 19:48:25 +0200 Subject: [Python-3000] enhanced descriptors, part 2 In-Reply-To: <20060613063927.cpbh2nslum2owgsw@login.werra.lunarpages.com> References: <20060613063927.cpbh2nslum2owgsw@login.werra.lunarpages.com> Message-ID: <1d85506f0606131048i49ee0982jf8729e130b2a8a41@mail.gmail.com> > f.position -- used to access the current position > f.position = x -- seek to an absolute position (may be relative to end) > f.seek_by(x) -- seek by a relative amount > > Properties are nice, but there's nothing wrong with methods either. If > we went with the second approach, people might foolishly use > "f.position += 4" where they intended "f.seek_by(x)" and it would still > work fine, it just wouldn't be optimized. That's really not so bad. okay, i'm fine with that. but i'm not happy with the fact it's not *possible* to implement such things in python. perhaps with time more use-cases will show it's needed. until then... ;) -tomer On 6/13/06, Michael Chermside wrote: > tomer writes: > > there really is a need for "enhanced descriptors", or something like > > that. i'm having serious trouble implementing the position property, > > as python is currently limited in this area. > > No, this doesn't necessarily imply that we need "enhanced descriptors", > an alternative solution is to change the intended API for file > positions. After all, the original motivation for using a property > was that it was (a) nice to use, (b) easy to read, and (c) possible to > implement. If (c) isn't true then perhaps we rethink the API. > > After all, how bad would it be to use the following: > > f.position -- used to access the current position > f.seek_to(x) -- seek to an absolute position (may be relative to end) > f.seek_by(x) -- seek by a relative amount > > Or even go half-way: > > f.position -- used to access the current position > f.position = x -- seek to an absolute position (may be relative to end) > f.seek_by(x) -- seek by a relative amount > > Properties are nice, but there's nothing wrong with methods either. If > we went with the second approach, people might foolishly use > "f.position += 4" where they intended "f.seek_by(x)" and it would still > work fine, it just wouldn't be optimized. That's really not so bad. > > -- Michael Chermside > > From greg.ewing at canterbury.ac.nz Wed Jun 14 02:39:58 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 14 Jun 2006 12:39:58 +1200 Subject: [Python-3000] enhanced descriptors, part 2 In-Reply-To: <20060613063927.cpbh2nslum2owgsw@login.werra.lunarpages.com> References: <20060613063927.cpbh2nslum2owgsw@login.werra.lunarpages.com> Message-ID: <448F5ADE.2090400@canterbury.ac.nz> Michael Chermside wrote: > f.position = x -- seek to an absolute position (may be relative to end) although the "relative to end" part would still admit the circularity problem (if it's considered to be a problem - personally I'm not too worried what happens if you're silly enough to try to seek before the beginning of a file). -- Greg From greg.ewing at canterbury.ac.nz Wed Jun 14 02:51:41 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 14 Jun 2006 12:51:41 +1200 Subject: [Python-3000] PEP 3101 update In-Reply-To: <20060613173330.9388899E4A@xprdmxin.myway.com> References: <20060613173330.9388899E4A@xprdmxin.myway.com> Message-ID: <448F5D9D.3040702@canterbury.ac.nz> Rudy Rudolph wrote: > '9.3g' which means fill chars and > digits in the first 5 positions, a decimal point if necessary in the > sixth position, and digits and fill chars in the last three positions. So what you're really asking for is an option for suppressing trailing zeroes after a decimal point (and replacing them with spaces). That makes sense, although I think calling it "decimal align" would be confusing. It confused me, because I was thinking of what this means in a word processor, where you're aligning decimal points with some predetermined absolute position. -- Greg From rudyrudolph at excite.com Wed Jun 14 21:45:14 2006 From: rudyrudolph at excite.com (Rudy Rudolph) Date: Wed, 14 Jun 2006 15:45:14 -0400 (EDT) Subject: [Python-3000] PEP 3101 update Message-ID: <20060614194514.3F76B8B354@xprdmxin.myway.com> Greg Ewing wrote: >So what you're really asking for is an option for >suppressing trailing zeroes after a decimal point >(and replacing them with spaces). >That makes sense, although I think calling it >"decimal align" would be confusing. It confused >me, because I was thinking of what this means in >a word processor, where you're aligning decimal >points with some predetermined absolute position. Formatting with 'g' instead of 'f' already suppresses trailing zeroes (and the decimal point if there is no fractional part). Calling it "decimal align" is just as valid as your "right align" and "left align". That is, they align the current piece relative to what was printed before; none gives you absolute positioning. However, if a) the same format string is used for every line, b) all fields specify a width, and c) nothing exceeds its format width, then all the columns will align left, right, decimal, or whatever. That's one of the main uses of format strings, using relative positioning one line at a time to achieve a poor-man's table with the fields in all lines aligned in columns as if the positions had been specified absolutely. Center- and decimal-align nicely round out the left-align, right-align, and pad-after-sign formatting already proposed, and are easy to implement. I therefore ask that they be added to the PEP. BTW, I very much like the proposal in the PEP. Issue to consider: Can decimal alignment be specified together with pad-after-space? Rudy _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web! From greg.ewing at canterbury.ac.nz Thu Jun 15 03:06:36 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 15 Jun 2006 13:06:36 +1200 Subject: [Python-3000] PEP 3101 update In-Reply-To: <20060614194514.3F76B8B354@xprdmxin.myway.com> References: <20060614194514.3F76B8B354@xprdmxin.myway.com> Message-ID: <4490B29C.5050402@canterbury.ac.nz> Rudy Rudolph wrote: > Calling it "decimal align" is just as > valid as your "right align" and "left align". But "decimal align" raises the question "align with *what*?" The answer to that is far less obvious than it is with "left" and "right", IMO. Also, the output of %f is *alread* "decimal aligned" in this sense. The only difference between the current behaviour of %f and your suggested "decimal align" is that trailing zeroes would be suppressed. So it would make a lot more sense to me to call it "suppress trailing zeroes" instead. -- Greg From talin at acm.org Thu Jun 15 06:57:26 2006 From: talin at acm.org (Talin) Date: Wed, 14 Jun 2006 21:57:26 -0700 Subject: [Python-3000] PEP 3101 update In-Reply-To: <20060614194514.3F76B8B354@xprdmxin.myway.com> References: <20060614194514.3F76B8B354@xprdmxin.myway.com> Message-ID: <4490E8B6.7010601@acm.org> Rudy Rudolph wrote: > Center- and decimal-align nicely round out the left-align, > right-align, and pad-after-sign formatting already proposed, > and are easy to implement. I therefore ask that they be added > to the PEP. BTW, I very much like the proposal in the PEP. The basic idea of a decimal align option sounds good to me. I even started working up an implementation, but I haven't had the time to finish it -- I decided to go with the '^' character as an alignment symbol. Greg Ewing's point that this is effectively the same as "pad with spaces" is correct, however I don't think that's the way most people think of it - in other words, generally what people ask for is "line up all the decimal points". Here's my concern however: PEP 3101 is getting rather large, because of all of these little details that are ancilliary to the primary proposal of a 'format' method for string objects. I've already pushed back on Nick Coghlan's otherwise excellent suggestion of allowing the same set of conversion specifiers to be used as a second argument to str() for this reason. (I thought about breaking out the conversion specifiers into a separate PEP, but since they aren't meaningful by themselves it makes no sense to accept one PEP and reject another, and also because then I'd have 3 PEPs in the Python-3000 queue [including 3102], and right now 2 is as much as I want to deal with.) Because 3101 is targeted at Python-3000, and because 3000 is scheduled for release in the distant future, I have no sense as to what the timetable is for acceptance or adoption of this PEP; As far as I know, it could be a year or more before a decision is made, and the PEP might be rejected at the end of that time. So from my point of view, I am faced with the prospect of an ever-expanding PEP as people continue to think of new suggestions over the course of the next year, all of which may come to naught. My feeling is that a good PEP should contain a limited number of BDFL decisions - that is, it should be possible for Guido to go down the checklist and accept / reject / suggest changes to a small number of essential bullet points. I fear that PEP 3101 is going to turn into a kind of omnibus bill with all kinds of little amendments to deal with. For this reason, I'd like to put some sort of limit on lower-level details of the PEP, and let that detail be filled in via the normal feature request / patch submission process once the PEP has actually been accepted. I guess what I also need to do is find some place to post my prototype so that people can criticize it and submit patches to it. What would be ideal for my purposes would be if there was a "research" branch in the Python svn so that wild-eyed radicals such as myself could check in code that is still being discussed by the community and is not yet intended for inclusion in the main tree. This would also allow people who have suggestions to submit patches to the prototype, rather than having to ask me to do it for them. -- Talin From talin at acm.org Fri Jun 16 07:17:49 2006 From: talin at acm.org (Talin) Date: Thu, 15 Jun 2006 22:17:49 -0700 Subject: [Python-3000] We should write a PEP on what goes into the stdlib In-Reply-To: <20060613062830.55hi3ynpppdd8gc4@login.werra.lunarpages.com> References: <20060613062830.55hi3ynpppdd8gc4@login.werra.lunarpages.com> Message-ID: <44923EFD.50804@acm.org> Michael Chermside wrote: > I agree. If we have a PEP with rules for acceptance, then every time we > don't follow those rules exactly we will be accused of favoritism. If > we have informal rules like today and decide things on a case-by-case > basis, then everything is fine. Let me make a suggestion that might help resolve the disagreement. One of my favorite podcasts is "Life of a Law Student", (http://www.lifeofalawstudent.com/) in which a first year law student named Neil Wehneman makes a daily podcast of what he learned in law school that day. One of the ideas that he talks about (Intro to the Law #2) is the difference between a "Rule" and a "Standard": A 'rule' is a definitive test, intended to provide certainty. An example is the speed limit - you are either exceeding the speed limit, or you aren't. A 'standard', on the other hand (at least, in its legal definition) is a set of factors to be weighed by a judge when making a decision. Its purpose is to provide flexibility, allowing human judgement to stay in the loop, but at the same time giving a framework for making those judgements in a consistent way. An example of a standard is fair use under copyright law. When a judge decides whether something is fair use, they use a standard consisting of a number of factors, including the amount of the work copied, the commercial or non-commercial use of the work, and so on. Note that none of these factors are a simple "yes/no" decision - instead, a judgement must be made as to how much a particular case fits the standard. A use of a work can be completely commercial, completely noncommercial, or something inbetween. To the extent that it is noncommercial, that weighs in favor of it being declared fair use; To the extent that it is commercial, that weighs against. So what I would suggest, then, is the creation of a standard (in this legal sense) for what factors should be considered in deciding whether to include something in the stdlib. Moreover, the standard should be clearly labeled as such - to prevent people from interpreting the document as a set of hard rules that they can use to beat other people over the head with. So for example, it might say something like: "To the extent that the module has enjoyed widespread adoption and use within the Python community, this weighs in favor of inclusion." and so on. -- Talin From brett at python.org Fri Jun 16 19:01:46 2006 From: brett at python.org (Brett Cannon) Date: Fri, 16 Jun 2006 10:01:46 -0700 Subject: [Python-3000] We should write a PEP on what goes into the stdlib In-Reply-To: <44923EFD.50804@acm.org> References: <20060613062830.55hi3ynpppdd8gc4@login.werra.lunarpages.com> <44923EFD.50804@acm.org> Message-ID: On 6/15/06, Talin wrote: > > Michael Chermside wrote: > > I agree. If we have a PEP with rules for acceptance, then every time we > > don't follow those rules exactly we will be accused of favoritism. If > > we have informal rules like today and decide things on a case-by-case > > basis, then everything is fine. > > Let me make a suggestion that might help resolve the disagreement. > > One of my favorite podcasts is "Life of a Law Student", > (http://www.lifeofalawstudent.com/) in which a first year law student > named Neil Wehneman makes a daily podcast of what he learned in law > school that day. One of the ideas that he talks about (Intro to the Law > #2) is the difference between a "Rule" and a "Standard": > > A 'rule' is a definitive test, intended to provide certainty. An example > is the speed limit - you are either exceeding the speed limit, or you > aren't. > > A 'standard', on the other hand (at least, in its legal definition) is a > set of factors to be weighed by a judge when making a decision. Its > purpose is to provide flexibility, allowing human judgement to stay in > the loop, but at the same time giving a framework for making those > judgements in a consistent way. > > An example of a standard is fair use under copyright law. When a judge > decides whether something is fair use, they use a standard consisting of > a number of factors, including the amount of the work copied, the > commercial or non-commercial use of the work, and so on. > > Note that none of these factors are a simple "yes/no" decision - > instead, a judgement must be made as to how much a particular case fits > the standard. A use of a work can be completely commercial, completely > noncommercial, or something inbetween. To the extent that it is > noncommercial, that weighs in favor of it being declared fair use; To > the extent that it is commercial, that weighs against. > > So what I would suggest, then, is the creation of a standard (in this > legal sense) for what factors should be considered in deciding whether > to include something in the stdlib. > > Moreover, the standard should be clearly labeled as such - to prevent > people from interpreting the document as a set of hard rules that they > can use to beat other people over the head with. > > So for example, it might say something like: "To the extent that the > module has enjoyed widespread adoption and use within the Python > community, this weighs in favor of inclusion." and so on. At this point, I am dropping the PEP idea and I am going to make it a general doc at python.org/dev/ when a take my intro doc ( http://www.python.org/dev/intro/) and break it out into individual docs for bugs, patches, committing, and getting things into the stdlib or language. So basically I am going with the Standards approach. =) -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060616/8fddb173/attachment.htm From guido at python.org Mon Jun 19 19:22:38 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 19 Jun 2006 10:22:38 -0700 Subject: [Python-3000] We should write a PEP on what goes into the stdlib In-Reply-To: References: <20060613062830.55hi3ynpppdd8gc4@login.werra.lunarpages.com> <44923EFD.50804@acm.org> Message-ID: I'm coming late to this, and am folding all my comments in a single email. Short version: Brett, please go ahead! Here are some comments. [Raymond] > There isn't a consistent ruleset that explains > clearly why decimal, elementtree, email, and textwrap were included > while Cheetah, Twisted, numpy, and BeautifulSoup were not. Oh yes there is. Just look at the names alone. Also release cycles. > Overly general rules are likely to be rife with exceptions and amount to > useless administrivia. I don't think these contentious issues can be > decided in advance. The specifics of each case are more relevant than a > laundry list of generalizations. It still makes sense to have a list of guidelines. (a) This tells potential contributors how high the bar is set. (b) This helps the discussion if "fairness" is invoked (why did module X get accepted?). > We don't need a PEP for every module. If the python-dev discussion says > we want it and Guido approves, then it is a done deal. Agreed (this is the only part where I disagree with Brett). This is not to say that a PEP wouldn't be helpful in some cases; but it's not a requirement. A PEP is helpful when it is likely that the discussion will be long or contentious. [Michael Chermside] > I agree. If we have a PEP with rules for acceptance, then every time we > don't follow those rules exactly we will be accused of favoritism. If > we have informal rules like today and decide things on a case-by-case > basis, then everything is fine. I don't think that not having rules avoids accusations of favoritism. There are many rules and guidelines that are being applied quite consistently when something is proposed for stdlib inclusion (Brett didn't even enumerate all of them; for example an oft-cited rule is that the contributor must commit to maintaining the code for several years). It only makes sense to write these up in one place. Of course we shouldn't create the expectation that anything that matches the rules is automatically accepted (that would be insane). > Rather than a formal PEP, how about a wiki page Absolutely not! Wikis have no official status. Not only can anybody edit them; there's no process in place to remove them when they are outdated. [Talin] > So what I would suggest, then, is the creation of a standard (in this > legal sense) for what factors should be considered in deciding whether > to include something in the stdlib. > > Moreover, the standard should be clearly labeled as such - to prevent > people from interpreting the document as a set of hard rules that they > can use to beat other people over the head with. Sounds like a good idea. Not so different from what I said above about automatic acceptance based on matching the rules. [Brett] > At this point, I am dropping the PEP idea and I am going to make it a > general doc at python.org/dev/ when a take my intro doc > (http://www.python.org/dev/intro/) and break it out into individual > docs for bugs, patches, committing, and getting things into the stdlib > or language. I think this is fine; but I don't think it would be wrong to do it as a PEP. Having a PEP makes it a bit easier for the community to participate in discussing its contents, so I think I would have favored a PEP, but the important thing is that the standard we apply is documented somewhere. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jun 19 22:59:12 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 19 Jun 2006 13:59:12 -0700 Subject: [Python-3000] PEP 3101 update In-Reply-To: <4490E8B6.7010601@acm.org> References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <4490E8B6.7010601@acm.org> Message-ID: Hi Talin, Here's how I see it. The probability of this PEP being accepted doesn't really depend on whether that particular proposed feature is present. Given all possible proposed features, it's probably better to err on the side of exclusion -- a PEP like this is more likely to be rejected due to excessive baggage than due to lack of functionality, as long as it covers all the functionality it's replacing. So, I'm with you: try to get the PEP implemented and accepted before adding too many new features. Once there's an accepted framework, it's easier to add features. (Perhaps there's one use for this particular proposed feature; since it requires adding yet another parameter to certain formatting functions, it would be a good test for the generality of the API. Personally, I wonder if at some point we'll want to pass an arbitrary argument list, and/or keyword args? That would be a more useful feature to add to consider for the PEP than a specific decimal alignment, since it is a feature in support of extensibility.) --Guido On 6/14/06, Talin wrote: > Rudy Rudolph wrote: > > Center- and decimal-align nicely round out the left-align, > > right-align, and pad-after-sign formatting already proposed, > > and are easy to implement. I therefore ask that they be added > > to the PEP. BTW, I very much like the proposal in the PEP. > > The basic idea of a decimal align option sounds good to me. I even > started working up an implementation, but I haven't had the time to > finish it -- I decided to go with the '^' character as an alignment symbol. > > Greg Ewing's point that this is effectively the same as "pad with > spaces" is correct, however I don't think that's the way most people > think of it - in other words, generally what people ask for is "line up > all the decimal points". > > Here's my concern however: PEP 3101 is getting rather large, because of > all of these little details that are ancilliary to the primary proposal > of a 'format' method for string objects. I've already pushed back on > Nick Coghlan's otherwise excellent suggestion of allowing the same set > of conversion specifiers to be used as a second argument to str() for > this reason. > > (I thought about breaking out the conversion specifiers into a separate > PEP, but since they aren't meaningful by themselves it makes no sense to > accept one PEP and reject another, and also because then I'd have 3 PEPs > in the Python-3000 queue [including 3102], and right now 2 is as much as > I want to deal with.) > > Because 3101 is targeted at Python-3000, and because 3000 is scheduled > for release in the distant future, I have no sense as to what the > timetable is for acceptance or adoption of this PEP; As far as I know, > it could be a year or more before a decision is made, and the PEP might > be rejected at the end of that time. So from my point of view, I am > faced with the prospect of an ever-expanding PEP as people continue to > think of new suggestions over the course of the next year, all of which > may come to naught. > > My feeling is that a good PEP should contain a limited number of BDFL > decisions - that is, it should be possible for Guido to go down the > checklist and accept / reject / suggest changes to a small number of > essential bullet points. I fear that PEP 3101 is going to turn into a > kind of omnibus bill with all kinds of little amendments to deal with. > > For this reason, I'd like to put some sort of limit on lower-level > details of the PEP, and let that detail be filled in via the normal > feature request / patch submission process once the PEP has actually > been accepted. > > I guess what I also need to do is find some place to post my prototype > so that people can criticize it and submit patches to it. What would be > ideal for my purposes would be if there was a "research" branch in the > Python svn so that wild-eyed radicals such as myself could check in code > that is still being discussed by the community and is not yet intended > for inclusion in the main tree. This would also allow people who have > suggestions to submit patches to the prototype, rather than having to > ask me to do it for them. > > -- Talin > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Tue Jun 20 20:21:49 2006 From: talin at acm.org (Talin) Date: Tue, 20 Jun 2006 11:21:49 -0700 Subject: [Python-3000] PEP 3101 update In-Reply-To: References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <4490E8B6.7010601@acm.org> Message-ID: <44983CBD.3000703@acm.org> Guido van Rossum wrote: > Hi Talin, > > Here's how I see it. > > The probability of this PEP being accepted doesn't really depend on > whether that particular proposed feature is present. Given all > possible proposed features, it's probably better to err on the side of > exclusion -- a PEP like this is more likely to be rejected due to > excessive baggage than due to lack of functionality, as long as it > covers all the functionality it's replacing. So, I'm with you: try to > get the PEP implemented and accepted before adding too many new > features. Once there's an accepted framework, it's easier to add > features. > > (Perhaps there's one use for this particular proposed feature; since > it requires adding yet another parameter to certain formatting > functions, it would be a good test for the generality of the API. > Personally, I wonder if at some point we'll want to pass an arbitrary > argument list, and/or keyword args? That would be a more useful > feature to add to consider for the PEP than a specific decimal > alignment, since it is a feature in support of extensibility.) Well, one of the design goals for conversion specifiers is conciseness. I suspect you would get a lot of complaints if the conversion specifiers grew much longer than they currently are. So its a balancing act between compressability and readability. There are two reasons for this: First, TOOWTDI. Anything you can do with a conversion specifier can be done by pre-processing the parameter that you pass into the format function. Allowing arbitrary conversion syntax would essentially mean creating a new language-within-a-language that would duplicate functionality that is better expressed by function calls. Secondly, the conversion specifiers should not visually dominate or distract from the format string. That is, when reading the format string, you should be able to mentally skip over the conversion strings without too much trouble. This is much easier if they are short. So in other words, I'm not trying to make the most general API possible, what I am doing instead is looking for various "low hanging fruit", that is useful features that can be expressed in one or two characters without sacrificing overall readability. Anything that requires more than that should be done by function calls. While you are here, I'd like to ask a couple questions: 1) Do you have any reaction to Brett Cannon's idea that we add a second, optional argument to str() that accepts exactly the same conversion specifier syntax? Should I incorporate that into the PEP, or should that be a separate PEP? 2) What's your feeling (and this isn't just directed at you) about having a sandbox area in the svn repository that's open to general modification, kind of like the code version of a wiki? Or, to put it another way, what's the best place to put my code so that people have the ability to hack on it? -- Talin From guido at python.org Tue Jun 20 20:39:44 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 20 Jun 2006 11:39:44 -0700 Subject: [Python-3000] PEP 3101 update In-Reply-To: <44983CBD.3000703@acm.org> References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <4490E8B6.7010601@acm.org> <44983CBD.3000703@acm.org> Message-ID: On 6/20/06, Talin wrote: > While you are here, I'd like to ask a couple questions: > > 1) Do you have any reaction to Brett Cannon's idea that we add a second, > optional argument to str() that accepts exactly the same conversion > specifier syntax? Should I incorporate that into the PEP, or should that > be a separate PEP? Not so keen. This seems to be a completely different use of str(). If we want that API it should be called something else. I don't see an advantage of overloading str(). > 2) What's your feeling (and this isn't just directed at you) about > having a sandbox area in the svn repository that's open to general > modification, kind of like the code version of a wiki? Or, to put it > another way, what's the best place to put my code so that people have > the ability to hack on it? The svn access controls make this impossible AFAIK (but I know very little about them). I suggest you use one of the more distributed alternatives, e.g. Mercurial (I keep hearing good things about it). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Thu Jun 22 04:07:59 2006 From: talin at acm.org (Talin) Date: Wed, 21 Jun 2006 19:07:59 -0700 Subject: [Python-3000] PEP 3101 update In-Reply-To: References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <4490E8B6.7010601@acm.org> <44983CBD.3000703@acm.org> Message-ID: <4499FB7F.9030604@acm.org> Guido van Rossum wrote: > On 6/20/06, Talin wrote: > >> While you are here, I'd like to ask a couple questions: >> >> 1) Do you have any reaction to Brett Cannon's idea that we add a second, >> optional argument to str() that accepts exactly the same conversion >> specifier syntax? Should I incorporate that into the PEP, or should that >> be a separate PEP? > > > Not so keen. This seems to be a completely different use of str(). If > we want that API it should be called something else. I don't see an > advantage of overloading str(). Before we dismiss that too quickly, let me do a better job of explaining the general idea. The motivation for this is converting an arbitrary value to string form - which is exactly what str() does. Only in this case, we want to be able to have some control over the formatting of that string. Converting single values to strings using operator % looks something like this: s = "%2.2g" % f With str.format(), the single-conversion case gets a bit more wordy: s = "{0:2.2g}".format( f ) Instead of all that, why not allow the conversion to be passed to the str() constructor directly: s = str( f, "2.2g" ) It doesn't actually have to be called "str", you could say, for example: s = str.convert( f, "2.2g" ) However, the str() form is more concise and more readable than any of the alternatives presented here. I think it's pretty clear what is intended (especially since C# has a similar syntax for its "ToString()" method.) In any case, I think there's a pretty good argument that one ought to be able to convert single values without having to embed them as fields within a string template. Now, my personal motive for this is that it allows me to cut my PEP in half - because the logic of the conversion specifiers can be isolated and used directly, without having to go through format(). -- Talin From greg.ewing at canterbury.ac.nz Thu Jun 22 09:42:08 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 22 Jun 2006 19:42:08 +1200 Subject: [Python-3000] PEP 3101 update In-Reply-To: <4499FB7F.9030604@acm.org> References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <4490E8B6.7010601@acm.org> <44983CBD.3000703@acm.org> <4499FB7F.9030604@acm.org> Message-ID: <449A49D0.8020108@canterbury.ac.nz> Talin wrote: > s = str.convert( f, "2.2g" ) If format is a string method, then you will already be able to do s = str.format("2.2g", f) if you want. -- Greg From ncoghlan at gmail.com Thu Jun 22 11:49:34 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 22 Jun 2006 19:49:34 +1000 Subject: [Python-3000] PEP 3101 update In-Reply-To: <449A49D0.8020108@canterbury.ac.nz> References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <4490E8B6.7010601@acm.org> <44983CBD.3000703@acm.org> <4499FB7F.9030604@acm.org> <449A49D0.8020108@canterbury.ac.nz> Message-ID: <449A67AE.3090002@gmail.com> Greg Ewing wrote: > Talin wrote: > >> s = str.convert( f, "2.2g" ) > > If format is a string method, then you will already be > able to do > > s = str.format("2.2g", f) > > if you want. Nope. Given the current PEP, it'd have to be one of the following: s = "{0:2.2g}".format(f) s = str.format("{0:2.2g}", f) However, I realised that there's an approach that is aesthetically pleasing and doesn't require using str() for this - simply consider the leading '{0:' and trailing '}' to be implicit if there are no braces at all in the supplied format string. Then you could do things like: >>> "b".format(10) 1010 >>> "o".format(10) 12 >>> "x".format(10) a >>> "X".format(10) A >>> "2.2g".format(10) 10.00 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From murman at gmail.com Thu Jun 22 15:43:50 2006 From: murman at gmail.com (Michael Urman) Date: Thu, 22 Jun 2006 08:43:50 -0500 Subject: [Python-3000] PEP 3101 update In-Reply-To: <449A67AE.3090002@gmail.com> References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <4490E8B6.7010601@acm.org> <44983CBD.3000703@acm.org> <4499FB7F.9030604@acm.org> <449A49D0.8020108@canterbury.ac.nz> <449A67AE.3090002@gmail.com> Message-ID: On 6/22/06, Nick Coghlan wrote: > However, I realised that there's an approach that is aesthetically pleasing > and doesn't require using str() for this - simply consider the leading '{0:' > and trailing '}' to be implicit if there are no braces at all in the supplied > format string. > > Then you could do things like: [examples with missing quotes omitted] And >>> "The implicit braces scare me, for I am weak".format(10) 'ValueError' (Assuming lenient mode, and that str.format raises ValueError for such a case) Michael -- Michael Urman http://www.tortall.net/mu/blog From guido at python.org Thu Jun 22 18:54:51 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 22 Jun 2006 09:54:51 -0700 Subject: [Python-3000] PEP 3101 update In-Reply-To: <449A67AE.3090002@gmail.com> References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <4490E8B6.7010601@acm.org> <44983CBD.3000703@acm.org> <4499FB7F.9030604@acm.org> <449A49D0.8020108@canterbury.ac.nz> <449A67AE.3090002@gmail.com> Message-ID: On 6/22/06, Nick Coghlan wrote: > However, I realised that there's an approach that is aesthetically pleasing > and doesn't require using str() for this - simply consider the leading '{0:' > and trailing '}' to be implicit if there are no braces at all in the supplied > format string. -1. Implicit is better than explicit. It would encourage Python to guess when there are no braces but there is a format argument, instead of throwing an exception. To Talin: I'm all for a way to say blah(x, "2.2g") instead of the more verbose "{2.2g}".format(x). In fact it would probably be great if the latter was officially defined as a way to spell the former combined with literal text: "foo{2.2g}bar{3.3f}spam".format(x, y) is shorter and mor readable than "foo" + blah(x, "2.2g") + "bar" + blah(y, "3.3f") + "spam" What I object to is only the spelling of blah(x, f) as str(x, f). Perhaps a static string method; but probably better some other built-in or something in a new stdlib module. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Fri Jun 23 08:14:01 2006 From: talin at acm.org (Talin) Date: Thu, 22 Jun 2006 23:14:01 -0700 Subject: [Python-3000] PEP 3101 update In-Reply-To: References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <4490E8B6.7010601@acm.org> <44983CBD.3000703@acm.org> <4499FB7F.9030604@acm.org> <449A49D0.8020108@canterbury.ac.nz> <449A67AE.3090002@gmail.com> Message-ID: <449B86A9.8010009@acm.org> Guido van Rossum wrote: > To Talin: I'm all for a way to say blah(x, "2.2g") instead of the more > verbose "{2.2g}".format(x). In fact it would probably be great if the > latter was officially defined as a way to spell the former combined > with literal text: > > "foo{2.2g}bar{3.3f}spam".format(x, y) > > is shorter and mor readable than > > "foo" + blah(x, "2.2g") + "bar" + blah(y, "3.3f") + "spam" > > What I object to is only the spelling of blah(x, f) as str(x, f). > Perhaps a static string method; but probably better some other > built-in or something in a new stdlib module. OK, how about this: y.tostr("3.3f") Essentially I'm proposing adding an overridable method named 'tostr' (or some better name if you can think of one) to class 'object'. Advantages over a builtin: -- Doesn't add another global name -- Easily overridable by subclasses (gets rid of the need for a __format__ call in my PEP.) -- If we make the conversion argument optional, it could eventually replace the magic __str__ method. -- Talin From guido at python.org Fri Jun 23 19:27:21 2006 From: guido at python.org (Guido van Rossum) Date: Fri, 23 Jun 2006 10:27:21 -0700 Subject: [Python-3000] PEP 3101 update In-Reply-To: <449B86A9.8010009@acm.org> References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <4490E8B6.7010601@acm.org> <44983CBD.3000703@acm.org> <4499FB7F.9030604@acm.org> <449A49D0.8020108@canterbury.ac.nz> <449A67AE.3090002@gmail.com> <449B86A9.8010009@acm.org> Message-ID: On 6/22/06, Talin wrote: > Guido van Rossum wrote: > > To Talin: I'm all for a way to say blah(x, "2.2g") instead of the more > > verbose "{2.2g}".format(x). In fact it would probably be great if the > > latter was officially defined as a way to spell the former combined > > with literal text: > > > > "foo{2.2g}bar{3.3f}spam".format(x, y) > > > > is shorter and mor readable than > > > > "foo" + blah(x, "2.2g") + "bar" + blah(y, "3.3f") + "spam" > > > > What I object to is only the spelling of blah(x, f) as str(x, f). > > Perhaps a static string method; but probably better some other > > built-in or something in a new stdlib module. > > OK, how about this: > > y.tostr("3.3f") > > Essentially I'm proposing adding an overridable method named 'tostr' (or > some better name if you can think of one) to class 'object'. > > Advantages over a builtin: > > -- Doesn't add another global name > -- Easily overridable by subclasses (gets rid of the need for a > __format__ call in my PEP.) > -- If we make the conversion argument optional, it could eventually > replace the magic __str__ method. I'm not sure that every object should have this method. Please consider making it just a method in a stdlib module. Perhaps it could use overloaded functions. IMO the PEP would do best not to add new builtins or object methods. (A __format__ method is OK since it just mimics the standard idiom for providing overridable type-specific operations; but perhaps overloadable functions are a better alternative.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at gmail.com Sat Jun 24 04:47:40 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 24 Jun 2006 12:47:40 +1000 Subject: [Python-3000] PEP 3101 update In-Reply-To: References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <4490E8B6.7010601@acm.org> <44983CBD.3000703@acm.org> <4499FB7F.9030604@acm.org> <449A49D0.8020108@canterbury.ac.nz> <449A67AE.3090002@gmail.com> <449B86A9.8010009@acm.org> Message-ID: <449CA7CC.6040506@gmail.com> Guido van Rossum wrote: > "foo{2.2g}bar{3.3f}spam".format(x, y) Getting a format string like that to work would be tricky. With the current PEP, it would need to be: "foo{0:2.2g}bar{1:3.3f}spam".format(x, y) It should be possible to simplify that without ambiguity to: "foo{:2.2g}bar{:3.3f}spam".format(x, y) by having an internal counter in the format function that kept track of how many fields had been encountered that didn't refer to a specific position or name. That is, either the braces were empty ('{}'), or there was nothing before the conversion specifier ('{:}'). To get shorter than that, however, you'd be getting into territory where the interpreter is trying to guess the programmer's intent (e.g. is '{0f}' intentionally just a conversion specifier, or is it a typo for '{0:f}'?). So I think going that far falls foul of EIBTI, the same way my idea of an implicit "{0:" and "}" did. I like the internal counter concept though - it means that purely positional stuff can be written without any additional mental overhead, and with the braces being the only additional typing when compared to the status quo. That way, if you didn't have any field formatting you wanted to do, you could just write: "{} picks up the {}. It is {}!".format(person, thing, adjective) > I'm not sure that every object should have this method. > > Please consider making it just a method in a stdlib module. > > Perhaps it could use overloaded functions. > > IMO the PEP would do best not to add new builtins or object methods. > (A __format__ method is OK since it just mimics the standard idiom for > providing overridable type-specific operations; but perhaps > overloadable functions are a better alternative.) Since the PEP calls "2.2g" and friends conversion specifiers, how about we use an overloaded function "string.convert"? # In string.py @overloaded def convert(obj, spec): """Converts an object to a string using a conversion specifier""" # Default handling is to convert as per PEP 3101 # (AKA the "format_builtin_type" function in Talin's prototype) Objects with alternate conversion specifiers (like datetime objects) would simply overload the function: # In datetime.py @atimport("string") def _string_overloads(module): """Register function overloads in string module""" overload(module.convert, time)(time.strftime) overload(module.convert, date)(date.strftime) overload(module.convert, datetime)(datetime.strftime) The "cformat" function from Talin's prototype could then be named "string.format", with the signature: # In string.py def format(fmt, positional=(), named=None, field_hook=None): """Create a formatted string from positional and named values""" # Format as per PEP 3101 # (AKA the "cformat" function in Talin's prototype) Finally, the format method of str objects would use the above: # Method of str objects def format(self, *args, **kwds): from string import format return format(self, args, kwds) So if you had an existing tuple and/or dictionary, you could do "from string import format" and use the function directly in order to save creation of an unnecessary copy of the containers. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From talin at acm.org Sat Jun 24 08:17:05 2006 From: talin at acm.org (Talin) Date: Fri, 23 Jun 2006 23:17:05 -0700 Subject: [Python-3000] PEP 3101 update In-Reply-To: References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <4490E8B6.7010601@acm.org> <44983CBD.3000703@acm.org> Message-ID: <449CD8E1.7030306@acm.org> Guido van Rossum wrote: > On 6/20/06, Talin wrote: > The svn access controls make this impossible AFAIK (but I know very > little about them). I suggest you use one of the more distributed > alternatives, e.g. Mercurial (I keep hearing good things about it). All right, I spent some time playing around with Mercurial and so far I am pretty impressed. Particularly with the fact that it can be used as a .cgi script. Within an hour after downloading the Mercurial source I was able to: -- Compile and install the package on my laptop -- Create an initial repository -- Check in my prototype -- Compile and install Mercurial on the web server machine (I have an account at bluehost.com) -- Propagate the changes from my laptop to the server -- set up Mercurial to function as a .cgi script -- write a .htaccess file to tell Apache to use it You can see the result here: http://www.viridia.org/hg/python/string_format (My god, there's even an RSS feed. Sheesh!) I'd invite all interested parties to take a look at the code. Some of it's pretty experimental and I am sure that there are better ways to do it. But right not its primarily intended as a proof of concept. -- Talin From tomerfiliba at gmail.com Sat Jun 24 16:25:28 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Sat, 24 Jun 2006 16:25:28 +0200 Subject: [Python-3000] sock2 v0.6 Message-ID: <1d85506f0606240725g4702c7bfw37021a1297c197e2@mail.gmail.com> i updated the sock2 package. this release: * added all the socket options that are defined in socketmodule.c * redesigned the DNS module * updated the design docs on the site http://sebulba.wikispaces.com/project+sock2 please download and mess with it a little, and send back your comments. thanks. -tomer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20060624/a62d59c1/attachment.html From murman at gmail.com Sat Jun 24 17:55:09 2006 From: murman at gmail.com (Michael Urman) Date: Sat, 24 Jun 2006 10:55:09 -0500 Subject: [Python-3000] PEP 3101 update In-Reply-To: <449CA7CC.6040506@gmail.com> References: <20060614194514.3F76B8B354@xprdmxin.myway.com> <44983CBD.3000703@acm.org> <4499FB7F.9030604@acm.org> <449A49D0.8020108@canterbury.ac.nz> <449A67AE.3090002@gmail.com> <449B86A9.8010009@acm.org> <449CA7CC.6040506@gmail.com> Message-ID: On 6/23/06, Nick Coghlan wrote: > I like the internal counter concept though - it means that purely positional > stuff can be written without any additional mental overhead, and with the > braces being the only additional typing when compared to the status quo. > > That way, if you didn't have any field formatting you wanted to do, you could > just write: > > "{} picks up the {}. It is {}!".format(person, thing, adjective) I don't like this, as it makes it easy to fall into a localization trap which the original programmer may have no reason to predict. While it would be possible to add indices to the translated format string (unlike the usual C/C++ equivalent), it would make things much more confusing, possibly tempting constructs like "{1} ... {0} ... {}". I doubt most translators would be intimately familiar with Python's format specification rules. I would much prefer the consistent explicit counter (or lookup-key) in the format specifier. Michael -- Michael Urman http://www.tortall.net/mu/blog