From starsareblueandfaraway at gmail.com Thu May 5 16:37:04 2011 From: starsareblueandfaraway at gmail.com (Roy Hyunjin Han) Date: Thu, 5 May 2011 10:37:04 -0400 Subject: [Python-ideas] [Python-Dev] What if replacing items in a dictionary returns the new dictionary? In-Reply-To: References: <20110429143406.GA441@iskra.aviel.ru> Message-ID: >> 2011/4/29 Roy Hyunjin Han : >> It would be convenient if replacing items in a dictionary returns the >> new dictionary, in a manner analogous to str.replace(). What do you >> think? >> >> # Current behavior >> x = {'key1': 1} >> x.update(key1=3) == None >> x == {'key1': 3} # Original variable has changed >> >> # Possible behavior >> x = {'key1': 1} >> x.replace(key1=3) == {'key1': 3} >> x == {'key1': 1} # Original variable is unchanged >> > 2011/5/5 Giuseppe Ottaviano : > In general nothing stops you to use a proxy object that returns itself > after each method call, something like > > class using(object): > def __init__(self, obj): > self._wrappee = obj > > def unwrap(self): > return self._wrappee > > def __getattr__(self, attr): > def wrapper(*args, **kwargs): > getattr(self._wrappee, attr)(*args, **kwargs) > return self > return wrapper > > > d = dict() > print using(d).update(dict(a=1)).update(dict(b=2)).unwrap() > # prints {'a': 1, 'b': 2} > l = list() > print using(l).append(1).append(2).unwrap() > # prints [1, 2] Cool! I never thought of that. That's a great snippet. I'll forward this to the python-ideas list. I don't think the python-dev people want this discussion to continue on their mailing list. From starsareblueandfaraway at gmail.com Thu May 5 16:42:57 2011 From: starsareblueandfaraway at gmail.com (Roy Hyunjin Han) Date: Thu, 5 May 2011 10:42:57 -0400 Subject: [Python-ideas] [Python-Dev] What if replacing items in a dictionary returns the new dictionary? In-Reply-To: References: <20110429143406.GA441@iskra.aviel.ru> Message-ID: >> ? ?# Possible behavior >> ? ?x = {'key1': 1} >> ? ?x.replace(key1=3) == {'key1': 3} >> ? ?x == {'key1': 1} # Original variable is unchanged >> > 2011/5/5 Giuseppe Ottaviano : > class using(object): > ? ?def __init__(self, obj): > ? ? ? ?self._wrappee = obj > > ? ?def unwrap(self): > ? ? ? ?return self._wrappee > > ? ?def __getattr__(self, attr): > ? ? ? ?def wrapper(*args, **kwargs): > ? ? ? ? ? ?getattr(self._wrappee, attr)(*args, **kwargs) > ? ? ? ? ? ?return self > ? ? ? ?return wrapper The only thing I would add is obj.copy(), to ensure that the original dictionary is unchanged. class using(object): def __init__(self, obj): self._wrappee = obj.copy() From starsareblueandfaraway at gmail.com Thu May 5 17:19:16 2011 From: starsareblueandfaraway at gmail.com (Roy Hyunjin Han) Date: Thu, 5 May 2011 11:19:16 -0400 Subject: [Python-ideas] [Python-Dev] What if replacing items in a dictionary returns the new dictionary? In-Reply-To: References: <20110429143406.GA441@iskra.aviel.ru> Message-ID: 2011/5/5 Giuseppe Ottaviano : >> The only thing I would add is obj.copy(), to ensure that the original >> dictionary is unchanged. >> >> class using(object): >> ? ?def __init__(self, obj): >> ? ? ? ?self._wrappee = obj.copy() > > My example was just a proof of concept, there are many other things > that may need to be taken care of (for example, non-callable > attributes). > BTW, the copy should be done outside. If the object is copied, I'd say > "using" is a poor choice of name for the proxy. You're right, I would need to do more work to get it to mimic the underlying object. I think I will stick with Oleg's suggestion to subclass dict for now; it's great for unit tests. Thanks for the idea, though. class ReplaceableDict(dict): def replace(self, **kwargs): 'Works for replacing string-based keys' return dict(self.items() + kwargs.items()) From moloney at ohsu.edu Thu May 5 23:41:06 2011 From: moloney at ohsu.edu (Brendan Moloney) Date: Thu, 5 May 2011 14:41:06 -0700 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> Message-ID: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> Hello, I posted this on python-dev, but was told that this is the more appropriate list. Currently if I do: $ import pkg Then all of the public subpackages/submodules are not automatically pulled into the 'pkg' namespace. I can do: $ from pkg import * To get all of the public subpackages/submodules, but that dumps them all into the current namespace. Why not allow: $ import pkg.* This would allow easier interactive use (by eliminating the need to import individual subpackages/submodules) while keeping the 'pkg' namespace around. Thanks, Brendan Moloney From benjamin at python.org Fri May 6 00:00:35 2011 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 5 May 2011 22:00:35 +0000 (UTC) Subject: [Python-ideas] Allow 'import star' with namespaces References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> Message-ID: Brendan Moloney writes: > This would allow easier interactive use (by eliminating the need to import individual > subpackages/submodules) while keeping the 'pkg' namespace around. import * is generally frowned upon, so encouraging its use by extending it is not a good idea. From moloney at ohsu.edu Fri May 6 00:24:16 2011 From: moloney at ohsu.edu (Brendan Moloney) Date: Thu, 5 May 2011 15:24:16 -0700 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu>, Message-ID: <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> Benjamin Peterson [benjamin at python.org] wrote: > import * is generally frowned upon, so encouraging its use by extending it is > not a good idea. Well it is frowned upon precisely because it pollutes the current namespace. This change would eliminate that issue. From dag.odenhall at gmail.com Fri May 6 09:20:26 2011 From: dag.odenhall at gmail.com (dag.odenhall at gmail.com) Date: Fri, 6 May 2011 09:20:26 +0200 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> Message-ID: On 6 May 2011 00:24, Brendan Moloney wrote: > Benjamin Peterson [benjamin at python.org] wrote: >> import * is generally frowned upon, so encouraging its use by extending it is >> not a good idea. > > Well it is frowned upon precisely because it pollutes the current namespace. This change would eliminate that issue. I like this idea, except it's inconsistent with from-import-star, the latter which does *not* get you sub-packages or modules. From g.brandl at gmx.net Fri May 6 09:44:02 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 06 May 2011 09:44:02 +0200 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> Message-ID: On 06.05.2011 09:20, dag.odenhall at gmail.com wrote: > On 6 May 2011 00:24, Brendan Moloney wrote: >> Benjamin Peterson [benjamin at python.org] wrote: >>> import * is generally frowned upon, so encouraging its use by extending it is >>> not a good idea. >> >> Well it is frowned upon precisely because it pollutes the current namespace. This change would eliminate that issue. > > I like this idea, except it's inconsistent with from-import-star, the > latter which does *not* get you sub-packages or modules. And that's for a reason: it's not easy (I think it's even impossible, because for example individual submodules can change __path__) to determine all importable submodules of a package. So ``import pkg.*`` would not have any behavior other than ``import pkg``. Georg From matt at whoosh.ca Fri May 6 19:51:24 2011 From: matt at whoosh.ca (Matt Chaput) Date: Fri, 06 May 2011 13:51:24 -0400 Subject: [Python-ideas] 1_000_000 Message-ID: <4DC4351C.2000109@whoosh.ca> Not sure if this has been proposed before: A syntax change to allow underscores as thousands separators in literal numbers to improve readability, e.g.: for i in range(1, 1_000_000): pass I believe D allows this and while it's a small thing it really is much more readable. Worth a PEP? Thanks, Matt From janssen at parc.com Fri May 6 21:11:59 2011 From: janssen at parc.com (Bill Janssen) Date: Fri, 6 May 2011 12:11:59 PDT Subject: [Python-ideas] thoughts on regular expression improvements Message-ID: <98999.1304709119@parc.com> I've been doing a lot of RE hacking lately, and some possible improvements suggest themselves. 1. Multiple occurrences of a named group Right now, you can compose RE's with x = re.compile("...") y = re.compile("..." + x.pattern + "...") But if x contains named groups, you run into trouble if you have something like z = re.compile("..." + x.pattern + "..." + x.pattern + "...") which can easily happen if x could occur at various places in z. The issue is that a named group is only allowed once, which isn't a bad error-prevention mechanism, but it would be nice if it could occur more than once (in alternative subexpressions), perhaps enabled by a another RE flag. 2. Easier composition. Writing y = re.compile("..." + x.pattern + "...") seems a tad groty, to use a term from my childhood, and affords the RE engine no purchase on the composition, which can be an issue if the flags for x are different from the flags for y. If the first argument to re.compile could be a tuple or list, you could write y = re.compile(["...", x, "..."]) and the engine could see that "..." is a string, and that x is a RE, and could inspect x as necessary. 3. Edit distances. The RE engine TRE (http://laurikari.net/tre/about/) supports fuzzy matching of strings, using edit distances. One can write an expression like "(total){~2}" which would any string that's "total" with no more than two edit errors. You can also specify insertions, deletions, and substitution limits separately with "+", "-", and "#". That would be nice to have... Bill From moloney at ohsu.edu Fri May 6 21:49:08 2011 From: moloney at ohsu.edu (Brendan Moloney) Date: Fri, 6 May 2011 12:49:08 -0700 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> , Message-ID: <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> dag.odenhall at gmail.com wrote: > I like this idea, except it's inconsistent with from-import-star, the > latter which does *not* get you sub-packages or modules. Georg Brandl [g.brandl at gmx.net] wrote: > And that's for a reason: it's not easy (I think it's even impossible, because > for example individual submodules can change __path__) to determine all > importable submodules of a package. > So ``import pkg.*`` would not have any behavior other than ``import pkg``. When I said all _public_ sub-packages and modules I was referring to those listed in the __all__ attribute of 'pkg'. Thus it would behave in the exact same way as from-import-star except you don't pollute the current namespace. Brendan From dirkjan at ochtman.nl Fri May 6 21:58:36 2011 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Fri, 6 May 2011 21:58:36 +0200 Subject: [Python-ideas] thoughts on regular expression improvements In-Reply-To: <98999.1304709119@parc.com> References: <98999.1304709119@parc.com> Message-ID: On Fri, May 6, 2011 at 21:11, Bill Janssen wrote: > I've been doing a lot of RE hacking lately, and some possible > improvements suggest themselves. Have you looked at the regex module? Cheers, Dirkjan From ethan at stoneleaf.us Fri May 6 22:12:00 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 06 May 2011 13:12:00 -0700 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> , <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> Message-ID: <4DC45610.3040803@stoneleaf.us> Brendan Moloney wrote: > dag.odenhall at gmail.com wrote: >> I like this idea, except it's inconsistent with from-import-star, the >> latter which does *not* get you sub-packages or modules. > > Georg Brandl [g.brandl at gmx.net] wrote: >> And that's for a reason: it's not easy (I think it's even impossible, because >> for example individual submodules can change __path__) to determine all >> importable submodules of a package. > >> So ``import pkg.*`` would not have any behavior other than ``import pkg``. > > When I said all _public_ sub-packages and modules I was referring to those > listed in the __all__ attribute of 'pkg'. Thus it would behave in the exact > same way as from-import-star except you don't pollute the current namespace. I'm not catching the vision -- could you put together a short example that would illustrate? ~Ethan~ From janssen at parc.com Fri May 6 22:28:12 2011 From: janssen at parc.com (Bill Janssen) Date: Fri, 6 May 2011 13:28:12 PDT Subject: [Python-ideas] thoughts on regular expression improvements In-Reply-To: References: <98999.1304709119@parc.com> Message-ID: <641.1304713692@parc.com> Dirkjan Ochtman wrote: > On Fri, May 6, 2011 at 21:11, Bill Janssen wrote: > > I've been doing a lot of RE hacking lately, and some possible > > improvements suggest themselves. > > Have you looked at the regex module? >From Python 1.4? Not in a long time... Bill From janssen at parc.com Fri May 6 22:32:18 2011 From: janssen at parc.com (Bill Janssen) Date: Fri, 6 May 2011 13:32:18 PDT Subject: [Python-ideas] thoughts on regular expression improvements In-Reply-To: References: <98999.1304709119@parc.com> Message-ID: <818.1304713938@parc.com> Dirkjan Ochtman wrote: > On Fri, May 6, 2011 at 21:11, Bill Janssen wrote: > > I've been doing a lot of RE hacking lately, and some possible > > improvements suggest themselves. > > Have you looked at the regex module? Ah, you mean the PyPI "regex". Looks like it has "branch reset", which might support my #1? Using the same group name multiple times? I don't see fuzzy matches, or support for composition, though. Bill From jsbueno at python.org.br Fri May 6 22:42:53 2011 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 6 May 2011 17:42:53 -0300 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: <4DC45610.3040803@stoneleaf.us> References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> <4DC45610.3040803@stoneleaf.us> Message-ID: On Fri, May 6, 2011 at 5:12 PM, Ethan Furman wrote: > Brendan Moloney wrote: >> >> dag.odenhall at gmail.com wrote: >>> >>> I like this idea, except it's inconsistent with from-import-star, the >>> latter which does *not* get you sub-packages or modules. >> >> Georg Brandl [g.brandl at gmx.net] wrote: >>> >>> And that's for a reason: it's not easy (I think it's even impossible, >>> because >>> for example individual submodules can change __path__) to determine all >>> importable submodules of a package. >> >>> So ``import pkg.*`` would not have any behavior other than ``import >>> pkg``. >> >> When I said all _public_ sub-packages and modules I was referring to those > >> listed in the ?__all__ attribute of 'pkg'. ?Thus it would behave in the >> exact >> same way as from-import-star except you don't pollute the current >> namespace. > > > I'm not catching the vision -- could you put together a short example that > would illustrate? The idea is to be able to do operate witha single import when submodules would have to be implicited imported - like xml.etree.ElementTree : [gwidion at powerpuff ~]$ python Python 2.6.1 (r261:67515, Apr 12 2009, 04:14:16) [GCC 4.3.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import xml >>> xml.etree Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'etree' >>> import xml.etree >>> xml.etree.ElementTree Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'ElementTree' >>> import xml.etree.ElementTree >>> xml.etree.ElementTree > > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From moloney at ohsu.edu Fri May 6 22:50:14 2011 From: moloney at ohsu.edu (Brendan Moloney) Date: Fri, 6 May 2011 13:50:14 -0700 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: <4DC45610.3040803@stoneleaf.us> References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> , <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu>, <4DC45610.3040803@stoneleaf.us> Message-ID: <5E25C96030E66B44B9CFAA95D3DE5919351310A7B5@EX-MB08.ohsu.edu> Ethan Furman [ethan at stoneleaf.us] wrote: > I'm not catching the vision -- could you put together a short example > that would illustrate? The motivation is really just for interactive usage (much like the current from-import-star). If 'pkg' contains a number of sub-packages/modules that take a while to import, it makes sense to not automatically import them into the 'pkg' namespace (in the pkg.__init__ module). Putting the sub-package/module names into the __all__ list gives interactive users the ability to import everything in one go using from-import-star. Unfortunately the from-import-star usage pollutes the current namespace, and thus its use is discouraged. So really the vision is that developers can make their packages convenient for interactive use (by setting the __all__ attribute) without requiring users to use a discouraged language feature or making regular import of the package slow. Brendan From ericsnowcurrently at gmail.com Fri May 6 22:52:09 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 6 May 2011 14:52:09 -0600 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: <4DC45610.3040803@stoneleaf.us> References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> <4DC45610.3040803@stoneleaf.us> Message-ID: On Fri, May 6, 2011 at 2:12 PM, Ethan Furman wrote: > Brendan Moloney wrote: > >> dag.odenhall at gmail.com wrote: >> >>> I like this idea, except it's inconsistent with from-import-star, the >>> latter which does *not* get you sub-packages or modules. >>> >> >> Georg Brandl [g.brandl at gmx.net] wrote: >> >>> And that's for a reason: it's not easy (I think it's even impossible, >>> because >>> for example individual submodules can change __path__) to determine all >>> importable submodules of a package. >>> >> >> So ``import pkg.*`` would not have any behavior other than ``import >>> pkg``. >>> >> >> When I said all _public_ sub-packages and modules I was referring to those >> > > listed in the __all__ attribute of 'pkg'. Thus it would behave in the > exact > > same way as from-import-star except you don't pollute the current > namespace. > > > I'm not catching the vision -- could you put together a short example that > would illustrate? > > He's saying that the package would be imported like normal. Then all "public" sub-modules of the package would automatically imported and bound to the namespace of the object that resulted from the import of the package. The trickery is that __all__ in the __init__.py would change meaning somewhat, and, do you bind the submodules into the package's module object or something else? If you have a list of the submodules you want imported then you can already accomplish this: import parent for mod in parent.__all_submodules__: __import__("parent.{}".format(mod)) Of course, this does not bind the submodules to the namespace of the package module, but I suppose you could try that with one more step. I am not sure of the specific import mechanism with regards to name binding, but that would seem to be a conflict with the way imported names for submodules are bound. -eric ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dag.odenhall at gmail.com Fri May 6 22:59:05 2011 From: dag.odenhall at gmail.com (dag.odenhall at gmail.com) Date: Fri, 6 May 2011 22:59:05 +0200 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: <4DC45610.3040803@stoneleaf.us> References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> <4DC45610.3040803@stoneleaf.us> Message-ID: On 6 May 2011 22:12, Ethan Furman wrote: > Brendan Moloney wrote: >> >> dag.odenhall at gmail.com wrote: >>> >>> I like this idea, except it's inconsistent with from-import-star, the >>> latter which does *not* get you sub-packages or modules. >> >> Georg Brandl [g.brandl at gmx.net] wrote: >>> >>> And that's for a reason: it's not easy (I think it's even impossible, >>> because >>> for example individual submodules can change __path__) to determine all >>> importable submodules of a package. >> >>> So ``import pkg.*`` would not have any behavior other than ``import >>> pkg``. >> >> When I said all _public_ sub-packages and modules I was referring to those > >> listed in the ?__all__ attribute of 'pkg'. ?Thus it would behave in the >> exact >> same way as from-import-star except you don't pollute the current >> namespace. If you're going to require listing in __all__ anyway, you might as well use what already works: import the modules in the package, and you can then import the package and access the modules as attributes: pkg/__init__.py: from . import mod script.py: import pkg pkg.mod #=> pkg/mod.py From dag.odenhall at gmail.com Fri May 6 23:06:18 2011 From: dag.odenhall at gmail.com (dag.odenhall at gmail.com) Date: Fri, 6 May 2011 23:06:18 +0200 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC4351C.2000109@whoosh.ca> References: <4DC4351C.2000109@whoosh.ca> Message-ID: On 6 May 2011 19:51, Matt Chaput wrote: > Not sure if this has been proposed before: A syntax change to allow > underscores as thousands separators in literal numbers to improve > readability, e.g.: > > ?for i in range(1, 1_000_000): > ? ?pass > > I believe D allows this and while it's a small thing it really is much more > readable. Ruby too. You could also use e-notation[1]: 1e6, in your example. In many situations it's even more readable because you don't need to "count the zeros". This is already supported in Python. [1] http://en.wikipedia.org/wiki/Scientific_notation#E_notation From nadeem.vawda at gmail.com Fri May 6 23:23:05 2011 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Fri, 6 May 2011 23:23:05 +0200 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC4351C.2000109@whoosh.ca> Message-ID: On Fri, May 6, 2011 at 11:06 PM, dag.odenhall at gmail.com wrote: > You could also use e-notation[1]: 1e6, in your example. 1e6 is a float, though. If you use it in that example, range() complains that its arguments must be integers. From solipsis at pitrou.net Fri May 6 23:24:07 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 6 May 2011 23:24:07 +0200 Subject: [Python-ideas] 1_000_000 References: <4DC4351C.2000109@whoosh.ca> Message-ID: <20110506232407.2bd211a1@pitrou.net> On Fri, 6 May 2011 23:06:18 +0200 "dag.odenhall at gmail.com" wrote: > On 6 May 2011 19:51, Matt Chaput wrote: > > Not sure if this has been proposed before: A syntax change to allow > > underscores as thousands separators in literal numbers to improve > > readability, e.g.: > > > > ?for i in range(1, 1_000_000): > > ? ?pass > > > > I believe D allows this and while it's a small thing it really is much more > > readable. > > Ruby too. > > You could also use e-notation[1]: 1e6, in your example. In many > situations it's even more readable because you don't need to "count > the zeros". This is already supported in Python. Yes, but it gives a float, not an integer: >>> for i in range(0, 1e6): pass ... Traceback (most recent call last): File "", line 1, in TypeError: 'float' object cannot be interpreted as an integer Regards Antoine. From kirubakaran at gmail.com Fri May 6 23:25:56 2011 From: kirubakaran at gmail.com (Kirubakaran) Date: Fri, 6 May 2011 14:25:56 -0700 Subject: [Python-ideas] 1_000_000 In-Reply-To: <20110506232407.2bd211a1@pitrou.net> References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> Message-ID: How about range(10**60) ? - Kirubakaran. On Fri, May 6, 2011 at 2:24 PM, Antoine Pitrou wrote: > On Fri, 6 May 2011 23:06:18 +0200 > "dag.odenhall at gmail.com" > wrote: > > On 6 May 2011 19:51, Matt Chaput < > matt-KKMwxO2wslj3fQ9qLvQP4Q at public.gmane.org> wrote: > > > Not sure if this has been proposed before: A syntax change to allow > > > underscores as thousands separators in literal numbers to improve > > > readability, e.g.: > > > > > > for i in range(1, 1_000_000): > > > pass > > > > > > I believe D allows this and while it's a small thing it really is much > more > > > readable. > > > > Ruby too. > > > > You could also use e-notation[1]: 1e6, in your example. In many > > situations it's even more readable because you don't need to "count > > the zeros". This is already supported in Python. > > Yes, but it gives a float, not an integer: > > >>> for i in range(0, 1e6): pass > ... > Traceback (most recent call last): > File "", line 1, in > TypeError: 'float' object cannot be interpreted as an integer > > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kirubakaran at gmail.com Fri May 6 23:26:14 2011 From: kirubakaran at gmail.com (Kirubakaran) Date: Fri, 6 May 2011 14:26:14 -0700 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> Message-ID: (fixed typo) How about range(10**6) ? - Kirubakaran. On Fri, May 6, 2011 at 2:25 PM, Kirubakaran wrote: > How about range(10**60) ? > > - Kirubakaran. > > > On Fri, May 6, 2011 at 2:24 PM, Antoine Pitrou wrote: > >> On Fri, 6 May 2011 23:06:18 +0200 >> "dag.odenhall at gmail.com" >> wrote: >> > On 6 May 2011 19:51, Matt Chaput < >> matt-KKMwxO2wslj3fQ9qLvQP4Q at public.gmane.org> wrote: >> > > Not sure if this has been proposed before: A syntax change to allow >> > > underscores as thousands separators in literal numbers to improve >> > > readability, e.g.: >> > > >> > > for i in range(1, 1_000_000): >> > > pass >> > > >> > > I believe D allows this and while it's a small thing it really is much >> more >> > > readable. >> > >> > Ruby too. >> > >> > You could also use e-notation[1]: 1e6, in your example. In many >> > situations it's even more readable because you don't need to "count >> > the zeros". This is already supported in Python. >> >> Yes, but it gives a float, not an integer: >> >> >>> for i in range(0, 1e6): pass >> ... >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: 'float' object cannot be interpreted as an integer >> >> >> Regards >> >> Antoine. >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matt at whoosh.ca Fri May 6 23:36:47 2011 From: matt at whoosh.ca (Matt Chaput) Date: Fri, 06 May 2011 17:36:47 -0400 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> Message-ID: <4DC469EF.5000408@whoosh.ca> On 06/05/2011 5:26 PM, Kirubakaran wrote: > (fixed typo) > How about range(10**6) ? Both 1e6 (if it worked in the example) and 10**6 both require a bit of work (at least for my non-mathematician brain) to decode as "1 million", whereas with 1_000_000 you're not so much counting the zeros in your head as counting the *groups* of zeros visually. For me it's much more readable at a glance. Also, obviously the 10**6 trick doesn't work so well if the example is: for i in range(47_284_345): pass Matt From kirubakaran at gmail.com Fri May 6 23:37:10 2011 From: kirubakaran at gmail.com (Kirubakaran) Date: Fri, 6 May 2011 14:37:10 -0700 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> Message-ID: Ah, thanks. Sorry, I don't know how I failed to see that. On Fri, May 6, 2011 at 2:30 PM, Andre Roberge wrote: > I believe that the original suggestion was meant to be more general than > the specific suggestions for powers of 10. For example, consider the > following hypothetical: > > for i in range(1, 1_111_111_111, 1024): > pass > > where the _ really helps in figuring out the size. > > Andr? > > > On Fri, May 6, 2011 at 6:26 PM, Kirubakaran wrote: > >> (fixed typo) >> How about range(10**6) ? >> >> - Kirubakaran. >> >> >> On Fri, May 6, 2011 at 2:25 PM, Kirubakaran wrote: >> >>> How about range(10**60) ? >>> >>> - Kirubakaran. >>> >>> >>> On Fri, May 6, 2011 at 2:24 PM, Antoine Pitrou wrote: >>> >>>> On Fri, 6 May 2011 23:06:18 +0200 >>>> "dag.odenhall at gmail.com" >>>> wrote: >>>> > On 6 May 2011 19:51, Matt Chaput < >>>> matt-KKMwxO2wslj3fQ9qLvQP4Q at public.gmane.org> wrote: >>>> > > Not sure if this has been proposed before: A syntax change to allow >>>> > > underscores as thousands separators in literal numbers to improve >>>> > > readability, e.g.: >>>> > > >>>> > > for i in range(1, 1_000_000): >>>> > > pass >>>> > > >>>> > > I believe D allows this and while it's a small thing it really is >>>> much more >>>> > > readable. >>>> > >>>> > Ruby too. >>>> > >>>> > You could also use e-notation[1]: 1e6, in your example. In many >>>> > situations it's even more readable because you don't need to "count >>>> > the zeros". This is already supported in Python. >>>> >>>> Yes, but it gives a float, not an integer: >>>> >>>> >>> for i in range(0, 1e6): pass >>>> ... >>>> Traceback (most recent call last): >>>> File "", line 1, in >>>> TypeError: 'float' object cannot be interpreted as an integer >>>> >>>> >>>> Regards >>>> >>>> Antoine. >>>> >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> http://mail.python.org/mailman/listinfo/python-ideas >>>> >>> >>> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leapyear.org Fri May 6 23:38:19 2011 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 6 May 2011 14:38:19 -0700 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> Message-ID: None of these answers address the original suggestion. Matt didn't say that he only wanted this for numbers of the form 10^N; he just gave that as an example. Consider these examples instead: - 1_234_000 - 9.876_543_210 - 0xFEFF_0042 I'm not advocating this change (nor against it); I just think the discussion should be focused on the actual idea. I do have a question: Is _ just ignored in numbers or are there more complex rules? - 1_2345_6789 (can I use groups of other sizes instead?) - 1_2_3_4_5 (ditto) - 1_234_6789 (do all the groups need to be the same size?) - 1_ (must the _ only be in between 2 digits?) - 1__234 (what about multiple _s?) - 9.876_543_210 (can it be used to the right of the decimal point?) - 0xFEFF_0042 (can it be used in hex, octal or binary numbers?) - int('123_456') (do other functions accept this syntax too?) --- Bruce Puzzazz newsletter: http://j.mp/puzzazz-news-2011-04 including April Fools! Blog post: http://www.vroospeak.com Ironically, a glaring Google grammatical error On Fri, May 6, 2011 at 2:26 PM, Kirubakaran wrote: > (fixed typo) > How about range(10**6) ? > > - Kirubakaran. > > > On Fri, May 6, 2011 at 2:25 PM, Kirubakaran wrote: > >> How about range(10**60) ? >> >> - Kirubakaran. >> >> >> On Fri, May 6, 2011 at 2:24 PM, Antoine Pitrou wrote: >> >>> On Fri, 6 May 2011 23:06:18 +0200 >>> "dag.odenhall at gmail.com" >>> wrote: >>> > On 6 May 2011 19:51, Matt Chaput < >>> matt-KKMwxO2wslj3fQ9qLvQP4Q at public.gmane.org> wrote: >>> > > Not sure if this has been proposed before: A syntax change to allow >>> > > underscores as thousands separators in literal numbers to improve >>> > > readability, e.g.: >>> > > >>> > > for i in range(1, 1_000_000): >>> > > pass >>> > > >>> > > I believe D allows this and while it's a small thing it really is >>> much more >>> > > readable. >>> > >>> > Ruby too. >>> > >>> > You could also use e-notation[1]: 1e6, in your example. In many >>> > situations it's even more readable because you don't need to "count >>> > the zeros". This is already supported in Python. >>> >>> Yes, but it gives a float, not an integer: >>> >>> >>> for i in range(0, 1e6): pass >>> ... >>> Traceback (most recent call last): >>> File "", line 1, in >>> TypeError: 'float' object cannot be interpreted as an integer >>> >>> >>> Regards >>> >>> Antoine. >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> http://mail.python.org/mailman/listinfo/python-ideas >>> >> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sat May 7 00:04:43 2011 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 6 May 2011 23:04:43 +0100 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> <4DC45610.3040803@stoneleaf.us> Message-ID: On 6 May 2011 21:52, Eric Snow wrote: > He's saying that the package would be imported like normal. ?Then all > "public" sub-modules of the package would automatically imported and bound > to the namespace of the object that resulted from the import of the package. There is no means of determining what submodules of a package exist. Check PEP 302 for details - finders find modules ant they can do so any way they like - there's nothing in the protocol to enumerate subpackages, so you can't do it (if faced with a general PEP 302 finder). Paul. From ethan at stoneleaf.us Sat May 7 00:40:06 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 06 May 2011 15:40:06 -0700 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> Message-ID: <4DC478C6.3010801@stoneleaf.us> Bruce Leban wrote: > None of these answers address the original suggestion. Matt didn't say > that he only wanted this for numbers of the form 10^N; he just gave that > as an example. > > Consider these examples instead: > > * 1_234_000 > * 9.876_543_210 > * 0xFEFF_0042 > > I'm not advocating this change (nor against it); I just think the > discussion should be focused on the actual idea. I do have a question: > > Is _ just ignored in numbers or are there more complex rules? > > * 1_2345_6789 (can I use groups of other sizes instead?) > * 1_2_3_4_5 (ditto) > * 1_234_6789 (do all the groups need to be the same size?) > * 1_ (must the _ only be in between 2 digits?) > * 1__234 (what about multiple _s?) > * 9.876_543_210 (can it be used to the right of the decimal point?) > * 0xFEFF_0042 (can it be used in hex, octal or binary numbers?) > * int('123_456') (do other functions accept this syntax too?) I would say it's ignored. Have the rule be something like number_string.replace('_',''). The only wrinkle is that currently '_1' is usable name, and that should probably be disallowed if the above change took place. I'm +1 on the idea. ~Ethan~ From alexander.belopolsky at gmail.com Sat May 7 00:42:59 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 6 May 2011 18:42:59 -0400 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC478C6.3010801@stoneleaf.us> References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC478C6.3010801@stoneleaf.us> Message-ID: On Fri, May 6, 2011 at 6:40 PM, Ethan Furman wrote: .. > The only wrinkle is that currently '_1' is usable name, and that should > probably be disallowed if the above change took place. -1_000 if _1 becomes invalid as an identifier. +0 otherwise. From fdrake at acm.org Sat May 7 00:45:23 2011 From: fdrake at acm.org (Fred Drake) Date: Fri, 6 May 2011 18:45:23 -0400 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC478C6.3010801@stoneleaf.us> References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC478C6.3010801@stoneleaf.us> Message-ID: On Fri, May 6, 2011 at 6:40 PM, Ethan Furman wrote: > The only wrinkle is that currently '_1' is usable name, and that should > probably be disallowed if the above change took place. Why? I've never seen a leading thousands separator in practice. For example, ,123,456 isn't generally accepted usage, so why should _123_456 be considered acceptable? (I'm not taking a position on the proposal here; just commenting on the problem of breaking code by making _1 a number instead of an identifier.) -Fred -- Fred L. Drake, Jr.? ? "Give me the luxuries of life and I will willingly do without the necessities." ?? --Frank Lloyd Wright From ethan at stoneleaf.us Sat May 7 00:58:50 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 06 May 2011 15:58:50 -0700 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC478C6.3010801@stoneleaf.us> Message-ID: <4DC47D2A.9090808@stoneleaf.us> Alexander Belopolsky wrote: > On Fri, May 6, 2011 at 6:40 PM, Ethan Furman wrote: > .. >> The only wrinkle is that currently '_1' is usable name, and that should >> probably be disallowed if the above change took place. > > -1_000 if _1 becomes invalid as an identifier. > > +0 otherwise. So you use _8127 style names for your objects* then? ~Ethan~ *Okay, avoiding the word 'variables' can make for some slightly odd sounding sentences! ;) From ethan at stoneleaf.us Sat May 7 01:02:08 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 06 May 2011 16:02:08 -0700 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC478C6.3010801@stoneleaf.us> Message-ID: <4DC47DF0.1020001@stoneleaf.us> Fred Drake wrote: > On Fri, May 6, 2011 at 6:40 PM, Ethan Furman wrote: >> The only wrinkle is that currently '_1' is usable name, and that should >> probably be disallowed if the above change took place. > > Why? I've never seen a leading thousands separator in practice. For example, > > ,123,456 > > isn't generally accepted usage, so why should > > _123_456 > > be considered acceptable? > > (I'm not taking a position on the proposal here; just commenting on the problem > of breaking code by making _1 a number instead of an identifier.) I see it as a readability issue -- if you have 1_024 and _1025 (etc, etc), where one is a number and the other a name, confusion can easily result. ~Ethan~ From fdrake at acm.org Sat May 7 00:59:02 2011 From: fdrake at acm.org (Fred Drake) Date: Fri, 6 May 2011 18:59:02 -0400 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC47D2A.9090808@stoneleaf.us> References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC478C6.3010801@stoneleaf.us> <4DC47D2A.9090808@stoneleaf.us> Message-ID: On Fri, May 6, 2011 at 6:58 PM, Ethan Furman wrote: > So you use _8127 style names for your objects* then? Code generators often use such names, though. Since _1234 is currently a legal identifier, you'd be breaking backward compatibility. I understand the motivation for a thousands separator, at least (though I'll admit, I don't find it compelling; *all* big numbers in code are too magical). -Fred -- Fred L. Drake, Jr.? ? "Give me the luxuries of life and I will willingly do without the necessities." ?? --Frank Lloyd Wright From cs at zip.com.au Sat May 7 00:51:38 2011 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 7 May 2011 08:51:38 +1000 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC478C6.3010801@stoneleaf.us> References: <4DC478C6.3010801@stoneleaf.us> Message-ID: <20110506225138.GA2323@cskk.homeip.net> On 06May2011 15:40, Ethan Furman wrote: | Bruce Leban wrote: | >Is _ just ignored in numbers or are there more complex rules? | > | > * 1_2345_6789 (can I use groups of other sizes instead?) | > * 1_2_3_4_5 (ditto) | > * 1_234_6789 (do all the groups need to be the same size?) | > * 1_ (must the _ only be in between 2 digits?) | > * 1__234 (what about multiple _s?) | > * 9.876_543_210 (can it be used to the right of the decimal point?) | > * 0xFEFF_0042 (can it be used in hex, octal or binary numbers?) | > * int('123_456') (do other functions accept this syntax too?) | | I would say it's ignored. Have the rule be something like | number_string.replace('_',''). | | The only wrinkle is that currently '_1' is usable name, and that | should probably be disallowed if the above change took place. | | I'm +1 on the idea. Personally I'm be for ignoring the _ also, save that I would forbid it at the start or end, so no _1 or 1_. And I would permit it in hex code etc. I'm +0.5, myself. Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ A strong conviction that something must be done is the parent of many bad measures. - Daniel Webster From python at mrabarnett.plus.com Sat May 7 01:41:33 2011 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 07 May 2011 00:41:33 +0100 Subject: [Python-ideas] 1_000_000 In-Reply-To: <20110506225138.GA2323@cskk.homeip.net> References: <4DC478C6.3010801@stoneleaf.us> <20110506225138.GA2323@cskk.homeip.net> Message-ID: <4DC4872D.60004@mrabarnett.plus.com> On 06/05/2011 23:51, Cameron Simpson wrote: > On 06May2011 15:40, Ethan Furman wrote: > | Bruce Leban wrote: > |>Is _ just ignored in numbers or are there more complex rules? > |> > |> * 1_2345_6789 (can I use groups of other sizes instead?) > |> * 1_2_3_4_5 (ditto) > |> * 1_234_6789 (do all the groups need to be the same size?) > |> * 1_ (must the _ only be in between 2 digits?) > |> * 1__234 (what about multiple _s?) > |> * 9.876_543_210 (can it be used to the right of the decimal point?) > |> * 0xFEFF_0042 (can it be used in hex, octal or binary numbers?) > |> * int('123_456') (do other functions accept this syntax too?) > | > | I would say it's ignored. Have the rule be something like > | number_string.replace('_',''). > | > | The only wrinkle is that currently '_1' is usable name, and that > | should probably be disallowed if the above change took place. > | > | I'm +1 on the idea. > > Personally I'm be for ignoring the _ also, save that I would forbid it > at the start or end, so no _1 or 1_. > > And I would permit it in hex code etc. > > I'm +0.5, myself. > As far as I remember, Ada also permits it, but has the rule that it can occur only between digits. If we follow that, then: 1_2345_6789 => Yes 1_2_3_4_5 => Yes 1_234_6789 => Yes 1_ => No _1 => No 1__234 => No 9.876_543_210 => Yes 9._876_543_210 => No 9_.876_543_210 => No 0xFEFF_0042 => Yes int('123_456') => Yes From bruce at leapyear.org Sat May 7 01:44:21 2011 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 6 May 2011 16:44:21 -0700 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC4872D.60004@mrabarnett.plus.com> References: <4DC478C6.3010801@stoneleaf.us> <20110506225138.GA2323@cskk.homeip.net> <4DC4872D.60004@mrabarnett.plus.com> Message-ID: I'm opposed to changing int so that int('123_456') ignores the _ as that will change the behavior of existing code and could break apps. Alternatively, if you want to change int how about int('123_456', separator='_') ignores the _. That would also admit int('123,456', separator=',') --- Bruce * * On Fri, May 6, 2011 at 4:41 PM, MRAB wrote: > On 06/05/2011 23:51, Cameron Simpson wrote: > >> On 06May2011 15:40, Ethan Furman wrote: >> | Bruce Leban wrote: >> |>Is _ just ignored in numbers or are there more complex rules? >> |> >> |> * 1_2345_6789 (can I use groups of other sizes instead?) >> |> * 1_2_3_4_5 (ditto) >> |> * 1_234_6789 (do all the groups need to be the same size?) >> |> * 1_ (must the _ only be in between 2 digits?) >> |> * 1__234 (what about multiple _s?) >> |> * 9.876_543_210 (can it be used to the right of the decimal >> point?) >> |> * 0xFEFF_0042 (can it be used in hex, octal or binary numbers?) >> |> * int('123_456') (do other functions accept this syntax too?) >> | >> | I would say it's ignored. Have the rule be something like >> | number_string.replace('_',''). >> | >> | The only wrinkle is that currently '_1' is usable name, and that >> | should probably be disallowed if the above change took place. >> | >> | I'm +1 on the idea. >> >> Personally I'm be for ignoring the _ also, save that I would forbid it >> at the start or end, so no _1 or 1_. >> >> And I would permit it in hex code etc. >> >> I'm +0.5, myself. >> >> As far as I remember, Ada also permits it, but has the rule that it can > occur only between digits. If we follow that, then: > > 1_2345_6789 => Yes > 1_2_3_4_5 => Yes > 1_234_6789 => Yes > 1_ => No > _1 => No > 1__234 => No > 9.876_543_210 => Yes > 9._876_543_210 => No > 9_.876_543_210 => No > 0xFEFF_0042 => Yes > int('123_456') => Yes > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ironfroggy at gmail.com Sat May 7 01:55:11 2011 From: ironfroggy at gmail.com (Calvin Spealman) Date: Fri, 6 May 2011 19:55:11 -0400 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC478C6.3010801@stoneleaf.us> <20110506225138.GA2323@cskk.homeip.net> <4DC4872D.60004@mrabarnett.plus.com> Message-ID: On Fri, May 6, 2011 at 7:44 PM, Bruce Leban wrote: > I'm opposed to changing int so that int('123_456') ignores the _ as that > will change the behavior of existing code and could break apps. > Alternatively, if you want to change int how about int('123_456', > separator='_') ignores the _. That would also admit int('123,456', > separator=',') > --- Bruce > > > On Fri, May 6, 2011 at 4:41 PM, MRAB wrote: >> >> On 06/05/2011 23:51, Cameron Simpson wrote: >>> >>> On 06May2011 15:40, Ethan Furman ?wrote: >>> | Bruce Leban wrote: >>> |>Is _ just ignored in numbers or are there more complex rules? >>> |> >>> |> ? ? * 1_2345_6789 ?(can I use groups of other sizes instead?) >>> |> ? ? * 1_2_3_4_5 ?(ditto) >>> |> ? ? * 1_234_6789 ?(do all the groups need to be the same size?) >>> |> ? ? * 1_ ? (must the _ only be in between 2 digits?) >>> |> ? ? * 1__234 ? (what about multiple _s?) >>> |> ? ? * 9.876_543_210 ? (can it be used to the right of the decimal >>> point?) >>> |> ? ? * 0xFEFF_0042 ? (can it be used in hex, octal or binary numbers?) >>> |> ? ? * int('123_456') ? (do other functions accept this syntax too?) >>> | >>> | I would say it's ignored. ?Have the rule be something like >>> | number_string.replace('_',''). >>> | >>> | The only wrinkle is that currently '_1' is usable name, and that >>> | should probably be disallowed if the above change took place. >>> | >>> | I'm +1 on the idea. >>> >>> Personally I'm be for ignoring the _ also, save that I would forbid it >>> at the start or end, so no _1 or 1_. >>> >>> And I would permit it in hex code etc. >>> >>> I'm +0.5, myself. >>> >> As far as I remember, Ada also permits it, but has the rule that it can >> occur only between digits. If we follow that, then: >> >> ? ?1_2345_6789 => Yes >> ? ?1_2_3_4_5 => Yes >> ? ?1_234_6789 => Yes >> ? ?1_ => No >> ? ?_1 => No >> ? ?1__234 => No >> ? ?9.876_543_210 => Yes >> ? ?9._876_543_210 => No >> ? ?9_.876_543_210 => No >> ? ?0xFEFF_0042 => Yes >> ? ?int('123_456') => Yes >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > I am +0 on the whole idea, but +0.5 if is not an underscore, which I think is ugly. Would it conflict with any other syntax rules if numbers allowed a space separator? for i in range(1 111 111): foo(i) It looks cleaner and in a fixed-font should be just as obvious about separator placement. -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From greg.ewing at canterbury.ac.nz Sat May 7 01:56:04 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 07 May 2011 11:56:04 +1200 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC4351C.2000109@whoosh.ca> References: <4DC4351C.2000109@whoosh.ca> Message-ID: <4DC48A94.2030608@canterbury.ac.nz> Matt Chaput wrote: > Not sure if this has been proposed before: A syntax change to allow > underscores as thousands separators in literal numbers to improve > readability, It has, but it received a rather lukewarm response last time. An alternative would be to allow spaces. -- Greg From pjenvey at underboss.org Sat May 7 01:59:35 2011 From: pjenvey at underboss.org (Philip Jenvey) Date: Fri, 6 May 2011 16:59:35 -0700 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC4872D.60004@mrabarnett.plus.com> References: <4DC478C6.3010801@stoneleaf.us> <20110506225138.GA2323@cskk.homeip.net> <4DC4872D.60004@mrabarnett.plus.com> Message-ID: On May 6, 2011, at 4:41 PM, MRAB wrote: > On 06/05/2011 23:51, Cameron Simpson wrote: >> On 06May2011 15:40, Ethan Furman wrote: >> | Bruce Leban wrote: >> |>Is _ just ignored in numbers or are there more complex rules? >> |> >> |> * 1_2345_6789 (can I use groups of other sizes instead?) >> |> * 1_2_3_4_5 (ditto) >> |> * 1_234_6789 (do all the groups need to be the same size?) >> |> * 1_ (must the _ only be in between 2 digits?) >> |> * 1__234 (what about multiple _s?) >> |> * 9.876_543_210 (can it be used to the right of the decimal point?) >> |> * 0xFEFF_0042 (can it be used in hex, octal or binary numbers?) >> |> * int('123_456') (do other functions accept this syntax too?) >> | >> | I would say it's ignored. Have the rule be something like >> | number_string.replace('_',''). >> | >> | The only wrinkle is that currently '_1' is usable name, and that >> | should probably be disallowed if the above change took place. >> | >> | I'm +1 on the idea. >> >> Personally I'm be for ignoring the _ also, save that I would forbid it >> at the start or end, so no _1 or 1_. >> >> And I would permit it in hex code etc. >> >> I'm +0.5, myself. >> > As far as I remember, Ada also permits it, but has the rule that it can > occur only between digits. If we follow that, then: > > 1_2345_6789 => Yes > 1_2_3_4_5 => Yes > 1_234_6789 => Yes > 1_ => No > _1 => No > 1__234 => No > 9.876_543_210 => Yes > 9._876_543_210 => No > 9_.876_543_210 => No > 0xFEFF_0042 => Yes > int('123_456') => Yes Java 7 also adds this feature. Its rules: You can place underscores only between digits; you cannot place underscores in the following places: ? At the beginning or end of a number ? Adjacent to a decimal point in a floating point literal ? Prior to an F or L suffix ? In positions where a string of digits is expected The following examples demonstrate valid and invalid underscore placements in numeric literals: float pi1 = 3_.1415F; // Invalid; cannot put underscores adjacent to a decimal point float pi2 = 3._1415F; // Invalid; cannot put underscores adjacent to a decimal point long socialSecurityNumber1 = 999_99_9999_L; // Invalid; cannot put underscores prior to an L suffix int x1 = _52; // This is an identifier, not a numeric literal int x2 = 5_2; // OK (decimal literal) int x3 = 52_; // Invalid; cannot put underscores at the end of a literal int x4 = 5_______2; // OK (decimal literal) int x5 = 0_x52; // Invalid; cannot put underscores in the 0x radix prefix int x6 = 0x_52; // Invalid; cannot put underscores at the beginning of a number int x7 = 0x5_2; // OK (hexadecimal literal) int x8 = 0x52_; // Invalid; cannot put underscores at the end of a number int x9 = 0_52; // OK (octal literal) int x10 = 05_2; // OK (octal literal) int x11 = 052_; // Invalid; cannot put underscores at the end of a number (From http://download.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html ) -- Philip Jenvey From dholth at gmail.com Sat May 7 02:16:21 2011 From: dholth at gmail.com (Daniel Holth) Date: Fri, 6 May 2011 20:16:21 -0400 Subject: [Python-ideas] AttributeError: __exit__ Message-ID: I just learned about Python internals from The ZODB transaction module. In Python < 2.7, the module works as a transaction manager. More or less: manager = Foo() __exit__ = manager.__exit__ __enter__ = manager.__enter__ After Python 2.7, it doesn't work. import transaction with transaction: pass >>> AttributeError: __exit__ It should be obvious to even the most casual observer that the exception is because, after Python 2.7, the with: statement has its own opcode that bypasses transaction.__getattribute__('__exit__') -> transaction.__dict__['__exit__']. Instead, CPython calls special_lookup(), looks for __exit__ on the module type, not the instance, doesn't find it, and raises the AttributeError. Instead, import sys sys.__exit__ >>> AttributeError: 'module' object has no attribute '__exit__' The interpreter should at least explain the AttributeError in the same way as it does when the user triggers it directly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat May 7 02:38:05 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 6 May 2011 17:38:05 -0700 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> <4DC45610.3040803@stoneleaf.us> Message-ID: The point is that the pkg should use __all__ to declare what submodules exist. That's what it was invented for! On May 6, 2011 3:05 PM, "Paul Moore" wrote: > On 6 May 2011 21:52, Eric Snow wrote: >> He's saying that the package would be imported like normal. Then all >> "public" sub-modules of the package would automatically imported and bound >> to the namespace of the object that resulted from the import of the package. > > There is no means of determining what submodules of a package exist. > Check PEP 302 for details - finders find modules ant they can do so > any way they like - there's nothing in the protocol to enumerate > subpackages, so you can't do it (if faced with a general PEP 302 > finder). > > Paul. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat May 7 02:41:33 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 6 May 2011 17:41:33 -0700 Subject: [Python-ideas] AttributeError: __exit__ In-Reply-To: References: Message-ID: Please file a bug. On May 6, 2011 5:17 PM, "Daniel Holth" wrote: > > I just learned about Python internals from The ZODB transaction module. In Python < 2.7, the module works as a transaction manager. More or less: > > manager = Foo() > __exit__ = manager.__exit__ > __enter__ = manager.__enter__ > > After Python 2.7, it doesn't work. > > import transaction > with transaction: pass > >>> AttributeError: __exit__ > > It should be obvious to even the most casual observer that the exception is because, after Python 2.7, the with: statement has its own opcode that bypasses transaction.__getattribute__('__exit__') -> transaction.__dict__['__exit__']. Instead, CPython calls special_lookup(), looks for __exit__ on the module type, not the instance, doesn't find it, and raises the AttributeError. > > Instead, > > import sys > sys.__exit__ > >>> AttributeError: 'module' object has no attribute '__exit__' > > The interpreter should at least explain the AttributeError in the same way as it does when the user triggers it directly. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Sat May 7 02:44:09 2011 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 07 May 2011 10:44:09 +1000 Subject: [Python-ideas] 1 246 358 (was: 1_000_000) References: <4DC4351C.2000109@whoosh.ca> <4DC48A94.2030608@canterbury.ac.nz> Message-ID: <87pqnvmjae.fsf_-_@benfinney.id.au> Greg Ewing writes: > An alternative would be to allow spaces. I would prefer to allow space between digits in a numeric literal. 1 2345 6789 1 2 3 4 5 6789 1 234 6789 1 234 567 89 9.876 543 210 0xFEFF 0042 This nicely parallels the fact that space can separate chunks of a string literal. But that still leaves the following inconsistency: int('1 234 567') That will currently raise a ValueError. Should it continue to do so under this proposal? -- \ ?You say ?Carmina?, and I say ?Burana?, You say ?Fortuna?, and | `\ I say ?cantata?, Carmina, Burana, Fortuna, cantata, Let's Carl | _o__) the whole thing Orff.? ?anonymous | Ben Finney From guido at python.org Sat May 7 02:54:15 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 6 May 2011 17:54:15 -0700 Subject: [Python-ideas] 1 246 358 (was: 1_000_000) In-Reply-To: <87pqnvmjae.fsf_-_@benfinney.id.au> References: <4DC4351C.2000109@whoosh.ca> <4DC48A94.2030608@canterbury.ac.nz> <87pqnvmjae.fsf_-_@benfinney.id.au> Message-ID: Too ambiguous, too hard to parse. I like the _ proposal. On May 6, 2011 5:45 PM, "Ben Finney" wrote: > Greg Ewing writes: > >> An alternative would be to allow spaces. > > I would prefer to allow space between digits in a numeric literal. > > 1 2345 6789 > 1 2 3 4 5 6789 > 1 234 6789 > 1 234 567 89 > 9.876 543 210 > 0xFEFF 0042 > > This nicely parallels the fact that space can separate chunks of a > string literal. > > But that still leaves the following inconsistency: > > int('1 234 567') > > That will currently raise a ValueError. Should it continue to do so > under this proposal? > > -- > \ ?You say ?Carmina?, and I say ?Burana?, You say ?Fortuna?, and | > `\ I say ?cantata?, Carmina, Burana, Fortuna, cantata, Let's Carl | > _o__) the whole thing Orff.? ?anonymous | > Ben Finney > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat May 7 02:55:52 2011 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 07 May 2011 01:55:52 +0100 Subject: [Python-ideas] 1 246 358 In-Reply-To: <87pqnvmjae.fsf_-_@benfinney.id.au> References: <4DC4351C.2000109@whoosh.ca> <4DC48A94.2030608@canterbury.ac.nz> <87pqnvmjae.fsf_-_@benfinney.id.au> Message-ID: <4DC49898.4070900@mrabarnett.plus.com> On 07/05/2011 01:44, Ben Finney wrote: > Greg Ewing writes: > >> An alternative would be to allow spaces. > > I would prefer to allow space between digits in a numeric literal. > > 1 2345 6789 > 1 2 3 4 5 6789 > 1 234 6789 > 1 234 567 89 > 9.876 543 210 > 0xFEFF 0042 > > This nicely parallels the fact that space can separate chunks of a > string literal. > > But that still leaves the following inconsistency: > > int('1 234 567') > > That will currently raise a ValueError. Should it continue to do so > under this proposal? > I prefer there not to be whitespace inside tokens. String literals are an exception, they are explicitly delimited. From steve at pearwood.info Sat May 7 04:00:11 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 07 May 2011 12:00:11 +1000 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> Message-ID: <4DC4A7AB.8000803@pearwood.info> Bruce Leban wrote: > Consider these examples instead: > > - 1_234_000 > - 9.876_543_210 > - 0xFEFF_0042 > > I'm not advocating this change (nor against it); I just think the discussion > should be focused on the actual idea. I do have a question: > > Is _ just ignored in numbers or are there more complex rules? > > - 1_2345_6789 (can I use groups of other sizes instead?) > - 1_2_3_4_5 (ditto) > - 1_234_6789 (do all the groups need to be the same size?) +1 on all of these. I don't particularly like the look of _ as a number separator, but it's hard to think of any alternatives other than space, and some separator is better than long sequences of digits. I'm -0.5 on spaces even though it looks MUCH better, because it's too easy to leave the commas out in lists etc: L = [1, 2, 3, 4 5, 6, 7, 8, 9, 10] # oops, wanted 4 & 5 not 45 (Admittedly if the items where strings, the same failure mode applies.) > - 1_ (must the _ only be in between 2 digits?) > - 1__234 (what about multiple _s?) -1 on allowing either _1 or 1_ as numbers. -0 on allowing doubled underscores. > - 9.876_543_210 (can it be used to the right of the decimal point?) > - 0xFEFF_0042 (can it be used in hex, octal or binary numbers?) +1 on these two. > - int('123_456') (do other functions accept this syntax too?) That's a tricky one... I'd say No, but I'm not entirely sure. It's easy enough to say: int('123_456'.replace('_', '')) albeit a tad verbose. Also easy to say: int('123' '456') which is less verbose. And it will change the behaviour of the int function. So I don't think we need to support separators inside strings. We can always change our mind later and add it in, but it's much harder to take it out later. -- Steven From steve at pearwood.info Sat May 7 04:00:43 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 07 May 2011 12:00:43 +1000 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC47DF0.1020001@stoneleaf.us> References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC478C6.3010801@stoneleaf.us> <4DC47DF0.1020001@stoneleaf.us> Message-ID: <4DC4A7CB.7030100@pearwood.info> Ethan Furman wrote: > I see it as a readability issue -- if you have 1_024 and _1025 (etc, > etc), where one is a number and the other a name, confusion can easily > result. I don't think there will be *that* much confusion though. _1025 can occur on the LHS of an assignment, 1_024 cannot. And we already distinguish between x1234 and 0x1234 without much confusion. -- Steven From guido at python.org Sat May 7 05:45:18 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 6 May 2011 20:45:18 -0700 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC4A7AB.8000803@pearwood.info> References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC4A7AB.8000803@pearwood.info> Message-ID: On Fri, May 6, 2011 at 7:00 PM, Steven D'Aprano wrote: > Bruce Leban wrote: > >> Consider these examples instead: >> >> ? - 1_234_000 >> ? - 9.876_543_210 >> ? - 0xFEFF_0042 >> >> I'm not advocating this change (nor against it); I just think the >> discussion >> should be focused on the actual idea. I do have a question: >> >> Is _ just ignored in numbers or are there more complex rules? >> >> ? - 1_2345_6789 ?(can I use groups of other sizes instead?) >> ? - 1_2_3_4_5 ?(ditto) >> ? - 1_234_6789 ?(do all the groups need to be the same size?) > > +1 on all of these. I don't particularly like the look of _ as a number > separator, but it's hard to think of any alternatives other than space, and > some separator is better than long sequences of digits. > > I'm -0.5 on spaces even though it looks MUCH better, because it's too easy > to leave the commas out in lists etc: > > L = [1, 2, 3, 4 5, 6, 7, 8, 9, 10] ?# oops, wanted 4 & 5 not 45 > > (Admittedly if the items where strings, the same failure mode applies.) And it does sometimes bite. So let's not do more of that. (In retrospect 'xxx' + 'yyy' would have been good enough.) >> ? - 1_ ? (must the _ only be in between 2 digits?) >> ? - 1__234 ? (what about multiple _s?) > > -1 on allowing either _1 or 1_ as numbers. > > -0 on allowing doubled underscores. > > >> ? - 9.876_543_210 ? (can it be used to the right of the decimal point?) >> ? - 0xFEFF_0042 ? (can it be used in hex, octal or binary numbers?) > > +1 on these two. Steven channels me well so far. Fine points about _ in floats: IMO the _ should be allowed to appear between any two digits, or between the last digit and the 'e' in the exponent, or between the 'e' and a following digit. But not adjacent to the '.' or to the '+' or '-' in the exponent. So 3.141_593 yes, 3_.14 no. Fine points about _ in bin/oct/hex literals: 0x_dead_beef yes, 0_xdeadbeef no. (The overall rule seems to be that it must be internal to alphanumeric strings, except that leading 0x, 0o or 0b must not be separated -- somehow I find 0_x_dead_beef would be a disservice to human readers.) >> ? - int('123_456') ? (do other functions accept this syntax too?) > > That's a tricky one... I'd say No, but I'm not entirely sure. It's easy > enough to say: > > int('123_456'.replace('_', '')) > > albeit a tad verbose. Also easy to say: > > int('123' '456') > > which is less verbose. But that's not how it'll be used. The argument will be provided by the user of the code. > And it will change the behaviour of the int function. > So I don't think we need to support separators inside strings. I think it's fine, the same reason why we want to write 1_234_567 in code sometimes applies to input or command line arguments too, and I see little harm. > We can always change our mind later and add it in, but it's much harder to > take it out later. It seems entirely harmless here. Also for float(). It would also be nice to have an easy way to emit _ in suitable places. Maybe this could be added to the .format() language for numbers? It would be nice if you could tell it to emit an _ every N positions. -- --Guido van Rossum (python.org/~guido) From cs at zip.com.au Sat May 7 06:29:11 2011 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 7 May 2011 14:29:11 +1000 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: Message-ID: <20110507042911.GA14472@cskk.homeip.net> On 06May2011 19:55, Calvin Spealman wrote: | I am +0 on the whole idea, but +0.5 if is not an underscore, which I | think is ugly. I think the underscore is one of the better choices: - it is very visible, unlike a dot or comma - it is "low" or "flat", not intruding into the glyph space of the digits, leaving things easy to read - it is already widely used (perl (sorry), Ada (where I first encountered it now that someone ele has mentioned it, etc) i.e. it is a pre-existing idom with successful use | Would it conflict with any other syntax rules if | numbers allowed a space separator? | | for i in range(1 111 111): | foo(i) | | It looks cleaner and in a fixed-font should be just as obvious about | separator placement. I'm very -1 on this one. Like another recent proposal it take a common typing error and turns it into legal syntax. Code that once would fail to compile because the author dropped a comma between values now runs, with silent breakage (the new stuff isn't even the wrong type!) Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ It's there as a sop to former Ada programmers. :-) - Larry Wall regarding 10_000_000 in <11556 at jpl-devvax.JPL.NASA.GOV> From cs at zip.com.au Sat May 7 06:30:09 2011 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 7 May 2011 14:30:09 +1000 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC4872D.60004@mrabarnett.plus.com> References: <4DC4872D.60004@mrabarnett.plus.com> Message-ID: <20110507043009.GA15772@cskk.homeip.net> On 07May2011 00:41, MRAB wrote: | As far as I remember, Ada also permits it, That's where I first encountered it myself. | but has the rule that it can | occur only between digits. If we follow that, then: | | 1_2345_6789 => Yes | 1_2_3_4_5 => Yes | 1_234_6789 => Yes | 1_ => No | _1 => No | 1__234 => No | 9.876_543_210 => Yes | 9._876_543_210 => No | 9_.876_543_210 => No | 0xFEFF_0042 => Yes | int('123_456') => Yes +1 to this. Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ It is impossible to travel faster than light, and certainly not desirable as ones hat keeps blowing off. - Woody Allen From ben+python at benfinney.id.au Sat May 7 07:03:42 2011 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 07 May 2011 15:03:42 +1000 Subject: [Python-ideas] 1 246 358 References: <4DC4351C.2000109@whoosh.ca> <4DC48A94.2030608@canterbury.ac.nz> <87pqnvmjae.fsf_-_@benfinney.id.au> <4DC49898.4070900@mrabarnett.plus.com> Message-ID: <87hb97m79t.fsf@benfinney.id.au> MRAB writes: > On 07/05/2011 01:44, Ben Finney wrote: > > I would prefer to allow space between digits in a numeric literal. [?] > > This nicely parallels the fact that space can separate chunks of a > > string literal. > I prefer there not to be whitespace inside tokens. String literals are > an exception, they are explicitly delimited. That's a good justification for the special case. Okay, I withdraw my proposal. -- \ ?Facts are stubborn things; and whatever may be our wishes, our | `\ inclinations, or the dictates of our passion, they cannot alter | _o__) the state of facts and evidence.? ?John Adams, 1770-12-04 | Ben Finney From lac at openend.se Sat May 7 07:05:37 2011 From: lac at openend.se (Laura Creighton) Date: Sat, 07 May 2011 07:05:37 +0200 Subject: [Python-ideas] 1_000_000 In-Reply-To: Message from MRAB of "Sat, 07 May 2011 00:41:33 BST." <4DC4872D.60004@mrabarnett.plus.com> References: <4DC478C6.3010801@stoneleaf.us> <20110506225138.GA2323@cskk.homeip.net> <4DC4872D.60004@mrabarnett.plus.com> Message-ID: <201105070505.p4755b2E014146@theraft.openend.se> If you disallow variable names of the form _ you will break a huge amount of my automatically generated code. Admittedly, it wouldn't be hard to change things so that the generated variables are now X instead, but that happens to be the way I have written it now. Laura From greg.ewing at canterbury.ac.nz Sat May 7 09:29:47 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 07 May 2011 19:29:47 +1200 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC47D2A.9090808@stoneleaf.us> References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC478C6.3010801@stoneleaf.us> <4DC47D2A.9090808@stoneleaf.us> Message-ID: <4DC4F4EB.9080007@canterbury.ac.nz> Ethan Furman wrote: > So you use _8127 style names for your objects* then? I can easily imagine a code generator producing names like that to reduce the chance of collision with a user's names. -- Greg From greg.ewing at canterbury.ac.nz Sat May 7 09:36:07 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 07 May 2011 19:36:07 +1200 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC478C6.3010801@stoneleaf.us> <4DC47D2A.9090808@stoneleaf.us> Message-ID: <4DC4F667.5000104@canterbury.ac.nz> Fred Drake wrote: > I understand the motivation for a thousands separator, at least (though > I'll admit, I don't find it compelling; *all* big numbers in code are > too magical). Bigness is a relative concept. Avogadro's number is fairly big in absolute terms, but you can hold that many molecules in your hand quite easily. Although writing it as 6_020_000_000_000_000_000_000_000_000 probably wouldn't be very helpful. -- Greg From greg.ewing at canterbury.ac.nz Sat May 7 09:41:43 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 07 May 2011 19:41:43 +1200 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC47DF0.1020001@stoneleaf.us> References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC478C6.3010801@stoneleaf.us> <4DC47DF0.1020001@stoneleaf.us> Message-ID: <4DC4F7B7.3090204@canterbury.ac.nz> Ethan Furman wrote: > I see it as a readability issue -- if you have 1_024 and _1025 (etc, > etc), where one is a number and the other a name, confusion can easily > result. But probably not much worse than the confusion you can get today between 1234e6 and _1234e6, or O000001 and 0000001. There will always be ways of creating confusing-looking code if you put your mind to it. :-) -- Greg From greg.ewing at canterbury.ac.nz Sat May 7 09:46:57 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 07 May 2011 19:46:57 +1200 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC478C6.3010801@stoneleaf.us> <20110506225138.GA2323@cskk.homeip.net> <4DC4872D.60004@mrabarnett.plus.com> Message-ID: <4DC4F8F1.4090904@canterbury.ac.nz> Bruce Leban wrote: > I'm opposed to changing int so that int('123_456') ignores the _ as that > will change the behavior of existing code and could break apps. But int('123_456', 0) should perhaps work? (On the grounds that it parses numbers using the same syntax as Python source.) -- Greg From greg.ewing at canterbury.ac.nz Sat May 7 09:51:35 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 07 May 2011 19:51:35 +1200 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC478C6.3010801@stoneleaf.us> <20110506225138.GA2323@cskk.homeip.net> <4DC4872D.60004@mrabarnett.plus.com> Message-ID: <4DC4FA07.2000506@canterbury.ac.nz> Philip Jenvey wrote: > int x4 = 5_______2; // OK (decimal literal) Hmmm, that one looks really weird -- maybe it should be disallowed as well? -- Greg From steve at pearwood.info Sat May 7 10:18:22 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 07 May 2011 18:18:22 +1000 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC4FA07.2000506@canterbury.ac.nz> References: <4DC478C6.3010801@stoneleaf.us> <20110506225138.GA2323@cskk.homeip.net> <4DC4872D.60004@mrabarnett.plus.com> <4DC4FA07.2000506@canterbury.ac.nz> Message-ID: <4DC5004E.30308@pearwood.info> Greg Ewing wrote: > Philip Jenvey wrote: > >> int x4 = 5_______2; // OK (decimal literal) > > Hmmm, that one looks really weird -- maybe it should be > disallowed as well? I don't think we need disallow it merely over an aesthetic judgement (although it does look weird *grins*). There is precedence with separators in collections: >>> t = (1,,,,2) File "", line 1 t = (1,,,,2) ^ SyntaxError: invalid syntax Like consecutive commas, consecutive underscores are likely to indicate a typo rather than a deliberate decision. So I'm +1 on strictly enforcing a single underscore between digits. -- Steven From greg.ewing at canterbury.ac.nz Sat May 7 10:27:14 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 07 May 2011 20:27:14 +1200 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC5004E.30308@pearwood.info> References: <4DC478C6.3010801@stoneleaf.us> <20110506225138.GA2323@cskk.homeip.net> <4DC4872D.60004@mrabarnett.plus.com> <4DC4FA07.2000506@canterbury.ac.nz> <4DC5004E.30308@pearwood.info> Message-ID: <4DC50262.2080300@canterbury.ac.nz> Steven D'Aprano wrote: > Like consecutive commas, consecutive underscores are likely to indicate > a typo rather than a deliberate decision. Well, yes, that's really the rationale I had in mind. Although it would provide an amusingly funky way of introducing dividing line comments into your code: class A: ... ... ... 0____________________________________0 class B: ... ... ... You could even decorate it with scissors for a bit more panache: 0_____8<0_____8<0_____8<0_____8<0_____0 -- Greg From p.f.moore at gmail.com Sat May 7 10:58:56 2011 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 7 May 2011 09:58:56 +0100 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> <4DC45610.3040803@stoneleaf.us> Message-ID: On 7 May 2011 01:38, Guido van Rossum wrote: > The point is that the pkg should use __all__ to declare what submodules > exist. That's what it was invented for! Hmm, OK. I missed that. But how would that work? p1/__init__.py: __all__ = ['p2', 'foo'] def foo(): print "p1.foo" p1/p2/__init__.py: __all__ = ['foo'] def foo(): print "p1.foo" If I import p1, p1.__all__ shows me that p2 and foo are public. p1.foo exists and I can tell it's not a module. p1.p2 doesn't exist in the p1 namespace at the moment, so how do I tell that I need to import it? Just assume all nonexistent names are subpackages, and import them? That doesn't seem like a very robust approach. A proof of concept in the form of a Python implementation (as a function) would help me understand, I guess. (But I still doubt that even if it's implementable, the feature is much practical use...) Paul. From dirkjan at ochtman.nl Sat May 7 14:16:48 2011 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sat, 7 May 2011 14:16:48 +0200 Subject: [Python-ideas] thoughts on regular expression improvements In-Reply-To: <818.1304713938@parc.com> References: <98999.1304709119@parc.com> <818.1304713938@parc.com> Message-ID: On Fri, May 6, 2011 at 22:32, Bill Janssen wrote: > Ah, you mean the PyPI "regex". ?Looks like it has "branch reset", which > might support my #1? ?Using the same group name multiple times? > > I don't see fuzzy matches, or support for composition, though. I might've been more specific: I think MRAB is working on regex as a playground for new regex-module things (and potentially a replacement for stdlib re), so it might be a good place to implement these kinds of things or discuss them. Cheers, Dirkjan From guido at python.org Sat May 7 16:41:55 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 7 May 2011 07:41:55 -0700 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> <4DC45610.3040803@stoneleaf.us> Message-ID: On Sat, May 7, 2011 at 1:58 AM, Paul Moore wrote: > On 7 May 2011 01:38, Guido van Rossum wrote: >> The point is that the pkg should use __all__ to declare what submodules >> exist. That's what it was invented for! > > Hmm, OK. I missed that. But how would that work? > > p1/__init__.py: > > __all__ = ['p2', 'foo'] > def foo(): print "p1.foo" > > p1/p2/__init__.py: > > __all__ = ['foo'] > def foo(): print "p1.foo" > > If I import p1, p1.__all__ shows me that p2 and foo are public. p1.foo > exists and I can tell it's not a module. p1.p2 doesn't exist in the p1 > namespace at the moment, so how do I tell that I need to import it? > Just assume all nonexistent names are subpackages, and import them? > That doesn't seem like a very robust approach. Do whatever "from pkg import *" does today. Though the recursive application is new. I think (if we do this) it should be recursive. The implementation is straightforward, though the consequences may not be (think cyclic imports). > A proof of concept in the form of a Python implementation (as a > function) would help me understand, I guess. (But I still doubt that > even if it's implementable, the feature is much practical use...) It deviates from "import what you use" for sure. OTOH it is a better alternative to "from pkg import *" because it does not pollute the namespace. I believe Java users are used to this. -- --Guido van Rossum (python.org/~guido) From python at mrabarnett.plus.com Sat May 7 17:32:53 2011 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 07 May 2011 16:32:53 +0100 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC4F8F1.4090904@canterbury.ac.nz> References: <4DC478C6.3010801@stoneleaf.us> <20110506225138.GA2323@cskk.homeip.net> <4DC4872D.60004@mrabarnett.plus.com> <4DC4F8F1.4090904@canterbury.ac.nz> Message-ID: <4DC56625.6010704@mrabarnett.plus.com> On 07/05/2011 08:46, Greg Ewing wrote: > Bruce Leban wrote: >> I'm opposed to changing int so that int('123_456') ignores the _ as >> that will change the behavior of existing code and could break apps. > > But int('123_456', 0) should perhaps work? (On the grounds that > it parses numbers using the same syntax as Python source.) > There's also the argument that if you forbid it then the programmer may have to write: int(string.replace("_", "")) in order to let the user include underscores, which would make it too permissive. If the user entered "_10", the above code would accept it. From g.brandl at gmx.net Sat May 7 18:11:21 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 07 May 2011 18:11:21 +0200 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC50262.2080300@canterbury.ac.nz> References: <4DC478C6.3010801@stoneleaf.us> <20110506225138.GA2323@cskk.homeip.net> <4DC4872D.60004@mrabarnett.plus.com> <4DC4FA07.2000506@canterbury.ac.nz> <4DC5004E.30308@pearwood.info> <4DC50262.2080300@canterbury.ac.nz> Message-ID: On 07.05.2011 10:27, Greg Ewing wrote: > Steven D'Aprano wrote: > >> Like consecutive commas, consecutive underscores are likely to indicate >> a typo rather than a deliberate decision. > > Well, yes, that's really the rationale I had in mind. > > Although it would provide an amusingly funky way of > introducing dividing line comments into your code: > > class A: > ... > ... > ... > > 0____________________________________0 +1__________________________________________________________0! Georg From g.brandl at gmx.net Sat May 7 18:12:06 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 07 May 2011 18:12:06 +0200 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> , <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> Message-ID: On 06.05.2011 21:49, Brendan Moloney wrote: > dag.odenhall at gmail.com wrote: >> I like this idea, except it's inconsistent with from-import-star, the >> latter which does *not* get you sub-packages or modules. > > Georg Brandl [g.brandl at gmx.net] wrote: >> And that's for a reason: it's not easy (I think it's even impossible, >> because for example individual submodules can change __path__) to determine >> all importable submodules of a package. > >> So ``import pkg.*`` would not have any behavior other than ``import pkg``. > > When I said all _public_ sub-packages and modules I was referring to those > listed in the __all__ attribute of 'pkg'. Thus it would behave in the exact > same way as from-import-star except you don't pollute the current namespace. Right -- I forgot about __all__. Georg From dholth at gmail.com Sat May 7 19:15:07 2011 From: dholth at gmail.com (Daniel Holth) Date: Sat, 7 May 2011 13:15:07 -0400 Subject: [Python-ideas] AttributeError: __exit__ In-Reply-To: References: Message-ID: OK. I will reopen the related bug that was immediately closed with a suggestion to check with the python-ideas mailing list. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Sat May 7 19:49:58 2011 From: dholth at gmail.com (Daniel Holth) Date: Sat, 7 May 2011 13:49:58 -0400 Subject: [Python-ideas] proposal: module-level __init__ Message-ID: __all__ is very useful when doing import *, which is frowned upon. As an alternative, allow modules to contain a function called __init__ that defines that module's exported symbols by way of the global statement. By importing modules that are used, but not intended to be exported, inside the __init__ function, programmers avoid cases such as the unintentional 'somemodule.sys' (referring to a module by its non-canonical name) that makes it harder to refactor larger projects. Before: __all__ = ['a', 'b'] import sys def a(): pass def b(): pass def c(): pass After: def __init__(): global a, b import sys def a(): pass def b(): pass def c(): pass __init__() -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Sat May 7 19:56:50 2011 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 07 May 2011 19:56:50 +0200 Subject: [Python-ideas] proposal: module-level __init__ In-Reply-To: References: Message-ID: <4DC587E2.3040301@egenix.com> Daniel Holth wrote: > __all__ is very useful when doing import *, which is frowned upon. As an > alternative, allow modules to contain a function called __init__ that > defines that module's exported symbols by way of the global statement. By > importing modules that are used, but not intended to be exported, inside the > __init__ function, programmers avoid cases such as the unintentional > 'somemodule.sys' (referring to a module by its non-canonical name) that > makes it harder to refactor larger projects. > > Before: > > __all__ = ['a', 'b'] > import sys > def a(): pass > def b(): pass > def c(): pass > > After: > > def __init__(): > global a, b > import sys > def a(): pass > def b(): pass > def c(): pass > > __init__() This is already possible and used in modules where you don't want to clutter up the global namespace. Where's the novelty ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 07 2011) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2011-06-20: EuroPython 2011, Florence, Italy 44 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From fdrake at acm.org Sat May 7 21:26:22 2011 From: fdrake at acm.org (Fred Drake) Date: Sat, 7 May 2011 15:26:22 -0400 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC50262.2080300@canterbury.ac.nz> References: <4DC478C6.3010801@stoneleaf.us> <20110506225138.GA2323@cskk.homeip.net> <4DC4872D.60004@mrabarnett.plus.com> <4DC4FA07.2000506@canterbury.ac.nz> <4DC5004E.30308@pearwood.info> <4DC50262.2080300@canterbury.ac.nz> Message-ID: On Sat, May 7, 2011 at 4:27 AM, Greg Ewing wrote: > You could even decorate it with scissors for a bit > more panache: > > 0_____8<0_____8<0_____8<0_____8<0_____0 Heh. Thanks for the swell tip, Martha Stewart! -Fred -- Fred L. Drake, Jr.? ? "Give me the luxuries of life and I will willingly do without the necessities." ?? --Frank Lloyd Wright From eric at trueblade.com Sat May 7 21:51:36 2011 From: eric at trueblade.com (Eric Smith) Date: Sat, 07 May 2011 15:51:36 -0400 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC4A7AB.8000803@pearwood.info> Message-ID: <4DC5A2C8.5080305@trueblade.com> On 05/06/2011 11:45 PM, Guido van Rossum wrote: > It would also be nice to have an easy way to emit _ in suitable > places. Maybe this could be added to the .format() language for > numbers? It would be nice if you could tell it to emit an _ every N > positions. We already support commas (PEP 378). Adding underscores in the same way would be easy. However, you can't specify N, it's always 3. Eric. From guido at python.org Sat May 7 23:06:12 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 7 May 2011 14:06:12 -0700 Subject: [Python-ideas] 1_000_000 In-Reply-To: <4DC5A2C8.5080305@trueblade.com> References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC4A7AB.8000803@pearwood.info> <4DC5A2C8.5080305@trueblade.com> Message-ID: On Sat, May 7, 2011 at 12:51 PM, Eric Smith wrote: > On 05/06/2011 11:45 PM, Guido van Rossum wrote: > >> It would also be nice to have an easy way to emit _ in suitable >> places. Maybe this could be added to the .format() language for >> numbers? It would be nice if you could tell it to emit an _ every N >> positions. > > We already support commas (PEP 378). Adding underscores in the same way > would be easy. However, you can't specify N, it's always 3. Which would suck for non-decimal formats. :-( Also there seem to be some countries where the conventions for formatting currency uses groupings other than 1000. E.g. http://www.ozgrid.com/forum/showthread.php?t=10226 (though specifying N wouldn't be enough there). -- --Guido van Rossum (python.org/~guido) From jeanpierreda at gmail.com Sun May 8 00:38:47 2011 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 7 May 2011 18:38:47 -0400 Subject: [Python-ideas] 1_000_000 In-Reply-To: References: <4DC4351C.2000109@whoosh.ca> <20110506232407.2bd211a1@pitrou.net> <4DC4A7AB.8000803@pearwood.info> <4DC5A2C8.5080305@trueblade.com> Message-ID: >> On 05/06/2011 11:45 PM, Guido van Rossum wrote: > Which would suck for non-decimal formats. :-( Also there seem to be > some countries where the conventions for formatting currency uses > groupings other than 1000. E.g. > http://www.ozgrid.com/forum/showthread.php?t=10226 (though specifying > N wouldn't be enough there). Wouldn't something like that be the job of locale.currency()? Devin Jeanpierre From jeanpierreda at gmail.com Sun May 8 02:57:29 2011 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 7 May 2011 20:57:29 -0400 Subject: [Python-ideas] Rename python.exe to python3.exe on Windows Message-ID: Hello, On most *nix systems, Python 3.x is available as the python3 executable, and Python 2.x as the 'python' executable. This lets both exist side-by-side and be usable from the command-line. The alternative (used by Arch), is to name Python 2.x 'python2', and 3.x 'python'. The Windows distribution of Python does neither, it names them both 'python.exe', meaning that you can't install and use both at once. Moreover, if you install Python 2.7 and then Python 3.2, the default handler for .py files is set to Python 3.2, and changing it to 2.7 is difficult because of a quirk in Eexplorer that forces you to choose between two non-distinguishable "python.exe"s. This is made much more difficult if in fact you installed five or so different Python versions. Also any automated tests using something like Cram that use python3 will not work, and any batch scripts that use python.exe will work differently depending on the host system. (It wouldn't be awful to get python-X.Y.exe executables, either). The downside of this is that any code that tries to use C:\Python3Y\python.exe breaks. Such code is probably broken anyway, there are multiple Ys around, and Python can be installed in My Documents or wherever. PEP 397 should relieve the issues with opening .py files, making some of this unnecessary with that change, as well. I'm guessing that it would also be appropriate to rename pythonw.exe to python3w.exe. I doubt that particular change matters at all, it's solely to do with opening .pyw files, and that should be handled by PEP 397. I'd appreciate any thoughts or comments you might have. Thanks for your time, Devin Jeanpierre From ben+python at benfinney.id.au Sun May 8 03:21:52 2011 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 08 May 2011 11:21:52 +1000 Subject: [Python-ideas] Rename python.exe to python3.exe on Windows References: Message-ID: <87zkmykmvj.fsf@benfinney.id.au> Devin Jeanpierre writes: > On most *nix systems, Python 3.x is available as the python3 > executable, and Python 2.x as the 'python' executable. This lets both > exist side-by-side and be usable from the command-line. More importantly, it ensures that programs written for older Python 2.x will continue to run with the default ?python?. If the default ?python? were Python 3.x, programs expecting Python 2.x would most likely break due to backward incompatibility. So it's best if the ?python? program invokes only Python 2.x. -- \ ?To label any subject unsuitable for comedy is to admit | `\ defeat.? ?Peter Sellers | _o__) | Ben Finney From steve at pearwood.info Sun May 8 04:28:07 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 08 May 2011 12:28:07 +1000 Subject: [Python-ideas] Rename python.exe to python3.exe on Windows In-Reply-To: <87zkmykmvj.fsf@benfinney.id.au> References: <87zkmykmvj.fsf@benfinney.id.au> Message-ID: <4DC5FFB7.6050605@pearwood.info> Ben Finney wrote: > If the default ?python? were Python 3.x, programs expecting Python 2.x > would most likely break due to backward incompatibility. So it's best if > the ?python? program invokes only Python 2.x. The first sentence is true. The second is a value judgement, not a statement of fact, and the people behind Arch Linux disagree with you. http://www.archlinux.org/news/python-is-now-python-3/ I say, good on 'em. I wish I could find the quote somebody made about Arch being the distro that makes Gentoo seem cautious and conservative... something about Arch moving forward so the Gentoo folks know which mistakes not to make? -- Steven From stephen at xemacs.org Mon May 9 12:39:17 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 09 May 2011 19:39:17 +0900 Subject: [Python-ideas] Rename python.exe to python3.exe on Windows In-Reply-To: <4DC5FFB7.6050605@pearwood.info> References: <87zkmykmvj.fsf@benfinney.id.au> <4DC5FFB7.6050605@pearwood.info> Message-ID: <87pqnsdup6.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > I wish I could find the quote somebody made about Arch being the distro > that makes Gentoo seem cautious and conservative... something about Arch > moving forward so the Gentoo folks know which mistakes not to make? The only thing history teaches us is that nobody learns from others' history: $ python Python 3.1.3 (r313:86834, Feb 22 2011, 18:52:21) [GCC 4.3.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> $ There are a couple of ebuilds that break because of this. From ncoghlan at gmail.com Mon May 9 16:04:16 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 10 May 2011 00:04:16 +1000 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> <4DC45610.3040803@stoneleaf.us> Message-ID: On Sat, May 7, 2011 at 6:52 AM, Eric Snow wrote: > If you have a list of the submodules you want imported then you can already > accomplish this: > import parent > for mod in parent.__all_submodules__: > ? ? __import__("parent.{}".format(mod)) > Of course, this does not bind the submodules to the namespace of the package > module It actually does, as binding the submodule name in the parent package namespace is part of the responsibility of __import__(): >>> import logging >>> logging.handlers Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'handlers' >>> __import__("logging.handlers") >>> logging.handlers This is one of the reasons circular imports are such a pain - we pre-bind them in sys.modules, and remove them again if the import fails, but we don't currently do that in the parent package namespace, so circular imports sometimes work and sometime break depending not only on which names are accessed but also *how* they're accessed (e.g. in a/b/c.py, "import a.b.c" will work, "import a.b.c; c = a.b.c" will fail with AttributeError and "from a.b import c" will fail with ImportError). >?I am not sure > of the specific import mechanism with regards to name binding, but that > would seem to be a conflict with the way imported names for submodules are > bound. Nope, it's basically the same as what happens automatically when the modules are imported normally. Indeed, as near as I can tell, this request amounts to asking for syntactic sugar that does something roughly along the lines of: def _subnames(pkg_name, subnames): for subname in subnames: yield ".".join(pkg_name, subname) def import_all(pkg): try: pkg_all = pkg.__all__ except AttributeError: pass else: names = list(_subnames(pkg.__name__, pkg_all)) for name in names: mod = importlib.import_module(name) try: mod_all = mod.__all__ except AttributeError: pass else: names.extend(_subnames(mod.__name__, mod_all) I can see a case being made to provide that as a function in pkgutil (or perhaps importlib itself), but I don't see any reason to give it dedicated syntax. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From grosser.meister.morti at gmx.net Mon May 9 18:43:13 2011 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Mon, 09 May 2011 18:43:13 +0200 Subject: [Python-ideas] Rename python.exe to python3.exe on Windows In-Reply-To: <87pqnsdup6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87zkmykmvj.fsf@benfinney.id.au> <4DC5FFB7.6050605@pearwood.info> <87pqnsdup6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4DC819A1.3000508@gmx.net> I would say in every Python installation there should be a binary with the version number attached. I think in most (all?) Linux distributions this is already the case. E.g. there is python2.7 and python3.2. There is also python2, that links to some python2.x, and python3 that links to some python3.x, and then there is python, that links to any of the above. Under Linux/Mac OS X we already add a line like this to our scripts: #!/usr/bin/env python Or better: #!/usr/bin/env python3 I say it should be documented that the first is deprecated and the latter form shall be used. Using "#!/usr/bin/env python" should mean "this script is written so that it can be run in *any* python version", which is pretty unrealistic. "#!/usr/bin/env python3" should mean "this script is written so that it can be run in any python 3.x version" and so on. Of course there are scripts that do not use this right. They should be considered as broken and be fixed. (Maybe print deprecation warnings if possible?) Now on Windows there is no #! mechanism. I think it would be worthwhile to fix this and implement a python-dispatcher for Windows. This would then parse the #!-line, drop the "/usr/bin/env" part (if it exists) and lookup the right Python binary form a registry variable. I don't know if there are any registry variables set in a Windows Python installation that let you find the binary of a certain version, but I think it would be a good thing. This way correct scripts would just work under Unix (Linux, Mac, BSD) and Windows. And under Windows you would not have any problems with file type associations. *.py and *.pyw files just have to be associated with the dispatcher. It should not matter if the dispatcher is from a Python 2.x or Python 3.x installation. -panzi On 05/09/2011 12:39 PM, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > I wish I could find the quote somebody made about Arch being the distro > > that makes Gentoo seem cautious and conservative... something about Arch > > moving forward so the Gentoo folks know which mistakes not to make? > > The only thing history teaches us is that nobody learns from > others' history: > > $ python > Python 3.1.3 (r313:86834, Feb 22 2011, 18:52:21) > [GCC 4.3.5] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> > $ > > There are a couple of ebuilds that break because of this. From ncoghlan at gmail.com Mon May 9 18:55:32 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 10 May 2011 02:55:32 +1000 Subject: [Python-ideas] Rename python.exe to python3.exe on Windows In-Reply-To: <4DC819A1.3000508@gmx.net> References: <87zkmykmvj.fsf@benfinney.id.au> <4DC5FFB7.6050605@pearwood.info> <87pqnsdup6.fsf@uwakimon.sk.tsukuba.ac.jp> <4DC819A1.3000508@gmx.net> Message-ID: On Tue, May 10, 2011 at 2:43 AM, Mathias Panzenb?ck wrote: > Now on Windows there is no #! mechanism. I think it would be worthwhile to > fix this and implement a python-dispatcher for Windows. This would then > parse the #!-line, drop the "/usr/bin/env" part (if it exists) and lookup > the right Python binary form a registry variable. I don't know if there are > any registry variables set in a Windows Python installation that let you > find the binary of a certain version, but I think it would be a good thing. > > This way correct scripts would just work under Unix (Linux, Mac, BSD) and > Windows. And under Windows you would not have any problems with file type > associations. *.py and *.pyw files just have to be associated with the > dispatcher. It should not matter if the dispatcher is from a Python 2.x or > Python 3.x installation. Since this came up not all that long ago, I'll point people to PEP 394 (for the current draft recommendation regarding symlinks on *nix systems) and PEP 397 (for proposed Windows launcher semantics). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From guido at python.org Mon May 9 19:02:59 2011 From: guido at python.org (Guido van Rossum) Date: Mon, 9 May 2011 10:02:59 -0700 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> <4DC45610.3040803@stoneleaf.us> Message-ID: On Mon, May 9, 2011 at 7:04 AM, Nick Coghlan wrote: > This is one of the reasons circular imports are such a pain - we > pre-bind them in sys.modules, and remove them again if the import > fails, but we don't currently do that in the parent package namespace, > so circular imports sometimes work and sometime break depending not > only on which names are accessed but also *how* they're accessed (e.g. > in a/b/c.py, "import a.b.c" will work, "import a.b.c; c = a.b.c" will > fail with AttributeError and "from a.b import c" will fail with > ImportError). Maybe that's something we could strive to fix? > I can see a case being made to provide that as a function in pkgutil > (or perhaps importlib itself), but I don't see any reason to give it > dedicated syntax. +1 -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Mon May 9 19:16:52 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 10 May 2011 03:16:52 +1000 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> <4DC45610.3040803@stoneleaf.us> Message-ID: On Tue, May 10, 2011 at 3:02 AM, Guido van Rossum wrote: > On Mon, May 9, 2011 at 7:04 AM, Nick Coghlan wrote: >> This is one of the reasons circular imports are such a pain - we >> pre-bind them in sys.modules, and remove them again if the import >> fails, but we don't currently do that in the parent package namespace, >> so circular imports sometimes work and sometime break depending not >> only on which names are accessed but also *how* they're accessed (e.g. >> in a/b/c.py, "import a.b.c" will work, "import a.b.c; c = a.b.c" will >> fail with AttributeError and "from a.b import c" will fail with >> ImportError). > > Maybe that's something we could strive to fix? The relevant bug is still open: http://bugs.python.org/issue992389 My recollection is that the division of responsibility between the core import code and PEP 302 loaders gets a little confused on this point (although I don't recall if that's a real confusion or just an artefact of the structure of the legacy import code). It will hopefully be a little easier to fix once importlib takes over from import.c and the pre-PEP 302 legacy stuff goes away. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ericsnowcurrently at gmail.com Mon May 9 20:55:20 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 9 May 2011 12:55:20 -0600 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> <4DC45610.3040803@stoneleaf.us> Message-ID: On Mon, May 9, 2011 at 8:04 AM, Nick Coghlan wrote: > On Sat, May 7, 2011 at 6:52 AM, Eric Snow > wrote: > > > If you have a list of the submodules you want imported then you can > already > > accomplish this: > > import parent > > for mod in parent.__all_submodules__: > > __import__("parent.{}".format(mod)) > > > Of course, this does not bind the submodules to the namespace of the > package > > module > > It actually does, as binding the submodule name in the parent package > namespace is part of the responsibility of __import__(): > > >>> import logging > >>> logging.handlers > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'module' object has no attribute 'handlers' > >>> __import__("logging.handlers") > > >>> logging.handlers > > > Well, dang it. Not sure how I missed this before: $ python3 >>> import temp >>> dir(temp) ['__builtins__', '__cached__', '__doc__', '__file__', '__name__', '__package__', '__path__'] $ python3 >>> import temp.mod >>> dir(temp) ['__builtins__', '__cached__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'mod'] So the sub-module name binding mechanism is simply to bind the package module and then bind the submodules to it. However, "import temp.mod as something_else" and "from temp import mod" don't do this, which makes sense. This is one of the reasons circular imports are such a pain - we > pre-bind them in sys.modules, and remove them again if the import > fails, but we don't currently do that in the parent package namespace, > so circular imports sometimes work and sometime break depending not > only on which names are accessed but also *how* they're accessed (e.g. > in a/b/c.py, "import a.b.c" will work, "import a.b.c; c = a.b.c" will > fail with AttributeError and "from a.b import c" will fail with > ImportError). > > > I am not sure > > of the specific import mechanism with regards to name binding, but that > > would seem to be a conflict with the way imported names for submodules > are > > bound. > > Nope, it's basically the same as what happens automatically when the > modules are imported normally. Indeed, as near as I can tell, this > request amounts to asking for syntactic sugar that does something > roughly along the lines of: > > def _subnames(pkg_name, subnames): > for subname in subnames: > yield ".".join(pkg_name, subname) > > def import_all(pkg): > try: > pkg_all = pkg.__all__ > except AttributeError: > pass > else: > names = list(_subnames(pkg.__name__, pkg_all)) > for name in names: > mod = importlib.import_module(name) > try: > mod_all = mod.__all__ > except AttributeError: > pass > else: > names.extend(_subnames(mod.__name__, mod_all) > > This works as long as __all__ only contains submodule names, right? > I can see a case being made to provide that as a function in pkgutil > (or perhaps importlib itself), but I don't see any reason to give it > dedicated syntax. > > +1 -eric > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwm at mired.org Tue May 10 16:47:54 2011 From: mwm at mired.org (Mike Meyer) Date: Tue, 10 May 2011 10:47:54 -0400 Subject: [Python-ideas] Minor tweak to PEP 8? Message-ID: <20110510104754.4689cc5e@bhuda.mired.org> PEP eight has an interesting omission in the "Code Layout" section. It doesn't say how to indent continuation lines when code is wrapped to comply with the line length limits. It has examples, but no textual guides. Which means you can do a rock-stupid word warp (with no indentation on the continuation lines), point at the resulting mess, and say "See? If we follow this part of the PEP, we get really ugly code!". Mail doing just that is what prompted this suggestion. I therefore propose adding a sentence or two to this section, something along the lines of: The continuation line(s) should be indented to reflect the structure of the statement being continued. This should be at least one space beyond the first open parenthesis that is not closed on the continued line, if present. Nothing hard and fast, just a requirement to use good sense and the minimal indent resulting from doing so. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From mikegraham at gmail.com Tue May 10 17:51:33 2011 From: mikegraham at gmail.com (Mike Graham) Date: Tue, 10 May 2011 11:51:33 -0400 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: <20110510104754.4689cc5e@bhuda.mired.org> References: <20110510104754.4689cc5e@bhuda.mired.org> Message-ID: On Tue, May 10, 2011 at 10:47 AM, Mike Meyer wrote: > PEP eight has an interesting omission in the "Code Layout" section. It > doesn't say how to indent continuation lines when code is wrapped to > comply with the line length limits. It has examples, but no textual > guides. Which means you can do a rock-stupid word warp (with no > indentation on the continuation lines), point at the resulting mess, > and say "See? If we follow this part of the PEP, we get really ugly > code!". Mail doing just that is what prompted this suggestion. > > I therefore propose adding a sentence or two to this section, > something along the lines of: > > ? ?The continuation line(s) should be indented to reflect the > ? ?structure of the statement being continued. This should be at > ? ?least one space beyond the first open parenthesis that is not > ? ?closed on the continued line, if present. > > Nothing hard and fast, just a requirement to use good sense and the > minimal indent resulting from doing so. > > ? ? Message-ID: <87y62ejl2j.fsf@benfinney.id.au> Mike Graham writes: > For this actual rule, I am -1, as I think this is too limiting. And often results in hideous code :-) I'm ?1 also. Please don't make the indentation of continuation lines dependent on the content of the opening line. > Sometimes the indentation is too far and the best style is > > self.other_thing.some_long_method_name( > foo, > barMightBeSortOfLongNaturally, > baz........ I assume you meant a four-column (not three-column) additional indent. +1 if so, this matches the indentation style I advocate for continuation lines. -- \ ?I believe in making the world safe for our children, but not | `\ our children's children, because I don't think children should | _o__) be having sex.? ?Jack Handey | Ben Finney From sklass at pointcircle.com Wed May 11 04:59:16 2011 From: sklass at pointcircle.com (Steven Klass) Date: Tue, 10 May 2011 19:59:16 -0700 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: <87y62ejl2j.fsf@benfinney.id.au> References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> Message-ID: +1 for this method self.some_insane_long_method_which_should_have_originator_shot( True, None, keyword = foobar) :-) On Tue, May 10, 2011 at 2:35 PM, Ben Finney wrote: > Mike Graham writes: > > > For this actual rule, I am -1, as I think this is too limiting. > > And often results in hideous code :-) > > I'm ?1 also. Please don't make the indentation of continuation lines > dependent on the content of the opening line. > > > Sometimes the indentation is too far and the best style is > > > > self.other_thing.some_long_method_name( > > foo, > > barMightBeSortOfLongNaturally, > > baz........ > > I assume you meant a four-column (not three-column) additional indent. > > +1 if so, this matches the indentation style I advocate for continuation > lines. > > -- > \ ?I believe in making the world safe for our children, but not | > `\ our children's children, because I don't think children should | > _o__) be having sex.? ?Jack Handey | > Ben Finney > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Steven M. Klass ? 1 (480) 225-1112 ? sklass at pointcircle.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmjohnson.mailinglist at gmail.com Wed May 11 05:19:29 2011 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Tue, 10 May 2011 17:19:29 -1000 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> Message-ID: Can we all at least agree that continuation lines should always be at least one space more indented than the parent line? So, for example, this would be right out: for item in items: modified_item = self.frobincation_with_spengulizer( item, True, False, spam=None) The arguments should at least line up with the o in modified, if not the f. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Wed May 11 05:50:42 2011 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 11 May 2011 13:50:42 +1000 Subject: [Python-ideas] Minor tweak to PEP 8? References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> Message-ID: <87tyd1ki99.fsf@benfinney.id.au> "Carl M. Johnson" writes: > Can we all at least agree that continuation lines should always be at > least one space more indented than the parent line? At least one standard (four-column) indentation level further than the opening line. If you think you need to break that so your multi-line string will have the right content, think again: use the ?textwrap.dedent? function . -- \ ?Drop your trousers here for best results.? ?dry cleaner, | `\ Bangkok | _o__) | Ben Finney From mwm at mired.org Wed May 11 08:00:22 2011 From: mwm at mired.org (Mike Meyer) Date: Wed, 11 May 2011 02:00:22 -0400 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: <87tyd1ki99.fsf@benfinney.id.au> References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> <87tyd1ki99.fsf@benfinney.id.au> Message-ID: <20110511020022.15d26754@bhuda.mired.org> On Wed, 11 May 2011 13:50:42 +1000 Ben Finney wrote: > "Carl M. Johnson" > writes: > > > Can we all at least agree that continuation lines should always be at > > least one space more indented than the parent line? > > At least one standard (four-column) indentation level further than the > opening line. Still overly strict. Consider: f(long_named_argument_one, calculated_value_two(with_arguments), another_argument) The two-space indent is perfectly reasonable here, as it aligns the first element (a function argument) with the same function's argument above it. In some cases, a similar one-space indent is also reasonable. I stand by my second proposal (reworded): Continuation lines should be indented to reflect the structure of the code. The indentation should either align with similar elements or match the surrounding source. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From ben+python at benfinney.id.au Wed May 11 10:05:12 2011 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 11 May 2011 18:05:12 +1000 Subject: [Python-ideas] Minor tweak to PEP 8? References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> <87tyd1ki99.fsf@benfinney.id.au> <20110511020022.15d26754@bhuda.mired.org> Message-ID: <87liydk6h3.fsf@benfinney.id.au> Mike Meyer writes: > On Wed, 11 May 2011 13:50:42 +1000 > Ben Finney wrote: > > At least one standard (four-column) indentation level further than the > > opening line. > > Still overly strict. Consider: > > f(long_named_argument_one, calculated_value_two(with_arguments), > another_argument) > > The two-space indent is perfectly reasonable here Maybe so; I'm not saying it's unreasonable. I'm saying it's *more* reasonable to not have the indentation level depend on the opening line. This generally involves breaking the opening line at a bracketing token, such as ?"""?, ?(?, ?[?, etc., as Carl's suggestion showed, so there's no parameter on that line for lining up. Also, that function needs to be renamed to something more descriptive :-) -- \ ?Kissing a smoker is like licking an ashtray.? ?anonymous | `\ | _o__) | Ben Finney From palla74 at gmail.com Wed May 11 11:10:54 2011 From: palla74 at gmail.com (Palla) Date: Wed, 11 May 2011 11:10:54 +0200 Subject: [Python-ideas] EuroPython: Early Bird will end in 2 days! Message-ID: Hi all, If you plan to attend, you could save quite a bit on registration fees! Buy your ticket now! http://ep2011.europython.eu/registration/ The end of Early bird is on May 12th, Friday, 23:59:59 CEST. We'd like to ask to you to forward this post to anyone that you feel may be interested. We have an amazing lineup of tutorials, events and talks. We have some excellent keynote speakers and a very complete partner program... but early bird registration ends in 2 days! Right now, you still get discounts on talks and tutorials so if you plan to attend Register Now: http://ep2011.europython.eu/registration/ While you are booking, remember to have a look at the partner program and our offer for a prepaid, data+voice+tethering SIM. All the best, -- ->PALLA From ncoghlan at gmail.com Wed May 11 14:21:35 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 11 May 2011 22:21:35 +1000 Subject: [Python-ideas] Allow 'import star' with namespaces In-Reply-To: References: <5E25C96030E66B44B9CFAA95D3DE5919351310A7AE@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7AF@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B2@EX-MB08.ohsu.edu> <5E25C96030E66B44B9CFAA95D3DE5919351310A7B4@EX-MB08.ohsu.edu> <4DC45610.3040803@stoneleaf.us> Message-ID: On Tue, May 10, 2011 at 4:55 AM, Eric Snow wrote: > So the sub-module name binding mechanism is simply to bind the package > module and then bind the submodules to it. ?However, "import temp.mod as > something_else" and "from temp import mod" don't do this, which makes sense. Not quite - both of the latter options change the name binding behaviour in the *current* module, but temp.mod will be set to the imported module regardless. It's part of the import process, whereas the namebinding in the current module happens later (the underlying complexity of all this is why importlib.import_module() was added to replace direct invocation of __import__(). The latter has quite a weird signature in order to support the various incarnations of the import statement). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From p.f.moore at gmail.com Wed May 11 15:27:54 2011 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 11 May 2011 14:27:54 +0100 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> Message-ID: On 11 May 2011 04:19, Carl M. Johnson wrote: > Can we all at least agree that continuation lines should always be at least > one space more indented than the parent line? Like it or not, some_string = """\ Text to be used for something incredibly exciting!""" is not uncommon. I know about textwrap.dedent, but having to use a Python function call to code a literal has always made me uncomfortable. I'm not saying it's right or wrong, just that there are reasonable arguments why it might be reasonable. What's wrong with just saying that continuation lines should be formatted as appropriate to ensure readability, and leave it at that? I know people have various standards of readability, but I'm willing to assume that PEP 8 is targeted at people with some level of common sense (anyone who is arguing "letter of the law" over something daft like the example that started the thread is clearly trolling and could find loopholes in anything, so why bother trying to convince them?) Paul. From guido at python.org Wed May 11 16:23:33 2011 From: guido at python.org (Guido van Rossum) Date: Wed, 11 May 2011 07:23:33 -0700 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> Message-ID: At Google we use the following rule (from http://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Indentation): Yes: # Aligned with opening delimiter foo = long_function_name(var_one, var_two, var_three, var_four) # 4-space hanging indent; nothing on first line foo = long_function_name( var_one, var_two, var_three, var_four) No: # Stuff on first line forbidden foo = long_function_name(var_one, var_two, var_three, var_four) # 2-space hanging indent forbidden foo = long_function_name( var_one, var_two, var_three, var_four) I propose we somehow incorporate these two allowed alternatives into PEP 8. They both serve a purpose. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed May 11 16:54:32 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 12 May 2011 00:54:32 +1000 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> Message-ID: <4DCAA328.3050105@pearwood.info> Paul Moore wrote: > What's wrong with just saying that continuation lines should be > formatted as appropriate to ensure readability, and leave it at that? +1 I think that specifying exactly how to indent continuation lines, or even whether or not to indent them, is way too controlling for my tastes. I don't believe it makes that much difference. Like the brace wars, if there actually was any objective, meaningful, consistent benefit of one style over the others, there would be no argument about it. Instead, it's all subjective, vague, and far from consistent. -- Steven From mwm at mired.org Wed May 11 18:24:38 2011 From: mwm at mired.org (Mike Meyer) Date: Wed, 11 May 2011 12:24:38 -0400 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: <4DCAA328.3050105@pearwood.info> References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> <4DCAA328.3050105@pearwood.info> Message-ID: <20110511122438.2ad526b9@bhuda.mired.org> On Thu, 12 May 2011 00:54:32 +1000 Steven D'Aprano wrote: > Paul Moore wrote: > > What's wrong with just saying that continuation lines should be > > formatted as appropriate to ensure readability, and leave it at that? > +1 -1. This sentiment is adequately expressed by the "A Foolish Consistency ..." section. It shouldn't need repeating. > I think that specifying exactly how to indent continuation lines, or > even whether or not to indent them, is way too controlling for my > tastes. I don't believe it makes that much difference. Like the brace > wars, if there actually was any objective, meaningful, consistent > benefit of one style over the others, there would be no argument about > it. Instead, it's all subjective, vague, and far from consistent. If you don't believe it makes much different "whether or not to indent them", I suggest you align all continuation lines on the left hand side of the page in code you have to maintain and then report back to us. As for there being no benefit for one choice over another - that's true about almost everything in the PEP (four space indent instead of tabs? 80 character or 79 characters lines? spaces around = with exceptions? No spaces before/after "." and after open parens or before close parens? etc.). The goal is consistency. The important thing isn't so much what we choose as that we choose something so it'll be consistent when it doesn't make any difference. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From steve at pearwood.info Wed May 11 18:55:08 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 12 May 2011 02:55:08 +1000 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: <20110511122438.2ad526b9@bhuda.mired.org> References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> <4DCAA328.3050105@pearwood.info> <20110511122438.2ad526b9@bhuda.mired.org> Message-ID: <4DCABF6C.1020000@pearwood.info> Mike Meyer wrote: >> I think that specifying exactly how to indent continuation lines, or >> even whether or not to indent them, is way too controlling for my >> tastes. I don't believe it makes that much difference. Like the brace >> wars, if there actually was any objective, meaningful, consistent >> benefit of one style over the others, there would be no argument about >> it. Instead, it's all subjective, vague, and far from consistent. > > If you don't believe it makes much different "whether or not to indent > them", I suggest you align all continuation lines on the left hand > side of the page in code you have to maintain and then report back to > us. In general, that would be an *outdent*, rather than not indenting. As a matter of fact, there is at least one situation where I don't indent continuation lines: if condition: do_something("some long piece of text, most likely" " but not always an error message, which uses implicit" " concatenation over multiple lines blah blah blah blah" % spam) # and it's perfectly maintainable, thanks for asking. The fact that I have bare strings (with a leading space) and/or a binary operator is more than enough clue that the lines form a block. Indenting would be superfluous, and counter-productive, as it would reduce the space available on each line. -- Steven From mat at matlehmann.de Wed May 11 18:44:04 2011 From: mat at matlehmann.de (Matthias Lehmann) Date: Wed, 11 May 2011 18:44:04 +0200 Subject: [Python-ideas] triple-quoted strings and indendation Message-ID: Hi all, two times in one day I read about the problems of triple-quoted strings and indendation (one time on stackoverflow, one time one this list). Python is well known for its readability and its use of idendation to this end. But with triple-quoted strings, nice indendation is not possible without the need to post-process the resulting string. Problem ======= Most often, the desired result of >>> some_string = """Hello ... World.""" is simply Hello World instead of Hello World. Idea ===== What about the idea, to use a string-flag to indicate, that the triple-quoted string is to be trimmed. Like: >>> some_string = t"""Hello ... World.""" This would blend in with the 'u' and 'r' flags that already exist. The triple-quoted string is trimmed to remove all whitespace up to the column where the first line of the string started OR all common whitespace of the subsequent lines, if the subsequent lines start on a column before the first line. The second rule makes it possible to also write: >>> some_string = t"""Hello ... World.""" Pros ===== The advantages above textwrap.dedent are: 1) textwrap.dedent only removes whitespace common to ALL lines, so to achieve the desired result, one has to add an additional newline >>> some_string = """ ... Hello ... World.""" >>> result = textwrap.dedent(some_string)[1:] 2) Also, it does not work, if one actually does want some common whitespace before all lines: >>> some_string = """ ... Hello ... Wold.""" >>> result = textwrap.dedent(some_string)[1:] gives again Hello World which is not, what I wanted. But >>> some_string = t""" Hello ... World.""" would give Hello World. 3) And finally to quote a post from earlier today "I know about textwrap.dedent, but having to use a Python function call to code a literal has always made me uncomfortable." Problems ========= Common indendation style for triple-quoted string (as far as I know) is >>> foo = """blubber ... bla""" (align to first quote-char) but with this auto-trimming, it would look better to use >>> foo = t"""blubber ... bar""" (align to first char after triple quotes) The other stlye would still work, though - as long as one does not want to preserve leading whitespace. Maybe the t flag could also cause a leading and trailing newline to be removed, so that >>> foo = t""" ... Hallo ... World. ... """ would also result in Hello World. Maybe something like this has been proposed before - please be kind, if it is an old hat. Mat From phd at phdru.name Wed May 11 23:27:48 2011 From: phd at phdru.name (Oleg Broytman) Date: Thu, 12 May 2011 01:27:48 +0400 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: Message-ID: <20110511212748.GA20130@iskra.aviel.ru> On Wed, May 11, 2011 at 06:44:04PM +0200, Matthias Lehmann wrote: > What about the idea, to use a string-flag to indicate, that the > triple-quoted string is to be trimmed. Like: PEP 295 http://www.python.org/dev/peps/pep-0295/ was rejected in 2002. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From steve at pearwood.info Wed May 11 23:44:44 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 12 May 2011 07:44:44 +1000 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> Message-ID: <4DCB034C.8010508@pearwood.info> Guido van Rossum wrote: > At Google we use the following rule (from > http://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Indentation): > > Yes: # Aligned with opening delimiter > foo = long_function_name(var_one, var_two, > var_three, var_four) I cringe whenever I see that. If people are going to bother lining things up other than at 4-space indents, they should at least line them up in a visually attractive place. The delimiter should surround the arguments, not line up with them: foo = long_function_name(var_one, var_two, var_three, var_four) although the effect may be spoiled if you're reading this in a non-monospaced font. This is analogous to the way that professional typesetters use handing punctuation: http://desktoppub.about.com/od/typelayout/ss/hangingquotes.htm "Li Europan lingues es membres del sam familie. Lor separat existentie es un myth. Por scientie, musica, sport etc, litot Europa usa li sam vocabular. Li lingues differe solmen in li grammatica, li pronunciation e li plu commun vocabules." compared to: "Li Europan lingues es membres del sam familie. Lor separat existentie es un myth. Por scientie, musica, sport etc, litot Europa usa li sam vocabular. Li lingues differe solmen in li grammatica, li pronunciation e li plu commun vocabules." On the other hand, there's a good argument for not spending the time to neatly line up blocks of code (other than at the usual multiples of four spaces), whether it is to the delimiter or not. It's the same argument against doing this: fee_fi_fo_fum = "something" # Align the equals foo = "something else" # and/or the hashes. When actively changing code lined up like that, you can easily spend more time aligning things than programming. I have a hard time reconciling the advice in PEP 8 against such alignments with the current suggestion. -- Steven From mal at egenix.com Wed May 11 23:50:43 2011 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 11 May 2011 23:50:43 +0200 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: <4DCB034C.8010508@pearwood.info> References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> <4DCB034C.8010508@pearwood.info> Message-ID: <4DCB04B3.3000100@egenix.com> Steven D'Aprano wrote: > Guido van Rossum wrote: >> At Google we use the following rule (from >> http://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Indentation): >> >> >> Yes: # Aligned with opening delimiter >> foo = long_function_name(var_one, var_two, >> var_three, var_four) > > I cringe whenever I see that. If people are going to bother lining > things up other than at 4-space indents, they should at least line them > up in a visually attractive place. The delimiter should surround the > arguments, not line up with them: > > foo = long_function_name(var_one, var_two, > var_three, var_four) See the link Guido posted: that's what they use. Looks like the MUA dropped a blank or there was a tab/space issue involved. Whitespace tends to be mysterious sometimes ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 11 2011) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2011-06-20: EuroPython 2011, Florence, Italy 40 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tjreedy at udel.edu Thu May 12 00:17:40 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 11 May 2011 18:17:40 -0400 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: Message-ID: On 5/11/2011 12:44 PM, Matthias Lehmann wrote: > Hi all, > > two times in one day I read about the problems of triple-quoted strings > and indendation (one time on stackoverflow, one time one this list). > Python is well known for its readability and its use of idendation to > this end. But with triple-quoted strings, nice indendation is not > possible without the need to post-process the resulting string. Three partial solutions: 1. Strings are constants. Define them at the top of the module, in global scope. I remember seeing this promoted as a good coding practice once -- easy to find, modify, translate. text = '''\ LIne 1 linklnlsf 2 and finally, we are done. ''' I would consider this for strings, at least long strings, displayed to end-users, with mnemonic names. 2. For doc strings, especially for top level classes, do not worry. def whip_up(**args): '''Return some delicious munchies made from inputs. The keyword values should be edible and preferably yummy. Whip_up will do the best it can which what you give it. ''' Having help(whip_up) print Return some delicious munchies made from inputs. The keyword values should be edible and preferably yummy. Whip_up will do the best it can which what you give it. is not a problem. It might even be a virtue. 3. 'It is not necessarily so bad.' I have a test function with several tests that compares an expected string, given as a literal, to actual output captured with StringIO. Since this is a test_main in the file, run with __name__ == '__main__', I do not want to put the strings in the main part of the file (1 above). At first, the following bothered me. expected = '''\ Line 1 Line 2 ''' I like Python's indentation! But it does not bother me so much anymore. IDLE colors the literals green, so they can be semi-ignored. Having the full screen width available can be a plus. If I were using textwrap.dedent much, I might give it a short nickname like 'de' would be visible while I want see it but ignorable when I do not. If one wants a custom dedent rule, like the one you described, write a custom function. -- Terry Jan Reedy From greg.ewing at canterbury.ac.nz Thu May 12 00:22:27 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 12 May 2011 10:22:27 +1200 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: Message-ID: <4DCB0C23.9080709@canterbury.ac.nz> I have an idea of my own concerning multi-line strings. Many of the problems of triple-quoted strings stem from the fact that they're trying to be expressions that sit in-line with the rest of the code. As we've seen with all the attempts to fit multi-line function bodies into lambdas, that doesn't really work. So instead of a multi-line string *expression*, I think we need a *statement*. string adverisement: | Python Egg Incubator! | | Hatch your eggs in half the time. Get yours | today for only $39.99! -- Greg From guido at python.org Thu May 12 00:22:21 2011 From: guido at python.org (Guido van Rossum) Date: Wed, 11 May 2011 15:22:21 -0700 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: <4DCB034C.8010508@pearwood.info> References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> <4DCB034C.8010508@pearwood.info> Message-ID: On Wed, May 11, 2011 at 2:44 PM, Steven D'Aprano wrote: > Guido van Rossum wrote: >> >> At Google we use the following rule (from >> >> http://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Indentation): >> >> Yes: ?# Aligned with opening delimiter >> ? ? ? foo = long_function_name(var_one, var_two, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? var_three, var_four) > > I cringe whenever I see that. If people are going to bother lining things up > other than at 4-space indents, they should at least line them up in a > visually attractive place. The delimiter should surround the arguments, not > line up with them: > > ? ? ? ?foo = long_function_name(var_one, var_two, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? var_three, var_four) > > although the effect may be spoiled if you're reading this in a > non-monospaced font. I used rich text in gmail and it looks aligned to me. Sorry if it doesn't for you; as MAL said, follow the link to see how it's supposed to look. > This is analogous to the way that professional > typesetters use handing punctuation: > > http://desktoppub.about.com/od/typelayout/ss/hangingquotes.htm > > ? "Li Europan lingues es membres del sam familie. Lor separat > ? ?existentie es un myth. Por scientie, musica, sport etc, litot > ? ?Europa usa li sam vocabular. Li lingues differe solmen in li > ? ?grammatica, li pronunciation e li plu commun vocabules." > > compared to: > > ? ?"Li Europan lingues es membres del sam familie. Lor separat > ? ?existentie es un myth. Por scientie, musica, sport etc, litot > ? ?Europa usa li sam vocabular. Li lingues differe solmen in li > ? ?grammatica, li pronunciation e li plu commun vocabules." > > > On the other hand, there's a good argument for not spending the time to > neatly line up blocks of code (other than at the usual multiples of four > spaces), whether it is to the delimiter or not. Emacs automatically does this for me. I spend zero time aligning code. > It's the same argument > against doing this: > > fee_fi_fo_fum = "something" ? ? ? # Align the equals > foo ? ? ? ? ? = "something else" ?# and/or the hashes. > > When actively changing code lined up like that, you can easily spend more > time aligning things than programming. > > I have a hard time reconciling the advice in PEP 8 against such alignments > with the current suggestion. Hardly; that is about spaces *between* tokens. This is about indentation. The amount of degradation in non-monospace fonts is quite different. Indentation still looks indented, just not aligned with [the first character inside] the open parenthesis, whereas internal spaces look completely jumbled. IF PEP 8 was still mine I would add this specific rule from the Google style guide. If people want to bikeshed it to death, go ahead, I will probably mute the thread. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Thu May 12 00:40:45 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 12 May 2011 10:40:45 +1200 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: Message-ID: <4DCB106D.4010105@canterbury.ac.nz> Terry Reedy wrote: > If I were using textwrap.dedent much, I might give it a short nickname > like 'de' Wild idea: make the unary + operator on strings do textwrap.dedent() on them. -- Greg From bruce at leapyear.org Thu May 12 00:57:32 2011 From: bruce at leapyear.org (Bruce Leban) Date: Wed, 11 May 2011 15:57:32 -0700 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: <4DCB106D.4010105@canterbury.ac.nz> References: <4DCB106D.4010105@canterbury.ac.nz> Message-ID: On Wed, May 11, 2011 at 3:40 PM, Greg Ewing wrote: > > Wild idea: make the unary + operator on strings do > textwrap.dedent() on them. > > Wouldn't the unary - operator make more sense since it's removing spaces? But I would prefer that it use a slightly friendlier form of dedent: def dedent_for_literal(s): if s and s[0] == '\n': s = s[1:] if s and s[-1] == '\n': s = s[:-1] return textwrap.dedent(s) That said, is this such a wart on the language that it's worth changing? --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at cheimes.de Thu May 12 01:58:05 2011 From: lists at cheimes.de (Christian Heimes) Date: Thu, 12 May 2011 01:58:05 +0200 Subject: [Python-ideas] Threading hooks and disable gc per thread Message-ID: <4DCB228D.2010904@cheimes.de> Hello, today I've spent several hours debugging a segfault in JCC [1]. JCC is a framework to wrap Java code for Python. It's most prominently used in PyLucene [2]. You can read more about my debugging in [3] With JCC every Python thread must be registered at the JVM through JCC. An unattached thread, that accesses a wrapped Java object, leads to errors and may even cause a segfault. Accessing also includes garbage collection. A code line like a = {} or "a b c".split() can segfault since the allocation of a dict or a bound method runs through _PyObject_GC_New(), which may trigger a cyclic garbage collection run. If the current thread isn't attached to the JVM but triggers a gc.collect() with some Java objects in a cycle, the interpreter crashes. It's quite complicated and hard to "fix" third party tools to attach all threads created in the third party library. The issue could be solved with a simple on_thread_start hook in the threading module. However there is more to it. In order to free memory threads must also be detached from the JVM, when a thread has ended. A second on_thread_stop hook isn't enough since the bound methods may also lead to a gc.collect() run after the thread is detached. I propose three changes to Python in order to fix the issue: on thread start hook -------------------- Similar to the atexit module, third party modules can register a callable with *args and **kwargs. The functions are called inside the newly created thread just before the target is called. The best place for the hook list is threading.Thread._bootstrap_inner() right before the try: self.run() except: block. Exceptions are ignored during the call but reported to the user at the end (same as atexit's atexit_callfunc()) on thread end hook ------------------ Same as on thread start hook but the callables are called inside the dying thread after self.run(). gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread() -------------------------------------------------------------- Right now almost any code can trigger a gc.collect() run non-deterministicly. Some application like JCC want to control if gc.collect() is wanted on a thread level. This could be solved with a new flat in PyThreadState. PyThreadState->gc_enabled is enabled by default. When the flag is false, _PyObject_GC_Malloc() doesn't start a gc.collect() run for that thread. The collection is delayed until another thread or the main thread triggers it. The three functions should also have a C equivalent so C code can prevent gc in a thread. Thoughs? Christian [1] http://lucene.apache.org/pylucene/jcc/index.html [2] http://lucene.apache.org/pylucene/ [3] http://mail-archives.apache.org/mod_mbox/lucene-pylucene-dev/201105.mbox/browser From ben+python at benfinney.id.au Thu May 12 04:41:02 2011 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 12 May 2011 12:41:02 +1000 Subject: [Python-ideas] Minor tweak to PEP 8? References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> Message-ID: <878vuck5dt.fsf@benfinney.id.au> Guido van Rossum writes: > Yes: # Aligned with opening delimiter > foo = long_function_name(var_one, var_two, > var_three, var_four) This is needlessly dependent on the content of the opening line; if that changes, the rest need to change. It begs for the indentation to get mis-aligned when other lines are edited. > # 4-space hanging indent; nothing on first line > foo = long_function_name( > var_one, var_two, var_three, > var_four) This one doesn't have the previous problem, which is why it's what I recommend. I would be happy to see the latter explicitly recommended in PEP 8. If the price of that is to have the former also recommended, I'd grumble but it would be an improvement. > No: # Stuff on first line forbidden > foo = long_function_name(var_one, var_two, > var_three, var_four) > > # 2-space hanging indent forbidden > foo = long_function_name( > var_one, var_two, var_three, > var_four) I agree with pointing to both of these as bad examples. -- \ ?People demand freedom of speech to make up for the freedom of | `\ thought which they avoid.? ?Soren Aabye Kierkegaard (1813?1855) | _o__) | Ben Finney From mat at matlehmann.de Thu May 12 09:24:53 2011 From: mat at matlehmann.de (Matthias Lehmann) Date: Thu, 12 May 2011 09:24:53 +0200 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: <20110511212748.GA20130@iskra.aviel.ru> References: <20110511212748.GA20130@iskra.aviel.ru> Message-ID: > PEP 295 http://www.python.org/dev/peps/pep-0295/ was rejected in 2002. > > Oleg. Oh, thanks for the link, I was almost sure that something like that was proposed before - sorry I didn't thoroughly search the PEPs beforehand. I still think that indendation of triple-quoted strings is a wart of the language - a small one, but still a wart. But it's been discussed and rejected before - and probably with good reasons. Mat From mat at matlehmann.de Thu May 12 09:39:43 2011 From: mat at matlehmann.de (Matthias Lehmann) Date: Thu, 12 May 2011 09:39:43 +0200 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: <4DCB106D.4010105@canterbury.ac.nz> References: <4DCB106D.4010105@canterbury.ac.nz> Message-ID: > Wild idea: make the unary + operator on strings do > textwrap.dedent() on them. > The disadvantage compared to a string flag is, that this unary operator has no knowledge of the current indendation level within the code - so this solution looks similar in code x = +""" foo bar""" vs x = t""" foo bar""" the results is different, though. foo bar vs foo bar From phd at phdru.name Thu May 12 12:15:57 2011 From: phd at phdru.name (Oleg Broytman) Date: Thu, 12 May 2011 14:15:57 +0400 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: <20110511212748.GA20130@iskra.aviel.ru> Message-ID: <20110512101557.GA5286@iskra.aviel.ru> On Thu, May 12, 2011 at 09:24:53AM +0200, Matthias Lehmann wrote: > > PEP 295 http://www.python.org/dev/peps/pep-0295/ was rejected in 2002. > Oh, thanks for the link, I was almost sure that something like that > was proposed before - sorry I didn't thoroughly search the PEPs > beforehand. > > I still think that indendation of triple-quoted strings is a wart of > the language - a small one, but still a wart. But it's been > discussed and rejected before - and probably with good reasons. My opinion is: -- I don't think it's a wart; -- If it's a wart it's quite small; -- It's very easy to fix by calling dedent(); -- Fixing it by changing the language means to change the language for very little gain; changing the language must not be done lightly. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From p.f.moore at gmail.com Thu May 12 12:18:03 2011 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 12 May 2011 11:18:03 +0100 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: Message-ID: On 11 May 2011 17:44, Matthias Lehmann wrote: > 3) And finally to quote a post from earlier today > "I know about textwrap.dedent, but having to use a > Python function call to code a literal has always made me > uncomfortable." As the writer of that comment, I'd like to add a -1 to this proposal :-) My intent was to point out that I'm willing to have indentation oddities rather than use dedent. In my view, the problem isn't important enough to warrant extra syntax. Sorry, :-) Paul. From mat at matlehmann.de Thu May 12 12:32:49 2011 From: mat at matlehmann.de (Matthias Lehmann) Date: Thu, 12 May 2011 12:32:49 +0200 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: Message-ID: Am 12.05.2011 12:18, schrieb Paul Moore: > On 11 May 2011 17:44, Matthias Lehmann wrote: >> 3) And finally to quote a post from earlier today >> "I know about textwrap.dedent, but having to use a >> Python function call to code a literal has always made me >> uncomfortable." > > As the writer of that comment, I'd like to add a -1 to this proposal :-) > > My intent was to point out that I'm willing to have indentation > oddities rather than use dedent. In my view, the problem isn't > important enough to warrant extra syntax. > > Sorry, :-) > Paul. I didn't mean to misuse your comment - I hope this is not your perception. Thanks for your feedback. Mat From amaramrahul at users.sourceforge.net Thu May 12 15:44:00 2011 From: amaramrahul at users.sourceforge.net (Rahul Amaram) Date: Thu, 12 May 2011 19:14:00 +0530 Subject: [Python-ideas] Suggestion for Style Guide for Python Code PEP 8 Message-ID: <4DCBE420.20008@users.sourceforge.net> Hi, I was wondering if the following programming recommendation would be added to the Style Guide for Python Code (PEP 8) page. The preferred way for checking if a key (k) exists in a dictionary (d) is "if k in d". This is faster than "if k in d.keys()" and this has superseded "d.has_key(k)" Cheers, Rahul. From p.f.moore at gmail.com Thu May 12 15:54:07 2011 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 12 May 2011 14:54:07 +0100 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: Message-ID: On 12 May 2011 11:32, Matthias Lehmann wrote: > I didn't mean to misuse your comment - I hope this is not your perception. Not at all. I understood your message, just wanted to clarify the thinking behind my original statement. Your quote was entirely fair. Paul. From solipsis at pitrou.net Thu May 12 19:26:45 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 12 May 2011 19:26:45 +0200 Subject: [Python-ideas] PEP-3151 pattern-matching References: <4D9D792D.2020403@egenix.com> <4D9ECE64.7070402@egenix.com> Message-ID: <20110512192645.02de509b@pitrou.net> On Fri, 08 Apr 2011 10:59:16 +0200 "M.-A. Lemburg" wrote: > > > > I think EnvironmentError, WindowsError, VMSError, OSError, mmap.error > > and select.error should definitely all be merged with IOError, as they > > aren't used consistently enough to make handling them differently > > reliable even in current code. > > Their use may be inconsistent in a few places, but those cases > are still well-defined by the implementation, so code relying > on that well-defined behavior will break in subtle ways. Another quirk occurred to me today: select.error doesn't derive from EnvironmentError, and so it doesn't have the errno attribute (even though the select module "correctly" instantiates it with a (errno, message) tuple). Also, its str() is borked: >>> e = select.error(4, "interrupted") >>> str(e) "(4, 'interrupted')" >>> raise e Traceback (most recent call last): File "", line 1, in select.error: (4, 'interrupted') Regards Antoine. From mal at egenix.com Thu May 12 21:06:18 2011 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 12 May 2011 21:06:18 +0200 Subject: [Python-ideas] PEP-3151 pattern-matching In-Reply-To: <20110512192645.02de509b@pitrou.net> References: <4D9D792D.2020403@egenix.com> <4D9ECE64.7070402@egenix.com> <20110512192645.02de509b@pitrou.net> Message-ID: <4DCC2FAA.7010409@egenix.com> Antoine Pitrou wrote: > On Fri, 08 Apr 2011 10:59:16 +0200 > "M.-A. Lemburg" wrote: >>> >>> I think EnvironmentError, WindowsError, VMSError, OSError, mmap.error >>> and select.error should definitely all be merged with IOError, as they >>> aren't used consistently enough to make handling them differently >>> reliable even in current code. >> >> Their use may be inconsistent in a few places, but those cases >> are still well-defined by the implementation, so code relying >> on that well-defined behavior will break in subtle ways. > > Another quirk occurred to me today: select.error doesn't derive from > EnvironmentError, and so it doesn't have the errno attribute (even > though the select module "correctly" instantiates it with a (errno, > message) tuple). Also, its str() is borked: > >>>> e = select.error(4, "interrupted") >>>> str(e) > "(4, 'interrupted')" >>>> raise e > Traceback (most recent call last): > File "", line 1, in > select.error: (4, 'interrupted') Works fine in Python 2.7: >>> import select >>> e = select.error(4, "intr") >>> e error(4, 'intr') >>> try: ... raise e ... except select.error, x: ... code, text = x ... print code,text ... 4 intr >>> Note that existing code will not look for an attribute that doesn't exist :-) It'll unwrap the tuple and work from there or use the .args attribute to get at the constructor args. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 12 2011) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2011-06-20: EuroPython 2011, Florence, Italy 39 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Thu May 12 21:22:11 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 12 May 2011 21:22:11 +0200 Subject: [Python-ideas] PEP-3151 pattern-matching In-Reply-To: <4DCC2FAA.7010409@egenix.com> References: <4D9D792D.2020403@egenix.com> <4D9ECE64.7070402@egenix.com> <20110512192645.02de509b@pitrou.net> <4DCC2FAA.7010409@egenix.com> Message-ID: <1305228131.3548.4.camel@localhost.localdomain> Le jeudi 12 mai 2011 ? 21:06 +0200, M.-A. Lemburg a ?crit : > Note that existing code will not look for an attribute that > doesn't exist :-) True. My point is that not having "errno" makes it even more obscure how to check for different kinds of select errors. Also, given that other "environmental" errors will have an "errno" giving the POSIX error code, it's easy to get surprised. Regards Antoine. From g.brandl at gmx.net Thu May 12 22:12:47 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 12 May 2011 22:12:47 +0200 Subject: [Python-ideas] Suggestion for Style Guide for Python Code PEP 8 In-Reply-To: <4DCBE420.20008@users.sourceforge.net> References: <4DCBE420.20008@users.sourceforge.net> Message-ID: On 12.05.2011 15:44, Rahul Amaram wrote: > Hi, > I was wondering if the following programming recommendation would be > added to the Style Guide for Python Code (PEP 8) page. > > The preferred way for checking if a key (k) exists in a dictionary (d) > is "if k in d". This is faster than "if k in d.keys()" and this has > superseded "d.has_key(k)" While "k in d" is certainly the right way, this is not the sort of thing that should be added to PEP 8. There must be dozens of such little idioms and anti-idioms, and listing them all is way beyond the PEP's scope. (And has_key is gone in py3k anyway.) Georg From g.brandl at gmx.net Thu May 12 22:15:07 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 12 May 2011 22:15:07 +0200 Subject: [Python-ideas] PEP-3151 pattern-matching In-Reply-To: <4DCC2FAA.7010409@egenix.com> References: <4D9D792D.2020403@egenix.com> <4D9ECE64.7070402@egenix.com> <20110512192645.02de509b@pitrou.net> <4DCC2FAA.7010409@egenix.com> Message-ID: On 12.05.2011 21:06, M.-A. Lemburg wrote: > Antoine Pitrou wrote: >> On Fri, 08 Apr 2011 10:59:16 +0200 >> "M.-A. Lemburg" wrote: >>>> >>>> I think EnvironmentError, WindowsError, VMSError, OSError, mmap.error >>>> and select.error should definitely all be merged with IOError, as they >>>> aren't used consistently enough to make handling them differently >>>> reliable even in current code. >>> >>> Their use may be inconsistent in a few places, but those cases >>> are still well-defined by the implementation, so code relying >>> on that well-defined behavior will break in subtle ways. >> >> Another quirk occurred to me today: select.error doesn't derive from >> EnvironmentError, and so it doesn't have the errno attribute (even >> though the select module "correctly" instantiates it with a (errno, >> message) tuple). Also, its str() is borked: >> >>>>> e = select.error(4, "interrupted") >>>>> str(e) >> "(4, 'interrupted')" >>>>> raise e >> Traceback (most recent call last): >> File "", line 1, in >> select.error: (4, 'interrupted') > > Works fine in Python 2.7: > >>>> import select >>>> e = select.error(4, "intr") >>>> e > error(4, 'intr') Note that this is the repr(), while Antoine showed the str(). But the str() looks correct to me as well (for an exception that doesn't derive from EnvironmentError). Georg From solipsis at pitrou.net Thu May 12 23:17:29 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 12 May 2011 23:17:29 +0200 Subject: [Python-ideas] PEP-3151 pattern-matching References: <4D9D792D.2020403@egenix.com> <4D9ECE64.7070402@egenix.com> <20110512192645.02de509b@pitrou.net> <4DCC2FAA.7010409@egenix.com> Message-ID: <20110512231729.6b91b373@pitrou.net> On Thu, 12 May 2011 22:15:07 +0200 Georg Brandl wrote: > > But the str() looks correct to me as well (for an exception that doesn't > derive from EnvironmentError). It's technically correct, sure. The point is that "technically correct" translates to "humanly bogus" here, because of the broken I/O exception hierarchy. Regards Antoine. From g.brandl at gmx.net Fri May 13 07:07:56 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 13 May 2011 07:07:56 +0200 Subject: [Python-ideas] PEP-3151 pattern-matching In-Reply-To: <20110512231729.6b91b373@pitrou.net> References: <4D9D792D.2020403@egenix.com> <4D9ECE64.7070402@egenix.com> <20110512192645.02de509b@pitrou.net> <4DCC2FAA.7010409@egenix.com> <20110512231729.6b91b373@pitrou.net> Message-ID: On 12.05.2011 23:17, Antoine Pitrou wrote: > On Thu, 12 May 2011 22:15:07 +0200 > Georg Brandl wrote: >> >> But the str() looks correct to me as well (for an exception that doesn't >> derive from EnvironmentError). > > It's technically correct, sure. The point is that "technically correct" > translates to "humanly bogus" here, because of the broken I/O > exception hierarchy. Yep, and I'm all for fixing it with PEP 3151 :) Georg From clockworksaint at gmail.com Fri May 13 14:34:50 2011 From: clockworksaint at gmail.com (Weeble) Date: Fri, 13 May 2011 13:34:50 +0100 Subject: [Python-ideas] Suggestion for Style Guide for Python Code PEP 8 Message-ID: On 12.05.2011 15:44, Rahul Amaram wrote: > The preferred way for checking if a key (k) exists in a dictionary (d) > is "if k in d". This is faster than "if k in d.keys()" and this has > superseded "d.has_key(k)" While 'k in d' is the right way to do it, I feel the claim it's faster that 'k in d.keys()' is somewhat weak. While this is technically true, it's a constant overhead, not some cost linear in the size of the collection - at least in Python 3 - because .keys() returns a view. >>> timeit("'1234567' in d", "d=dict((str(x),x) for x in range(5000000))") 0.09317641210044993 >>> timeit("'1234567' in d.keys()", "d=dict((str(x),x) for x in range(5000000))") 0.1938305479460105 >>> timeit("'1234567' in dkeys", "dkeys=dict((str(x),x) for x in range(5000000)).keys()") 0.0903750153983367 So "x in d.keys()" is slower than "x in d", but only by the cost of a method lookup. I don't see any reason ever to recommend using "x in d.keys()", but I think it's misleading to say that this is because of performance reasons, assuming that we are talking about Python 3. (I also completely agree with everything Georg said, FWIW.) From stephen at xemacs.org Fri May 13 15:13:19 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 13 May 2011 22:13:19 +0900 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: <4DCB106D.4010105@canterbury.ac.nz> Message-ID: <87vcxed9qo.fsf@uwakimon.sk.tsukuba.ac.jp> Matthias Lehmann writes: > > Wild idea: make the unary + operator on strings do > > textwrap.dedent() on them. > > > The disadvantage compared to a string flag is, that this unary operator > has no knowledge of the current indendation level within the code But then your complaint is against text.dedent, not against Python syntax. (That's no reason you can't have 2 complaints, of course.) From mat at matlehmann.de Fri May 13 16:13:57 2011 From: mat at matlehmann.de (Matthias Lehmann) Date: Fri, 13 May 2011 16:13:57 +0200 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: <87vcxed9qo.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4DCB106D.4010105@canterbury.ac.nz> <87vcxed9qo.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Am 13.05.2011 15:13, schrieb Stephen J. Turnbull: > Matthias Lehmann writes: > > > Wild idea: make the unary + operator on strings do > > > textwrap.dedent() on them. > > > > > The disadvantage compared to a string flag is, that this unary operator > > has no knowledge of the current indendation level within the code > > But then your complaint is against text.dedent, not against Python > syntax. (That's no reason you can't have 2 complaints, of course.) Well, it's not the fault of textwrap.dedent, that is has no notion of the indendation-level of its argument. As far as I know, that is something, only the parser knows (not that I know anything about the Python parser). Mat From stephen at xemacs.org Fri May 13 18:14:29 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 14 May 2011 01:14:29 +0900 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: <4DCB106D.4010105@canterbury.ac.nz> <87vcxed9qo.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r582d1cq.fsf@uwakimon.sk.tsukuba.ac.jp> Matthias Lehmann writes: > Am 13.05.2011 15:13, schrieb Stephen J. Turnbull: > > Matthias Lehmann writes: > > > > Wild idea: make the unary + operator on strings do > > > > textwrap.dedent() on them. > > > > > > > The disadvantage compared to a string flag is, that this unary operator > > > has no knowledge of the current indendation level within the code > > > > But then your complaint is against text.dedent, not against Python > > syntax. (That's no reason you can't have 2 complaints, of course.) > > Well, it's not the fault of textwrap.dedent, that is has no notion of > the indendation-level of its argument. As far as I know, that is > something, only the parser knows (not that I know anything about the > Python parser). Oh, I thought you were referring to the indentation within the string (on the first line), not where the string begins. Sorry! But I think there's real trouble here, because there are different styles of indentation, as we've seen. You'd have to enforce one for triple-quoted strings, but that's likely to conflict with many developers' ideas about the matter. That's really not something the parser should be doing .... From bruce at leapyear.org Fri May 13 19:17:01 2011 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 13 May 2011 10:17:01 -0700 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: <87r582d1cq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4DCB106D.4010105@canterbury.ac.nz> <87vcxed9qo.fsf@uwakimon.sk.tsukuba.ac.jp> <87r582d1cq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, May 13, 2011 at 9:14 AM, Stephen J. Turnbull wrote: > Matthias Lehmann writes: > > Well, it's not the fault of textwrap.dedent, that is has no notion of > > the indendation-level of its argument. As far as I know, that is > > something, only the parser knows (not that I know anything about the > > Python parser). > > Oh, I thought you were referring to the indentation within the string > (on the first line), not where the string begins. Sorry! > > But I think there's real trouble here, because there are different > styles of indentation, as we've seen. You'd have to enforce one for > triple-quoted strings, but that's likely to conflict with many > developers' ideas about the matter. That's really not something the > parser should be doing .... If this feature were to be added, we would surely want to ignore the indentation on the first line regardless of the previous line since it shouldn't depend on whether or not I use two or four space indents: fun_func(-""" multiple lines """) # ^^^^ don't want these spaces in my string but unless we force people to follow the convention that you must have a line break after the opening """ we would need to ignore indentation starting with the second line for people who use this style: fun_func(-"""foo bar more""") Now personally, I'd probably follow that first style but if this were a language feature I wouldn't think it should only work for one style. Here's pseudo-code: if s[0] == '\n': # style = first case above strip first character and strip indentation starting with first line else if s[0] == ' ': strip indentation starting with first line # style = """\ else: strip indentation starting with second line # style = second case above --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From mat at matlehmann.de Fri May 13 23:00:35 2011 From: mat at matlehmann.de (Matthias Lehmann) Date: Fri, 13 May 2011 23:00:35 +0200 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: <4DCB106D.4010105@canterbury.ac.nz> <87vcxed9qo.fsf@uwakimon.sk.tsukuba.ac.jp> <87r582d1cq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Am 13.05.2011 19:17, schrieb Bruce Leban: > If this feature were to be added, we would surely want to ignore the > indentation on the first line regardless of the previous line since it > shouldn't depend on whether or not I use two or four space indents: > > fun_func(-""" > multiple > lines > """) > # ^^^^ don't want these spaces in my string > > but unless we force people to follow the convention that you must have a > line break after the opening """ we would need to ignore indentation > starting with the second line for people who use this style: > > fun_func(-"""foo > bar > more""") > > Now personally, I'd probably follow that first style but if this were a > language feature I wouldn't think it should only work for one style. > Here's pseudo-code: > > if s[0] == '\n': # style = first case above > strip first character and strip indentation starting with first > line > else if s[0] == ' ': > strip indentation starting with first line # style = """\ > else: > strip indentation starting with second line # style = second > case above > The prototyped code for trimming of triple-quoted string as I proposed were: def trim(start_column, lines): """ start_column: start-column of first line of the triple-quoted string lines: the lines of the string """ n = start_column for line in lines[1:]: m = get_index_of_first_non_whitespace_char(line) n = min(n, m) result = [] if len(lines[0]) > 0: result.append(lines[1]) for line in lines[1:]: result.append(line[n:]) if len(lines[-1]) == 0: result = result[:-1] return '\n'.join(result) The crux is to have the start_column available to the function, everything else could be done just with a function. With this, following indendation styles are possible: func(t"""foo bar more""") func(t"""foo bar more""") func(t""" foo bar more """) All this would be possible with a function, too. The start_column is really only needed to support cases like this: func(t""" keep white space """) From rrr at ronadam.com Sat May 14 03:28:49 2011 From: rrr at ronadam.com (Ron Adam) Date: Fri, 13 May 2011 20:28:49 -0500 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: <4DCB0C23.9080709@canterbury.ac.nz> References: <4DCB0C23.9080709@canterbury.ac.nz> Message-ID: On 05/11/2011 05:22 PM, Greg Ewing wrote: > I have an idea of my own concerning multi-line strings. > > Many of the problems of triple-quoted strings stem from > the fact that they're trying to be expressions that > sit in-line with the rest of the code. As we've seen > with all the attempts to fit multi-line function bodies > into lambdas, that doesn't really work. > > So instead of a multi-line string *expression*, I think > we need a *statement*. > > string adverisement: > | Python Egg Incubator! > | > | Hatch your eggs in half the time. Get yours > | today for only $39.99! If in the above, '|' is used as the start of a line terminated string, it would be a nicer way of typing... string advertisement: " Python Egg Incubator!\n" "\n" " Hatch your eggs in half the time. Get yours\n" " today for only $39.99!\n" I think that would only require a small patch to tokanize.c. It would result in a blank line being added to the end of the paragraph, but maybe that's not so bad. The hard parts are finding the best symbol, '|' is already used, and weather or not to try to handle raw and byte strings would be a concern as well. We don't want to allow quotes to go unterminated as that is usually an error that needs to be caught. Weather or not it's desirable to do this is another thing. ;-) Cheers, Ron From stephen at xemacs.org Sat May 14 03:50:48 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 14 May 2011 10:50:48 +0900 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: <4DCB106D.4010105@canterbury.ac.nz> <87vcxed9qo.fsf@uwakimon.sk.tsukuba.ac.jp> <87r582d1cq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87hb8ycao7.fsf@uwakimon.sk.tsukuba.ac.jp> Matthias Lehmann writes: > func(t""" > foo > bar > more > """) This style is possible without help from the parser, by taking the last line as a hint for the indent to trim. From amaramrahul at users.sourceforge.net Sat May 14 06:50:33 2011 From: amaramrahul at users.sourceforge.net (Rahul Amaram) Date: Sat, 14 May 2011 10:20:33 +0530 Subject: [Python-ideas] Suggestion for Style Guide for Python Code PEP 8 In-Reply-To: References: <4DCBE420.20008@users.sourceforge.net> Message-ID: <4DCE0A19.6000109@users.sourceforge.net> Thanks for the reply George and Weeble. It would nice if these kind of minor programming guidelines are also included in some page probably titled "Extended Python Guidelines" :). The reason being novice programmers in python who have worked in previous languages tend to use the same style of coding as in other languages. So, for instance, to check for the existence of a key in a dictionary, it is extremely likely that they'd either look for a has_key method or get a list of all the keys and search in it. Anyway, as you said, there might a lot of such small idioms in python, which may not make sense to cover in PEP 8 but if they are really the recommended way of doing the operation, then we probably should have them documented in one place. Regards, Rahul. On Friday 13 May 2011 01:42 AM, Georg Brandl wrote: > On 12.05.2011 15:44, Rahul Amaram wrote: > >> Hi, >> I was wondering if the following programming recommendation would be >> added to the Style Guide for Python Code (PEP 8) page. >> >> The preferred way for checking if a key (k) exists in a dictionary (d) >> is "if k in d". This is faster than "if k in d.keys()" and this has >> superseded "d.has_key(k)" >> > While "k in d" is certainly the right way, this is not the sort of thing > that should be added to PEP 8. There must be dozens of such little > idioms and anti-idioms, and listing them all is way beyond the PEP's scope. > > (And has_key is gone in py3k anyway.) > > Georg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From greg.ewing at canterbury.ac.nz Sat May 14 13:15:50 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 14 May 2011 23:15:50 +1200 Subject: [Python-ideas] triple-quoted strings and indendation In-Reply-To: References: <4DCB0C23.9080709@canterbury.ac.nz> Message-ID: <4DCE6466.1000403@canterbury.ac.nz> Ron Adam wrote: > > On 05/11/2011 05:22 PM, Greg Ewing wrote: > >> string adverisement: >> | Python Egg Incubator! >> | >> | Hatch your eggs in half the time. Get yours >> | today for only $39.99! > > I think that would only require a small patch to tokanize.c. It would > result in a blank line being added to the end of the paragraph, No, the idea is that a newline wouldn't be added to the last line. If you wanted that, you would have to add an empty line at the end: string foo: | This line ends with a newline. | > The hard parts are finding the best symbol, '|' is already used, In a different context, though. There shouldn't be any ambiguity. I'd much rather use '|' than anything else, because it makes such a nice vertical boundary line. -- Greg From g.brandl at gmx.net Sat May 14 16:17:11 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 14 May 2011 16:17:11 +0200 Subject: [Python-ideas] Suggestion for Style Guide for Python Code PEP 8 In-Reply-To: <4DCE0A19.6000109@users.sourceforge.net> References: <4DCBE420.20008@users.sourceforge.net> <4DCE0A19.6000109@users.sourceforge.net> Message-ID: On 14.05.2011 06:50, Rahul Amaram wrote: > Thanks for the reply George and Weeble. It would nice if these kind of > minor programming guidelines are also included in some page probably > titled "Extended Python Guidelines" :). The reason being novice > programmers in python who have worked in previous languages tend to use > the same style of coding as in other languages. So, for instance, to > check for the existence of a key in a dictionary, it is extremely likely > that they'd either look for a has_key method or get a list of all the > keys and search in it. Anyway, as you said, there might a lot of such > small idioms in python, which may not make sense to cover in PEP 8 but > if they are really the recommended way of doing the operation, then we > probably should have them documented in one place. I'd hope that simple things like "k in d" are already in every tutorial on Python that's worth anything... Georg From grosser.meister.morti at gmx.net Sat May 14 19:57:32 2011 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Sat, 14 May 2011 19:57:32 +0200 Subject: [Python-ideas] a few decorator recipes In-Reply-To: References: <4DBAF64D.30500@gmx.net> Message-ID: <4DCEC28C.7040504@gmx.net> So there is a standard place to store such metadata. See how I use it here (scroll all the way down): https://bitbucket.org/panzi/functools_plus/src -panzi On 04/30/2011 09:17 PM, Benjamin Peterson wrote: > Mathias Panzenb?ck writes: >> >> def annotations(**annots): >> def deco(obj): >> if hasattr(obj,'__annotations__'): >> obj.__annotations__.update(annots) >> else: >> obj.__annotations__ = annots >> return obj >> return deco > > Why would you want to do that? > >> >> def setannot(obj, key, value): > > I don't see the point. > > From greg at krypto.org Sat May 14 20:51:37 2011 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 14 May 2011 11:51:37 -0700 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> Message-ID: On Wed, May 11, 2011 at 7:23 AM, Guido van Rossum wrote: > At Google we use the following rule (from > http://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Indentation): > > Yes:? # Aligned with opening delimiter > ? ? ? foo = long_function_name(var_one, var_two, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?var_three, var_four) > > ? ? ? # 4-space hanging indent; nothing on first line > ? ? ? foo = long_function_name( > ? ? ? ? ? var_one, var_two, var_three, > ? ? ? ? ? var_four) and note that this should be "8-space hanging indent" if it goes into pep8. The rule is really "double your code indentation hanging indent" so that you can never confuse the two visually. it works well. > > No: ? # Stuff on first line forbidden > ? ? ? foo = long_function_name(var_one, var_two, > ? ? ? ? ? var_three, var_four) > > ? ? ? # 2-space hanging indent forbidden > ? ? ? foo = long_function_name( > ? ? ? ? var_one, var_two, var_three, > ? ? ? ? var_four) > > I propose we somehow incorporate these two allowed alternatives into PEP 8. > They both serve a purpose. > > -- > --Guido van Rossum (python.org/~guido) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > From guido at python.org Sat May 14 21:01:50 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 14 May 2011 12:01:50 -0700 Subject: [Python-ideas] Minor tweak to PEP 8? In-Reply-To: References: <20110510104754.4689cc5e@bhuda.mired.org> <87y62ejl2j.fsf@benfinney.id.au> Message-ID: Indeed. Somebody update PEP 8 please! On Sat, May 14, 2011 at 11:51 AM, Gregory P. Smith wrote: > On Wed, May 11, 2011 at 7:23 AM, Guido van Rossum wrote: >> At Google we use the following rule (from >> http://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Indentation): >> >> Yes:? # Aligned with opening delimiter >> ? ? ? foo = long_function_name(var_one, var_two, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?var_three, var_four) >> >> ? ? ? # 4-space hanging indent; nothing on first line >> ? ? ? foo = long_function_name( >> ? ? ? ? ? var_one, var_two, var_three, >> ? ? ? ? ? var_four) > > and note that this should be "8-space hanging indent" if it goes into > pep8. ?The rule is really "double your code indentation hanging > indent" so that you can never confuse the two visually. ?it works > well. > >> >> No: ? # Stuff on first line forbidden >> ? ? ? foo = long_function_name(var_one, var_two, >> ? ? ? ? ? var_three, var_four) >> >> ? ? ? # 2-space hanging indent forbidden >> ? ? ? foo = long_function_name( >> ? ? ? ? var_one, var_two, var_three, >> ? ? ? ? var_four) >> >> I propose we somehow incorporate these two allowed alternatives into PEP 8. >> They both serve a purpose. >> >> -- >> --Guido van Rossum (python.org/~guido) >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > -- --Guido van Rossum (python.org/~guido) From greg at krypto.org Sat May 14 21:21:09 2011 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 14 May 2011 12:21:09 -0700 Subject: [Python-ideas] Threading hooks and disable gc per thread In-Reply-To: <4DCB228D.2010904@cheimes.de> References: <4DCB228D.2010904@cheimes.de> Message-ID: On Wed, May 11, 2011 at 4:58 PM, Christian Heimes wrote: > Hello, > > today I've spent several hours debugging a segfault in JCC [1]. JCC is a > framework to wrap Java code for Python. It's most prominently used in > PyLucene [2]. You can read more about my debugging in [3] > > With JCC every Python thread must be registered at the JVM through JCC. > An unattached thread, that accesses a wrapped Java object, leads to > errors and may even cause a segfault. Accessing also includes garbage > collection. A code line like > > ? a = {} > > or > ? "a b c".split() > > can segfault since the allocation of a dict or a bound method runs > through _PyObject_GC_New(), which may trigger a cyclic garbage > collection run. If the current thread isn't attached to the JVM but > triggers a gc.collect() with some Java objects in a cycle, the > interpreter crashes. It's quite complicated and hard to "fix" third > party tools to attach all threads created in the third party library. > > The issue could be solved with a simple on_thread_start hook in the > threading module. However there is more to it. In order to free memory > threads must also be detached from the JVM, when a thread has ended. A > second on_thread_stop hook isn't enough since the bound methods may also > lead to a gc.collect() run after the thread is detached. > > I propose three changes to Python in order to fix the issue: > > on thread start hook > -------------------- > > Similar to the atexit module, third party modules can register a > callable with *args and **kwargs. The functions are called inside the > newly created thread just before the target is called. The best place > for the hook list is threading.Thread._bootstrap_inner() right before > the try: self.run() except: block. Exceptions are ignored during the > call but reported to the user at the end (same as atexit's > atexit_callfunc()) > > > on thread end hook > ------------------ > > Same as on thread start hook but the callables are called inside the > dying thread after self.run(). > Makes sense to me. Something that needs clarifying: when the process dies (main python thread has exited and all remaining python threads are daemon threads) the on thread end hook will _not_ be called. +1 This is really two separate feature requests. The above thread hooks and the below gc hooks. > gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread() > -------------------------------------------------------------- > > Right now almost any code can trigger a gc.collect() run > non-deterministicly. Some application like JCC want to control if > gc.collect() is wanted on a thread level. This could be solved with a > new flat in PyThreadState. PyThreadState->gc_enabled is enabled by > default. When the flag is false, _PyObject_GC_Malloc() doesn't start a > gc.collect() run for that thread. The collection is delayed until > another thread or the main thread triggers it. > > The three functions should also have a C equivalent so C code can > prevent gc in a thread. This also sounds useful since we are a long long way from concurrent gc. (and whenever we gain that, we'd need a way to control when it can or can't happen or to register the gc threads with the anything that needs to know about 'em, JCC, etc..) +1 -gps From lists at cheimes.de Sun May 15 03:04:28 2011 From: lists at cheimes.de (Christian Heimes) Date: Sun, 15 May 2011 03:04:28 +0200 Subject: [Python-ideas] Threading hooks and disable gc per thread In-Reply-To: References: <4DCB228D.2010904@cheimes.de> Message-ID: <4DCF269C.20101@cheimes.de> Am 14.05.2011 21:21, schrieb Gregory P. Smith: > Makes sense to me. > > Something that needs clarifying: when the process dies (main python > thread has exited and all remaining python threads are daemon threads) > the on thread end hook will _not_ be called. Good catch! This gotcha should be mentioned in the docs. A daemon thread can end at any point in its life cycle. It's not an issue for my use case. For JCC the hook just frees some resources that are freed anyway when the process ends. Other use cases may need a more deterministic cleanup, but that's out of the scope for my proposal. Users can get around the issue with an atexit hook, though. > This also sounds useful since we are a long long way from concurrent > gc. (and whenever we gain that, we'd need a way to control when it > can or can't happen or to register the gc threads with the anything > that needs to know about 'em, JCC, etc..) I though of a concurrent GC, too. A dedicated GC thread could improve response time of a GUI or web application if we could separate the cyclic garbage detection into two steps. Even on a fast machine, a full GC sweep with millions of objects in gen2 can take a long time up to a second, in which the interpreter is locked. I assume that the scanning a million objects takes most of the time. If it would be possible to have a scan without the GIL held and then remove the objects in a second step with the GIL acquired, response time could increase. However that would require a major redesign of the traverse and visit slots. Back to my proposal. My initial proposal was missing one feature. It should be possible to alter the default setting for PyThreadState->gc_enabled, too. JCC could use the additional API to make sure, non attached threads don't run the GC. Example how JCC could use the feature: lucene.initVM() initializes the Java VM and attaches the current thread. This is usually done in the main thread before any other thread is started. The function would call PyThread_set_gc_enabled(0) to set the default value for new thread states and to prevent any new thread from starting a cyclic GC collect. lucene.getVM().attachCurrentThread() creates some thread local objects in a TLS and registers the current thread at the Java VM. This would run PyObject_GC_set_thread_enabled(1) to allow GC collect in the current thread. lucene.getVMEnv().detachCurrentThread() cleans up the TLS and unregisters the thread, so a PyObject_GC_set_thread_enabled(0) is required. The implementation is rather simple: - a new static int variable for the default setting and a new flag in the PyThreadState struct - check PyThreadState_Get()->gc_enabled in _PyObject_GC_Malloc() - four small functions to set and get the default and thread setting - three Python functions in the gc module to enable, disable and get the flag from the current PyThreadState - a function to get the global flag. I'm not sure if we should expose the global switch for Python code. The attached patch already has all C functionality. If I hear more +1, then I'll write two small PEPs for both feature requests. Christian -------------- next part -------------- A non-text attachment was scrubbed... Name: gc_thread.diff Type: text/x-patch Size: 3331 bytes Desc: not available URL: From ncoghlan at gmail.com Sun May 15 12:40:31 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 May 2011 20:40:31 +1000 Subject: [Python-ideas] Suggestion for Style Guide for Python Code PEP 8 In-Reply-To: References: <4DCBE420.20008@users.sourceforge.net> <4DCE0A19.6000109@users.sourceforge.net> Message-ID: On Sun, May 15, 2011 at 12:17 AM, Georg Brandl wrote: > I'd hope that simple things like "k in d" are already in every tutorial > on Python that's worth anything... In this particular case, the official docs are already quite explicit: """has_key() is deprecated in favor of key in d.""" http://docs.python.org/library/stdtypes.html#dict.has_key Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun May 15 13:13:56 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 May 2011 21:13:56 +1000 Subject: [Python-ideas] Threading hooks and disable gc per thread In-Reply-To: <4DCB228D.2010904@cheimes.de> References: <4DCB228D.2010904@cheimes.de> Message-ID: On Thu, May 12, 2011 at 9:58 AM, Christian Heimes wrote: > on thread start hook > -------------------- > > Similar to the atexit module, third party modules can register a > callable with *args and **kwargs. The functions are called inside the > newly created thread just before the target is called. The best place > for the hook list is threading.Thread._bootstrap_inner() right before > the try: self.run() except: block. Exceptions are ignored during the > call but reported to the user at the end (same as atexit's > atexit_callfunc()) > > > on thread end hook > ------------------ > > Same as on thread start hook but the callables are called inside the > dying thread after self.run(). So the plan is to have threading.Thread support the hooks, while _thread.start_new_thread and creation of thread states at the C level (including via PyGILState_Ensure) will bypass them? That actually sounds reasonable to me (+0), but the PEP should at least discuss the rationale for the choice of level for the new feature. I also suggest storing the associated hook lists at the threading.Thread class object level rather than at the threading module level (supporting such modularity of state being a major advantage of only providing this feature at the higher level). The PEP should also go into detail as to why having these hooks in a custom Thread subclass isn't sufficient (e.g. needing to support threads created by third party libraries, but note that such a rationale has a problem due to the _thread.start_new_thread loophole). Composability through inheritance should also be discussed - the hook invocation should probably walk the MRO so it is easy to create Thread subclasses that include class specific hooks without inadvertently skipping the hooks installed on threading.Thread. The possibility of passing exception information to thread_end hooks (ala __exit__ methods) should be considered, along with the general relationship between the threading hooks and the context management protocol. > gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread() > -------------------------------------------------------------- > > Right now almost any code can trigger a gc.collect() run > non-deterministicly. Some application like JCC want to control if > gc.collect() is wanted on a thread level. This could be solved with a > new flat in PyThreadState. PyThreadState->gc_enabled is enabled by > default. When the flag is false, _PyObject_GC_Malloc() doesn't start a > gc.collect() run for that thread. The collection is delayed until > another thread or the main thread triggers it. > > The three functions should also have a C equivalent so C code can > prevent gc in a thread. The default setting for this should go in the interpreter state object rather than in a static variable (subinterpreters can then inherit the state of their parent interpreter when they are first created). Otherwise sounds reasonable. (+0) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From dag.odenhall at gmail.com Tue May 17 16:14:34 2011 From: dag.odenhall at gmail.com (dag.odenhall at gmail.com) Date: Tue, 17 May 2011 16:14:34 +0200 Subject: [Python-ideas] PEP-3151 pattern-matching In-Reply-To: References: <4D9D792D.2020403@egenix.com> <4D9ECE64.7070402@egenix.com> <20110512192645.02de509b@pitrou.net> <4DCC2FAA.7010409@egenix.com> <20110512231729.6b91b373@pitrou.net> Message-ID: Excuse me if this has already been discussed, but couldn't __instancecheck__ be used to add exception types that match with more precision? From pyideas at rebertia.com Wed May 18 00:41:40 2011 From: pyideas at rebertia.com (Chris Rebert) Date: Tue, 17 May 2011 15:41:40 -0700 Subject: [Python-ideas] PEP-3151 pattern-matching In-Reply-To: References: <4D9D792D.2020403@egenix.com> <4D9ECE64.7070402@egenix.com> <20110512192645.02de509b@pitrou.net> <4DCC2FAA.7010409@egenix.com> <20110512231729.6b91b373@pitrou.net> Message-ID: On Tue, May 17, 2011 at 7:14 AM, dag.odenhall at gmail.com wrote: > Excuse me if this has already been discussed, but couldn't > __instancecheck__ be used to add exception types that match with more > precision? Somewhat related bug: http://bugs.python.org/issue12029 Cheers, Chris From dag.odenhall at gmail.com Wed May 18 12:07:28 2011 From: dag.odenhall at gmail.com (dag.odenhall at gmail.com) Date: Wed, 18 May 2011 12:07:28 +0200 Subject: [Python-ideas] PEP-3151 pattern-matching In-Reply-To: References: <4D9D792D.2020403@egenix.com> <4D9ECE64.7070402@egenix.com> <20110512192645.02de509b@pitrou.net> <4DCC2FAA.7010409@egenix.com> <20110512231729.6b91b373@pitrou.net> Message-ID: On 18 May 2011 00:41, Chris Rebert wrote: > On Tue, May 17, 2011 at 7:14 AM, dag.odenhall at gmail.com > wrote: >> Excuse me if this has already been discussed, but couldn't >> __instancecheck__ be used to add exception types that match with more >> precision? > > Somewhat related bug: > http://bugs.python.org/issue12029 Interesting. If that is intentional I'd advocate against it unless there's a strong argument for it. Another idea (also likely already proposed) would be to match against instances as well, by the 'args' attribute: try: ... except IOError(32): # isinstance IOError and .args == (32,) ... If this seems crazy consider that it's (to some extent) similar to the behavior of 'raise'. From jeanpierreda at gmail.com Wed May 18 14:24:04 2011 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 18 May 2011 08:24:04 -0400 Subject: [Python-ideas] PEP-3151 pattern-matching In-Reply-To: References: <4D9D792D.2020403@egenix.com> <4D9ECE64.7070402@egenix.com> <20110512192645.02de509b@pitrou.net> <4DCC2FAA.7010409@egenix.com> <20110512231729.6b91b373@pitrou.net> Message-ID: On Wed, May 18, 2011 at 6:07 AM, dag.odenhall at gmail.com wrote: > Interesting. If that is intentional I'd advocate against it unless > there's a strong argument for it. > > Another idea (also likely already proposed) would be to match against > instances as well, by the 'args' attribute: > > try: > ? ?... > except IOError(32): ?# isinstance IOError and .args == (32,) > ? ?... > > If this seems crazy consider that it's (to some extent) similar to the > behavior of 'raise'. Unfortunately, as described it wouldn't match IOError(32, 'Blah blah blah'). Although maybe it makes sense to create an Anything builtin, which is equal to everything, such that IOError(X, Y) == IOError(X, Anything) == IOError(X, Z) for all X, Y, and Z (except stupid X like X = nan). I do like it. Devin Jeanpierre From jeanpierreda at gmail.com Wed May 18 14:46:29 2011 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 18 May 2011 08:46:29 -0400 Subject: [Python-ideas] Rename python.exe to python3.exe on Windows In-Reply-To: <4DC5FFB7.6050605@pearwood.info> References: <87zkmykmvj.fsf@benfinney.id.au> <4DC5FFB7.6050605@pearwood.info> Message-ID: I think going the Arch route might be unrealistic, because there will be no Python 2.8, and thus no chance to rename python.exe to python2.exe. Arch can do it because they have their own distribution of Python, Microsoft Windows does not. All I can think of is having Python 3 installers also install symlinks or batch scripts for Python 2 installations. This could break because you might install a Python 2 installation after Python 3 (and in an unexpected place, if the installer tries to predict things). If 3.3 has its executable renamed, the worst situation is that python refers to 3.1 or 3.2, and python3 to 3.3. This could be resolved by removing 3.2 or 3.1 from the PATH (and if you still wanted to access them, adding python31.exe and python32.exe symlinks somewhere on the PATH). This situation is increasingly unlikely to occur as time goes on and fewer people put 3.2 or 3.1 on the PATH at all. Devin Jeanpierre On Sat, May 7, 2011 at 10:28 PM, Steven D'Aprano wrote: > Ben Finney wrote: > >> If the default ?python? were Python 3.x, programs expecting Python 2.x >> would most likely break due to backward incompatibility. So it's best if >> the ?python? program invokes only Python 2.x. > > The first sentence is true. The second is a value judgement, not a statement > of fact, and the people behind Arch Linux disagree with you. > > http://www.archlinux.org/news/python-is-now-python-3/ > > I say, good on 'em. > > I wish I could find the quote somebody made about Arch being the distro that > makes Gentoo seem cautious and conservative... something about Arch moving > forward so the Gentoo folks know which mistakes not to make? > > > > -- > Steven > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From ethan at stoneleaf.us Wed May 18 22:10:15 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 18 May 2011 13:10:15 -0700 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: <4DD3EC7A.8070801@stoneleaf.us> References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> Message-ID: <4DD427A7.3060606@stoneleaf.us> As those who have to work with byte strings know, when retrieving a single character from a byte string, what you get back is not a byte string, but an int -- a rather important distinction from unicode strings (str). This has the frustrating side-effect of b'abc'[2] == b'c' being False. It is far too late to change that particular behavior of the byte string (returning int's, that is) -- however, it may not be too late for a non-backwards-incompatible change: have the bytes class' __eq__ method be modified so that it 1) checks to see if the bytes instance is length 1 2) checks to see if a) the other object is an int, and b) 0 <= other_obj < 256 3) if 1 and 2, make the comparison between the int and its single element instead of returning NotImplemented? This makes sense to me -- after all, the bytes class is an array of ints in range(256); it is a special case, but doesn't feel any more special than passing an int into bytes() giving a string of that many null bytes; and it would get rid of the, in my opinion ugly, idiom of some_var[i:i+1] == b'd' It would also not require a new literal syntax. Thoughts? ~Ethan~ From ethan at stoneleaf.us Wed May 18 23:11:10 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 18 May 2011 14:11:10 -0700 Subject: [Python-ideas] [Python-Dev] Python 3.x and bytes In-Reply-To: <4DD426C2.7060706@v.loewis.de> References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD41E2B.7000404@stoneleaf.us> <4DD426C2.7060706@v.loewis.de> Message-ID: <4DD435EE.7090903@stoneleaf.us> Martin v. L?wis wrote [from python-dev]: > Immutable objects that compare equal should hash equal; > so we would also have to change the hashing of byte strings. Not sure > whether that, in turn, has undesirable consequences. I thought it was the other-way-round -- if they hash equal, they should compare equal? Or is this just for immutables? > In addition, equality should be transitive, so b'A' == 65.0. I'm not sure what you're getting at... we could certainly have step 2 check for a number instead of an int, and then step 3 could extract the one element, giving an int, and then let that int compare itself with the other number, whether it be int, float, fraction, what-have-you. ~Ethan~ From fdrake at acm.org Wed May 18 23:04:13 2011 From: fdrake at acm.org (Fred Drake) Date: Wed, 18 May 2011 17:04:13 -0400 Subject: [Python-ideas] [Python-Dev] Python 3.x and bytes In-Reply-To: <4DD435EE.7090903@stoneleaf.us> References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD41E2B.7000404@stoneleaf.us> <4DD426C2.7060706@v.loewis.de> <4DD435EE.7090903@stoneleaf.us> Message-ID: On Wed, May 18, 2011 at 5:11 PM, Ethan Furman wrote: > I thought it was the other-way-round -- if they hash equal, they should > compare equal? ?Or is this just for immutables? Two values that compare equal must have equal hashes. Having equal hashes does not imply equality. -Fred -- Fred L. Drake, Jr.? ? "Give me the luxuries of life and I will willingly do without the necessities." ?? --Frank Lloyd Wright From greg.ewing at canterbury.ac.nz Thu May 19 00:13:09 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 19 May 2011 10:13:09 +1200 Subject: [Python-ideas] PEP-3151 pattern-matching In-Reply-To: References: <4D9D792D.2020403@egenix.com> <4D9ECE64.7070402@egenix.com> <20110512192645.02de509b@pitrou.net> <4DCC2FAA.7010409@egenix.com> <20110512231729.6b91b373@pitrou.net> Message-ID: <4DD44475.1050004@canterbury.ac.nz> Devin Jeanpierre wrote: > On Wed, May 18, 2011 at 6:07 AM, dag.odenhall at gmail.com > wrote: > >>except IOError(32): # isinstance IOError and .args == (32,) >> ... > > Unfortunately, as described it wouldn't match IOError(32, 'Blah blah > blah'). Also it's a bit magical -- normally one doesn't expect Haskell-like pattern matching in Python. Maybe something more explicit would be better: try: ... except IOError as e with e.errno == 32: ... -- Greg From tjreedy at udel.edu Thu May 19 05:10:01 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 18 May 2011 23:10:01 -0400 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: <4DD427A7.3060606@stoneleaf.us> References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> Message-ID: On 5/18/2011 4:10 PM, Ethan Furman wrote: > As those who have to work with byte strings know, when retrieving a > single character from a byte string, what you get back is not a byte > string, but an int -- a rather important distinction from unicode > strings (str). For all sequences, slicing (if it works at all) returns a subsequence (possibly of length 0, which is why slicing can work with out-of-bounds slice points). For all (built-in) sequences except for strings, indexing returns a member of the sequence (which is why it raises an exception for out-of-bounds indexes). Leaving aside extension and user-defined sequences, strings are unique in instead returning a length-1 subsequence So bytes are normal while strings are anomolous! Why that anomaly? The immediate reason is that Python does not have a separate character type. Why not? Guido might best answer (but he might say 'my gut instinct'), but I can think of a few reasons. 1. That is how it is in the (math) theory of strings. 'A' is both a char and a string of length one. There is no separate 'char' type that cannot be added (concatenated) to other strings of whatever length. 2. (Related) This pragmatically works best for Python. 3. Python follows Occam's principle by not introducing types without necessity. And a separate char type is not *necessary*. 4. Text strings are homegeneous arrays (like the arrays in the array module), unlike heterogeneous tuples and lists. So they need not be sequences of Python objects, and for efficiency, would not be even if there were a character type. Like other arrays, they contain the information needed to produce Python objects on demand without actually containing such objects in the way tuples, lists, and dicts do. I do, however, understand the tendency to think of bytes as strings because of both Python's history and the remnant string interface. For people using non-Latin (non-ascii) alphabets, the 'convenience' of replacing some bytes with ascii-chars might be less convenient. -- Terry Jan Reedy From jeanpierreda at gmail.com Thu May 19 07:02:32 2011 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Thu, 19 May 2011 01:02:32 -0400 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> Message-ID: On Wed, May 18, 2011 at 11:10 PM, Terry Reedy wrote: > For all sequences, slicing (if it works at all) returns a subsequence > (possibly of length 0, which is why slicing can work with out-of-bounds > slice points). For all (built-in) sequences except for strings, indexing > returns a member of the sequence (which is why it raises an exception for > out-of-bounds indexes). Leaving aside extension and user-defined sequences, > strings are unique in instead returning a length-1 subsequence So bytes are > normal while strings are anomolous! I don't see the necessity of saying that length-1 strings aren't members of strings. For all definitions I can think of for "member of the sequence", they are. You get them when you iterate over them, you get them when you use index access, they work with .index(). They have a sort of infinite regress / cycle to them ("it's strings all the way down"), but you can get that with lists too (x = []; x.append(x); y = x + x -- compare with x = 'a'; y = x + x). > 1. That is how it is in the (math) theory of strings. 'A' is both a char and > a string of length one. There is no separate 'char' type that cannot be > added (concatenated) to other strings of whatever length. At least in the context of formal language theory (e.g. Sipser's Introduction to the Theory of Computation), characters (symbols) are a separate thing from strings. You have your alphabet, Sigma, which is an arbitrary set, and strings are finite sequences of elements from Sigma. In Python's case, it's chosen an alphabet where all elements are length-1 strings in the alphabet. I don't think that's really well-formed using this definition of string and ZFC, and the usual definitions of finite sequences (functions or linked-lists). It doesn't really matter, you can model it in something else. > I do, however, understand the tendency to think of bytes as strings because > of both Python's history and the remnant string interface. I would add the syntax of bytes literals to the list of similarities. br'\foo' versus r'\foo' makes them very similar. > For people using non-Latin (non-ascii) alphabets, the 'convenience' of > replacing some bytes with ascii-chars might be less convenient. Eh, actually I think what was suggested was having w.g. b'\x42' == 0x42 by making singleton bytes objects equal to the appropriate integer. This would work for all bytes, not just those smaller than 128. Devin Jeanpierre From dag.odenhall at gmail.com Thu May 19 10:33:11 2011 From: dag.odenhall at gmail.com (dag.odenhall at gmail.com) Date: Thu, 19 May 2011 10:33:11 +0200 Subject: [Python-ideas] PEP-3151 pattern-matching In-Reply-To: <4DD44475.1050004@canterbury.ac.nz> References: <4D9D792D.2020403@egenix.com> <4D9ECE64.7070402@egenix.com> <20110512192645.02de509b@pitrou.net> <4DCC2FAA.7010409@egenix.com> <20110512231729.6b91b373@pitrou.net> <4DD44475.1050004@canterbury.ac.nz> Message-ID: On 19 May 2011 00:13, Greg Ewing wrote: > Devin Jeanpierre wrote: >> >> On Wed, May 18, 2011 at 6:07 AM, dag.odenhall at gmail.com >> wrote: >> >>> except IOError(32): ?# isinstance IOError and .args == (32,) >>> ?... >> >> Unfortunately, as described it wouldn't match IOError(32, 'Blah blah >> blah'). > > Also it's a bit magical -- normally one doesn't expect > Haskell-like pattern matching in Python. > > Maybe something more explicit would be better: > > ?try: > ? ?... > ?except IOError as e with e.errno == 32: > ? ?... Then we're back where we started in which case I prefer 'if' as the keyword. From andrew at acooke.org Thu May 19 14:27:57 2011 From: andrew at acooke.org (andrew cooke) Date: Thu, 19 May 2011 08:27:57 -0400 Subject: [Python-ideas] Type Metadata (and related ideas) Message-ID: <20110519122757.GA11553@acooke.org> Hi, I just finished working on a project that plays around with ABCs and function annotations. The idea was to allow for more delcarative code by adding tools to describe Pytohn data in more detail. While I don't think the result is suitable for adding to Python (it's way too big a change, and it's not yet proven), the process of making something consistent involved working through a lot of ideas about "types in Python" that I recorded at http://www.acooke.org/pytyp.pdf Part of that paper (pages 8 and 9) describes some issues that caused particular problems, including: * The lack of annotations on type generators makes it hard to use annotations as a way of completely describing types. * There seems to be a missing ABC for __getitem__ (which unites lists, dicts and tuples). * As ever, mutability is complicated :o) If we had copy on write lists (which already exist), perhaps we could hash instances efficiently? (OK, this may be already discussed, but I had to mention it) * Given duck typing, shouldn't AttributeError be a TypeError (or vice versa?) * For this particular use-case, an __instancehook__ (which would work much like __subclasshook__) for ABCMeta would have been useful (as described in the paper, polymorphism in Python occurs at the instance level, so asking about the types of instances makes a surprising amount of sense, if done right). Anyway, apologies if some or all of this is old news or inapprorpiate. I just thought people here might find it interesting (you can do things like type check functions and use dynamic dispatch by type - all in a fairly pythonic way... (imho)) Cheers, Andrew PS The project home and more docs are at http://www.acooke.org/pytyp/ ; the code is at http://code.google.com/p/pytyp/ ; pypi page is http://pypi.python.org/pypi/pytyp From jackdied at gmail.com Fri May 20 06:46:14 2011 From: jackdied at gmail.com (Jack Diederich) Date: Fri, 20 May 2011 00:46:14 -0400 Subject: [Python-ideas] function defaults and an empty() builtin Message-ID: During a code review I got asked a question about a "pythonic" idiom I've been asked about before. The code was like this: def func(optional=None): if optional is None: optional = [] The question was why the optional value wasn't set to an empty list in the first place. The answer is that Really Bad Things can happen if someone actually goes and manipulates that empty list because all future callers will see the modified version. I don't think this defensive programming practice is yet passe - I can think of lots of unit tests that wouldn't trigger bad behavior. You would have to intentionally provoke it by adding some unit tests to be sure. What would make my life a little easier is a builtin container named "empty()" that emulates all builtin containers and raises an exception for any add/subtract manipulations. Something like: class empty(): def _bad_user(self, *args): raise ValueError("empty objects are empty") append = pop = __getitem__ = add = setdeafult = __ior__ = __iand__ = _bad_user def _empty(self): return [] items = keys = values = get = _empty return nothing when asked for something and raise a ValueError when any attempt is made to add/remove items. -Jack From stephen at xemacs.org Fri May 20 07:44:30 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 20 May 2011 14:44:30 +0900 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> Message-ID: <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > For people using non-Latin (non-ascii) alphabets, the 'convenience' of > replacing some bytes with ascii-chars might be less convenient. For us, the convenience remains. Japanese mail is transmitted via SMTP, and the control function "hello" is still spelled "EHLO" in Japanese mail. Farsi web pages are formatted by HTML, and the control function "new line" is spelled "
" in Farsi, of course. It's the pain that comes from the inevitable mixing of binary protocol that looks like text with real text, turning the whole into an unintelligible garble, that hurts so much harder for people who can't properly write their names in ASCII. ???????????????-ly y'rs, From steve at pearwood.info Fri May 20 07:57:04 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 20 May 2011 15:57:04 +1000 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: Message-ID: <201105201557.05197.steve@pearwood.info> On Fri, 20 May 2011 02:46:14 pm Jack Diederich wrote: > During a code review I got asked a question about a "pythonic" idiom > I've been asked about before. The code was like this: > > def func(optional=None): > if optional is None: > optional = [] > > The question was why the optional value wasn't set to an empty list > in the first place. The answer is that Really Bad Things can happen > if someone actually goes and manipulates that empty list because all > future callers will see the modified version. Assuming that this behaviour is not intended. However, I agree that, in general, the behaviour of mutable defaults in Python is a Gotcha. > I don't think this defensive programming practice is yet passe - I > can think of lots of unit tests that wouldn't trigger bad behavior. > You would have to intentionally provoke it by adding some unit tests > to be sure. Er, yes... how is that different from any other behaviour, good or bad? You have to write the unit tests to test the behaviour you want to test for, or else it won't be tested. > What would make my life a little easier is a builtin container named > "empty()" that emulates all builtin containers and raises an > exception for any add/subtract manipulations. Something like: I don't think that this idea will actually be as useful as you think it will, but in any case, why does it need to be a built-in? def func(optional=empty()): ... works just as well whether empty is built-in or not. But as I said, I don't think this will fly. What's the point? If you don't pass an argument for optional, and get a magic empty list, your function will raise an exception as soon as it tries to do something with the list. To my mind, that makes it rather useless. If you want the function to raise an exception if the default value is used, surely it's better to just make the argument non-optional. But perhaps I've misunderstood something. -- Steven D'Aprano From masklinn at masklinn.net Fri May 20 08:37:30 2011 From: masklinn at masklinn.net (Masklinn) Date: Fri, 20 May 2011 08:37:30 +0200 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <201105201557.05197.steve@pearwood.info> References: <201105201557.05197.steve@pearwood.info> Message-ID: <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> On 2011-05-20, at 07:57 , Steven D'Aprano wrote: > > But as I said, I don't think this will fly. What's the point? If you > don't pass an argument for optional, and get a magic empty list, your > function will raise an exception as soon as it tries to do something > with the list. To my mind, that makes it rather useless. If you want > the function to raise an exception if the default value is used, surely > it's better to just make the argument non-optional. > > But perhaps I've misunderstood something. > That Jack's object would be an empty, immutable collection. Not an arbitrary object. The idea is rather similar to Java's Collections.empty* (emptySet, emptyMap and emptyList), which could be fine solutions to this issue indeed. There is already `frozenset` for sets, but there is no way to instantiate an immutable list or dict in Python right now, as far as I know (tuples don't work as several mixed list&tuple operations yield an error, and I don't like using tuples as sequences personally). The ability to either make a collection (list or dict, maybe via separate functions) immutable or to create a special immutable empty variant thereof would work nicely. From tjreedy at udel.edu Fri May 20 10:28:39 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 May 2011 04:28:39 -0400 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/20/2011 1:44 AM, Stephen J. Turnbull wrote: > > For people using non-Latin (non-ascii) alphabets, the 'convenience' of > > replacing some bytes with ascii-chars might be less convenient. > > For us, the convenience remains. I understood the thrust of this thread being that doing text manipulation with bytes sometimes bites -- because bytes are not text. Someone writing email or html bodies in Japanese or Farsi will not even try that, but will use str (unicode) and encode to bytes only when done, most likely transparently.. As far as I noticed, Ethan did not explain why he was extracting single bytes and comparing to a constant, so it is hard to know if he was even using them properly. > Japanese mail is transmitted via > SMTP, and the control function "hello" is still spelled "EHLO" in > Japanese mail. I am not familiar with that control function, but if it is part of the SMTP protocol, it has nothing to do with the language of the payload. For programming a wire protocol that encodes abstract functions in ascii chars, then the ascii char representation of bytes in convenient. That is why it was chosen as the default. > Farsi web pages are formatted by HTML, and the control > function "new line" is spelled "
" in Farsi, of course. When writing the html *text* body, sure. But I presume browsers decode encoded bytes to unicode *before* parsing the text. If so, it does not really matter that '
' gets encoded to b'
'. > It's the pain that comes from the inevitable mixing of binary protocol > that looks like text with real text, turning the whole into an > unintelligible garble, that hurts so much harder for people who can't > properly write their names in ASCII. > > ???????????????-ly y'rs, -- Terry Jan Reedy From steve at pearwood.info Fri May 20 13:54:30 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 20 May 2011 21:54:30 +1000 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> Message-ID: <201105202154.30833.steve@pearwood.info> On Fri, 20 May 2011 04:37:30 pm you wrote: > On 2011-05-20, at 07:57 , Steven D'Aprano wrote: > > But as I said, I don't think this will fly. What's the point? If > > you don't pass an argument for optional, and get a magic empty > > list, your function will raise an exception as soon as it tries to > > do something with the list. To my mind, that makes it rather > > useless. If you want the function to raise an exception if the > > default value is used, surely it's better to just make the argument > > non-optional. > > > > But perhaps I've misunderstood something. > > That Jack's object would be an empty, immutable collection. Not an > arbitrary object. Yes, I get that, but what's the point? What's an actual use-case for it? What's the point of having an immutable collection that has the same methods as a list, but raises an exception if you use them? Most importantly, why single out an *empty* immutable list for special treatment, instead of providing a general immutable list type? It seems to me that all this suggested pattern does is use a too-clever and round-about way of turning a buggy function into an exception for the caller, possibly a long way from where the error actually exists. I can't think of any reason I would use this special empty() value as a default instead of either: - fix the function to not use the same default list; or - if using a default value causes problems, don't use a default value [...] > The ability to either make a collection (list or dict, maybe via > separate functions) immutable or to create a special immutable empty > variant thereof would work nicely. These are two different issues. Being able to freeze an object would be handy, but a special dedicated empty immutable list strikes me as completely pointless. -- Steven D'Aprano From masklinn at masklinn.net Fri May 20 14:14:09 2011 From: masklinn at masklinn.net (Masklinn) Date: Fri, 20 May 2011 14:14:09 +0200 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <201105202154.30833.steve@pearwood.info> References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> Message-ID: On 2011-05-20, at 13:54 , Steven D'Aprano wrote: > On Fri, 20 May 2011 04:37:30 pm you wrote: >> On 2011-05-20, at 07:57 , Steven D'Aprano wrote: >>> But as I said, I don't think this will fly. What's the point? If >>> you don't pass an argument for optional, and get a magic empty >>> list, your function will raise an exception as soon as it tries to >>> do something with the list. To my mind, that makes it rather >>> useless. If you want the function to raise an exception if the >>> default value is used, surely it's better to just make the argument >>> non-optional. >>> >>> But perhaps I've misunderstood something. >> >> That Jack's object would be an empty, immutable collection. Not an >> arbitrary object. > > Yes, I get that, but what's the point? What's an actual use-case for it? > What's the point of having an immutable collection that has the same > methods as a list, but raises an exception if you use them? Not if you use them, if you *modify* them. I'm guessing the point is to be able to avoid the `if collection is None` dance when the collection is not *supposed* to be modified: an immutable collection would immediately raise on modification, acting as a precondition/invariant and ensuring mutation is not introduced on the original collection. > Most importantly, why single out an *empty* immutable list for special > treatment, instead of providing a general immutable list type? I'm pretty sure I mentioned that as a good idea in the following two paragraphs of my comment. But in Jack's case, I'm guessing it's because the Python bug of collections-as-default-values is most generally encountered with empty collections. > I can't think of any reason I would use this special empty() value as a > default instead of either: > > - fix the function to not use the same default list; or > - if using a default value causes problems, don't use a default value But that's the very issue: the mutable-collection-default is a common bug, and one which may be quite hard to debug in the long term (not just that, but it may not even be visible as a bug ? even though data is corrupted ? and manifest itself as an even harder to track memory leak). By making that default-empty-collection immutable, mutations of the default-argument collection become obvious (they blow up), and the function can be fixed. It's much easier to track this down than a strange memory leak. On 2011-05-20, at 13:54 , Steven D'Aprano wrote: >> The ability to either make a collection (list or dict, maybe via >> separate functions) immutable or to create a special immutable empty >> variant thereof would work nicely. > These are two different issues. Being able to freeze an object would be > handy, but a special dedicated empty immutable list strikes me as > completely pointless. An immutable empty collection can be a system-wide singleton, and extremely cheap to use. It makes for a good default value or default object member when you expect the collection to never be modified. Using `collections.empty_list` is also more readable and clearer than, say, `collections.freeze([])`. From ethan at stoneleaf.us Fri May 20 15:05:37 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 May 2011 06:05:37 -0700 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4DD66721.3010508@stoneleaf.us> Terry Reedy wrote: > As far as I noticed, Ethan did not explain why he was extracting single > bytes and comparing to a constant, so it is hard to know if he was even > using them properly. The header of a .dbf file details the field composition such as name, size, type, etc. The type is C for character, L for logical, etc, and the end of the field definition block is signaled by a CR byte. So in one spot of my code I (used to) have a comparison if hdr[0] == b'\x0d': # end of fields which I have changed to if hdr[0] == 0x0d: and elsewhere: field_type = hdr[11] which is now field_type = chr(hdr[11]) since the first 127 positions of unicode are ASCII. However, I can see this silently producing errors for values between 128 and 255 -- consider: --> chr(0xa1) '?' --> b'\xa1'.decode('cp1251') '\u040e' So because my single element access to the byte string lost its bytes type, I may no longer get the correct result. ~Ethan~ From dsdale24 at gmail.com Fri May 20 15:14:41 2011 From: dsdale24 at gmail.com (Darren Dale) Date: Fri, 20 May 2011 09:14:41 -0400 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: <4DD66721.3010508@stoneleaf.us> References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <4DD66721.3010508@stoneleaf.us> Message-ID: On Fri, May 20, 2011 at 9:05 AM, Ethan Furman wrote: > Terry Reedy wrote: >> >> As far as I noticed, Ethan did not explain why he was extracting single >> bytes and comparing to a constant, so it is hard to know if he was even >> using them properly. > > The header of a .dbf file details the field composition such as name, size, > type, etc. ?The type is C for character, L for logical, etc, and the end of > the field definition block is signaled by a CR byte. > > So in one spot of my code I (used to) have a comparison > > if hdr[0] == b'\x0d': # end of fields > > which I have changed to > > if hdr[0] == 0x0d: > > and elsewhere: > > field_type = hdr[11] > > which is now > > field_type = chr(hdr[11]) > > since the first 127 positions of unicode are ASCII. > > However, I can see this silently producing errors for values between 128 and > 255 -- consider: > > --> chr(0xa1) > '?' > --> b'\xa1'.decode('cp1251') > '\u040e' > > So because my single element access to the byte string lost its bytes type, > I may no longer get the correct result. Can you use a single element stride as a workaround? >>> b'01234' b'01234' >>> b'01234'[0] 48 >>> b'01234'[0:1] b'0' From p.f.moore at gmail.com Fri May 20 15:28:19 2011 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 20 May 2011 14:28:19 +0100 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: <4DD66721.3010508@stoneleaf.us> References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <4DD66721.3010508@stoneleaf.us> Message-ID: On 20 May 2011 14:05, Ethan Furman wrote: > Terry Reedy wrote: >> >> As far as I noticed, Ethan did not explain why he was extracting single >> bytes and comparing to a constant, so it is hard to know if he was even >> using them properly. > > The header of a .dbf file details the field composition such as name, size, > type, etc. ?The type is C for character, L for logical, etc, and the end of > the field definition block is signaled by a CR byte. > > So in one spot of my code I (used to) have a comparison > > if hdr[0] == b'\x0d': # end of fields > > which I have changed to > > if hdr[0] == 0x0d: This seems to me to be an improvement, regardless... > and elsewhere: > > field_type = hdr[11] > > which is now > > field_type = chr(hdr[11]) > > since the first 127 positions of unicode are ASCII. That seems reasonable, if you have a fixed set of known-ASCII values that are field types. If you care about detecting invalid files, then do a field_type in 'CL...' test to validate and you're fine. > However, I can see this silently producing errors for values between 128 and > 255 -- consider: > > --> chr(0xa1) > '?' > --> b'\xa1'.decode('cp1251') > '\u040e' But those aren't valid field codes, so why do you care? And why are you using cp1251? I thought you said they were ASCII? As I said, if you're checking for error values, just start with either a check for specific values, or simply check the field type is <128. > So because my single element access to the byte string lost its bytes type, > I may no longer get the correct result. I still don't see your problem here... Paul. From ncoghlan at gmail.com Fri May 20 17:03:04 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 21 May 2011 01:03:04 +1000 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> Message-ID: I share Steve's puzzlement as the intended use case. To get value from the magic empty immutable list, you will have to explicitly test that calling your function with the default value does the right thing. But if you're writing an explicit test, having that test call the function *twice* to confirm correct use of the 'is None' idiom will work just as well. There are limits to how much we can help people that don't test their code. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Fri May 20 17:16:46 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 21 May 2011 01:16:46 +1000 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: <4DD66721.3010508@stoneleaf.us> References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <4DD66721.3010508@stoneleaf.us> Message-ID: On Fri, May 20, 2011 at 11:05 PM, Ethan Furman wrote: > which is now > > field_type = chr(hdr[11]) This is definitely a modelling problem, and exactly the kind of thinking that the bytes model in Py3k is intended to combat. Bytes are not text, even when you're dealing primarily with ASCII. The world where that mindset worked consistently and reliably is ancient history (and many non-English speakers still suffer annoying software glitches due to the fact that English speakers have been able to get by with only ASCII for so long). If you want a subscript on a bytes object to create another bytes object, then slice it, just as you would a list. If you want the integer value, index it. > So because my single element access to the byte string lost its bytes type, I may no longer get the correct result. Umm, no. You may not get the correct result because you're telling Python to interpret a value as a Unicode code point when it is actually no such thing (given your example, I assume it is actually cp1251 encoded text). Therefore, instead of: chr(hdr[11]) # Only makes sense for a sequence of Unicode code points you want something like: hdr[11:12].decode('cp1251') # Makes sense for a cp1251 encoded byte sequence Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ethan at stoneleaf.us Fri May 20 18:31:04 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 May 2011 09:31:04 -0700 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> Message-ID: <4DD69748.3030608@stoneleaf.us> Masklinn wrote: > I'm guessing the point is to be able to avoid the `if collection is None` > dance when the collection is not *supposed* to be modified: an immutable > collection would immediately raise on modification, acting as a > precondition/invariant and ensuring mutation is not introduced on the > original collection. If the function can't proceed properly without an actual parameter, why supply a default? Make it required, and then the function will blow up when it's called without one. I suppose there could be a case where one is going to iterate through a collection, and useful work may still happen if said collection is empty, and one is feeling too lazy to create an empty one on the spot where the function is called and so relies an the immutable empty default... but if one knows all that one should be able to not call any mutating methods. But the original problem is that an empty list is used as the default because an actual list is expected. I think the problem has been misunderstood -- it's not *if* the list gets modified, but *when* -- so you would have the same dance, only instead of None, your now saying if default == empty(): default = [] So you haven't saved a thing, and still don't really get the purpose behind mutable defaults. ~Ethan~ From janssen at parc.com Fri May 20 19:35:12 2011 From: janssen at parc.com (Bill Janssen) Date: Fri, 20 May 2011 10:35:12 PDT Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <4DD66721.3010508@stoneleaf.us> Message-ID: <30322.1305912912@parc.com> Nick Coghlan wrote: > On Fri, May 20, 2011 at 11:05 PM, Ethan Furman wrote: > > which is now > > > > field_type = chr(hdr[11]) > > This is definitely a modelling problem, and exactly the kind of > thinking that the bytes model in Py3k is intended to combat. > > Bytes are not text, even when you're dealing primarily with ASCII. The To me, that's the crux of this issue, and that's the reason this will keep coming up again and again, and that's the reason people will continue to want to "improve" the 'bytes' type to be more 'string-like'. The problem, of course, is that bytes often *are* text, in the sense that the byte sequence contains an encoded string, and the programmer both knows that and wants that. Even for non-ASCII strings. Because Python is widely used for processing encoded strings of various kinds, and programmers hate to decode/encode just to work on them *as* strings. Mind you, that's exactly the wrong thing to do, in my opinion. It just gets us back to the bad old days of Python 2, where strings were often kept in a sequence of bytes which had no way of indicating what encoding it had. But changing the mindset of programmers? Hard to do, very hard to do. Personally, I think a more realistic approach might be to (a) improve the implementation of 'str()' so that it avoids unnecessary decode/encode operations, decoding only when necessary (yes, that means there would be multiple C-level representations for a 'str'), and then (b) making 'bytes' less useful as strings. Bill From masklinn at masklinn.net Fri May 20 19:51:30 2011 From: masklinn at masklinn.net (Masklinn) Date: Fri, 20 May 2011 19:51:30 +0200 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <4DD69748.3030608@stoneleaf.us> References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <4DD69748.3030608@stoneleaf.us> Message-ID: On 2011-05-20, at 18:31 , Ethan Furman wrote: > Masklinn wrote: >> I'm guessing the point is to be able to avoid the `if collection is None` >> dance when the collection is not *supposed* to be modified: an immutable >> collection would immediately raise on modification, acting as a >> precondition/invariant and ensuring mutation is not introduced on the >> original collection. > > If the function can't proceed properly without an actual parameter, why supply a default? It can, where did you get the idea that it could not? That's the point of the default parameter. > But the original problem is that an empty list is used as the default because an actual list is expected. I think the problem has been misunderstood -- it's not *if* the list gets modified, but *when* -- so you would have the same dance, only instead of None, your now saying > > if default == empty(): > default = [] > > So you haven't saved a thing, and still don't really get the purpose behind mutable defaults. No, the point of empty() (or whatever it would be called) would very much be to forbid mutation of the default parameter. I used the word *if* because that is precisely what I meant: if the default parameter is modified, an error has been introduced into the function. empty() is both an empty list (because the code iterates over a list for instance, or maps it, or what have you) and an assertion that this list is *not* to be modified. From masklinn at masklinn.net Fri May 20 20:15:18 2011 From: masklinn at masklinn.net (Masklinn) Date: Fri, 20 May 2011 20:15:18 +0200 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> Message-ID: <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> On 2011-05-20, at 17:03 , Nick Coghlan wrote: > I share Steve's puzzlement as the intended use case. > > To get value from the magic empty immutable list, you will have to > explicitly test that calling your function with the default value does > the right thing. Why is that? The value of the empty immutable list (there's nothing magic to it) would be an eternal assertion that an incorrect behavior (trying to mutate the default parameter) can not be introduced in the function. It is no different than adding `assert` calls in the code. > But if you're writing an explicit test, having that test call the > function *twice* to confirm correct use of the 'is None' idiom will > work just as well. But that's the point: do you *always* use the `is None` idiom? And do you really love it? When you know the function body you just wrote does not perform any modification to the collection? There are 17 functions or methods with list default parameters and 133 with dict default parameters in the Python standard library. Surely some of them legitimately make use of a mutable default parameter as some kind of process-wide cache or accumulator, but I would doubt the majority does (why would SMTP.sendmail need to accumulate data in its mail_options parameter across runs?) Do you know for sure that no mutation of these 150+ parameters will ever be introduced, that all of these functions and methods are sufficiently tested, called often enough that the introduction of a mutation of the default parameter in themselves or one of their callees would *never* be able to pass muster? From ethan at stoneleaf.us Fri May 20 20:56:48 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 May 2011 11:56:48 -0700 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <4DD69748.3030608@stoneleaf.us> Message-ID: <4DD6B970.10102@stoneleaf.us> Masklinn wrote: > On 2011-05-20, at 18:31 , Ethan Furman wrote: >> Masklinn wrote: >>> I'm guessing the point is to be able to avoid the `if collection is None` >>> dance when the collection is not *supposed* to be modified: an immutable >>> collection would immediately raise on modification, acting as a >>> precondition/invariant and ensuring mutation is not introduced on the >>> original collection. >> >> If the function can't proceed properly without an actual parameter, why supply >> a default? > > It can, where did you get the idea that it could not? That's the point of the > default parameter. Yes, I am aware. And the point of providing an empty list as a default is so you have a list to add things to -- so what have you gained by providing an empty frozen list as a default? Seems to me all you have now is a built-in time bomb -- every call = a blow up. >> But the original problem is that an empty list is used as the default because >> an actual list is expected. I think the problem has been misunderstood -- it's >> not *if* the list gets modified, but *when* -- so you would have the same dance, >> only instead of None, your now saying >> >> if default == empty(): >> default = [] >> >> So you haven't saved a thing, and still don't really get the purpose behind >> mutable defaults. > > No, the point of empty() (or whatever it would be called) would very much be to > forbid mutation of the default parameter. I used the word *if* because that is > precisely what I meant: if the default parameter is modified, an error has been > introduced into the function. > > empty() is both an empty list (because the code iterates over a list for instance, > or maps it, or what have you) and an assertion that this list is *not* to be > modified. So what happens when you provide a *real* list, that is to be modified? Not modify it? Or have code that is constantly checking to see if it's okay to modify the list because it might be the immutable empty() object? ~Ethan~ From masklinn at masklinn.net Fri May 20 20:48:15 2011 From: masklinn at masklinn.net (Masklinn) Date: Fri, 20 May 2011 20:48:15 +0200 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <4DD6B970.10102@stoneleaf.us> References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <4DD69748.3030608@stoneleaf.us> <4DD6B970.10102@stoneleaf.us> Message-ID: On 2011-05-20, at 20:56 , Ethan Furman wrote: > Masklinn wrote: >> On 2011-05-20, at 18:31 , Ethan Furman wrote: >>> Masklinn wrote: >>>> I'm guessing the point is to be able to avoid the `if collection is None` >>>> dance when the collection is not *supposed* to be modified: an immutable >>>> collection would immediately raise on modification, acting as a >>>> precondition/invariant and ensuring mutation is not introduced on the >>>> original collection. > >> >>> If the function can't proceed properly without an actual parameter, why supply > >> a default? > > >> It can, where did you get the idea that it could not? That's the point of the > > default parameter. > > Yes, I am aware. And the point of providing an empty list as a default is so you have a list to add things to Not at all, you may just want to iterate on it, or accumulate it. There are cases of exactly this in the standard library itself. > -- so what have you gained by providing an empty frozen list as a default? Seems to me all you have now is a built-in time bomb -- every call = a blow up. See above, your assumption is flawed and all reasoning following it is nonsense. >>> But the original problem is that an empty list is used as the default because > >> an actual list is expected. I think the problem has been misunderstood -- it's > >> not *if* the list gets modified, but *when* -- so you would have the same dance, > >> only instead of None, your now saying >>> >>> if default == empty(): >>> default = [] >>> >>> So you haven't saved a thing, and still don't really get the purpose behind > >> mutable defaults. > > >> No, the point of empty() (or whatever it would be called) would very much be to > > forbid mutation of the default parameter. I used the word *if* because that is > > precisely what I meant: if the default parameter is modified, an error has been > > introduced into the function. >> empty() is both an empty list (because the code iterates over a list for instance, > > or maps it, or what have you) and an assertion that this list is *not* to be > > modified. > > So what happens when you provide a *real* list, that is to be modified? Again, this default parameter is for functions which *are not supposed to* modify collections they were provided as parameters (which is the vast majority of functions, really). > Not modify it? Or have code that is constantly checking to see if it's okay to modify the list because it might be the immutable empty() object? This "objection" is absolutely nonsensical. Please cease. From ethan at stoneleaf.us Fri May 20 21:12:45 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 May 2011 12:12:45 -0700 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> Message-ID: <4DD6BD2D.5050801@stoneleaf.us> Masklinn wrote: > On 2011-05-20, at 17:03 , Nick Coghlan wrote: >> I share Steve's puzzlement as the intended use case. >> >> To get value from the magic empty immutable list, you will have to >> explicitly test that calling your function with the default value does >> the right thing. > Why is that? The value of the empty immutable list (there's nothing magic > to it) would be an eternal assertion that an incorrect behavior (trying > to mutate the default parameter) can not be introduced in the function. > > It is no different than adding `assert` calls in the code. > >> But if you're writing an explicit test, having that test call the >> function *twice* to confirm correct use of the 'is None' idiom will >> work just as well. > > But that's the point: do you *always* use the `is None` idiom? And do > you really love it? When you know the function body you just wrote > does not perform any modification to the collection? In this scenario: def func(mylist=empty()): do_some_stuff_with_mylist mylist is the empty() object, you *know* func() does not modify mylist, you are wrong (heh) and it does... but your program always calls func() with an actual list -- how is empty() going to save you then? Hint: it won't. And if you're thinking a unittest would catch that -- yes it would, but it would also catch it without empty() (make a copy first, call the func(), compare afterwards -- different? Mutation!) ~Ethan~ From tjreedy at udel.edu Fri May 20 21:05:17 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 May 2011 15:05:17 -0400 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: <4DD66721.3010508@stoneleaf.us> References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <4DD66721.3010508@stoneleaf.us> Message-ID: On 5/20/2011 9:05 AM, Ethan Furman wrote: > The header of a .dbf file details the field composition such as name, > size, type, etc. The type is C for character, L for logical, etc, and > the end of the field definition block is signaled by a CR byte. At the level of bytes, these are small int codes. For English speakers, it is convenient that most map to ascii chars that are the first letters of an English name of the type. This convinience is somewhat lost for non-English non-latin-alphabet speakers who cannot do the same. > So in one spot of my code I (used to) have a comparison > > if hdr[0] == b'\x0d': # end of fields > > which I have changed to > > if hdr[0] == 0x0d: Some people dislike magic constants in code and would suggest defining them at the top of the file (or even in a separate module) with comment that define and explain the protocol. # Field type codes T_log = ... # Logical field with T or F T_char= ... # Variable length char field T_efdb= 0x0d # End of field definition block Take your pick of how to define the constants: >>> 0x0d == 13 == 0o15 == 0b1101 == ord(b'\r') == ord('\r') == b'\r'[0] True In 3.x, the identifies and comments can use any characters and language, so this works for everyone. -- Terry Jan Reedy From ethan at stoneleaf.us Fri May 20 21:22:52 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 May 2011 12:22:52 -0700 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <4DD69748.3030608@stoneleaf.us> <4DD6B970.10102@stoneleaf.us> Message-ID: <4DD6BF8C.9020704@stoneleaf.us> Masklinn wrote: > On 2011-05-20, at 20:56 , Ethan Furman wrote: >> Masklinn wrote: >>> Ethan wrote: >>>> If the function can't proceed properly without an actual parameter, why supply >>>> a default? >>> >>> It can, where did you get the idea that it could not? That's the point of the >>> default parameter. >> >> Yes, I am aware. And the point of providing an empty list as a default is so you have a list to add things to > > Not at all, you may just want to iterate on it, or accumulate it. There are cases of exactly this in the standard library itself. Um, isn't accumulating modifying? Or do you mean accumulating in a global or class instance? And why would you iterate over an empty list? If you have an example from the stdlib I'd love to see it (seriously -- I'm always up for learning something). ~Ethan~ From masklinn at masklinn.net Fri May 20 21:10:25 2011 From: masklinn at masklinn.net (Masklinn) Date: Fri, 20 May 2011 21:10:25 +0200 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <4DD6BD2D.5050801@stoneleaf.us> References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> <4DD6BD2D.5050801@stoneleaf.us> Message-ID: <9202D06F-4D45-45C9-A487-9C8097494716@masklinn.net> On 2011-05-20, at 21:12 , Ethan Furman wrote: > > In this scenario: > > def func(mylist=empty()): > do_some_stuff_with_mylist > > mylist is the empty() object, you *know* func() does not modify mylist, you are wrong (heh) and it does... but your program always calls func() with an actual list -- how is empty() going to save you then? It will not until one day func() is called without an actual list, and then you get a clear and immediate error instead of silent data corruption, or a memory leak, which are generally the result of an improperly mutated default argument collection and much harder to spot. As I wrote in part of the message you quoted (but ignored), empty() acts as an assertion. The assertion is that the default parameter will never be modified, and if the assertion fails, an error is generated. That's it. That's a pretty common bug in Python, and it solves it. No more, and no less. From tjreedy at udel.edu Fri May 20 21:11:12 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 May 2011 15:11:12 -0400 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <4DD69748.3030608@stoneleaf.us> Message-ID: On 5/20/2011 1:51 PM, Masklinn wrote: I am as puzzled as other people. > empty() is both an empty list (because the code iterates over a list > for instance, or maps it, or what have you) and an assertion that > this list is *not* to be modified. So use () as the default. It has all the methods of [] except for the mutation methods. -- Terry Jan Reedy From bruce at leapyear.org Fri May 20 21:10:55 2011 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 20 May 2011 12:10:55 -0700 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> Message-ID: It seems to me that a better way of doing this is: def func(optional_list=[]) optional_list = freeze(optional_list) That is, if I expect the list to be immutable when it's empty, why wouldn't I expect it to be immutable when it's not empty? The only case where the immutability of the empty list matters is if there's a bug that changes the list (or a called function changes its signature when that wasn't expected). Wouldn't it be worth protecting against that when the list isn't empty as well? Of course PEP 351 was rejected so there is no freeze() builtin. (Although personally, I don't agree with all the arguments against it. For example, I don't think that freezing a dict has to be hashable. I also think that immutable objects are useful in unit testing where it's allows me to easily be sure that a passed in dict isn't changed by a function.) Anyway, in the case of a list I suspect that this is pretty close to what you want: def func(optional_list=[]) optional_list = tuple(optional_list) --- Bruce Latest blog post: http://www.vroospeak.com Your social security number is a very poor password Learn how to hack web apps: http://j.mp/gruyere-security (learn how to write buggy Python too) On Fri, May 20, 2011 at 11:15 AM, Masklinn wrote: > On 2011-05-20, at 17:03 , Nick Coghlan wrote: > > I share Steve's puzzlement as the intended use case. > > > > To get value from the magic empty immutable list, you will have to > > explicitly test that calling your function with the default value does > > the right thing. > Why is that? The value of the empty immutable list (there's nothing magic > to it) would be an eternal assertion that an incorrect behavior (trying > to mutate the default parameter) can not be introduced in the function. > > It is no different than adding `assert` calls in the code. > > > But if you're writing an explicit test, having that test call the > > function *twice* to confirm correct use of the 'is None' idiom will > > work just as well. > But that's the point: do you *always* use the `is None` idiom? And do > you really love it? When you know the function body you just wrote > does not perform any modification to the collection? > > There are 17 functions or methods with list default parameters and > 133 with dict default parameters in the Python standard library. > > Surely some of them legitimately make use of a mutable default > parameter as some kind of process-wide cache or accumulator, but > I would doubt the majority does (why would SMTP.sendmail need to > accumulate data in its mail_options parameter across runs?) > > Do you know for sure that no mutation of these 150+ parameters will > ever be introduced, that all of these functions and methods are > sufficiently tested, called often enough that the introduction of > a mutation of the default parameter in themselves or one of their > callees would *never* be able to pass muster? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Fri May 20 21:18:30 2011 From: masklinn at masklinn.net (Masklinn) Date: Fri, 20 May 2011 21:18:30 +0200 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <4DD6BF8C.9020704@stoneleaf.us> References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <4DD69748.3030608@stoneleaf.us> <4DD6B970.10102@stoneleaf.us> <4DD6BF8C.9020704@stoneleaf.us> Message-ID: On 2011-05-20, at 21:22 , Ethan Furman wrote: > Masklinn wrote: >> On 2011-05-20, at 20:56 , Ethan Furman wrote: >>> Masklinn wrote: > >>> Ethan wrote: >>>>> If the function can't proceed properly without an actual parameter, why supply >>>>> a default? > >>> >>>> It can, where did you get the idea that it could not? That's the point of the >>>> default parameter. > >> >>> Yes, I am aware. And the point of providing an empty list as a default is so you have a list to add things to > > >> Not at all, you may just want to iterate on it, or accumulate it. There are cases of exactly this in the standard library itself. > > Um, isn't accumulating modifying? No. `reduce` does not alter the list in place, nor does `sum`, `any` or iterating on the list. > Or do you mean accumulating in a global or class instance? Accumulating can be done in anything. > And why would you iterate over an empty list? Because you're iterating period, and that it's an empty list has no influence on your behavior. You'll simply do nothing during your iteration, because the iteration count will be 0. Why special-case empty lists when there is no need to? Same with dict, `get` works on empty dicts as well as on any other such collection. > If you have an example from the stdlib I'd love to see it (seriously -- I'm always up for learning something). Mailcap does that line 170: `subst` takes an empty list as a default parameter, forwards that parameter to `findparam` which iterates on the list to try and find the param. If it can't find the param in the list, it simply returns an empty string. An empty list is simply a case where it will never find the param, and it will Just Work. No need to create a special case. Have you really never done such a thing? From masklinn at masklinn.net Fri May 20 21:27:42 2011 From: masklinn at masklinn.net (Masklinn) Date: Fri, 20 May 2011 21:27:42 +0200 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> Message-ID: On 2011-05-20, at 21:10 , Bruce Leban wrote: > It seems to me that a better way of doing this is: > > def func(optional_list=[]) > optional_list = freeze(optional_list) > > That is, if I expect the list to be immutable when it's empty, why wouldn't > I expect it to be immutable when it's not empty? The only case where the > immutability of the empty list matters is if there's a bug that changes the > list (or a called function changes its signature when that wasn't expected). > Wouldn't it be worth protecting against that when the list isn't empty as > well? Absolutely, but the mutation of the default parameters seems to be the main problem (historically): it's a memory leak, and it's a global data corruption, where modifying a provided parameter is a local data corruption (unless the object passed in is global of course). Ideally, you could just add a decorator or an annotation doing that for you without additional work and name mutation within the function. > Anyway, in the case of a list I suspect that this is pretty close to what > you want: > > def func(optional_list=[]) > optional_list = tuple(optional_list) But does not necessarily work depending on what callees demand (a callee may be trying to concatenate that to a list of its own, and concatenating lists and tuples does not work). Plus, it does not help with dicts, which can expose the same issue. From bruce at leapyear.org Fri May 20 21:53:06 2011 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 20 May 2011 12:53:06 -0700 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> Message-ID: On Fri, May 20, 2011 at 12:27 PM, Masklinn wrote: > > Absolutely, but the mutation of the default parameters seems to be the main > problem (historically): it's a memory leak, and it's a global data > corruption, > where modifying a provided parameter is a local data corruption (unless the > object passed in is global of course). > > Agreed, but that wasn't how I interpreted Jack's question. I think there are two issues (with the first one the one that I think Jack was targeting): (1) I want to make sure a parameter is immutable so I don't accidentally change it (and none of the functions I call can do that). Akin to declaring a parameter const in C-like languages. For example, is_ip_blocked(ip_address, blocked_ip_list). (That's not just ip_address in blocked_ip_list if the list contains CIDR addresses.) We could add code to make the values immutable or use annotations: @const def func(x : const, y : const = []): pass Personally, I like declaring the contract that a parameter is not being modified explicitly and I would like a shallow freeze() function. (2) The gotcha that the default value is the same value every time rather than a new value. Lot's of ways to deal with this but none of them work without educating people how the feature works. For example: @copy def func(x : copy, y : copy = []): pass --- Bruce Latest blog post: http://www.vroospeak.com Your social security number is a very poor password Learn how to hack web apps: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From stutzbach at google.com Fri May 20 22:21:02 2011 From: stutzbach at google.com (Daniel Stutzbach) Date: Fri, 20 May 2011 13:21:02 -0700 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: Message-ID: On Thu, May 19, 2011 at 9:46 PM, Jack Diederich wrote: > return nothing when asked for something and raise a ValueError when > any attempt is made to add/remove items. > Couldn't you just use the empty immutable version for whatever type the optional might be? For sequences, use (). For sets, use frozenset(). For dicts, use ... oh. Crap. -- Daniel Stutzbach -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri May 20 22:24:08 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 20 May 2011 14:24:08 -0600 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> Message-ID: On Fri, May 20, 2011 at 1:53 PM, Bruce Leban wrote: > On Fri, May 20, 2011 at 12:27 PM, Masklinn wrote: > >> >> Absolutely, but the mutation of the default parameters seems to be the >> main >> problem (historically): it's a memory leak, and it's a global data >> corruption, >> where modifying a provided parameter is a local data corruption (unless >> the >> object passed in is global of course). >> >> > Agreed, but that wasn't how I interpreted Jack's question. I think there > are two issues (with the first one the one that I think Jack was targeting): > > (1) I want to make sure a parameter is immutable so I don't accidentally > change it (and none of the functions I call can do that). Akin to declaring > a parameter const in C-like languages. For example, is_ip_blocked(ip_address, > blocked_ip_list). (That's not just ip_address in blocked_ip_list if the > list contains CIDR addresses.) We could add code to make the values > immutable or use annotations: > > @const > def func(x : const, y : const = []): > pass > > Personally, I like declaring the contract that a parameter is not being > modified explicitly and I would like a shallow freeze() function. > > (2) The gotcha that the default value is the same value every time rather > than a new value. Lot's of ways to deal with this but none of them work > without educating people how the feature works. For example: > > @copy > def func(x : copy, y : copy = []): > pass > > One bad solution to both would be to have the language enforce that default values cannot be of mutable type. Though not a valid solution, that idea highlights the only two reasons I can see for using mutable defaults: - caching across function calls - having your default be of the same type as your expected argument (a documentation of sorts) The idiom of using None that Jack originally described is an acceptable alternative to using mutable defaults. It identifies no expectations on the type of the argument. It explicitly indicates that None is a valid argument. It implies that it will be special-cased in the function/class. It is easy to be consistent using None regardless of the expected type of the argument. The advantage of Jack's original proposal is that in cases where you are not modifying the argument your won't need to plug the correct object in for None, so that if statement he included would not be necessary. -eric > --- Bruce > Latest blog post: http://www.vroospeak.com Your social security number is > a very poor password > Learn how to hack web apps: http://j.mp/gruyere-security > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jackdied at gmail.com Sat May 21 01:10:01 2011 From: jackdied at gmail.com (Jack Diederich) Date: Fri, 20 May 2011 19:10:01 -0400 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> Message-ID: On Fri, May 20, 2011 at 11:03 AM, Nick Coghlan wrote: > I share Steve's puzzlement as the intended use case. > > To get value from the magic empty immutable list, you will have to > explicitly test that calling your function with the default value does > the right thing. > > But if you're writing an explicit test, having that test call the > function *twice* to confirm correct use of the 'is None' idiom will > work just as well. > > There are limits to how much we can help people that don't test their code. The use case isn't very fancy, it's to have a generic empty iterable as a place holder in function defs instead of doing the "if x is None" dance. Unit tests make using a real empty iterable less likely to trigger bad behavior, but because the behavior of real empty iterables in function defs is tricky and non-intuitive, unit tests for some of those functions would need some extra boilerplate. That might not be a bad tradeoff compared to adding extra "if x is None" checks for each optional arg to a function. FYI, here is the code that triggered the query. The "if None" check is mostly habit with a small dose of pedagogical reinforcement for other devs that would read it. def query_sphinx(search_text, include=None, exclude=None): if include is None: include = {} if exclude is None: exclude = {} query = sphinxapi.client() for field, values in include.items(): query.SetFilter(field, values) for field, values in exclude.items(): query.SetFilter(field, values, exclude=True) return query.query(search_text) -Jack From ericsnowcurrently at gmail.com Sat May 21 02:03:17 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 20 May 2011 18:03:17 -0600 Subject: [Python-ideas] __implements__ on arguments to ABCMeta.register Message-ID: ABCMeta.register is great. It adds the cls argument to the _abc_registry of the ABC. However, the class that was passed in does not get touched. If you then want to find out to which classes a class has been registered, you can't find out from that class. Whereas your can find out from an abstract base class which classes have been registered to it. I propose having ABCMeta.register add/update a special method __implements__ to the class that is getting registered. This would not be done to builtin/extension types. It adds the ABC to the __implements__ of the subclass that is getting registered. Something along these lines, right before the final return in the method: if not hasattr(subclass, "__implements__"): try: subclass.__implements__ = {cls} except TypeError: pass else: subclass.__implements__.add(cls) This is a small addition, but I realize it [potentially] adds another special method to classes, so it's not trivial. The use case is that I want to be able to validate that a class implements all of the abstract methods of all the classes to which it has been registered. I don't have a programmatic way of discovering that set without asking every class out there. This is an easy way to accomplish this (for non-extension/non-builtin types). An alternative is to subclass ABCMeta and tack this on, but that only works for my ABCs. Another is to use a class decorator to do this any place I do a register (or even to do the register too), but again, only for the places that I do the registration. Anyway, if it's useful to me then it may be useful to others, so I wanted to put this out there. I expect this has come up before, particularly during discussions about PEP 3119. However, I wasn't able to track down anything specifically about doing this sort of "reverse registration". And, of course, I may be overestimating the value of this functionality. If this does not seem that valuable to anyone else, then no big deal. :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbaker3448 at gmail.com Sat May 21 02:57:23 2011 From: dbaker3448 at gmail.com (Dan Baker) Date: Fri, 20 May 2011 19:57:23 -0500 Subject: [Python-ideas] Filtered "for" loop with list-comprehension-like syntax Message-ID: One common pattern I run across when parsing plain text data files is that I want to skip over blank lines when processing. If I wanted to build a list of all non-blank lines in the file, I could simply do: lines = [line for line in input_file if line.strip()] But as a loop, it almost invariably gets written as: for line in input_file: if not line.strip(): continue # do the real processing It seems odd that "for x in y if z" is allowed in comprehensions but not in a regular for loop. Why not let for x in y if z: do_stuff(x) be a shorthand for for x in y: if not z: continue do_stuff(x) Similarly, I occasionally have multiple sections that need to be handled differently. One way to write this is: for line in input_file: if is_section_delimiter(line): break do_stuff_1(line) for line in input_file: # this picks up where the last one left off if is_section_delimiter(line): break do_stuff_2(line) etc. (This is a little bit of a weird idiom with files since repeated iteration over them remembers where it left off, at least in 2.7.) It would be nice to have this shorthand for it: for line in input_file while not is_section_delimiter(line): do_stuff_1(line) for line in input_file while not is_section_delimiter(line): do_stuff_2(line) etc. This makes it more immediately clear (to me, at least) that it stops at the end of the section. This could also be added to comprehensions; it's somewhat tricky to emulate in comprehensions now. I think the easiest way to do the equivalent of [f(x) for x in y while z] with a comprehension is a = [(f(x) if z else None) for x in y] try: idx = a.index(None) except ValueError: # no None found pass else: # truncate before first None a = a[:idx] but even that fails if None is a potentially valid result of f(x) (or if you forget to use the try/except block and z was always True), and it processes the entire list even though it may throw out a sizable chunk of it immediately after. The only totally safe way I can think of to do it now is by unpacking it into a loop: a = [] for x in y: if not z: break a.append(f(x)) I think adding these would make such idioms a little more readable, but it might not be enough of a gain to justify a syntax addition. Thoughts? Dan Baker From ben+python at benfinney.id.au Sat May 21 03:51:51 2011 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 21 May 2011 11:51:51 +1000 Subject: [Python-ideas] Filtered "for" loop with list-comprehension-like syntax References: Message-ID: <87liy0ddmw.fsf@benfinney.id.au> Dan Baker writes: > It seems odd that "for x in y if z" is allowed in comprehensions but > not in a regular for loop. Why not let > > for x in y if z: > do_stuff(x) > > be a shorthand for > > for x in y: > if not z: > continue > do_stuff(x) This can already be spelled: for x in (w for w in y if z): do_stuff(x) Which is not to forestall discussion of the proposed language change, but only to point out that there is an existing idiom for this. > Similarly, I occasionally have multiple sections that need to be > handled differently. One way to write this is: > for line in input_file: > if is_section_delimiter(line): > break > do_stuff_1(line) > for line in input_file: # this picks up where the last one left off > if is_section_delimiter(line): > break > do_stuff_2(line) > etc. That looks like it would be better modelled with an explicit state transition when the condition is encountered, without stopping the iteration: handlers = [do_stuff_1, do_stuff_2, do_stuff_3] handle_line = handlers.pop(0) for line in input_file: if is_section_delimiter(line): handle_line = handlers.pop(0) handle_line(line) -- \ ?Philosophy is questions that may never be answered. Religion | `\ is answers that may never be questioned.? ?anonymous | _o__) | Ben Finney From guido at python.org Sat May 21 04:10:13 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 20 May 2011 19:10:13 -0700 Subject: [Python-ideas] Filtered "for" loop with list-comprehension-like syntax In-Reply-To: References: Message-ID: On Fri, May 20, 2011 at 5:57 PM, Dan Baker wrote: > One common pattern I run across when parsing plain text data files is > that I want to skip over blank lines when processing. If I wanted to > build a list of all non-blank lines in the file, I could simply do: > > lines = [line for line in input_file if line.strip()] > > But as a loop, it almost invariably gets written as: > > for line in input_file: > ? if not line.strip(): > ? ? ?continue > ? # do the real processing > > It seems odd that "for x in y if z" is allowed in comprehensions but > not in a regular for loop. Why not let > > for x in y if z: > ? do_stuff(x) > > be a shorthand for > > for x in y: > ? if not z: > ? ? ?continue > ? do_stuff(x) Yes, "why not" indeed. Because you can already do that in any number of different ways; you showed one (two if you count the comprehension), another is for x in y: if z: do_stuff(x) Do we really need more ways to spell the same thing? (Hint: this is a rhetorical question. I recommend you study the zen of Python before replying.) > Similarly, I occasionally have multiple sections that need to be > handled differently. One way to write this is: > for line in input_file: > ? if is_section_delimiter(line): > ? ? ?break > ? do_stuff_1(line) > for line in input_file: # this picks up where the last one left off > ? if is_section_delimiter(line): > ? ? ? break > ? do_stuff_2(line) > etc. > (This is a little bit of a weird idiom with files since repeated > iteration over them remembers where it left off, at least in 2.7.) > > It would be nice to have this shorthand for it: > for line in input_file while not is_section_delimiter(line): > ? do_stuff_1(line) > for line in input_file while not is_section_delimiter(line): > ? do_stuff_2(line) > etc. > > This makes it more immediately clear (to me, at least) Ay, there's the rub. More syntactical options means more things to learn for every single Python user. One of the attractions of Python is that it is relatively small and simple. Let's keep it that way! > that it stops > at the end of the section. This could also be added to comprehensions; > it's somewhat tricky to emulate in comprehensions now. I think the > easiest way to do the equivalent of [f(x) for x in y while z] with a > comprehension is > a = [(f(x) if z else None) for x in y] > try: > ? idx = a.index(None) > except ValueError: # no None found > ? pass > else: # truncate before first None > ? a = a[:idx] > but even that fails if None is a potentially valid result of f(x) (or > if you forget to use the try/except block and z was always True), and > it processes the entire list even though it may throw out a sizable > chunk of it immediately after. The only totally safe way I can think > of to do it now is by unpacking it into a loop: > a = [] > for x in y: > ? if not z: > ? ? ?break > ? a.append(f(x)) > > I think adding these would make such idioms a little more readable, > but it might not be enough of a gain to justify a syntax addition. > Thoughts? Indeed it is not enough. -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Sat May 21 04:13:21 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 21 May 2011 12:13:21 +1000 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> References: <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> Message-ID: <201105211213.21596.steve@pearwood.info> On Sat, 21 May 2011 04:15:18 am you wrote: > On 2011-05-20, at 17:03 , Nick Coghlan wrote: > > I share Steve's puzzlement as the intended use case. > > > > To get value from the magic empty immutable list, you will have to > > explicitly test that calling your function with the default value > > does the right thing. > > Why is that? The value of the empty immutable list (there's nothing > magic to it) would be an eternal assertion that an incorrect behavior > (trying to mutate the default parameter) can not be introduced in the > function. It's not the caller's responsibility to avoid mangling the internals of the function. It is the function's responsibility to avoid exposing those internals. This suggestion seems crazy to me. Let me try to explain from the point of view of the caller. Suppose I call func and get a list back: x = func(a) So I can treat x as a list, because that's what it is: x.append(None) But if I fail to pass an argument, and the default empty() is used, I get something that looks like a list: y = func() hasattr(y, "append") # returns True but blows up when I try to use it: y.append(None) # raise an exception All because the function author doesn't want me modifying the return result. And why does the author care what I do with the result? Because he's exposing the default function value in such a way that the caller can mangle it. If it causes problems when the function returns the default value, stop returning the default value! Don't push the burden onto the caller by dropping a landmine into their code. Now you can "fix" this, for some definition of "fix", by documenting the fact that not passing the argument will result in something other than a list: "If you don't pass an argument, and use the default, then you will get back an immutable empty sequence that has the same API as a list but that will raise an exception if you try to mutate it." This is downright awful API design. As the caller, I simply don't care about the function author's difficulties in ensuring that the default value is not modified. That's Not My Problem. Fix your own buggy code. (Not that it is actually difficult: the idiom for mutable default values is two simple lines.) What Is My Problem is that rather than fix his function, the author has dumped the problem in my lap. Now I have this immutable empty sequence that is useless to me. I either have to detect it and change it myself: result = func(*args) # args could be empty if result is empty(): # Fix stupid design flaw in func result = [] or I have to remember to never, under any circumstances, call func() without supplying an argument. [...] > There are 17 functions or methods with list default parameters and > 133 with dict default parameters in the Python standard library. > > Surely some of them legitimately make use of a mutable default > parameter as some kind of process-wide cache or accumulator, but > I would doubt the majority does (why would SMTP.sendmail need to > accumulate data in its mail_options parameter across runs?) > > Do you know for sure that no mutation of these 150+ parameters will > ever be introduced, that all of these functions and methods are > sufficiently tested, called often enough that the introduction of > a mutation of the default parameter in themselves or one of their > callees would *never* be able to pass muster? Fine, you've discovered 150 potentially buggy functions in the standard library. If the authors didn't remember to use the default=None idiom in their functions, what makes you think that they'd remember to use default=empty() instead? This suggested idiom is counterproductive. The function author doesn't save any work -- he still has to remember not to write default=[] in his functions. The author's burden is increased, because now he has to choose between three idioms instead of two: # use this when default is like a cache default=[] # use this when you need to mutate default within the function default=None if default is None: default = [] # use this when you want to return the default value but don't want # the caller to mutate it default=empty() And the caller's burden is increased, because now he has to deal with this immutable list instead of a real list. -- Steven D'Aprano From steve at pearwood.info Sat May 21 05:12:58 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 21 May 2011 13:12:58 +1000 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <9202D06F-4D45-45C9-A487-9C8097494716@masklinn.net> References: <4DD6BD2D.5050801@stoneleaf.us> <9202D06F-4D45-45C9-A487-9C8097494716@masklinn.net> Message-ID: <201105211312.58826.steve@pearwood.info> On Sat, 21 May 2011 05:10:25 am Masklinn wrote: > On 2011-05-20, at 21:12 , Ethan Furman wrote: > > In this scenario: > > > > def func(mylist=empty()): > > do_some_stuff_with_mylist > > > > mylist is the empty() object, you *know* func() does not modify > > mylist, you are wrong (heh) and it does... but your program always > > calls func() with an actual list -- how is empty() going to save > > you then? > > It will not until one day func() is called without an actual list, > and then you get a clear and immediate error instead of silent data > corruption, or a memory leak, I wouldn't call it a memory leak. As I understand it, a memory leak is normally understood to mean that your program is assigning memory in such a way that neither you, nor the compiler, can free it, not that you merely haven't noticed that you're assigning memory. Since the default value is exposed, either the function or the caller can free that memory. > which are generally the result of an > improperly mutated default argument collection and much harder to > spot. You are simply wrong there. There is no reason to imagine that the caller will *immediately* attempt to modify the result: y = func() # returns immutable empty list y.append(None) The attempt to mutate y might not happen until much later, in some distant part of the code, in another function, or module, or thread, or even another process. There is no limit to how distant in time or space the exception could be. It's a landmine waiting to blow up, not an assertion. y = func() # ... much later data = {'key': y} # ... much later still params.update(data) response = connect('something', params) def connect(x, params): a = params.get('key', []) a.append('something') When connect fails, there's nothing to associate the error with the mistake of calling func() without supplying an argument. But note that calling func() without an argument is supposed to be legal. Why is it a mistake? It's only a mistake because func exposes internal data to the caller, and then compounds that bug by punishing the caller for inadvertently modifying that internal data rather than not exposing it in the first place. This is *astonishingly* awful design. [...] > That's it. That's a pretty common bug in Python, and it solves it. No > more, and no less. This doesn't solve the problem, it just creates a new one. If people can't remember to use the "if default is None" idiom, what makes you think they will remember to use empty()? And if they do remember, they're just disguising their bug as the caller's mistake. -- Steven D'Aprano From steve at pearwood.info Sat May 21 05:19:49 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 21 May 2011 13:19:49 +1000 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <4DD6BF8C.9020704@stoneleaf.us> Message-ID: <201105211319.49732.steve@pearwood.info> On Sat, 21 May 2011 05:18:30 am Masklinn wrote: > Why special-case empty lists when there is no need to? That's a remarkable statement. What is empty() except a special case for empty lists? If you and Jack are serious about this proposal, it would require at least two such functions, emptylist and emptydict, not just empty(). And even if you are right that it solves the problem of default=[] (which you aren't, but for the sake of the argument lets pretend), it doesn't solve the general issue of mutable defaults. As I said, having a freeze() function that creates an immutable list might be a good idea, although not for the default argument issue. (I'm not entirely sure how that differs from tuple, but that's another issue...) But special casing a frozen empty list seems silly, and the use-case given by the OP, and defended by you, is actively harmful. -- Steven D'Aprano From greg.ewing at canterbury.ac.nz Sat May 21 05:57:30 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 21 May 2011 15:57:30 +1200 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <4DD69748.3030608@stoneleaf.us> <4DD6B970.10102@stoneleaf.us> Message-ID: <4DD7382A.6020104@canterbury.ac.nz> Masklinn wrote: > Again, this default parameter is for functions which *are not supposed to* modify > collections they were provided as parameters (which is the vast majority of functions, > really). Your empty() default would do nothing to catch attempts to modify a passed-in list. If you're worried about the function erroneously modifying the default value, you should be just as worried about that. -- Greg From dbaker3448 at gmail.com Sat May 21 06:06:04 2011 From: dbaker3448 at gmail.com (Dan Baker) Date: Fri, 20 May 2011 23:06:04 -0500 Subject: [Python-ideas] Filtered "for" loop with list-comprehension-like syntax In-Reply-To: References: Message-ID: I had a feeling that might be the answer. Sometimes a little syntactic sugar isn't bad, but even reasonable people won't always agree on which kinds - and too far down that road lies Perl. Thanks anyway. Dan From greg.ewing at canterbury.ac.nz Sat May 21 06:10:55 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 21 May 2011 16:10:55 +1200 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> Message-ID: <4DD73B4F.5000902@canterbury.ac.nz> Masklinn wrote: > Absolutely, but the mutation of the default parameters seems to be the main > problem (historically): The most common problem regarding default parameters is mutation of them by functions which *are* supposed to modify the parameter. IMO you're trying to solve an almost-nonexistent problem. > it's a global data corruption, > where modifying a provided parameter is a local data corruption It's just as damaging, though -- the program still produces incorrect results. -- Greg From guido at python.org Sat May 21 06:14:51 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 20 May 2011 21:14:51 -0700 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <4DD73B4F.5000902@canterbury.ac.nz> References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> <4DD73B4F.5000902@canterbury.ac.nz> Message-ID: Please end this thread. The original empty() proposal is clearly not working, and no modification of it is going to work. The recommended pattern is very clear and matches the Zen of Python: Explicit is better than implicit. -- --Guido van Rossum (python.org/~guido) From cs at zip.com.au Sat May 21 07:25:04 2011 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 21 May 2011 15:25:04 +1000 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: Message-ID: <20110521052504.GA25306@cskk.homeip.net> On 20May2011 15:11, Terry Reedy wrote: | On 5/20/2011 1:51 PM, Masklinn wrote: | | I am as puzzled as other people. | | >empty() is both an empty list (because the code iterates over a list | >for instance, or maps it, or what have you) and an assertion that | >this list is *not* to be modified. | | So use () as the default. It has all the methods of [] except for | the mutation methods. You're missing the point. This thread is about providing a complex solution to a common problem. Your technique of providing a simple solution to the problem doesn't help the thread persist. [ Hmm, I see my random sig quoter has hit the money again:-) Truly, the quote below was pot luck! ] Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ If you can keep your head while all those about you are losing theirs, perhaps you don't understand the situation. - Paul Wilson From tjreedy at udel.edu Sat May 21 23:06:01 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 21 May 2011 17:06:01 -0400 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <4DD69748.3030608@stoneleaf.us> <4DD6B970.10102@stoneleaf.us> <4DD6BF8C.9020704@stoneleaf.us> Message-ID: On 5/20/2011 3:18 PM, Masklinn wrote: > > Because you're iterating period, and that it's an empty list has no > influence on your behavior. You'll simply do nothing during your > iteration, because the iteration count will be 0. Why special-case > empty lists when there is no need to? An empty tuple () works fine for that. You never explained in any way I could remotely understand why you want something else. -- Terry Jan Reedy From tjreedy at udel.edu Sat May 21 23:48:55 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 21 May 2011 17:48:55 -0400 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> Message-ID: On 5/20/2011 7:10 PM, Jack Diederich wrote: > The use case isn't very fancy, it's to have a generic empty iterable If () and frozenset() are not generic enough for *you*, try this: def empty(): raise StopIteration yield emp = empty() for i in emp: print('something') for i in emp: print('something') # prints nothing, both times. Actually, iter(()), iter([]), iter({}) behave the same when iterated. > as a place holder in function defs instead of doing the "if x is None" That idiom is for a completely differert use case: when one wants a new empty *mutable* on every call, that will be filled with values and, typically, returned. > FYI, here is the code that triggered the query. The "if None" check > is mostly habit with a small dose of pedagogical reinforcement for > other devs that would read it. > > def query_sphinx(search_text, include=None, exclude=None): > if include is None: > include = {} > if exclude is None: > exclude = {} Since you are not mutating include and exclude, there is no point to this noise. I fact, I consider it wrong because it actually *misleads* other devs who would expect something put into each of them and returned. The proper way to write this is def query_sphinx(search_text, include={}, exclude={}): which documents that the parameters should be dicts (or similar) and that they are read only. You version implies that it would be ok to write to them, which is wrong. > > query = sphinxapi.client() > > for field, values in include.items(): > query.SetFilter(field, values) > for field, values in exclude.items(): > query.SetFilter(field, values, exclude=True) > > return query.query(search_text) -- Terry Jan Reedy From tjreedy at udel.edu Sat May 21 23:57:44 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 21 May 2011 17:57:44 -0400 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> Message-ID: On 5/20/2011 3:10 PM, Bruce Leban wrote: > It seems to me that a better way of doing this is: > > def func(optional_list=[]) > optional_list = freeze(optional_list) If a function only needs a read-only sequence, it should not require or to said to require a list. "def func(optional_seq = ()):". If it only iterates through an input collection, it should only require an iterable: "def func(optional_iter=()): it = iter(optional_iter)". If you are paranoid and want the function to raise on any attempt to do much of anything with the input, replace '()' with 'iter()'. -- Terry Jan Reedy From tjreedy at udel.edu Sun May 22 00:53:14 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 21 May 2011 18:53:14 -0400 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> <19A0EE0D-2881-489B-B344-C56ABB501C27@masklinn.net> Message-ID: On 5/20/2011 2:15 PM, Masklinn wrote: = > There are 17 functions or methods with list default parameters and > 133 with dict default parameters in the Python standard library. Could you give the re or whatever that you used to find these? I might want to look and possibly change a few. > > Surely some of them legitimately make use of a mutable default > parameter as some kind of process-wide cache or accumulator, but > I would doubt the majority does (why would SMTP.sendmail need to > accumulate data in its mail_options parameter across runs?) I suspect that that nearly all of these uses are for read-only inputs. In Python 3, () could replace [] in such cases, as tuples now have all the read-only sequence methods (they once had no methods). Of course, even that does not protect against perhaps crazy code like if input: # skips empty args, default or not That suggests that all functions that are supposes to only read an input sequences should be tested with tuples. Actually, if only an iterable is needed, then such should be tested with non-seequence iterables. That is actually very easy to produce: for instance, iter((1,2,3)). Thinking about it more, it the only use of an arg is to iterate through key,value pairs, then the default could be 'iter({})' insteaad of '{}' to better document the usage. > Do you know for sure that no mutation of these 150+ parameters will > ever be introduced, that all of these functions and methods are > sufficiently tested, called often enough that the introduction of > a mutation of the default parameter in themselves or one of their > callees would *never* be able to pass muster? No. However, anyone qualified for push access to the central source should know that defaults should be treated as read-only unless documented otherwise. This is especially true for {}. So I consider it a somewhat paranoid worry, in the absence of cases where revisers *have* introduced mutation where not present before. That aside, developers have and are improving the test suite. That was the focus of the recent post-PyCon sprint. It continues with a test improvement most every day. If you want to join us volunteers to improve tests further, please do. -- Terry Jan Reedy From tjreedy at udel.edu Sun May 22 01:01:36 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 21 May 2011 19:01:36 -0400 Subject: [Python-ideas] Filtered "for" loop with list-comprehension-like syntax In-Reply-To: References: Message-ID: On 5/20/2011 8:57 PM, Dan Baker wrote: > It seems odd that "for x in y if z" is allowed in comprehensions but > not in a regular for loop. Comprehensions are expressions and therefore need everything packed into them that is needed. For statements and if statement are statements and both can be followed in there suite with an many statements as needed, so there is no *need* to pack more than is necessary into the header line. Even doc strings, which are conceptually part of the header, were put down into the suite. -- Terry Jan Reedy From tjreedy at udel.edu Sun May 22 01:41:20 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 21 May 2011 19:41:20 -0400 Subject: [Python-ideas] Use iter() for defaults (was Re: function defaults and an empty() builtin) In-Reply-To: References: <201105201557.05197.steve@pearwood.info> <2E9DE818-85B4-4601-9170-DBD041B53F56@masklinn.net> <201105202154.30833.steve@pearwood.info> Message-ID: On 5/21/2011 5:48 PM, Terry Reedy wrote: >> def query_sphinx(search_text, include=None, exclude=None): >> if include is None: >> include = {} >> if exclude is None: >> exclude = {} > > Since you are not mutating include and exclude, there is no point to > this noise. I fact, I consider it wrong because it actually *misleads* > other devs who would expect something put into each of them and > returned. The proper way to write this is > > def query_sphinx(search_text, include={}, exclude={}): > > which documents that the parameters should be dicts (or similar) and > that they are read only. You version implies that it would be ok to > write to them, which is wrong. > >> >> query = sphinxapi.client() >> >> for field, values in include.items(): >> query.SetFilter(field, values) >> for field, values in exclude.items(): >> query.SetFilter(field, values, exclude=True) >> >> return query.query(search_text) Here is back-compatible rewrite that expands the domain for 'include' and 'exclude' to iterables of key-value pairs. It both documents and ensures that query_sphinx() will do nothing but iterate through key-value pairs from the last two args. def query_sphinx(search_text, include=iter({}.values()), exclude=iter({}.values())): if isinstance(include, dict): include = include.items() if isinstance(exclude, dict): exclude = exclude.items() query = sphinxapi.client() for field, values in include(): query.SetFilter(field, values) for field, values in exclude(): query.SetFilter(field, values, exclude=True) return query.query(search_text) Lifting effectively constant expressions out of a loop is a standard technique. In this case, the 'loop' is whatever would cause repeated calls to the function without explicit args. The small define-time cost of the extra calls would eventually be saved at runtime by reusing the dict_valueiterators instead of creating equivalent ones over and over. -- Terry Jan Reedy From rob.cliffe at btinternet.com Sun May 22 13:41:45 2011 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Sun, 22 May 2011 12:41:45 +0100 Subject: [Python-ideas] function defaults and an empty() builtin In-Reply-To: <201105211319.49732.steve@pearwood.info> References: <4DD6BF8C.9020704@stoneleaf.us> <201105211319.49732.steve@pearwood.info> Message-ID: <4DD8F679.6060703@btinternet.com> On 21/05/2011 04:19, Steven D'Aprano wrote: > On Sat, 21 May 2011 05:18:30 am Masklinn wrote: >> Why special-case empty lists when there is no need to? > That's a remarkable statement. What is empty() except a special case for > empty lists? > > If you and Jack are serious about this proposal, it would require at > least two such functions, emptylist and emptydict, not just empty(). > And even if you are right that it solves the problem of default=[] > (which you aren't, but for the sake of the argument lets pretend), it > doesn't solve the general issue of mutable defaults. > Or all collections could have a "mutable" attribute which, once it has been set to False, can never subsequently be reset to True. Then you could merge lists and tuples into a single type, ditto sets and frozen sets, and you get immutable dictionaries as well. Plus a considerable simplification of the language. From stephen at xemacs.org Sun May 22 17:46:20 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 23 May 2011 00:46:20 +0900 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87boyubuwj.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > As far as I noticed, Ethan did not explain why he was extracting single > bytes and comparing to a constant, so it is hard to know if he was even > using them properly. It doesn't really matter whether Ethan is using them properly. It's clear there are such uses, though I don't know how important they are, so we may as well assume Ethan's is one such. > > Japanese mail is transmitted via SMTP, and the control function > > "hello" is still spelled "EHLO" in Japanese mail. > > I am not familiar with that control function, but if it is part of > the SMTP protocol, it has nothing to do with the language of the > payload. Precisely my point. Therefore a payload represented as bytes should be treated as *uninterpreted* bytes, except where interpretations are defined for those bytes. This works for SMTP, because RFC 822 *deliberately* specifies headers to be encoded in ASCII (not "ASCII-compatible") in order that the payload (header) manipulations specified by RFC 821 and friends be guaranteed correct. Nevertheless, people frequently request mail processing features that require manipulations of MIME part bodies and even plain RFC 822 message bodies. These cannot be guaranteed correct unless done by decoding and reencoding, but bytes-oriented manipulations generally "work" in monolingual contexts (or seem to, and any problems can always be blamed on MS Outlook). There are several such features that come up over and over again on Mailman lists and sometimes in the Python Email SIG, and I'm sure the same is true for web protocols. > > Farsi web pages are formatted by HTML, and the control > > function "new line" is spelled "
" in Farsi, of course. > > When writing the html *text* body, sure. But I presume browsers decode > encoded bytes to unicode *before* parsing the text. If so, it does not > really matter that '
' gets encoded to b'
'. HTML is not exclusively processed by browsers. It is often processed by servers and middleware that don't know they're speaking HTML, and according to several experts' testimony, they're in a freakin' hurry to push bytes out the door, there's no time for Unicode (decoding and encoding, OMG how inefficient!) Such developers want to write their libraries using bytes *and* literals that can be used both for binary protocols and for text protocols (urlparse seems to be the canonical example). The convenience of using bytes in a string-like way (eg, the b'' literal) in manipulating many binary protocols is clear. That convenience is just as great for people who are at substantial risk of mojibake if bytes are used to do text manipulations on the encoded form, as well as for people who face little risk (eg, those who use only American English). The question is how far to go with polymorphism, etc. I think that Nick's urlparse work gets the balance about right, and see only danger in more stringlike bytes (eg, by returning b'b' for b'bytes'[0]). OTOH, there are some changes that might be useful but seem very low-risk, such as a c'b' literal that means 98, not b'b'. From ncoghlan at gmail.com Mon May 23 07:46:05 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 23 May 2011 15:46:05 +1000 Subject: [Python-ideas] __implements__ on arguments to ABCMeta.register In-Reply-To: References: Message-ID: On Sat, May 21, 2011 at 10:03 AM, Eric Snow wrote: > This is a small addition, but I realize it [potentially] adds another > special method to classes, so it's not trivial. > The use case is that I want to be able to validate that a class implements > all of the abstract methods of all the classes to which it has been > registered. ?I don't have a programmatic way of discovering that set without > asking every class out there. ?This is an easy way to accomplish this (for > non-extension/non-builtin types). ?An alternative is to subclass ABCMeta and > tack this on, but that only works for my ABCs. ?Another is to use a class > decorator to do this any place I do a register (or even to do the register > too), but again, only for the places that I do the registration. > Anyway, if it's useful to me then it may be useful to others, so I wanted to > put this out there. ?I expect this has come up before, particularly during > discussions about PEP 3119. ?However, I wasn't able to track down anything > specifically about doing this sort of "reverse registration". ?And, of > course, I may be overestimating the value of this functionality. ?If this > does not seem that valuable to anyone else, then no big deal. ?:) An alternative approach to the same idea was to be able to register callbacks with ABCs to track registration and deregistration operations on that ABC and any subclasses. This has the advantage of working with arbitrary objects, including those without mutable __dict__ attributes. Such an approach would start by building a type map (via ABC.__subclasses__) and then using the callback hooks to keep the mapping up to date. I believe there is an open tracker item for that concept, but I can't currently find a reference to it. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon May 23 08:02:18 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 23 May 2011 16:02:18 +1000 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: <87boyubuwj.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <87boyubuwj.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, May 23, 2011 at 1:46 AM, Stephen J. Turnbull wrote: > The question is how far to go with polymorphism, etc. ?I think that > Nick's urlparse work gets the balance about right, and see only danger > in more stringlike bytes (eg, by returning b'b' for b'bytes'[0]). > OTOH, there are some changes that might be useful but seem very > low-risk, such as a c'b' literal that means 98, not b'b'. If we did go with an ord() literal, I would actually favour something more like 0'b'. However, as Maciej pointed out off-list, adding a new literal type because calls to builtin functions have a relatively high overhead in CPython even with constant arguments probably isn't a good idea. Better to just write "ord('b')" and use PyPy to make it fast (Alternative for use with -O rather than PyPy: "ordb = 98; assert ordb == ord('b')"). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stefan_ml at behnel.de Mon May 23 09:19:26 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 23 May 2011 09:19:26 +0200 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <87boyubuwj.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Nick Coghlan, 23.05.2011 08:02: > On Mon, May 23, 2011 at 1:46 AM, Stephen J. Turnbull wrote: >> The question is how far to go with polymorphism, etc. I think that >> Nick's urlparse work gets the balance about right, and see only danger >> in more stringlike bytes (eg, by returning b'b' for b'bytes'[0]). >> OTOH, there are some changes that might be useful but seem very >> low-risk, such as a c'b' literal that means 98, not b'b'. > > If we did go with an ord() literal, I would actually favour something > more like 0'b'. > > However, as Maciej pointed out off-list, adding a new literal type > because calls to builtin functions have a relatively high overhead in > CPython even with constant arguments probably isn't a good idea. > Better to just write "ord('b')" and use PyPy to make it fast Even CPython could optimise b'x'[0] into a constant, if people ever find this to be a bottleneck. Stefan From lists at cheimes.de Mon May 23 15:40:39 2011 From: lists at cheimes.de (Christian Heimes) Date: Mon, 23 May 2011 15:40:39 +0200 Subject: [Python-ideas] Threading hooks and disable gc per thread In-Reply-To: References: <4DCB228D.2010904@cheimes.de> Message-ID: <4DDA63D7.80305@cheimes.de> Am 15.05.2011 13:13, schrieb Nick Coghlan: (Sorry for the delay, I was swamped with work again) > So the plan is to have threading.Thread support the hooks, while > _thread.start_new_thread and creation of thread states at the C level > (including via PyGILState_Ensure) will bypass them? > > That actually sounds reasonable to me (+0), but the PEP should at > least discuss the rationale for the choice of level for the new > feature. I also suggest storing the associated hook lists at the > threading.Thread class object level rather than at the threading > module level (supporting such modularity of state being a major > advantage of only providing this feature at the higher level). I've considered both places, too. _thread.start_new_thread() as well as PyGILState_Ensure() would require a considerable amount of C coding for a feature that won't affect performance in a noticeable way. This is my answer against an C implementation in _thread.start_new_thread(). It's far too much work for a feature that can be implemented in Python easily. An implementation in the pure Python threading module will work on PyPy, IronPython and Jython instantly. I consider any library, that bypasses the threading module, broken, too. PyGILState_Ensure() or PyThreadState_New() are a different beast. I concur, it would the best place for the hooks if I could think of a way to implement the on-thread-stop hook. I don't see a way to execute some code at the end of a thread without cooperation from the calling code. > The PEP should also go into detail as to why having these hooks in a > custom Thread subclass isn't sufficient (e.g. needing to support > threads created by third party libraries, but note that such a > rationale has a problem due to the _thread.start_new_thread loophole). Understood. > Composability through inheritance should also be discussed - the hook > invocation should probably walk the MRO so it is easy to create Thread > subclasses that include class specific hooks without inadvertently > skipping the hooks installed on threading.Thread. Good idea! Do you think, it's sufficient to have hook methods like class Thread: _start_hooks = [] def on_thread_starting(self): for hook, args, kwargs in self._start_hooks: hook(*args, **kwargs) ? Subclasses of threading.Thread can easily overwrite the hook method and call its parent's on_thread_starting(). > The possibility of passing exception information to thread_end hooks > (ala __exit__ methods) should be considered, along with the general > relationship between the threading hooks and the context management > protocol. That's an interesting idea! I'll consider it. >> gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread() >> -------------------------------------------------------------- > > The default setting for this should go in the interpreter state object > rather than in a static variable (subinterpreters can then inherit the > state of their parent interpreter when they are first created). > > Otherwise sounds reasonable. (+0) A subinterpreter flag isn't enough. All subinterpreters share a common GC list. A gc.collect() inside a subinterpreter run affects the entire interpreter and not just the one subinterpreter. I've to think about the issue of subinterpreters ... If I understand the code correctly, gc.get_objects() punches a hole in the subinterpreter isolation. It returns all tracked objects of the current process -- from all subinterpreters. Is this a design issue? The fact isn't mentioned in http://docs.python.org/c-api/init.html#bugs-and-caveats. Christian From ncoghlan at gmail.com Mon May 23 16:27:04 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 24 May 2011 00:27:04 +1000 Subject: [Python-ideas] Threading hooks and disable gc per thread In-Reply-To: <4DDA63D7.80305@cheimes.de> References: <4DCB228D.2010904@cheimes.de> <4DDA63D7.80305@cheimes.de> Message-ID: On Mon, May 23, 2011 at 11:40 PM, Christian Heimes wrote: > Am 15.05.2011 13:13, schrieb Nick Coghlan: >> Composability through inheritance should also be discussed - the hook >> invocation should probably walk the MRO so it is easy to create Thread >> subclasses that include class specific hooks without inadvertently >> skipping the hooks installed on threading.Thread. > > Good idea! > > Do you think, it's sufficient to have hook methods like > > class Thread: > ? ?_start_hooks = [] > > ? ?def on_thread_starting(self): > ? ? ? ?for hook, args, kwargs in self._start_hooks: > ? ? ? ? ? ?hook(*args, **kwargs) > > ? Subclasses of threading.Thread can easily overwrite the hook method > and call its parent's on_thread_starting(). I was actually thinking of making life even easier for subclasses: class Thread: start_hooks = [] @classmethod def _on_thread_starting(cls): # Hooks in parent classes are called before hooks in child classes hook_sources = reversed(cls.__mro__) for hook_source in hook_sources: # Arguable design decision here: only look at Thread subclasses, not any mixins if not issubclass(hook_source, Thread): continue hooks = hook_src.__dict__.get("start_hooks", ()) for hook, args, kwargs in hooks: hook(*args, **kwargs) With the parent method explicitly walking the whole MRO in reverse, any subclass hooks will naturally be invoked after any parent hooks without any particular effort on the part of the subclass implementor - the just need to provide and populate a "start_hooks" attribute. The alternative would mean that overriding "_start_hooks" in a subclass would block ready access to the main hooks in Thread. >>> gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread() >>> -------------------------------------------------------------- >> >> The default setting for this should go in the interpreter state object >> rather than in a static variable (subinterpreters can then inherit the >> state of their parent interpreter when they are first created). >> >> Otherwise sounds reasonable. (+0) > > A subinterpreter flag isn't enough. All subinterpreters share a common > GC list. A gc.collect() inside a subinterpreter run affects the entire > interpreter and not just the one subinterpreter. I've to think about the > issue of subinterpreters ... > > If I understand the code correctly, gc.get_objects() punches a hole in > the subinterpreter isolation. It returns all tracked objects of the > current process -- from all subinterpreters. Is this a design issue? The > fact isn't mentioned in > http://docs.python.org/c-api/init.html#bugs-and-caveats. It's quite possible - there's a reason that heavy use of subinterpreters has a "this may fail in unexpected ways" rider attached. Still, this is the kind of thing a PEP will hopefully do a reasonable job of flushing out and resolving. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From sturla at molden.no Mon May 23 18:39:07 2011 From: sturla at molden.no (Sturla Molden) Date: Mon, 23 May 2011 18:39:07 +0200 Subject: [Python-ideas] [Python-Dev] CPython optimization: storing reference counters outside of objects In-Reply-To: <4DD9E9A7.50807@v.loewis.de> References: <4DD9E9A7.50807@v.loewis.de> Message-ID: <4DDA8DAB.2060209@molden.no> Den 23.05.2011 06:59, skrev "Martin v. L?wis": > > My expectation is that your approach would likely make the issues > worse in a multi-CPU setting. If you put multiple reference counters > into a contiguous block of memory, unrelated reference counters will > live in the same cache line. Consequentially, changing one reference > counter on one CPU will invalidate the cached reference counters of > that cache line on other CPU, making your problem a) actually worse. In a multi-threaded setting with concurrent thread accessing reference counts, this would certainly worsen the situation. In a single-threaded setting, this will likely be an improvement. CPython, however, has a GIL. Thus there is only one concurrently active thread with access to reference counts. On a thread switch in the interpreter, I think the performance result will depend on the nature of the Python code: If threads share a lot of objects, it could help to reduce the number of dirty cache lines. If threads mainly work on private objects, it would likely have the effect you predict. Which will dominate is hard to tell. Instead, we could use multiple heaps: Each Python thread could manage it's own heap for malloc and free (cf. HeapAlloc and HeapFree in Windows). Objects local to one thread only reside in the locally managed heap. When an object becomes shared by seveeral Python threads, it is moved from a local heap to the global heap of the process. Some objects, such as modules, would be stored directly onto the global heap. This way, objects only used by only one thread would never dirty cache lines used by other threads. This would also be a way to reduce the CPython dependency on the GIL. Only the global heap would need to be protected by the GIL, whereas the local heaps would not need any global synchronization. (I am setting follow-up to the Python Ideas list, it does not belong on Python dev.) Sturla Molden From fuzzyman at gmail.com Mon May 23 20:16:44 2011 From: fuzzyman at gmail.com (Michael Foord) Date: Mon, 23 May 2011 19:16:44 +0100 Subject: [Python-ideas] Implementing __dir__ (moving dir implementation to object.__dir__?) Message-ID: Hello all, I'm looking at implementing __dir__ for a class (mock.Mock as it happens) to include some dynamically added attributes, the canonical use case according to the documentation: http://docs.python.org/dev/reference/datamodel.html?highlight=__dir__#object.__dir__ What I would like to do is report all the "standard attributes", and then add any dynamically created attributes. So the question is, how do I obtain the "standard list" (the list that dir would normally report in the absence of a custom __dir__ implementation)? There is no object.__dir__ (despite the fact that this is how it is documented...) and obviously calling dir(self) within __dir__ is doomed to failure. The best I have come up with is: def __dir__(self): return dir(type(self)) + list(self.__dict__) + self._get_dynamic_attributes() This works (absent multiple inheritance), but it would be nice to just be able to do: def __dir__(self): standard = super().__dir__() return standard + self._get_dynamic_attributes() Moving the relevant parts of the implementation of dir into object.__dir__ would be one way to solve that. All the best, Michael Foord -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From amauryfa at gmail.com Mon May 23 20:22:14 2011 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Mon, 23 May 2011 20:22:14 +0200 Subject: [Python-ideas] CPython optimization: storing reference counters outside of objects In-Reply-To: References: <4DD9E9A7.50807@v.loewis.de> <4DDA8DAB.2060209@molden.no> Message-ID: Hi, 2011/5/23 Sturla Molden : > Instead, we could use multiple heaps: > > Each Python thread could manage it's own heap for malloc and free (cf. > HeapAlloc and HeapFree in Windows). Objects local to one thread only reside > in the locally managed heap. > > When an object becomes shared by seveeral Python threads, it is moved from a > local heap to the global heap of the process. Some objects, such as modules, > would be stored directly onto the global heap. Does this mean that the PyObject* address would change? How would you update all the places that store moved references? -- Amaury Forgeot d'Arc From fuzzyman at gmail.com Mon May 23 22:52:23 2011 From: fuzzyman at gmail.com (Michael Foord) Date: Mon, 23 May 2011 21:52:23 +0100 Subject: [Python-ideas] Implementing __dir__ (moving dir implementation to object.__dir__?) In-Reply-To: References: Message-ID: On 23 May 2011 19:16, Michael Foord wrote: > Hello all, > > I'm looking at implementing __dir__ for a class (mock.Mock as it happens) > to include some dynamically added attributes, the canonical use case > according to the documentation: > > > http://docs.python.org/dev/reference/datamodel.html?highlight=__dir__#object.__dir__ > > What I would like to do is report all the "standard attributes", and then > add any dynamically created attributes. > > So the question is, how do I obtain the "standard list" (the list that dir > would normally report in the absence of a custom __dir__ implementation)? > > There is no object.__dir__ (despite the fact that this is how it is > documented...) and obviously calling dir(self) within __dir__ is doomed to > failure. > > The best I have come up with is: > > def __dir__(self): > return dir(type(self)) + list(self.__dict__) + > self._get_dynamic_attributes() > Better version which orders and removes duplicates: return sorted(set((dir(type(self)) + list(self.__dict__) + self._get_dynamic_attributes())) > > This works (absent multiple inheritance), but it would be nice to just be > able to do: > > def __dir__(self): > standard = super().__dir__() > return standard + self._get_dynamic_attributes() > > Moving the relevant parts of the implementation of dir into object.__dir__ > would be one way to solve that. > > All the best, > > Michael Foord > > -- > > http://www.voidspace.org.uk/ > > May you do good and not evil > May you find forgiveness for yourself and forgive others > May you share freely, never taking more than you give. > -- the sqlite blessing http://www.sqlite.org/different.html > > > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Mon May 23 23:29:18 2011 From: sturla at molden.no (Sturla Molden) Date: Mon, 23 May 2011 23:29:18 +0200 Subject: [Python-ideas] CPython optimization: storing reference counters outside of objects In-Reply-To: References: <4DD9E9A7.50807@v.loewis.de> <4DDA8DAB.2060209@molden.no> Message-ID: <4DDAD1AE.2060207@molden.no> Den 23.05.2011 20:22, skrev Amaury Forgeot d'Arc: > > Does this mean that the PyObject* address would change? > How would you update all the places that store moved references? > That is a good point. How does the generational GC of .NET and Java deal with object relocation? Perhaps we don't need to allocate new memory and memcpy. A heap is called a "heap" because it is a priority queue of contiguous memory buffers -- free size being the criterion for partial sorting. So we pop the buffer (or parts of it?) containing the PyObject off one heap and paste it to another, the PyObject* will not change. This might not be efficient for cache lines however. Also, there is the question of attributes. Preferably a Python object and its attributes should reside on the same cache line. Sturla From greg.ewing at canterbury.ac.nz Mon May 23 23:33:35 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 24 May 2011 09:33:35 +1200 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <87boyubuwj.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4DDAD2AF.1050800@canterbury.ac.nz> Stefan Behnel wrote: > Even CPython could optimise b'x'[0] into a constant, if people ever find > this to be a bottleneck. The need to write such circumlocutions would still be a nuisance, though. -- Greg From alexander.belopolsky at gmail.com Tue May 24 00:03:38 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 23 May 2011 18:03:38 -0400 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: <4DDAD2AF.1050800@canterbury.ac.nz> References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <87boyubuwj.fsf@uwakimon.sk.tsukuba.ac.jp> <4DDAD2AF.1050800@canterbury.ac.nz> Message-ID: On Mon, May 23, 2011 at 5:33 PM, Greg Ewing wrote: > Stefan Behnel wrote: > >> Even CPython could optimise b'x'[0] into a constant, if people ever find >> this to be a bottleneck. > > The need to write such circumlocutions would still be a > nuisance, though. Not a nuisance enough to warrant a syntax change, IMO. Note that one of the proposed alternatives, 0'b' visually is very similar to b'x'[0]. There are plenty of other options available to users. My own favorite is probably, if bytesdata[i] == 98: # ord('b') .. In some cases, when single-byte values have protocol mnemonics, it may be more appropriate to give them descriptive names: quit_code = ord('q') if bytesdata[i] == quit_code: .. Finally, I find it rare to have single-byte codes at fixed positions in protocols. More often such codes are found after splitting the bytes data on some kind of separator. From steve at pearwood.info Tue May 24 01:58:23 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 24 May 2011 09:58:23 +1000 Subject: [Python-ideas] Implementing __dir__ (moving dir implementation to object.__dir__?) In-Reply-To: References: Message-ID: <201105240958.23648.steve@pearwood.info> On Tue, 24 May 2011 04:16:44 am Michael Foord wrote: > Hello all, > > I'm looking at implementing __dir__ for a class (mock.Mock as it > happens) to include some dynamically added attributes, the canonical > use case according to the documentation: [...] > Moving the relevant parts of the implementation of dir into > object.__dir__ would be one way to solve that. I haven't yet needed to write a custom __dir__, but your proposal makes sense to me. +1 -- Steven D'Aprano From benjamin at python.org Tue May 24 02:10:24 2011 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 24 May 2011 00:10:24 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Implementing_=5F=5Fdir=5F=5F_=28moving_d?= =?utf-8?b?aXIgaW1wbGVtZW50YXRpb24gdG8Jb2JqZWN0Ll9fZGlyX18/KQ==?= References: Message-ID: Michael Foord writes: > Moving the relevant parts of the implementation of dir into object.__dir__ would be one way to solve that. Sounds fine to me. Do file a bug report. From bruce at leapyear.org Tue May 24 02:18:41 2011 From: bruce at leapyear.org (Bruce Leban) Date: Mon, 23 May 2011 17:18:41 -0700 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <87boyubuwj.fsf@uwakimon.sk.tsukuba.ac.jp> <4DDAD2AF.1050800@canterbury.ac.nz> Message-ID: I like c'x'. It's easy to read and very explicitly constant and clear what the value is 'x'. (Some other letter instead of 'c' would be fine as well.) I don't like this: > if bytesdata[i] == 121: # ord('x') because it looks a heck of a lot like: > if bytesdata[i] == 120: # ord('x') and only one of those is correct. That's a very easy bug to miss. I like it even less without the comment. I don't care for: > if bytesdata[i] == ord('x'): because while ord is a builtin, it's not invulnerable to being changed. In contrast, string constants and numbers are truly constant. I recognize that the compiler can optimize: > if bytesdata[i] == b'x'[0]: but that looks like chicken scratches to me. Someone suggested using 0'x' which I don't quite get. It looks too much like 0x to me and the I've always read the leading zero to mean 'this is a number'. Also, this was raised in the context of bytes and not all characters fit in a byte. So c'?' ord('?') work but b'?'[0] won't. Is there a learning curve? Yes, but minor IMHO and if you don't know it, it's obvious when you see it that you don't know it. --- Bruce Follow me: http://www.twitter.com/Vroo Latest tweet: SO disappointed end of the world didn't happen AGAIN! #y2k #rapture Now waiting for 2038! #unixrapture -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue May 24 02:40:51 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 23 May 2011 20:40:51 -0400 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <87boyubuwj.fsf@uwakimon.sk.tsukuba.ac.jp> <4DDAD2AF.1050800@canterbury.ac.nz> Message-ID: 2011/5/23 Bruce Leban : > I like c'x'. It's easy to read and very explicitly constant and clear what > the value is 'x'. (Some other letter instead of 'c' would be fine as well.) -0 from me Mainly because unlike b'..' or r'..' constructs, no meaning is proposed for c'xyz'. BTW, is it too soon to assign new meaning to back-quotes? In py3k they no longer stand for repr(), so we can probably reuse them for ord()? On the other hand, this is likely to be a bad idea for the same reasons as syntax for repr() was. From stutzbach at google.com Tue May 24 02:54:49 2011 From: stutzbach at google.com (Daniel Stutzbach) Date: Mon, 23 May 2011 17:54:49 -0700 Subject: [Python-ideas] __implements__ on arguments to ABCMeta.register In-Reply-To: References: Message-ID: On Fri, May 20, 2011 at 5:03 PM, Eric Snow wrote: > > The use case is that I want to be able to validate that a class implements > all of the abstract methods of all the classes to which it has been > registered. > If you're going down that road, would you be willing to write a patch for http://bugs.python.org/issue9731 along the way? > I don't have a programmatic way of discovering that set without asking > every class out there. > I agree it would be nice to have a way to ask a class "which ABCs do you implement?" It would be handy for introspection and debugging purposes. -- Daniel Stutzbach -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Tue May 24 03:21:05 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 23 May 2011 19:21:05 -0600 Subject: [Python-ideas] __implements__ on arguments to ABCMeta.register In-Reply-To: References: Message-ID: On Mon, May 23, 2011 at 6:54 PM, Daniel Stutzbach wrote: > On Fri, May 20, 2011 at 5:03 PM, Eric Snow wrote: >> >> The use case is that I want to be able to validate that a class implements >> all of the abstract methods of all the classes to which it has been >> registered. >> > > If you're going down that road, would you be willing to write a patch for > http://bugs.python.org/issue9731 along the way? > > >> Interesting. I was motivated in a similar situation to write a validater in the same vein [1]. In fact, working on that is where I got thinking about something like __implements__. The class I wrote would work with registered classes in addition to subclasses, if there were such a mechanism. -eric [1] http://code.activestate.com/recipes/577711-validating-classes-and-objects-against-an-abstract/ > I don't have a programmatic way of discovering that set without asking >> every class out there. >> > > I agree it would be nice to have a way to ask a class "which ABCs do you > implement?" It would be handy for introspection and debugging purposes. > > -- > Daniel Stutzbach > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Tue May 24 03:30:00 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 23 May 2011 19:30:00 -0600 Subject: [Python-ideas] __implements__ on arguments to ABCMeta.register In-Reply-To: References: Message-ID: On Sun, May 22, 2011 at 11:46 PM, Nick Coghlan wrote: > On Sat, May 21, 2011 at 10:03 AM, Eric Snow > wrote: > > This is a small addition, but I realize it [potentially] adds another > > special method to classes, so it's not trivial. > > The use case is that I want to be able to validate that a class > implements > > all of the abstract methods of all the classes to which it has been > > registered. I don't have a programmatic way of discovering that set > without > > asking every class out there. This is an easy way to accomplish this > (for > > non-extension/non-builtin types). An alternative is to subclass ABCMeta > and > > tack this on, but that only works for my ABCs. Another is to use a class > > decorator to do this any place I do a register (or even to do the > register > > too), but again, only for the places that I do the registration. > > Anyway, if it's useful to me then it may be useful to others, so I wanted > to > > put this out there. I expect this has come up before, particularly > during > > discussions about PEP 3119. However, I wasn't able to track down > anything > > specifically about doing this sort of "reverse registration". And, of > > course, I may be overestimating the value of this functionality. If this > > does not seem that valuable to anyone else, then no big deal. :) > > An alternative approach to the same idea was to be able to register > callbacks with ABCs to track registration and deregistration > operations on that ABC and any subclasses. This has the advantage of > working with arbitrary objects, including those without mutable > __dict__ attributes. Such an approach would start by building a type > map (via ABC.__subclasses__) and then using the callback hooks to keep > the mapping up to date. > > That would be pretty cool. A simple __implements__ like I described it would definitely be less flexible. > I believe there is an open tracker item for that concept, but I can't > currently find a reference to it. > > I believe you are talking about http://bugs.python.org/issue5405. -eric > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue May 24 04:21:53 2011 From: guido at python.org (Guido van Rossum) Date: Mon, 23 May 2011 19:21:53 -0700 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <87boyubuwj.fsf@uwakimon.sk.tsukuba.ac.jp> <4DDAD2AF.1050800@canterbury.ac.nz> Message-ID: 2011/5/23 Bruce Leban : > I like c'x'. It's easy to read and very explicitly constant and clear what > the value is 'x'. (Some other letter instead of 'c' would be fine as well.) We shouldn't add any new notation to create integers from characters to the language. It's too small a use case for adding new syntax. I would focus on agreeing on the notation that is most readable; personally I vote for ord('x'). -- --Guido van Rossum (python.org/~guido) From stephen at xemacs.org Tue May 24 04:40:46 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 24 May 2011 11:40:46 +0900 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <87boyubuwj.fsf@uwakimon.sk.tsukuba.ac.jp> <4DDAD2AF.1050800@canterbury.ac.nz> Message-ID: <87wrhgaki9.fsf@uwakimon.sk.tsukuba.ac.jp> Bruce Leban writes: > I recognize that the compiler can optimize: > > > if bytesdata[i] == b'x'[0]: > > but that looks like chicken scratches to me. Using named constants should fix that, and is better style anyway. > Someone suggested using 0'x' which I don't quite get. It looks too much like > 0x to me True but minor, IMO YMMV. > and the I've always read the leading zero to mean 'this is a > number'. That's precisely Nick's point in suggesting it! From ncoghlan at gmail.com Tue May 24 07:13:53 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 24 May 2011 15:13:53 +1000 Subject: [Python-ideas] __implements__ on arguments to ABCMeta.register In-Reply-To: References: Message-ID: On Tue, May 24, 2011 at 11:30 AM, Eric Snow wrote: > On Sun, May 22, 2011 at 11:46 PM, Nick Coghlan wrote: >> I believe there is an open tracker item for that concept, but I can't >> currently find a reference to it. > > I believe you are talking about?http://bugs.python.org/issue5405. That's the one (my tracker-fu failed me when I was trying to find it). I added a link from that issue back to the archive of this thread on python.org. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Tue May 24 07:19:27 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 24 May 2011 15:19:27 +1000 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: <87wrhgaki9.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <87boyubuwj.fsf@uwakimon.sk.tsukuba.ac.jp> <4DDAD2AF.1050800@canterbury.ac.nz> <87wrhgaki9.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, May 24, 2011 at 12:40 PM, Stephen J. Turnbull wrote: > Bruce Leban writes: > ?> and the I've always read the leading zero to mean 'this is a > ?> number'. > > That's precisely Nick's point in suggesting it! Indeed :) Still, I've come around to the point of view that the simplest and clearest way to write it is simply "ord('x')", and if that is in a time-critical inner loop, save the value in a named variable. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at gmail.com Tue May 24 11:24:43 2011 From: fuzzyman at gmail.com (Michael Foord) Date: Tue, 24 May 2011 10:24:43 +0100 Subject: [Python-ideas] Implementing __dir__ (moving dir implementation to object.__dir__?) In-Reply-To: References: Message-ID: On 24 May 2011 01:10, Benjamin Peterson wrote: > Michael Foord writes: > > Moving the relevant parts of the implementation of dir into > object.__dir__ > would be one way to solve that. > > Sounds fine to me. Do file a bug report. > > Thanks. http://bugs.python.org/issue12166 All the best, Michael > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue May 24 14:47:07 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 24 May 2011 21:47:07 +0900 Subject: [Python-ideas] Python 3.x and bytes In-Reply-To: References: <4DD2C2A5.3080403@stoneleaf.us> <4DD2D89D.4000303@stoneleaf.us> <4DD2F661.2050005@stoneleaf.us> <4DD35B9C.3030702@canterbury.ac.nz> <4DD3EC7A.8070801@stoneleaf.us> <4DD427A7.3060606@stoneleaf.us> <87ei3uaptt.fsf@uwakimon.sk.tsukuba.ac.jp> <87boyubuwj.fsf@uwakimon.sk.tsukuba.ac.jp> <4DDAD2AF.1050800@canterbury.ac.nz> <87wrhgaki9.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87k4dg9sfo.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > Still, I've come around to the point of view that the simplest and > clearest way to write it is simply "ord('x')", and if that is in a > time-critical inner loop, save the value in a named variable. +1. Actually, I prefer the latter. I feel that the former is just a complicated and expensive magic number in almost all cases. From songofacandy at gmail.com Wed May 25 19:29:48 2011 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 26 May 2011 02:29:48 +0900 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. Message-ID: Hi, all. There are some situation that I want to use bytes as a string in real world. (I use the 'bstr' for bytes as a string below) Sadly, Python 3's bytes is not bytestring. For example, when I want to make 'cat -n' that is transparent to encoding, Python 3 doesn't permit b'{0:6d}'.format(n) and '{0:6d}'.format(n).encode('ascii') is circuitous way against simple requirements. I think the best way to handle such situation with Python 3 is using 'latin1' codec. For example, encoding transparent 'cat -n' is: import sys fin = open(sys.stdin.fileno(), 'r', encoding='latin1') fout = open(sys.stdout.fileno(), 'w', encoding='latin1') for n, L in enumerate(fin): fout.write('{0:5d}\t{1}'.format(n, L)) If using 'latin1' is Pythonic way to handle encoding transparent string, I think Python should provide another alias like 'bytes'. Any thoughts? -- INADA Naoki? From tjreedy at udel.edu Thu May 26 03:58:58 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 25 May 2011 21:58:58 -0400 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On 5/25/2011 1:29 PM, INADA Naoki wrote: > Sadly, Python 3's bytes is not bytestring. By intention. > import sys > fin = open(sys.stdin.fileno(), 'r', encoding='latin1') > fout = open(sys.stdout.fileno(), 'w', encoding='latin1') > for n, L in enumerate(fin): > fout.write('{0:5d}\t{1}'.format(n, L)) > > If using 'latin1' is Pythonic way to handle encoding transparent string, > I think Python should provide another alias like 'bytes'. I presume that you mean you would like to write fin = open(sys.stdin.fileno(), 'r', encoding='bytes') fout = open(sys.stdout.fileno(), 'w', encoding='bytes') If such a thing were added, the 256 bytes should directly map to the first 256 codepoints. I don't know if 'latin1' does that or not. In any case, one can rewrite the above without decoding input lines. with open('tem.py', 'rb') as fin, open('tem2.txt', 'wb') as fout: for n, L in enumerate(fin): fout.write('{0:5d}\t'.format(n).encode('ascii')) fout.write(L) (sys.x.fineno raises fineno AttributeError in IDLE.) -- Terry Jan Reedy From songofacandy at gmail.com Thu May 26 04:57:24 2011 From: songofacandy at gmail.com (INADA Naoki) Date: Thu, 26 May 2011 11:57:24 +0900 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On Thu, May 26, 2011 at 10:58 AM, Terry Reedy wrote: > On 5/25/2011 1:29 PM, INADA Naoki wrote: > >> Sadly, Python 3's bytes is not bytestring. > > By intention. Yes, I know. But I feel sad because it cause many confusions. Bytes supports some string methods. >>> b"foo".capitalize() # Oh, b'Foo' >>> b"foo".isalpha() # alphabets in not-string? True >>> b"foo%d" % 3 Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for %: 'bytes' and 'int' > >> import sys >> fin = open(sys.stdin.fileno(), 'r', encoding='latin1') >> fout = open(sys.stdout.fileno(), 'w', encoding='latin1') >> for n, L in enumerate(fin): >> ? ? fout.write('{0:5d}\t{1}'.format(n, L)) >> >> If using 'latin1' is Pythonic way to handle encoding transparent string, >> I think Python should provide another alias like 'bytes'. > > I presume that you mean you would like to write > fin = open(sys.stdin.fileno(), 'r', encoding='bytes') > fout = open(sys.stdout.fileno(), 'w', encoding='bytes') > > If such a thing were added, the 256 bytes should directly map to the first > 256 codepoints. I don't know if 'latin1' does that or not. In any case, Yes, 'latin1' directly maps 256 bytes to 256 codepoints. > one > can rewrite the above without decoding input lines. > > with open('tem.py', 'rb') as fin, open('tem2.txt', 'wb') as fout: > ?for n, L in enumerate(fin): > ? ?fout.write('{0:5d}\t'.format(n).encode('ascii')) > ? ?fout.write(L) > > (sys.x.fineno raises fineno AttributeError in IDLE.) > There are 2 problems. 1) binary mode doesn't support line buffering. So I should disable buffering and this may cause performance regression. 2) Requiring .encode('ascii') is less attractive when using Python as a scripting language in Unix. But latin1 approach has disadvantage of performance and memory usage. I think Python 3 doesn't provide easy and efficient way to implement encoding transparent command like 'cat -n'. It's very sad. -- INADA Naoki? From tjreedy at udel.edu Thu May 26 06:09:42 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 26 May 2011 00:09:42 -0400 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On 5/25/2011 10:57 PM, INADA Naoki wrote: > Bytes supports some string methods. As exactly specified in 4.6.5. Bytes and Byte Array Methods There is really no need to repeat what everyone reading this knows. > I wrote >> with open('tem.py', 'rb') as fin, open('tem2.txt', 'wb') as fout: >> for n, L in enumerate(fin): >> fout.write('{0:5d}\t'.format(n).encode('ascii')) >> fout.write(L) >> >> (sys.x.fineno raises fineno AttributeError in IDLE.) >> > > There are 2 problems. > > 1) binary mode doesn't support line buffering. So I should disable buffering > and this may cause performance regression. *nix already has a c-coded cat command; Windows has copy commands. So there is no need to design Python for this. Cat is usually used with files rather than terminals ans screens. When it is used with terminals and screens, the extra encode/decode does not matter. Realistic Python programs that actually do something with the text need to decode with the actual encoding, regardless of byte source. So I do not think we need a bytes alias for latin_1. The docs might mention that it is essentially a do-nothing codec. -- Terry Jan Reedy From serge.hulne at gmail.com Thu May 26 07:29:47 2011 From: serge.hulne at gmail.com (Serge Hulne) Date: Thu, 26 May 2011 07:29:47 +0200 Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code Message-ID: Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code Here is the link; http://svn.python.org/projects/python/trunk/Tools/scripts/pindent.py Pindent stands for "Pyton indent": Goal : 1. It provides bloc delimiters (end of blocks) in the for of comments (like "#end if" or "#end for" etc ... ) 2. This allows one to check / restore the indentation of Python code, in cases where> 1. A copy/paste went wrong 2. The indentation of a Python source got corrupted when the script was posted on web page, send via email etc ... 3. Standardise (fix) sources which happily mix whitespaces and tabs 4. Make Python code more readable for developers used to end of blocs delimiters (Ruby, C, C++, C#,Java, etc ...) Basically the idea is the same as the Go language "gofmt" (Go format). Example: #------------------- - Before using pindent: #!/usr/bin env python i = 0 for c in "hello world": if c == 'l': i+=1 print "number of occurrences of `l` :", i #------------------ - After using indent: #!/usr/bin env python i = 0 for c in "hello world": if c == 'l': i+=1 print "number of occurrences of `l` :", i # end if # end for Serge Hulne -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu May 26 07:42:18 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 26 May 2011 15:42:18 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On Thu, May 26, 2011 at 3:29 AM, INADA Naoki wrote: > There are some situation that I want to use bytes as a string in real world. Breaking the bytes-are-text mental model is something we deliberately set out to do with Python 3 (because it is wrong). In today's global environment, programmers *need* to learn about text encoding issues as treating bytes as text without finding out the encoding first is a surefire way to get unintelligible mojibake. If "What does 'latin-1' mean?" is a question that gets them there, then that's fine. You *cannot* transparently handle data in arbitrary encodings, as the meanings of the bytes change based on the encoding (this is especially true when dealing with non-ASCII compatible encodings). That said, decoding and reencoding via 'ascii' (strict 7-bit) or 'latin-1' (full 8-bit) is the easiest way to handle both strings and bytes input reasonably efficiently. See urllib.parse for examples on how to do that. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stefan_ml at behnel.de Thu May 26 11:15:19 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 26 May 2011 11:15:19 +0200 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: Terry Reedy, 26.05.2011 03:58: > If such a thing were added, the 256 bytes should directly map to the first > 256 codepoints. I don't know if 'latin1' does that or not. Yes, Unicode was specifically designed to support that. The first 128 code points are identical with the ASCII encoding, the first 256 code points are identical with the Latin-1 encoding. See also PEP 393, which exploits this feature. http://www.python.org/dev/peps/pep-0393/ That being said, I don't see the point of aliasing "latin-1" to "bytes" in the codecs. That sounds confusing to me. Stefan From cmjohnson.mailinglist at gmail.com Thu May 26 12:53:36 2011 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Thu, 26 May 2011 00:53:36 -1000 Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code In-Reply-To: References: Message-ID: On Wed, May 25, 2011 at 7:29 PM, Serge Hulne wrote: > Basically the idea is the same as the Go language "gofmt" (Go format). > Something like gofmt is imaginable for Python. Block delimiters are not. Never gonna happen. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmjohnson.mailinglist at gmail.com Thu May 26 12:59:58 2011 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Thu, 26 May 2011 00:59:58 -1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel wrote: > Yes, Unicode was specifically designed to support that. The first 128 code > points are identical with the ASCII encoding, the first 256 code points are > identical with the Latin-1 encoding. > > See also PEP 393, which exploits this feature. > > http://www.python.org/dev/peps/pep-0393/ > > That being said, I don't see the point of aliasing "latin-1" to "bytes" in > the codecs. That sounds confusing to me. "bytes" is probably the wrong name for it, but I think using some name to signal "I'm not really using this encoding, I just need to be able to pass these bytes into and out of a string without losing any bits" might be better than using "latin-1" if we're forced to take up this hack. (My gut feeling is that it would be better if we could avoid using the "latin-1" hack all together, but apparently wiser minds than me have decided we have no other choice.) Maybe we could call it "passthrough"? And we could add a documentation note that if you use "passthrough" to decode some bytes you must, must, must use it to encode them later, since the string you manipulate won't really contain unicode codepoints, just a transparent byte encoding? -- Carl -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu May 26 13:13:29 2011 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 26 May 2011 13:13:29 +0200 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: <4DDE35D9.2040204@egenix.com> Carl M. Johnson wrote: > On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel wrote: > > >> Yes, Unicode was specifically designed to support that. The first 128 code >> points are identical with the ASCII encoding, the first 256 code points are >> identical with the Latin-1 encoding. >> >> See also PEP 393, which exploits this feature. >> >> http://www.python.org/dev/peps/pep-0393/ >> >> That being said, I don't see the point of aliasing "latin-1" to "bytes" in >> the codecs. That sounds confusing to me. > > > "bytes" is probably the wrong name for it, but I think using some name to > signal "I'm not really using this encoding, I just need to be able to pass > these bytes into and out of a string without losing any bits" might be > better than using "latin-1" if we're forced to take up this hack. (My gut > feeling is that it would be better if we could avoid using the "latin-1" > hack all together, but apparently wiser minds than me have decided we have > no other choice.) Maybe we could call it "passthrough"? And we could add a > documentation note that if you use "passthrough" to decode some bytes you > must, must, must use it to encode them later, since the string you > manipulate won't really contain unicode codepoints, just a transparent byte > encoding? If you really wish to carry around binary data in a Unicode object, then you should use a codec that maps the 256 code points in a byte to either a private code point area or use a hack like the surrogateescape approach defined in PEP 383: http://www.python.org/dev/peps/pep-0383/ By using 'latin-1' you can potentially have the binary data leak into other text data of your application, or worse, have it converted to a different encoding on output, e.g. when sending the data to a UTF-8 pipe. In any case, this is bound to create hard to detect problems. Better use bytes to begin with. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 26 2011) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2011-05-23: Released eGenix mx Base 3.2.0 http://python.egenix.com/ 2011-05-25: Released mxODBC 3.1.1 http://python.egenix.com/ 2011-06-20: EuroPython 2011, Florence, Italy 25 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From serge.hulne at gmail.com Thu May 26 13:15:37 2011 From: serge.hulne at gmail.com (Serge Hulne) Date: Thu, 26 May 2011 13:15:37 +0200 Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code In-Reply-To: References: Message-ID: Actually these are "fake" bloc delimiters (in the shape of comments, see example in the original post). By this I mean they are used by the formatting tool (pindent) only, not by the language (Python itself). They are (generated by and used by) pindent for the sake of being able to fix the indent level in python code when : 1. A copy / paste went bad (e.g. the last line of a for bloc has been "pasted at the wrong indentation level"). 2. A source file lost all indentation when been mailed because, say, the tabs have been stripped 3. etc... I do not see how there can be an equivalent of gofmt if there is no *indication* of the end of the blocs (independent of the indentation, that is). It is my feeling that without such a tool Python is inherently very vulnerable to glitches occurring at editing time: 1. Copy / paste glitch that passes unnoticed, does not generate an exception but alters the logic of the program. 2. Tab key inadvertently hit. 3. Difficulty in assessing the target indentation level when a part of a bloc has to be pasted in a different part of the code. Serge Hulne. On Thu, May 26, 2011 at 7:29 AM, Serge Hulne wrote: > Suggestion: Integrate the script "pindent.py" as standard command for > formatting pyhton code > > Here is the link; > http://svn.python.org/projects/python/trunk/Tools/scripts/pindent.py > > Pindent stands for "Pyton indent": > > Goal : > > 1. It provides bloc delimiters (end of blocks) in the for of comments > (like "#end if" or "#end for" etc ... ) > 2. This allows one to check / restore the indentation of Python code, > in cases where> > 1. A copy/paste went wrong > 2. The indentation of a Python source got corrupted when the script > was posted on web page, send via email etc ... > 3. Standardise (fix) sources which happily mix whitespaces and tabs > 4. Make Python code more readable for developers used to end of > blocs delimiters (Ruby, C, C++, C#,Java, etc ...) > > Basically the idea is the same as the Go language "gofmt" (Go format). > > Example: > > #------------------- > - Before using pindent: > > #!/usr/bin env python > > i = 0 > for c in "hello world": > if c == 'l': > i+=1 > print "number of occurrences of `l` :", i > > #------------------ > - After using indent: > > #!/usr/bin env python > > i = 0 > for c in "hello world": > if c == 'l': > i+=1 > print "number of occurrences of `l` :", i > # end if > # end for > > > Serge Hulne > -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Thu May 26 13:17:07 2011 From: masklinn at masklinn.net (Masklinn) Date: Thu, 26 May 2011 13:17:07 +0200 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On 2011-05-26, at 12:59 , Carl M. Johnson wrote: > On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel wrote: >> Yes, Unicode was specifically designed to support that. The first 128 code >> points are identical with the ASCII encoding, the first 256 code points are >> identical with the Latin-1 encoding. >> >> See also PEP 393, which exploits this feature. >> >> http://www.python.org/dev/peps/pep-0393/ >> >> That being said, I don't see the point of aliasing "latin-1" to "bytes" in >> the codecs. That sounds confusing to me. > > > "bytes" is probably the wrong name for it, but I think using some name to > signal "I'm not really using this encoding, I just need to be able to pass > these bytes into and out of a string without losing any bits" might be > better than using "latin-1" if we're forced to take up this hack. (My gut > feeling is that it would be better if we could avoid using the "latin-1" > hack all together, but apparently wiser minds than me have decided we have > no other choice.) Maybe we could call it "passthrough"? And we could add a > documentation note that if you use "passthrough" to decode some bytes you > must, must, must use it to encode them later, since the string you > manipulate won't really contain unicode codepoints, just a transparent byte > encoding? Considering the original use case, which seems to be mostly about being able to use .format, would it make more sense to be able to create "byte patterns", with formats similar to those of str.format but not identical (e.g. better control on layout would be nice, something similar to Erlang's bit syntax for putting binaries together). This would be useful to put together byte sequences from existing values to e.g. output binary formats. From jimjjewett at gmail.com Thu May 26 15:45:51 2011 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 26 May 2011 09:45:51 -0400 Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code In-Reply-To: References: Message-ID: On Thu, May 26, 2011 at 7:15 AM, Serge Hulne wrote: > Actually these are "fake" bloc delimiters (in the shape of comments, see > example in the original post). They are inherently bad, because they are extra noise. The question is whether they add enough value to make up for that. > A copy / paste went bad (e.g. the last line of a for bloc has been "pasted > at the wrong indentation level"). For me, it is usually either the entire bloc, or just the first line that is wrong. > A source file lost all indentation when been mailed because, say, the tabs > have been stripped > etc... This has been an annoyance on the python lists lately; I'm not sure why, but a lot of the recent code has come through (at least on my gmail account) without indentation at all. The catch is, I have usually been able to figure out where the indents/dedents should go; if I can't, it is a sign that the function is too long. And these extra comments only make the functions longer... -jJ From brian.curtin at gmail.com Thu May 26 16:06:44 2011 From: brian.curtin at gmail.com (Brian Curtin) Date: Thu, 26 May 2011 09:06:44 -0500 Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code In-Reply-To: References: Message-ID: On Thu, May 26, 2011 at 00:29, Serge Hulne wrote: > Suggestion: Integrate the script "pindent.py" as standard command for > formatting pyhton code > > Here is the link; > http://svn.python.org/projects/python/trunk/Tools/scripts/pindent.py > > Pindent stands for "Pyton indent": > > Goal : > > 1. It provides bloc delimiters (end of blocks) in the for of comments > (like "#end if" or "#end for" etc ... ) > 2. This allows one to check / restore the indentation of Python code, > in cases where> > 1. A copy/paste went wrong > 2. The indentation of a Python source got corrupted when the script > was posted on web page, send via email etc ... > 3. Standardise (fix) sources which happily mix whitespaces and tabs > 4. Make Python code more readable for developers used to end of > blocs delimiters (Ruby, C, C++, C#,Java, etc ...) > > Basically the idea is the same as the Go language "gofmt" (Go format). > > Example: > > #------------------- > - Before using pindent: > > #!/usr/bin env python > > i = 0 > for c in "hello world": > if c == 'l': > i+=1 > print "number of occurrences of `l` :", i > > #------------------ > - After using indent: > > #!/usr/bin env python > > i = 0 > for c in "hello world": > if c == 'l': > i+=1 > print "number of occurrences of `l` :", i > # end if > # end for This is already included in the Python source tree, so I'm not sure what further inclusion/integration you are suggesting. I don't find this style necessary nor is it really a good style to promote, especially because Python isn't Ruby, C++, or any of the languages you listed. The only time I've found it sort-of ok to do this is if a block nested in other blocks spans more than the height of one monitor view, which isn't often. Even then, most IDEs and editors handle this by having optional guides for block beginning and ending. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Thu May 26 16:12:55 2011 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Thu, 26 May 2011 16:12:55 +0200 Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code In-Reply-To: References: Message-ID: Brian Curtin : > This is already included in the Python source tree, so I'm not sure what > further inclusion/integration you are suggesting. Really? I honestly fail to understand why one would want to use such a tool at all. It always assumes the worst scenario (bad indentation / mixed tab spaces / copy & paste went bad) and tries to solve it by adding unnecessary cruft. 2011/5/26 Serge Hulne : > Make Python code more readable for developers used to end of blocs > delimiters (Ruby, C, C++, C#,Java, etc ...) Unless the block code is very long and/or not nicely written it's *less* readable. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ From ncoghlan at gmail.com Thu May 26 16:55:55 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 May 2011 00:55:55 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On Thu, May 26, 2011 at 9:17 PM, Masklinn wrote: > Considering the original use case, which seems to be mostly about being able to use .format, would it make more sense to be able to create "byte patterns", with formats similar to those of str.format but not identical (e.g. better control on layout would be nice, something similar to Erlang's bit syntax for putting binaries together). > > This would be useful to put together byte sequences from existing values to e.g. output binary formats. We already have an entire module dedicated to the task of handling binary formats: http://docs.python.org/py3k/library/struct "format(n, '6d').encode('ascii')" is the right way to get the string representation of a number as ASCII bytes. However, the programmer needs to be aware that concatenating those bytes with an encoding that is not ASCII compatible (such as UTF-16, UTF-32, or many of the Asian encodings) will result in a sequence of unusable garbage. It is far, far safer to transform everything into the text domain, work with it there, then encode back when the manipulation is complete. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From masklinn at masklinn.net Thu May 26 17:56:48 2011 From: masklinn at masklinn.net (Masklinn) Date: Thu, 26 May 2011 17:56:48 +0200 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: <4B82F12B-186A-4729-BC9C-E894811FBB2B@masklinn.net> On 2011-05-26, at 16:55 , Nick Coghlan wrote: > On Thu, May 26, 2011 at 9:17 PM, Masklinn wrote: >> Considering the original use case, which seems to be mostly about being able to use .format, would it make more sense to be able to create "byte patterns", with formats similar to those of str.format but not identical (e.g. better control on layout would be nice, something similar to Erlang's bit syntax for putting binaries together). >> >> This would be useful to put together byte sequences from existing values to e.g. output binary formats. > > We already have an entire module dedicated to the task of handling > binary formats: http://docs.python.org/py3k/library/struct Sure, but: 1. It does not matter overly much, there are many cases where this did not stop the core team from agreeing the problem was insufficiently well solved (latest instance: string formatting, the current builtin solution being predated by an other builtin and at least one previous stdlib solution) 2. struct suffers from a bunch of issues - it ranks low in discoverability, people who have not bit-twiddled much in C may not realize that a struct (in C) is just an interpretation pattern on a byte string, and it's advertised as an interaction between Python and C structs, not arbitrary bytes patterns/building - struct format strings are "wonky" (in that they're nothing like those of str.format) - struct format strings simply can't deal with mixing literal "character bytes" and format specs, making formats with fixed ascii structures significantly less readable > "format(n, '6d').encode('ascii')" is the right way to get the string > representation of a number as ASCII bytes. However, the programmer > needs to be aware that concatenating those bytes with an encoding that > is not ASCII compatible (such as UTF-16, UTF-32, or many of the Asian > encodings) will result in a sequence of unusable garbage. It is far, > far safer to transform everything into the text domain, work with it > there, then encode back when the manipulation is complete. Sure, but as you noted this is not even always done in the stdlib, why third-party developers would be expected to be in a better situation? And between jumping through a semi-arbitrary decode/encode cycle whose semantics are completely ignored and being able to just specify a bytes pattern, which seems stranger? And I'm probably overstating its importance, but erlang seems to do rather well with its bit syntax. Which is much closer to str.format than to struct.pack (in API, in looks, in complexity, ?) From benjamin at python.org Thu May 26 22:19:37 2011 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 26 May 2011 20:19:37 +0000 (UTC) Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code References: Message-ID: Serge Hulne writes: > > > Suggestion: Integrate the script "pindent.py" A more useful script in my opinion is "reindent.py". > > A copy/paste went wrong > The indentation of a Python source got corrupted when the script was posted on web page, send via email etc ... > Standardise (fix) sources which happily mix whitespaces and tabs Since it does just this and nothing else. From tjreedy at udel.edu Fri May 27 00:26:21 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 26 May 2011 18:26:21 -0400 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On 5/26/2011 7:17 AM, Masklinn wrote: > Considering the original use case, to prefix ascii-encoded numbers to lines in an unknown but ascii-compatible encoding*, and considering the responses since my last post, I have changed from -0 to -1 to the alias proposal. 1. The use case does not need the fake decoding and is better off without it. 2. I suspect the uses cases where fake decoding is both needed and sufficient are relatively rare. 3. Fake decoding is dangerous (Lemburg). 4. People who know enough to use it safely should already know about how latin-1 relates to unicode, and therefore do not need an alias. 5. Other people should not be encouraged to use it as a fake. *I meant to ask earlier whether there are ascii-incompatible encodings for which the original code and my revision would not work. I gather from the responses that yes, there are some. -- Terry Jan Reedy From greg.ewing at canterbury.ac.nz Fri May 27 01:28:47 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 27 May 2011 11:28:47 +1200 Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code In-Reply-To: References: Message-ID: <4DDEE22F.6000807@canterbury.ac.nz> Serge Hulne wrote: > It is my feeling that without such a tool Python is inherently very > vulnerable to glitches occurring at editing time: > > 1. Copy / paste glitch that passes unnoticed, does not generate > an exception but alters the logic of the program. > 2. Tab key inadvertently hit. > 3. Difficulty in assessing the target indentation level when a > part of a bloc has to be pasted in a different part of the > code. How much actual experience have you had writing and editing Python code? While it might seem from a theoretical viewpoint that these problems should exist, in my experience they occur very rarely, if at all. Even sending Python code by email seems to be fine most of the time as long as you indent it with spaces, unless there is some particularly braindamaged piece of software in the way. All the Python mailing lists and newsgroups I frequent seem to handle space-indented Python just fine. I don't think any tool to add block-delimiting comments is going to gain much adoption, because the uglification of the code that it results in is grossly out of proportion to the actual magnitude of the problem. -- Greg From greg.ewing at canterbury.ac.nz Fri May 27 01:34:51 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 27 May 2011 11:34:51 +1200 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: <4DDEE39B.5090400@canterbury.ac.nz> Masklinn wrote: > would it make more sense to be able to create "byte patterns", > with formats similar to those of str.format but not identical (e.g. better > control on layout would be nice, something similar to Erlang's bit syntax for > putting binaries together). Sounds a lot like struct.pack. Maybe struct.pack and struct.unpack could be made available as methods of bytes? I don't think this would address the OP's use case, though, because he seems to actually want a textual format whose output is encoded in ascii. -- Greg From songofacandy at gmail.com Fri May 27 04:02:52 2011 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 27 May 2011 11:02:52 +0900 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On Fri, May 27, 2011 at 7:26 AM, Terry Reedy wrote: > On 5/26/2011 7:17 AM, Masklinn wrote: >> >> Considering the original use case, > > to prefix ascii-encoded numbers to lines in an unknown but ascii-compatible > encoding*, > and considering the responses since my last post, I have changed from -0 to > -1 to the alias proposal. > > 1. The use case does not need the fake decoding and is better off without > it. > 2. I suspect the uses cases where fake decoding is both needed and > sufficient are relatively rare. > 3. Fake decoding is dangerous (Lemburg). > 4. People who know enough to use it safely should already know about how > latin-1 relates to unicode, and therefore do not need an alias. > 5. Other people should not be encouraged to use it as a fake. OK, I understand that using 'latin1' is just a hack and not Pythonic way. Then, I hope bytes has a fast and efficient "format" method like: >>> b'{0} {1}'.format(23, b'foo') # accepts int, float, bytes, bool, None 23 foo >>> b'{0}'.format('foo') # raises TypeError for other types. TypeError And line buffering in binary mode is also nice. > *I meant to ask earlier whether there are ascii-incompatible encodings for > which the original code and my revision would not work. I gather from the > responses that yes, there are some. > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- INADA Naoki? From stephen at xemacs.org Fri May 27 04:59:58 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 27 May 2011 11:59:58 +0900 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: <87y61s97bl.fsf@uwakimon.sk.tsukuba.ac.jp> INADA Naoki writes: > Any thoughts? -1 TOOWTDI. No alias, please. It's just an idiom people who need the functionality will need to learn (but see comment on urllib.parse below). As Terry says, it's hard to believe that use of the latin1 codec and str for internal processing is going to be a bottleneck in practical applications. I wonder if it would be possible to generalize Nick's work on urllib.parse to a more general class. From ncoghlan at gmail.com Fri May 27 06:41:04 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 May 2011 14:41:04 +1000 Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code In-Reply-To: <4DDEE22F.6000807@canterbury.ac.nz> References: <4DDEE22F.6000807@canterbury.ac.nz> Message-ID: On Fri, May 27, 2011 at 9:28 AM, Greg Ewing wrote: > Even sending Python code by email seems to be fine most of > the time as long as you indent it with spaces, unless there > is some particularly braindamaged piece of software in the > way. All the Python mailing lists and newsgroups I frequent > seem to handle space-indented Python just fine. Email is generally fine, but quite a few commenting systems are braindead when it comes to handling whitespace correctly. Even there, a simple leading dot on each line can generally resolve the issue, or else you put the code on a code pasting site and just link to it from the comment. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Fri May 27 07:11:41 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 May 2011 15:11:41 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <87y61s97bl.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87y61s97bl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, May 27, 2011 at 12:59 PM, Stephen J. Turnbull wrote: > I wonder if it would be possible to generalize Nick's work on > urllib.parse to a more general class. I thought about that when I was implementing it, and I don't really think so. The decode/encode cycle in urllib.parse is based on a few key elements: 1. The URL standard itself mandates a 7-bit ASCII bytestream. The implicit conversion accordingly uses the ascii codec with strict error handling, so if you want to handle malformed URLs, you still have to do your own decoding and pass in already decoded text strings rather than the raw bytes (as there is no way for the library to guess an appropriate encoding for any non-ASCII bytes it encounters). 2. The affected urllib.parse APIs are all stateless - the output is determined by the inputs. Accordingly, it was fairly straightforward to coerce all of the arguments to strings and also create a "coerce result" callable that is either a no-op that just returns its argument (string inputs) or calls .encode() on its input and returns that (bytes/bytearray inputs) 3. All of the operations that returned tuples were updated to return namedtuple subclasses with an encode() method that passed the encoding command down to the individual tuple elements. These subclasses all came in matched pairs (one that held only strings, another that held only bytes). The argument coercion function could probably be extracted and placed in the string module, but it isn't all that useful on its own - it's adequate if you're only returning single strings, but needs to be matched with an appropriately designed class hierarchy if you're returning anything more complicated. I believe RDM used a similar design pattern of parallel bytes and string based return types to get the email package into a more usable state for 3.2. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Fri May 27 07:24:25 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 May 2011 15:24:25 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On Fri, May 27, 2011 at 12:02 PM, INADA Naoki wrote: > Then, I hope bytes has a fast and efficient "format" method like: >>>> b'{0} {1}'.format(23, b'foo') ?# accepts int, float, bytes, bool, None > 23 foo >>>> b'{0}'.format('foo') ?# raises TypeError for other types. > TypeError What method is invoked to convert the numbers to text? What encoding is used to convert those numbers to text? How does this operation avoid also converting the *bytes* object to text and then reencoding it? Bytes are not text. Struggling against that is a recipe for making life hard for yourself in Python 3. That said, there *may* still be a place for bytes.format(). However, proper attention needs to be paid to the encoding issues, and the question of how arbitrary types can be supported (including how to handle the fast path for existing bytes() and bytearray() objects). The pedagogic cost of making it even harder than it already is to convince people that bytes are not text would also need to be considered. > And line buffering in binary mode is also nice. The Python 3 IO stack already provides b'\n' based line buffering for binary files. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From songofacandy at gmail.com Fri May 27 08:14:57 2011 From: songofacandy at gmail.com (INADA Naoki) Date: Fri, 27 May 2011 15:14:57 +0900 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On Fri, May 27, 2011 at 2:24 PM, Nick Coghlan wrote: > On Fri, May 27, 2011 at 12:02 PM, INADA Naoki wrote: >> Then, I hope bytes has a fast and efficient "format" method like: >>>>> b'{0} {1}'.format(23, b'foo') ?# accepts int, float, bytes, bool, None >> 23 foo >>>>> b'{0}'.format('foo') ?# raises TypeError for other types. >> TypeError > > What method is invoked to convert the numbers to text? Doesn't invoke any methods. Please imagine stdio's pritnf. > What encoding > is used to convert those numbers to text? > How does this operation > avoid also converting the *bytes* object to text and then reencoding > it? I've wrote a wrong example. >>>>> b'{0} {1}'.format(23, b'foo') ?# accepts int, float, bytes, bool, None >> 23 foo This should be b'23 foo'. Numbers encoded by ascii. > > Bytes are not text. Struggling against that is a recipe for making > life hard for yourself in Python 3. I love unicode and use unicode when I can use it. But this is a problem in the real world. For example, Python 2 is convenient for analyzing line based logs containing some different encodings. Python 3 > > That said, there *may* still be a place for bytes.format(). However, > proper attention needs to be paid to the encoding issues, and the > question of how arbitrary types can be supported (including how to > handle the fast path for existing bytes() and bytearray() objects). > The pedagogic cost of making it even harder than it already is to > convince people that bytes are not text would also need to be > considered. > >> And line buffering in binary mode is also nice. > > The Python 3 IO stack already provides b'\n' based line buffering for > binary files. But the doc says that "1 to select line buffering (only usable in text mode)," http://docs.python.org/dev/library/functions.html#open > > Cheers, > Nick. > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > -- INADA Naoki? From ncoghlan at gmail.com Fri May 27 08:37:31 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 May 2011 16:37:31 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On Fri, May 27, 2011 at 4:14 PM, INADA Naoki wrote: > But the doc says that "1 to select line buffering (only usable in text mode)," > http://docs.python.org/dev/library/functions.html#open True, I was thinking about the public API (readline/readlines) rather than the underlying buffering. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Fri May 27 08:45:13 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 May 2011 16:45:13 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: On Fri, May 27, 2011 at 4:14 PM, INADA Naoki wrote: > I love unicode and use unicode when I can use it. > But this is a problem in the real world. > For example, Python 2 is convenient for analyzing line based logs > containing some different encodings. Python 3 ...deliberately makes that difficult because it is *wrong*. Binary files containing a mixture of encodings cannot be safely treated as text. The closest it is possible to get is to support only ASCII compatible encodings by decoding it as ASCII with the "surrogateescape" error handler so that bytes with the high order bit set can be faithfully reproduced on reencoding. However, such code will potentially fail once it encounters a non-ASCII compatible encoding, such as UTF-16 or -32. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Fri May 27 10:46:48 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 27 May 2011 17:46:48 +0900 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: <87k4dc8r9j.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > On Fri, May 27, 2011 at 12:02 PM, INADA Naoki wrote: > > Then, I hope bytes has a fast and efficient "format" method like: I still don't see a use case for a fast and efficient bytes.format() method. The latin-1 codec is O(n) with a very small coefficient. It seems to me this is "really" all about TOOWTDI: we'd like to be able to interpolate data received as arguments into a data stream using the same idiom everywhere, whether the stream consists of text, bytes, or class Froooble instances. (I admit I don't offhand know how you'd spell "{0}" in a Froooble stream.) OK, so at present only bytes is a plausible application, but I'm willing to go there. Then, if it turns out that the latin-1 codec imposes too high overhead on .format() in some application, the concerned parties can optimize it. > >>>> b'{0} {1}'.format(23, b'foo') ?# accepts int, float, bytes, bool, None I don't see a use case for accepting bool or None. I hadn't thought about float, but are you really gonna need it? On-the-fly generation of CSS "'{0}em'.format(0.5)" or something like that, I guess? > > 23 foo > >>>> b'{0}'.format('foo') ?# raises TypeError for other types. Philip Eby has a use case for accepting str as long as the ascii codec in strict error mode works on the particular instances of str. Although I'm not sure he would consider a .format() method efficient enough, ISTR he wanted the compiler to convert literals. > > TypeError > > What method is invoked to convert the numbers to text? What encoding > is used to convert those numbers to text? How does this operation > avoid also converting the *bytes* object to text and then reencoding > it? OTOH, Nick, aren't you making this harder than it needs to be? After all, > Bytes are not text. Precisely. So bytes.format() need not handle *all* text-like manipulations, just protocol magic that puns ASCII-encoded text. If a bytes object is displayed sorta like text, then it *is* *all* bytes in the ASCII repertoire (not even the right half of Latin-1 is allowed). In bytes.format(), bytes are bytes, they don't get encoded, they just get interpolated into the bytes object being created. For other stuff, especially integers, if there is a conventional represention for it in ASCII, it *might* be an appropriate conversion for bytes.format() (but see above for my reservations about several common Python types). str (Unicode) might be converted via the ascii codec in strict errors mode, although the purist in me really would rather not go there. AFAICS, this handles all use cases presented so far. > The pedagogic cost of making it even harder than it already is to > convince people that bytes are not text would also need to be > considered. This bothers me quite a bit, but my sense is that practicality is going to beat purity (into a bloody pulp :-P) once again. From ncoghlan at gmail.com Fri May 27 11:27:54 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 May 2011 19:27:54 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <87k4dc8r9j.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87k4dc8r9j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, May 27, 2011 at 6:46 PM, Stephen J. Turnbull wrote: > ?> What method is invoked to convert the numbers to text? What encoding > ?> is used to convert those numbers to text? How does this operation > ?> avoid also converting the *bytes* object to text and then reencoding > ?> it? > > OTOH, Nick, aren't you making this harder than it needs to be? ?After > all, To me, the defining feature of str.format() over str.__mod__() is the ability for types to provide their own __format__ methods, rather than being limited to a predefined set of types known to the interpreter. If bytes were to reuse the same name, then I'd want to see similar flexibility. Now, a *different* bytes method (bytes.interpolate, perhaps?), limited to specific types may make sense, but such an alternative *shouldn't* be conflated with the text formatting API. However, proponents of such an addition need to clearly articulate their use cases and proposed solution in a PEP to make it clear that they aren't merely trying to perpetuate the bytes/text confusion that plagues 2.x 8-bit strings. We can almost certainly do better when it comes to constructing byte sequences from component parts, but simply saying "oh, just add a format() method to bytes objects" doesn't cut it, since the associated magic methods for str.format are all string based, and bytes interpolation also needs to address encoding issues for anything that isn't already a byte sequence. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From masklinn at masklinn.net Fri May 27 11:41:32 2011 From: masklinn at masklinn.net (Masklinn) Date: Fri, 27 May 2011 11:41:32 +0200 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: <87k4dc8r9j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 2011-05-27, at 11:27 , Nick Coghlan wrote: > On Fri, May 27, 2011 at 6:46 PM, Stephen J. Turnbull wrote: >> > What method is invoked to convert the numbers to text? What encoding >> > is used to convert those numbers to text? How does this operation >> > avoid also converting the *bytes* object to text and then reencoding >> > it? >> >> OTOH, Nick, aren't you making this harder than it needs to be? After >> all, > > To me, the defining feature of str.format() over str.__mod__() is the > ability for types to provide their own __format__ methods, rather than > being limited to a predefined set of types known to the interpreter. > If bytes were to reuse the same name, then I'd want to see similar > flexibility. > > Now, a *different* bytes method (bytes.interpolate, perhaps?), limited > to specific types may make sense, but such an alternative *shouldn't* > be conflated with the text formatting API. > > However, proponents of such an addition need to clearly articulate > their use cases and proposed solution in a PEP to make it clear that > they aren't merely trying to perpetuate the bytes/text confusion that > plagues 2.x 8-bit strings. > > We can almost certainly do better when it comes to constructing byte > sequences from component parts, but simply saying "oh, just add a > format() method to bytes objects" doesn't cut it, since the associated > magic methods for str.format are all string based, and bytes > interpolation also needs to address encoding issues for anything that > isn't already a byte sequence. I don't see anything I could disagree with. Especially not in the last paragraph. From theller at ctypes.org Fri May 27 12:04:40 2011 From: theller at ctypes.org (Thomas Heller) Date: Fri, 27 May 2011 12:04:40 +0200 Subject: [Python-ideas] Threading hooks and disable gc per thread In-Reply-To: <4DCB228D.2010904@cheimes.de> References: <4DCB228D.2010904@cheimes.de> Message-ID: <4DDF7738.2050809@ctypes.org> Am 12.05.2011 01:58, schrieb Christian Heimes: > Hello, > > today I've spent several hours debugging a segfault in JCC [1]. JCC is a > framework to wrap Java code for Python. It's most prominently used in > PyLucene [2]. You can read more about my debugging in [3] > > With JCC every Python thread must be registered at the JVM through JCC. > An unattached thread, that accesses a wrapped Java object, leads to > errors and may even cause a segfault. Accessing also includes garbage > collection. A code line like > > a = {} > > or > "a b c".split() > > can segfault since the allocation of a dict or a bound method runs > through _PyObject_GC_New(), which may trigger a cyclic garbage > collection run. If the current thread isn't attached to the JVM but > triggers a gc.collect() with some Java objects in a cycle, the > interpreter crashes. It's quite complicated and hard to "fix" third > party tools to attach all threads created in the third party library. I have a somewhat similar problem and just noticed this thread. In our software, we have multiple threads, and we use a lot of COM objects. COM object also have the requirement that they must only be used in the same thread (in the same apartment, to be exact) that created them. This also applies to cleaning up with the garbage collector. Ok, when the com object is part of some Python structures that include reference cycles, then the cycle gc tries to clean up the ref cycle and cleans up the COM object. This can happen in ANY thread, and in some cases the program crashes or the thread hangs. Here is my idea to fix this from within Python: The COM objects, when created, keep the name of the currently executing thread. In the __del__ method, where the cleanup of the COM object happens by calling the COM .Release() method, a check is made if the current thread is the allowed one or not. If it is the wrong thread, the COM object is kept alive by appending it to some list. The list is stored in a global dictionary indexed by the thread name. The remaining goal is to clear the lists in the dict inside the valid thread - which is done on every creation of a COM object, on every destruction of a COM object, and in the CoUninitialize function that every thread using COM must call before it is ending. At least that's my plan. Maybe you can use a similar approach? Thomas From stephen at xemacs.org Fri May 27 12:20:24 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 27 May 2011 19:20:24 +0900 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: <87ipsw8mxj.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > On Fri, May 27, 2011 at 4:14 PM, INADA Naoki wrote: > > I love unicode and use unicode when I can use it. > > But this is a problem in the real world. > > For example, Python 2 is convenient for analyzing line based logs > > containing some different encodings. Where's the use case for bytes here? > > Python 3 > > ...deliberately makes that difficult because it is *wrong*. Nick, you should have stopped there. :-) I can see very little difference between Python 2 and Python 3 in this use case, except that Python 2 makes it much easier to write easily crashable programs. In both versions, the safe thing to do for such a program is either to slurp the whole log with open(log, encoding=, errors=) (that's Python 3 code; Python 2 makes this more tedious, in fact). But no need for reading as bytes in Python 3 visible here, move along, people! Alternatively, one could write a function that reads lines from the log as bytes, and tries different encodings for each line (perhaps interacting with the user) and eventually uses some default encoding and a nonfatal error handler to get *something*. This requires reading as bytes, but it's no easier to write in Python 2 AFAICS. Granted, such a function will not easily be portable between Python 2 and 3, but that's a different problem. > Binary files containing a mixture of encodings cannot be safely > treated as text. "Safety" is use-case-dependent. I suppose Inada-san considers using Python 2 strs to receive file input safe enough for his log analyzer. While we shouldn't encourage that (and either errors='ignore' or errors='surrogateescape' should be easy enough for him in the log analysis case[1]), I don't think we should demand GIGO with 100% fidelity in all use cases, either. Footnotes: [1] In new code. Again, a port of existing Python 2 code to Python 3 might not be trivial, depending on how he handles unexpected encodings and how pervasively they are manipulated in his program. From stephen at xemacs.org Fri May 27 13:07:42 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 27 May 2011 20:07:42 +0900 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: <87k4dc8r9j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87fwo08kqp.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > To me, the defining feature of str.format() over str.__mod__() is the > ability for types to provide their own __format__ methods, Ah, so you object to the _spelling_, not the requested functionality. (At least, not all of it.) All is clear now! OK, I retract my suggestion, but I'll let you beat up on anybody who dredges it up in the future. Specifically, I think that calling it "bytes.format" (a) is discoverable and (b) it is not obvious to me that __format_bytes__ functionality for arbitrary types is a bad thing, although I personally have no use case and am unlikely to catch one for a while (thus at most I'm now -0, and could easily be persuaded to lower that). > bytes interpolation also needs to address encoding issues for > anything that isn't already a byte sequence. Sure, but my proposal here still stands: whatever the API is, and whatever types it supports, the assumption is that interpolation uses the conventional ASCII representation for the given type (and for interpolations implemented in stdlib there had better be universal agreement on what that convention is). From steve at pearwood.info Fri May 27 13:21:00 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 27 May 2011 21:21:00 +1000 Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code In-Reply-To: References: Message-ID: <201105272121.01078.steve@pearwood.info> On Thu, 26 May 2011 09:15:37 pm Serge Hulne wrote: > It is my feeling that without such a tool Python is inherently very > vulnerable to glitches occurring at editing time: I can't think of any language that is invulnerable to the errors you list. All languages are vulnerable to glitches occurring at edit time. Picking your second example: > 2. Tab key inadvertently hit. If you inadvertently hit the tab key in the middle of a line: n = le n(mylist) # oops, hit the tab key! do you expect it to keep working? No. Then why treat the start of the line any different? There might be some places that, *by chance*, an extra tab won't break the code: n = len( mylist) but you shouldn't rely on that. In general, you should expect ANY and EVERY mutation of source code could break your code, and avoid tools or practices that insert arbitrary changes you didn't intend. Don't let your cat walk on the keyboard while editing source code, don't put your code through a tool that turns text into fake Swedish, and don't use tools that mangle whitespace. It is commonsense really. There are broken tools out there -- especially web forum software -- that arbitrarily mutate whitespace in source code. Those tools are broken, and should be avoided. If you can't avoid them, you have my sympathy, but that's your problem, not Python's, and Python doesn't need to be integrated with a tool for fixing broken source code. -- Steven D'Aprano From ncoghlan at gmail.com Fri May 27 13:51:53 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 May 2011 21:51:53 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <87fwo08kqp.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87k4dc8r9j.fsf@uwakimon.sk.tsukuba.ac.jp> <87fwo08kqp.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, May 27, 2011 at 9:07 PM, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > ?> To me, the defining feature of str.format() over str.__mod__() is the > ?> ability for types to provide their own __format__ methods, > > Ah, so you object to the _spelling_, not the requested functionality. > (At least, not all of it.) ?All is clear now! > > OK, I retract my suggestion, but I'll let you beat up on anybody who > dredges it up in the future. ?Specifically, I think that calling it > "bytes.format" (a) is discoverable and (b) it is not obvious to me > that __format_bytes__ functionality for arbitrary types is a bad > thing, although I personally have no use case and am unlikely to catch > one for a while (thus at most I'm now -0, and could easily be > persuaded to lower that). In the specific case of adding bytes.format(), it's the weight of the backing machinery that bothers me - the PEP 3101 implementation isn't small, and providing a parallel API for bytes without slowing down the existing string implementation would be problematic (code re-use would likely slow down the common case even further, while avoiding re-use would likely end up duplicating a lot of code). However, *if* a solid set of use cases for direct bytes interpolation can be identified (and that's a big if), then it may be possible to devise a narrower, more focused API that doesn't require such a heavy back end to support it. But the use cases have to come first, and ones that are better expressed via techniques such as ASCII decoding with the surrogateescape error handler to support round-tripping don't count. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From donspauldingii at gmail.com Fri May 27 15:35:58 2011 From: donspauldingii at gmail.com (Don Spaulding) Date: Fri, 27 May 2011 08:35:58 -0500 Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code In-Reply-To: <201105272121.01078.steve@pearwood.info> References: <201105272121.01078.steve@pearwood.info> Message-ID: On Fri, May 27, 2011 at 6:21 AM, Steven D'Aprano wrote: > and Python doesn't > need to be integrated with a tool for fixing broken source code. > Doesn't it? I thought something like this was already integrated. At least, since switching to Python, my source code looks a lot less broken. I don't know about this "pindent" script, but don't take out whatever it is in Python that makes my source code look so good. :-P -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Fri May 27 16:01:32 2011 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 27 May 2011 10:01:32 -0400 Subject: [Python-ideas] Threading hooks and disable gc per thread In-Reply-To: <4DDF7738.2050809@ctypes.org> References: <4DCB228D.2010904@cheimes.de> <4DDF7738.2050809@ctypes.org> Message-ID: On Fri, May 27, 2011 at 6:04 AM, Thomas Heller wrote: > Here is my idea to fix this from within Python: > The COM objects, when created, keep the name of the currently executing > thread. In the __del__ method, where the cleanup of the COM object > happens by calling the COM .Release() method, a check is made if the > current thread is the allowed one or not. ?If it is the wrong thread, > the COM object is kept alive by appending it to some list. The list is > stored in a global dictionary indexed by the thread name. Of course, this means that multiple COM objects in the same cycle become uncollectable, which again argues for the __close__ idiom. (Just like __del__ except that it can be run more than once, and it if there are multiples in a cycle, they are run in arbitrary order instead of deferred.) Alternatively, you might get away with some wonky proxy objects as part of the COM wrapping. -jJ From theller at ctypes.org Fri May 27 12:04:40 2011 From: theller at ctypes.org (Thomas Heller) Date: Fri, 27 May 2011 12:04:40 +0200 Subject: [Python-ideas] Threading hooks and disable gc per thread In-Reply-To: <4DCB228D.2010904@cheimes.de> References: <4DCB228D.2010904@cheimes.de> Message-ID: <4DDF7738.2050809@ctypes.org> Am 12.05.2011 01:58, schrieb Christian Heimes: > Hello, > > today I've spent several hours debugging a segfault in JCC [1]. JCC is a > framework to wrap Java code for Python. It's most prominently used in > PyLucene [2]. You can read more about my debugging in [3] > > With JCC every Python thread must be registered at the JVM through JCC. > An unattached thread, that accesses a wrapped Java object, leads to > errors and may even cause a segfault. Accessing also includes garbage > collection. A code line like > > a = {} > > or > "a b c".split() > > can segfault since the allocation of a dict or a bound method runs > through _PyObject_GC_New(), which may trigger a cyclic garbage > collection run. If the current thread isn't attached to the JVM but > triggers a gc.collect() with some Java objects in a cycle, the > interpreter crashes. It's quite complicated and hard to "fix" third > party tools to attach all threads created in the third party library. I have a somewhat similar problem and just noticed this thread. In our software, we have multiple threads, and we use a lot of COM objects. COM object also have the requirement that they must only be used in the same thread (in the same apartment, to be exact) that created them. This also applies to cleaning up with the garbage collector. Ok, when the com object is part of some Python structures that include reference cycles, then the cycle gc tries to clean up the ref cycle and cleans up the COM object. This can happen in ANY thread, and in some cases the program crashes or the thread hangs. Here is my idea to fix this from within Python: The COM objects, when created, keep the name of the currently executing thread. In the __del__ method, where the cleanup of the COM object happens by calling the COM .Release() method, a check is made if the current thread is the allowed one or not. If it is the wrong thread, the COM object is kept alive by appending it to some list. The list is stored in a global dictionary indexed by the thread name. The remaining goal is to clear the lists in the dict inside the valid thread - which is done on every creation of a COM object, on every destruction of a COM object, and in the CoUninitialize function that every thread using COM must call before it is ending. At least that's my plan. Maybe you can use a similar approach? Thomas From stephen at xemacs.org Fri May 27 17:18:39 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 28 May 2011 00:18:39 +0900 Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code In-Reply-To: References: <201105272121.01078.steve@pearwood.info> Message-ID: <87boyo894g.fsf@uwakimon.sk.tsukuba.ac.jp> Don Spaulding writes: > At least, since switching to Python, my source code looks a lot > less broken. QOTW! From ronaldoussoren at mac.com Fri May 27 13:28:51 2011 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 27 May 2011 13:28:51 +0200 Subject: [Python-ideas] Suggestion: Integrate the script "pindent.py" as standard command for formatting pyhton code In-Reply-To: References: Message-ID: On 26 May, 2011, at 13:15, Serge Hulne wrote: > > It is my feeling that without such a tool Python is inherently very vulnerable to glitches occurring at editing time: > Copy / paste glitch that passes unnoticed, does not generate an exception but alters the logic of the program. > Tab key inadvertently hit. > Difficulty in assessing the target indentation level when a part of a bloc has to be pasted in a different part of the code. You seem to be arguing for the addition of block delimiters to the language (even if only in comments), you might want to try "from __future__ import braces". Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2224 bytes Desc: not available URL: From greg.ewing at canterbury.ac.nz Sat May 28 02:55:58 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 28 May 2011 12:55:58 +1200 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: Message-ID: <4DE0481E.7010005@canterbury.ac.nz> Nick Coghlan wrote: > The pedagogic cost of making it even harder than it already is to > convince people that bytes are not text would also need to be > considered. I think that boat was missed some time ago. If there were ever a serious intention to teach people that bytes are not text by limiting the feature set of bytes, it would have been better served by not giving bytes *any* features that assumed a particular encoding. As it is, bytes has quite a lot of features that implicitly treat it as ascii-encoded text: the literal and repr() forms, capitalize(), expandtabs(), lower(), splitlines(), swapcase(), title(), upper(), and all the is*() methods. Accepting all of that, and then saying "Oh, no, we couldn't possibly provide a format() method, because bytes are not text" seems a tad inconsistent. -- Greg From ncoghlan at gmail.com Sat May 28 03:16:14 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 May 2011 11:16:14 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <4DE0481E.7010005@canterbury.ac.nz> References: <4DE0481E.7010005@canterbury.ac.nz> Message-ID: On Sat, May 28, 2011 at 10:55 AM, Greg Ewing wrote: > Nick Coghlan wrote: > >> The pedagogic cost of making it even harder than it already is to >> convince people that bytes are not text would also need to be >> considered. > > I think that boat was missed some time ago. If there were > ever a serious intention to teach people that bytes are not > text by limiting the feature set of bytes, it would have > been better served by not giving bytes *any* features that > assumed a particular encoding. > > As it is, bytes has quite a lot of features that implicitly > treat it as ascii-encoded text: the literal and repr() > forms, capitalize(), expandtabs(), lower(), splitlines(), > swapcase(), title(), upper(), and all the is*() methods. > > Accepting all of that, and then saying "Oh, no, we couldn't > possibly provide a format() method, because bytes are not > text" seems a tad inconsistent. Originally we didn't have all of that - more and more of it crept back in at the behest of several binary protocol folks (including me, if I recall correctly). The urllib.parse experience has convinced me that giving in to that pressure was a mistake. We went for a premature optimisation, and screwed up the bytes API as a result. Yes, there is a potential performance issue with the decode/process/encode model, but simple keeping a bunch of string methods in the bytes API was the wrong answer (and something that isn't actually all that useful in practice, for the reasons brought up in this and other recent threads). Perhaps it is time to resurrect the idea of an explicit 'ascii' type? Add a'' literals, support the full string API as well as the bytes API, deprecate all string APIs on bytes and bytearray objects. The other thing I have learned in trying to deal with some of these issues is that ASCII-encoded text really *is* special, compared to all other encodings, due to its widespread use in a multitude of networking protocols and other formats. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From greg.ewing at canterbury.ac.nz Sat May 28 04:00:13 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 28 May 2011 14:00:13 +1200 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: <4DE0481E.7010005@canterbury.ac.nz> Message-ID: <4DE0572D.5000506@canterbury.ac.nz> Nick Coghlan wrote: > Perhaps it is time to resurrect the idea of an explicit 'ascii' type? > Add a'' literals, support the full string API as well as the bytes > API, deprecate all string APIs on bytes and bytearray objects. That sounds like an idea worth pursuing. Maybe also introduce an x'...' literal for bytes at the same time, with a view to eventually deprecating and removing the b'...' syntax. I don't think I would remove *all* the string methods from bytes, only the ones that assume ascii encoding. Searching and replacing substrings etc. still makes sense on arbitrary bytes. How would ascii behave when mixed with unicode strings? Should it automatically coerce to unicode, or should an explicit decode() be required? -- Greg From ethan at stoneleaf.us Sat May 28 04:23:43 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 27 May 2011 19:23:43 -0700 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <4DE0572D.5000506@canterbury.ac.nz> References: <4DE0481E.7010005@canterbury.ac.nz> <4DE0572D.5000506@canterbury.ac.nz> Message-ID: <4DE05CAF.9050603@stoneleaf.us> Greg Ewing wrote: > Nick Coghlan wrote: > >> Perhaps it is time to resurrect the idea of an explicit 'ascii' type? >> Add a'' literals, support the full string API as well as the bytes >> API, deprecate all string APIs on bytes and bytearray objects. > > That sounds like an idea worth pursuing. Maybe also introduce an > x'...' literal for bytes at the same time, with a view to eventually > deprecating and removing the b'...' syntax. > > I don't think I would remove *all* the string methods from bytes, > only the ones that assume ascii encoding. Searching and replacing > substrings etc. still makes sense on arbitrary bytes. > > How would ascii behave when mixed with unicode strings? Should it > automatically coerce to unicode, or should an explicit decode() > be required? And what happens when a char > 127 hits the ascii stream? As for unicode interoperation, I'm inclined to let it be implicit, since ascii directly overlaps unicode. Depending, of course, on the answer to the above question. ~Ethan~ From eric at trueblade.com Sat May 28 11:43:54 2011 From: eric at trueblade.com (Eric Smith) Date: Sat, 28 May 2011 05:43:54 -0400 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: <87k4dc8r9j.fsf@uwakimon.sk.tsukuba.ac.jp> <87fwo08kqp.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4DE0C3DA.4060005@trueblade.com> On 5/27/2011 7:51 AM, Nick Coghlan wrote: > In the specific case of adding bytes.format(), it's the weight of the > backing machinery that bothers me - the PEP 3101 implementation isn't > small, and providing a parallel API for bytes without slowing down the > existing string implementation would be problematic (code re-use would > likely slow down the common case even further, while avoiding re-use > would likely end up duplicating a lot of code). However, *if* a solid > set of use cases for direct bytes interpolation can be identified (and > that's a big if), then it may be possible to devise a narrower, more > focused API that doesn't require such a heavy back end to support it. In Python 2.x str.format() and unicode.format() share the same implementation, using the Objects/stringlib mechanism of #defines and multiple includes. So while you do get the compiled code included twice, there's only one source file that implements them both. I don't think there's any concern about performance issues. And Python 3.x has the exact same implementation, although it's only included for unicode strings. It would not be difficult to add .format() for bytes. There have been various discussions over the years of how to actually do that. I think the most recent one was to add an __bformat__ method. I'm not saying any of this is a good idea or desirable. I'm just saying it would be easy to do and wouldn't hurt the performance of unicode.format(). Eric. From ncoghlan at gmail.com Sat May 28 12:29:46 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 May 2011 20:29:46 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <4DE0C3DA.4060005@trueblade.com> References: <87k4dc8r9j.fsf@uwakimon.sk.tsukuba.ac.jp> <87fwo08kqp.fsf@uwakimon.sk.tsukuba.ac.jp> <4DE0C3DA.4060005@trueblade.com> Message-ID: On Sat, May 28, 2011 at 7:43 PM, Eric Smith wrote: > There have been various discussions over the years of how to actually do > that. I think the most recent one was to add an __bformat__ method. Python 2.x was different, as the automatic unicode coercion meant class developers still only needed to provide __str__ (or __unicode__ if they wanted to return non-ASCII data). __bformat__ (and similar ideas) are somewhat different beasts due to the encoding issues involved. Those aren't insurmountable, but they're things that don't come up with pure unicode handling (2.x unicode, 3.x str) or data that is essentially assumed to be latin-1 encoded in many cases (2.x str) > I'm not saying any of this is a good idea or desirable. I'm just saying > it would be easy to do and wouldn't hurt the performance of > unicode.format(). I'm still not sure about that, since the 2.x str.format() pretty much ignores the associated encoding problems, and I don't believe perpetuating that behaviour would be appropriate for 3.x bytes. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat May 28 12:47:48 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 May 2011 20:47:48 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <4DE05CAF.9050603@stoneleaf.us> References: <4DE0481E.7010005@canterbury.ac.nz> <4DE0572D.5000506@canterbury.ac.nz> <4DE05CAF.9050603@stoneleaf.us> Message-ID: On Sat, May 28, 2011 at 12:23 PM, Ethan Furman wrote: > Greg Ewing wrote: >> How would ascii behave when mixed with unicode strings? Should it >> automatically coerce to unicode, or should an explicit decode() >> be required? > > And what happens when a char > 127 hits the ascii stream? These are the kinds of questions that make it clear that the answer here is far from being as simple as merely adding more string methods to the existing bytes type. The underlying data model is simply *wrong* for working with bytes as if they were text. For a previous, more flexible, incarnation of this idea, Barry's post is the earlier record I found of the idea of a byte sequence oriented type that carried its encoding metadata along with it: http://mail.python.org/pipermail/python-dev/2010-June/100777.html However, supporting multi-byte codes (and other stateful codecs like ShiftJIS) poses problems for slicing operations (just as it does for us already in Unicode slicing). Hence the possibility of strictly limiting this to 7-bit ASCII - the main problem with most bytes-as-text suggestions is that they don't work for arbitrary subsets of the codecs available in the standard library and it generally isn't entirely clear which codecs will work and which ones won't. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From paul at colomiets.name Sun May 29 20:55:21 2011 From: paul at colomiets.name (Paul Colomiets) Date: Sun, 29 May 2011 21:55:21 +0300 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <4DE0C3DA.4060005@trueblade.com> References: <87k4dc8r9j.fsf@uwakimon.sk.tsukuba.ac.jp> <87fwo08kqp.fsf@uwakimon.sk.tsukuba.ac.jp> <4DE0C3DA.4060005@trueblade.com> Message-ID: On Sat, May 28, 2011 at 12:43 PM, Eric Smith wrote: > > And Python 3.x has the exact same implementation, although it's only > included for unicode strings. It would not be difficult to add .format() > for bytes. > > There have been various discussions over the years of how to actually do > that. I think the most recent one was to add an __bformat__ method. Well, that's actually great idea I think. format method on bytes could produce some data which is not an ascii, and eventually became struct.pack on steroids. The struct.pack has plenty of problems: * unable to use named fields, which is usefull to describe big structures * all fields are fixed-length, which is unfortunate for today's trend of variable length integers * can't specify separators between fields I also use str(intvalue).encode('ascii') idiom a lot. So probably I'd suggest to have something like __bformat__ with format values somewhat similar to ones struct.pack has along with str-like ones for integers. Also it might be useful to have `!len` conversion for bytes fields, for easier encoding of length-prefixed strings. To show an example, here is how two-chunk png file can be encoded: (b"\x89PNG\r\n\x1A\n" b"{s1!len:>L}IHDR{s1}{crc1:>L}" b"{s2!len:>L}IDAT{s2}{crc2:>L}\0\0\0\0IEND".format( s1=section1, crc1=crc(section1), s2=section2, crc2=crc(section2))) -- Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon May 30 04:39:45 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 30 May 2011 11:39:45 +0900 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <4DE0572D.5000506@canterbury.ac.nz> References: <4DE0481E.7010005@canterbury.ac.nz> <4DE0572D.5000506@canterbury.ac.nz> Message-ID: <877h989aj2.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > How would ascii behave when mixed with unicode strings? Should it > automatically coerce to unicode, Definitely not! Bytes are not text, and the programmer must say when they want those bytes decoded. The Python translator must not be asked to guess. > or should an explicit decode() be required? Simplest. But IMHO worth considering is an implicit coercion of Unicode to ascii via decode() with strict errors. Remember, Unicode is an invertible mapping of characters to abstract integers, which may be represented in various different ways, such as bytes, 32-bit words, or UTF-8. So in some sense there is no violation of the Unicode type here. Sorry, I can't explain more clearly at the moment, but I have a strong sense that coercion (ASCII) bytes -> Unicode *changes* or maybe even "destroys" the type of the byte, while the coercion (ASCII) Unicode -> bytes takes an abstract type "Unicode" and refines to a concrete type "bytes". Among other things, this is always reversible. This takes into account the common usage of punning natural language encoded in ASCII on binary protocol magic numbers. Then one could write stuff like my_pipe.write('HELO ' + my_fqdn) while true pedants would of course write my_pipe.write(b'HELO ' + my_fqdn) This doesn't explain how to make it easy to ensure that my_fqdn is bytes, of course, and that makes me uneasy about whether this would actually be useful, or merely confusing. (However, there are use cases where it is claimed that 'HELO ' is needed both as str and as bytes.) From ncoghlan at gmail.com Mon May 30 06:45:10 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 30 May 2011 14:45:10 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <877h989aj2.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4DE0481E.7010005@canterbury.ac.nz> <4DE0572D.5000506@canterbury.ac.nz> <877h989aj2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, May 30, 2011 at 12:39 PM, Stephen J. Turnbull wrote: > (However, there are use > cases where it is claimed that 'HELO ' is needed both as str and as > bytes.) My current opinion is that all of this still needs more experimentation outside the core before we start fiddling any further with the builtins (we blinked once in the lead-up to 3.0 by allowing bytes and bytearray to retain a lot of string methods that assume an ASCII compatible encoding, and I now have my doubts about the wisdom of even that step). I don't have a good answer on how to deal with the real world situations where the *use case* blurs the bytes/text distinction (typically by embedding ASCII text inside an otherwise binary protocol), and given the potential to backslide into the bad old days of 8-bit strings, I'm not prepared to guess, either. 3.x has largely cleared the decks to allow a better solution to evolve in this space by making it harder to blur the line accidentally, and decode()/manipulate/encode() already nicely covers many stateless use cases. If it turns out we need another type, or some other API, to deal gracefully with any use cases where that isn't enough, then so be it. However, I think we need to let the status quo run for a while longer and see what people actually using the current types in production come up with. The bytes/text division in Python 3 is by far the biggest conceptual change between the two languages, so it's going to take some time before we can figure out how many of the problems encountered are real issues with the split model not covering some use cases and how many are just people (including us) taking time to get used to the sharp division between the two worlds. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From raymond.hettinger at gmail.com Mon May 30 06:58:52 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 29 May 2011 21:58:52 -0700 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: <4DE0481E.7010005@canterbury.ac.nz> <4DE0572D.5000506@canterbury.ac.nz> <877h989aj2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On May 29, 2011, at 9:45 PM, Nick Coghlan wrote: > On Mon, May 30, 2011 at 12:39 PM, Stephen J. Turnbull > wrote: >> (However, there are use >> cases where it is claimed that 'HELO ' is needed both as str and as >> bytes.) > > My current opinion is that all of this still needs more > experimentation outside the core before we start fiddling any further > with the builtins (we blinked once in the lead-up to 3.0 by allowing > bytes and bytearray to retain a lot of string methods that assume an > ASCII compatible encoding, and I now have my doubts about the wisdom > of even that step). I don't have a good answer on how to deal with the > real world situations where the *use case* blurs the bytes/text > distinction (typically by embedding ASCII text inside an otherwise > binary protocol), and given the potential to backslide into the bad > old days of 8-bit strings, I'm not prepared to guess, either. +1 Raymond From tjreedy at udel.edu Mon May 30 22:04:36 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 30 May 2011 16:04:36 -0400 Subject: [Python-ideas] Bytes formatting (was Re: Adding 'bytes' as alias for 'latin_1' codec) In-Reply-To: References: <87k4dc8r9j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Changing the subject to what it has actually become. On 5/27/2011 5:27 AM, Nick Coghlan wrote: > We can almost certainly do better when it comes to constructing byte > sequences from component parts, but simply saying "oh, just add a > format() method to bytes objects" doesn't cut it, since the associated > magic methods for str.format are all string based, STRING FORMATTING From a modern and Python viewpoint, string formatting is about interpolating text representations of objects into a text template. By default, the text representation is str(object). Exception 1. str.format has an optional conversion specifier "!s/r/a" to specify repr(object) or ascii(object) instead of str(object). (It can also be used to overrides exception 2.) This is not relevant to bytes formatting. Exception 2.str.format, like % formatting, does special processing of numbers. Electronic computing was originally used only to compute numbers and text formatting was originally about formatting numbers, usually in tables, with optional text decoration. That is why the maximum field size for string interpolation is still called 'precision'. There are numerous variations in number formatting and most of the complication of format specifications arise therefrom. BYTES FORMATTING If the desired result consists entirely of text encoded with one encoding, the current recommended method is to construct the text and encode. I think this is the proper method and do not think that anything we add should be aimed at this use case. There are two other current methods to assemble bytes from pieces. One is concatenation; it has the same advantages and disadvantages of string concatenation. Another, overlooked in the current discussion so afr, is in-place editing of a bytearray by index and slice assignment. It has the disadvantage of having to know the correct indexes and slice points. If we add another bytes formatting function or method, I think it should be about interpolating bytes into a bytes template. The use cases would be anything other than mono-encoded text -- text with multiple encodings or non-text bytes possibly intermixed with encoded text. > and bytes interpolation also needs to address encoding issues > for anything that isn't already a byte sequence. As indicated above, I disagree if 'encoding' means 'text encoding'. Let .encode handle encoding issues. PROPOSAL A bytes template uses b'{' and b'}' to mark interpolation fields and other ascii bytes within as needed. It uses the ascii equivalent of the string field_name spec. It does not have a conversion spec. The format_spec should have the minimum needed for existing public protocols. How much more is up for discussion. We need use cases. One possibility to keep in mind is that a bytes template could constructed by an ascii-compatible encoding of formatted text. Specs for bytes fields can be protected in a text template by doubling the braces. >>> '{} {{byte-field-spec}}'.format(1).encode() b'1 {byte-field-spec}' A major issue is what to do with numbers. Sometimes they needed to be ascii encoded, sometime binary encoded. The baseline is to do nothing extra and require all args to be bytes. I think this may be appropriate for floats as they are seldom specifically used in protocols. I think the same may be true for ints with signs. So I think we mainly need to consider counts (unsigned ints) for possible exceptional processing. Option 0. As stated, no special number specs. Option 1. Use a subset of the current int spec to produce ascii encodings; use struct.pack for binary encodings. (How many of the current integer presentation types would be needed?) Option 2. Use an adaptation of the struct.pack mini-language to produce binary encodings; use encoded str.format for ascii encodings. (The latter might be done as part of a text-to-bytes-template process as indicated above.) Option 3. Combine options 1 and 2. This might best be done by replacing the omitted 'conversion' field with a 'number-encoding' field, b'!a' or b'!b', to indicate ascii or binary conversion and corresponding interpretation of the format spec. (In other words, do not try to combine the number to text and number to binary mini-languages, but add a 'prefix' to specify which is being used.) -- Terry Jan Reedy From guido at python.org Mon May 30 22:27:05 2011 From: guido at python.org (Guido van Rossum) Date: Mon, 30 May 2011 13:27:05 -0700 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: <4DE0481E.7010005@canterbury.ac.nz> <4DE0572D.5000506@canterbury.ac.nz> <877h989aj2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, May 29, 2011 at 9:45 PM, Nick Coghlan wrote: > On Mon, May 30, 2011 at 12:39 PM, Stephen J. Turnbull > wrote: >> (However, there are use >> cases where it is claimed that 'HELO ' is needed both as str and as >> bytes.) > > My current opinion is that all of this still needs more > experimentation outside the core before we start fiddling any further > with the builtins (we blinked once in the lead-up to 3.0 by allowing > bytes and bytearray to retain a lot of string methods that assume an > ASCII compatible encoding, and I now have my doubts about the wisdom > of even that step). I don't have a good answer on how to deal with the > real world situations where the *use case* blurs the bytes/text > distinction (typically by embedding ASCII text inside an otherwise > binary protocol), and given the potential to backslide into the bad > old days of 8-bit strings, I'm not prepared to guess, either. > > 3.x has largely cleared the decks to allow a better solution to evolve > in this space by making it harder to blur the line accidentally, and > decode()/manipulate/encode() already nicely covers many stateless use > cases. If it turns out we need another type, or some other API, to > deal gracefully with any use cases where that isn't enough, then so be > it. However, I think we need to let the status quo run for a while > longer and see what people actually using the current types in > production come up with. The bytes/text division in Python 3 is by far > the biggest conceptual change between the two languages, so it's going > to take some time before we can figure out how many of the problems > encountered are real issues with the split model not covering some use > cases and how many are just people (including us) taking time to get > used to the sharp division between the two worlds. Well said, Nick. We ought to attempt to live with the current situation for quite a bit longer before stirring the pot again. My feeling is that one of the main reasons why this topic keeps coming up is simply that it is different from Python 2 -- this is "the year of Python 3" so more people than ever before are discovering the differences between Python 2 and 3. Most people's minds probably haven't switched over, and the solutions and attitudes that worked in Python 2 don't always work so well in Python 3. Let's also remember that while Python is not exactly blazing a new trail here, it is also not following the most conservative course. Most languages of Python's vintage or older are still using a model that blurs the line between text and binary data, representing Unicode text as bytes that happen to be encoded in some encoding. Even if the language assumes a default encoding this doesn't mean that all data manipulated is actually text encoded in that encoding -- it just means that you may get nonsense when you use text operations on data that uses some other encoding, just as you get nonsense when you use text operations on binary data (e.g. using readlines() on a JPEG file). Python lets you do this too, to some extent, with some of the text operations on bytes data, and this is definitely a compromise. I hope that we have built in just enough friction to remind people that this is not the best way to deal with text most of the time, while still allowing advanced users who are writing e.g. parsers for Internet protocols to stay at the bytes layer at a reasonable cost. Personally I think we got this close enough to right that we won't having to rethink the whole thing, even if small tweaks might be possible; but there's no need to rush. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Tue May 31 02:38:07 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 31 May 2011 12:38:07 +1200 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <877h989aj2.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4DE0481E.7010005@canterbury.ac.nz> <4DE0572D.5000506@canterbury.ac.nz> <877h989aj2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4DE4386F.1030905@canterbury.ac.nz> Stephen J. Turnbull wrote: > Greg Ewing writes: > > > How would ascii behave when mixed with unicode strings? Should it > > automatically coerce to unicode, > > Definitely not! Bytes are not text, and the programmer must say when > they want those bytes decoded. But the proposed 'ascii' type *is* text, though. Whether it's a good idea to auto-coerce I'm not sure, but it's not obviously wrong to do so. -- Greg From python at mrabarnett.plus.com Tue May 31 04:11:59 2011 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 31 May 2011 03:11:59 +0100 Subject: [Python-ideas] Bytes formatting (was Re: Adding 'bytes' as alias for 'latin_1' codec) In-Reply-To: References: <87k4dc8r9j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4DE44E6F.50708@mrabarnett.plus.com> On 30/05/2011 21:04, Terry Reedy wrote: > Changing the subject to what it has actually become. > PROPOSAL > > A bytes template uses b'{' and b'}' to mark interpolation fields and > other ascii bytes within as needed. It uses the ascii equivalent of the > string field_name spec. It does not have a conversion spec. The > format_spec should have the minimum needed for existing public > protocols. How much more is up for discussion. We need use cases. > > One possibility to keep in mind is that a bytes template could > constructed by an ascii-compatible encoding of formatted text. Specs for > bytes fields can be protected in a text template by doubling the braces. > > >>> '{} {{byte-field-spec}}'.format(1).encode() > b'1 {byte-field-spec}' > > A major issue is what to do with numbers. Sometimes they needed to be > ascii encoded, sometime binary encoded. The baseline is to do nothing > extra and require all args to be bytes. I think this may be appropriate > for floats as they are seldom specifically used in protocols. I think > the same may be true for ints with signs. So I think we mainly need to > consider counts (unsigned ints) for possible exceptional processing. > > Option 0. As stated, no special number specs. > > Option 1. Use a subset of the current int spec to produce ascii > encodings; use struct.pack for binary encodings. (How many of the > current integer presentation types would be needed?) > > Option 2. Use an adaptation of the struct.pack mini-language to produce > binary encodings; use encoded str.format for ascii encodings. (The > latter might be done as part of a text-to-bytes-template process as > indicated above.) > > Option 3. Combine options 1 and 2. This might best be done by replacing > the omitted 'conversion' field with a 'number-encoding' field, b'!a' or > b'!b', to indicate ascii or binary conversion and corresponding > interpretation of the format spec. (In other words, do not try to > combine the number to text and number to binary mini-languages, but add > a 'prefix' to specify which is being used.) > Perhaps something like this: # Format int as byte. b"{:b}".format(128) returns b"\x80" # Format int as double-byte. b"{:2b}".format(0x100) returns b"\x00\x01" or b"\x01\x00" # Format int as double-byte, little-endian. b"{:<2b}".format(0x100) returns b"\x00\x01" # Format int as double-byte, big-endian. b"{:>2b}".format(0x100) returns b"\x01\x00" # Format list of ints as signed bytes. b"{:s}".format([1, -2, 3]) returns b"\x01\xFE\x03" # Format list of ints as unsigned bytes. b"{:u}".format([1, 254, 3]) returns b"\x01\xFE\x03" # Format ASCII-only string as bytes. b"{:a}".format("abc") returns b"abc" From stephen at xemacs.org Tue May 31 07:51:47 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 31 May 2011 14:51:47 +0900 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <4DE4386F.1030905@canterbury.ac.nz> References: <4DE0481E.7010005@canterbury.ac.nz> <4DE0572D.5000506@canterbury.ac.nz> <877h989aj2.fsf@uwakimon.sk.tsukuba.ac.jp> <4DE4386F.1030905@canterbury.ac.nz> Message-ID: <87ipsr76z0.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Stephen J. Turnbull wrote: > > Greg Ewing writes: > > > > > How would ascii behave when mixed with unicode strings? Should it > > > automatically coerce to unicode, > > > > Definitely not! Bytes are not text, and the programmer must say when > > they want those bytes decoded. > > But the proposed 'ascii' type *is* text, though. If it's intended that the 'ascii' type *be* text, I don't see the point. It *is* Unicode (with a restricted range), and no coercion is necessary between str and 'ascii', just a change of representation. This can be done completely transparently[1], no need for a new type, except that some effort on the part of implementer can be saved by imposing ongoing annoyance on the application programmer. But even as a separate type, 'ascii' still can't mix with bytes safely, for the same reason that str can't mix with bytes: 'ascii' and str have a known fixed encoding (Unicode), and bytes have an unknown, variable encoding (possibly the non-encoding 'binary'). YAGNI... Footnotes: [1] For some use cases it might be useful to allow specifying the representation in advance, as a micro-optimization. From greg.ewing at canterbury.ac.nz Tue May 31 09:32:18 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 31 May 2011 19:32:18 +1200 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <87ipsr76z0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4DE0481E.7010005@canterbury.ac.nz> <4DE0572D.5000506@canterbury.ac.nz> <877h989aj2.fsf@uwakimon.sk.tsukuba.ac.jp> <4DE4386F.1030905@canterbury.ac.nz> <87ipsr76z0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4DE49982.3090208@canterbury.ac.nz> Stephen J. Turnbull wrote: > But even as a separate type, 'ascii' still can't mix with bytes > safely, Yes, it can, because it's also bytes. :-) If you're using the special ascii type at all, rather than an ordinary str, it's precisely because you want to mix it with bytes. Making that part hard would defeat the purpose, -- Greg From ncoghlan at gmail.com Tue May 31 10:24:30 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 May 2011 18:24:30 +1000 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <4DE49982.3090208@canterbury.ac.nz> References: <4DE0481E.7010005@canterbury.ac.nz> <4DE0572D.5000506@canterbury.ac.nz> <877h989aj2.fsf@uwakimon.sk.tsukuba.ac.jp> <4DE4386F.1030905@canterbury.ac.nz> <87ipsr76z0.fsf@uwakimon.sk.tsukuba.ac.jp> <4DE49982.3090208@canterbury.ac.nz> Message-ID: On Tue, May 31, 2011 at 5:32 PM, Greg Ewing wrote: > Stephen J. Turnbull wrote: > >> But even as a separate type, 'ascii' still can't mix with bytes >> safely, > > Yes, it can, because it's also bytes. :-) > > If you're using the special ascii type at all, rather > than an ordinary str, it's precisely because you want > to mix it with bytes. Making that part hard would > defeat the purpose, Indeed, the specific use case here is working with ASCII snippets embedded within ASCII compatible encodings (or otherwise demarcated from the 8-bit data). As I stated elsewhere, we still need more usage of Python 3 in production before we can find out whether or not this is a significant enough use case to require builtin support, or if third party libraries will be up to the task. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Tue May 31 11:08:06 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 31 May 2011 18:08:06 +0900 Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: <4DE49982.3090208@canterbury.ac.nz> References: <4DE0481E.7010005@canterbury.ac.nz> <4DE0572D.5000506@canterbury.ac.nz> <877h989aj2.fsf@uwakimon.sk.tsukuba.ac.jp> <4DE4386F.1030905@canterbury.ac.nz> <87ipsr76z0.fsf@uwakimon.sk.tsukuba.ac.jp> <4DE49982.3090208@canterbury.ac.nz> Message-ID: <87ei3f6xvt.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Stephen J. Turnbull wrote: > > > But even as a separate type, 'ascii' still can't mix with bytes > > safely, > > Yes, it can, because it's also bytes. :-) To the extent that's safe, you may as well just use str and force encoding with the ascii codec and strict errors (as I suggested earlier). AFAICS, the argument that the visual signal of the special literal syntax helps is bogus. It doesn't help with variables; variables aren't typed in Python. It's still just as possible to type a'?????', although it might make the mistake a little more visible. And in most cases, the use case for this feature will be very stylized, with a very small vocabulary of ASCII puns, written as literals at the point of combination with a bytes object. Anything else I can think of should be handled as text, via conversion to str. I just don't see a use case for an 'ascii' type, vs. coercing str to bytes and raising an error if the str is not all-ASCII. > If you're using the special ascii type at all, rather > than an ordinary str, it's precisely because you want > to mix it with bytes. Making that part hard would > defeat the purpose, Indeed. Most alleged use cases for "mixing" *should* be made hard to do by operating on bytes directly. Cf. the mixed-encoding log file example. From janssen at parc.com Tue May 31 18:16:46 2011 From: janssen at parc.com (Bill Janssen) Date: Tue, 31 May 2011 09:16:46 PDT Subject: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec. In-Reply-To: References: <4DE0481E.7010005@canterbury.ac.nz> Message-ID: <79306.1306858606@parc.com> Nick Coghlan wrote: > Perhaps it is time to resurrect the idea of an explicit 'ascii' type? > Add a'' literals, support the full string API as well as the bytes > API, deprecate all string APIs on bytes and bytearray objects. The > other thing I have learned in trying to deal with some of these issues > is that ASCII-encoded text really *is* special, compared to all other > encodings, due to its widespread use in a multitude of networking > protocols and other formats. I like the deprecations you suggest, but I'd prefer to see a more general solution: the 'str' type extended so that it had two possible representations for strings, the current format and an "encoded" format, which would be kept as an array of bytes plus an encoding. It would transcode only as necessary -- for example, the 're' module might require the current Unicode encoding. An explicit method would be added to allow the user to force transcoding. This would complicate life at the C level, to be sure. Though, perhaps not so much, given the proper macrology. Bill From tjreedy at udel.edu Tue May 31 20:08:33 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 31 May 2011 14:08:33 -0400 Subject: [Python-ideas] Bytes formatting (was Re: Adding 'bytes' as alias for 'latin_1' codec) In-Reply-To: References: <4DE0481E.7010005@canterbury.ac.nz> <4DE0572D.5000506@canterbury.ac.nz> <877h989aj2.fsf@uwakimon.sk.tsukuba.ac.jp> <4DE4386F.1030905@canterbury.ac.nz> <87ipsr76z0.fsf@uwakimon.sk.tsukuba.ac.jp> <4DE49982.3090208@canterbury.ac.nz> Message-ID: On 5/31/2011 4:24 AM, Nick Coghlan wrote: > On Tue, May 31, 2011 at 5:32 PM, Greg Ewing >> If you're using the special ascii type at all, rather >> than an ordinary str, it's precisely because you want >> to mix it with bytes. Making that part hard would >> defeat the purpose, > > Indeed, the specific use case here is working with ASCII snippets > embedded within ASCII compatible encodings (or otherwise demarcated > from the 8-bit data). My proposal for a function that interpolates bytes into bytes covers this case. There is no need for a new class at all. I agree that experience and experimentation is needed before adding anything to the atdlib. But here is a baseline version in Python: from itertools import zip_longest import re field = re.compile(b'{}') def bformat(template, *inserts): temlits = re.split(field, template) # template literals res = bytearray() for t,i in zip_longest(temlits, inserts, fillvalue=b''): res.extend(t) res.extend(i) return res print(bformat(b'xxx{}yyy{}zzz', b'help', b'me')) # bytearray(b'xxxhelpyyymezzz') This is, of course, not limited to the ascii subset of bytes. print(bformat(b'xx\xaa{}yy\xbb{}zzz', b'h\xeeelp', b'm\xeee')) #bytearray(b'xx\xaah\xeeelpyy\xbbm\xeeezzz') The next step would be to change the field re to allow a field spec between {} and add capturing parens so that re.split keeps the field specs. Then use those to format the inserted bytes or, later, ints. -- Terry Jan Reedy From tjreedy at udel.edu Tue May 31 20:18:03 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 31 May 2011 14:18:03 -0400 Subject: [Python-ideas] Bytes formatting (was Re: Adding 'bytes' as alias for 'latin_1' codec) In-Reply-To: <4DE44E6F.50708@mrabarnett.plus.com> References: <87k4dc8r9j.fsf@uwakimon.sk.tsukuba.ac.jp> <4DE44E6F.50708@mrabarnett.plus.com> Message-ID: On 5/30/2011 10:11 PM, MRAB wrote: > On 30/05/2011 21:04, Terry Reedy wrote: >> Option 3. Combine options 1 and 2. This might best be done by replacing >> the omitted 'conversion' field with a 'number-encoding' field, b'!a' or >> b'!b', to indicate ascii or binary conversion and corresponding >> interpretation of the format spec. (In other words, do not try to >> combine the number to text and number to binary mini-languages, but add >> a 'prefix' to specify which is being used.) Unless someone has a better idea of how to combine than I do ;-). > Perhaps something like this: > > # Format int as byte. > b"{:b}".format(128) returns b"\x80" > > # Format int as double-byte. > b"{:2b}".format(0x100) returns b"\x00\x01" or b"\x01\x00" > > # Format int as double-byte, little-endian. > b"{:<2b}".format(0x100) returns b"\x00\x01" > > # Format int as double-byte, big-endian. > b"{:>2b}".format(0x100) returns b"\x01\x00" > > # Format list of ints as signed bytes. > b"{:s}".format([1, -2, 3]) returns b"\x01\xFE\x03" > > # Format list of ints as unsigned bytes. > b"{:u}".format([1, 254, 3]) returns b"\x01\xFE\x03" > > # Format ASCII-only string as bytes. > b"{:a}".format("abc") returns b"abc" Interesting. The core ideas of my proposal are * There are bytes construction cases not sensibly handled by test interpolation followed by encoding. Bytes concatenation and bytearray manipulation may be awkward, or follow patterns that can usefully be captures in a new function. * Bytes interpolation should only deal with bytes and maybe ints and have nothing to do with text encoding. * Design details should be based on use cases and experimentation with suggestions such as the above by people who would be the users of such a function. Experimental functions should be uploaded to pypi. -- Terry Jan Reedy