From paul at colomiets.name Tue Oct 1 21:17:11 2013 From: paul at colomiets.name (Paul Colomiets) Date: Tue, 1 Oct 2013 22:17:11 +0300 Subject: [Python-ideas] pprint in displayhook In-Reply-To: References: <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> Message-ID: Hi, On Sun, Sep 29, 2013 at 11:38 PM, Serhiy Storchaka wrote: > > What should be changed in pprint? > Would be nice if it support custom types. Just my 2 cents -- Paul From robert.kern at gmail.com Tue Oct 1 22:00:27 2013 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 01 Oct 2013 21:00:27 +0100 Subject: [Python-ideas] pprint in displayhook In-Reply-To: References: <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> Message-ID: On 2013-10-01 20:17, Paul Colomiets wrote: > Hi, > > On Sun, Sep 29, 2013 at 11:38 PM, Serhiy Storchaka wrote: >> >> What should be changed in pprint? > > Would be nice if it support custom types. For what it's worth, I would like to point out that IPython uses an adaptation of Armin Ronacher's pretty.py for pretty-printing as the default displayhook. It is a nice design that supports custom types after-the-fact. https://github.com/ipython/ipython/blob/master/IPython/lib/pretty.py Armin's original code: http://dev.pocoo.org/hg/sandbox/file/tip/pretty/pretty.py -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ncoghlan at gmail.com Wed Oct 2 01:20:34 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Oct 2013 09:20:34 +1000 Subject: [Python-ideas] pprint in displayhook In-Reply-To: References: <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> Message-ID: On 2 Oct 2013 05:45, "Paul Colomiets" wrote: > > Hi, > > On Sun, Sep 29, 2013 at 11:38 PM, Serhiy Storchaka wrote: > > > > What should be changed in pprint? > > > > Would be nice if it support custom types. Fixing pprint to allow customisation was a key part of the rationale for functools.singledispatch. I guess Lukasz just hasn't had time to work on the follow-up patch to refactor the pprint module (or else I just missed it on the tracker, which is entirely plausible). Cheers, Nick. > > Just my 2 cents > > -- > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Oct 2 02:56:45 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 2 Oct 2013 10:56:45 +1000 Subject: [Python-ideas] pprint in displayhook In-Reply-To: References: <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> Message-ID: <20131002005644.GI7989@ando> On Sun, Sep 29, 2013 at 11:38:30PM +0300, Serhiy Storchaka wrote: > 28.09.13 07:17, Raymond Hettinger ???????(??): > >This might be a reasonable idea if pprint were in better shape. > >I think substantial work needs to be done on it, before it would > >be worthy of becoming the default method of display. > > What should be changed in pprint? I would like to see pprint be smarter about printing lists and dicts. At the moment, a long list is either printed all on one line, like the default display, or one item per line. This can end up as one long, narrow column, which is worse than the default. I'd like to see it be smarter about using multiple columns. E.g. pprint([1, 2, 3, ... 1000]) rather than this: [1, 2, 3, ... 998, 999, 1000] something like this: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000] -- Steven From robert.kern at gmail.com Wed Oct 2 17:31:58 2013 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 02 Oct 2013 16:31:58 +0100 Subject: [Python-ideas] pprint in displayhook In-Reply-To: <20131002005644.GI7989@ando> References: <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> <20131002005644.GI7989@ando> Message-ID: On 2013-10-02 01:56, Steven D'Aprano wrote: > On Sun, Sep 29, 2013 at 11:38:30PM +0300, Serhiy Storchaka wrote: >> 28.09.13 07:17, Raymond Hettinger ???????(??): >>> This might be a reasonable idea if pprint were in better shape. >>> I think substantial work needs to be done on it, before it would >>> be worthy of becoming the default method of display. >> >> What should be changed in pprint? > > I would like to see pprint be smarter about printing lists and dicts. At > the moment, a long list is either printed all on one line, like the > default display, or one item per line. This can end up as one long, > narrow column, which is worse than the default. I'd like to see it be > smarter about using multiple columns. > > E.g. pprint([1, 2, 3, ... 1000]) > > rather than this: > > [1, > 2, > 3, > ... > 998, > 999, > 1000] > > something like this: > > [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, > ... > 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000] As someone who has used pretty-printing as their default displayhook for a decade now via IPython, I have to say that this case happens much less often than one might expect. It *is* irritating the rare times it does come up, but less so than what I expect we would see from the false positives of a more intelligent algorithm. But I withhold final judgement until I see the actual results of such an algorithm. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From paul at colomiets.name Wed Oct 2 22:20:47 2013 From: paul at colomiets.name (Paul Colomiets) Date: Wed, 2 Oct 2013 23:20:47 +0300 Subject: [Python-ideas] pprint in displayhook In-Reply-To: References: <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> Message-ID: Hi, On Wed, Oct 2, 2013 at 2:20 AM, Nick Coghlan wrote: > Fixing pprint to allow customisation was a key part of the rationale for > functools.singledispatch. I guess Lukasz just hasn't had time to work on the > follow-up patch to refactor the pprint module (or else I just missed it on > the tracker, which is entirely plausible). > Nice. Any chances it will be in time for python 3.4? We are waiting for it for about a decade :) -- Paul From g.rodola at gmail.com Thu Oct 3 19:09:51 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 3 Oct 2013 19:09:51 +0200 Subject: [Python-ideas] Allow from foo import bar* Message-ID: I suppose this has already been proposed in past but couldn't find any online reference so here goes. When it comes to module constant imports I usually like being explicit it's OK with me as long as I have to do: >>> from resource import (RLIMIT_CORE, RLIMIT_CPU, RLIMIT_FSIZE) Nevertheless in case the existence of certain constants depends on the platform in use I end up doing: >>> if hasattr(resource, "RLIMIT_MSGQUEUE"): # linux only .... import resource.RLIMIT_MSGQUEUE .... >>> if hasattr(resource, "RLIMIT_NICE"): # linux only .... import resource.RLIMIT_NICE .... ...or worse, if for simplicity I'm willing to simply import all RLIMIT_* constants I'll have to do this: >>> import resource >>> import sys >>> for name in dir(resource): .... if name.startswith('RLIMIT_'): .... setattr(sys.modules[__name__], name, getattr(resource, name)) ...or just give up and use: from resource import * ...which of course will pollute the namespace with unnecessary stuff. So why not just allow "from resource import RLIMIT_*" syntax? Another interesting variation might be: >>> from socket import AF_*, SOCK_* >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM (2, 10, 1, 2) On the other hand mixing "*" and "common" imports would be forbidden: >>> from socket import AF_*, socket, File "", line 1 from socket import AF_*, socket ^ SyntaxError: invalid syntax; Thoughts? --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Oct 3 19:16:37 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 3 Oct 2013 10:16:37 -0700 Subject: [Python-ideas] Allow from foo import bar* In-Reply-To: References: Message-ID: Hm. Why not just use "import socket" and then use "socket.AF_"? On Thu, Oct 3, 2013 at 10:09 AM, Giampaolo Rodola' wrote: > I suppose this has already been proposed in past but couldn't find > any online reference so here goes. > When it comes to module constant imports I usually like being explicit > it's OK with me as long as I have to do: > > >>> from resource import (RLIMIT_CORE, RLIMIT_CPU, RLIMIT_FSIZE) > > Nevertheless in case the existence of certain constants depends on the > platform in use I end up doing: > > >>> if hasattr(resource, "RLIMIT_MSGQUEUE"): # linux only > .... import resource.RLIMIT_MSGQUEUE > .... > >>> if hasattr(resource, "RLIMIT_NICE"): # linux only > .... import resource.RLIMIT_NICE > .... > > > ...or worse, if for simplicity I'm willing to simply import all RLIMIT_* > constants I'll have to do this: > > >>> import resource > >>> import sys > >>> for name in dir(resource): > .... if name.startswith('RLIMIT_'): > .... setattr(sys.modules[__name__], name, getattr(resource, name)) > > ...or just give up and use: > > from resource import * > > ...which of course will pollute the namespace with unnecessary stuff. > So why not just allow "from resource import RLIMIT_*" syntax? > Another interesting variation might be: > > > >>> from socket import AF_*, SOCK_* > >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM > (2, 10, 1, 2) > > > On the other hand mixing "*" and "common" imports would be forbidden: > > >>> from socket import AF_*, socket, > File "", line 1 > from socket import AF_*, socket > ^ > SyntaxError: invalid syntax; > > > Thoughts? > > > --- Giampaolo > https://code.google.com/p/pyftpdlib/ > https://code.google.com/p/psutil/ > https://code.google.com/p/pysendfile/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Oct 3 19:25:46 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 03 Oct 2013 18:25:46 +0100 Subject: [Python-ideas] Allow from foo import bar* In-Reply-To: References: Message-ID: <524DA89A.6090608@mrabarnett.plus.com> On 03/10/2013 18:09, Giampaolo Rodola' wrote: > I suppose this has already been proposed in past but couldn't find > any online reference so here goes. > When it comes to module constant imports I usually like being explicit > it's OK with me as long as I have to do: > > >>> from resource import (RLIMIT_CORE, RLIMIT_CPU, RLIMIT_FSIZE) > > Nevertheless in case the existence of certain constants depends on the > platform in use I end up doing: > > >>> if hasattr(resource, "RLIMIT_MSGQUEUE"): # linux only > .... import resource.RLIMIT_MSGQUEUE > .... > >>> if hasattr(resource, "RLIMIT_NICE"): # linux only > .... import resource.RLIMIT_NICE > .... > > > ...or worse, if for simplicity I'm willing to simply import all RLIMIT_* > constants I'll have to do this: > > >>> import resource > >>> import sys > >>> for name in dir(resource): > .... if name.startswith('RLIMIT_'): > .... setattr(sys.modules[__name__], name, getattr(resource, name)) > > ...or just give up and use: > > from resource import * > > ...which of course will pollute the namespace with unnecessary stuff. > So why not just allow "from resource import RLIMIT_*" syntax? > Another interesting variation might be: > > > >>> from socket import AF_*, SOCK_* > >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM > (2, 10, 1, 2) > > > On the other hand mixing "*" and "common" imports would be forbidden: > > >>> from socket import AF_*, socket, > File "", line 1 > from socket import AF_*, socket > ^ > SyntaxError: invalid syntax; > > > Thoughts? > If you're importing RLIMIT_MSGQUEUE, then presumably you're using it somewhere(!), but if it's platform-specific, you'll still need to check which platform the code is running on anyway before trying to use it... From storchaka at gmail.com Thu Oct 3 20:42:03 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 03 Oct 2013 21:42:03 +0300 Subject: [Python-ideas] Allow from foo import bar* In-Reply-To: References: Message-ID: 03.10.13 20:09, Giampaolo Rodola' ???????(??): > Another interesting variation might be: > > >>> from socket import AF_*, SOCK_* > >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM > (2, 10, 1, 2) >>> from socket import AddressFamily, SocketType >>> globals().update(AddressFamily.__members__) >>> globals().update(SocketType.__members__) >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM (, , , ) From g.rodola at gmail.com Thu Oct 3 20:43:09 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 3 Oct 2013 20:43:09 +0200 Subject: [Python-ideas] Allow from foo import bar* In-Reply-To: References: Message-ID: > Hm. Why not just use "import socket" and then use "socket.AF_"? That's what I usually do as well (because explicit is better than implicit) but from my understanding when it comes to constants it is generally not considered a bad practice to import them directly into the module namespace. I guess my specific case is bit different though. I have all these constants defined in a _linux.py submodule which I import from __init__.py in order to expose them publicly. And this is how I do that: # Linux >= 2.6.36 if _psplatform.HAS_PRLIMIT: from psutil._pslinux import (RLIM_INFINITY, RLIMIT_AS, RLIMIT_CORE, RLIMIT_CPU, RLIMIT_DATA, RLIMIT_FSIZE, RLIMIT_LOCKS, RLIMIT_MEMLOCK, RLIMIT_NOFILE, RLIMIT_NPROC, RLIMIT_RSS, RLIMIT_STACK) if hasattr(_psplatform, "RLIMIT_MSGQUEUE"): RLIMIT_MSGQUEUE = _psplatform.RLIMIT_MSGQUEUE if hasattr(_psplatform, "RLIMIT_NICE"): RLIMIT_NICE = _psplatform.RLIMIT_NICE if hasattr(_psplatform, "RLIMIT_RTPRIO"): RLIMIT_RTPRIO = _psplatform.RLIMIT_RTPRIO if hasattr(_psplatform, "RLIMIT_RTTIME"): RLIMIT_RTTIME = _psplatform.RLIMIT_RTTIME if hasattr(_psplatform, "RLIMIT_SIGPENDING"): RLIMIT_SIGPENDING = _psplatform.RLIMIT_SIGPENDING In *this specific case* a "from _psplatform import RLIM*" would have solved my problem nicely. On one hand this might look like encouraging wildcard import usage, but I think it's the opposite. Sometimes people use "from foo import *" just because "from foo import bar*" is not available. --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ On Thu, Oct 3, 2013 at 7:16 PM, Guido van Rossum wrote: > Hm. Why not just use "import socket" and then use "socket.AF_"? > > > On Thu, Oct 3, 2013 at 10:09 AM, Giampaolo Rodola' wrote: > >> I suppose this has already been proposed in past but couldn't find >> any online reference so here goes. >> When it comes to module constant imports I usually like being explicit >> it's OK with me as long as I have to do: >> >> >>> from resource import (RLIMIT_CORE, RLIMIT_CPU, RLIMIT_FSIZE) >> >> Nevertheless in case the existence of certain constants depends on the >> platform in use I end up doing: >> >> >>> if hasattr(resource, "RLIMIT_MSGQUEUE"): # linux only >> .... import resource.RLIMIT_MSGQUEUE >> .... >> >>> if hasattr(resource, "RLIMIT_NICE"): # linux only >> .... import resource.RLIMIT_NICE >> .... >> >> >> ...or worse, if for simplicity I'm willing to simply import all RLIMIT_* >> constants I'll have to do this: >> >> >>> import resource >> >>> import sys >> >>> for name in dir(resource): >> .... if name.startswith('RLIMIT_'): >> .... setattr(sys.modules[__name__], name, getattr(resource, name)) >> >> ...or just give up and use: >> >> from resource import * >> >> ...which of course will pollute the namespace with unnecessary stuff. >> So why not just allow "from resource import RLIMIT_*" syntax? >> Another interesting variation might be: >> >> >> >>> from socket import AF_*, SOCK_* >> >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM >> (2, 10, 1, 2) >> >> >> On the other hand mixing "*" and "common" imports would be forbidden: >> >> >>> from socket import AF_*, socket, >> File "", line 1 >> from socket import AF_*, socket >> ^ >> SyntaxError: invalid syntax; >> >> >> Thoughts? >> >> >> --- Giampaolo >> https://code.google.com/p/pyftpdlib/ >> https://code.google.com/p/psutil/ >> https://code.google.com/p/pysendfile/ >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Oct 3 21:00:39 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 3 Oct 2013 12:00:39 -0700 Subject: [Python-ideas] Allow from foo import bar* In-Reply-To: References: Message-ID: Hm. It seems a pretty small use case for what would be a major implementation challenge -- I'm sure there would be lots of issues implementing this cleanly given all the special casing for import *, and the special handling of importlib during bootstrap. On Thu, Oct 3, 2013 at 11:43 AM, Giampaolo Rodola' wrote: > > Hm. Why not just use "import socket" and then use "socket.AF_"? > > That's what I usually do as well (because explicit is better than > implicit) but from my understanding when it comes to constants it is > generally not considered a bad practice to import them directly into the > module namespace. > I guess my specific case is bit different though. > I have all these constants defined in a _linux.py submodule which I import > from __init__.py in order to expose them publicly. > And this is how I do that: > > # Linux >= 2.6.36 > if _psplatform.HAS_PRLIMIT: > from psutil._pslinux import (RLIM_INFINITY, RLIMIT_AS, RLIMIT_CORE, > RLIMIT_CPU, RLIMIT_DATA, RLIMIT_FSIZE, > RLIMIT_LOCKS, RLIMIT_MEMLOCK, > RLIMIT_NOFILE, > RLIMIT_NPROC, RLIMIT_RSS, > RLIMIT_STACK) > if hasattr(_psplatform, "RLIMIT_MSGQUEUE"): > RLIMIT_MSGQUEUE = _psplatform.RLIMIT_MSGQUEUE > if hasattr(_psplatform, "RLIMIT_NICE"): > RLIMIT_NICE = _psplatform.RLIMIT_NICE > if hasattr(_psplatform, "RLIMIT_RTPRIO"): > RLIMIT_RTPRIO = _psplatform.RLIMIT_RTPRIO > if hasattr(_psplatform, "RLIMIT_RTTIME"): > RLIMIT_RTTIME = _psplatform.RLIMIT_RTTIME > if hasattr(_psplatform, "RLIMIT_SIGPENDING"): > RLIMIT_SIGPENDING = _psplatform.RLIMIT_SIGPENDING > > > In *this specific case* a "from _psplatform import RLIM*" would have > solved my problem nicely. > On one hand this might look like encouraging wildcard import usage, but I > think it's the opposite. > Sometimes people use "from foo import *" just because "from foo import > bar*" is not available. > > > --- Giampaolo > https://code.google.com/p/pyftpdlib/ > https://code.google.com/p/psutil/ > https://code.google.com/p/pysendfile/ > > > On Thu, Oct 3, 2013 at 7:16 PM, Guido van Rossum wrote: > >> Hm. Why not just use "import socket" and then use "socket.AF_"? >> >> >> On Thu, Oct 3, 2013 at 10:09 AM, Giampaolo Rodola' wrote: >> >>> I suppose this has already been proposed in past but couldn't find >>> any online reference so here goes. >>> When it comes to module constant imports I usually like being explicit >>> it's OK with me as long as I have to do: >>> >>> >>> from resource import (RLIMIT_CORE, RLIMIT_CPU, RLIMIT_FSIZE) >>> >>> Nevertheless in case the existence of certain constants depends on the >>> platform in use I end up doing: >>> >>> >>> if hasattr(resource, "RLIMIT_MSGQUEUE"): # linux only >>> .... import resource.RLIMIT_MSGQUEUE >>> .... >>> >>> if hasattr(resource, "RLIMIT_NICE"): # linux only >>> .... import resource.RLIMIT_NICE >>> .... >>> >>> >>> ...or worse, if for simplicity I'm willing to simply import all RLIMIT_* >>> constants I'll have to do this: >>> >>> >>> import resource >>> >>> import sys >>> >>> for name in dir(resource): >>> .... if name.startswith('RLIMIT_'): >>> .... setattr(sys.modules[__name__], name, getattr(resource, name)) >>> >>> ...or just give up and use: >>> >>> from resource import * >>> >>> ...which of course will pollute the namespace with unnecessary stuff. >>> So why not just allow "from resource import RLIMIT_*" syntax? >>> Another interesting variation might be: >>> >>> >>> >>> from socket import AF_*, SOCK_* >>> >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM >>> (2, 10, 1, 2) >>> >>> >>> On the other hand mixing "*" and "common" imports would be forbidden: >>> >>> >>> from socket import AF_*, socket, >>> File "", line 1 >>> from socket import AF_*, socket >>> ^ >>> SyntaxError: invalid syntax; >>> >>> >>> Thoughts? >>> >>> >>> --- Giampaolo >>> https://code.google.com/p/pyftpdlib/ >>> https://code.google.com/p/psutil/ >>> https://code.google.com/p/pysendfile/ >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Thu Oct 3 21:13:34 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 3 Oct 2013 21:13:34 +0200 Subject: [Python-ideas] Allow from foo import bar* In-Reply-To: References: Message-ID: On Thu, Oct 3, 2013 at 9:00 PM, Guido van Rossum wrote: > Hm. It seems a pretty small use case for what would be a major > implementation challenge -- I'm sure there would be lots of issues > implementing this cleanly given all the special casing for import *, and > the special handling of importlib during bootstrap. > Fair enough. --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Thu Oct 3 21:59:42 2013 From: joshua at landau.ws (Joshua Landau) Date: Thu, 3 Oct 2013 20:59:42 +0100 Subject: [Python-ideas] Allow from foo import bar* In-Reply-To: References: Message-ID: On 3 October 2013 19:43, Giampaolo Rodola' wrote: >> Hm. Why not just use "import socket" and then use "socket.AF_"? > > That's what I usually do as well (because explicit is better than implicit) > but from my understanding when it comes to constants it is generally not > considered a bad practice to import them directly into the module namespace. > I guess my specific case is bit different though. > I have all these constants defined in a _linux.py submodule which I import > from __init__.py in order to expose them publicly. > And this is how I do that: > > # Linux >= 2.6.36 > if _psplatform.HAS_PRLIMIT: > from psutil._pslinux import (RLIM_INFINITY, RLIMIT_AS, RLIMIT_CORE, > RLIMIT_CPU, RLIMIT_DATA, RLIMIT_FSIZE, > RLIMIT_LOCKS, RLIMIT_MEMLOCK, ... > > In *this specific case* a "from _psplatform import RLIM*" would have solved > my problem nicely. Or we change the module such that we can do from psutil._pslinux import RLIMIT and then use RLIMIT.CORE, RLIMIT.CPU, RLIMIT.LOCKS, etc. From rymg19 at gmail.com Thu Oct 3 22:43:01 2013 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Thu, 3 Oct 2013 15:43:01 -0500 Subject: [Python-ideas] Allow from foo import bar* In-Reply-To: References: Message-ID: Well, that looks painful! I agree with Joshua: If you are doing something like that, namespaces work best. If you're really that desperate, why not something like this: globals().update({name: getattr(_psplatform, name) for name in dir(_psplatform) if name.startswith('RLIMIT')}) On Thu, Oct 3, 2013 at 1:43 PM, Giampaolo Rodola' wrote: > > Hm. Why not just use "import socket" and then use "socket.AF_"? > > That's what I usually do as well (because explicit is better than > implicit) but from my understanding when it comes to constants it is > generally not considered a bad practice to import them directly into the > module namespace. > I guess my specific case is bit different though. > I have all these constants defined in a _linux.py submodule which I import > from __init__.py in order to expose them publicly. > And this is how I do that: > > # Linux >= 2.6.36 > if _psplatform.HAS_PRLIMIT: > from psutil._pslinux import (RLIM_INFINITY, RLIMIT_AS, RLIMIT_CORE, > RLIMIT_CPU, RLIMIT_DATA, RLIMIT_FSIZE, > RLIMIT_LOCKS, RLIMIT_MEMLOCK, > RLIMIT_NOFILE, > RLIMIT_NPROC, RLIMIT_RSS, > RLIMIT_STACK) > if hasattr(_psplatform, "RLIMIT_MSGQUEUE"): > RLIMIT_MSGQUEUE = _psplatform.RLIMIT_MSGQUEUE > if hasattr(_psplatform, "RLIMIT_NICE"): > RLIMIT_NICE = _psplatform.RLIMIT_NICE > if hasattr(_psplatform, "RLIMIT_RTPRIO"): > RLIMIT_RTPRIO = _psplatform.RLIMIT_RTPRIO > if hasattr(_psplatform, "RLIMIT_RTTIME"): > RLIMIT_RTTIME = _psplatform.RLIMIT_RTTIME > if hasattr(_psplatform, "RLIMIT_SIGPENDING"): > RLIMIT_SIGPENDING = _psplatform.RLIMIT_SIGPENDING > > > In *this specific case* a "from _psplatform import RLIM*" would have > solved my problem nicely. > On one hand this might look like encouraging wildcard import usage, but I > think it's the opposite. > Sometimes people use "from foo import *" just because "from foo import > bar*" is not available. > > > --- Giampaolo > https://code.google.com/p/pyftpdlib/ > https://code.google.com/p/psutil/ > https://code.google.com/p/pysendfile/ > > > On Thu, Oct 3, 2013 at 7:16 PM, Guido van Rossum wrote: > >> Hm. Why not just use "import socket" and then use "socket.AF_"? >> >> >> On Thu, Oct 3, 2013 at 10:09 AM, Giampaolo Rodola' wrote: >> >>> I suppose this has already been proposed in past but couldn't find >>> any online reference so here goes. >>> When it comes to module constant imports I usually like being explicit >>> it's OK with me as long as I have to do: >>> >>> >>> from resource import (RLIMIT_CORE, RLIMIT_CPU, RLIMIT_FSIZE) >>> >>> Nevertheless in case the existence of certain constants depends on the >>> platform in use I end up doing: >>> >>> >>> if hasattr(resource, "RLIMIT_MSGQUEUE"): # linux only >>> .... import resource.RLIMIT_MSGQUEUE >>> .... >>> >>> if hasattr(resource, "RLIMIT_NICE"): # linux only >>> .... import resource.RLIMIT_NICE >>> .... >>> >>> >>> ...or worse, if for simplicity I'm willing to simply import all RLIMIT_* >>> constants I'll have to do this: >>> >>> >>> import resource >>> >>> import sys >>> >>> for name in dir(resource): >>> .... if name.startswith('RLIMIT_'): >>> .... setattr(sys.modules[__name__], name, getattr(resource, name)) >>> >>> ...or just give up and use: >>> >>> from resource import * >>> >>> ...which of course will pollute the namespace with unnecessary stuff. >>> So why not just allow "from resource import RLIMIT_*" syntax? >>> Another interesting variation might be: >>> >>> >>> >>> from socket import AF_*, SOCK_* >>> >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM >>> (2, 10, 1, 2) >>> >>> >>> On the other hand mixing "*" and "common" imports would be forbidden: >>> >>> >>> from socket import AF_*, socket, >>> File "", line 1 >>> from socket import AF_*, socket >>> ^ >>> SyntaxError: invalid syntax; >>> >>> >>> Thoughts? >>> >>> >>> --- Giampaolo >>> https://code.google.com/p/pyftpdlib/ >>> https://code.google.com/p/psutil/ >>> https://code.google.com/p/pysendfile/ >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- Ryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Oct 3 23:44:43 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 4 Oct 2013 07:44:43 +1000 Subject: [Python-ideas] Allow from foo import bar* In-Reply-To: References: Message-ID: On 4 Oct 2013 06:01, "Joshua Landau" wrote: > > On 3 October 2013 19:43, Giampaolo Rodola' wrote: > >> Hm. Why not just use "import socket" and then use "socket.AF_"? > > > > That's what I usually do as well (because explicit is better than implicit) > > but from my understanding when it comes to constants it is generally not > > considered a bad practice to import them directly into the module namespace. > > I guess my specific case is bit different though. > > I have all these constants defined in a _linux.py submodule which I import > > from __init__.py in order to expose them publicly. > > And this is how I do that: > > > > # Linux >= 2.6.36 > > if _psplatform.HAS_PRLIMIT: > > from psutil._pslinux import (RLIM_INFINITY, RLIMIT_AS, RLIMIT_CORE, > > RLIMIT_CPU, RLIMIT_DATA, RLIMIT_FSIZE, > > RLIMIT_LOCKS, RLIMIT_MEMLOCK, > ... > > > > In *this specific case* a "from _psplatform import RLIM*" would have solved > > my problem nicely. > > Or we change the module such that we can do > > from psutil._pslinux import RLIMIT > > and then use RLIMIT.CORE, RLIMIT.CPU, RLIMIT.LOCKS, etc. Another Enum candidate, perhaps? Cheers, Nick. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Oct 4 21:17:14 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 04 Oct 2013 22:17:14 +0300 Subject: [Python-ideas] pprint in displayhook In-Reply-To: <20131002005644.GI7989@ando> References: <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> <20131002005644.GI7989@ando> Message-ID: 02.10.13 03:56, Steven D'Aprano ???????(??): > I would like to see pprint be smarter about printing lists and dicts. At > the moment, a long list is either printed all on one line, like the > default display, or one item per line. This can end up as one long, > narrow column, which is worse than the default. I'd like to see it be > smarter about using multiple columns. http://bugs.python.org/issue19132 From storchaka at gmail.com Tue Oct 8 13:17:59 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 08 Oct 2013 14:17:59 +0300 Subject: [Python-ideas] Add "has_surrogates" flags to string object Message-ID: Here is an idea about adding a mark to PyUnicode object which allows fast answer to the question if a string has surrogate code. This mark has one of three possible states: * String doesn't contain surrogates. * String contains surrogates. * It is still unknown. We can combine this with "is_ascii" flag in 2-bit value: * String is ASCII-only (and doesn't contain surrogates). * String is not ASCII-only and doesn't contain surrogates. * String is not ASCII-only and contains surrogates. * String is not ASCII-only and it is still unknown if it contains surrogate. By default a string is created in "unknown" state (if it is UCS2 or UCS4). After first request it can be switched to "has surrogates" or "hasn't surrogates". State of the result of concatenating or slicing can be determined from states of input strings. This will allow faster UTF-16 and UTF-32 encoding (and perhaps even a little faster UTF-8 encoding) and converting to wchar_t* if string hasn't surrogates (this is true in most cases). From masklinn at masklinn.net Tue Oct 8 13:38:19 2013 From: masklinn at masklinn.net (Masklinn) Date: Tue, 8 Oct 2013 13:38:19 +0200 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: Message-ID: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> On 2013-10-08, at 13:17 , Serhiy Storchaka wrote: > Here is an idea about adding a mark to PyUnicode object which allows fast answer to the question if a string has surrogate code. This mark has one of three possible states: > > * String doesn't contain surrogates. > * String contains surrogates. > * It is still unknown. > > We can combine this with "is_ascii" flag in 2-bit value: > > * String is ASCII-only (and doesn't contain surrogates). > * String is not ASCII-only and doesn't contain surrogates. > * String is not ASCII-only and contains surrogates. > * String is not ASCII-only and it is still unknown if it contains surrogate. Isn't that redundant with the kind under shortest form representation? From solipsis at pitrou.net Tue Oct 8 13:43:43 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 8 Oct 2013 13:43:43 +0200 Subject: [Python-ideas] Add "has_surrogates" flags to string object References: Message-ID: <20131008134343.2e084051@pitrou.net> Le Tue, 08 Oct 2013 14:17:59 +0300, Serhiy Storchaka a ?crit : > Here is an idea about adding a mark to PyUnicode object which allows > fast answer to the question if a string has surrogate code. This mark > has one of three possible states: > > * String doesn't contain surrogates. > * String contains surrogates. > * It is still unknown. > > We can combine this with "is_ascii" flag in 2-bit value: > > * String is ASCII-only (and doesn't contain surrogates). > * String is not ASCII-only and doesn't contain surrogates. > * String is not ASCII-only and contains surrogates. > * String is not ASCII-only and it is still unknown if it contains > surrogate. > > By default a string is created in "unknown" state (if it is UCS2 or > UCS4). After first request it can be switched to "has surrogates" or > "hasn't surrogates". State of the result of concatenating or slicing > can be determined from states of input strings. Not true for slicing (you can take a non-surrogates slice of a surrogates string). Other than that, this sounds reasonable to me, provided that the patch isn't too complex and the perf improvements are worth it. Regards Antoine. From storchaka at gmail.com Tue Oct 8 13:43:51 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 08 Oct 2013 14:43:51 +0300 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> Message-ID: 08.10.13 14:38, Masklinn ???????(??): > On 2013-10-08, at 13:17 , Serhiy Storchaka wrote: > >> Here is an idea about adding a mark to PyUnicode object which allows fast answer to the question if a string has surrogate code. This mark has one of three possible states: >> >> * String doesn't contain surrogates. >> * String contains surrogates. >> * It is still unknown. >> >> We can combine this with "is_ascii" flag in 2-bit value: >> >> * String is ASCII-only (and doesn't contain surrogates). >> * String is not ASCII-only and doesn't contain surrogates. >> * String is not ASCII-only and contains surrogates. >> * String is not ASCII-only and it is still unknown if it contains surrogate. > > Isn't that redundant with the kind under shortest form representation? No, it isn't redundant. '\udc80' is UCS2 string with surrogate code, and '\udc80\U00010000' is UCS4 string with surrogate code. UCS2 string without surrogate codes can be encoded in UTF-16 by memcpy(). From mal at egenix.com Tue Oct 8 13:58:00 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 08 Oct 2013 13:58:00 +0200 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: Message-ID: <5253F348.3010204@egenix.com> On 08.10.2013 13:17, Serhiy Storchaka wrote: > Here is an idea about adding a mark to PyUnicode object which allows fast answer to the question if > a string has surrogate code. This mark has one of three possible states: > > * String doesn't contain surrogates. > * String contains surrogates. > * It is still unknown. > > We can combine this with "is_ascii" flag in 2-bit value: > > * String is ASCII-only (and doesn't contain surrogates). > * String is not ASCII-only and doesn't contain surrogates. > * String is not ASCII-only and contains surrogates. > * String is not ASCII-only and it is still unknown if it contains surrogate. > > By default a string is created in "unknown" state (if it is UCS2 or UCS4). After first request it > can be switched to "has surrogates" or "hasn't surrogates". State of the result of concatenating or > slicing can be determined from states of input strings. > > This will allow faster UTF-16 and UTF-32 encoding (and perhaps even a little faster UTF-8 encoding) > and converting to wchar_t* if string hasn't surrogates (this is true in most cases). I guess you could use one bit from the kind structure for that: /* Character size: - PyUnicode_WCHAR_KIND (0): * character type = wchar_t (16 or 32 bits, depending on the platform) - PyUnicode_1BYTE_KIND (1): * character type = Py_UCS1 (8 bits, unsigned) * all characters are in the range U+0000-U+00FF (latin1) * if ascii is set, all characters are in the range U+0000-U+007F (ASCII), otherwise at least one character is in the range U+0080-U+00FF - PyUnicode_2BYTE_KIND (2): * character type = Py_UCS2 (16 bits, unsigned) * all characters are in the range U+0000-U+FFFF (BMP) * at least one character is in the range U+0100-U+FFFF - PyUnicode_4BYTE_KIND (4): * character type = Py_UCS4 (32 bits, unsigned) * all characters are in the range U+0000-U+10FFFF * at least one character is in the range U+10000-U+10FFFF */ unsigned int kind:3; For some reason, it allocates 3 bits, but only 2 bits are used. The again, the state struct is unsigned int, so there's still plenty of room for extra flags. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 08 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-10-14: PyCon DE 2013, Cologne, Germany ... 6 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From masklinn at masklinn.net Tue Oct 8 13:58:20 2013 From: masklinn at masklinn.net (Masklinn) Date: Tue, 8 Oct 2013 13:58:20 +0200 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> Message-ID: On 2013-10-08, at 13:43 , Serhiy Storchaka wrote: > 08.10.13 14:38, Masklinn ???????(??): >> On 2013-10-08, at 13:17 , Serhiy Storchaka wrote: >> >>> Here is an idea about adding a mark to PyUnicode object which allows fast answer to the question if a string has surrogate code. This mark has one of three possible states: >>> >>> * String doesn't contain surrogates. >>> * String contains surrogates. >>> * It is still unknown. >>> >>> We can combine this with "is_ascii" flag in 2-bit value: >>> >>> * String is ASCII-only (and doesn't contain surrogates). >>> * String is not ASCII-only and doesn't contain surrogates. >>> * String is not ASCII-only and contains surrogates. >>> * String is not ASCII-only and it is still unknown if it contains surrogate. >> >> Isn't that redundant with the kind under shortest form representation? > > No, it isn't redundant. '\udc80' is UCS2 string with surrogate code, and '\udc80\U00010000' is UCS4 string with surrogate code. I don't know the details of the flexible string representation, but I believed the names fit what was actually in memory. UCS2 does not have surrogate pairs, thus surrogate codes make no sense in UCS2, they're a UTF-16 concept. Likewise for UCS4. Surrogate codes are not codepoints, they have no reason to appear in either UCS2 or UCS4 outside of encoding errors. > UCS2 string without surrogate codes can be encoded in UTF-16 by memcpy(). Surrogate codes prevent that (modulo objections above) for slicing (not that it's a big issue I think, a guard can just check whether it's slicing within a surrogate pair, that only requires checking the first and last 2 bytes of the range) but not for concatenation right? From victor.stinner at gmail.com Tue Oct 8 14:23:09 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 8 Oct 2013 14:23:09 +0200 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: Message-ID: I like the idea. I prefer to add another flag (1 bit), instead of having a complex with 4 different values. Your idea looks specific to the PEP 393, so I prefer to keep the flag private. Otherwise it would be hard for other implementations of Python to implement the function getting the flag value. Victor 2013/10/8 Serhiy Storchaka : > Here is an idea about adding a mark to PyUnicode object which allows fast > answer to the question if a string has surrogate code. This mark has one of > three possible states: > > * String doesn't contain surrogates. > * String contains surrogates. > * It is still unknown. > > We can combine this with "is_ascii" flag in 2-bit value: > > * String is ASCII-only (and doesn't contain surrogates). > * String is not ASCII-only and doesn't contain surrogates. > * String is not ASCII-only and contains surrogates. > * String is not ASCII-only and it is still unknown if it contains surrogate. > > By default a string is created in "unknown" state (if it is UCS2 or UCS4). > After first request it can be switched to "has surrogates" or "hasn't > surrogates". State of the result of concatenating or slicing can be > determined from states of input strings. > > This will allow faster UTF-16 and UTF-32 encoding (and perhaps even a little > faster UTF-8 encoding) and converting to wchar_t* if string hasn't > surrogates (this is true in most cases). From steve at pearwood.info Tue Oct 8 15:02:08 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 9 Oct 2013 00:02:08 +1100 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> Message-ID: <20131008130208.GX7989@ando> On Tue, Oct 08, 2013 at 01:58:20PM +0200, Masklinn wrote: > > On 2013-10-08, at 13:43 , Serhiy Storchaka wrote: > > > 08.10.13 14:38, Masklinn ???????(??): > >> On 2013-10-08, at 13:17 , Serhiy Storchaka wrote: > >> > >>> Here is an idea about adding a mark to PyUnicode object which > >>> allows fast answer to the question if a string has surrogate code. > >>> This mark has one of three possible states: [...] > >> Isn't that redundant with the kind under shortest form representation? > > > > No, it isn't redundant. '\udc80' is UCS2 string with surrogate code, and '\udc80\U00010000' is UCS4 string with surrogate code. > > I don't know the details of the flexible string representation, but I > believed the names fit what was actually in memory. UCS2 does not > have surrogate pairs, thus surrogate codes make no sense in UCS2, > they're a UTF-16 concept. Likewise for UCS4. Surrogate codes are not > codepoints, they have no reason to appear in either UCS2 or UCS4 > outside of encoding errors. I welcome correction, but I think you're mistaken. Python 3.3 strings don't have surrogate *pairs*, but they can contain surrogate *code points*. Unicode states: "Isolated surrogate code points have no interpretation; consequently, no character code charts or names lists are provided for this range." http://www.unicode.org/charts/PDF/UDC00.pdf http://www.unicode.org/charts/PDF/UD800.pdf So technically surrogates are "non-characters". That doesn't mean they are forbidden though; you can certainly create them, and encode them to UTF-16 and -32: py> surr = '\udc80' py> import unicodedata as ud py> ud.category(surr) 'Cs' py> surr.encode('utf-16') b'\xff\xfe\x80\xdc' py> surr.encode('utf-32') b'\xff\xfe\x00\x00\x80\xdc\x00\x00' However, you cannot encode single surrogates to UTF-8: py> surr.encode('utf-8') Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 0: surrogates not allowed as per the standard: http://www.unicode.org/faq/utf_bom.html#utf8-5 I *think* you are supposed to be able to encode surrogate *pairs* to UTF-8, if I'm reading the FAQ correctly, but it seems Python 3.3 doesn't support that. In any case, it is certainly legal to have Unicode strings containing non-characters, including surrogates, and you can encode them to UTF-16 and -32. However, it looks like surrogates won't round trip in UTF-16, but they will in UTF-32: py> surr.encode('utf-16').decode('utf-16') == surr Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'utf16' codec can't decode bytes in position 2-3: unexpected end of data py> surr.encode('utf-32').decode('utf-32') == surr True So... I'm not sure why this will be useful. Presumably Unicode strings containing surrogate code points will be rare, and you can't encode them to UTF-8 at all, and you can't round trip them from UTF-16. -- Steven From stephen at xemacs.org Tue Oct 8 15:31:07 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 08 Oct 2013 22:31:07 +0900 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> Message-ID: <87vc17khyc.fsf@uwakimon.sk.tsukuba.ac.jp> Masklinn writes: > I don't know the details of the flexible string representation, but I > believed the names fit what was actually in memory. UCS2 does not > have surrogate pairs, thus surrogate codes make no sense in UCS2, > they're a UTF-16 concept. Likewise for UCS4. Surrogate codes are not > codepoints, they have no reason to appear in either UCS2 or UCS4 > outside of encoding errors. True, but Python doesn't actually use UCS2 or UCS4 internally. It uses UCS2 or UCS4 plus a row of codes from the surrogate area to represent undecodable bytes. This feature is optional (enabled by using the appropriate error= setting in the codec), but I don't suppose it's going to go away. From masklinn at masklinn.net Tue Oct 8 15:48:18 2013 From: masklinn at masklinn.net (Masklinn) Date: Tue, 8 Oct 2013 15:48:18 +0200 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: <20131008130208.GX7989@ando> References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> <20131008130208.GX7989@ando> Message-ID: On 2013-10-08, at 15:02 , Steven D'Aprano wrote: [snipped early part as any response would be superseded by or redundant with the stuff below] > However, you cannot encode single surrogates to UTF-8: > > py> surr.encode('utf-8') > Traceback (most recent call last): > File "", line 1, in > UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in > position 0: surrogates not allowed > > as per the standard: > > http://www.unicode.org/faq/utf_bom.html#utf8-5 > > I *think* you are supposed to be able to encode surrogate *pairs* to > UTF-8, if I'm reading the FAQ correctly I'm reading the opposite, from http://www.unicode.org/faq/utf_bom.html#utf8-4: > there is a widespread practice of generating pairs of three byte > sequences in older software, especially software which pre-dates the > introduction of UTF-16 or that is interoperating with UTF-16 > environments under particular constraints. Such an encoding is not > conformant to UTF-8 as defined. Pairs of 3-byte sequences would be encoding each surrogate directly to UTF-8, whereas a single 4-byte sequence would be decoding the surrogate pair to a codepoint and encoding that codepoint to UTF-8. My reading of the FAQ makes the second interpretation the only valid one. So you can't encode surrogates (either lone or paired) to UTF-8, you can encode the codepoint encoded by a surrogate pair. > In any case, it is certainly legal to have Unicode strings > containing non-characters, including surrogates, and you can encode them > to UTF-16 and ?32. The UTF-32 section has similar note to UTF-8: http://www.unicode.org/faq/utf_bom.html#utf32-7 > A: If an unpaired surrogate is encountered when converting ill-formed > UTF-16 data, any conformant converter must treat this as an error. By > representing such an unpaired surrogate on its own, the resulting UTF-32 > data stream would become ill-formed. While it faithfully reflects the > nature of the input, Unicode conformance requires that encoding form > conversion always results in valid data stream. and the UTF-16 section points out: http://www.unicode.org/faq/utf_bom.html#utf16-7 > Q: Are there any 16-bit values that are invalid? > A: Unpaired surrogates are invalid in UTFs. These include any value in > the range D80016 to DBFF16 not followed by a value in the range DC0016 > to DFFF16, or any value in the range DC0016 to DFFF16 not preceded by a > value in the range D80016 to DBFF16. As far as I can read the FAQ, it is always invalid to encode a surrogate, surrogates are not to be considered codepoints (they're not just noncharacters[0], noncharacters are codepoints), and a lone surrogate in a UTF-16 stream means the stream is corrupted, which should result in an error during transcoding to anything (unless some recovery mode is used to replace corrupted characters by some mark during decoding I guess). > So... I'm not sure why this will be useful. Presumably Unicode strings > containing surrogate code points will be rare And they're a sign of corrupted stream. The FAQ reads a bit strangely, I think because it's written from the viewpoint that the "internal encoding" will be UTF-16, and UTF-8 and UTF-32 are transcoding from that. Which does not apply to CPython and the FSR. Parsing the FAQ with that viewpoint, I believe a CPython string (unicode) must not contain surrogate codes: a surrogate pair should have been decoded from UTF-16 to a codepoint (then identity-encoded to UCS4) and a single surrogate should have been caught by the UTF-16 decoder and should have triggered the error handler at that point. A surrogate code in a CPython string means the string is corrupted[1]. Surrogates *may* appear in binary data, while building a UTF-16 bytestream by hand. [0] since "noncharacter" has a well-defined meaning in unicode, and only applies to 66 codepoints, a much smaller range than surrogates: http://www.unicode.org/faq/private_use.html#noncharacters [1] note that this hinges on my understanding of "UCS2" in FSR being actual UCS2, if it's UCS2-with-surrogates with a heuristic for switching between UCS2 and UCS4 depending on the number of surrogate pairs in the string it does not apply From steve at pearwood.info Tue Oct 8 16:20:09 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 9 Oct 2013 01:20:09 +1100 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> <20131008130208.GX7989@ando> Message-ID: <20131008142009.GY7989@ando> On Tue, Oct 08, 2013 at 03:48:18PM +0200, Masklinn wrote: > On 2013-10-08, at 15:02 , Steven D'Aprano wrote: > > py> surr.encode('utf-8') > > Traceback (most recent call last): > > File "", line 1, in > > UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in > > position 0: surrogates not allowed > > > > as per the standard: > > > > http://www.unicode.org/faq/utf_bom.html#utf8-5 > > > > I *think* you are supposed to be able to encode surrogate *pairs* to > > UTF-8, if I'm reading the FAQ correctly > > I'm reading the opposite, from http://www.unicode.org/faq/utf_bom.html#utf8-4: > > > there is a widespread practice of generating pairs of three byte > > sequences in older software, especially software which pre-dates the > > introduction of UTF-16 or that is interoperating with UTF-16 > > environments under particular constraints. Such an encoding is not > > conformant to UTF-8 as defined. > > Pairs of 3-byte sequences would be encoding each surrogate directly to > UTF-8, whereas a single 4-byte sequence would be decoding the surrogate > pair to a codepoint and encoding that codepoint to UTF-8. My reading > of the FAQ makes the second interpretation the only valid one. It's not that clear to me. I fear the Unicode FAQs don't distinguish between Unicode strings and bytes well enough for my liking :( But for the record, my interpretion is that if you have a pair of code points constisting of the same values as a valid surrogate pair, you should be able to encode to UTF-8. To give a concrete example: Given: c = '\N{LINEAR B SYLLABLE B038 E}' # \U00010001 c.encode('utf-8') => b'\xf0\x90\x80\x81' and: c.encode('utf-16BE') # encodes as a surrogate pair => b'\xd8\x00\xdc\x01' then those same surrogates, taken as codepoints, should be encodable as UTF-8: '\ud800\udc01'.encode('utf-8') => b'\xf0\x90\x80\x81' I'd actually be disappointed if that were the case; I think that would be a poor design. But if that's what the Unicode standard demands, Python ought to support it. But hopefully somebody will explain to me why my interpretation is wrong :-) [...] > The FAQ reads a bit strangely, I think because it's written from the > viewpoint that the "internal encoding" will be UTF-16, and UTF-8 and > UTF-32 are transcoding from that. Which does not apply to CPython and > the FSR. Hmmm... well, that might explain it. If it's written by Java programmers for Java programmers, they may very well decide that having spent 20 years trying to convince people that string != ASCII, they're now going to convince them that string == UTF-16 instead :/ > Parsing the FAQ with that viewpoint, I believe a CPython string (unicode) > must not contain surrogate codes: a surrogate pair should have been > decoded from UTF-16 to a codepoint (then identity-encoded to UCS4) and a > single surrogate should have been caught by the UTF-16 decoder and > should have triggered the error handler at that point. A surrogate code > in a CPython string means the string is corrupted[1]. I think that interpretation is a bit strong. I think it would be fair to say that CPython strings may contain surrogates, but you can't encode them to bytes using the UTFs. Nor are there any byte sequences that can be decoded to surrogates using the UTFs. This essentially means that you can only get surrogates in a string using (e.g.) chr() or \u escapes, and you can't then encode them to bytes using UTF encodings. > Surrogates *may* appear in binary data, while building a UTF-16 > bytestream by hand. But there you're talking about bytes, not byte strings. Byte strings can contain any bytes you like :-) -- Steven From turnbull at sk.tsukuba.ac.jp Tue Oct 8 16:31:25 2013 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 08 Oct 2013 23:31:25 +0900 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> <20131008130208.GX7989@ando> Message-ID: <87txgrkf5u.fsf@uwakimon.sk.tsukuba.ac.jp> Masklinn writes: > The FAQ reads a bit strangely, I think because it's written from the > viewpoint that the "internal encoding" will be UTF-16, and UTF-8 and > UTF-32 are transcoding from that. Which does not apply to CPython and > the FSR. No, it's written from the viewpoint that it says *nothing* about internal encodings, only about the encodings used in interchange of textual data, and about certain aspects of the processes that may receive and generate such data (eg, when data matches a Unicode regular expression, or how bidirectional text should appear visually). > Parsing the FAQ with that viewpoint, I believe a CPython string (unicode) > must not contain surrogate codes: No, it says no such thing. All the Unicode Standard (and the FAQ) says is that if Python generates output that purports to be text encoded in Unicode, it may not contain surrogate codes except where those codes are used according to UTF-16 to encode characters in planes 2 to 17, and if it receives data alleged to be Unicode in some transformation format, it must raise an error if it receives surrogates other than a correctly formed surrogate pair in text known to be encoded as UTF-16. In fact (as I wrote before without proper citation), the internal encoding of Python has been extended by PEP 383 to use a subset of the surrogate space to represent undecodable bytes in an octet stream, when the error handler is set to "surrogateescape". Furthermore, there is nothing to stop a Python unicode from containing any code unit (including both surrogates and other non-characters like 0xFFFF). Checking of the rules you cite is done by codecs, at encoding and decoding time. From masklinn at masklinn.net Tue Oct 8 16:40:58 2013 From: masklinn at masklinn.net (Masklinn) Date: Tue, 8 Oct 2013 16:40:58 +0200 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: <20131008142009.GY7989@ando> References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> <20131008130208.GX7989@ando> <20131008142009.GY7989@ando> Message-ID: On 2013-10-08, at 16:20 , Steven D'Aprano wrote > I'd actually be disappointed if that were the case; I think that would > be a poor design. But if that's what the Unicode standard demands, > Python ought to support it. That would be really weird, it'd mean an *encoder* has to translate a surrogate pair into the actual codepoint in some sort of weird UTF-specific normalization pass. > But hopefully somebody will explain to me why my interpretation is wrong > :-) > > [...] >> The FAQ reads a bit strangely, I think because it's written from the >> viewpoint that the "internal encoding" will be UTF-16, and UTF-8 and >> UTF-32 are transcoding from that. Which does not apply to CPython and >> the FSR. > > Hmmm... well, that might explain it. If it's written by Java programmers > for Java programmers, they may very well decide that having spent 20 > years trying to convince people that string != ASCII, they're now > going to convince them that string == UTF-16 instead :/ To be fair, it's not just java programmers, IIRC ICU uses UTF-16 as the internal encoding. >> Parsing the FAQ with that viewpoint, I believe a CPython string (unicode) >> must not contain surrogate codes: a surrogate pair should have been >> decoded from UTF-16 to a codepoint (then identity-encoded to UCS4) and a >> single surrogate should have been caught by the UTF-16 decoder and >> should have triggered the error handler at that point. A surrogate code >> in a CPython string means the string is corrupted[1]. > > I think that interpretation is a bit strong. I think it would be fair to > say that CPython strings may contain surrogates, but you can't encode > them to bytes using the UTFs. Nor are there any byte sequences that can > be decoded to surrogates using the UTFs. > > This essentially means that you can only get surrogates in a string > using (e.g.) chr() or \u escapes, and you can't then encode them to > bytes using UTF encodings. > >> Surrogates *may* appear in binary data, while building a UTF-16 >> bytestream by hand. > > But there you're talking about bytes, not byte strings. Byte strings can > contain any bytes you like :-) Yes, that's basically what I mean: I think surrogates only make sense in a bytestream, not in a unicode stream. Although I did not remember/was not aware of PEP 383 (thank you Stephen) which makes the Unicode spec irrelevant to what Python string contains. On 2013-10-08, at 16:31 , Stephen J. Turnbull wrote: > Furthermore, there is nothing to stop a Python unicode from containing > any code unit (including both surrogates and other non-characters like > 0xFFFF). Checking of the rules you cite is done by codecs, at > encoding and decoding time. noncharacters are a very different case for what it's worth, their own FAQ clearly notes that they are valid full-fledged codepoints and must be encoded and preserved by UTFs: http://www.unicode.org/faq/private_use.html#nonchar7 From random832 at fastmail.us Tue Oct 8 17:27:52 2013 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 08 Oct 2013 11:27:52 -0400 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> Message-ID: <1381246072.12709.31490813.0B5674DF@webmail.messagingengine.com> On Tue, Oct 8, 2013, at 7:58, Masklinn wrote: > I don't know the details of the flexible string representation, but I > believed the names fit what was actually in memory. UCS2 does not > have surrogate pairs, thus surrogate codes make no sense in UCS2, > they're a UTF-16 concept. Likewise for UCS4. Surrogate codes are not > codepoints, they have no reason to appear in either UCS2 or UCS4 > outside of encoding errors. They can also occur due to slicing a ctypes unicode buffer, due to PEP 383, or due to native UTF-16 filenames that contain invalid surrogates. The latter two also create situations where you need to generate them. From storchaka at gmail.com Tue Oct 8 17:55:25 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 08 Oct 2013 18:55:25 +0300 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: <20131008130208.GX7989@ando> References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> <20131008130208.GX7989@ando> Message-ID: 08.10.13 16:02, Steven D'Aprano ???????(??): > So... I'm not sure why this will be useful. This is a bug. http://bugs.python.org/issue12892 From storchaka at gmail.com Tue Oct 8 18:16:57 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 08 Oct 2013 19:16:57 +0300 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: <5253F348.3010204@egenix.com> References: <5253F348.3010204@egenix.com> Message-ID: 08.10.13 14:58, M.-A. Lemburg ???????(??): > I guess you could use one bit from the kind structure > for that: The kind of string should be equal to the size of character unit. This assumption is used in a lot of code. From storchaka at gmail.com Tue Oct 8 18:21:57 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 08 Oct 2013 19:21:57 +0300 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: Message-ID: 08.10.13 15:23, Victor Stinner ???????(??): > I like the idea. I prefer to add another flag (1 bit), instead of > having a complex with 4 different values. We need at least 3-states value: yes, no, may be. But combining with is_ascii flag we need only one additional bit. I think that it shouldn't be more complex. > Your idea looks specific to the PEP 393, so I prefer to keep the flag > private. Otherwise it would be hard for other implementations of > Python to implement the function getting the flag value. Yes, of course. From mal at egenix.com Tue Oct 8 18:28:51 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 08 Oct 2013 18:28:51 +0200 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: <5253F348.3010204@egenix.com> Message-ID: <525432C3.3070905@egenix.com> On 08.10.2013 18:16, Serhiy Storchaka wrote: > 08.10.13 14:58, M.-A. Lemburg ???????(??): >> I guess you could use one bit from the kind structure >> for that: > > The kind of string should be equal to the size of character unit. This assumption is used in a lot > of code. Ok, then just add the flag to the end of the list... we'd still have at least 7 bits left on most platforms, IICC. PS: I guess this use of kind should be documented clearly somewhere. The unicodeobject.h file only hints at this and for PyUnicode_WCHAR_KIND this interpretation cannot be used. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 08 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-10-14: PyCon DE 2013, Cologne, Germany ... 6 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From bruce at leapyear.org Tue Oct 8 22:37:54 2013 From: bruce at leapyear.org (Bruce Leban) Date: Tue, 8 Oct 2013 13:37:54 -0700 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: <20131008142009.GY7989@ando> References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> <20131008130208.GX7989@ando> <20131008142009.GY7989@ando> Message-ID: On Tue, Oct 8, 2013 at 7:20 AM, Steven D'Aprano wrote: > Given: > > c = '\N{LINEAR B SYLLABLE B038 E}' # \U00010001 > c.encode('utf-8') > => b'\xf0\x90\x80\x81' > > and: > > c.encode('utf-16BE') # encodes as a surrogate pair > => b'\xd8\x00\xdc\x01' > > then those same surrogates, taken as codepoints, should be encodable as > UTF-8: > > '\ud800\udc01'.encode('utf-8') > => b'\xf0\x90\x80\x81' > > > I'd actually be disappointed if that were the case; I think that would > be a poor design. But if that's what the Unicode standard demands, > Python ought to support it. > The FAQ is explicit that this is wrong: "The definition of UTF-8 requires that supplementary characters (those using surrogate pairs in UTF-16) be encoded with a single four byte sequence." http://www.unicode.org/faq/utf_bom.html#utf8-4 It goes on to say that there is a widespread practice of doing it anyway in older software. Therefore, it might be acceptable to accept these mis-encoded characters when *decoding* but they should never be generated when *encoding*. I'd prefer not to have that on by default given the history of overlong UTF-8 bugs (e.g., see http://blogs.msdn.com/b/michael_howard/archive/2008/08/22/overlong-utf-8-escapes-bite.aspx). Essentially if different decoders follow different rules, then you can sometimes sneak stuff through the permissive decoders. Notwithstanding that, there is a different unicode encoding CESU-8 which does the opposite: it always encodes those characters requiring surrogate pairs as 6 bytes consisting of two UTF-8-style encodings of the individual surrogate codepoints. Python doesn't support this and the request to support it was rejected: http://bugs.python.org/issue12742 --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Oct 9 00:49:29 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 09 Oct 2013 11:49:29 +1300 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> <20131008130208.GX7989@ando> <20131008142009.GY7989@ando> Message-ID: <52548BF9.4070802@canterbury.ac.nz> Bruce Leban wrote: > The FAQ is explicit that this is wrong: "The definition of UTF-8 > requires that supplementary characters (those using surrogate pairs in > UTF-16) be encoded with a single four byte > sequence." http://www.unicode.org/faq/utf_bom.html#utf8-4 Python's internal string representation is not UTF-16, though, so this doesn't apply directly. Seems to me it hinges on whether a pair of surrogate code points appearing in a Python string are meant to represent a single character or not. I would say not, because otherwise they would have been stored as a single code unit. -- Greg From steve at pearwood.info Wed Oct 9 02:55:07 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 9 Oct 2013 11:55:07 +1100 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> <20131008130208.GX7989@ando> <20131008142009.GY7989@ando> Message-ID: <20131009005507.GB7989@ando> On Tue, Oct 08, 2013 at 01:37:54PM -0700, Bruce Leban wrote: > On Tue, Oct 8, 2013 at 7:20 AM, Steven D'Aprano wrote: > > > Given: > > > > c = '\N{LINEAR B SYLLABLE B038 E}' # \U00010001 > > c.encode('utf-8') > > => b'\xf0\x90\x80\x81' > > > > and: > > > > c.encode('utf-16BE') # encodes as a surrogate pair > > => b'\xd8\x00\xdc\x01' > > > > then those same surrogates, taken as codepoints, should be encodable as > > UTF-8: > > > > '\ud800\udc01'.encode('utf-8') > > => b'\xf0\x90\x80\x81' > > > > > > I'd actually be disappointed if that were the case; I think that would > > be a poor design. But if that's what the Unicode standard demands, > > Python ought to support it. > > > > The FAQ is explicit that this is wrong: "The definition of UTF-8 requires > that supplementary characters (those using surrogate pairs in UTF-16) be > encoded with a single four byte sequence." > http://www.unicode.org/faq/utf_bom.html#utf8-4 And if you count the number of bytes, you will see four of them: '\ud800\udc01'.encode('utf-8') => b'\xf0' b'\x90' b'\x80' b'\x81' I stress that Python 3.3 doesn't actually do this, but my reading of the FAQ suggests that it should. The question isn't what UTF-8 should do with supplmentary characters (those outside the BMP). That is well-defined, and Python 3.3 gets it right. The question is what it should do with pairs of surrogates. Ill-formed surrogates are rightly illegal when encoding to UTF-8: # a lone surrogate is illegal '\ud800'.encode('utf-8') must be treated as an error # two high surrogates, or two low surrogates '\udc01\udc01'.encode('utf-8') must be treated as an error '\ud800\ud800'.encode('utf-8') must be treated as an error # if they're in the wrong order '\udc01\ud800'.encode('utf-8') must be treated as an error The only thing that I'm not sure is how to deal with *valid* pairs of surrogates: '\ud800\udc01'.encode('utf-8') should do what? I personally would hope that this too should raise, which is Python's current behaviour, but my reading of the FAQs is that it should be treated as if there were an implicit UTF-16 conversion. (I hope I'm wrong!) That is: 1) treat the sequence of code points as if it were a sequence of two 16-bit values b'\xd8\x00' b'\xdc\x01' 2) implicitly decode it using UTF-16 to get U+10001 3) encode U+10001 using UTF-8 to get b'\xf0\x90\x80\x81' That would be (in my opinion) *horrible*, but that's my reading of the Unicode FAQ. The question asks: "How do I convert a UTF-16 surrogate pair such as to UTF-8?" and the answer seems to be: "The definition of UTF-8 requires that supplementary characters (those using surrogate pairs in UTF-16) be encoded with a single four byte sequence." which doesn't actually answer the question (the question is about SURROGATE PAIRS, the answer is about SUPPLEMENTARY CHARACTERS) but suggests the above horrible interpretation. What I'm hoping for is a definite source that explains what the UTF-8 encoder is supposed to do with a Unicode string containing surrogates. (And presumably the other UTF encoders as well, although I haven't tried thinking about them yet.) > It goes on to say that there is a widespread practice of doing it anyway in > older software. Therefore, it might be acceptable to accept these > mis-encoded characters when *decoding* but they should never be generated > when *encoding*. They are talking about the practice of generating six bytes, two three-byte sequences. You should notice that I'm not generating six bytes anywhere. -- Steven From stephen at xemacs.org Wed Oct 9 04:03:46 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 09 Oct 2013 11:03:46 +0900 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: <52548BF9.4070802@canterbury.ac.nz> References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> <20131008130208.GX7989@ando> <20131008142009.GY7989@ando> <52548BF9.4070802@canterbury.ac.nz> Message-ID: <87eh7vjj3x.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Bruce Leban wrote: > > The FAQ is explicit that this is wrong: "The definition of UTF-8 > > requires that supplementary characters (those using surrogate pairs in > > UTF-16) be encoded with a single four byte > > sequence." http://www.unicode.org/faq/utf_bom.html#utf8-4 > > Python's internal string representation is not UTF-16, though, > so this doesn't apply directly. It applies directly to Steven's examples, since they use .encode() and .decode(). > Seems to me it hinges on whether a pair of surrogate code > points appearing in a Python string are meant to represent > a single character or not. Only (a subset of) low surrogates is valid in a Python string, so a pair can't possibly respresent a supplementary character in UTF-16 encoding. From tjreedy at udel.edu Wed Oct 9 04:43:54 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 08 Oct 2013 22:43:54 -0400 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: <20131009005507.GB7989@ando> References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> <20131008130208.GX7989@ando> <20131008142009.GY7989@ando> <20131009005507.GB7989@ando> Message-ID: On 10/8/2013 8:55 PM, Steven D'Aprano wrote: > '\ud800\udc01'.encode('utf-8') > => b'\xf0' b'\x90' b'\x80' b'\x81' > > I stress that Python 3.3 doesn't actually do this, but my reading of the > FAQ suggests that it should. And I already explained on python-list why that reading is wrong; transcoding a utf-16 string (sequence of 2-byte words, subject to validity rules) is different from encoding unicode text (character sequence, and surrogates are not characters). A utf-16 to utf-8 transcoder should (must) do the above, but in 3.3+, the utf-8 codec is no longer the utf-16 trancoder that it effectively was for narrow builds. Each utf form defines a one to one mapping between unicode texts and valid code unit sequences. (Unicode Standard, Chapter 3, definition D79.) Having both '\U00010001' and '\ud800\udc01' map to b'\xf0\x90\x80\x81' would violate that important property. '\ud800\udc01' represents a character in utf-16 but not in python's flexible string representation. The latter uses one code unit (of variable size per string) per character, instead of a variable number of code units (of one size for all strings) per character. Because machines have not conceptual, visual, or aural memory, but only byte memory, they must re-encode abstract characters to bytes to remember them. In pre 3.3 narrow builds, where utf-16 was used internally, decoding and encoding amounted to transcoding bytes encodings into the utf-16 encoding, and vice versa. So utf-8 b'\xf0\x90\x80\x81' and utf-16 '\ud800\udc01' were mapped into each other. Whether the mapping was done directly or indirectly, via the character codepoint value, did not matter to the user. In any case FSR no longer uses multiple-code-unit encodings internally, and '\ud800\udc01', even though allowed for practical reasons, does not represent and is not the same as '\U00010001'. The proposed 'has_surrogates' flag amounts to an 'not strictly valid' flag. Only the FSR implementors can decide if it is worth the trouble. -- Terry Jan Reedy From stephen at xemacs.org Wed Oct 9 06:29:04 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 09 Oct 2013 13:29:04 +0900 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: <20131009005507.GB7989@ando> References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> <20131008130208.GX7989@ando> <20131008142009.GY7989@ando> <20131009005507.GB7989@ando> Message-ID: <87a9ijjcdr.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > What I'm hoping for is a definite source that explains what the UTF-8 > encoder is supposed to do with a Unicode string containing > surrogates. According to PEP 383, which provides a special mechanism for roundtripping input that claims to be a particular encoding but does not conform to that encoding, when encoding to UTF-8, if the errors= parameter *is* surrogateescape *and* the value is in the first row of the low surrogate range, it is masked by 0xff and emitted as a single byte. In all other cases of surrogates, it should raise an error. A conforming Unicode codec must not emit UTF-8 which would decode to a surrogate. These cases can occur in valid Python programs because chr() is unconstrained (for example). On input, Unicode conformance means that when using the surrogateescape handler, an alleged UTF-8 stream containing a 6-byte sequence that would algorithmically decode to a surrogate pair should be represented internally as a sequence of 6 surrogates from the first row of the low surrogate range. If the surrogateescape handler is not in use, it should raise an error. Sorry about not testing actual behavior, gotta run to a meeting. I forget what PEP 383 says about other Unicode codecs. From bruce at leapyear.org Wed Oct 9 04:09:11 2013 From: bruce at leapyear.org (Bruce Leban) Date: Tue, 8 Oct 2013 19:09:11 -0700 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: <20131009005507.GB7989@ando> References: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net> <20131008130208.GX7989@ando> <20131008142009.GY7989@ando> <20131009005507.GB7989@ando> Message-ID: Sorry. I don't think what I said contributed to the conversation very well. Let me try again. On Tue, Oct 8, 2013 at 5:55 PM, Steven D'Aprano wrote: > On Tue, Oct 08, 2013 at 01:37:54PM -0700, Bruce Leban wrote: > > The question isn't what UTF-8 should do with supplmentary characters > (those outside the BMP). That is well-defined, and Python 3.3 gets it > right. The question is what it should do with pairs of surrogates. > Ill-formed surrogates are rightly illegal when encoding to UTF-8: > > The only thing that I'm not sure is how to deal with *valid* > pairs of surrogates: > > '\ud800\udc01'.encode('utf-8') should do what? > > I don't think that's valid. While it is a sequence of Unicode *codepoints *(Python definition of unicode string) it is not a sequence of Unicode * characters*. Arguably, Python should insist that a Unicode string be a sequence of Unicode characters and reject '\ud800\udc01' at compile time just as it does '\U01010101' as those are all not valid Unicode characters. However, I concede that is unlikely to happen. Here's how I read the FAQ. Most of this FAQ is written in terms of converting one representation to another. Python strings are not one of those representations. A *Unicode transformation format* (UTF) is an algorithmic mapping from every Unicode code point (except surrogate code points) to a unique byte sequence. http://www.unicode.org/faq/utf_bom.html#gen2 To convert UTF-X to UTF-Y, you convert the UTF-X to a sequence of characters and then convert that to UTF-Y. Note that this excludes surrogate code points -- they are not representable in the sequence of code points that a UTF defines. The definition of UTF-32 says: Any Unicode character can be represented as a single 32-bit unit in UTF-32. This single 4 code unit corresponds to the Unicode scalar value, which is the abstract number associated with a Unicode character. http://www.unicode.org/faq/utf_bom.html#utf32-1 Thus a surrogate codepoint is NOT allowed in UTF-32 as it is not a character and if it is encountered it should be treated as an error. --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Oct 11 14:12:37 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 11 Oct 2013 14:12:37 +0200 Subject: [Python-ideas] Add "has_surrogates" flags to string object In-Reply-To: References: Message-ID: 2013/10/8 Serhiy Storchaka : > Here is an idea about adding a mark to PyUnicode object which allows fast > answer to the question if a string has surrogate code. This mark has one of > three possible states: > > * String doesn't contain surrogates. > * String contains surrogates. > * It is still unknown. > > We can combine this with "is_ascii" flag in 2-bit value: > > * String is ASCII-only (and doesn't contain surrogates). > * String is not ASCII-only and doesn't contain surrogates. > * String is not ASCII-only and contains surrogates. > * String is not ASCII-only and it is still unknown if it contains surrogate. > > By default a string is created in "unknown" state (if it is UCS2 or UCS4). > After first request it can be switched to "has surrogates" or "hasn't > surrogates". State of the result of concatenating or slicing can be > determined from states of input strings. > > This will allow faster UTF-16 and UTF-32 encoding (and perhaps even a little > faster UTF-8 encoding) and converting to wchar_t* if string hasn't > surrogates (this is true in most cases). Knowing if a string contains any surrogate character would also speedup marshal and pickle modules: http://bugs.python.org/issue19219#msg199465 Victor From mistersheik at gmail.com Fri Oct 11 20:29:43 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 11 Oct 2013 11:29:43 -0700 (PDT) Subject: [Python-ideas] An exhaust() function for iterators In-Reply-To: References: Message-ID: <5a7a21a5-bd7e-4bc7-a80f-e6d6154f0e13@googlegroups.com> This was also my thought. On Sunday, September 29, 2013 4:42:20 PM UTC-4, Serhiy Storchaka wrote: > > 29.09.13 07:06, Clay Sweetser ???????(??): > > I would like to propose that this function, or one very similar to it, > > be added to the standard library, either in the itertools module, or > > the standard namespace. > > If nothing else, doing so would at least give a single *obvious* way > > to exhaust an iterator, instead of the several miscellaneous methods > > available. > > I prefer optimize the for loop so that it will be most efficient way (it > is already most obvious way). > > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Oct 11 20:51:20 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 11 Oct 2013 11:51:20 -0700 Subject: [Python-ideas] An exhaust() function for iterators In-Reply-To: <5a7a21a5-bd7e-4bc7-a80f-e6d6154f0e13@googlegroups.com> References: <5a7a21a5-bd7e-4bc7-a80f-e6d6154f0e13@googlegroups.com> Message-ID: It is hard to imagine that doing this: for _ in side_effect_iter: pass Could EVER realistically spend a significant share of its time in the loop code. Side effects almost surely need to do something that vastly overpowers the cost of the loop itself (maybe some I/O, maybe some computation), or there's no point in using a side-effect iterator. I know you *could* technically write: def side_effect_iter(N, obj): for n in range(N): obj.val = n yield True And probably something else whose only side effect was changing some value that doesn't need real computation. But surely writing that and exhausting that iterator is NEVER the best way to code such a thing. On the other hand, a more realistic one like this: def side_effect_iter(N): for n in range(N): val = complex_computation(n) write_to_slow_disk(val) yield True Is going to take a long time in each iteration, and there's no reason to care that the loop isn't absolutely optimal speed. On Fri, Oct 11, 2013 at 11:29 AM, Neil Girdhar wrote: > This was also my thought. > > > On Sunday, September 29, 2013 4:42:20 PM UTC-4, Serhiy Storchaka wrote: > >> 29.09.13 07:06, Clay Sweetser ???????(??): >> > I would like to propose that this function, or one very similar to it, >> > be added to the standard library, either in the itertools module, or >> > the standard namespace. >> > If nothing else, doing so would at least give a single *obvious* way >> > to exhaust an iterator, instead of the several miscellaneous methods >> > available. >> >> I prefer optimize the for loop so that it will be most efficient way (it >> is already most obvious way). >> >> ______________________________**_________________ >> Python-ideas mailing list >> Python... at python.org >> https://mail.python.org/**mailman/listinfo/python-ideas >> >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Oct 11 21:02:42 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 11 Oct 2013 22:02:42 +0300 Subject: [Python-ideas] An exhaust() function for iterators In-Reply-To: References: <5a7a21a5-bd7e-4bc7-a80f-e6d6154f0e13@googlegroups.com> Message-ID: 11.10.13 21:51, David Mertz ???????(??): > It is hard to imagine that doing this: > > for _ in side_effect_iter: pass > > Could EVER realistically spend a significant share of its time in the > loop code. When I written a test for tee() (issue #13454) I needed very fast iterator exhausting. There were one or two other similar cases. From mistersheik at gmail.com Fri Oct 11 20:38:33 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 11 Oct 2013 11:38:33 -0700 (PDT) Subject: [Python-ideas] Extremely weird itertools.permutations Message-ID: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> "It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations." ? http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original. Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code). It is completely non-obvious to have permutations returning duplicates. For a non-breaking compromise what about adding a flag? Best, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Oct 11 21:29:35 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 11 Oct 2013 22:29:35 +0300 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> Message-ID: 11.10.13 21:38, Neil Girdhar ???????(??): > Should we consider fixing itertools.permutations and to output only > unique permutations (if possible, although I realize that would break > code). It is completely non-obvious to have permutations returning > duplicates. For a non-breaking compromise what about adding a flag? I think this should be separated function. From mertz at gnosis.cx Fri Oct 11 22:02:11 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 11 Oct 2013 13:02:11 -0700 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> Message-ID: What would you like this hypothetical function to output here: >>> from itertools import permutations >>> from decimal import Decimal as D >>> from fractions import Fraction as F >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a") >>> list(permutations(items)) It's neither QUITE equality nor identity you are looking for, I think, in nonredundant_permutation(): >> "aa" == "AA".lower(), "aa" is "AA".lower() (True, False) >>> "aa" == "a"+"a", "aa" is "a"+"a" (True, True) >>> D(3) == 3.0, D(3) is 3.0 (True, False) On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar wrote: > "It is universally agreed that a list of n distinct symbols has n! > permutations. However, when the symbols are not distinct, the most common > convention, in mathematics and elsewhere, seems to be to count only > distinct permutations." ? > http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original > . > > > Should we consider fixing itertools.permutations and to output only unique > permutations (if possible, although I realize that would break code). It is > completely non-obvious to have permutations returning duplicates. For a > non-breaking compromise what about adding a flag? > > Best, > Neil > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Oct 11 22:19:22 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 11 Oct 2013 13:19:22 -0700 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> Message-ID: <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> I think equality is perfectly reasonable here. The fact that {3.0, 3} only has one member seems like the obvious precedent to follow here. Sent from a random iPhone On Oct 11, 2013, at 13:02, David Mertz wrote: > What would you like this hypothetical function to output here: > > >>> from itertools import permutations > >>> from decimal import Decimal as D > >>> from fractions import Fraction as F > >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a") > >>> list(permutations(items)) > > It's neither QUITE equality nor identity you are looking for, I think, in nonredundant_permutation(): > > >> "aa" == "AA".lower(), "aa" is "AA".lower() > (True, False) > >>> "aa" == "a"+"a", "aa" is "a"+"a" > (True, True) > >>> D(3) == 3.0, D(3) is 3.0 > (True, False) > > On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar wrote: >> "It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations." ? http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original. >> >> >> Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code). It is completely non-obvious to have permutations returning duplicates. For a non-breaking compromise what about adding a flag? >> >> Best, >> Neil >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.brandvein at gmail.com Fri Oct 11 23:19:38 2013 From: jon.brandvein at gmail.com (Jonathan Brandvein) Date: Fri, 11 Oct 2013 17:19:38 -0400 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: I think it's fair to use {3.0, 3} as precedent. But note that transitivity is not required by the __eq__() method. In cases of intransitive equality (A == B == C but not A == C), I imagine the result should be ill-defined in the same way that sorting is when the key function is inconsistent. Jon On Fri, Oct 11, 2013 at 4:19 PM, Andrew Barnert wrote: > I think equality is perfectly reasonable here. The fact that {3.0, 3} only > has one member seems like the obvious precedent to follow here. > > Sent from a random iPhone > > On Oct 11, 2013, at 13:02, David Mertz wrote: > > What would you like this hypothetical function to output here: > > >>> from itertools import permutations > >>> from decimal import Decimal as D > >>> from fractions import Fraction as F > >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a") > >>> list(permutations(items)) > > It's neither QUITE equality nor identity you are looking for, I think, in > nonredundant_permutation(): > > >> "aa" == "AA".lower(), "aa" is "AA".lower() > (True, False) > >>> "aa" == "a"+"a", "aa" is "a"+"a" > (True, True) > >>> D(3) == 3.0, D(3) is 3.0 > (True, False) > > On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar wrote: > >> "It is universally agreed that a list of n distinct symbols has n! >> permutations. However, when the symbols are not distinct, the most common >> convention, in mathematics and elsewhere, seems to be to count only >> distinct permutations." ? >> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original >> . >> >> >> Should we consider fixing itertools.permutations and to output only >> unique permutations (if possible, although I realize that would break >> code). It is completely non-obvious to have permutations returning >> duplicates. For a non-breaking compromise what about adding a flag? >> >> Best, >> Neil >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Fri Oct 11 23:25:56 2013 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 11 Oct 2013 22:25:56 +0100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> Message-ID: <52586CE4.9030002@mrabarnett.plus.com> On 11/10/2013 20:29, Serhiy Storchaka wrote: > 11.10.13 21:38, Neil Girdhar ???????(??): >> Should we consider fixing itertools.permutations and to output only >> unique permutations (if possible, although I realize that would break >> code). It is completely non-obvious to have permutations returning >> duplicates. For a non-breaking compromise what about adding a flag? > > I think this should be separated function. > +1 From mertz at gnosis.cx Fri Oct 11 22:27:34 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 11 Oct 2013 13:27:34 -0700 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: Andrew & Neil (or whoever): Is this *really* what you want: >>> from itertools import permutations >>> def nonredundant_permutations(seq): ... return list(set(permutations(seq))) ... >>> pprint(list(permutations([F(3,1), D(3.0), 3.0]))) [(Fraction(3, 1), Decimal('3'), 3.0), (Fraction(3, 1), 3.0, Decimal('3')), (Decimal('3'), Fraction(3, 1), 3.0), (Decimal('3'), 3.0, Fraction(3, 1)), (3.0, Fraction(3, 1), Decimal('3')), (3.0, Decimal('3'), Fraction(3, 1))] >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))) [(Fraction(3, 1), Decimal('3'), 3.0)] It seems odd to me to want that. On the other hand, I provide a one-line implementation of the desired behavior if anyone wants it. Moreover, I don't think the runtime behavior of my one-liner is particularly costly... maybe not the best possible, but the best big-O possible. On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert wrote: > I think equality is perfectly reasonable here. The fact that {3.0, 3} only > has one member seems like the obvious precedent to follow here. > > Sent from a random iPhone > > On Oct 11, 2013, at 13:02, David Mertz wrote: > > What would you like this hypothetical function to output here: > > >>> from itertools import permutations > >>> from decimal import Decimal as D > >>> from fractions import Fraction as F > >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a") > >>> list(permutations(items)) > > It's neither QUITE equality nor identity you are looking for, I think, in > nonredundant_permutation(): > > >> "aa" == "AA".lower(), "aa" is "AA".lower() > (True, False) > >>> "aa" == "a"+"a", "aa" is "a"+"a" > (True, True) > >>> D(3) == 3.0, D(3) is 3.0 > (True, False) > > On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar wrote: > >> "It is universally agreed that a list of n distinct symbols has n! >> permutations. However, when the symbols are not distinct, the most common >> convention, in mathematics and elsewhere, seems to be to count only >> distinct permutations." ? >> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original >> . >> >> >> Should we consider fixing itertools.permutations and to output only >> unique permutations (if possible, although I realize that would break >> code). It is completely non-obvious to have permutations returning >> duplicates. For a non-breaking compromise what about adding a flag? >> >> Best, >> Neil >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Oct 11 23:35:41 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 11 Oct 2013 17:35:41 -0400 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: > Moreover, I don't think the runtime behavior of my one-liner is particularly costly? It is *extremely* costly. There can be n! permutations, so for even, say, 12 elements, you are looking at many gigabytes of memory needlessly used. One big motivator for itertools is not to have to do this. I'm curious how you would solve this problem: https://www.kattis.com/problems/industrialspy efficiently in Python. I did it by using a unique-ifying generator, but ideally this would not be necessary. Ideally, Python would do exactly what C++ does with next_permutation. Best, Neil On Fri, Oct 11, 2013 at 4:27 PM, David Mertz wrote: > Andrew & Neil (or whoever): > > Is this *really* what you want: > > >>> from itertools import permutations > >>> def nonredundant_permutations(seq): > ... return list(set(permutations(seq))) > ... > >>> pprint(list(permutations([F(3,1), D(3.0), 3.0]))) > [(Fraction(3, 1), Decimal('3'), 3.0), > (Fraction(3, 1), 3.0, Decimal('3')), > (Decimal('3'), Fraction(3, 1), 3.0), > (Decimal('3'), 3.0, Fraction(3, 1)), > (3.0, Fraction(3, 1), Decimal('3')), > (3.0, Decimal('3'), Fraction(3, 1))] > > >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))) > [(Fraction(3, 1), Decimal('3'), 3.0)] > > It seems odd to me to want that. On the other hand, I provide a one-line > implementation of the desired behavior if anyone wants it. Moreover, I > don't think the runtime behavior of my one-liner is particularly costly... > maybe not the best possible, but the best big-O possible. > > > > On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert wrote: > >> I think equality is perfectly reasonable here. The fact that {3.0, 3} >> only has one member seems like the obvious precedent to follow here. >> >> Sent from a random iPhone >> >> On Oct 11, 2013, at 13:02, David Mertz wrote: >> >> What would you like this hypothetical function to output here: >> >> >>> from itertools import permutations >> >>> from decimal import Decimal as D >> >>> from fractions import Fraction as F >> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a") >> >>> list(permutations(items)) >> >> It's neither QUITE equality nor identity you are looking for, I think, in >> nonredundant_permutation(): >> >> >> "aa" == "AA".lower(), "aa" is "AA".lower() >> (True, False) >> >>> "aa" == "a"+"a", "aa" is "a"+"a" >> (True, True) >> >>> D(3) == 3.0, D(3) is 3.0 >> (True, False) >> >> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar wrote: >> >>> "It is universally agreed that a list of n distinct symbols has n! >>> permutations. However, when the symbols are not distinct, the most common >>> convention, in mathematics and elsewhere, seems to be to count only >>> distinct permutations." ? >>> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original >>> . >>> >>> >>> Should we consider fixing itertools.permutations and to output only >>> unique permutations (if possible, although I realize that would break >>> code). It is completely non-obvious to have permutations returning >>> duplicates. For a non-breaking compromise what about adding a flag? >>> >>> Best, >>> Neil >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>> >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Fri Oct 11 23:38:41 2013 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 11 Oct 2013 22:38:41 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: <52586FE1.8040803@mrabarnett.plus.com> On 11/10/2013 21:27, David Mertz wrote: > Andrew & Neil (or whoever): > > Is this *really* what you want: > > >>> from itertools import permutations > >>> def nonredundant_permutations(seq): > ... return list(set(permutations(seq))) > ... > >>> pprint(list(permutations([F(3,1), D(3.0), 3.0]))) > [(Fraction(3, 1), Decimal('3'), 3.0), > (Fraction(3, 1), 3.0, Decimal('3')), > (Decimal('3'), Fraction(3, 1), 3.0), > (Decimal('3'), 3.0, Fraction(3, 1)), > (3.0, Fraction(3, 1), Decimal('3')), > (3.0, Decimal('3'), Fraction(3, 1))] > > >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))) > [(Fraction(3, 1), Decimal('3'), 3.0)] > > It seems odd to me to want that. On the other hand, I provide a > one-line implementation of the desired behavior if anyone wants it. > Moreover, I don't think the runtime behavior of my one-liner is > particularly costly... maybe not the best possible, but the best big-O > possible. > n! gets very big very fast, so that can be a very big set. If you sort the original items first then it's much easier to yield unique permutations without having to remember them. (Each would be > than the previous one, although you might have to map them to orderable keys if they're not orderable themselves, e.g. a mixture of integers and strings.) > > > On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert > wrote: > > I think equality is perfectly reasonable here. The fact that {3.0, > 3} only has one member seems like the obvious precedent to follow here. > > Sent from a random iPhone > > On Oct 11, 2013, at 13:02, David Mertz > wrote: > >> What would you like this hypothetical function to output here: >> >> >>> from itertools import permutations >> >>> from decimal import Decimal as D >> >>> from fractions import Fraction as F >> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a") >> >>> list(permutations(items)) >> >> It's neither QUITE equality nor identity you are looking for, I >> think, in nonredundant_permutation(): >> >> >> "aa" == "AA".lower(), "aa" is "AA".lower() >> (True, False) >> >>> "aa" == "a"+"a", "aa" is "a"+"a" >> (True, True) >> >>> D(3) == 3.0, D(3) is 3.0 >> (True, False) >> >> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar >> > wrote: >> >> "It is universally agreed that a list of n distinct symbols >> has n! permutations. However, when the symbols are not >> distinct, the most common convention, in mathematics and >> elsewhere, seems to be to count only distinct permutations." ? >> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original. >> >> >> Should we consider fixing itertools.permutations and to output >> only unique permutations (if possible, although I realize that >> would break code). It is completely non-obvious to have >> permutations returning duplicates. For a non-breaking >> compromise what about adding a flag? >> From mistersheik at gmail.com Fri Oct 11 23:38:27 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 11 Oct 2013 17:38:27 -0400 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: My code, which was the motivation for this suggestion: import itertools as it import math def is_prime(n): for i in range(2, int(math.floor(math.sqrt(n))) + 1): if n % i == 0: return False return n >= 2 def unique(iterable): # Should not be necessary in my opinion seen = set() for x in iterable: if x not in seen: seen.add(x) yield x n = int(input()) for _ in range(n): x = input() print(sum(is_prime(int("".join(y))) for len_ in range(1, len(x) + 1) for y in unique(it.permutations(x, len_)) if y[0] != '0')) On Fri, Oct 11, 2013 at 5:35 PM, Neil Girdhar wrote: > > Moreover, I don't think the runtime behavior of my one-liner is > particularly costly? > > It is *extremely* costly. There can be n! permutations, so for even, say, > 12 elements, you are looking at many gigabytes of memory needlessly used. > One big motivator for itertools is not to have to do this. I'm curious > how you would solve this problem: > https://www.kattis.com/problems/industrialspy efficiently in Python. I > did it by using a unique-ifying generator, but ideally this would not be > necessary. Ideally, Python would do exactly what C++ does with > next_permutation. > > Best, > > Neil > > > On Fri, Oct 11, 2013 at 4:27 PM, David Mertz wrote: > >> Andrew & Neil (or whoever): >> >> Is this *really* what you want: >> >> >>> from itertools import permutations >> >>> def nonredundant_permutations(seq): >> ... return list(set(permutations(seq))) >> ... >> >>> pprint(list(permutations([F(3,1), D(3.0), 3.0]))) >> [(Fraction(3, 1), Decimal('3'), 3.0), >> (Fraction(3, 1), 3.0, Decimal('3')), >> (Decimal('3'), Fraction(3, 1), 3.0), >> (Decimal('3'), 3.0, Fraction(3, 1)), >> (3.0, Fraction(3, 1), Decimal('3')), >> (3.0, Decimal('3'), Fraction(3, 1))] >> >> >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))) >> [(Fraction(3, 1), Decimal('3'), 3.0)] >> >> It seems odd to me to want that. On the other hand, I provide a one-line >> implementation of the desired behavior if anyone wants it. Moreover, I >> don't think the runtime behavior of my one-liner is particularly costly... >> maybe not the best possible, but the best big-O possible. >> >> >> >> On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert wrote: >> >>> I think equality is perfectly reasonable here. The fact that {3.0, 3} >>> only has one member seems like the obvious precedent to follow here. >>> >>> Sent from a random iPhone >>> >>> On Oct 11, 2013, at 13:02, David Mertz wrote: >>> >>> What would you like this hypothetical function to output here: >>> >>> >>> from itertools import permutations >>> >>> from decimal import Decimal as D >>> >>> from fractions import Fraction as F >>> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a") >>> >>> list(permutations(items)) >>> >>> It's neither QUITE equality nor identity you are looking for, I think, >>> in nonredundant_permutation(): >>> >>> >> "aa" == "AA".lower(), "aa" is "AA".lower() >>> (True, False) >>> >>> "aa" == "a"+"a", "aa" is "a"+"a" >>> (True, True) >>> >>> D(3) == 3.0, D(3) is 3.0 >>> (True, False) >>> >>> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar wrote: >>> >>>> "It is universally agreed that a list of n distinct symbols has n! >>>> permutations. However, when the symbols are not distinct, the most common >>>> convention, in mathematics and elsewhere, seems to be to count only >>>> distinct permutations." ? >>>> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original >>>> . >>>> >>>> >>>> Should we consider fixing itertools.permutations and to output only >>>> unique permutations (if possible, although I realize that would break >>>> code). It is completely non-obvious to have permutations returning >>>> duplicates. For a non-breaking compromise what about adding a flag? >>>> >>>> Best, >>>> Neil >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> >>>> >>> >>> >>> -- >>> Keeping medicines from the bloodstreams of the sick; food >>> from the bellies of the hungry; books from the hands of the >>> uneducated; technology from the underdeveloped; and putting >>> advocates of freedom in prisons. Intellectual property is >>> to the 21st century what the slave trade was to the 16th. >>> >>> >>> >>> -- >>> Keeping medicines from the bloodstreams of the sick; food >>> from the bellies of the hungry; books from the hands of the >>> uneducated; technology from the underdeveloped; and putting >>> advocates of freedom in prisons. Intellectual property is >>> to the 21st century what the slave trade was to the 16th. >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>> >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> -- >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> python-ideas+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Oct 11 20:50:00 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 11 Oct 2013 11:50:00 -0700 (PDT) Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> Message-ID: <26ae5ce9-6709-41a1-9dc2-6f8d5bc2f0bd@googlegroups.com> Note that if permutations is made to return only unique permutations, the behaviour of defining unique elements by index can be recovered using: ([it[index] for index in indexes] for indexes in itertools.permutations(range(len(it)))) On Friday, October 11, 2013 2:38:33 PM UTC-4, Neil Girdhar wrote: > > "It is universally agreed that a list of n distinct symbols has n! > permutations. However, when the symbols are not distinct, the most common > convention, in mathematics and elsewhere, seems to be to count only > distinct permutations." ? > http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original > . > > > Should we consider fixing itertools.permutations and to output only unique > permutations (if possible, although I realize that would break code). It is > completely non-obvious to have permutations returning duplicates. For a > non-breaking compromise what about adding a flag? > > Best, > Neil > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Oct 11 23:48:25 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 11 Oct 2013 14:48:25 -0700 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: OK, you're right. Just using set() has bad worst case memory costs. I was thinking of the case where there actually WERE lots of equalities, and hence the resulting list would be much smaller than N!. But of course that's not general. It takes more than one line, but here's an incremental version: def nonredundant_permutations(seq): seq = sorted(seq) last = None for perm in permutations(seq): if perm != last: yield perm last = perm On Fri, Oct 11, 2013 at 2:35 PM, Neil Girdhar wrote: > > Moreover, I don't think the runtime behavior of my one-liner is > particularly costly? > > It is *extremely* costly. There can be n! permutations, so for even, say, > 12 elements, you are looking at many gigabytes of memory needlessly used. > One big motivator for itertools is not to have to do this. I'm curious > how you would solve this problem: > https://www.kattis.com/problems/industrialspy efficiently in Python. I > did it by using a unique-ifying generator, but ideally this would not be > necessary. Ideally, Python would do exactly what C++ does with > next_permutation. > > Best, > > Neil > > > On Fri, Oct 11, 2013 at 4:27 PM, David Mertz wrote: > >> Andrew & Neil (or whoever): >> >> Is this *really* what you want: >> >> >>> from itertools import permutations >> >>> def nonredundant_permutations(seq): >> ... return list(set(permutations(seq))) >> ... >> >>> pprint(list(permutations([F(3,1), D(3.0), 3.0]))) >> [(Fraction(3, 1), Decimal('3'), 3.0), >> (Fraction(3, 1), 3.0, Decimal('3')), >> (Decimal('3'), Fraction(3, 1), 3.0), >> (Decimal('3'), 3.0, Fraction(3, 1)), >> (3.0, Fraction(3, 1), Decimal('3')), >> (3.0, Decimal('3'), Fraction(3, 1))] >> >> >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))) >> [(Fraction(3, 1), Decimal('3'), 3.0)] >> >> It seems odd to me to want that. On the other hand, I provide a one-line >> implementation of the desired behavior if anyone wants it. Moreover, I >> don't think the runtime behavior of my one-liner is particularly costly... >> maybe not the best possible, but the best big-O possible. >> >> >> >> On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert wrote: >> >>> I think equality is perfectly reasonable here. The fact that {3.0, 3} >>> only has one member seems like the obvious precedent to follow here. >>> >>> Sent from a random iPhone >>> >>> On Oct 11, 2013, at 13:02, David Mertz wrote: >>> >>> What would you like this hypothetical function to output here: >>> >>> >>> from itertools import permutations >>> >>> from decimal import Decimal as D >>> >>> from fractions import Fraction as F >>> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a") >>> >>> list(permutations(items)) >>> >>> It's neither QUITE equality nor identity you are looking for, I think, >>> in nonredundant_permutation(): >>> >>> >> "aa" == "AA".lower(), "aa" is "AA".lower() >>> (True, False) >>> >>> "aa" == "a"+"a", "aa" is "a"+"a" >>> (True, True) >>> >>> D(3) == 3.0, D(3) is 3.0 >>> (True, False) >>> >>> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar wrote: >>> >>>> "It is universally agreed that a list of n distinct symbols has n! >>>> permutations. However, when the symbols are not distinct, the most common >>>> convention, in mathematics and elsewhere, seems to be to count only >>>> distinct permutations." ? >>>> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original >>>> . >>>> >>>> >>>> Should we consider fixing itertools.permutations and to output only >>>> unique permutations (if possible, although I realize that would break >>>> code). It is completely non-obvious to have permutations returning >>>> duplicates. For a non-breaking compromise what about adding a flag? >>>> >>>> Best, >>>> Neil >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> >>>> >>> >>> >>> -- >>> Keeping medicines from the bloodstreams of the sick; food >>> from the bellies of the hungry; books from the hands of the >>> uneducated; technology from the underdeveloped; and putting >>> advocates of freedom in prisons. Intellectual property is >>> to the 21st century what the slave trade was to the 16th. >>> >>> >>> >>> -- >>> Keeping medicines from the bloodstreams of the sick; food >>> from the bellies of the hungry; books from the hands of the >>> uneducated; technology from the underdeveloped; and putting >>> advocates of freedom in prisons. Intellectual property is >>> to the 21st century what the slave trade was to the 16th. >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>> >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> -- >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> python-ideas+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Oct 11 23:51:06 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 11 Oct 2013 17:51:06 -0400 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: Unfortunately, that doesn't quite work? list("".join(x) for x in it.permutations('aaabb', 3)) ['aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba'] On Fri, Oct 11, 2013 at 5:48 PM, David Mertz wrote: > OK, you're right. Just using set() has bad worst case memory costs. I > was thinking of the case where there actually WERE lots of equalities, and > hence the resulting list would be much smaller than N!. But of course > that's not general. It takes more than one line, but here's an incremental > version: > > def nonredundant_permutations(seq): > seq = sorted(seq) > last = None > for perm in permutations(seq): > if perm != last: > yield perm > last = perm > > > On Fri, Oct 11, 2013 at 2:35 PM, Neil Girdhar wrote: > >> > Moreover, I don't think the runtime behavior of my one-liner is >> particularly costly? >> >> It is *extremely* costly. There can be n! permutations, so for even, >> say, 12 elements, you are looking at many gigabytes of memory needlessly >> used. One big motivator for itertools is not to have to do this. I'm >> curious how you would solve this problem: >> https://www.kattis.com/problems/industrialspy efficiently in Python. I >> did it by using a unique-ifying generator, but ideally this would not be >> necessary. Ideally, Python would do exactly what C++ does with >> next_permutation. >> >> Best, >> >> Neil >> >> >> On Fri, Oct 11, 2013 at 4:27 PM, David Mertz wrote: >> >>> Andrew & Neil (or whoever): >>> >>> Is this *really* what you want: >>> >>> >>> from itertools import permutations >>> >>> def nonredundant_permutations(seq): >>> ... return list(set(permutations(seq))) >>> ... >>> >>> pprint(list(permutations([F(3,1), D(3.0), 3.0]))) >>> [(Fraction(3, 1), Decimal('3'), 3.0), >>> (Fraction(3, 1), 3.0, Decimal('3')), >>> (Decimal('3'), Fraction(3, 1), 3.0), >>> (Decimal('3'), 3.0, Fraction(3, 1)), >>> (3.0, Fraction(3, 1), Decimal('3')), >>> (3.0, Decimal('3'), Fraction(3, 1))] >>> >>> >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))) >>> [(Fraction(3, 1), Decimal('3'), 3.0)] >>> >>> It seems odd to me to want that. On the other hand, I provide a >>> one-line implementation of the desired behavior if anyone wants it. >>> Moreover, I don't think the runtime behavior of my one-liner is >>> particularly costly... maybe not the best possible, but the best big-O >>> possible. >>> >>> >>> >>> On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert wrote: >>> >>>> I think equality is perfectly reasonable here. The fact that {3.0, 3} >>>> only has one member seems like the obvious precedent to follow here. >>>> >>>> Sent from a random iPhone >>>> >>>> On Oct 11, 2013, at 13:02, David Mertz wrote: >>>> >>>> What would you like this hypothetical function to output here: >>>> >>>> >>> from itertools import permutations >>>> >>> from decimal import Decimal as D >>>> >>> from fractions import Fraction as F >>>> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a") >>>> >>> list(permutations(items)) >>>> >>>> It's neither QUITE equality nor identity you are looking for, I think, >>>> in nonredundant_permutation(): >>>> >>>> >> "aa" == "AA".lower(), "aa" is "AA".lower() >>>> (True, False) >>>> >>> "aa" == "a"+"a", "aa" is "a"+"a" >>>> (True, True) >>>> >>> D(3) == 3.0, D(3) is 3.0 >>>> (True, False) >>>> >>>> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar wrote: >>>> >>>>> "It is universally agreed that a list of n distinct symbols has n! >>>>> permutations. However, when the symbols are not distinct, the most common >>>>> convention, in mathematics and elsewhere, seems to be to count only >>>>> distinct permutations." ? >>>>> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original >>>>> . >>>>> >>>>> >>>>> Should we consider fixing itertools.permutations and to output only >>>>> unique permutations (if possible, although I realize that would break >>>>> code). It is completely non-obvious to have permutations returning >>>>> duplicates. For a non-breaking compromise what about adding a flag? >>>>> >>>>> Best, >>>>> Neil >>>>> >>>>> _______________________________________________ >>>>> Python-ideas mailing list >>>>> Python-ideas at python.org >>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>> >>>>> >>>> >>>> >>>> -- >>>> Keeping medicines from the bloodstreams of the sick; food >>>> from the bellies of the hungry; books from the hands of the >>>> uneducated; technology from the underdeveloped; and putting >>>> advocates of freedom in prisons. Intellectual property is >>>> to the 21st century what the slave trade was to the 16th. >>>> >>>> >>>> >>>> -- >>>> Keeping medicines from the bloodstreams of the sick; food >>>> from the bellies of the hungry; books from the hands of the >>>> uneducated; technology from the underdeveloped; and putting >>>> advocates of freedom in prisons. Intellectual property is >>>> to the 21st century what the slave trade was to the 16th. >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> >>>> >>> >>> >>> -- >>> Keeping medicines from the bloodstreams of the sick; food >>> from the bellies of the hungry; books from the hands of the >>> uneducated; technology from the underdeveloped; and putting >>> advocates of freedom in prisons. Intellectual property is >>> to the 21st century what the slave trade was to the 16th. >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>> -- >>> >>> --- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "python-ideas" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> python-ideas+unsubscribe at googlegroups.com. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >>> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Oct 12 00:03:48 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 11 Oct 2013 15:03:48 -0700 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: Bummer. You are right, Neil. I saw MRAB's suggestion about sorting, and falsely thought that would be general; but obviously it's not. So I guess the question is whether there is ANY way to do this without having to accumulate a 'seen' set (which can grow to size N!). The answer isn't jumping out at me, but that doesn't mean there's not a way. I don't want itertools.permutations() to do "equality filtering", but assuming some other function in itertools were to do that, how could it do so algorithmically? Or whatever, same question if it is itertools.permutations(seq, distinct=True) as the API. On Fri, Oct 11, 2013 at 2:51 PM, Neil Girdhar wrote: > Unfortunately, that doesn't quite work? > > list("".join(x) for x in it.permutations('aaabb', 3)) > ['aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', > 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', > 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', > 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'baa', 'baa', 'bab', 'baa', > 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba', 'baa', 'baa', > 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba'] > > > On Fri, Oct 11, 2013 at 5:48 PM, David Mertz wrote: > >> OK, you're right. Just using set() has bad worst case memory costs. I >> was thinking of the case where there actually WERE lots of equalities, and >> hence the resulting list would be much smaller than N!. But of course >> that's not general. It takes more than one line, but here's an incremental >> version: >> >> def nonredundant_permutations(seq): >> seq = sorted(seq) >> last = None >> for perm in permutations(seq): >> if perm != last: >> yield perm >> last = perm >> >> >> On Fri, Oct 11, 2013 at 2:35 PM, Neil Girdhar wrote: >> >>> > Moreover, I don't think the runtime behavior of my one-liner is >>> particularly costly? >>> >>> It is *extremely* costly. There can be n! permutations, so for even, >>> say, 12 elements, you are looking at many gigabytes of memory needlessly >>> used. One big motivator for itertools is not to have to do this. I'm >>> curious how you would solve this problem: >>> https://www.kattis.com/problems/industrialspy efficiently in Python. >>> I did it by using a unique-ifying generator, but ideally this would not be >>> necessary. Ideally, Python would do exactly what C++ does with >>> next_permutation. >>> >>> Best, >>> >>> Neil >>> >>> >>> On Fri, Oct 11, 2013 at 4:27 PM, David Mertz wrote: >>> >>>> Andrew & Neil (or whoever): >>>> >>>> Is this *really* what you want: >>>> >>>> >>> from itertools import permutations >>>> >>> def nonredundant_permutations(seq): >>>> ... return list(set(permutations(seq))) >>>> ... >>>> >>> pprint(list(permutations([F(3,1), D(3.0), 3.0]))) >>>> [(Fraction(3, 1), Decimal('3'), 3.0), >>>> (Fraction(3, 1), 3.0, Decimal('3')), >>>> (Decimal('3'), Fraction(3, 1), 3.0), >>>> (Decimal('3'), 3.0, Fraction(3, 1)), >>>> (3.0, Fraction(3, 1), Decimal('3')), >>>> (3.0, Decimal('3'), Fraction(3, 1))] >>>> >>>> >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))) >>>> [(Fraction(3, 1), Decimal('3'), 3.0)] >>>> >>>> It seems odd to me to want that. On the other hand, I provide a >>>> one-line implementation of the desired behavior if anyone wants it. >>>> Moreover, I don't think the runtime behavior of my one-liner is >>>> particularly costly... maybe not the best possible, but the best big-O >>>> possible. >>>> >>>> >>>> >>>> On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert wrote: >>>> >>>>> I think equality is perfectly reasonable here. The fact that {3.0, 3} >>>>> only has one member seems like the obvious precedent to follow here. >>>>> >>>>> Sent from a random iPhone >>>>> >>>>> On Oct 11, 2013, at 13:02, David Mertz wrote: >>>>> >>>>> What would you like this hypothetical function to output here: >>>>> >>>>> >>> from itertools import permutations >>>>> >>> from decimal import Decimal as D >>>>> >>> from fractions import Fraction as F >>>>> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a") >>>>> >>> list(permutations(items)) >>>>> >>>>> It's neither QUITE equality nor identity you are looking for, I think, >>>>> in nonredundant_permutation(): >>>>> >>>>> >> "aa" == "AA".lower(), "aa" is "AA".lower() >>>>> (True, False) >>>>> >>> "aa" == "a"+"a", "aa" is "a"+"a" >>>>> (True, True) >>>>> >>> D(3) == 3.0, D(3) is 3.0 >>>>> (True, False) >>>>> >>>>> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar wrote: >>>>> >>>>>> "It is universally agreed that a list of n distinct symbols has n! >>>>>> permutations. However, when the symbols are not distinct, the most common >>>>>> convention, in mathematics and elsewhere, seems to be to count only >>>>>> distinct permutations." ? >>>>>> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original >>>>>> . >>>>>> >>>>>> >>>>>> Should we consider fixing itertools.permutations and to output only >>>>>> unique permutations (if possible, although I realize that would break >>>>>> code). It is completely non-obvious to have permutations returning >>>>>> duplicates. For a non-breaking compromise what about adding a flag? >>>>>> >>>>>> Best, >>>>>> Neil >>>>>> >>>>>> _______________________________________________ >>>>>> Python-ideas mailing list >>>>>> Python-ideas at python.org >>>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Keeping medicines from the bloodstreams of the sick; food >>>>> from the bellies of the hungry; books from the hands of the >>>>> uneducated; technology from the underdeveloped; and putting >>>>> advocates of freedom in prisons. Intellectual property is >>>>> to the 21st century what the slave trade was to the 16th. >>>>> >>>>> >>>>> >>>>> -- >>>>> Keeping medicines from the bloodstreams of the sick; food >>>>> from the bellies of the hungry; books from the hands of the >>>>> uneducated; technology from the underdeveloped; and putting >>>>> advocates of freedom in prisons. Intellectual property is >>>>> to the 21st century what the slave trade was to the 16th. >>>>> >>>>> _______________________________________________ >>>>> Python-ideas mailing list >>>>> Python-ideas at python.org >>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>> >>>>> >>>> >>>> >>>> -- >>>> Keeping medicines from the bloodstreams of the sick; food >>>> from the bellies of the hungry; books from the hands of the >>>> uneducated; technology from the underdeveloped; and putting >>>> advocates of freedom in prisons. Intellectual property is >>>> to the 21st century what the slave trade was to the 16th. >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> >>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "python-ideas" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> python-ideas+unsubscribe at googlegroups.com. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>> >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Oct 12 00:19:34 2013 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 11 Oct 2013 23:19:34 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: <52587976.1000901@mrabarnett.plus.com> On 11/10/2013 23:03, David Mertz wrote: > Bummer. You are right, Neil. I saw MRAB's suggestion about sorting, > and falsely thought that would be general; but obviously it's not. > > So I guess the question is whether there is ANY way to do this without > having to accumulate a 'seen' set (which can grow to size N!). The > answer isn't jumping out at me, but that doesn't mean there's not a way. > > I don't want itertools.permutations() to do "equality filtering", but > assuming some other function in itertools were to do that, how could it > do so algorithmically? Or whatever, same question if it is > itertools.permutations(seq, distinct=True) as the API. > Here's an implementation: def unique_permutations(iterable, count=None, key=None): def perm(items, count): if count: prev_item = object() for i, item in enumerate(items): if item != prev_item: for p in perm(items[ : i] + items[i + 1 : ], count - 1): yield [item] + p prev_item = item else: yield [] if key is None: key = lambda item: item items = sorted(iterable, key=key) if count is None: count = len(items) yield from perm(items, count) And some results: >>> print(list("".join(x) for x in unique_permutations('aaabb', 3))) ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] >>> print(list(unique_permutations([0, 'a', 0], key=str))) [[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]] From mistersheik at gmail.com Sat Oct 12 00:23:36 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 11 Oct 2013 18:23:36 -0400 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: <52587976.1000901@mrabarnett.plus.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> Message-ID: Beautiful!! On Fri, Oct 11, 2013 at 6:19 PM, MRAB wrote: > On 11/10/2013 23:03, David Mertz wrote: > >> Bummer. You are right, Neil. I saw MRAB's suggestion about sorting, >> and falsely thought that would be general; but obviously it's not. >> >> So I guess the question is whether there is ANY way to do this without >> having to accumulate a 'seen' set (which can grow to size N!). The >> answer isn't jumping out at me, but that doesn't mean there's not a way. >> >> I don't want itertools.permutations() to do "equality filtering", but >> assuming some other function in itertools were to do that, how could it >> do so algorithmically? Or whatever, same question if it is >> itertools.permutations(seq, distinct=True) as the API. >> >> Here's an implementation: > > def unique_permutations(iterable, count=None, key=None): > def perm(items, count): > if count: > prev_item = object() > > for i, item in enumerate(items): > if item != prev_item: > for p in perm(items[ : i] + items[i + 1 : ], count - > 1): > yield [item] + p > > prev_item = item > > else: > yield [] > > if key is None: > key = lambda item: item > > items = sorted(iterable, key=key) > > if count is None: > count = len(items) > > yield from perm(items, count) > > > And some results: > > >>> print(list("".join(x) for x in unique_permutations('aaabb', 3))) > ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] > >>> print(list(unique_**permutations([0, 'a', 0], key=str))) > [[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]] > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit https://groups.google.com/d/** > topic/python-ideas/**dDttJfkyu2k/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe@**googlegroups.com > . > For more options, visit https://groups.google.com/**groups/opt_out > . > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Oct 12 00:45:04 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 11 Oct 2013 15:45:04 -0700 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> Message-ID: I realize after reading http://stackoverflow.com/questions/6284396/permutations-with-unique-valuesthat my version was ALMOST right: def nonredundant_permutations(seq, r=None): last = () for perm in permutations(sorted(seq), r): if perm > last: yield perm last = perm I can't look only for inequality, but must use the actual comparison. >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)] ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0])) [(Fraction(3, 1), Decimal('3'), 3.0)] Of course, this approach DOES rely on the order in which itertools.permutations() returns values. However, it's a bit more compact than MRAB's version. On Fri, Oct 11, 2013 at 3:23 PM, Neil Girdhar wrote: > Beautiful!! > > > On Fri, Oct 11, 2013 at 6:19 PM, MRAB wrote: > >> On 11/10/2013 23:03, David Mertz wrote: >> >>> Bummer. You are right, Neil. I saw MRAB's suggestion about sorting, >>> and falsely thought that would be general; but obviously it's not. >>> >>> So I guess the question is whether there is ANY way to do this without >>> having to accumulate a 'seen' set (which can grow to size N!). The >>> answer isn't jumping out at me, but that doesn't mean there's not a way. >>> >>> I don't want itertools.permutations() to do "equality filtering", but >>> assuming some other function in itertools were to do that, how could it >>> do so algorithmically? Or whatever, same question if it is >>> itertools.permutations(seq, distinct=True) as the API. >>> >>> Here's an implementation: >> >> def unique_permutations(iterable, count=None, key=None): >> def perm(items, count): >> if count: >> prev_item = object() >> >> for i, item in enumerate(items): >> if item != prev_item: >> for p in perm(items[ : i] + items[i + 1 : ], count - >> 1): >> yield [item] + p >> >> prev_item = item >> >> else: >> yield [] >> >> if key is None: >> key = lambda item: item >> >> items = sorted(iterable, key=key) >> >> if count is None: >> count = len(items) >> >> yield from perm(items, count) >> >> >> And some results: >> >> >>> print(list("".join(x) for x in unique_permutations('aaabb', 3))) >> ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] >> >>> print(list(unique_**permutations([0, 'a', 0], key=str))) >> [[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]] >> >> >> ______________________________**_________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/**mailman/listinfo/python-ideas >> >> -- >> >> --- You received this message because you are subscribed to a topic in >> the Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/** >> topic/python-ideas/**dDttJfkyu2k/unsubscribe >> . >> To unsubscribe from this group and all its topics, send an email to >> python-ideas+unsubscribe@**googlegroups.com >> . >> For more options, visit https://groups.google.com/**groups/opt_out >> . >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Oct 12 01:49:55 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 12 Oct 2013 09:49:55 +1000 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> Message-ID: On 12 Oct 2013 08:45, "David Mertz" wrote: > > > I realize after reading http://stackoverflow.com/questions/6284396/permutations-with-unique-valuesthat my version was ALMOST right: > > def nonredundant_permutations(seq, r=None): > last = () > for perm in permutations(sorted(seq), r): > if perm > last: > yield perm > last = perm > > I can't look only for inequality, but must use the actual comparison. > > >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)] > ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] > >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0])) > [(Fraction(3, 1), Decimal('3'), 3.0)] > > Of course, this approach DOES rely on the order in which itertools.permutations() returns values. However, it's a bit more compact than MRAB's version. As there is no requirement that entries in a sequence handled by itertools.permutations be sortable, so the original question of why this isn't done by default has been answered (the general solution risks consuming too much memory, while the memory efficient solution constrains the domain to only sortable sequences). Cheers, Nick. > > > > > On Fri, Oct 11, 2013 at 3:23 PM, Neil Girdhar wrote: >> >> Beautiful!! >> >> >> On Fri, Oct 11, 2013 at 6:19 PM, MRAB wrote: >>> >>> On 11/10/2013 23:03, David Mertz wrote: >>>> >>>> Bummer. You are right, Neil. I saw MRAB's suggestion about sorting, >>>> and falsely thought that would be general; but obviously it's not. >>>> >>>> So I guess the question is whether there is ANY way to do this without >>>> having to accumulate a 'seen' set (which can grow to size N!). The >>>> answer isn't jumping out at me, but that doesn't mean there's not a way. >>>> >>>> I don't want itertools.permutations() to do "equality filtering", but >>>> assuming some other function in itertools were to do that, how could it >>>> do so algorithmically? Or whatever, same question if it is >>>> itertools.permutations(seq, distinct=True) as the API. >>>> >>> Here's an implementation: >>> >>> def unique_permutations(iterable, count=None, key=None): >>> def perm(items, count): >>> if count: >>> prev_item = object() >>> >>> for i, item in enumerate(items): >>> if item != prev_item: >>> for p in perm(items[ : i] + items[i + 1 : ], count - 1): >>> yield [item] + p >>> >>> prev_item = item >>> >>> else: >>> yield [] >>> >>> if key is None: >>> key = lambda item: item >>> >>> items = sorted(iterable, key=key) >>> >>> if count is None: >>> count = len(items) >>> >>> yield from perm(items, count) >>> >>> >>> And some results: >>> >>> >>> print(list("".join(x) for x in unique_permutations('aaabb', 3))) >>> ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] >>> >>> print(list(unique_permutations([0, 'a', 0], key=str))) >>> [[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]] >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>> -- >>> >>> --- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. >>> To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe at googlegroups.com. >>> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Oct 12 01:53:31 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 11 Oct 2013 19:53:31 -0400 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> Message-ID: Yes, that's all true. I want to suggest that the efficient unique permutations solution is very important to have. Sortable sequences are very common. There are itertools routines that only work with =-comparable elements (e.g. groupby), so it's not a stretch to have a permutations that is restricted to <-comparable elements. Best, Neil On Fri, Oct 11, 2013 at 7:49 PM, Nick Coghlan wrote: > > On 12 Oct 2013 08:45, "David Mertz" wrote: > > > > > > I realize after reading > http://stackoverflow.com/questions/6284396/permutations-with-unique-valuesthat my version was ALMOST right: > > > > def nonredundant_permutations(seq, r=None): > > last = () > > for perm in permutations(sorted(seq), r): > > if perm > last: > > yield perm > > last = perm > > > > I can't look only for inequality, but must use the actual comparison. > > > > >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)] > > ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] > > >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0])) > > [(Fraction(3, 1), Decimal('3'), 3.0)] > > > > Of course, this approach DOES rely on the order in which > itertools.permutations() returns values. However, it's a bit more compact > than MRAB's version. > > As there is no requirement that entries in a sequence handled by > itertools.permutations be sortable, so the original question of why this > isn't done by default has been answered (the general solution risks > consuming too much memory, while the memory efficient solution constrains > the domain to only sortable sequences). > > Cheers, > Nick. > > > > > > > > > > > On Fri, Oct 11, 2013 at 3:23 PM, Neil Girdhar > wrote: > >> > >> Beautiful!! > >> > >> > >> On Fri, Oct 11, 2013 at 6:19 PM, MRAB > wrote: > >>> > >>> On 11/10/2013 23:03, David Mertz wrote: > >>>> > >>>> Bummer. You are right, Neil. I saw MRAB's suggestion about sorting, > >>>> and falsely thought that would be general; but obviously it's not. > >>>> > >>>> So I guess the question is whether there is ANY way to do this without > >>>> having to accumulate a 'seen' set (which can grow to size N!). The > >>>> answer isn't jumping out at me, but that doesn't mean there's not a > way. > >>>> > >>>> I don't want itertools.permutations() to do "equality filtering", but > >>>> assuming some other function in itertools were to do that, how could > it > >>>> do so algorithmically? Or whatever, same question if it is > >>>> itertools.permutations(seq, distinct=True) as the API. > >>>> > >>> Here's an implementation: > >>> > >>> def unique_permutations(iterable, count=None, key=None): > >>> def perm(items, count): > >>> if count: > >>> prev_item = object() > >>> > >>> for i, item in enumerate(items): > >>> if item != prev_item: > >>> for p in perm(items[ : i] + items[i + 1 : ], count > - 1): > >>> yield [item] + p > >>> > >>> prev_item = item > >>> > >>> else: > >>> yield [] > >>> > >>> if key is None: > >>> key = lambda item: item > >>> > >>> items = sorted(iterable, key=key) > >>> > >>> if count is None: > >>> count = len(items) > >>> > >>> yield from perm(items, count) > >>> > >>> > >>> And some results: > >>> > >>> >>> print(list("".join(x) for x in unique_permutations('aaabb', 3))) > >>> ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] > >>> >>> print(list(unique_permutations([0, 'a', 0], key=str))) > >>> [[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]] > >>> > >>> > >>> _______________________________________________ > >>> Python-ideas mailing list > >>> Python-ideas at python.org > >>> https://mail.python.org/mailman/listinfo/python-ideas > >>> > >>> -- > >>> > >>> --- You received this message because you are subscribed to a topic in > the Google Groups "python-ideas" group. > >>> To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. > >>> To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > >>> For more options, visit https://groups.google.com/groups/opt_out. > >> > >> > >> > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> > > > > > > > > -- > > Keeping medicines from the bloodstreams of the sick; food > > from the bellies of the hungry; books from the hands of the > > uneducated; technology from the underdeveloped; and putting > > advocates of freedom in prisons. Intellectual property is > > to the 21st century what the slave trade was to the 16th. > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Oct 12 02:20:17 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 11 Oct 2013 17:20:17 -0700 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> Message-ID: I think this is worth having even for 3.3 and 2.x, so I'd suggest sending a patch to more-itertools (https://github.com/erikrose/more-itertools) as well as here. Sent from a random iPhone On Oct 11, 2013, at 16:53, Neil Girdhar wrote: > Yes, that's all true. I want to suggest that the efficient unique permutations solution is very important to have. Sortable sequences are very common. There are itertools routines that only work with =-comparable elements (e.g. groupby), so it's not a stretch to have a permutations that is restricted to <-comparable elements. > > Best, > Neil > > > On Fri, Oct 11, 2013 at 7:49 PM, Nick Coghlan wrote: >> >> On 12 Oct 2013 08:45, "David Mertz" wrote: >> > >> > >> > I realize after reading http://stackoverflow.com/questions/6284396/permutations-with-unique-values that my version was ALMOST right: >> > >> > def nonredundant_permutations(seq, r=None): >> > last = () >> > for perm in permutations(sorted(seq), r): >> > if perm > last: >> > yield perm >> > last = perm >> > >> > I can't look only for inequality, but must use the actual comparison. >> > >> > >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)] >> > ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] >> > >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0])) >> > [(Fraction(3, 1), Decimal('3'), 3.0)] >> > >> > Of course, this approach DOES rely on the order in which itertools.permutations() returns values. However, it's a bit more compact than MRAB's version. >> >> As there is no requirement that entries in a sequence handled by itertools.permutations be sortable, so the original question of why this isn't done by default has been answered (the general solution risks consuming too much memory, while the memory efficient solution constrains the domain to only sortable sequences). >> >> Cheers, >> Nick. >> >> > >> > >> > >> > >> > On Fri, Oct 11, 2013 at 3:23 PM, Neil Girdhar wrote: >> >> >> >> Beautiful!! >> >> >> >> >> >> On Fri, Oct 11, 2013 at 6:19 PM, MRAB wrote: >> >>> >> >>> On 11/10/2013 23:03, David Mertz wrote: >> >>>> >> >>>> Bummer. You are right, Neil. I saw MRAB's suggestion about sorting, >> >>>> and falsely thought that would be general; but obviously it's not. >> >>>> >> >>>> So I guess the question is whether there is ANY way to do this without >> >>>> having to accumulate a 'seen' set (which can grow to size N!). The >> >>>> answer isn't jumping out at me, but that doesn't mean there's not a way. >> >>>> >> >>>> I don't want itertools.permutations() to do "equality filtering", but >> >>>> assuming some other function in itertools were to do that, how could it >> >>>> do so algorithmically? Or whatever, same question if it is >> >>>> itertools.permutations(seq, distinct=True) as the API. >> >>>> >> >>> Here's an implementation: >> >>> >> >>> def unique_permutations(iterable, count=None, key=None): >> >>> def perm(items, count): >> >>> if count: >> >>> prev_item = object() >> >>> >> >>> for i, item in enumerate(items): >> >>> if item != prev_item: >> >>> for p in perm(items[ : i] + items[i + 1 : ], count - 1): >> >>> yield [item] + p >> >>> >> >>> prev_item = item >> >>> >> >>> else: >> >>> yield [] >> >>> >> >>> if key is None: >> >>> key = lambda item: item >> >>> >> >>> items = sorted(iterable, key=key) >> >>> >> >>> if count is None: >> >>> count = len(items) >> >>> >> >>> yield from perm(items, count) >> >>> >> >>> >> >>> And some results: >> >>> >> >>> >>> print(list("".join(x) for x in unique_permutations('aaabb', 3))) >> >>> ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] >> >>> >>> print(list(unique_permutations([0, 'a', 0], key=str))) >> >>> [[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]] >> >>> >> >>> >> >>> _______________________________________________ >> >>> Python-ideas mailing list >> >>> Python-ideas at python.org >> >>> https://mail.python.org/mailman/listinfo/python-ideas >> >>> >> >>> -- >> >>> >> >>> --- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. >> >>> To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. >> >>> To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe at googlegroups.com. >> >>> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> >> >> >> >> >> _______________________________________________ >> >> Python-ideas mailing list >> >> Python-ideas at python.org >> >> https://mail.python.org/mailman/listinfo/python-ideas >> >> >> > >> > >> > >> > -- >> > Keeping medicines from the bloodstreams of the sick; food >> > from the bellies of the hungry; books from the hands of the >> > uneducated; technology from the underdeveloped; and putting >> > advocates of freedom in prisons. Intellectual property is >> > to the 21st century what the slave trade was to the 16th. >> > >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Oct 12 03:55:23 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 12 Oct 2013 02:55:23 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> Message-ID: <5258AC0B.1090603@mrabarnett.plus.com> On 12/10/2013 00:49, Nick Coghlan wrote: > > On 12 Oct 2013 08:45, "David Mertz" > wrote: > > > > > > I realize after reading > http://stackoverflow.com/questions/6284396/permutations-with-unique-values > that my version was ALMOST right: > > > > def nonredundant_permutations(seq, r=None): > > last = () > > for perm in permutations(sorted(seq), r): > > if perm > last: > > yield perm > > last = perm > > > > I can't look only for inequality, but must use the actual comparison. > > > > >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)] > > ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] > > >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0])) > > [(Fraction(3, 1), Decimal('3'), 3.0)] > > > > Of course, this approach DOES rely on the order in which > itertools.permutations() returns values. However, it's a bit more > compact than MRAB's version. > > As there is no requirement that entries in a sequence handled by > itertools.permutations be sortable, so the original question of why this > isn't done by default has been answered (the general solution risks > consuming too much memory, while the memory efficient solution > constrains the domain to only sortable sequences). > OK, here's a new implementation: def unique_permutations(iterable, count=None): def perm(items, count): if count: prev_item = object() for i, item in enumerate(items): if item != prev_item: for p in perm(items[ : i] + items[i + 1 : ], count - 1): yield [item] + p prev_item = item else: yield [] items = list(iterable) keys = {} for item in items: keys.setdefault(item, len(keys)) items.sort(key=keys.get) if count is None: count = len(items) yield from perm(items, count) From steve at pearwood.info Sat Oct 12 04:06:48 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 12 Oct 2013 13:06:48 +1100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> Message-ID: <20131012020647.GH7989@ando> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote: > "It is universally agreed that a list of n distinct symbols has n! > permutations. However, when the symbols are not distinct, the most common > convention, in mathematics and elsewhere, seems to be to count only > distinct permutations." ? I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn. Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities: py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'} which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations: py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB? So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations. I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't. > Should we consider fixing itertools.permutations and to output only unique > permutations (if possible, although I realize that would break code). Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker. -- Steven From python at mrabarnett.plus.com Sat Oct 12 04:34:33 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 12 Oct 2013 03:34:33 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: <5258AC0B.1090603@mrabarnett.plus.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> Message-ID: <5258B539.10307@mrabarnett.plus.com> On 12/10/2013 02:55, MRAB wrote: > On 12/10/2013 00:49, Nick Coghlan wrote: >> >> On 12 Oct 2013 08:45, "David Mertz" > > wrote: >> > >> > >> > I realize after reading >> http://stackoverflow.com/questions/6284396/permutations-with-unique-values >> that my version was ALMOST right: >> > >> > def nonredundant_permutations(seq, r=None): >> > last = () >> > for perm in permutations(sorted(seq), r): >> > if perm > last: >> > yield perm >> > last = perm >> > >> > I can't look only for inequality, but must use the actual comparison. >> > >> > >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)] >> > ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] >> > >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0])) >> > [(Fraction(3, 1), Decimal('3'), 3.0)] >> > >> > Of course, this approach DOES rely on the order in which >> itertools.permutations() returns values. However, it's a bit more >> compact than MRAB's version. >> >> As there is no requirement that entries in a sequence handled by >> itertools.permutations be sortable, so the original question of why this >> isn't done by default has been answered (the general solution risks >> consuming too much memory, while the memory efficient solution >> constrains the domain to only sortable sequences). >> > OK, here's a new implementation: > [snip] I've just realised that I don't need to sort them at all. Here's a new improved implementation: def unique_permutations(iterable, count=None): def perm(items, count): if count: seen = set() for i, item in enumerate(items): if item not in seen: for p in perm(items[ : i] + items[i + 1 : ], count - 1): yield [item] + p seen.add(item) else: yield [] items = list(iterable) if count is None: count = len(items) yield from perm(items, count) From mertz at gnosis.cx Sat Oct 12 04:36:26 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 11 Oct 2013 19:36:26 -0700 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: <5258AC0B.1090603@mrabarnett.plus.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> Message-ID: Hi MRAB, I'm confused by your implementation. In particular, what do these lines do? # [...] items = list(iterable) keys = {} for item in items: keys.setdefault(item, len(keys)) items.sort(key=keys.get) I cannot understand how these can possibly have any effect (other than the first line that makes a concrete list out of an iterable). We loop through the list in its natural order. E.g. say the list is '[a, b, c]' (where those names are any types of objects whatsoever). The loop gives us: keys == {a:0, b:1, c:2} When we do a sort on 'key=keys.get()' how can that ever possibly change the order of 'items'? There's also a bit of a flaw in that your implementation blows up if anything yielded by iterable isn't hashable: >>> list(unique_permutations([ [1,2],[3,4],[5,6] ])) Traceback (most recent call last): File "", line 1, in TypeError: unhashable type: 'list' There's no problem doing this with itertools.permutations: >>> list(permutations([[1,2],[3,4],[5,6]])) [([1, 2], [3, 4], [5, 6]), ([1, 2], [5, 6], [3, 4]), ([3, 4], [1, 2], [5, 6]), ([3, 4], [5, 6], [1, 2]), ([5, 6], [1, 2], [3, 4]), ([5, 6], [3, 4], [1, 2])] This particular one also succeeds with my nonredundant_permutations: >>> list(nonredundant_permutations([[1,2],[3,4],[5,6]])) [([1, 2], [3, 4], [5, 6]), ([1, 2], [5, 6], [3, 4]), ([3, 4], [1, 2], [5, 6]), ([3, 4], [5, 6], [1, 2]), ([5, 6], [1, 2], [3, 4]), ([5, 6], [3, 4], [1, 2])] However, my version *DOES* fail when things cannot be compared under inequality: >>> list(nonredundant_permutations([[1,2],3,4])) Traceback (most recent call last): File "", line 1, in File "", line 3, in nonredundant_permutations TypeError: unorderable types: int() < list() This also doesn't afflict itertools.permutations. Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Oct 12 04:37:02 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 11 Oct 2013 22:37:02 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <20131012020647.GH7989@ando> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> Message-ID: I think it's pretty indisputable that permutations are formally defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet). I believe most people will be surprised that len(permutations(iterable)) does count unique permutations. Best, Neil On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano wrote: > On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote: > > "It is universally agreed that a list of n distinct symbols has n! > > permutations. However, when the symbols are not distinct, the most common > > convention, in mathematics and elsewhere, seems to be to count only > > distinct permutations." ? > > I dispute this entire premise. Take a simple (and stereotypical) > example, picking balls from an urn. > > Say that you have three Red and Two black balls, and randomly select > without replacement. If you count only unique permutations, you get only > four possibilities: > > py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) > {'BR', 'RB', 'RR', 'BB'} > > which implies that drawing RR is no more likely than drawing BB, which > is incorrect. The right way to model this experiment is not to count > distinct permutations, but actual permutations: > > py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) > ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', > 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] > > which makes it clear that there are two ways of drawing BB compared to > six ways of drawing RR. If that's not obvious enough, consider the case > where you have two thousand red balls and two black balls -- do you > really conclude that there are the same number of ways to pick RR as BB? > > So I disagree that counting only distinct permutations is the most > useful or common convention. If you're permuting a collection of > non-distinct values, you should expect non-distinct permutations. > > I'm trying to think of a realistic, physical situation where you would > only want distinct permutations, and I can't. > > > > Should we consider fixing itertools.permutations and to output only > unique > > permutations (if possible, although I realize that would break code). > > Absolutely not. Even if you were right that it should return unique > permutations, and I strongly disagree that you were, the fact that it > would break code is a deal-breaker. > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Oct 12 04:48:23 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 11 Oct 2013 19:48:23 -0700 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> Message-ID: Related to, but not quite the same as Steven D'Aprano's point, I would find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items. This is why I've raised the example of 'items=[Fraction(3,1), Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason. On the other hand, I can see a little bit of sense that 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences. My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones. On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar wrote: > I think it's pretty indisputable that permutations are formally defined > this way (and I challenge you to find a source that doesn't agree with > that). I'm sure you know that your idea of using permutations to evaluate > a multinomial distribution is not efficient. A nicer way to evaluate > probabilities is to pass your set through a collections.Counter, and then > use the resulting dictionary with scipy.stats.multinomial (if it exists > yet). > > I believe most people will be surprised that len(permutations(iterable)) > does count unique permutations. > > Best, > > Neil > > > On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano wrote: > >> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote: >> > "It is universally agreed that a list of n distinct symbols has n! >> > permutations. However, when the symbols are not distinct, the most >> common >> > convention, in mathematics and elsewhere, seems to be to count only >> > distinct permutations." ? >> >> I dispute this entire premise. Take a simple (and stereotypical) >> example, picking balls from an urn. >> >> Say that you have three Red and Two black balls, and randomly select >> without replacement. If you count only unique permutations, you get only >> four possibilities: >> >> py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) >> {'BR', 'RB', 'RR', 'BB'} >> >> which implies that drawing RR is no more likely than drawing BB, which >> is incorrect. The right way to model this experiment is not to count >> distinct permutations, but actual permutations: >> >> py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) >> ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', >> 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] >> >> which makes it clear that there are two ways of drawing BB compared to >> six ways of drawing RR. If that's not obvious enough, consider the case >> where you have two thousand red balls and two black balls -- do you >> really conclude that there are the same number of ways to pick RR as BB? >> >> So I disagree that counting only distinct permutations is the most >> useful or common convention. If you're permuting a collection of >> non-distinct values, you should expect non-distinct permutations. >> >> I'm trying to think of a realistic, physical situation where you would >> only want distinct permutations, and I can't. >> >> >> > Should we consider fixing itertools.permutations and to output only >> unique >> > permutations (if possible, although I realize that would break code). >> >> Absolutely not. Even if you were right that it should return unique >> permutations, and I strongly disagree that you were, the fact that it >> would break code is a deal-breaker. >> >> >> >> -- >> Steven >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> -- >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> python-ideas+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Oct 12 04:55:06 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 11 Oct 2013 22:55:06 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> Message-ID: I honestly think that Python should stick to the mathematical definition of permutations rather than some kind of consensus of the tiny minority of people here. When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense. The fact that dict and set use equality is I think the reason that permutations should use equality. Neil On Fri, Oct 11, 2013 at 10:48 PM, David Mertz wrote: > Related to, but not quite the same as Steven D'Aprano's point, I would > find it very strange for itertools.permutations() to return a list that was > narrowed to equal-but-not-identical items. > > This is why I've raised the example of 'items=[Fraction(3,1), > Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and > float for distinct reasons to get different behaviors and available > methods. When I want to look for the permutations of those I don't want > "any old random choice of equal values" since presumably I've given them a > type for a reason. > > On the other hand, I can see a little bit of sense that > 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me > a list of 7!==5040 things that are exactly the same as each other. On the > other hand, I don't know how to generalize that, since my feeling is far > less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's > redundancy, but there's also important information in the probability and > count of specific sequences. > > My feeling, however, is that if one were to trim down the results from a > permutations-related function, it is more interesting to me to only > eliminate IDENTICAL items, not to eliminate merely EQUAL ones. > > > On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar wrote: > >> I think it's pretty indisputable that permutations are formally defined >> this way (and I challenge you to find a source that doesn't agree with >> that). I'm sure you know that your idea of using permutations to evaluate >> a multinomial distribution is not efficient. A nicer way to evaluate >> probabilities is to pass your set through a collections.Counter, and then >> use the resulting dictionary with scipy.stats.multinomial (if it exists >> yet). >> >> I believe most people will be surprised that len(permutations(iterable)) >> does count unique permutations. >> >> Best, >> >> Neil >> >> >> On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano wrote: >> >>> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote: >>> > "It is universally agreed that a list of n distinct symbols has n! >>> > permutations. However, when the symbols are not distinct, the most >>> common >>> > convention, in mathematics and elsewhere, seems to be to count only >>> > distinct permutations." ? >>> >>> I dispute this entire premise. Take a simple (and stereotypical) >>> example, picking balls from an urn. >>> >>> Say that you have three Red and Two black balls, and randomly select >>> without replacement. If you count only unique permutations, you get only >>> four possibilities: >>> >>> py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) >>> {'BR', 'RB', 'RR', 'BB'} >>> >>> which implies that drawing RR is no more likely than drawing BB, which >>> is incorrect. The right way to model this experiment is not to count >>> distinct permutations, but actual permutations: >>> >>> py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) >>> ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', >>> 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] >>> >>> which makes it clear that there are two ways of drawing BB compared to >>> six ways of drawing RR. If that's not obvious enough, consider the case >>> where you have two thousand red balls and two black balls -- do you >>> really conclude that there are the same number of ways to pick RR as BB? >>> >>> So I disagree that counting only distinct permutations is the most >>> useful or common convention. If you're permuting a collection of >>> non-distinct values, you should expect non-distinct permutations. >>> >>> I'm trying to think of a realistic, physical situation where you would >>> only want distinct permutations, and I can't. >>> >>> >>> > Should we consider fixing itertools.permutations and to output only >>> unique >>> > permutations (if possible, although I realize that would break code). >>> >>> Absolutely not. Even if you were right that it should return unique >>> permutations, and I strongly disagree that you were, the fact that it >>> would break code is a deal-breaker. >>> >>> >>> >>> -- >>> Steven >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >>> -- >>> >>> --- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "python-ideas" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> python-ideas+unsubscribe at googlegroups.com. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Oct 12 04:57:08 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 11 Oct 2013 19:57:08 -0700 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> Message-ID: <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com> On Oct 11, 2013, at 19:48, David Mertz wrote: > My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones. I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on. In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case. From ncoghlan at gmail.com Sat Oct 12 06:35:13 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 12 Oct 2013 14:35:13 +1000 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> Message-ID: On 12 Oct 2013 12:56, "Neil Girdhar" wrote: > > I honestly think that Python should stick to the mathematical definition of permutations rather than some kind of consensus of the tiny minority of people here. When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense. The fact that dict and set use equality is I think the reason that permutations should use equality. Why should the behaviour of hash based containers limit the behaviour of itertools? Python required a permutation solution that is memory efficient and works with arbitrary objects, so that's what itertools provides. However, you'd also like a memory efficient iterator for *mathematical* permutations that pays attention to object values and filters out equivalent results. I *believe* the request is equivalent to giving a name to the following genexp: (k for k, grp in groupby(permutations(sorted(input)))) That's a reasonable enough request (although perhaps more suited to the recipes section in the itertools docs), but conflating it with complaints about the way the existing iterator works is a good way to get people to ignore you (especially now the language specific reasons for the current behaviour have been pointed out, along with confirmation of the fact that backwards compatibility requirements would prohibit changing it even if we wanted to). Cheers, Nick. > > Neil > > > On Fri, Oct 11, 2013 at 10:48 PM, David Mertz wrote: >> >> Related to, but not quite the same as Steven D'Aprano's point, I would find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items. >> >> This is why I've raised the example of 'items=[Fraction(3,1), Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason. >> >> On the other hand, I can see a little bit of sense that 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences. >> >> My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones. >> >> >> On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar wrote: >>> >>> I think it's pretty indisputable that permutations are formally defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet). >>> >>> I believe most people will be surprised that len(permutations(iterable)) does count unique permutations. >>> >>> Best, >>> >>> Neil >>> >>> >>> On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano wrote: >>>> >>>> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote: >>>> > "It is universally agreed that a list of n distinct symbols has n! >>>> > permutations. However, when the symbols are not distinct, the most common >>>> > convention, in mathematics and elsewhere, seems to be to count only >>>> > distinct permutations." ? >>>> >>>> I dispute this entire premise. Take a simple (and stereotypical) >>>> example, picking balls from an urn. >>>> >>>> Say that you have three Red and Two black balls, and randomly select >>>> without replacement. If you count only unique permutations, you get only >>>> four possibilities: >>>> >>>> py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) >>>> {'BR', 'RB', 'RR', 'BB'} >>>> >>>> which implies that drawing RR is no more likely than drawing BB, which >>>> is incorrect. The right way to model this experiment is not to count >>>> distinct permutations, but actual permutations: >>>> >>>> py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) >>>> ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', >>>> 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] >>>> >>>> which makes it clear that there are two ways of drawing BB compared to >>>> six ways of drawing RR. If that's not obvious enough, consider the case >>>> where you have two thousand red balls and two black balls -- do you >>>> really conclude that there are the same number of ways to pick RR as BB? >>>> >>>> So I disagree that counting only distinct permutations is the most >>>> useful or common convention. If you're permuting a collection of >>>> non-distinct values, you should expect non-distinct permutations. >>>> >>>> I'm trying to think of a realistic, physical situation where you would >>>> only want distinct permutations, and I can't. >>>> >>>> >>>> > Should we consider fixing itertools.permutations and to output only unique >>>> > permutations (if possible, although I realize that would break code). >>>> >>>> Absolutely not. Even if you were right that it should return unique >>>> permutations, and I strongly disagree that you were, the fact that it >>>> would break code is a deal-breaker. >>>> >>>> >>>> >>>> -- >>>> Steven >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> >>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. >>>> To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe at googlegroups.com. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>> >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> >> >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat Oct 12 07:10:21 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 12 Oct 2013 14:10:21 +0900 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> Message-ID: <87pprbgjlu.fsf@uwakimon.sk.tsukuba.ac.jp> Neil Girdhar writes: > I honestly think that Python should stick to the mathematical > definition of permutations rather than some kind of consensus of > the tiny minority of people here. Is there an agreed mathematical definition of permutations of *sequences*? Every definition I can find refers to permutations of *sets*. I think any categorist would agree that there are a large number of maps of _Sequence_ to _Set_, in particular the two obviously useful ones[1]: the one that takes each element of the sequence to a *different* element of the corresponding set, and the one that takes equal elements of the sequence to the *same* element of the corresponding set. The corresponding set need not be the underlying set of the sequence, and which one is appropriate presumably depends on applications. >?When next_permutation was added to C++, I believe the whole > standards committee discussed it and they came up with the thing > that makes the most sense. To the negligible (in several senses of the word) fraction of humanity that participates in C++ standardization. Python is not C++ (thanking all the Roman and Greek gods, and refusing to identify Zeus with Jupiter, nor Aphrodite with Venus). >?The fact that dict and set use equality is I think the reason that > permutations should use equality. Sequences are not sets, and dict is precisely the wrong example for you to use, since it makes exactly the point that values that are identical may be bound to several different keys. We don't unify keys in a dict just because the values are identical (or equal). Similar, in representing a sequence as a set, we use a set of ordered pairs, with the first component an unique integer indicating position, and the second the sequence element. Since there are several useful mathematical ways to convert sequences to sets, and in particular one very similar, if not identical, to the one you like is enshrined in the very convenient constructor set(), I think it's useful to leave it as it is. > It is universally agreed that a list of n distinct symbols has n! > permutations. But that's because there's really no sensible definition of "underlying set" for such a list except the set containing exactly the same elements as the list.[2] But there is no universal agreement that "permutations of a list" is a sensible phrase. For example, although the Wikipedia article Permutation refers to lists of permutations, linked list representations of data, to the "list of objects" for use in Cauchy's notation, and to the cycle representation as a list of sequences, it doesn't once refer to permutation of a list. They're obvious not averse to discussing lists, but the word use for the entity being permuted is invariably "set". Footnotes: [1] And some maps not terribly useful for our purposes, such as one that maps all sequences to a singleton. [2] A categorist would disagree, but that's not interesting. From mertz at gnosis.cx Sat Oct 12 07:26:07 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 11 Oct 2013 22:26:07 -0700 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> Message-ID: What you propose, Nick, is definitely different from the several functions that have been bandied about here. I.e. >>> def nick_permutations(items, r=None): ... return (k for k, grp in groupby(permutations(sorted(items),r))) >>> ["".join(p) for p in nonredundant_permutations('aaabb', 3)] ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba'] >>> ["".join(p) for p in nick_permutations('aaabb', 3)] ['aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'baa', 'bab', 'baa', 'bab', 'baa', 'bab', 'bba', 'baa', 'bab', 'baa', 'bab', 'baa', 'bab', 'bba'] >>> ["".join(p) for p in permutations('aaabb', 3)] ['aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba'] If I'm thinking of this right, what you give is equivalent to the initial flawed version of 'nonredundant_permutations()' that I suggested, which used '!=' rather than the correct '>' in comparing to the 'last' tuple. FWIW, I deliberately chose the name 'nonredundant_permutations' rather than MRAB's choice of 'unique_permutations' because I think what the filtering does is precisely NOT to give unique ones. Or rather, not to give ALL unique ones, but only those defined by equivalence (i.e. rather than identity). My name is ugly, and if there were to be a function like it in itertools, a better name should be found. But such a name should emphasize that it is "filter by equivalence classes" ... actually, maybe this suggests another function which is instead "filter by identity of tuples", potentially also added to itertools. On Fri, Oct 11, 2013 at 9:35 PM, Nick Coghlan wrote: > > On 12 Oct 2013 12:56, "Neil Girdhar" wrote: > > > > I honestly think that Python should stick to the mathematical definition > of permutations rather than some kind of consensus of the tiny minority of > people here. When next_permutation was added to C++, I believe the whole > standards committee discussed it and they came up with the thing that makes > the most sense. The fact that dict and set use equality is I think the > reason that permutations should use equality. > > Why should the behaviour of hash based containers limit the behaviour of > itertools? > > Python required a permutation solution that is memory efficient and works > with arbitrary objects, so that's what itertools provides. > > However, you'd also like a memory efficient iterator for *mathematical* > permutations that pays attention to object values and filters out > equivalent results. > > I *believe* the request is equivalent to giving a name to the following > genexp: > > (k for k, grp in groupby(permutations(sorted(input)))) > > That's a reasonable enough request (although perhaps more suited to the > recipes section in the itertools docs), but conflating it with complaints > about the way the existing iterator works is a good way to get people to > ignore you (especially now the language specific reasons for the current > behaviour have been pointed out, along with confirmation of the fact that > backwards compatibility requirements would prohibit changing it even if we > wanted to). > > Cheers, > Nick. > > > > > Neil > > > > > > On Fri, Oct 11, 2013 at 10:48 PM, David Mertz wrote: > >> > >> Related to, but not quite the same as Steven D'Aprano's point, I would > find it very strange for itertools.permutations() to return a list that was > narrowed to equal-but-not-identical items. > >> > >> This is why I've raised the example of 'items=[Fraction(3,1), > Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and > float for distinct reasons to get different behaviors and available > methods. When I want to look for the permutations of those I don't want > "any old random choice of equal values" since presumably I've given them a > type for a reason. > >> > >> On the other hand, I can see a little bit of sense that > 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me > a list of 7!==5040 things that are exactly the same as each other. On the > other hand, I don't know how to generalize that, since my feeling is far > less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's > redundancy, but there's also important information in the probability and > count of specific sequences. > >> > >> My feeling, however, is that if one were to trim down the results from > a permutations-related function, it is more interesting to me to only > eliminate IDENTICAL items, not to eliminate merely EQUAL ones. > >> > >> > >> On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar > wrote: > >>> > >>> I think it's pretty indisputable that permutations are formally > defined this way (and I challenge you to find a source that doesn't agree > with that). I'm sure you know that your idea of using permutations to > evaluate a multinomial distribution is not efficient. A nicer way to > evaluate probabilities is to pass your set through a collections.Counter, > and then use the resulting dictionary with scipy.stats.multinomial (if it > exists yet). > >>> > >>> I believe most people will be surprised that > len(permutations(iterable)) does count unique permutations. > >>> > >>> Best, > >>> > >>> Neil > >>> > >>> > >>> On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano > wrote: > >>>> > >>>> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote: > >>>> > "It is universally agreed that a list of n distinct symbols has n! > >>>> > permutations. However, when the symbols are not distinct, the most > common > >>>> > convention, in mathematics and elsewhere, seems to be to count only > >>>> > distinct permutations." ? > >>>> > >>>> I dispute this entire premise. Take a simple (and stereotypical) > >>>> example, picking balls from an urn. > >>>> > >>>> Say that you have three Red and Two black balls, and randomly select > >>>> without replacement. If you count only unique permutations, you get > only > >>>> four possibilities: > >>>> > >>>> py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) > >>>> {'BR', 'RB', 'RR', 'BB'} > >>>> > >>>> which implies that drawing RR is no more likely than drawing BB, which > >>>> is incorrect. The right way to model this experiment is not to count > >>>> distinct permutations, but actual permutations: > >>>> > >>>> py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) > >>>> ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', > 'RB', > >>>> 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] > >>>> > >>>> which makes it clear that there are two ways of drawing BB compared to > >>>> six ways of drawing RR. If that's not obvious enough, consider the > case > >>>> where you have two thousand red balls and two black balls -- do you > >>>> really conclude that there are the same number of ways to pick RR as > BB? > >>>> > >>>> So I disagree that counting only distinct permutations is the most > >>>> useful or common convention. If you're permuting a collection of > >>>> non-distinct values, you should expect non-distinct permutations. > >>>> > >>>> I'm trying to think of a realistic, physical situation where you would > >>>> only want distinct permutations, and I can't. > >>>> > >>>> > >>>> > Should we consider fixing itertools.permutations and to output only > unique > >>>> > permutations (if possible, although I realize that would break > code). > >>>> > >>>> Absolutely not. Even if you were right that it should return unique > >>>> permutations, and I strongly disagree that you were, the fact that it > >>>> would break code is a deal-breaker. > >>>> > >>>> > >>>> > >>>> -- > >>>> Steven > >>>> _______________________________________________ > >>>> Python-ideas mailing list > >>>> Python-ideas at python.org > >>>> https://mail.python.org/mailman/listinfo/python-ideas > >>>> > >>>> -- > >>>> > >>>> --- > >>>> You received this message because you are subscribed to a topic in > the Google Groups "python-ideas" group. > >>>> To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. > >>>> To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > >>>> For more options, visit https://groups.google.com/groups/opt_out. > >>> > >>> > >>> > >>> _______________________________________________ > >>> Python-ideas mailing list > >>> Python-ideas at python.org > >>> https://mail.python.org/mailman/listinfo/python-ideas > >>> > >> > >> > >> > >> -- > >> Keeping medicines from the bloodstreams of the sick; food > >> from the bellies of the hungry; books from the hands of the > >> uneducated; technology from the underdeveloped; and putting > >> advocates of freedom in prisons. Intellectual property is > >> to the 21st century what the slave trade was to the 16th. > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Oct 12 07:38:19 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 11 Oct 2013 22:38:19 -0700 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com> Message-ID: Hi Andrew, I've sort of said as much in my last reply to Nick. But maybe I can clarify further. I can imagine *someone* wanting a filtering of permutations by either identify or equality. Maybe, in fact, by other comparisons also for generality. This might suggest an API like the following: equal_perms = distinct_permutations(items, r, filter_by=operator.eq) ident_perms = distinct_permutations(items, r, filter_by=operator.is_) Or even perhaps, in some use-case that isn't clear to me, e.g. start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: a[0]==b[0]) Or perhaps more plausibly, some predicate that, e.g. tests if two returned tuples are the same under case normalization of the strings within them. I guess the argument then would be what the default value of 'filter_by' might be... but that seems less important to me if there were an option to pass a predicate as you liked. On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert wrote: > On Oct 11, 2013, at 19:48, David Mertz wrote: > > > My feeling, however, is that if one were to trim down the results from a > permutations-related function, it is more interesting to me to only > eliminate IDENTICAL items, not to eliminate merely EQUAL ones. > > I agree with the rest of your message, but I still think you're wrong > here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating > the two values the same would be equally surprised by {3.0, 3} having only > one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And > so on. > > In Python, sets, dict keys, groups, etc. work by ==. That was a choice > that could have been made differently, but Python made that choice long > ago, and has applied it completely consistently, and it would be very > strange to choose differently in this case. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Oct 12 07:48:25 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 11 Oct 2013 22:48:25 -0700 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com> Message-ID: Btw. My implementation of nonredundant_permutations *IS* guaranteed to work by the docs for Python 3.4. Actually for Python 2.7+. That is, it's not just an implementation accident (as I thought before I checked), but a promised API of itertools.permutations that: Permutations are emitted in lexicographic sort order. So, if the input iterable is sorted, the permutation tuples will be produced in sorted order. As long as that holds, my function will indeed behave correctly (but of course, with the limitation that it blows up if different items in the argument iterable cannot be compared using operator.lt(). On Fri, Oct 11, 2013 at 10:38 PM, David Mertz wrote: > Hi Andrew, > > I've sort of said as much in my last reply to Nick. But maybe I can > clarify further. I can imagine *someone* wanting a filtering of > permutations by either identify or equality. Maybe, in fact, by other > comparisons also for generality. > > This might suggest an API like the following: > > equal_perms = distinct_permutations(items, r, filter_by=operator.eq) > ident_perms = distinct_permutations(items, r, filter_by=operator.is_) > > Or even perhaps, in some use-case that isn't clear to me, e.g. > > start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: > a[0]==b[0]) > > Or perhaps more plausibly, some predicate that, e.g. tests if two returned > tuples are the same under case normalization of the strings within them. > > I guess the argument then would be what the default value of 'filter_by' > might be... but that seems less important to me if there were an option to > pass a predicate as you liked. > > > > On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert wrote: > >> On Oct 11, 2013, at 19:48, David Mertz wrote: >> >> > My feeling, however, is that if one were to trim down the results from >> a permutations-related function, it is more interesting to me to only >> eliminate IDENTICAL items, not to eliminate merely EQUAL ones. >> >> I agree with the rest of your message, but I still think you're wrong >> here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating >> the two values the same would be equally surprised by {3.0, 3} having only >> one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And >> so on. >> >> In Python, sets, dict keys, groups, etc. work by ==. That was a choice >> that could have been made differently, but Python made that choice long >> ago, and has applied it completely consistently, and it would be very >> strange to choose differently in this case. > > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Oct 12 08:34:46 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 12 Oct 2013 17:34:46 +1100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> Message-ID: <20131012063445.GI7989@ando> On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote: > I honestly think that Python should stick to the mathematical definition of > permutations rather than some kind of consensus of the tiny minority of > people here. So do I. And that is exactly what itertools.permutations already does. -- Steven From mistersheik at gmail.com Sat Oct 12 08:55:25 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 12 Oct 2013 02:55:25 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> Message-ID: Hi Nick, Rereading my messages, I feel like I haven't been as diplomatic as I wanted. Like everyone here, I care a lot about Python and I want to see it become as perfect as it can be made. If my wording has been too strong, it's only out of passion for Python. I acknowledged in my initial request that it would be impossible to change the default behaviour of itertools.permutations. I understand that that ship has sailed. I think my best proposal is to have an efficient distinct_permutations function in itertools. It should be in itertools so that it is discoverable. It should be a function rather one of the recipes proposed to make it as efficient as possible. (Correct me if I'm wrong, but like the set solution, groupby is also not so efficient.) I welcome the discussion and hope that the most efficient implementation someone here comes up with will be added one day to itertools. Best, Neil On Sat, Oct 12, 2013 at 12:35 AM, Nick Coghlan wrote: > > On 12 Oct 2013 12:56, "Neil Girdhar" wrote: > > > > I honestly think that Python should stick to the mathematical definition > of permutations rather than some kind of consensus of the tiny minority of > people here. When next_permutation was added to C++, I believe the whole > standards committee discussed it and they came up with the thing that makes > the most sense. The fact that dict and set use equality is I think the > reason that permutations should use equality. > > Why should the behaviour of hash based containers limit the behaviour of > itertools? > > Python required a permutation solution that is memory efficient and works > with arbitrary objects, so that's what itertools provides. > > However, you'd also like a memory efficient iterator for *mathematical* > permutations that pays attention to object values and filters out > equivalent results. > > I *believe* the request is equivalent to giving a name to the following > genexp: > > (k for k, grp in groupby(permutations(sorted(input)))) > > That's a reasonable enough request (although perhaps more suited to the > recipes section in the itertools docs), but conflating it with complaints > about the way the existing iterator works is a good way to get people to > ignore you (especially now the language specific reasons for the current > behaviour have been pointed out, along with confirmation of the fact that > backwards compatibility requirements would prohibit changing it even if we > wanted to). > > Cheers, > Nick. > > > > > Neil > > > > > > On Fri, Oct 11, 2013 at 10:48 PM, David Mertz wrote: > >> > >> Related to, but not quite the same as Steven D'Aprano's point, I would > find it very strange for itertools.permutations() to return a list that was > narrowed to equal-but-not-identical items. > >> > >> This is why I've raised the example of 'items=[Fraction(3,1), > Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and > float for distinct reasons to get different behaviors and available > methods. When I want to look for the permutations of those I don't want > "any old random choice of equal values" since presumably I've given them a > type for a reason. > >> > >> On the other hand, I can see a little bit of sense that > 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me > a list of 7!==5040 things that are exactly the same as each other. On the > other hand, I don't know how to generalize that, since my feeling is far > less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's > redundancy, but there's also important information in the probability and > count of specific sequences. > >> > >> My feeling, however, is that if one were to trim down the results from > a permutations-related function, it is more interesting to me to only > eliminate IDENTICAL items, not to eliminate merely EQUAL ones. > >> > >> > >> On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar > wrote: > >>> > >>> I think it's pretty indisputable that permutations are formally > defined this way (and I challenge you to find a source that doesn't agree > with that). I'm sure you know that your idea of using permutations to > evaluate a multinomial distribution is not efficient. A nicer way to > evaluate probabilities is to pass your set through a collections.Counter, > and then use the resulting dictionary with scipy.stats.multinomial (if it > exists yet). > >>> > >>> I believe most people will be surprised that > len(permutations(iterable)) does count unique permutations. > >>> > >>> Best, > >>> > >>> Neil > >>> > >>> > >>> On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano > wrote: > >>>> > >>>> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote: > >>>> > "It is universally agreed that a list of n distinct symbols has n! > >>>> > permutations. However, when the symbols are not distinct, the most > common > >>>> > convention, in mathematics and elsewhere, seems to be to count only > >>>> > distinct permutations." ? > >>>> > >>>> I dispute this entire premise. Take a simple (and stereotypical) > >>>> example, picking balls from an urn. > >>>> > >>>> Say that you have three Red and Two black balls, and randomly select > >>>> without replacement. If you count only unique permutations, you get > only > >>>> four possibilities: > >>>> > >>>> py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) > >>>> {'BR', 'RB', 'RR', 'BB'} > >>>> > >>>> which implies that drawing RR is no more likely than drawing BB, which > >>>> is incorrect. The right way to model this experiment is not to count > >>>> distinct permutations, but actual permutations: > >>>> > >>>> py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) > >>>> ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', > 'RB', > >>>> 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] > >>>> > >>>> which makes it clear that there are two ways of drawing BB compared to > >>>> six ways of drawing RR. If that's not obvious enough, consider the > case > >>>> where you have two thousand red balls and two black balls -- do you > >>>> really conclude that there are the same number of ways to pick RR as > BB? > >>>> > >>>> So I disagree that counting only distinct permutations is the most > >>>> useful or common convention. If you're permuting a collection of > >>>> non-distinct values, you should expect non-distinct permutations. > >>>> > >>>> I'm trying to think of a realistic, physical situation where you would > >>>> only want distinct permutations, and I can't. > >>>> > >>>> > >>>> > Should we consider fixing itertools.permutations and to output only > unique > >>>> > permutations (if possible, although I realize that would break > code). > >>>> > >>>> Absolutely not. Even if you were right that it should return unique > >>>> permutations, and I strongly disagree that you were, the fact that it > >>>> would break code is a deal-breaker. > >>>> > >>>> > >>>> > >>>> -- > >>>> Steven > >>>> _______________________________________________ > >>>> Python-ideas mailing list > >>>> Python-ideas at python.org > >>>> https://mail.python.org/mailman/listinfo/python-ideas > >>>> > >>>> -- > >>>> > >>>> --- > >>>> You received this message because you are subscribed to a topic in > the Google Groups "python-ideas" group. > >>>> To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. > >>>> To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > >>>> For more options, visit https://groups.google.com/groups/opt_out. > >>> > >>> > >>> > >>> _______________________________________________ > >>> Python-ideas mailing list > >>> Python-ideas at python.org > >>> https://mail.python.org/mailman/listinfo/python-ideas > >>> > >> > >> > >> > >> -- > >> Keeping medicines from the bloodstreams of the sick; food > >> from the bellies of the hungry; books from the hands of the > >> uneducated; technology from the underdeveloped; and putting > >> advocates of freedom in prisons. Intellectual property is > >> to the 21st century what the slave trade was to the 16th. > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Oct 12 09:02:47 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 12 Oct 2013 03:02:47 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com> Message-ID: Why not just use the standard python way to generalize this: "key" rather than the nonstandard "filter_by". On Sat, Oct 12, 2013 at 1:38 AM, David Mertz wrote: > Hi Andrew, > > I've sort of said as much in my last reply to Nick. But maybe I can > clarify further. I can imagine *someone* wanting a filtering of > permutations by either identify or equality. Maybe, in fact, by other > comparisons also for generality. > > This might suggest an API like the following: > > equal_perms = distinct_permutations(items, r, filter_by=operator.eq) > ident_perms = distinct_permutations(items, r, filter_by=operator.is_) > > Or even perhaps, in some use-case that isn't clear to me, e.g. > > start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: > a[0]==b[0]) > > Or perhaps more plausibly, some predicate that, e.g. tests if two returned > tuples are the same under case normalization of the strings within them. > > I guess the argument then would be what the default value of 'filter_by' > might be... but that seems less important to me if there were an option to > pass a predicate as you liked. > > > > On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert wrote: > >> On Oct 11, 2013, at 19:48, David Mertz wrote: >> >> > My feeling, however, is that if one were to trim down the results from >> a permutations-related function, it is more interesting to me to only >> eliminate IDENTICAL items, not to eliminate merely EQUAL ones. >> >> I agree with the rest of your message, but I still think you're wrong >> here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating >> the two values the same would be equally surprised by {3.0, 3} having only >> one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And >> so on. >> >> In Python, sets, dict keys, groups, etc. work by ==. That was a choice >> that could have been made differently, but Python made that choice long >> ago, and has applied it completely consistently, and it would be very >> strange to choose differently in this case. > > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Oct 12 09:09:32 2013 From: mertz at gnosis.cx (David Mertz) Date: Sat, 12 Oct 2013 00:09:32 -0700 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com> Message-ID: On Sat, Oct 12, 2013 at 12:02 AM, Neil Girdhar wrote: > Why not just use the standard python way to generalize this: "key" rather > than the nonstandard "filter_by". > Yes, 'key' is a much better name than what I suggested. I'm not quite sure how best to implement this still. I guess MRAB's recursive approach should work, even though I like the simplicity of my style that takes full advantage of the existing itertools.permutations() (and uses 1/3 as many lines of--I think clearer--code). His has the advantage, however, that it doesn't require operator.lt() to work... however, without benchmarking, I have a pretty strong feeling that my suggestion will be faster since it avoids all that recursive call overhead. Maybe I'm wrong about that though. > On Sat, Oct 12, 2013 at 1:38 AM, David Mertz wrote: > >> Hi Andrew, >> >> I've sort of said as much in my last reply to Nick. But maybe I can >> clarify further. I can imagine *someone* wanting a filtering of >> permutations by either identify or equality. Maybe, in fact, by other >> comparisons also for generality. >> >> This might suggest an API like the following: >> >> equal_perms = distinct_permutations(items, r, filter_by=operator.eq) >> ident_perms = distinct_permutations(items, r, filter_by=operator.is_) >> >> Or even perhaps, in some use-case that isn't clear to me, e.g. >> >> start_same_perms = distinct_permutations(items, r, filter_by=lambda >> a,b: a[0]==b[0]) >> >> Or perhaps more plausibly, some predicate that, e.g. tests if two >> returned tuples are the same under case normalization of the strings within >> them. >> >> I guess the argument then would be what the default value of 'filter_by' >> might be... but that seems less important to me if there were an option to >> pass a predicate as you liked. >> >> >> >> On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert wrote: >> >>> On Oct 11, 2013, at 19:48, David Mertz wrote: >>> >>> > My feeling, however, is that if one were to trim down the results from >>> a permutations-related function, it is more interesting to me to only >>> eliminate IDENTICAL items, not to eliminate merely EQUAL ones. >>> >>> I agree with the rest of your message, but I still think you're wrong >>> here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating >>> the two values the same would be equally surprised by {3.0, 3} having only >>> one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And >>> so on. >>> >>> In Python, sets, dict keys, groups, etc. work by ==. That was a choice >>> that could have been made differently, but Python made that choice long >>> ago, and has applied it completely consistently, and it would be very >>> strange to choose differently in this case. >> >> >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Oct 12 09:17:43 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 12 Oct 2013 03:17:43 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <20131012063445.GI7989@ando> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> Message-ID: I'm sorry, but I can't find a reference supporting the statement that the current permutations function is consistent with the mathematical definition. Perhaps you would like to find a reference? A quick search yielded the book "the Combinatorics of Permutations": http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1 Please look in the chapter "Permutation of multisets". Best, Neil On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano wrote: > On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote: > > I honestly think that Python should stick to the mathematical definition > of > > permutations rather than some kind of consensus of the tiny minority of > > people here. > > So do I. And that is exactly what itertools.permutations already does. > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Oct 12 09:35:31 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 12 Oct 2013 18:35:31 +1100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> Message-ID: <20131012073531.GJ7989@ando> On Fri, Oct 11, 2013 at 10:37:02PM -0400, Neil Girdhar wrote: > I think it's pretty indisputable that permutations are formally defined > this way (and I challenge you to find a source that doesn't agree with > that). If by "this way" you mean "unique permutations only", then yes it *completely* disputable, and I am doing so right now. I'm not arguing one way or the other for a separate "unique_permutations" generator, just that the existing permutations generator does the right thing. If you're satisfied with that answer, you can stop reading now, because the rest of my post is going to be rather long: TL;DR: If you want a unique_permutations generator, that's a reasonable request. If you insist on changing permutations, that's unreasonable, firstly because the current behaviour is correct, and secondly because backwards compatibility would constrain it to keep the existing behaviour even if it were wrong. . . . Still here? Okay then, let me justify why I say the current behaviour is correct. Speaking as a math tutor who has taught High School level combinatorics for 20+ years, I've never come across any text book or source that defines permutations in terms of unique permutations only. In every case that I can remember, or that I still have access to, unique permutations is considered a different kind of operation ("permutations ignoring duplicates", if you like) rather than the default. E.g. "Modern Mathematics 6" by Fitzpatrick and Galbraith has a separate section for permutations with repetition, gives the example of taking permutations from the word "MAMMAL", and explicitly contrasts situations where you consider the three letters M as "different" from when you consider them "the same". But in all such cases, such a situation is discussed as a restriction on permutations, not an expansion, that is: * there are permutations; * sometimes you want to only consider unique permutations; rather than: * there are permutations, which are always unique; * sometimes you want to consider things which are like permutations except they're not necessarily unique. I'd even turn this around and challenge you to find a source that *does* define them as always unique. Here's a typical example, from the Collins Dictionary of Mathematics: [quote] **permutation** or **ordered arrangement** n. 1 an ordered arrangement of a specified number of objects selected from a set. The number of distinct permutations of r objects from n is n!/(n-r)! usually written n P r or n P r. For example there are six distinct permutations of two objects selected out of three: <1,2>, <1,3>, <2,1>, <2,3>, <3,1>, <3,2>. Compare COMBINATION. 2. any rearrangement of all the elements of a finite sequence, such as (1,3,2) and (3,1,2). It is *odd* or *even* according as the number of exchanges of position yielding it from the original order is odd or even. It is a *cyclic permutation* if it merely advances all the elements a fixed number of places; that is, if it is a CYCLE of maximal LENGTH. A *transposition* is a cycle of degree two, and all permutations factor as products of transpositions. See also SIGNATURE. 3. any BIJECTION of a set to itself, where the set may be finite or infinite. [end quote] The definition makes no comment about how to handle duplicate elements, but we can derive an answer for that: 1) We're told how many permutations there are. Picking r elements out of n gives us n!/(n-r)!. If you throw away duplicate permutations, you will fall short. 2) The number of permutations shouldn't depend on the specific entities being permuted. Permutations of (1, 2, 3, 4) and (A, B, C, D) should be identical. If your set of elements contains duplicates, such as (Red ball, Red ball, Red ball, Black ball, Black ball), we can put the balls into 1:1 correspondence with integers (1, 2, 3, 4, 5), permute the integers, then reverse the mapping to get balls again. If we do this, we ought to get the same result as just permuting the balls directly. (That's not to say that there are never cases where we don't care to distinguish betweem one red ball and another. But in general we do distinguish between them.) I think this argument may hinge on what you consider *distinct*. In this context, if I permute the string "RRRBB", I consider all three characters to be distinct. Object identity is an implementation detail (not all programming languages have "objects"); even equality is an irrelevant detail. If I'm choosing to permute "RRRBB" rather than "RB", then clearly *to me* there must be some distinguishing factor between the three Rs and two Bs. Another source is Wolfram Mathworld: http://mathworld.wolfram.com/Permutation.html which likewise says nothing about discarding repeated permutations when there are repeated elements. See also their page on "Ball Picking": http://mathworld.wolfram.com/BallPicking.html Last but not least, here's a source which clearly distinguishes permutations from "permutations with duplicates": http://mathcentral.uregina.ca/QQ/database/QQ.09.07/h/beth3.html and even gives a distinct formula for calculating the number of permutations. Neither Wolfram Mathworld nor the Collins Dictionary of Maths consider this formula important enough to mention, which suggests strongly that it should be considered separate from the default permutations. (A little like cyclic permutations, which are different again.) -- Steven From steve at pearwood.info Sat Oct 12 09:39:30 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 12 Oct 2013 18:39:30 +1100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <20131012073531.GJ7989@ando> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012073531.GJ7989@ando> Message-ID: <20131012073930.GK7989@ando> On Sat, Oct 12, 2013 at 06:35:31PM +1100, Steven D'Aprano wrote: > I think this argument may hinge on what you consider *distinct*. In this > context, if I permute the string "RRRBB", I consider all three > characters to be distinct. /s/three/five/ -- Steven From bauertomer at gmail.com Sat Oct 12 10:18:35 2013 From: bauertomer at gmail.com (TB) Date: Sat, 12 Oct 2013 11:18:35 +0300 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <20131012073531.GJ7989@ando> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012073531.GJ7989@ando> Message-ID: <525905DB.8070509@gmail.com> On 10/12/2013 10:35 AM, Steven D'Aprano wrote: > If you want a unique_permutations generator, that's a reasonable > request. If you insist on changing permutations, that's unreasonable, > firstly because the current behaviour is correct, and secondly because > backwards compatibility would constrain it to keep the existing > behaviour even if it were wrong. > I agree that backwards compatibility should be kept, but the current behaviour of itertools.permutations is (IMHO) surprising. So here are my 2c: Until I tried it myself, I was sure that it will be like the corresponding permutations functions in Sage: sage: list(Permutations("aba")) [['a', 'a', 'b'], ['a', 'b', 'a'], ['b', 'a', 'a']] or Mathematica: http://www.wolframalpha.com/input/?i=permutations+of+{a%2C+b%2C+a} Currently the docstring of itertools.permutations just says "Return successive r-length permutations of elements in the iterable", without telling what happens with input of repeated elements. The full doc in the reference manual is better in that regard, but I think at least one example with repeated elements would be nice. Regards, TB From mistersheik at gmail.com Sat Oct 12 10:20:24 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 12 Oct 2013 04:20:24 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <525905DB.8070509@gmail.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012073531.GJ7989@ando> <525905DB.8070509@gmail.com> Message-ID: +1 On Sat, Oct 12, 2013 at 4:18 AM, TB wrote: > On 10/12/2013 10:35 AM, Steven D'Aprano wrote: > >> If you want a unique_permutations generator, that's a reasonable >> request. If you insist on changing permutations, that's unreasonable, >> firstly because the current behaviour is correct, and secondly because >> backwards compatibility would constrain it to keep the existing >> behaviour even if it were wrong. >> >> I agree that backwards compatibility should be kept, but the current > behaviour of itertools.permutations is (IMHO) surprising. > > So here are my 2c: Until I tried it myself, I was sure that it will be > like the corresponding permutations functions in Sage: > > sage: list(Permutations("aba")) > [['a', 'a', 'b'], ['a', 'b', 'a'], ['b', 'a', 'a']] > > or Mathematica: http://www.wolframalpha.com/**input/?i=permutations+of+{a% > **2C+b%2C+a} > > Currently the docstring of itertools.permutations just says "Return > successive r-length permutations of elements in the iterable", without > telling what happens with input of repeated elements. The full doc in the > reference manual is better in that regard, but I think at least one example > with repeated elements would be nice. > > Regards, > TB > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/**mailman/listinfo/python-ideas > > -- > > --- You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit https://groups.google.com/d/** > topic/python-ideas/**dDttJfkyu2k/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe@**googlegroups.com > . > For more options, visit https://groups.google.com/**groups/opt_out > . > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Oct 12 10:22:59 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 12 Oct 2013 01:22:59 -0700 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> Message-ID: On Oct 11, 2013, at 23:55, Neil Girdhar wrote: > I think my best proposal is to have an efficient distinct_permutations function in itertools. It should be in itertools so that it is discoverable. It should be a function rather one of the recipes proposed to make it as efficient as possible. (Correct me if I'm wrong, but like the set solution, groupby is also not so efficient.) > > I welcome the discussion and hope that the most efficient implementation someone here comes up with will be added one day to itertools. I think getting something onto PyPI (whether as part of more-itertools or elsewhere) and/or the ActiveState recipes (and maybe StackOverflow and CodeReview) is the best way to get from here to there. Continuing to discuss it here, you've only got the half dozen or so people who are on this list and haven't tuned out this thread to come up with the most efficient implementation. Put it out in the world and people will begin giving you comments/bug reports/rants calling you an idiot for missing the obvious more efficient way to do it, and then you can use their code. And then, when you're satisfied with it, you have a concrete proposal for something to add to itertools in python X.Y+1 instead of some implementation to be named later to add one day. I was also going to suggest that you drop the argument about whether this is the one true definition of sequence permutation and just focus on whether it's a useful thing to have, but it looks like you're way ahead of me there, so never mind. From breamoreboy at yahoo.co.uk Sat Oct 12 10:28:55 2013 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 12 Oct 2013 09:28:55 +0100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <525905DB.8070509@gmail.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012073531.GJ7989@ando> <525905DB.8070509@gmail.com> Message-ID: On 12/10/2013 09:18, TB wrote: > Currently the docstring of itertools.permutations just says "Return > successive r-length permutations of elements in the iterable", without > telling what happens with input of repeated elements. The full doc in > the reference manual is better in that regard, but I think at least one > example with repeated elements would be nice. > > Regards, > TB I look forward to seeing your suggested doc patch on the Python bug tracker. -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence From tjreedy at udel.edu Sat Oct 12 10:41:48 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 12 Oct 2013 04:41:48 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <20131012073531.GJ7989@ando> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012073531.GJ7989@ando> Message-ID: On 10/12/2013 3:35 AM, Steven D'Aprano wrote: > I'd even turn this around and challenge you to find a source that *does* > define them as always unique. Here's a typical example, from the Collins > Dictionary of Mathematics: > > > [quote] > **permutation** or **ordered arrangement** n. 1 an ordered arrangement > of a specified number of objects selected from a set. The number of > distinct permutations of r objects from n is > > n!/(n-r)! > > usually written n P r or n P > r. For example there are six distinct permutations of two > objects selected out of three: <1,2>, <1,3>, <2,1>, <2,3>, <3,1>, <3,2>. > Compare COMBINATION. The items of a set are, by definition of a set, distinct, so the question of different but equal permutations does not arise. > 2. any rearrangement of all the elements of a finite sequence, such as > (1,3,2) and (3,1,2). It is *odd* or *even* according as the number of > exchanges of position yielding it from the original order is odd or > even. It is a *cyclic permutation* if it merely advances all the > elements a fixed number of places; that is, if it is a CYCLE of maximal > LENGTH. A *transposition* is a cycle of degree two, and all permutations > factor as products of transpositions. See also SIGNATURE. The items of a sequence may be duplicates. But in the treatments of permutations I have seen (admittedly not all of them), they are considered to be distinguished by position, so that one may replace the item by counts 1 to n and vice versa. > 3. any BIJECTION of a set to itself, where the set may be finite or > infinite. > [end quote] Back to a set of distinct items again. You are correct that itertools.permutations does the right thing by standard definition. > Last but not least, here's a source which clearly distinguishes > permutations from "permutations with duplicates": > > http://mathcentral.uregina.ca/QQ/database/QQ.09.07/h/beth3.html > > and even gives a distinct formula for calculating the number of > permutations. Neither Wolfram Mathworld nor the Collins Dictionary of > Maths consider this formula important enough to mention, which suggests > strongly that it should be considered separate from the default > permutations. The question is whether this particular variation is important inportant enough to put in itertools. It is not a combinatorics module and did not start with permutations. -- Terry Jan Reedy From ncoghlan at gmail.com Sat Oct 12 17:07:58 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Oct 2013 01:07:58 +1000 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> Message-ID: On 12 Oct 2013 17:18, "Neil Girdhar" wrote: > > I'm sorry, but I can't find a reference supporting the statement that the current permutations function is consistent with the mathematical definition. Perhaps you would like to find a reference? A quick search yielded the book "the Combinatorics of Permutations": http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1 Please look in the chapter "Permutation of multisets". Itertools effectively produces the permutation of (index, value) pairs. Hence Steven's point about the permutations of a list not being mathematically defined, so you have to decide what set to map it to in order to decide what counts as a unique value. The mapping itertools uses considers position in the iterable relevant so exchanging two values that are themselves equivalent is still considered a distinct permutation since their original position is taken into account. Like a lot of mathematics, it's a matter of paying close attention to which entities are actually being manipulated and how the equivalence classes are being defined :) Hence the current proposal amounts to adding another variant that provides the permutations of an unordered multiset instead of those of a set of (index, value) 2-tuples (with the indices stripped from the results). One interesting point is that combining collections.Counter.elements() with itertools.permutations() currently does the wrong thing, since itertools.permutations() *always* considers iterable order significant, while for collections.Counter.elements() it's explicitly arbitrary. Cheers, Nick. > > Best, > > Neil > > > On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano wrote: >> >> On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote: >> > I honestly think that Python should stick to the mathematical definition of >> > permutations rather than some kind of consensus of the tiny minority of >> > people here. >> >> So do I. And that is exactly what itertools.permutations already does. >> >> >> >> -- >> Steven >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> -- >> >> --- >> You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Oct 12 18:55:31 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 12 Oct 2013 17:55:31 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> Message-ID: <52597F03.90509@mrabarnett.plus.com> On 12/10/2013 03:36, David Mertz wrote: > Hi MRAB, > > I'm confused by your implementation. In particular, what do these lines do? > > # [...] > items = list(iterable) > keys = {} > for item in items: > keys.setdefault(item, len(keys)) > items.sort(key=keys.get) > > I cannot understand how these can possibly have any effect (other than > the first line that makes a concrete list out of an iterable). > > We loop through the list in its natural order. E.g. say the list is > '[a, b, c]' (where those names are any types of objects whatsoever). > The loop gives us: > > keys == {a:0, b:1, c:2} > > When we do a sort on 'key=keys.get()' how can that ever possibly change > the order of 'items'? > You're assuming that no item is equal to any other. Try this: keys = {} for item in [1, 2, 2.0]: keys.setdefault(item, len(keys)) You'll get: keys == {1: 0, 2: 1} because 2 == 2.0. > There's also a bit of a flaw in that your implementation blows up if > anything yielded by iterable isn't hashable: > > >>> list(unique_permutations([ [1,2],[3,4],[5,6] ])) > Traceback (most recent call last): > File "", line 1, in > TypeError: unhashable type: 'list' > That is true, so here is yet another implementation: ----8<----------------------------------------8<---- def distinct_permutations(iterable, count=None): def perm(items, count): if count: prev_item = object() for i, item in enumerate(items): if item != prev_item: for p in perm(items[ : i] + items[i + 1 : ], count - 1): yield [item] + p prev_item = item else: yield [] hashable_items = {} unhashable_items = [] for item in iterable: try: hashable_items[item].append(item) except KeyError: hashable_items[item] = [item] except TypeError: for key, values in unhashable_items: if key == item: values.append(item) break else: unhashable_items.append((item, [item])) items = [] for values in hashable_items.values(): items.extend(values) for key, values in unhashable_items: items.extend(values) if count is None: count = len(items) yield from perm(items, count) ----8<----------------------------------------8<---- It uses a dict for speed, with the fallback of a list for unhashable items. > > >>> list(permutations([[1,2],[3,4],[5,6]])) > [([1, 2], [3, 4], [5, 6]), ([1, 2], [5, 6], [3, 4]), ([3, 4], [1, > 2], [5, 6]), > ([3, 4], [5, 6], [1, 2]), ([5, 6], [1, 2], [3, 4]), ([5, 6], [3, > 4], [1, 2])] > > This particular one also succeeds with my nonredundant_permutations: > > >>> list(nonredundant_permutations([[1,2],[3,4],[5,6]])) > [([1, 2], [3, 4], [5, 6]), ([1, 2], [5, 6], [3, 4]), ([3, 4], [1, > 2], [5, 6]), > ([3, 4], [5, 6], [1, 2]), ([5, 6], [1, 2], [3, 4]), ([5, 6], [3, > 4], [1, 2])] > My result is: >>> list(distinct_permutations([[1,2],[3,4],[5,6]])) [[[1, 2], [3, 4], [5, 6]], [[1, 2], [5, 6], [3, 4]], [[3, 4], [1, 2], [5, 6]], [[3, 4], [5, 6], [1, 2]], [[5, 6], [1, 2], [3, 4]], [[5, 6], [3, 4], [1, 2]]] > However, my version *DOES* fail when things cannot be compared under > inequality: > > >>> list(nonredundant_permutations([[1,2],3,4])) > Traceback (most recent call last): > File "", line 1, in > File "", line 3, in nonredundant_permutations > TypeError: unorderable types: int() < list() > > This also doesn't afflict itertools.permutations. > My result is: >>> list(distinct_permutations([[1,2],3,4])) [[3, 4, [1, 2]], [3, [1, 2], 4], [4, 3, [1, 2]], [4, [1, 2], 3], [[1, 2], 3, 4], [[1, 2], 4, 3]] From mertz at gnosis.cx Sat Oct 12 18:56:13 2013 From: mertz at gnosis.cx (David Mertz) Date: Sat, 12 Oct 2013 09:56:13 -0700 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> Message-ID: On Sat, Oct 12, 2013 at 8:07 AM, Nick Coghlan wrote: > One interesting point is that combining collections.Counter.elements() > with itertools.permutations() currently does the wrong thing, since > itertools.permutations() *always* considers iterable order significant, > while for collections.Counter.elements() it's explicitly arbitrary. > I hadn't thought about it, but as I read the docs for 3.4 (and it's the same back through 2.7), not only would both of these be permissible in a Python implementation: >>> list(collections.Counter({'a':2,'b':1}).elements()) ['a', 'a', 'b'] Or: >>> list(collections.Counter({'a':2,'b':1}).elements()) ['b', 'a', 'a'] But even this would be per documentation (although really unlikely as an implementation): >>> list(collections.Counter({'a':2,'b':1}).elements()) ['a', 'b', 'a'] -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Sat Oct 12 19:34:26 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 12 Oct 2013 10:34:26 -0700 Subject: [Python-ideas] An exhaust() function for iterators In-Reply-To: References: Message-ID: <1657273B-685C-4335-A9E4-5DF5775DE620@gmail.com> On Sep 28, 2013, at 9:06 PM, Clay Sweetser wrote: > > As it turns out, the fastest and most efficient method available in > the standard library is collections.deque's __init__ and extend > methods. That technique is shown in the itertools docs in the consume() recipe. It is the fastest way in CPython (in PyPy, a straight for-loop will likely be the fastest). I didn't immortalize it as a real itertool because I think most code is better-off with a straight for-loop. The itertools were inspired by functional languages and intended to be used in a functional style where iterators with side-effects would be considered bad form. A regular for-loop is only a little bit slower, but it has a number of virtues including clarity, signal checking, and thread switching. In a real application, the speed difference of consume() vs a for-loop is likely to be insignificant if the iterator is doing anything interesting at all. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Oct 12 20:56:55 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 12 Oct 2013 14:56:55 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> Message-ID: Yes, you're right and I understand what's been done although like the 30 upvoters to the linked stackoverflow question, I find the current behaviour surprising and would like to see a distinct_permutations function. How do I start to submit a patch? Neil On Sat, Oct 12, 2013 at 11:07 AM, Nick Coghlan wrote: > > On 12 Oct 2013 17:18, "Neil Girdhar" wrote: > > > > I'm sorry, but I can't find a reference supporting the statement that > the current permutations function is consistent with the mathematical > definition. Perhaps you would like to find a reference? A quick search > yielded the book "the Combinatorics of Permutations": > http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1 Please look in the > chapter "Permutation of multisets". > > Itertools effectively produces the permutation of (index, value) pairs. > Hence Steven's point about the permutations of a list not being > mathematically defined, so you have to decide what set to map it to in > order to decide what counts as a unique value. The mapping itertools uses > considers position in the iterable relevant so exchanging two values that > are themselves equivalent is still considered a distinct permutation since > their original position is taken into account. Like a lot of mathematics, > it's a matter of paying close attention to which entities are actually > being manipulated and how the equivalence classes are being defined :) > > Hence the current proposal amounts to adding another variant that provides > the permutations of an unordered multiset instead of those of a set of > (index, value) 2-tuples (with the indices stripped from the results). > > One interesting point is that combining collections.Counter.elements() > with itertools.permutations() currently does the wrong thing, since > itertools.permutations() *always* considers iterable order significant, > while for collections.Counter.elements() it's explicitly arbitrary. > > Cheers, > Nick. > > > > > Best, > > > > Neil > > > > > > On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano > wrote: > >> > >> On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote: > >> > I honestly think that Python should stick to the mathematical > definition of > >> > permutations rather than some kind of consensus of the tiny minority > of > >> > people here. > >> > >> So do I. And that is exactly what itertools.permutations already does. > >> > >> > >> > >> -- > >> Steven > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> > >> -- > >> > >> --- > >> You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > >> To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. > >> To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > >> For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Sun Oct 13 02:44:38 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 12 Oct 2013 17:44:38 -0700 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> Message-ID: On Oct 12, 2013, at 11:56 AM, Neil Girdhar wrote: > , I find the current behaviour surprising and would like to see a distinct_permutations function. > How do I start to submit a patch? You can submit your patch at http://bugs.python.org and assign it to me (the module designer and maintainer). That said, the odds of it being accepted are slim. There are many ways to write combinatoric functions (Knuth has a whole book on the subject) and I don't aspire to include multiple variants unless there are strong motivating use cases. In general, if someone wants to eliminate duplicates from the population, they can do so easily with: permutations(set(population), n) The current design solves the most common use cases and it has some nice properties such as: * permutations is a subsequence of product * no assumptions are made about the comparability or orderability of members of the population * len(list(permutations(range(n), r))) == n! / (n-r)! just like you were taught in school * it is fast For more exotic needs, I think is appropriate to look outside the standard library to more full-featured combinatoric libraries (there are several listed at pypi.python.org). Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sun Oct 13 03:24:36 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 12 Oct 2013 21:24:36 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> Message-ID: Hi Raymond, I agree with you on the consistency point with itertools.product. That's a great point. However, permutations(set(population)) is not the correct way to take the permutations of a multiset. Please take a look at how permutations are taken from a multiset from any of the papers I linked or any paper that you can find on the internet. The number of permutations of multiset is n! / \prod a_i! for a_i are the element counts ? just like I was taught in school. There is currently no fast way to find these permutations of a multiset and it is a common operation for solving problems. What is needed, I think is a function multiset_permutations that accepts an iterable. Best, Neil On Sat, Oct 12, 2013 at 8:44 PM, Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > > On Oct 12, 2013, at 11:56 AM, Neil Girdhar wrote: > > , I find the current behaviour surprising and would like to see a > distinct_permutations function. > > How do I start to submit a patch? > > > You can submit your patch at http://bugs.python.org and assign it to me > (the module designer and maintainer). > > That said, the odds of it being accepted are slim. > There are many ways to write combinatoric functions > (Knuth has a whole book on the subject) and I don't > aspire to include multiple variants unless there are > strong motivating use cases. > > In general, if someone wants to eliminate duplicates > from the population, they can do so easily with: > > permutations(set(population), n) > > The current design solves the most common use cases > and it has some nice properties such as: > * permutations is a subsequence of product > * no assumptions are made about the comparability > or orderability of members of the population > * len(list(permutations(range(n), r))) == n! / (n-r)! > just like you were taught in school > * it is fast > > For more exotic needs, I think is appropriate to look > outside the standard library to more full-featured > combinatoric libraries (there are several listed at > pypi.python.org). > > > Raymond > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Oct 13 03:11:18 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 12 Oct 2013 18:11:18 -0700 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> Message-ID: <5259F336.9070203@stoneleaf.us> On 10/12/2013 05:44 PM, Raymond Hettinger wrote: > > On Oct 12, 2013, at 11:56 AM, Neil Girdhar > wrote: > >> , I find the current behaviour surprising and would like to see a distinct_permutations function. >> How do I start to submit a patch? > > You can submit your patch at http://bugs.python.org and assign it to me (the module designer and maintainer). > > That said, the odds of it being accepted are slim. +1 About the only improvement I can see would be a footnote in the itertools doc table that lists the different combinatorics. Being a naive permutations user myself I would have made the mistake of thinking that "r-length tuples, all possible orderings, no repeated elements" meant no repeated values. The longer text for permutations makes it clear how it works. My rst-foo is not good enough to link from the table down into the permutation text where the distinction is made clear. If no one beats me to a proposed patch I'll see if I can figure it out. -- ~Ethan~ From steve at pearwood.info Sun Oct 13 03:47:42 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 13 Oct 2013 12:47:42 +1100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> Message-ID: <20131013014742.GR7989@ando> On Sat, Oct 12, 2013 at 05:44:38PM -0700, Raymond Hettinger wrote: > In general, if someone wants to eliminate duplicates > from the population, they can do so easily with: > > permutations(set(population), n) In fairness Raymond, the proposal is not to eliminate duplicates from the population, but from the permutations themselves. Consider the example I gave earlier, where you're permuting "RRRBB" two items at a time. There are 20 permutations including duplicates, but sixteen of them are repeated: py> list(''.join(t) for t in permutations("RRRBB", 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] py> set(''.join(t) for t in permutations("RRRBB", 2)) {'BR', 'RB', 'RR', 'BB'} But if you eliminate duplicates from the population first, you get only two permutations: py> list(''.join(t) for t in permutations(set("RRRBB"), 2)) ['BR', 'RB'] If it were just a matter of calling set() on the output of permutations, that would be trivial enough. But, you might care about order, or elements might not be hashable, or you might have a LOT of permutations to generate before discarding: population = "R"*1000 + "B"*500 set(''.join(t) for t in permutations(population, 2)) # takes a while... In my opinion, if unique_permutations is no more efficient than calling set on the output of permutations, it's not worth it. But if somebody can come up with an implementation which is significantly more efficient, without making unreasonable assumptions about orderability, hashability or even comparability, then in my opinion that might be worthwhile. -- Steven From raymond.hettinger at gmail.com Sun Oct 13 05:03:43 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 12 Oct 2013 20:03:43 -0700 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <20131013014742.GR7989@ando> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> Message-ID: On Oct 12, 2013, at 6:47 PM, Steven D'Aprano wrote: > the proposal is not to eliminate duplicates from > the population, but from the permutations themselves. I'm curious about the use cases for this. Other than red/blue marble examples and some puzzle problems, does this come-up in any real problems? Do we actually need this? Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sun Oct 13 09:38:40 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 13 Oct 2013 03:38:40 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> Message-ID: My intuition is that we want Python to be "complete". Many other languages can find the permutations of a multiset. Python has a permutations function. Many people on stackoverflow expected that function to be able to find those permutations. One suggestion: Why not make it so that itertools.permutations checks if its argument is an instance of collections.Mapping? If it is, we could interpret the items as a mapping from elements to positive integers, which is a compact representation of a multiset. Then, it could do the right thing for that case. Best, Neil On Sat, Oct 12, 2013 at 11:03 PM, Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > > On Oct 12, 2013, at 6:47 PM, Steven D'Aprano wrote: > > the proposal is not to eliminate duplicates from > the population, but from the permutations themselves. > > > I'm curious about the use cases for this. > Other than red/blue marble examples and some puzzle problems, > does this come-up in any real problems? Do we actually need this? > > > Raymond > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Oct 13 11:27:54 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Oct 2013 19:27:54 +1000 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> Message-ID: On 13 October 2013 17:38, Neil Girdhar wrote: > My intuition is that we want Python to be "complete". Many other languages > can find the permutations of a multiset. Python has a permutations > function. Many people on stackoverflow expected that function to be able to > find those permutations. Nope, we expressly *don't* want the standard library to be "complete", because that would mean growing to the size of PyPI (or larger). There's always going to be scope for applications to adopt new domain specific dependencies with more in-depth support than that provided by the standard library. Many standard library modules are in fact deliberately designed as "stepping stone" modules that will meet the needs of code which have an incidental relationship to that task, but will need to be replaced with something more sophisticated for code directly related to that domain. Many times, that means they will ignore as irrelevant distinctions that are critical in certain contexts, simply because they don't come up all that often outside those specific domains, and addressing them involves making the core module more complicated to use for more typical cases. In this case, the proposed alternate permutations mechanism only makes a difference when: 1. The data set contains equivalent values 2. Input order is not considered significant, so exchanging equivalent values should *not* create a new permutation (i.e. multiset permutations rather than sequence permutations). If users aren't likely to encounter situations where that makes a difference, then providing both in the standard library isn't being helpful, it's being actively user hostile by asking them to make a decision they're not yet qualified to make for the sake of the few experts that specifically need . Hence Raymond's request for data modelling problems outside the "learning or studying combinatorics" context to make the case for standard library inclusion. Interestingly, I just found another language which has the equivalent of the currrent behaviour of itertools.permutations: Haskell has it as Data.List.permutations. As far as I can tell, Haskell doesn't offer support for multiset permutations in the core, you need an additional package like Math.Combinatorics (see: http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatorics-Multiset.html#g:4). Since iterator based programming in Python is heavily inspired by Haskell, this suggests that the current behaviour of itertools.permutations is appropriate and that Raymond is right to be dubious about including multiset permutations support directly in the standard library. Those interested in improving the experience of writing combinatorics code in Python may wish to look into helping out with the combinatorics package on PyPI: http://phillipmfeldman.org/Python/for_developers.html (For example, politely approach Phillip to see if he is interested in hosting it on GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving the PyPI metadata, etc - note I have no experience with this package, it's just the first hit for "python combinatorics") > One suggestion: Why not make it so that itertools.permutations checks if its > argument is an instance of collections.Mapping? If it is, we could > interpret the items as a mapping from elements to positive integers, which > is a compact representation of a multiset. Then, it could do the right > thing for that case. If you want to go down the path of only caring about hashable values, you may want to argue for a permutations method on collections.Counter (it's conceivable that approach has the potential to be even faster than an approach based on accepting and processing an arbitrary iterable, since it can avoid generating repeated values in the first place). A Counter based multiset permutation algorithm was actually posted to python-list back in 2009, just after collections.Counter was introduced: https://mail.python.org/pipermail/python-list/2009-January/521685.html I just created an updated version of that recipe and posted it as https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From breamoreboy at yahoo.co.uk Sun Oct 13 13:05:30 2013 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 13 Oct 2013 12:05:30 +0100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> Message-ID: On 13/10/2013 08:38, Neil Girdhar wrote: > My intuition is that we want Python to be "complete". No thank you. I much prefer "Python in a Nutshell" the size it is now, I'm not interested in competing with (say) "Java in a Nutshell". -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence From oscar.j.benjamin at gmail.com Sun Oct 13 17:54:16 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sun, 13 Oct 2013 16:54:16 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: On 11 October 2013 22:38, Neil Girdhar wrote: > My code, which was the motivation for this suggestion: > > import itertools as it > import math > > def is_prime(n): > for i in range(2, int(math.floor(math.sqrt(n))) + 1): > if n % i == 0: > return False > return n >= 2 I don't really understand what your code is doing but I just wanted to point out that the above will fail for large integers (maybe not relevant in your case): >>> is_prime(2**19937-1) Traceback (most recent call last): File "", line 1, in File "tmp.py", line 3, in is_prime for i in range(2, int(math.floor(math.sqrt(n))) + 1): OverflowError: long int too large to convert to float Even without the OverflowError I suspect that there are primes p > ~1e16 such that is_prime(p**2) would incorrectly return True. This is a consequence of depending on FP arithmetic in what should be exact computation. The easy fix is to break when i**2 > n avoiding the tricky sqrt operation. Alternatively you can use an exact integer sqrt function to fix this: def sqrt_floor(y): try: x = int(math.sqrt(y)) except OverflowError: x = y while not (x ** 2 <= y < (x+1) ** 2): x = (x + y // x) // 2 return x Oscar From mistersheik at gmail.com Sun Oct 13 20:29:38 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 13 Oct 2013 14:29:38 -0400 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: Did you read the problem? Anyway, let's not get off topic (permutations). Neil On Sun, Oct 13, 2013 at 11:54 AM, Oscar Benjamin wrote: > On 11 October 2013 22:38, Neil Girdhar wrote: > > My code, which was the motivation for this suggestion: > > > > import itertools as it > > import math > > > > def is_prime(n): > > for i in range(2, int(math.floor(math.sqrt(n))) + 1): > > if n % i == 0: > > return False > > return n >= 2 > > I don't really understand what your code is doing but I just wanted to > point out that the above will fail for large integers (maybe not > relevant in your case): > > >>> is_prime(2**19937-1) > Traceback (most recent call last): > File "", line 1, in > File "tmp.py", line 3, in is_prime > for i in range(2, int(math.floor(math.sqrt(n))) + 1): > OverflowError: long int too large to convert to float > > Even without the OverflowError I suspect that there are primes p > > ~1e16 such that is_prime(p**2) would incorrectly return True. This is > a consequence of depending on FP arithmetic in what should be exact > computation. The easy fix is to break when i**2 > n avoiding the > tricky sqrt operation. Alternatively you can use an exact integer sqrt > function to fix this: > > def sqrt_floor(y): > try: > x = int(math.sqrt(y)) > except OverflowError: > x = y > while not (x ** 2 <= y < (x+1) ** 2): > x = (x + y // x) // 2 > return x > > > Oscar > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Oct 13 21:02:56 2013 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 13 Oct 2013 14:02:56 -0500 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: <5258B539.10307@mrabarnett.plus.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com> Message-ID: [MRAB, posts a beautiful solution] I don't really have a use for this, but it was a lovely programming puzzle, so I'll include an elaborate elaboration of MRAB's algorithm below. And that's the end of my interest in this ;-) It doesn't require that elements be orderable or even hashable. It does require that they can be compared for equality, but it's pretty clear that if we _do_ include something like this, "equality" has to be pluggable. By default, this uses `operator.__eq__`, but any 2-argument function can be used. E.g., use `operator.is_` to make it believe that only identical objects are equal. Or pass a lambda to distinguish by type too (e.g., if you don't want 3 and 3.0 to be considered equal). Etc. The code is much lower-level, to make it closer to an efficient C implementation. No dicts, no sets, no slicing or concatenation of lists, etc. It sticks to using little integers (indices) as far as possible, which can be native in C (avoiding mounds of increfs and decrefs). Also, because "equality" is pluggable, it may be a slow operation. The `equal()` function is only called here during initial setup, to partition the elements into equivalence classes. Where N is len(iterables), at best `equal()` is called N-1 times (if all elements happen to be equal), and at worst N*(N-1)/2 times (if no elements happen to be equal), all independent of `count`. It assumes `equal()` is transitive. It doesn't always return permutations in the same order as MRAB's function, because - to avoid any searching - it iterates over equivalence classes instead of over the original iterables. This is the simplest differing example I can think of: >>> list(unique_permutations("aba", 2)) [('a', 'b'), ('a', 'a'), ('b', 'a')] For the first result, MRAB's function first picks the first 'a', then removes it from the iterables and recurses on ("ba", 1). So it finds 'b' next, and yields ('a', 'b') (note: this is the modified unique_permutations() below - MRAB's original actually yielded lists, not tuples). But: >>> list(up("aba", 2)) [('a', 'a'), ('a', 'b'), ('b', 'a')] Different order! That's because "up" is iterating over (conceptually) [EquivClass(first 'a', second 'a'), EquivClass('b')] It first picks the first `a`, then adjusts list pointers (always a fast, constant-time operation) so that it recurses on [EquivClass(second 'a'), EquivClass('b')] So it next finds the second 'a', and yields (first 'a', second 'a') as its first result. Maybe this will make it clearer: >>> list(up(["a1", "b", "a2"], 2, lambda x, y: x[0]==y[0])) [('a1', 'a2'), ('a1', 'b'), ('b', 'a1')] No, I guess that didn't make it clearer - LOL ;-) Do I care? No. Anyway, here's the code. Have fun :-) # MRAB's beautiful solution, modified in two ways to be # more like itertools.permutations: # 1. Yield tuples instead of lists. # 2. When count > len(iterable), don't yield anything. def unique_permutations(iterable, count=None): def perm(items, count): if count: seen = set() for i, item in enumerate(items): if item not in seen: for p in perm(items[:i] + items[i+1:], count - 1): yield [item] + p seen.add(item) else: yield [] items = list(iterable) if count is None: count = len(items) if count > len(items): return for p in perm(items, count): yield tuple(p) # New code, ending in generator `up()`. import operator # In C, this would be a struct of native C types, # and the brief methods would be coded inline. class ENode: def __init__(self, initial_index=None): self.indices = [initial_index] # list of equivalent indices self.current = 0 self.prev = self.next = self def index(self): "Return current index." return self.indices[self.current] def unlink(self): "Remove self from list." self.prev.next = self.next self.next.prev = self.prev def insert_after(self, x): "Insert node x after self." x.prev = self x.next = self.next self.next.prev = x self.next = x def advance(self): """Advance the current index. If we're already at the end, remove self from list. .restore() undoes everything .advance() did.""" assert self.current < len(self.indices) self.current += 1 if self.current == len(self.indices): self.unlink() def restore(self): "Undo what .advance() did." assert self.current <= len(self.indices) if self.current == len(self.indices): self.prev.insert_after(self) self.current -= 1 def build_equivalence_classes(items, equal): ehead = ENode() # headed, doubly-linked circular list of equiv classes for i, elt in enumerate(items): e = ehead.next while e is not ehead: if equal(elt, items[e.indices[0]]): # Add (index of) elt to this equivalence class. e.indices.append(i) break e = e.next else: # elt not equal to anything seen so far: append # new equivalence class. e = ENode(i) ehead.prev.insert_after(e) return ehead def up(iterable, count=None, equal=operator.__eq__): def perm(i): if i: e = ehead.next assert e is not ehead while e is not ehead: result[count - i] = e.index() e.advance() yield from perm(i-1) e.restore() e = e.next else: yield tuple(items[j] for j in result) items = tuple(iterable) if count is None: count = len(items) if count > len(items): return ehead = build_equivalence_classes(items, equal) result = [None] * count yield from perm(count) From python at mrabarnett.plus.com Sun Oct 13 21:30:42 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 13 Oct 2013 20:30:42 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com> Message-ID: <525AF4E2.6010301@mrabarnett.plus.com> On 13/10/2013 20:02, Tim Peters wrote: > [MRAB, posts a beautiful solution] > > I don't really have a use for this, but it was a lovely programming > puzzle, so I'll include an elaborate elaboration of MRAB's algorithm > below. And that's the end of my interest in this ;-) > > It doesn't require that elements be orderable or even hashable. It > does require that they can be compared for equality, but it's pretty > clear that if we _do_ include something like this, "equality" has to > be pluggable. By default, this uses `operator.__eq__`, but any > 2-argument function can be used. E.g., use `operator.is_` to make it > believe that only identical objects are equal. Or pass a lambda to > distinguish by type too (e.g., if you don't want 3 and 3.0 to be > considered equal). Etc. > [snip] I posted yet another implementation after that one. From oscar.j.benjamin at gmail.com Sun Oct 13 21:34:09 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sun, 13 Oct 2013 20:34:09 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: On 13 October 2013 19:29, Neil Girdhar wrote: > Did you read the problem? I did but since you showed some code that you said you were working on I thought you'd be interested to know that it could be improved. > Anyway, let's not get off topic (permutations). Getting back to your proposal, I disagree that permutations should be "fixed". The current behaviour is correct. If I was asked to define a permutation I would have given definition #3 from Steven's list: a bijection from a set to itself. Formally a permutation of a collection of non-unique elements is not defined. They may also be uses for a function like the one that you proposed but I've never needed it (and I have used permutations a few times) and no one in this thread (including you) has given a use-case for this. Oscar From mistersheik at gmail.com Sun Oct 13 21:39:19 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 13 Oct 2013 15:39:19 -0400 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: On Sun, Oct 13, 2013 at 3:34 PM, Oscar Benjamin wrote: > On 13 October 2013 19:29, Neil Girdhar wrote: > > Did you read the problem? > > I did but since you showed some code that you said you were working on > I thought you'd be interested to know that it could be improved. > The code solves the problem according to its specification :) (The numbers are less than 1e8.) > > Anyway, let's not get off topic (permutations). > > Getting back to your proposal, I disagree that permutations should be > "fixed". The current behaviour is correct. If I was asked to define a > permutation I would have given definition #3 from Steven's list: a > bijection from a set to itself. Formally a permutation of a collection > of non-unique elements is not defined. > > They may also be uses for a function like the one that you proposed > but I've never needed it (and I have used permutations a few times) > and no one in this thread (including you) has given a use-case for > this. > > > Oscar > The problem is a use-case. Did you read it? Did you try solving it? -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sun Oct 13 22:04:21 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 13 Oct 2013 21:04:21 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> Message-ID: <525AFCC5.4070309@mrabarnett.plus.com> On 13/10/2013 20:34, Oscar Benjamin wrote: > On 13 October 2013 19:29, Neil Girdhar wrote: >> Did you read the problem? > > I did but since you showed some code that you said you were working on > I thought you'd be interested to know that it could be improved. > >> Anyway, let's not get off topic (permutations). > > Getting back to your proposal, I disagree that permutations should be > "fixed". The current behaviour is correct. If I was asked to define a > permutation I would have given definition #3 from Steven's list: a > bijection from a set to itself. Formally a permutation of a collection > of non-unique elements is not defined. > > They may also be uses for a function like the one that you proposed > but I've never needed it (and I have used permutations a few times) > and no one in this thread (including you) has given a use-case for > this. > Here's a use case: anagrams. From mistersheik at gmail.com Sun Oct 13 22:56:55 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 13 Oct 2013 16:56:55 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> Message-ID: Executive summary: Thanks for discussion everyone. I'm now convinced that itertools.permutations is fine as it is. I am not totally convinced that multiset_permutations doesn't belong in itertools, or else there should be a standard combinatorics library. On Sun, Oct 13, 2013 at 5:27 AM, Nick Coghlan wrote: > On 13 October 2013 17:38, Neil Girdhar wrote: > > My intuition is that we want Python to be "complete". Many other > languages > > can find the permutations of a multiset. Python has a permutations > > function. Many people on stackoverflow expected that function to be > able to > > find those permutations. > > Nope, we expressly *don't* want the standard library to be "complete", > because that would mean growing to the size of PyPI (or larger). > There's always going to be scope for applications to adopt new domain > specific dependencies with more in-depth support than that provided by > the standard library. > By complete I meant that just as if you were to add the "error function, erf" to math, you would want to add an equivalent version to cmath. When I saw the permutation function in itertools, I expected that it would work on both sets and multisets, or else there would be another function that did. > > Many standard library modules are in fact deliberately designed as > "stepping stone" modules that will meet the needs of code which have > an incidental relationship to that task, but will need to be replaced > with something more sophisticated for code directly related to that > domain. Many times, that means they will ignore as irrelevant > distinctions that are critical in certain contexts, simply because > they don't come up all that often outside those specific domains, and > addressing them involves making the core module more complicated to > use for more typical cases. > Good point. > > In this case, the proposed alternate permutations mechanism only makes > a difference when: > > 1. The data set contains equivalent values > 2. Input order is not considered significant, so exchanging equivalent > values should *not* create a new permutation (i.e. multiset > permutations rather than sequence permutations). > > If users aren't likely to encounter situations where that makes a > difference, then providing both in the standard library isn't being > helpful, it's being actively user hostile by asking them to make a > decision they're not yet qualified to make for the sake of the few > experts that specifically need . Hence Raymond's request for data > modelling problems outside the "learning or studying combinatorics" > context to make the case for standard library inclusion. > > Interestingly, I just found another language which has the equivalent > of the currrent behaviour of itertools.permutations: Haskell has it as > Data.List.permutations. As far as I can tell, Haskell doesn't offer > support for multiset permutations in the core, you need an additional > package like Math.Combinatorics (see: > > http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatorics-Multiset.html#g:4 > ). > > Since iterator based programming in Python is heavily inspired by > Haskell, this suggests that the current behaviour of > itertools.permutations is appropriate and that Raymond is right to be > dubious about including multiset permutations support directly in the > standard library. > > You've convinced me that itertools permutations is doing the right thing :) I'm not sure if multiset permutations should be in the standard library or not. It is very useful. > Those interested in improving the experience of writing combinatorics > code in Python may wish to look into helping out with the > combinatorics package on PyPI: > http://phillipmfeldman.org/Python/for_developers.html (For example, > politely approach Phillip to see if he is interested in hosting it on > GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving > the PyPI metadata, etc - note I have no experience with this package, > it's just the first hit for "python combinatorics") > > > One suggestion: Why not make it so that itertools.permutations checks if > its > > argument is an instance of collections.Mapping? If it is, we could > > interpret the items as a mapping from elements to positive integers, > which > > is a compact representation of a multiset. Then, it could do the right > > thing for that case. > > If you want to go down the path of only caring about hashable values, > you may want to argue for a permutations method on collections.Counter > (it's conceivable that approach has the potential to be even faster > than an approach based on accepting and processing an arbitrary > iterable, since it can avoid generating repeated values in the first > place). > > A Counter based multiset permutation algorithm was actually posted to > python-list back in 2009, just after collections.Counter was > introduced: > https://mail.python.org/pipermail/python-list/2009-January/521685.html > > Nice find! > I just created an updated version of that recipe and posted it as > https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py > > Why not just define multiset_permutations to accept a dict (dict is a base class of Counter)? Since you're going to convert from an iterable (with duplicates) to a dict (via Counter) anyway, why not accept it as such. Users who want an interface similar to itertools.permutations can pass their iterable through Counter first. Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Oct 13 23:22:06 2013 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 13 Oct 2013 16:22:06 -0500 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: <525AF4E2.6010301@mrabarnett.plus.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com> <525AF4E2.6010301@mrabarnett.plus.com> Message-ID: [Tim] >> [MRAB, posts a beautiful solution] >> >> I don't really have a use for this, but it was a lovely programming >> puzzle, so I'll include an elaborate elaboration of MRAB's algorithm >> below. And that's the end of my interest in this ;-) >> >> It doesn't require that elements be orderable or even hashable. It >> does require that they can be compared for equality, but it's pretty >> clear that if we _do_ include something like this, "equality" has to >> be pluggable. >> ... [MRAB] > I posted yet another implementation after that one. I know. I was talking about the beautiful one ;-) The later one could build equivalence classes faster (than mine) in many cases, but I don't care much about the startup costs. I care a lot more about: 1. Avoiding searches in the recursive function; i.e., this: for i, item in enumerate(items): if item != prev_item: Making such tests millions (billions ...) of times adds up - and equality testing may not be cheap. The algorithm I posted does no item testing after the setup is done (none in its recursive function). 2. Making "equality" pluggable. Your later algorithm bought "find equivalence classes" speed for hashable elements by using a dict, but a dict's notion of equality can't be changed. So, make equality pluggable, and that startup-time speed advantage vanishes for all but operator.__eq__'s idea of equality. From oscar.j.benjamin at gmail.com Mon Oct 14 01:10:51 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 14 Oct 2013 00:10:51 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com> <525AF4E2.6010301@mrabarnett.plus.com> Message-ID: On 13 October 2013 22:22, Tim Peters wrote: > 2. Making "equality" pluggable. Your later algorithm bought "find > equivalence classes" speed for hashable elements by using a dict, but > a dict's notion of equality can't be changed. So, make equality > pluggable, and that startup-time speed advantage vanishes for all but > operator.__eq__'s idea of equality. It sounds like you want Antoine's TransformDict: http://www.python.org/dev/peps/pep-0455/ Oscar From oscar.j.benjamin at gmail.com Mon Oct 14 01:32:34 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 14 Oct 2013 00:32:34 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com> <525AF4E2.6010301@mrabarnett.plus.com> Message-ID: On 14 October 2013 00:23, Tim Peters wrote: > [Tim] >>> 2. Making "equality" pluggable. Your later algorithm bought "find >>> equivalence classes" speed for hashable elements by using a dict, but >>> a dict's notion of equality can't be changed. So, make equality >>> pluggable, and that startup-time speed advantage vanishes for all but >>> operator.__eq__'s idea of equality. > > [Oscar Benjamin] >> It sounds like you want Antoine's TransformDict: >> http://www.python.org/dev/peps/pep-0455/ > > Not really in this case - I want a two-argument function ("are A and B > equal?"). Not all plausible cases of that can be mapped to a > canonical hashable key. For example, consider permutations of a list > of lists, where the user doesn't want int and float elements of the > lists to be considered equal when they happen to have the same value. > Is that a stretch? Oh ya ;-) Will this do? d = TransformDict(lambda x: (type(x), x)) Oscar From tim.peters at gmail.com Mon Oct 14 01:44:14 2013 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 13 Oct 2013 18:44:14 -0500 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com> <525AF4E2.6010301@mrabarnett.plus.com> Message-ID: [Oscar Benjamin] > Will this do? > d = TransformDict(lambda x: (type(x), x)) No. In the example I gave, *lists* will be passed as x (it was a list of lists: the lists are the elements of the permutations, and they happen to have internal structure of their own). So the `type(x)` there is useless (it will always be the list type), while the lists themselves would still be compared by operator.__eq__. Not to mention that the constructed tuple isn't hashable anyway (x is a list), so can't be used by TransformDict. So that idea doesn't work several times over ;-) From oscar.j.benjamin at gmail.com Mon Oct 14 01:55:29 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 14 Oct 2013 00:55:29 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com> <525AF4E2.6010301@mrabarnett.plus.com> Message-ID: On 14 October 2013 00:44, Tim Peters wrote: > [Oscar Benjamin] >> Will this do? >> d = TransformDict(lambda x: (type(x), x)) > > No. In the example I gave, *lists* will be passed as x (it was a list > of lists: the lists are the elements of the permutations, and they > happen to have internal structure of their own). So the `type(x)` > there is useless (it will always be the list type), while the lists > themselves would still be compared by operator.__eq__. > > Not to mention that the constructed tuple isn't hashable anyway (x is > a list), so can't be used by TransformDict. > > So that idea doesn't work several times over ;-) Damn, you're right. I obviously didn't think that one through hard enough. Okay how about this? d = TransformDict(lambda x: (tuple(map(type, x)), tuple(x))) Oscar From tim.peters at gmail.com Mon Oct 14 01:23:51 2013 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 13 Oct 2013 18:23:51 -0500 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com> <525AF4E2.6010301@mrabarnett.plus.com> Message-ID: [Tim] >> 2. Making "equality" pluggable. Your later algorithm bought "find >> equivalence classes" speed for hashable elements by using a dict, but >> a dict's notion of equality can't be changed. So, make equality >> pluggable, and that startup-time speed advantage vanishes for all but >> operator.__eq__'s idea of equality. [Oscar Benjamin] > It sounds like you want Antoine's TransformDict: > http://www.python.org/dev/peps/pep-0455/ Not really in this case - I want a two-argument function ("are A and B equal?"). Not all plausible cases of that can be mapped to a canonical hashable key. For example, consider permutations of a list of lists, where the user doesn't want int and float elements of the lists to be considered equal when they happen to have the same value. Is that a stretch? Oh ya ;-) From oscar.j.benjamin at gmail.com Mon Oct 14 02:20:19 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 14 Oct 2013 01:20:19 +0100 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com> <525AF4E2.6010301@mrabarnett.plus.com> Message-ID: On 14 October 2013 01:15, Tim Peters wrote: > [Oscar Benjamin] >> ... >> Damn, you're right. I obviously didn't think that one through hard >> enough. Okay how about this? >> d = TransformDict(lambda x: (tuple(map(type, x)), tuple(x))) > > Oscar, please give this up - it's not going to work. `x` can be any > object whatsoever, with arbitrarily complex internal structure, and > the user can have an arbitrarily convoluted idea of what "equal" > means. Did I mention that these lists don't *only* have ints and > floats as elements, but also nested sublists? Oh ya - they also want > a float and a singleton list containing the same float to be > considered equal ;-) Etc. That does seem contrived but then I guess the whole problem is however.... > Besides, you're trying to solve a problem I didn't have to begin with > ;-) That is, I don't care much about the cost of building equivalence > classes - it's a startup cost for the generator, not an "inner loop" > cost. Even if you could bash every case into a different convoluted > hashable tuple, in general it's going to be - in this specific problem > - far easier for the user to define an equal() function they like, > working directly on the two objects. That doesn't require an endless > sequence of tricks. okay I see what you mean. Oscar From tim.peters at gmail.com Mon Oct 14 02:15:00 2013 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 13 Oct 2013 19:15:00 -0500 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <52587976.1000901@mrabarnett.plus.com> <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com> <525AF4E2.6010301@mrabarnett.plus.com> Message-ID: [Oscar Benjamin] > ... > Damn, you're right. I obviously didn't think that one through hard > enough. Okay how about this? > d = TransformDict(lambda x: (tuple(map(type, x)), tuple(x))) Oscar, please give this up - it's not going to work. `x` can be any object whatsoever, with arbitrarily complex internal structure, and the user can have an arbitrarily convoluted idea of what "equal" means. Did I mention that these lists don't *only* have ints and floats as elements, but also nested sublists? Oh ya - they also want a float and a singleton list containing the same float to be considered equal ;-) Etc. Besides, you're trying to solve a problem I didn't have to begin with ;-) That is, I don't care much about the cost of building equivalence classes - it's a startup cost for the generator, not an "inner loop" cost. Even if you could bash every case into a different convoluted hashable tuple, in general it's going to be - in this specific problem - far easier for the user to define an equal() function they like, working directly on the two objects. That doesn't require an endless sequence of tricks. From felix at groebert.org Mon Oct 14 14:25:53 2013 From: felix at groebert.org (=?ISO-8859-1?Q?Felix_Gr=F6bert?=) Date: Mon, 14 Oct 2013 14:25:53 +0200 Subject: [Python-ideas] pytaint: taint tracking in python Message-ID: Hi, I'd like to start a discussion on adding a security feature: taint tracking. As part of his internship, Marcin (cc) has been working on a patch to cpython-2.7.5 which is available online. We also published a design document and slides. https://github.com/felixgr/pytaint The idea behind taint tracking (or taint checking) is that we mark ('taint') untrusted data and prevent the programmer from using it in sensitive places (called sinks). A standard use case would be in a web application, where data extracted from HTTP requests is tainted and a database connection is sensitive sink. In other words: objects returned by http request have a property indicating taint, and when one of them is passed to database connection, a TaintException is raised. The idea itself is not new (Ruby and Perl have it; there are also some python libraries floating around) and pretty much noone uses it - however with a few improvements, it can be made viable. Firstly, we introduce different kinds of taint (motivation: a string may be attack vector for many classes of attacks - e.g. XSS, SQLi - and we need different escaping for that). Secondly, we allow to easily apply it to existing software - a programmer can simply write a config file specifying taint sources, sensitive sinks and taint cleaners, and enable tracking by adding one line to his app. We think it's a very useful feature for developing most of webapps and other security-sensitive application in Python, any thoughts on this? Thanks, Felix -------------- next part -------------- An HTML attachment was scrubbed... URL: From dickinsm at gmail.com Mon Oct 14 14:29:04 2013 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 14 Oct 2013 13:29:04 +0100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> Message-ID: On Sun, Oct 13, 2013 at 9:56 PM, Neil Girdhar wrote: > > By complete I meant that just as if you were to add the "error function, > erf" to math, you would want to add an equivalent version to cmath. > An interesting choice of example. *Why* would you want to do so? Since you bring this up, I assume you're already aware that math.erf exists but cmath.erf does not. I believe there are good, practical reasons *not* to add cmath.erf, in spite of the existence of math.erf. Not least of these is that cmath.erf would be significantly more complicated to implement and of significantly less interest to users. And perhaps there's a parallel with itertools.permutations and the proposed itertools.multiset_permutations here... -- Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Mon Oct 14 14:37:59 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 14 Oct 2013 08:37:59 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> Message-ID: Actually I didn't notice that. It seems weird to find erf in math, but erf for complex numbers in scipy.special. It's just about organization and user discovery. I realize that from the developer's point of view, erf for complex numbers is complicated, but why does the user care? On Mon, Oct 14, 2013 at 8:29 AM, Mark Dickinson wrote: > On Sun, Oct 13, 2013 at 9:56 PM, Neil Girdhar wrote: > >> >> By complete I meant that just as if you were to add the "error function, >> erf" to math, you would want to add an equivalent version to cmath. >> > > An interesting choice of example. *Why* would you want to do so? > > Since you bring this up, I assume you're already aware that math.erf > exists but cmath.erf does not. I believe there are good, practical reasons > *not* to add cmath.erf, in spite of the existence of math.erf. Not least > of these is that cmath.erf would be significantly more complicated to > implement and of significantly less interest to users. And perhaps there's > a parallel with itertools.permutations and the proposed > itertools.multiset_permutations here... > > -- > Mark > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Mon Oct 14 15:11:42 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 14 Oct 2013 14:11:42 +0100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> Message-ID: On 14 October 2013 13:37, Neil Girdhar wrote: > > Actually I didn't notice that. It seems weird to find erf in math, but erf > for complex numbers in scipy.special. It's just about organization and user > discovery. I realize that from the developer's point of view, erf for > complex numbers is complicated, but why does the user care? This is the first time I've seen a suggestion that there should be cmath.erf. So I would say that most users don't care about having a complex error function. Whoever would take the time to implement the complex error function might instead spend that time implementing and maintaining something that users do care about. Oscar From ncoghlan at gmail.com Mon Oct 14 15:15:06 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 14 Oct 2013 23:15:06 +1000 Subject: [Python-ideas] pytaint: taint tracking in python In-Reply-To: References: Message-ID: On 14 October 2013 22:25, Felix Gr?bert wrote: > We think it's a very useful feature for developing most of webapps and other > security-sensitive application in Python, any thoughts on this? It's definitely an interesting idea, and the idea of pursuing it initially as a separate project to optionally harden Python 2 applications is a good one. Longer term, before it can be considered for inclusion as a language feature: 1. It needs to work with Python 3 (which has a substantially different text model), as Python 2 is no longer receiving new features. 2. The performance impact needs to be assessed when the feature is disabled (the default) and when various sources and sinks are defined. The performance numbers comparing http://hg.python.org/benchmarks/ between vanilla CPython 2.7.5 and pytaint may also be of interest to potential users of the Python 2.7 version. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mistersheik at gmail.com Mon Oct 14 15:15:06 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 14 Oct 2013 09:15:06 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> Message-ID: Look I don't want it, and anyway it's already in scipy.special. I just organizational symmetry. I expected to find complex versions of math functions in cmath ?not in scipy special. On Mon, Oct 14, 2013 at 9:11 AM, Oscar Benjamin wrote: > On 14 October 2013 13:37, Neil Girdhar wrote: > > > > Actually I didn't notice that. It seems weird to find erf in math, but > erf > > for complex numbers in scipy.special. It's just about organization and > user > > discovery. I realize that from the developer's point of view, erf for > > complex numbers is complicated, but why does the user care? > > This is the first time I've seen a suggestion that there should be > cmath.erf. So I would say that most users don't care about having a > complex error function. Whoever would take the time to implement the > complex error function might instead spend that time implementing and > maintaining something that users do care about. > > > Oscar > -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Mon Oct 14 15:26:59 2013 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 14 Oct 2013 14:26:59 +0100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> Message-ID: On 14/10/2013 14:15, Neil Girdhar wrote: > Look I don't want it, and anyway it's already in scipy.special. I just > organizational symmetry. I expected to find complex versions of math > functions in cmath ?not in scipy special. > Why are you comparing core Python modules with third party ones? -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence From abarnert at yahoo.com Mon Oct 14 18:07:26 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 14 Oct 2013 09:07:26 -0700 Subject: [Python-ideas] pytaint: taint tracking in python In-Reply-To: References: Message-ID: On Oct 14, 2013, at 5:25, Felix Gr?bert wrote: > The idea itself is not new (Ruby and Perl have it; there are also some python libraries floating around) and pretty much noone uses it - however with a few improvements, it can be made viable. A good part of the reason no one uses it is that SQL injection is always given as the motivation for the idea, but it's not a very good solution for that problem, and there's already a well-known better solution: parameterized queries. SQL isn't the only case where you build executable strings--a document formatter might build Postscript code; a forum might build HTML (maybe even with embedded JS); a game might even read Python code from an in-game console or untrusted mod that's allowed to run in a different globals environment but not the main one; etc. Has anyone successfully used perl's long-standing taint mode for any such purposes? If not, can you demonstrate using it in python? I don't think that would be _necessary_ for a python taint mode implementation to be considered useful, but it would certainly help get attention to the idea. From raymond.hettinger at gmail.com Mon Oct 14 19:56:23 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 14 Oct 2013 10:56:23 -0700 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> Message-ID: <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com> On Oct 13, 2013, at 1:56 PM, Neil Girdhar wrote: > I'm now convinced that itertools.permutations is fine as it is. I am not totally convinced that multiset_permutations doesn't belong in itertools, Now that we have a good algorithm, I'm open to adding this to itertools, but it would need to have a name that didn't create any confusion with respect to the existing tools, perhaps something like: anagrams(population, r) Return an iterator over a all distinct r-length permutations of the population. Unlike permutations(), element uniqueness is determined by value rather than by position. Also, anagrams() makes no guarantees about the order the tuples are generated. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leapyear.org Mon Oct 14 20:03:21 2013 From: bruce at leapyear.org (Bruce Leban) Date: Mon, 14 Oct 2013 11:03:21 -0700 Subject: [Python-ideas] pytaint: taint tracking in python In-Reply-To: References: Message-ID: There's another good use case for tainting: html injection (XSS). There's a good solution for that too but XSS is still prevalent because it's easy to build html by concatenating strings without escaping and template systems make it too easy to inject strings without escaping (or put another way, they make it equally easy to inject escaped strings as unescaped strings). However, the issue is not just tainting but typing as well. When I have a string, I need to know if it's raw text or html text. If it's html text, I need to know if it's safe (generated by the program or user input that's been sanitized (carefully)) or unsafe (raw user input). I'm not sure it isn't --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security On Mon, Oct 14, 2013 at 9:07 AM, Andrew Barnert wrote: > On Oct 14, 2013, at 5:25, Felix Gr?bert wrote: > > > The idea itself is not new (Ruby and Perl have it; there are also some > python libraries floating around) and pretty much noone uses it - however > with a few improvements, it can be made viable. > > A good part of the reason no one uses it is that SQL injection is always > given as the motivation for the idea, but it's not a very good solution for > that problem, and there's already a well-known better solution: > parameterized queries. > > SQL isn't the only case where you build executable strings--a document > formatter might build Postscript code; a forum might build HTML (maybe even > with embedded JS); a game might even read Python code from an in-game > console or untrusted mod that's allowed to run in a different globals > environment but not the main one; etc. Has anyone successfully used perl's > long-standing taint mode for any such purposes? If not, can you demonstrate > using it in python? > > I don't think that would be _necessary_ for a python taint mode > implementation to be considered useful, but it would certainly help get > attention to the idea. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Mon Oct 14 22:28:44 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 14 Oct 2013 16:28:44 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com> Message-ID: Excellent! My top two names are 1. multiset_permutations (reflects the mathematical name) 2. anagrams Note that we may also want to add multiset_combinations. It hasn't been part of this discussion, but it may be part of another discussion and I wanted to point this out as I know many of you are future-conscious. We seem to be all agreed that we want to accept "r", the length of the permutation desired. With permutations, the *set* is passed in as a iterable representing distinct elements. With multiset_permutations, there are three ways to pass in the *multiset*: - 1. an iterable whose elements (or an optional key function applied to which) are compared using __eq__ - 2. a dict (of which collections.Counter) is a subclass - 3. an iterable whose elements are key-value pairs and whose values are counts Example uses: 1. multiset_permutations(word) 2. multiset_permutations(Counter(word)) 3. multiset_permutations(Counter(word).items()) >From a dictionary: 1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k, v) for k, v in d.items())) 2. multiset_permutations(d) 3. multiset_permutations(d.items()) >From an iterable of key-value pairs: 1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k, v) for k, v in it)) 2. multiset_permutations({k: v for k, v in it}) 3. multiset_permutations(it) The advantage of 2 is that no elements are compared by multiset_permutations (so it is simpler and faster). The advantage of 3 is that no elements are compared, and they need not be comparable or hashable. This version is truly a generalization of the "permutations" function. This way, for any input "it" you could pass to permutations, you could equivalently pass zip(it, itertools.repeat(1)) to multiset_permutations. Comments? Neil On Mon, Oct 14, 2013 at 1:56 PM, Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > > On Oct 13, 2013, at 1:56 PM, Neil Girdhar wrote: > > I'm now convinced that itertools.permutations is fine as it is. I am not > totally convinced that multiset_permutations doesn't belong in itertools, > > > Now that we have a good algorithm, I'm open to adding this to itertools, > but it would need to have a name that didn't create any confusion > with respect to the existing tools, perhaps something like: > > anagrams(population, r) > > Return an iterator over a all distinct r-length permutations > of the population. > > Unlike permutations(), element uniqueness is determined > by value rather than by position. Also, anagrams() makes > no guarantees about the order the tuples are generated. > > > > Raymond > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leapyear.org Mon Oct 14 22:52:40 2013 From: bruce at leapyear.org (Bruce Leban) Date: Mon, 14 Oct 2013 13:52:40 -0700 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: <525AFCC5.4070309@mrabarnett.plus.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <525AFCC5.4070309@mrabarnett.plus.com> Message-ID: On Sun, Oct 13, 2013 at 1:04 PM, MRAB wrote: > Here's a use case: anagrams. > For what it's worth, I've written anagram-finding code, and I didn't do it with permutations. The faster approach is to create a dictionary mapping a canonical form of each word to a list of words, e.g., { 'ACT': ['ACT', 'CAT'], 'AET': ['ATE', 'EAT', 'ETA', 'TEA'] } This requires extra work to build the map but you do that just once when you read the dictionary and then every lookup is O(1) not O(len(word)). This canonical form approach is useful for other word transformations that are used in puzzles, e.g., words that are have the same consonants (ignoring vowels). --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security P.S. Yes, I know: if you play Scrabble, TAE is also a word. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Oct 15 00:59:54 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 14 Oct 2013 18:59:54 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com> Message-ID: On 10/14/2013 4:28 PM, Neil Girdhar wrote: > Excellent! > > My top two names are > 1. multiset_permutations (reflects the mathematical name) > 2. anagrams I like anagrams. I did not completely get what this issue was about until someone finally mentioned anagrams as use case. -- Terry Jan Reedy From tim.peters at gmail.com Tue Oct 15 02:48:17 2013 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Oct 2013 19:48:17 -0500 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com> Message-ID: [Raymond Hettinger] > Now that we have a good algorithm, I'm open to adding this to itertools, I remain reluctant, because I still haven't seen a compelling use case. Yes, it generates all distinct r-letter anagrams - but so what? LOL ;-) Seriously, I've written anagram programs several times in my life, and generating "all possible" never occurred to me because it's so crushingly inefficient. > but it would need to have a name that didn't create any confusion > with respect to the existing tools, perhaps something like: > > anagrams(population, r) "anagrams" is great! Inspired :-) What about an optional argument to define what the _user_ means by "equality"? The algorithm I posted had an optional `equal=operator.__eq__` argument. Else you're going to be pushed to add a clumsy `TransformAnagrams` later <0.4 wink>. > Return an iterator over a all distinct r-length permutations > of the population. > > Unlike permutations(), element uniqueness is determined > by value rather than by position. Also, anagrams() makes > no guarantees about the order the tuples are generated. Well, MRAB's algorithm (and my rewrite) guarantees that _if_ the elements support a total order, and appear in the iterable in non-decreasing order, then the anagrams are generated in non-decreasing lexicographic order. And that may be a useful guarantee (hard to tell without a real use case, though!). There's another ambiguity I haven't seen addressed explicitly. Consider this: >>> from fractions import Fraction >>> for a in anagrams([3, 3.0, Fraction(3)], 3): ... print(a) (3, 3.0, Fraction(3, 1)) All the algorithms posted here work to show all 3 elements in this case. But why? If the elements all equal, then other outputs "should be" acceptable too. Like (3, 3, 3) or (3.0, Fraction(3, 1), 3.0) etc. All those outputs compare equal! This isn't visible if, e.g., the iterable's elements are letters (where a == b if and only if str(a) == str(b), so the output looks the same no matter what). At least "my" algorithm could be simplified materially if it only saved (and iterated over) a (single) canonical representative for each equivalence class, instead of saving entire equivalence classes and then jumping through hoops to cycle through each equivalence class's elements. But, for some reason, output (3, 3, 3) just "looks wrong" above. I'm not sure why. From mistersheik at gmail.com Tue Oct 15 03:17:26 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 14 Oct 2013 21:17:26 -0400 Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com> <525AFCC5.4070309@mrabarnett.plus.com> Message-ID: Here are a couple people looking for the function that doesn't exist (yet?) http://stackoverflow.com/questions/9660085/python-permutations-with-constraints/9660395#9660395 http://stackoverflow.com/questions/15592299/generating-unique-permutations-in-python On Mon, Oct 14, 2013 at 4:52 PM, Bruce Leban wrote: > > On Sun, Oct 13, 2013 at 1:04 PM, MRAB wrote: > >> Here's a use case: anagrams. >> > > For what it's worth, I've written anagram-finding code, and I didn't do it > with permutations. The faster approach is to create a dictionary mapping a > canonical form of each word to a list of words, e.g., > > { > 'ACT': ['ACT', 'CAT'], > 'AET': ['ATE', 'EAT', 'ETA', 'TEA'] > } > > This requires extra work to build the map but you do that just once when > you read the dictionary and then every lookup is O(1) not O(len(word)). > This canonical form approach is useful for other word transformations that > are used in puzzles, e.g., words that are have the same consonants > (ignoring vowels). > > > --- Bruce > I'm hiring: http://www.cadencemd.com/info/jobs > Latest blog post: Alice's Puzzle Page http://www.vroospeak.com > Learn how hackers think: http://j.mp/gruyere-security > > P.S. Yes, I know: if you play Scrabble, TAE is also a word. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Tue Oct 15 03:17:56 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 15 Oct 2013 02:17:56 +0100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com> Message-ID: <525C97C4.2030601@mrabarnett.plus.com> On 15/10/2013 01:48, Tim Peters wrote: > [Raymond Hettinger] >> Now that we have a good algorithm, I'm open to adding this to itertools, > > I remain reluctant, because I still haven't seen a compelling use > case. Yes, it generates all distinct r-letter anagrams - but so what? > LOL ;-) Seriously, I've written anagram programs several times in my > life, and generating "all possible" never occurred to me because it's > so crushingly inefficient. > > >> but it would need to have a name that didn't create any confusion >> with respect to the existing tools, perhaps something like: >> >> anagrams(population, r) > > "anagrams" is great! Inspired :-) > > What about an optional argument to define what the _user_ means by > "equality"? The algorithm I posted had an optional > `equal=operator.__eq__` argument. Else you're going to be pushed to > add a clumsy `TransformAnagrams` later <0.4 wink>. > >> Return an iterator over a all distinct r-length permutations >> of the population. >> >> Unlike permutations(), element uniqueness is determined >> by value rather than by position. Also, anagrams() makes >> no guarantees about the order the tuples are generated. > > Well, MRAB's algorithm (and my rewrite) guarantees that _if_ the > elements support a total order, and appear in the iterable in > non-decreasing order, then the anagrams are generated in > non-decreasing lexicographic order. And that may be a useful > guarantee (hard to tell without a real use case, though!). > [snip] I can see that one disadvantage of my algorithm is that the worst-case storage requirement is O(n^2) (I think). This is because the set of first items could have N members, the set of second items could have N-1 members, etc. On the other hand, IMHO, the sheer number of permutations will become a problem long before the memory requirement does! :-) From steve at pearwood.info Tue Oct 15 03:27:18 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 15 Oct 2013 12:27:18 +1100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <20131013014742.GR7989@ando> Message-ID: <20131015012718.GZ7989@ando> On Mon, Oct 14, 2013 at 08:37:59AM -0400, Neil Girdhar wrote: > Actually I didn't notice that. It seems weird to find erf in math, but erf > for complex numbers in scipy.special. It's just about organization and > user discovery. I realize that from the developer's point of view, erf for > complex numbers is complicated, but why does the user care? 99% of users don't care about math.errf at all. Of those who do, 99% don't care about cmath.errf. I'd like to see cmath.errf because I'm a maths junkie, but if I were responsible for *actually doing the work* I'd make the same decision to leave cmath.errf out and leave it for a larger, more complete library like scipy. There are an infinitely large number of potential programs which could in principle be added to Python's std lib, and only a finite number of person-hours to do the work. And there are costs to adding software to the std lib, not just benefits. -- Steven From mistersheik at gmail.com Tue Oct 15 03:29:24 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 14 Oct 2013 21:29:24 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <20131015012718.GZ7989@ando> References: <20131013014742.GR7989@ando> <20131015012718.GZ7989@ando> Message-ID: You make a good point. It was just a random example to illustrate that desire for completeness. On Mon, Oct 14, 2013 at 9:27 PM, Steven D'Aprano wrote: > On Mon, Oct 14, 2013 at 08:37:59AM -0400, Neil Girdhar wrote: > > Actually I didn't notice that. It seems weird to find erf in math, but > erf > > for complex numbers in scipy.special. It's just about organization and > > user discovery. I realize that from the developer's point of view, erf > for > > complex numbers is complicated, but why does the user care? > > 99% of users don't care about math.errf at all. Of those who do, 99% > don't care about cmath.errf. I'd like to see cmath.errf because I'm a > maths junkie, but if I were responsible for *actually doing the work* > I'd make the same decision to leave cmath.errf out and leave it for a > larger, more complete library like scipy. > > There are an infinitely large number of potential programs which could > in principle be added to Python's std lib, and only a finite number of > person-hours to do the work. And there are costs to adding software to > the std lib, not just benefits. > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 15 03:39:30 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Oct 2013 11:39:30 +1000 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <20131013014742.GR7989@ando> <20131015012718.GZ7989@ando> Message-ID: On 15 October 2013 11:29, Neil Girdhar wrote: > You make a good point. It was just a random example to illustrate that > desire for completeness. The desire for conceptual purity and consistency is a good one, it just needs to be balanced against the practical constraints of writing, maintaining, documenting, teaching and learning the standard library. "It isn't worth the hassle" is the answer to a whole lot of "Why not X?" questions in software development :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tim.peters at gmail.com Tue Oct 15 03:40:00 2013 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Oct 2013 20:40:00 -0500 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: <525C97C4.2030601@mrabarnett.plus.com> References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com> <525C97C4.2030601@mrabarnett.plus.com> Message-ID: [MRAB] > I can see that one disadvantage of my algorithm is that the worst-case > storage requirement is O(n^2) (I think). This is because the set of > first items could have N members, the set of second items could have > N-1 members, etc. On the other hand, IMHO, the sheer number of > permutations will become a problem long before the memory requirement > does! :-) My rewrite is O(N) space (best and worst cases). I _think_ yours is too, but I understand my rewrite better by now ;-) Each element of the iterable appears in exactly one ENode: the `ehead` list is a partitioning of the input iterable. From mistersheik at gmail.com Tue Oct 15 03:40:52 2013 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 14 Oct 2013 21:40:52 -0400 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <20131013014742.GR7989@ando> <20131015012718.GZ7989@ando> Message-ID: On Mon, Oct 14, 2013 at 9:39 PM, Nick Coghlan wrote: > > Totally agree. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Oct 15 04:45:33 2013 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Oct 2013 21:45:33 -0500 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com> <20131012020647.GH7989@ando> <20131012063445.GI7989@ando> <20131013014742.GR7989@ando> Message-ID: One example of prior art: Maxima, which I use in its wxMaxima incarnation. """ Function: permutations(a) Returns a set of all distinct permutations of the members of the list or set a. Each permutation is a list, not a set. When a is a list, duplicate members of a are included in the permutations """ Examples from a Maxima shell: > permutations([1, 2. 3]); {[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]} > permutations([[1, 2], [1, 2], [2, 3]]) {[[1,2],[1,2],[2,3]], [[1,2],[2,3],[1,2]], [[2,3],[1,2],[1,2]]} > permutations({1, 1.0, 1, 1.0}) {[1,1.0],[1.0,1]} That last one may be surprising at first, but note that it's the first example where I passed a _set_ (instead of a list). And: > {1, 1.0, 1, 1.0} {1,1.0} Best I can tell, Maxima has no builtin function akin to our permutations(it, r) when r < len(it). But Maxima has a huge number of builtin functions, and I often struggle to find ones I _want_ in its docs ;-) From breamoreboy at yahoo.co.uk Tue Oct 15 09:30:21 2013 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 15 Oct 2013 08:30:21 +0100 Subject: [Python-ideas] Extremely weird itertools.permutations In-Reply-To: References: <20131013014742.GR7989@ando> <20131015012718.GZ7989@ando> Message-ID: On 15/10/2013 02:39, Nick Coghlan wrote: > > The desire for conceptual purity and consistency is a good one, it > just needs to be balanced against the practical constraints of > writing, maintaining, documenting, teaching and learning the standard > library. > > "It isn't worth the hassle" is the answer to a whole lot of "Why not > X?" questions in software development :) > > Cheers, > Nick. > Would our volunteers be more inclined to take on the hassle if they got double time on Saturdays and triple time on Sundays? :) -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence From felix at groebert.org Tue Oct 15 11:58:41 2013 From: felix at groebert.org (=?ISO-8859-1?Q?Felix_Gr=F6bert?=) Date: Tue, 15 Oct 2013 11:58:41 +0200 Subject: [Python-ideas] pytaint: taint tracking in python In-Reply-To: References: Message-ID: 1. Please correct me if I misunderstand the Python project, but if the idea is deemed 'good' by this list, a PEP can follow and the feature can be included in Python 3? It is not necessary to have a Python 3 implementation beforehand? The existing Python 2.7.5 pytaint implementation is intended to be run by users who need tainting in Python 2 but can also serve as a reference / benchmark / proof-of-concept implementation for this discussion. 2. I haven't had the time to publish benchmarks yet but I plan to. Also, of course, the cpython tests pass and we added additional taint tracking tests. We also ran the internal tests of our python codebase with the pytaint interpreter. This had negligible fails, mostly because some C extensions haven't had been recompiled to work with the redefined string objects. Regarding taint tracking as a feature for python: First of all, taint tracking is a general language feature and can be considered for additional applications besides security. When it comes to the security community, taint tracking is certainly controversial. Nevertheless, my pytaint announcement received 50 retweets and 30 favs from a part of the security community, if that counts for something ;) As Andrew and Bruce mention, there are other solutions to XSS and SQLi: template systems and parameterized queries. Another library solution exists to shell injection: pipes.quote. However, all these solutions require the developer to pick the correct library and method. We have empirical indicators that this works, but maybe only in 70% of cases. The rest of the developers are introducing new vulnerabilities. Thus, an additional language-based feature can help to mitigate the remaining 30% of cases. A web app framework (or a python-developing company) can maintain and ship a pytaint configuration which will throw a TaintError exception in those 30% of cases and prevent the vulnerability from being exploited. This argument follows along the principle of defense-in-depth: why just have one security feature (e.g. pipes.quote) if we can offer several security features to the developer? This has previously worked well for system security: ALSR, DEP, etc. Regarding the relation to typing: We are using Mertis on purpose to be able to distinguish between different forms of string cleaning. Today, most HTML template systems don't even make a distinction between different escaping contexts. However, with a pytaint Merit configuration for raw HTML, URLs, HTML attribution contents, CSS attributes and JS strings, you would be able to make sure that your string is cleaned for the specific context you're using it in. This can be implemented for each template system individually but it would be easier to just write a pytaint config. If you don't clean strings based on browser context, you will run into problems: a string is cleaned with HTML-entity encoding but used in a