[Python-ideas] Extremely weird itertools.permutations

Sat Oct 12 07:26:07 CEST 2013

What you propose, Nick, is definitely different from the several functions
that have been bandied about here.  I.e.

>>> def nick_permutations(items, r=None):
...     return (k for k, grp in groupby(permutations(sorted(items),r)))

>>> ["".join(p) for p in nonredundant_permutations('aaabb', 3)]
['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']

>>> ["".join(p) for p in nick_permutations('aaabb', 3)]
['aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'aaa', 'aab',
'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'aaa', 'aab', 'aaa', 'aab',
'aba', 'abb', 'aba', 'abb', 'baa', 'bab', 'baa', 'bab', 'baa', 'bab',
'bba', 'baa', 'bab', 'baa', 'bab', 'baa', 'bab', 'bba']

>>> ["".join(p) for p in permutations('aaabb', 3)]
['aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba',
'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba',
'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab',
'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'baa', 'baa', 'bab', 'baa',
'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba', 'baa', 'baa',
'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba']

If I'm thinking of this right, what you give is equivalent to the initial
flawed version of 'nonredundant_permutations()' that I suggested, which
used '!=' rather than the correct '>' in comparing to the 'last' tuple.

FWIW, I deliberately chose the name 'nonredundant_permutations' rather than
MRAB's choice of 'unique_permutations' because I think what the filtering
does is precisely NOT to give unique ones. Or rather, not to give ALL
unique ones, but only those defined by equivalence (i.e. rather than
identity).

My name is ugly, and if there were to be a function like it in itertools, a
better name should be found. But such a name should emphasize that it is
"filter by equivalence classes" ... actually, maybe this suggests another
function which is instead "filter by identity of tuples", potentially also
added to itertools.

On Fri, Oct 11, 2013 at 9:35 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

>
> On 12 Oct 2013 12:56, "Neil Girdhar" <mistersheik at gmail.com> wrote:
> >
> > I honestly think that Python should stick to the mathematical definition
> of permutations rather than some kind of consensus of the tiny minority of
> people here.  When next_permutation was added to C++, I believe the whole
> standards committee discussed it and they came up with the thing that makes
> the most sense.  The fact that dict and set use equality is I think the
> reason that permutations should use equality.
>
> Why should the behaviour of hash based containers limit the behaviour of
> itertools?
>
> Python required a permutation solution that is memory efficient and works
> with arbitrary objects, so that's what itertools provides.
>
> However, you'd also like a memory efficient iterator for *mathematical*
> permutations that pays attention to object values and filters out
> equivalent results.
>
> I *believe* the request is equivalent to giving a name to the following
> genexp:
>
>     (k for k, grp in groupby(permutations(sorted(input))))
>
> That's a reasonable enough request (although perhaps more suited to the
> recipes section in the itertools docs), but conflating it with complaints
> about the way the existing iterator works is a good way to get people to
> ignore you (especially now the language specific reasons for the current
> behaviour have been pointed out, along with confirmation of the fact that
> backwards compatibility requirements would prohibit changing it even if we
> wanted to).
>
> Cheers,
> Nick.
>
> >
> > Neil
> >
> >
> > On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz at gnosis.cx> wrote:
> >>
> >> Related to, but not quite the same as Steven D'Aprano's point, I would
> find it very strange for itertools.permutations() to return a list that was
> narrowed to equal-but-not-identical items.
> >>
> >> This is why I've raised the example of 'items=[Fraction(3,1),
> Decimal(3.0), 3.0]' several times.  I've created the Fraction, Decimal, and
> float for distinct reasons to get different behaviors and available
> methods.  When I want to look for the permutations of those I don't want
> "any old random choice of equal values" since presumably I've given them a
> type for a reason.
> >>
> >> On the other hand, I can see a little bit of sense that
> 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me
> a list of 7!==5040 things that are exactly the same as each other.  On the
> other hand, I don't know how to generalize that, since my feeling is far
> less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's
> redundancy, but there's also important information in the probability and
> count of specific sequences.
> >>
> >> My feeling, however, is that if one were to trim down the results from
> a permutations-related function, it is more interesting to me to only
> eliminate IDENTICAL items, not to eliminate merely EQUAL ones.
> >>
> >>
> >> On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik at gmail.com>
> wrote:
> >>>
> >>> I think it's pretty indisputable that permutations are formally
> defined this way (and I challenge you to find a source that doesn't agree
> with that).  I'm sure you know that your idea of using permutations to
> evaluate a multinomial distribution is not efficient.  A nicer way to
> evaluate probabilities is to pass your set through a collections.Counter,
> and then use the resulting dictionary with scipy.stats.multinomial (if it
> exists yet).
> >>>
> >>> I believe most people will be surprised that
> len(permutations(iterable)) does count unique permutations.
> >>>
> >>> Best,
> >>>
> >>> Neil
> >>>
> >>>
> >>> On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve at pearwood.info>
> wrote:
> >>>>
> >>>> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:
> >>>> > "It is universally agreed that a list of n distinct symbols has n!
> >>>> > permutations. However, when the symbols are not distinct, the most
> common
> >>>> > convention, in mathematics and elsewhere, seems to be to count only
> >>>> > distinct permutations." —
> >>>>
> >>>> I dispute this entire premise. Take a simple (and stereotypical)
> >>>> example, picking balls from an urn.
> >>>>
> >>>> Say that you have three Red and Two black balls, and randomly select
> >>>> without replacement. If you count only unique permutations, you get
> only
> >>>> four possibilities:
> >>>>
> >>>> py> set(''.join(t) for t in itertools.permutations('RRRBB', 2))
> >>>> {'BR', 'RB', 'RR', 'BB'}
> >>>>
> >>>> which implies that drawing RR is no more likely than drawing BB, which
> >>>> is incorrect. The right way to model this experiment is not to count
> >>>> distinct permutations, but actual permutations:
> >>>>
> >>>> py> list(''.join(t) for t in itertools.permutations('RRRBB', 2))
> >>>> ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB',
> 'RB',
> >>>> 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']
> >>>>
> >>>> which makes it clear that there are two ways of drawing BB compared to
> >>>> six ways of drawing RR. If that's not obvious enough, consider the
> case
> >>>> where you have two thousand red balls and two black balls -- do you
> >>>> really conclude that there are the same number of ways to pick RR as
> BB?
> >>>>
> >>>> So I disagree that counting only distinct permutations is the most
> >>>> useful or common convention. If you're permuting a collection of
> >>>> non-distinct values, you should expect non-distinct permutations.
> >>>>
> >>>> I'm trying to think of a realistic, physical situation where you would
> >>>> only want distinct permutations, and I can't.
> >>>>
> >>>>
> >>>> > Should we consider fixing itertools.permutations and to output only
> unique
> >>>> > permutations (if possible, although I realize that would break
> code).
> >>>>
> >>>> Absolutely not. Even if you were right that it should return unique
> >>>> permutations, and I strongly disagree that you were, the fact that it
> >>>> would break code is a deal-breaker.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Steven
> >>>> _______________________________________________
> >>>> Python-ideas mailing list
> >>>> Python-ideas at python.org
> >>>> https://mail.python.org/mailman/listinfo/python-ideas
> >>>>
> >>>> --
> >>>>
> >>>> ---
> >>>> You received this message because you are subscribed to a topic in
> the Google Groups "python-ideas" group.
> >>>> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
> >>>> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> >>>> For more options, visit https://groups.google.com/groups/opt_out.
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Python-ideas mailing list
> >>> Python-ideas at python.org
> >>> https://mail.python.org/mailman/listinfo/python-ideas
> >>>
> >>
> >>
> >>
> >> --
> >> Keeping medicines from the bloodstreams of the sick; food
> >> from the bellies of the hungry; books from the hands of the
> >> uneducated; technology from the underdeveloped; and putting
> >> advocates of freedom in prisons.  Intellectual property is
> >> to the 21st century what the slave trade was to the 16th.
> >
> >
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> >
>

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/b4d4965a/attachment-0001.html>