[Tutor] feedback on permutations script
Steven D'Aprano
steve at pearwood.info
Sun Jul 11 05:29:58 CEST 2010
On Sun, 11 Jul 2010 02:48:38 am Isaac wrote:
> This is my interpretation of an algorithm to generate all
> permutations of items in a list. I would appreciate any feedback
> regarding advice for improvements.
Some of this is going to just be personal style, some of this is going
to be slightly stronger semi-official recommended style (PEP 8 if you
want to google further), and right at the end some hints regarding the
algorith.
> #!/usr/bin/python
If you're on Linux, it is generally recommended to use a shebang line
of "#!/usr/bin/env python". Python can be installed in different
places, while /usr/bin is the standard location for env. Of course,
this isn't entirely portable either...
> def add_at_index(new_item, target_array, target_index):
> """
> new_item is a single list item.
> target_array is a list or iterable.
> target_index is a number.
>
> This function returns a new list that inserts new_item inside
> target_array at target_array's target_index. The new list will be 1
> element longer than before.
> """
>
> new_list = target_array[:]
>
> new_list.insert(target_index, new_item)
> return new_list
This has misleading function and argument names: it's called
add_at_index and takes an argument called target_array, but:
(1) target_array is not expected to be an array, but a list. People
forget that Python *does* have a standard array type. Just import array
to be reminded :)
(2) target_array is documented as being any iterable, but that's simply
not correct. Try passing a tuple, string or iterator and watch it
explode.
(3) The function and argument names imply that it *adds* to a *target*,
but that's not what it does. It makes a copy. Unfortunately Python
doesn't have a common convention for indicating functions that make a
copy.
(4) target_index is said to be a number. Can you pass floats? Fractions?
Decimals? Complex numbers?
(5) And the function is awfully verbose for such a tiny little thing.
I'm not very happy with the function name, but can't think of anything
better that's not excessively long, so I'll keep the name. Here is how
I would re-write the code:
def add_at_index(item, target, where):
"""Return a copy of target as a list and insert item at index where.
item: any object
target: any finite iterable
where: an integer index
"""
new_list = list(target)
new_list.insert(where, item)
return new_list
> def append_lists(working_list, list_of_lists):
> # appends every list in list_of_lists to working_list; returns
> working_list
> for _list in list_of_lists:
> working_list.append(_list)
> return working_list
You like those verbose variable names, don't you? :)
You have a comment, but no doc-string. Is that deliberate?
You have named a local variable "_list". This isn't *wrong* exactly, but
it's very unusual. The standard convention in Python is that names
starting with a single underscore are "private" names, but local
variables inside a function are always private! Nothing outside of the
function can get access to or see that variable, so marking it private
is unnecessary.
If your only reason for using the underscore is to distinguish the
variable from the built-in "list", then consider these alternatives:
* since this variable is just a generic name, you can use any
old generic name: L, mylist, alist, seq, ...
* where you absolutely have to use the word "list", the usual
Python convention to avoid shadowing the built-in is to
append the underscore to the end of the name: list_
* in a function this small, where you don't use the built-in
list for anything, feel free to shadow the built-in. It won't
hurt, although some people will hate it, and code checkers like
PyLint or PyChecker may complain.
According to the comment and the name, list_of_lists must contain only
lists. But why? There doesn't seem to be anything in the code that
*requires* the items to be lists. This seems to be more of a
multi-append than an append-lists function. So re-writing it:
def multi_append(target, items):
"""Return target list with each item of items appended.
target: a list
items: any sequence or iterable
"""
for item in items:
target.append(item)
return target
But this function is entirely unnecessary, because that's exactly what
the "extend" method does:
target.extend(items)
> def insert_each(orphan_item, target_array):
> """
> orphan_item is a single list item.
> target_array is a list or iterable.
>
> This function returns a list of lists that place orphan_item in
> between and at
> the ends of each list element in target_array
> """
>
> new_array_length = len(target_array) + 1
> array_of_arrays = []
>
> for count in range(new_array_length):
> array_of_arrays.append(add_at_index(orphan_item,
> target_array, count))
>
> return array_of_arrays
The description of this function is complicated enough that it needs an
example in the doc-string. Here's my version:
def insert_each(item, target):
"""Return a list of lists generated from target list by
inserting item into each position of the copy:
>>> insert_each(0, [1, 2, 3])
[[0, 1, 2, 3], [1, 0, 2, 3], [1, 2, 0, 3], [1, 2, 3, 0]]
item: any object
target: any finite sequence or iterable.
"""
result = []
for i in range(len(target) + 1):
new_list = add_at_index(item, target, i)
result.append(new_list)
return result
> def permute(original_list, list_of_lists):
> """
> accept a list of items, original_list, to insert one at a time
> into list_of_lists
> using the function insert_each. Called for the first time,
> new_list is a list containing an empty list. Then call
> recursively using an updated list_of_lists, called new_list_of_lists
> below, and the remaining items in original list.
> """
This tells *how* the function works, but not *why*, or *what* it is
supposed to do. It says to set new_list to [[]] but there is no
argument new_list, so what is the user supposed to do with it?
Similarly, your variable names tend to describe what they are made of,
rather than what they represent. "list_of_lists" isn't just some
arbitrary list containing other lists. It is a list of permutations.
The fact that each permutation is a list is unimportant.
If the user must call a function with a particular argument, then don't
make that a requirement. Just have the function create it itself.
There's no need to have the comments "call recursively". Presumably the
reader can actually read. If you're reading a function called "permute"
and see a call to permute(), then it's pretty safe to bet that it's a
recursive call. That's not quite so bad as the archetypal Bad Comment:
x = x+1 # add one to x
Wow, thanks Captain Obvious, what would we do without you?
*rolls eyes*
but it is approaching it. The only time it does need a comment is if,
somehow, it ISN'T a recursive call.
So, re-writing using the updated functions above and (IMO) more sensible
variable names, but leaving the basic algorithm alone:
def permute(seq, _permutations=None):
"""Return a list of all the permutations of seq.
seq: a finite sequence or iterable.
_permutations: private argument, do not supply this.
>>> permute([1, 2, 3])
[[1, 2, 3], [2, 1, 3], [2, 3, 1], ... [3, 2, 1]]
"""
if _permutations is None:
_permutations = [[]]
if not isinstance(seq, list):
seq = list(seq)
try:
item = seq.pop()
except IndexError:
# The final iteration has been completed, so we're done.
return _permutations
if _permutations == [[]]:
_permutations = [[item]]
else:
temp = []
for perm in _permutations[:]:
perm = insert_each(item, perm)
temp.extend(perm)
_permutations = temp
return permute(seq, _permutations)
A couple of comments:
* This is going to be inefficient even for quite small sizes
of the input list. It seems to me that you're doing a lot
of unnecessary copying of lists and inserting. But don't
take that as gospel, I'd need to think about it a bit more
to be sure of that.
* However the basic approach -- to generate all the
permutations at once -- truly is very inefficient. The
number of permutations grows exponentially fast, and so do
the time and memory requirements. A better approach is to
generate the permutations lazily, one at a time: see Python
generators and the yield statement.
* Recursion in Python isn't as efficient as some other
languages, so this will be unnecessarily slow compared to
a version using iteration. For small lists you won't notice
the difference; for large lists you'll run out of memory
first; but for intermediate lists you could see some gains
by re-writing using a loop.
--
Steven D'Aprano
More information about the Tutor
mailing list