suggestion for a small addition to the Python 3 list class

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Apr 21 15:53:27 EDT 2013


On Sun, 21 Apr 2013 09:09:20 -0500, Robert Yacobellis wrote:

> Greetings,
>  
> I'm an instructor of Computer Science at Loyola University, Chicago, and
> I and Dr. Harrington (copied on this email) teach sections of COMP 150,
> Introduction to Computing, using Python 3.  One of the concepts we teach
> students is the str methods split() and join().  I have a suggestion for
> a small addition to the list class: add a join() method to lists.  It
> would work in a similar way to how join works for str's, except that the
> object and method parameter would be reversed: <list object>.join(<str
> object>).

That proposed interface doesn't make much sense to me. You're performing 
a string operation ("make a new string, using this string as a 
separator") not a list operation, so it's not really appropriate as a 
list method. It makes much more sense as a string method.

It is also much more practical as a string method. This way, only two 
objects need a join method: strings, and bytes (or if you prefer, Unicode 
strings and byte strings). Otherwise, you would need to duplicate the 
method in every possible iterable object:

- lists
- tuples
- dicts
- OrderedDicts
- sets
- frozensets
- iterators
- generators
- every object that obeys the sequence protocol
- every object that obeys the iterator protocol

(with the exception of iterable objects such as range objects that cannot 
contain strings). Every object would have to contain code that does 
exactly the same thing in every detail: walk the iterable, checking that 
the item is a string, and build up a new string with the given separator:

class list: # also tuple, dict, set, frozenset, etc...
    def join(self, separator):
        ...


Not only does that create a lot of duplicated code, but it also increases 
the burden on anyone creating an iterable class, including iterators and 
sequences. Anyone who writes their own iterable class has to write their 
own join method, which is actually trickier than it seems at first 
glance. (See below.)

Any half-decent programmer would recognise the duplicated code and factor 
it out into an external function that takes a separator and a iterable 
object:

def join(iterable, separator):
    # common code goes here... it's *all* common code, every object's 
    # join method is identical


That's exactly what Python already does, except it swaps the order of the 
arguments:

def join(separator, iterable):
    ...


and promotes it to a method on strings instead of a bare function.


> Rationale: When I teach students about split(), I can intuitively tell
> them split() splits the string on its left on white space or a specified
> string.  Explaining the current str join() method to them doesn't seem
> to make as much sense: use the string on the left to join the items in
> the list??

Yes, exactly. Makes perfect sense to me.


> If the list class had a join method, it would be more
> intuitive to say "join the items in the list using the specified string
> (the method's argument)."

You can still say that. You just have to move the parenthetical aside:

"Join the items in the list (the method's argument) using the specified 
string."



> This is similar to Scala's List mkString() method.


This is one place where Scala gets it wrong. In my opinion, as a list 
method, mkString ought to operate on the entire list, not its individual 
items. The nearest equivalent in Python would be converting a list to a 
string using the repr() or str() functions:

py> str([1, 2, 3])
'[1, 2, 3]'


(which of course call the special methods __repr__ or __str__ on the 
list).


> I've attached a proposed implementation in Python code which is a little
> more general than what I've described.  In this implementation the list
> can contain elements of any type, and the separator can also be any data
> type, not just str.

Just for the record, the implementation you provide will be O(N**2) due 
to the repeated string concatenation, which means it will be *horribly* 
slow for large enough lists. It's actually quite difficult to efficiently 
join a lot of strings without using the str.join method. Repeated string 
concatenation will, in general, be slow due to the repeated copying of 
intermediate results.

By shifting the burden of writing a join method onto everyone who creates 
a sequence type, we would end up with a lot of slow code.

If you must have a convenience (inconvenience?) method on lists, the 
right way to do it is like this:

class list2(list):
    def join(self, sep=' '):
        if isinstance(sep, (str, bytes)):
            return sep.join(self)
        raise TypeError





-- 
Steven



More information about the Python-list mailing list