[Python-ideas] The non-obvious nature of str.join (was Re: sum(...) limitation)

Stephen J. Turnbull stephen at xemacs.org
Wed Aug 13 06:38:25 CEST 2014


Wolfgang Maier writes:

 > Exactly. So my point was that when you don't subclass str, but instead 
 > use a wrapper around it, you can give it a as str-like interface as you 
 > want so the thing looks and feels like a string to users, it will still 
 > not work as part of an iterable passed to .join

You mean this behavior?

wideload:~ 12:42$ python3.2           
>>> 
... 
>>> class N:
...  def __init__(self, s=''):           
...   self.s = s
...  def __str__(self):
...   return self.s
... 
>>> " ".join(['a', N('b')])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 1: expected str instance, N found
>>> ' '.join(str(x) for x in ['a', N('b')])
'a b'
>>> 

Given the fact that every object is str-able, I don't think we want to
give "str(x) for x in" semantics to str.join.  So I think the answer
is "if you want Nasty to automatically acquire all the behaviors of
str, make it a subclass of str".

I can't think of a use case where subclassing would be problematic.

 > Sum on the other hand knows how to use .__add__ and .__radd__ .

It seems to me that that's a strong argument against "summing strings"
with the current implementation of sum(), given the ease with which
you can construct types where the "sum" of an iterable can be
implemented efficiently and gives the same answer as the generic
algorithm based on '+', but the generic algorithm is inefficient (just
make it immutable).

I suppose most Sequence types are arrays of pointers at the C level,
or otherwise implement O(1) '+=', so either the join-style "just
memmove the arrays into a sufficiently large buffer", or iterated
'+=', does the trick for an efficient generic sum.

This just guesswork, though.




More information about the Python-ideas mailing list