I come to praise .join, not to bury it...

Alex Martelli aleaxit at yahoo.com
Wed Mar 7 04:54:15 EST 2001


"Greg Ewing" <greg at cosc.canterbury.ac.nz> wrote in message
news:3AA58BD0.3324C709 at cosc.canterbury.ac.nz...
> Alex Martelli wrote:
> >
> > No!  string_join (in stringobject.c) currently just _checks_
> > PyString_Check on the items -- if any item fails that, it
> > delegates to PyUnicode_Join (if PyUnicode_Check OK's that).
>
> And that's somehow not a typeswitch? If it's a regular
> string do this... if it's a Unicode string do that...
> smells like a typeswitch to me...

It's (needlessly) coded as a typeswitch in the Python 2
sources for string_join, yes.  The correct coding to
reflect the design would be to use the existing standard
interface to ask each item to _behave like_ a string,
rather than resorting to the "IS of identity" -- Korzibski's
intimations against it are perfectly valid in the context
of polymorphic programming, no less than in epistemology;
in fact, just about any test for 'IS object X of exact type
Y' you can find in the Python sources, unless they are
shortcut-like 'accelerators' for a special case _before_
the more-general request 'please o mr X try to behave like
an Y, are you able to?', can be seen as implementation
defects, in my personal opinion.

But we're arguing about the DESIGN, not the _current
implementation_ thereof, with its limitations and bugs.


> You seemed to be arguing that dispatching on the joiner
> somehow resulted in an efficiency gain by going straight
> to the right code for the kind of strings being joined,
> and I was pointing out that that's not the case. But
> maybe I misunderstood what you were saying.

The joiner-dispatch DESIGN *affords* efficiency gains;
the implementation, at one stage, may not take advantage
of the enabling-effects of a good design, but that's no
reason to make the design worse, contorting it to refect
the current limitations of an implementation!


> > You gain a clean approach, which the implementation could
> > (and should) exploit to access the items in the sequence
> > through the _appropriate_ standard interface
>
> It's only clean according to one measure of cleanness.
> It could be considered cleaner to treat all the arguments
> uniformly, and access them *all* through a standard
> interface.

And *WHAT* 'standard interface' would apply on the joiner
object, pray?  The only response I see that makes sense is
"a joiner interface" -- something that the joiner object
exposes, which has a method to be called to do the join.
Which, surprise surprise, is EXACTLY what we have now --
the method in question being called 'join' (why not?!).

Note that pairwise-joining of items is NOT sufficient,
because of efficiency considerations -- O(N) versus
O(N squared).  (In other terms: general joining of a
sequence may be _functionally_ constructed by iterating
a join-two-items primitive, BUT that is unacceptable as
it leads to O(N*N) performance).


> > There are NO technical costs associated with these
> > advantages.
>
> I disagree. I've mentioned one already -- it bloats the
> string object, and makes the Python core larger and
> less modular.

The string object and the string module are both in
the Python core, so moving functionality between them
can in no way be considered 'bloating'.  Somebody
who's _subsetting_ the Python core is (by definition)
operating outside of Python's problem-space, and
his or her problems are not technical Python problems.


> > Only technical advantages and costs
> > can be fairly put on the balance
>
> I disagree with that, too. I think it's quite possible
> and appropriate for a small technical advantage to be
> outweighed by a larger aesthetic one. For example, there
> are technical advantages to braces instead of indentation:
> the scanner is easier to implement, it's easier to write
> programs which generate code, and there's no chance of
> space-tab confusion. On the indentation side, there's the
> advantage that indentation and block structure always
> agree. If you're counting technical advantages, that's
> one in favour of indentation and three against. But I
> doubt Guido would switch on that basis.

'Counting' technical advantages is as idiotic as 'counting'
coins as a measure of wealth -- if I have one 1-dollar
coin, and you have three 5-cent ones, such an idiot counter
would conclude that you must be richer than me by a
three to one margin.  *Obviously*, the balancing of
technical considerations must take account of their
importance and relevance; a 'count' is inappropriate.

For the brace/indentation case, I would count only the
space/tab issue as being of any substantial worth -- the
miniscule extra workload on the scanner, and the need
for the output-stage of source-generators to keep track
of nesting (advisable on other grounds anyway) being
absolutely-unessential trifles.  As the space/tab issue
can easily be ameliorated via tabnanny & friends, I judge
the blocking==indenting advantage strongly dominant (and
I'm happy Guido's technical evaluation is the same:-).


> > Note that I've seen NO arguments AT ALL for having
> > .join be a method on SEQUENCE object ... despite this
> > being often mentioned on an irrational-purely-aesthetical
> > basis.
>
> I don't think the basis is irrational at all. It
> seems to me that there are two quite distinct possible
> reasons for deciding to make something a method of a
> particular object.
>
> One is to get polymorphism on that object. The other

So far, so good.

> is because the object makes sense as the direct object
> (in the English grammar sense) of the operation, so
> that you can naturally read x.verb(y) as "verb x
> with y" or "verb x using y".

And here, we disagree VERY deeply.  Natural language is
NO sensible basis on which to approach software design!
Please re-read Wittgenstein's works (the Philosophical
Investigations will suffice) for many deep, convincing
arguments on the issue -- W's youthful hopes to clean up
natural language of 'inappropriate superstructure' and
make it a usably solid basis for anything rigorous (as
vigorously, even brilliantly, pursued in the Tractatus)
being inevitably dashed by the deep nature of language.

Yeah, yeah, I know -- the world is full of assassinated
trees, tragically chomped into pulp for the purpose of
making paper on which such absurdities as basing software
design on natural-language foibles are _advised_.  But
then, in the literature for any subject you can find many
widespread examples of both obvious and subtle absurdities.

When you do O-O design in the context of a language that
does NOT force you to express any function as a method
of something, x.verb(y) MEANS that object x has some say
(which it may of course choose to delegate) on how verb
is implemented, while object y does not necessarily have
any such implementation-freedom aspect regarding verb.
(This assumes a single-dispatch setting -- multimethods,
such as Dylan's, may change the picture drastically).

If the language forces you to express everything as a
method, then the implications are correspondingly weaker --
but, still there to some extent, since you MIGHT have
chosen to introduce a third-party object to act (to all
intent and purposes) as a 'collection of functions', to
be able to write functions.verb(x,y) and avoiding the
(supposedly undesired) dissimetry between x and y's roles.


> When these two considerations pull in the same
> direction, all is well. And with most of the string
> methods they do, but in the case of join, they don't.
> There are good reasons for not making it a method of
> the sequence, but when it is made a method of the
> joiner, the resulting expression sounds backwards[1].

Suppose I'm building a line of a CSV file, a popular
textual format where fields are separated (aka joined)
with commas -- the acronym stands for 'comma-separated
values'.  So, what I want to do is
    'comma-separate these values'
and, since 'separate' equates to join, this equates to:
    'comma-join these values'
so, I code:
    comma.join(these_values)
*WHAT'S BACKWARDS ABOUT IT*?!  (Please note that the
CSV acronym IS an English one -- maybe _your_ English
native-speaker preference would have been for 'VSWC',
'Values Separated With Commas', but English native
speakers from _North_ of the Equator coined this term,
so, maybe it's a Coriolis-effect issue...?-).


> So we are faced with two alternatives, neither
> of which are entirely good. In such cases, my
> preference is not to choose either alternative --
> i.e. make join a function, not a method.

A definite technical loss on the altar of endlessly
debatable aesthetic grounds?  ***Include me out***!


Alex






More information about the Python-list mailing list