[Python-Dev] Re: [Python-checkins] python/dist/src/Lib/test test_string.py, 1.25, 1.26

Tim Peters tim.peters at gmail.com
Thu Aug 26 20:21:25 CEST 2004


[Walter Dörwald]
> I'm working on it, however I discovered that unicode.join()
> doesn't optimize this special case:
>
> s = "foo"
> assert "".join([s]) is s
>
> u = u"foo"
> assert u"".join([s]) is s
> 
> The second assertion fails.

Well, in that example it *has* to fail, because the input (s) wasn't a
unicode string to begin with, but u"".join() must return a unicode
string.  Maybe you intended to say that

    assert u"".join([u]) is u

fails (which is also true today, but doesn't need to be true tomorrow).

> I'd say that this test (joining a one item sequence returns
> the item itself) should be removed because it tests an
> implementation detail.

Neverthess, it's an important pragmatic detail.  We should never throw
away a test just because rearrangement makes a test less convenient.

> I'm not sure, whether the optimization should be added to
> unicode.find().

Believing you mean join(), yes.  Doing common endcases efficiently in
C code is an important quality-of-implementation concern, lest people
need to add reams of optimization test-&-branch guesses in their own
Python code.  For example, the SpamBayes tokenizer has many passes
that split input strings on magical separators of one kind or another,
pasting the remaining pieces together again via string.join().  It's
explicitly noted in the code that special-casing the snot out of
"separator wasn't found" in Python is a lot slower than letting
string.join(single_element_list) just return the list element, so that
simple, uniform Python code works well in all cases.  It's expected
that *most* of these SB passes won't find the separator they're
looking for, and it's important not to make endless copies of
unboundedly large strings in the expected case.  The more heavily used
unicode strings become, the more important that they treat users
kindly in such cases too.


More information about the Python-Dev mailing list