[Python-Dev] Re: Re: [Python-checkins] python/dist/src/Objects unicodeobject.c, 2.219, 2.220

Tim Peters tim.peters at gmail.com
Fri Aug 27 16:30:27 CEST 2004


[M.-A. Lemburg]
> Hmm, you've now made PyUnicode_Join() to work with iterators
> whereas PyString_Join() only works for sequences.

They have both worked with iterators since the release in which
iterators were introduced.  Nothing changed now in this respect.

> What are the performance implications of this for PyUnicode_Join() ?

None.

> Since the string and Unicode implementations have to be in sync,
> we'd also need to convert PyString_Join() to work on iterators.

It already does.  I replied earlier this week on the same topic --
maybe you didn't see that, or maybe you misunderstand what
PySequence_Fast does.

> Which brings up the second question:
> What are the performance implications of this for PyString_Join() ?

None.

> The join operation is a widely used method, so both implementations
> need to be as fast as possible. It may be worthwhile making the
> PySequence_Fast() approach a special case in both routines and
> using the iterator approach as fallback if no sequence is found.

string_join uses PySequence_Fast already; the Unicode join didn't, and
still doesn't.  In the cases of exact list or tuple arguments,
PySequence_Fast would be quicker in Unicode join.  But in any cases
other than those,  PySequence_Fast materializes a concrete tuple
containing the full materialized iteration, so could be more
memory-consuming.  That's probably a good tradeoff, though.

> Note that PyString_Join() with iterator support will also
> have to be careful about not trying to iterate twice,

It already is.  Indeed, the primary reason it uses PySequence_Fast is
to guarantee that it never iterates over an iterator argument more
than once.  The Unicode join doesn't have that potential problem.

> so it will have to use a similiar logic to the one applied
> in PyString_Format() where the work already done up to the
> point where it finds a Unicode string is reused when calling
> PyUnicode_Format().

>>> def g():
...     for piece in 'a', 'b', u'c', 'd': # force Unicode promotion on 3rd yield
...         yield piece
...
>>> ' '.join(g())
u'a b c d'
>>>


More information about the Python-Dev mailing list