[Python-Dev] Re: PEP 292 - Simpler String Substitutions

ncoghlan at iinet.net.au ncoghlan at iinet.net.au
Tue Aug 24 01:49:52 CEST 2004


Quoting Raymond Hettinger <python at rcn.com>:

> > This code-snippet is littered everwhere in my applications:
> > 
> >     string.join([str(x) for x in iterable])
> > 
> > Its tedious and makes code hard to read.  Do we need a PEP to fix
> this?
> 
> A PEP would be overkill.
> 
> Still, it would be helpful to do PEP-like things such as reference
> implementation, soliticing comments, keep an issue list, etc.
> 
> A minor issue is that the implementation automatically shifts to Unicode
> upon encountering a Unicode string.  So you would need to test for this
> before coercing to a string.

Perhaps have string join coerce to string, and Unicode join coerce to the
separator's encoding. If we do that, the existing string->Unicode promotion code
should handle the switch between the two join types.

> Also, join works in multiple passes.  The proposal should be specific
> about where stringizing occurs.  IIRC, you need the object length on the
> first pass, but the error handling and switchover to Unicode occur on
> the second.

Having been digging in the guts of string join last week, I'm pretty sure the
handover to the Unicode join happens on the first 'how much space do we need'
pass (essentially, all of the work done so far is thrown away, and the Unicode
join starts from scratch. If you know you have Unicode, you're better off using
a Unicode separator to avoid this unnecessary work </tangent>).

We then have special casing of zero length and single item sequences, before
dropping into the 'build the new string' loop.

By flagging the need for a 'stringisation' operation in the failed side of the
'PyUnicode_Check' that occurs during the first pass (to see if we should hand
over to the Unicode join), we could avoid unnecessarily slowing the pure string
cases.

To keep the speed of the pure-string case, we would need to guarantee that the
sequence consists of only strings when we run through the final pass to build
the new string. So we would need an optional second pass that constructs a new
sequence, containing any strings from the original sequence, plus 'stringised'
versions of the non-strings.

The final pass could remain as-is. The only possible difference is that it may
be operating on the new 'stringised' sequence rather than the old one.

The alternative implementation (checking each item's type as it is added to the
new string in the final pass) has the significant downside of slowing down the
existing case of joining only strings.

Either implementation should still be a lot faster than ''.join([str(x) for x in
seq]) though.

Time to go knock out some code, I think. . .

Cheers,
Nick.

P.S. "'\n'.join(locals().items())" sure would be pretty, though



More information about the Python-Dev mailing list