[Python-Dev] Auto-str and auto-unicode in join

M.-A. Lemburg mal at egenix.com
Fri Aug 27 11:02:05 CEST 2004


Nick Coghlan wrote:
> Tim Peters wrote:
> 
>> I needed a break from intractable database problems, and am almost
>> done with PyUnicode_Join().  I'm not doing auto-unicode(), though, so
>> there will still be plenty of fun left for Nick!
> 
> 
> I actually got that mostly working (off slightly out-of-date CVS though).
> 
> Joining a sequence of 10 integers with auto-str seems to take about 60% 
> of the time of a str(x) list comprehension on that same sequence (and 
> the PySequence_Fast call means that a generator is slightly slower than 
> a list comp!). For a sequence which mixed strings and non-strings, the 
> gains could only increase.
> 
> However, there is one somewhat curly problem I'm not sure what to do about.
> 
> To avoid slowing down the common case of string join (a list of only 
> strings) it is necessary to do the promotion to string in the type-check 
> & size-calculation pass.
> 
> That's fine in the case of a list that consists of only strings and 
> non-basestrings, or the case of a unicode separator - every 
> non-basestring is converted using either PyObject_Str or PyObject_Unicode.
> 
> Where it gets weird is something like this:
>     ''.join([an_int, a_unicode_str])
>     u''.join([an_int, a_unicode_str])

This gives you a TypeError, so it's a non-issue (.join() does
not do an implicit call to str(obj) on the list elements).

The real issue is the case where you have [a_str, a_unicode_obj]
and for that the current implementation already does the right
thing, namely to look for Unicode objects in the length checking pass.

> In the first case, the int will first be converted to a string via 
> PyObject_Str, and then that string representation is what will get 
> converted to Unicode after the detection of the unicode string causes 
> the join to be handed over to Unicode join.
> 
> In the latter case, the int is converted directly to Unicode.
> 
> So my question would be, is it reasonable to expect that 
> PyObject_Unicode(PyObject_Str(some_object)) give the same answer as 
> PyObject_Unicode(some_object)?
 >
> If not, then the string join would have to do something whereby it kept 
> a 'pristine' version of the sequence around to hand over to the Unicode 
> join.
> 
> My first attempt at implementing this feature had that property, but 
> also had the effect of introducing about a 1% slowdown of the standard 
> sequence-of-strings case (it introduced an extra if statement to see if 
> a 'stringisation' pass was required after the initial type checking and 
> sizing pass). For longer sequences than 10 strings, I imagine the 
> relative slowdown would be much less.
> 
> Hmm. . . I think I see a way to implement this, while still avoiding 
> adding any code to the standard path through the function. It'd be 
> slower for the case where an iterator is passed in, and we automatically 
> invoke PyObject_Str but don't end up delegating to Unicode join, though, 
> as it involves making a copy of the sequence that only gets used if the 
> Unicode join is invoked. (If the original object is a real sequence, 
> rather than an iterator, there is no extra overhead - we have to make 
> the copy anyway, to avoid mutating the user's sequence).
> 
> If people are definitely interested in this feature, I could probably 
> put a patch together next week.
> 
> Regards,
> Nick.
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/mal%40egenix.com

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 27 2004)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list