[Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

M.-A. Lemburg mal at egenix.com
Mon Oct 24 20:32:22 CEST 2005


Guido van Rossum wrote:
> On 10/24/05, Phil Thompson <phil at riverbankcomputing.co.uk> wrote:
> 
>>I'm implementing a string-like object in an extension module and trying to
>>make it as interoperable with the standard string object as possible. To do
>>this I'm implementing the relevant slots and the buffer interface. For most
>>things this is fine, but there are a small number of methods in
>>stringobject.c that don't use the buffer interface - and I don't understand
>>why.
>>
>>Specifically...
>>
>>string_contains() doesn't which means that...
>>
>>    MyString("foo") in "foobar"
>>
>>...doesn't work.
>>
>>s.join(sequence) only allows sequence to contain string or unicode objects.
>>
>>s.strip([chars]) only allows chars to be a string or unicode object. Same for
>>lstrip() and rstrip().
>>
>>s.ljust(width[, fillchar]) only allows fillchar to be a string object (not
>>even a unicode object). Same for rjust() and center().
>>
>>Other methods happily allow types that support the buffer interface as well as
>>string and unicode objects.
>>
>>I'm happy to submit a patch - I just wanted to make sure that this behaviour
>>wasn't intentional for some reason.
> 
> 
> A concern I'd have with fixing this is that Unicode objects also
> support the buffer API. In any situation where either str or unicode
> is accepted I'd be reluctant to guess whether a buffer object was
> meant to be str-like or Unicode-like. I think this covers all the
> cases you mention here.

This situation is a little better than that: the buffer
interface has a slot called getcharbuffer which is what
the string methods use in case they find that a string
argument is not of type str or unicode.

A few don't, but I guess we could fix this.

str.split(), .[lr]strip() all support the getcharbuffer
interface. str.join() currently doesn't. The Unicode object also
leaves out a few cases, among those the ones you mentioned.
If it's better for inter-op, I guess we should make an effort
and let all of them support the getcharbuffer interface.

> We need to support this better in Python 3000; but I'm not sure you
> can do much better in Python 2.x; subclassing from str is unlikely to
> work for you because then too many places are going to assume the
> internal representation is also the same as for str.

As first step, I'd suggest to implement the gatcharbuffer
slot. That will already go a long way.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 24 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list