[Python-Dev] Bug or feature? Unicode vs t#

M.-A. Lemburg mal@lemburg.com
Fri, 12 Oct 2001 13:20:10 +0200


Paul Prescod wrote:
> 
> "M.-A. Lemburg" wrote:
> >
> >...
> >
> > Since hexlify() uses a parser marker which does not involve a
> > type check, there's no way to have it reject Unicode objects.
> 
> Well, we do have the option of changing the code!
> 
> We could have hexlify check that its argument type is not Unicode or,
> more likely, remove the buffer interface from Unicode objects.

As mentioned in that email, Unicode objects do not expose their
internals via the "s#" or "t#" parser markers. Removing the buffer
interface from Unicode would not solve the problem for these.

> I think
> that's the logical outcome of Guido's belief that Python programmers do
> NOT need access to the internal representation of string objects. The
> buffer interface seems to be only in existence to give access to
> internal representations of objects.

I can't follow you here: access to the internals is always possible
at C level and even at Python level they can use the unicode-internal
codec to peek at the internals. 8-bit strings don't differ between 
internal and external at all.

The question Guido raised really boils down to whether Unicode objects
should provide the getreadbuffer interface or not. I don't think it
is used in many places since most Python programmers writing
C extensions will use the parser markers to get at their arguments.
OTOH, getreadbuffer is meant to access the internals of an object,
e.g. arrays have this interface too, so this would be an argument
for not removing it.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/