[Python-3000] string C API

Tue Oct 3 17:53:15 CEST 2006

Jim Jewett schrieb:
> In python 3, a string object might look like
> 
> #define PyObject_str_HEAD   \
>    PyObject_VAR_HEAD   \
>    long ob_shash;   \
>    PyObject *cache;
> 
> with a typical concrete implementation looking like
> 
> typedef struct {
>    PyObject_str_HEAD
>    PyObject *encoding   /* concrete method implementation, not just
> codecs */
>    data
> } PyAbstractUnicodeObject;

I think Josiah is proposing a different implementation:

typedef struct{
  PyObject_VAR_HEAD
  long ob_shash;
  enum{L1,L2,L4} ob_elemsize;
  ucs4 ob_sval[1]; /* could be interpreted as char* or ucs2* as well */
} PyUnicodeObject;

> implementors of concrete string types.

Why should they care what the Python string type is implemented like?
Few people implement their own string types, and those can just
implement their own type, with no concern whatsoever for the builtin
string type.

> Python is normally pretty good about duck typing, but str is a
> notorious exception.

Non-sense. Python is good about duck typing, period. Just look
at UserString for an example on how to implement a new string
type.

You seem to be talking about polymorphism through inheritance.
Python does not support that well for any of the builtin types;
I do think that all these types should be final (in the Java
sense).

> I also expect that the number of concrete types in the core itself may
> increase if it is easy to do that.  I don't think any single person
> would care enough to maintain all of UCS4, UCS2, Latin-1, Latin-2,
> UTF-8, and NSString versions; it wouldn't surprise me if there were
> someone who cared enough to maintain each of those.

I doubt any kind of "pluggable" representation could work in a
reasonable way. With that generality, you lose any information
as to what the internal representation is, and then code becomes
tedious to write and slow to run.

Regards,
Martin