[issue1943] improved allocation of PyUnicode objects

Wed Jun 3 23:31:26 CEST 2009

Guido van Rossum <guido at python.org> added the comment:

On Wed, Jun 3, 2009 at 1:41 PM, Antoine Pitrou <report at bugs.python.org> wrote:
> Apart from the example Marc-André just posted (and which is a 0.0.1
> proof of concept he apparently just wrote), the number of users is,
> AFAICT, zero.

IIUC Marc-Andre extracted that from a larger code base (MX) which he
owns and has been maintaining for a decade or so.

> Unless there's some closed source extension which happens to extend
> unicode as a C subtype.

I believe part of MX is closed source.

> Now, as for easing the subclassing of unicode in C, there are probably
> several possibilities which range from devising a clever set of macros
> to abusing the ob_size field for a tagged pointer. People who really
> care should do a concrete proposal (and I don't know who these people
> are, apart from Marc-André).

Not really if the core code uses a macro that depends on the layout of
the object (i.e. the data immediately following the header, like old
8-bit strings), unless you change the core (or the macro) to only use
this if the type matches exactly, and for subtypes use a more
expensive API. But that would slow down unnecessarily for subclasses
written in Python (of which there are plenty).

But I would like to point out that few people if any have ever
complained about the contiguous allocation for 8-bit strings in Python
[0-2].x. And we certainly wouldn't have given in. Now that Unicode is
no longer some fancy-schmancy advanced concept but the basis for *all*
Python string processing I think we should apply the same policy.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1943>
_______________________________________