[Python-Dev] PEP 393: Special-casing ASCII-only strings

"Martin v. Löwis" martin at v.loewis.de
Thu Sep 15 23:39:13 CEST 2011


> I like it. If we start which such optimization, we can also also remove data
> from strings allocated by the new API (it can be computed: object pointer +
> size of the structure). See my email for my proposition of structures:
>     Re: [Python-Dev] PEP 393 review
>     Thu Aug 25 00:29:19 2011

I agree it is tempting to drop the data pointer. However, I'm not sure
how many different structures we would end up with, and how the aliasing
rules would defeat this (you cannot interpret a struct X* as a struct 
Y*, unless either X is the first field of Y or vice versa).

Thinking about this, the following may work:
- ASCIIObject: state, length, hash, wstr*, data follow
- SingleBlockUnicode: ASCIIObject, wstr_len,
                       utf8*, utf8_len, data follow
- UnicodeObject: SingleBlockUnicode, data pointer, no data follow

This is essentially your proposal, except that the wstr_len is dropped 
for ASCII strings, and that it uses nested structs.

The single-block variants would always be "ready", the full unicode 
object is ready only if the data pointer is set.

I'll try it out, unless somebody can punch a hole into this proposal :-)

Regards,
Martin



More information about the Python-Dev mailing list