[issue1943] improved allocation of PyUnicode objects

Adam Olsen report at bugs.python.org
Sun Jan 10 20:52:15 CET 2010


Adam Olsen <rhamph at gmail.com> added the comment:

Points against the subclassing argument:

* We have a null-termination invariant.  For byte strings this was part of the public API, and I'm not sure that's changed for unicode strings; aren't you arguing that we should maximize how much of our implementation is a public API?  This prevents lazy slicing.

* UTF-16 and UTF-32 are rarely used encodings, especially for longer strings (ie files).  For shorter strings (APIs) the unicode object overhead is more significant and we'd need a way to slave to the buffer's lifetime to that of the unicode object (hard to do).  For longer strings UTF-8 would be much more useful, but that's been shot down before.

* subclassing unicode so you can change the meaning of the fields (ie allocating your own buffer) is a gross hack.  It relies far too much on fine details of the implementation and is fragile (what if you miss the dummy byte needed by fastsearch?)  Most of the possible options could be, if they function correctly, applied directly to the basetype as a patch, so it's moot.

* If you dislike PyVarObject in general (I think the API is ugly too) you should argue for a general policy discouraging future use of it, not just get in the way of the one place where it's most appropriate

Terry: PyVarObjects would be much easier to subclass if the type object stored an offset to the beginning of the variable section, so it could be automatically recalculated for subclasses based on the size of the struct.  This'd mean the PyBytesObject struct would no longer end with a char ob_sval[1].  The down side is a tiny bit more math when accessing the variable section (as the offset is no longer constant).

----------
nosy: +Rhamphoryncus

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1943>
_______________________________________


More information about the Python-bugs-list mailing list