[Python-3000] PyUnicodeObject implementation

Sun Sep 7 15:52:54 CEST 2008

Stefan Behnel <stefan_ml <at> behnel.de> writes:
> 
> From a Cython perspective, I find the lack of efficient subclassing after such
> a change particularly striking. That seriously bit me in Py2 when I tried
> making XML text content a bit more intelligent in lxml (i.e. make it remember
> what XML element it originated from).

I've used a library which had adopted this kind of behaviour (I think it was
BeautifulSoup). After using it several times in a row I noticed memory
consumption of my program exploded. The problem was that the library was
returning objects which looked innocently like strings, but internally kept a
reference to a multi-megabyte HTML tree. The solution was to convert them
explicitly to str before storing them for later use, which defeated the point of
having an str-derived type.

In these cases I think it's much friendlier to the user of the API to use
composition rather than inheritance. Or, simply, just return a raw string and
let the user keep the context separately if he wants to.

PS: what do you call "efficient subclassing"? if you look at the current
implementation of unicode_subtype_new() in unicodeobject.c, it isn't very
efficient (everything including the raw data buffer is allocated twice).