[Python-Dev] Question about the current implementation of str

Nick Coghlan ncoghlan at gmail.com
Sat Apr 9 03:18:10 EDT 2016


On 9 April 2016 at 10:56, Larry Hastings <larry at hastings.org> wrote:
>
>
> I have a straightforward question about the str object, specifically the
> PyUnicodeObject.  I've tried reading the source to answer the question
> myself but it's nearly impenetrable.  So I was hoping someone here who
> understands the current implementation could answer it for me.
>
> Although the str object is immutable from Python's perspective, the C object
> itself is mutable.  For example, for dynamically-created strings the hash
> field may be lazy-computed and cached inside the object.  I was wondering if
> there were other fields like this.  For example, are there similar
> lazy-computed cached objects for the different encoded versions (utf8 utf16)
> of the str?  What would really help an exhaustive list of the fields of a
> str object that may ever change after the object's initial creation.

https://www.python.org/dev/peps/pep-0393/#specification should have
most of the relevant details.

Aside from the hash and the interned-or-not flag in the state, most
things should be locked once the string is ready, except that
generating the utf-8 and wchar_t forms is deferred until they're
needed if they're not the same as the canonical form.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list