Mutable strings

Alex Martelli aleax at aleax.it
Mon Sep 22 08:31:58 EDT 2003


Hans-Joachim Widmaier wrote:
   ...
>> I realised that Python doesn't *need* mutable strings.
> 
> Mutable strings come to *my* mind whenever I have to play with huge
> binary data.  Working with tens of megabytes is inherently somewhat
> slow.

But mutable strings are not the best place to keep "huge binary
data".  Lists of smaller blocks, arrays of bytes, and lists of
arrays can be much more appropriate data structures.


>> Python strings (and integers and floats) are all immutable for a very goo
>> d
>> reason:  dictionaries can't reliably use mutable objects as keys.
> 
> All understood. But then, I don't want to use my 32-MB binary blob as
> a key.

Since you don't in fact need to use it in any of the ways typically
applicable only to strings, it doesn't need to be a string.


>> convert strings into lists and back, my anxiety went away.   These cover
>> most usage of strings that might convince you you need mutability.
> 
> Converting said blob 'efficiently' to a list is something that I
> certainly would not call 'efficiently' - if not for the conversion
> itself, then for the memory consumption as list.

A typical case might be one where the blob is, e.g., in fact made
up of 65K sectors of 512 bytes each.  In this case, the extra memory
consumption due to keeping the blob in memory as a list of 65K small
strings rather than one big string is, I would guess, about 1%.  So,
who cares?  And similarly if the "substrings" are of different sizes,
just as long as you only have a few tens of thousands of such
substrings.  It's quite unusual that the "intrinsic structure" of
the blob is in fact one big undifferentiated 32MB thingy -- when it
is, you're unlikely to need it in memory, or if you do you're
unlikely to be able to apply any processing mutation to it sensibly;
and for those unusual and unlikely cases, arrays of bytes are often
just fine (after all, C has nothing BUT arrays of bytes [or of other
fixed entities], yet it's quite suitable for some such processing).


> I don't think strings are immutable because they ought to be that way
> (e.g. some CS guru teaches that "mutable strings are the root of all
> evil"). They're immutable because they allow them to be used as
> dictionary keys. And it was found that this doesn't affect the
> usefulness of the language too much.

Wrong.  Consider Java, even back from the very first version: it had
no dictionaries on which string might be keys, yet it still decided
to make its strings immutable.  This should make it obvious that the
interest of using keys as dict keys cannot possibly be the sole
motivation for the decision to make strings immutable in a language.
Rather, the deeper motivation is connected to wanting strings to be
ATOMIC, ELEMENTARY types, just like numbers; and to lots of useful
practical returns of that choice.  All you lose is the "ability" to
"confuse" (type-pun) between strings and arrays of bytes in many
situations, but that's an ability best lost in many cases.  It's not
an issue of "evil" -- a close-to-the-hardware low-level language
like C has excellent reasons to choose a different, close-to-HW
semantics -- but in a higher-level language I think Python's and
Java's choice to have strings immutable works better than (e.g.)
Perl's and Ruby's to have them mutable.


> Still, I can see a use for mutable strings. Or better, mutable binary
> data, made up of bytes. (where 'byte' is the smallest individually
> addressable memory unit blabla, ... you get the meaning. Just to not
> invite nit-pickers on that term.)

Just "import array" and you have your "mutable binary data made up
of bytes".  So, what's the problem?  Type-punning between THAT type,
and strings, is just not all that useful.


>> "explicit is better than implicit".
> 
> Yes, definitely: Let there be another type.

But, there IS one!  So, hat's wrong with it...?!


Alex





More information about the Python-list mailing list