[Python-3000] How will unicode get used?

"Martin v. Löwis" martin at v.loewis.de
Sun Sep 24 18:31:12 CEST 2006


Fredrik Lundh schrieb:
>> I don't think reducing memory consumption is that important, for current
>> hardware. Java and .NET have demonstrated that you can do "real"
>> application with that approach.
> 
> I've spent more time optimizing Python's string types than most, and 
> that doesn't match my experiences at all.  Operations on wide chars are 
> often faster than one might think, but any processor can copy X bytes of 
> data faster than it can copy X*4 bytes of data, and I doubt that's going 
> to change soon.

These statements don't contradict. You are saying that there is a
measurable, perhaps significant difference between copying of
single-byte vs. double-byte strings. I can believe this.

My claim is that this still isn't that important, and that it will
be "fast enough", anyway. In many cases, the application will be
IO-bound, so the cost of string operations might be negligible,
either way.

Of course, both statements generalize across an unspecified set of
applications, so it is a matter of personal preferences.

>> I think supporting multiple representations at run-time would really
>> be terrible. Any API of the "give me the data" kind would either have
>> to expose the choice of representations, or perform a copy.
> 
> Unless you can guarantee that *all* external API:s that a Python 
> extension might want to use will use exactly the same internal 
> representation as Python, that's something that we have to deal with anyway.

APIs will certainly allow different kinds of memory buffers to
create a Python string object. Creation is a fairly small part
of the API; I believe it would noticeably simplify the
implementation if there is only a single internal representation.


>> Either alternative would produce many programming errors in extension
>  > modules.
> 
> And even if that was true (which I don't believe), "many" would still
> be "very small" compared to the problems that reference counting and 
> error handling is causing.

We will see. We need a specification or implementation first to see,
of course.

Regards,
Martin


More information about the Python-3000 mailing list