diferences between 22 and python 23

Martin v. Löwis martin at v.loewis.de
Sun Dec 7 12:31:49 EST 2003


"Fredrik Lundh" <fredrik at pythonware.com> writes:

> otoh, it would make sense to use 8-bit strings to store Unicode strings
> that happen to contain only Unicode code points in the full 8-bit range
> (0..255).

I'm not sure about the advantages. It would give a more efficient
representation, yes, but at the cost a slower implementation. Codecs
often cannot know in advance whether a string will contain only
latin-1 (unless they are the latin-1 or the ascii codec), so they
would need to scan over the input first.

In addition, operations like PyUnicode_AsUnicode would be very
difficult to implement (unless you have *two* representation pointers
in the Unicode object - at which time the memory savings are
questionable).

> I assume you meant:
> 
>     Yes, all library functions that expect *text* strings should support
>     Unicode objects.

Correct.

> having written Python's Unicode string type, I'm now thinking that
> it might have been better to use a polymorphic "text" type with
> either UTF-8 or encoded char or wchar buffers, and do dynamic
> translation based on usage patterns.  I've been playing with this
> idea in Pytte, but as usual, there's so much code, and so little
> time...

"Better" in what sense? Would it even be better if you had to preserve
all the C-level API that we currently have?

Regards,
Martin




More information about the Python-list mailing list