diferences between 22 and python 23
Martin v. Löwis
martin at v.loewis.de
Sun Dec 7 12:31:49 EST 2003
"Fredrik Lundh" <fredrik at pythonware.com> writes:
> otoh, it would make sense to use 8-bit strings to store Unicode strings
> that happen to contain only Unicode code points in the full 8-bit range
> (0..255).
I'm not sure about the advantages. It would give a more efficient
representation, yes, but at the cost a slower implementation. Codecs
often cannot know in advance whether a string will contain only
latin-1 (unless they are the latin-1 or the ascii codec), so they
would need to scan over the input first.
In addition, operations like PyUnicode_AsUnicode would be very
difficult to implement (unless you have *two* representation pointers
in the Unicode object - at which time the memory savings are
questionable).
> I assume you meant:
>
> Yes, all library functions that expect *text* strings should support
> Unicode objects.
Correct.
> having written Python's Unicode string type, I'm now thinking that
> it might have been better to use a polymorphic "text" type with
> either UTF-8 or encoded char or wchar buffers, and do dynamic
> translation based on usage patterns. I've been playing with this
> idea in Pytte, but as usual, there's so much code, and so little
> time...
"Better" in what sense? Would it even be better if you had to preserve
all the C-level API that we currently have?
Regards,
Martin
More information about the Python-list
mailing list