[Python-Dev] Unicode proposal: %-formatting ?
Tim Peters
tim_one@email.msn.com
Tue, 16 Nov 1999 00:38:32 -0500
[MAL]
> I wonder how we could add %-formatting to Unicode strings without
> duplicating the PyString_Format() logic.
>
> First, do we need Unicode object %-formatting at all ?
Sure -- in the end, all the world speaks Unicode natively and encodings
become historical baggage. Granted I won't live that long, but I may last
long enough to see encodings become almost purely an I/O hassle, with all
computation done in Unicode.
> Second, here is an emulation using strings and <default encoding>
> that should give an idea of one could work with the different
> encodings:
>
> s = '%s %i abcäöü' # a Latin-1 encoded string
> t = (u,3)
What's u? A Unicode object? Another Latin-1 string? A default-encoded
string? How does the following know the difference?
> # Convert Latin-1 s to a <default encoding> string via Unicode
> s1 = unicode(s,'latin-1').encode()
>
> # The '%s' will now add u in <default encoding>
> s2 = s1 % t
>
> # Finally, convert the <default encoding> encoded string to Unicode
> u1 = unicode(s2)
I don't expect this actually works: for example, change %s to %4s.
Assuming u is either UTF-8 or Unicode, PyString_Format isn't smart enough to
know that some (or all) characters in u consume multiple bytes, so can't
extract "the right" number of bytes from u. I think % formating has to know
the truth of what you're doing.
> Note that .encode() defaults to the current setting of
> <default encoding>.
>
> Provided u maps to Latin-1, an alternative would be:
>
> u1 = unicode('%s %i abcäöü' % (u.encode('latin-1'),3), 'latin-1')
More interesting is fmt % tuple where everything is Unicode; people can muck
with Latin-1 directly today using regular strings, so the example above
mostly shows artificial convolution.