(Simple?) Unicode Question

Sat Aug 29 15:09:12 EDT 2009

On Sat, 29 Aug 2009 08:26:54 +0000, Steven D'Aprano wrote:

> Python only needs to know when you convert the text to or from bytes. I 
> can do this:
> 
>>>> s = "hello"
>>>> t = "world"
>>>> print(' '.join([s, t]))
> hello world
> 
> and not need to care anything about encodings.
> 
> So long as your terminal has a sensible encoding, and you have a good 
> quality font, you should be able to print any string you can create.

UTF-8 isn't a particularly sensible encoding for terminals.

And "Unicode font" is an oxymoron. You can merge a whole bunch of fonts
together and stuff them into a TTF file; that doesn't make them "a font",
though.

>>> I may be wrong, but I believe that's part of the idea between
>>> separation of string and bytes types in Python 3.x. I believe, if you
>>> are using Python 3.x, you don't need the character encoding mumbo jumbo
>>> at all ;-)
>> 
>> Nothing has changed in that regard. You still need to decode and encode
>> text and for that you have to know the encoding.
> 
> You only need to worry about encoding when you convert from bytes to 
> text, and visa versa. Admittedly, the most common time you need to do 
> that is when reading input from files, but if all your text strings are 
> generated by Python, and not output anywhere, you shouldn't need to care 
> about encodings.

Why would you generate text strings and not output them anywhere?

The main advantage of using Unicode internally is that you can associate
encodings with the specific points where data needs to be converted
to/from bytes, rather than having to carry the encoding details around the
program.