(Simple?) Unicode Question

Sat Aug 29 22:36:49 EDT 2009

On Sat, 29 Aug 2009 20:09:12 +0100, Nobody wrote:

> On Sat, 29 Aug 2009 08:26:54 +0000, Steven D'Aprano wrote:
> 
>> Python only needs to know when you convert the text to or from bytes. I
>> can do this:
>> 
>>>>> s = "hello"
>>>>> t = "world"
>>>>> print(' '.join([s, t]))
>> hello world
>> 
>> and not need to care anything about encodings.
>> 
>> So long as your terminal has a sensible encoding, and you have a good
>> quality font, you should be able to print any string you can create.
> 
> UTF-8 isn't a particularly sensible encoding for terminals.

Did I mention UTF-8?

Out of curiosity, why do you say that UTF-8 isn't sensible for terminals?

> And "Unicode font" is an oxymoron. You can merge a whole bunch of fonts
> together and stuff them into a TTF file; that doesn't make them "a
> font", though.

I never mentioned "Unicode font" either. In any case, there's no reason 
why a skillful designer can't make a single font which covers the entire 
Unicode range in a consistent style.

>>>> I may be wrong, but I believe that's part of the idea between
>>>> separation of string and bytes types in Python 3.x. I believe, if you
>>>> are using Python 3.x, you don't need the character encoding mumbo
>>>> jumbo at all ;-)
>>> 
>>> Nothing has changed in that regard. You still need to decode and
>>> encode text and for that you have to know the encoding.
>> 
>> You only need to worry about encoding when you convert from bytes to
>> text, and visa versa. Admittedly, the most common time you need to do
>> that is when reading input from files, but if all your text strings are
>> generated by Python, and not output anywhere, you shouldn't need to
>> care about encodings.
> 
> Why would you generate text strings and not output them anywhere?

Who knows? It doesn't matter -- the point is that you can if you want to. 
You only need to worry about encodings at input and output, therefore 
logically if you don't do I/O you can process strings all day long and 
never worry about encodings at all.

> The main advantage of using Unicode internally is that you can associate
> encodings with the specific points where data needs to be converted
> to/from bytes, rather than having to carry the encoding details around
> the program.

Surely the main advantage of Unicode is that it gives you a full and 
consistent range of characters not limited to the 128 characters provided 
by ASCII?

-- 
Steven