(Simple?) Unicode Question
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sat Aug 29 22:36:49 EDT 2009
On Sat, 29 Aug 2009 20:09:12 +0100, Nobody wrote:
> On Sat, 29 Aug 2009 08:26:54 +0000, Steven D'Aprano wrote:
>
>> Python only needs to know when you convert the text to or from bytes. I
>> can do this:
>>
>>>>> s = "hello"
>>>>> t = "world"
>>>>> print(' '.join([s, t]))
>> hello world
>>
>> and not need to care anything about encodings.
>>
>> So long as your terminal has a sensible encoding, and you have a good
>> quality font, you should be able to print any string you can create.
>
> UTF-8 isn't a particularly sensible encoding for terminals.
Did I mention UTF-8?
Out of curiosity, why do you say that UTF-8 isn't sensible for terminals?
> And "Unicode font" is an oxymoron. You can merge a whole bunch of fonts
> together and stuff them into a TTF file; that doesn't make them "a
> font", though.
I never mentioned "Unicode font" either. In any case, there's no reason
why a skillful designer can't make a single font which covers the entire
Unicode range in a consistent style.
>>>> I may be wrong, but I believe that's part of the idea between
>>>> separation of string and bytes types in Python 3.x. I believe, if you
>>>> are using Python 3.x, you don't need the character encoding mumbo
>>>> jumbo at all ;-)
>>>
>>> Nothing has changed in that regard. You still need to decode and
>>> encode text and for that you have to know the encoding.
>>
>> You only need to worry about encoding when you convert from bytes to
>> text, and visa versa. Admittedly, the most common time you need to do
>> that is when reading input from files, but if all your text strings are
>> generated by Python, and not output anywhere, you shouldn't need to
>> care about encodings.
>
> Why would you generate text strings and not output them anywhere?
Who knows? It doesn't matter -- the point is that you can if you want to.
You only need to worry about encodings at input and output, therefore
logically if you don't do I/O you can process strings all day long and
never worry about encodings at all.
> The main advantage of using Unicode internally is that you can associate
> encodings with the specific points where data needs to be converted
> to/from bytes, rather than having to carry the encoding details around
> the program.
Surely the main advantage of Unicode is that it gives you a full and
consistent range of characters not limited to the 128 characters provided
by ASCII?
--
Steven
More information about the Python-list
mailing list