[Tutor] Unicode trouble

Øyvind python at kapitalisten.no
Thu Dec 1 12:25:11 CET 2005


>The important question is, what is actual encoding of your source data?
>>
>> Is there anything else I could try?

>Understand why the above question is important, then answer it. Until you
do >you are just thrashing around in the dark.

The source is a text-document that as far as I know only contains English
and Norwegian letters. It can be opened with Notepad and Excel. I tried to
run thru it in Python by:

f = open('c://file.txt')

for i in f:
    print f

and that doesn't seem to give any problem. It prints all characters
without any trouble.

How would I find what encoding the document is in? All I can find is by
opening Notepad, selecting Font/Script and it says 'Western'.

Might the problem only be related to Win32com, not Python since Python
prints it without trouble?
>Do you know what a character encoding is? Do you understand the
difference >between utf-8 and latin-1?

Earlier characters had values 1-255. (Ascii). Now, you have a wider
choice. In our part of the world we can use an extended version which
contains a lot more, latin-1. UTF-8 is a part of Unicode and contains a
lot more characters than Ascii.

My knowledge about character encoding doesn't go much farther than this.
Simply said, I understand that the document that I want to read includes
characters beyond Ascii, and therefore I need to use UTF-8 or Latin-1. Why
I should use one instead of the other, I have no idea.



-- 
This email has been scanned for viruses & spam by Decna as - www.decna.no
Denne e-posten er sjekket for virus & spam av Decna as - www.decna.no



More information about the Tutor mailing list