Interpreting non-ascii characters.

ddtl this at is.invalid
Wed Jul 18 02:46:23 EDT 2007


On Wed, 18 Jul 2007 08:29:58 +1000, John Machin <sjmachin at lexicon.net> wrote:
> ...

I have a bunch of directories and files from different systems 
(each directory contains files from the same system) which are
encoded differently (though all of them are in Russian), so the
following encodings are present: koi8-r, win-1251, utf-8 etc., 
and I want to transliterate them into a regular ASCII so that they 
would be readable regardless of the system. Personally I use both 
Linux and Windows. So what I do, is read file name using os.listdir, 
convert to list ('foo.txt' => ['f', 'o', ... , 't'], except that 
file names are in Russian), transliterate (some letters in Russian 
have to be transliterated into 2 or even 3 Latin letters),
and then rename file.

It seems though that after all I solved the problem - I thought
that my Windows (2000) used win-1251 and Linux used koi8-r and
because of that I couldn't understand what are those strange 
codes I got while experimenting with locally created Cyrillic
file names, but in effect Linux uses utf-8, and Windows uses cp866,
so after getting it and reading the article you suggested I
solved the problem.

Thanks.




More information about the Python-list mailing list