convert Unicode filenames to good-looking ASCII

coldpizza vriolk at gmail.com
Thu May 6 11:53:49 EDT 2010


Hello,

I need to convert accented unicode chars in some audio files to
similarly-looking ascii chars. Looks like the following code seems to
work on windows:

import os
import sys
import glob

EXT = '*.*'

lst_uni = glob.glob(unicode(EXT))

os.system('chcp 437')
lst_asci = glob.glob(EXT)
print sys.stdout.encoding

for i in range(len(lst_asci)):
    try:
        os.rename(lst_uni[i], lst_asci[i])
    except Exception as e:
        print e

On windows it converts most of the accented chars from the latin1
encoding. This does not work in Linux since it uses 'chcp'.

The questions are (1) *why* does it work on windows, and (2) what is
the proper and portable way to convert unicode characters to similarly
looking plain ascii chars?

That is how to properly do this kind of conversion?
 ü  > u
 é  > e
 â  > a
 ä  > a
 à  > a
 á  > a
 ç  > c
 ê  > e
 ë  > e
 è  > e

Is there any other way apart from creating my own char replacement
table?



More information about the Python-list mailing list