Becoming Unicode Aware

Michael Foord fuzzyman at gmail.com
Wed Oct 27 06:56:22 EDT 2004


I'm trying to become 'unicode-aware'... *sigh*. What's that quote - 'a
native speaker of ascii will never learn to speak unicode like a
native'. The trouble is I think I've been a native speaker of latin-1
without realising it.

My main problem with udnerstanding unicode is what to do with
arbitrary text without an encoding specified. To the best of my
knowledge the technical term for this situation is 'buggered'. E.g. I
have a CGI guestbook script. Is the only way of knowing what encodign
the user is typing in, to ask them ?

Anyway - ConfigObj reads config files from plain text files. Is there
a standard for specifying the encoding within the text file ? I know
python scripts have a method - should I just use that ?

Also - suppose I know the encoding, or let the programmer specify, is
the following sufficient for reading the files in :

def afunction(setoflines, encoding='ascii'):
    for line in setoflines:
        if encoding:
            line = line.decode(encoding)

Regards,


Fuzzy
http://www.voidspace.org.uk/atlantibots/pythonutils.html



More information about the Python-list mailing list