Becoming Unicode Aware

Diez B. Roggisch deetsNOSPAM at web.de
Wed Oct 27 06:56:32 EDT 2004


> My main problem with udnerstanding unicode is what to do with
> arbitrary text without an encoding specified. To the best of my
> knowledge the technical term for this situation is 'buggered'. E.g. I
> have a CGI guestbook script. Is the only way of knowing what encodign
> the user is typing in, to ask them ?

Unfortunately the http standard seems to lack a specification how form data
encoding is to be transferred. But it seems that most browser which
understand a certain encoding your page is delivered in will use that for
replying.

 
> Anyway - ConfigObj reads config files from plain text files. Is there
> a standard for specifying the encoding within the text file ? I know
> python scripts have a method - should I just use that ?

No idea what configobj is - is it you own config parser?
 
> Also - suppose I know the encoding, or let the programmer specify, is
> the following sufficient for reading the files in :
> 
> def afunction(setoflines, encoding='ascii'):
>     for line in setoflines:
>         if encoding:
>             line = line.decode(encoding)

Yes, it should be - but why the if? It is unnecessary, as its condition will
always be true - and you _want_ it that way, as the result of afunction
should always be unicode objects, no matter what encoding was used.


-- 
Regards,

Diez B. Roggisch



More information about the Python-list mailing list