[Web-SIG] WSGI configuration and character encoding.

Alan Kennedy py-web-sig at xhaus.com
Tue Nov 30 19:03:30 CET 2004


[Alan Kennedy]
 >> So if we're going to use ConfigParser *and* support encodings, then
 >> we  need to either
 >>
 >> A: Make the user specify the encoding *outside* the configuration
 >> file
 >> B: Require some form of "magic string" at the top of the file so that
 >> we can guess the encoding. And write the guessing algorithm.

[Phillip J. Eby]
 > As long as the encoding is restricted to basically the same set of
 > encodings that work for Python source code, it should only be
 > necessary to have the encoding specified as a configuration variable
 > in the file.
 >
 > However, if it's considered desirable to also detect a BOM, we can
 > implement that by reading the first four bytes of the file, and then
 > either backing up if there's no BOM, or wrapping the file object with
 > the appropriate decoding wrapper before passing it to ConfigParser.
 >
 > Of course, at that point we could just as well implement the exact
 > same detection algorithm as PEP 263, except that we could also support
 > wide encodings as long as there's a BOM.

I'm really, really, really, really, *really* against us trying to come 
up with our own solution to the encoding problem. There are just too 
many pitfalls and special cases.

Take XML 1.1, for example. XML 1.0 omitted the use of the IBM EBCDIC NEL 
character 0x85 as a permitted line terminator. XML 1.1 tried to rectify 
that omission, and despite the fact that dozens of clever people (i.e. 
the W3C XML working group) worked on the problem, and the spec was 
reviewed by literally thousands of eyeballs worldwide, they *all* 
*still* got it wrong!

XML 1.1: Dead on Arrival
http://norman.walsh.name/2004/09/30/xml11

I strongly urge that we adopt a solution that already has built-in 
encoding support, e.g. python or XML.

Failing that, if we want to use ConfigParser, I see three ways forward

1. Make the user specify the encoding of the config file *outside* the 
config file itself.
2. Approach ein den deutsche-enkoding-bots on python-dev, e.g. MAL or 
MvL, and ask their advice.
3. Spend days or weeks bending our brains about how to make ConfigParser 
also do encodings, and about whether the proposed approach works or not. 
And what about WSGI implementors? I shudder to think what a poorly 
chosen solution could do to them.

Just my €0,02.

Lastly, here's a wild suggestion: How about a hybrid approach? We use 
ConfigParser and the nice .ini syntax, but we wrap it in a simple XML 
wrapper, just so that we don't have to worry about encodings. For example

#----begin----
<?xml version="1.0" encoding="windows-1252"?>
<wsgi_config>

[server]
webmaster: aláin_ó_cinnéide at spam.org

</wsgi_config>
#-----end-----

Ugly, but perfectly functional and trivial to implement too.

Regards,

Alan.


More information about the Web-SIG mailing list