[Python-Dev] Can the cgi module be made Unicode-aware?

Martin v. Loewis martin@v.loewis.de
11 Apr 2002 18:26:55 +0200


Skip Montanaro <skip@pobox.com> writes:

> I did some reading before nodding off last night.  The <form> tag takes an
> optional "accept-charset" attribute, which can be a list.  

No, it doesn't - that's a proprietary extension. Or, maybe I'm missing
something: where did you find a statement that this is "official" in
any sense?

> As far as I can tell, the underlying data encoding of the form's data is
> generally going to be implicit.  

Unfortunately. RFC 1867 specifies that browsers should use a
Content-Type in a multipart/form-data message, but none of the current
browsers does.

> Adding an "accept-charset" attribute to the <form> does appear to
> have some effect on Content-Type in some instances, but not in all.

It might depend on the browser, since it's proprietary.

> The cgi programmer can't rely on charset information coming from the browser
> and will need a way to tell the cgi module what the charset of the incoming
> data is.  I think FieldStorage and MiniFieldStorage need optional charset
> parameters and I think the charset needs to be used from the Content-Type
> header, if present.

Of course, if you also have uploaded files, this cannot work: the file
data never follow the encoding - only the "text" fields do.

Regards,
Martin