[mod_python] Knowing the encoding of the URI

Daniel Chiaramello daniel.chiaramello at golog.net
Tue Mar 3 11:26:49 EST 2009


Hello everybody.

I am using mod_python, and I am confronted with a problem I don't know 
how to solve in an elegant way...

The problem is that I don't know what is the encoding of the 
<req.unparsed_uri> strings...

My script runs in China, and I receive requests coded in both "utf-8" 
and "gb18030" encoding...

The way I handle that is the following:

        uri = req.unparsed_uri
       
        try:
            uri_utf8 = uri.decode("utf-8").encode("utf-8")
            found_encoding = (uri_utf8 == uri)
        except:
            found_encoding = False
       
        if not found_encoding:
            uri_gb18030 = ""
            try:
                uri_gb18030 = uri.decode("gb18030").encode("gb18030")
                found_encoding = (uri_gb18030 == uri)
            except:
                found_encoding = False
           
            if found_encoding:
                uri = uri.decode("gb18030").encode("utf-8")
            else:
                raise "### Failed to find encoding for uri '%s'..." % (uri)

I am not very pleased by that.

So, is there a way to know in which encoding the <unparsed_uri> is 
coded? Is there a better way to determine the encoding?
I noticed the "content_encoding" member of the request, but it is always 
set to None...


Thanks for your attention,
Daniel



More information about the Python-list mailing list