Unicode/UTF-8 confusion

Tom Stambaugh tms at zeetix.com
Sat Mar 15 16:33:24 EDT 2008


I appreciate the answers the community has provided, I think I need to add 
some additional context.

I use a trick to let me pass the information into my browser client 
application. The browser requests the server information from a form whose 
target is a hidden iframe. The string the server serializes is wrapped in 
html that embeds it in an onload handler like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta 
http-equiv="content-type" content="text/html; charset=UTF-8" /><script 
type="text/javascript">//<![CDATA[
function vpage_load() {
    var aParent = window.parent;
    if (!aParent || !aParent.document || !aParent.document.vpage) {
        alert("No parent, parent.document, or parent.document.vpage");
        return;}
    var aSerializedObject = '%(jsonString)s';
    if (aParent && aParent._clientApplication) {
        aParent._clientApplication.loadObject(aSerializedObject, 
window.document, '' + window.document.location, true)}
    else {
        alert("No parent or no clientApplication")}
    }
    //]]></script>
    </head>
    <body onload="vpage_load();">
        <input id="state" value="" type="text">
        <textarea id="vpage"></textarea>
        </body></html>

When this html finishes loading, its onload handler fires, it in turn fires 
the "loadObject" method of the _clientApplication that is waiting for the 
result, and the clientApplication then unpacks aSerializedObject into the 
browser.

Once back in the browser, the loadObject method calls JSON.parse on 
aSerializedObject, the json string we're discussing. A serialized object 
typically contains many (at least tens, and sometimes several hundred) html 
fragments. It contains at most a handful of apostrophes. That means there 
are MANY more double quotes than apostrophes, if I delimit attributes with 
double quotes.

In order to successfully pass the escapes to the server, I already have to 
double any each backslash. At the end of the day, it's easier -- and results 
in better performance -- to convert each apostrophe to its unicode 
equivalent, as I originally asked.

I just want to know if there's a faster way to persuade simplejson to 
accomplish the feat.






More information about the Python-list mailing list