Unicode/UTF-8 confusion
Tom Stambaugh
tms at zeetix.com
Sat Mar 15 16:33:24 EDT 2008
I appreciate the answers the community has provided, I think I need to add
some additional context.
I use a trick to let me pass the information into my browser client
application. The browser requests the server information from a form whose
target is a hidden iframe. The string the server serializes is wrapped in
html that embeds it in an onload handler like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta
http-equiv="content-type" content="text/html; charset=UTF-8" /><script
type="text/javascript">//<![CDATA[
function vpage_load() {
var aParent = window.parent;
if (!aParent || !aParent.document || !aParent.document.vpage) {
alert("No parent, parent.document, or parent.document.vpage");
return;}
var aSerializedObject = '%(jsonString)s';
if (aParent && aParent._clientApplication) {
aParent._clientApplication.loadObject(aSerializedObject,
window.document, '' + window.document.location, true)}
else {
alert("No parent or no clientApplication")}
}
//]]></script>
</head>
<body onload="vpage_load();">
<input id="state" value="" type="text">
<textarea id="vpage"></textarea>
</body></html>
When this html finishes loading, its onload handler fires, it in turn fires
the "loadObject" method of the _clientApplication that is waiting for the
result, and the clientApplication then unpacks aSerializedObject into the
browser.
Once back in the browser, the loadObject method calls JSON.parse on
aSerializedObject, the json string we're discussing. A serialized object
typically contains many (at least tens, and sometimes several hundred) html
fragments. It contains at most a handful of apostrophes. That means there
are MANY more double quotes than apostrophes, if I delimit attributes with
double quotes.
In order to successfully pass the escapes to the server, I already have to
double any each backslash. At the end of the day, it's easier -- and results
in better performance -- to convert each apostrophe to its unicode
equivalent, as I originally asked.
I just want to know if there's a faster way to persuade simplejson to
accomplish the feat.
More information about the Python-list
mailing list