XML, Unicode
pcarey at lexmark.com
pcarey at lexmark.com
Tue Oct 1 07:37:41 EDT 2002
Win2k, Python2.2
Hello gurus.
I'm working on a small application that lets users build Web pages from a
simple interface.
No big deal; user's pick what components they want, fill out a few fields,
and choose the languages they need.
The XML is parsed and assembled into html pages.
Everything is encoded in UTF-8, and everything seems to work pretty well
(even the RTL languages, Hebrew and Arabic).
My questions (and they might seem dumb) are thus:
1) Is it "safe" to use UTF-8 encoding for html pages. (These pages will be
seen by lots of folks around the world.)
2) I use codecs.open("filename.html", "w+", "utf8") to create the html
pages, encoded in utf-8; is this correct?
3) The xml is all utf-8, and it appears that, when building strings,
non-utf8 strings are coerced?
#one of the f(x)s...%<---------
def makeHTML(self, cNode):
cText = self.unTagXML(cNode.toxml(), cNode.tagName)
cTagStart = "<"+cNode.parentNode.getAttribute("htmlTag")+">"
cTagEnd = "</"+cNode.parentNode.getAttribute("htmlTag")+">"
cLink = cNode.parentNode.getAttribute("filename")
if (len(cLink) != 0): #===>if the component has a filename attribute...
cLinkStart = '<a href="'+cLink+'">'
cLinkEnd = '</a>'
else:
cLinkStart = ""
cLinkEnd = ""
#===>ADDING UTF-8 TO REGULAR OLD STRINGS SEEMS TO WORK JUST FINE...
componentString = cTagStart+cLinkStart+cText+cLinkEnd+cTagEnd
self.guiFile.write(componentString+'\r\n')
4) Why do I hafta use '\r\n' for the "Newline character" instead of '\n'?
Again, everything seems to work great; I'm just a little gunshy about royally screwing up.
I've ordered "Unicode: A Primer" (seen it referenced a lot), so hopefully I can get a better understanding
of the unicode aspect, but, in the meantime, I thought I'd ask you guys (and gals).
Thanks for any input,
hope i've been clear on this.
PETE
More information about the Python-list
mailing list