XML, Unicode

Tue Oct 1 07:37:41 EDT 2002

Win2k, Python2.2
Hello gurus.

I'm working on a small application that lets users build Web pages from a
simple interface.

No big deal; user's pick what components they want, fill out a few fields,
and choose the languages they need.

The XML is parsed and assembled into html pages.

Everything is encoded in UTF-8, and everything seems to work pretty well
(even the RTL languages, Hebrew and Arabic).

My questions (and they might seem dumb) are thus:

1) Is it "safe" to use UTF-8 encoding for html pages. (These pages will be
seen by lots of folks around the world.)

2) I use codecs.open("filename.html", "w+", "utf8") to create the html
pages, encoded in utf-8; is this correct?

3) The xml is all utf-8, and it appears that, when building strings,
non-utf8 strings are coerced?

      #one of the f(x)s...%<---------

      def makeHTML(self, cNode):
            cText      = self.unTagXML(cNode.toxml(), cNode.tagName)
            cTagStart  = "<"+cNode.parentNode.getAttribute("htmlTag")+">"
            cTagEnd    = "</"+cNode.parentNode.getAttribute("htmlTag")+">"
            cLink      = cNode.parentNode.getAttribute("filename")
            if (len(cLink) != 0): #===>if the component has a filename attribute...
                  cLinkStart = '<a href="'+cLink+'">'
                  cLinkEnd   = '</a>'
            else:
                  cLinkStart = ""
                  cLinkEnd   = ""

            #===>ADDING UTF-8 TO REGULAR OLD STRINGS SEEMS TO WORK JUST FINE...
            componentString = cTagStart+cLinkStart+cText+cLinkEnd+cTagEnd
            self.guiFile.write(componentString+'\r\n')

4) Why do I hafta use '\r\n' for the "Newline character" instead of '\n'?

Again, everything seems to work great; I'm just a little gunshy about royally screwing up.
I've ordered "Unicode: A Primer" (seen it referenced a lot), so hopefully I can get a better understanding
of the unicode aspect, but, in the meantime, I thought I'd ask you guys (and gals).

Thanks for any input,
hope i've been clear on this.

PETE