[I18n-sig] Mixed encodings and XML

uche.ogbuji@fourthought.com uche.ogbuji@fourthought.com
Wed, 13 Dec 2000 18:17:51 -0700


> uche.ogbuji@fourthought.com writes:
> > > Convert all of those sections into Unicode, using UTF-8 as the
> > > encoding form. You could write a trivial Python script to do this for
> > > you.
> > 
> > Not what I need, unfortunately.  The whole point of the exercise is
> > to have examples in the actual encodings.
> 
> And the point of that is what? They will display (most probably) as
> jibberish within the browser... or is that the point?

Good question.  I have not tried Chen Chien-Hsun's original HTML.  Perhaps 
even that won't work in a browser.  Makes sense.  What does a browser do with 
a document with

<META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=iso-8859-1'>
                                                            ^^^^^^^^^^
                                                            !!!!???!!!!

In the header and then runs into a big patch of UCS-2 or BIG5?

My guess is that it displays gibberish as you suggest.  In this case, I think 
there's no point expecting HTML generated from XML to do any better and it 
simply makes sense to break out the alternatively encoded portions into 
separate, linked files.

Chen, does this make sense?

> > Hmm?  My docbook tool is simply 4XSLT, which handles the individual encodings 
> > just fine now.
> 
> Sure, but if you want to generate a LaTeX (and from there PDF or PS)
> version you're screwed, AFAIK. If you are just generating HTML then
> you're OK.

Yeah.  That's all for now.

Thanks much.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python