[XML-SIG] how to get the 'codepage' from a xml document

Thomas B. Passin tpassin@comcast.net
Fri, 10 Jan 2003 08:41:33 -0500


[Remy C. Cool]
> > > just have to find out how to get this implemented in a such a way
> > > that I can pass the encoding to the parser.
> >
> > Why do you think you need to do this? A compliant parser is going
> > to be autodetecting the encoding if you don't force it to use
> > something else. Why do you want to do the autodetect externally?
>
> My application appends/inserts data into an existing xml file ... some
> what like a print queue. So I need the encoding to be able to create
> 'the new' xml file in the same encoding as the original and I don't
> like to hardcode the encoding into the source. It uses no external
> entity's (except for a DTD in plain ASCII) so that's not a problem.
>

I do not think you are looking at things quite the right way here.  When you
read the source and parse it, the resulting characters should no longer be
"encoded" - they are in the computer's internal format.  You should read the
external file into which you want to insert the new material.  It will now
be in the internal format too, and the two can be combined. When you write
the combined file, you can specify the encodng to use.

It is true that you still have to decide what encoding to use for the
output, but you no longer have a mix-and-match problem.  Anyway, if you have
to figure out the external file's encoding, you can always read the first
line of the external file, look for "encoding = ",  look at the byte order
mark if necessary, and do a simple-minded detection. It is bound to be good
enough for the merging output in your situation.

Cheers,

Tom P