[XML-SIG] how to get the 'codepage' from a xml document
Mike Brown
mike@skew.org
Fri, 10 Jan 2003 02:55:15 -0700 (MST)
Remy C. Cool wrote:
> I found that the class InputSource has a method getEncoding
It's not what you're looking for. It only tells you what you previously set
with setEncoding(). The InputSource does not peek into the stream to
autodetect the encoding.
As it says in the spec at
http://www.saxproject.org/apidoc/org/xml/sax/InputSource.html
The SAX parser will use the InputSource object to determine how to read XML
input. If there is a character stream available, the parser will read that
stream directly, disregarding any text encoding declaration found in that
stream. If there is no character stream, but there is a byte stream, the
parser will use that byte stream, using the encoding specified in the
InputSource or else (if no encoding is specified) autodetecting the
character encoding using an algorithm such as the one in the XML
specification.
For example, if you are receiving the byte stream over HTTP or some other
MIME-based protocol, a Content-Type header may have contained a charset
parameter that indicated the encoding. This would take precedence over the
self-declared encoding. You would use setEncoding() to indicate that the byte
stream being wrapped by the InputSource is to be decoded according to that
encoding and not according to whatever is mentioned in the prolog.
> ... now I
> just have to find out how to get this implemented in a such a way
> that I can pass the encoding to the parser.
Why do you think you need to do this? A compliant parser is going to be
autodetecting the encoding if you don't force it to use something else. Why do
you want to do the autodetect externally?
> Does anyone know where I can find an example on how to do this?
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52257
Mike
--
Mike J. Brown | http://skew.org/~mike/resume/
Denver, CO, USA | http://skew.org/xml/