[XML-SIG] how to get the 'codepage' from a xml document

Mike Brown mike@skew.org
Fri, 10 Jan 2003 02:55:15 -0700 (MST)


Remy C. Cool wrote:
> I found that the class InputSource has a method getEncoding

It's not what you're looking for. It only tells you what you previously set
with setEncoding(). The InputSource does not peek into the stream to 
autodetect the encoding.

As it says in the spec at
http://www.saxproject.org/apidoc/org/xml/sax/InputSource.html

  The SAX parser will use the InputSource object to determine how to read XML 
  input. If there is a character stream available, the parser will read that 
  stream directly, disregarding any text encoding declaration found in that 
  stream. If there is no character stream, but there is a byte stream, the 
  parser will use that byte stream, using the encoding specified in the 
  InputSource or else (if no encoding is specified) autodetecting the 
  character encoding using an algorithm such as the one in the XML 
  specification.

For example, if you are receiving the byte stream over HTTP or some other
MIME-based protocol, a Content-Type header may have contained a charset
parameter that indicated the encoding. This would take precedence over the
self-declared encoding. You would use setEncoding() to indicate that the byte
stream being wrapped by the InputSource is to be decoded according to that
encoding and not according to whatever is mentioned in the prolog.

> ... now I 
> just have to find out how to get this implemented in a such a way 
> that I can pass the encoding to the parser. 

Why do you think you need to do this? A compliant parser is going to be
autodetecting the encoding if you don't force it to use something else. Why do
you want to do the autodetect externally?
 
> Does anyone know where I can find an example on how to do this?

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52257

Mike

-- 
  Mike J. Brown   |  http://skew.org/~mike/resume/
  Denver, CO, USA |  http://skew.org/xml/