[Expat-discuss] Parser query

Kev Buckley k.buckley@lancaster.ac.uk
Wed, 15 Aug 2001 15:55:18 +0100 (BST)


Hello,

I'm trying to understand what characters the expat parser is throwing
when it reads the numerical entity code for the &#szlig; and &#oslah;
characters.

Bit of background.

I'm dumping data off a Palm which produces 

octal 337 (dec 223)  for the &#szlig;
      370 (dec 248)  for the &#oslash;

But if I have ß and ø in my XML doc, when expat parses the
doc I seem to get back two bytes as follows:

octal 303 + octal 237 for the ß
octal 303 + octal 248 for the ø


Other "extended shift" characters from the Palm seem to translate OK,
in that their &#nnn; codes get parsed as 

octal 302 + octal NNN  

where NNN equates to the decimal nnn used in the numeric entity code.

eg, the trademark symbol 

octal 231 (dec 153)  ->   &#153  which is parsed as 

octal 302 + octal 231


If the above makes any sense to anyone, then do you have any clues as
to what I have missed in trying to get these characters out of
(through) the expat parser ?


Kevin

-- 
Regards,

----------------------------------------------------------------------
*  Kevin M. Buckley              e-mail: K.Buckley@lancaster.ac.uk   *
*                                                                    *
*  Systems Administrator                                             *
*  Computer Centre                                                   *
*  Lancaster University          Voice:  +44 (0) 1524 5 93718        *
*  LANCASTER. LA1 4YW            Fax  :  +44 (0) 1524 5 25113        *
*  England.                                                          *
*                                                                    *
*  My PC runs Linux/GNU, you still computing the Bill Gate$' way ?   *
----------------------------------------------------------------------