[Expat-discuss] BOM Header causes an access violation exception

Oxley, Nicholas noxley@rsasecurity.com
Mon, 25 Jun 2001 16:08:39 +0100


Hi,

I'm using expat to parse UTF-8 data. I've been using the Windows 2000
Notepad to edit and save in UTF-8 format so that I can test using unicode
characters. I've found that when I don't strip the byte order mark from the
UTF-8 file that notepad generates, expat throws an exception.

In the initScan() function (xmltok.c ln. 1488) it correctly identifies that
the first three characters of the data buffer are in fact, a byte order
mark, and returns XML_TOK_BOM.

In the doProlog() function, the XmlTokenRole() function returns
XML_ROLE_NONE (which does nothing in this context..). The data buffer is set
( s = next; ln. 3252) (next is null), and then XmlPrologTok is called with
the parameters:

XmlPrologTok(utf8_encoding, s (NULL ptr), end (reference to some heap
memory), &next (address of s));

Then, on ln. 983, there is a switch(BYTE_TYPE(enc, ptr)). And Boom, the null
pointer which it is mapped to is dereferenced.


#define BYTE_TYPE(enc, p) SB_BYTE_TYPE(enc, p)

#define SB_BYTE_TYPE(enc, p) \
  (((struct normal_encoding *)(enc))->type[(unsigned char)*(p)])


If anyone could shed any light on this problem, that'd be great.


Kindest Regards,

Nicholas.