[Expat-discuss] BOM Header causes an access violation exception
Oxley, Nicholas
noxley@rsasecurity.com
Mon, 25 Jun 2001 16:08:39 +0100
Hi,
I'm using expat to parse UTF-8 data. I've been using the Windows 2000
Notepad to edit and save in UTF-8 format so that I can test using unicode
characters. I've found that when I don't strip the byte order mark from the
UTF-8 file that notepad generates, expat throws an exception.
In the initScan() function (xmltok.c ln. 1488) it correctly identifies that
the first three characters of the data buffer are in fact, a byte order
mark, and returns XML_TOK_BOM.
In the doProlog() function, the XmlTokenRole() function returns
XML_ROLE_NONE (which does nothing in this context..). The data buffer is set
( s = next; ln. 3252) (next is null), and then XmlPrologTok is called with
the parameters:
XmlPrologTok(utf8_encoding, s (NULL ptr), end (reference to some heap
memory), &next (address of s));
Then, on ln. 983, there is a switch(BYTE_TYPE(enc, ptr)). And Boom, the null
pointer which it is mapped to is dereferenced.
#define BYTE_TYPE(enc, p) SB_BYTE_TYPE(enc, p)
#define SB_BYTE_TYPE(enc, p) \
(((struct normal_encoding *)(enc))->type[(unsigned char)*(p)])
If anyone could shed any light on this problem, that'd be great.
Kindest Regards,
Nicholas.