[Expat-discuss] not well-formed (invalid token) error

Nick MacDonald nickmacd at gmail.com
Wed Apr 8 01:17:06 CEST 2009


Krishna:

I'm afraid you're fighting a losing battle...  You're trying to use
the wrong too.  I have never looked into XHTML, put my gut instinct
tells me it would be an valid XML expression of HTML... and that is
where you need to go if you wish to continue to use eXpat as you're
currently attempting.  You could try and write some front end filter
to convert HTML that is not valid XML into valid XML, but that seems,
at first blush, to be a lot of work, and probably not worth the
payoff.  There must be some HTML parser out there that you could
use...  but I have never had the need and so cannot supply you with a
reference for one.

Good luck,
  Nick

On Tue, Apr 7, 2009 at 6:34 PM, Krishna Kondaka <kkondaka at yahoo.com> wrote:
>
> Thank you very much Nick!
>
> I am also getting error while parsing '<a href=http://www.yahoo.com>' because it expects the attributes to be in double quotes - like - <a href="http://www.yahoo.com">. Is there any way to instruct expat to not to expect double quotes?
>
> Thanks
> Krishna
>
>
>
> ----- Original Message ----
> From: Nick MacDonald <nickmacd at gmail.com>
> To: Krishna Kondaka <kkondaka at yahoo.com>
> Sent: Tuesday, April 7, 2009 2:55:54 PM
> Subject: Re: [Expat-discuss] not well-formed (invalid token) error
>
> I should have also noted, you could have coded the hr tags as:  <hr/>
>
> On Tue, Apr 7, 2009 at 5:49 PM, Nick MacDonald <nickmacd at gmail.com> wrote:
>> HTML is NOT valid XML.  eXpat is NOT an HTML parser, it is an XML
>> parser.  Your fixed file, below, will now parse (I added the missing
>> /hr tags.)
>>
>> Nick
>>
>>> <html>
>>>        <head>
>>>                <title>
>>>                        TEST
>>>                </title>
>>>        </head>
>>>        <body>
>>>                <hr></hr>
>>>                <a href=http://www.yahoo.com>Yahoo!</a>
>>>                <hr></hr>
>>>        </body>
>>> </html>
>>
>>
>> On Tue, Apr 7, 2009 at 5:39 PM, Krishna Kondaka <kkondaka at yahoo.com> wrote:
>>>
>>> Hi
>>>
>>> I am trying to parse a very simple HTML file but I am getting 'not well-formed (invalid token) error'. Is there any thing I can do to make this work without getting errors?
>>>
>>> Thanks in advance!
>>>
>>>
>>> The HTML file:
>>>
>>> <html>
>>>        <head>
>>>                <title>
>>>                        TEST
>>>                </title>
>>>        </head>
>>>        <body>
>>>                <hr>
>>>                <a href=http://www.yahoo.com>Yahoo!</a>
>>>                <hr>
>>>        </body>
>>> </html>
>>>
>>>
>>> The C code start, end routines are as follows:
>>>
>>> static void
>>> elem_start(void *data, const char *el, const char **attr)
>>> {
>>>  int i;
>>>  int *Depth = (int *)data;
>>>
>>>  strcpy(curr_elem, el);
>>>  for (i = 0; i < *Depth; i++)
>>>    buginf("  ");
>>>
>>>  buginf("Element(%s)->", el);
>>>
>>>  for (i = 0; attr[i]; i += 2) {
>>>    buginf("Attribute(%s)='%s':", attr[i], attr[i + 1]);
>>>  }
>>>  buginf("\n");
>>>
>>>  (*Depth)++;
>>> }
>>>
>>> static void
>>> elem_end(void *data, const char *el)
>>> {
>>>    int i, *Depth = data;
>>>
>>>    // buginf("%s===%s==\n", el, curr_elem);
>>>    if (el && !strcmp(el, curr_elem) && *curr_data_buf) {
>>>        for (i = 0; i < *Depth; i++)
>>>            buginf("  ");
>>>        buginf("Value='%s'\n", curr_data_buf);
>>>        curr_data_buf[0] = '\0';
>>>        curr_data_bufp = curr_data_buf;
>>>    }
>>>    (*((int *)data))--;
>>> }
>>>
>>> static void xml_test_parse (int len, char *bufp)
>>> {
>>>    int depth = 0;
>>>    XML_Parser parser;
>>>
>>>    parser = XML_ParserCreate(NULL);
>>>    XML_SetUserData(parser, &depth);
>>>    XML_SetElementHandler(parser, elem_start, elem_end);
>>>    XML_SetCharacterDataHandler(parser, cdata_handler);
>>>
>>>    if (!XML_Parse(parser, bufp, len, TRUE)) {
>>>        buginf( "%s at line %d\n",
>>>                XML_ErrorString(XML_GetErrorCode(parser)),
>>>                XML_GetCurrentLineNumber(parser));
>>>    }
>>>    XML_ParserFree(parser);
>>> }


More information about the Expat-discuss mailing list