[Expat-discuss] Parsing question from a newbie.

Thomas, John jct at sharplabs.com
Wed Mar 30 03:37:17 CEST 2011


Ladies and gentlemen,

I'm sorry for the dumb*** question, but I am an expat newbie.  And, for
that matter, I am an XML newbie.  I may make some silly assumptions in
my problem statement below.  Please feel free to thrash me with my own
ignorance.

Suppose that I have an XML snippet of the form:

<myXML>
  <size_1>
    <height>8.5</height>
    <width>11.0</width>
  </size_1>
  <size_2>
    <height>8.0</height>
    <width>10.0</width>
  </size_2>
</myXML>

Note that there are redundant leaf nodes for both height and width.  I
presume that this snippet is legal XML, despite this redundancy.  (The
pair can, in thoery, be disambiguated by the context; i.e. size_1 vs
size_2).  Perhaps this is NOT legal XML, but I shall proceed with my
questions assuming that it is so.

Now suppose that I have the following application code (adapted
shamelessly from "elements.c".


static void XMLCALL
startElement(void *userData, const char *name, const char **atts)
{
}

void value_data_handler (void *userData, const char *buf_ptr, int len)
{
}

void default_handler (void *userData, const char *buf_ptr, int len)
{
}

static void XMLCALL
endElement(void *userData, const char *name)
{
}

int
main(int argc, char *argv[])
{
  char buf[BUFSIZ];
  XML_Parser parser = XML_ParserCreate(NULL);
  int done;
  int depth = 0;
  XML_SetUserData(parser, &depth);
  XML_SetElementHandler(parser, startElement, endElement);
  XML_SetCharacterDataHandler(parser, value_data_handler);
  XML_SetDefaultHandler(parser, default_handler);
  do {
    int len = (int)fread(buf, 1, sizeof(buf), stdin);
    done = len < sizeof(buf);
    if (XML_Parse(parser, buf, len, done) == XML_STATUS_ERROR) {
      fprintf(stderr,
              "%s at line %" XML_FMT_INT_MOD "u\n",
              XML_ErrorString(XML_GetErrorCode(parser)),
              XML_GetCurrentLineNumber(parser));
      return 1;
    }
  } while (!done);
  XML_ParserFree(parser);
  return 0;
}

If my understanding is correct, expat calls the value_data_handler()
function when it has located the start and stop tags for an XML element.
And the data handler function is written by me, the developer, to parse
element values (height and width in my example) out of the XML.  In
order for my code to disambiguate these redundant element names, there
must be some way to determine the context for each call.  

In other words, I would expect one of the parameters to the
value_data_handler() call to be something equivalent to
"myXML:size_1:height" to distinguish this call from the one with the
context equivalent to "myXML:size_2:height".

Unless I am missing something, the only parameters that I get from expat
to this call are: 
1)	A pointer to my own "userData" data structure.
2)	A pointer to the XML data buffer at the point corresponding to
the value that is to be parsed.
3)	The length of the data value text, before the element-ending
markup token.

First question: Why does expat give a length value of 1 when there is no
text between <tokenA><\tokenA>?

I know that the userData structure COULD hold this information that I
seek provided that I, the developer, put it there.  But I would have to
know how and when to OBTAIN that contextual data in the first place.
And I do not.  So ....

Second question: How (and when) do I obtain the context information for
a call to value_data_handler()?  Does it exist in a convenient form and
is there an expat-provided function call to get it?

And finally,

Third question: If I am looking at this problem "all wrong", what is the
"right way" to look at it?

Thank you for your patience and (in advance) for your help.

John C Thomas
Sharp Laboratories of America
jct at sharplabs.com




More information about the Expat-discuss mailing list