[Expat-discuss] RE: Character Data

Dirk Dierckx brc@fourlittlemice.com
Mon, 13 Aug 2001 09:20:07 +0200


"Vassilii Nemtchinov" <vnemchin@hotmail.com> wrote:
>I know that the subject of handling character data has been already
>discussed here. Still I would like somebody to provide some suggestions on
>the subject. Since we can not assume that the character data arrive in one
>chunk for various reasons, in my solution I am allocating a buffer in the
>start tag handler and I am also setting a flag indicating that I've
>encountered the beginning of the element. I keep adding character data to a
>buffer in the character handler until I reset back the flag in the end
>handler. I can see several problems in using this method. First, it seems
>that the whole purpose of event-driven parser has been defied since I have
>to set sentinels  myself and not rely entirely on the parser. Secondly, in
>the worst case I have to allocate as many sentinels as I have elements in
>the document (same goes for separate buffers for character data). I am sure
>that somebody found a better solution for getting character data.

I'll explain (in short) how I do it.  I use something like the
following:

struct SParserContext
{	
	size_t m_szElementValueSize; /* init: m_szElementValueSize = (size_t)0U; */
	size_t m_szElementValueLen; /* init: m_szElementValueLen = (size_t)0U; */
	char *m_pchElementValue; /* init: m_pchElementValue = NULL; */
	char **m_ppchCurrentElementValue; /* init: m_ppchCurrentElementValue = NULL; */
	...
};

and use this structure in the following way:

void
assignElementValueToCurrentElement(struct SParserContext *psCtx)
{
	if(psCtx->m_ppchCurrentElementValue
		&& NULL == *(psCtx->m_ppchCurrentElementValue)
		&& psCtx->m_szElementValueLen > (size_t)0U)
	{
		*(psCtx->m_ppchCurrentElementValue) = (char*)malloc(
			psCtx->m_szElementValueLen + (size_t)1U);
		strcpy(*(psCtx->m_ppchCurrentElementValue),
			 psCtx->m_pchElementValue;
	}

	/* Make sure the element value is empty. */
	psCtx->m_szElementValueLen = (size_t)0U;
}

void
processOpenTag(void *pvUserData,
		   const char *pcchElement,
		   const char **ppcchAttributes)
{
	struct SParserContext *psCtx = (struct SParserContext*)pvUserData;

	/* 
		If we have a value stored in m_pchElementValue at this point
		=> it will be the full value of the *previously encountered* element.
	*/
	assignElementValueToCurrentElement(psCtx);

	...
	/* With your new element (pcchElement) you need a NULL char *
	   to it's value (~pchThisElementValueBuffer) , so ... */
	pchThisElementValueBuffer = NULL;
	psCtx->m_ppchCurrentElementValue = &pchThisElementValueBuffer;
	...
}

void
processCloseTag(void *pvUserData,
		    const char *pcchElement)
{
	struct SParserContext *psCtx = (struct SParserContext*)pvUserData;

	/*
		If we have a value stored in m_pchElementValue at this point
		=> it will be the full value of the *current* (~ pcchElement) element.
	*/
	assignElementValueToCurrentElement(psCtx);
}

/*
	Beware: Decoding of pcchData has been left out of this sample code !!!
*/
void
processTagData(void *pvUserData,
		   const char *pcchData,
		   int iDataLen)
{
	const size_t cszIncrement = (size_t)512U;
	struct SParserContext *psCtx = (struct SParserContext*)pvUserData;
	size_t szDataLen = (size_t)iDataLen, szBufferSize;

	szBufferSize = psCtx->m_szElementValueLen + szDataLen + (size_t)1U;
	if(szBufferSize > psCtx->m_szElementValueSize)
	{
		char *pchNew = NULL;

		/*
			Not enough memory available in m_pchElementValue to append
			pcchData to it, create/enlarge m_pchElementValue.
		*/
		szBufferSize = ((szBufferSize / cszIncrement) + (size_t)1U)
			* cszIncrement;
		if(psCtx->m_szElementValueSize)
			pchNew = (char*)realloc(psCtx->m_pchElementValue, szBufferSize);
		else
			pchNew = (char*)malloc(szBufferSize);
		if(pchNew)
		{
			psCtx->m_szElementValueSize = szBufferSize;
			psCtx->m_pchElementValue = pchNew;
		}
	}

	if(szBufferSize <= psCtx->m_szElementValueSize)
	{
		char *pchString = &(psCtx->m_pchElementValue[psCtx->m_szElementValueLen]);

		memcpy((void*)pchString,
 (void*)pcchData, szDataLen);
		pchString[szDataLen] = '\0'; /* m_pchElementValue is always terminated. */
		psCtx->m_szElementValueLen += szDataLen;
	}
}

PS.: The use of psCtx->m_ppchCurrentElementValue is purely to make this
sample
complete, in your particular impl. you should use a method that is
appropriate
for your problem off course.

Hope this code has done more good then bad ;-).

Regards,
Dirk.

-----Original Message-----
From: expat-discuss-admin@lists.sourceforge.net
[mailto:expat-discuss-admin@lists.sourceforge.net]On Behalf Of
expat-discuss-request@lists.sourceforge.net
Sent: Sunday, August 12, 2001 9:04 PM
To: expat-discuss@lists.sourceforge.net
Subject: Expat-discuss digest, Vol 1 #94 - 1 msg


Send Expat-discuss mailing list submissions to
	expat-discuss@lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.sourceforge.net/lists/listinfo/expat-discuss
or, via email, send a message with subject or body 'help' to
	expat-discuss-request@lists.sourceforge.net

You can reach the person managing the list at
	expat-discuss-admin@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Expat-discuss digest..."


Today's Topics:

   1. Character Data (Vassilii Nemtchinov)

--__--__--

Message: 1
From: "Vassilii Nemtchinov" <vnemchin@hotmail.com>
To: expat-discuss@lists.sourceforge.net
Date: Sun, 12 Aug 2001 02:40:12 +0000
Subject: [Expat-discuss] Character Data

I know that the subject of handling character data has been already
discussed here. Still I would like somebody to provide some suggestions on
the subject. Since we can not assume that the character data arrive in one
chunk for various reasons, in my solution I am allocating a buffer in the
start tag handler and I am also setting a flag indicating that I've
encountered the beginning of the element. I keep adding character data to a
buffer in the character handler until I reset back the flag in the end
handler. I can see several problems in using this method. First, it seems
that the whole purpose of event-driven parser has been defied since I have
to set sentinels  myself and not rely entirely on the parser. Secondly, in
the worst case I have to allocate as many sentinels as I have elements in
the document (same goes for separate buffers for character data). I am sure
that somebody found a better solution for getting character data.


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp




--__--__--

_______________________________________________
Expat-discuss mailing list
Expat-discuss@lists.sourceforge.net
http://lists.sourceforge.net/lists/listinfo/expat-discuss


End of Expat-discuss Digest