[XML-SIG] Handling large amouns of character data

Fredrik Lundh fredrik@pythonware.com
Tue, 9 Apr 2002 00:56:29 +0200


Jean-Francois wrote:

> Here is the problem, it seems that the parser inserts some sort of "newline"
> when lines are really long, which breaks a whole lot of things.

more likely, the parser calls char_data more than once without
calling the start tag or end tag handler.

> Here is my character data handler:
> 
>         def char_data(data):
>                 global nextchardata, xynodes
>                 if ( nextchardata == 'coordinates' ):
>                         print data
>                         data = strip(data)
>                         if ( data != '' ):
>                                 if ( xynodes == None ):
>                                         xynodes = data
>                                 else:
>                                         xynodes = xynodes+' '+data
>                                 nextchardata = None
> 
> if I do a "print data" RIGHT AFTER the "def" line, I see ALL the data dumped
> to screen, with "newlines" appearing after a given size "chunk" it seems.
> 
> I've tried to remove those using the "replace" function, but to no avail.
> I'm not even sure what they are ...

the print statement always prints a newline at the end of each line.

> Here's the kicker:
> 
> If I put that "print data" as the first line within the first "if" statement,
> boum, my data just got stripped, only the FIRST line of those now
> "multiline" data sets is now available!!! I didn't even do anything to it! 

of course you did: the last thing you do in your handler is to set
nextchardata to None, which causes the "coordinates" text to fail
on the next call.

try resetting nextchardata in the end-tag handler instead.

(a more robust approach is to use a list or character buffer to collect
string fragments in the character handler, reset that list in the start
tag handler, and process it in the end tag handler.)

</F>