expat having problems with entities (&)

nnguyen nguyenn at gmail.com
Fri Dec 11 17:01:14 EST 2009


On Dec 11, 4:39 pm, Rami Chowdhury <rami.chowdh... at gmail.com> wrote:
> On Fri, Dec 11, 2009 at 13:23, nnguyen <nguy... at gmail.com> wrote:
>
> > Any ideas on any expat tricks I'm missing out on? I'm also inclined to
> > try another parser that can keep the string together when there are
> > entities, or at least ampersands.
>
> IIRC expat explicitly does not guarantee that character data will be
> handed to the CharacterDataHandler in complete blocks. If you're
> certain you want to stay at such a low level, I would just modify your
> char_data method to append character data to self.current_data rather
> than replacing it. Personally, if I had the option (e.g. Python 2.5+)
> I'd use ElementTree...
>

Well the appending trick worked. From some logging I figured out that
it was reading through those bits of current_data before getting to
the subfield ending element (which is kinda obvious when you think
about it). So I just used a += and made sure to clear out current_data
when it hits a subfield ending element.

Thanks!



More information about the Python-list mailing list