[issue43483] Loss of content in simple (but oversize) SAX parsing

Larry Trammell report at bugs.python.org
Wed Mar 17 11:58:32 EDT 2021


Larry Trammell <ridgerat at nwi.net> added the comment:

Assuming that my understanding is completely correct, the situation is that the xml parser has an unspecified behavior.  This is true in any text content handler, at any time, and applies to the expat parser as well as SAX. In some rare cases, the behavior of the current implementation (and also many past ones) sometimes seems inconsistent and can catch users by surprise -- even some who are relatively knowledgable (which does not include me). 

This is a little abstract, but two things could be done to improve this:

1. Modify the implementation so that the behavior remains unspecified but falls more in line with plausible expectations of the users.  This makes things a little more complicated for the implementer, but does not invalidate the documentation of present or past versions. 

2. The documentation could be updated to expose the new constraints on the previously unspecified behavior, giving users a better chance to recognize and prepare for any remaining difficulties.  However, the implementation changes could be made even without these documentation changes.

So I remain confused about whether this is really a "bug" -- it is an "easy but unfortunate implementation choice" that is technically not wrong, even if sometimes baffling.  Established applications that already use older parser versions are relatively unlikely to start failing given the kind of documents they process, so backport changes might be helpful but do not seem urgent. 

Eric, with this clarification, what is your opinion about how to properly post a new issue -- improvement or bug fix?  I can provide a more detailed technical explanation where a new issue is posted.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43483>
_______________________________________


More information about the Python-bugs-list mailing list