[New-bugs-announce] [issue35729] XML.etree bug

Igor Nowicki report at bugs.python.org
Sun Jan 13 03:17:55 EST 2019


New submission from Igor Nowicki <thesmilingcatofcheshire at gmail.com>:

Consider we have big XML file and we can't load it all into memory. We use then `iterparse` function from XML.etree.ElementTree module to parse it element by element.

Problem is, XML doesn't allow to run this smoothly and starts outputing wrong data after loading 16 kb (16*1024, found it after looking into source code). Having large number of children, we get the information that we have just a few.

To reproduce the problem, I created this example program. It makes simple xml file with progressively bigger files and tracks how many children of main objects there are counted. For small objects we have actual number, 100 children. For bigger and bigger sizes we have smaller numbers, going down to just few.

----------
components: Library (Lib)
files: find_records.py
messages: 333549
nosy: Igor Nowicki
priority: normal
severity: normal
status: open
title: XML.etree bug
type: performance
versions: Python 3.6
Added file: https://bugs.python.org/file48046/find_records.py

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35729>
_______________________________________


More information about the New-bugs-announce mailing list