[Tutor] Trying to parse a HUGE(1gb) xml file in python

Alan Gauld alan.gauld at btinternet.com
Tue Dec 21 11:45:22 CET 2010


"David Hutto" <smokefloat at gmail.com> wrote

> That';s what I saying above that xml seems to be the hog in terms of
> it's user defined tags. Is that somewhat a confirmation of my hunch,
> that it's the length of the users predefined tags that add to the
> above mess, and that maybe a lessened tag system in accordance with
> xml might be better, or a simple <a> tag <b> tag in the xml(other
> files) with an index  to point to a and b would be better.

Shorter tags reduce the data volume by a bit (and it can be a
big bit if the names are all 20 characters long!) but the inherent tag
structure, even with single char names will still often surpass the
data content.

<i>
5
</i>

8 bytes to describe an int which could be represented in
a single byte in binary (or even in CSV). Even if the int were
a 64bit binary value (8 bytes) the minimal tag structure still
consumes the same data width. Of course if the data
content is a long string then simple tags become cost
effective (think <p> in XHTML)...

HTH,


-- 
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/




More information about the Tutor mailing list