[Tutor] Trying to parse a HUGE(1gb) xml file in python

Tue Dec 21 11:09:18 CET 2010

On Tue, Dec 21, 2010 at 4:58 AM, Alan Gauld <alan.gauld at btinternet.com> wrote:
>
> "David Hutto" <smokefloat at gmail.com> wrote
>
>>> I sympathize with you. I wonder who thought that building a 1GB XML file
>>> was a good thing.
>
>> that was just the first listing:
>>
>>
>> http://www.google.com/search?client=ubuntu&channel=fs&q=parsing+gigabyte+xml+python&ie=utf-8&oe=utf-8
>
> Eeek! One of the listings says:
>
>> 22 Jan 2009 ... Stripping Illegal Characters from XML in Python >>
>
> ... I'd be asking Python to process 6.4 gigabytes of CSV into
> 6.5 gigabytes of XML 1. ..... In fact, what happened was that
> the parsing didn't work and the whole db was ...
>
> And I thought a 1G file was extreme... Do these people stop to think that
> with XML as much as 80% of their "data" is just description (ie the tags).

That';s what I saying above that xml seems to be the hog in terms of
it's user defined tags. Is that somewhat a confirmation of my hunch,
that it's the length of the users predefined tags that add to the
above mess, and that maybe a lessened tag system in accordance with
xml might be better, or a simple <a> tag <b> tag in the xml(other
files) with an index  to point to a and b would be better.