[Tutor] Trying to parse a HUGE(1gb) xml file in python

Stefan Behnel stefan_ml at behnel.de
Tue Dec 21 10:28:38 CET 2010


Hi,

I wonder why you reply to my e-mail without replying to what I wrote in it.


David Hutto, 21.12.2010 10:12:
> .
>>>>> I sympathize with you. I wonder who thought that building a 1GB XML file
>>>>> was a good thing.

This was written by Steven D'Aprano.


> If it is:
>
> XML stands for eXtensible Markup Language.
>
> XML is designed to transport and store data.
>
>
> Then what other file medium would you suggest as the tagging means.

There are different file formats for structured and semi-structured data. 
XML certainly isn't the only one, and people have been defining specific 
formats for their specific use cases for ages, for better or worse each time.

Personally, I don't think GB-sized XML files are bad per-se. It depends on 
the use case, and it depends on what's considered a suitable solution in a 
given environment. Also note that XML tends to compress pretty well, and 
that it's sometimes faster to parse gzipped XML than uncompressed XML. So 
the serialised file size by itself isn't an argument, either.


> You have a file with tags, you can't parse and store the data in any
> file anymore than the next, right?
>
> So the tags and how they are marked by any module or file extension
> searcher shouldn't matter, right?

I don't think I can extract the intended meaning from the assembled words 
you use here.

Stefan



More information about the Tutor mailing list