Trying to parse a HUGE(1gb) xml file

Stefan Behnel stefan_ml at behnel.de
Tue Dec 28 01:08:13 EST 2010


Roy Smith, 28.12.2010 00:21:
> To go back to my earlier example of
>
>          <Parental-Advisory>FALSE</Parental-Advisory>
>
> using 432 bits to store 1 bit of information, stuff like that doesn't
> happen in marked-up text documents.  Most of the file is CDATA (do they
> still use that term in XML, or was that an SGML-ism only?).  The markup
> is a relatively small fraction of the data.  I'm happy to pay a factor
> of 2 or 3 to get structured text that can be machine processed in useful
> ways.  I'm not willing to pay a factor of 432 to get tabular data when
> there's plenty of other much more reasonable ways to encode it.

If the above only appears once in a large document, I don't care how much 
space it takes. If it appears all over the place, it will compress down to 
a couple of bits, so I don't care about the space, either.

It's readability that counts here. Try to reverse engineer a binary format 
that stores the above information in 1 bit.

Stefan




More information about the Python-list mailing list