Trying to parse a HUGE(1gb) xml file

Stefan Sonnenberg-Carstens stefan.sonnenberg at pythonmeister.com
Thu Dec 23 16:34:06 EST 2010


Am 23.12.2010 21:27, schrieb Nobody:
> On Wed, 22 Dec 2010 23:54:34 +0100, Stefan Sonnenberg-Carstens wrote:
>
>> Normally (what is normal, anyway?) such files are auto-generated,
>> and are something that has a apparent similarity with a database query
>> result, encapsuled in xml.
>> Most of the time the structure is same for every "row" thats in there.
>> So, a very unpythonic but fast, way would be to let awk resemble the
>> records and write them in csv format to stdout.
> awk works well if the input is formatted such that each line is a record;
You shouldn't tell it to awk.
> it's not so good otherwise. XML isn't a line-oriented format; in
> particular, there are many places where both newlines and spaces are just
> whitespace. A number of XML generators will "word wrap" the resulting XML
> to make it more human readable, so line-oriented tools aren't a good idea.
I never had the opportunity seeing awk fail on this task :-)

For large datasets I always have huge question marks if one says "xml".
But I don't want to start a flame war.



More information about the Python-list mailing list