[Tutor] Trying to parse a HUGE(1gb) xml file in python

Stefan Behnel stefan_ml at behnel.de
Tue Dec 21 09:52:22 CET 2010


Chris Fuller, 21.12.2010 03:27:
> This isn't XML, it's an abomination of XML.  Best to not treat it as XML.
> Good thing you're only after one class of tags.  Here's what I'd do.  I'll
> give a general solution, but there are two parameters / four cases that could
> make the code simpler, I'll just point them out at the end.
>
> Iterate over the file descriptor, reading in line-by-line.  This will be slow
> on a huge file, but probably not so bad if you're only doing it once.

Note that it's not unlikely that this is actually *slower* than using a 
real XML parser:

http://effbot.org/zone/celementtree.htm#benchmarks

Stefan



More information about the Tutor mailing list