[Tutor] Trying to parse a HUGE(1gb) xml file in python

David Hutto smokefloat at gmail.com
Tue Dec 21 10:19:26 CET 2010


On Tue, Dec 21, 2010 at 4:17 AM, David Hutto <smokefloat at gmail.com> wrote:
> On Tue, Dec 21, 2010 at 4:10 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> David Hutto, 21.12.2010 09:55:
>>>
>>> On Tue, Dec 21, 2010 at 3:52 AM, Stefan Behnel wrote:
>>>>
>>>> Chris Fuller, 21.12.2010 03:27:
>>>>>
>>>>> This isn't XML, it's an abomination of XML.  Best to not treat it as
>>>>> XML.
>>>>> Good thing you're only after one class of tags.  Here's what I'd do.
>>>>>  I'll
>>>>> give a general solution, but there are two parameters / four cases that
>>>>> could
>>>>> make the code simpler, I'll just point them out at the end.
>>>>>
>>>>> Iterate over the file descriptor, reading in line-by-line.  This will be
>>>>> slow
>>>>> on a huge file, but probably not so bad if you're only doing it once.
>>>>
>>>> Note that it's not unlikely that this is actually *slower* than using a
>>>> real
>>>> XML parser:
>>>
>>> Or a 'real' language like C or C++ maybe to increase, or in Python's
>>> case, bypass, the interpreter?
>>
>> While this may be a little faster than Python code (although I suspect that
>> benchmarking is needed to prove either way), I doubt that it's worth the
>> overhead in code writing. If I can write a couple of lines of Python code
>> that are easy to validate and almost as fast as C code, why would I want to
>> write and debug hundreds of lines of code in C or C++, just to see that I
>> need to tune my benchmark to notice the difference?
>
> Don't get me wrong, I love the simplicity too, but if you know you
> really do need it along the way, then you should start thinking ahead
> of the easy, and toward the harder code for your project. Just as
> every language has it's place, so does Python.

If I want to write a programming language, It might not be the best
idea to have a labguage needed for speed based on Python, I should
maybe use wha it's based on, or refine my own optimizations, just to
be a little clearer about my perspective.


>
>
>>
>> But then, people even write XML handling code in Java, where neither
>> performance nor code size is a suitable argument.
>>
>> Stefan
>>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>>
>
>
>
> --
> They're installing the breathalyzer on my email account next week.
>



-- 
They're installing the breathalyzer on my email account next week.


More information about the Tutor mailing list