lisp is winner in DOM parsing contest! 8-]

Peter Hansen peter at engcorp.com
Mon Jul 12 09:17:58 EDT 2004


Alex Mizrahi wrote:

> (message (Hello 'Peter)
>  >> i have 3mb long XML document with about 150000 lines (i think it has
>  >> about 200000 elements there) which i want to parse to DOM to work
>  >> with.
> 
>  PH> Often, problems with performance come down the using the wrong
>  PH> algorithm, or using the wrong architecture for the problem at hand.
> 
> i see nothing wrong in loading 3 mb data into RAM. however, implementation
> details made it 100 times larger and it was the problem..

What is problematic about that for you?

>  PH> Are you absolutely certain that using a full in-memory DOM
>  PH> representation is the best for your problem?  It seems very unlikely
>  PH> to me that it really is...
> 
> format i'm dealing with is quite chaotic and i'm going to work with it
> interactively - track down myself where data i need lie and see how can i
> extract data..

You didn't mention this before.  If you're doing it interactively,
which I assume means with you actually typing lines of code that
will be executed in real-time, as you hit ENTER, then why the heck
are you concerned about the RAM footprint (i.e. there's nothing wrong
with loading 100MB of data into RAM for such a case either) or
even performance (since clearly you are going to be spending many
times more time working with the data than it takes to parse it,
even with some of the slower methods)?

Like I said, pick the right architecture for your problem domain...

-Peter



More information about the Python-list mailing list