[Tutor] how best to store and process varriable ammounts of paired data

Thu Apr 22 15:18:44 EDT 2004

On 22 Apr 2004, Brian van den Broek <- bvande at po-box.mcgill.ca wrote:

> I can think of three main methods for storing and using the extracted data.

> 1) Iterate over the file and build a dictionary as I go, using the
> identified numerical ids as the keys, and the title strings as the
> values. Then work by iteration over the dictionary keys.

> 2) Iterate the file and build two lists, one of the id's and one of
> the title strings. Then:
>     a) make a dictionary from the lists and work with the dictionary, or
>     b) Just work from the lists themselves, iterating over the indices.

> 3) Parse the file, building tuples (id, string title) as I go and
> putting them in a list. Then iterate over the list, and read each
> tupple value as needed.

> So, since there may be 1000's of (id, title) pairs, I am wanting to
> choose the best method -- best here being defined as some compromise
> between high speed and small memory footprint.

I think method 3 should best match your wishes.  If you build a
dictionary and want to iterate afterwards over the values a list has to
be built.  If you use two lists you could zip() them and you would get a
new list with tuples.  So if you can build the tuples from start on why
not doing it that way?

> Pointers as to which method here listed would be the way to go? Or
> some other way I've overlooked? Or, I am worrying about something that
> doesn't really matter?

A test will only show if it really matters (perhaps you spend more time
thinking about it than your PC needs to process the data).

A fourth method would be to write a generator function which gives you
an iterator which yields those tuples.  So you neeedn't build a list of
tuples to iterate over them and it's also pretty fast.

   Karl
-- 
Please do *not* send copies of replies to me.
I read the list