[Tutor] how best to store and process varriable ammounts of
paired data
Karl Pflästerer
sigurd at 12move.de
Thu Apr 22 15:18:44 EDT 2004
On 22 Apr 2004, Brian van den Broek <- bvande at po-box.mcgill.ca wrote:
> I can think of three main methods for storing and using the extracted data.
> 1) Iterate over the file and build a dictionary as I go, using the
> identified numerical ids as the keys, and the title strings as the
> values. Then work by iteration over the dictionary keys.
> 2) Iterate the file and build two lists, one of the id's and one of
> the title strings. Then:
> a) make a dictionary from the lists and work with the dictionary, or
> b) Just work from the lists themselves, iterating over the indices.
> 3) Parse the file, building tuples (id, string title) as I go and
> putting them in a list. Then iterate over the list, and read each
> tupple value as needed.
> So, since there may be 1000's of (id, title) pairs, I am wanting to
> choose the best method -- best here being defined as some compromise
> between high speed and small memory footprint.
I think method 3 should best match your wishes. If you build a
dictionary and want to iterate afterwards over the values a list has to
be built. If you use two lists you could zip() them and you would get a
new list with tuples. So if you can build the tuples from start on why
not doing it that way?
> Pointers as to which method here listed would be the way to go? Or
> some other way I've overlooked? Or, I am worrying about something that
> doesn't really matter?
A test will only show if it really matters (perhaps you spend more time
thinking about it than your PC needs to process the data).
A fourth method would be to write a generator function which gives you
an iterator which yields those tuples. So you neeedn't build a list of
tuples to iterate over them and it's also pretty fast.
Karl
--
Please do *not* send copies of replies to me.
I read the list
More information about the Tutor
mailing list