[Tutor] how best to store and process varriable ammounts of paired data

Brian van den Broek bvande at po-box.mcgill.ca
Thu Apr 22 13:34:25 EDT 2004


Hi all,

I'm starting a project to write a bunch of functions for parsing the 
datafiles of a particular application I use. Thanks to help from the group 
I now understand how to work with files :-) but I have a question about 
efficient storage of the information I extract.

I want to extract from the files two types of lines that always come in 
pairs where one is a unique numerical id and the other a title string. The 
id numbers are unique, but not necessarily in numerical order. The 
associated strings are not necessarily unique. The lines are seperated by 
varriable ammounts of data, but in each case, I am sure I know how to 
extract only the information of interest. :-)  After extraction I want to 
do something with those pairs. My tasks will leave the pair invarriant in 
the extracted data I store, so an immutable type is fine. There can be 
anywhere from a small number of pairs to tens of thousands, depending on 
the particular datafile in question.

I can think of three main methods for storing and using the extracted data.

1) Iterate over the file and build a dictionary as I go, using the 
identified numerical ids as the keys, and the title strings as the values. 
Then work by iteration over the dictionary keys.

2) Iterate the file and build two lists, one of the id's and one of the 
title strings. Then:
    a) make a dictionary from the lists and work with the dictionary, or
    b) Just work from the lists themselves, iterating over the indices.

3) Parse the file, building tuples (id, string title) as I go and putting 
them in a list. Then iterate over the list, and read each tupple value as 
needed.

So, since there may be 1000's of (id, title) pairs, I am wanting to choose 
the best method -- best here being defined as some compromise between high 
speed and small memory footprint.

Pointers as to which method here listed would be the way to go? Or some 
other way I've overlooked? Or, I am worrying about something that doesn't 
really matter?

Perhaps my problem is, last name notwithstanding, I'm not Dutch:
    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
;-)

Thanks and best to all,

Brian vdB





More information about the Tutor mailing list