Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently

William Park opengeometry at yahoo.ca
Tue Oct 5 15:49:27 EDT 2004


Jane Austine <janeaustine50 at hotmail.com> wrote:
> Hello.
> 
> I have student_names and their scores, which are between 0..100.
> 
> I want to retrieve in the sorted order in terms of score, Nth
> student's name to N+10th student's name along with their scores.
> (suppose a web page showing top 10 students and there are "next page"
> link and if you follow the link there shows next top 10 students with
> "prev" and "next" page links.)
> 
> The list is dynamically resized -- elements are added or deleted. The
> list could go fairly long enough that pulling all the data on memory
> at once is not a good choice.
> 
> What algorithm and data/file structure are most efficient? (the
> environment is "cgi" based web environment)
> 
> Simplest approach would be using btree from bsddb. But there are only
> "first"/"last" and "next"/"prev" moves. If I have to retrieve 20..29th
> students, I have got to "first" and then "next" 19 times before I
> finally get to 20th student, which is a waste of time(and memory).
> Maybe I could keep the cursor position among sessions if I were using
> a standalone server. However, it is in CGI environment.
> 
> What can I do instead? What data structure allows you to access
> nth..n+10th data from the sorted list in a constant time?

This is an example of "premature optimization".  How many student do you
have anyways?  10,000 students with 100 bytes for name and score comes
to 1MB.  Just keep the list as textfile, and read a block at time.



More information about the Python-list mailing list