Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently

Peter L Hansen peter at engcorp.com
Tue Oct 5 07:47:42 EDT 2004


Jane Austine wrote:
> I have student_names and their scores, which are between 0..100.
> 
> I want to retrieve in the sorted order in terms of score, Nth
> student's name to N+10th student's name along with their scores.
> (suppose a web page showing top 10 students and there are "next page"
> link and if you follow the link there shows next top 10 students with
> "prev" and "next" page links.)
> 
> The list is dynamically resized -- elements are added or deleted. The
> list could go fairly long enough that pulling all the data on memory
> at once is not a good choice.

"Fairly long enough"?  I imagine you'd have to have thousands of
students before just doing the simplest thing (ignore performance
and read the whole list) would become a noticable problem.

It's rare in an environment like the web, with all the user and
network delays that are present, that optimizing at the level
you are trying to do is worth the effort.

> What algorithm and data/file structure are most efficient? (the
> environment is "cgi" based web environment)

And unless mod_python is involved, CGI adds yet another significant
delay, masking any slight difference in performance from the optimal
approach and the simplistic one.

> Simplest approach would be using btree from bsddb. 

Nope, simplest would be just use a list and ignore the theoretical
but unnoticable performance issues.  Remember, the actual memory
allocation and manipulations of the list elements happens in C code,
not in Python, so it is very fast.

> What can I do instead? What data structure allows you to access
> nth..n+10th data from the sorted list in a constant time?

If you've got more than 1000 students, it _might_ be worth
analyzing this further. :-)

-Peter



More information about the Python-list mailing list