- Re: Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently

Jane Austine janeaustine50 at hotmail.com
Tue Oct 5 08:36:54 EDT 2004


Peter L. Hansen wrote:
> 
> Jane Austine wrote:
> > I have student_names and their scores, which are between 0..100.
> > 
> > I want to retrieve in the sorted order in terms of score, Nth
> > student's name to N+10th student's name along with their scores.
> > (suppose a web page showing top 10 students and there are "next page"
> > link and if you follow the link there shows next top 10 students with
> > "prev" and "next" page links.)
> > 
> > The list is dynamically resized -- elements are added or deleted. The
> > list could go fairly long enough that pulling all the data on memory
> > at once is not a good choice.
> 
> "Fairly long enough"?  I imagine you'd have to have thousands of
> students before just doing the simplest thing (ignore performance
> and read the whole list) would become a noticable problem.

First of all, thank you for replying me.

Yes, I do have more than a thousand.

> 
> It's rare in an environment like the web, with all the user and
> network delays that are present, that optimizing at the level
> you are trying to do is worth the effort.
>

Yes. Especially when there are CPU load limits(process going on like a
few seconds using more than 10% will get warning) on the server, and
it is shared among a few users(using a web hosting service).
 
> > What algorithm and data/file structure are most efficient? (the
> > environment is "cgi" based web environment)
> 
> And unless mod_python is involved, CGI adds yet another significant
> delay, masking any slight difference in performance from the optimal
> approach and the simplistic one.
>

True. So the situation is not very good, and that's why I need a
clever scheme. :)

> > Simplest approach would be using btree from bsddb. 
> 
> Nope, simplest would be just use a list and ignore the theoretical
> but unnoticable performance issues.  Remember, the actual memory
> allocation and manipulations of the list elements happens in C code,
> not in Python, so it is very fast.
> 
> > What can I do instead? What data structure allows you to access
> > nth..n+10th data from the sorted list in a constant time?
> 
> If you've got more than 1000 students, it _might_ be worth
> analyzing this further. :-)
> 

Actually, it's not exactly students. They are some kind of elements
and they have a few attributes(columns) like timestamp and etc, and
can be sorted by a few columns(single key sorting though).

Most of the time I suppose the lists would grow up to 10000 ~ 90000
elements.

-Jane



More information about the Python-list mailing list