- Re: Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently
Jane Austine
janeaustine50 at hotmail.com
Tue Oct 5 08:36:54 EDT 2004
Peter L. Hansen wrote:
>
> Jane Austine wrote:
> > I have student_names and their scores, which are between 0..100.
> >
> > I want to retrieve in the sorted order in terms of score, Nth
> > student's name to N+10th student's name along with their scores.
> > (suppose a web page showing top 10 students and there are "next page"
> > link and if you follow the link there shows next top 10 students with
> > "prev" and "next" page links.)
> >
> > The list is dynamically resized -- elements are added or deleted. The
> > list could go fairly long enough that pulling all the data on memory
> > at once is not a good choice.
>
> "Fairly long enough"? I imagine you'd have to have thousands of
> students before just doing the simplest thing (ignore performance
> and read the whole list) would become a noticable problem.
First of all, thank you for replying me.
Yes, I do have more than a thousand.
>
> It's rare in an environment like the web, with all the user and
> network delays that are present, that optimizing at the level
> you are trying to do is worth the effort.
>
Yes. Especially when there are CPU load limits(process going on like a
few seconds using more than 10% will get warning) on the server, and
it is shared among a few users(using a web hosting service).
> > What algorithm and data/file structure are most efficient? (the
> > environment is "cgi" based web environment)
>
> And unless mod_python is involved, CGI adds yet another significant
> delay, masking any slight difference in performance from the optimal
> approach and the simplistic one.
>
True. So the situation is not very good, and that's why I need a
clever scheme. :)
> > Simplest approach would be using btree from bsddb.
>
> Nope, simplest would be just use a list and ignore the theoretical
> but unnoticable performance issues. Remember, the actual memory
> allocation and manipulations of the list elements happens in C code,
> not in Python, so it is very fast.
>
> > What can I do instead? What data structure allows you to access
> > nth..n+10th data from the sorted list in a constant time?
>
> If you've got more than 1000 students, it _might_ be worth
> analyzing this further. :-)
>
Actually, it's not exactly students. They are some kind of elements
and they have a few attributes(columns) like timestamp and etc, and
can be sorted by a few columns(single key sorting though).
Most of the time I suppose the lists would grow up to 10000 ~ 90000
elements.
-Jane
More information about the Python-list
mailing list