Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently

Changjune Kim juneaftn1 at kornet.net
Tue Oct 5 09:53:59 EDT 2004


"Jane Austine" <janeaustine50 at hotmail.com> wrote in message
news:ba1e306f.0410050436.59d455bf at posting.google.com...
> Peter L. Hansen wrote:
> >
> > Jane Austine wrote:
> > > I have student_names and their scores, which are between 0..100.
> > >
> > > I want to retrieve in the sorted order in terms of score, Nth
> > > student's name to N+10th student's name along with their scores.
> > > (suppose a web page showing top 10 students and there are "next page"
> > > link and if you follow the link there shows next top 10 students with
> > > "prev" and "next" page links.)
> > >
> > > The list is dynamically resized -- elements are added or deleted. The
> > > list could go fairly long enough that pulling all the data on memory
> > > at once is not a good choice.
> >
> > "Fairly long enough"?  I imagine you'd have to have thousands of
> > students before just doing the simplest thing (ignore performance
> > and read the whole list) would become a noticable problem.
>
> First of all, thank you for replying me.
>
> Yes, I do have more than a thousand.
>
> >
> > It's rare in an environment like the web, with all the user and
> > network delays that are present, that optimizing at the level
> > you are trying to do is worth the effort.
> >
>
> Yes. Especially when there are CPU load limits(process going on like a
> few seconds using more than 10% will get warning) on the server, and
> it is shared among a few users(using a web hosting service).
>
> > > What algorithm and data/file structure are most efficient? (the
> > > environment is "cgi" based web environment)
> >
> > And unless mod_python is involved, CGI adds yet another significant
> > delay, masking any slight difference in performance from the optimal
> > approach and the simplistic one.
> >
>
> True. So the situation is not very good, and that's why I need a
> clever scheme. :)
>
> > > Simplest approach would be using btree from bsddb.
> >
> > Nope, simplest would be just use a list and ignore the theoretical
> > but unnoticable performance issues.  Remember, the actual memory
> > allocation and manipulations of the list elements happens in C code,
> > not in Python, so it is very fast.
> >
> > > What can I do instead? What data structure allows you to access
> > > nth..n+10th data from the sorted list in a constant time?
> >
> > If you've got more than 1000 students, it _might_ be worth
> > analyzing this further. :-)
> >
>
> Actually, it's not exactly students. They are some kind of elements
> and they have a few attributes(columns) like timestamp and etc, and
> can be sorted by a few columns(single key sorting though).
>
> Most of the time I suppose the lists would grow up to 10000 ~ 90000
> elements.
>
> -Jane

If using standalone RDBMS engine(like MySQL) is not your choice, and you
want to stick to bsddb, take a look at test_basics.py in the lib/bsddb/test
directory. There you will see BTreeRecnoTestCase.






More information about the Python-list mailing list