Re: Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently
From: Changjune Kim (juneaftn1_at_kornet.net)
Date: Tue, 5 Oct 2004 22:54:15 +0900
"Jane Austine" <email@example.com> wrote in message
> Peter L. Hansen wrote:
> > Jane Austine wrote:
> > > I have student_names and their scores, which are between 0..100.
> > >
> > > I want to retrieve in the sorted order in terms of score, Nth
> > > student's name to N+10th student's name along with their scores.
> > > (suppose a web page showing top 10 students and there are "next page"
> > > link and if you follow the link there shows next top 10 students with
> > > "prev" and "next" page links.)
> > >
> > > The list is dynamically resized -- elements are added or deleted. The
> > > list could go fairly long enough that pulling all the data on memory
> > > at once is not a good choice.
> > "Fairly long enough"? I imagine you'd have to have thousands of
> > students before just doing the simplest thing (ignore performance
> > and read the whole list) would become a noticable problem.
> First of all, thank you for replying me.
> Yes, I do have more than a thousand.
> > It's rare in an environment like the web, with all the user and
> > network delays that are present, that optimizing at the level
> > you are trying to do is worth the effort.
> Yes. Especially when there are CPU load limits(process going on like a
> few seconds using more than 10% will get warning) on the server, and
> it is shared among a few users(using a web hosting service).
> > > What algorithm and data/file structure are most efficient? (the
> > > environment is "cgi" based web environment)
> > And unless mod_python is involved, CGI adds yet another significant
> > delay, masking any slight difference in performance from the optimal
> > approach and the simplistic one.
> True. So the situation is not very good, and that's why I need a
> clever scheme. :)
> > > Simplest approach would be using btree from bsddb.
> > Nope, simplest would be just use a list and ignore the theoretical
> > but unnoticable performance issues. Remember, the actual memory
> > allocation and manipulations of the list elements happens in C code,
> > not in Python, so it is very fast.
> > > What can I do instead? What data structure allows you to access
> > > nth..n+10th data from the sorted list in a constant time?
> > If you've got more than 1000 students, it _might_ be worth
> > analyzing this further. :-)
> Actually, it's not exactly students. They are some kind of elements
> and they have a few attributes(columns) like timestamp and etc, and
> can be sorted by a few columns(single key sorting though).
> Most of the time I suppose the lists would grow up to 10000 ~ 90000
If using standalone RDBMS engine(like MySQL) is not your choice, and you
want to stick to bsddb, take a look at test_basics.py in the lib/bsddb/test
directory. There you will see BTreeRecnoTestCase.