Surprising (for me) benchmark results...

John J. Lee phrxy at csv.warwick.ac.uk
Wed May 2 01:05:38 EDT 2001


On Tue, 1 May 2001, Daniel Berlin wrote:
> On Wed, 2 May 2001, Rainer Deyke wrote:
> > "Courageous" <jkraska1 at san.rr.com> wrote in message
> > news:rrouet0f4un3h3eo8pq6nlk5idgkmhhhca at 4ax.com...
> > > On Wed, 02 May 2001 01:11:50 GMT, "Rainer Deyke" <root at rainerdeyke.com>
> > wrote:
> > > >"Daniel Berlin" <dan at www.cgsoftware.com> wrote in message
> > > >news:mailman.988762570.8591.python-list at python.org...
> > > >> Actually, the sort is probably not the bottleneck.
> > > >> As the file gets larger, the bottleneck becomes disk time, not sort
> > time.
> > > >Disk access is O(N); sorting is typically O(N log N).  Therefore as the
[...]
> > > Disregarding the algorithm and this specific situation in question, you
> > are
> > > both correct; this is because while disk access may scale O(N), this
> > > scaling is multiplied by a huge constant factor that can in many cases
> > > dwarf the sorting algorithm, depending on the number of entries and
> > > the like.

The best answer, of course.  Mind you, I don't know whether the curves
*ever* cross for feasible N.  Does anyone here know, though direct
experience or calculation?  The other thing is that for small changes in
N, disk access doesn't necessarily go as N (caches, that sort of thing?),
and maybe sorting doesn't go exactly as N ln N.

> > I was responding to this: "As the file gets larger, the bottleneck becomes
> > disk time, not sort time."  This is just plain incorrect (assuming larger
> > file = more entries, not longer entries).
> No, it's not.
> You still have to get your damn file off the disk.
> He was timing a sorting program in it's entirety, not the sort.
> Therefore, the disk i/o time matters. And with it taking such a short time
> for each program run, it could possibly matter a *whole lot*.
> > If disk access dominates  on large
> > files, it dominates even more on small files.
> Err, that doesn't make it not dominate on small files.

And if it (disk access) dominates on small files, then it dominates less
on large files, until eventually the sort dominates.  Mm?

To repeat what you said again:

> > > >> As the file gets larger, the bottleneck becomes disk time, not sort
> > time.

(you can't get any more smug and annoying than a good 'Mm?', can you? :)


John




More information about the Python-list mailing list