Surprising (for me) benchmark results...

Daniel Berlin dan at www.cgsoftware.com
Tue May 1 23:07:24 EDT 2001


On Wed, 2 May 2001, Rainer Deyke wrote:

> "Courageous" <jkraska1 at san.rr.com> wrote in message
> news:rrouet0f4un3h3eo8pq6nlk5idgkmhhhca at 4ax.com...
> > On Wed, 02 May 2001 01:11:50 GMT, "Rainer Deyke" <root at rainerdeyke.com>
> wrote:
> >
> > >"Daniel Berlin" <dan at www.cgsoftware.com> wrote in message
> > >news:mailman.988762570.8591.python-list at python.org...
> > >> Actually, the sort is probably not the bottleneck.
> > >> As the file gets larger, the bottleneck becomes disk time, not sort
> time.
> > >
> > >Disk access is O(N); sorting is typically O(N log N).  Therefore as the
> > >number of entries in the file size increases, the time taken by the sort
> > >becomes more significant and the time taken by disk access becomes less
> > >significant.  This is assuming that the file fits into memory (as was the
> > >case here).
> >
> > Disregarding the algorithm and this specific situation in question, you
> are
> > both correct; this is because while disk access may scale O(N), this
> > scaling is multiplied by a huge constant factor that can in many cases
> > dwarf the sorting algorithm, depending on the number of entries and
> > the like.
>
> I was responding to this: "As the file gets larger, the bottleneck becomes
> disk time, not sort time."  This is just plain incorrect (assuming larger
> file = more entries, not longer entries).
No, it's not.
You still have to get your damn file off the disk.
He was timing a sorting program in it's entirety, not the sort.
Therefore, the disk i/o time matters. And with it taking such a short time
for each program run, it could possibly matter a *whole lot*.
> If disk access dominates  on large
> files, it dominates even more on small files.
Err, that doesn't make it not dominate on small files.
--Dan





More information about the Python-list mailing list