Amount of files on a drive?

Steve Holden sholden at holdenweb.com
Thu Feb 7 16:31:36 EST 2002


"Robin Munn" <rmunn at pobox.com> wrote ...
> On 6 Feb 2002 04:57:54 GMT, Philip Swartzleonard <starx at pacbell.net>
wrote:
> >G. Willoughby || Tue 05 Feb 2002 07:56:57a:
> >
[file counting code]
> >
> >Well, first determine how long it actually takes (time it), and see how
it
> >compares to say, selecting everything on the drive and hitting properties
> >in the context menu to get a count (i.e. see how good windows itself is
at
> >it). It took my 800mhz athlon about 35 seconds to tell me that i have
6.15
> >gigs in 101,064 files in 3,930 folders on my C drive. Counting on that
> >scale can get slow.
>
> Also, the bottleneck here is going to be the drive's access time, not
> the processor speed.

True to a degree, although multi-level caching will likely mean that it's
difficult to determine the effects directly.

>     The access time is usually measured in milliseconds
> and is the average time for the drive's read/write head to move from one
> location to another on the disk surface. This operation is sometimes
> called a "seek". Since the drive has to perform at least one seek for
> every file, and usually more, the total time spent waiting for the hard
> drive far outweighs the time spent by the processor.
>
This is a completely incorrect assumption. Even the original code is only
visiting the directories, not the actual files themselves. A directory is
just a special file containing data about the files and directories it
contains.

If the whole directory is on a single cylinder (likely for most directories,
but false if a directory is extended after the cylinder it was created on is
full), at most one seek is required to get the heads into position to read
it. Since the content of the files is not being read, no further seeks are
required per file.

> On the other hand, I do see one thing you could do to improve your code.
> Chage the countFiles() function thus:
>
> ------------------------------------------------------------------
> def countFiles(arg, dir, files):
>     count = count + len(files)
> ------------------------------------------------------------------
>
That's certainly a good suggestion. Although disk accesses are almost
certain to dominate timings to the extent that it will make hardly any
difference <0.8 wink>. Also, as you point out below, the original code
couldn't possibly work as shown anyhow, so we're reading only the flavor of
the real program.

> Also, I presume you're using a global variable to hold your count,
> unlike the trimmed-down example code you showed us. In the code you
> showed us above, the count variable would be a variable local to
> countFiles() and would not hold its value across separate invocations of
> countFiles(). Instead, you would do:
>
> ------------------------------------------------------------------
> count = 0
>
> def countFiles(arg, dir, files):
>     global count
>     count = count + len(files)
> ------------------------------------------------------------------
>
This is good advice. Of course, the only reason the original was cast the
way it was is so that the OP could use the walk() function - countFiles() is
the callback for walk().

> The "global count" statement inside countFiles() specifies that from
> here on, the name count will refer not to a variable local to
> countFiles() but instead to the global variable of that name.
>
> Gee, this is reminding me of that student blooper (probably an urban
> legend) that went, "The Iliad was not written by Homer, but by another
> man of that name." :)

someone-also-called-steve-ly y'rs  - steve
--
Consulting, training, speaking: http://www.holdenweb.com/
Author, Python Web Programming: http://pydish.holdenweb.com/pwp/

"This is Python.  We don't care much about theory, except where it
intersects with useful practice."  Aahz Maruch on c.l.py







More information about the Python-list mailing list