[Python-3000] Removal of os.path.walk

Guido van Rossum guido at python.org
Thu May 1 01:02:31 CEST 2008


There is one use case I can see for an iterator-version of
os.listdir() (to be named os.opendir()): when globbing a huge
directory looking for a certain pattern. Using os.listdir() you end up
needed enough memory to hold all of the names at once. Using
os.opendir() you would need only enough memory to hold all of the
names THAT MATCH.

On Wed, Apr 30, 2008 at 3:50 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > There's a big difference between "not enough memory" and "directory
>  > consumes lots of memory".  My company has some directories with several
>  > hundred thousand entries, so using an iterator would be appreciated
>  > (although by the time we upgrade to Python 3.x, we probably will have
>  > fixed that architecture).
>  >
>  > But even then, we're talking tens of megabytes at worst, so it's not a
>  > killer -- just painful.
>
>  But what kind of operation do you want to perform on that directory?
>
>  I would expect that usually, you either
>
>  a) refer to a single file, which you are either going to create, or
>    want to process. In that case, you know the name in advance, so
>    you open/stat/mkdir/unlink/rmdir the file, without caring how
>    many files exist in the directory,
>  or
>
>  b) need to process all files, to count/sum/backup/remove them;
>    in this case, you will need the entire list in the process,
>    and reading them one-by-one is likely going to slow down
>    the entire operation, instead of speeding it up.
>
>  So in no case, you actually need to read the entries incrementally.
>
>  That the C APIs provide chunk-wise processing is just because
>  dynamic memory management is so painful to write in C that the
>  caller is just asked to pass a limited-size output buffer, which then
>  gets refilled in subsequent read calls. Originally, the APIs would
>  return a single entry at a time from the file system, which was
>  super-slow. Today, SysV all-singing all-dancing getdents provides
>  multiple entries at a time, for performance reasons.
>
>  Regards,
>  Martin
>
>
> _______________________________________________
>  Python-3000 mailing list
>  Python-3000 at python.org
>  http://mail.python.org/mailman/listinfo/python-3000
>  Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list