os walk() and threads problems (os.walk are thread safe?)

Chris Mellon arkanes at gmail.com
Tue Nov 13 14:46:45 EST 2007


On Nov 13, 2007 1:06 PM, Marcus Alves Grando <marcus at sbh.eng.br> wrote:
> Diez B. Roggisch wrote:
> > Marcus Alves Grando wrote:
> >
> >> Diez B. Roggisch wrote:
> >>> Marcus Alves Grando wrote:
> >>>
> >>>> Hello list,
> >>>>
> >>>> I have a strange problem with os.walk and threads in python script. I
> >>>> have one script that create some threads and consume Queue. For every
> >>>> value in Queue this script run os.walk() and printing root dir. But if i
> >>>> increase number of threads the result are inconsistent compared with one
> >>>> thread.
> >>>>
> >>>> For example, run this code plus sort with one thread and after run again
> >>>> with ten threads and see diff(1).
> >>> I don't see any difference. I ran it with 1 and 10 workers + sorted the
> >>> output. No diff whatsoever.
> >> Do you test in one dir with many subdirs? like /usr or /usr/ports (in
> >> freebsd) for example?
> >
> > Yes, over 1000 subdirs/files.
>
> Strange, because to me accurs every time.
>
> >
> >>> And I don't know what you mean by diff(1) - was that supposed to be some
> >>> output?
> >> No. One thread produce one result and ten threads produce another result
> >> with less lines.
> >>
> >> Se example below:
> >>
> >> @@ -13774,8 +13782,6 @@
> >>   /usr/compat/linux/proc/44
> >>   /usr/compat/linux/proc/45
> >>   /usr/compat/linux/proc/45318
> >> -/usr/compat/linux/proc/45484
> >> -/usr/compat/linux/proc/45532
> >>   /usr/compat/linux/proc/45857
> >>   /usr/compat/linux/proc/45903
> >>   /usr/compat/linux/proc/46
> >
> > I'm not sure what that directory is, but to me that looks like the
> > linux /proc dir, containing process ids. Which incidentially changes
> > between the two runs, as more threads will have process id aliases.
>
> My example are not good enough. I run this script in ports directory of
> freebsd and imap folders in my linux server, same thing.
>
> @@ -182,7 +220,6 @@
>   /usr/ports/archivers/p5-POE-Filter-Bzip2
>   /usr/ports/archivers/p5-POE-Filter-LZF
>   /usr/ports/archivers/p5-POE-Filter-LZO
> -/usr/ports/archivers/p5-POE-Filter-LZW
>   /usr/ports/archivers/p5-POE-Filter-Zlib
>   /usr/ports/archivers/p5-PerlIO-gzip
>   /usr/ports/archivers/p5-PerlIO-via-Bzip2
> @@ -234,7 +271,6 @@
>   /usr/ports/archivers/star-devel
>   /usr/ports/archivers/star-devel/files
>   /usr/ports/archivers/star/files
> -/usr/ports/archivers/stuffit
>   /usr/ports/archivers/szip
>   /usr/ports/archivers/tardy
>   /usr/ports/archivers/tardy/files
>
>

Are you just diffing the output? There's no guarantee that
os.path.walk() will always have the same order, or that your different
working threads will produce the same output in the same order. On my
system, for example, I get a different order of subdirectory output
when I run with 10 threads than with 1.

walk() requires that stat() works for the next directory that will be
walked. It might be remotely possible that stat() is failing for some
reason and some directories are being lost (this is probably not going
to be reproducible). If you can reproduce it, trying using pdb to see
what's going on inside walk().



More information about the Python-list mailing list