[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

Serhiy Storchaka report at bugs.python.org
Tue Jan 2 03:22:14 EST 2018


Serhiy Storchaka <storchaka+cpython at gmail.com> added the comment:

Thanks for your investigations Niklas. I ran my benchmark on a spinning disk, but significant nonlinearity is observed only for the size of directory around 1000000 and more. And yes, sorting by inode number helps.

$ for j in 1 2 3 4 6 8 12 16; do i=$((j*100000)); mkdir x; (cd x && seq $i|xargs touch); env time -f "%e $i" python3.7-unpatched -c 'import shutil; shutil.rmtree("x")'; done
1.01 100000
3.80 200000
3.64 300000
4.89 400000
8.72 600000
11.86 800000
56.80 1200000
209.82 1600000

$ for j in 1 2 3 4 6 8 12 16; do i=$((j*100000)); mkdir x; (cd x && seq $i|xargs touch); env time -f "%e $i" python3.7-patched -c 'import shutil; shutil.rmtree("x")'; done
0.97 100000
2.42 200000
3.84 300000
4.48 400000
7.07 600000
10.01 800000
15.53 1200000
23.24 1600000

$ for j in 1 2 3 4 6 8 12 16; do i=$((j*100000)); mkdir x; (cd x && seq $i|xargs touch); env time -f "%e $i" env rm -rf x; done
0.68 100000
1.34 200000
2.10 300000
3.95 400000
5.95 600000
10.28 800000
47.66 1200000
89.32 1600000

On an SSD:

$ for j in 1 2 3 4 6 8 12 16; do i=$((j*100000)); mkdir x; (cd x && seq $i|xargs touch); env time -f "%e $i" python3.7-unpatched -c 'import shutil; shutil.rmtree("x")'; done
1.00 100000
1.93 200000
2.90 300000
4.98 400000
7.05 600000
9.87 800000
21.45 1200000
36.19 1600000

$ for j in 1 2 3 4 6 8 12 16; do i=$((j*100000)); mkdir x; (cd x && seq $i|xargs touch); env time -f "%e $i" python3.7-patched -c 'import shutil; shutil.rmtree("x")'; done
0.96 100000
1.99 200000
3.09 300000
4.85 400000
7.55 600000
9.44 800000
16.03 1200000
21.28 1600000

$ for j in 1 2 3 4 6 8 12 16; do i=$((j*100000)); mkdir x; (cd x && seq $i|xargs touch); env time -f "%e $i" env rm -rf x; done
0.67 100000
1.38 200000
2.41 300000
2.82 400000
5.24 600000
7.02 800000
18.60 1200000
30.58 1600000

Interesting that patched Python is faster than 'rm' (GNU coreutils 8.26) for large numbers.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue32453>
_______________________________________


More information about the Python-bugs-list mailing list