mergeall 2.2, with os.scandir() speed optimization

Mark Lutz lutz at rmi.net
Fri Sep 25 02:18:04 CEST 2015


There's a new version of the mergeall folder tree synchronization tool, which
uses Python 3.5's os.scandir(), if available, to radically speed up its trees
comparison phase.  In testing on Windows 7 and 10, the new call speeds mergeall
comparisons by a factor of 5 to 10, depending on devices.  This is due entirely
to the elimination of system calls that os.scandir() affords.

The savings is especially significant for large archives.  For a 78G target use
case of 50k files in 3k folders, comparison runtime fell from  40 to 7 seconds 
on a fast USB stick (6x); from 112 to 16 seconds on a slower stick (7x); and 
from 600 to 60 seconds on an ancient single-core machine (10x). 

Also note that the scandir() call is standard in the os module in 3.5, but can
also be had for older Python releases, including 2.7 and older 3.X, via a PyPI
package.  mergeall uses either form if present, and falls back on the original 
os.listdir() scheme as a last resort to continue supporting older Pythons 
(though a scandir() is now strongly recommended, for obvious reasons!).

All of which seems proof that language improvement and backward compatibility 
are not necessarily mutually exclusive.  The details:

2.2 changes:
    http://learning-python.com/mergeall/docs/Usage-Overview.html#optimizations

Main README: 
    http://learning-python.com/mergeall/Readme.html

Usage guide:
    http://learning-python.com/mergeall/docs/Usage-Overview.html

GUI screenshot:
    http://learning-python.com/mergeall/examples/Screenshots/main-quit-help.png

Download the package:
    http://learning-python.com/downloads/mergeall.zip

Cheers,
--M. Lutz (http://www.rmi.net/~lutz | http://learning-python.com)


More information about the Python-announce-list mailing list