bz2 & cpu usage

Kirk Job-Sluder kirk-news at jobsluder.net
Tue Oct 19 17:23:13 EDT 2004


Sorry for the late post, the original scrolled off the server.

 > I'd like to keep at least 50% of the cpu free while doing bz2 file
 > compression. Currently, bz2 compression takes between 80 & 100 percent
 > of the cpu and the Windows GUI becomes almost useless. How can I lower
 > the strain on the cpu and still do compression? I'm willing for the
 > compression process to take longer.
 >
 > Thanks,
 >
 > Brad
 >
 > def compress_file(filename):
 >     path = r"C:\repository_backup"
 >     print path
 >     for root, dirs, files in os.walk(path):
 >         for f in files:
 >             if f == filename:
 >                 print "Compressing", f
 >                 x = file(os.path.join(root, f), 'rb')
 >                 os.chdir(path)
 >                 y = bz2.BZ2File(f + ".bz2", 'w')
 >                 while True:
 >                     data = x.read(1024000)
 >                     time.sleep(0.1)
 >                     if not data:
 >                         break
 >                     y.write(data)
 >                     time.sleep(0.1)
 >                 y.close()
 >                 x.close()
 >             else:
 >                 return

One of the issues you may be running into is memory.  Under windows, 
using up 90% of the CPU shouldn't affect GUI performance (much) but 
swapping does.  According to the bzip2 man page, the maximum block size 
is 900KB so you might be running into problems reading your file 1024KB 
at a time.  Use the system monitor control panel to check for excessive 
swapping.  Bzip2 uses 8x<blocksize> memory.  So with the default setting 
of a 900KB block size, you are looking at 7.2M + some bookeeping memory.

Another issue is that you might be better off downloading bzip2 for 
windows and letting the gnu bzip2  implementation handle file input and 
output.  Using a shell command here might be more efficient in spite of 
spawning a new process.

A third issue is that bzip2 achieves high compression efficiency at the 
expense of CPU time and memory.  It might be worth considering whether 
gzip might occupy the sweet spot compromise between minimal archive size 
and minimal cpu usage.

Fourth, how many of those files are uncompressible?  I've noticed that 
bzip2 tries really hard to eek out some form of savings from 
uncompressible files.  A filename filter for files that should not be 
compressed (png, jpg, gif, sx*) might be worth doing here.



More information about the Python-list mailing list