bz2 & cpu usage

Brad Tilley bradtilley at gmail.com
Wed Oct 20 11:45:53 EDT 2004


Kirk Job-Sluder wrote:
> Sorry for the late post, the original scrolled off the server.
> 
>  > I'd like to keep at least 50% of the cpu free while doing bz2 file
>  > compression. Currently, bz2 compression takes between 80 & 100 percent
>  > of the cpu and the Windows GUI becomes almost useless. How can I lower
>  > the strain on the cpu and still do compression? I'm willing for the
>  > compression process to take longer.
>  >
>  > Thanks,
>  >
>  > Brad
>  >
>  > def compress_file(filename):
>  >     path = r"C:\repository_backup"
>  >     print path
>  >     for root, dirs, files in os.walk(path):
>  >         for f in files:
>  >             if f == filename:
>  >                 print "Compressing", f
>  >                 x = file(os.path.join(root, f), 'rb')
>  >                 os.chdir(path)
>  >                 y = bz2.BZ2File(f + ".bz2", 'w')
>  >                 while True:
>  >                     data = x.read(1024000)
>  >                     time.sleep(0.1)
>  >                     if not data:
>  >                         break
>  >                     y.write(data)
>  >                     time.sleep(0.1)
>  >                 y.close()
>  >                 x.close()
>  >             else:
>  >                 return
> 
> One of the issues you may be running into is memory.  Under windows, 
> using up 90% of the CPU shouldn't affect GUI performance (much) but 
> swapping does.  According to the bzip2 man page, the maximum block size 
> is 900KB so you might be running into problems reading your file 1024KB 
> at a time.  Use the system monitor control panel to check for excessive 
> swapping.  Bzip2 uses 8x<blocksize> memory.  So with the default setting 
> of a 900KB block size, you are looking at 7.2M + some bookeeping memory.
> 
> Another issue is that you might be better off downloading bzip2 for 
> windows and letting the gnu bzip2  implementation handle file input and 
> output.  Using a shell command here might be more efficient in spite of 
> spawning a new process.
> 
> A third issue is that bzip2 achieves high compression efficiency at the 
> expense of CPU time and memory.  It might be worth considering whether 
> gzip might occupy the sweet spot compromise between minimal archive size 
> and minimal cpu usage.
> 
> Fourth, how many of those files are uncompressible?  I've noticed that 
> bzip2 tries really hard to eek out some form of savings from 
> uncompressible files.  A filename filter for files that should not be 
> compressed (png, jpg, gif, sx*) might be worth doing here.

Thanks for the tips. I installed 512MB of ECC Ram and the problem went away.



More information about the Python-list mailing list