multicpu bzip2 using os.system or queue using python script

harijay harijay at gmail.com
Tue Jul 27 11:22:15 EDT 2010


I want to quickly bzip2 compress several hundred gigabytes of data
using my 8 core , 16 GB ram workstation.
Currently I am using a simple python script to compress a whole
directory tree using bzip2 and a system call coupled to an os.walk
call.

I see that the bzip2 only uses a single cpu while the other cpus
remain relatively idle.

I am a newbie in queue and threaded processes . But I am wondering how
I can implement this such that I can have four bzip2 running threads
(actually I guess os.system threads ), each using probably their own
cpu , that deplete files from a queue as they bzip them.


Thanks for your suggestions in advance

hari


My single thread script is pasted here .

import os
import sys


for roots, dirlist , filelist in os.walk(os.curdir):
	for file in [os.path.join(roots,filegot) for filegot in filelist]:
		if "bz2" not in file:
			print "Compressing %s" % (file)
			os.system("bzip2 %s" % file)
			print ":DONE"





More information about the Python-list mailing list