BitVector

Ross Ridge rridge at caffeine.csclub.uwaterloo.ca
Mon Mar 10 14:52:00 EDT 2008


DanielJohnson  <diffuser78 at gmail.com> wrote:
>I am trying to solve a genetic algorithm problem where I want to read
>a bitvector of very large size (say 10000) and manipulate bits based
>on certain algorithms.
>
>I am a newbie in Python. What data structure are good to read such
>huge data set. Are there any built in classes for bit fiddling.

A 10000 bit data set is by no means huge.  Depending on exactly what
operations you want to perform converting your "bitvector" to a byte
array, one bit per byte may be fast way to go.  This increases the size
of data set by 8 times, but it's probably the fastest way in Python to
test and set individual bits.

Here's a couple of functions I wrote for very quickly converting a
"bitvector" in the form of a string (eg. read from a file) in to a
byte array:

	import string
	import array
	import binascii

	_tr_16 = string.maketrans("0123456789abcdef",
				  "\x00\x01\x02\x03"
				  "\x10\x11\x12\x13"
				  "\x20\x21\x22\x23"
				  "\x30\x31\x32\x33")
	_tr_4 = string.maketrans("0123",
				 "\x00\x01"
				 "\x10\x11")
	_tr_2 = string.maketrans("01", "\x00\x01")
	
	def string_to_bit_array(s):
		"""Convert a string to an array containing a sequence of bits."""
		s = binascii.hexlify(s).translate(_tr_16)
		s = binascii.hexlify(s).translate(_tr_4)
		s = binascii.hexlify(s).translate(_tr_2)
		a = array.array('B', s)
		return a
	
	_tr_rev_2 = string.maketrans("\x00\x01", "01")
	_tr_rev_4 = string.maketrans("\x00\x01"
				     "\x10\x11",
				     "0123")
	_tr_rev_16 = string.maketrans("\x00\x01\x02\x03"
				      "\x10\x11\x12\x13"
				      "\x20\x21\x22\x23"
				      "\x30\x31\x32\x33",
				      "0123456789abcdef")
	def bit_array_to_string(a):
		"""Convert an array containing a sequence of bits to a string."""
		remainder = len(a) % 8
		if remainder != 0:
			a.fromlist([0] * (8 - remainder))
		s = a.tostring()
		s = binascii.unhexlify(s.translate(_tr_rev_2))
		s = binascii.unhexlify(s.translate(_tr_rev_4))	
		return binascii.unhexlify(s.translate(_tr_rev_16))

I've used these functions to implement a data compression algorithim
in Python.  The algorithm still runs very slow in Python, but it runs
much faster than it would have if I had used Python's bitwise operators.

					Ross Ridge

-- 
 l/  //	  Ross Ridge -- The Great HTMU
[oo][oo]  rridge at csclub.uwaterloo.ca
-()-/()/  http://www.csclub.uwaterloo.ca/~rridge/ 
 db  //	  



More information about the Python-list mailing list