XORing long strings opimization?

Tue Nov 4 16:27:43 EST 2003

Noen wrote:

> def XOR(s1,s2):
>      """ XOR string s1 with s2 """
>      output = ""
>      # Argument check
>      if (type(s1) or type(s2)) != type(""):
>          raise TypeError, "Arguments are not strings"
>      if len(s1) != len(s2):
>          raise ValueError, "Size differs in strings"
>      # Xoring
>      for i in range(len(s1)):
>          output += chr(ord(s1[i]) ^ ord(s2[i]))
>      return output

I keep nagging:

>>> xor3.XOR("", 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "xor3.py", line 7, in XOR
    if len(s1) != len(s2):
TypeError: len() of unsized object

Must be in a nasty mood this evening :-)
Seriously, I think that "first make it right, then make it fast" is a
reasonable design principle.
You can check the type

if type(s1) != type("") or type(s2) != type(""): 
...

or

if not isinstance(s1, str) or not isinstance(s2, str):
...

or even 

if not type(s1) == type(s2) == type(""):
...

but personally I would omit the check altogether. The second check could be
relaxed to len(s1) <= len(s2), assuming that s1 is the data to be encrypted
and s2 the random character sequence that makes your encryption NSA-proof.
For (much) weaker encryption schemes, you could omit it completely and
cycle over the characters of s2 (not recommended).

As to speed, I would suppose the following to be somewhat faster:

def xor(s1, s2):
    return "".join([chr(ord(c1) ^ ord(c2)) for c1, c2 in itertools.izip(s1,
s2)])

It favours "".join(character list) over consecutive s += tail operations,
which need to build "many" strings as intermediate results.

I think, further speedups could be achieved building a lookup table for all
character combinations.

Peter