regex/lambda black magic

John Machin sjmachin at lexicon.net
Thu May 25 17:11:17 EDT 2006


On 26/05/2006 4:33 AM, Andrew Robert wrote:
> Hi Everyone,
> 
> 
> Thanks for all of your patience on this.
> 
> I finally got it to work.
> 
> 
> Here is the completed test code showing what is going on.

Consider doing what you should have done at the start: state what you 
are trying to achieve. Not very many people have the patience that Max 
showing ploughing through code that was both fugly and broken in order 
to determine what it should have been doing.

What is the motivation for encoding characters like 
,./<>;':"`~!@#$^&*()-+=[]\{}|

> 
> Not cleaned up yet but it works for proof-of-concept purposes.
> 
> 
> 
> #!/usr/bin/python
> 
> import re,base64
> 
> # Evaluate captured character as hex
> def ret_hex(value):
> 	return '%'+base64.b16encode(value)

This is IMHO rather pointless and obfuscatory, calling a function in a 
module when it can be done by a standard language feature. Why did you 
change it from the original "%%%2X" % value (which would have been 
better IMHO done as "%%%02X" % value)?

> 
> # Evaluate the value of whatever was matched
> def enc_hex_match(match):
>  	return ret_hex(match.group(0))

Why a second level of function call?

> 
> def ret_ascii(value):
> 	return base64.b16decode(value)

See above.


> 
> # Evaluate the value of whatever was matched
> def enc_ascii_match(match):
> 
> 	arg=match.group()
> 
> 	#remove the artifically inserted % sign

Don't bother, just ignore it.
return int(match()[1:], 16)

> 	arg=arg[1:]
> 
> 	# decode the result
>  	return ret_ascii(arg)
> 
> def file_encoder():
> 	# Read each line, pass any matches on line to function for
> 	# line in file.readlines():
> 	output=open(r'e:\pycode\sigh.new','wb')
> 	for line in open(r'e:\pycode\sigh.txt','rb'):
> 		 output.write( (re.sub('[^\w\s]',enc_hex_match, line)) )
> 	output.close()

Why are you opening the file with "rb" but then reading it a line at a time?
For a binary file, the whole file may be one "line"; it would be safer 
to read() blocks of say 8Kb.
For a text file, the only point of the binary mode might be to avoid any 
sort of problem caused by OS-dependant definitions of "newline" i.e. 
CRLF vs LF. I note that as \r and \n are whitespace, you are not 
encoding them as %0D and %0A; is this deliberate?

> 
> def file_decoder():
> 	# Read each line, pass any matches on line to function for
> 	# line in file.readlines():
> 
> 	output=open(r'e:\pycode\sigh.new2','wb')
> 	for line in open(r'e:\pycode\sigh.new','rb'):
> 		output.write(re.sub('%[0-9A-F][0-9A-F]',enc_ascii_match, line))
> 	output.close()
> 
> 
> 
> 
> file_encoder()
> 
> file_decoder()



More information about the Python-list mailing list