Stripping ASCII codes when parsing

Mon Oct 17 16:13:50 EDT 2005

David Pratt wrote:

> I am working with a text format that advises to strip any ascii control 
> characters (0 - 30) as part of parsing data and also the ascii pipe 
> character (124) from the data. I think many of these characters are 
> from a different time. Since I have never seen most of these characters 
> in text I am not sure how these first 30 control characters are all 
> represented (other than say tab (\t), newline(\n), line return(\r) ) so 
> what should I do to remove these characters if they are ever 
> encountered. Many thanks.

Use ''.translate.  Pass in the identity mapping for the first argument, 
and for the second parameter, specify the list of all the characters you 
wish to delete.  This would probably be something like:

	IDENTITY_MAP = ''.join([chr(x) for x in range(256)])
	BAD_MAP = ''.join([chr(x) for x in range(32) + [124])

	aNewString = aString.translate(IDENTITY_MAP, BAD_MAP)

Note that ASCII 31 is also a control character (US).

-- 
Erik Max Francis && max at alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
   The believer is happy; the doubter is wise.
   -- (an Hungarian proverb)