Stripping ASCII codes when parsing

David Pratt fairwinds at eastlink.ca
Mon Oct 17 11:21:14 EDT 2005


Many thanks Steve. This is good information. I think this should work 
fine. I was doing a string.replace in a cleanData() method with the 
following characters but don't know if that would have done it. This 
contains all the control characters that I really know about in normal 
use. ord(c) < 32 sounds like a much better way to go and comprehensive. 
  So I guess instead of string.replace, I should do a    ...  for char 
in ...    and check evaluate each character, correct? - or is there a 
better way of eliminating these other that reading a string in 
character by character.

'\a','\b','\e','\f','\n','\r','\t','\v','|'

Regards,
David


On Monday, October 17, 2005, at 06:04 AM, Steve Holden wrote:

> David Pratt wrote:
>> I am working with a text format that advises to strip any ascii 
>> control
>> characters (0 - 30) as part of parsing data and also the ascii pipe
>> character (124) from the data. I think many of these characters are
>> from a different time. Since I have never seen most of these 
>> characters
>> in text I am not sure how these first 30 control characters are all
>> represented (other than say tab (\t), newline(\n), line return(\r) ) 
>> so
>> what should I do to remove these characters if they are ever
>> encountered. Many thanks.
>
> You will find the ord() function useful: control characters all have
> ord(c) < 32.
>
> You can also use the chr() function to return a character whose ord() 
> is
> a specific value, and you can use hex escapes to include arbitrary
> control characters in string literals:
>
>    myString = "\x00\x01\x02"
>
> regards
>   Steve
> -- 
> Steve Holden       +44 150 684 7255  +1 800 494 3119
> Holden Web LLC                     www.holdenweb.com
> PyCon TX 2006                  www.python.org/pycon/
>
> -- 
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list