Filtering out non-readable characters

Steve Holden steve at holdenweb.com
Fri Jul 29 10:27:21 EDT 2005


Adriaan Renting wrote:
> def StripNoPrint(self, S):
>         from string import printable
>         return "".join([ ch for ch in S if ch in printable ])
> 
> 
> Adriaan Renting        | Email: renting at astron.nl
> ASTRON                 | Phone: +31 521 595 217
> P.O. Box 2             | GSM:   +31 6 24 25 17 28
> NL-7990 AA Dwingeloo   | FAX:   +31 521 597 332
> The Netherlands        | Web: http://www.astron.nl/~renting/
> 
>>>>"MKoool" <mohan at terabolic.com> 07/16/05 2:33 AM >>>
> 
> I have a file with binary and ascii characters in it.  I massage the
> data and convert it to a more readable format, however it still comes
> up with some binary characters mixed in.  I'd like to write something
> to just replace all non-printable characters with '' (I want to delete
> non-printable characters).
> 
> I am having trouble figuring out an easy python way to do this... is
> the easiest way to just write some regular expression that does
> something like replace [^\p] with ''?
> 
> Or is it better to go through every character and do ord(character),
> check the ascii values?
> 
> What's the easiest way to do something like this?
> 
> thanks
> 
I'd consider using the string's translate() method for this. Provide it 
with two arguments: the first should be a string of the 256 ordinals 
from 0 to 255 (because you won't be changing any characters, so you need 
a translate table that effects the null transformation) and the second 
argument should a string containing all the characters you want to remove.

So

  >>> tt = "".join([chr(i) for i in range(256)])

generates the null translate table quite easily. Then

  >>> import string
  >>> ds = tt.translate(tt, string.printable)

sets ds to be all the non-printable characters (according to the string 
module, anyway).

Now you should be able to remove the non-printable characters from s by 
writing

     s = s.translate(tt, ds)

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC             http://www.holdenweb.com/




More information about the Python-list mailing list