Filtering out non-readable characters
Steve Holden
steve at holdenweb.com
Fri Jul 29 10:27:21 EDT 2005
Adriaan Renting wrote:
> def StripNoPrint(self, S):
> from string import printable
> return "".join([ ch for ch in S if ch in printable ])
>
>
> Adriaan Renting | Email: renting at astron.nl
> ASTRON | Phone: +31 521 595 217
> P.O. Box 2 | GSM: +31 6 24 25 17 28
> NL-7990 AA Dwingeloo | FAX: +31 521 597 332
> The Netherlands | Web: http://www.astron.nl/~renting/
>
>>>>"MKoool" <mohan at terabolic.com> 07/16/05 2:33 AM >>>
>
> I have a file with binary and ascii characters in it. I massage the
> data and convert it to a more readable format, however it still comes
> up with some binary characters mixed in. I'd like to write something
> to just replace all non-printable characters with '' (I want to delete
> non-printable characters).
>
> I am having trouble figuring out an easy python way to do this... is
> the easiest way to just write some regular expression that does
> something like replace [^\p] with ''?
>
> Or is it better to go through every character and do ord(character),
> check the ascii values?
>
> What's the easiest way to do something like this?
>
> thanks
>
I'd consider using the string's translate() method for this. Provide it
with two arguments: the first should be a string of the 256 ordinals
from 0 to 255 (because you won't be changing any characters, so you need
a translate table that effects the null transformation) and the second
argument should a string containing all the characters you want to remove.
So
>>> tt = "".join([chr(i) for i in range(256)])
generates the null translate table quite easily. Then
>>> import string
>>> ds = tt.translate(tt, string.printable)
sets ds to be all the non-printable characters (according to the string
module, anyway).
Now you should be able to remove the non-printable characters from s by
writing
s = s.translate(tt, ds)
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
More information about the Python-list
mailing list