[Tutor] Removing control characters

Mark Tolonen metolone+gmane at gmail.com
Fri Feb 20 02:16:26 CET 2009


"Kent Johnson" <kent37 at tds.net> wrote in message 
news:1c2a2c590902191500y71600feerff0b73a88fb49eed at mail.gmail.com...
> On Thu, Feb 19, 2009 at 5:41 PM, Dinesh B Vadhia
> <dineshbvadhia at hotmail.com> wrote:
>> Okay, here is a combination of Mark's suggestions and yours:
>
>>> # replace unwanted chars in string s with " "
>>> t = "".join([(" " if n in c else n) for n in s if n not in c])
>>> t
>> 'Product ConceptsHard candy with an innovative twist, Internet Archive:
>> Wayback Machine. [online] Mar. 25, 2004. Retrieved from the Internet 
>> <URL:
>> http://www.confectionery-innovations.com>.'
>>
>> This last bit doesn't work ie. replacing the unwanted chars with " " - 
>> eg.
>> 'ConceptsHard'.  What's missing?
>
> The "if n not in c" at the end of the list comp rejects the unwanted
> characters from the result immediately. What you wrote is the same as
> t = "".join([n for n in s if n not in c])
>
> because "n in c" will never be true in the first conditional.
>
> BTW if you care about performance, this is the wrong approach. At
> least use a set for c; better would be to use translate().

Sorry, I didn't catch the "replace with space" part.  Kent is right, 
translate is what you want.  The join is still nice for making the 
translation table:

>>> table = ''.join(' ' if n < 32 or n > 126 else chr(n) for n in 
>>> xrange(256))
>>> string.translate('here is\x01my\xffstring',table)
'here is my string'

-Mark




More information about the Tutor mailing list