Stripping unencodable characters from a string

Dave Angel davea at davea.name
Tue May 5 15:00:38 EDT 2015


On 05/05/2015 02:19 PM, Paul Moore wrote:

You need to specify that you're using Python 3.4 (or whichever) when 
starting a new thread.

> I want to write a string to an already-open file (sys.stdout, typically). However, I *don't* want encoding errors, and the string could be arbitrary Unicode (in theory). The best way I've found is
>
>      data = data.encode(file.encoding, errors='replace').decode(file.encoding)
>      file.write(data)
>
> (I'd probably use backslashreplace rather than replace, but that's a minor point).
>
> Is that the best way? The multiple re-encoding dance seems a bit clumsy, but it was the best I could think of.
>
> Thanks,
> Paul.
>

If you're going to take charge of the encoding of the file, why not just 
open the file in binary, and do it all with
     file.write(data.encode( myencoding, errors='replace') )

i can't see the benefit of two encodes and a decode just to write a 
string to the file.

Alternatively, there's probably a way to open the file using 
codecs.open(), and reassign it to sys.stdout.


-- 
DaveA



More information about the Python-list mailing list