Stripping unencodable characters from a string
Paul Moore
p.f.moore at gmail.com
Tue May 5 15:24:56 EDT 2015
On Tuesday, 5 May 2015 20:01:04 UTC+1, Dave Angel wrote:
> On 05/05/2015 02:19 PM, Paul Moore wrote:
>
> You need to specify that you're using Python 3.4 (or whichever) when
> starting a new thread.
Sorry. 2.6, 2.7, and 3.3+. It's for use in a cross-version library.
> If you're going to take charge of the encoding of the file, why not just
> open the file in binary, and do it all with
> file.write(data.encode( myencoding, errors='replace') )
I don't have control of the encoding of the file. It's typically sys.stdout, which is already open. I can't replace sys.stdout (because the main program which calls my library code wouldn't like me messing with global state behind its back). And sys.stdout isn't open in binary mode.
> i can't see the benefit of two encodes and a decode just to write a
> string to the file.
Nor can I - that's my point. But if all I have is an open text-mode file with the "strict" error mode, I have to incur one encode, and I have to make sure that no characters are passed to that encode which can't be encoded.
If there was a codec method to identify un-encodable characters, that might be an alternative (although it's quite possible that the encode/decode dance would be faster anyway, as it's mostly in C - not that performance is key here).
> Alternatively, there's probably a way to open the file using
> codecs.open(), and reassign it to sys.stdout.
As I said, I have to work with the file (sys.stdout or whatever) that I'm given. I can't reopen or replace it.
Paul
More information about the Python-list
mailing list