getting fileinput to do errors='ignore' or 'replace'?
Peter Otten
__peter__ at web.de
Thu Dec 3 11:11:37 EST 2015
Adam Funk wrote:
> On 2015-12-03, Adam Funk wrote:
>
>> I'm having trouble with some input files that are almost all proper
>> UTF-8 but with a couple of troublesome characters mixed in, which I'd
>> like to ignore instead of throwing ValueError. I've found the
>> openhook for the encoding
>>
>> for line in fileinput.input(options.files,
>> openhook=fileinput.hook_encoded("utf-8")):
>> do_stuff(line)
>>
>> which the documentation describes as "a hook which opens each file
>> with codecs.open(), using the given encoding to read the file", but
>> I'd like codecs.open() to also have the errors='ignore' or
>> errors='replace' effect. Is it possible to do this?
>
> I forgot to mention: this is for Python 2.7.3 & 2.7.10 (on different
> machines).
Have a look at the source of fileinput.hook_encoded:
def hook_encoded(encoding):
import io
def openhook(filename, mode):
mode = mode.replace('U', '').replace('b', '') or 'r'
return io.open(filename, mode, encoding=encoding, newline='')
return openhook
You can use it as a template to write your own factory function:
def my_hook_encoded(encoding, errors=None):
import io
def openhook(filename, mode):
mode = mode.replace('U', '').replace('b', '') or 'r'
return io.open(
filename, mode,
encoding=encoding, newline='',
errors=errors)
return openhook
for line in fileinput.input(
options.files,
openhook=my_hook_encoded("utf-8", errors="ignore")):
do_stuff(line)
Another option is to create the function on the fly:
for line in fileinput.input(
options.files,
openhook=functools.partial(
io.open, encoding="utf-8", errors="replace")):
do_stuff(line)
(codecs.open() instead of io.open() should also work)
More information about the Python-list
mailing list