Changing strings in files

Chris Angelico rosuav at gmail.com
Tue Nov 10 13:40:56 EST 2020


On Wed, Nov 11, 2020 at 5:36 AM Eli the Bearded <*@eli.users.panix.com> wrote:
> Read first N lines of a file. If all parse as valid UTF-8, consider it text.
> That's probably the rough method file(1) and Perl's -T use. (In
> particular allow no nulls. Maybe allow ISO-8859-1.)
>

ISO-8859-1 is basically "allow any byte values", so all you'd be doing
is checking for a lack of NUL bytes. I'd definitely recommend
mandating UTF-8, as that's a very good way of recognizing valid text,
but if you can't do that then the simple NUL check is all you really
need.

And let's be honest here, there aren't THAT many binary files that
manage to contain a total of zero NULs, so you won't get many false
hits :)

ChrisA


More information about the Python-list mailing list