Changing strings in files

Serhiy Storchaka storchaka at gmail.com
Wed Nov 11 03:03:54 EST 2020


10.11.20 22:40, Dennis Lee Bieber пише:
> 	Testing for extension in a list of exclusions would be much faster than
> scanning the contents of a file, and the few that do get through would have
> to be scanned anyway.

Then the simplest method should work: read the first 512 bytes and check
if they contain b'\0'. Chance that a random sequences of bytes does not
contain NUL is (1-1/256)**512 = 0.13. So this will filter out 87% of
binary files. Likely6 more, because binary files usually have some
structure, and reserve fixed size for integers. Most integers are much
less than the maximal value, so higher bits and bytes are zeroes. You
can also decrease the probability of false results by increasing the
size of tested data or by testing few other byte values (b'\1', b'\2',
etc). Anything more sophisticate is just a waste of your time.



More information about the Python-list mailing list