Python version of perl's "if (-T ..)" and "if (-B ...)"?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Fri Feb 12 12:20:06 EST 2010


On Fri, 12 Feb 2010 15:14:07 +0100, Christian Heimes wrote:

> Lloyd Zusman wrote:
>> .... The -T  and -B  switches work as follows. The first block or so
>> .... of the file is examined for odd characters such as strange control
>> .... codes or characters with the high bit set. If too many strange
>> .... characters (>30%) are found, it's a -B file; otherwise it's a -T
>> .... file. Also, any file containing null in the first block is ....
>> considered a binary file. [ ... ]
> 
> That's a butt ugly heuristic that will lead to lots of false positives
> if your text happens to be UTF-16 encoded or non-english text UTF-8
> encoded.

And a hell of a lot of false negatives if the file is binary.

The way I've always seen it, a file is binary if it contains a single 
binary character *anywhere* in the file.


-- 
Steven



More information about the Python-list mailing list