Finding nonprintable characters?
Steven Majewski
sdm7g at Virginia.EDU
Tue Feb 19 14:35:40 EST 2002
On Tue, 19 Feb 2002, VanL wrote:
>
> I have a function
>
> isBinary(filehandle)
>
> that I'm not sure how to implement. I've decided to define binary as
> containing characters above \x80. But what is the best way to do this?
>
> 1. iterate through xreadline, so the whole thing doesn't get loaded into
> memory?
I would use file.read( bytes ) -- if it's binary, then you probably
don't need to read the whole file in. Most programs I've seen that
try to determine 'binaryness' only check the first N bytes anyway.
( I've seen some that want a certain percentage of non-printing chars
per block -- not just a single out of range char. )
> 2. String searching? If so, for what string? Searching for anything
> greater than \x7f?
>
> 3. Re searching? for what class?
>
How about something like:
filter( lambda c: ord(c) > value, file.read( blocksize ) )
or, as you note, save the ord() call and use an octal or hex string
literal. If you want to use list comprehensions it would be something
like:
[ c for c in file.read( blocksize ) if c > '\x7f' ]
but list comprehensions give you a list while filter on a string
yields a string. You can divide the (float) length of the filtered value
by the length of the original ( blocksize ) for a ratio if you
want to use that instead of a single out of range char.
-- Steve Majewski
More information about the Python-list
mailing list