Finding nonprintable characters?
Gustavo Cordova
gcordova at hebmex.com
Tue Feb 19 14:13:40 EST 2002
>
> Hello,
>
Gweepnings.
>
> I have a function
>
> isBinary(filehandle)
>
Shades of Perl's "if(-b $filename)" !!!!
>
> that I'm not sure how to implement.
>
damn. :-(
> I've decided to define binary as
> containing characters above \x80. But what is the best way
> to do this?
>
> 1. iterate through xreadline, so the whole thing doesn't get
> loaded into
> memory?
>
def isBinary(filehandle):
# Save current position.
lastPos = filehandle.tell()
# Search for binary chars.
line = filehandle.readline()
while line:
if .... (find a char > \x7F) how??
>
> 2. String searching? If so, for what string? Searching for anything
> greater than \x7f?
>
But where??
In the first line?
Char by char?
In the first n Kb of text?
>
> 3. Re searching? for what class?
>
class like [\x7F-\xFF] I'd think.
>
> Thanks in advance,
>
> Van
>
My suggestion:
1. read a block of the file, say, the first 2Kb.
2. Scan with a regex like r'[\x80-\xFF]'.
3. If no chars found, the it's text, else it's binary.
import sre
def isBinary(filehandle, blockSize=2048):
start = filehandle.read(blockSize)
filehandle.seek(0)
# Check for "binary" chars.
if sre.search(r'[\x80-\xFF]',start,sre.S):
# Sure enough, it's one of them dastardly BINARY files!
return 1
# Wait! Is there at least ONE \n in the text?
if not sre.search(r'\n', start, sre.S):
# Shuks, seem'd decent enough.
return None
# OK, you're good, I guess.
return 1
More or less. I added the '\n' requirement for "textyness",
because 2Kb withough a single new-line doesn't seem quite
texty to me. Of course, you might think diferent.
Good luck :-)
-gustavo
-gus
More information about the Python-list
mailing list