Detecting Binary content in files

ritu ritu_bhandari27 at yahoo.com
Tue Mar 31 17:26:08 EDT 2009


On Mar 31, 10:19 am, Josh Dukes <josh.du... at microvu.com> wrote:
> There might be another way but off the top of my head:
>
> #!/usr/bin/env python
>
> def isbin(filename):
>    fd=open(filename,'rb')
>    for b in fd.read():
>        if ord(b) > 127:
>            fd.close()
>            return True
>    fd.close()
>    return False
>
> for f in ['/bin/bash', '/etc/passwd']:
>    print "%s is binary: " % f, isbin(f)
>
> Of course this would detect unicode files as being binary and maybe
> that's not what you want. How are you thinking about doing it in
> perl exactly?

With perl, I'm thinking of doing something like the below:

if ( ( -B $filename ||
           $filename =~ /\.pdf$/ ) &&
         -s $filename > 0 ) {
        return(1);
    }

So my isbin method should return a true for any file that isn't
entirely ASCII text, so I guess for my purposes classifying a unicode
file as a 'binary' would be alright. Thanks much for your response.

>
> On Tue, 31 Mar 2009 09:23:05 -0700 (PDT)
>
> ritu <ritu_bhandar... at yahoo.com> wrote:
> > Hi,
>
> > I'm wondering if Python has a utility to detect binary content in
> > files? Or if anyone has any ideas on how that can be accomplished? I
> > haven't been able to find any useful information to accomplish this
> > (my other option is to fire off a perl script from within m python
> > script that will tell me whether the file is binary), so any pointers
> > will be appreciated.
>
> > Thanks,
> > Ritu
> > --
> >http://mail.python.org/mailman/listinfo/python-list
>
> --
>
> Josh Dukes
> MicroVu IT Department




More information about the Python-list mailing list