Checking Common File Types

Sun Dec 1 22:05:42 EST 2013

On Monday, December 2, 2013 5:11:15 AM UTC+5:30, jade wrote:
> > To: pytho... at python.org
> > From: wlf... at ix.netcom.com
> > Subject: Re: Checking Common File Types
> > Date: Sun, 1 Dec 2013 18:23:22 -0500
> > 
> > On Sun, 1 Dec 2013 18:27:16 +0000, jade <jade... at msn.com> declaimed the
> > following:
> > 
> > >Hello, 
> > >I'm trying to create a script that checks all the files in my 'downloaded' directory against common file types and then tells me how many of the files in that directory aren't either a GIF or a JPG file. I'm familiar with basic Python but this is the first time I've attempted anything like this and I'm looking for a little help or a point in the right direction? 
> > >
> > >file_sigs = {'\xFF\xD8\xFF':('JPEG','jpg'),  '\x47\x49\x46':('GIF','gif')}
> > 
> > 	Apparently you presume the file extensions are inaccurate, as you are
> > digging into the files for signatures.
> > 
> > >def readFile():    filename = r'c:/temp/downloads'      fh = open(filename, 'r')     file_sig = fh.read(4) print '[*] check_sig() File:',filename #, 'Hash Sig:', binascii.hexlify(file_sig) 
> > 
> > 	Note: if you are hardcoding forward slashes, you don't need the raw
> > indicator...
> > 
> > 	That said, what is "c:/temp/downloads"? You apparently are opening IT
> > as the file to be examined. Is it supposed to be a directory containing
> > many files, a file containing a list of files, ???
> > 
> > 	What is "check_sig" -- it looks like a function you haven't defined --
> > but it's inside the quotes making a string literal that will never be
> > called anyway.
> > 
> > 	If you are just concerned with one directory of files, you might want
> > to read the help file on the glob module, along with os.path
> > (join/splitext/etc). Or just string methods...
> > 
> > >>> import glob
> > >>> import os.path
> > >>> TARGET = os.path.join(os.environ["USERPROFILE"],
> > ... 	"documents/BW-conversion/*")
> > >>> TARGET = os.path.join(os.environ["USERPROFILE"],
> > ... 	"documents/BW-conversion/*")
> > >>> files = glob.glob(TARGET)
> > >>> for fn in files:
> > ... 	fp, fx = os.path.splitext(fn)
> > ... 	print "File %s purports to be of type %s" % (fn, fx.upper())
> > ... 
> > File C:\Users\Wulfraed\documents/BW-conversion\BW-1.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\BW-2.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\BW-3.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\BW-4.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\BWConv.html purports to be
> > of type .HTML
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_b1.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_b2.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_b3.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_b4.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_b5.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_b6.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_col.jpg purports to be
> > of type .JPG
> > >>> 
> > -- 
> > 	Wulfraed                 Dennis Lee Bieber         AF6VN
> >     wlf... at ix.netcom.com    HTTP://wlfraed.home.netcom.com/
> > 
> > -- 
> > https://mail.python.org/mailman/listinfo/python-list
>
>
>
> Hi, thanks for all your replies. I realised pretty soon after I asked for help that I was trying to read the wrong amount of bytes and set about completely rewriting my code (after a coffee break)
>
> import sys, os, binascii
>
> def readfile():
>
>
>     dictionary = {'474946':('GIF', 'gif'), 'ffd8ff':('JPEG', 'jpeg')}
>     try:
>         files = os.listdir('C:\\Temp\\downloads')        
>         for item in files:
>             f = open('C:\\Temp\\downloads\\'+ item, 'r')
>             file_sig = f.read(3)
>             file_sig_hex = binascii.hexlify(file_sig)
>                         
>             if file_sig_hex in dictionary:
>                 print item + ' is a image file, it is a ' + file_sig
>
>             else:
>                 print item + ' is not an image file, it is' +file_sig
>
>             print file_sig_hex
>
>     
>
>     except:
>         print 'Error. Try again'
>
>     finally:
>         if 'f' in locals():
>             f.close()
>
> def main():
>  
>     readfile()
>
> if __name__ == '__main__':
>     main()
>
> As of right now my script prints out 'Error Try again' but when i comment out this part of the code;
>
>           if file_sig_hex in dictionary:
>                 print item + ' is a image file' + dictionary 
>
>             else:
>                 print item + ' is not an image file, is it' +dictionary 
>
>             
>
> it prints the file signatures to the screen, however what I'm trying to do with the if statement is tell me if the file is an image and give me is signature and if it is not, I want it to tell me and still give me it's signature and tell me what type of file it is. Can anyone point out an obvious error? 

You are catching all exceptions -- that garbages all the debugging finesse that python offers you. Dont.
http://stackoverflow.com/questions/10594113/bad-idea-to-catch-all-exceptions-in-python

On a different note: You seem to be using google groups.
It causes some nuisance to people:
https://wiki.python.org/moin/GoogleGroupsPython

Heres a more automated solution 

see my post here:
https://groups.google.com/forum/#!topic/comp.lang.python/Cf6adRN3KGs