Howto Determine mimetype without the file name extension?

Phoe6 orsenthil at gmail.com
Tue Jul 18 09:53:25 EDT 2006


Hi all,
         I had a filesystem crash and when I retrieved the data back
the files had random names without extension. I decided to write a
script to determine the file extension and create a newfile with
extension.
---
method 1:
# File extension utility.

import os
import mimetypes
import shutil

def main():

    for root,dirs,files in os.walk(r'C:\Senthil\test'):
        for each in files:
            fname = os.path.join(root,each)
            print fname
            mtype,entype = mimetypes.guess_type(fname)
            fext = mimetypes.guess_extension(mtype)
            if fext is not None:
                try:
                    newname = fname + fext
                    print newname
                    shutil.copyfile(fname,newname)
                except (IOError,os.error), why:
                    print "Can't copy %s to %s: %s" %
(fname,newname,str(why))


if __name__ == "__main__":
    main()

----
The problem I faced with this script is. if the filename did not have
any extension, the mimetypes.guess_type(filename) failed!!!
How do I get around this problem.

As it was a linux box, I tried using file command to get the work done.
----
Method 2:

import os
import shutil
import re

def detext(filename):
	cin,cout,cerr = os.popen3('file ' + filename)
	fileoutput = cout.read()
	rtf = re.compile('Rich Text Format data')
#	doc = re.compile('Microsoft Office Document')
	pdf = re.compile('PDF')

	if rtf.search(fileoutput) is not None:
		shutil.copyfile(filename,filename + '.rtf')
	if doc.search(fileoutput) is not None:
		shutil.copyfile(filename,filename + '.doc')

	if pdf.search(fileoutput) is not None:
		shutil.copyfile(filename,filename + '.pdf')

def main():
	for root,dirs,files in os.walk(os.getcwd()):
		for each in files:
			fname = os.path.join(root,each)
			detext(fname)

if __name__ == '__main__':
	main()

----
but the problem with using file was it recognized both .xls (MS Excel)
and .doc ( MS Doc) as Microsoft Word Document only. I need to separate
the .xls and .doc files, I dont know if file will be helpful here.

--
If the first approach of mimetypes works, it would be great!
Has anyone faced this problem? How did you solve it?

thanks,
Senthil

http://phoe6.livejournal.com




More information about the Python-list mailing list