Howto Determine mimetype without the file name extension?
Justin Azoff
justin.azoff at gmail.com
Tue Jul 18 11:00:16 EDT 2006
Phoe6 wrote:
> Hi all,
> I had a filesystem crash and when I retrieved the data back
> the files had random names without extension. I decided to write a
> script to determine the file extension and create a newfile with
> extension.
[...]
> but the problem with using file was it recognized both .xls (MS Excel)
> and .doc ( MS Doc) as Microsoft Word Document only. I need to separate
> the .xls and .doc files, I dont know if file will be helpful here.
You may want to try the gnome.vfs module:
info = gnome.vfs.get_file_info(filename,
gnome.vfs.FILE_INFO_GET_MIME_TYPE)
info.mime_type #mime type
If all of your documents are .xls and .doc, you could also use one of
the cli tools that converts .doc to txt like catdoc. These tools will
fail on an .xls document, so if you run it and check for output. .doc
files would output a lot, .xls files would output an error or nothing.
The gnome.vfs module is probably your best bet though :-)
Additionally, I would re-organize your program a bit. something like:
import os
import re
import subprocess
types = (
('rtf', 'Rich Text Format data'),
('doc', 'Microsoft Office Document'),
('pdf', 'PDF'),
('txt', 'ASCII English text'),
)
def get_magic(filename):
pipe=subprocess.Popen(['file',filename],stdout=subprocess.PIPE)
output = pipe.stdout.read()
pipe.wait()
return output
def detext(filename):
fileoutput = get_magic(filename)
for ext, pattern in types:
if pattern in fileoutput:
return ext
def allfiles(path):
for root,dirs,files in os.walk(os.getcwd()):
for each in files:
fname = os.path.join(root,each)
yield fname
def fixnames(path):
for fname in allfiles(path):
extension = detext(fname)
print fname, extension #....
def main():
path = os.getcwd()
fixnames(path)
if __name__ == '__main__':
main()
Short functions that just do one thing are always best.
To change that to use gnome.vfs, just change the types list to be a
dictionary like
types = {
'application/msword': 'doc',
'application/vnd.ms-powerpoint': 'ppt',
}
and then
def get_mime(filename):
info = gnome.vfs.get_file_info(filename,
gnome.vfs.FILE_INFO_GET_MIME_TYPE)
return info.mime_type
def detext(filename):
mime_type = get_mime(filename)
return types.get(mime_type)
--
- Justin
More information about the Python-list
mailing list