Determining when a file is an Open Office Document

Robert Marshall spam at chezmarshall.freeserve.co.uk
Fri Jan 19 16:41:53 EST 2007


On Fri, 19 Jan 2007, Steven D'Aprano wrote:

> On Fri, 19 Jan 2007 12:22:04 +1100, Ben Finney wrote:
> 
>> tubby <tubby at bandaheart.com> writes:
>> 
>>> Silly question, but here goes... what's a good way to determine
>>> when a file is an Open Office document? I could look at the file
>>> extension, but it seems there would be a better way.
>> <snip>
>> The Unix 'file' command determines the type of a file by its
>> contents, not its name. This functionality is essentially a
>> database of "magic" byte patterns mapping to file types,
> 
> Ah, another lousy, unreliable way to make a definite statement about
> the actual contents of a file. Looking at magic bytes inside a file
> is hardly bullet-proof (although file seems to be moderately
> reliable in practice, at least under Linux).
> 
> Simple example: is the file consisting of two bytes "x09x0A" meant
> to be a text file with a tab and a newline, or a binary file
> consisting of a single two-byte int? There's no way to tell just
> from the contents.  

And see for example the problem that development versions of emacs is
(were?) having with C files that started #define and were then treated
as graphics files!

http://thread.gmane.org/gmane.emacs.devel/64823/focus=65228


Robert
-- 
La grenouille songe..dans son château d'eau
Links and things http://rmstar.blogspot.com/



More information about the Python-list mailing list