Determine file type (binary or text)

Karl Scalet news at yebu.de
Wed Aug 13 08:39:57 EDT 2003


Michael Peuser schrieb:
> Hi,
> yes there is more than just Unix in the world ;-)
> Windows directories have no means to specify their contents type in any way.

That's even more true with linux/unix, as there is no need to do
any stuff like line-terminator conversion.

> The approved method is using three-letter extensions, though this rule  is
> not strictly followed (lot of files without extension nowadays!)
> 
> When I had a similar problem I read 1000 characters, counted the amount of
> <32 and >255 characters and classified it "binary when this qota exceeded
> 20%. I have no idea whether it will work good with chinese unicode files or
> some funny depositories or project files that store uncompressed texts....

based on the idea from Mr. "bromden", why not use mimetypes.MimeTypes()
and guess_type('file://...') and analye the returned string.
This should work on windows / linux / unix / whatever.


Karl


> 
> KIndly
> Michael P
> 
> "Sami Viitanen" <none at none.net> schrieb im Newsbeitrag
> news:v7p_a.1558$k4.32814 at news2.nokia.com...
> 
>>Works well in Unix but I'm making a script that works on both
>>Unix and Windows.
>>
>>Win doesn't have that 'file -bi' command.
>>
>>"bromden" <bromden at gazeta.pl.no.spam> wrote in message
>>news:bhd559$ku9$1 at absinth.dialog.net.pl...
>>
>>>>How can I check if a file is binary or text?
>>>
>>> >>> import os
>>> >>> f = os.popen('file -bi test.py', 'r')
>>> >>> f.read().startswith('text')
>>>1
>>>
>>>(btw, f.read() returns 'text/x-java; charset=us-ascii\n')
>>>
>>>--
>>>bromden[at]gazeta.pl
>>>
>>
>>
> 
> 





More information about the Python-list mailing list