[Python-ideas] os.path.isbinary

Mathias Panzenböck grosser.meister.morti at gmx.net
Wed Jul 31 23:20:14 CEST 2013


On 07/31/2013 07:15 PM, Clay Sweetser wrote:
>
> On Jul 31, 2013 12:22 PM, "Eli Bendersky" <eliben at gmail.com <mailto:eliben at gmail.com>> wrote:
>  >
>  >
>  >
>  >
>  > On Wed, Jul 31, 2013 at 8:40 AM, Ryan <rymg19 at gmail.com <mailto:rymg19 at gmail.com>> wrote:
>  >>
>  >> Here's something more interesting than my shlex idea.
>  >>
>  >> os.path is, pretty much, the Python FS toolbox, along with shutil. But, there's one feature missing: check if a file is binary. It isn't hard, see http://code.activestate.com/recipes/173220/. But, writing 50 lines of code for a more common task isn't really Python-ish.
>  >>
>  >> So...
>  >>
>  >> What if os.path had a binary checker that works just like isfile:
>  >> os.path.isbinary('/nothingness/is/eternal') # Returns boolean
> Besides the high chance of false positives, what makes this method (and the problem it tries to solve) so so difficult is that binary files may contain what is considered to be large amounts of text, and text files may contain pieces of binary data.
> For example, consider a windows executable file - Much of the data in such a file is considered binary data, but there are defined sections where strings and text resources are stored. Any heuristic algorithm like the one mentioned will be insufficient in such cases.
> Although I can't think of a situation off hand where the opposite may be true (binary data embedded in what is considered to be a text file) I'm pretty sure such a situation exists.

One could consider PDF to be such a format (text with embedded binary data).

>  >
>  >
>  >
>  > Some time ago I put on a gas mask and dove into the Perl source code to figure out how its "is binary" and "is text" operators work: http://eli.thegreenplace.net/2011/10/19/perls-guess-if-file-is-text-or-binary-implemented-in-python/
>  >
>  > I would recommend against including such a simplistic heuristic in the Python stdlib.
>  >
>  > Eli
>  >
>  >


More information about the Python-ideas mailing list