How to know if a file is a text file

Philip Semanchuk philip at semanchuk.com
Sat Nov 14 12:51:30 EST 2009


On Nov 14, 2009, at 11:02 AM, Luca Fabbri wrote:

> Hi all.
>
> I'm looking for a way to be able to load a generic file from the
> system and understand if he is plain text.
> The mimetype module has some nice methods, but for example it's not
> working for file without extension.

Hi Luca,
You have to define what you mean by "text" file. It might seem  
obvious, but it's not.

Do you mean just ASCII text? Or will you accept Unicode too? Unicode  
text can be more difficult to detect because you have to guess the  
file's encoding (unless it has a BOM; most don't).

And do you need to verify that every single byte in the file is  
"text"? What if the file is 1GB, do you still want to examine every  
single byte?

If you give us your own (specific!) definition of what "text" means,  
or perhaps a description of the problem you're trying to solve, then  
maybe we can help you better.

Cheers
Philip



More information about the Python-list mailing list