Finding non ascii characters in a set of files

Larry Bates lbates at websafe.com
Fri Feb 23 10:44:40 EST 2007


Peter Bengtsson wrote:
> On Feb 23, 2:38 pm, b... at yahoo.com wrote:
>> Hi,
>>
>> I'm updating my program to Python 2.5, but I keep running into
>> encoding problems. I have no ecodings defined at the start of any of
>> my scripts. What I'd like to do is scan a directory and list all the
>> files in it that contain a non ascii character. How would I go about
>> doing this?
>>
> 
> How about something like this:
> content = open('file.py').read()
> try:
>     content.encode('ascii')
> except UnicodeDecodeError:
>     print "file.py contains non-ascii characters"
> 
> 
The next problem will be that non-text files will contain non-ASCII
characters (bytes).  The other 'issue' is that OP didn't say how large
the files were, so .read() might be a problem.

-Larry



More information about the Python-list mailing list