Finding non ascii characters in a set of files
Tim Arnold
tiarno at sas.com
Fri Feb 23 12:17:46 EST 2007
"Peter Bengtsson" <peterbe at gmail.com> wrote in message
news:1172243566.906121.189930 at h3g2000cwc.googlegroups.com...
> On Feb 23, 2:38 pm, b... at yahoo.com wrote:
>> Hi,
>>
>> I'm updating my program to Python 2.5, but I keep running into
>> encoding problems. I have no ecodings defined at the start of any of
>> my scripts. What I'd like to do is scan a directory and list all the
>> files in it that contain a non ascii character. How would I go about
>> doing this?
>>
>
> How about something like this:
> content = open('file.py').read()
> try:
> content.encode('ascii')
> except UnicodeDecodeError:
> print "file.py contains non-ascii characters"
>
Here's what I do (I need to know the line number).
import os,sys,codecs
def checkfile(filename):
f = codecs.open(filename,encoding='ascii')
lines = open(filename).readlines()
print 'Total lines: %d' % len(lines)
for i in range(0,len(lines)):
try:
l = f.readline()
except:
num = i+1
print 'problem: line %d' % num
f.close()
More information about the Python-list
mailing list