Finding non ascii characters in a set of files

Tim Arnold tiarno at sas.com
Fri Feb 23 12:17:46 EST 2007


"Peter Bengtsson" <peterbe at gmail.com> wrote in message 
news:1172243566.906121.189930 at h3g2000cwc.googlegroups.com...
> On Feb 23, 2:38 pm, b... at yahoo.com wrote:
>> Hi,
>>
>> I'm updating my program to Python 2.5, but I keep running into
>> encoding problems. I have no ecodings defined at the start of any of
>> my scripts. What I'd like to do is scan a directory and list all the
>> files in it that contain a non ascii character. How would I go about
>> doing this?
>>
>
> How about something like this:
> content = open('file.py').read()
> try:
>    content.encode('ascii')
> except UnicodeDecodeError:
>    print "file.py contains non-ascii characters"
>
Here's what I do (I need to know the line number).

import os,sys,codecs
def checkfile(filename):
    f = codecs.open(filename,encoding='ascii')

    lines = open(filename).readlines()
    print 'Total lines: %d' % len(lines)
    for i in range(0,len(lines)):
        try:
            l = f.readline()
        except:
            num = i+1
            print 'problem: line %d' % num

    f.close()






More information about the Python-list mailing list