Finding non ascii characters in a set of files

Fri Feb 23 15:13:25 EST 2007

"Marc 'BlackJack' Rintsch" <bj_666 at gmx.net> wrote in message 
news:pan.2007.02.23.18.41.03.317525 at gmx.net...
> In <ern7jr$rkn$1 at foggy.unx.sas.com>, Tim Arnold wrote:
>
>> Here's what I do (I need to know the line number).
>>
>> import os,sys,codecs
>> def checkfile(filename):
>>     f = codecs.open(filename,encoding='ascii')
>>
>>     lines = open(filename).readlines()
>>     print 'Total lines: %d' % len(lines)
>>     for i in range(0,len(lines)):
>>         try:
>>             l = f.readline()
>>         except:
>>             num = i+1
>>             print 'problem: line %d' % num
>>
>>     f.close()
>
> I see a `NameError` here.  Where does `i` come from?  And there's no need
> to read the file twice.  Untested:
>
> import os, sys, codecs
>
> def checkfile(filename):
>    f = codecs.open(filename,encoding='ascii')
>
>    try:
>        for num, line in enumerate(f):
>            pass
>    except UnicodeError:
>        print 'problem: line %d' % num
>
>    f.close()
>
> Ciao,
> Marc 'BlackJack' Rintsch

well, I take it back....that code doesn't work, or at least it doesn't for 
my test case.
but thanks anyway, I'm sticking to my original code. the 'i' came from for i 
in range.
--Tim