Finding non ascii characters in a set of files
Tim Arnold
tiarno at sas.com
Fri Feb 23 15:13:25 EST 2007
"Marc 'BlackJack' Rintsch" <bj_666 at gmx.net> wrote in message
news:pan.2007.02.23.18.41.03.317525 at gmx.net...
> In <ern7jr$rkn$1 at foggy.unx.sas.com>, Tim Arnold wrote:
>
>> Here's what I do (I need to know the line number).
>>
>> import os,sys,codecs
>> def checkfile(filename):
>> f = codecs.open(filename,encoding='ascii')
>>
>> lines = open(filename).readlines()
>> print 'Total lines: %d' % len(lines)
>> for i in range(0,len(lines)):
>> try:
>> l = f.readline()
>> except:
>> num = i+1
>> print 'problem: line %d' % num
>>
>> f.close()
>
> I see a `NameError` here. Where does `i` come from? And there's no need
> to read the file twice. Untested:
>
> import os, sys, codecs
>
> def checkfile(filename):
> f = codecs.open(filename,encoding='ascii')
>
> try:
> for num, line in enumerate(f):
> pass
> except UnicodeError:
> print 'problem: line %d' % num
>
> f.close()
>
> Ciao,
> Marc 'BlackJack' Rintsch
well, I take it back....that code doesn't work, or at least it doesn't for
my test case.
but thanks anyway, I'm sticking to my original code. the 'i' came from for i
in range.
--Tim
More information about the Python-list
mailing list