Finding non ascii characters in a set of files

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Fri Feb 23 13:41:04 EST 2007


In <ern7jr$rkn$1 at foggy.unx.sas.com>, Tim Arnold wrote:

> Here's what I do (I need to know the line number).
> 
> import os,sys,codecs
> def checkfile(filename):
>     f = codecs.open(filename,encoding='ascii')
> 
>     lines = open(filename).readlines()
>     print 'Total lines: %d' % len(lines)
>     for i in range(0,len(lines)):
>         try:
>             l = f.readline()
>         except:
>             num = i+1
>             print 'problem: line %d' % num
> 
>     f.close()

I see a `NameError` here.  Where does `i` come from?  And there's no need
to read the file twice.  Untested:

import os, sys, codecs

def checkfile(filename):
    f = codecs.open(filename,encoding='ascii')

    try:
        for num, line in enumerate(f):
            pass
    except UnicodeError:
        print 'problem: line %d' % num

    f.close()

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list