Finding non ascii characters in a set of files

John Machin sjmachin at lexicon.net
Fri Feb 23 11:47:38 EST 2007


On Feb 24, 2:44 am, Larry Bates <lba... at websafe.com> wrote:
> Peter Bengtsson wrote:
> > On Feb 23, 2:38 pm, b... at yahoo.com wrote:
> >> Hi,
>
> >> I'm updating my program to Python 2.5, but I keep running into
> >> encoding problems. I have no ecodings defined at the start of any of
> >> my scripts. What I'd like to do is scan a directory and list all the
> >> files in it that contain a non ascii character. How would I go about
> >> doing this?
>
> > How about something like this:
> > content = open('file.py').read()
> > try:
> >     content.encode('ascii')
> > except UnicodeDecodeError:
> >     print "file.py contains non-ascii characters"
>
> The next problem will be that non-text files will contain non-ASCII
> characters (bytes).  The other 'issue' is that OP didn't say how large
> the files were, so .read() might be a problem.
>
> -Larry

The way I read it, the OP's problem is to determine in one big hit
which Python source files need a

# coding: whatever

line up the front to stop Python 2.5 complaining ... I hope none of
them are so big as to choke .read()

Cheers,
John




More information about the Python-list mailing list