Finding non ascii characters in a set of files

Toby A Inkster usenet200702 at tobyinkster.co.uk
Sat Feb 24 06:21:14 EST 2007


bg_ie wrote:

> What I'd like to do is scan a directory and list all the
> files in it that contain a non ascii character.

Not quite sure what your intention is. If you're planning a one-time scan
of a directory for non-ASCII characters in files, so that you can manually
fix those files up, then this Perl one-liner will do the trick. At the
command line, type:

	perl -ne 'print "$ARGV:$.\n" if /[\x80-\xFF]/;' *

This will print out a list of files that contain non-ASCII characters, and
the line numbers which those characters appear on. Note this also
operates on binary files like images, etc, so you may want to be more
specific with the wildcard. e.g.:

	perl -ne 'print "$ARGV:$.\n" if /[\x80-\xFF]/;' *.py *.txt *.*htm*

-- 
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!



More information about the Python-list mailing list