stripping unwanted chars from string

Edward Elliott nobody at 127.0.0.1
Wed May 3 23:36:57 EDT 2006


I'm looking for the "best" way to strip a large set of chars from a filename
string (my definition of best usually means succinct and readable).   I
only want to allow alphanumeric chars, dashes, and periods.  This is what I
would write in Perl (bless me father, for I have sinned...):

$filename =~ tr/\w.-//cd, or equivalently 
$filename =~ s/[^\w.-]//

I could just use re.sub like the second example, but that's a bit overkill. 
I'm trying to figure out if there's a good way to do the same thing with
string methods.  string.translate seems to do what I want, the problem is
specifying the set of chars to remove.  Obviously hardcoding them all is a
non-starter.

Working with chars seems to be a bit of a pain.  There's no equivalent of
the range function, one has to do something like this:

>>> [chr(x) for x in range(ord('a'), ord('z')+1)]
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

Do that twice for letters, once for numbers, add in a few others, and I get
the chars I want to keep.  Then I'd invert the set and call translate. 
It's a mess and not worth the trouble.  Unless there's some way to expand a
compact representation of a char list and obtain its complement, it looks
like I'll have to use a regex.

Ideally, there would be a mythical charset module that works like this:

>>> keep = charset.expand (r'\w.-') # or r'a-zA-Z0-9_.-'
>>> toss = charset.invert (keep)

Sadly I can find no such beast.  Anyone have any insight?  As of now,
regexes look like the best solution.




More information about the Python-list mailing list