stripping unwanted chars from string

John Machin sjmachin at lexicon.net
Thu May 4 05:43:40 EDT 2006


On 4/05/2006 4:30 PM, Edward Elliott wrote:
> Bryan wrote:
>>  >>> keepchars = set(string.letters + string.digits + '-.')
> 
> Now that looks a lot better.  Just don't forget the underscore. :)
> 

*Looks* better than the monkey business. Perhaps I should point out to 
those of the studio audience who are huddled in an ASCII bunker (if any) 
that string.letters provides the characters considered to be alphabetic 
in whatever the locale is currently set to. There is no guarantee that 
the operating system won't permit filenames containing other characters, 
ones that the file's creator would quite reasonably consider to be 
alphabetic. And of course there are languages that have characters that 
one would not want to strip but can scarcely be described as alphanumeric.

 >>> import os
 >>> os.listdir(u'.')
[u'\xc9t\xe9_et_hiver.doc', u'\u041c\u043e\u0441\u043a\u0432\u0430.txt', 
u'\u5f20\u654f.txt']

 >>> import string
 >>> string.letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

Doing
import locale; locale.setlocale(locale.LC_ALL, '')
would make string.letters work (for me) with the first file above, but 
that's all.



More information about the Python-list mailing list