Making safe file names

Roy Smith roy at panix.com
Tue May 7 20:22:17 EDT 2013


In article <mailman.1428.1367972114.3114.python-list at python.org>,
 Dave Angel <davea at davea.name> wrote:

> On 05/07/2013 03:58 PM, Andrew Berg wrote:
> > Currently, I keep Last.fm artist data caches to avoid unnecessary API calls 
> > and have been naming the files using the artist name. However,
> > artist names can have characters that are not allowed in file names for 
> > most file systems (e.g., C/A/T has forward slashes). Are there any
> > recommended strategies for naming such files while avoiding conflicts (I 
> > wouldn't want to run into problems for an artist named C-A-T or
> > CAT, for example)? I'd like to make the files easily identifiable, and 
> > there really are no limits on what characters can be in an artist name.
> >
> 
> So what you need first is a list of allowable characters for all your 
> target OS versions.  And don't forget that the allowable characters may 
> vary depending on the particular file system(s) mounted on a given OS.
> 
> You also need to decide how to handle Unicode characters, since they're 
> different for different OS.  In Windows on NTFS, filenames are in 
> Unicode, while on Unix, filenames are bytes.  So on one of those, you 
> will be encoding/decoding if your code is to be mostly portable.
> 
> Don't forget that ls and rm may not use the same encoding you're using. 
>   So you may not consider it adequate to make the names legal, but you 
> may also want they easily typeable in the shell.

One possible tool that may help you here is unidecode 
(https://pypi.python.org/pypi/Unidecode).  It doesn't solve your whole 
problem, but it does help get unicode text into a form which is both 
7-bit clean and human readable.



More information about the Python-list mailing list