Making safe file names

Jens Thoms Toerring jt at toerring.de
Tue May 7 18:37:01 EDT 2013


Andrew Berg <bahamutzero8825 at gmail.com> wrote:
> Currently, I keep Last.fm artist data caches to avoid unnecessary API calls
> and have been naming the files using the artist name. However, artist names
> can have characters that are not allowed in file names for most file systems
> (e.g., C/A/T has forward slashes). Are there any recommended strategies for
> naming such files while avoiding conflicts (I wouldn't want to run into
> problems for an artist named C-A-T or CAT, for example)? I'd like to make
> the files easily identifiable, and there really are no limits on what
> characters can be in an artist name. --

It's not clear what the context that you need this for. You
could e.g. replace all characters not allowed by the file
system by their hexidecimal (ASCII) values, preceeded by a
'%" (so '/' would be changed to '%2F', and also encode a '%'
itself in a name by '%25'). Then you have a well-defined
two-way mapping ("isomorphic" if I remember my math-lear-
nining days correctly) between the original name and the
way you store it. E.g.

  "C/A/T"  would become  "C%2FA%2FT"

and

  "C%2FA/T"  would become  "C%252FA%2FT"

You can translate back and forth between them with not too
much effort.

Of course, that assumes that '%' is a character allowed by
your file system - otherwise pick some other one, any one
will do in principle. It's a bit harder for a human to in-
terpret but rathe likely not that much of a problem. You
probably will have seen that kind of scheme used in URLs.
The concept is rather old and called 'escape character',
i.e. have one character that assumes some special meaning
and also "escaped" it.

If, on the hand, those names are never to be translated back
to the original name another strategy would be to use the SHA1
hash value of the artists name. Since clashes between SHA1 hash
values are rather hard to produce it's a rather safe method of
converting something (i.e. the artists name) to a number. The
drawback, of course, is that you can't translate back from the
hash value to the original name (if that would be simple the
whole thing wouldn't work;-)

                       Regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt at toerring.de
   \__________________________      http://toerring.de



More information about the Python-list mailing list