Making safe file names

Andrew Berg bahamutzero8825 at gmail.com
Wed May 8 00:49:46 EDT 2013


On 2013.05.07 22:40, Steven D'Aprano wrote:
> There aren't any characters outside of UTF-8 :-) UTF-8 covers the entire 
> Unicode range, unlike other encodings like Latin-1 or ASCII.
You are correct. I'm not sure what I was thinking.

>> I don't understand. I have no intention of changing Unicode characters.
> 
> Of course you do. You even talk below about Unicode characters like * 
> and ? not being allowed on NTFS systems.
I worded that incorrectly. What I meant, of course, is that I intend to preserve as many characters as possible and have no need to stay
within ASCII.

> If you have an artist with control characters in their name, like newline 
> or carriage return or NUL, I think it is fair to just drop the control 
> characters and then give the artist a thorough thrashing with a halibut.
While the thrashing with a halibut may be warranted (though I personally would use a rubber chicken), conflicts are problematic.

> Does your mapping really need to be guaranteed reversible? If you have an 
> artist called "JoeBlow", and another artist called "Joe\0Blow", and a 
> third called "Joe\nBlow", does it *really* matter if your application 
> conflates them?
Yes and yes. Some artists like to be real cute with their names and make witch house artist names look tame in comparison, and some may
choose to use names similar to some very popular artists. I've also seen people scrobble fake artists with names that look like real artist
names (using things like a non-breaking space instead of a regular space) with different artist pictures in order to confuse and troll
people. If I could remember the user profiles with this, I'd link them. Last.fm is a silly place.
As I said before though, I don't think control characters are even allowed in artist names (likely for technical reasons).
-- 
CPython 3.3.1 | Windows NT 6.2.9200 / FreeBSD 9.1



More information about the Python-list mailing list