encode short string as filename (unix/windows)

robert no-spam at no-spam-no-spam.com
Mon Mar 27 11:40:38 EST 2006


Jean-Paul Calderone wrote:


> punycode is used by dns.  A commonly used email codec is 
> quoted-printable.  Here's an example of each:
> 
>    >>> u'Helló world'.encode('utf-8').encode('quopri')
>    'Hell=C3=B3=20world'
>    >>> u'Helló world'.encode('punycode')
>    'Hell world-jbb'
>    >>>
> Note the extra trip through utf-8 for quoted-printable, as it is not 
> implemented in Python as a character encoding, but a byte encoding, so 
> you cannot (safely) apply it to a unicode string.
> 
> Jean-Paul
> 

 >>> u'Helló world\\/\x00'.encode('punycode')
'Hell world\\/\x00-elb'
 >>> u'Helló world\\/\x00'.encode('utf-8').encode('quopri')
'Hell=C3=B3=20world\\/=00'
 >>>


that doesn't remove \ /
that other base.. things similar

so finally found me reggae'ing :-(  , but this provides minimal optical 
damage for common strings ...


def encode_as_filename(s):
     def _(m): return "+%02X" % ord(m.group(0))
     return re.sub('[\x00"\\\\/*?:<>|+\n]',_,s)
def decode_from_filename(s):
     def _(m): return chr(int(m.group(0)[1:],16))
     return re.sub("\\+[\dA-F]{2,2}",_,s)


 >>> newsletter.encode_as_filename('robert@?/\\+\n\x00:+test')
'robert at +3F+2F+5C+2B+0A+00+3A+2Btest'
 >>> newsletter.decode_from_filename(_)
'robert@?/\\+\n\x00:+test'
 >>>


Robert





More information about the Python-list mailing list