encode short string as filename (unix/windows)

Jean-Paul Calderone exarkun at divmod.com
Mon Mar 27 11:21:28 EST 2006


On Mon, 27 Mar 2006 18:13:17 +0200, "Diez B. Roggisch" <deets at nospam.web.de> wrote:
>robert wrote:
>
>> want to encode/decode an arbitrary short 8-bit string as save filename.
>> is there a good already builtin encoding to do this (without too much
>> inflation) ? or re.sub expression?
>
>Yuu could use the base64-encoder. Disadvantage is clearly that you can't
>easily read your original text. Alternatively, three is that encoding that
>is used by e.g. emails if you have an umlaut in a name. I _think_ it is
>called puny-code, but I'm not sure how and if you can use that from within
>python - google yourself :)

punycode is used by dns.  A commonly used email codec is quoted-printable.  Here's an example of each:

    >>> u'Helló world'.encode('utf-8').encode('quopri')
    'Hell=C3=B3=20world'
    >>> u'Helló world'.encode('punycode')
    'Hell world-jbb'
    >>> 

Note the extra trip through utf-8 for quoted-printable, as it is not implemented in Python as a character encoding, but a byte encoding, so you cannot (safely) apply it to a unicode string.

Jean-Paul



More information about the Python-list mailing list