[Python-Dev] Unicode docs

Tue, 15 May 2001 03:33:06 -0400

I don't know that the Unicode docs need massive work, but the docs that are
there simply don't answer the technical questions people have:  they're too
thin.

Let's keep it simple.  Contrast the Library manual's:

    unicode(string[, encoding[, errors]])
    Decodes string using the codec for encoding. Error handling is
    done according to errors. The default behavior is to decode UTF-8
    in strict mode, meaning that encoding errors raise ValueError. See
    also the codecs module.

with Andrew's description (from http://www.amk.ca/python/2.0/):

    unicode(string [, encoding] [, errors])
    Creates a Unicode string from an 8-bit string. encoding is a
    string naming the encoding to use. The errors parameter specifies
    the treatment of characters that are invalid for the current
    encoding; passing 'strict' as the value causes an exception
    to be raised on any encoding error, while 'ignore' causes errors
    to be silently ignored and 'replace' uses U+FFFD, the official
    replacement character, in case of any problems.

The latter addresses several *fundamental* questions untouched by the former,
like whar are the datatypes of the arguments and the result, what values does
errors accept, and what do they mean?  The first blurb answers some more,
like what's the default encoding, and which exception is raised?  Neither is
complete on its own, but the reference manual should have a complete answer
to all such questions.  It doesn't have to go on at great length.

A round-trip example would be invaluable.

If Fred wanted to incorporate a brief overview too, a light rework of
Andrew/Moshe's writeup would be an excellent start.