[Python-ideas] Add encoding attribute to bytes
Terry Reedy
tjreedy at udel.edu
Fri Nov 6 02:15:36 CET 2009
A Python interpreter has one encoding for floats, ints, and strings.
sys.float_info and sys.int_info give details about the first two.
although they are mostly invisible to user code. (I presume they are
attached to sys rather than float and int precisely because this.) A
couple of recent posts have discussed making the unicode encoding (UCS2
v 4) both less visible and more discoverable to extensions.
Bytes are nearly always an encoding of *something*, but the particular
encoding used is instance-specific. As Guido has said, the programmer
must keep track. But how? In an OO language, one obvious way is as an
attribute of the instance. That would be carried with the instance and
make it self-identifying.
What I do not know if it is feasible to give an immutable instance of a
builtin class a mutable attribute slot. If it were, I think this could
make 3.x bytes easier and more transparent to use. When a string is
encoded to bytes, the attribute would be set. If it were then pickled,
the attribute would be stored with it and restored with it, and less
easily lost. If it were then decoded, the attribute would be used. If it
were sent to the net, the attribute would be used to set the appropriate
headers. The reverse process would apply from net to bytes to (unicode)
text.
Bytes representing other types of data, such as nedia could also be
tagged, not just those representing text.
This would be a proposal for 3.3 at the earliest. It would involved
revising stdlib modules, as appropriate, to use the new info.
Terry Jan Reedy
More information about the Python-ideas
mailing list