[Python-Dev] bytes.from_hex()

Thu Mar 2 04:25:20 CET 2006

[My apologies Greg; I meant to send this to the whole list. I really
need a list-reply button in GMail. ]

On 3/1/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> I don't like that, because it creates a dependency
> (conceptually, at least) between the bytes type and
> the unicode type.

I only find half of this bothersome. The unicode type has a pretty
clear dependency on the bytestring type: all I/O needs to be done in
bytes. Various APIs may mask this by accepting unicode values and
transparently doing the right thing, but from the theoretical
standpoint we pretend there is no simple serialization of unicode
values. But the reverse is not true: the bytestring type has no
dependency on unicode.

As a practicality vs purity, however, I think it's a good choice to
let the bytestring type have a tie to unicode, much like the str type
implicitly does now. But you're absolutely right that adding a
.tounicode begs the question why not a .tointeger?

To try to step back and summarize the viewpoints I've seen so far,
there are three main requirements.

  1) We want things that are conceptually text to be stored in memory
as unicode values.
  2) We want there to be some unambiguous conversion via codecs
between bytestrings and unicode values. This should help teaching,
learning, and remembering unicode.
  3) We want a way to apply and reverse compressions, encodings,
encryptions, etc., which are not only between bytestrings and unicode
values; they may be between any two arbitrary types. This allows
writing practical programs.

There seems to be little disagreement over 1, provided sufficiently
efficient implementation, or sufficient string powers in the
bytestring type. To satisfy both 2 and 3, there seem to be a couple
options. What other requirements do we have?

For (2):
  a) Restrict the existing helpers to be only bytestring.decode and
unicode.encode, possibly enforcing output types of the opposite kind,
and removing bytestring.encode
  b) Add new methods with these semantics, e.g. bytestring.udecode and
unicode.uencode

For (3):
  c) Create new helpers codecs.encode(obj, encoding, errors) and
codecs.decode(obj, encoding, errors)
  d) [Keep existing bytestring and unicode helper methods as is, and]
require use of codecs.getencoder() and codecs.getdecoder() for
arbitrary starting object types

Obviously 2a and 3d do not work together, but 2b and 3c work with
either complementary option. What other options do we have?

Michael
--
Michael Urman  http://www.tortall.net/mu/blog