[Python-Dev] Re: adding a bytes sequence type to Python
Guido van Rossum
guido at python.org
Thu Aug 12 18:10:48 CEST 2004
> >> 1. Make bytes a synonuym for str.
>
> Guido> Hmm... I worry that a simple alias would just encourage confused
> Guido> usage, since the compiler won't check. I'd rather see bytes an
> Guido> alias for a bytes array as defined by the array module.
>
> You're right. This could probably be added now ("now" being 2.5) with
> little or no problem. My thought was to get the name in there quickly
> (could be done in 2.4) with some supporting documentation so people could
> begin modifying their code.
I think very few people would do so until the semantics of bytes were
clearer. Let's just put it in meaning byte array when we're ready.
> >> 2. Warn about the use of bytes as a variable name.
>
> Guido> Is this really needed? Builtins don't byte variable names.
>
> I suppose this could be dispensed with. Let pychecker handle it.
>
> >> 3. Introduce b"..." literals as a synonym for current string
> >> literals, and have them *not* generate warnings if non-ascii
> >> characters were used in them without a coding cookie.
>
> Guido> I expecet all sorts of problems with that, such as what it would
> Guido> mean if Unicode or multibyte characters are used in the source.
>
> My intent in proposing b"..." literals was that they would be
> allowed in any source file. Their contents would not be interpreted
> in any way.
But they would be manipulated if they were non-ASCII and the source
file was converted to a different encoding. Better be safe and only
allow printable ASCII and hex escapes there.
> One simple use case: identifying the magic number of a
> binary file type of some sort. That might well be a constant to
> programmers manipulating that sort of file and have nothing to do
> with Unicode at all.
Not a very strong use case, this could easily be done using just hex.
> Guido> Do we really need byte array literals at all?
>
> I think so. Martin already pointed out an example where a string
> literal is used today for a sequences of bytes that's put out on the
> wire as-is. It's just convenient that the protocol was developed in
> such a way that most of its meta-data is plain ASCII.
See my response to that.
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list