[Python-Dev] Re: adding a bytes sequence type to Python

Guido van Rossum guido at python.org
Thu Aug 12 18:10:48 CEST 2004


>     >> 1. Make bytes a synonuym for str.
> 
>     Guido> Hmm...  I worry that a simple alias would just encourage confused
>     Guido> usage, since the compiler won't check.  I'd rather see bytes an
>     Guido> alias for a bytes array as defined by the array module.
> 
> You're right.  This could probably be added now ("now" being 2.5) with
> little or no problem.  My thought was to get the name in there quickly
> (could be done in 2.4) with some supporting documentation so people could
> begin modifying their code.

I think very few people would do so until the semantics of bytes were
clearer.  Let's just put it in meaning byte array when we're ready.

>     >> 2. Warn about the use of bytes as a variable name.
> 
>     Guido> Is this really needed?  Builtins don't byte variable names.
> 
> I suppose this could be dispensed with.  Let pychecker handle it.
> 
>     >> 3. Introduce b"..." literals as a synonym for current string
>     >> literals, and have them *not* generate warnings if non-ascii
>     >> characters were used in them without a coding cookie.
> 
>     Guido> I expecet all sorts of problems with that, such as what it would
>     Guido> mean if Unicode or multibyte characters are used in the source.
> 
> My intent in proposing b"..." literals was that they would be
> allowed in any source file.  Their contents would not be interpreted
> in any way.

But they would be manipulated if they were non-ASCII and the source
file was converted to a different encoding.  Better be safe and only
allow printable ASCII and hex escapes there.

> One simple use case: identifying the magic number of a
> binary file type of some sort.  That might well be a constant to
> programmers manipulating that sort of file and have nothing to do
> with Unicode at all.

Not a very strong use case, this could easily be done using just hex.

>     Guido> Do we really need byte array literals at all?  
> 
> I think so.  Martin already pointed out an example where a string
> literal is used today for a sequences of bytes that's put out on the
> wire as-is.  It's just convenient that the protocol was developed in
> such a way that most of its meta-data is plain ASCII.

See my response to that.

--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list