[Python-3000] Thoughts on new I/O library and bytecode

Wed Feb 21 04:44:22 CET 2007

[Note: changed subject]

On 2/20/07, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Guido van Rossum" <guido at python.org> wrote:
> > On 2/20/07, Paul Moore <p.f.moore at gmail.com> wrote:
> > > (I have similar concerns over the "new IO" proposals I've
> > > seen, but there's nothing concrete there yet, so I'll save that
> > > argument for another day...)
> >
> > Then you should also have misgivings about the Unicode/str
> > unification. If you are cool with that, I don't see how we can avoid
> > redoing the I/O library.
>
> I'm not so sure.  The return type on socket.recv and os.read could be
> changed to bytes (seemingly without much difficulty),

Yes, that's the plan anyway.

> and likely could
> even be changed to *take* a bytes object as the destination buffer
> (ditto for files opened as 'raw').

This already works -- bytes support the buffer API.

> From there, aside from updating the
> standard library to handle socket, os.read, etc., for incoming data
> expecting a bytes object, and raising an exception when trying to write
> a unicode object, that is the limit to the changes.

Sure.

> Of course, even with the proposed updated I/O library, every one of
> those modules would have to be changed anyways.

Right. But I expect the higher-level APIs (sock.makefile()) to be
relatively stable.

> Then again, I've been "eh?" on the whole I/O library thing, and
> generally annoyed at the "everything is unicode" idea.

Well, unless you remove the str type, how are you going to get rid of
the endless problems with unicode where mixing unicode and str
sometimes works and sometimes doesn't?

> Converting all
> libraries that currently deal with IO is going to be a pain, especially
> if it does any sort of parsing of mixed binary and non-unicode textual
> data (like http headers combined with binary posted data or a utf-8
> encoded stream).

Yeah, I'm not looking forward to that, but I expect it'll be
relatively straightforward once we figure out the right patterns;
there's just a lot of code to convert. But that's the whole Py3k plan.

> As a heavy user of quite a few of the current standard library IO
> modules (SocketServer, asyncore, urllib, socket, etc.) and as someone
> who has the "opportunity" to write line-level protocols, I'd be quite
> happy with the following...
>
> 1) add bytes (or add features to array)
> 2) rename unicode to text (or str)
> 3) renaming str to bin (or some other sufficiently clear name)

So you'd have THREE types (bytes, text, bin)? Or are you proposing bin
instead of bytes, contrary to what you suggested above?

> 4) making string literals 'hello' be unicode
> 5) allow for b'constant' be the renamed str
> 6) add a mandatory 3rd argument to file/open which is the codec to use
> for reading

And how does that help users or compatibility?

> 7) offer a new function for opening 'binary' files (which are opened as
> 'rb' or 'wb' whenever 'r' or 'w' are passed, respectively), which will
> remove confusion on Windows platforms

This is a red herring. Or I'm not sure I understand this part of your
proposal. What's wrong with 'rb'?

> Indeed, it isn't as revolutionary as "everything is unicode", but it
> would allow the standard library to be updated with a relative minimum
> of fuss and muss, without needing to intermix...
>     x = bytes.decode('latin-1').USEFUL_UNICODE_METHOD(...)
> or
>     sock.send(unicode.encode('latin-1'))

Actually, with the renamings and everything, it's just about as
disruptive as the current proposal, so I'm unclear why you think this
is so different.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)