python3: why writing to socket require bytes type while writing to a file require str ?

Steven D'Aprano steve at pearwood.info
Fri Jul 22 06:41:57 EDT 2016


On Fri, 22 Jul 2016 06:52 pm, Yubin Ruan wrote:

> Hi,
>     I'm migrating my code to python3 now and find it hard to deal with
>     python's 'str' and 'bytes' type. It's kind of painful. One thing I
>     find really confusing is that, writing to a socket requires argument
>     to be of type 'bytes'

Correct. Sockets are low-level endpoint for transmitting bytes between
processes.


>     (otherwise python throw 'str does not support 
>     buffer interface...' exception), while writing to a file requires
>     argument to be of type 'str'. Why is that? 

Only files opened in text mode expect str (text strings). Files opened in
binary mode expect bytes.

This is for convenience: Python is a high-level programming language, and so
for the majority of Python programmers, writing to text files should accept
*text*, not bytes. Python allows you to specify the encoding that the file
will use (defaulting to UTF-8), and will automatically encode and decode
when writing and reading from the file.


>     In standard UNIX interface, 
>     everything is file, which mean that writing to a socket is the same as
>     writing to a normal file. 

Python is not Unix-only and doesn't exclusively follow the Unix philosophy.
For example, None is not a file, nor are floats, or lists.

A Python text file is an object which provides an interface for writing
Unicode text strings to low-level byte-oriented files using some encoding.
Python binary files are similar objects, except they skip the encoding part
and so require bytes.

Files opened in text mode will also automatically convert line-endings to
the correct ending used by your operating system.


>     This is true is python2. But thing doesn't 
>     seem to work in python 3. Does anyone have idea why is this in python3
>     ? In my understanding, writing to a file would requires that 
>     everything be written byte by byte. So writing objects of type 'bytes'
>     to socket makes sense. But, how could python write to file using
>     unicode(type 'str')? Does python encode 'str'(Unicode) automatically
>     before writing things to file? 

Yes.

But remember, only if the file is opened in text mode.

To open the file in binary mode, add "b" to the mode:

open("foo", "rb")
open("foo", "wb")

>     If it's true, then why can't it do that 
>     automatic encoding when I trying to write a 'str' to socket ?

Probably because nobody has requested that functionality before.



-- 
Steven
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list