file.read Method Documentation (Python 2.7.10)

Roel Schroeven roel at roelschroeven.net
Wed Jan 11 12:49:39 EST 2023


Chris Angelico schreef op 11/01/2023 om 18:36:
> On Thu, 12 Jan 2023 at 04:31, Stephen Tucker <stephen_tucker at sil.org> wrote:
> > 1. Create BOM.txt
> > 2. Input three bytes at once from BOM.txt and print them
> > 3. Input three bytes one at a time from BOM.txt and print them
>
> All of these correctly show that a file, in binary mode, reads and writes bytes.
>
> > 4. Input three bytes at once from BOM.txt and print them
> > >>> import codecs
> > >>> myfil = codecs.open ("BOM.txt", mode="rb", encoding="UTF-8")
>
> This is now a codecs file, NOT a vanilla file object. See its docs here:
>
> https://docs.python.org/2.7/library/codecs.html#codecs.open
>
> The output is "codec-dependent" but I would assume that UTF-8 will
> yield Unicode text strings.
>
> > 5. Attempt to input three bytes one at a time from BOM.txt and print them
> > -------------------------------------------------------------------------
> >
> > >>> myfil = codecs.open ("BOM.txt", mode="rb", encoding="UTF-8")
> > >>> myBOM_4 = myfil.read (1)
> > >>> myBOM_4
> > u'\ufeff'
>
> > A. The attempt at Part 5 actually inputs all three bytes when we ask it to input just the first one!
>
> On the contrary; you asked it for one *character* and it read one character.

Not exactly. You're right of course that things opened with 
codecs.open() behave differently from vanilla file objects.
codecs.open() returns a StreamReaderWriter instance, which combines 
StreamReader and StreamWriter. For read(), StreamReader is what matters 
(documented at 
https://docs.python.org/3.11/library/codecs.html#codecs.StreamReader). 
It's read() method is:

read(size=- 1, chars=- 1, firstline=False)

_size_ indicates the approximate maximum number of encoded bytes or code 
points to read for decoding. The decoder can modify this setting as 
appropriate.

_chars_ indicates the number of decoded code points or bytes to return. 
The read() method will never return more data than requested, but it 
might return less, if there is not enough available.

When only one parameter is provided, without name, it's _size_. So 
myfil.read(1) asks to read enough bytes to decode 1 code point 
(approximately). That's totally consistent with the observer behavior.

-- 
"Peace cannot be kept by force. It can only be achieved through understanding."
         -- Albert Einstein



More information about the Python-list mailing list