[Tutor] How to read the first so many Unicode characters from a file?

boB Stepp robertvstepp at gmail.com
Fri Jun 25 23:10:22 EDT 2021


Say I have a text file with a mixture of ASCII and non-ASCII
characters (assumed to be UTF-8) and I wanted to read the first N
characters from the file.  The first thought that comes to mind is:

with open(filename) as f:
    N_characters = f.read(N)

But reading the docs says that this will read N bytes.  My
understanding is that some UTF-8 characters can be more than one byte.
So it seems to me that this won't work in general.  Does Python
provide a way to accomplish this easily?  Or is my understanding
flawed?

TIA!
boB Stepp


More information about the Tutor mailing list