[Tutor] How to read the first so many Unicode characters from a file?

Sat Jun 26 00:39:47 EDT 2021

On Fri, Jun 25, 2021 at 11:25 PM Eryk Sun <eryksun at gmail.com> wrote:
>
> On 6/25/21, boB Stepp <robertvstepp at gmail.com> wrote:
> > Say I have a text file with a mixture of ASCII and non-ASCII
> > characters (assumed to be UTF-8) and I wanted to read the first N
> > characters from the file.  The first thought that comes to mind is:
> >
> > with open(filename) as f:
> >     N_characters = f.read(N)
>
> Assuming Python 3, you're opening the file in text mode, which reads
> characters, not bytes. That said, you're using the default encoding
> that's based on the platform and locale. In Windows this will be the
> process ANSI code page, unless UTF-8 mode is enabled (e.g. `python -X
> utf8`). You can explicitly decode the file as UTF-8 via open(filename,
> encoding='utf-8').

Ah, foolish me.  I thought I was reading about text streams in the
docs, but I was actually in a bytes section.  This combined with where
I am at in the book I'm reading misled me.

If I specify the encoding at the top of the program file, will that
suffice for overcoming Windows code page issues -- being ASCII not
UTF-8?

Thanks!
boB Stepp