[Tutor] Unknown encoded file types.

Alan Gauld alan.gauld at yahoo.co.uk
Sun Feb 7 07:28:43 EST 2021


> On 07Feb2021 22:02, Sean Murphy <mhysnm1964 at gmail.com> wrote:
>> My understanding of the difference between readline and read is how the
>> information is stored. Readline stores it in a list while read stores as a
>> string.

You are thinking of readlines() - note the 's'!

readline() returns a single line as a string.
readlines() returns all the lines as a list of strings


> So UTF8 has a variable number of bytes per ordinal which among its 
> features are (a) it is compact for Western alphabets and (b) identical 
> to ASCII For the the characters which are n the ASCII range. UTF16 uses 
> 2 bytes per ordinal, less compact but fixed width.

Being picky, utf16 can extend to 4 bytes for a few rare cases.

> There are ordinals in Unicode beyond the 16 bit range, BTW.

Just so.

> But any UTF16 encoding will be an even number of bytes.

This is true, unlike utf8 which can be 1,2,3 or 4 bytes long.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list