read Unicode characters one by one in python2

Sun Feb 25 15:56:44 EST 2018

On Mon, Feb 26, 2018 at 3:57 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Mon, 26 Feb 2018 01:50:16 +1100, Chris Angelico wrote:
>
>> If you actually need character-by-character, you'd need "for character
>> in fh.read()" rather than iterating over the file itself. Iterating over
>> a file yields lines.
>
> Indeed. But I wonder if there's a performance cost/gain to iterating over
> each line, rather than reading one char at a time?
>
> for line in file:
>     for c in line:
>         ...
>
> Too lazy to actually test it myself, but just tossing this idea out in
> case anyone else cares to give it a try.
>

Depends on the size of the file. For a small file, you could read the
whole thing into memory in a single disk operation, and then splitting
into lines is a waste of time; but for a gigantic file, reading
everything into RAM means crazy-expensive transfer/copy, so it'd be
HEAPS more efficient to work line by line - particularly if you don't
need the whole file.

But if you indeed want to cut the process off, having nested loops
means a simple "break" won't work. So that's a different
consideration.

ChrisA