How to read from a file to an arbitrary delimiter efficiently?

Marko Rauhamaa marko at pacujo.net
Sat Feb 27 12:47:34 EST 2016


Dennis Lee Bieber <wlfraed at ix.netcom.com>:

> On Sat, 27 Feb 2016 21:40:17 +1100, Steven D'Aprano <steve at pearwood.info>
> declaimed the following:
>>Thanks for finding the issue, but the solutions given don't suit my
>>use case. I don't want an iterator that operates on pre-read blocks, I
>>want something that will read a record from a file, and leave the file
>>pointer one entry past the end of the record.
>>
>>Oh, and records are likely fairly short, but there may be a lot of them.
>
> 	Considering that most of the world has settled on the view that
> files are just linear streams (curse you, UNIX) anything working with
> "records" has to build the concept on top of the stream. Either by
> making records "fixed width" (allowing for fast random access:
> recNum*recLen => seek position), though likely giving up the stream
> access... Or by wrapping the stream with something that does
> parsing/buffering.

It may be instructive to see how the Linux/UNIX utility head(1)
operates. It actually reads its input greedily but once it has seen
enough, it uses lseek(2) to move the seek position back.

Not all file-like objects can seek so head(1) may fail to operate as
advertised:

========================================================================
$ seq 10000 >/tmp/data.txt
$ {
> head -n 5 >/dev/null
> head -n 5
> } </tmp/data.txt
6
7
8
9
10
$ cat /tmp/data.txt | {
> head -n 5 >/dev/null
> head -n 5
> }

1861
1862
1863
1864
$
========================================================================


Marko



More information about the Python-list mailing list