How to read from a file to an arbitrary delimiter efficiently?

Chris Angelico rosuav at gmail.com
Thu Feb 25 02:30:25 EST 2016


On Thu, Feb 25, 2016 at 5:50 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
>
> # Read a chunk of bytes/characters from an open file.
> def chunkiter(f, delim):
>     buffer = []
>     b = f.read(1)
>     while b:
>         buffer.append(b)
>         if b in delim:
>             yield ''.join(buffer)
>             buffer = []
>         b = f.read(1)
>     if buffer:
>         yield ''.join(buffer)

How bad is it if you over-read? If it's absolutely critical that you
not read anything from the buffer that you shouldn't, then yeah, it's
going to be slow. But if you're never going to read the file using
anything other than this iterator, the best thing to do is to read
more at a time. Simple and naive method:

def chunkiter(f, delim):
    """Don't use [ or ] as the delimiter, kthx"""
    buffer = ""
    b = f.read(256)
    while b:
        buffer += b
        *parts, buffer = re.split("["+delim+"]", buffer)
        yield from parts
    if buffer: yield buffer

How well does that perform?

ChrisA



More information about the Python-list mailing list