[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack

Andrew Barnert report at bugs.python.org
Sun Jul 20 02:41:35 CEST 2014


Andrew Barnert added the comment:

While we're at it, Douglas Alan's solution wouldn't be an ideal solution even if it were a builtin. A fileLineIter obviously doesn't support the stream API. It means you end up with two objects that share the same file, but have separate buffers and out-of-sync file pointers. And it's a lot slower.

That being said, I think it may be useful enough to put in the stdlib—even more so if you pull the resplit-an-iterator-of-strings code out:

def resplit(strings, separator):
    partialLine = None
    for s in strings:
        if partialLine:
            partialLine += s
        else:
            partialLine = s
        if not s:
            break
        lines = partialLine.split(separator)
        partialLine = lines.pop()
        yield from lines
    if partialLine:
        yield partialLine

Now, you can do this:

with open('rdm-example') as f:
    chunks = iter(partial(f.read, 8192), '')
    lines = resplit(chunks, '\0')
    lines = (line + '\n' for line in lines)

# Or, if you're just going to strip off the newlines anyway:
with open('file-0-example') as f:
    chunks = iter(partial(f.read, 8192), '')
    lines = resplit(chunks, '\0')

# Or, if you have a binary file:
with open('binary-example, 'rb') as f:
    chunks = iter(partial(f.read, 8192), b'')
    lines = resplit(chunks, b'\0')

# Or, if I understand ysj.ray's example:
with open('ysj.ray-example') as f:
    chunks = iter(partial(f.read, 8192), '')
    lines = resplit(chunks, '\r\n')
    records = resplit(lines, '\t')

# Or, if you have something that isn't a file at all:
lines = resplit((packet.body for packet in packets), '\n')

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1152248>
_______________________________________


More information about the Python-bugs-list mailing list