[issue1152248] Add support for reading records with arbitrary separators to the standard IO stack
Andrew Barnert
report at bugs.python.org
Sun Jul 20 02:41:35 CEST 2014
Andrew Barnert added the comment:
While we're at it, Douglas Alan's solution wouldn't be an ideal solution even if it were a builtin. A fileLineIter obviously doesn't support the stream API. It means you end up with two objects that share the same file, but have separate buffers and out-of-sync file pointers. And it's a lot slower.
That being said, I think it may be useful enough to put in the stdlib—even more so if you pull the resplit-an-iterator-of-strings code out:
def resplit(strings, separator):
partialLine = None
for s in strings:
if partialLine:
partialLine += s
else:
partialLine = s
if not s:
break
lines = partialLine.split(separator)
partialLine = lines.pop()
yield from lines
if partialLine:
yield partialLine
Now, you can do this:
with open('rdm-example') as f:
chunks = iter(partial(f.read, 8192), '')
lines = resplit(chunks, '\0')
lines = (line + '\n' for line in lines)
# Or, if you're just going to strip off the newlines anyway:
with open('file-0-example') as f:
chunks = iter(partial(f.read, 8192), '')
lines = resplit(chunks, '\0')
# Or, if you have a binary file:
with open('binary-example, 'rb') as f:
chunks = iter(partial(f.read, 8192), b'')
lines = resplit(chunks, b'\0')
# Or, if I understand ysj.ray's example:
with open('ysj.ray-example') as f:
chunks = iter(partial(f.read, 8192), '')
lines = resplit(chunks, '\r\n')
records = resplit(lines, '\t')
# Or, if you have something that isn't a file at all:
lines = resplit((packet.body for packet in packets), '\n')
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1152248>
_______________________________________
More information about the Python-bugs-list
mailing list