[Chicago] is there really no built-in file/iter split() thing?

Kumar McMillan kumar.mcmillan at gmail.com
Fri Nov 30 22:49:55 CET 2007


[In the hope that Chris has another awesome response...]

Here is another: I have a big sql file (45M) and need to iter through
the statements---no fancy sql parsing, I just want the statements.
Assuming open('big.sql').read().split(';') would be a dumb idea, I
couldn't find anything in stdlib to do this.  What am I missing?  I
thought the tokenize module would but I couldn't see how at first
glance.

def readsplit(filelike, token):
    """yields each chunk between tokens in contents of filelike object.

    For example::

        >>> [c for c in readsplit(StringIO('''bad; ass; elf in
        ... the forest;'''), ';')]
        ...
        ['bad', ' ass', ' elf in \\nthe forest', '']
        >>> [c for c in readsplit(StringIO(''';
        ... 1,2,3;
        ...    and 4; and
        ... even 5'''), ';')]
        ...
        ['', '\\n1,2,3', '\\n   and 4', ' and\\neven 5']
        >>>

    """
    buf = []
    for line in filelike:
        buf.append(line)
        line = ''.join(buf)
        buf[:] = []
        chunks = line.split(';')
        for chunk in chunks[:-1]:
            yield chunk
        buf.append(chunks[-1])
    if len(buf):
        yield ''.join(buf)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: readsplit.py
Type: text/x-python
Size: 976 bytes
Desc: not available
Url : http://mail.python.org/pipermail/chicago/attachments/20071130/b5be0dc4/attachment.py 


More information about the Chicago mailing list