File reading using delimiters

Mon Jun 9 20:32:38 EDT 2003

kylotan at hotmail.com (Kylotan) wrote in message news:<153fa67.0306090752.218d23b1 at posting.google.com>...
> All the examples of reading files in Python seem to concern reading a
> line at a time. But this is not much good to me as I want to be able
> to read up to arbitrary delimiters without worrying about how many
> lines I'm spanning. With my rudimentary Python knowledge I'm having to
> read in multiple lines, concatenate them, search for the delimiter,
> split the result if necessary, and carry forward whatever was after
> the delimiter to the next operation. Is there a better way of reading
> until a certain character is encountered, and no more?

My Pexpect module is good for this type of scanning.
    http://pexpect.sourceforge.net/
Your code might then look like the follwing examples. Pexpect can match single
chars or arbitrary strings or regular expressions or lists of all of the above.
Note that Pexpect works on file descriptors. It doesn't operate
directly on file-like objects, so it wouldn't work on a StringIO object
(maybe I'll add that feature to future versions). But true file objects
have a file descriptor, so this should do what you want. 

Store this test data in a file called "my_happy_file":
    This is data for the first chunk
    FIRST_DELIMITER This is now data for the second chunk.
    Notice that this chunk can span multiple lines.
    The delimiter can also sepcified as a regular expression.
    The delimiter does not need to be on a separate line. SECOND_DELIMITER

--- first exmaple -------------------------------------------------------------
import pexpect
fin = file ('my_happy_file', 'r')
reader = pexpect.spawn (fin.fileno()) # Uses the file descriptor of fin.
reader.expect ('FIRST_DELIMITER')
first_chunk = reader.before  # everything before the expected delimiter.
reader.expect ('SEC.*_DELIMITER')
second_chunk = reader.before
print first_chunk
print second_chunk
-------------------------------------------------------------------------------

Note that since you can look for a regular expression with subgroups 
that you could also match all your fields with one regular expression:

--- Second exmaple ------------------------------------------------------------
import pexpect
fin = file ('my_happy_file', 'r')
reader = pexpect.spawn (fin.fileno())
reader.expect ('(.*)FIRST_DELIMITER(.*)SECOND_DELIMITER')
print reader.match.group(1)
print reader.match.group(2)
-------------------------------------------------------------------------------

Yours,
Noah