Discussion on some Code Issues

Fri Jul 6 03:35:16 EDT 2012

subhabangalore at gmail.com wrote:

[Please don't top-post]

>> start = 0
>> for match in re.finditer(r"\$", data):
>>     end = match.start()
>>     print(start, end)
>>     print(data[start:end])
>>     start = match.end()

> That is a nice one. I am thinking if I can write "for lines in f" sort of
> code that is easy but then how to find out the slices then, 

You have to keep track both of the offset of the line and the offset within 
the line:

def offsets(lines, pos=0):
    for line in lines:
        yield pos, line
        pos += len(line)

start = 0
for line_start, line in offsets(lines):
    for pos, part in offsets(re.split(r"(\$)", line), line_start):
        if part == "$":
            print(start, pos)
            start = pos + 1

(untested code, I'm assuming that the file ends with a $)

> btw do you
> know in any case may I convert the index position of file to the list
> position provided I am writing the list for the same file we are reading.

Use a lookup list with the end positions of the texts and then find the 
relevant text with bisect.

>>> ends = [10, 20, 50]
>>> filepos = 15
>>> bisect.bisect(ends, filepos)
1 # position 15 belongs to the second text