64 bit offsets?

jay thompson jayryan.thompson at gmail.com
Wed Oct 6 21:35:05 EDT 2010


As nice as it would be to use 64bit offsets I am instead mmapping the file
in 1GB chunks and getting the results I need. I would still be interested in
a 64bit solution though.

jt

On Wed, Oct 6, 2010 at 2:41 PM, jay thompson <jayryan.thompson at gmail.com>wrote:

> Hello everyone,
>
> I'm trying to extract some data from a large memory mapped file (the
> largest is ~30GB) with re.finditer() and re.start(). Pythons regular
> expression module is great but the size of re.start() is 32bits (signed so I
> can really only address 2GB).  I was wondering if any here had some
> suggestions on how to get the long offsets I need. btw... I can't break up
> the file because the pattern I'm looking for can occur anywhere and on any
> boundry.
>
> Also, is seek() limited to 32bit addresses?
>
> this is what I have in python 2.7 AMD64:
>
>
> with open(file_path, 'r+b') as file:
>
>     file_map = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ)
>     file_map.seek(0)
>
>     pattern = re.compile("pattern")
>
>     for iii in re.finditer(pattern, file_map):
>
>         offset = iii.start()
>
>         write_to_sqlite(offset)
>
>
>
>
>
> --
> "It's quite difficult to remind people that all this stuff was here for a
> million years before people. So the idea that we are required to manage it
> is ridiculous. What we are having to manage is us."   ...Bill Ballantine,
> marine biologist.
>
>


-- 
"It's quite difficult to remind people that all this stuff was here for a
million years before people. So the idea that we are required to manage it
is ridiculous. What we are having to manage is us."   ...Bill Ballantine,
marine biologist.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20101006/a04f48a0/attachment-0001.html>


More information about the Python-list mailing list