regular expressions with other string classes

Andrew Dalke dalke at dalkescientific.com
Wed Sep 19 06:19:29 EDT 2001


[Note: proposed change to mmap.mmap at the end of this message.]

Nicholas FitzRoy-Dale wrote
> I'm doing regular expression search on an MMAPed file without issues.
> From code to index an mbox-style file:

> mboxMap = mmap.mmap (handle.fileno(), getFileLength (self.sourceFilename),
>   mmap.MAP_SHARED, mmap.PROT_READ)

Well, there's my problem.  I've nearly no clue on how to use
the mmap module, so I assumed I could use the defaults.

Just tried out what you did, and it works.

The following doesn't work:

$ ./python
Python 2.2a3+ (#7, Sep 19 2001, 03:31:19)
[GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> handle = open("/etc/passwd", "r")
>>> import mmap
>>> map = mmap.mmap(handle.fileno(), 0)
>>> import re
>>> pat = re.compile("^(.*(?=dalke).*)$", re.MULTILINE)
>>> m = pat.search(map)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: expected string or buffer
>>>

In this case, the mmap call is missing the MAP_SHARED flag
(which is the default, so should be fine).  It's also missing
the PROT_READ flag, so the default of PROT_READ | PROT_WRITE
is used.

I guess that's the problem, but you can see why the error message
threw me off.



Would it be useful if mmap.mmap was changed to something like

def proposed_mmap(file, size = 0, flags = mmap.MAP_SHARED,
                  prot = None):
  # Roughly like PyObject_AsFileDescriptor in Objects/fileobject.c
  if hasattr(file, "fileno"):
    fileno = file.fileno()
  else:
    fileno = file

  # See if we need to figure out the default for "prot"
  if prot is None:
    # File-like object may have a "mode" attribute defined
    # If so, use it, otherwise default to "rw"
    mode = getattr(file, "mode", "rw")
    prot = 0
    if "r" in mode:
        prot |= mmap.PROT_READ
    if "w" in mode:
        prot |= mmap.PROT_WRITE
    if prot == 0:
        prot = mmap.PROT_NONE

  return mmap.mmap(fileno, size, flags, prot)

This would allow people like me to do

  handle = open("filename")
  map = mmap.mmap(handle)

and have it just work.  (Unless I do 'mmap.mmap(open("filename"))'
since then I'll lose a reference count and the file handle gets
closed from underneath me.  I think.)

Comments?

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list