"sgrep" wrapper module

Darrell news@dorb.com
Sat, 08 May 99 12:46:51 GMT


[Moderator's note: "sgrep (structured grep) is a tool for searching text
files and filtering text streams using structural criteria.  The data model
of sgrep is based on regions, which are non-empty substrings of text."]

I swiged sgrep into python. This is all very much just a first crack to see
how it would work.
Modified sgrep a bit to avoid reading files from disk,fixed a memory leak
and a GP fault.

The test file was 4.7meg of SGML. These tests aren't very scientific but if
I waited until everything was perfect, that might be a while.

Source with VC6 work space and Linux Makefile
http://www.dorb.com/darrell/

Be sure to look at this for new features.
http://www.cs.helsinki.fi/~jjaakkol/sgrep/README.txt

--Darrell

<P><A HREF="http://www.dorb.com/darrell/">sgrep wrapper</A> - module
to use the <A HREF="http://www.cs.helsinki.fi/~jjaakkol/sgrep/README.txt">sgrep</A>
structured text/*ML search tool from within Python.  (06-May-99)

##### Test sgrep Used 31 meg. The results are stored in an array class.
>>> import time
>>> from sgrep import *
>>> args=['','"Auto"']
>>> t1= time.time()
>>> buf=open('dism.dsr').read()
>>> l=[]
>>> for x in range(100):
...     l.append(sgrepArgs(args,buf))
...
>>> print time.time()-t1
31.4170000553
>>>
##### Test re Used 63 meg. The results are stored in a python list with
python objects.
##### I believe thats the diff in memory and partly performance.
>>> l=[]
>>> import time, re
>>> from sgrep import *
>>> args=['','"Auto"']
>>> t1= time.time()
>>> buf=open('dism.dsr').read()
>>> l=[]
>>> for x in range(100):
...     l.append(re.findall("Auto",buf))
...
>>> print time.time()-t1
47.5
###### Test sgrep with a little more complicated test
>>> import time
>>> from sgrep import *
>>> args=['','("AutoTagger".."/AutoTagger") containing "para"']
>>> t1= time.time()
>>> cc=sgrepArgs(args,'now is the now time')
>>> buf=open('dism.dsr').read()
>>> l=[]
>>> for x in range(100):
...     l.append(sgrepArgs(args,buf))
...
>>> print time.time()-t1
33.1720000505
>>>
>>>
>>> import time
>>> from sgrep import *
>>> args=['','"para" not in ("AutoTagger".."/AutoTagger")']
>>> t1= time.time()
>>> buf=open('dism.dsr').read()
>>> l=[]
>>> for x in range(100):
...     l.append(sgrepArgs(args,buf))
...
>>> print time.time()-t1
33.2180000544
>>> print len(l[0])
6424
>>>
>>>
>>> import time
>>> from sgrep import *
>>> args=['','("AutoTagger".."/AutoTagger") not containing "para"']
>>> t1= time.time()
>>> buf=open('dism.dsr').read()
>>> l=[]
>>> for x in range(100):
...     l.append(sgrepArgs(args,buf))
...
>>> print time.time()-t1
33.1720000505
>>> print len(l[0])
295
>>>
>>>

-- 
----------- comp.lang.python.announce (moderated) ----------
Article Submission Address:  python-announce@python.org
Python Language Home Page:   http://www.python.org/
Python Quick Help Index:     http://www.python.org/Help.html
------------------------------------------------------------