Any fancy grep utility replacements out there?

John J. Lee jjl at pobox.com
Sun Apr 6 08:08:58 EDT 2008


"samslists at gmail.com" <samslists at gmail.com> writes:

> So I need to recursively grep a bunch of gzipped files.  This can't be
> easily done with grep, rgrep or zgrep.  (I'm sure given the right
> pipeline including using the find command it could be done....but
> seems like a hassle).
>
> So I figured I'd find a fancy next generation grep tool.  Thirty
> minutes of searching later I find a bunch in Perl, and even one in
> Ruby.  But I can't find anything that interesting or up to date for
> Python.  Does anyone know of something?
>
> Thanks

There must be a million of these scripts out there, maybe one per
programmer :-) Here's mine:

http://codespeak.net/svn/user/jjlee/trunk/pygrep/


It doesn't do zip files.  It has the usual file / dir blacklisting
feature (for avoiding backup files, etc.).

Oddities of this particular script are support for searching for
Python tokens in .py files, doctests, doctest files, and preppy 2
.prep template files.  It also outputs in a format that allows you to
click on matches in emacs.

A few years back I was going to release it in the hope that other
people would write plugins for other templating systems, but then I
stopped doing lots of web stuff.

Actually, tokenizing based on a simple fixed "word boundary" rule
seems to work as well in many cases (pygrep doesn't do that) -- though
sometimes proper tokenization can be quite handy -- searching for a
particular Python name, Python string or number can be just what's
needed (pygrep does support that -- e.g. <no options>, -sep, -sebp,
-nep).  Most of the time I just use the -t option though, which is
just substring match, just because it's fast and good enough for most
cases (most search strings are longish and so don't give lots of false
positives).  The default is tokenized search for files it knows how to
tokenize (.py, .prep, etc.) and substring match for every other file
that's not blacklisted -- I find this good for small projects, but
too slow (there's no caching) for large projects.

Somebody at work has a nice little web-based tool that you can run as
a local server, and turns tokens (e.g. Python names -- but it's based
on some fast simple tokenizer that doesn't know about Python) into
links you can click on.  The CSS is written so the link styling
doesn't show up until you hover the mouse over a token, IIRC.  It
seems very efficient for exploring/reading and navigating source code
-- I only don't use it because it's not integrated with emacs.  It
would be great if somebody could do the same in emacs, with back /
forward buttons :-)


John



More information about the Python-list mailing list