Regex for strings utility

Skip Montanaro skip at pobox.com
Tue Jul 17 15:25:52 EDT 2001


    rhys> I'm trying to write a script which operates like the Unix
    rhys> 'strings' utility but I'm having difficulties with the regex.
    ...
    rhys> I'm getting a Syntax Error: Invalid Token at the closing brace to
    rhys> the pattern.

You have a couple problems.  First, the pattern needs to be a string, so it
has to be enclosed in quotes.  Second, the terminating character for the for
loop needs to be a colon.  Third, based upon the way you imported re, you
need to refer to the findall function as re.findall.

Here's a slightly revised version of your script:

    #!/usr/bin/env python

    # strings program

    import sys, re

    f = open(sys.argv[1])
    line = f.readline()
    pattern = re.compile("[\040-\126\s]{4,}")

    while line:
            # regular expression to match strings >=4 chars goes here
            matches = re.findall(pattern, line)
            for match in matches:
                    print match
            line = f.readline()

-- 
Skip Montanaro (skip at pobox.com)
http://www.mojam.com/
http://www.musi-cal.com/




More information about the Python-list mailing list