Regex for strings utility

Bengt Richter bokr at accessone.com
Tue Jul 17 16:31:27 EDT 2001


On Tue, 17 Jul 2001 14:25:52 -0500, Skip Montanaro <skip at pobox.com> wrote:

>
>    rhys> I'm trying to write a script which operates like the Unix
>    rhys> 'strings' utility but I'm having difficulties with the regex.
>    ...
>    rhys> I'm getting a Syntax Error: Invalid Token at the closing brace to
>    rhys> the pattern.
>
>You have a couple problems.  First, the pattern needs to be a string, so it
>has to be enclosed in quotes.  Second, the terminating character for the for
>loop needs to be a colon.  Third, based upon the way you imported re, you
>need to refer to the findall function as re.findall.
>
>Here's a slightly revised version of your script:
>
>    #!/usr/bin/env python
>
>    # strings program
>
>    import sys, re
>
>    f = open(sys.argv[1])
>    line = f.readline()
>    pattern = re.compile("[\040-\126\s]{4,}")
Fourth ... ;-)
I don't think he means \126 since
 >>> "\126 is not \176"
 'V is not ~'
and since he's got \s covered, it would be clearer as
     pattern = re.compile("[!-~\s]{4,}")
>
>    while line:
>            # regular expression to match strings >=4 chars goes here
>            matches = re.findall(pattern, line)
>            for match in matches:
>                    print match
>            line = f.readline()
>




More information about the Python-list mailing list