Regex for strings utility
Bengt Richter
bokr at accessone.com
Tue Jul 17 16:31:27 EDT 2001
On Tue, 17 Jul 2001 14:25:52 -0500, Skip Montanaro <skip at pobox.com> wrote:
>
> rhys> I'm trying to write a script which operates like the Unix
> rhys> 'strings' utility but I'm having difficulties with the regex.
> ...
> rhys> I'm getting a Syntax Error: Invalid Token at the closing brace to
> rhys> the pattern.
>
>You have a couple problems. First, the pattern needs to be a string, so it
>has to be enclosed in quotes. Second, the terminating character for the for
>loop needs to be a colon. Third, based upon the way you imported re, you
>need to refer to the findall function as re.findall.
>
>Here's a slightly revised version of your script:
>
> #!/usr/bin/env python
>
> # strings program
>
> import sys, re
>
> f = open(sys.argv[1])
> line = f.readline()
> pattern = re.compile("[\040-\126\s]{4,}")
Fourth ... ;-)
I don't think he means \126 since
>>> "\126 is not \176"
'V is not ~'
and since he's got \s covered, it would be clearer as
pattern = re.compile("[!-~\s]{4,}")
>
> while line:
> # regular expression to match strings >=4 chars goes here
> matches = re.findall(pattern, line)
> for match in matches:
> print match
> line = f.readline()
>
More information about the Python-list
mailing list