Using a function for regular expression substitution

MRAB python at mrabarnett.plus.com
Sun Aug 29 13:14:15 EDT 2010


On 29/08/2010 15:22, naugiedoggie wrote:
> Hello,
>
> I'm having a problem with using a function as the replacement in
> re.sub().
>
> Here is the function:
>
> def normalize(s) :
>      return
> urllib.quote(string.capwords(urllib.unquote(s.group('provider'))))

This normalises the provider and returns only that, and none of the
remainder of the string.

I think you might want this:

def normalize(s):
     return s[ : s.start('provider')] + 
urllib.quote(string.capwords(urllib.unquote(s.group('provider')))) + 
s[s.start('provider') : ]

It returns the part before the provider, followed by the normalised
provider, and then the part after the provider.

>
> The purpose of this function is to proper-case the words contained in
> a URL query string parameter value.  I'm massaging data in web log
> files.
>
> In case it matters, the regex pattern looks like this:
>
> provider_pattern = r'(?P<search>Search_Provider)=(?P<provider>[^&]+)'
>
> The call looks like this:
>
> <code>
> re.sub(matcher,normalize,line)
> </code>
>
> Where line is the log line entry.
>
> What I get back is first the entire line with the normalization of the
> parameter value, but missing the parameter; then appended to that
> string is the entire line again, with the query parameter back in
> place pointing to the normalized string.
>
> <code>
>>>> fileReader = open(log,'r')
>>>>
>>>> lines = fileReader.readlines()
>>>> for line in lines:
> 	if line.find('Search_Type') != -1 and line.find('Search_Provider') !=
> -1 :

These can be replaced by:

	if 'Search_Type' in line and 'Search_Provider' in line:

> 		re.sub(provider_matcher,normalize,line)

re.sub is returning the result, which you're throwing away!

		line = re.sub(provider_matcher,normalize,line)

> 		print line,'\n'
> </code>
>
> The output of the print is like this:
>
> <code>
> 'log-entry parameter=value&normalized-string&parameter=value\n
> log-entry parameter=value&parameter=normalized-string&parameter=value'
> </code>
>
> The goal is to massage the specified entries in the log files and
> write the entire log back into a new file.  The new file has to be
> exactly the same as the old one, with the exception of the entries
> I've altered with my function.
>
> No doubt I'm doing something trivially wrong, but I've tried to
> reproduce the structure as defined in the documentation.
>



More information about the Python-list mailing list