[Tutor] Regex

Alan Gauld alan.gauld at yahoo.co.uk
Tue May 4 05:51:12 EDT 2021


On 04/05/2021 04:56, P L wrote:

> Having difficulty on an assignment, I have to extract a part of a string, 
> from one key word to another character.

Is it a character or another substring? There is quite a big difference.
Based on your code I'll assume you mean substring.

> string = "Jan 31 01:33:12 ubuntu.local ticky: ERROR Tried to add information to closed ticket (mcintosh)"
> 
> I would like to extract from ERROR to ticket, so "ERROR Tried to add information to closed ticket"

You have jumped directly to regex, but that would always be my last
resort. Have you considered finding the index of the twoi strings and
then extracting a slice?

>>> s = "Jan 31 01:33:12 ubuntu.local ticky: ERROR Tried to add
information to closed ticket (mcintosh)"
>>> start = s.index('ERROR')
>>> end = s.index('ticket')+len('ticket')
>>> s[start:end]
'ERROR Tried to add information to closed ticket'
>>>

You can then generalize that to use string variables for the
start/end markers rather than literal strings.

> Here's what I have so far:
> 
> with open("syslog.log", "r") as file:
>     for error in file:
>         if 'ERROR' not in error:
>             continue
>         pattern = r'(?:ERROR) [\w].+'
>         result = re.search(pattern, error)

If you are going to use the same pattern many times its usually
better (ie faster) to compile the regex outside the loop and
then use the compiled version inside.

AS for your regex, I don't see how it will stop at 'ticket'?
The .+ will match everything to the end of the string.

I'd use the much simpler pattern: 'ERROR.+ticket'

> but my results are the following:
> "ERROR Tried to add information to closed ticket (>"

No idea how you got that.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list