Using regexes versus "in" membership test?

Victor Hooi victorhooi at gmail.com
Thu Dec 13 01:35:28 EST 2012


Heya,

See my original first post =):

> Would I be substantially better off using a list of strings and using "in" against each line, then using a second pass of regex only on the matched lines? 

Based on what Steven said, and what I know about the logs in question, it's definitely better to do it that way.

However, I'd still like to fix up the regex, or fix any glaring issues with it as well.

Cheers,
Victor

On Thursday, 13 December 2012 17:19:57 UTC+11, Chris Angelico  wrote:
> On Thu, Dec 13, 2012 at 5:10 PM, Victor Hooi <victorhooi at gmail.com> wrote:
> 
> > Are there any other general pointers you might give for that regex? The lines I'm trying to match look something like this:
> 
> >
> 
> >     07:40:05.793627975 [Info  ] [SOME_MODULE] [SOME_FUNCTION] [SOME_OTHER_FLAG] [RequestTag=0 ErrorCode=3 ErrorText="some error message" ID=0:0x0000000000000000 Foo=1 Bar=5 Joe=5]
> 
> >
> 
> > Essentially, I'd want to strip out the timestamp, logging-level, module, function etc - and possibly the tag-value pairs?
> 
> 
> 
> If possible, can you do a simple test to find out whether or not you
> 
> want a line and then do more complex parsing to get the info you want
> 
> out of it? For instance, perhaps the presence of the word "ErrorCode"
> 
> is all you need to check - it wouldn't hurt if you have a few percent
> 
> of false positives that get discarded during the parse phase, it'll
> 
> still be quicker to do a single string-in-string check than a complex
> 
> regex to figure out if you even need to process the line at all.
> 
> 
> 
> ChrisA




More information about the Python-list mailing list