Better Regex and exception handling for this small code

Ganesh Pal ganesh1pal at gmail.com
Tue Jul 18 12:52:28 EDT 2017


Thanks Cameron Simpson for you suggestion  and reply quite helpful :)

On Wed, Jul 12, 2017 at 5:06 AM, Cameron Simpson <cs at zip.com.au> wrote:

> On 11Jul2017 22:01, Ganesh Pal <ganesh1pal at gmail.com> wrote:
>
>> I am trying to open a file and check if there is a pattern  has  changed
>> after the task got completed?
>>
>> file data:
>> ........................................................
>>
>> #tail -f /file.txt
>> ..........................................
>> Note: CRC:algo = 2, split_crc = 1, unused = 0, initiator_crc = b6b20a65,
>> journal_crc = d2097b00
>> Note: Task completed successfully.
>> Note: CRC:algo = 2, split_crc = 1, unused = 0, initiator_crc = d976d35e,
>> journal_crc = a176af10
>>
>>
>> I  have the below piece of code  but would like to make this better more
>> pythonic , I found regex pattern and exception handling poor here , any
>> quick suggestion in your spare time is welcome.
>>
>>
>> #open the existing file if the flag is set and check if there is a match
>>
>> log_file='/file.txt'
>> flag_is_on=1
>>
>
> Use "True" instead of "1". A flag is a Boolean thing, and should use a
> Boolean value. This lets you literally speak "true" and 'false" rather than
> imoplicitly saying that "0 means false and nonzero means true".
>
> data = None
>>
>
> There is no need to initialise data here because you immediately overwrite
> it below.
>
> with open(log_file, 'r') as f:
>>     data = f.readlines()
>>
>> if flag_is_on:
>>
>
> Oh yes. Just name this variable "flag". "_is_on" is kind of implicit.
>
>    logdata = '\n'.join(data)
>>
>
> Do other parts of your programme deal with the file data as lines? If not,
> there is little point to reading the file and breaking it up into lines
> above, then joining them together against here. Just go:
>
>  with open(log_file) as f:
>      log_data = f.read()
>
>    reg = "initiator_crc =(?P<ini_crc>[\s\S]*?), journal_crc"
>>
>
> Normally we write regular expressions as "raw" python strings, thus:
>
>    reg = r'initiator_crc =(?P<ini_crc>[\s\S]*?), journal_crc'
>
> because backslashes etc are punctuation inside normal strings. Within a
> "raw" string started with r' nothing is special until the closing '
> character. This makes writing regular expressions more reliable.
>
> Also, why the character range "[\s\S]"? That says whitespace or
> nonwhitespace i.e. any character. If you want any character, just say ".".
>
>    crc = re.findall(re.compile(reg), logdata)
>>
>
> It is better to compile a regexp just the once, getting a Regexp object,
> and then you just use the compiled object.
>
>    if not crc:
>>        raise Exception("Pattern not found in  logfile")
>>
>
> ValueError would be a more appropriate exception here; plain old
> "Exception" is pretty vague.
>
>    checksumbefore = crc[0].strip()
>>    checksumafter = crc[1].strip()
>>
>
> Your regexp cannot start or end with whitespace. Those .strip calls are
> not doing anything for you.
>
> This reads like you expect there to be exactly 2 matches in the file. What
> if there are more or fewer?
>
>    logging.info("checksumbefore :%s and  checksumafter:%s"
>>                      % (checksumbefore, checksumafter))
>>
>>    if checksumbefore == checksumafter:
>>       raise Exception("checksum not macthing")
>>
>
> Don't you mean != here?
>
> I wouldn't be raising exceptions in this code. Personally I would make
> this a function that returns True or False. Exceptions are a poor way of
> returning "status" or other values. They're really for "things that should
> not have happened", hence their name.
>
> It looks like you're scanning a log file for multiple lines and wanting to
> know if successive ones change. Why not write a function like this
> (untested):
>
>  RE_CRC_LINE = re.compile(r'initiator_crc =(?P<ini_crc>[\s\S]*?),
> journal_crc')
>
>  def check_for_crc_changes(logfile):
>      old_crc_text = ''
>      with open(logfile) as f:
>          for line in f:
>              m = RE_CRC_LINE.match(line)
>              if not m:
>                  # uninteresting line
>                  continue
>              crc_text = m.group(0)
>              if crc_text != old_crc_text:
>                  # found a change
>                  return True
>      if old_crc_text == '':
>          # if this is really an error, you might raise this exception
>          # but maybe no such lines is just normal but boring
>          raise ValueError("no CRC lines seen in logfile %r" % (logfile,))
>      # found no changes
>      return False
>
> See that there is very little sanity checking. In an exception supporting
> language like Python you can often write code as if it will always succeed
> by using things which will raise exceptions if things go wrong. Then
> _outside_ the function you can catch any exceptions that occur (such as
> being unable to open the log file).
>
> Cheers,
> Cameron Simpson <cs at zip.com.au>
>



More information about the Python-list mailing list