[Tutor] how to print lines which contain matching words or strings

Avi Gross avigross at verizon.net
Tue Nov 20 11:25:14 EST 2018


Asad,

Thank you for the clarification. I am glad that you stated (albeit at the
end) that you wanted a better idea of how to do it than the code you
display. I stripped out the earlier parts of the discussion for storage
considerations but they can be found in the archives if needed.

There are several ways to look at your code.

One is to discuss it the general way it is. 

The other is to discuss how it could be, and there are often many people
that champion one style or another.

I will work with your style but point out the more compact form many favor
first. As has been pointed out, people coming from languages like C, may try
to write in a similar style even in a language that supports other ays.

So if your goal is what you say, then all you need is doable in very few
lines of code.

The basic idea is iteration. You can use it several times.

You have a file. In Python (at least recent versions) the opened file is an
iterator. So the outline of your program can look like:

for line in open(...):
	process_line(line, re_list)

I snuck in a function called process_line that you need to define or replace
by code. I also snuck in a list of regular expressions you would create,
perhaps above the loop.

I will not give you a tutorial on regular expressions. Suffice it to say
they tend to be strings. You do not search for 123 but rather for "123" or
str(123) or anything that becomes a single string.

Here is one of many ways to learn how to make proper expressions and use
them:

https://docs.python.org/2/howto/regex.html

Since you want to repeatedly use the same expressions for each line, you may
want to compile each one and have a list of the compiled versions. 

If you have a list like this:

re_str = [ "ABC", "123", "(and)|(AND)", "[_A-Za-z][_A-Za-z0-9]*" ]

you can use a loop such as list comprehension like this:

re_comp = [ re.compile(pattern) for pattern in re_str ]

So in the function above, or in-line, you can loop over the expressions for
each line sort of like this:

for pat in re_comp:
	<<IF MATCHES pat against line>> print line  and break out.

The latter line is not actual Python code but a place you use whatever
matching function you want. The variable "pat" holds each compiled pattern
one at a time so pat.search(line) or pat.match(line) and so on can be used
depending on your need. Since you actually do not care what matches you have
lots of leeway.

There are many other ways but this one is quite simple and broad and adjust
to any number or type of pattern if properly used.

Back to your code. No need to use a raw string on a normal filename but
harmless.

f3 = open(r'file1.txt',r)

Why file1 is read into variable f3 remains a harmless mystery.

But then I see you using another style by reading the entire file into
memory

f = f3.readlines()
d = []

Nothing wrong with that, although the example above shows how to process one
line at a time. So far, you seem to want to make a list of lines that match
and not print till later.

for linenum in range(len(f)):

OK, that is valid Python but far from optimal. Yes, you can loop over
indices of the list f using the length. But since such a list of strings is
an iterable, you could have done something similar to the method I showed
above:

for line in f:

But going with what you have, you decided to create a series of individual
if statements.

        if re.search("ERR-1" ,f[linenum])
           print f[linenum]
           break

        if re.search("\d\d\d\d\d\d",f[linenum])   --- > seach for a patch
number length of six digits for example 123456
           print f[line]
           break

and so on.

Ignoring the comment in the code that makes it fail, this is presumably
valid but not Pythonic.

One consideration is that the if statement can look like this:

If (condition1 and (condition2 or condition3)) ...

So you could do a list of "or" statements in one if.

In pseudocode:

If (matches(line, re1) or matches(line, re2) ... or ...)

The above, if properly written with N parts will return true as soon as the
first condition matches. You can then print or copy for later printing. No
break needed. But note each of the pseudo-code matches() must return as
pythonic True or be False.

The extended form of "if" is another way:

If condition1 :
	Something
elif condition2:
	Something else
elif condition3:
	Have fun
else:
	whatever


I note you made an empty list with d = []
But you never used it. My initial guess was that you wanted to add lines to
the list. Since you printed instead, is it needed.

You asked about using dictionaries. Yes, you can store just about anything
in dictionaries and iterate over them in the random order. But a list of
strings or compiled regular expressions would work fine for this
application. Having said that, you can make a dictionary but what would be
the key? The key has to be something immutable and is there any obvious
advantage?

If you care about efficiency, some final notes.

The order of the searches might matter. The most commonly found ones should
be tested first and the rarest ones last. Since the algorithm does not want
to print a line multiple times, it will stop evaluating when it finds what
it wants.

Regular expressions are powerful but also not cheap. If many or all the
things you are searching for are simple text, consider using normal string
functions as was discussed earlier. One approach would be to have two
modules and something like this:

If string_search(...):
	Print it
elif re_search(..):
	Print it
Else
	Skip it

And be warned you may make spurious matches. If you search for "and" you
will match sand and ampersand. You need to make the regular expression
understand you want a word boundary before and after or whatever your need
is. You can do very powerful but expensive things like only matching if you
find the same word at least three times on that line, no matter what the
word is. You may need to tell it whether a match should be greedy and many
other considerations such as ignoring case. 

Have fun.

Avi




-----Original Message-----
From: Tutor <tutor-bounces+avigross=verizon.net at python.org> On Behalf Of
Asad
Sent: Monday, November 19, 2018 10:15 PM
To: tutor at python.org
Subject: Re: [Tutor] how to print lines which contain matching words or
strings

Hi Avi Gross /All,

             Thanks for the reply. Yes you are correct , I would like to to
open a file and process a line at a time from the file and want to select
just lines that meet my criteria and print them while ignoring the rest. i
have created the following code :


   import re
   import os

   f3 = open(r'file1.txt',r)
   f = f3.readlines()
   d = []
   for linenum in range(len(f)):
        if re.search("ERR-1" ,f[linenum])
           print f[linenum]
           break
        if re.search("\d\d\d\d\d\d",f[linenum])   --- > seach for a patch
number length of six digits for example 123456
           print f[line]
           break
        if re.search("Good Morning",f[linenum])
           print f[line]
           break
        if re.search("Breakfast",f[linenum])
           print f[line]
           break
        ...
        further 5 more hetrogeneus if conditions I have

=======================================================================
This is beginners approach to print the lines which match the if conditions
.

How should I make it better may be create a dictionary of search items or a
list and then iterate over the lines in a file to print the lines matching
the condition.


Please advice ,

Thanks,




More information about the Tutor mailing list