[Tutor] list comprehension, testing for multiple conditions

Steven D'Aprano steve at pearwood.info
Wed Aug 22 16:53:53 CEST 2012


On 22/08/12 20:28, Pete O'Connell wrote:
> Hi. The next step for me to parse the file as I want to is to change
> lines that look like this:
> f 21/21/21 22/22/22 24/24/23 23/23/24
> into lines that look like this:
> f 21 22 23 24

In English, what is the rule you are applying here? My guess is:

"Given three numbers separated by slashes, ignore the first two numbers
and keep the third."

E.g. "17/25/97" => 97.

Am I close?


> Below is my terribly slow loop for doing this. Any suggestions about
> how to make this code more efficient would be greatly appreciated

What makes you say it is "terribly slow"? Perhaps it is as fast as it
could be under the circumstances. (Maybe it takes a long time because
you have a lot of data, not because it is slow.)

The first lesson of programming is not to be too concerned about speed
until your program is correct.

Like most such guidelines, this is not entirely true -- you don't want
to write code which is unnecessarily slow. But the question you should
be asking is, "is it fast enough?" rather than "is it fast?".

Also, the sad truth is that Python tends to be slower than some other
languages. (It's also faster than some other languages too.) But the
general process is:

1) write something that works correctly;

2) if it is too slow, try to speed it up in Python;

3) if that's still too slow, try using something like cython or PyPy

4) if all else fails, now that you have a working prototype, re-write
it again in C, Java, Lisp or Haskell.

Once they see how much more work is involved in writing fast C code,
most people decide that "fast enough" is fast enough :)


> with open(fileName) as lines:
>      theGoodLines = [line.strip("\n") for line in lines if "vn" not in
> line and "vt" not in line and line != "\n"]

I prefer to write code in chains of filters.

with open(fileName) as lines:
     # get rid of leading and trailing whitespace, including newlines
     lines = (line.strip() for line in lines)
     # ignore blanks
     lines = (line in lines if line)
     # ignore lines containing "vn" or "vt"
     theGoodLines = [line in lines if not ("vn" in line or "vt" in line)]

Note that only the last step is a list comprehension using [ ], the others
are generator expressions using ( ) instead.

Will the above be faster than your version? I have no idea. But I think it
is more readable and understandable. Some people might disagree.


> for i in range(len(theGoodLines)):
>      if theGoodLines[i][0] == "f":
>          aGoodLineAsList = theGoodLines[i].split(" ")
>          theGoodLines[i] = aGoodLineAsList[0] + " " +
> aGoodLineAsList[1].split("/")[-1] + " " +
> aGoodLineAsList[2].split("/")[-1] + " " +
> aGoodLineAsList[3].split("/")[-1] + " " +
> aGoodLineAsList[4].split("/")[-1]


Start with a helper function:

def extract_last_item(term):
     """Extract the item from a term like a/b/c"""
     return term.split("/")[-1]


for i, line in enumerate(theGoodLines):
     if line[0] == "f":
         terms = line.split()
         theGoodLines[i] = " ".join([extract_last_item(t) for t in terms])



See how you go with that.



-- 
Steven


More information about the Tutor mailing list