[Tutor] list comprehension, testing for multiple conditions

Pete O'Connell pedrooconnell at gmail.com
Wed Aug 22 23:23:36 CEST 2012


On Thu, Aug 23, 2012 at 2:53 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> On 22/08/12 20:28, Pete O'Connell wrote:
>>
>> Hi. The next step for me to parse the file as I want to is to change
>> lines that look like this:
>> f 21/21/21 22/22/22 24/24/23 23/23/24
>> into lines that look like this:
>> f 21 22 23 24
>
>
> In English, what is the rule you are applying here? My guess is:
>
> "Given three numbers separated by slashes, ignore the first two numbers
> and keep the third."
>
> E.g. "17/25/97" => 97.
>
> Am I close?

Hi Steve, yes that is correct

>
>
>
>> Below is my terribly slow loop for doing this. Any suggestions about
>> how to make this code more efficient would be greatly appreciated
>
>
> What makes you say it is "terribly slow"? Perhaps it is as fast as it
> could be under the circumstances. (Maybe it takes a long time because
> you have a lot of data, not because it is slow.)

OK maybe I am wrong about it being slow (I thought for loops were
slower than lis comprehensions). But I do know I need it to be as fast
as possible if I need to run it on a thousand files each with hundreds
of thousands of lines

>
> The first lesson of programming is not to be too concerned about speed
> until your program is correct.
>
> Like most such guidelines, this is not entirely true -- you don't want
> to write code which is unnecessarily slow. But the question you should
> be asking is, "is it fast enough?" rather than "is it fast?".
>
> Also, the sad truth is that Python tends to be slower than some other
> languages. (It's also faster than some other languages too.) But the
> general process is:
>
> 1) write something that works correctly;
>
> 2) if it is too slow, try to speed it up in Python;
>
> 3) if that's still too slow, try using something like cython or PyPy
>
> 4) if all else fails, now that you have a working prototype, re-write
> it again in C, Java, Lisp or Haskell.
>
> Once they see how much more work is involved in writing fast C code,
> most people decide that "fast enough" is fast enough :)

OK I will keep it as is and see if I can live with it.

Thanks
Pete

>
>
>
>> with open(fileName) as lines:
>>      theGoodLines = [line.strip("\n") for line in lines if "vn" not in
>> line and "vt" not in line and line != "\n"]
>
>
> I prefer to write code in chains of filters.
>
> with open(fileName) as lines:
>     # get rid of leading and trailing whitespace, including newlines
>     lines = (line.strip() for line in lines)
>     # ignore blanks
>     lines = (line in lines if line)
>     # ignore lines containing "vn" or "vt"
>     theGoodLines = [line in lines if not ("vn" in line or "vt" in line)]
>
> Note that only the last step is a list comprehension using [ ], the others
> are generator expressions using ( ) instead.
>
> Will the above be faster than your version? I have no idea. But I think it
> is more readable and understandable. Some people might disagree.
>
>
>
>> for i in range(len(theGoodLines)):
>>      if theGoodLines[i][0] == "f":
>>          aGoodLineAsList = theGoodLines[i].split(" ")
>>          theGoodLines[i] = aGoodLineAsList[0] + " " +
>> aGoodLineAsList[1].split("/")[-1] + " " +
>> aGoodLineAsList[2].split("/")[-1] + " " +
>> aGoodLineAsList[3].split("/")[-1] + " " +
>> aGoodLineAsList[4].split("/")[-1]
>
>
>
> Start with a helper function:
>
> def extract_last_item(term):
>     """Extract the item from a term like a/b/c"""
>     return term.split("/")[-1]
>
>
> for i, line in enumerate(theGoodLines):
>     if line[0] == "f":
>         terms = line.split()
>         theGoodLines[i] = " ".join([extract_last_item(t) for t in terms])
>
>
>
> See how you go with that.
>
>
>
> --
> Steven
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor



-- 
-


More information about the Tutor mailing list