[Tutor] question about strip() and list comprehension

Peter Otten __peter__ at web.de
Wed Apr 9 10:07:35 CEST 2014


Steven D'Aprano wrote:

> On Tue, Apr 08, 2014 at 02:38:13PM -0600, Jared Nielsen wrote:
>> Hello,
>> Could someone explain why and how this list comprehension with strip()
>> works?
>> 
>> f = open('file.txt')
>> t = [t for t in f.readlines() if t.strip()]
>> f.close()
>> print "".join(t)
>> 
>> I had a very long file of strings filled with blank lines I wanted to
>> remove. I did some Googling and found the above code snippet, but no
>> clear explanation as to why it works. I'm particularly confused by how
>> "if t.strip()" is removing the blank lines.
> 
> It isn't. Rather, what it is doing is *preserving* the non-blank lines.
> 
> The call to strip() removes any leading and trailing whitespace, so if
> the line is blank of contains nothing but whitespace, it reduces down to
> the empty string:
> 
> py> '    '.strip()
> ''
> 
> Like other empty sequences and containers, the empty string is
> considered to be "like False", falsey:
> 
> py> bool('')
> False
> 
> 
> So your list cmprehension (re-written to use a more meaningful name)
> which looks like this:
> 
>     [line for line in f.readlines() if line.strip()
> 
> iterates over each line in the file, tests if there is anything left
> over after stripping the leading/trailing whitespace, and only
> accumulates the lines that are non-blank. It is equivalent to this
> for-loop:
> 
>     accumulator = []
>     for line in f.readlines():
>         if line.strip():  # like "if bool(line.strip())"
>             accumulator.append(line)
> 
> 
>> I also don't fully understand the 'print "".join(t)'.
> 
> I presume you understand what print does :-) so it's only the "".join(t)
> that has you confused. This is where the interactive interpreter is
> brilliant, you can try things out for yourself and see what they do. Do
> you know how to start the interactive interpreter?
> 
> (If not, ask and we'll tell you.)
> 
> py> t = ['Is', 'this', 'the', 'right', 'place', 'for', 'an', 'argument?']
> py> ''.join(t)
> 'Isthistherightplaceforanargument?'
> py> ' '.join(t)
> 'Is this the right place for an argument?'
> py> '--+--'.join(t)
> 'Is--+--this--+--the--+--right--+--place--+--for--+--an--+--argument?'
> 
> 
> In your case, you have a series of lines, so each line will end with a
> newline:
> 
> py> t = ['line 1\n', 'line 2\n', 'line 3\n']
> py> ''.join(t)
> 'line 1\nline 2\nline 3\n'
> py> print ''.join(t)
> line 1
> line 2
> line 3
> 
> 
>> The above didn't remove the leading white space on several lines, so I
>> made the following addition:
>> 
>> f = open('file.txt')
>> t = [t for t in f.readlines() if t.strip()]
>> f.close()
>> s = [x.lstrip() for x in t]
>> print "".join(s)
> 
> 
> You can combine those two list comps into a single one:
> 
> f = open('file.txt')
> lines = [line.lstrip() for line in f.readlines() if line.strip()]
> f.close()

For those readers who are starting out with python but are not absolute 
beginners let me just mention that in

[line.lstrip() for line in f.readlines() if line.strip()]

the readlines() call is superfluous -- it reads the lines of the file into a 
list thus putting them all into memory when you need only one at a time, 
effectively more than doubling the amount of memory needed. So get into the 
habit of iterating over a file directly

for line in f:
   ....

In the case where you want to filter and modify lines you can omit the list 
with all modified lines, too:

import sys
for line in f:
    line = line.lstrip()
    sys.stdout.write(line)

Here sys.stdout.write() writes to stdout like print, but expects a string 
and doesn't add a newline. Note that I didn't add

if line:
    sys.stdout.write(line)

-- it doesn't matter much if you do or don't write an empty string. Now what 
about the listcomp? An experienced programmer would use a generator 
expression which is similar to the listcomp, but just deals with the current 
line. Your script will never run out of memory no matter whether the file 
has one billion or one billion billion lines (diskspace and runtime are 
another matter). A genexp looks similar to a listcomp

lines = (line.lstrip() for line in f)

but has the limitation here that you can only iterate over the lines while 
the file is still open. Together with a sophisticated way to close the file

with open("file.txt") as f:
    ... # do something with the file
print "f is now closed without an explicit close() call"
print "even if an error occured while processing the file"

the whole script that removes empty lines and leading whitespace can be 
written as

import sys
with open("file.txt") as f:
    sys.stdout.writelines(line.lstrip() for line in f)

What was I saying? Ah: FORGET ABOUT file.readlines(). You hardly ever need 
it.




More information about the Tutor mailing list