Ignoring comments - Parsing input file

Alex Martelli aleax at aleax.it
Sun Oct 13 05:25:53 EDT 2002


hunj wrote:

> A compact version :-)
> # use tuple or list if you like
> 
> lines = [line for line in open('text') if not line.startswith('#') and
> len(line.strip())]

Marginally compact-er (:-):

lines = [line for line in open('text') if line[:1]!='#' and line.strip()]

"not line.startswith('#')" is more readable than "line[:1]!='#'" (which
in turn is equivalent to "line[0]!='#'" in this case, since line will
never be empty here) so there's little to choose between them.  However,
I do believe it's better NOT to duplicate the work Python itself is
doing internally anyway: the len() call around line.strip() is quite
redundant, since it's what Python internally does to check if the
sequence (string) line.strip() is "true or false".


Neither your snippet nor my variant manage to remove a comment line
that starts with some whitespace followed by a # then the comment, i.e.
an indented comment line.  That's not hard though:

lines = [line for line in open('text') if line.strip()[:1] not in ('','#')]

line.strip()[:1] is '' when line is all-whitespace, otherwise it is the
first non-whitespace character in string line.


None of these variants deal properly with multiline strings that may
happen to contain whitespace lines and/or #'s that must be respected.
For THAT need, I'd DEFINITELY suggest using module tokenize!  Not as
compact as the above idioms, but correctness has its pluses too:-).


Alex




More information about the Python-list mailing list