Weird bahaviour from shlex - line no

Dave Angel davea at davea.name
Sat Sep 28 09:29:19 EDT 2013


On 28/9/2013 02:26, Daniel Stojanov wrote:

> Can somebody explain this. The line number reported by shlex depends
> on the previous token. I want to be able to tell if I have just popped
> the last token on a line.
>

I agree that it seems weird.  However, I don't think you have made
clear why it's not what you (and I) expect.

import shlex

def parseit(string):
    print
    print "Parsing -", string
    first = shlex.shlex(string)
    token = "dummy"
    while token:
        token = first.get_token()
        print token, " -- line", first.lineno

parseit("word1 word2\nword3")     #first
parseit("word1 word2,\nword3")    #second
parseit("word1 word2,word3\nword4")
parseit("word1 word2+,?\nword3")

This will display the lineno attribute for every token.

shlex is documented at:

http://docs.python.org/2/library/shlex.html

And lineno is documented on that page as:

"""shlex.lineno
Source line number (count of newlines seen so far plus one).
"""

It's not at all clear what "seen so far" is intended to mean, but in
practice, the line number is incremented for the last token on the
line. Thus your first example

Parsing - word1 word2
word3
word1  -- line 1
word2  -- line 2
word3  -- line 2
  -- line 2

word2 has the incremented line number.

But when the token is neither whitespace nor ASCII letters, then it
doesn't increment lineno.  Thus second example:

Parsing - word1 word2,
word3
word1  -- line 1
word2  -- line 1
,  -- line 1                      #we would expect this to be "line 2"
word3 -- line 2 -- line 2

Anybody else have some explanation or advice for Daniel, other than
preprocessing the string by stripping any non letters off the end of the
line?

-- 
DaveA





More information about the Python-list mailing list