a little parsing challenge ☺

John O'Hagan research at johnohagan.com
Mon Jul 25 01:57:06 EDT 2011


On Thu, 21 Jul 2011 05:58:48 -0700 (PDT)
Xah Lee <xahlee at gmail.com> wrote:

[...]

> > > On Sunday, July 17, 2011 2:48:42 AM UTC-7, Raymond Hettinger wrote:
> > >> On Jul 17, 12:47 am, Xah Lee <xah... at gmail.com> wrote:
> > >>> i hope you'll participate. Just post solution here. Thanks.
> >
> > >>http://pastebin.com/7hU20NNL
> >
> > > just installed py3.
> > > there seems to be a bug.
> > > in this file
> >
> > >http://xahlee.org/p/time_machine/tm-ch04.html
> >
> > > there's a mismatched double curly quote. at position 28319.
> >
> > > the python code above doesn't seem to spot it?

[...]

> >
> > That script doesn't check that the balance is zero at the end of file.
> >
> > Patch:
> >
> > --- ../xah-raymond-old.py       2011-07-19 20:05:13.000000000 +0200
> > +++ ../xah-raymond.py   2011-07-19 20:03:14.000000000 +0200
> > @@ -16,6 +16,8 @@
> >          elif c in closers:
> >              if not stack or c != stack.pop():
> >                  return i
> > +    if stack:
> > +        return i
> >      return -1
> >
> >  def scan(directory, encoding='utf-8'):
> 
> Thanks a lot for the fix Raymond.
> 
> Though, the code seems to have a minor problem.
> It works, but the report is wrong.
> e.g. output:
> 
> 30068: c:/Users/h3/web/xahlee_org/p/time_machine\tm-ch04.html
> 
> that 30068 position is the last char in the file.
> The correct should be 28319. (or at least point somewhere in the file
> at a bracket char that doesn't match.)
> 

[...]

If you want to know where brackets were opened which remain unclosed at EOF, then you have to keep the indices as well as the characters in the stack, and not return until the scan is complete, because anything still in the stack might turn out to be the earliest error. Easy enough to implement:

def checkmatch(string): #skipping the file handling
    openers = {'[': ']', '(': ')', '{': '}' } #etc
    closers = openers.values() 
    still_open, close_errors = [], []
    for index, char in enumerate(string, start=1):
        if char in openers:
            still_open.append((index, char))
        elif char in closers:
            if still_open and char == openers[still_open[-1][1]]:
                still_open.pop()
            else:
                close_errors.append((index, char))
    if still_open or close_errors:
        return min(still_open[:1] + close_errors[:1])[0]


although you might as well return still_open + close_errors and see them all.

Regards,

John



More information about the Python-list mailing list