Re: a little parsing challenge ☺
Xah Lee
xahlee at gmail.com
Thu Jul 21 08:58:48 EDT 2011
On Jul 19, 11:07 am, Thomas Jollans <t... at jollybox.de> wrote:
> On 19/07/11 18:54, Xah Lee wrote:
>
>
>
>
>
>
>
>
>
> > On Sunday, July 17, 2011 2:48:42 AM UTC-7, Raymond Hettinger wrote:
> >> On Jul 17, 12:47 am, Xah Lee <xah... at gmail.com> wrote:
> >>> i hope you'll participate. Just post solution here. Thanks.
>
> >>http://pastebin.com/7hU20NNL
>
> > just installed py3.
> > there seems to be a bug.
> > in this file
>
> >http://xahlee.org/p/time_machine/tm-ch04.html
>
> > there's a mismatched double curly quote. at position 28319.
>
> > the python code above doesn't seem to spot it?
>
> > here's the elisp script output when run on that dir:
>
> > Error file: c:/Users/h3/web/xahlee_org/p/time_machine/tm-ch04.html
> > ["“" 28319]
> > Done deal!
>
> That script doesn't check that the balance is zero at the end of file.
>
> Patch:
>
> --- ../xah-raymond-old.py 2011-07-19 20:05:13.000000000 +0200
> +++ ../xah-raymond.py 2011-07-19 20:03:14.000000000 +0200
> @@ -16,6 +16,8 @@
> elif c in closers:
> if not stack or c != stack.pop():
> return i
> + if stack:
> + return i
> return -1
>
> def scan(directory, encoding='utf-8'):
Thanks a lot for the fix Raymond.
Though, the code seems to have a minor problem.
It works, but the report is wrong.
e.g. output:
30068: c:/Users/h3/web/xahlee_org/p/time_machine\tm-ch04.html
that 30068 position is the last char in the file.
The correct should be 28319. (or at least point somewhere in the file
at a bracket char that doesn't match.)
Today, i tried 3 more scripts. 2 fixed python3 versions, 1 ruby, all
failed again. I've reported the problems i encounter at python or ruby
newsgroups. If you are the author, a fix is very much appreciated.
I'll get back to your code and eventually do a blog of summary of all
different lang versions.
Am off to test that elaborate perl regex now... cross fingers.
Xah. Mood: quite discouraged.
More information about the Python-list
mailing list