Re: a little parsing challenge ☺

Xah Lee xahlee at gmail.com
Mon Jul 18 10:39:28 EDT 2011


On Jul 17, 12:47 am, Xah Lee <xah... at gmail.com> wrote:
> 2011-07-16
>
> folks, this one will be interesting one.
>
> the problem is to write a script that can check a dir of text files
> (and all subdirs) and reports if a file has any mismatched matching
> brackets.
>
Ok, here's my solution (pasted at bottom). I haven't tried to make it
elegant or terse, yet, seeing that many are already much elegent than
i could possibly do so with my code.

my solution basically use a stack. (i think all of us are doing
similar) Here's the steps:

• Go thru the file char by char, find a bracket char.
• check if the one on stack is a matching opening char. If so remove
it. Else, push the current onto the stack.
• Repeat the above till end of file.
• If the stack is not empty, then the file got mismatched brackets.
Report it.
• Do the above on all files.

Many elegant solutions. Raymond Hettinger is very quick, posted a
solution only after a hour or so when i posted it. Many others are
very short, very nice. Thank you all for writing them. I haven't
studied them yet. I'll run them all and post a summary in 2 days. (i
have few thousands files to run this test thru, many of them have
mismatched brackets. So i have good data to test with.)

PS we still lack a perl, Scheme lisp, tcl, lua versions. These
wouldn't be hard and would be interesting to read.  If you are picking
up one of these lang, this would be a good exercise.  Haskell too. I
particularly would like to see a javascript version ran from command
line. Maybe somebody can put this exercise to Google folks ... they
are like the js gods.

also, now that we have these home-brewed code, how'd a parser expert
do it? Is it possible to make it even simpler by using some parser
tools? (have no idea what those lex yacc do, or modern incarnations)
I've also been thinking whether this can be done with Parsing
Expression Grammar. That would make the code semantics really elegant
(as opposed home-cooked stack logic).

 Xah

;; -*- coding: utf-8 -*-
;; 2011-07-15, Xah Lee
;; go thru a file, check if all brackets are properly matched.
;; e.g. good: (…{…}… “…”…)
;; bad: ( [)]
;; bad: ( ( )

(setq inputDir "~/web/xahlee_org/p/") ; must end in slash

(defvar matchPairs '() "a alist. For each air, the car is opening
char, cdr is closing char.")

(setq matchPairs '(
                   ("(" . ")")
                   ("{" . "}")
                   ("[" . "]")
                   ("“" . "”")
                   ("‹" . "›")
                   ("«" . "»")
                   ("【" . "】")
                   ("〈" . "〉")
                   ("《" . "》")
                   ("「" . "」")
                   ("『" . "』")
                   )
      )

(defvar searchRegex "" "regex string of all pairs to search.")
(setq searchRegex "")
(mapc
 (lambda (mypair) ""
   (setq searchRegex (concat searchRegex (regexp-quote (car mypair))
"|" (regexp-quote (cdr mypair)) "|") )
   )
 matchPairs)

(setq searchRegex (replace-regexp-in-string "|$" "" searchRegex t
t)) ; remove the ending “|”

(setq searchRegex (replace-regexp-in-string "|" "\\|" searchRegex t
t)) ; change | to \\| for regex “or” operation

(defun my-process-file (fpath)
  "process the file at fullpath fpath ..."
  (let (myBuffer (ii 0) myStack ξchar ξpos)

    (setq myStack '() ) ; each element is a vector [char position]
    (setq ξchar "")

    (setq myBuffer (get-buffer-create " myTemp"))
    (set-buffer myBuffer)
    (insert-file-contents fpath nil nil nil t)

    (goto-char 1)
    (while (search-forward-regexp searchRegex nil t)
      (setq ξpos (point)  )
      (setq ξchar (buffer-substring-no-properties ξpos (- ξpos 1))  )

      ;; (princ (format "-----------------------------\nfound char: %s
\n" ξchar) )

      (let ((isClosingCharQ nil) (matchedOpeningChar nil) )
        (setq isClosingCharQ (rassoc ξchar matchPairs))
        (when isClosingCharQ (setq matchedOpeningChar (car
isClosingCharQ) ) )

        ;; (princ (format "isClosingCharQ is: %s\n" isClosingCharQ) )
        ;; (princ (format "matchedOpeningChar is: %s\n"
matchedOpeningChar) )

        (if
            (and
             (car myStack) ; not empty
             (equal (elt (car myStack) 0) matchedOpeningChar )
             )
            (progn
              ;; (princ (format "matched this bottom item on stack: %s
\n" (car myStack)) )
              (setq myStack (cdr myStack) )
              )
          (progn
            ;; (princ (format "did not match this bottom item on
stack: %s\n" (car myStack)) )
            (setq myStack (cons (vector ξchar ξpos) myStack) ) )
          )
        )
      ;; (princ "current stack: " )
      ;; (princ myStack )
      ;; (terpri )
      )

    (when (not (equal myStack nil))
      (princ "Error file: ")
      (princ fpath)
      (print (car myStack) )
      )
    (kill-buffer myBuffer)
    ))


;; (require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*xah match pair output*" )
  (with-output-to-temp-buffer outputBuffer
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
    (princ "Done deal!")
    )
  )



More information about the Python-list mailing list