a little parsing challenge ☺
Thomas 'PointedEars' Lahn
PointedEars at web.de
Mon Jul 18 17:59:06 EDT 2011
Ian Kelly wrote:
> Billy Mays wrote:
>> I gave it a shot. It doesn't do any of the Unicode delims, because let's
>> face it, Unicode is for goobers.
>
> Uh, okay...
>
> Your script also misses the requirement of outputting the index or row
> and column of the first mismatched bracket.
Thanks to Python's expressiveness, this can be easily remedied (see below).
I also do not follow Billy's comment about Unicode. Unicode and the fact
that Python supports it *natively* cannot be appreciated enough in a
globalized world.
However, I have learned a lot about being pythonic from his posting (take
those generator expressions, for example!), and the idea of looking at the
top of a stack for reference is a really good one. Thank you, Billy!
Here is my improvement of his code, which should fill the mentioned gaps.
I have also reversed the order in the report line as I think it is more
natural this way. I have tested the code superficially with a directory
containing a single text file. Watch for word-wrap:
# encoding: utf-8
'''
Created on 2011-07-18
@author: Thomas 'PointedEars' Lahn <PointedEars at web.de>, based on an idea of
Billy Mays <81282ed9a88799d21e77957df2d84bd6514d9af6 at myhashismyemail.com>
in <news:j01ph6$knt$1 at speranza.aioe.org>
'''
import sys, os
pairs = {u'}': u'{', u')': u'(', u']': u'[',
u'”': u'“', u'›': u'‹', u'»': u'«',
u'】': u'【', u'〉': u'〈', u'》': u'《',
u'」': u'「', u'』': u'『'}
valid = set(v for pair in pairs.items() for v in pair)
if __name__ == '__main__':
for dirpath, dirnames, filenames in os.walk(sys.argv[1]):
for name in filenames:
stack = [' ']
# you can use chardet etc. instead
encoding = 'utf-8'
with open(os.path.join(dirpath, name), 'r') as f:
reported = False
chars = ((c, line_no, col) for line_no, line in enumerate(f)
for col, c in enumerate(line.decode(encoding)) if c in valid)
for c, line_no, col in chars:
if c in pairs:
if stack[-1] == pairs[c]:
stack.pop()
else:
if not reported:
first_bad = (c, line_no + 1, col + 1)
reported = True
else:
stack.append(c)
print '%s: %s' % (name, ("good" if len(stack) == 1 else "bad
'%s' at %s:%s" % first_bad))
--
PointedEars
Bitte keine Kopien per E-Mail. / Please do not Cc: me.
More information about the Python-list
mailing list