SGML to Python memory tree

François Pinard pinard at iro.umontreal.ca
Wed May 17 08:03:21 EDT 2000


Hi, gang.  Once started on sharing little pieces of code :-).

For the Translation Project, I have some Python code that reads `nsgmls'
output into a memory tree.  It does not process attributes, as I did
not have any in my little application.  This code is surprisingly short,
given what it does.  (It had to work for Python 1.5.1, that's why it works
around the missing `LIST.pop()').


def _(text):
    return text

def read_sgml_file(name):
    stack = []
    current = []
    # Avoid docbk30, which raises some unanalysed interference.
    for line in os.popen('SGML_CATALOG_FILES= nsgmls %s' % name).readlines():
        if line[0] == '(':
            stack.append(current)
            current = [string.lower(line[1:-1])]
            continue
        if line[0] == ')':
            element = tuple(current)
            current = stack[-1]
            del stack[-1]
            current.append(element)
            continue
        if line[0] == '-':
            line = line[1:-1]
            line = string.replace(line, '\\n', '\n')
            line = string.replace(line, '\\011', '\t')
            line = string.rstrip(line)
            current.append(line)
            continue
        if line[0] == 'C':
            return current[0]
    sys.stderr.write(_("SGML in `%s' is not conformant.\n") % name)

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard






More information about the Python-list mailing list