Parsing problems: A journey from a text file to a directory tree

Michael J. Fromberger Michael.J.Fromberger at Clothing.Dartmouth.EDU
Tue Sep 18 14:51:51 EDT 2007


In article <1189958074.670245.17910 at n39g2000hsh.googlegroups.com>,
 "Martin M." <martinmichel at ame-electroplating.com> wrote:

> Hi everybody,
> 
> Some of my colleagues want me to write a script for easy folder and
> subfolder creation on the Mac.
> 
> The script is supposed to scan a text file containing directory trees
> in the following format:
> 
> [New client]
> |-Invoices
> |-Offers
> |--Denied
> |--Accepted
> |-Delivery notes
> 
> As you can see, the folder hierarchy is expressed by the amounts of
> minuses, each section header framed by brackets (like in Windows
> config files).
> 
> After the scan process, the script is supposed to show a dialog, where
> the user can choose from the different sections (e.g. 'Alphabet',
> 'Months', 'New client' etc.). Then the script will create the
> corresponding folder hierarchy in the currently selected folder (done
> via AppleScript).
> 
> But currently I simply don't know how to parse these folder lists and
> how to save them in an array accordingly.
> 
> First I thought of an array like this:
> 
> dirtreedb = {'New client': {'Invoices': {}, 'Offers': {'Denied': {},
> 'Accpeted': {}}, 'Delivery notes': {}}}
> 
> But this doesn't do the trick, as I also have to save the hierarchy
> level of the current folder as well...
> 
> Argh, I really don't get my head around this problem and I need your
> help. I have the feeling, that the answer is not that complicated, but
> I just don't get it right now...

Hello, Martin,

A good way to approach this problem is to recognize that each section of 
your proposed configuration represents a kind of depth-first traversal 
of the tree structure you propose to create.  Thus, you can reconstruct 
the tree by keeping track at all times of the path from the "root" of 
the tree to the "current location" in the tree.

Below is one possible implementation of this idea in Python.  In short, 
the function keeps track of a stack of dictionaries, each of which 
represents the contents of some directory in your hierarchy.  As you 
encounter "|--" lines, entries are pushed to or popped from the stack 
according to whether the nesting level has increased or decreased.

This code is not heavily tested, but hopefully it should be clear:

.import re
.
.def parse_folders(input):
.    """Read input from a file-like object that describes directory
.    structures to be created.  The input format is:
.
.    [Top-level name]
.    |-Subdirectory1
.    |--SubSubDirectory1
.    |--SubSubDirectory2
.    |---SubSubSubDirectory1
.    |-Subdirectory2
.    |-Subdirectory3
.
.    The input may consist of any number of such groups.  The result is
.    a dictionary structure in which each key names a directory, and
.    the corresponding value is a dictionary structure showing the
.    contents of that directory, possibly empty.
.    """
.
.    # This expression matches "header" lines, defining a new section.
.    new_re  = re.compile(r'\[([\w ]+)\]\s*$')
.
.    # This expression matches "nesting" lines, defining subdirectories.
.    more_re = re.compile(r'(\|-+)([\w ]+)$')
.    
.    out = {}        # Root:  Maps section names to subtrees.
.    state = [out]   # Stack of dictionaries, current path.
.
.    for line in input:
.        m = new_re.match(line)
.        if m:       # New section begins here...
.            key = m.group(1).strip()
.            out[key] = {}
.            state = [out, out[key]]
.            continue
.
.        m = more_re.match(line)
.        if m:       # Add a directory to an existing section
.            assert state
.            
.            new_level = len(m.group(1))
.            key = m.group(2).strip()
.            
.            while new_level < len(state):
.                state.pop()
.
.            state[-1][key] = {}
.            state.append(state[-1][key])
.
.    return out

To call this, pass a file-like object to parse_folders(), e.g.:

test1 = '''
[New client].
|-Invoices
|-Offers
|--Denied
|--Accepted
|---Reasons
|---Rhymes
|-Delivery notes
'''

    from StringIO import StringIO
    result = parse_folders(StringIO(test1))

As the documentation suggests, the result is a nested dictionary 
structure, representing the folder structure you encoded.  I hope this 
helps.

Cheers,
-M

-- 
Michael J. Fromberger             | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/  | Dartmouth College, Hanover, NH, USA



More information about the Python-list mailing list