[Tutor] storing and saving file tree structure

Sun Jan 24 04:27:37 EST 2021

On 24/01/2021 06:45, mhysnm1964 at gmail.com wrote:

> D:\authors\a\Anne rice\title\filename.pdf
> 
> Title - is the title of the book and the filename could be a PDF, Mob, MP3,
> ETC. depending where I have purchased the book. What I am trying to do is
> bring all the information into a spreadsheet. Hence why I am trying to bring
> in the whole directory structure into a data struct like a dict.

You could use a database instead.
SQLite comes with python and can be run in memory rather than on disk.

> *	Some directories have multiple files. Thus when you use pathlib you
> will get multiple path objects. For example:
> D:\authors\a\Anne rice\title\track1.mp3
> D:\authors\a\Anne rice\title\track2.mp3
> D:\authors\a\Anne rice\title\track3.mp3
> 
> *	I do not want the filenames to be included, only the directory names
> which I have worked out.
> *	The last directory is normally the title of the book. This is how I
> have structured the directory.
> *	I want to remove duplicate entries of author names and titles.
> *	Want to import into Excel - I have information on this part. Either
> directly into a spreadsheet or use CSV.

Rewording the requirement.

You have a set of authors and each author has a set of books associated.
Is that it?

> how to write the code to remove duplicates. Dictionaries are really great to
> identify duplicate keys because you can use a simple if test. Finding
> duplicates in a list is more challenging. 

So don't use a list. use a set. sets remove duplicates automatically
(ie they don't allow them to exist!)

> Books = {"Anne Rice": []} # dict with a list.

Books = {"Anne Rice": set()} # dict with a set

> Only methods I have found to identify duplicates within lists is using for
> loops. Thus I was trying to work out how to use dictionaries instead and
> could not. Creating nested dictionaries dynamically is beyond my ability.

Its really not difficult. Lets pretent you wanted all the files
associated woth each book:

Books = {"Anne rice": {"Book Title": [list,of,files]}}

Personally I use formatting to show the layout better if I'm building
it statically, but in your case you are loading it dynamically from
your files.

Books = {
         "Anne rice": {
                       "Book1": [
                                list,
                                of,
                                files
                                ],
                       "Book2": [
                                More,
                                Files
                                ]
                       },
         "Next author": {
                         etc...
                 }
        }

But since you don;t need that just use a set instead of a list.

> import  re, csv
> 
> from pathlib import Path

> def csv_export (data):
>     # dumps the data list to a csv and has to be an list

You can write a dict to a CSV and the dict keys become
the column headings. Look at the Dictwriter.

>     with open ('my-books.csv', 'w', newline="") as fp:
>         writer = csv.writer(fp)
>         writer.writerows(data)
> # end def  
> 
> books = {}
> bookPath = []
> dirList = Path(r"e:\authors") # starting directory
> 
> for path in dirList.rglob('*'): # loading the whole directory structure.
> 
>     if not path.is_dir(): # Checks to see if is a file.
> 
>         bookPath = list(path.relative_to(dirList).parts) # extracts the file
> path as a tuple without "e:\author".
> 
>         bookPath.pop() # removes the file from the path parts, as we only
> want directory names.
> 
>        author = bookPath[1] # author name is always the 2nd element.
> 
>         if author in books: # check for existing keys

Its a dictionary why do you care? Just add the books to the author
entry, if it exists it will work, if it doesn't the entry will be created.

> 
>             if  bookPath[-1] not in books[author]: # trying to find
> duplicate titles but fails.

If you use a set you don;t need to check. but...
In what way fails? It should succeed with an in test even
if its not very efficient.
>                 books[author].append(bookPath)
>             # end if 
>         else: # creates new entries for dict.
>             books[author] = bookPath
>         # end if 
>     # end if 
> # end for

One of the reasons Python uses indentation is to avoid all these
end markers and their misleading, and thus bug-forming, implications.
It's rather ironic that you are putting them back in as comments! :)

> I suspect I might have to do recursive functions but not sure how to do
> this. I always have challenges with recursive logic. I hope someone can help
> and the above makes sense.

It looks like it should work, although you only check the books if the
author is already there. Where is the code to handle the case where its
a new author?

But if you use a dict of sets you avoids all of that checking business.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos