[Tutor] filename comparison

Peter Otten __peter__ at web.de
Wed Jan 12 05:25:08 EST 2022


On 11/01/2022 11:34, mhysnm1964 at gmail.com wrote:
> All,
> 
>   
> 
> Problem Description: I have over 8000 directories. In each directory there
> is a text file and a MP3 file. Below is the file naming structure for an MP3
> or text file:
> 
>   
> 
> Other Mother plot.txt
> 
> Other Mother.mp3
> 
>   
> 
> What should occur:
> 
> *	Each directory should have both the above two files.
> *	There can be multiple MP3 and text files in the same directory.
> *	I want to find out which directories do not have a  plot text file
> associated to the already existing mp3 file.
> *	I want to find out which plot text file does not have a mp3 file.
> 
>   
> 
> I have already managed to walk the directory structure using os.walk. But I
> am struggling with the best method of comparing the existing files.
> 
>   
> 
> Anyone have any ideas how to approach this problem? As I am completely stuck
> on how to resolve this.

Given the filenames in one directory your task reduces to a few set 
operations on the names without their respective suffix:

 >>> files = ["foo.mp3", "bar.mp3", "bar plot.txt", "baz plot.txt"]

Get the "stems" from the text files:

 >>> txt_only = {n[:-9] for n in files if n.endswith(" plot.txt")}
 >>> txt_only
{'bar', 'baz'}

The same for the audio files:

 >>> mp3_only = {n[:-4] for n in files if n.endswith(".mp3")}
 >>> mp3_only
{'bar', 'foo'}

Try turning the above into a function stems(files, suffix) that works 
for arbitrary suffixes.

Now the beauty of this approach: getting the stems of the missing text 
files:

 >>> missing_txt = mp3_only - txt_only
 >>> missing_txt
{'foo'}

Swap the two sets to get the stems of the missing mp3 files.
Before reporting you may want to add the suffixes:

 >>> {n + " plot.txt" for n in missing_txt}
{'foo plot.txt'}

PS: Getting the stems of the complete pairs is just as easy:
 >>> txt_only & mp3_only
{'bar'}



More information about the Tutor mailing list