[Tutor] filename comparison
Peter Otten
__peter__ at web.de
Wed Jan 12 05:25:08 EST 2022
On 11/01/2022 11:34, mhysnm1964 at gmail.com wrote:
> All,
>
>
>
> Problem Description: I have over 8000 directories. In each directory there
> is a text file and a MP3 file. Below is the file naming structure for an MP3
> or text file:
>
>
>
> Other Mother plot.txt
>
> Other Mother.mp3
>
>
>
> What should occur:
>
> * Each directory should have both the above two files.
> * There can be multiple MP3 and text files in the same directory.
> * I want to find out which directories do not have a plot text file
> associated to the already existing mp3 file.
> * I want to find out which plot text file does not have a mp3 file.
>
>
>
> I have already managed to walk the directory structure using os.walk. But I
> am struggling with the best method of comparing the existing files.
>
>
>
> Anyone have any ideas how to approach this problem? As I am completely stuck
> on how to resolve this.
Given the filenames in one directory your task reduces to a few set
operations on the names without their respective suffix:
>>> files = ["foo.mp3", "bar.mp3", "bar plot.txt", "baz plot.txt"]
Get the "stems" from the text files:
>>> txt_only = {n[:-9] for n in files if n.endswith(" plot.txt")}
>>> txt_only
{'bar', 'baz'}
The same for the audio files:
>>> mp3_only = {n[:-4] for n in files if n.endswith(".mp3")}
>>> mp3_only
{'bar', 'foo'}
Try turning the above into a function stems(files, suffix) that works
for arbitrary suffixes.
Now the beauty of this approach: getting the stems of the missing text
files:
>>> missing_txt = mp3_only - txt_only
>>> missing_txt
{'foo'}
Swap the two sets to get the stems of the missing mp3 files.
Before reporting you may want to add the suffixes:
>>> {n + " plot.txt" for n in missing_txt}
{'foo plot.txt'}
PS: Getting the stems of the complete pairs is just as easy:
>>> txt_only & mp3_only
{'bar'}
More information about the Tutor
mailing list