4

I am merging two movie libraries and am looking to "de-duplicate" manually via bash scripting.

Here is my thought process so far:

  • Find all files with same name regardless of extension
  • Delete smaller file (I have storage for days! and prefer quality!)

I could build on this, so if I can somehow make the delete part separate, I can build on it. My though being I could use ffmpeg to inspect the video and pick the better one, but I'm guessing bigger size = best option and simpler to code.

I posted of Software Rec but didn't get what I wanted so I realized bash is my best bet, but my "find" knowledge is limited and most of the answers I am finding are way to complicated, I figure this should be a simple thing.

Eg: Find files with same name but different content?

  • I think I am making it over complicated, I'd prefer "Regardless of Extension" and I bet somebody will answer that way, but it would likely be much easier to use a list of possible extensions which I do know, eg (MKV/MP4/M4V/AVI/MPEG) – FreeSoftwareServers Apr 13 '18 at 03:52

2 Answers2

8

This is a nice way I wrote to just find the repeating files ignoring extension:

find . -exec bash -c 'basename "$0" ".${0##*.}"' {} \; | sort | uniq --repeated

Then I wrapped it in this loop to find the smaller of the two files for each:

for i in $(find . -exec bash -c 'basename "$0" ".${0##*.}"' {} \; | sort | uniq --repeated); do find . -name "$i*" -printf '%s %p\n' | sort -n | head -1 | cut -d ' ' -f 2-; done

Finally one more loop to (interactively, with rm -i so there's a prompt before every one) delete all those files:

for j in $(for i in $(find . -exec bash -c 'basename "$0" ".${0##*.}"' {} \; | sort | uniq --repeated); do find . -name "$i*" -printf '%s %p\n' | sort -n | head -1 | cut -d ' ' -f 2-; done); do rm -i "$j"; done

As this involves doing two finds on your directory, surely there is a better way. But this should work for simple cases. It also assumes you're working from the current directory, if you want to perform the command on a different one just change the . argument to both find commands.

habs
  • 433
1

How I ended up doing it because of my issue with needing to exclude .srt files.

Find Files with same name but different extension:

ls * | sed 's/.\{4\}$//' | sort | uniq -d

Note: This is dependant on the extension being 4 characters eg .XYZ, it wouldn't work for mpeg, but all my movies are mp4/mkv/m4v.

Ignore .srt Files:

ls * |  awk '!/.srt/'  | sed 's/.\{4\}$//' | sort | uniq -d

I posted a separate thread on comparing file sizes, but Harrys Answer does a great job except I realized I had .srt issues to mitigate.

Compare two file sizes and delete smaller file