1

In exploring options to merge two folders, I've come across a very powerful tool known as rmlint. It has some useful documentation (and Gentle Guide).

I have a scenario that I previously mentioned and to which I received some great answers:

How to merge two folders so as to remove identical files from one, while keeping checksummed differentials?

I was leaning towards the rdfind answer, but as I was researching it a bit I stumbled upon rmlint and found the developer's discussion on duplicate isolation to be quite elucidating.

While reviewing all of this I found a couple interesting arguments: --merge-directories --honour-dir-layout

I thus tried an incantation as follows:

rmlint -T "bi,bl,df,dd" --progress --merge-directories --honour-dir-layout A B

Unfortunately the saved command that I'm to execute is rather enormous given my large scenario and haven't really been able to isolate a manageable smaller subset thus far to test on to establish any degree of confidence before firing this up. I tried to find a way to do a trial run so it might print out what it would be doing vs showing it to me in a script so as to emulate the actions that would be taken, but I'm not finding this option (maybe I'm just bleary eyed and overlooking it?).

I therefore thought I could and should pose a question here to this end:

Has anyone had any success at merging duplicate data sets with rmlint and, if so, what arguments would you suggest to merge two folders such that the goals of my earlier question may be reasonably met?

To briefly restate: The ultimate goal is to get everything that is unique to B into A, while deleting everything in B that is already present in A and anything that has a data contents conflict (ie, non-unique contents) between A and B then leave in both for manual compare such that it will be relatively easy to find these in B after after execution.

1 Answers1

1

Indeed rmlint seems better suited for the task than rdfind. I like that it outputs a shell script, which you can examine to verify that it doesn't propose to do something you didn't really intend.

For your use case, I was drawn to the section of the manual that talks about Flagging original directories, since you clearly have an "original" directory and a "duplicate" one.

This example looks like you could use it as a starting point:

# Find all files on /media/portable that can be safely deleted:
$ rmlint --keep-all-tagged --must-match-tagged /media/portable // ~

Note that your original directory comes after the //, which I found a bit surprising (by default the tool seems to prefer keeping the file from the earlier argument). So rmlint --keep-all-tagged --must-match-tagged B // A.

(Note: I don't have personal experience with rmlint, I'm just going by the documentation)