1

I need to remove files with identical size but not identical content on Linux, that is why fdupes is not an option.

I tried the following command, however it did not remove all files with identical sizes (no idea why)

last=-1; find . -type f -name '*.png' -printf '%f\0' | sort -nz | while read -d '' i; do s=$(stat -c '%s' "$i"); [[ $s = $last ]] && rm "$i"; last=$s; done

Any ideas? What did I wrong?

EDIT: I made a mistake in the initial post. I need to keep one file from the given size, for example:

1.png    # 23,5 Kb
2.png    # 24,6 Kb
4.png    # 24,6 Kb > remove
8.png    # 24,6 Kb > remove
16.png   # 23,5 Kb

Basically I want to remove duplicates, but not by checksum and by size only.

Charles
  • 11

1 Answers1

1

Since you seem to be on a GNU system, you could do something like:

(export LC_ALL=C
find . -name '*.png' -type f -printf '%20s %p\0' |
  sort -z |
  uniq -zuDw20 |
  cut -zb22- |
  xargs -r0 echo rm -f --
)

That prints a 20-character padded size followed by the file path for each file, and uniq -zuDw20 reports all but the last of all entries with duplicated first 20 bytes.

Remove the echo when happy.

Among the things you did wrong:

  • read -d '' i should be IFS= read -rd '' i. See Understanding "IFS= read -r line"
  • %f is only the file name, not its full paths, so that will only work for file names in the current directory.
  • you're comparing the size of a file with the size of the previous file, but you're sorting the list of file by name, not by size. So files with the same size will not necessarily be consecutive in that list.
  • Thank you! I have made a mistake and I need to keep one file from every size found. Basically I want to remove duplicates, but not by checksum and by size only. – Charles Nov 27 '19 at 14:10
  • @Charles, yes, that's what I understood and what my solution should be doing. How do you determine which of the dups to keep? – Stéphane Chazelas Nov 27 '19 at 14:14
  • First of all, thank you, Stéphane! You are my hero, your solution is so elegant and works perfectly. I needed to remove duplicate screenshots and unfortunately metadata were different, so file sizes were identical but checksums did not match. Your solution worked great. – Charles Nov 27 '19 at 14:28
  • @Charles, you should accept the answer then, so that other users with a similar problem will know that is the right one. Just click on the upward arrow at the beginning of the answer. – Eduardo Trápani Nov 27 '19 at 14:37
  • @EduardoTrápani, the "upward arrow" is to "upvote" which the OP can't do at the moment as they don't have enough reputation. To accept it's the "tick" shaped button which is to say: "that's the best answer and works for me". – Stéphane Chazelas Nov 27 '19 at 14:42
  • Sorry @StéphaneChazelas , yes, that's it. – Eduardo Trápani Nov 27 '19 at 14:53