0

Given a text file containing paths that I want to keep like:

/mnt/cache/vfs/cf/A/file
/mnt/cache/vfs/cf/B/file2

And I want to clear everything else under /mnt/cache/vfs/cf unless it's in my text file

So /mnt/cache/vfs/cf/Z/file3 is deleted, etc

The text file is huge, and has filenames with spaces and possible accents or other special characters

Freedo
  • 1,255

3 Answers3

1

I would list the files in the filesystem, remove those that exist in your set of files to be kept, and delete the remainder.

Here I've used NULL-terminated filenames throughout so that there is no confusion with xargs between a filename containing spaces and its space separated parts:

find /mnt/cache/vfs/cf -type f -print0 |
    LC_ALL=C sort -z |
    LC_ALL=C comm -z -23 - <(LC_ALL=C sort list-of-files-to-keep.list | tr '\n' '\0') |
    xargs -0 printf '%s\n' {}

Replace printf '%s\n' with rm -- when you are ready to perform the deletions.

The comm command takes two sorted files and compares them line by line. The first column of output is entries only in the first file, the second is entries only in the second file, and the third is entries in both files. The -1, -2, and -3 qualifiers inhibit output of the corresponding column, so our comm -23 will output lines that are present only in the first file (-, i.e stdin).

I've forced the locale to C so that sort and comm work in a consistent manner with each other (comm requires sorted input), and also so that every line is sorted in deterministic manner (some locales sort sets of characters the same, so the characters in such a set may be ordered in an inconsistent way).

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
0

What I would do:

shopt -s extglob
cd /mnt/cache/vfs/cf
{   printf 'rm !('; awk -F'/mnt/cache/vfs/cf' '{print $2}' file |
    paste -sd '|'
} | sed 's/$/)/'

When you are happy with the output, you can pipe the whole snippet in bash:

shopt -s extglob
cd /mnt/cache/vfs/cf
{   printf 'rm !('
    awk -F'/mnt/cache/vfs/cf' '{print $2}' file |
    paste -sd '|'
} | sed 's/$/)/' | 
    bash

See http://mywiki.wooledge.org/glob#extglob
and https://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html

0

I would use a simple one-liner like this:

for FILE in $(ls /mnt/cache/vfs/cf); do grep "$FILE" keep_files.txt || rm "/mnt/cache/vfs/cf/$FILE"; done

But I would, depending on how many files you want to delete, recommend to always move them into a temporary directory, to make sure you don't remove a file you need by accident :)

Bog
  • 989
  • 1
    The code is poor, sorry. (1) Bash pitfall number one. (2) No quotes. (3) Expanded $FILE you're giving to grep may look like a regular expression that matches more than you expect. Besides, ls will give you basenames, not full pathnames. – Kamil Maciorowski Oct 17 '23 at 08:18
  • @KamilMaciorowski Oh wow, thanks for the website recommendation. And yeah I know, I still have a lot to learn how to write Bash-Scripts without any flaws. Thats why I said that he should move the files into a directory before deleting them. But yeah I can understand that you find this oneliner poorly written^^ – Bog Oct 17 '23 at 08:25
  • This doesn't work for me. I only match the files I want to keep :( I have tried everything already and nothing works :( – Freedo Oct 18 '23 at 08:25
  • @Freedo you haven't tried "everything", but you might have tried everything you can think of. There are (so far) two other answers here to consider – Chris Davies Oct 19 '23 at 11:18
  • Some simple improvements: 1. use fgrep to prevent interpreting as regex 2. use bash wildcard expansion. This did what I want: for folder in *; do fgrep "$folder" /path/to/keep.txt || rm -rv "$folder"; done As always, echo the results BEFORE you run rm!! – Gargravarr Nov 12 '23 at 15:44