How do I delete all of a set of files in a random order?

Question

I have a set of files, all that are named with the convention file_[number]_[abcd].bin (where [number] is a number in the range 0-size of drive in MB). i.e there is file_0_a.bin, file_0_b.bin, file_0_c.bin and file_0_d.bin and then the 0 would become a 1 and so on.

The number of files is figured out at run-time based on the size of the partition. I need to delete all of the files that have been created, but in a pseudo-random manner. in blocks of size that I need to be able to specify, i.e where there is 1024 files, delete 512, then delete another 512.

I have the following function for doing it currently, which I call the required number of times, but it will get progressively less likely to find a file that exists, to the point where it might never complete. Obviously, this is somewhat less than ideal.

What is another method that I can use to delete all of the files in a random order?

deleteRandFile() #$1 - total number of files
{
    i=$((RANDOM%$1))
    j=$((RANDOM%3))
    file=""

    case $j in
    0)
        file="${dest_dir}/file_${i}_a.bin";;
    1)
        file="${dest_dir}/file_${i}_b.bin";;    
    2)
        file="${dest_dir}/file_${i}_c.bin";;
    3)
        file="${dest_dir}/file_${i}_d.bin";;
    esac

    if ! [[ -f $file ]]; then
        deleteRandFile $1
    else
        rm $file
    fi

    return 0;
}

Edit: I'm trying to delete in random order so that I can fragment the files as much as possible. This is part of a script that begins by filling a drive with 1MB files, and deletes them, 1024 at a time, then fills the 'gap' with 1 1GB file. Rinse and repeat until you have some very fragmented 1GB files.

Maybe it would help if you can explain why it matters in what order you delete the files. — Nate Eldredge, Oct 08 '14 at 21:25
@NateEldredge Sure, uh...I'm trying to fragment a hard drive and kill a file system. I have my reasons. — Yann, Oct 08 '14 at 21:37
In zsh, you would use the *.bin(o+functionName) notation as in http://unix.stackexchange.com/a/9831 — ignis, Oct 09 '14 at 07:06

Stéphane Chazelas · Accepted Answer · 2017-09-21T09:51:44.410

13

If you want to delete all the files, then, on a GNU system, you could do:

cd -P -- "$destdir" &&
  printf '%s\0' * | # print the list of files as zero terminated records
    sort -Rz |      # random sort (shuffle) the zero terminated records
    xargs -r0 rm -f # pass the input if non-empty (-r) understood as 0-terminated
                    # records (-0) as arguments to rm -f

If you want to only delete a certain number of those matching a regexp you'd insert something like this between the sort and xargs:

awk -v RS='\0' -v ORS='\0' -v n=1024 '/regexp/ {print; if (--n == 0) exit}'

With zsh, you could do:

shuffle() REPLY=$RANDOM
rm -f file_<->_[a-d].bin(.+shuffle[1,1024])

edited Sep 21 '17 at 09:51

answered Oct 08 '14 at 10:40

Stéphane Chazelas

544,893

I don't suppose you'd mind explaining a bit more fully? – Yann Oct 08 '14 at 10:41
I need to learn to type faster, or ignore the review queues to get an answer in ;-) – Anthon Oct 08 '14 at 10:42
This looks pretty good, do you mind going into a bit more detail about what the xargs -r0 flag does? – Yann Oct 08 '14 at 10:48
Thanks, the last thing is how would I need to alter this to make it not delete the lot at once, but delete, say, 1024 at a time? Would piping it through something to get the first 1024 entries that match a regex work? – Yann Oct 08 '14 at 10:55
Awesome, that's perfect. Barring something that does it in fewer commands, I'll accept this in a little bit. – Yann Oct 08 '14 at 11:10
I wound out modifying this slightly to become the command ls -l | sort -Rz | awk -v n=0 '/file_[0-9]+_[abcd].bin/ { if( ++n <= 1024) {print $9 ; }}' | xargs -r0 rm -f – Yann Oct 08 '14 at 12:34
I wonder why you prefer to use "printf" here, instead of a find . -maxdepth 1 -type f -print0 ? (Just reading, as I can not test (no gnu here... :( ), but printf here apparently only add 1 NUL to the entire list of files (all on 1 line, if none have newlines embedded)? Isn't it prone to errors if files have spaces/newlines etc in their pathname? Please explain why this works :) and does sort -Rz shuffle also inside each (here possibly only 1) line? – Olivier Dulac Oct 09 '14 at 05:40
1

@OlivierDulac, no. printf '%s\0' * formats each argument as %s\0 so is like find except that it excludes dot files, sorts the list, doesn't fork a process. And you can do printf '%s\0' file_*_[a-d].bin. find may be better if you only want regular files though you could also use zsh and its globbing qualifiers. – Stéphane Chazelas Oct 09 '14 at 07:48
@StéphaneChazelas: thanks a lot, quite handy trick indeed (if files are not too numerous to fit). I didn't knew printf '%s' * would "loop" over all files, i thought it would try to fit it all in a single %s. thx! – Olivier Dulac Oct 09 '14 at 12:48
@OlivierDulac, printf is built in most shells, so there's no limit on the number of arguments (other than the available memory). – Stéphane Chazelas Oct 09 '14 at 13:04

slm · Answer 2 · 2014-10-08T12:37:56.327

11

Here's a potential alternative using find and shuf:

$ find $destdir -type f | shuf | xargs rm -f

This will find all the files in $destdir and then use the shuf command to shuffle their order, and then pass the list on to xargs rm -f for deletion.

To gate how many files are deleted:

$ find $destdir -type f | shuf | head -X | xargs rm -f

Where -X is the number of files that you want to delete, for example, head -100.

edited Oct 08 '14 at 12:37

answered Oct 08 '14 at 12:12

slm

369,824

It's a good answer, but it doesn't let me specify how many files to delete. (also I don't have shuf on the system, but I didn't specify what I had, so that doesn't stop this being a good answer) – Yann Oct 08 '14 at 12:32
I'm confused, why do you need to specify a number to the delete function? – slm Oct 08 '14 at 12:35
I want to delete the first n files in the shuffled list. I'll edit the question to make that more clear. – Yann Oct 08 '14 at 12:36
@Yann4 - like that? – slm Oct 08 '14 at 12:38
Exactly like that. At the moment, the other answer wins out for me, as it doesn't involve re-compiling a kernel, but this is still a great answer, does exactly the job, +1. – Yann Oct 08 '14 at 12:40
Although, having said that, it doesn't let me match the file name to a regex, it'll just delete everything in the folder. But that could be altered by find -regex right? – Yann Oct 08 '14 at 12:44
@Yann4 - correct, this was to model an approach, you tune it however you need it. Remember it's just a list of files coming through the pipe, so you can insert a grep -E 'pat1|pat2' in the pipe chain too, to get which files you want. But it would be most efficient to do this on find. – slm Oct 08 '14 at 12:51
1

@Yann4 I'm confused but curious about that "doesn't involve re-compiling a kernel" - is that some kind of meme I missed? – Volker Siegel Oct 08 '14 at 17:12
@VolkerSiegel I'm working on an embedded environment, and while the various stuff runs, I had to recompile it to get bash and awk, as that counts as an extra. Although, I could roll with that being a meme – Yann Oct 08 '14 at 21:36
2

This answer is unsafe as written, at least in general. find outputs literal strings separated by newlines, and xargs reads a shell-quoted, whitespace-delimited list of names as input. A malicious name in the input can trick it into deleting something very different from what you intended to delete. – R.. GitHub STOP HELPING ICE Oct 08 '14 at 23:08
2

@R - look at the requirements for the format of filenames that the OP is using. This is perfectly safe given that! – slm Oct 08 '14 at 23:09
@slm, that is no excuse for writing incorrect code or not pointing out the limitations. – Stéphane Chazelas Oct 09 '14 at 07:52
@StéphaneChazelas - I disagree. If the requirements are stipulated that code only needs to deal with X then it should be perfectly fine for that code to deal with X. The OP even leads with this information as the first sentence. – slm Oct 09 '14 at 11:04

How do I delete all of a set of files in a random order?

2 Answers2