7

I have a set of files, all that are named with the convention file_[number]_[abcd].bin (where [number] is a number in the range 0-size of drive in MB). i.e there is file_0_a.bin, file_0_b.bin, file_0_c.bin and file_0_d.bin and then the 0 would become a 1 and so on.

The number of files is figured out at run-time based on the size of the partition. I need to delete all of the files that have been created, but in a pseudo-random manner. in blocks of size that I need to be able to specify, i.e where there is 1024 files, delete 512, then delete another 512.

I have the following function for doing it currently, which I call the required number of times, but it will get progressively less likely to find a file that exists, to the point where it might never complete. Obviously, this is somewhat less than ideal.

What is another method that I can use to delete all of the files in a random order?

deleteRandFile() #$1 - total number of files
{
    i=$((RANDOM%$1))
    j=$((RANDOM%3))
    file=""

    case $j in
    0)
        file="${dest_dir}/file_${i}_a.bin";;
    1)
        file="${dest_dir}/file_${i}_b.bin";;    
    2)
        file="${dest_dir}/file_${i}_c.bin";;
    3)
        file="${dest_dir}/file_${i}_d.bin";;
    esac

    if ! [[ -f $file ]]; then
        deleteRandFile $1
    else
        rm $file
    fi

    return 0;
}

Edit: I'm trying to delete in random order so that I can fragment the files as much as possible. This is part of a script that begins by filling a drive with 1MB files, and deletes them, 1024 at a time, then fills the 'gap' with 1 1GB file. Rinse and repeat until you have some very fragmented 1GB files.

Yann
  • 1,190
  • Maybe it would help if you can explain why it matters in what order you delete the files. – Nate Eldredge Oct 08 '14 at 21:25
  • @NateEldredge Sure, uh...I'm trying to fragment a hard drive and kill a file system. I have my reasons. – Yann Oct 08 '14 at 21:37
  • In zsh, you would use the *.bin(o+functionName) notation as in http://unix.stackexchange.com/a/9831 – ignis Oct 09 '14 at 07:06

2 Answers2

13

If you want to delete all the files, then, on a GNU system, you could do:

cd -P -- "$destdir" &&
  printf '%s\0' * | # print the list of files as zero terminated records
    sort -Rz |      # random sort (shuffle) the zero terminated records
    xargs -r0 rm -f # pass the input if non-empty (-r) understood as 0-terminated
                    # records (-0) as arguments to rm -f

If you want to only delete a certain number of those matching a regexp you'd insert something like this between the sort and xargs:

awk -v RS='\0' -v ORS='\0' -v n=1024 '/regexp/ {print; if (--n == 0) exit}'

With zsh, you could do:

shuffle() REPLY=$RANDOM
rm -f file_<->_[a-d].bin(.+shuffle[1,1024])
  • I don't suppose you'd mind explaining a bit more fully? – Yann Oct 08 '14 at 10:41
  • I need to learn to type faster, or ignore the review queues to get an answer in ;-) – Anthon Oct 08 '14 at 10:42
  • This looks pretty good, do you mind going into a bit more detail about what the xargs -r0 flag does? – Yann Oct 08 '14 at 10:48
  • Thanks, the last thing is how would I need to alter this to make it not delete the lot at once, but delete, say, 1024 at a time? Would piping it through something to get the first 1024 entries that match a regex work? – Yann Oct 08 '14 at 10:55
  • Awesome, that's perfect. Barring something that does it in fewer commands, I'll accept this in a little bit. – Yann Oct 08 '14 at 11:10
  • I wound out modifying this slightly to become the command ls -l | sort -Rz | awk -v n=0 '/file_[0-9]+_[abcd].bin/ { if( ++n <= 1024) {print $9 ; }}' | xargs -r0 rm -f – Yann Oct 08 '14 at 12:34
  • I wonder why you prefer to use "printf" here, instead of a find . -maxdepth 1 -type f -print0 ? (Just reading, as I can not test (no gnu here... :( ), but printf here apparently only add 1 NUL to the entire list of files (all on 1 line, if none have newlines embedded)? Isn't it prone to errors if files have spaces/newlines etc in their pathname? Please explain why this works :) and does sort -Rz shuffle also inside each (here possibly only 1) line? – Olivier Dulac Oct 09 '14 at 05:40
  • 1
    @OlivierDulac, no. printf '%s\0' * formats each argument as %s\0 so is like find except that it excludes dot files, sorts the list, doesn't fork a process. And you can do printf '%s\0' file_*_[a-d].bin. find may be better if you only want regular files though you could also use zsh and its globbing qualifiers. – Stéphane Chazelas Oct 09 '14 at 07:48
  • @StéphaneChazelas: thanks a lot, quite handy trick indeed (if files are not too numerous to fit). I didn't knew printf '%s' * would "loop" over all files, i thought it would try to fit it all in a single %s. thx! – Olivier Dulac Oct 09 '14 at 12:48
  • @OlivierDulac, printf is built in most shells, so there's no limit on the number of arguments (other than the available memory). – Stéphane Chazelas Oct 09 '14 at 13:04
11

Here's a potential alternative using find and shuf:

$ find $destdir -type f | shuf | xargs rm -f

This will find all the files in $destdir and then use the shuf command to shuffle their order, and then pass the list on to xargs rm -f for deletion.

To gate how many files are deleted:

$ find $destdir -type f | shuf | head -X | xargs rm -f

Where -X is the number of files that you want to delete, for example, head -100.

slm
  • 369,824
  • It's a good answer, but it doesn't let me specify how many files to delete. (also I don't have shuf on the system, but I didn't specify what I had, so that doesn't stop this being a good answer) – Yann Oct 08 '14 at 12:32
  • I'm confused, why do you need to specify a number to the delete function? – slm Oct 08 '14 at 12:35
  • I want to delete the first n files in the shuffled list. I'll edit the question to make that more clear. – Yann Oct 08 '14 at 12:36
  • @Yann4 - like that? – slm Oct 08 '14 at 12:38
  • Exactly like that. At the moment, the other answer wins out for me, as it doesn't involve re-compiling a kernel, but this is still a great answer, does exactly the job, +1. – Yann Oct 08 '14 at 12:40
  • Although, having said that, it doesn't let me match the file name to a regex, it'll just delete everything in the folder. But that could be altered by find -regex right? – Yann Oct 08 '14 at 12:44
  • @Yann4 - correct, this was to model an approach, you tune it however you need it. Remember it's just a list of files coming through the pipe, so you can insert a grep -E 'pat1|pat2' in the pipe chain too, to get which files you want. But it would be most efficient to do this on find. – slm Oct 08 '14 at 12:51
  • 1
    @Yann4 I'm confused but curious about that "doesn't involve re-compiling a kernel" - is that some kind of meme I missed? – Volker Siegel Oct 08 '14 at 17:12
  • @VolkerSiegel I'm working on an embedded environment, and while the various stuff runs, I had to recompile it to get bash and awk, as that counts as an extra. Although, I could roll with that being a meme – Yann Oct 08 '14 at 21:36
  • 2
    This answer is unsafe as written, at least in general. find outputs literal strings separated by newlines, and xargs reads a shell-quoted, whitespace-delimited list of names as input. A malicious name in the input can trick it into deleting something very different from what you intended to delete. – R.. GitHub STOP HELPING ICE Oct 08 '14 at 23:08
  • 2
    @R - look at the requirements for the format of filenames that the OP is using. This is perfectly safe given that! – slm Oct 08 '14 at 23:09
  • @slm, that is no excuse for writing incorrect code or not pointing out the limitations. – Stéphane Chazelas Oct 09 '14 at 07:52
  • @StéphaneChazelas - I disagree. If the requirements are stipulated that code only needs to deal with X then it should be perfectly fine for that code to deal with X. The OP even leads with this information as the first sentence. – slm Oct 09 '14 at 11:04