For loop to find files of largest size and copy them to another directory

Question

I am trying to create a loop to sort files within each of many directories by size, then copy the largest two to another location, keeping the directory format (below).

folder/sample 1  
       .../s1.fastq.gz  
       .../s2.fastq.gz  
       .../s3.fastq.gz  
       .../s4.fastq.gz  
folder/sample 2  
       .../s1.fastq.gz  
       .../s2.fastq.gz  
       .../s3.fastq.gz  
       .../s4.fastq.gz

I'm new to Linux, so I'm struggling. I tried:

#!/bin/bash
mkdir newfolder

for dir in folder/*
do
echo $dir
ls -S $dir/*.gz | head -n +2 | cp -T newfolder

done

However, I get the following error.

cp: missing destination file operand after 'newfolder.'

How do I correctly feed the large files into the copy function?

I've also tried using xargs, but I get the error

xargs: invalid option -- 'w'

because I am not correctly feeding in one line at a time.

Possible duplicate of Execute a command once per line of piped input? — muru, Nov 14 '19 at 05:38
If you can use the zsh shell: cp *.fastq.gz(.OL[1,2]) newfolder in each directory. — Kusalananda, Nov 14 '19 at 06:37
Could you consider using a shell other than bash to solve the issue? — Kusalananda, Nov 14 '19 at 08:13
@Łukasz, please be careful with your edits, don’t change commands etc. (newfolder to new folder here, it introduces another error). — Stephen Kitt, Nov 14 '19 at 10:07
Thanks all! I cannot use zsh as I am executing this on a cluster/server that only operates bash. — Holly Jenkins, Nov 15 '19 at 00:50

score 1 · Answer 1 · answered Nov 14 '19 at 09:31

zsh would be a much better choice of shell for that than bash:

#! /bin/zsh -
ret=0
for dir (folder/*(/)) {
  two_largest_files=($dir/*.gz(N.OL[1,2]))
  if (($#two_largest_files)) {
    mkdir -p newfolder/$dir:t &&
      cp -v $two_largest_files newfolder/$dir:t/ || ret=$?
  }
}
exit $ret

(note that -v, for verbose is not supported by all cp implementations, replace with (set -x; cp $two...) if yours doesn't support it).

score 0 · Answer 2 · edited Nov 14 '19 at 08:04

0

There are two issues with your code:

Never, ever try to parse the output of ls, use stat instead;
When files are "many", or filenames contain "funny" characters ("/sample 1/") use find and xargs. Refer to man find and man xargs for further info.

Do something like:

mkdir newdir

find . -type f -name '*.gz' -print0 |\
  xargs -0 -r stat --printf="%s:%N" |\
  sort -rn |\
  head -n 2 |\
  cut -d: -f 2 |\
  xargs cp -T newdir

Warning! Untested code (I'm on phone). Replace the last line with

xargs echo cp -T newdir

Until it works.

For the curious, see https://mywiki.wooledge.org/ParsingLs

edited Nov 14 '19 at 08:04

Kusalananda

333,661

answered Nov 14 '19 at 05:38

waltinator

4,865

You're using nul-delimiters only part of the way through that pipeline, but drop them for sort. There is no need to escape the newlines. – Kusalananda Nov 14 '19 at 08:07

score 0 · Answer 3 · answered Nov 14 '19 at 06:09

This is quite complex. First, you shouldn't parse the output of ls, as for files with newlines in their names things can get messy. So it's best to use the NUL as record (line) delimiter in all the pipeline. This is an example:

for dir in folder/*
do
    echo "$dir"
    find "$dir" -type f -print0 -exec du -h0 {} + | sort -hrz | head -zn 2 |
        sed -z 's/^.*[[:space:]]// ' | xargs -0I@ cp -v @ newfolder
done

find finds the files in the given "$dir" - you should use the quotes here. It also applies du to all the files to get their size.
sort sorts the results by size.
head limits to top 2.
sed gets rid of the size value before the file name.
xargs builds the actual command using the arguments from the pipeline.

NUL delimiter has normally to be indicated in all commands, and so there are the z flags in sort, head and sed; 0 in du and xargs; and they're produced by the -print0 switch of find.

(I don't know why you use the -T flag in cp. In my example it's not there, and there's -v instead to give feedback.)

For loop to find files of largest size and copy them to another directory

3 Answers3