5

I am trying to copy five largest files from a certain directory to my pwd. Using cp specific/directory$(ls -S specific/directory | head -n) ./ copies the first file and then proceeds to produce cannot stat errors for the rest of the files in the list.

Why is the pipe working for the first item and failing for the rest?

Sudoh
  • 257

4 Answers4

6

note All my solutions treat only files, as requested, and can treat all type of files
(even with special characters).

If you want to use ls -S

do it the right way:

ls --zero -S | head -z -n5 | xargs -r0 cp -t ./other/dir --

Require recent GNU coreutils.

coreutils 9.1-1 here.

Another way, using bash and recent GNU find:

findutils 4.9.0-4 here.

Based on this:

shopt -s nullglob
cd specific/directory/ || exit
print0 () { 
    [ "$#" -eq 0 ] || printf '%s\0' "$@"
}
readarray -td '' files < <(
    print0 * |
    find -files0-from - -maxdepth 0 -type f -printf '%b\t%p\0' |
    sort -rzn |
    cut -zf2 -
) 
cp -av -- "${files[@]:0:5}" "$OLDPWD"/
  • ${files[@]:0:5} is expanding to the first 5 elements of the files array whose keys are greater or equal to 0.

for older tools via Perl in any shell

perl -e 'rename($_, "./other/dir/$_") for ((sort { -s $b <=> -s $a } <*>))[0..4]'
6

Using zsh you can avoid all the pitfalls associated with parsing and sorting the output of ls:

cp -n -- specific/directory/*(.DOL[1,5]) ./

or with GNU cp (for the -toption):

cp -n -t ./ -- specific/directory/*(.DOL[1,5])

where the glob qualifiers are

  • . match plain files only (not directories, symlinks, fifos, sockets.)
  • D toggle the dotglob option - omit this if you want to exclude hidden files
  • OL[1,5] orders the results by file length (size) and selects the first 5

and the -n option prevents cp from clobbering existing files in the case of a name collision.

steeldriver
  • 81,074
2

EDIT: New answer, works more completely:

The reason the original fails is that the directory name is added only to the first result, so the remaining results, not existing in the current directory, cause the errors re no such files.

A way that works without find is to take advantage of the -F option to ls, which includes trailing characters indicating the types of inodes. The following is an incomplete answer that removes directories from the listing via grep; a more complete answer would remove other inode types that should be excluded. The sed commands removes the * added to executables by -F.

source="<some directory name>"
destination='.'
someCount=5 # e.g.
while IFS=\  read -r; do
    cp "${source}/${REPLY}" "${destination}"
done <<<"$(ls "${source}" -Ft | grep -v '/$' | head -5 | sed 's/\*$//')"

ORIGINAL ANSWER:

Assume the largest files are one, two, three, and four. The command in the question ends up being

cp specific/directory/one two three four .

Since two, three, and four do not exit in ., the command fails. Something akin to

source=specific/directory
set -f # disable globbing
IFS='
'      # split on newlines only
for file in $(ls -S $source); do
   cp "${source}/${file}" .
done

would do it.

WARNING: This will break if there are any newlines in any of the file names (or if your ls mangles filenames even when not printing to the terminal).

  • 1
    Oh, do you mean my accidentally omitting the "head" command? Oops. That should be $(ls $source | head -${someCount}, provided someCount is set. – Peter Whittaker May 01 '23 at 18:09
  • 1
    Thanks to not parsing ls output + use more quotes! http://mywiki.wooledge.org/Quotes. Moreover, you add directory when OP asked only files – Gilles Quénot May 01 '23 at 18:10
  • 1
    Fair point re folders. Starting to look more like a job for find, re the other answer. – Peter Whittaker May 01 '23 at 18:15
  • It occurs to me that there is another solution that doesn't require find, and that takes advantage of the -F flag to ls. It's convoluted, but it's an interesting hack: while IFS=\ read -r; do cp "${source}/${REPLY}" .; done <<<"$(ls "${source}" -FS | xargs -0 | grep -v '/$' | head -${someCount} | sed 's/\*$//')"

    FWIW.

    – Peter Whittaker May 01 '23 at 19:02
  • 2
    ls | xargs -0 is a non sense. You should use ls --zero – Gilles Quénot May 01 '23 at 19:11
  • Interesting. What platform are you on? AlmaLinux 9.1, RHEL 7.9, and current Mac OS all give variations of $ ls --zero ls: unrecognized option '--zero' – Peter Whittaker May 01 '23 at 19:17
  • So don't use a version of ls that doesn't support --zero. I use ls (GNU coreutils) 9.1 – Gilles Quénot May 01 '23 at 19:23
  • So xargs-0 isn’t a “non sense”, then, at least not for those who may not be in a position to pull in a more recent coreutils. – Peter Whittaker May 01 '23 at 19:32
  • Fixed that up a bit. With recent versions of GNU ls one might want to check if the output is still quoted when not printing to a terminal (I can't remember). – ilkkachu May 01 '23 at 21:31
  • 3
    xargs -0 isn't nonsense as such, not at all. But ls | xargs -0 is a bit silly, since ls by default doesn't print any NULs. ls --zero plus some helpers would be a very good answer, actually. (also remember you can [edit] the answer) – ilkkachu May 01 '23 at 21:34
  • You are absolutely correct. I mistakenly added that in preserving spaces, but of course it is useless for that and completely unnecessary in this case. – Peter Whittaker May 01 '23 at 22:11
2

To integrate the other answers:


TL; DR: See below for viable solutions for bash and POSIX shells.


Why is the pipe working for the first item and failing for the rest?

Because the shell does not behave as your command assumes it does.

The $(ls -S | head) Command Substitution is indeed replaced by its output, and is indeed pasted immediately adjacent to the right of the cp specific/directory snippet, but:

  1. because you haven't double-quoted it (which is itself wrong most often than not), the Command Substitution's output undergoes Word Splitting according to the IFS variable; this latter is by default set to (a single space) plus <tab> plus <newline> characters, and a <newline> happens to be what the ls -S | head commands use to separate each file name, hence each name ends up being a separate independent path to your cp command; note that in this case double-quoting the Command Substitution would not help, as you have probably found out already
  2. the shell also does not duplicate the specific/directory/ snippet for each of the names either; (that would be the job for a Brace Expansion, but it'd be tricky to get it right in this case); hence only the first one of the thus separate names gets the directory prefix and is therefore reachable by cp, while the other 4 names are instead expected to be present in the current directory but obviously they aren't (and even if they were, cp would have then complained about them being in fact the same files as the ones in the destination directory ./)

Could it be made to "work"? in principle yes, but it'd be fragile because it'd fall apart as soon as one of the n-files contains one of the characters specified in the IFS variable; even worse, if combined with an uncontrolled eval it could be used for the most classic of command injections if you don't have full control of the filenames in specific/directory. (Plus, see note 1 below).


Possible solutions for bash and POSIX shells

Beside the ls --zero solution available when using GNU coreutils v9.0 onwards as mentioned in other answers, the operation can also be done safely1 with GNU ls from coreutils v8.25 (circa 2016) onwards, which provides the --quoting-style variants for shells. For this we need to use eval, as this is in fact the only way of benefiting from that ls option which is indeed designed to work with eval.

As usual, eval needs to be handled with extra care, if ever. Here we're using it exclusively for the ls command only, and relying on ls to quote filenames correctly for the shell as per documented behavior. For additional care one may invoke for instance /bin/ls i.e. the explicit full path to an ls executable providing the wanted --quoting-style option instead of risking to go using who knows which ls happens to be in $PATH or perhaps who knows what exported rogue function (or even alias) purposefully named ls.

So, with bash:

(
  set -o pipefail \
    && o="$(/bin/ls -S --quoting-style=shell-escape-always | head -n 5)" \
    && eval "set -- $o" \
    && (("$#")) && cp -n -- "${@/#/specific/directory/}" .
)

You can easily change the number of the first n-files by changing the head -n 5.

Note that in the snippet above I've added extra safety and error checks, but pragmatically the whole thing can be trimmed down to the essential commands, if you are absolutely positive about your ls version and it having no real reason to fail or output stray characters.

(cd specific/directory && \
 eval "cp -n -- $(ls -S --quoting-style=shell-escape-always | head -n 5)"' "$OLDPWD"')

An equivalent of the above solution made for POSIX shells can also work safely1 although it's not entirely ideal as it needs to load in memory the entire list of files presented by the ls command. As we cannot filter out such list before it gets to the shell, the source directory must not contain as many files as to fill the available memory, or else the shell will die before running the cp command:

(
  set -- && cd specific/directory \
    && o="$(/bin/ls -rSxw 0 --quoting-style=shell-always)" && eval "set -- $o" \
    && [ "$#" -gt 0 ] && n="$(($# - 5))" && shift "$(($n > 0 ? $n : 0))" \
    && cp -n -- "$@" "$OLDPWD"
)

Here you change the number of the first n-files by changing the $(($# - 5)) bit.

Just as with the bash version, this one can be trimmed down a bit too, as long as you are again positive about the required pre-conditions. This one, in addition to the bash trimmed down version, also needs at least n-files to be actually present in the source directory, or else the shift command will fail making the shell abort prematurely (e.g. if you have less than 5 files in specific/directory, this trimmed down version won't copy them).

(
  set -- && cd specific/directory \
    && eval "set -- $(ls -rSxw 0 --quoting-style=shell-always)" \
    && shift "$(($# - 5))" && cp -n -- "$@" "$OLDPWD"
)

1 NOTE: for simplicity and explanation, the solutions above do not check for the files being actually regular files only, (i.e. not directories or symlinks, sockets, named-fifos, device files). Therefore if your source directory does happen to have "files" of those kinds among the first largest n-files (even if counting effectively 0 bytes), the solutions above will include those names in the final cp command. This can be particularly relevant for symlinks and directories which always do count greater than 0, depending on their contents, and hence may rank higher than regular files in a ls -S. Naturally we could loop over the filenames to test their file's types and discard the non-regular ones, but it'd get increasingly complex especially to replace the discarded ones with the next-in-rank. Please see the other answers to handle these cases sanely, as my solutions here already stretch quite a bit what bash and POSIX shells are capable of.

LL3
  • 5,418