1

For example, able to do

cp -R folder1 /Volume/Data

or

some_cp folder1 peter@192.168.1.87/some/path/folder1

and copy 15GB of files, and then sleep for 20 or 30 seconds? So whenever the command has copied 15GB, it will sleep for 20, 30 seconds, or any specified amount of time so that the hard drive can rest and not getting too stressed out.

Instead of by amount of data, one other possibility is, after copying files for every 3 minutes, then rest for 1 minute?

nonopolarity
  • 3,069
  • 1
    rsync has the --bwlimit option to limit by bandwidth. There's also trickle, but again IIRC that's bandwidth based – Chris Davies Jun 01 '20 at 12:50
  • so both rsync and trickle can limit it somehow by bandwidth? – nonopolarity Jun 01 '20 at 13:29
  • 1
    Yes, that's right. You would have to work out an appropriate bandwidth to keep your hard disk happy. I can give you an example (of both) as an answer but they would only be examples. – Chris Davies Jun 01 '20 at 13:41
  • You are trying to solve a problem that does not exist generally. Hard drives don't need to rest and don't get "stressed out". – Pedro Jun 01 '20 at 14:30
  • 1
    Hard drives can get overheated (and in such cases they need a fairly long time to rest). But the best solution to that problem is to improve the cooling of the drive or replacing it with a more energy efficient drive. The interface to external drives can get overloaded. I have some very bad experiences from old 32-bit systems, but time to flush the buffers and to rest is probably good also when writing a lot to a USB pendrive in a 64-bit operating system. – sudodus Jun 01 '20 at 18:27

2 Answers2

2

Shellscript

I made a bash shellscript, that iterates until everything is copied. When there are big files, it is important to use what was already copied in the previous iteration, and it is nice to 'see' the progress of the copy process.

#!/bin/bash

########################################################################

function usage {

 echo "
Usage:   $0 source-dir/ target-dir  # copies content of source-dir
         $0 source-dir  target-dir  # copies source-dir (with subdirs)
"
 exit
}
########################################################################

# main

########################################################################

if [ $# -ne 2 ]
then
 usage
fi
if ! test -d "$1"
then
 echo "$1 is not a directory"
 if test -f "$1"
 then
  echo "but $1 is a file :-)"
 else
  echo "and $1 is not a file :-("
  usage
 fi
fi
if ! test -d "${2##*:}"  # allowing network directories
then
 echo "$2 is not a directory :-("
 usage
else
 echo "$2 is a directory :-)"
fi

cont=true
while $cont
do
 echo "copying ..."
 timeout --foreground 25 rsync --info=progress2 --partial -Ha "$1" "$2"  
 if [ "$?" != "0" ]
 then
  cont=true
  echo "flushing the buffers ..."
  sync
  echo "sleeping for 5 seconds ..."
  sleep 5
 else
  cont=false
 fi
done
echo "final flushing of buffers ..."
sync
echo "Dome :-)"

Usage

When you make the shellscript executable and run it without any parameters, you get the following help message,

Usage:   ./rsyncer-w-pause source-dir/ target-dir  # copies content of source-dir
         ./rsyncer-w-pause source-dir  target-dir  # copies source-dir (with subdirs)

I tested the shellscript by copying some iso files with linux distros from a slow USB pendrive to my hard disk drive. This way there were several iterations, and the copying was interrupted in the middle of the iso files, but the copied part could be used by the next iteration. So I have checked that it works also to copy big files.

To be used for real copying, I think you should increase the time to copy (from 25 seconds), and you should also increase the time to sleep (from 5 seconds). Use the time intervals that are best for your particular task.

Comments about the command line options and parameters

The command timeout stops the execution of the command it controls even when in the middle of copying a file.

   --foreground

          when not running timeout directly from a shell prompt,

          allow COMMAND to read from the TTY and get TTY signals; in  this
          mode, children of COMMAND will not be timed out

The command rsync is a powerful copying command. See man rsync to get a complete description of the possible options.

   --info=FLAGS
          This option lets you have fine-grained control over the informa‐
          tion output you want to see.  An individual  flag  name  may  be
          followed  by a level number, with 0 meaning to silence that out‐
          put, 1 being  the  default  output  level,  and  higher  numbers
          increasing  the  output  of  that  flag  (for those that support
          higher levels).  Use --info=help to see all the  available  flag
          names,  what they output, and what flag names are added for each
          increase in the verbose level.  Some examples:

              rsync -a --info=progress2 src/ dest/
              rsync -avv --info=stats2,misc1,flist0 src/ dest/

   --partial
          By default, rsync will delete any partially transferred file  if
          the  transfer  is  interrupted. In some circumstances it is more
          desirable to keep partially transferred files. Using the  --par‐
          tial  option  tells  rsync to keep the partial file which should
          make a subsequent transfer of the rest of the file much faster.

You may or may not like the --hard-links option,

   -H, --hard-links
          This tells rsync to look for hard-linked files in the source and
          link together the corresponding files on the destination.  With‐
          out  this option, hard-linked files in the source are treated as
          though they were separate files.

   -a, --archive
          This  is equivalent to -rlptgoD. It is a quick way of saying you
          want recursion and want to preserve almost everything  (with  -H
          being  a  notable  omission).   The  only exception to the above
          equivalence is when --files-from is specified, in which case  -r
          is not implied.

          Note that -a does not preserve hardlinks, because finding multi‐
          ply-linked files is expensive.  You must separately specify -H.

Finally, read this explanation about the parameters (source and target),

          rsync -avz foo:src/bar /data/tmp

   This would recursively transfer all files from the directory src/bar on
   the  machine foo into the /data/tmp/bar directory on the local machine.
   The files are transferred in "archive" mode, which  ensures  that  sym‐
   bolic  links,  devices,  attributes,  permissions, ownerships, etc. are
   preserved in the transfer.  Additionally, compression will be  used  to
   reduce the size of data portions of the transfer.

          rsync -avz foo:src/bar/ /data/tmp

   A  trailing slash on the source changes this behavior to avoid creating
   an additional directory level at the destination.  You can think  of  a
   trailing / on a source as meaning "copy the contents of this directory"
   as opposed to "copy the directory by  name",  but  in  both  cases  the
   attributes  of the containing directory are transferred to the contain‐
   ing directory on the destination.  In other words, each of the  follow‐
   ing  commands copies the files in the same way, including their setting
   of the attributes of /dest/foo:

          rsync -av /src/foo /dest
          rsync -av /src/foo/ /dest/foo
sudodus
  • 6,421
1

You can get the pid of the copy process and pause it and later resume.

Also you can use rsync and stop it whenever you want, when you start it again it will start to copy the remaining files.

But I suggest to use rsync and pause rsync process.

Assuming the rsync pid is 1234

to pause:    
kill -s STOP 1234

to continue:
kill -s CONT 1234
binarysta
  • 3,032
  • interesting... does this have the granularity of, if the rsync has a 8MB buffer and trying to fill up the buffer, the kill can actually stop at the machine code level when the buffer is 12% filled (say, even below the UNIX read() system call level)? – nonopolarity Jun 01 '20 at 14:34
  • @nonopolarity actually I don't know this, but you also can use nice to change the priority of the process manually. And as --bwlimit mentioned before it both limits the size of the blocks that rsync writes, and tries to keep the average transfer rate (over socket) at the requested limit. – binarysta Jun 01 '20 at 14:49
  • I'm not convinced that stopping an I/O bound process like this isn't going to create problems. This could be a solution but I would expect a confirmation that this has been tried and tested. – xenoid Jun 01 '20 at 15:54
  • @xenoid that's why I suggest to pause rsync and not simple copy in order to avoid problems. BTW, seems this guy has tested with 25TB https://networklessons.com/uncategorized/pause-linux-process-with-sigstop-sigcont – binarysta Jun 01 '20 at 15:56
  • @xenoid I've done this kind of thing when I've discovered two or three simultaneous rsync processes are thrashing my (spinning) disks. It works provided you don't hit the network timeouts that will cause rsync to lose connection to the remote. – Chris Davies Jun 01 '20 at 19:10