1

I have a script I run regularly using cron. I would like to get notified by email when these scripts fail. I do not wish to be notified every time they run and produce any output at all.

As such, I am using the script Cronic to run my jobs inside cron, which should mean only error output gets sent, and not just any output.

However, in one script I have a bash function to detect if a directory is empty:

# function to check if directory is empty outputs 0 if empty, or
# some positive integer if it does not
is_dir_empty () 
{
  ls -A "$1" | wc -c
}

This function is used in an if statement like this:

if [ "$(is_dir_empty "${local_backup_location}/${folder_to_backup_basename}")" -eq 0 ]; then
    # save space by removing diffs older than 1 month
    rdiff-backup --remove-older-than 1M --force ${local_backup_location}/${folder_to_backup_basename} || logger -t ${logger_name}  "No existing data backup"
fi

occasionally I get an email like this from cronic

Cronic detected failure or error output for the command:
/usr/local/sbin/backup_uploads

RESULT CODE: 0

ERROR OUTPUT: ls -A /mnt/storage-2/data_upload_backup/upload

STANDARD OUTPUT: <snip>

TRACE-ERROR OUTPUT:

  • folder_to_backup=/mnt/storage-1/sftp_data/sftpuser1/upload
  • backup_server=ed-mh-x86001.mydomain.tld
  • backup_folder=data_upload_backup
  • remote_backup_storage_root=/mnt/storage-2
  • local_backup_storage_root=/mnt/storage-2
  • logger_name=backup_uploads
  • error_report_dir=/root/rdiff-backup-errors
  • email_receiver=sysadmin@mydomain.tld
  • problems=0
  • lockfilename=/root/.backup_uploads_lockfile.pid
  • test -e /root/.backup_uploads_lockfile.pid
  • echo
  • basename /mnt/storage-1/sftp_data/sftpuser1/upload
  • folder_to_backup_basename=upload
  • hostname
  • remote_server_backup_folder=/mnt/storage-2/data_upload_backup_ed-mh-pi01
  • mkdir -p /root/rdiff-backup-errors
  • logger -t backup_uploads Starting backup for folder /mnt/storage-1/sftp_data/sftpuser1/upload
  • test -d /mnt/storage-2
  • local_backup_location=/mnt/storage-2/data_upload_backup
  • mkdir -p /mnt/storage-2/data_upload_backup
  • logger -t backup_uploads Backing up to /mnt/storage-2/data_upload_backup
  • mkdir -p /root/rdiff-backup-errors
  • test -d /mnt/storage-2/data_upload_backup/upload
  • is_dir_empty /mnt/storage-2/data_upload_backup/upload
    • wc -c

ls -A /mnt/storage-2/data_upload_backup/upload

  • [ 26 -eq 0 ]
  • logger -t backup_uploads Starting backup for /mnt/storage-1/sftp_data/sftpuser1/upload
  • hostname
  • rdiff-backup --ssh-no-compression /mnt/storage-1/sftp_data/sftpuser1/upload /mnt/storage-2/data_upload_backup/upload/
  • testssh root@ed-mh-x86001.mydomain.tld
  • remote_backup_machine_accessible=0
  • [ 0 -eq 0 ]
  • remote_backup_location=root@ed-mh-x86001.mydomain.tld:/mnt/storage-2/data_upload_backup_ed-mh-pi01
  • logger -t backup_uploads copying local backup up to root@ed-mh-x86001.mydomain.tld:/mnt/storage-2/data_upload_backup_ed-mh-pi01
  • rsync -aP --delete-after /mnt/storage-2/data_upload_backup/ root@ed-mh-x86001.mydomain.tld:/mnt/storage-2/data_upload_backup_ed-mh-pi01
  • rm -f /root/.backup_uploads_lockfile.pid
  • logger -t backup_uploads Lock file /root/.backup_uploads_lockfile.pid deleted, end of script

I do not get this every time the script is run, just occasionally, so perhaps when the directory is actually empty or something like this.

So my question is why am I getting error output from this command, and how to prevent it (at least in the obvious context where i don't want it here)?

This question is very similar to another I asked myself here, but I assure you it is different!

crobar
  • 223

3 Answers3

2

That's down to how that cronic captures the error output of your command and how it tries to separate out the xtrace output from that.

It looks for lines starting with one or more + in there.

That's a crude heuristic. If some error message starts with + it will be classified as trace output. If a xtrace line is interleaved with an error message line or another xtrace line for another command started in parallel (like in your case), then the + may not be at the start the line and it will be classified as xtrace output. If the script uses a different PS4, that will not work properly, etc.

For bash scripts, to separate the error output from the xtrace output, you should use BASH_XTRACEFD instead:

BASH_XTRACEFD=7 bash -x /path/to/your-script > out 2> err 7> trace
1

Similar issue to the one in the linked question. See this part in the output:

+ + wc -c
ls -A /mnt/storage-2/data_upload_backup/upload

When cronic parses the output to pick up the strace lines, it sees the line without a + in front, and thinks that's part of regular error output.

The output there is messed up, it should be just showing the normal xtrace output for those two commands, i.e.:

+ ls -A /mnt/storage-2/data_upload_backup/upload
+ wc -c

What has to happen here, is that Dash doesn't bother to buffer the whole output line before calling write() to send it to stderr, but instead does it piece-by-piece. And since there's two shell processes doing that, one for each part of the pipeline, and they run at the same time, the outputs can get mixed up. I can repeat that on a loaded system with not too much trouble.

For that result, it's enough for it to write the + first, and then the command, but it seems to be worse, I got outputs like this, too:

+ is_dir_empty .
+ + lswc -A -c .

That is, even the individual words get interleaved.

We can see the individual writes with strace:

$ strace -etrace=write -f dash -xc 'echo aa bb cc'
write(2, "+ ", 2+ )                       = 2
write(2, "echo", 4echo)                     = 4
write(2, " aa", 3 aa)                      = 3
write(2, " bb", 3 bb)                      = 3
write(2, " cc", 3 cc)                      = 3
write(2, "\n", 1
)
...

I can't think of a way to work around that, short of fixing it in the shell.

Bash doesn't seem to have the issue, so maybe switch to using it.

ilkkachu
  • 138,973
0

This might or might not be related, but never parse ls output; not even here.

A simple

number_of_items () { echo $#; }
is_empty() {
 shopt -s dotglob
 shopt -s nullglob
 value=$(number_of_items $1/*)
 [[ "${value}" -eq 0 ]]
}

would do, so that you can

if is_empty "${local_backup_location}/${folder_to_backup_basename}" ; then
  …

What you actually want is contains_accessible_things

number_of_items () { echo $#; }
contains_accessible_things() {
 shopt -s dotglob
 shopt -s nullglob
 # check for directory and readability 
 [[ -d "$1" && -r "$1" && ( "$(number_of_items $1/*)" -gt 0 ) ]]
}
… 
if contains_accessible_things "${local_backup_location}/${folder_to_backup_basename}" ; then
  do_stuff
fi
  • I don't know why your answer was downvoted, while not the root cause of the problem it seems possibly useful, but I'm not really qualified to say it was wrong to be downvoted. – crobar Jan 06 '22 at 22:12
  • @crobar thanks for the kind words! Yeah, it was a bit of a wild guess. – Marcus Müller Jan 06 '22 at 22:34