Bash function produces error output, why?

Question

I have a script I run regularly using cron. I would like to get notified by email when these scripts fail. I do not wish to be notified every time they run and produce any output at all.

As such, I am using the script Cronic to run my jobs inside cron, which should mean only error output gets sent, and not just any output.

However, in one script I have a bash function to detect if a directory is empty:

# function to check if directory is empty outputs 0 if empty, or
# some positive integer if it does not
is_dir_empty () 
{
  ls -A "$1" | wc -c
}

This function is used in an if statement like this:

if [ "$(is_dir_empty "${local_backup_location}/${folder_to_backup_basename}")" -eq 0 ]; then
    # save space by removing diffs older than 1 month
    rdiff-backup --remove-older-than 1M --force ${local_backup_location}/${folder_to_backup_basename} || logger -t ${logger_name}  "No existing data backup"
fi

occasionally I get an email like this from cronic

Cronic detected failure or error output for the command:
/usr/local/sbin/backup_uploads
RESULT CODE: 0
ERROR OUTPUT:
ls -A /mnt/storage-2/data_upload_backup/upload
STANDARD OUTPUT:
<snip>
TRACE-ERROR OUTPUT:

folder_to_backup=/mnt/storage-1/sftp_data/sftpuser1/upload
backup_server=ed-mh-x86001.mydomain.tld
backup_folder=data_upload_backup
remote_backup_storage_root=/mnt/storage-2
local_backup_storage_root=/mnt/storage-2
logger_name=backup_uploads
error_report_dir=/root/rdiff-backup-errors
email_receiver=sysadmin@mydomain.tld
problems=0
lockfilename=/root/.backup_uploads_lockfile.pid
test -e /root/.backup_uploads_lockfile.pid
echo
basename /mnt/storage-1/sftp_data/sftpuser1/upload
folder_to_backup_basename=upload
hostname
remote_server_backup_folder=/mnt/storage-2/data_upload_backup_ed-mh-pi01
mkdir -p /root/rdiff-backup-errors
logger -t backup_uploads Starting backup for folder /mnt/storage-1/sftp_data/sftpuser1/upload
test -d /mnt/storage-2
local_backup_location=/mnt/storage-2/data_upload_backup
mkdir -p /mnt/storage-2/data_upload_backup
logger -t backup_uploads Backing up to /mnt/storage-2/data_upload_backup
mkdir -p /root/rdiff-backup-errors
test -d /mnt/storage-2/data_upload_backup/upload
is_dir_empty /mnt/storage-2/data_upload_backup/upload

wc -c



ls -A /mnt/storage-2/data_upload_backup/upload

[ 26 -eq 0 ]
logger -t backup_uploads Starting backup for /mnt/storage-1/sftp_data/sftpuser1/upload
hostname
rdiff-backup --ssh-no-compression /mnt/storage-1/sftp_data/sftpuser1/upload /mnt/storage-2/data_upload_backup/upload/
testssh root@ed-mh-x86001.mydomain.tld
remote_backup_machine_accessible=0
[ 0 -eq 0 ]
remote_backup_location=root@ed-mh-x86001.mydomain.tld:/mnt/storage-2/data_upload_backup_ed-mh-pi01
logger -t backup_uploads copying local backup up to root@ed-mh-x86001.mydomain.tld:/mnt/storage-2/data_upload_backup_ed-mh-pi01
rsync -aP --delete-after /mnt/storage-2/data_upload_backup/ root@ed-mh-x86001.mydomain.tld:/mnt/storage-2/data_upload_backup_ed-mh-pi01
rm -f /root/.backup_uploads_lockfile.pid
logger -t backup_uploads Lock file /root/.backup_uploads_lockfile.pid deleted, end of script

I do not get this every time the script is run, just occasionally, so perhaps when the directory is actually empty or something like this.

So my question is why am I getting error output from this command, and how to prevent it (at least in the obvious context where i don't want it here)?

This question is very similar to another I asked myself here, but I assure you it is different!

score 2 · Answer 1 · answered Jan 06 '22 at 17:28

That's down to how that cronic captures the error output of your command and how it tries to separate out the xtrace output from that.

It looks for lines starting with one or more + in there.

That's a crude heuristic. If some error message starts with + it will be classified as trace output. If a xtrace line is interleaved with an error message line or another xtrace line for another command started in parallel (like in your case), then the + may not be at the start the line and it will be classified as xtrace output. If the script uses a different PS4, that will not work properly, etc.

For bash scripts, to separate the error output from the xtrace output, you should use BASH_XTRACEFD instead:

BASH_XTRACEFD=7 bash -x /path/to/your-script > out 2> err 7> trace

ilkkachu · Accepted Answer · 2022-01-06T17:42:18.027

Similar issue to the one in the linked question. See this part in the output:

+ + wc -c
ls -A /mnt/storage-2/data_upload_backup/upload

When cronic parses the output to pick up the strace lines, it sees the line without a + in front, and thinks that's part of regular error output.

The output there is messed up, it should be just showing the normal xtrace output for those two commands, i.e.:

+ ls -A /mnt/storage-2/data_upload_backup/upload
+ wc -c

What has to happen here, is that Dash doesn't bother to buffer the whole output line before calling write() to send it to stderr, but instead does it piece-by-piece. And since there's two shell processes doing that, one for each part of the pipeline, and they run at the same time, the outputs can get mixed up. I can repeat that on a loaded system with not too much trouble.

For that result, it's enough for it to write the + first, and then the command, but it seems to be worse, I got outputs like this, too:

+ is_dir_empty .
+ + lswc -A -c .

That is, even the individual words get interleaved.

We can see the individual writes with strace:

$ strace -etrace=write -f dash -xc 'echo aa bb cc'
write(2, "+ ", 2+ )                       = 2
write(2, "echo", 4echo)                     = 4
write(2, " aa", 3 aa)                      = 3
write(2, " bb", 3 bb)                      = 3
write(2, " cc", 3 cc)                      = 3
write(2, "\n", 1
)
...

I can't think of a way to work around that, short of fixing it in the shell.

Bash doesn't seem to have the issue, so maybe switch to using it.

Thanks, I'm going to switch to bash for this script and see if it goes away! — crobar, Jan 06 '22 at 21:49
@crobar, any script with pipes would be susceptible to that issue, though. — ilkkachu, Jan 06 '22 at 22:18

Marcus Müller · Answer 3 · 2022-01-06T17:22:58.820

0

This might or might not be related, but never parse ls output; not even here.

A simple

number_of_items () { echo $#; }
is_empty() {
 shopt -s dotglob
 shopt -s nullglob
 value=$(number_of_items $1/*)
 [[ "${value}" -eq 0 ]]
}

would do, so that you can

if is_empty "${local_backup_location}/${folder_to_backup_basename}" ; then
  …

What you actually want is contains_accessible_things

number_of_items () { echo $#; }
contains_accessible_things() {
 shopt -s dotglob
 shopt -s nullglob
 # check for directory and readability 
 [[ -d "$1" && -r "$1" && ( "$(number_of_items $1/*)" -gt 0 ) ]]
}
… 
if contains_accessible_things "${local_backup_location}/${folder_to_backup_basename}" ; then
  do_stuff
fi

edited Jan 06 '22 at 17:22

answered Jan 06 '22 at 17:17

Marcus Müller

28,836

I don't know why your answer was downvoted, while not the root cause of the problem it seems possibly useful, but I'm not really qualified to say it was wrong to be downvoted. – crobar Jan 06 '22 at 22:12
@crobar thanks for the kind words! Yeah, it was a bit of a wild guess. – Marcus Müller Jan 06 '22 at 22:34

Bash function produces error output, why?

3 Answers3