Why does this script output corrupt files when run automatically with crontab?

Question

I have a script that scans a folder for all .mp3 files and creates an index. Next it waits 5s otherwise it doesn't work correctly. Lastly, it removes bad characters from the song names.

I have this running automatically with sudo crontab:

#!/bin/bash
#Creates index file.
find /var/www/html/uploads/Music/ -name '*.mp3' > /var/www/html/uploads/Music/songs.index | sleep 5s | sudo sed -i -e 's/\/var\/www\/html/\.\./g' /var/www/html/uploads/Music/songs.index

For some reason when it's run through crontab it creates a file that has to be converted to text and back to an index in order to read it. When you run the same script manually it works fine. What am I missing?

crontab file:

1 * * * * /home/aaeadmin/bin/midnightRun.bash

Kusalananda · Accepted Answer · 2019-03-30T00:28:38.320

It is likely because you run the commands in a pipeline.

Each part of a pipeline is started at the same time and run concurrently with the other parts of the same pipeline. This means that the find command starts at exactly the same time as the sed command. It is only data passed through the pipeline, from the standard output of one command into the standard input of the next, that synchronises the different commands.

Also note that you don't actually use the pipeline aspect of the pipeline. There is no data being passed between the commands.

Your script would be better written as

#!/bin/bash
#Creates index file.

find /var/www/html/uploads/Music/ -name '*.mp3' > /var/www/html/uploads/Music/songs.index

sed -i -e 's/\/var\/www\/html/\.\./g' /var/www/html/uploads/Music/songs.index

Here, the find command would finish executing before the sed command started. I have also removed the sudo command as it is clearly not needed (if find can write into the file, then sed can read and modify it without sudo).

If you find that you do need sudo to write songs.index, I would suggest that you instead run this cron job in the crontab belonging to a user with permissions to write to the target directory /var/www/html/uploads/Music.

A pipelined solution would be

#!/bin/sh
#Creates index file.

find /var/www/html/uploads/Music/ -name '*.mp3' |    
sed -e 's,^/var/www/html,..,' >/var/www/html/uploads/Music/songs.index

Here, find writes directly to the sed command, and the result of the sed command is written to the index file. The communication of data (pathnames) from find into sed keeps the two processes synchronised and sed would wait for find to produce the next line of input before continuing (and vice versa; find would wait for sed to process the data before trying to output more data).

I've also made the sed command easier to read and I've anchored the regular expression to the start of the line and removed the unnecessary g flag at the end. Since it now reads from the find command, I've also removed the -i option (strictly speaking, the -e option could also be removed).

Another thing I've changed is the #! line. You're not using any bash-specific features, so we may as well run the script with a (possibly) more light-weight shell.

If you want to write the filenames of the found files to the index, with the initial /var/www/html replaced by .. you could also do that directly from find:

#!/bin/sh
#Creates index file.

find /var/www/html/uploads/Music/ -type f -name '*.mp3' -exec sh -c '
    for pathname do
        printf "../%s\n" "${pathname#/var/www/html/}"
    done' sh {} + >/var/www/html/uploads/Music/songs.index

This single find command would find the pathnames all regular files (i.e. not directories etc.) whose names matched the given pattern. For batches of these pathnames, a short inline shell script is called. The script simply iterates over the current batch of pathnames and prints them out slightly modified.

The modification to the pathname is done through the parameter substitution ${pathname#/var/www/html/}. This removes the string /var/www/html/ from the start of the value in $pathname. The printf format string used would then ensure that this part is replaced by ../.

Why does this script output corrupt files when run automatically with crontab?

1 Answers1