3

I have a list of files with names prefix_0000.mp3 ... prefix_x.mp3, where max(x) = 9999.

I have the bash script:

...
sox prefix_*.mp3 script_name_output.mp3 # this fails because maximum number is 348
rm prefix_*.mp3
...

How can I best split the ordered list of mp3 files into sublists (with retaining ordering) and gradually sox them and remove unneeded files in a bash script?

xralf
  • 15,415

3 Answers3

2

(edited for clarity, and to make it safer)

This should work if there are no gaps in the file sequence. Just replace LAST=0 with the last 4-digit number in your sequence. You’ll be left with script_name_output.mp3.

# make a backup in case anything goes wrong
mkdir backup && cp *.mp3 backup

# enter last 4-digit number in the file sequence
LAST=0
LASTNN__=$(echo ${LAST:0:2})
LAST__NN=$(echo ${LAST:2:2})

# sox 100 files at a time
for i in $(seq -f "%02g" 0 $((--LASTNN__))); do
  LIST=$(paste -sd' ' <(seq -f "prefix_$i%02g.mp3" 0 99));
  sox $LIST script_name_output_$i.mp3;
done

# sox the last group
LAST_LIST=$(paste -sd' ' \
  <(seq -f "prefix_${LASTNN__}%02g.mp3" 0 $LAST__NN))
sox $LAST_LIST script_name_output_${LASTNN__}.mp3

# concatenate all the sox'ed files
OUTPUT_LIST=$(paste -sd' ' \
  <(seq -f "script_name_output_%02g.mp3" 0 $LASTNN__))
sox $OUTPUT_LIST script_name_output.mp3

# delete the intermediate files
rm $OUTPUT_LIST

# delete input files if everything worked
rm prefix_*.mp3
msp
  • 51
  • @xralf: I see you've put a bounty on this issue, and that "the current answers do not contain enough detail". What details do you feel are missing from my answer? I'm happy to clarify any confusion. Have you tried running the first code block I provided? Did it not work? – msp Sep 23 '16 at 03:44
  • @xralf: I've updated my answer for clarity, and to make it safer. Let me know if you're still unable to get it to work. – msp Sep 23 '16 at 15:27
  • I accepted other solution because it was more universal and less cryptic (I'm not used to various bash shortcuts) – xralf Sep 25 '16 at 16:48
2

First, gather the list into a Bash array. If the files are in the current directory, you can use

files=(prefix_????.mp3)

Alternatively, you can use find and sort,

IFS=$'\n' ;
files=($(find . -name 'prefix_*.mp3' printf '%p\n' | sort -d))

Setting IFS tells Bash to split only at newlines. If your file and directory names do not contain spaces, you can omit it.

Alternatively, you can read the file names from a file, say filelist, one name per line, and no empty lines,

IFS=$'\n'
files=($(<filelist))

If you might have empty lines in there, use

IFS=$'\n'
files=($(sed -e '/$/ d' filelist))

Next, decide how many files you want in each slice, the name of the temporary accumulator file, as well as the final combined file name:

s=100
src="combined-in.mp3"
out="combined-out.mp3"

Then, we just need to slice the list, and process each sublist:

while (( ${#files[@]} > 0 )); do
    n=${#files[@]}

    # Slice files array into sub and left.
    if (( n <= s )); then
        sub=("${files[@]}")
        left=()
    else
        (( n-= s ))
        sub=("${files[@]:0:s}")
        left=("${files[@]:s:n}")
    fi

    # If there is no source file, but there is
    # a sum file, rename sum to source.
    if [ ! -e "$src" -a -e "$out" ]; then
        mv -f "$out" "$src"
    fi

    # If there is a source file, include it first.
    if [ -e "$src" ]; then
        sub=("$src" "${sub[@]}")
    fi

    # Run command.
    if ! sox "${sub[@]}" "$out" ; then
        rm -f "$out"
        echo "Failed!"
        break
    fi

    rm -f "$src"
    echo "Done up to ${sub[-1]}."
    files=("${left[@]}")

    # rm -f "${sub[@]}"
done

If sox reports a failure, the loop will break early. Otherwise, it will output the last name in the batch processed.

We use an if for the sox command to detect the failure, and remove the output file if indeed a failure occurred. Because we also postpone modifying the files array until after a successful sox command, we can safely edit/fix individual files, and then just rerun the while loop, to continue where we stopped.

If you are short on disk space, you can uncomment the second-to-last line, rm -f "${sub[@]}", to remove all files that have been successfully combined.


The above processes the initial parts over and over again.

As I explained in a comment below, the results will be much better if you concatenate the files first using ffmpeg (without recoding using sox), possibly followed by a recoding pass using sox. (Or, you could recode each first, of course.)

First, you create a pipe-separated list (string) of the file names,

files="$(ls -1 prefix_????.mp3 | tr '\n' '|')"

remove the final superfluous pipe,

files="${files%|}"

and feed them to ffmpeg, with no recoding:

ffmpeg -i "concat:$files" -codec copy output.mp3

Note that you may wish to run

ulimit -n hard

to raise the number of open files to the maximum allowed for the current process (hard limit); you can query it using ulimit -n. (I don't recall whether ffmpeg concat: opens the sources sequentially or all at once.)

If you do this more than once, I'd put it all into a simple script:

#!/bin/bash
export LANG=C LC_ALL=C
if [ $# -le 2 -o "$1" = "-h" -o "$1" = "--help" ]; then
    exec >&2
    printf '\n'
    printf 'Usage: %s -h | --help ]\n' "$0"
    printf '       %s OUTPUT INPUT1 .. INPUTn\n' "$0"
    printf '\n'
    printf 'Inputs may be audio mp3 or MPEG media files.\n'
    printf '\n'
    exit 1
fi

output="$1"
shift 1
ulimit -n hard

inputs="$(printf '%s|' "${@}")"
inputs="${inputs%|}"

ffmpeg -i "concat:$inputs" -codec copy "$output"
retval=$?

if [ $retval -ne 0 ]; then
    rm -f "$output"
    echo "Failed!"
    exit $retval
fi

# To remove all inputs now, uncomment the following line:
# rm -f "${@}"
echo "Success."
exit 0

Note that because I use -codec copy instead of -acodec copy, the above should work for all kinds of MPEG files, not just mp3 audio files.

  • I accepted the answer, but I get this error can't open input fileprefix_0098.mp3': Too many open files Failed!` – xralf Sep 25 '16 at 19:19
  • @xralf: Reduce s; use for example s=50 (instead of s=100). – Nominal Animal Sep 25 '16 at 19:45
  • This was the first thing I've done and there is still this error. – xralf Sep 25 '16 at 19:46
  • @xralf: You found a bug in my snippet; now fixed. I had n >= s (more files than we want in one sublist), when I obviously meant n <= s, immediately after the # Slice files array comment. (So, instead of slicing, it used the entire array for the first sublist. Ouch. Apologies for the idiotic bug.) – Nominal Animal Sep 25 '16 at 19:50
  • Yes, you're right. I wasn't thinking on it, but instead tried to test what is in echo "${sub[@]}" before break. But, thinking is better practise :-) if you're not multitasking like me today – xralf Sep 25 '16 at 20:16
  • I discovered other kind of problem. The more files, the more noise is in the resulting file. I noticed that the problem is because, there is a file we concatenate repeatedly. So, the solution is good logically, but from the listening point of view, the result is unusable. I wonder if sox has some switch to get rid of this problem. Or if the bash script can treat temporary files in a different way (not to use one temporary many times) – xralf Sep 25 '16 at 21:42
  • @xralf: The noise occurs, because you end up re-encoding the initial bit over and over again. You can minimize this by encoding the files in groups, and finally merging the groups. (Right now, the snippet re-encodes the initial part over and over again.) – Nominal Animal Sep 25 '16 at 22:28
  • A better solution would be to use e.g. ffmpeg to concatenate the files without re-encoding (and finally re-encode with sox if necessary or desired). First, combine the names into a pipe-separated list: files="$(ls -1 prefix_????.mp3 | tr '\n' '|')", then remove the trailing pipe, files="${files%|}", and finally run ffmpeg -i "concat:$files" -acodec copy output.mp3 to concatenate them all into output.mp3. – Nominal Animal Sep 25 '16 at 22:41
1

You may be able to raise the file descriptor limit:

ulimit -n 11000

As a regular user, you should be able to raise that limit up to the hard limit. See ulimit -Hn for the current hard limit.

A non-root process cannot raise the hard limit (that's the whole point, the administrator sets it to prevent ordinary users to abuse the system resources). If you have superuser access via sudo, you can start a new non-superuser shell with the hard and soft limit raised with:

sudo HOME="$HOME" zsh -c 'ulimit -HSn 100000; USERNAME=$SUDO_USER; zsh'

Or that sox command:

sudo HOME="$HOME" zsh -c 'ulimit -HSn 100000; USERNAME=$SUDO_USER
                          sox prefix_*.mp3 script_name_output.mp3'

If on Linux, you can also call the prlimit command as root to raise the limit of your shell (and its children):

bash-4.3$ ulimit -n
1024
bash-4.3$ ulimit -Hn
65536
bash-4.3$ sudo prlimit --nofile=100000:100000 --pid="$$"
bash-4.3$ ulimit -Hn
100000
bash-4.3$ ulimit -n
100000

Otherwise, you could do the job in 2 steps: concatenate the files in groups of 347 files and then concatenate the intermediary files.

With zsh:

intermediate_concat() sox "$@" intermediate.$((++n)).mp3
autoload zargs
n=0
zargs -n 347 prefix_*.mp3 -- intermediate_concat
sox intermediate.*.mp3(n) script_name_output.mp3
  • That's strange. I used ulimit -n 11000 at the beginning of my script and it writes sox FAIL formats: can't open input fileprefix_0098.mp3': Too many open files` – xralf Sep 25 '16 at 15:14
  • ulimit: open files: cannot modify limit: Operation not permitted I overlooked this error at the beginning. Should I always use sudo for my script? – xralf Sep 25 '16 at 16:51
  • @xralf, see edit. – Stéphane Chazelas Sep 25 '16 at 19:34