3

Currently, my bash script splits by number of lines. However, I'd like to split a file into X pieces, each of those having total lines equal to the file length divided by X. The script is run as follows:

./script.sh input_file.tsv

So far, in the script, I have this:

INPUT_FILE=$1
SPLIT_NUM_THREADS=15
TOTAL_LINES=$(wc -l < $INPUT_FILE)
SPLIT_NUM=$( echo "scale=6; $TOTAL_LINES / $SPLIT_NUM_THREADS" | bc)

The following issues exist:

  • Using $INPUT_FILE to get TOTAL_LINES gets me the error "ambiguous redirect", but using simply "input.tsv" does not. What's wrong there?
  • SPLIT_NUM is a float, how do I convert it to an int so it can split by lines?

How can I resolve these issues and split a file by number of pieces?

switch87
  • 926
Befall
  • 173
  • 1
    I don't get that "ambiguous redirect" error (GNU bash, Version 4.2.53). It appears if an unset or empty variable is used. Please put echo "$INPUT_FILE" before the line with the error (though I don't see a possible problem yet). – Hauke Laging Nov 19 '14 at 23:09
  • Oh FFS I was running the script and forgetting to put the input file in the command, DERP. That's fixed, thank you. All I need to do is get a rounded number for splitting, any idea there? – Befall Nov 19 '14 at 23:15
  • 1
    Try SPLIT_NUM=$(expr '(' $TOTAL_LINES + $SPLIT_NUM_THREADS - 1 ')' / $SPLIT_NUM_THREADS ). There are more compact ways to do this, depending on your shell. – Mark Plotnick Nov 19 '14 at 23:26
  • @MarkPlotnick that worked perfectly, thanks so much! – Befall Nov 19 '14 at 23:53
  • Maybe you can also make use of something like this for the float part. –  Nov 20 '14 at 03:14

1 Answers1

2

Each part gets the integer divide ($((a/b))). If the line number modulo the number of parts ($((a%b))) is not zero then you have to distribute the spare modulo number over the parts. One solution is to give the modulo value number of parts an additional line.

SPLIT_NUM_THREADS=15
TOTAL_LINES=52
for((i=0;i<$((TOTAL_LINES%SPLIT_NUM_THREADS));i++)); do
  echo $((TOTAL_LINES/SPLIT_NUM_THREADS+1))
done
4
4
4
4
4
4
4
for((i=$((TOTAL_LINES%SPLIT_NUM_THREADS));i<SPLIT_NUM_THREADS;i++)); do
  echo $((TOTAL_LINES/SPLIT_NUM_THREADS))
done
3
3
3
3
3
3
3
3
Hauke Laging
  • 90,279