How to add header to multiple files using awk

Question

I would like to add a header line containing whitespaces to multiple files.

Here is what I have so far:

#!/bin/bash
# script name is "add_header.sh"
# ARG1 = HEADER STRING
# ARG2,3,... = ARRAY OF FILES TO ADD HEADER TO, RELATIVE DIRECTORY
HEADER=$1 
shift
for FILE in $@; do
    awk -v HEADER=$HEADER FILE=$FILE 'BEGIN{print HEADER} {print}' FILE > FILE.new
done

Unfortunately, when I run it on my use case, it fails because of the white spaces:

touch file1 file2 file3
./add_header.sh "some header with spaces" file1 file2 file3

which gives following error:

awk: fatal: cannot open file `with' for reading (No such file or directory)
awk: fatal: cannot open file `with' for reading (No such file or directory)
awk: fatal: cannot open file `with' for reading (No such file or directory)

Is there a way to escape white spaces inside a bash variable? I have tried using \ before each space, but the error now changes to:

./add_header.sh "some\ header\ with\ spaces" file1 file2 file3
awk: fatal: cannot open file `with\' for reading (No such file or directory)
awk: fatal: cannot open file `with\' for reading (No such file or directory)
awk: fatal: cannot open file `with\' for reading (No such file or directory)

which means the whitespaces are not being escaped.

FILE does not exist. You need to use "$FILE" at the end of the command e.g awk -v HEADER="$HEADER" -v FILE="$FILE" 'BEGIN{print HEADER} {print > FILE".new"}' "$FILE" — sseLtaH, Nov 14 '21 at 15:42
Quote your variables header="$1", awk HEADER="$header" .... Avoid upper case variable names. Always paste your script into https://shellcheck.net, a syntax checker, or install shellcheck locally. Make using shellcheck part of your development process. — waltinator, Nov 14 '21 at 15:44
Thanks @waltinator. I've combined your feedback with HatLess — Jia Geng, Nov 14 '21 at 16:16
@HatLess I indeed forgot to add the file argument. I also forgot to add a new -v flag for the FILE argument. However, {print > FILE".new"} does not work. — Jia Geng, Nov 14 '21 at 16:19

Kusalananda · Answer 1 · 2021-11-14T17:03:24.490

#!/bin/sh

header=$1; shift

for pathname do
    { printf '%s\n' "$header"; cat -- "$pathname"; } >"$pathname.new"
done

There is no real need for awk here as we want to concatenate the header and the old file contents. We do this by simply outputting the header string with printf and then using cat to output the file's contents. We redirect the output of both printf and cat to a new file.

Would you really want to do it with awk, then either loop over the files as in the code above, or let awk process each file without an explicit shell loop.

First variation with an explicit shell loop:

#!/bin/sh
header=$1; shift
for pathname do
    header=$header awk 'BEGIN { print ENVIRON["header"] }; 1' "$pathname" >"$pathname.new"
done

The above solution would be the slowest variant of all variants in this answer, as it invokes awk once per file.

Second variation without a shell loop (requires an awk that understands BEGINFILE, like GNU awk does):

#!/bin/sh
header=$1; shift
header=$header awk '
    BEGINFILE { fname = FILENAME ".new"; print ENVIRON["header"] >(fname) }
    { print >(fname) }' "$@"

Third variation (portable variant of the last piece of code):

#!/bin/sh
header=$1; shift
header=$header awk '
    FNR == 1 { fname = FILENAME ".new"; print ENVIRON["header"] >(fname) }
    { print >(fname) }' "$@"

Thank you very much for the detailed answer. I will go for the first suggestion. One question, why cant we use echo $header instead of printf? Is it because of compatibility issues with echo? — Jia Geng, Nov 14 '21 at 17:16
@JiaGeng First of all, using $header unquoted may invoke word splitting and filename globbing (e.g. if the header is * * * hello * * *, each * will be globbing). Secondly, echo may change the data before printing it (interpreting certain backspace sequences like \t and \n). Thirdly, if the header is simply -n, you'd get unexpected results. See also more details here: https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo — Kusalananda, Nov 14 '21 at 17:26

score 0 · Answer 2 · answered Nov 14 '21 at 17:13

The right way to do this, assuming you have no empty input files, would be:

#!/usr/bin/env bash
header=$1 
shift
awk -v header="$header" '
    FNR==1 { close(out); out=FILENAME ".new"; $0=header ORS $0 }
    { print > out }
' "$@"

The above will work using any awk in any shell on every Unix box. If you can have empty input files it'd need a tweak.

score -2 · Answer 3 · answered Nov 14 '21 at 16:26

-2

Here is the modified version which works for me:

#!/bin/bash
header=$1
shift
for file in $@; do 
    awk -v HEADER="$header" 'BEGIN{print HEADER} {print}' "$file" > "$file".new
done

I tried to use {print > [FILE].new} inside the awk expression but it did not work. It just printed to stdout. Maybe because new files cannot be created within awk.

answered Nov 14 '21 at 16:26

Jia Geng

1

1

You need to quote $@, or you won't be able to cope with filename containing whitespace or filename globbing characters. – Kusalananda Nov 14 '21 at 16:39

How to add header to multiple files using awk

3 Answers3