Concatenate files with same structure, keeping header only for 1st file

Question

I want to merge 28 files with different names and the same data structure, using the script below:

$ cp mohan.csv Consolidate.csv
$ for fname in line
    do 
      cat $fname | sed '1d' >> Consolidate.csv
    done < input.txt

while input.txt contains:

mohan.csv
babu.csv
mahesh.csv
datvik.csk
... etc

and

$ cat mohan.csv
no,name,dept
1,xyz,hr
2,abc,sales

Output of my script:

$ cat Consolidate.csv
no,name,dept
1,xyz,hr
2,abc,sales
babu.csv
mahesh.csv
datvik.csk
... etc

Please, help me with this.

You read each input line into line, but use $fname in the loop? — DonHolgo, Jun 02 '22 at 18:52
while is working fine. for loop is not working with the above script — Mohan Babu, Jun 02 '22 at 19:08
Does this answer your question? Concatenate multiple files with same header — Cbhihe, Jun 03 '22 at 06:24

Cbhihe · Answer 1 · 2022-06-03T06:28:09.810

You can use a variety of tools for this job. I assume that you want to keep the first header and get rid of the other ones. So it is just a matter of appending to the untouched first file, while getting rid of headers in other files.

Get rid of mohan.csv in input.txt:

$ cat input.txt
babu.csv
mahesh.csv
datvik.csk
... etc

Then:

$ cp mohan.csv consolidate.csv
$ for file in "$(<input.txt)"; do
    sed '/no,name,dept/d' "$file" >> consolidate.csv
  done

Or with read:

$ cp mohan.csv consolidate.csv
$ while read -r file; do
    sed '/no,name,dept/d' "$file" >> consolidate.csv
  done <  input.txt

Or, much simpler still, a pure sed answer found at https://unix.stackexchange.com/a/204343/72707

Kusalananda · Answer 2 · 2022-06-03T08:28:24.267

This solution assumes that the list of files is potentially longer than what could be handled by a single invocation of some external command (so the filenames in input.txt can't be listed on the command line in one go).

Assuming that the list of positional parameters (i.e., the list of arguments to some script) contains the list of pathnames of files that we'd like to concatenate, and that we'd like to take the header from only the first of these if the output file does not already exist. If the output file already exists, we can assume that the header already was written to the output.

The minimal code for doing this in the shell would be something like

[ ! -e outfile ] && head -n 1 -- "$1" >outfile
awk 'FNR != 1' "$@" >>outfile

Here, head -n 1 is used to get only the header from the very first file and write it to outfile, if the file outfile does not already exist. Then, awk is used to extract all but the first line from all files, appending these lines to outfile.

This tiny script could be executed with all the input files from your input.txt file, assuming the filenames in input.txt are properly quoted or otherwise "simple" (no embedded whitespace characters, or quotes):

xargs sh -c '
    out=$1; shift
    [ ! -e "$out" ] && head -n 1 -- "$1" >"$out"
    awk "FNR != 1" "$@" >>"$out"
' sh Consolidate.csv <input.txt

This uses xargs to run a small in-line sh -c script, taking the input from input.txt as arguments. The output filename, Consolidate.csv, is given as the first argument and received into out in the in-line script.

If the list in input.txt is very long, xargs will arrange for our in-line script to be called several times, with batches of arguments read from the file.

score 0 · Answer 3 · answered Jun 03 '22 at 08:03

How about

awk 'NR==1 || FNR > 1' $(< input.txt )

Don't quote the "command substitution" so it produces a list of file names on awk's command line (up to the LINE_MAX system config parameter). Should your file names contain white space characters, revert to an array:

readarray -t Arr < input.txt
awk 'NR==1 || FNR > 1' "${Arr[@]}"

Should your header span more than one line, adapt the two numbers-to-be-compared in the awk script.

Concatenate files with same structure, keeping header only for 1st file

3 Answers3