2

Consider a large number of CSV files (*.csv) living in some folder. They all have the same exact header.

How can I efficiently concatenate them all into a single CSV file with the same single header?


I found a number of solutions that solve similar but more specific problems.


The current awk solution doesn't work.

$ cat concat_my_csv_files.sh
    #!/usr/bin/env zsh
    awk '
        FNR==1 && NR!=1 { while (/^<header>/) getline; }
        1 {print}
    ' $1/*.csv > $2

$ ./concat_my_csv_files /some/path/to/csv/files/ full_join.csv

when I do:

grep -F column_A full_join.csv

I see several rows having it.

2 Answers2

4
awk '
    NR == 1 {print}
    FNR == 1 {next}
    {print}
' *.csv

The NR variable is the record number of all the input.
The FNR variable is the record number of only the current file.

This prints the first line seen by awk (the header of the first file), then will skip the first line of each file, printing all the other lines.

glenn jackman
  • 85,964
  • 2
    Note, this can be logically shortened to: awk 'NR == 1 || FNR > 1' *.csv – glenn jackman Oct 05 '15 at 21:25
  • How are we supposed to help you when you don't show us what your files look like? In the absence of actual information, I'm assuming your "header" is one line at the top of each file. – glenn jackman Oct 06 '15 at 01:07
3

Basically you want "head -n 1 firstorany.csv; tail -n +2 *.csv".

set -- *.csv
head -n 1 "$1"
tail -n +2 "$@"

If you have *.csv as arguments in a sh script, omit the first line.