1

Here is an expression I type on a routine basis because everyone uses headed CSV files, and I must pass around the header for my purposes as well:

cat foo.csv | awk -F',' 'BEGIN{start=0}{if(start==0){start = 1; print $0; next;} ...}'

There must be some trivial setting to allow the header to pass through, and avoid either creating a BASH script wrapper (and towing that around in my brain) or writing this over and over.

Is there such a setting?

Chris
  • 961
  • 7
  • 20
  • That seems to be a convoluted way of doing NR == 1 {print; next;} – muru Feb 10 '20 at 15:41
  • Highly related, probably dupe: https://unix.stackexchange.com/questions/11856/sort-but-keep-header-line-at-the-top (in particular, awk version in https://unix.stackexchange.com/a/71949/70524) – muru Feb 10 '20 at 15:44
  • 1
    You can start reading https://stackoverflow.com/tags/awk/info to learn to write more idiomatic awk. – glenn jackman Feb 10 '20 at 16:15
  • 1
    just slap a BEGIN{if(getline>0)print} at the beginning of your script. If you want to process multiple files, then the NR>1{...} you see in thousands of examples (including the aswer to this Q) is WRONG, you need FNR>1{...} instead: you either want to skip the header of each file, or if you don't, you also don't want to do a pointless test for each line. –  Feb 10 '20 at 16:36
  • @mosvy I had a warning about NR/FNR in the answer, but I guess I should highlight it more prominently. – AdminBee Feb 10 '20 at 16:55

1 Answers1

5

I assume that you still want to perform text-processing operations with awk on this CSV file. If so, I would recommend adding a condition on the "line number" to it, as in:

awk -F',' 'NR==1{print} NR>1{ your code here }' foo.csv

Here, NR is the awk builtin variable for the "record number", which usually defaults to the line number (notice that when processing multiple files, this is the "global number of processed lines", the per-file-line number is FNR). You can also easily omit printing the header by leaving out the NR==1{...} part.

If in the end you will be using print in your manipulations anyway, you can "golf" this to

awk -F',' 'NR>1{ your code here }1' foo.csv

the 1 standing for "print the resulting line ($0)".

Also:

  • you don't need to cat a file to pipe it to awk, just supply it as command-line argument
  • variables that are uninitialized default to "0", so you don't really need the start=0 statement in your BEGIN section
AdminBee
  • 22,803
  • the left to right paradigm is habitual. Do you know of any symbolic way to get rid of cats if I just want the left to right pipeline? happy to add another question if it isn't a simple response – Chris Feb 10 '20 at 16:01
  • 1
    I don't, but maybe someone more knowledgeable can comment on this. It not, feel free to open enother question if there is no answer on SE yet. – AdminBee Feb 10 '20 at 16:08
  • You can use redirect: < foo.csv awk -F',' 'NR>1{ your code here }1' but why is passing a file as an arg not left to right? – user1794469 Feb 10 '20 at 17:00
  • 1
    @user1794469 is correct. Also compare the output of awk '{print FILENAME; exit}' file to using an alternative like < file awk '{print FILENAME; exit}' and notice that doing the latter (just like with cat file |) robs you of the ability to access the input file name. The only real benefit to letting the shell open the file instead of awk opening it is then if the file can't be opened and you're also redirecting output to a file then that output file wouldn't even be created rather than being created but empty. Not worth the tradeoff in general. – Ed Morton Feb 10 '20 at 20:45