0

I am trying to create a script that goes through a document and finds the highest character length in a column and return it. This script returns 78,78,78,78 when, what im aiming for is 10,11,14,51

for ((i=1;i<=4;i++)); do
  awk -F"|" '{ print length($i) }' contact_d.csv | sort -nr | sed '1!d';
done

contact_d.csv contains this: (please note dummy data) and its a sample

Barrera|Wilkinson|(09) 1466 1886|eu@dignissim.co.uk
Hopkins|Sellers|(07) 3814 2364|faucibus.orci@libero.co.uk
Hunter|Calderon|(01) 3984 0139|Proin@Uttincidunt.ca

Does anyone have any insight as to why the for loop isn't returning what I am aiming for?

1 Answers1

3

The following code should work:

awk -F'|' '{for (i=1;i<=NF;i++) {len=length($i); if (len>lval[i]) {lval[i]=len; lpos[i]=FNR;}}} END{for (i in lval) printf("Longest value of column %d: %d (line %d)\n",i,lval[i],lpos[i])}' contact_d.csv

For the above example, it returns

Longest value of column 1: 7 (line 1)
Longest value of column 2: 9 (line 1)
Longest value of column 3: 14 (line 1)
Longest value of column 4: 26 (line 2)
  • This script will, for every line, loop over all fields (from 1 to NF, the number of fields) and see if the length of the field (temporarily stored in a variable l) is greater than the longest length found so far, which is stored in an array variable lval under the index of the field (=column) number.

  • On the first line, lval is not yet initialized, and it will behave as if all lval[i] were 0 (in reality, it is more complex than that).

  • If the length of the field i on the current line is longer than the value stored in lval[i], the script will store the current length of the field in lval[i] and the current line number (which is accessible through the "automatic" variable FNR) into the array variable lpos.

  • At the end of the file (END condition), it will print the longest length and corresponding position for all columns. I use the construct for (i in lval) which loops over all indices present in the array lval, so I don't have to save the number of columns in an extra variable (as would be necessary for something like for (i=1;i<=ncols;i++) - in the END block, the concept of "number of fields" becomes somewhat ill-defined although in practice awk will often use the corresponding values for the last line of the file when accessed).

Note that it is rarely necessary to call awk in a shell loop; it can do most of the things you would need a loop for by itself.

As for the reason why your original attempt failed, you are trying to feed a shell variable ($i) to an awk script whose code is enclosed in single quotes (as is recommended), but the single quotes turn off the interpretation of shell variables (and even if not, it would not have worked like that).

AdminBee
  • 22,803