107

I am trying to sum certain numbers in a column using awk. I would like to sum just column 3 of the "smiths" to get a total of 212. I can sum the whole column using awk but not just the "smiths". I have:

awk 'BEGIN {FS = "|"} ; {sum+=$3} END {print sum}' filename.txt

Also I am using putty. Thank you for any help.

smiths|Login|2
olivert|Login|10
denniss|Payroll|100
smiths|Time|200
smiths|Logout|10
Braiam
  • 35,991
jake
  • 1,215
  • 2
  • 10
  • 9

5 Answers5

129
awk -F '|' '$1 ~ /smiths/ {sum += $3} END {print sum}' inputfilename
  • The -F flag sets the field separator; I put it in single quotes because it is a special shell character.
  • Then $1 ~ /smiths/ applies the following {code block} only to lines where the first field matches the regex /smiths/.
  • The rest is the same as your code.

Note that since you're not really using a regex here, just a specific value, you could just as easily use:

awk -F '|' '$1 == "smiths" {sum += $3} END {print sum}' inputfilename

Which checks string equality. This is equivalent to using the regex /^smiths$/, as mentioned in another answer, which includes the ^ anchor to only match the start of the string (the start of field 1) and the $ anchor to only match the end of the string. Not sure how familiar you are with regexes. They are very powerful, but for this case you could use a string equality check just as easily.

KiriSakow
  • 105
Wildcard
  • 36,499
  • 6
    By the way, my go-to favorite awk reference is http://www.grymoire.com/Unix/Awk.html. Very helpful page. – Wildcard Nov 14 '15 at 05:10
  • 2
    Thank you @Wildcard ! I was able to neatly aggregate a uncompressed size of particular files in big zip archive based on your advice :)

    unzip -lv /appl/tmp/data.lar | grep documentlibrary | awk '{sum += $1} END {print sum/1024/1024}'

    – Pawel Kruszewski Dec 12 '17 at 11:24
45

Another approach is to use awk associative arrays, more info here. This line produces the desired output:

awk -F '|' '{a[$1] += $3} END{print a["smiths"]}' filename.txt

As a side effect, the array stores all other values:

awk -F '|' '{a[$1] += $3} END{for (i in a) print i, a[i]}' filename.txt

Output:

smiths 212
denniss 100
olivert 10
Andrey
  • 820
5

Very good so far. All you need to do is add a selector before the block to add the sum. Here we check that the first argument contains only "smiths":

awk 'BEGIN {FS = "|"} ; $1 ~ /^smiths$/ {sum+=$3} END {print sum}'

You could shorten this a bit by specifying the field separator as an option. In awk it's generally a good idea to initialize variables on the command line:

awk -F'|' '$1 ~ /^smiths$/ {sum+=$3} END {print sum}'
RobertL
  • 6,780
3

I personally would prefer to keep the awk section as simple as possible and do as much as you can without it. Comingled logic doesn't take advantage of the power of Unix pipelines and is thus harder to understand, debug or modify for closely related use cases.

cat filename.txt | perl -pe 's{.*|}{}g' | awk '{sum+=$1} END {print sum}'
Sridhar Sarnobat
  • 1,802
  • 20
  • 27
1
cat filename.txt | grep smiths | awk -F '|' '{sum+=$NF} END {print sum}'
  • -F option to specify separator.
  • $NF is for "last column".
  • 1
    cat and grep are unnecessary here. – Andrey Apr 18 '18 at 13:29
  • Why is grep unneeded @Andrey? OP wants to add only "smiths" rows. You'd need to modify the awk statement, right? – E L Jun 19 '19 at 00:27
  • 1
    @EL yes, the awk statement should be modified to /smiths/{...} if the grep call is not there. This is a trivial modification, but it provides significant benefits: decreases the number of running processes, simplifies the error control, and makes the code clearer. – Andrey Jun 20 '19 at 13:45