9

This is part file

N W N N N N N N N N N
N C N N N N N N N N N
N A N N N N N N N N N
N N N N N N N N N N N
N G N N N N N N N N N
N C N N N C N N N N N
N C C N N N N N N N N

In each line I want to count the total number of all characters that are not "N"

my desire output

1
1
1
0
1
2
2
Anna1364
  • 1,026
  • Use sed to replace stuff you don't care about and awk to count the remaining length sed 's/N//g ; s/\s//g' file | awk '{ print length($0); }' – Rolf Oct 10 '17 at 07:19

8 Answers8

13

GNU awk solution:

awk -v FPAT='[^N[:space:]]' '{ print NF }' file
  • FPAT='[^N[:space:]]' - the pattern defining a field value (any character except N char and whitespace)

The expected output:

1
1
1
0
1
2
2
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
9
awk '{ gsub("[ N]",""); print length() }'
Hauke Laging
  • 90,279
7

Another awk approach (will return -1 for empty lines).

awk -F'[^N ]' '$0=NF-1""' infile

Or in complex, it will return -1 on empty lines, 0 on whitespaces (Tabs/Spaces) lines only.

awk -F'[^N \t]+' '$0=NF-1""' infile
αғsнιη
  • 41,407
  • will print -1 for empty lines... but then that might be desirable to distinguish line made up of only N/space vs empty line... – Sundeep Oct 07 '17 at 04:59
  • 1
    @Sundeep Yes, that's correct. also see my update where lines was only contains Tabs or Spaces to indicate as 0 – αғsнιη Oct 07 '17 at 05:32
7

assuming that count is needed for each line other than space character and N

$ perl -lne 'print tr/N //c' ip.txt 
1
1
1
0
1
2
2
  • return value of tr is how many characters were replaced
  • c to complement the set of characters given
  • Note the use of -l option, strips newline character from input line to avoid off-by-one error and also adds newline character for the print statement


A more generic solution

perl -lane 'print scalar grep {$_ ne "N"} @F' ip.txt 
  • -a option to automatically split input line on white-spaces, saved in @F array
  • grep {$_ ne "N"} @F returns array of all elements in @F which doesn't match the string N
    • regex equivalent would be grep {!/^N$/} @F
  • use of scalar will give number of elements of the array
Sundeep
  • 12,008
6

Alternative awk solution:

awk '{ print gsub(/[^N[:space:]]/,"") }' file
  • gsub(...) - The gsub() function returns the number of substitutions made.

The output:

1
1
1
0
1
2
2
5
  1. tr and POSIX shell script:

    tr -d 'N ' < file | while read x ; do echo ${#x} ; done
    
  2. bash, ksh, and zsh:

    while read x ; do x="${x//[ N]}" ; echo ${#x} ; done < file
    
agc
  • 7,223
  • 1
    can use awk '{print length()}' to avoid the slower shell looping.. but then one could do it all with awk itself... – Sundeep Oct 07 '17 at 04:54
  • @Sundeep, It's true, (if both are started at the same time), that awk looping is faster than shell looping. But the shell is always in memory, and awk might not be -- when awk is not already loaded, or swapped out, the overhead of loading it, (the time lost, can be greater than the advantage of running awk -- particularly on a small loop. In such cases, (i.e. this case), awk can be slower. – agc Oct 07 '17 at 13:08
  • well, am certainly not worried about time for small stuff... see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice – Sundeep Oct 07 '17 at 13:15
  • 1
    @Sundeep, I do worry. Some time ago I used to use floppy based Linux distros, which could run off a floppy, in a few megs of ram. Needlessly using awk in a shell script could make such a system crawl on all fours. Generally: the same latency drag applies to systems in limited firmware, or any system under heavy load. – agc Oct 07 '17 at 13:33
1

A short combination of tr and awk:

$ tr -d ' N' <file.in | awk '{ print length }'
1
1
1
0
1
2
2

This deletes all spaces an Ns from the input file and awk just prints the length of each line.

Kusalananda
  • 333,661
0

Another easy way is to do it in python, which comes pre-installed in most of unix environments. Drop the following code in a .py file:

with open('geno') as f:
    for line in f:
        count = 0
        for word in line.split():
            if word != 'N':
                count += 1
        print(count)

And then do:

python file.py

From your terminal. What the above does is:

  • for each line in a file named "geno"
  • set a counter to 0 and increment it each time we find a value != 'N'
  • when the end of the current line is reached, print the counter and go to the next line