How to count the number of characters in a line, except a specific character?

Question

This is part file

N W N N N N N N N N N
N C N N N N N N N N N
N A N N N N N N N N N
N N N N N N N N N N N
N G N N N N N N N N N
N C N N N C N N N N N
N C C N N N N N N N N

In each line I want to count the total number of all characters that are not "N"

my desire output

Use sed to replace stuff you don't care about and awk to count the remaining length sed 's/N//g ; s/\s//g' file | awk '{ print length($0); }' — Rolf, Oct 10 '17 at 07:19

score 13 · Accepted Answer · edited Oct 06 '17 at 21:52

13

GNU awk solution:

awk -v FPAT='[^N[:space:]]' '{ print NF }' file

FPAT='[^N[:space:]]' - the pattern defining a field value (any character except N char and whitespace)

The expected output:

edited Oct 06 '17 at 21:52

Jeff Schaller

67,283
35
116
255

answered Oct 06 '17 at 20:45

RomanPerekhrest

30,212

score 9 · Answer 2 · answered Oct 06 '17 at 20:48

9

awk '{ gsub("[ N]",""); print length() }'

answered Oct 06 '17 at 20:48

Hauke Laging

90,279

can also use awk '{print gsub(/[^ N]/,"")}' – Sundeep Oct 07 '17 at 04:47

αғsнιη · Answer 3 · 2017-10-07T05:31:35.000

7

Another awk approach (will return -1 for empty lines).

awk -F'[^N ]' '$0=NF-1""' infile

Or in complex, it will return -1 on empty lines, 0 on whitespaces (Tabs/Spaces) lines only.

awk -F'[^N \t]+' '$0=NF-1""' infile

edited Oct 07 '17 at 05:31

answered Oct 06 '17 at 21:30

αғsнιη

41,407

will print -1 for empty lines... but then that might be desirable to distinguish line made up of only N/space vs empty line... – Sundeep Oct 07 '17 at 04:59
1

@Sundeep Yes, that's correct. also see my update where lines was only contains Tabs or Spaces to indicate as 0 – αғsнιη Oct 07 '17 at 05:32

Sundeep · Answer 4 · 2017-12-13T04:38:00.047

assuming that count is needed for each line other than space character and N

$ perl -lne 'print tr/N //c' ip.txt 
1
1
1
0
1
2
2

return value of tr is how many characters were replaced
c to complement the set of characters given
Note the use of -l option, strips newline character from input line to avoid off-by-one error and also adds newline character for the print statement

A more generic solution

perl -lane 'print scalar grep {$_ ne "N"} @F' ip.txt

-a option to automatically split input line on white-spaces, saved in @F array
grep {$_ ne "N"} @F returns array of all elements in @F which doesn't match the string N
- regex equivalent would be grep {!/^N$/} @F
use of scalar will give number of elements of the array

score 6 · Answer 5 · answered Oct 06 '17 at 21:05

6

Alternative awk solution:

awk '{ print gsub(/[^N[:space:]]/,"") }' file

gsub(...) - The gsub() function returns the number of substitutions made.

The output:

answered Oct 06 '17 at 21:05

RomanPerekhrest

30,212

agc · Answer 6 · 2017-10-07T13:39:59.630

5

tr and POSIX shell script:

tr -d 'N ' < file | while read x ; do echo ${#x} ; done

bash, ksh, and zsh:

while read x ; do x="${x//[ N]}" ; echo ${#x} ; done < file

edited Oct 07 '17 at 13:39

answered Oct 07 '17 at 02:19

agc

7,223

1

can use awk '{print length()}' to avoid the slower shell looping.. but then one could do it all with awk itself... – Sundeep Oct 07 '17 at 04:54
@Sundeep, It's true, (if both are started at the same time), that awk looping is faster than shell looping. But the shell is always in memory, and awk might not be -- when awk is not already loaded, or swapped out, the overhead of loading it, (the time lost, can be greater than the advantage of running awk -- particularly on a small loop. In such cases, (i.e. this case), awk can be slower. – agc Oct 07 '17 at 13:08
well, am certainly not worried about time for small stuff... see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice – Sundeep Oct 07 '17 at 13:15
1

@Sundeep, I do worry. Some time ago I used to use floppy based Linux distros, which could run off a floppy, in a few megs of ram. Needlessly using awk in a shell script could make such a system crawl on all fours. Generally: the same latency drag applies to systems in limited firmware, or any system under heavy load. – agc Oct 07 '17 at 13:33

score 1 · Answer 7 · answered Oct 08 '17 at 08:30

1

A short combination of tr and awk:

$ tr -d ' N' <file.in | awk '{ print length }'
1
1
1
0
1
2
2

This deletes all spaces an Ns from the input file and awk just prints the length of each line.

answered Oct 08 '17 at 08:30

Kusalananda

333,661

score 0 · Answer 8 · answered Oct 07 '17 at 11:15

Another easy way is to do it in python, which comes pre-installed in most of unix environments. Drop the following code in a .py file:

with open('geno') as f:
    for line in f:
        count = 0
        for word in line.split():
            if word != 'N':
                count += 1
        print(count)

And then do:

python file.py

From your terminal. What the above does is:

for each line in a file named "geno"
set a counter to 0 and increment it each time we find a value != 'N'
when the end of the current line is reached, print the counter and go to the next line

How to count the number of characters in a line, except a specific character?

8 Answers8