Bash script to find maximum number of a certain character (".") in any single line of a file

Question

There is a file with an unknown number of lines. In the file each line contains unknown many periods (.).

How can I find the maximum period number? I am not interested in finding the line that contains the most periods.

For example: Processing the file content below in bash should give the answer "4".

one.one
two.two.two
three.three.three.three
four..four.
five..five..
six...six

Quite related: How to count the number of a specific character in each line?. Take an answer there, sort the results and get the last line, you have the answer. — Quasímodo, Jul 23 '20 at 12:25
tr -dc '\n.' | sort | tail -n1 | wc -m https://stackoverflow.com/q/8629410 — alecxs, Jul 24 '20 at 07:11

score 3 · Answer 1 · answered Jul 23 '20 at 09:25

You could do it with awk:

awk '{gsub(/[^.]/,""); len=length(); if (len>max) {max=len}} END{printf("Largest count of \".\": %d\n",max)}' file.txt

This will, for every line, replace all characters that are not ., by "nothing" (i.e. remove everything that is not a .). Then, it will count the length of the remaining string, and store the largest value found in max. At end-of-file, it will print the result.

score 3 · Answer 2 · answered Jul 23 '20 at 10:08

Alternatively, you can count the number of a specific character, and leave the text unchanged for further processing, such as printing the line itself, or counting another character. gsub returns the number of replacements.

awk '{ nDot = gsub ("[.]", "."); etc .. }'

score 3 · Answer 3 · answered Jul 23 '20 at 10:08

3

The awk-less answer:

sed 's/[^.]//g' test.dat | wc -L

In other words, keep only the dots, and use the -L option of wc: -L, --max-line-length: print the maximum display width

answered Jul 23 '20 at 10:08

xenoid

8,888

2

Note that wc -L is a GNU extension. – Stéphane Chazelas Jul 23 '20 at 11:24

Chris Davies · Answer 4 · 2020-07-23T13:12:03.697

2

Let's generate an example,

cat >file <<'X'
this.world.
this
1.2.3.4.5
all.is.done
X

With perl

perl -e 'while (<>) { $x = $n if ($n = ($_ =~ y/.//)) > $x } print "$x\n"' file
4

With awk

awk '{ gsub("[^.]", ""); if ((n = length($0)) > x) { x = n } } END { print x }' file
4

With tr and a non-POSIX extended version of wc

tr -cd '.\n' <file | wc -L
4

edited Jul 23 '20 at 13:12

answered Jul 23 '20 at 10:41

Chris Davies

116,213
16
160
287

The stderr output format of dd is only specified in the POSIX locale, and even there, all it says is it shall be "%u+%u records out\n", <number of whole output blocks>, <number of partial output blocks> (note that leading blanks are also allowed). GNU dd doesn't appear to be compliant in that regard. – Stéphane Chazelas Jul 23 '20 at 11:33
And DD reports bytes and not characters, so if you to generalize to any character it won't work. Only the awk and the wc -L version will work on characters coded in more than one byte. – xenoid Jul 23 '20 at 13:07
Ok. Option removed. Thank you both – Chris Davies Jul 23 '20 at 13:11
1

The version with tr and wc -L works OK for me (at least with French characters, assuming UTF-8 encoded input file). – xenoid Jul 23 '20 at 13:17
1

In UTF-8, bytes with a 0 upper bit can only be 1-byte characters, bytes of multi-bytes characters always have a 1 upper bit, so the ASCII for . cannot match a byte of a multi-byte character. – xenoid Jul 23 '20 at 13:21
1

@xenoid, GNU wc -L reports the display width, not the number of characters. See Get the display width of a string of characters – Stéphane Chazelas Jul 23 '20 at 18:12

Rakesh Sharma · Answer 5 · 2020-07-24T02:22:32.027

One way with awk could be as follows. We need to realize that the following equality holds:

number of fields = number of delimiters + 1

Note that adding a 0 to the operand in arithmetic comparison, even though not always necessary, is a good practice to inculcate. At least it helps me think about one less thing, for it becomes an auto reflex coding action. Since Awk does not provide separate operators for arithmetic nd string comparisons, hence coercion is needed to help disambiguate a string from a math operand or rather context.

$ awk -F '[.]' '
    NF>m+0 {m=NF}
    END {print --m}
' file
4

$ awk '
    gsub(/[^.]+/, "") &&
    ! index(t, $0) { t = $0 }
    END { print length(t) }
' file

$ perl -lne '
    my $k = tr/.//;
    $k > $m and $m = $k;
    }{ print $m+0;
' file

The GNU sed editor can also be used in conjunction with the binary calculator bc utility. Idea is we keep lines stripped off of all non-dots and the current longest string of pure dots is held in hold. At eof, we transform the dots into an actionable bc code to generate the number of those dots.

$ sed -Ee '
    s/[^.]+//g;G
    /^(.*)..*\n\1$/!ba
    s/\n.*//;h;:a
    $!d;g;s/./1+/g;s/$/0/
'  file | bc -l

Could you please add an explanation? And is m+0 really needed there? — Quasímodo, Jul 23 '20 at 12:27

score 0 · Answer 6 · answered Jul 24 '20 at 09:53

0

JAAOV (Just another awk obfuscating variant...)

awk 'gsub(/[^.]/,"") { print | "wc -L" }'

answered Jul 24 '20 at 09:53

JJoao

12,170
1
23
45

Bash script to find maximum number of a certain character (".") in any single line of a file

6 Answers6