How to find the most common name in passwd file

Question

My /etc/passwd has a list of users in a format that looks like this:

username:password:uid:gid:firstname.lastname, somenumber:/...

Goal : I want to see only the first names and than sort them having the most common name appear first, 2nd most common appear 2nd etc....

I saw some solutions as to how to do the 2nd part, although they are relevant to working with a text file and not to reading from a map.

In regards to the first part, I really don't know how to approach this. I know that there are some solutions but don't really know how to do them.

score 6 · Accepted Answer · answered Aug 09 '16 at 07:51

6

One way to do it:

cut -d: -f5 /etc/passwd | \
    sed 's/\..*//' | \
    sort -i | \
    uniq -ci | \
    sort -rn

answered Aug 09 '16 at 07:51

Satō Katsura

13,368
2
31
50

Great answer, but I think he'll be in need of using uniq without -i, since there should be difference between X and x in name, we only need --ignore-case option for sort as you've used. In addition, using the sed command you've added in your answer, seems irrelevant, if there is any reason, please explain. – Aug 09 '16 at 08:07
@FarazX Re: -i: John.Doe should be the same as john.doe. Re: sed: from the OP: I want to see only the first names. – Satō Katsura Aug 09 '16 at 08:13
Oh you're right, sorry I didn't notice. So voila! Thanks for your explanation, and your great way of using cut ;) – Aug 09 '16 at 08:14
cut + sed is too much sed '/\n/{P;d};s/:/\n/4;s/\./\n/;D' or sed 's/[^.]*:$\w\+$.*/\1/' – Costas Aug 09 '16 at 08:23
@Costas Too much compared to what? For me, total time spent thinking about getting the 5th field portably with sed >> the time gained by not using cut. BTW, your second recipe assumes GNU sed (\w). – Satō Katsura Aug 09 '16 at 08:31
@SatoKatsura The above is example. If you'd like you can do the same as in your script sed 's/[^.]*://;s/\..*//'. But my 1st example a little bit quicker. AND if you don't like \w you free to use [:alnum:] – Costas Aug 09 '16 at 08:38
@Costas sed 's/[^.]*://;s/\..*//' misses any names without dot. The point of using cut is precisely to avoid going into this kind of details, you know. – Satō Katsura Aug 09 '16 at 08:44
@SatoKatsura If you insist s/$[^:]*:$\{4\}//;s/[:.].*// In any way if you involve sed you can easily avoid cut – Costas Aug 09 '16 at 09:46
Can u explain briefly how this command works? (specifically the sed and cut part) – asaf92 Aug 09 '16 at 13:29
And btw, in my system I don't have access to passwd. I have to type ypcat passwd to read it. – asaf92 Aug 09 '16 at 13:31
@PanthersFan92 cut extracts the 5th field, sed kills the .lastname, somenumber part out of it. You can, of course, do it like this: ypcat passwd | cut -d: -f5 | .... – Satō Katsura Aug 09 '16 at 13:48

John1024 · Answer 2 · 2016-08-09T08:15:43.430

Using awk and sorting to have the most common name first:

awk -F: '{sub(/[.].*/, "", $5); a[$5]++} END{for (n in a)print a[n],n}' /etc/passwd | sort -nr

For a case-insensitive version:

awk -F: '{sub(/[. ,].*/, "", $5); a[tolower($5)]++} END{for (n in a)print a[n],n}' /etc/passwd | sort -nr

For those who prefer their commands spread over multiple lines:

awk -F: '
  {
    sub(/[.].*/, "", $5)
    a[$5]++
  }

  END{
    for (n in a)
      print a[n],n
  }
  ' /etc/passwd | sort -nr

How it works

-F:

This makes : the field separator.
sub(/[.].*/, "", $5)

This removes everything after the first period from field 5.
a[$5]++

The count for the number of times this name has appeared is stored in associative array a. This increments the counter. For the case-insensitive version, this is replaced with a[tolower($5)]++.
END{for (n in a)print a[n],n}

This prints the count and name for all the results that we have in array a.
sort -nr

This sorts the output numerically in descending order.

FWIW if GNU awk 4+ you can set PROCINFO["sorted_in"]="@val_num_desc" and drop the separate sort — dave_thompson_085, Oct 27 '19 at 02:46

How to find the most common name in passwd file

2 Answers2

How it works

Linked

Related