Consider the following awk script, duplicates.awk
:
#!/usr/bin/awk -f
BEGIN {
RS = "(\r\n|\n\r|\r|\n)"
FS = "[\t\v\f ]*;[\t\v\f ]*"
split("", count)
}
{
count[$3]++
}
END {
for (item in count) {
if (count[item] > 1)
printf "%s\n", item
}
}
Remember to make it executable, using e.g. chmod a+rx duplicates.awk
. You can either pipe the input to the command, or supply one or more input files as command-line parameters (multiple files are treated as if they were concatenated into one single file).
The BEGIN rule sets up universal newlines (that is, it accepts all newline conventions from MS-DOS to old Macs to Unix), and semicolons ;
as the field separator. For illustration, I made the field separator also consume any whitespace surrounding it, so that x;foo bar ; y
parses into three fields: x
, foo bar
, and y
.
The record rule (the middle part of the snippet) is applied to every record (line) in the input. Because awk supports associative arrays, we simply use the third field, a string, as a key to count
array, and increment that entry by one. (Incrementing a nonexistent array entry in awk yields 1, so the first increment yields 1, and the code works as you'd expect.)
The END rule scans the count
array, printing the entries that occurred at least twice. Note that this output is in random order. (There are ways to sort the output according to the number of occurrences, or even to keep the original order (of first occurrences) in the file, but OP did not mention any requirement wrt. ordering, so I didn't bother; undefined order is the simplest to implement.)
If you want to print e.g. the number of occurrences followed by the string (the value from the third column), then use the following END rule instead:
END {
for (item in count)
printf "%15d %s\n", count[item], item
}
The output is formatted so that the first fifteen characters in the output are reserved for the number, and the value starts at the 17th character.
find More duplicate
is unclear... if you can add a better sample with duplicates and expected output, it will help – Sundeep Oct 22 '16 at 06:03awk -F';' '{print $3}' ip.txt
orcut -d';' -f3 ip.txt
– Sundeep Oct 22 '16 at 06:07