I have a file with the following lines:
1 a
2 a
3 a
1 b
2 b
1 c
2 c
3 c
4 c
1 d
I want to get the result as:
a 1 2 3
b 1 2
c 1 2 3 4
d 1
I have a file with the following lines:
1 a
2 a
3 a
1 b
2 b
1 c
2 c
3 c
4 c
1 d
I want to get the result as:
a 1 2 3
b 1 2
c 1 2 3 4
d 1
Using awk
:
awk '{ group[$2] = (group[$2] == "" ? $1 : group[$2] OFS $1 ) }
END { for (group_name in group) print group_name, group[group_name] }' inputfile
This stores the groups in an array called group
. This array is indexed on the group name (the second column in the input data) and for each line of input from inputfile
, the value in the first column is appended to the correct group.
The END
block loops over all collected groups and outputs the group name and the entries of that group.
This awk
program with a nicer layout:
{
group[$2] = (group[$2] == "" ? $1 : group[$2] OFS $1 )
}
END {
for (group_name in group)
print group_name, group[group_name]
}
Note that this is not what you'd want to do if you have massive amounts of data as the group
array will actually store all input data read from the file.
For huge amounts of data, we assume that the input is sorted on the group names (the second column) and use
awk '$2 != group_name { if (group != "") print group_name, group; group = ""; group_name = $2 }
{ group = (group == "" ? $1 : group OFS $1) }
END { if (group != "") print group_name, group }' inputfile
This keeps track of what the current group is, and collects the data for that group. Whenever the second column in the input switches to another value, it outputs the collected group data and starts collecting new data. This means that only a few lines of input is ever stored, rather than storing the whole input data set.
This last awk
program with a nicer layout:
$2 != group_name {
if (group != "")
print group_name, group
group = ""
group_name = $2
}
{
group = (group == "" ? $1 : group OFS $1)
}
END {
# Output last group (only), if there was any data at all.
if (group != "")
print group_name, group
}
Try this,
for i in `awk '!a[$2]++ { print $2}' file.txt`
do
echo "$i `awk -v z=$i '$2==z{print $1}' file.txt | tr '\n' ' '`"
done
awk '!a[$2]++ { print $2}
will give the unique value of column 2.$2==z{print $1}
will print all values where $2 equals variable z
. awk '!a[$2]++ { print $2}' a.txt
do echo "$i
awk -v z=$i '$2==z{print $1}' a.txt | tr '\n' ' '
" done
awk: syntax error near line 1 awk: bailing out near line 1
– syed Mar 18 '19 at 12:09awk
for each unique value in column 2, even though the task at hand could be done with a single awk script. (This is related to Why is using a shell loop to process text considered bad practice?) Also, using for i in $(foo)
isn't good either, it breaks in case of whitespace (which the columns admittedly can't contain here), but also on glob characters. (See: Why is looping over find's output bad practice?)
– ilkkachu
Mar 18 '19 at 12:16
gawk '!a[$2]++ { print $2}' a.txt
do echo "$i
gawk -v z=$i '$2==z{print $1}' a.txt | tr '\n' ' '
" done
a 1 2 3 b 1 2 c 1 2 3 4 d 1
– syed Mar 18 '19 at 12:18Command:for i in a b c d; do echo $i;awk -v i="$i" '$2 == i{print $1}' filename| perl -pne "s/\n/ /g";echo " "| perl -pne "s/ /\n/g";done| sed '/^$/d'| sed "N;s/\n/ /g"
output
for i in a b c d; do echo $i;awk -v i="$i" '$2 == i{print $1}' l.txt | perl -pne "s/\n/ /g";echo " "| perl -pne "s/ /\n/g";done| sed '/^$/d'| sed "N;s/\n/ /g"
a 1 2 3
b 1 2
c 1 2 3 4
d 1
2 a
, how should the output look like? Are the input lines always sorted already (by the 2nd column, followed by the 1st column)? In the output, should the numbers be sorted or do you want them to appear in the same order as in the input data? – Bodo Mar 18 '19 at 11:441 a 2 a 3 a 1 b 2 b 1 c 2 c 3 c 4 c 1 d
? Or is it multiple lines1 a
,2 a
, etc.? – Chris Davies Mar 18 '19 at 11:57