I have a comma-separated csv file with 50 lines. One column is for state names and the other column is for capitals (of the states). How do you make a loop where it counts the number of tokens (2, 3, 4) from those two columns together and groups the result into an array? Is it possible to keep track of how many such states there are while doing this?
2 Answers
This solution uses awk instead. I understood from the question that the output should contain only the name of the states. The previous answer provided an output that was more useful and OP accepted that answer, so this script follows the same format with the same dataset.
{
x = $0
gsub(/,/, " ", $0)
a[x]=NF
}
END {
for (key in a) {
counter[a[key]] += 1
}
for (c in counter) {
print counter[c] " values with " c " tokens:"
for (key in a) {
if (c == a[key]) {
print "\t"key
}
}
}
}
32 values with 2 tokens:
Oregon,Salem
Virginia,Richmond
Montana,Helena
Florida,Tallahassee
Ohio,Columbus
Delaware,Dover
Nebraska,Lincoln
California,Sacramento
Wisconsin,Madison
Alaska,Juneau
Texas,Austin
Tennessee,Nashville
Hawaii,Honolulu
Maryland,Annapolis
Idaho,Boise
Illinois,Springfield
Wyoming,Cheyenne
Georgia,Atlanta
Connecticut,Hartford
Arizona,Phoenix
Indiana,Indianapolis
Colorado,Denver
Mississippi,Jackson
Washington,Olympia
Kentucky,Frankfort
Vermont,Montpelier
Maine,Augusta
Michigan,Lansing
Kansas,Topeka
Alabama,Montgomery
Massachusetts,Boston
Pennsylvania,Harrisburg
16 values with 3 tokens:
South Dakota,Pierre
New Hampshire,Concord
Arkansas,Little Rock
North Carolina,Raleigh
North Dakota,Bismarck
Louisiana,Baton Rouge
Oklahoma,Oklahoma City
New York,Albany
Nevada,Carson City
Iowa,Des Moines
South Carolina,Columbia
Rhode Island,Providence
New Jersey,Trenton
Minnesota,St. Paul
Missouri,Jefferson City
West Virginia,Charleston
2 values with 4 tokens:
Utah,Salt Lake City
New Mexico,Santa Fe

- 516
With State Capitals.csv
along the lines of:
Alabama,Montgomery
Alaska,Juneau
Arizona,Phoenix
...
West Virginia,Charleston
Wisconsin,Madison
Wyoming,Cheyenne
The following Bash script (version 4+) does what you're asking (assuming I understand what you're asking):
#!/bin/bash -e
export PATH=/bin:/sbin:/usr/bin:/usr/sbin
declare -A a
declare -i i j
while IFS=, read state capital; do
i=$(( $( echo "$state $capital" | tr -cd ' ' | wc -c ) + 1 ))
if [[ -z ${a[$i]} ]]; then
declare -a b=()
else
eval "${a[$i]}"
fi
b+=("$state|$capital")
a[$i]=$( declare -p b )
done <<< $( sort 'State Capitals.csv' )
for i in $( IFS=$'\n'; echo "${!a[*]}" | sort -n ); do
echo "The following "state capital" strings have $i tokens:"
eval "${a[$i]}"
for (( j = 0; j < ${#b[@]}; ++j )); do
echo "${b[$j]}"
done
| column -ts '|'
| sed -re 's/^/ /'
done
The first loop populates an associative array (a
), its indices being the number of words in "State Capital", and its values being a string representation of arrays containing "State|Capital" entries (stringified using declare -p
).
The second loop iterates through the sorted keys of a
, uses eval
to load a
's values (stringified with declare -p
) into array b
, and iterating through b
.

- 157
- 5
-
If it's appropriate to ask here, why was this question and this answer down-voted? Hoping to become a helpful contributor here. – ahi324 Jul 26 '22 at 23:29
-
I didn't down vote your answer, but I'm pretty sure the reason was due to the use of shell loops, which some people frown upon reflexively. In fact, I just wrote an answer to provide a more nuanced view of this issue here: https://unix.stackexchange.com/a/711524/29793. In my opinion, your script is a bit complex for my taste, but gives a very nice output and solves all the problems requested by OP, so I upvoted this answer anyway. – r_31415 Jul 27 '22 at 20:13
-
1@r_31415 Thank you for the feedback, and I very much like your criteria for shell loops. – ahi324 Jul 27 '22 at 20:24
-
1@ahi324 I tried to upvote & then I chose your answer as the best one despite not understanding your code since it was too advanced for me but it met all the checkmarks like r_31415 said – usuallystuck Jul 30 '22 at 11:41
awk '{ print NF }'
would output the number of whitespace-delimited words on each line of input. What type of array are you needing? An array in some shell script language, or a list in XML or JSON? – Kusalananda Jul 26 '22 at 10:47