-1

I have a comma-separated csv file with 50 lines. One column is for state names and the other column is for capitals (of the states). How do you make a loop where it counts the number of tokens (2, 3, 4) from those two columns together and groups the result into an array? Is it possible to keep track of how many such states there are while doing this?

Kusalananda
  • 333,661
  • 1
    An example of the input and the expected output would be good to see. What is a token? Are you just wanting to count words per line, then awk '{ print NF }' would output the number of whitespace-delimited words on each line of input. What type of array are you needing? An array in some shell script language, or a list in XML or JSON? – Kusalananda Jul 26 '22 at 10:47

2 Answers2

1

This solution uses awk instead. I understood from the question that the output should contain only the name of the states. The previous answer provided an output that was more useful and OP accepted that answer, so this script follows the same format with the same dataset.

{
    x = $0
    gsub(/,/, " ", $0)
    a[x]=NF

}

END { for (key in a) { counter[a[key]] += 1 }

for (c in counter) {
    print counter[c] " values with " c " tokens:"
    for (key in a) {
        if (c == a[key]) {
            print "\t"key
        }
    }
}

}

32 values with 2 tokens: Oregon,Salem Virginia,Richmond Montana,Helena Florida,Tallahassee Ohio,Columbus Delaware,Dover Nebraska,Lincoln California,Sacramento Wisconsin,Madison Alaska,Juneau Texas,Austin Tennessee,Nashville Hawaii,Honolulu Maryland,Annapolis Idaho,Boise Illinois,Springfield Wyoming,Cheyenne Georgia,Atlanta Connecticut,Hartford Arizona,Phoenix Indiana,Indianapolis Colorado,Denver Mississippi,Jackson Washington,Olympia Kentucky,Frankfort Vermont,Montpelier Maine,Augusta Michigan,Lansing Kansas,Topeka Alabama,Montgomery Massachusetts,Boston Pennsylvania,Harrisburg 16 values with 3 tokens: South Dakota,Pierre New Hampshire,Concord Arkansas,Little Rock North Carolina,Raleigh North Dakota,Bismarck Louisiana,Baton Rouge Oklahoma,Oklahoma City New York,Albany Nevada,Carson City Iowa,Des Moines South Carolina,Columbia Rhode Island,Providence New Jersey,Trenton Minnesota,St. Paul Missouri,Jefferson City West Virginia,Charleston 2 values with 4 tokens: Utah,Salt Lake City New Mexico,Santa Fe

r_31415
  • 516
0

With State Capitals.csv along the lines of:

Alabama,Montgomery
Alaska,Juneau
Arizona,Phoenix
...
West Virginia,Charleston
Wisconsin,Madison
Wyoming,Cheyenne

The following Bash script (version 4+) does what you're asking (assuming I understand what you're asking):

#!/bin/bash -e

export PATH=/bin:/sbin:/usr/bin:/usr/sbin

declare -A a declare -i i j while IFS=, read state capital; do i=$(( $( echo "$state $capital" | tr -cd ' ' | wc -c ) + 1 )) if [[ -z ${a[$i]} ]]; then declare -a b=() else eval "${a[$i]}" fi b+=("$state|$capital") a[$i]=$( declare -p b ) done <<< $( sort 'State Capitals.csv' )

for i in $( IFS=$'\n'; echo "${!a[*]}" | sort -n ); do echo "The following &quot;state capital&quot; strings have $i tokens:" eval "${a[$i]}" for (( j = 0; j < ${#b[@]}; ++j )); do echo "${b[$j]}" done
| column -ts '|'
| sed -re 's/^/ /' done

The first loop populates an associative array (a), its indices being the number of words in "State Capital", and its values being a string representation of arrays containing "State|Capital" entries (stringified using declare -p).

The second loop iterates through the sorted keys of a, uses eval to load a's values (stringified with declare -p) into array b, and iterating through b.

ahi324
  • 157
  • 5
  • If it's appropriate to ask here, why was this question and this answer down-voted? Hoping to become a helpful contributor here. – ahi324 Jul 26 '22 at 23:29
  • I didn't down vote your answer, but I'm pretty sure the reason was due to the use of shell loops, which some people frown upon reflexively. In fact, I just wrote an answer to provide a more nuanced view of this issue here: https://unix.stackexchange.com/a/711524/29793. In my opinion, your script is a bit complex for my taste, but gives a very nice output and solves all the problems requested by OP, so I upvoted this answer anyway. – r_31415 Jul 27 '22 at 20:13
  • 1
    @r_31415 Thank you for the feedback, and I very much like your criteria for shell loops. – ahi324 Jul 27 '22 at 20:24
  • 1
    @ahi324 I tried to upvote & then I chose your answer as the best one despite not understanding your code since it was too advanced for me but it met all the checkmarks like r_31415 said – usuallystuck Jul 30 '22 at 11:41