I have a bunch of csvfiles that I'm importing into a database. I'd like to get a preview of the unique values in each column to help me create the tables. I've written a script that takes a input csv file and output text file. I want to write column headers and unique values to the output file. Here are some of the criteria I haven't been able to implement:
- I want to skip columns that are all numbers, but allow for string that contain numbers like "Unit 7".
- I want to skip strings that are whitespace like ' ' but allow for strings with spaces like "Unit 7"
- I don't want timestamp or time objects like.
#!/usr/bin/env bash
set -o errexit
set -o nounset
main() {
if [[ $1 -ne *.csv ]] ; then
echo "$1 is not a csv file"
exit 1
elif [[ -z $2 ]] ; then
echo "Usage: univals <csvfile.csv> <outputfile.txt>"
exit 1
else
header_length=$(head $1 -n 1 | wc -w)
headers=( $(head $1 -n 1 | tr '\t' '\n') )
for ((i=1 ; i < $header_length ; i++)) ; do
This code facilitates printing unique values on one line: https://stackoverflow.com/questions/19274695/sorting-on-same-line-bash
a=( $@ )
b=( $(printf "%s " ${a[@]} | cut -f $i $1 | grep -v '[0-9]\|\s' | sort -u) )
$(echo "${headers[i-1]}" >> $2)
$(printf "%s " ${b[@]} >> $2)
done
fi
}
main "$@"
This has helped me skip the numbers but clearly taken a toll on everything that has a number in it or has a space in it. Thanks in advance for any help/advice.
sort -u
– TAAPSogeking Jul 27 '22 at 21:47