Don't use a shell loop to process text. Use a text processing utility.
Here, to capitalise names in the 5th field, if the Lingua::EN::NameCase
perl
module is available:
perl -Mopen=locale -MLingua::EN::NameCase -F, -ae '
$F[4] = nc $F[4] unless @F < 5;
print join ",", @F' < your-file
If not, as an approximation, you could convert to uppercase the first character of every sequence of one or more alphanumeric ones:
perl -Mopen=locale -F, -ae '
$F[4] =~ s/\w+/\u$&/g unless @F < 5;
print join ",", @F' < your-file
That would however not handle properly names such as McGregor
, van Dike
... or those with combining characters.
(perl also has proper CSV parsing modules in case your input is not only the simple csv without quoting in your sample).
The same can be done with standard awk
syntax, but it's a lot more cumbersome:
awk -F, -v OFS=, '
NF >= 5 {
r = $5; $5 = ""
while (match(r, "[[:alnum:]]+")) {
$5 = $5 substr(r, 1, RSTART - 1) \
toupper(substr(r, RSTART, 1)) \
substr(r, RSTART + 1, RLENGTH - 1)
r = substr(r, RSTART + RLENGTH)
}
$5 = $5 r
}
{print}' < your-file
Slightly easier with GNU awk
and its patsplit()
function:
gawk -F, -v OFS=, '
NF >= 5 {
n = patsplit($5, f, /[[:alnum:]]+/, s)
$5 = s[0]
for (i = 1; i <= n; i++)
$5 = $5 toupper(substr(f[i], 1, 1)) \
substr(f[i], 2) s[i]
}
{print}' < your-file
If you have to use a shell loop, at least use a shell with a capitalisation operator:
#! /bin/zsh -
while IFS=, read -ru3 -A fields; do
(( $#fields < 5 )) || fields[5]=${(C)fields[5]}
print -r -- ${(j[,])fields} || exit
done 3< your-file
Note that that one (and the Lingua::EN::NameCase
based one) differs from the other ones in that it turns éric serRA
into Éric Serra
instead of Éric SerRA
for instance. You can achieve the same result in perl
by changing \u
to \u\L
and in awk
by applying tolower()
to the second part of each word.
If you had to only use bash
and its builtin commands as you indicate in comments, that would be a lot more cumbersome (in addition to being inefficient) as bash has very limited operators compared to those of zsh or ksh93 for instance and its read -a
can't read separated values.
That would have to be something like (here assuming bash 4.0+ for the ${var^}
operator):
#! /bin/bash -
set -o noglob -o nounset
IFS=,
re='^([^[:alnum:]]*)([[:alnum:]]+)(.*)$'
while IFS= read -ru3 line; do
fields=( $line'' )
if (( ${#fields[@]} >= 5 )); then
rest="${fields[4]}" fields[4]=
while [[ "$rest" =~ $re ]]; do
fields[4]="${fields[4]}${BASH_REMATCH[1]}${BASH_REMATCH[2]^}"
rest="${BASH_REMATCH[3]}"
done
fi
printf '%s\n' "${fields[*]}" || exit
done 3< your-file
Those assume that the input is valid text encoded in the user's locale charset (for instance, that in a UTF-8 locale, that é
above is encoded in UTF-8 (0xc3 0xa9 bytes), not iso8859-1 or other charset). The bash (and possibly awk) ones will choke on NUL bytes.
As perl
's \w
is alnums + underscore, you'll also find a difference for strings like jean_pierre
which perl
would capitalise as Jean_pierre
while the other ones would capitalise it as Jean_Pierre
. You may need to adapt to your specific input (also consider combining characters which would also put a spanner in the works here). See also the Lingua::EN::NameCase
perl
module to handle even more special cases.
As far as what commands are installed by default on what systems. Most systems will have perl
(possibly the Text::CSV
module, but likely not the Lingua::EN::NameCase
one) and a POSIX compliant awk
and sh
implementations, many (even some non-GNU systems) have bash
(the GNU shell), several have GNU awk (though not some GNU-based systems such as Ubuntu which at least in some versions prefer mawk). Few currently have zsh
installed by default.
CentOS being a GNU system should have bash
and gawk
installed by default in addition to perl
. bash
and gawk
even provide sh
and awk
there.
tail
norawk
are builtin inbash
. Why would you want to only use builtin tools, especially inbash
which is among the least efficient of all shells. – Stéphane Chazelas Jun 22 '21 at 09:54tail
. – Kusalananda Jun 22 '21 at 10:58Ann Sue Smith
? Can you ever have single-word names likeCher
? Can you have names likejohn mcloud
that should becomeJohn McLoud
orsue jones-smith
that should becomeSue Jones-Smith
? Can you ever have unusual names like Elon Musks kidX Æ A-12
? If your input can contain anything other than just the most basic names as shown right now then please [edit] your question to include them in your example. – Ed Morton Jun 22 '21 at 18:46