How can I print UTF-8 symbols on a terminal using bash commands.
This works
echo -e '\U2586'
But the following is failing
printf '%s\n' $(tput setaf 118) "\\u2586" $(tput sgr0)
How can I print UTF-8 symbols on a terminal using bash commands.
This works
echo -e '\U2586'
But the following is failing
printf '%s\n' $(tput setaf 118) "\\u2586" $(tput sgr0)
In bash, if you want printf
to expand backslash escape sequences in arguments after the format string, you should use %b
instead of %s
in the format string:
printf '%b\n' "$(tput setaf 118)" "\u2586" "$(tput sgr0)"
Since you have three arguments, perhaps this might be more appropriate:
printf '%s%b%s\n' "$(tput setaf 118)" "\u2586" "$(tput sgr0)"
As Stéphane Chazelas pointed out, this will output the encoding of the U+2586 character in the current locale’s character set. If that’s UTF-8, the result will be UTF-8; other character sets will differ. If the character set can’t represent U+2586, the result will be the string “\u2586” (zsh will fail with a “character not in range” error instead).
This produces the behaviour you want in most cases: if possible, it displays “▆”. If you really want to output the UTF-8 representation of the character, in all cases, you can force that by overriding the locale, e.g.
LC_ALL= LC_CTYPE=en_US.UTF-8 printf '%s%b%s\n' "$(tput setaf 118)" "\u2586" "$(tput sgr0)"
(See What is the difference between LANG=C and LC_ALL=C? for an explanation of the variable settings used above.)
printf %b '\u2586'
will print the U+2586 (▆
) character in the locale charset (0xe2 0x96 0x86 if it's UTF-8, 0xa8 0x7d if it's GB18030, etc) or as \u2586
if the locale has no such character. For it to print it UTF-8 encoded regardless of the user's locale, you'd need LC_CTYPE=en_US.UTF-8 printf...
(or any other locale available on the system that uses UTF-8 as the charmap and assuming $LC_ALL
is not otherwise set to a non-empty string).
– Stéphane Chazelas
Aug 10 '21 at 09:59
always
keyword. This kind of issue is what is currently holding POSIX from finalising the specification of $'...'
.
– Stéphane Chazelas
Aug 10 '21 at 12:30
Note that support for \uxxxx
and \UXXXXXXXX
was first added to the GNU implementation of the printf
standalone utility in 2000, but like for other escape sequences, they are only recognised in the format argument or in arguments for %b
specifiers. Not for %s
which display the strings verbatim.
They were later added to the printf
builtin of zsh
in 2003 (and also for echo
/print
and $'...'
there), ksh93 in 2004, bash in 2010 (4.2) and probably a few more since.
That's not standard though. There is plan for POSIX to specify the $'...'
form of quotes from ksh93 and them to allow \u
/\U
sequences, but one of the blocking points at the moment is how to handle the expansion if the charset in the current locale was not UTF-8 at the time the command using those quotes was parsed and/or run.
Still, if your script was started in a locale where the charset is UTF-8 and you have not changed the locale (the LC_CTYPE
, LC_ALL
and LANG
variables) since, using $'\uxxxx'
is probably the most portable way to get the UTF-8 encoding of a character based on its Unicode codepoint.
Use $'\UXXXXXXXX'
for characters with codepoint above 0xFFFF. Note that some shells do require all 4 digits for \u
and all 8 digits for \U
. So for maximum portability, use $'St\u00E9phane'
or $'St\U000000E9phane'
for Stéphane
for instance. In any shell, you'll need St\u00E9fan
for Stéfan
as $'St\ue9fan
would be treated as St
\U0E9F
ane
since f
is a hexadimal digit. With shells that do support $'\ue9'
you can also do $'St\ue9'$'fan'
or St$'\ue9'fan
and mix-and-match quote operators.
Then you can pass those expansions to any command, printf
or otherwise.
In your particular case, you could use zsh
's print
builtin as:
print -P '%F{118}\u2586%f'
Where -P
enables prompt expansion where the foreground colour can be set without having to run the tput
command. Or:
print -rP '%F{118}'$'\u2586''%f'
Where -r
disables the escape sequences and pass the U+2586 character literally to print
using the the $'...'
form of quotes.
Or:
arbitrary_text=$'\u2586'' arbitrary text with \backslash and % characters'
print -r -- ${(%):-%F{118}}$arbitrary_text${(%):-%f}
Where print
doesn't do any expansion, but the colour escape sequences are generated by the %
parameter expansion flag, and the U+2586 stored verbatim in the variable.
print -r
printf "%s\n" ☺
– ctrl-alt-delor Aug 10 '21 at 09:10