2

How can I print UTF-8 symbols on a terminal using bash commands.

This works

echo -e '\U2586'

But the following is failing

printf '%s\n' $(tput setaf 118) "\\u2586" $(tput sgr0)
Isabel
  • 79

2 Answers2

5

In bash, if you want printf to expand backslash escape sequences in arguments after the format string, you should use %b instead of %s in the format string:

printf '%b\n' "$(tput setaf 118)" "\u2586" "$(tput sgr0)"

Since you have three arguments, perhaps this might be more appropriate:

printf '%s%b%s\n' "$(tput setaf 118)" "\u2586" "$(tput sgr0)"

As Stéphane Chazelas pointed out, this will output the encoding of the U+2586 character in the current locale’s character set. If that’s UTF-8, the result will be UTF-8; other character sets will differ. If the character set can’t represent U+2586, the result will be the string “\u2586” (zsh will fail with a “character not in range” error instead).

This produces the behaviour you want in most cases: if possible, it displays “▆”. If you really want to output the UTF-8 representation of the character, in all cases, you can force that by overriding the locale, e.g.

LC_ALL= LC_CTYPE=en_US.UTF-8 printf '%s%b%s\n' "$(tput setaf 118)" "\u2586" "$(tput sgr0)"

(See What is the difference between LANG=C and LC_ALL=C? for an explanation of the variable settings used above.)

Stephen Kitt
  • 434,908
  • Technically bash (4.2 or above)'s printf %b '\u2586' will print the U+2586 () character in the locale charset (0xe2 0x96 0x86 if it's UTF-8, 0xa8 0x7d if it's GB18030, etc) or as \u2586 if the locale has no such character. For it to print it UTF-8 encoded regardless of the user's locale, you'd need LC_CTYPE=en_US.UTF-8 printf... (or any other locale available on the system that uses UTF-8 as the charmap and assuming $LC_ALL is not otherwise set to a non-empty string). – Stéphane Chazelas Aug 10 '21 at 09:59
  • Thanks @Stéphane, I’ve added that information to the answer. Do you know if it’s possible to disable the “character not in range” behaviour in zsh? – Stephen Kitt Aug 10 '21 at 12:27
  • 1
    I don't think it is, though you can at least catch the exception using the always keyword. This kind of issue is what is currently holding POSIX from finalising the specification of $'...'. – Stéphane Chazelas Aug 10 '21 at 12:30
0

Note that support for \uxxxx and \UXXXXXXXX was first added to the GNU implementation of the printf standalone utility in 2000, but like for other escape sequences, they are only recognised in the format argument or in arguments for %b specifiers. Not for %s which display the strings verbatim.

They were later added to the printf builtin of zsh in 2003 (and also for echo/print and $'...' there), ksh93 in 2004, bash in 2010 (4.2) and probably a few more since.

That's not standard though. There is plan for POSIX to specify the $'...' form of quotes from ksh93 and them to allow \u/\U sequences, but one of the blocking points at the moment is how to handle the expansion if the charset in the current locale was not UTF-8 at the time the command using those quotes was parsed and/or run.

Still, if your script was started in a locale where the charset is UTF-8 and you have not changed the locale (the LC_CTYPE, LC_ALL and LANG variables) since, using $'\uxxxx' is probably the most portable way to get the UTF-8 encoding of a character based on its Unicode codepoint.

Use $'\UXXXXXXXX' for characters with codepoint above 0xFFFF. Note that some shells do require all 4 digits for \u and all 8 digits for \U. So for maximum portability, use $'St\u00E9phane' or $'St\U000000E9phane' for Stéphane for instance. In any shell, you'll need St\u00E9fan for Stéfan as $'St\ue9fan would be treated as St \U0E9F ane since f is a hexadimal digit. With shells that do support $'\ue9' you can also do $'St\ue9'$'fan' or St$'\ue9'fan and mix-and-match quote operators.

Then you can pass those expansions to any command, printf or otherwise.

In your particular case, you could use zsh's print builtin as:

print -P '%F{118}\u2586%f'

Where -P enables prompt expansion where the foreground colour can be set without having to run the tput command. Or:

print -rP '%F{118}'$'\u2586''%f'

Where -r disables the escape sequences and pass the U+2586 character literally to print using the the $'...' form of quotes.

Or:

arbitrary_text=$'\u2586'' arbitrary text with \backslash and % characters'
print -r -- ${(%):-%F{118}}$arbitrary_text${(%):-%f}

Where print doesn't do any expansion, but the colour escape sequences are generated by the % parameter expansion flag, and the U+2586 stored verbatim in the variable. print -r