1

I'm trying to escape the following code with the echo command but I keep getting the actual octet and not the emoji.

Also where could I find the octet values of the emojis? I seem to always find the UTF-8 values.

#!/usr/bin/env bash

UNICORN='\360\237\246\204\n'
FIRE=''

# this does not work when I run the script
printf '\360\237\246\204\n'
printf "Riding a ${UNICORN:Q}"  
echo "Riding a ${UNICORN:Q}" #[Fails]: how to extract the actual emoji? 

EDIT_1: Just updating the code after reading comments

#!/usr/bin/env bash
# Note: use hexdump -b to get one-bye octal display

UNICORN_UTF8=$'\360\237\246\204'


printf "U1F525\n"|hexdump -b  # [ASK]: How to translate the return value to a valid UTF8 ?

FIRE_UTF8=$'\125\061\106\065\062\065\012'

echo "Riding a ${UNICORN_locale_encoding}"
echo "${UNICORN_UTF8} + ${FIRE_UTF8}"

EDIT_2: Posting final code. It sort of works.

#!/usr/bin/env bash

# Author:
# Usage:
# Note: use hexdump -b to get one-bye octal display of the emoji (needed for when ≠ computers use ≠ commandLine tools) 
# Ex: printf "U1F525\n"|hexdump -v -e '"\\" 1/1 "%03o"' ; echo 

UNICORN_UTF8=$'\360\237\246\204'
FIRE_UTF8=$'\xF0\x9F\x94\xA5'
LEAVE_SPACE=\^[a-zA-Z0-9_]*$\

echo "Riding an ${UNICORN_UTF8} ${LEAVE_SPACE} out of a ${FIRE_UTF8} ${LEAVE_SPACE} house."
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
  • 3
    Welcome to this site! Any reason why you don't want to use printf? Also see this relevant question https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo – Daniele Santi Sep 06 '18 at 09:51
  • 2
    This is a follow-on question to https://unix.stackexchange.com/questions/466961/ . – JdeBP Sep 06 '18 at 09:53
  • 1
    \octet and even \n are printf syntax. what makes you expect echo knos ho to interpret them? – Philippos Sep 06 '18 at 09:54
  • @Philippos thanks, I thought that printf and echo behaved the same. New to this bash and scripting.

    @MrShunz, main reason is because I would like to use it on an script, and my current scripts has echo "Ready to git-some and sync your local branches to the remote counterparts ?"

    – intercoder Sep 06 '18 at 14:02
  • @JdeBP correct that is the follow up question to actually all my questions here. I'm trying to incorporate emojis into my script – intercoder Sep 06 '18 at 14:16
  • printf U1F525 outputs U1F525, 0125 is the code for U. With bash-4.2+ use printf '\U1F525' as already shown in answers to both your questions. – Stéphane Chazelas Sep 06 '18 at 14:42

2 Answers2

9

echo's syntax is different from the standard C escapes as supported by printf/awk/$'...'...

In standard echo syntax, you need a leading 0 in front of the octal sequence (which can have from 1 to 3 digits)¹:

echo '\0360\0237\0246\0204'

Note that for bash's echo builtin to work with that, you need to enable the xpg_echo option²:

$ UNICORN_utf8_printf_format='\360\237\246\204'
$ UNICORN_utf8_echo='\0360\0237\0246\0204'
$ UNICORN_utf8=$'\360\237\246\204'
$ printf "$UNICORN_utf8_printf_format\n"

$ printf '%s\n' "$UNICORN_utf8"

$ shopt -s xpg_echo
$ echo "$UNICORN_utf8_echo"

Above, only $UNICORN_utf8 contains a character, encoded in UTF8. The other ones contain sequences of backslash and digits that are meant to be expanded by the respective tools.

The %b format of the printf utility also understands the same sequences as echo. %b was actually added so we can get rid of echo which is impossible to use portably and reliably.

$ printf '%b\n' "$UNICORN_utf8_echo"

See also (in zsh and bash³):

UNICORN_locale_encoding=$'\U1f984'

Which gets you a Unicorn encoded in the locale's encoding, which would make it work even if the locale's encoding was not UTF-8 and also had that character (probably only GB18030, where is encoded as $'\225\60\330\66' and where $'\360\237\246\204' would be the encoding of 馃 (\N{CJK UNIFIED IDEOGRAPH-9983}\N{<private use area-E6E9>})).

Some printf implementations (including GNU printf and the printf builtin of zsh, ksh93 and recent versions of bash (4.2 or above)) also support those \UXXXXXXXX escape sequences in their format argument (or arguments to %b except with ksh93); the GNU one needs 8 digits.


¹ GNU coreutils echo and busybox echo support \ooo with -e as an extension (not when POSIXLY_CORRECT is in the environment for GNU echo)

² other option would be to use the non-standard -e option, but then it wouldn't work when both the posix and xpg_echo options are enabled, like when bash is in UNIX compliance mode.

³ ksh93 and mksh also support that syntax, but encode in UTF-8 regardless of the locale's encoding; in current (2018) versions of FreeBSD sh, you need \U0001f984 and it only works in UTF-8 locales.

3
$ echo $'\360\237\230\200\012'

(that's bash's echo, GNU bash, version 4.3.43(1)-release (x86_64-redhat-linux-gnu))

Or you can use the echo binary:

$ /usr/bin/echo -e "\360\237\230\200\012"

How did I get it? I used maulinglawns's answer above to see the emoji's octal:

$ printf "\U1F600\n"

$ printf "\U1F600\n"|hexdump -b
0000000 360 237 230 200 012                                            
0000005

hexdump:
-b, --one-byte-octal one-byte octal display

EDIT: If you wanted the unicorn emoji:

$ echo $'\360\237\246\204'

$ `which echo` -e '\360\237\246\204'

If you want some generic way of getting the octal in that format:

printf "\U1F600\n"|hexdump -v -e '"\\" 1/1 "%03o"' ; echo
\360\237\230\200\012

The output includes the \n as \012. The ";echo" will add a newline at the end, it's useful for when trying it on command line, else the shell prompt will be shown right after the output.