1

In Bash, to type in a literal TAB character:

  • hitting tab key doesn't work but will only invoke completion of readline used by Bash
  • Ctrl-V-tab works
  • $(echo $'\t') works

Questions

  • Is it correct that the third way is actually done by Bash interpreting $'\t' as an ANSI C string for the tab character?
  • How is the second way done? Also by readline used by Bash?
  • Generally what are some ways to type in a literal character?

Originated from: https://unix.stackexchange.com/a/458074/674

slm
  • 369,824
Tim
  • 101,790

1 Answers1

2

How it's made

The so called by you "literal characters" are implemented as ordinary Unicode charachters. Let's look how it works for Tabulation and New line. Check Tabulation hex-encoding:

printf $'\t' | hexdump

The output is

0000000 0009                                   
0000001

The output means the \t character is ordinary UTF-8 character U+0009. You can print it in such the way:

printf '\x00\x09'

or with echo:

echo -e '\u0009'

Consider the following example for New line character:

bob@alice:~$ printf $'\n' | hexdump
0000000 000a                                   
0000001
bob@alice:~$ printf '\x00\x0A empty lines are above and below'; echo $'\n'

 empty lines are above and below

bob@alice:~$ echo -e '\u000a empty line is above'

 empty line is above
bob@alice:~$ 

How to input Unicode characters

There is so called ComposeKey or MultiKey in Linux. The key can be defined in xorg.conf.d/10-keyboard.conf file, just add the line to file:

Option "xkbOptions" "grp:alt_shift_toggle,terminate:ctrl_alt_bksp,compose:menu"`

UTF-8 (Unicode) compose sequence hints can be found in Compose file:

less /usr/share/X11/locale/en_US.UTF-8/Compose

In GUI terminals also works CTRL+SHIFT+U keybinding - press it and you'll see u letter. Input 266a and complete it with Space or Enter key - the Eights Note sign appears.

Additional information

  1. ANSI-C Quoting
  2. Ubuntu - ComposeKey
  3. Wikipedia - Compose key
  4. How to set a Compose Key in Ubuntu 18.04
Bob
  • 1,155
  • 2
    TAB is encoded as one 0x09 byte in ASCII or UTF-8, not two 0x00 and 0x09 bytes. See the second address in hexdump output is 1 meaning there was only one byte. hexdump dumps 16bit words by default. Use od -vtx1 instead. It would only be encoded as 0009 in UTF-16BE, an encoding that is not Unix-compatible. – Stéphane Chazelas Jul 25 '18 at 16:26
  • @StéphaneChazelas Thank you for your remark. Please, pay your attention I have avoided intentionally to use UTF-8 encoded characters and used encoding independent Unicode Code Point U+0009 for Unicode tabulation control code. I didn't use intentionally a "byte-language". Take into account, using of \x00 isn't a mistake too since according to part 23.1 of Unicode Standard v.11 usage of U+0000 is outside the scope of the Unicode Standard, which does not require any particular usage of null (page 858). Read: http://www.unicode.org/versions/Unicode11.0.0/UnicodeStandard-11.0.pdf – Bob Jul 25 '18 at 18:23