Bash script to get ASCII values for alphabet

Question

How do I get the ASCII value of the alphabet?

For example, 97 for a?

score 108 · Accepted Answer · edited Sep 27 '13 at 08:42

108

Define these two functions (usually available in other languages):

chr() {
  [ "$1" -lt 256 ] || return 1
  printf "\\$(printf '%03o' "$1")"
}

ord() {
  LC_CTYPE=C printf '%d' "'$1"
}

Usage:

chr 65
A

ord A
65

edited Sep 27 '13 at 08:42

Stéphane Chazelas

544,893

answered Sep 26 '13 at 08:02

dsmsk80

3,068

12

@dmsk80: +1. For others like me who think they spot a typo: "'A" is correct whereas if you use "A" it will say : A: invalid number . It seems it's done on printf side (ie, in the shell, "'A" is indeed 2 chars, a ' and a A. Those are passed to printf. And in the printf context, it is converted to the ascii value of A, (and is finally printed as a decimal thanks to the '%d'. Use 'Ox%x' to show it in hexa or '0%o' to have it in octal)) – Olivier Dulac Sep 26 '13 at 11:05
5

-1 for not explaining how it works... joking :D, but seriously what do these printf "\\$(printf '%03o' "$1")", '%03o', LC_CTYPE=C and the single quote in "'$1" do? – razzak Dec 04 '14 at 19:42
4

Read all the detail in FAQ 71. An excellent detailed analysis. – Nov 03 '15 at 09:22

score 24 · Answer 2 · answered Sep 26 '13 at 12:14

24

You can see the entire set with:

$ man ascii

You'll get tables in octal, hex, and decimal.

answered Sep 26 '13 at 12:14

ford

343

There's also an ascii package for debian-based distros, but (at least now) the question is tagged as bash, so these wouldn't help the OP. In fact, it's installed on my system and all I get from man ascii is its man page. – Joe Sep 27 '13 at 21:47

score 22 · Answer 3 · edited Nov 03 '15 at 12:00

22

This works well,

echo "A" | tr -d "\n" | od -An -t uC

echo "A"                              ### Emit a character.
         | tr -d "\n"                 ### Remove the "newline" character.
                      | od -An -t uC  ### Use od (octal dump) to print:
                                      ### -An  means Address none
                                      ### -t  select a type
                                      ###  u  type is unsigned decimal.
                                      ###  C  of size (one) char.

exactly equivalent to:

echo -n "A" | od -An -tuC        ### Not all shells honor the '-n'.

edited Nov 03 '15 at 12:00

answered Sep 26 '13 at 11:33

Saravanan

349

4

Can you maybe add a small explanation? – Bernhard Sep 26 '13 at 12:04
tr to remove "\n" (new line ) from the input. od is used to -t dC is to print in decimal character. – Saravanan Sep 26 '13 at 12:42
2

echo -n suppresses trailing newline eliminating the need for tr -d "\n" – Gowtham Sep 26 '13 at 16:59
2

@Gowtham, only with some implementations of echo, not in Unix compliant echos for instance. printf %s A would be the portable one. – Stéphane Chazelas Sep 27 '13 at 08:44

Stéphane Chazelas · Answer 4 · 2019-12-13T16:39:22.067

16

If you want to extend it to UTF-8 characters (assuming you're in a UTF-8 locale):

$ perl -CA -le 'print ord shift' 
128520

$ perl -CS -le 'print chr shift' 128520

With bash, ksh or zsh builtins:

$ printf "\U$(printf %08x 128520)\n"

edited Dec 13 '19 at 16:39

answered Sep 26 '13 at 12:09

Stéphane Chazelas

544,893

Did you intend to put a square box character or else then the original char is not being displayed in the post and is being replaced by a square box character. – mtk Oct 02 '13 at 07:11
1

@mtk, You need a browser that displays UTF-8 and a font that has that 128520 character. – Stéphane Chazelas Oct 02 '13 at 07:13
I am on Latest Chrome, and don't think that it doesn't support UTF-8. Would like to know what browser you are on? – mtk Oct 02 '13 at 08:53
@mtk, iceweasel on Debian sid. The font as confirmed by iceweasel's web console is "DejaVu Sans" and I've got ttf-dejavu ttf-dejavu-core ttf-dejavu-extra packages installed which come from Debian with upstream at http://dejavu-fonts.org/ – Stéphane Chazelas Oct 02 '13 at 09:10
what is the base of 128520? my own ctbl() seems to properly enable me to display it, and to slice the char from the head of a string with printf, but it puts 4*((o1=360)>=(d1=240)|(o2=237)>=(d2=159)|(o3=230)>=(d3=152)|(o4=210)>=(d4=136)) in $OPTARG for the byte values. – mikeserv Nov 18 '15 at 05:52

mikeserv · Answer 5 · 2015-11-15T02:23:56.897

ctbl()  for O                   in      0 1 2 3
        do  for o               in      0 1 2 3 4 5 6 7
                do for  _o      in      7 6 5 4 3 2 1 0
                        do      case    $((_o=(_o+=O*100+o*10)?_o:200)) in
                                (*00|*77) set   "${1:+ \"}\\$_o${1:-\"}";;
                                (140|42)  set   '\\'"\\$_o$1"           ;;
                                (*)       set   "\\$_o$1"               ;esac
                        done;   printf   "$1";   shift
                done
        done
eval '
ctbl(){
        ${1:+":"}       return "$((OPTARG=0))"
        set     "" ""   "${1%"${1#?}"}"
        for     c in    ${a+"a=$a"} ${b+"b=$b"} ${c+"c=$c"}\
                        ${LC_ALL+"LC_ALL=$LC_ALL"}
        do      while   case  $c in     (*\'\''*) ;; (*) ! \
                                 set "" "${c%%=*}='\''${c#*=}$1'\'' $2" "$3"
                        esac;do  set    "'"'\''\${c##*\'}"'$@";  c=${c%\'\''*}
        done;   done;   LC_ALL=C a=$3 c=;set "" "$2 OPTARG='\''${#a}*("
        while   [ 0 -ne "${#a}" ]
        do      case $a in      ([[:print:][:cntrl:]]*)
                        case    $a in   (['"$(printf \\1-\\77)"']*)
                                        b=0;;   (*)     b=1
                        esac;;  (['"$(  printf  \\200-\\277)"']*)
                                        b=2;;   (*)     b=3
                esac;    set    '"$(ctbl)"'     "$@"
                eval "   set    \"\${$((b+1))%"'\''"${a%"${a#?}"}"*}" "$6"'\''
                a=${a#?};set    "$((b=b*100+${#1}+${#1}/8*2)))" \
                                "$2(o$((c+=1))=$b)>=(d$c=$((0$b)))|"
        done;   eval "   unset   LC_ALL  a b c;${2%?})'\''"
        return  "$((${OPTARG%%\**}-1))"
}'

The first ctbl() - at the top there - only ever runs the one time. It generates the following output (which has been filtered through sed -n l for printability's sake):

ctbl | sed -n l

 "\200\001\002\003\004\005\006\a\b\t$
\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\
\035\036\037 !\\"#$%&'()*+,-./0123456789:;<=>?" "@ABCDEFGHIJKLMNOPQRS\
TUVWXYZ[\\]^_\\`abcdefghijklmnopqrstuvwxyz{|}~\177" "\200\201\202\203\
\204\205\206\207\210\211\212\213\214\215\216\217\220\221\222\223\224\
\225\226\227\230\231\232\233\234\235\236\237\240\241\242\243\244\245\
\246\247\250\251\252\253\254\255\256\257\260\261\262\263\264\265\266\
\267\270\271\272\273\274\275\276\277" "\300\301\302\303\304\305\306\
\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\327\
\330\331\332\333\334\335\336\337\340\341\342\343\344\345\346\347\350\
\351\352\353\354\355\356\357\360\361\362\363\364\365\366\367\370\371\
\372\373\374\375\376\377"$

...which are all 8-bit bytes (less NUL), divided into four shell-quoted strings split evenly at 64-byte boundaries. The strings might be represented with octal ranges like \200\1-\77,\100-\177,\200-\277,\300-\377, where byte 128 is used as a place-holder for NUL.

The first ctbl()'s entire purpose for existence is to generate those strings so that eval may define the second ctbl() function with them literally embedded thereafter. In that way they can be referenced in the function without needing to generate them again each time they are needed. When eval does define the second ctbl() function the first will cease to be.

The top half of the second ctbl() function is mostly ancillary here - it is designed to portably and safely serialize any current shell state it might affect when it is called. The top loop will quote any quotes in the values of any variables it might want to use, and then stack all of the results in its positional parameters.

The first two lines, though, first immediately return 0 and set $OPTARG to same if the function's first argument does not contain at least one character. And if it does, the second line immediately truncates its first argument to only its first character - because the function only handles a character at a time. Importantly, it does this in the current locale context, which means that if a character might comprise more than a single byte, then, provided the shell properly supports multi-byte chars, it will not discard any bytes except those which are not in the first character of its first argument.

        ${1:+":"}       return "$((OPTARG=0))"
        set     "" ""   "${1%"${1#?}"}"

It then does the save loop if at all necessary, and afterward it redefines the current locale context to the C locale for every category by assigning to the LC_ALL variable. From this point on, a character can only consist of a single byte, and so if there were multiple bytes in the first character of its first argument, these should now be each addressable as individual characters in their own right.

        LC_ALL=C

It is for this reason that the second half of the function is a while loop, as opposed to a singly run sequence. In most cases it will probably execute only once per call, but, if the shell in which ctbl() is defined properly handles multi-byte characters, it might loop.

        while   [ 0 -ne "${#a}" ]
        do      case $a in      ([[:print:][:cntrl:]]*)
                        case    $a in   (['"$(printf \\1-\\77)"']*)
                                        b=0;;   (*)     b=1
                        esac;;  (['"$(  printf  \\200-\\277)"']*)
                                        b=2;;   (*)     b=3
                esac;    set    '"$(ctbl)"'     "$@"

Note that the above $(ctbl) command substitution is only ever evaluated once - by eval when the function is initially defined - and that forever after that token is replaced with the literal output of that command substitution as saved into the the shell's memory. The same is true of the two case pattern command substitutions. This function does not ever call a subshell or any other command. It will also never attempt to read or write input/output (except in the case of some shell diagnostic message - which probably indicates a bug).

Also note that the test for loop continuity is not simply [ -n "$a" ], because, as I found to my frustration, for some reason a bash shell does:

char=$(printf \\1)
[ -n "$char" ] || echo but it\'s not null\!

but it's not null!

...and so I explicitly compare $a's len to 0 for each iteration, which, also inexplicably, behaves differently (read: correctly).

The case checks the first byte for inclusion in any of our four strings and stores a reference to the byte's set in $b. Afterward the shell's first four positional parameters are set to the strings embedded by eval and written by ctbl()'s predecessor.

Next, whatever remains of the first argument is again temporarily truncated to its first character - which should now be assured to be a single byte. This first byte is used as a reference to strip from the tail of the string which it matched and the reference in $b is eval'd to represent a positional parameter so everything from the reference byte to the last byte in string can be substituted away. The other three strings are dropped from the positional parameters entirely.

               eval "   set    \"\${$((b+1))%"'\''"${a%"${a#?}"}"*}" "$6"'\''
               a=${a#?};set    "$((b=b*100+${#1}+${#1}/8*2)))" \
                                "$2(o$((c+=1))=$b)>=(d$c=$((0$b)))|"

At this point the byte's value (modulo 64) can be referenced as the string's len:

str=$(printf '\200\1\2\3\4\5\6\7')
ref=$(printf \\4)
str=${str%"$ref"*}
echo "${#str}"

A little math is then done to reconcile the modulus based on the value in $b, the first byte in $a is permanently stripped away, and output for the current cycle is appended to a stack pending completion before the loop recycles to check if $a is actually empty.

    eval "   unset   LC_ALL  a b c;${2%?})'\''"
    return  "$((${OPTARG%%\**}-1))"

When $a definitely is empty, all names and state - with the exception of $OPTARG - that the function affected throughout the course of its execution are restored to their previous state - whether set and not null, set and null, or unset - and the output is saved to $OPTARG as the function returns. The actual return value is one less than the total number of bytes in the first character of its first argument - so any single byte character returns zero and any multi-byte char will return more than zero - and its output format is a little strange.

The value ctbl() saves to $OPTARG is a valid shell arithmetic expression that, if evaluated, will concurrently set variable names of the forms $o1, $d1, $o2, $d2 to decimal and octal values of all respective bytes in the first character of its first argument, but ultimately evaluate to the total number of bytes in its first argument. I had a specific kind of workflow in mind when writing this, and I think maybe a demonstration is in order.

I often find a reason to take a string apart with getopts like:

str=some\ string OPTIND=1
while   getopts : na  -"$str"
do      printf %s\\n "$OPTARG"
done

s
o
m
e

s
t
r
i
n
g

I probably do a little more than just print it a char per line, but anything's possible. In any case, I haven't yet found a getopts that will properly do (strike that - dash's getopts does it char by char, but bash definitely doesn't):

str=ŐőŒœŔŕŖŗŘřŚśŜŝŞş  OPTIND=1
while   getopts : na  -"$str"
do      printf %s\\n "$OPTARG"
done|   od -tc

0000000 305  \n 220  \n 305  \n 221  \n 305  \n 222  \n 305  \n 223  \n
0000020 305  \n 224  \n 305  \n 225  \n 305  \n 226  \n 305  \n 227  \n
0000040 305  \n 230  \n 305  \n 231  \n 305  \n 232  \n 305  \n 233  \n
0000060 305  \n 234  \n 305  \n 235  \n 305  \n 236  \n 305  \n 237  \n
0000100

Ok. So I tried...

str=ŐőŒœŔŕŖŗŘřŚśŜŝŞş
while   [ 0 -ne "${#str}" ]
do      printf %c\\n "$str"    #identical results for %.1s
        str=${str#?}
done|   od -tc

#dash
0000000 305  \n 220  \n 305  \n 221  \n 305  \n 222  \n 305  \n 223  \n
0000020 305  \n 224  \n 305  \n 225  \n 305  \n 226  \n 305  \n 227  \n
0000040 305  \n 230  \n 305  \n 231  \n 305  \n 232  \n 305  \n 233  \n
0000060 305  \n 234  \n 305  \n 235  \n 305  \n 236  \n 305  \n 237  \n
0000100

#bash
0000000 305  \n 305  \n 305  \n 305  \n 305  \n 305  \n 305  \n 305  \n
*
0000040

That kind of workflow - the byte for byte/char for char kind - is one I often get into when doing tty stuff. At the leading edge of input you need to know char values as soon as you read them, and you need their sizes (especially when counting columns), and you need characters to be whole characters.

And so now I have ctbl():

str=ŐőŒœŔŕŖŗŘřŚśŜŝŞş
while [ 0 -ne "${#str}" ]
do    ctbl "$str"
      printf "%.$(($OPTARG))s\t::\t$OPTARG\t::\t$?\t::\t\\$o1\\$o2\n" "$str"
      str=${str#?}
done

Ő   ::  2*((o1=305)>=(d1=197)|(o2=220)>=(d2=144))   ::  1   ::  Ő
ő   ::  2*((o1=305)>=(d1=197)|(o2=221)>=(d2=145))   ::  1   ::  ő
Œ   ::  2*((o1=305)>=(d1=197)|(o2=222)>=(d2=146))   ::  1   ::  Œ
œ   ::  2*((o1=305)>=(d1=197)|(o2=223)>=(d2=147))   ::  1   ::  œ
Ŕ   ::  2*((o1=305)>=(d1=197)|(o2=224)>=(d2=148))   ::  1   ::  Ŕ
ŕ   ::  2*((o1=305)>=(d1=197)|(o2=225)>=(d2=149))   ::  1   ::  ŕ
Ŗ   ::  2*((o1=305)>=(d1=197)|(o2=226)>=(d2=150))   ::  1   ::  Ŗ
ŗ   ::  2*((o1=305)>=(d1=197)|(o2=227)>=(d2=151))   ::  1   ::  ŗ
Ř   ::  2*((o1=305)>=(d1=197)|(o2=230)>=(d2=152))   ::  1   ::  Ř
ř   ::  2*((o1=305)>=(d1=197)|(o2=231)>=(d2=153))   ::  1   ::  ř
Ś   ::  2*((o1=305)>=(d1=197)|(o2=232)>=(d2=154))   ::  1   ::  Ś
ś   ::  2*((o1=305)>=(d1=197)|(o2=233)>=(d2=155))   ::  1   ::  ś
Ŝ   ::  2*((o1=305)>=(d1=197)|(o2=234)>=(d2=156))   ::  1   ::  Ŝ
ŝ   ::  2*((o1=305)>=(d1=197)|(o2=235)>=(d2=157))   ::  1   ::  ŝ
Ş   ::  2*((o1=305)>=(d1=197)|(o2=236)>=(d2=158))   ::  1   ::  Ş
ş   ::  2*((o1=305)>=(d1=197)|(o2=237)>=(d2=159))   ::  1   ::  ş

Note that ctbl() doesn't actually define the $[od][12...] variables - it never has any lasting effect on any state but $OPTARG - but only puts the string in $OPTARG that can be used to define them - which is how I get the second copy of each char above by doing printf "\\$o1\\$o2" because they are set each time I evaluate $(($OPTARG)). But where I do it I'm also declaring a field length modifier to printf's %s string argument format, and because the expression always evaluates to the total number of bytes in a character, I get the whole character on output when I do:

printf %.2s "$str"

@HelloGoodbye this isnt bash code. nor is this obfuscated. to see obfuscation, please refer to [ "$(printf \\1)" ]|| ! echo but its not null! meanwhile, feel free to better acquaint yourself with meaningful comment practice, unless you recommend an actual such contest...? — mikeserv, Nov 09 '18 at 14:20
No, I don't, what I wrote was just another way of saying that your code is very confusing (at least to me), but maybe it wasn't supposed to be easily understandable. If it isn't bash, then what language is it? — HelloGoodbye, Nov 09 '18 at 19:37
@HelloGoodbye - this is POSIX sh command language. bash is a bourne again supraset of same, and in large part a precipitous motivator for much of the care afforded above toward widely portable, self expanding and namespace honorable character sizes of any kind. bash should handle much of this already, but the c language printf was, and maybe is, deficient the capability above provided. — mikeserv, Nov 10 '18 at 09:28
I'm still inclined to use printf "%d" "'$char" for the sake of simplicity and readability. I'm curious what sort of problems this exposes me to that @mikeserv's solution addresses? Is there more than just some control characters affecting the return code (which I believe was his point in the above comment)? — Alex Jansen, Feb 25 '19 at 06:52
@AlexJohnson this is a universal fix for a somewhat infamous basic printf problem with multibyte characters. — mikeserv, Feb 27 '19 at 23:49
Funnily enough, I just ran into a similar problem one line above the printf statement in question. Iterating through a string with 'û' in it was looping an extra time on some machines, presumably because of a second byte caused by a locale setting. Do you have a favorite article on handling these sorts of issues in bash? — Alex Jansen, Feb 28 '19 at 01:11

score 8 · Answer 6 · answered Sep 26 '13 at 13:59

8

I'm going for the simple (and elegant?) Bash solution:

for i in {a..z}; do echo $(printf "%s %d" "$i" "'$i"); done

For in a script you can use the following:

CharValue="A"
AscValue=`printf "%d" "'$CharValue"

Notice the single quote before the CharValue. It is obligated...

answered Sep 26 '13 at 13:59

phulstaert

91

1

How is your answer different from dsmsk80's answer? – Bernhard Sep 26 '13 at 14:35
1

My interpretation of the question is "how to get the ASCII values for the values of the alphabet". Not how to define a function to retrieve the ASCII value for one character. So my first answer is a short one-line command to get the ASCII values for the alphabet. – phulstaert Sep 27 '13 at 07:26
I get your point, but I still think the bottom line of both answer is printf "%d". – Bernhard Sep 27 '13 at 09:11
2

I agree this is a crucial part of the process to get to the result, yet i didn't wanted to make the assumption that xmpirate knew about the "for i in" and the use of a range. If he wanted a list, this could be a real time-saver ;-). Also, future readers might find my additions helpful. – phulstaert Sep 27 '13 at 09:44

score 3 · Answer 7 · answered Nov 15 '15 at 01:29

3

Not a shell script, but works

awk 'BEGIN{for( i=97; i<=122;i++) printf "%c %d\n",i,i }'

Sample output

xieerqi:$ awk 'BEGIN{for( i=97; i<=122;i++) printf "%c %d\n",i,i }' | head -n 5                                    
a 97
b 98
c 99
d 100
e 101

answered Nov 15 '15 at 01:29

Sergiy Kolodyazhnyy

16,527

score 3 · Answer 8 · edited Dec 09 '15 at 20:19

3

select the symbol, then press CTRL+C
open konsole
and type: xxd<press enter>
then press <SHIFT+INSERT><CTRL+D>

you get something like:

mariank@dd903c5n1 ~ $ xxd
û0000000: fb

you know the symbol you pasted has hex code 0xfb

edited Dec 09 '15 at 20:19

Jakuje

21,357

answered Dec 09 '15 at 20:14

Marian Klen

31

score 1 · Answer 9 · answered Jul 09 '20 at 20:23

If you want to print out the decimal representation of the UTF-8 value, I endorse dsmsk80's soluiton. If, on the other hand, you need to assign the value to a variable, there is a mechanism within Bash's printf that works faster. Let us assume that you want to assign the ascii value of "A" (which is 65 in decimal, and which we have assigned to a variable, theChar) to a variable myVar. Inlining dmsmsk80's ord() function we would get:

LC_CTYPE=C myVar=$(printf "%d" "'$theChar")

In order for this assignment to take place, the value harvested from '$theChar must be formatted in decimal characters and then parsed from decimal to the number 65 that is then stored in myVar. To avoid this formatting and parsing we can take advantage of the -v flag for printf, which assigns the value to be printed directly. The syntax is as follows:

LC_CTYPE=C printf -v myVar "%d" "'$theChar"

I discovered this because I needed to create a Bash script that gave me the Fowler-Noll-Vo hash for each line of a text file, which I quote here:

#!/bin/bash
export LC_CTYPE=C
prime=16777619                                             #FNV prime
ofset=2166136261                                           #FNV offset
mask=0xffffffff                                            #bitmask
cat $1 | while read line || [[ -n $line ]]                 #foreach line in file (w/o end return)
do
    hash=$ofset                                            #set hash to offset for line.
    for (( i=0; i<${#line}; i++ ))                         #foreach char in line
    do
        printf -v charVal "%d" "'${line:$i:1}"             #use printf -v trick.
        hash=$(( ( ( hash ^ charVal ) * prime ) & mask ))  #update FNV1-a hash for char.
    done
    printf "%08X\n" $hash                                  #print hash result for line.
done

Using the -v option, for assignment with printf, resulted in a 50X performance improvement, when run in Cygwin64.

Bash script to get ASCII values for alphabet

9 Answers9

Linked