How do I get the ASCII value of the alphabet?
For example, 97
for a
?
How do I get the ASCII value of the alphabet?
For example, 97
for a
?
Define these two functions (usually available in other languages):
chr() {
[ "$1" -lt 256 ] || return 1
printf "\\$(printf '%03o' "$1")"
}
ord() {
LC_CTYPE=C printf '%d' "'$1"
}
Usage:
chr 65
A
ord A
65
You can see the entire set with:
$ man ascii
You'll get tables in octal, hex, and decimal.
This works well,
echo "A" | tr -d "\n" | od -An -t uC
echo "A" ### Emit a character.
| tr -d "\n" ### Remove the "newline" character.
| od -An -t uC ### Use od (octal dump) to print:
### -An means Address none
### -t select a type
### u type is unsigned decimal.
### C of size (one) char.
exactly equivalent to:
echo -n "A" | od -An -tuC ### Not all shells honor the '-n'.
echo -n
suppresses trailing newline eliminating the need for tr -d "\n"
– Gowtham
Sep 26 '13 at 16:59
echo
, not in Unix compliant echos for instance. printf %s A
would be the portable one.
– Stéphane Chazelas
Sep 27 '13 at 08:44
If you want to extend it to UTF-8 characters (assuming you're in a UTF-8 locale):
$ perl -CA -le 'print ord shift'
128520
$ perl -CS -le 'print chr shift' 128520
With bash
, ksh
or zsh
builtins:
$ printf "\U$(printf %08x 128520)\n"
iceweasel
on Debian sid
. The font as confirmed by iceweasel's web console is "DejaVu Sans" and I've got ttf-dejavu ttf-dejavu-core ttf-dejavu-extra packages installed which come from Debian with upstream at http://dejavu-fonts.org/
– Stéphane Chazelas
Oct 02 '13 at 09:10
ctbl()
seems to properly enable me to display it, and to slice the char from the head of a string with printf
, but it puts 4*((o1=360)>=(d1=240)|(o2=237)>=(d2=159)|(o3=230)>=(d3=152)|(o4=210)>=(d4=136))
in $OPTARG
for the byte values.
– mikeserv
Nov 18 '15 at 05:52
ctbl() for O in 0 1 2 3
do for o in 0 1 2 3 4 5 6 7
do for _o in 7 6 5 4 3 2 1 0
do case $((_o=(_o+=O*100+o*10)?_o:200)) in
(*00|*77) set "${1:+ \"}\\$_o${1:-\"}";;
(140|42) set '\\'"\\$_o$1" ;;
(*) set "\\$_o$1" ;esac
done; printf "$1"; shift
done
done
eval '
ctbl(){
${1:+":"} return "$((OPTARG=0))"
set "" "" "${1%"${1#?}"}"
for c in ${a+"a=$a"} ${b+"b=$b"} ${c+"c=$c"}\
${LC_ALL+"LC_ALL=$LC_ALL"}
do while case $c in (*\'\''*) ;; (*) ! \
set "" "${c%%=*}='\''${c#*=}$1'\'' $2" "$3"
esac;do set "'"'\''\${c##*\'}"'$@"; c=${c%\'\''*}
done; done; LC_ALL=C a=$3 c=;set "" "$2 OPTARG='\''${#a}*("
while [ 0 -ne "${#a}" ]
do case $a in ([[:print:][:cntrl:]]*)
case $a in (['"$(printf \\1-\\77)"']*)
b=0;; (*) b=1
esac;; (['"$( printf \\200-\\277)"']*)
b=2;; (*) b=3
esac; set '"$(ctbl)"' "$@"
eval " set \"\${$((b+1))%"'\''"${a%"${a#?}"}"*}" "$6"'\''
a=${a#?};set "$((b=b*100+${#1}+${#1}/8*2)))" \
"$2(o$((c+=1))=$b)>=(d$c=$((0$b)))|"
done; eval " unset LC_ALL a b c;${2%?})'\''"
return "$((${OPTARG%%\**}-1))"
}'
The first ctbl()
- at the top there - only ever runs the one time. It generates the following output (which has been filtered through sed -n l
for printability's sake):
ctbl | sed -n l
"\200\001\002\003\004\005\006\a\b\t$
\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\
\035\036\037 !\\"#$%&'()*+,-./0123456789:;<=>?" "@ABCDEFGHIJKLMNOPQRS\
TUVWXYZ[\\]^_\\`abcdefghijklmnopqrstuvwxyz{|}~\177" "\200\201\202\203\
\204\205\206\207\210\211\212\213\214\215\216\217\220\221\222\223\224\
\225\226\227\230\231\232\233\234\235\236\237\240\241\242\243\244\245\
\246\247\250\251\252\253\254\255\256\257\260\261\262\263\264\265\266\
\267\270\271\272\273\274\275\276\277" "\300\301\302\303\304\305\306\
\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\327\
\330\331\332\333\334\335\336\337\340\341\342\343\344\345\346\347\350\
\351\352\353\354\355\356\357\360\361\362\363\364\365\366\367\370\371\
\372\373\374\375\376\377"$
...which are all 8-bit bytes (less NUL
), divided into four shell-quoted strings split evenly at 64-byte boundaries. The strings might be represented with octal ranges like \200\1-\77
,\100-\177
,\200-\277
,\300-\377
, where byte 128 is used as a place-holder for NUL
.
The first ctbl()
's entire purpose for existence is to generate those strings so that eval
may define the second ctbl()
function with them literally embedded thereafter. In that way they can be referenced in the function without needing to generate them again each time they are needed. When eval
does define the second ctbl()
function the first will cease to be.
The top half of the second ctbl()
function is mostly ancillary here - it is designed to portably and safely serialize any current shell state it might affect when it is called. The top loop will quote any quotes in the values of any variables it might want to use, and then stack all of the results in its positional parameters.
The first two lines, though, first immediately return 0 and set $OPTARG
to same if the function's first argument does not contain at least one character. And if it does, the second line immediately truncates its first argument to only its first character - because the function only handles a character at a time. Importantly, it does this in the current locale context, which means that if a character might comprise more than a single byte, then, provided the shell properly supports multi-byte chars, it will not discard any bytes except those which are not in the first character of its first argument.
${1:+":"} return "$((OPTARG=0))"
set "" "" "${1%"${1#?}"}"
It then does the save loop if at all necessary, and afterward it redefines the current locale context to the C locale for every category by assigning to the LC_ALL
variable. From this point on, a character can only consist of a single byte, and so if there were multiple bytes in the first character of its first argument, these should now be each addressable as individual characters in their own right.
LC_ALL=C
It is for this reason that the second half of the function is a while
loop, as opposed to a singly run sequence. In most cases it will probably execute only once per call, but, if the shell in which ctbl()
is defined properly handles multi-byte characters, it might loop.
while [ 0 -ne "${#a}" ]
do case $a in ([[:print:][:cntrl:]]*)
case $a in (['"$(printf \\1-\\77)"']*)
b=0;; (*) b=1
esac;; (['"$( printf \\200-\\277)"']*)
b=2;; (*) b=3
esac; set '"$(ctbl)"' "$@"
Note that the above $(ctbl)
command substitution is only ever evaluated once - by eval
when the function is initially defined - and that forever after that token is replaced with the literal output of that command substitution as saved into the the shell's memory. The same is true of the two case
pattern command substitutions. This function does not ever call a subshell or any other command. It will also never attempt to read or write input/output (except in the case of some shell diagnostic message - which probably indicates a bug).
Also note that the test for loop continuity is not simply [ -n "$a" ]
, because, as I found to my frustration, for some reason a bash
shell does:
char=$(printf \\1)
[ -n "$char" ] || echo but it\'s not null\!
but it's not null!
...and so I explicitly compare $a
's len to 0 for each iteration, which, also inexplicably, behaves differently (read: correctly).
The case
checks the first byte for inclusion in any of our four strings and stores a reference to the byte's set in $b
. Afterward the shell's first four positional parameters are set
to the strings embedded by eval
and written by ctbl()
's predecessor.
Next, whatever remains of the first argument is again temporarily truncated to its first character - which should now be assured to be a single byte. This first byte is used as a reference to strip from the tail of the string which it matched and the reference in $b
is eval
'd to represent a positional parameter so everything from the reference byte to the last byte in string can be substituted away. The other three strings are dropped from the positional parameters entirely.
eval " set \"\${$((b+1))%"'\''"${a%"${a#?}"}"*}" "$6"'\''
a=${a#?};set "$((b=b*100+${#1}+${#1}/8*2)))" \
"$2(o$((c+=1))=$b)>=(d$c=$((0$b)))|"
At this point the byte's value (modulo 64) can be referenced as the string's len:
str=$(printf '\200\1\2\3\4\5\6\7')
ref=$(printf \\4)
str=${str%"$ref"*}
echo "${#str}"
4
A little math is then done to reconcile the modulus based on the value in $b
, the first byte in $a
is permanently stripped away, and output for the current cycle is appended to a stack pending completion before the loop recycles to check if $a
is actually empty.
eval " unset LC_ALL a b c;${2%?})'\''"
return "$((${OPTARG%%\**}-1))"
When $a
definitely is empty, all names and state - with the exception of $OPTARG
- that the function affected throughout the course of its execution are restored to their previous state - whether set and not null, set and null, or unset - and the output is saved to $OPTARG
as the function returns. The actual return value is one less than the total number of bytes in the first character of its first argument - so any single byte character returns zero and any multi-byte char will return more than zero - and its output format is a little strange.
The value ctbl()
saves to $OPTARG
is a valid shell arithmetic expression that, if evaluated, will concurrently set variable names of the forms $o1
, $d1
, $o2
, $d2
to decimal and octal values of all respective bytes in the first character of its first argument, but ultimately evaluate to the total number of bytes in its first argument. I had a specific kind of workflow in mind when writing this, and I think maybe a demonstration is in order.
I often find a reason to take a string apart with getopts
like:
str=some\ string OPTIND=1
while getopts : na -"$str"
do printf %s\\n "$OPTARG"
done
s
o
m
e
s
t
r
i
n
g
I probably do a little more than just print it a char per line, but anything's possible. In any case, I haven't yet found a getopts
that will properly do (strike that - dash
's getopts
does it char by char, but bash
definitely doesn't):
str=ŐőŒœŔŕŖŗŘřŚśŜŝŞş OPTIND=1
while getopts : na -"$str"
do printf %s\\n "$OPTARG"
done| od -tc
0000000 305 \n 220 \n 305 \n 221 \n 305 \n 222 \n 305 \n 223 \n
0000020 305 \n 224 \n 305 \n 225 \n 305 \n 226 \n 305 \n 227 \n
0000040 305 \n 230 \n 305 \n 231 \n 305 \n 232 \n 305 \n 233 \n
0000060 305 \n 234 \n 305 \n 235 \n 305 \n 236 \n 305 \n 237 \n
0000100
Ok. So I tried...
str=ŐőŒœŔŕŖŗŘřŚśŜŝŞş
while [ 0 -ne "${#str}" ]
do printf %c\\n "$str" #identical results for %.1s
str=${str#?}
done| od -tc
#dash
0000000 305 \n 220 \n 305 \n 221 \n 305 \n 222 \n 305 \n 223 \n
0000020 305 \n 224 \n 305 \n 225 \n 305 \n 226 \n 305 \n 227 \n
0000040 305 \n 230 \n 305 \n 231 \n 305 \n 232 \n 305 \n 233 \n
0000060 305 \n 234 \n 305 \n 235 \n 305 \n 236 \n 305 \n 237 \n
0000100
#bash
0000000 305 \n 305 \n 305 \n 305 \n 305 \n 305 \n 305 \n 305 \n
*
0000040
That kind of workflow - the byte for byte/char for char kind - is one I often get into when doing tty stuff. At the leading edge of input you need to know char values as soon as you read them, and you need their sizes (especially when counting columns), and you need characters to be whole characters.
And so now I have ctbl()
:
str=ŐőŒœŔŕŖŗŘřŚśŜŝŞş
while [ 0 -ne "${#str}" ]
do ctbl "$str"
printf "%.$(($OPTARG))s\t::\t$OPTARG\t::\t$?\t::\t\\$o1\\$o2\n" "$str"
str=${str#?}
done
Ő :: 2*((o1=305)>=(d1=197)|(o2=220)>=(d2=144)) :: 1 :: Ő
ő :: 2*((o1=305)>=(d1=197)|(o2=221)>=(d2=145)) :: 1 :: ő
Œ :: 2*((o1=305)>=(d1=197)|(o2=222)>=(d2=146)) :: 1 :: Œ
œ :: 2*((o1=305)>=(d1=197)|(o2=223)>=(d2=147)) :: 1 :: œ
Ŕ :: 2*((o1=305)>=(d1=197)|(o2=224)>=(d2=148)) :: 1 :: Ŕ
ŕ :: 2*((o1=305)>=(d1=197)|(o2=225)>=(d2=149)) :: 1 :: ŕ
Ŗ :: 2*((o1=305)>=(d1=197)|(o2=226)>=(d2=150)) :: 1 :: Ŗ
ŗ :: 2*((o1=305)>=(d1=197)|(o2=227)>=(d2=151)) :: 1 :: ŗ
Ř :: 2*((o1=305)>=(d1=197)|(o2=230)>=(d2=152)) :: 1 :: Ř
ř :: 2*((o1=305)>=(d1=197)|(o2=231)>=(d2=153)) :: 1 :: ř
Ś :: 2*((o1=305)>=(d1=197)|(o2=232)>=(d2=154)) :: 1 :: Ś
ś :: 2*((o1=305)>=(d1=197)|(o2=233)>=(d2=155)) :: 1 :: ś
Ŝ :: 2*((o1=305)>=(d1=197)|(o2=234)>=(d2=156)) :: 1 :: Ŝ
ŝ :: 2*((o1=305)>=(d1=197)|(o2=235)>=(d2=157)) :: 1 :: ŝ
Ş :: 2*((o1=305)>=(d1=197)|(o2=236)>=(d2=158)) :: 1 :: Ş
ş :: 2*((o1=305)>=(d1=197)|(o2=237)>=(d2=159)) :: 1 :: ş
Note that ctbl()
doesn't actually define the $[od][12...]
variables - it never has any lasting effect on any state but $OPTARG
- but only puts the string in $OPTARG
that can be used to define them - which is how I get the second copy of each char above by doing printf "\\$o1\\$o2"
because they are set each time I evaluate $(($OPTARG))
. But where I do it I'm also declaring a field length modifier to printf
's %s
string argument format, and because the expression always evaluates to the total number of bytes in a character, I get the whole character on output when I do:
printf %.2s "$str"
[ "$(printf \\1)" ]|| ! echo but its not null!
meanwhile, feel free to better acquaint yourself with meaningful comment practice, unless you recommend an actual such contest...?
– mikeserv
Nov 09 '18 at 14:20
sh
command language. bash
is a bourne again supraset of same, and in large part a precipitous motivator for much of the care afforded above toward widely portable, self expanding and namespace honorable character sizes of any kind. bash
should handle much of this already, but the c
language printf
was, and maybe is, deficient the capability above provided.
– mikeserv
Nov 10 '18 at 09:28
printf
problem with multibyte characters.
– mikeserv
Feb 27 '19 at 23:49
I'm going for the simple (and elegant?) Bash solution:
for i in {a..z}; do echo $(printf "%s %d" "$i" "'$i"); done
For in a script you can use the following:
CharValue="A"
AscValue=`printf "%d" "'$CharValue"
Notice the single quote before the CharValue. It is obligated...
printf "%d"
.
– Bernhard
Sep 27 '13 at 09:11
Not a shell script, but works
awk 'BEGIN{for( i=97; i<=122;i++) printf "%c %d\n",i,i }'
Sample output
xieerqi:$ awk 'BEGIN{for( i=97; i<=122;i++) printf "%c %d\n",i,i }' | head -n 5
a 97
b 98
c 99
d 100
e 101
konsole
xxd<press enter>
<SHIFT+INSERT><CTRL+D>
you get something like:
mariank@dd903c5n1 ~ $ xxd
û0000000: fb
you know the symbol you pasted has hex code 0xfb
If you want to print out the decimal representation of the UTF-8 value, I endorse dsmsk80's soluiton. If, on the other hand, you need to assign the value to a variable, there is a mechanism within Bash's printf
that works faster. Let us assume that you want to assign the ascii value of "A" (which is 65 in decimal, and which we have assigned to a variable, theChar
) to a variable myVar
. Inlining dmsmsk80's ord()
function we would get:
LC_CTYPE=C myVar=$(printf "%d" "'$theChar")
In order for this assignment to take place, the value harvested from '$theChar
must be formatted in decimal characters and then parsed from decimal to the number 65 that is then stored in myVar
. To avoid this formatting and parsing we can take advantage of the -v
flag for printf
, which assigns the value to be printed directly. The syntax is as follows:
LC_CTYPE=C printf -v myVar "%d" "'$theChar"
I discovered this because I needed to create a Bash script that gave me the Fowler-Noll-Vo hash for each line of a text file, which I quote here:
#!/bin/bash
export LC_CTYPE=C
prime=16777619 #FNV prime
ofset=2166136261 #FNV offset
mask=0xffffffff #bitmask
cat $1 | while read line || [[ -n $line ]] #foreach line in file (w/o end return)
do
hash=$ofset #set hash to offset for line.
for (( i=0; i<${#line}; i++ )) #foreach char in line
do
printf -v charVal "%d" "'${line:$i:1}" #use printf -v trick.
hash=$(( ( ( hash ^ charVal ) * prime ) & mask )) #update FNV1-a hash for char.
done
printf "%08X\n" $hash #print hash result for line.
done
Using the -v
option, for assignment with printf
, resulted in a 50X performance improvement, when run in Cygwin64.
"'A"
is correct whereas if you use"A"
it will say :A: invalid number
. It seems it's done on printf side (ie, in the shell,"'A"
is indeed 2 chars, a'
and aA
. Those are passed to printf. And in the printf context, it is converted to the ascii value of A, (and is finally printed as a decimal thanks to the'%d'
. Use'Ox%x'
to show it in hexa or'0%o'
to have it in octal)) – Olivier Dulac Sep 26 '13 at 11:05printf "\\$(printf '%03o' "$1")"
,'%03o'
,LC_CTYPE=C
and the single quote in"'$1"
do? – razzak Dec 04 '14 at 19:42