14

I am writing a script which needs to calculate the number of characters in a command's output in a single step.

For example, using the command readlink -f /etc/fstab should return 10 because the output of that command is 10 characters long.

This is already possible with stored variables using the following code:

variable="somestring";
echo ${#variable};
# 10

Unfortunately, using the same formula with a command-generated string does not work:

${#(readlink -f /etc/fstab)};
# bash: ${#(readlink -f /etc/fstab)}: bad substitution

I understand it is possible to do this by first saving the output to a variable:

variable=$(readlink -f /etc/fstab);
echo ${#variable};

But I would like to remove the extra step.

Is this possible? Compatibility with the Almquist shell (sh) using only in-built or standard utilities is preferable.

Braiam
  • 35,991

5 Answers5

9

With GNU expr:

$ expr length + "$(readlink -f /etc/fstab)"
10

The + there is a special feature of GNU expr to make sure the next argument is treated as a string even if it happens to be an expr operator like match, length, +...

The above will strip any trailing newline of output. To work around it:

$ expr length + "$(readlink -f /etc/fstab; printf .)" - 2
10

The result was subtracted to 2 because the final newline of readlink and the character . we added.

With Unicode string, expr does not seem to work, because it returns length of string in bytes instead of characters count (See line 654)

$ LC_ALL=C.UTF-8 expr length ăaa
4

So, you can use:

$ printf "ăaa" | LC_ALL=C.UTF-8 wc -m
3

POSIXLY:

$ expr " $(readlink -f /etc/fstab; printf .)" : ".*" - 3
10

The space before command substitution prevent command from being crashed with string start with -, so we need to subtract 3.

cuonglm
  • 153,898
  • Thanks! It seems that your third example works even without the LC_ALL=C.UTF-8, which significantly simplifies things if the encoding of the string will not be known in advance. – user339676 Oct 11 '14 at 08:10
  • 2
    expr length $(echo "*") — nope. At least use double quotes: expr length "$(…)". But this strips off trailing newlines from the command, it's an unescapable feature of command substitution. (You can work around it, but then the answer becomes even more complex.) – Gilles 'SO- stop being evil' Oct 11 '14 at 14:12
6

Not sure how to do this with shell builtins (Gnouc is though) but the standard tools can help:

  1. You can use wc -m which counts characters. Unfortunately, it also counts the final newline so you'd have to get rid of that first:

    readlink -f /etc/fstab | tr -d '\n' | wc -m
    
  2. You can of course use awk

    readlink -f /etc/fstab | awk '{print length($0)}'
    
  3. Or Perl

    readlink -f /etc/fstab | perl -lne 'print length'
    
terdon
  • 242,166
5

I usually do it like this:

$ echo -n "$variable" | wc -m
10

To do commands I'd adapt it like so:

$ echo -n "$(readlink -f /etc/fstab)" | wc -m
10

This approach is similar to what you were doing in your 2 steps, except we're combining them into a single one liner.

slm
  • 369,824
  • 2
    You must use -m instead of -c. With unicode characters, your approach will be broken. – cuonglm Oct 11 '14 at 02:48
  • 1
    Why not simply readlink -f /etc/fstab | wc -m ? – Phil Frost Oct 11 '14 at 13:30
  • 1
    Why do you use this unreliable method instead of ${#variable}? At least use double quotes echo -n "$variable", but this still fails if e.g. the value of variable is -e. When you use it in combination with a command substitution, keep in mind that trailing newlines are stripped off. – Gilles 'SO- stop being evil' Oct 11 '14 at 14:10
  • @philfrost b/c what I showed built off of what the op was already thinking. Also it works for any cmds that he may have setup prior in vars and wants their lengths afterwords. Also terdon has that example already. – slm Oct 11 '14 at 14:47
1

You can call external utilities (see other answers), but they will make your script slower, and it's hard to get the plumbing right.

Zsh

In zsh, you can write ${#$(readlink -f /etc/fstab)} to get the length of the command substitution. Note that this isn't the length of the command output, it's the length of the output without any trailing newline.

If you want the exact length of the output, output an extra non-newline character at the end, and subtract one.

$((${#$(readlink -f /etc/fstab; echo .)} - 1))

If what you want is the payload in the command's output, then you need to subtract two here, because the output of readlink -f is the canonical path plus a newline.

$((${#$(readlink -f /etc/fstab; echo .)} - 2))

This differs from ${#$(readlink -f /etc/fstab)} in the rare but possible case where the canonical path itself ends in a newline.

For this specific example, you don't need an external utility at all, because zsh has a built-in construct that's equivalent to readlink -f, through the history modifier A.

echo /etc/fstab(:A)

To get the length, use the history modifier in a parameter expansion:

${#${:-/etc/fstab}:A}

If you have the file name in a variable filename, that would be ${#filename:A}.

Bourne/POSIX-style shells

None of the pure Bourne/POSIX shells (Bourne, ash, mksh, ksh93, bash, yash …) have any similar extension that I know of. If you need to apply a parameter substitution to the output of a command substitution or to nest parameter substitutions, use successive stages.

You can stuff the processing into a function if you like.

command_output_length_sans_trailing_newlines () {
  set -- "$("$@")"
  echo "${#1}"
}

or

command_output_length () {
  set -- "$("$@"; echo .)"
  echo "$((${#1} - 1))"
}

but there's usually no benefit; except with ksh93, that causes an extra fork to be able to use the output of the function, so it makes your script slower, and there's rarely any readability benefit.

Once again, the output of readlink -f is the canonical path plus a newline; if you want the length of the canonical path, subtract 2 instead of 1 in command_output_length. Using command_output_length_sans_trailing_newlines gives the right result only when the canonical path itself doesn't end in a newline.

Bytes vs characters

${#…} is supposed to be the length in characters, not in bytes, which makes a difference in multibyte locales. Reasonably up-to-date versions of ksh93, bash and zsh calculate the length in characters according to the value of LC_CTYPE at the time the ${#…} construct is expanded. Many other common shells don't really support multibyte locales: as of dash 0.5.7, mksh 46 and posh 0.12.3, ${#…} returns the length in bytes. If you want the length in characters in a reliable way, use the wc utility:

$(readlink -f /etc/fstab | wc -m)

As long as $LC_CTYPE designates a valid locale, you can be confident that this will either error out (on an ancient or restricted platform that doesn't support multibyte locales) or return the correct length in characters. (For Unicode, “length in characters” means the number of code points — number of glyphs is yet another story, due to complications such as combining characters.)

If you want the length in bytes, set LC_CTYPE=C temporarily, or use wc -c instead of wc -m.

Counting bytes or characters with wc includes any trailing newlines from the command. If you want the length of the canonical path in bytes, it's

$(($(readlink -f /etc/fstab | wc -c) - 1))

To get it in characters, subtract 2.

  • @cuonglm No, you need to subtract 1. echo . adds two characters, but the second character is a trailing newline which is stripped by the command substitution. – Gilles 'SO- stop being evil' Dec 17 '15 at 10:31
  • The newline is from readlink output, plus the . by echo. We both agree that echo . add two characters but the trailing newline was stripped. Try with printf . or see my answer http://unix.stackexchange.com/a/160499/38906. – cuonglm Dec 17 '15 at 10:34
  • @cuonglm The question asked the number of characters in the command's output. The output of readlink is the link target plus a newline. – Gilles 'SO- stop being evil' Dec 17 '15 at 10:42
0

This works in dash but it does require that the targeted var is definitely empty or unset. That is why this is actually two commands - I explicitly empty $l in the first:

l=;printf '%.slen is %d and result is %s\n' \
    "${l:=$(readlink -f /etc/fstab)}" "${#l}" "$l"

OUTPUT

len is 10 and result is /etc/fstab

That's all shell builtins - not including the readlink of course - but evaluating it in the current shell that way implies that you must do the assignment before getting the len, which is why I %.silence the first argument in the printf format string and just add it again for the literal value at the tail of printf's arg list.

With eval:

l=$(readlink -f /etc/fstab) eval 'l=${#l}:$l'
printf %s\\n "$l"

OUTPUT

10:/etc/fstab

You can get close to that same thing, but instead of the output in a variable in the first command you get it on stdout:

PS4='${#0}:$0' dash -cx '2>&1' "$(readlink -f /etc/fstab)"

...which writes...

10:/etc/fstab

...to file descriptor 1 without assigning any value to any vars in the current shell.

mikeserv
  • 58,310
  • 1
    Isn't that exactly what the OP wanted to avoid? "I understand it is possible to do this by first saving the output to a variable: variable=$(readlink -f /etc/fstab); echo ${#variable}; But I would like to remove the extra step." – terdon Oct 11 '14 at 13:35
  • @terdon, probably I misunderstood, but it was my impression that the semicolon was the problem and not the variable. That's why these get the len and output in a single simple command using only shell builtins. The shell doesn't exec readlink then exec expr, for instance. It probably only matters if somehow getting the len occludes the value, which I admit I'm having difficulty understanding why that may be, but I suspect there could be a case in which it mattered. – mikeserv Oct 11 '14 at 17:28
  • 1
    The eval way, by the way, is probably the cleanest here - it assigns the output and the len to the same var name in a single execution - very close to doing l=length(l):out(l). Doing expr length $(command) does occlude the value in favor of the len, by the way. – mikeserv Oct 11 '14 at 17:34