Explanation of `sed` usage in specific shell script

Question

While reading an online tutorial, I came across the following code:

#!/bin/bash
# Counting the number of lines in a list of files
# for loop over arguments
# count only those files I am owner of

if [ $# -lt 1 ]
then
  echo "Usage: $0 file ..."
  exit 1
fi

echo "$0 counts the lines of code" 
l=0
n=0
s=0
for f in $*
do
  if [ -O $f ] # checks whether file owner is running the script
  then 
      l=`wc -l $f | sed 's/^\([0-9]*\).*$/\1/'`
      echo "$f: $l"
      n=$[ $n + 1 ]
      s=$[ $s + $l ]
  else
      continue
  fi
done

echo "$n files in total, with $s lines in total"

What is the purpose of the sed call in this example?

Please don't post images of text, copy and paste the text directly into the question in a code block. Also if this is coursework I recommend finding another course. — jesse_b, May 24 '18 at 13:15
Thank you very much for helping me! it was a very hard for me to learn linux from the internet and i do realise now that this coursework is a little bit unefficient :) — Otilia Domnea, May 24 '18 at 13:49
But i still have a little problem ...and i would be very grateful if you could give me one more helping hand. Having read your answer, i understood the output of the sed function, but i want you to explain me step by step what sed 's/^([0-9]).$/\1/'` means ( why did they choose that arguments) :) — Otilia Domnea, May 24 '18 at 13:55

jesse_b · Accepted Answer · 2018-05-24T14:02:33.697

The sed command in example 6 pulls only the number of lines out of the wc -l output.

It's running wc -l on $f (the file owned by the running of the script that was passed in as an argument). This would normally produce an output like so:

$ wc -l .bashrc
17 .bashrc

Number of lines in column 1 and filename in column 2. The sed command is grabbing only the number of lines in a pretty unnecessary way.

$ wc -l .bashrc | sed 's/^\([0-9]*\).*$/\1/'
17

The sed statement 's/^$[0-9]*$.*$/\1/' does the following:

^ - Match the beginning of the line
$[0-9]*$ - Match any numbers unlimited times (The escaped parenthesis form a capture group)
.* - Match anything unlimited times
$ - Match the end of the line
\1 - Represents the contents of the first capture group.

Essentially this is matching any line that starts with a number and replacing the whole line with the first capture group (the number).

Thanks Stephen Kitt for recommending this:

$ wc -l < .bashrc
17

Otherwise using cut or awk would be better for something like this:

$ wc -l .bashrc | cut -d' ' -f1
17

$ wc -l .bashrc | awk '{print $1}'
17

Kusalananda · Answer 2 · 2018-05-24T17:02:05.077

The use of sed in that piece of code is to parse the output of wc -l to extract the number of lines in the file.

This is usually not needed as

l=$( wc -l <"$f" )

would have done the same thing (you should try this).

The script is using a few constructs that are non-portable and considered "obsolete", and there are details in the script that makes it unsafe.

Expansions should be quoted. For example, if [ $# -lt 1 ] is better written as if [ "$#" -eq 0 ], and if [ -O $f ] should be if [ -O "$f" ]. This way we can support filenames that contain any characters, even characters that are part of $IFS (spaces, tabs and newlines). The $# should be quoted in case $IFS contains digits for some reason or other.

For more on this, see the three other questions entitled "Security implications of forgetting to quote a variable in bash/POSIX shells", "Why does my shell script choke on whitespace or other special characters?" and "When is double-quoting necessary?".
Command substitution using backticks is troublesome under some circumstances. The line saying l=`wc -l ...` could be rewritten as l=$(wc -l ...). The newer $(...) is better since it nests, since quoting works as expected (compare e.g. echo "`echo "`echo ok`"`", which generates a syntax error, with echo "$(echo "$(echo ok)")"), and since it is easier to read.

For more on this, see e.g. "Have backticks (i.e. `cmd`) in *sh shells been deprecated?"
$[ $s + $l ] is a non-portable way of saying $(( s + l )).
Variable data should be outputted using printf rather than by using echo. For example, that last line,
```
echo "$n files in total, with $s lines in total"
```
may be rewritten as
```
printf '%d files in total, with %d lines in total\n' "$n" "$s"
```
See e.g. "Why is printf better than echo?".
Using $* to loop over the command line arguments stops the script from functioning on filenames containing spaces.
The continue statement and the else branch of the if statement is not needed at all as it comes last in the loop anyway.
Diagnostic output should be printed to standard error.

"Corrected" version of the script:

#!/bin/bash
# Counting the number of lines in a list of files
# for loop over arguments
# count only those files I am owner of

if [ "$#" -eq 0 ]; then
    printf 'Usage: %s file ...\n' "$0" >&2
    exit 1
fi

printf '%s counts the lines of code\n' "$0"
l=0; n=0; s=0
for name do
    if [ -f "$name" ] && [ -O "$name" ]; then # checks whether its a regular file and file owner is running the script
        nlines=$( wc -l <"$name" )
        printf '%s: %d\n' "$name" "$nlines"
        totlines=$(( totlines + nlines ))
        nfiles=$(( nfiles + 1 ))
    fi
done

printf '%d files in total, with %s lines in total" "$nfiles" "$totlines"

Explanation of `sed` usage in specific shell script

2 Answers2