How do you replace sed and wc with awk? (counting names in directories)

Question

I'd like to count the number of elements inside a folder. I thought of

function lsc { /bin/ls -l $1 | sed 1d | wc -l }

but then I remembered how awk people reduced those kind of pipes, getting rid of superfluous greps and seds. So, how would you do that in awk?

Just use ls -1 | wc -l (that's a number 1 rather than a letter l) ? Or if keen on using awk, you could use ls -l | awk 'NR>1{a++}END{print a}' — steve, Dec 27 '20 at 10:21
@steve Either of those will only work if no name contains embedded newlines. Do touch $'file\n1' $'file\n2' and test it. — Kusalananda, Dec 27 '20 at 11:49
@Kusalananda when was last time you saw a filename with a newline ? serious question. I've been in the industry for nearly 15 years and never encountered a single one. — ychaouche, Dec 27 '20 at 15:04
@ychaouche Since newlines are allowed, I'd rather write software that copes with them than knowingly writing software that would break whenever such a filename is encountered. You may, for example, have to deal with files that have names that were not written by a human, but just taken from some database. Also, since it's easy to deal with all possible filenames, why not do that? — Kusalananda, Dec 27 '20 at 15:17
@ychaouche among other, less nefarious things, attackers with appropriate authority might be able to create a file name containing newlines on a system, relying on some otherwise-innocent but buggy software to try to access it and lead to privilege escalation, denial of service, etc. See for example https://nvd.nist.gov/vuln/detail/CVE-2011-1155 which describes a bug that would cause a DoS attack if logrotate tried to handle such a file. Not writing your code to account for all possible file name characters is just asking for trouble. — Ed Morton, Dec 27 '20 at 22:25
I've always maintained that newlines in filenames are either (a) a bug in some tool, or (b) something malicious or nefarious. Even if they are technically "allowed". — , Dec 28 '20 at 05:06
@sitaram I still find it fascinating that everyone finds it "acceptable" and not at least a fundamental design "oddity", like, did what did the persons in charge had in mind for allowing \n in file names, for what usage ? — ychaouche, Dec 28 '20 at 10:44
It's not about selecting which chars to allow, it's about selecting which ones to disallow. Where would you stop disallowing characters? Maybe disallow backslashes or other white space chars or punctuation chars or...? Why create file names containing any given char - who cares? The only 2 characters disallowed in file names are NUL because C-strings are NUL-terminated and / because directory paths are /-separated. There's simply no reason to disallow any other characters, people just have to write their code correctly and, as @kusalananda mentioned, it's easy to do so. — Ed Morton, Dec 28 '20 at 17:30
@ychaouche -- Maybe there was divergence of opinion, or maybe in some cases second thoughts. IMO, people that hide behind "the standard allows it, so the script must support it", need to read section 2.2 ("Standards permit the exclusion of bad filenames") of this page or section 4.8 ("Filename portability") of "The Open Group Base Specifications Issue 7 IEEE Std 1003.1, 2013 Edition", General Concepts. — , Dec 30 '20 at 11:50

Kusalananda · Accepted Answer · 2020-12-28T20:44:03.907

There is no need for ls, sed, wc or awk.

If you simply want to count how many names a pattern expands to, then you can do that with

set -- *
echo "$#"

The set command sets the positional parameters ($1, $2, etc.) to the names matching the * pattern. This automatically sets the special variable $# to the number of set positional parameters, i.e. the number of names matching the given pattern.

In bash or in a shell that has named arrays, you can use

names=(*)
echo "${#names[@]}"

This works similarly, but sets the elements of the names array to the names resulting from the expansion of the * pattern. The variable expansion ${#names[@]} will be the number of elements in the names array.

An issue with this is that if the pattern doesn't match anything, it will remain unexpanded, so you get a count of 1 (even though the directory is empty). To fix this in the bash shell, set the nullglob shell option with shopt -s nullglob. By setting this shell option, patterns that do not match anything will be removed completely.

In bash, if you additionally want to count hidden names, set the dotglob shell option with shopt -s dotglob.

Your function could look something like this in bash:

lsc () (
    shopt -s nullglob
    set -- "$1"/*
    echo "$#"
)

Note the use of ( ... ) for the function body to avoid setting nullglob in the calling shell.

Or, for /bin/sh:

lsc () {
    set -- "$1"/*
    if [ -e "$1" ] || [ -L "$1" ]; then
        echo "$#"
    else
        echo 0
    fi
}

The if statement here makes sure that the first positional parameter is the name of an actual file and not an unexpanded pattern (due to not matching anything). The -e ("exists") must be true for us to trust the number in $#. If it isn't true, then we additionally check whether the name refers to a symbolic link with the -L test. If this is true, we know that the first thing that the pattern expanded to was a "dead" symbolic link (a symbolic link pointing to a non-existent file), and we trust $# to be correct. If both tests fail, we know that we didn't match anything and therefore output 0.

score 0 · Answer 2 · answered Dec 27 '20 at 11:18

First, your one-liner function definition is incorrect. Presence of the function keyword indicates that you might use Bash and this is what it says under Shell Function Definitions in man bash:

function name [()] compound-command [redirection]

while the compound command is defined as:

{ list; }

So it should be:

function lsc { /bin/ls -l $1 | sed 1d | wc -l; }

But even though it's technically correct it will not work well, see Why not parse ls (and what to do instead)?.

You can print a number of files and directories in the given directory using awk like that:

awk 'END {print ARGC - 1}' *

but notice that if at least one of the arguments expanded by shell in place of * is a directory awk complains:

$ awk 'END {print ARGC - 1}' *
awk: warning: command line argument `dir1' is a directory: skipped

But anyway, the result is still fine and you can redirect errors to /dev/null:

awk 'END {print ARGC - 1}' * 2>/dev/null

If you change END to BEGIN then you won't get those warnings since awk doesn't have to open any files (in END NF must be set to the number of fields present in the last file opened, for example, while nothing related to file contents is set in BEGIN). In some older awk versions you may need to add ; exit before the } — Ed Morton, Dec 27 '20 at 17:26

How do you replace sed and wc with awk? (counting names in directories)

2 Answers2