2

I wish to list directories while calculating their size and print the whole thing in a simple command, using ls and awk, work great to control the list and what I wish to display :

ls -AlhF /usr | awk '{print $6, $7, $8, "\011", $9 }'

But I would like to go a bit further and to get the size of the folders or the files, use $9 with the command du -s with something like this :

ls -AlhF /usr | awk '{print $6, $7, $8, "\011", $9, du -s /usr/$9 }'

I've tried many ways to write it, with backticks, doublequote, but I always end up with errors or unwanted results.

Noam M
  • 451
chistof
  • 21
  • 1
    Don't Parse ls. Also, awk is not sh. backticks and $() aren't supported. Use the system() function instead. – cas Aug 02 '17 at 05:31
  • Thanks for the suggestions, I will think about not parsing ls after, but for now, after trying and searching, I got something almost working : ls -lAhF /usr | awk '{system("du -s /usr/"$9" | cut -f1")}{print $9}' The problem is that both information don't appear on the same line... – chistof Aug 02 '17 at 06:22
  • Don't use system as it is prone to the risk of command injection. It is best to pipe through to sh first. See me solution – Raman Sailopal Aug 02 '17 at 08:25

3 Answers3

3

To answer the question and ignoring the issues about parsing the output of ls for now, to get the output (stdout) of a command in awk, you do:

cmd="that command"
output=""
while ((cmd | getline line) > 0)
  output = output line RS
status = close(cmd)

How the exit status is encoded in the status variable varies from one awk implementation to the next. All you can rely on is that it will be 0 if and only if cmd succeeds.

getline starts a shell (usually a POSIX shell, but on some systems, that could be a Bourne shell) to parse that command line. So it's important to properly quote data in there. Best is to use single quotes which is the safest.

Here since the output will be a single line anyway (your approach can't handle file names with any spacing characters, let alone newlines), you only need to do one getline:

 awk -v q="'" '
   function shquote(s) {
     gsub(q, q "\\" q q, s)
     return q s q
   }
   {
     cmd = "du -sk " shquote("/usr/" $9)
     cmd | getline du_output
     status = close(cmd)
   }'

If you call getline without a variable name, then it sets $0 which can make it easier to extract the disk usage:

DIR=/usr export DIR
LC_ALL=C ls -l -- "$DIR" | awk -v q="'" '
  function shquote(s) {
    gsub(q, q "\\" q q, s)
    return q s q
  }
  {
    date = $6 " " $7 " " $8
    name = $9
    cmd = "du -sk " shquote(ENVIRON["DIR"] (ENVIRON["DIR"] ~ /^\/*$/ ? "" : "/") name)
    cmd | getline
    disk_usage = $1
    print date "\t" name "\t" disk_usage
  }'
1
ls -AlhF /usr | awk '{print "echo "$6" "$7" "$8" \011 "$9" $(du -s /usr/"$9")" }' | sh

Construct an echo and du command with awk and then execute it through piping to sh.

  • Also a command injection vulnerability (try with a file called $(reboot) for instance). Can also cause problems with echo implementations that expand escape sequences. – Stéphane Chazelas Aug 02 '17 at 08:26
0

Here's a perl implementation which doesn't rely on parsing the output of ls -l (which is inherently unreliable), and can easily be modified to use any date or size output format required.

Save it to a file, make it executable and run it as /path/to/script /usr. The script requires one or more directory name arguments.

#!/usr/bin/perl

use strict;
use Date::Format;

if ($#ARGV < 0) { die "Missing directory argument" };

foreach my $dir (@ARGV) {
    $dir =~ s:/+$::;  # strip trailing /
    opendir(my $dh, $dir) || die "Can't opendir $dir: $!";

    while (readdir $dh) {
      next if (m/^\.{1,2}$/); # skip . and .. entries

      my $mtime=(stat("$dir/$_"))[9];   # extract mtime field from stat
      my $mtime_f = time2str('%Y-%m-%d %X', $mtime);

      open(my $fh, "-|", 'du', '-s', '--', "$dir/$_");
      my $du = <$fh>;
      ($du) = split(' ',$du); # only want the first blank-separated field
      close($fh).

      printf "%s\t%s\t%s\n", $mtime_f, "$dir/$_" , $du;
    }

    closedir $dh;
};

There are simpler, shorter ways of doing this (in perl or awk or other languages) but this provides a useful base for reporting other information about directories and their contents. see perldoc -f stat for full details on the information available from the stat() function.

BTW, this could be implemented in awk, but you'd have to implement your own stat() function and a date-formatting function. Easier to use perl.

GNU awk aka gawk supports loadable modules and the filefuncs module provides a stat() which populates a statdata[] array. Use by adding @load "filefuncs" at the top of your awk script.

cas
  • 78,579
  • That qx(du -s "$dir/$_") is a command injection vulnerability. You could use the environ: $ENV{FILE}="$dir/$_" and qx'du -sk -- "$FILE"', or use open to also avoid having to run a shell. – Stéphane Chazelas Aug 02 '17 at 07:22
  • yeah, i was being lazy. easily fixed. I'll open a pipe to read from du instead. I thought of using Filesys::DiskUsage, but wanted to avoid uncommon modules. (by "uncommon", i mean "not packaged for debian" :) – cas Aug 02 '17 at 07:43
  • The stripping of trailing / is not valid for /path/to/script / – Stéphane Chazelas Aug 02 '17 at 07:57
  • du -s uses tab as separator on my system, not a single space. du (GNU coreutils) 8.26 on debian sid. – cas Aug 02 '17 at 08:07
  • split(' ', $var) does the default split (like awk -F ' '), not splitting on one space character. split(/\s/) is also wrong (as there can be any amount of blanks), so would split(/\s+/) be (as there could be leading blanks). Other option is to use /(\d+)/. – Stéphane Chazelas Aug 02 '17 at 08:12