9

Is there a quicker way of getting a couple of column of values than futzing with sed and awk?

For instance, if I have the output of ls -hal / and I want to just get the file and directory names and sizes, how can I easily and quickly doing that, without having to spend several minutes tweaking my command.

total 16078
drwxr-xr-x    33 root  wheel   1.2K Aug 13 16:57 .
drwxr-xr-x    33 root  wheel   1.2K Aug 13 16:57 ..
-rw-rw-r--     1 root  admin    15K Aug 14 00:41 .DS_Store
d--x--x--x     8 root  wheel   272B Jun 20 16:40 .DocumentRevisions-V100
drwxr-xr-x+    3 root  wheel   102B Mar 27 12:26 .MobileBackups
drwx------     5 root  wheel   170B Jun 20 15:56 .Spotlight-V100
d-wx-wx-wt     2 root  wheel    68B Mar 27 12:26 .Trashes
drwxrwxrwx     4 root  wheel   136B Mar 30 20:00 .bzvol
srwxrwxrwx     1 root  wheel     0B Aug 13 16:57 .dbfseventsd
----------     1 root  admin     0B Aug 16  2012 .file
drwx------  1275 root  wheel    42K Aug 14 00:05 .fseventsd
drwxr-xr-x@    2 root  wheel    68B Jun 20  2012 .vol
drwxrwxr-x+  289 root  admin   9.6K Aug 13 10:29 Applications
drwxrwxr-x     7 root  admin   238B Mar  5 20:47 Developer
drwxr-xr-x+   69 root  wheel   2.3K Aug 12 21:36 Library
drwxr-xr-x@    2 root  wheel    68B Aug 16  2012 Network
drwxr-xr-x+    4 root  wheel   136B Mar 27 12:17 System
drwxr-xr-x     6 root  admin   204B Mar 27 12:22 Users
drwxrwxrwt@    6 root  admin   204B Aug 13 23:57 Volumes
drwxr-xr-x@   39 root  wheel   1.3K Jun 20 15:54 bin
drwxrwxr-t@    2 root  admin    68B Aug 16  2012 cores
dr-xr-xr-x     3 root  wheel   4.8K Jul  6 13:08 dev
lrwxr-xr-x@    1 root  wheel    11B Mar 27 12:09 etc -> private/etc
dr-xr-xr-x     2 root  wheel     1B Aug 12 21:41 home
-rw-r--r--@    1 root  wheel   7.8M May  1 20:57 mach_kernel
dr-xr-xr-x     2 root  wheel     1B Aug 12 21:41 net
drwxr-xr-x@    6 root  wheel   204B Mar 27 12:22 private
drwxr-xr-x@   68 root  wheel   2.3K Jun 20 15:54 sbin
lrwxr-xr-x@    1 root  wheel    11B Mar 27 12:09 tmp -> private/tmp
drwxr-xr-x@   13 root  wheel   442B Mar 29 23:32 usr
lrwxr-xr-x@    1 root  wheel    11B Mar 27 12:09 var -> private/var

I realize there are a bazillion options for ls and I could probably do it for this particular example that way, but this is a general problem and I'd like a general solution to getting specific columns easily and quickly.

cut doesn't cut it because it doesn't take a regular expression, and I virtually never have the situation where there's a single space delimiting columns. This would be perfect if it would work:

ls -hal / | cut -d'\s' -f5,9

awk and sed are more general than I want, basically entire languages unto themselves. I have nothing against them, it's just that unless I've recently being doing a lot with them, it requires a pretty sizable mental shift to start thinking in their terms and write something that works. I'm usually in the middle of thinking about some other problem I'm trying to solve and suddenly having to solve a sed/awk problem throws off my focus.

Is there a flexible shortcut to achieving what I want?

iconoclast
  • 9,198
  • 13
  • 57
  • 97
  • 1
    futz |fəts| verb [ no obj. ] informal waste time; idle or busy oneself aimlessly; Getting to know sed and awk is in no way futzing my friend. If it is anything it is opposite as it saves many many hours. – Ketan Feb 14 '14 at 15:56
  • 1
    That's an overly-narrow definition of "futz". Would you prefer I used "fiddle"? I'm in no way disputing the value of sed or awk, just pointing out that I don't want to shift focus from one thing to another. – iconoclast Feb 14 '14 at 19:10
  • 2
    Opposed to most others in this thread, I think this is a good idea. AWK and SED are hard to get into, if you are new to this type of language. Especially, if you don't use it much. I use both from time to time, not too often, but definitely regularly and it is definitely not really easy to handle it, especially after a long break from them. I guess it would be helpful if there were a awk/sed wrapper with a much simpler API. That would be 1. a possible 2. not bad solution, I think. – Akito May 14 '20 at 00:09
  • See also du -ahd1 for this particular task. Disk usage, -a to include files, -h for human-readable, -d1 for max depth 1. – Steve Clay Oct 15 '21 at 14:49

10 Answers10

15

I'm not sure why

ls -hal / | awk '{print $5, $9}'

seems to you to be much more disruptive to your thought processes than

ls -hal / | cut -d'\s' -f5,9

would have been, had it worked. Would you really have to write that down? It only takes a few awk lines before adding the {} becomes automatic. (For me the hardest issue is remembering which field number corresponds to which piece of data, but perhaps you don't have that problem.)

You don't have to use all of awk's features; for simply outputing specific columns, you need to know very little awk.

The irritating issue would have been if you'd wanted to output the symlink as well as the filename, or if your filenames might have spaces in them. (Or, worse, newlines). With the hypothetical regex-aware cut, this is not a problem (except for the newlines); you would just replace -f5,9 with -f5,9-. However, there is no awk syntax for "fields 9 through to the end", and you're left with having to remember how to write a for loop.

Here's a little shell script which turns cut-style -f options into an awk program, and then runs the awk program. It needs much better error-checking, but it seems to work. (Added bonus: handles the -d option by passing it to the awk program.)

#!/bin/bash
prog=\{
while getopts f:d: opt; do
  case $opt in
    f) IFS=, read -ra fields <<<"$OPTARG"
       for field in "${fields[@]}"; do
         case $field in
           *-*) low=${field%-*}; high=${field#*-}
                if [[ -z $low  ]]; then low=1; fi
                if [[ -z $high ]]; then high=NF; fi
                ;;
            "") ;;
             *) low=$field; high=$field ;;
         esac
         if [[ $low == $high ]]; then
           prog+='printf "%s ", $'$low';'
         else
           prog+='for (i='$low';i<='$high';++i) printf "%s ", $i;'
         fi
       done
       prog+='printf "\n"}'
       ;;
    d) sep="-F$OPTARG";;
    *) exit 1;;
  esac
done
if [[ -n $sep ]]; then
  awk "$sep" "$prog"
else
  awk "$prog"
fi

Quick test:

$ ls -hal / | ./cut.sh -f5,9-
7.0K bin 
5.0K boot 
4.2K dev 
9.0K etc 
1.0K home 
8.0K host 
33 initrd.img -> /boot/initrd.img-3.2.0-51-generic 
33 initrd.img.old -> /boot/initrd.img-3.2.0-49-generic 
...
rici
  • 9,770
  • Awk is quite literally another language with different syntax. Would you understand if we were talking about AppleScript instead? Or take human languages: no matter how well you know another human language, switching back and forth requires extra effort, unless you have a lot of practice making that switch frequently. – iconoclast Sep 10 '19 at 04:16
4

I believe that there are no simpler solution than sed or awk. But you can write your own function.

Here is list function (copy paste to your terminal):

function list() { ls -hal $1 | awk '{printf "%-10s%-30s\n", $5, $9}'; }

then use list function:

list /

list /etc
damphat
  • 3,274
  • Actually, the problem here is that the function doesn't take arguments, so it's very narrowly focussed on a specific case, and is not flexible. If you could work out the quoting so that you can pass arguments into it, that would be useful. And of course remove everything before the pipe, so you can use it with any arbitrary input, and specify which columns you want. – iconoclast Feb 14 '14 at 19:27
  • 1
    in other words, please re-create awk but give it a different name and fewer features – Zac Thompson Nov 10 '17 at 23:09
  • 1
    @ZacThompson: yeah, like 0.1% of the features, and a much simpler API. That would be quite useful. – iconoclast Sep 10 '19 at 04:13
3

You can't just talk about "columns" without also explaining what a column is!

Very common in unix text processing is having whitespace as the column (field) separator and (naturally) newline as the row or record separator. Then awk is an excellent tool, that is very readable as well:

# for words (columns) 5 and 9:
ls -lah | awk '{print $5 " " $9}'
# or this, for the fifth and the last word:
ls -lah | awk '{print $5 " " $NF}'

If the columns are instead ordered character-wise, perhaps cut -c is better.

ls -lah | cut -c 31-33,46-

You can tell awk to use other field separators with the -F option. If you don't use -c (or -b) with cut, use -f to specify which columns to output.

The trick is knowledge about the input

Generally speaking, it's not always a good idea to parse output of ls, df, ps and similar tools with text-processing tools, at least not if you wish to be portable/compatible. In those cases, try to force the output in a POSIX-defined format. Sometimes this can be achieved by passing a certain option (-P perhaps) to the command generating the output. Sometimes by setting an environment variable such as POSIXLY_CORRECT or calling a specific binary, such as /usr/xpg4/bin/ls.

MattBianco
  • 3,704
2

This is an old question, but at the risk of rocking the boat, I have to say I agree with @iconoclast: there really ought to be a good, simple way of extracting selected columns in Unix.

Now, yes, awk can easily do this, it's true. But I think it's also true that it's "overkill" for a simple, common task. And even if the overkill factor isn't a concern, the extra typing certainly is: given how often I have columns to extract, I'd really rather not have to type print, and those braces, and those dollar signs, and those quotes, every time. And if the existence of awk and sed really imply that we don't need a simple column extractor, then by the same token I guess we don't need grep, either!

The cut utility ought to be the answer, but it's sadly broken. Its default is not "whitespace separated columns", despite the (IMO) overwhelming predominance of that need. And, in fact, it can't do arbitrary whitespace separation at all! (Thus iconoclast's question.)

Something like 35 years ago, before I'd even heard of cut, I wrote my own version. It works well; I'm still using it every day; I commend it to anyone who would like a better cut and who isn't hung up on using only "standard" tools. It's got one significant drawback in that its name, "column", has since been taken by a BSD utility.

Anyway, with this version of column in hand, my answer to iconoclast's question is

ls -hal / | column 5 9

or if you wish

ls -hal / | column 5,9

Man page and source tarball at http://www.eskimo.com/~scs/src/#column . Use it if you're interested; ignore it as the off-topic answer I suppose this is if you're not.

2

I'm amazed no one has written this already, but if your only objection to cut is that it won't handle repeated spaces as a single delimiter, how about you just squeeze the repeated spaces? That's one of the uses of tr.

ls -l | tr -s ' ' | cut -d ' ' -f5,9

Given the ls -l output shown in your question, the result would be:

1.2K .
1.2K ..
15K .DS_Store
272B .DocumentRevisions-V100
102B .MobileBackups
170B .Spotlight-V100
68B .Trashes
136B .bzvol
0B .dbfseventsd
0B .file
42K .fseventsd
68B .vol
9.6K Applications
238B Developer
2.3K Library
68B Network
136B System
204B Users
204B Volumes
1.3K bin
68B cores
4.8K dev
11B etc
1B home
7.8M mach_kernel
1B net
204B private
2.3K sbin
11B tmp
442B usr
11B var
Wildcard
  • 36,499
1

If you just want to display these two attributes (size and name), you can also use the stat tool (which is designed for just that - querying file attributes):

stat -c "%s  %n" .* *

will display the size and name of all files (including "hidden" files) in the current directory.

Notice: You chose ls as one application example for the use cases where you want to extract specific columns from the output of a program. Unfortunately, ls is one of the examples where you should really avoid using text-processing tools to parse the output.

AdminBee
  • 22,803
1

Using Raku (formerly known as Perl_6)

ls -hal / | raku -ne '.words[4,8].say;'

OR

ls -hal / | raku -ne '.words[4,8].put unless .words.elems < 8;'

The OP seems to be searching for a language that helps with tabular data. For whatever reason, sed and awk don't fit the bill (which is fine, BTW).

If you want a language that's built from the ground up to handle Unicode text, look no further than Raku. There seems to be the hope that simple regexes can be used to extract columnar-style data, so here's an example in Raku:

Saved page https://unicode-org.github.io/cldr-staging/charts/39/summary/en.html as text:

raku -e '.split(/\t/)[*-1].put if m:g/ "[" .+? "]" / for lines.skip(56);' en.txt

[a b c d e f g h i j k l m n o p q r s t u v w x y z] [A B C D E F G H I J K L M N O P Q R S T U V W X Y Z] [- ‑ , . / % ⓘ [--/] [:∶︓﹕:] [.․。︒﹒.。] ['ʼ՚᾽᾿’'] [%٪﹪%] [؉‰] [$﹩$] [£₤£] [¥¥] [₩₩] [₨₹{Rp}{Rs}] [-‐‒–⁻₋−➖﹣-] [,،٫⹁、︐︑﹐﹑,、] [+⁺₊➕﬩﹢+] [,٫⹁︐﹐,] [.․﹒.。]

Raku can:

1). split on whitespace with .words, and/or
2). destructively using a regex with .split(/.../), and/or
3). non-destructively around a defined regex with .comb(/.../),
so there's a good chance you'll be able to extract the columnar data you're looking for.

https://raku.org/

jubilatious1
  • 3,195
  • 8
  • 17
0

It might be trivial, but how about this?

ls -hal / |
    while IFS=' ' read x x x x five x x x nine etc;
        do echo $five $nine;
    done

Just take in account the fact that the the x, five, nine and etc variables will end up in the scope of a sub-shell... :)

Dacav
  • 568
0

You can do it find command

ls -hal| awk 'NR>1{print $(NF-4),$NF}'

Second option

find  . -maxdepth 1 -type f -printf "%s %p\n"
  • 2
    Your first command doesn’t use find. Your second option misses the point of the question, which is about generic extraction of column information, not the specific file information from ls. – Stephen Kitt Sep 09 '21 at 12:45
  • also your awk command says the same thing as mentioned in other existing answers – αғsнιη Sep 10 '21 at 17:47
0

The output of ls should never be parsed. Parsing ls is hard, column widths vary, number of columns can change mid-output, ...

Use the stat command (man stat). stat will let you output the data you want, in the format you want.

For example,

stat --format="%s %n" * .*
waltinator
  • 4,865