3

I'm trying to write a ls wrapper that uses awk to parse the output of ls -lhF. Right now I've split the program into two files - my_ls.sh and my_ls.awk. my_ls.sh's only purpose is to pipe the output of ls -lhF into my_ls.awk. It looks like:

#!/bin/bash
ls -lhF "$@" | my_ls.awk

I was wondering if there was any way to read the output of ls -lhF through the awk script itself.

EDIT: My main purpose is to write a script which shows the current directory contents in the form of a nice tree. A draft version of my_ls.awk would look like:

#!/usr/bin/awk -f

( NF >= 9 ) {
    print "|-- [" $5 "] " $9
}

This is where I've reached so far.

  • 5
    DON'T parse ls output for your own good. That is almost never the right approach for most tasks. I would suggest telling us instead what you hope to accomplish, so others may suggest better approaches. – jw013 Jul 16 '12 at 01:31
  • 1
    @jw013 That article warns about parsing ls in for loops etc. I don't see why I shouldn't use awk to parse it. AFAIK ls gives a clean output, with fields separated by whitespaces. And I think that less was created to parse exactly that kind of data. – westeros91 Jul 16 '12 at 01:36
  • 2
    I'd suggest re-reading the article, carefully this time. The problem is ls doesn't display filenames reliably when strange characters are present. There's also possible inconsistencies with the time format. There is no reliable way to parse ls output - it doesn't matter what you try to use. Awk does not have special magic that allows it to parse ls. Finally, less is just a pager - it doesn't parse anything. – jw013 Jul 16 '12 at 01:47
  • @jw013 Sure, but what I'm trying to implement is a ls wrapper. I have no intention of working with files based on the output of ls. I merely want to show the output in a different form. I still don't see how special characters would cause a problem to that. And I don't know why I mentioned less in my earlier comment (and I can't edit it now :P). I meant awk alright. https://github.com/trapd00r/ls--/ parses ls uses Perl. – westeros91 Jul 16 '12 at 01:56
  • 1
    If your end goal is simply cosmetic reformatting for display only, then most of the caveats don't apply - slightly garbled display in some cases is not a big deal. – jw013 Jul 16 '12 at 02:11
  • @jw013 Yes, my goal is just reformatting the output. This is where I've reached so far - https://i.minus.com/iSggaP2lAlT5W.png – westeros91 Jul 16 '12 at 02:12
  • +1 for an attempt to solve your problem. Nice looking output. So you want to indent the child files in relation to their parent directories? That will be nice. To answer your question, you know about ls -lfH ${@} | awk 'NR>9{print ....}' don't you? You don't need an intervening script. 99% of all unix tools allow piping of stdout from 1st program, into stdin of 2nd program, and chained on to a reasonable infinity stdout|stdin ... Good luck. – shellter Jul 16 '12 at 02:34
  • 1
    Also, before you possibly reinvent an existing wheel, take a look at tree. Most distributions have a package for it. – jw013 Jul 16 '12 at 12:48
  • If you would still like to build your own, you might want to have a look at stat and build the desired output yourself, instead of tweaking the output of ls. – janmoesen Jul 16 '12 at 13:43

3 Answers3

4

I'll join the other advice that you shouldn't parse the output of ls, so this is a bad example. But as a more general matter, I would include the awk script directly in the shell script by passing it as an argument to awk.

#!/bin/bash
ls -lhF "$@" | awk '
    ( NF >= 9 ) {
        print "|-- [" $5 "] " $9
    }'

Note that if the awk script must include the ' (single quote) character, you need to quote it: use '\'' (close single quote, literal single quote, open single quote).

To avoid having to quote, you can use a here document instead. But it's awkward because you can't use standard input both to feed input to awk and to feed the script. You need to use an additional file descriptor (see When would you use an additional file descriptor? File descriptors & shell scripting).

#!/bin/bash
ls -lhF "$@" | awk -f /dev/fd/3 3<<'EOF'
( NF >= 9 ) {
    print "|-- [" $5 "] " $9
}
EOF

Inside awk, you can read input from another command using the getline function and the pipe construct. It's not the way awk is primarily designed to be used, but it can be made to work. You need to quote the file name arguments for the underlying shell, which is highly error-prone. And since the text to be processed doesn't come from the expected sources (standard input or the files named on the command line), you end up with all the code in the BEGIN block.

#!/usr/bin/awk -f
BEGIN {
    command = "ls -lhF"
    for (i = 1; i <= ARGC; i++) {
        arg = ARGV[i];
        gsub("'", "'\\''", arg);
        command = command " '" arg "'";
    }
    ARGC = 0; for (i in ARGV) delete ARGV[i];
    while ((command | getline) > 0) {
        if (NF >= 9) { print "|-- [" $5 "] " $9 }
    }
}

In short, use a shell for what it's good at (such as piping commands together), and awk for what it's good at (such as text processing).

2

I am not quite sure what you are trying to do, but one issue which can come up is getting awk to print out what ls considers to be the last field, but which awk does not consider to be so (via its default parsing). eg.

-rw-r--r-- | 433k | filename-with-no-spaces      
-rw-r--r-- |   1k | link containing  spaces -> /home/user/filename-with-no-spaces

Somehow you need to isolate the last ls field. The approach taken below, is to find the length of all preceding fields and delilimiter. The rest is the filename field (plus other info, like a link's target). 

The script below determines the maximum width of the variable-width size field (needed for output formatting). There are multiple ways to get this width;   eg. (1) use awk to process each line of ls output ,in the main loop, adding each line to an array for subequent END{ } processing.   or (2) write the output of ls to a temporary file, and then have awk process that file.   The method shown below uses (2).

Note that the output of ls can send some perhaps unexpected, non-simple, outpt your way, as in the case of a link, so it is generally safer to use find and customize it's output to better suit your parsing needs.

f=7               # the number of (multi-space) delimiters before the start of the filename  
myls="$(mktemp)"  # a temp file to hold  output from `ls`
w=$(ls --color=always -lFHk ~/ |tee "$myls" |awk '{print $5}' |wc -L) # max width of size field
h=k               # size unit
awk --re-interval -v"f=$f" -v"w=$w" -v"h=$h" '
  NF >= f {
    regex = "^([^ ]+ +){"f"}" 
    match( $0, regex )  # find start of name field
    printf( "%s | %"w"s%s | %s\n", $1, $5, h, substr( $0, RLENGTH ))
  }' "$myls"
rm "$myls"
Peter.O
  • 32,916
  • Thanks! But still not what I asked. I wanted to know if I can get rid of all shell scripts, and process the output of ls using an awk script alone. – westeros91 Jul 17 '12 at 14:36
1

I recommend avoiding reinventing the wheel, and instead using tree, which presents a directory's files/folders and subdirectories files/folders:

tree(1) - Linux man page

Name

tree - list contents of directories in a tree-like format.

Synopsis

tree [-adfghilnopqrstuvxACDFNS] [-L level [-R]] [-H baseHREF] [-T title] [-o filename] [--nolinks] [-P pattern] [-I pattern] [--inodes] [--device] [--noreport] [--dirsfirst] [--version] [--help] [--filelimit #] [directory ...]

Description

Tree is a recursive directory listing program that produces a depth indented listing of files. Color is supported ala dircolors if the LS_COLORS environment variable is set, output is to a tty, and the -C flag is used. With no arguments, tree lists the files in the current directory. When directory arguments are given, tree lists all the files and/or directories found in the given directories each in turn. Upon completion of listing all files/directories found, tree returns the total number of files and/or directories listed.

By default, when a symbolic link is encountered, the path that the symbolic link refers to is printed after the name of the link in the format:

name -> real-path

If the '-l' option is given and the symbolic link refers to an actual directory, then tree will follow the path of the symbolic link as if it were a real directory.

laebshade
  • 2,176