104

I am getting output from a program that first produces one line that is a bunch of column headers, and then a bunch of lines of data. I want to cut various columns of this output and view it sorted according to various columns. Without the headers, the cutting and sorting is easily accomplished via the -k option to sort along with cut or awk to view a subset of the columns. However, this method of sorting mixes the column headers in with the rest of the lines of output. Is there an easy way to keep the headers at the top?

Mikel
  • 57,299
  • 15
  • 134
  • 153
jonderry
  • 2,089
  • 1
    I came across the following link. However, I can't get this technique of { head -1; sort; } to work. It always deletes a bunch of the text after the first line. Does anyone know why this happens? – jonderry Apr 23 '11 at 01:02
  • 2
    I suspect it's because head is reading more than one line into a buffer and throwing most of it away. My sed idea had the same problem. – Andy Apr 23 '11 at 01:09
  • @jonderry - that technique only works with lseekable input so it won't work when reading from a pipe. It will work if you redirect to a file >outfile and then run { head -n 1; sort; } <outfile – don_crissti Sep 26 '15 at 13:40
  • @jonderry I wonder if a specific line ending is observed in your particular tool. Some "Windows" command line tools are still coded for text processing of Linux line endings – Sun Feb 04 '20 at 03:25

16 Answers16

94

Stealing Andy's idea and making it a function so it's easier to use:

# print the header (the first line of input)
# and then run the specified command on the body (the rest of the input)
# use it in a pipeline, e.g. ps | body grep somepattern
body() {
    IFS= read -r header
    printf '%s\n' "$header"
    "$@"
}

Now I can do:

$ ps -o pid,comm | body sort -k2
  PID COMMAND
24759 bash
31276 bash
31032 less
31177 less
31020 man
31167 man
...

$ ps -o pid,comm | body grep less
  PID COMMAND
31032 less
31177 less
dessert
  • 1,687
Mikel
  • 57,299
  • 15
  • 134
  • 153
  • ps -C COMMAND may be more appropriate than grep COMMAND, but it's just an example. Also, you can't use -C if you also used another selection option such as -U. – Mikel Apr 23 '11 at 00:51
  • Or maybe it should be called body? As in body sort or body grep. Thoughts? – Mikel Apr 23 '11 at 00:57
  • I tried read in this form first, but noticed that it was eating leading whitespace. Making this a function is a good idea. +1 – Andy Apr 23 '11 at 01:01
  • 4
    Renamed from header to body, because you're doing the action on the body. Hopefully that makes more sense. – Mikel Apr 23 '11 at 01:02
  • @Andy, yeah, we should set IFS=. Fixed. – Mikel Apr 23 '11 at 01:04
  • 4
    Remember to call body on all subsequent pipeline participants: ps -o pid,comm | body grep less | body sort -k1nr – bishop Nov 07 '16 at 20:02
  • Can you modify the function so that it can act not only on pipes but on files,e.g. body sort -k2 foo and not just cat foo|body sort -k2 – Tim Sep 03 '17 at 09:53
  • 2
    @Tim You can just write <foo body sort -k2 or body sort -k2 <foo. Just one extra character from what you wanted. – Mikel Sep 04 '17 at 13:49
  • Cool stuff! Just as it was mentioned by others each next command in the pipe should be "body"ed: . . . | body cmd1 | body cmd2. Also it can be used in rare cases when the header contains more than 1 lines (for example mysql outputs in table format): msql -t -e "..." | body body body ... – jsxt May 06 '21 at 13:51
  • I've just realized that avoiding multiple body per each command can be reached with eval. For example: ps | body "grep firefox | sort" is a bit simpler than ps | body grep firefox | body sort and is still working. It's just needed to replace "$@" with eval "$@" in the function suggested by @Mikel. – jsxt May 07 '21 at 15:27
  • 1
    Slight side note: I know this is a generic solution, but I just wanted to point out that the ps command has the ability to sort (at least in some versions). You can do ps -o pid,comm --sort comm and it'll sort by that column. Also --sort -comm will sort in reverse order. – HerbCSO Aug 23 '22 at 22:13
  • Works fine, but because the body() call is now the first command after the pipe, we lose alias expansion. You can easily parse the alias definition from the output of alias $1, so I added that to conditionally expand the first argument to body() as follows: alias "$1" &> /dev/null && local ali="$(alias "$1")" && ali="${ali#alias $1=\'}" ali="${ali%\'}" && { $ali "${@:2}"; return; } || "$@" . Note that I added a return after the alias expansion, to prevent entering the || branch on error. – db-inf Dec 19 '23 at 15:28
60

You can keep the header at the top like this with bash:

command | (read -r; printf "%s\n" "$REPLY"; sort)

Or do it with perl:

command | perl -e 'print scalar (<>); print sort { ... } <>'
Andy
  • 2,927
  • 1
    (read;...) seems to lose the spacing between the fields of the header for me. Any suggestions? – jonderry Apr 23 '11 at 01:17
  • @jonderry: Change read to IFS= read. – Mikel Apr 23 '11 at 01:25
  • @Mikel: OK, changing to IFS= didn't fix this problem. However, changing to printf '%s\n' "$REPLY" fixed it for this approach. I haven't noticed an effect from setting IFS. What is this fixing? – jonderry Apr 23 '11 at 01:33
  • @jonderry: Any spaces at the start of the line. Without IFS, leading spaces are stripped out. With IFS=, the line is printed verbatim. – Mikel Apr 23 '11 at 01:49
  • 3
    IFS= disables word splitting when reading the input. I don't think it's necessary when reading to $REPLY. echo will expand backslash escapes if xpg_echo is set (not the default); printf is safer in that case. echo $REPLY without quotes will condense whitespace; I think echo "$REPLY" should be okay. read -r is needed if the input may contain backslash escapes. Some of this might depend on bash version. – Andy Apr 23 '11 at 01:50
  • 1
    @Andy: Wow, you're right, different rules for read REPLY; echo $REPLY (strips leading spaces) and read; echo $REPLY (doesn't). – Mikel Apr 23 '11 at 02:44
  • 1
    @Andy: IIRC, the default value of xpg_echo depends on your system, e.g. on Solaris I think it defaults to true. This is why Gilles likes printf so much: it's the only thing with predictable behavior. – Mikel Apr 23 '11 at 02:47
  • Great solution; in POSIX-features-only shells, use IFS= read -r l; printf '%s\n' "$l", since read always requires a variable argument there. – mklement0 May 02 '14 at 14:06
  • @MartinThoma It just means the command you want to sort, it isn't actually command but any command that produces output you want to sort. e.g. ps -o pid,comm would be used as the command. – Elijah Lynn Sep 29 '17 at 19:11
42

I found a nice awk version that works nicely in scripts:

awk 'NR == 1; NR > 1 {print $0 | "sort -n"}'
  • 4
    I like this, but it requires a bit of explanation - the pipe is inside the awk script. How does that work? Is it calling the sort command externally? Does anyone know of at least a link to a page explaining pipe use within awk? – Wildcard Nov 07 '15 at 01:24
  • @Wildcard you can check the official manual page or this primer. – lapo Nov 02 '16 at 19:52
  • This code fails when I use these arguments to sort: sort -n -k 2b,2 -t $'\t'. The problem is nesting '\t' inside 'NR...{print...}'. The explanation of how to escape the 's is here – Josh Mar 28 '20 at 17:30
  • For fixed-width output, use the -b option, as it will make sort ignore leading blanks in the sort key. The default field separator is non-blank-to-blank transitions, so fields will start with leading blanks. For example, this command lists installed Python packages first by location, then by package name: pip list -v | awk 'NR <= 2; NR > 2 { print $0 | "sort -b -k 3,3 -k 1,1" };' – aparkerlue May 13 '21 at 16:38
  • Note, pipes inside awk may need to be followed by close("sort --exact-args...") to prevent buffering from printing this after later prints. – Excalibur Dec 29 '21 at 18:31
7

Hackish but effective: prepend 0 to all header lines and 1 to all other lines before sorting. Strip the prefix after sorting.

… |
awk '{print (NR <= 2 ? "0 " : "1 ") $0}' |
sort -k 1 -k… |
cut -b 3-
7

The pee command from moreutils is designed for tasks like this.

Example:

To keep one header line, and sort the second (numeric) column in stdin:

<your command> | pee 'head -n 1' 'tail -n +2 | sort -k 2,2 -n'

Explanation:

pee : pipe stdin to one or more commands and concatenate the results.

head -n 1 : Print the first line of stdin.

tail -n +2 : Print the second and following lines from stdin.

sort -k 2,2 -n : Numerically sort by the second column.

Test:

printf "header\na 1\nc 3\nb 2\n" | pee 'head -n 1' 'tail -n +2 | sort -k 2,2 -n'

gives

header
a 1
b 2
c 3
freeB
  • 111
  • 2
    This is a great solution because it's easily memorizable: I just have to remember pee and then use regular commands I already know like head or sort. That also makes it easily adaptable to other use cases. Thanks a lot! – Jens Bannmann Jun 03 '23 at 08:03
4

Here's some magic perl line noise that you can pipe your output through to sort everything but keep the first line at the top: perl -e 'print scalar <>, sort <>;'

2

I think this is easiest.

ps -ef | ( head -n 1 ; sort )

or this which is possibly faster as it does not create a sub shell

ps -ef | { head -n 1 ; sort ; }

Other cool uses

shuffle lines after header row

cat file.txt |  ( head -n 1 ; shuf )

reverse lines after header row

cat file.txt |  ( head -n 1 ; tac )
don_crissti
  • 82,805
  • 3
    See http://unix.stackexchange.com/questions/11856/sort-but-keep-header-line-at-the-top#comment15824_11856. This is not actually a good solution. – Wildcard Nov 06 '15 at 21:43
  • 3
    Not working, cat file | { head -n 1 ; sort ; } > file2 only show head – Peter Krauss Jul 06 '18 at 19:19
2

I tried the command | {head -1; sort; } solution and can confirm that it really screws things up--head reads in multiple lines from the pipe, then outputs just the first one. So the rest of the output, that head did not read, is passed to sort--NOT the rest of the output starting from line 2!

The result is that you are missing lines (and one partial line!) that were in the beginning of your command output (except you still have the first line) - a fact that is easy to confirm by adding a pipe to wc at the end of the above pipeline - but that is extraordinarily difficult to trace down if you don't know this! I spent at least 20 minutes trying to work out why I had a partial line (first 100 bytes or so cut off) in my output before solving it.

What I ended up doing, which worked beautifully and didn't require running the command twice, was:

myfile=$(mktemp)
whatever command you want to run > $myfile

head -1 $myfile
sed 1d $myfile | sort

rm $myfile

If you need to put the output into a file, you can modify this to:

myfile=$(mktemp)
whatever command you want to run > $myfile

head -1 $myfile > outputfile
sed 1d $myfile | sort >> outputfile

rm $myfile
Wildcard
  • 36,499
  • You can use ksh93's head builtin or the line utility (on systems that still have one) or gnu-sed -u q or IFS=read -r line; printf '%s\n' "$line", that read the input one byte at a time to avoid that. – Stéphane Chazelas Jan 11 '18 at 21:58
0

Simple and straightforward!

<command> | head -n 1; <command> | sed 1d | sort <....>
  • sed nd ---> 'n' specifies line no., and 'd' stands for delete.
Jatsui
  • 95
  • 3
  • 1
    Just as jofel commented a year and a half ago on Sarva's answer, this starts command twice. So not really suitable for use in a pipeline. – Wildcard Nov 06 '15 at 02:36
0

I came here looking for a solution for the command w. This command shows details of who is logged in and what they are doing.

To show the results sorted, but with the headers kept at the top (there are 2 lines of headers), I settled on:

w | head -n 2; w | tail -n +3 | sort

Obviously this runs the command w twice and therefore may not be suitable for all situations. However, to its advantage it is substantially easier to remember.

Note that the tail -n +3 means 'show all lines from the 3rd onwards' (see man tail for details).

Robert
  • 101
0

Using Raku (formerly known as Perl_6)

~$ raku -e '.put for "\x0061".."\x07A";' | raku -e 'put get; .put for lines.sort.reverse.head(10);'

#OR

~$ raku -e '.put for "\x0061".."\x07A";' | raku -e 'put lines[0]; .put for lines[1..*].sort.reverse.head(10);'

Sample Input: English alphabet, one letter per line

Sample Output (truncated to first 10 lines via .head(10):

a
z
y
x
w
v
u
t
s
r
q

Answering this to complement Perl answers already posted. The put get call 1. 'gets' a single line and out-'puts' it, then 2. advances the read cursor so the first line isn't read again (e.g. by lines). If you need to read a 2-line header (for example), use (put get) xx 2.

When sorting a file, sometimes you want to filter a little first--an example is removing blank lines. That's easy with Raku, simply interpose a call to .map({$_ if .chars}) after the call to lines (and before the call to sort).

A nice advantage of Raku is built-in, high-level support for Unicode. A Cyrillic alphabet equivalent of the Raku code at top is as follows:

~$ raku -e '.put for "\x0430".."\x044F";' | raku -e 'put get; .put for lines.sort.reverse.head(10);'

OR, taking input off the command line:

~$ raku -e '.put for "\x0430".."\x044F";' > Cyrillic.txt
~$ raku -e 'put lines[0]; .put for lines[1..*].sort.reverse.head(10);'  Cyrillic.txt

Sample Output (either Cyrillic example above):

а
я
ю
э
ь
ы
ъ
щ
ш
ч
ц

See URLs below for further discussion on the Raku/Perl6 mailing list regarding how to translate Perl(5) file-input idioms into Raku.

https://www.nntp.perl.org/group/perl.perl6.users/2018/11/msg6295.html
https://www.nntp.perl.org/group/perl.perl6.users/2019/07/msg6825.html

https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17
0

Expanding on @Mikel's answer, here is a version of the body() function that adds a few features:

  1. It detects if there is input coming in on a pipe, and if not prints out usage information to STDERR.

  2. If no command is given, it uses sort as the default.

  3. If the first parameter is a number, it uses that number as the number of header lines (default 1)

In testing, it works on Linux bash and macOS zsh

I made a gist at github: https://gist.github.com/alanhoyle/7ec6bd445a790b62567d8b1ff6941c66

Thus:

body() {
    local HEADER_LINES=1
    local COMMAND="sort"
if [ -t 0 ]; then
     &gt;&amp;2 echo &quot;ERROR:  body requires piped input!&quot;
     &gt;&amp;2 echo &quot;body: prints the header from a STDIN and sends the 'body' to another command for&quot;
     &gt;&amp;2 echo &quot;    additional processing.  Useful for sort/grep when you want to keep headers&quot;
     &gt;&amp;2 echo &quot;USAGE:  COMMAND | body [ N ] [ COMMAND_TO_PROCESS_OUTPUT ]&quot;
     &gt;&amp;2 echo &quot;    if the first parameter N is a whole number, it prints that number of lines&quot;
     &gt;&amp;2 echo &quot;        before proceeding  [ default: skip $HEADER_LINES ]&quot;
     &gt;&amp;2 echo &quot;    if the [ COMMAND_TO PROCESS_OUTPUT ] is omitted, '$COMMAND' is used&quot;
     return 1
fi

local re='^[0-9]+$'

if [[ $1 =~ $re ]] ; then
    HEADER_LINES=$1
    shift
    &gt;&amp;2 echo &quot;body: skipping $HEADER_LINES&quot;
fi

local THIS_COMMAND=$@

if [ -z &quot;$THIS_COMMAND&quot; ] ; then
    &gt;&amp;2 echo &quot;body: running default $COMMAND&quot;
fi

for line in $(eval echo &quot;{1..$HEADER_LINES}&quot;)
do
    IFS= read -r header
    printf '%s\n' &quot;$header&quot;
done

if [ -z &quot;$THIS_COMMAND&quot; ] ; then
    ( $COMMAND )
else
    &quot;$@&quot;
fi

}

Example:

$ body
ERROR:  body requires piped input!

body: prints the header from a STDIN and sends the 'body' to another command for additional processing. Useful for sort/grep when you want to keep headers

USAGE: COMMAND | body [ N ] [ COMMAND_TO_PROCESS_OUTPUT ] if the first parameter N is a whole number, it prints that number of lines before proceeding [ default: skip 1 ] if the [ COMMAND_TO PROCESS_OUTPUT ] is omitted, 'sort' is used $ echo -e "header\n30\n33\n20" header 30 33 20 $ echo -e "header\n30\n33\n20" | body body: running sort by default header 20 30 33 $ echo -e "header\n30\n33\n20" | body grep 0 header 30 20 $ echo -e "header\n30\n33\n20" | body 2 body: skipping 2 body: running sort by default header 30 20 33

Aphoid
  • 213
0

Basically, you need something that reads one line and only one line from the input and outputs it and then leave the rest of the input to sort.

There are quite a few utilities that can read one line and print it:

  • head -n 1
  • sed q
  • awk '{print; exit}'

But most implementations of those read their input in chunks, and will generally end up reading more than one line. On seekable input, they're able to rewind upon exit to just after the first line, but they can't do that on pipes or other non-seekable input.

You need an utility that give you a guarantee they don't read past the end of the first line. The options are:

  • line: that used to be a standard utility but was obsoleted by POSIX on the ground that it was redundant with the read builtin of sh. That read lines one byte at a time, and output it. It was always outputting a line, even when there was none or a non-delimited one on input.
  • sed -u q: some sed implementations support a -u option for unbuffered and some of those that support it, with it also read their input one byte at a time. You also need a sed implementation that doesn't read one line in advance when the $ address is not used. Which probably doesn't leave many implementations besides GNU sed. GNU sed also outputs a full line if the input only had a non-delimited line.
  • IFS= read -r line: that reads up to one line and is guaranteed not read past the end of the line. Except for zsh's read builtin, it can't cope with NUL bytes. It doesn't print the line it has read, but you can use printf for that. With zsh, read -re reads the line and echoes it; it adds a newline character if missing on input.

So your best bet in sh-like shells would be:

sort_body() (
  if IFS= read -r line; then
    printf '%s\n' "$line" &&
      exec sort "$@"
  else # no input or only a non-delimited header line
    printf %s "$line"
    # no point in running sort as there's no input left
  fi
)

Then:

cmd | sort_body -nk1,1 ..
<file sort_body -u

(not sort_body -u file, the thing to sort has to be passed on sort_body's stdin).

0

If those are CSVs or TSVs (or more see manual), that sounds like a job for mlr (miller).

Like with a file looking like:

$ cat /usr/share/distro-info/debian.csv
version,codename,series,created,release,eol,eol-lts,eol-elts
1.1,Buzz,buzz,1993-08-16,1996-06-17,1997-06-05
1.2,Rex,rex,1996-06-17,1996-12-12,1998-06-05
1.3,Bo,bo,1996-12-12,1997-06-05,1999-03-09
2.0,Hamm,hamm,1997-06-05,1998-07-24,2000-03-09
2.1,Slink,slink,1998-07-24,1999-03-09,2000-10-30
2.2,Potato,potato,1999-03-09,2000-08-15,2003-07-30
[...]
$ mlr --ragged --csv cut -f codename,created then sort -f codename /usr/share/distro-info/debian.csv
codename,created
Bo,1996-12-12
Bookworm,2021-08-14
Bullseye,2019-07-06
Buster,2017-06-17
Buzz,1993-08-16
Etch,2005-06-06
[...]

That is, the order is not only preserved, but the field names in there can also be used in the cut or sort specifications.

-1
command | head -1; command | tail -n +2 | sort
Sarva
  • 1
  • 4
    This starts command two times. Therefore it is limited to some specific commands. However, for the requested ps command in the example, it would work. – jofel May 20 '14 at 12:00
-3

Try doing:

wc -l file_name | tail -n $(awk '{print $1-1}') file_name | sort
Kevdog777
  • 3,224
Barry
  • 1