I am getting output from a program that first produces one line that is a bunch of column headers, and then a bunch of lines of data. I want to cut various columns of this output and view it sorted according to various columns. Without the headers, the cutting and sorting is easily accomplished via the -k option to sort along with cut or awk to view a subset of the columns. However, this method of sorting mixes the column headers in with the rest of the lines of output. Is there an easy way to keep the headers at the top?
16 Answers
Stealing Andy's idea and making it a function so it's easier to use:
# print the header (the first line of input)
# and then run the specified command on the body (the rest of the input)
# use it in a pipeline, e.g. ps | body grep somepattern
body() {
IFS= read -r header
printf '%s\n' "$header"
"$@"
}
Now I can do:
$ ps -o pid,comm | body sort -k2
PID COMMAND
24759 bash
31276 bash
31032 less
31177 less
31020 man
31167 man
...
$ ps -o pid,comm | body grep less
PID COMMAND
31032 less
31177 less
-
ps -C COMMANDmay be more appropriate thangrep COMMAND, but it's just an example. Also, you can't use-Cif you also used another selection option such as-U. – Mikel Apr 23 '11 at 00:51 -
Or maybe it should be called
body? As inbody sortorbody grep. Thoughts? – Mikel Apr 23 '11 at 00:57 -
I tried
readin this form first, but noticed that it was eating leading whitespace. Making this a function is a good idea. +1 – Andy Apr 23 '11 at 01:01 -
4Renamed from
headertobody, because you're doing the action on the body. Hopefully that makes more sense. – Mikel Apr 23 '11 at 01:02 -
-
4Remember to call
bodyon all subsequent pipeline participants:ps -o pid,comm | body grep less | body sort -k1nr– bishop Nov 07 '16 at 20:02 -
Can you modify the function so that it can act not only on pipes but on files,e.g.
body sort -k2 fooand not justcat foo|body sort -k2– Tim Sep 03 '17 at 09:53 -
2@Tim You can just write
<foo body sort -k2orbody sort -k2 <foo. Just one extra character from what you wanted. – Mikel Sep 04 '17 at 13:49 -
Cool stuff! Just as it was mentioned by others each next command in the pipe should be "body"ed:
. . . | body cmd1 | body cmd2. Also it can be used in rare cases when the header contains more than 1 lines (for example mysql outputs in table format):msql -t -e "..." | body body body ...– jsxt May 06 '21 at 13:51 -
I've just realized that avoiding multiple
bodyper each command can be reached witheval. For example:ps | body "grep firefox | sort"is a bit simpler thanps | body grep firefox | body sortand is still working. It's just needed to replace"$@"witheval "$@"in the function suggested by @Mikel. – jsxt May 07 '21 at 15:27 -
1Slight side note: I know this is a generic solution, but I just wanted to point out that the
pscommand has the ability to sort (at least in some versions). You can dops -o pid,comm --sort command it'll sort by that column. Also--sort -commwill sort in reverse order. – HerbCSO Aug 23 '22 at 22:13 -
-
Works fine, but because the body() call is now the first command after the pipe, we lose alias expansion. You can easily parse the alias definition from the output of
alias $1, so I added that to conditionally expand the first argument to body() as follows:alias "$1" &> /dev/null && local ali="$(alias "$1")" && ali="${ali#alias $1=\'}" ali="${ali%\'}" && { $ali "${@:2}"; return; } || "$@". Note that I added areturnafter the alias expansion, to prevent entering the||branch on error. – db-inf Dec 19 '23 at 15:28
You can keep the header at the top like this with bash:
command | (read -r; printf "%s\n" "$REPLY"; sort)
Or do it with perl:
command | perl -e 'print scalar (<>); print sort { ... } <>'
- 2,927
-
1(read;...) seems to lose the spacing between the fields of the header for me. Any suggestions? – jonderry Apr 23 '11 at 01:17
-
-
@Mikel: OK, changing to
IFS=didn't fix this problem. However, changing toprintf '%s\n' "$REPLY"fixed it for this approach. I haven't noticed an effect from settingIFS. What is this fixing? – jonderry Apr 23 '11 at 01:33 -
@jonderry: Any spaces at the start of the line. Without
IFS, leading spaces are stripped out. WithIFS=, the line is printed verbatim. – Mikel Apr 23 '11 at 01:49 -
3
IFS=disables word splitting when reading the input. I don't think it's necessary when reading to$REPLY.echowill expand backslash escapes ifxpg_echois set (not the default);printfis safer in that case.echo $REPLYwithout quotes will condense whitespace; I thinkecho "$REPLY"should be okay.read -ris needed if the input may contain backslash escapes. Some of this might depend on bash version. – Andy Apr 23 '11 at 01:50 -
1@Andy: Wow, you're right, different rules for
read REPLY; echo $REPLY(strips leading spaces) andread; echo $REPLY(doesn't). – Mikel Apr 23 '11 at 02:44 -
1@Andy: IIRC, the default value of
xpg_echodepends on your system, e.g. on Solaris I think it defaults to true. This is why Gilles likesprintfso much: it's the only thing with predictable behavior. – Mikel Apr 23 '11 at 02:47 -
Great solution; in POSIX-features-only shells, use
IFS= read -r l; printf '%s\n' "$l", sincereadalways requires a variable argument there. – mklement0 May 02 '14 at 14:06 -
@MartinThoma It just means the command you want to sort, it isn't actually
commandbut any command that produces output you want to sort. e.g.ps -o pid,commwould be used as the command. – Elijah Lynn Sep 29 '17 at 19:11
I found a nice awk version that works nicely in scripts:
awk 'NR == 1; NR > 1 {print $0 | "sort -n"}'
- 561
-
4I like this, but it requires a bit of explanation - the pipe is inside the awk script. How does that work? Is it calling the
sortcommand externally? Does anyone know of at least a link to a page explaining pipe use within awk? – Wildcard Nov 07 '15 at 01:24 -
-
This code fails when I use these arguments to
sort:sort -n -k 2b,2 -t $'\t'. The problem is nesting'\t'inside'NR...{print...}'. The explanation of how to escape the's is here – Josh Mar 28 '20 at 17:30 -
For fixed-width output, use the
-boption, as it will makesortignore leading blanks in the sort key. The default field separator is non-blank-to-blank transitions, so fields will start with leading blanks. For example, this command lists installed Python packages first by location, then by package name:pip list -v | awk 'NR <= 2; NR > 2 { print $0 | "sort -b -k 3,3 -k 1,1" };'– aparkerlue May 13 '21 at 16:38 -
Note, pipes inside
awkmay need to be followed byclose("sort --exact-args...")to prevent buffering from printing this after later prints. – Excalibur Dec 29 '21 at 18:31
Hackish but effective: prepend 0 to all header lines and 1 to all other lines before sorting. Strip the prefix after sorting.
… |
awk '{print (NR <= 2 ? "0 " : "1 ") $0}' |
sort -k 1 -k… |
cut -b 3-
- 829,060
The pee command from moreutils is designed for tasks like this.
Example:
To keep one header line, and sort the second (numeric) column in stdin:
<your command> | pee 'head -n 1' 'tail -n +2 | sort -k 2,2 -n'
Explanation:
pee : pipe stdin to one or more commands and concatenate the results.
head -n 1 : Print the first line of stdin.
tail -n +2 : Print the second and following lines from stdin.
sort -k 2,2 -n : Numerically sort by the second column.
Test:
printf "header\na 1\nc 3\nb 2\n" | pee 'head -n 1' 'tail -n +2 | sort -k 2,2 -n'
gives
header
a 1
b 2
c 3
- 111
-
2This is a great solution because it's easily memorizable: I just have to remember
peeand then use regular commands I already know likeheadorsort. That also makes it easily adaptable to other use cases. Thanks a lot! – Jens Bannmann Jun 03 '23 at 08:03
Here's some magic perl line noise that you can pipe your output through to sort everything but keep the first line at the top: perl -e 'print scalar <>, sort <>;'
- 5,438
I think this is easiest.
ps -ef | ( head -n 1 ; sort )
or this which is possibly faster as it does not create a sub shell
ps -ef | { head -n 1 ; sort ; }
Other cool uses
shuffle lines after header row
cat file.txt | ( head -n 1 ; shuf )
reverse lines after header row
cat file.txt | ( head -n 1 ; tac )
- 82,805
- 69
- 2
-
3See http://unix.stackexchange.com/questions/11856/sort-but-keep-header-line-at-the-top#comment15824_11856. This is not actually a good solution. – Wildcard Nov 06 '15 at 21:43
-
3Not working,
cat file | { head -n 1 ; sort ; } > file2only show head – Peter Krauss Jul 06 '18 at 19:19
I tried the command | {head -1; sort; } solution and can confirm that it really screws things up--head reads in multiple lines from the pipe, then outputs just the first one. So the rest of the output, that head did not read, is passed to sort--NOT the rest of the output starting from line 2!
The result is that you are missing lines (and one partial line!) that were in the beginning of your command output (except you still have the first line) - a fact that is easy to confirm by adding a pipe to wc at the end of the above pipeline - but that is extraordinarily difficult to trace down if you don't know this! I spent at least 20 minutes trying to work out why I had a partial line (first 100 bytes or so cut off) in my output before solving it.
What I ended up doing, which worked beautifully and didn't require running the command twice, was:
myfile=$(mktemp)
whatever command you want to run > $myfile
head -1 $myfile
sed 1d $myfile | sort
rm $myfile
If you need to put the output into a file, you can modify this to:
myfile=$(mktemp)
whatever command you want to run > $myfile
head -1 $myfile > outputfile
sed 1d $myfile | sort >> outputfile
rm $myfile
- 36,499
-
You can use ksh93's
headbuiltin or thelineutility (on systems that still have one) orgnu-sed -u qorIFS=read -r line; printf '%s\n' "$line", that read the input one byte at a time to avoid that. – Stéphane Chazelas Jan 11 '18 at 21:58
Simple and straightforward!
<command> | head -n 1; <command> | sed 1d | sort <....>
- sed nd ---> 'n' specifies line no., and 'd' stands for delete.
- 95
- 3
-
1Just as jofel commented a year and a half ago on Sarva's answer, this starts
commandtwice. So not really suitable for use in a pipeline. – Wildcard Nov 06 '15 at 02:36
I came here looking for a solution for the command w. This command shows details of who is logged in and what they are doing.
To show the results sorted, but with the headers kept at the top (there are 2 lines of headers), I settled on:
w | head -n 2; w | tail -n +3 | sort
Obviously this runs the command w twice and therefore may not be suitable for all situations. However, to its advantage it is substantially easier to remember.
Note that the tail -n +3 means 'show all lines from the 3rd onwards' (see man tail for details).
- 101
Using Raku (formerly known as Perl_6)
~$ raku -e '.put for "\x0061".."\x07A";' | raku -e 'put get; .put for lines.sort.reverse.head(10);'
#OR
~$ raku -e '.put for "\x0061".."\x07A";' | raku -e 'put lines[0]; .put for lines[1..*].sort.reverse.head(10);'
Sample Input: English alphabet, one letter per line
Sample Output (truncated to first 10 lines via .head(10):
a
z
y
x
w
v
u
t
s
r
q
Answering this to complement Perl answers already posted. The put get call 1. 'gets' a single line and out-'puts' it, then 2. advances the read cursor so the first line isn't read again (e.g. by lines). If you need to read a 2-line header (for example), use (put get) xx 2.
When sorting a file, sometimes you want to filter a little first--an example is removing blank lines. That's easy with Raku, simply interpose a call to .map({$_ if .chars}) after the call to lines (and before the call to sort).
A nice advantage of Raku is built-in, high-level support for Unicode. A Cyrillic alphabet equivalent of the Raku code at top is as follows:
~$ raku -e '.put for "\x0430".."\x044F";' | raku -e 'put get; .put for lines.sort.reverse.head(10);'
OR, taking input off the command line:
~$ raku -e '.put for "\x0430".."\x044F";' > Cyrillic.txt
~$ raku -e 'put lines[0]; .put for lines[1..*].sort.reverse.head(10);' Cyrillic.txt
Sample Output (either Cyrillic example above):
а
я
ю
э
ь
ы
ъ
щ
ш
ч
ц
See URLs below for further discussion on the Raku/Perl6 mailing list regarding how to translate Perl(5) file-input idioms into Raku.
https://www.nntp.perl.org/group/perl.perl6.users/2018/11/msg6295.html
https://www.nntp.perl.org/group/perl.perl6.users/2019/07/msg6825.html
- 3,195
- 8
- 17
Expanding on @Mikel's answer, here is a version of the body() function that adds a few features:
It detects if there is input coming in on a pipe, and if not prints out usage information to STDERR.
If no command is given, it uses
sortas the default.If the first parameter is a number, it uses that number as the number of header lines (default 1)
In testing, it works on Linux bash and macOS zsh
I made a gist at github: https://gist.github.com/alanhoyle/7ec6bd445a790b62567d8b1ff6941c66
Thus:
body() {
local HEADER_LINES=1
local COMMAND="sort"
if [ -t 0 ]; then
>&2 echo "ERROR: body requires piped input!"
>&2 echo "body: prints the header from a STDIN and sends the 'body' to another command for"
>&2 echo " additional processing. Useful for sort/grep when you want to keep headers"
>&2 echo "USAGE: COMMAND | body [ N ] [ COMMAND_TO_PROCESS_OUTPUT ]"
>&2 echo " if the first parameter N is a whole number, it prints that number of lines"
>&2 echo " before proceeding [ default: skip $HEADER_LINES ]"
>&2 echo " if the [ COMMAND_TO PROCESS_OUTPUT ] is omitted, '$COMMAND' is used"
return 1
fi
local re='^[0-9]+$'
if [[ $1 =~ $re ]] ; then
HEADER_LINES=$1
shift
>&2 echo "body: skipping $HEADER_LINES"
fi
local THIS_COMMAND=$@
if [ -z "$THIS_COMMAND" ] ; then
>&2 echo "body: running default $COMMAND"
fi
for line in $(eval echo "{1..$HEADER_LINES}")
do
IFS= read -r header
printf '%s\n' "$header"
done
if [ -z "$THIS_COMMAND" ] ; then
( $COMMAND )
else
"$@"
fi
}
Example:
$ body
ERROR: body requires piped input!
body: prints the header from a STDIN and sends the 'body' to another command for
additional processing. Useful for sort/grep when you want to keep headers
USAGE: COMMAND | body [ N ] [ COMMAND_TO_PROCESS_OUTPUT ]
if the first parameter N is a whole number, it prints that number of lines
before proceeding [ default: skip 1 ]
if the [ COMMAND_TO PROCESS_OUTPUT ] is omitted, 'sort' is used
$ echo -e "header\n30\n33\n20"
header
30
33
20
$ echo -e "header\n30\n33\n20" | body
body: running sort by default
header
20
30
33
$ echo -e "header\n30\n33\n20" | body grep 0
header
30
20
$ echo -e "header\n30\n33\n20" | body 2
body: skipping 2
body: running sort by default
header
30
20
33
- 213
Basically, you need something that reads one line and only one line from the input and outputs it and then leave the rest of the input to sort.
There are quite a few utilities that can read one line and print it:
head -n 1sed qawk '{print; exit}'
But most implementations of those read their input in chunks, and will generally end up reading more than one line. On seekable input, they're able to rewind upon exit to just after the first line, but they can't do that on pipes or other non-seekable input.
You need an utility that give you a guarantee they don't read past the end of the first line. The options are:
line: that used to be a standard utility but was obsoleted by POSIX on the ground that it was redundant with thereadbuiltin ofsh. That read lines one byte at a time, and output it. It was always outputting a line, even when there was none or a non-delimited one on input.sed -u q: somesedimplementations support a-uoption for unbuffered and some of those that support it, with it also read their input one byte at a time. You also need asedimplementation that doesn't read one line in advance when the$address is not used. Which probably doesn't leave many implementations besides GNUsed. GNUsedalso outputs a full line if the input only had a non-delimited line.IFS= read -r line: that reads up to one line and is guaranteed not read past the end of the line. Except for zsh'sreadbuiltin, it can't cope with NUL bytes. It doesn't print the line it has read, but you can useprintffor that. Withzsh,read -rereads the line andechoes it; it adds a newline character if missing on input.
So your best bet in sh-like shells would be:
sort_body() (
if IFS= read -r line; then
printf '%s\n' "$line" &&
exec sort "$@"
else # no input or only a non-delimited header line
printf %s "$line"
# no point in running sort as there's no input left
fi
)
Then:
cmd | sort_body -nk1,1 ..
<file sort_body -u
(not sort_body -u file, the thing to sort has to be passed on sort_body's stdin).
- 544,893
If those are CSVs or TSVs (or more see manual), that sounds like a job for mlr (miller).
Like with a file looking like:
$ cat /usr/share/distro-info/debian.csv
version,codename,series,created,release,eol,eol-lts,eol-elts
1.1,Buzz,buzz,1993-08-16,1996-06-17,1997-06-05
1.2,Rex,rex,1996-06-17,1996-12-12,1998-06-05
1.3,Bo,bo,1996-12-12,1997-06-05,1999-03-09
2.0,Hamm,hamm,1997-06-05,1998-07-24,2000-03-09
2.1,Slink,slink,1998-07-24,1999-03-09,2000-10-30
2.2,Potato,potato,1999-03-09,2000-08-15,2003-07-30
[...]
$ mlr --ragged --csv cut -f codename,created then sort -f codename /usr/share/distro-info/debian.csv
codename,created
Bo,1996-12-12
Bookworm,2021-08-14
Bullseye,2019-07-06
Buster,2017-06-17
Buzz,1993-08-16
Etch,2005-06-06
[...]
That is, the order is not only preserved, but the field names in there can also be used in the cut or sort specifications.
- 544,893
command | head -1; command | tail -n +2 | sort
- 1
-
4This starts
commandtwo times. Therefore it is limited to some specific commands. However, for the requestedpscommand in the example, it would work. – jofel May 20 '14 at 12:00
{ head -1; sort; }to work. It always deletes a bunch of the text after the first line. Does anyone know why this happens? – jonderry Apr 23 '11 at 01:02headis reading more than one line into a buffer and throwing most of it away. Mysedidea had the same problem. – Andy Apr 23 '11 at 01:09lseekable input so it won't work when reading from a pipe. It will work if you redirect to a file>outfileand then run{ head -n 1; sort; } <outfile– don_crissti Sep 26 '15 at 13:40