Using head and tail to grab different sets of lines and saving into same file

Question

So this is for homework, but I will not be asking the specific homework question.

I need to use head and tail to grab different sets of line from one file. So like lines 6-11 and lines 19-24 and save them both to another file. I know I can do this using append such as

head -11 file|tail -6 > file1; head -24 file| tail -6 >> file1.

But I don't think we are supposed to.
Is there a specific way I could combine the head and tail commands and then save to the file?

Are they specifically asking you to use head and tail? If so, your solution is pretty much the best you can do. If you're allowed to use other programs, sed or awk might allow for nicer solutions (i.e. with fewer process invocations). — n.st, Jan 27 '15 at 19:44
Yea, they are asking for us to use head and tail. Thank you for your answer. — user2709291, Jan 27 '15 at 19:45
One more thing I can add: You can get around the appending output redirection (>>) by enclosing the two commands in parentheses to redirect their concatenated output: (head -11 file | tail -6; head -24 file | tail -6) > file1. It really comes down to personal preference which is nicer. — n.st, Jan 27 '15 at 19:49
Thank you that will work out very well. I really appreciate it. — user2709291, Jan 27 '15 at 19:51

score 15 · Answer 1 · edited Apr 13 '17 at 12:36

15

You can do it with head alone and basic arithmetic, if you group commands with { ... ; } using a construct like

{ head -n ...; head -n ...; ...; } < input_file > output_file

where all commands share the same input (thanks @mikeserv).
Getting lines 6-11 and lines 19-24 is equivalent to:

head -n 5 >/dev/null  # dump the first 5 lines to `/dev/null` then
head -n 6             # print the next 6 lines (i.e. from 6 to 11) then
head -n 7 >/dev/null  # dump the next 7 lines to `/dev/null` ( from 12 to 18)
head -n 6             # then print the next 6 lines (19 up to 24)

So, basically, you would run:

{ head -n 5 >/dev/null; head -n 6; head -n 7 >/dev/null; head -n 6; } < input_file > output_file

edited Apr 13 '17 at 12:36

Community

1

answered Jan 27 '15 at 21:29

don_crissti

82,805

It does not work for me. Input is consumed enterily by first head – Whimusical May 03 '20 at 11:46
Hi @Whimusical, what I found is this - when I use the pipe like this cat sample.csv | , the stdout is only consumed by the first head. But if I do it like < sample.csv, both two head can consume the input csv file. Do you see the same thing? – user3768495 Oct 22 '21 at 19:07
Your heads are buffering, that's why the input is consumed... if they were operating line-based my code above would work. – don_crissti Oct 25 '21 at 11:16
@don_crissti While this answer may be necessary if the input is unseekable (and not a cooked tty), it has extremely poor performance on large files, as head needs to make a separate syscall to read each byte, assuming it can even be instructed to do so. It also cannot output the lines in a different order from the input, though it's not clear from the question whether that might be necessary. – Martin Kealey Oct 16 '22 at 23:48

score 8 · Answer 2 · edited Apr 13 '17 at 12:36

You can use the { … } grouping construct to apply the redirection operator to a compound command.

{ head -n 11 file | tail -n 6; head -n 24 file | tail -n 6; } >file1

Instead of duplicating the first M+N lines and keeping only the last N, you can skip the first M lines and duplicate the next N. This is measurably faster on large files. Beware that the +N argument of tail is not the number of lines to skip, but one plus that — it's the number of the first line to print with lines numbered from 1.

{ tail -n +6 file | head -n 6; tail -n +19 file | head -n 6; } >file1

Either way, the output file is only opened once, but the input file is traversed once for each snippet to extract. How about grouping the inputs?

{ tail -n +6 | head -n 6; tail -n +14 | head -n 6; } <file >file1

In general, this doesn't work. (It might work on some systems, at least when the input is a regular file.) Why? Because of input buffering. Most programs, including tail, don't read their input byte by byte, but a few kilobytes at a time, because it's faster. So tail reads a few kilobytes, skips a bit at the beginning, passes a bit more to head, and stops — but what is read is read, and is not available for the next command.

Another approach is to use head piped to /dev/null to skip lines.

{ head -n 5 >/dev/null; head -n 6; head -n 7 >/dev/null; head -n 6; } <file >file1

Again, this is not guaranteed to work, due to buffering. It happens to work with the head command from GNU coreutils (the one found on non-embedded Linux systems), when the input is from a regular file. That's because once this implementation of head has read what it wants, it sets the file position to the first byte that it didn't output. This doesn't work if the input is a pipe.

A simpler way to print several sequences of lines from a file is to call a more generalist tool such as sed or awk. (This can be slower, but it only matters for extremely large files.)

sed -n -e '6,11p' -e '19,24p' <file >file1
sed -e '1,5d' -e '12,18d' -e '24q' <file >file1
awk '6<=NR && NR<=11 || 19<=NR && NR<=24' <file >file1
awk 'NR==6, NR==11; NR==19, NR==24' <file >file1

It doesn't happen to work, it is standard, specified behavior - though certainly, as you say, a pipe is not a reliable input source for shared input. UTILITY DESCRIPTION DEFAULTS: When a standard utility reads a seekable input file and terminates without an error before it reaches end-of-file, the utility will ensure that the file offset in the open file description is properly positioned just past the last byte processed by the utility. — mikeserv, Jan 28 '15 at 07:18

score 2 · Answer 3 · answered Jan 28 '15 at 00:29

I know you said that you need to use head and tail, but sed is definitely the simpler tool for the job here.

$ cat foo
a 1 1
a 2 1
b 1 1
a 3 1
c 3 1
c 3 1
$ sed -ne '2,4p;6p' foo
a 2 1
b 1 1
a 3 1
c 3 1

You can even build the blocks in a string with some other process and run that through sed.

$ a="2,4p;6p"
$ sed -ne $a foo
a 2 1
b 1 1
a 3 1
c 3 1

-n negates output, then you specify ranges to print with p, with the first and last number of the range separated by a comma.

That being said, you can either do the command grouping that @don_crissti suggested, or loop through the file a few times with head/tail grabbing a chunk of lines each time you go through.

$ head -4 foo | tail -3; head -6 foo | tail -1
a 2 1
b 1 1
a 3 1
c 3 1

The more lines in a file and the more blocks you have, the more efficient sed will get.

mikeserv · Answer 4 · 2015-01-29T10:25:59.377

With sed you might do:

sed '24q;1,5d;12,18d' <infile >outfile

...Possibly a more efficient solution could be had with head. Don has already demonstrated how that might work very well, but I've been playing around with it as well. Something you might do to handle this specific case:

for   n in 5 6 7 6
do    head -n"$n" >&"$((1+n%2))"
done  <infile >outfile 2>/dev/null

...which would call head 4 times writing either to outfile or to /dev/null depending on whether that iteration's value for $n is an even or odd number.

For more general cases, I cobbled this together from some other stuff I already had:

somehead()( 
### call it like:
### somehead -[repeat] [-][numlines]* <infile >outfile
    set -e -- "${1#-}" "$@"                             #-e for arg validation
    r=; cd -- "${TMP:-/tmp}"                            #go to tmp
    dd bs=4096 of="$$$$" <&4 2>&3 &                     #dd <in >tmpfile &bg
    until [ -s "$$$$" ]; do :; done                     #wait while tmpfile empty
    exec <"$$$$" 4<&-;   rm "$$$$"                      #<tmpfile; rm tmpfile
    [ "$3${1}0" -ne "$3${2#?}0" ]          ||           #validate args - chk $1
            shift "$(((r=-${1:--1})||1))"; shift        #shift 1||2
    while [ "$(((r+=(_n=1))-1))" -ne 0 ]   &&           #while ! $rptmax &&
          IFS= read -r l                   &&           #      ! EOF     &&
          printf "%.$(($1>0?${#l}+1:0))s" "$l           #      ? printf  do
";  do    for n do [ "${n#-}" -gt 0 ]      || exit      #args all -[nums>0]
          head "-n$((${n#-}-_n))" >&"$((n>(_n=0)?1:3))" #head -n?$1 >?[+-]
    done; done                                          #done and done
)   4<&0 3>/dev/null                                    #4<for dd 3>for head

This can do your thing like:

 seq 100 | somehead -1 -5 6 -7 6

...which prints...

It expects its first arg to be a repeat count prefixed with a -, or, failing that, just a -. If a count is provided it will repeat the line pattern given in the following args as many times as specified and stop as soon as it has done so.

For each arg that follows it will interpret a negative integer to indicate a line-count which should be written to /dev/null and a positive integer to indicate a line count which should be written to stdout.

So in the above example it prints the first 5 lines to /dev/null, the next 6 to stdout, the next 7 to /dev/null again and the next 6 once again to stdout. Having reached the last of its args and fully cycled through -1 repeat count, it then quits. If the first arg had been -2 it would have repeated the process one more time, or if - for as long as it could.

For each arg cycle the while loop is processed once through. At the top of each loop the first line from stdin is read into the shell variable $l. This is necessary because while head </dev/null; do :; done will repeat indefinitely - head does indicate in its return when it has reached end of file. So the check against EOF is dedicated to read and printf will write $l plus a newline to stdout only if the second argument is a positive integer.

The read check complicates the loop a little because immediately after another loop is called - a for loop which iterates over args 2-$# as represented in $n for each iteration of its parent while loop. This means that for each iteration the first arg must be decremented by one from the value specified on the command line, but all others should retain their original values, and so the value of the $_n marker var is subtracted from each, but only ever holds a value greater than 0 for the first arg.

That constitutes the main loop of the function, but the bulk of the code is at the top and is intended to enable the function to cleanly buffer even a pipe as input. This works by first calling a backgrounded dd to copy its in to a tmpfile on output at blocksizes of 4k a piece. The function then sets up a hold loop - which should almost never complete even a single full cycle - just to ensure that dd has made at least a single write to the file before the function then replaces its stdin with a file descriptor linked to the tmpfile and afterward immediately unlinks the file with rm. This enables the function to reliably process the stream without requiring traps or otherwise for cleanup - as soon as the function releases it claim on the fd the tmpfile will cease to exist because its only named filesystem link has already been removed.

score 0 · Answer 5 · answered Jan 28 '15 at 08:47

0

Use a bash function like this:

seq 1 30 > input.txt
f(){ head $1 input.txt | tail $2 >> output.txt ;}; f -11 -2; f -24 -3
cat output.txt
10
11
22
23
24

This is a bit an overkill in this case, but if your filters grow larger it can become a boon.

answered Jan 28 '15 at 08:47

mkalkov

157

Using head and tail to grab different sets of lines and saving into same file

5 Answers5