29

I would like to remove empty lines from the beginning and the end of file, but not remove empty lines between non-empty lines in the middle. I think sed or awk would be the solution.

Source:

1:
2:
3:line1
4:
5:line2
6:
7:
8:

Output:

1:line1
2:
3:line2
ilkkachu
  • 138,973
Feriman
  • 969

10 Answers10

34

Try this,

To remove blank lines from the begin of a file:

sed -i '/./,$!d' filename

To remove blank lines from the end of a file:

sed -i -e :a -e '/^\n*$/{$d;N;ba' -e '}' file

To remove blank lines from begin and end of a file:

sed -i -e '/./,$!d' -e :a -e '/^\n*$/{$d;N;ba' -e '}' file

From man sed,

-e script, --expression=script -> add the script to the commands to be executed

b label -> Branch to label; if label is omitted, branch to end of script.

a -> Append text after a line (alternative syntax).

$ -> Match the last line.

n N -> Add a newline to the pattern space, then append the next line of input to the pattern space. If there is no more input then sed exits without processing any more commands.

Stack EG
  • 1,636
  • 2
    Note that -i is a non-portable extension to the POSIX sed utility and will not be available on all systems. – Andrew Henle Nov 15 '19 at 11:27
  • i see that these commands work, but I'm not quite sure how. Could you explain them in more detail? In particular, in the second example, why doesn't the first clause delete embedded blank lines? Why does the second clause need to loop? It looks like it gets a bunch of newlines at once. Does any of this work on white space-only lines or are you considering them non-blank? – Joe Nov 16 '19 at 16:41
  • Please explain the individual commands, how they are working and what is the meaning of those flags. – Prvt_Yadav Nov 17 '19 at 10:26
  • 1
    It's better to do something like ^[[:space:]]$ instead of just a newline since there are DOS, Linux, and Mac kinds of newlines that will mess you up if you just try to strip out one kind of them. – labyrinth Sep 13 '20 at 02:58
  • In regards to @AndrewHenle's caveat, the command works just as well for streaming, if you don't want to worry about the difference between GNU -i and BSD -i '' – Gordon Jul 18 '21 at 17:29
  • 1
    sed '/[^[:space:]]/,$!d', sed -e :a -e '/^[[:space:]]*$/{$d;N;ba' -e '}' and sed -e '/[^[:space:]]/,$!d' -e :a -e '/^[[:space:]]*$/{$d;N;ba' -e '}' can also remove lines with only spaces. (re-comment to fix a bug of the previous one) – Míng Dec 20 '22 at 10:03
10

This little awk program will remove empty lines at the start of a file:

awk 'NF {p=1} p'

So we can combine that with tac that reverses lines and get:

awk 'NF {p=1} p' file | tac | awk 'NF {p=1} p' | tac
line1

line2

Stealing @guillermo chamorro's command substitution trick:

awk 'NF {p=1} p' <<< "$(< file)"
glenn jackman
  • 85,964
  • 2
    Does this require that the lines are truly empty, or is it enough that they are blank? – Kusalananda Jul 18 '21 at 17:30
  • That's a good question. I seems that if we use the default FS, blank lines get ignored: echo $' \t \t ' | awk '{print NF}' prints 0, but if we specify a field separator: echo $' \t \t ' | awk -F '\t' '{print NF}' prints 3 – glenn jackman Jul 18 '21 at 18:26
8

If the file is small enough to fit memory requirements:

$ perl -0777 -pe 's/^\n+|\n\K\n+$//g' ip.txt
line1

line2
  • -0777 to slurp entire input file
  • ^\n+ one or more newlines from start of string
  • \n\K to prevent deleting newline character of last non-empty line
  • \n+$ one or more newlines at end of string
Sundeep
  • 12,008
7

I propose this:

printf '%s\n' "$(cat file)" | sed '/./,$!d'

It will print the whole text except start-end blank lines. So, if we extend the example:

(blank)
(blank)
line1

line2 line1

line2 line1

line2 line1

line2 (blank) (blank)

It will output:

line1

line2 line1

line2 line1

line2 line1

line2

Quasímodo
  • 18,865
  • 4
  • 36
  • 73
  • 2
    Clever. The trick here is that command substitution ($(cat file)) strips off trailing newlines. I'd offer 2 suggestions: 1) use the bash builtin $(< file) instead of cat; 2) use a here string: sed '/[^[:blank:]]/,$!d' <<< "$(<file)" – glenn jackman Nov 15 '19 at 18:17
2

Using Raku (formerly known as Perl_6):

If the file is read into Raku with lines, then clever use of the trim function can be used to clean-up blank lines (i.e. whitespace) at the beginning and end of the file:

$ raku -e 'lines.join("\n").trim.put;' start_end.txt
lineX
line1

line2 line1

line2 line1

line2 line1

line2 ~$

The input file is the same one used by @schrodigerscatcuriosity (two blank lines at the start of the file, two blank lines at the end of the file). And if you only need to clean up the beginning/end of the file(s), then trim-leading and trim-trailing are your friends.

Alternatively, below is a pretty straightforward translation of @Sundeep's Perl5 code, using a few Raku features:

raku -e 'S:g/ ^\n+ || \n+$ //.put given slurp;' start_end.txt

For the Perl5-to-Raku translation: the file is slurp-ed in and Raku's S/// non-destructive substitution operator is used to return the resultant string. Alternation is accomplished with Raku's || 'first-matching' alternation operator, since Raku's | alternation operator denotes Longest Token Matching (LTM, an improvement).

The Raku equivalent of Perl5's /k and/or /K commands is simply <( ... )> , used singly or as a paired set. These operators instruct the regex engine to drop any matches before <( or after )>. [Note, however that the \K equivalent in Raku appears unnecessary for the problem at hand].

https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17
2

GNU sed has no limit on line length.

sed -z 's/^\n*\|\n*$//g' file

-z flag tells the editor to read the text up to the NUL character delimiter, and since there is no such delimiter in the file, it reads the entire file as one line.

But for compatibility reasons it is recommended to limit for the pattern and hold spaces no more than 4000 bytes. Also:

recursion is used to handle subpatterns and indefinite repetition. This means that the available stack space may limit the size of the buffer that can be processed by certain patterns.

nezabudka
  • 2,428
  • 6
  • 15
1

A simple 2 pass approach just for completeness:

$ awk 'NR==FNR{if (NF) { if (!beg) beg=NR; end=NR } next} FNR>=beg && FNR<=end' file file
line1

line2

The above treats lines of only blank chars as empty. If instead you only want lines with no chars at all to be considered empty then just change NF to /./.

Ed Morton
  • 31,617
1

Expanding on @schrodigerscatcuriosity command-substitiution-trick:

cat <<< "$(tac <<< "$(tac file)")"

I guess there's still more room for shell-magics.

markgraf
  • 2,860
0

Ed and Ex are POSIX editors that can handle this task.

They are quite similar and in the solutions presented here ed and ex are 100% interchangeable1.

General solution

printf '%s\n' a '' . 0a '' . '?.?+1,$d' '1,/./-1d' w q | ex -s file

If the file is known to have empty lines at the beginning and end

printf '%s\n' '?.?+1,$d' '1,/./-1d' w q | ex -s file

If you actually meant blank2 lines

printf '%s\n' a '' . 0a '' . '?[^[:blank:]]?+1,$d' '1,/[^[:blank:]]/-1d' w q | ex -s file

Explanation and breakdown

The manual is always the best explanation, but here is an overview:

Ed and Ex always start with the last line selected — so if we were to issue an unadorned d (delete command), it would delete the last line — and they can look for lines matching regular expressions.

Some commands take addresses ("line numbers"), e.g. 3,6d deletes from lines 3 to 6.

  • /regex/ looks ahead for the first line matching "regex".
  • ?regex? looks behind for the first line matching "regex".

Guess what? A regex can be an address too.

# Insert an empty line at the end
a

.

Insert an empty line at the beginning

0a

.

Delete from line L1 up to line L2, where

L1 is the line below the last non-empty line: ?.?+1

L2 is the last line: $

?.?+1,$d

Delete from line L3 up to line L4, where

L3 is the first line: 1

L4 is the line above the first non-empty line: /./-1

1,/./-1d

Write the changes to the file and quit

w q

Why do we need to temporarily add two empty lines for the general solution? Because otherwise 1,/./-1d would always delete the first line and ?.?+1,$d the last, even if not empty.

1: But IIRC a clean Debian installation lacks Ed, so I'm going with Ex.
2: I.e., lines that are visually empty but may contain spaces and tabs.

Quasímodo
  • 18,865
  • 4
  • 36
  • 73
-1

command

sed -n '/[a-zA-Z]/,/[a-zA-Z]/p' file| awk 'OFS=":"{$2=$1;$1=NR;print }'

output

1 line1
2 
3 line2
  • 6
    There's no need to use sed here as awk could easily do the same test. On the other hand, the line numbers are only for illustration in the question, which means that you may not need awk at all. In any case, it's almost never a reason to use both sed and awk in the same pipeline. – Kusalananda Nov 17 '19 at 09:47