39

I want to remove all empty lines from a file. Even if the line contains spaces or tabs it should also be removed.

kenorb
  • 20,988
  • Is it enough to handle the well-known 6 ASCII whitespace characters, or must it handle all 25 Unicode whitespace characters? Also assume you don't care about lines with only non-printable/control characters, either way. Should we assume the input is ASCII, UTF-8, Shift_JIS or could be anything? – smci Feb 20 '24 at 04:37

9 Answers9

35

Here is an awk solution:

$ awk NF file

With awk, NF only set on non-blank lines. When this condition match, awk default action that is print will print the whole line.

poige
  • 6,231
cuonglm
  • 153,898
32

Just grep for non-blanks:

grep '[^[:blank:]]' < file.in > file.out

[:blank:], inside character ranges ([...]), is called a POSIX character class. There are a few like [:alpha:], [:digit:]... [:blank:] matches horizontal white space (in the POSIX locale, that's space and tab, but in other locales there could be more, like all the Unicode horizontal spacing characters in UTF8 locales) while [[:space:]] matches horizontal and vertical white space characters (same as [:blank:] plus things like vertical tab, form feed...).

grep '[:blank:]'

Would return the lines that contain any of the characters, :, b, l, a, n or k. Character classes are only recognised within [...], and ^ within [...] negates the set. So [^[:blank:]] means any character but the blank ones.

  • 1
    should there also be a $ for end of line? – Michael Durrant Nov 16 '13 at 21:10
  • @MichaelDurrant It's not anchored on either side – jordanm Nov 16 '13 at 21:11
  • 1
    @MichaelDurrant. [^[:blank:]]$ would only match lines that end in a non-blank. We want lines that contain a non-blank anywhere – Stéphane Chazelas Nov 16 '13 at 21:11
  • @StephaneChazelas I tried grep [:blank:] SOURCEFILE even this command is working. I understand [] is for character class can you please give me some idea on how it works ? the snippet :blank: is new to me. – Jamshed Ansari user3000272 Nov 16 '13 at 21:52
  • Are there any cases where grep -E '\S' wouldn't work? – terdon Apr 06 '16 at 14:33
  • 1
    @terdon \S is a perl regex operator, not a standard ERE one. I beleive it was also recently added to GNU regexp, but I don't expect many other implementations would support it. In any case, it is for space, not blank. – Stéphane Chazelas Apr 06 '16 at 14:41
  • @StéphaneChazelas \S is for non-whitespace, not space. It matches any non-whitespace character. I thought it was PCRE only and just found out it works on normal GNU grep and sed on my Arch which is why I asked. – terdon Apr 06 '16 at 14:45
  • @terdon I meant it's '[^[:space:]]' not [^[:blank:]]. posting on a smart phone where those characters are a pain to type. Check the opengroup site for the ERE syntax. The GNU implementation is generally not the one you want to try to test for portability as it's generally the one with the most extensions – Stéphane Chazelas Apr 06 '16 at 14:57
9

How about:

sed -e 's/^[[:blank:]]*$//' source_file > newfile

or

sed -e '/^[[:blank:]]*$/d' source_file > newfile

i.e.

For each line, substitute:

  • if it starts ("^")
  • with spaces or tabs ("[[:blank:]]") zero or more times ("*")
  • and then is the end of the line ("$")

More info on ::blank:: and other special characters at http://www.zytrax.com/tech/web/regex.htm#special

  • 5
    [[:space:]] includes tabs. If it didn't your regex would fail if a space followed a tab. – jordanm Nov 16 '13 at 21:06
  • The wctype(3) and isalpha(3) manpages describe what the character classes will match. – jordanm Nov 16 '13 at 21:10
  • 1
    You may want to remove the first one which doesn't answer the question. – Stéphane Chazelas Nov 16 '13 at 21:33
  • @MichaelDurrant can you please write some thing about [[:blank:]] ? – Jamshed Ansari user3000272 Nov 16 '13 at 22:00
  • Added info for [[:blank::]]. Stephane, why doesn't the first work? I thought // at the end would replace the line without nothing. – Michael Durrant Nov 16 '13 at 22:56
  • Generally if I want to strip out blank lines as per OP, I'll use sed, but with the -i option to change "in place" (or "inline"). Worth reading the man page for the full caveat on using -i, but it saves on using a temporary output file if you have a string of inline text replacements that you need to make to a file. – Mark Glossop Nov 17 '13 at 06:34
4

Looks like I've found one not that fast, but funny at last:

| xargs -L1

poige
  • 6,231
2

You can use sed command for removing blank lines:

sed '/^$/d' in > out

This command deletes all empty lines from the file "in"

TPS
  • 2,481
1

Adding to @cuonglm's answer, some lines could have non printable color characters, so removing color before running ...| awk NF could be a good idea. In the below command, sed removes color from the text, awk then removes lines with white space.

...| sed $'s/\e\\[[0-9;:]*[a-zA-Z]//g' | awk NF

0

Try ex-way:

ex -s +'v/\S/d' -cwq test.txt

For multiple files (edit in-place):

ex -s +'bufdo!v/\S/d' -cxa *.txt

Note: The :bufdo command is not POSIX.

Without modifying the file (just print on the standard output):

cat test.txt | ex -s +'v/\S/d' +%p +q! /dev/stdin
kenorb
  • 20,988
  • note bufdo is not POSIX http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ex.html – Zombo Apr 17 '16 at 00:44
0

Use the following command:

grep '\S' FILE

which removes all lines including spaces or tabs.

Otherwise, removal not including lines with spaces/tabs, use:

grep . FILE

For example:

$  printf "line1\n\nline2\n \nline3\n" > FILE
$  cat -v FILE
line1

line2

line3
$  grep '\S' FILE
line1
line2
line3
$  grep . FILE
line1
line2

line3

See also:

kenorb
  • 20,988
-1

strings looks at line-endings as something not-a-string.

cat file.txt | strings

...and all empty lines are gone.

Kusalananda
  • 333,661
  • 4
    Welcome to the site, and thank you for your contribution. Unfortunately, strings only looks at "printable characters followed by a non-printable character". Space is "printable", so it will print lines that only contain spaces. The only reason why it apparently may work is because it has a lower threshold of what it considers a string (the default is 4 characters), so any line that contains less than 4 spaces and nothing else will be discarded - but so will be any line that contains less than 4 alphabetic characters. Lines with more than 4 spaces will be printed. – AdminBee Mar 09 '21 at 14:24