1

I have a bunch of text files, some of which contain lines that are empty, i.e. only consist only of a newline, or possible spaces followed by a newline. I locate the files using a find command.

  • Example file
    #Title 1
    12345678 1234
    

    #Title 2 12345678 1234 12345678 1234

  • Expected output
    #Title 1
    12345678 1234
    #Title 2
    12345678 1234
    12345678 1234
    

I want to remove all such empty lines. I tried it with the following command on Debian Linux Stretch:

cat "/path/to/file" | sed '/^\s*$/d' | sponge "/path/to/file";

Some of the files had for example 4 or more trailing blank lines, but the above command only removed all but one of the trailing blank lines.

How could I remove this last trailing blank line? As mentioned: should there be any blank lines further up in the file then these should be removed also.

I am trying to get some consistency between the files as the files are stored in an array of sorts within a BASH variable. The files are then looped over and all the blank lines and trailing blank lines are removed, while some of the files already don't have blank lines or any trailing blank line.

AeroMaxx
  • 189
  • 1
    Please make sure you don't confuse a trailing newline (which is a good thing) with a trailing blank line. – Kamil Maciorowski Nov 07 '22 at 19:17
  • @KamilMaciorowski That depends a trailing newline is not always a good thing, all the lines are text files and grep or any other command won't be used on the file, unless that command removes blank lines, or trailing blank lines obviously. These files also won't be used on Linux beyond removing the blank lines and trailing blank lines. – AeroMaxx Nov 07 '22 at 20:45
  • I'm not clear of what a "trailing blank line" means. What's the difference between a "trailing blank line" and just a "blank line"? Maybe you could share an example input file, the desired output, and the the difference between the desired output and the actual output using your command. By the way, most sed versions have a -i flag that allows changing the file in place, so you won't need pipes, just: sed -i '/^\s*$/d' /path/to/file. – aviro Nov 08 '22 at 07:24
  • @aviro see updated question – AeroMaxx Nov 08 '22 at 07:48
  • Your sed command should work. And it does work when I run it on my machine. What is the actual result of your sed command? From what I understand, you're saying that there's always one blank line left in the output, no matter how many blank lines there are in the file? For instance, if there's only one blank line, does it get removed? If there are 2? 3? 10? Always one blank line is not removed? On any file? – aviro Nov 08 '22 at 07:54
  • @aviro It depends on what sed they are using. Some implementations may delete lines that contain zero or more s characters. The \s pattern is definitely not standard. This would explain why empty lines are removed while lines with only spaces or tabs would still be kept. – Kusalananda Nov 08 '22 at 07:58
  • @Kusalananda but the OP says that it does remove all blank lines but one, so from what he says it does partially work. – aviro Nov 08 '22 at 08:00
  • 1
    @aviro Which is what I'm saying too. Lines that are removed contain only zero or more s characters. The lines that are not removed may contain spaces. – Kusalananda Nov 08 '22 at 08:07
  • @Kusalananda but if the line contains s character, it's not a blank line. A blank line won't include any character. – aviro Nov 08 '22 at 08:19
  • @aviro If their sed does not understand \s as a space character, it may understand it as an s. The pattern would then match lines consisting of zero or more s characters (which includes empty lines, but not lines containing spaces). – Kusalananda Nov 08 '22 at 08:24
  • Assuming by 'trailing newline' you're not referring to https://stackoverflow.com/q/729692/7270649 and/or the POSIX definition: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206 (similar question as posed by @KamilMaciorowski). – jubilatious1 Nov 17 '22 at 01:58

7 Answers7

5

If I understand your question correctly, you want to remove (truly or visually) empty lines from a text file. This can be done easily using awk.

For a single file, you could call

awk 'NF' /path/to/file

This will only print files that have at least one "non-blank" character on the line. The idea behind this is that awk by default splits input lines into "fields" at "whitespace", i.e. contiguous runs of space and tab characters. However, if a line only consists of such characters, the number of fields, stored internally in the automatic variable NF, will be recognized as "zero". The (rather short) program above imposes the condition that NF must be non-zero in order for it to print the current line. This effectively removes truly or "visually" empty lines.

Since awk by default will not perform inline editing, you may either have to resort to redirecting the output to a temporary file and then renaming, or using a sufficiently recent implementation that understands the -i inplace extension:

awk -i inplace 'NF' /path/to/file
AdminBee
  • 22,803
3

Here's another portable approach, which is to include only lines that contain something other than whitespace:

grep '[^[:space:]]' file

You can use the same approach with other commands too:

sed -n '/[^[:space:]]/p' file

Writing to the same file as the source is a fairly standard procedure. Some commands use -i (or an equivalent) to indicate in place editing, but in practice they actually write to a temporary file and then overwrite the original file with the temporary:

some_command file >file.tmp && mv -f file.tmp file
rm -f file.tmp

That works well provided that file didn't have hard-links from other places. To cater for such situations you need a double copy:

some_command file >file.tmp && cat file.tmp >file
rm -f file.tmp
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
2

I can unfortunately only reproduce your issue on macOS, where sed understands \s as s. The pattern ^\s*$ would therefore match any line consisting of zero or more s characters. This includes empty lines, but not lines that contain only space-like characters.


A portable way of removing lines that are empty or that only contain spaces or tabs is

grep -v -x '[[:blank:]]*' file

This uses grep to extract only lines that do not match [[:blank:]]*. The [[:blank:]]* pattern matches zero or more spaces or tab characters. If you want to match a larger set of space-like characters (including things like carriage returns and vertical tabs), use [[:space:]]* instead. The -x option to grep forces the pattern to match complete lines (as if you had anchored the expression with both ^ and $).

Kusalananda
  • 333,661
  • This works great, thank you ever so much for this! – AeroMaxx Nov 08 '22 at 08:31
  • I am using sed on Debian Linux Stretch. – AeroMaxx Nov 09 '22 at 06:36
  • @AeroMaxx Debian 9 ("Stretch") was discontinued 2020. You should upgrade. I don't have access to releases that old to test with. (EDIT: Yes I have, now, but I can't actually reproduce your issue.) – Kusalananda Nov 09 '22 at 06:48
  • Yeah I realise that, I just haven't done so in the fear of losing some work, or something that works now and then doesn't after upgrade. The machine is on a local network so not accessible from the internet. But back to the issue, can you try with a file with windows line endings, does that reproduce the issue? – AeroMaxx Nov 09 '22 at 12:42
  • @AeroMaxx Yes, I also tested with a DOS text file, but that does not trigger the behaviour that you see. – Kusalananda Nov 09 '22 at 12:45
  • So when you open the result file after the command in say notepad++ you don't say 5 lines with printable characters and 6th line with nothing on it. just based on the line numbers shown in notepad++ ? – AeroMaxx Nov 09 '22 at 12:54
1

You can just replace the \s with [[:space:]]. In addition, most sed versions have a -i flag which means edit the file in place. So this command should work:

sed -i '/^[[:space:]]*$/d' /path/to/file
aviro
  • 5,532
1

You could use:

grep '[[:graph:]]'

Which would report the lines containing at least one graphical character, so excluding lines that are empty or consist only of whitespace character, control characters, unknown/undefined/invalid characters.

0

Testing that on a system with GNU tools; here, the % is the shell prompt, and the $ the end-of-line marker added by cat -A:

% printf 'foo\n   \nbar\n   \n    \n' > file.txt
% cat -A file.txt
foo$
   $
bar$
   $
    $
% cat file.txt | sed '/^\s*$/d' | sponge file.txt
% cat file.txt  
foo
bar
%

There are no empty lines at the end of the resulting file.

The fact that some editors allow moving the cursor under the last line to make it easier to add a new line at the end is unrelated to how the apparently-empty lines get removed by that sed command.

(Instead of the pipe with sponge, you could just use sed -i '/^\s*$/d' file.txt, but you probably should use [[:space:]]* rather than \s* is that one is more widely supported.)

ilkkachu
  • 138,973
0

Using Raku (formerly known as Perl_6)

To remove blank (i.e. non-character containing) lines:

raku -ne '.put if .chars;' 

To remove blank (i.e. non-character containing) lines, as well as lines that consist solely of one-or-more \h horizontal-whitespace characters:

raku -ne '.put if .subst(/^ \h+ $/).chars;'  

Sample Input:

#[blank]
#[blank]
Title 1
12345678 1234
#[whitespace] 
Title 2
12345678 1234
12345678 1234
#[blank]
Title 3
12345678 1234
12345678 1234
#[blank]
#[blank]
#[blank]

Sample Output (second code example above):

Title 1
12345678 1234
Title 2
12345678 1234
12345678 1234
Title 3
12345678 1234
12345678 1234

https://raku.org
https://rakudo.org

jubilatious1
  • 3,195
  • 8
  • 17