cut empty lines from a file (bash script)

Question

I am trying to get rid of all the empty lines from a file but I do want to preserve "\n" after every non empty line.

Problem: command works correctly if used in CLI but as soon as I use any command in a bash script it removes all "\n" so I have all my results in one line instead of having them in separate lines.

here is my code:

#printing second and third word from every line and remove lines that do not contain any digits
    result=$(cat "$output_file" | awk '{print $2" "$3}' | sed 's/[^0-9]*/\\n/')
    echo -e ""$result"" > "$output_file"
#getting rid of all empty lines but what happens is that the whole file becomes one line
    no_empty_lines=$(cat "$output_file" | awk NF)
    echo -e ""$no_empty_lines"" > "$output_file"

file to edit:

> 135.121.62.246 7.4 
> 135.121.160.65 7.8 
> 135.121.106.56 7.5 
>  
>  
> 135.121.106.96 6.2 
>  
>  
> 135.121.160.106 10 
>   
> 135.121.90.46 commandFailed

demanded result:

file to edit:

> 135.121.46.246 7.4 
> 135.121.106.46 7.8 
> 135.121.106.56 7.5 
> 135.121.106.96 6.2  
> 135.121.160.16 10 
> 135.121.90.46 commandFailed

are those > signs part of the file data or just part of the (broken) representation here? Because you're referring to fields $2 and $3 in the awk code? — ilkkachu, Dec 29 '20 at 15:20
@ilkkachu no, this is a quotation wrapped it in
that's why ">" are showing. — dwt.bar, Dec 29 '20 at 15:34

Chris Davies · Answer 1 · 2020-12-29T14:51:42.000

You can match a line that contains at least one character:

grep . {file}

Putting this into some code that replaces the file in question. We create a temporary file, and if the creation was successful then we replace the original with the temporary. Finally we delete the temporary file just in case it didn't successfully replace the original.

file=some_file.txt
grep . "$file" >"$file.tmp.$$" && mv -f "$file.tmp.$$" "$file"
rm -f "$file.tmp.$$"

As an aside, here's why you lose the linebreaks in your original code:

result=$(cat "$output_file" | awk '{print $2" "$3}' | sed 's/[^0-9]*/\\n/')
echo -e ""$result"" > "$output_file"

The $result variable correctly contains the text, including its linebreaks. (It's an inefficient line, but let's ignore that issue as it works.)

However, the echo line is really strange. I don't understand why you have "" there - it represents a zero length quoted string and could equally usefully be removed, leaving this:

echo -e $result > "$output_file"

The shell then evaluates the content of $result, translating strings of whitespace into a single space. In this context, tabs and newlines are considered whitespace. (hello whole\nworld gets read as hello whole world.)

If you double-quoted your variable when you used it, this issue wouldn't occur

echo -e "$result" > "$output_file"

Kusalananda · Answer 2 · 2020-12-29T15:56:30.077

Your code, improved:

awk -i inplace '$2 ~ /[0-9]/ || $3 ~ /[0-9]/ { print $2, $3 }' "$output_file"

This assumes that you are using GNU awk 4.1.0 or later (for the -i inplace option). The code extracts the 2nd and 3rd field from any line where at least one of these fields contains a digit.

Without GNU awk:

tmpfile=$(mktemp)
cp "$output_file" "$tmpfile"
awk '$2 ~ /[0-9]/ || $3 ~ /[0-9]/ { print $2, $3 }' "$tmpfile" >"$output_file"
rm -f "$tmpfile"

Another formulation of the awk program would be to reset $0 to the 2nd and 3rd fields, and then do the test for digits:

awk -i inplace '{ $0 = $2 " " $3 }; /[0-9]/' "$output_file"

There are a number of issues in your code. The thing you are mentioning yourself, getting all the lines in one single line in the end, is due to using the value $result unquoted with echo. The $result expansion is unquoted because you, for whatever reason, use two sets of double quotes (two empty strings) on either side of the expansion, ""$result"".

When you use a variable expansion unquoted, the shell will take the value of the variable and split it on any space, tab or newline character to create a number of words. Each word will then undergo filename globbing. The resulting words are then used with echo -e in your code, which outputs each argument with spaces in-between them, and a newline at the very end.

Furthermore, you don't need to put the output of commands into variables. In this case, just redirecting to files will be just fine.

Your sed command inserts the string \n at the start of each line, replacing any run of non-digits that happens to be first on the line. It does not remove lines that does not contain digits. For that, use the sed expression /[0-9]/!d. But that is not needed as long as you only output lines with your awk script that contains digits (which my code above does).

It is surprisingly uncommon to pipe awk into sed or the other way around. awk is more than capable of doing whatever sed is able to do.

Alternatively to manually creating temporary files, sponge from moreutils is a nice tool that takes care of this, e.g. awk 'commands' file | sponge file. — FelixJN, Dec 28 '20 at 23:39
"The echo -e does not convert your \n into newlines because when unquoted, \n is just a literal n" -- if the backslashes come from a variable expansion, it doesn't matter if it's quoted or not. res='foo\nbar'; echo -e $res prints foo and bar, on two separate lines. — ilkkachu, Dec 29 '20 at 15:23
@ilkkachu So it is. You are correct. I will delete that passage. It would be nice to see the original data and not just the pre-processed stuff. — Kusalananda, Dec 29 '20 at 15:56

FelixJN · Answer 3 · 2020-12-28T23:02:19.727

1

The problem with your code is, that you save the results in a bash variable:

 no_empty_lines=$(cat "$output_file" | awk NF)

Which (skipping the redundant cat), can be seen as:

 result=$(command that returns multi-line data)

However bash turns multi-line strings into a single line with spaces.

Possible ways are here - which I assume is what you need, however with bash, your result could be an array:

 no_empty_lines=( $(awk 'NF' "$output_file") )

Entries now are ${no_empty_lines[0]}, ${no_empty_lines[1]}, ...

Call them with a loop

 for ((i=0;i<=${#no_empty_lines[@]}-1;i++)) ; do echo ${no_empty_lines[i]} ; done

Again - this is just to show you where your code failed due to bash and I'd suggest using one of the options form the above thread. ALSO: this array would put any word in a separate element of the array - thus completely removing your newline structure of the input.

edited Dec 28 '20 at 23:02

answered Dec 28 '20 at 22:56

FelixJN

13,566

Why would you put the output of the commands into variables at all? – Kusalananda Dec 28 '20 at 22:57
1

@Kusalananda I would not - it does not make sense for OP's aim. I just wanted to make OP aware what happens in his code and where bash behaves different from his expectations. – FelixJN Dec 28 '20 at 22:59
to append more strings to it and then override the output file with out empty lines. – dwt.bar Dec 28 '20 at 22:59
2

var=$(...) in bash doesn't turn multi-line strings into a single line with spaces – rowboat Dec 29 '20 at 01:44
1

In your answer, result correctly holds multiline data. The problem comes when it's printed out: the variable isn't quoted (rcho ""$result"" is the same as echo $result) so the shell parses the result into multiple words, with newlines having been treated just like any other whitespace – Chris Davies Dec 29 '20 at 08:06
@roaima that was it! thanks! – dwt.bar Dec 29 '20 at 14:27
@dwt.bar I have added a proper explanation into my answer. Please remember to accept the answer than helped you best, by using the green tickmark alongside the voting score and buttons. – Chris Davies Dec 29 '20 at 14:52
1

"However bash turns multi-line strings into a single line with spaces." -- no, it doesn't. – ilkkachu Dec 29 '20 at 15:24
1

The assignment no_empty_lines=( $(awk 'NF' "$output_file") ) will split the output of awk on any sequence of whitespace equally, so it will lose the positions of the line breaks. Furthermore, it will use the resulting words as filename globs, so a field like * would expand to a list of all files in the directory. The latter issue is there with your later echo ${no_empty_lines[i]} also. – ilkkachu Dec 29 '20 at 15:29

score -1 · Answer 4 · answered Dec 29 '20 at 14:27

With @roaima help I was able to narrow down the issue,

In your answer, result correctly holds multiline data. The problem comes when it's printed out: the variable isn't quoted (rcho ""$result"" is the same as echo $result) so the shell parses the result into multiple words, with newlines having been treated just like any other whitespace – roaima 6 hours ago

so here is a working solution:

result=$(cat "$output_file"| awk '{print $2" "$3}' | sed 's/[^0-9]*//')
echo -e "$result" | awk NF > "$output_file"

Assuming that the variable was stored correctly I removed extra quote while echoing the "$result", then I piped it to "awk NF" which removes empty lines and outputted that to a file.

now the result looks like so:

> 135.121.9.256 6.2
> 135.121.160.50 7.5
> 135.121.106.10 10
> 135.121.9.66 commandFailed 
> 135.121.100.156 commandFailed

cut empty lines from a file (bash script)

4 Answers4