0

I have a variable that goes more or less like this:

$ echo "$LIST"
file1: ok
file2: ok
file3:
file4:
file5: ok

Then I need to get the list of files that are not ok:

$ sed '/:\s.\+$/d' <<< "$LIST"
file3:
file4:

That works file, but in the event that there is no file that are not ok in the list, this will happen:

$ echo "$LIST"
file1: ok
file2: ok
file3: ok
file4: ok
file5: ok
$ sed '/:\s.\+$/d' <<< "$LIST"
$ NEWLIST=$(sed '/:\s.\+$/d' <<< "$LIST")
$ echo "$NEWLIST"

$ cat -A <<< "$NEWLIST" $ $ wc -l <<< "$NEWLIST" 1 $ wc -c <<< "$NEWLIST" 1

This newline (I think that is a \n) added to the variable is making a mess on my program as it identifies as having one file listed because I use wc -l to know how many files exist there. I'm not entirely sure if the \n is assigned by bash or by sed. Does anyone know a workaround?

3 Answers3

4

If you just want to exclude ok items, you could do this:

grep -cv ': ok$' <<< "$LIST"

Or similarly, but broadly opposite

grep -c ':$' <<< "$LIST"

EDIT: based on comment from @ilkkachu

If the list is entirely empty the -vc variant will erroneously report a count of 1, due to a <<< behaviour.

You could either add a guard to detect an empty LIST or simply use pipe instead

printf "%s" "$LIST" | grep -vc ": ok$"

If the list could contain blank lines they would also cause a miscount when using -vc.

In either case a further modification could be applied to prevent miscounting.

grep -vcE "^$|: ok$"

But now we are starting to jump through hoops and thus making the code harder to understand.

bxm
  • 4,855
  • @ilkkachu, I don't see grep -c counting the here-string extra empty line. This worked like a charm. In your example it counts as 3 because you did add an extra line with your second \n – Adriano_epifas Jul 26 '23 at 22:31
  • You raise some valid edge cases @ikkachu, answer updated. – bxm Jul 27 '23 at 07:44
  • what am I missing here. Your original response was correct. It was not going to report with if the list is empty. Now after you edit it, your answer is wrong. The grep -c was just perfect in my case. – Adriano_epifas Jul 27 '23 at 15:51
  • My edit was just to address a technically correct issue highlighted with the -vc variant (though perhaps not relevant with your data). I'd agree that the -c variant would be preferable, and this is still included in my answer. – bxm Jul 27 '23 at 22:43
4

A here-string <<< adds a newline at the end of the string before passing it to the command, and a command substitution removes any trailing newlines after reading the output of the command inside.

Also, echo adds a trailing newline to what it prints, but that's not a major issue here.

So, let's say you have one complete line in the variable, with the newline at the end, so the string is file1: ok<nl>

Now, you run sed '/ok/d' <<< "$LIST", and here, sed gets the input file1: ok<nl><nl>. It removes any lines that contain ok, and outputs the empty line <nl>.

You capture that with a command substitution, which removes the trailing newlines, giving the empty string. That's then assigned to NEWLIST.

Then, echo "$NEWLIST" prints a single newline (because echo adds one), and wc -l <<< "$NEWLIST" gives a single newline as input to wc (because the here-string adds one).

You'd get the same end result if the variable was initially just file1: ok, without the trailing newline, just the command substitution would have no newlines to remove from the end of the output of sed.

All of that means both command substitutions and here-strings mostly work well for single-line values, where you might not want the newline to appear in a variable but often do want to provide any commands with a complete line as input. As you saw, they do work for multiline-strings too (if the final newlines is again missing from the variable), but the odd dropping and adding of the newline still happens. It breaks down in the case where there is no newline to remove.

To see why that juggling is done, note that if command substitution didn't remove the newline from the output of date here, this would print a newline just before the period at the end, breaking the line in the middle.

$ weekday=$(date +%A)
$ echo "today is a $weekday."
today is a Thursday.

The simplest workaround is likely to just store multiline data like that in files instead. With inputfile containing the five lines (with the appropriate newlines):

file1: ok
file2: ok
file3: ok
file4: ok
file5: ok

then:

tmpfile=$(mktemp)
sed -e '/ok/d' < inputfile > "$tmpfile"
wc -l < "$tmpfile"
rm -f "$tmpfile"

outputs 0.

(Note that neither \s or \+ is standard POSIX basic or extended regex syntax, so they don't work on all systems, e.g. with the sed on macOS. That's why I used just /ok/ above.)

ilkkachu
  • 138,973
  • I see your point. Thanks for the answer. But I would rather avoid working with files at this point, bxm above gave an answer that is quite interesting, using grep -c – Adriano_epifas Jul 26 '23 at 22:28
1

I'm not entirely sure if the \n is assigned by bash or by sed.

It is neither at this point. You can check it with for example

$ echo -n "$NEWLIST"  # echo without -n adds a newline
$ wc -l <<< ""
1
$ wc -c <<< ""
1

It is the echo which is intentionally printing your variable (which is empty) and ends with a newline and for the here-string Bash ensures that it ends with a newline because tools tend to behave surprising if the last line is not ending with a \n.

The simplest way to fix it is probably to just check if NEWLIST is empty. Alternatively work more with files directly. For example:

list_file="$(mktemp)"
new_list_file="$(mktemp)"

cleanup

trap 'rm "$list_file" "$new_list_file"' EXIT

echo "$LIST" > "$list_file" sed '/:\s.+$/d' "$list_file" > "$new_list_file" wc -l "$new_list_file"

or wc -l < "$new_list_file" if you want to prevent it from printing the filename

For an example of an unexpected result if here-string would not add a newline think what result you expect from the following command and then run it:

echo -n "Content" | wc -l

Answer is: 0

  • How would you explain wc case? – Arkadiusz Drabczyk Jul 26 '23 at 20:20
  • Which wc case? The wc -c <<< ""? The here-string adds an implicit newline so the input has one character, the \n. – Paul Pazderski Jul 26 '23 at 20:24
  • @PaulPazderski, so, why are you saying the newline is due to "neither" Bash or sed, when you know the shell adds a newline at the end of a here-string? – ilkkachu Jul 26 '23 at 21:05
  • @ilkkachu Was a bit unprecise there. Adriano_epifas assumed that the error happens around the sed command and with my "at this point" I meant that none of the sed invocations introduced the unwanted newline and in fact neither the LIST nor NEWLIST variable had an unwanted newline. In the end here-string is part of Bash so you could say Bash adds the newline. – Paul Pazderski Jul 26 '23 at 21:11
  • @PaulPazderski, I did mention it could be bash doing that. But more than that, using echo -n is not a solution because if my list has 1 item only, it will show as "0" since the last /n is suppressed. echo -n messes up the result regardless the number of items in the variable – Adriano_epifas Jul 26 '23 at 22:19
  • @Adriano_epifas your right. Will adapt the answer. Well then you can simply add a [[ -n "$NEWLIST" ]] to check if it was not empty or work more with files: echo "$LIST" > list.tmp; sed '/:\s.\+$/d' list.tmp > newlist.tmp; wc -l newlist.tmp – Paul Pazderski Jul 26 '23 at 22:26
  • ikkachu below did suggest working with files, but bmx above suggested using grep -c that works just fine – Adriano_epifas Jul 26 '23 at 22:33