2

So, a while ago I saw this snippet for extracting text between two "markers":

# Usage: extract file "opening marker" "closing marker"
    while IFS=$'\n' read -r line; do
        [[ "$extract" && "$line" != "$3" ]] &&
            printf '%s\n' "$line"
    [[ "$line" == "$2" ]] && extract=1
    [[ "$line" == "$3" ]] && extract=
done < "$1"

(Here i just took the liberty to remove it from the function and put it in a file called extract) Now, it does work fine on "most" pair of markers. But i noticed it doesn't always work:

Following the original snippet's example, using N repeated char (using "#" instead of "`" because of formatting error on SO):

###sh
test
###

works when doing extract file '###sh' '###' but if we use the following marker:

###
test
###

and do extract file '###' '###', then it doesn't work?

Though i can see that the condition in the script does evaluate correctly (the extract variable being equal to 1 when using set -x).

What's wrong here?

PS: By saying "It doesn't work", I do mean that it doesn't print anything in the instance when it doesn't work, of course.

The two example output above shouldn't contain the markers (just the texts extracted between two markers)...

I prefer a bash/shell solution if possible.

2 Answers2

4

As stated by others in comment to your question, your script does not work because when the start condition [[ "$line" == "$2" ]] is met, extract is set to 1, but on the next line the end condition [[ "$line" == "$3" ]] is also met, which reset extract to the empty string.

Here is your script fixed:

# Usage: extract file "opening marker" "closing marker"
while IFS=$'\n' read -r line; do
    if [ "$extract" ]; then
        if [[ "$line" == "$3" ]]; then
             extract=
        else
            printf '%s\n' "$line"
        fi
    elif [[ "$line" == "$2" ]]; then
        extract=1
    fi
done < "$1"

And, in case you need this, at @Freddy's suggestion, here is a slightly modified version that requires that the end marker be present for the text to be printed:

# Usage: extract file "opening marker" "closing marker"
while IFS=$'\n' read -r line; do
    if [ "$extract" ]; then
        if [[ "$line" == "$3" ]]; then
            printf '%s\n' "${lines[@]}"
            lines=() extract=
        else
            lines+=( "$line" )
        fi
    elif [[ "$line" == "$2" ]]; then
        extract=1
    fi
done < "$1"

(lines are accumulated in the lines array and are only printed when the end marker is met)

xhienne
  • 17,793
  • 2
  • 53
  • 69
0

Add a toggling logic to the extract variable whenever $2 is seen. Thanks to xhiene for pointing it out.!

[[ $line == $2 ]] && case $extract in '') extract=1;; *) extract=; esac

And remove the $3 dependency on extract variable now.

HTH.

guest_7
  • 5,728
  • 1
  • 7
  • 13
  • 2
    If the two markers are the same, when will you hit the [[ "$line" == "$3" ]] line, then? – xhienne May 09 '21 at 00:13
  • Yes that is correct, it will never enter there. Toggling extract is the way to go in that case. – guest_7 May 09 '21 at 00:16
  • Usually the OP uses the same script with different markers. Now he asks us to fix it when the markers are the same. By toggling like you are doing, the script is now broken when the markers are different. – xhienne May 09 '21 at 00:28
  • There's another flaw (not yours). The input only needs the start marker to print the text. – Freddy May 09 '21 at 00:29