0

I have the following regex that works for me in sed:

cat <<EOF | sed -E '/^([A-Z][a-z]+){2,}$/Q'
Nothing Relevant
TotallyFake:
  • NowWeWant
  • TheseLines
  • AndAlsoThisLine

ButNotThisLine

  • OrThisLine

EOF

This only outputs the lines we want... but also the header lines, which is less good. So I lookes around and found the /this/,/that/ approach, and thought, cool! I can find the first PascalishCase thing and then break at the first empty line.

So I tried this:

cat <<EOF | sed -En '/^- ([A-Z][a-z]+){2,}$/,/^$/p'

Nothing Relevant TotallyFake:

  • NowWeWant
  • TheseLines
  • AndAlsoThisLine

ButNotThisLine

  • OrThisLine

EOF

However... it gives me OrThisLine. which is far less desirable.

How can I just use sed to find the first block of PascalText beginning with a - and only print those lines?

[edit]

Since the contents weren't clear enough, the output I want is:

- NowWeWant
- TheseLines
- AndAlsoThisLine

My understanding was that /this/,/that/ would find the first this and go to the first that after "this", but the ^$ pattern isn't matching the first blank line, it appears to be matching EOF.

Wayne Werner
  • 11,713
  • 2
    It's unclear what the expected output should be for the two pipelines (you allude to something about getting the lines from the first PascalCase line to the next empty one, but it's unclear what lines these are). Also, please specify whether the document is a YAML document, in which case there may be much better tools than sed to process it, depending on what it is you're actually wanting to do. – Kusalananda Jan 20 '23 at 07:05

1 Answers1

1

Assuming this is a YAML file like this (the indentation of the array elements is optional):

---
Somesection:
Someothersection:
TotallyFake:
  - NowWeWant
  - TheseLines
  - AndAlsoThisLine

ButNotThisLine:

  • OrThisLine

... and that you want to get the elements of the top-level TotallyFake array.

You would extract the TotallyFake top-level array (as YAML) using Mike Farah's yq (the most commonly available yq on Linux) like so:

$ yq '.TotallyFake' file
- NowWeWant
- TheseLines
- AndAlsoThisLine

Would you want the elements of the TotallyFake array as individual lines, expand the array by adding [] at the end:

$ yq '.TotallyFake[]' file
NowWeWant
TheseLines
AndAlsoThisLine

The corresponding commands using Andrey Kislyuk's yq (a wrapper around the well known jq JSON processor):

$ yq -y '.TotallyFake' file
- NowWeWant
- TheseLines
- AndAlsoThisLine

Here, the -y option tells yq to extract the data as YAML. Without it we would get a JSON-encoded array back (the equivalent of ["NowWeWant","TheseLines","AndAlsoThisLine"]).

To get the element as separate lines:

$ yq -r '.TotallyFake[]' file
NowWeWant
TheseLines
AndAlsoThisLine

The -r option gives us decoded ("raw") strings.


With sed, treating the input as text rather than a document in a structured document format:

$ sed -e '/^TotallyFake:/,/^$/!d' -e '//d' file
- NowWeWant
- TheseLines
- AndAlsoThisLine

This deletes all the lines that are outside of the section that we're interested in, and then deletes the actual range start and end lines with a second d command. The empty regular expression is a special syntax that tells sed to reuse the most recently match regular expression.

Note that this relies on whitespace that is optional in YAML documents (the empty line after the last element of the TotallyFake array).

Kusalananda
  • 333,661
  • definitely not YAML, also I can't guarantee that there's anything before the first - CamelCaseWord. – Wayne Werner Jan 21 '23 at 15:11
  • Turns out this was close! And apparently I misunderstood how sed does marker patterns - https://unix.stackexchange.com/a/180729/5788 looks like it finds every /A/,/B/ set. sed -E '/- ([A-Z][a-z]+){2,}/,/^$/!d;/^$/q' does exactly what I need. – Wayne Werner Jan 21 '23 at 15:34
  • And more particularly, since I'm trying to also get rid of the leading -: sed -E '/- ([A-Z][a-z]+){2,}/,/^$/!d;s/^- //;/^$/q' is completely what I needed – Wayne Werner Jan 21 '23 at 15:37