How do I remove all specific sub-sections of a specific header in a YAML file?

Question

I'm using bash shell. I have a YAML file from which I want to remove certain blocks of text.

  /image-content:
    post:
      operationId: createEventPublic
      summary: Process events
      description: Process events
      parameters: []
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/Content'
      responses:
        '201':
          description: Created
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Content'
  /text-content:
    post:
      operationId: createStaticText
      summary: Process text events
      description: Process text events
      parameters: []
      requestBody:
    ...

I would like to remove (as an example) the block of text where the path contains "image-content". Normally I can use this to remove a single line with that text

sed -i '/image-content/d' ./infile

but I'm less clear how to replace every line after that up until the next line that begins with two spaces and a "/" (e.g. " /"). In the above, I would want to remove everything up until

  /text-content:

Edit: Although this may not be valid openapi 3 swagger, I believe it is still a valid YAML file

openapi: 3.0.0
components:
  /static/image-content:
    post:
      type: hello
  /api/hello:
    post:
      type: hello
  /static/css-content:
    post:
      type: hello

Ultimately, I would like to remove the blocks beginning with "/static". So the ending doc would be

openapi: 3.0.0
components:
  /api/hello:
    post:
      type: hello

Kusalananda · Answer 1 · 2021-05-16T18:15:47.937

yq -y 'del(."/image-content")' file.yml

This uses yq from https://kislyuk.github.io/yq/ to delete the top-level /image-content section from the YAML document using the del() command.

Given the example document in the question, as-is, this would result in the following YAML document being written to the terminal:

/text-content:
  post:
    operationId: createStaticText
    summary: Process text events
    description: Process text events
    parameters: []
    requestBody: null

Redirect this to a new file if you want to save it, or use the --in-place option to do in-place editing (after testing without that option first, of course).

yq is a wrapper around the JSON parser jq, allowing one to use jq expression to work with YAML files.

If the document in the question is partial and does not show its true structure (the extra two spaces of indentation implies that what we're seeing are sections on a secondary level), then you may need to use

yq -y 'del(.[]."/image-content")' file.yml

The .[]."/image-content" expression refers to "any /image-content section just beneath the top level".

To recursively search for and delete /image-content sections, regardless of where in the document they may occur, use

yq -y 'del(.. | ."/image-content"?)' file.yml

The expression used in del() recursively goes through the document structure using .. and pulls out any section called /image-content, where there is one (this corresponds to the // operator in XPath queries). These are then deleted.

Adressing your updated question:

yq -y '.components |= with_entries(del(select(.key | startswith("/static/"))) // empty)' file.yml

This updates the components section by taking its subsections, temporarily turning them into separate key and value values (see documentation for with_entries() in the jq manual), selecting and deleting the ones with keys starting with the exact string /static/.

The // empty bit: The del() operation results in null values. These can not be turned back from key and value values into proper subsections, so I change them to empty values instead, which makes them disappear completely. I'm not entirely sure about the inner workings surrounding this to be honest.

This results in

openapi: 3.0.0
components:
  /api/hello:
    post:
      type: hello

score 1 · Answer 2 · answered May 17 '21 at 01:13

Tested with GNU sed:

sed -n '
    /^\s*\/static/ {
        n
        :c
            /^[[:space:]]*\//! {
                n
                bc
            }
        }
    p
' data

Then for the second questions is basically the same:

sed -n '
    /^[[:space:]]\+\/image-content:$/ {
        n
        :c
            /^[[:space:]]\+\//! {
                n
                bc
            }
    }
    p
' data

The first line looks for the desired paragraph then loop and delete each lines of its untill a new paragraph is found. Of course you could insert the -i flags for inplace editing.

Philippos · Answer 3 · 2021-05-17T13:41:32.593

Generic solution: Delete matching line and all follwing lines that are more indented

If you have a file with a given format, it usually is a good idea to use a tool designed for that format. In your case, you seem to have a simple rule based on the spaces by which the lines are indented, so why not give a simple script for a standard tool:

sed -e 'H;x;/^\(  *\)\n\1/{s/\n.*//;x;d;}' -e 's/.*//;x;/\/image-content/{s/^\( *\).*/ \1/;x;d;}' file

What it does: If a line with a matching pattern is found, it gets deleted, while the number of spaces is saved in the hold space with one additional whitespace. Then, for each line it is checked whether it starts with at least as many whitespaces as the hold buffer has; if yes, delete it, too, until a line with less indention resets the hold space.

Detailled desciption

H;x appends the current line to the Hold space and exchanges spaces, so now the current line is now saved in the hold space, while in pattern space we can examine the line appended to the old hold space
/^\( *\)\n\1/ is the pattern to identify the there was at least one whitespace in the hold space and the current line had at least as many whitespaces as the hold buffer. This means we need to remove the lines and the {} lines are only executed in this case:
s/\n.*// clears everything starting from the newline, so we remove the appended line and restore what was in the hold buffer before. Now we can exchange buffers again to return to the old state and delete the current pattern space to start a new cycle
The rest of the script is only executed if no line was removed. s/.*//;x clears pattern space and exchanges spaces, so we are at the initial state: the current line is in the pattern space while the hold space is empty
Finally, we need the trigger for deleting a section: \/image-content can be any pattern for triggering, of course it could also be \/static and can be at any indention level. So everything after this will only be executed for the trigger line. All other lines will simply be printed.
s/^\( *\).*/ \1/;x takes all whitespaces from this line, adds another one and places this in the hold space for future comparison (which we did at the beginning of the script). Then of cource we need to delete to avoid any output.

score 0 · Answer 4 · answered May 20 '21 at 18:05

you can achieve this with php I have created a simple program that does this, you might want to change the hardcoded variables for command line arguments depending on your use case (I used php7.4 for this)

<?php
// Config:
$fileinname="data.yaml"; // file to take data from
$fileoutname="out.yaml"; // file to write output to
$break=['/image-content/','/text-content/']; // Regex patterns for lines to delete between
// End of conifg
$out="";
$stage=0;
$file=file($fileinname);
for ($i=0;$i<count($file);$i++){
    if ($stage == 0){
        if (preg_match($break[0],$file[$i])){
            $stage++;
        }
        else {
            $out.=$file[$i];
        }
}
elseif ($stage == 1) {
    if (preg_match($break[1],$file[$i])){
        $stage++;
        $out.=$file[$i];
    }
}
elseif ($stage == 2){
    $out.=$file[$i];
}

}
file_put_contents($fileoutname,$out);
?>

lindyang · Answer 5 · 2023-12-21T06:20:13.950

0

without comment

sed '/^ *\/image-content:/{
:sub;
  $b eof;
  N;
/^\( *\)[^ ].*\n\1[^ ][^\n]*$/!b sub;
:eof;
  s/^\( *\)[^ ].*\n\(\1[^ ][^\n]*\)$/\2/;
  t loop;
  d;
:loop; n; b loop;
}' file;

with comment

sed '/^ *\/image-content:/{
:sub;
  $b eof;  # end of file
  N;
/^\( *\)[^ ].*\n\1[^ ][^\n]*$/!b sub;  # leading-spaces==ending-spaces(\1). loop if not same level
:eof;
  # if join with the first-line of next block, only leave the joint-line.
  s/^\( *\)[^ ].*\n\(\1[^ ][^\n]*\)$/\2/;
  t loop;  # jump if s/././ is done
  d;  # no more lines after target block
:loop; n; b loop;  # b loop is to speed the process
}' file;

edited Dec 21 '23 at 06:20

answered Dec 20 '23 at 10:28

lindyang

1

1

Welcome to the site, and thank you for your contribution. You may want to add some text to explain how your sed program solves the task described in the question - inline comments are somewhat difficult to read. – AdminBee Dec 20 '23 at 15:14

How do I remove all specific sub-sections of a specific header in a YAML file?

5 Answers5