0

File content as below:

Text1:
    text_1: Text1 text_1
    text_2:
    - text
    - file1:\\
    - file2:\\
Text2:
    text_1: Text2 text_1
    text_2:
    - text
    - file3:\\
Text3:
    etc

Output: print "file:\" entries for given Textn. Any idea how to achieve this using sed/awk commands in Linux.

Example: test.txt file contents as below:

$ cat test.txt 
Text1:
    text_1: Text1 text_1
    text_2:
    - text
    - file1:\\
    - file2:\\
Text2:
    text_1: Text2 text1
    text_2:
    - text
    - file3:\\
Text3:
    etc

tried below grep command as suggested and it prints all "file:\" entries in test.txt file. For "Text1:" match all I need is file1:\ and file2:\ as output and For "Text2:" match file3:\ only.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Rang's
  • 11
  • I'd use grep, something like grep -E 'Test[0-9]+|file[0-9]+:' the_file. UNTESTED – waltinator Jul 28 '23 at 07:09
  • Hi , Thank you for your reply, this will get all the "file:\" entries from the_file. for Text1 it should give only file1:\ and file2:\. – Rang's Jul 28 '23 at 07:33
  • 1
    If that's yaml you should be using a yaml-aware parser. For example, yq – Chris Davies Jul 28 '23 at 08:04
  • Yes, It's yaml content. – Rang's Jul 28 '23 at 08:09
  • Like roaima said, use a tool which understands yaml. sed, grep, awk all lack context to extract this. How should they decide that file3:\ does not belong to Text1. If you get something with sed/grep/awk which works you just made a bad yaml parser which breaks on the first slight change on the input format. Also your example appears broken. Anything under Text1 and so on is just one string value. – Paul Pazderski Jul 28 '23 at 13:06
  • as well as yq, it's worth noting that perl, python, and many other languages have yaml parsing libraries....and if you're doing more than just simple extraction from an existing yaml file it would be better to use one of those - bash really isn't suited to data processing itself, it's good at coordinating the execution of other programs to do that. – cas Jul 29 '23 at 01:00
  • Why did you remove the grep command you tried from the question? – AdminBee Aug 03 '23 at 12:41

1 Answers1

2

Using yq on the newly updated YAML file you can select the file values like this

yq '.Text1.text_2[] | select(. == "file*")' file.yaml

Output

file1:\\
file2:\\

If you want to be able to pick out the different TextN values you can do something like this to pass in the appropriate key value

for key in Text1 Text2
do
    printf 'Key %s:\n' "$key"
    yqText="$key" yq 'eval("." + env(yqText) + ".text_2[]") | select(. == "file*")' file.yaml |
        while IFS= read -r val
        do
            printf 'Value: %s\n' "$val"
        done
    echo
done

If you don't have yq installed you can either install it yourself from the Github repository at https://github.com/mikefarah/yq, or if it's for use in a managed environment then ask your Change Board.

Chris Davies
  • 116,213
  • 16
  • 160
  • 287