10

I would like to extract the commands from arbitrary shell scripts. I've used morbig (hat tip to Michael Homer for the suggestion!) to generate a JSON file from a shell script.

As an example, this shell script:

#!/bin/sh
echo hi
false || echo something
true && echo something

results in the following JSON:

[
  "Program_LineBreak_CompleteCommands_LineBreak",
  [ "LineBreak_Empty" ],
  [
    "CompleteCommands_CompleteCommands_NewlineList_CompleteCommand",
    [
      "CompleteCommands_CompleteCommands_NewlineList_CompleteCommand",
      [
        "CompleteCommands_CompleteCommand",
        [
          "CompleteCommand_CList",
          [
            "CList_AndOr",
            [
              "AndOr_Pipeline",
              [
                "Pipeline_PipeSequence",
                [
                  "PipeSequence_Command",
                  [
                    "Command_SimpleCommand",
                    [
                      "SimpleCommand_CmdName_CmdSuffix",
                      [
                        "CmdName_Word",
                        [ "Word", "echo", [ [ "WordName", "echo" ] ] ]
                      ],
                      [
                        "CmdSuffix_Word",
                        [ "Word", "hi", [ [ "WordName", "hi" ] ] ]
                      ]
                    ]
                  ]
                ]
              ]
            ]
          ]
        ]
      ],
      [ "NewLineList_NewLine" ],
      [
        "CompleteCommand_CList",
        [
          "CList_AndOr",
          [
            "AndOr_AndOr_OrIf_LineBreak_Pipeline",
            [
              "AndOr_Pipeline",
              [
                "Pipeline_PipeSequence",
                [
                  "PipeSequence_Command",
                  [
                    "Command_SimpleCommand",
                    [
                      "SimpleCommand_CmdName",
                      [
                        "CmdName_Word",
                        [ "Word", "false", [ [ "WordName", "false" ] ] ]
                      ]
                    ]
                  ]
                ]
              ]
            ],
            [ "LineBreak_Empty" ],
            [
              "Pipeline_PipeSequence",
              [
                "PipeSequence_Command",
                [
                  "Command_SimpleCommand",
                  [
                    "SimpleCommand_CmdName_CmdSuffix",
                    [
                      "CmdName_Word",
                      [ "Word", "echo", [ [ "WordName", "echo" ] ] ]
                    ],
                    [
                      "CmdSuffix_Word",
                      [
                        "Word",
                        "something",
                        [ [ "WordName", "something" ] ]
                      ]
                    ]
                  ]
                ]
              ]
            ]
          ]
        ]
      ]
    ],
    [ "NewLineList_NewLine" ],
    [
      "CompleteCommand_CList",
      [
        "CList_AndOr",
        [
          "AndOr_AndOr_AndIf_LineBreak_Pipeline",
          [
            "AndOr_Pipeline",
            [
              "Pipeline_PipeSequence",
              [
                "PipeSequence_Command",
                [
                  "Command_SimpleCommand",
                  [
                    "SimpleCommand_CmdName",
                    [
                      "CmdName_Word",
                      [ "Word", "true", [ [ "WordName", "true" ] ] ]
                    ]
                  ]
                ]
              ]
            ]
          ],
          [ "LineBreak_Empty" ],
          [
            "Pipeline_PipeSequence",
            [
              "PipeSequence_Command",
              [
                "Command_SimpleCommand",
                [
                  "SimpleCommand_CmdName_CmdSuffix",
                  [
                    "CmdName_Word",
                    [ "Word", "echo", [ [ "WordName", "echo" ] ] ]
                  ],
                  [
                    "CmdSuffix_Word",
                    [ "Word", "something", [ [ "WordName", "something" ] ] ]
                  ]
                ]
              ]
            ]
          ]
        ]
      ]
    ]
  ],
  [ "LineBreak_Empty" ]
]

I would like to see output along the lines of:

echo
false
echo
true
echo

... ignoring for now any parameters, options, and arguments to the base commands. The order of the outputted commands does not matter. Bonus points if it's easy to make them unique before being output (saving a |sort -u afterwards).

I've gotten as far as:

< simple.json jq flatten | grep -A2 CmdName_Word

but this feels like the wrong approach. I want to tell jq to give me the word that follows "Word" that follows "CmdName_Word", but I don't know how to do that.


If you'd like to reproduce these steps locally (extracted from https://github.com/colis-anr/morbig):

  1. (install docker per your OS)

  2. docker pull colisanr/morbig:latest

  3. define a shell function for ease of use:

     morbig () {
       D=$(cd "$(dirname "$1")"; pwd)
       B=$(basename "$1")
       docker run \
         -v "$D":/mnt \
         colisanr/morbig:latest --as simple /mnt/"$B"
     }
    
  4. ensure the directory that contains the shell script is writable by UID 1000 (the docker container runs as user "opam" inside the container, which has UID 1000).

  5. morbig your-shell-script-here.sh

  6. the resulting your-shell-script-here.sh.sjson JSON will be in the same directory as the shell script.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

1 Answers1

9
$ jq -r '.. | select(type == "array" and .[0] == "CmdName_Word") | .[1][1]' file
echo
false
echo
true
echo

The jq expression used here recurses over every entity in the document and tests the type of each to see if it's an array. For each found array that also has a first element which is the string CmdName_Word, it proceeds to extract the second element of their second element, which is the sought command name.

The expression could be shortened into

jq -r '.. | select(.[0]? == "CmdName_Word")[1][1]' file

... which uses .[0] in the select(), if it's available, which it would be if the current entity is an array. I've also used .[1][1] directly on the select().

Kusalananda
  • 333,661
  • It's safe to say that I was very far from this kind of solution! It looks to me like you've encoded the requirements exactly. I look forward to stress-testing this! – Jeff Schaller Apr 06 '22 at 23:14
  • 1
    The stress tests passed with flying colors. For amusement/posterity: in total, I collected a bit over 6,000 lines of shell script input; morbig gave 48,318 lines of JSON from that, and the (shortened) jq command from your answer (piped to sort -u) extracted 166 unique commands. This toolchain has been a huge help to this part of my project. – Jeff Schaller Apr 08 '22 at 19:58