0

TL;DR- There's a lot going on in the sed pattern below, and I'm not sure how the discrete pieces are being composed into an overall command.


Bash version: GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin21)

I'm learning about shell scripting by reading the RBENV codebase file-by-file, and I've encountered the rbenv-help file, which includes this function definition:

extract_initial_comment_block() {
  sed -ne "
    /^#/ !{
      q
    }
s/^#$/# /

/^# / {
  s/^# //
  p
}

" }

I see how this function is called further down in the code, so I know that its first arg is a filename:

extract_initial_comment_block < "$filename" | collect_documentation

From this I can see that the file represented by "$filename" is being fed as the standard input to the sed command. For the purposes of my question, the knock-on function "collect_documentation" is irrelevant.

I also gather from the name of the function that its purpose is to take a file like this, and return its summary and usage comments, i.e. lines 2-14 of the linked file. However, I haven't tested this theory yet, so I may not be 100% correct.

Furthermore, I know from this StackExchange answer that the purpose of the -e flag is to tell sed to interpret the subsequent string as a command (or a collection of commands separated by a newline?). So it looks like the body of extract_initial_comment_block contains 3 separate scripts for sed to interpret, in order. That same StackExchange answer says that {...} are used to group commands together, but I'm not sure if that's what is happening in this regex (these regexes?).

As near as I can tell, there are 3 scripts being fed to sed here:

    /^#/ !{
      q
    }
    s/^#$/# /
    /^# / {
      s/^# //
      p
    }

However, even within each of these scripts, there are patterns being used (such as ^# and !{ q }) that I'm not able to identify, even after availing myself of resources like The Linux Data Project. It seems like there are a lot of moving pieces, and I'm not sure how each script is being composed into a finished product.

I've tried to walk through my thought process in as clear a manner as I can. Is my train of thought correct so far? If it's not, where did I veer off-course? If it is, how can I deduce the meaning of each command that's passed to sed?

Richie Thomas
  • 425
  • 4
  • 10
  • The commands used here are all clearly documented in the sed man page. After reading through that document, which specifically are your questions? – larsks Sep 18 '22 at 18:49
  • My main questions were a) whether my assumption that there are 3 different commands being sent to sed is correct, and b) what the commands are doing, specifically commands 1 and 3. I can deduce what command #2 does, if my first assumption is correct, but at the time I wrote the question, this was not certain. – Richie Thomas Sep 19 '22 at 13:13
  • I wouldn't say the {...} command is "clearly" documented. There are no examples on the man page; indeed, there are only 3 mentions of the { character at all. The last ("Begin a block of commands") is the clearest, but the word "commands" is vague. Now that I've read the answer below, I can deduce that it means "zero- or one-address commands" such as a, i, q, and r mentioned in the command synopsis, but even that synopsis says "This is just a brief synopsis of sed commands to serve as a reminder to those who already know sed". For newer devs, a different guide may be needed. – Richie Thomas Sep 19 '22 at 13:20
  • Now you're asking the sort of specific questions that should have been included in the first place! It's generally better to update your question to add this sort of information (people may not see it in the comments). – larsks Sep 19 '22 at 13:31
  • I feel like the phrase "...there are patterns being used (such as ^# and !{ q }) that I'm not able to identify, even after availing myself of resources like The Linux Data Project" is pretty explicit. Perhaps the question title could reflect that more clearly. If you're OK with the title "What is the function of the {...} pattern passed to "sed -e"?", then I'll update it accordingly. Thanks for your feedback. – Richie Thomas Sep 19 '22 at 16:29

1 Answers1

2
/^#/ !{
      q
    }

The part between the slashes is a regular expression, where ^ means the start of string and # is just the character itself. The pattern selects the lines on which run the associated command. The trailing ! inverts the sense of the match, and q is the command to quit. So this quits the sed program when it sees a line that doesn't start with the comment marker #.

 s/^#$/# /

s/a/b/ substitutes a with b, ^ is start of line, $ end of line, # is itself. So this changes line with just a lone # to #+space.

/^# / {
  s/^# //
  p
}

If the line starts with a # and a space (/^# /), replace # and space with nothing (s/^# //) and print the line (p). This is where the previous substitution comes handy.

The -n option to sed (at the start of the command) tells sed to not print the line after executing the script on it, as it would do by default.

Note that the script ignores lines that start with a # and don't have a space after, including the hashbang lines starting with #! that tell the OS which interpreter to use for the script. Might be on purpose for those, but might hide some other lines.

E.g.

#!/bin/sh
# some script
#
#this is ignored
# this prints

this doesn't print any more

whatever

turns into

some script

this prints

ilkkachu
  • 138,973
  • Thank you. The syntax inside the quotes with the 3 regexes looks really similar to the subsequent function in that file (code here), which I see uses the awk programming language (according to the man page for awk). Does the sed command in fact use the same language as the awk command, or is the syntax passed to sed here not considered a "language", in the sense used by the awk man page? – Richie Thomas Sep 20 '22 at 13:42