12

I saw sed 's=.*/==' in context of sh script and I'm puzzled. I could not find in sed manual or web search (for sed s=) how s is used, not s///. Apart from s I see only one potential command here = (Print the current input line number), but in such case what the rest is doing...

Running the command in shell produces same output as input for e.g echo 'jkfdsa=335r34', whereas echo 'jkfdsa=335r34' | sed 's/=.*/==/' does replacement as per manual. Also slightly modifying command to e.g. echo 'jkfdsa=3' | sed 's798=.*/==/' gives
sed: -e expression #1, char 11: unterminated 's' command, so original should have some correct meaning. What is it?

ilkkachu
  • 138,973
Alex Martian
  • 1,035

3 Answers3

26

The = are alternative delimiters. These are used since the pattern contains a / (which is the more commonly used delimiter). Almost any character can be used as an alternative delimiter, so s@.*/@@ or s_.*/__ would have meant the same thing. With the ordinary delimiter, the sed expression could have been written as

s/.*\///

(the literal / that the expression wants to match needs to be escaped here) or, possibly more readable,

s/.*[/]//

(most characters within a [...] character class are literal1)

What the sed expression does is to substitute anything that matches .*/ with nothing. This will have the effect of removing everything up to and including the last / character on the line. It will remove up to the last / (not the first) since .* does a greedy match of any sequence of any characters.

Example:

$ echo 'a/b/c' | sed 's/.*[/]//'
c

The unterminated 's' command error that you get when testing

s798=.*/==/

is due to 7 being used as the delimiter for the s command. The expression

s7.*/77

would have worked though.


1 ... apart from the characters that have special meaning within [...] such as ^ (at the start) and - (when not first, second after ^, or last). The characters [ and ] also needs special treatment within [...], but that goes outside the scope of this question.


If this is used to get the filename at the end of a path in some string or shell variable, then the basename utility may do a better job of it (and also does the right thing if the path ends with a slash):

$ basename a/b/c
c
$ basename a/b/c/
c

Likewise, the standard shell parameter substitution ${variable##*/} would, assuming the variable contains no newlines, be equivalent in its effect to passing the value of $variable through the above sed expression in a command substitution, but many times faster.

The variable substitution and the basename utility also copes with correctly handling pathnames containing newlines, which sed would not do (since it processes its input line by line).

Kusalananda
  • 333,661
  • Thank you, very detailed. I wonder if it's on man page somewhere, I could not find by searching for alternative word. – Alex Martian Jan 22 '19 at 12:03
  • 1
    @AlexeiMartianov For GNU sed, it is documented in the "info pages" for the s (substitute) command (online link here). For BSD sed, it's mentioned in the manual, and the POSIX spec, for sed also mentions this in connection to the s command. They may not use the wording "alternative delimiter" though. – Kusalananda Jan 22 '19 at 12:11
  • I don't think sed 's/.*[/]//' is POSIX, but then I think the POSIX specification is defective in this instance. – Stéphane Chazelas Jan 22 '19 at 12:39
  • 2
    Yes, now I see The / characters may be uniformly replaced by any other single character within any given s command. – Alex Martian Jan 22 '19 at 12:43
  • @StéphaneChazelas Oh? I might have to read up on that later. – Kusalananda Jan 22 '19 at 12:51
  • Note that strictly speaking .*/ matches the first sequence of 0 or more characters, as many as possible that is followed by a / character. A file path on Unix generally can be any sequence of non-nul bytes which may not form valid characters, so .*/ may not match everything up to the last /. For instance, on a file path like $'St\xe9phane/donn\xe9es/index.txt', in a UTF-8 locale, it would match on phane/ because those 0xe9 bytes don't form valid characters in UTF-8 (they would in iso8859-1). You would want to set the locale to C for . to match any byte. – Stéphane Chazelas Jan 22 '19 at 16:07
  • @StéphaneChazelas, was there some standard way to match non-characters, or is reverting to the C locale the only way out? – ilkkachu Jan 22 '19 at 16:20
  • It should be added that in sed, the first character after the command s determines the delimiter for the command. You can also write sa.*/aa or something like this. The following delimiters only have to be the same as the one directly after the command. Neither is / the default delimiter, nor is = a certain alternative. From sed's view, they are just the first after the command. – rexkogitans Jan 22 '19 at 16:31
  • 1
    @ilkkachu, the behaviour of text utilities like sed is unspecified by POSIX if the input is not text. That includes sequence of bytes that don't form valid characters, input that doesn't end in newline and lines bigger than LINE_MAX. The first can be addressed with LC_ALL=C, second by adding the missing newline, 3rd can't be. as PATH_MAX is not guaranteed to be smaller than LINE_MAX, in theory you can't deal with arbitrary file paths with text utilities. – Stéphane Chazelas Jan 22 '19 at 16:41
  • There is one character that is not literal within a [...] construct: -. Since this is a "beginner" question, you might want to mention that exception to the rule. – Christopher Schultz Jan 22 '19 at 17:28
  • @ChristopherSchultz Added that. Will move it into a "footnote" though as it distracts. – Kusalananda Jan 22 '19 at 17:42
  • echo "$variable" | sed s=.*/== and echo "${variable##*/}" are not equivalent. The sed option probably does not do what you want if $variable contains a newline. sed's -z option (separate lines by NUL characters) might help. basename "$variable" seems to have similar results to ${variable##*/}. – 8bittree Jan 22 '19 at 18:06
  • 1
    @8bittree Standard sed does not have a -z option, but you are otherwise correct. I will modify that bit ever so slightly. – Kusalananda Jan 22 '19 at 18:08
12

From https://backreference.org/2010/02/20/using-different-delimiters-in-sed/:

It's a not-so-known fact that sed can use any character as separator for the "s" command. Basically, sed takes whatever follows the "s" as the separator.

So the slash in the middle becomes just a normal character. If I'm interpreting it correctly, your expression:

s=.*/==

could be also be written as:

s/.*\///

and explained as "remove anything before the last slash, including the slash".

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
jous
  • 221
10

It greedily removes all contents before and / itself.

Example:

 echo "nsi/wnx/cmedwcm" | sed 's=.*/=='

Output :

cmedwcm

Here = serves as the delimiter for regex(.*/) and replacement(null).

Siva
  • 9,077