First note that sed
is a text utility that works by default on one line at at a time while filenames can contain any character (including newline) and even non-characters (can be non-text).
Also, leaving a variable unquoted has a very special meaning, you almost never want to do that, it's also potentially very dangerous.
Also, you can't use echo
to output arbitrary data, use printf
instead.
Also, variable assignment syntax in Bourne-like shells is: var=value
, not $var=value
.
You can load the whole output of echo
(or better, printf
) into sed
's pattern space with:
printf '%s\n' "$filename" | sed -e :1 -e '$!{N;b1' -e '}'
Then, you can add the code to extract the part between the second and third _
:
var2=$(
printf '%s\n' "$filename" |
sed -ne :1 -e '$!{N;b1' -e '}' -e 's/^\([^_]*_\)\{2\}\([^_]*\)_.*/\2/p'
)
The non-greedy part is addressed by using [^_]*
(a sequence of non-_
characters) which, contrary to .*
guarantees we don't match past _
boundaries (though it would still choke on non-characters in many implementations).
In this case here, you could use shell parameter expansion operators instead:
case $filename in
(*_*_*_*) var2=${filename#*_*_}; var2=${var2%%_*};;
(*) var2=;;
esac
Which would work better if the filename is not text or if the part you want to extract ends in a newline character (and would also be more efficient).
Some shells like zsh
or ksh93
have more advanced operators:
zsh
:
split on _
and get third field:
var2=${"${(@s:_:)filename}"[3]}
Using the ${var/pattern/replacement}
and back-references (in that case, you want to verify first that the variable contains at least 3 underscores or there won't be any substitution).
set -o extendedglob
var2=${filename/(#b)*_*_(*)_*/$match[1]}
ksh93
:
var2=${filename/*_*_@(*)_*/\1}