14

I'm trying to set a variable in an sh script to the last 3 characters of the base name of a file (by base name I mean without the path and without the suffix). I've succeeded in doing this but, purely out of curiosity, I'm wondering if there is a shorter, single command I can use. Originally I had a one-liner with awk, but it was rather long. Currently I have this two-line script (assuming a complete filename is in $1):

filebase=`basename "$1"`
lastpart=`echo -n ${filebase%.*} | tail -c3`

So for example, "/path/to/somefile.txt" ends up with "ile" in $lastpart.

Can I somehow combine basename and the bit to strip the suffix into a single command, and is there a way to send it to tail (or something else I can use) without using a pipe? The suffix is unknown so I can't base it as a parameter to basename.

The main goal isn't actually so much to be as short as possible, as to be as readable at a glance as possible. The actual context of all of this is this question on Superuser, where I'm trying to come up with a reasonably simple answer.

Jason C
  • 1,383
  • 3
  • 14
  • 29

7 Answers7

14
var=123456
echo "${var#"${var%???}"}"

###OUTPUT###

456

That first removes the last three characters from $var then removes from $var the results of that removal - which returns the last three characters of $var. Here are some examples more specifically aimed at demonstrating how you might do such a thing:

touch file.txt
path=${PWD}/file.txt
echo "$path"

/tmp/file.txt

base=${path##*/}
exten=${base#"${base%???}"}
base=${base%."$exten"}
{ 
    echo "$base" 
    echo "$exten" 
    echo "${base}.${exten}" 
    echo "$path"
}

file
txt
file.txt
/tmp/file.txt

You don't have to spread this all out through so many commands. You can compact this:

{
    base=${path##*/} exten= 
    printf %s\\n "${base%.*}" "${exten:=${base#"${base%???}"}}" "$base" "$path"
    echo "$exten"
}

file 
txt 
file.txt 
/tmp/file.txt
txt

Combining $IFS with setting shell parameters can also be a very effective means of parsing and drilling through shell variables:

(IFS=. ; set -f; set -- ${path##*/}; printf %s "${1#"${1%???}"}")

That will get you only the three characters immediately preceding the first period following the last / in $path. If you want to retrieve only the first three characters immediately preceding the last . in $path (for instance, if there is a possibility of more than one . in filename):

(IFS=.; set -f; set -- ${path##*/}; ${3+shift $(($#-2))}; printf %s "${1#"${1%???}"}")

In both cases you can do:

newvar=$(IFS...)

And...

(IFS...;printf %s "$2")

...will print what follows the .

If you don't mind using an external program you can do:

printf %s "${path##*/}" | sed 's/.*\(...\)\..*/\1/'

If there is a chance of a \newline character in the filename (not applicable for the native shell solutions - they all handle that anyway):

printf %s "${path##*/}" | sed 'H;$!d;g;s/.*\(...\)\..*/\1/'
mikeserv
  • 58,310
  • Thanks; this gives me something to think about, although it doesn't quite work. With this, /path/to/123456.txt becomes txt, but I need to get the bit before the extension (456). I'll have to find out how to combine it with the %.* thing, which is also kind of new to me. – Jason C Jun 23 '14 at 04:32
  • 1
    It is, thanks. I've also found documentation. But to get the last 3 characters from $base there, the best I could do was the three-line name=${var##*/} ; base=${name%%.*} ; lastpart=${base#${base%???}}. On the plus side it's pure bash, but it's still 3 lines. (In your example of "/tmp/file.txt" I'd need "ile" rather than "file".) I did just learn a lot about parameter substitution; I had no idea it could do that... pretty handy. I do find it very readable, as well, personally. – Jason C Jun 23 '14 at 04:46
  • 1
    @JasonC - this is fully portable behavior - it is not bash specific. I recommend reading this. – mikeserv Jun 23 '14 at 04:57
  • 1
    Well, I guess, I can use % instead of %% to remove the suffix, and I don't actually need to strip the path, so I can get a nicer, two line noextn=${var%.*} ; lastpart=${noextn#${noextn%???}}. – Jason C Jun 23 '14 at 04:57
  • 1
    @JasonC - yes, that looks like it would work. It will break if there is $IFS in ${noextn} and you do not quote the expansion. So, this is safer: lastpart=${noextn#"${noextn%???}"} – mikeserv Jun 23 '14 at 04:59
  • 1
    @JasonC - last, if you found the above helpful, you might want to look at this. It deals with other forms of parameter expansion and the other answers to that question are really good too. And there are links to two other answers on the same subject within. If you want. – mikeserv Jun 23 '14 at 05:12
  • 1
    @mikeserv: Your solution will fail when suffix is not 3 characters. – cuonglm Jun 23 '14 at 07:40
  • @Gnouc - i know. But he asked for the last three characters. In fact he doesnt want the extension, if you look at his comments from file.txt he wants ile. – mikeserv Jun 23 '14 at 07:42
  • @Gnouc - I guess another way could be IFS=/. ; set -- $path ; unset IFS ; shift $((${#}-2)) ; printf %s\\n "${1#"${1%???}"}" "$2" – mikeserv Jun 23 '14 at 07:57
  • @mikeserv: cool! But I like to use perl with parsing text like this, see my answer. – cuonglm Jun 23 '14 at 08:09
  • @mikeserv He clarified that he wanted the last three characters of the filename without the suffix. Maybe you should modify your answer to include the solution from your last comment. – Dubu Jun 23 '14 at 08:22
  • @Dubu - I think I might. But the answer was more about demonstrating how. Still, it's a very good idea. – mikeserv Jun 23 '14 at 08:22
  • 1
    The quotes in lastpart=${noextn#"${noextn%???}"} are not about IFS, it's to prevent ${noextn%???} from being taken as a pattern. – Stéphane Chazelas Jun 23 '14 at 14:07
  • Thanks, @StephaneChazelas. Ive never thought about it like that before - just had enough bad experiences nesting expansions to always quote those. But, to clarify - you mean the expansion of ${...} right? Like in the casesp='???' ; ${strip%$p} and ${strip%"$p"} the first drops last three characters whereas second drops last three characters if theyre '?'... right? It will always expand, right? – mikeserv Jun 23 '14 at 14:38
  • 1
    yes. It's ${var#$pattern} or ${var#"$string"}. Same goes for case $x in $pattern) "$string") or [[ $x = $pattern ]]/[[ $x = "$string" ]] in ksh or bash. – Stéphane Chazelas Jun 23 '14 at 15:10
6

That's a typical job for expr:

$ file=/path/to/abcdef.txt
$ expr "/$file" : '.*\([^/.]\{3\}\)\.[^/.]*$'
def

If you know your file names have the expected format (contains one and only one dot and at least 3 characters before the dot), that can be simplified to:

expr "/$file" : '.*\(.\{3\}\)\.'

Note that the exit status will be non-zero if there's no match, but also if the matched part is a number that resolves to 0. (like for a000.txt or a-00.txt)

With zsh:

file=/path/to/abcdef.txt
lastpart=${${file:t:r}[-3,-1]}

(:t for tail (basename), :r for rest (with extension removed)).

  • 2
    Nice. expr is another one I need to get familiar with. I really like the zsh solutions in general (I was just reading about its support for nested substitutions on the left side of a ${} yesterday too and wishing sh had the same), it's just a bummer that it isn't always present by default. – Jason C Jun 23 '14 at 14:49
  • @JasonC - you'll get no hard feelings from me if you change your accepted answer - this is a better one, I think. – mikeserv Jun 23 '14 at 14:54
  • @mikeserv Thanks; your answer taught me a lot too. I am going to change this because this one is short and succinct; but let's blame SE for not giving us runner-ups. – Jason C Jun 23 '14 at 15:09
  • 2
    @JasonC - the information matters most. Make the best of it as accessible as you can - thats the whole point of the system anyway. If rep bought food i might get upset, but more often (than never) info brings home the bacon – mikeserv Jun 23 '14 at 15:11
  • 1
    @mikeserv "Request: Exchange rep for bacon"; look out meta here I come. – Jason C Jun 23 '14 at 15:12
  • 1
    @mikerserv, yours is POSIX, uses builtins only and doesn't fork any process. Not using command substitution also means you avoid problems with trailing newlines, so it's a good answer too. – Stéphane Chazelas Jun 23 '14 at 15:13
  • Well, gracias, Stephane. But are you saying expr isn't POSIX? I thought for sure it was! I take it all back. But anyway, I've often noticed good answers that came a little later don't get noticed so well as they should after an answer has been accepted. Mine's already got 7 votes - that's good enough. This'll get seen now. That's all I cared about. – mikeserv Jun 23 '14 at 15:19
  • 1
    @mikeserv, I didn't mean to imply expr was not POSIX. It certainly is. It's rarely built-in though. – Stéphane Chazelas Jun 23 '14 at 15:22
4

If you can use perl:

lastpart=$(
    perl -e 'print substr((split(/\.[^.]*$/,shift))[0], -3, 3)
            ' -- "$(basename -- "$1")"
)
cuonglm
  • 153,898
  • that is cool. got ny vote. – mikeserv Jun 23 '14 at 08:19
  • A bit more concise: perl -e 'shift =~ /(.{3})\.[^.]*$/ && print $1' $filename. An additional basename would be needed if the filename might contain no suffix but some directory in the path does. – Dubu Jun 23 '14 at 08:28
  • @Dubu: Your solution always fails if filename has no suffix. – cuonglm Jun 23 '14 at 08:35
  • 1
    @Gnouc This was by intent. But you're right, this could be wrong depending on the purpose. Alternative: perl -e 'shift =~ m#(.{3})(?:\.[^./]*)?$# && print $1' $filename – Dubu Jun 23 '14 at 08:49
2

sed works for this:

[user@host ~]$ echo one.two.txt | sed -r 's|(.*)\..*$|\1|;s|.*(...)$|\1|'
two

Or

[user@host ~]$ sed -r 's|(.*)\..*$|\1|;s|.*(...)$|\1|' <<<one.two.txt
two

If your sed doesn't support -r, just replace the instances of () with \( and \), and then -r isn't needed.

1

If perl is available, I find it can be more readable than other solutions, specifically because its regex language is more expressive and it has the /x modifier, which allows for writing clearer regexs:

perl -e 'print $1 if shift =~ m{ ( [^/]{3} ) [.] [^./]* \z }x' -- "$file"

This prints nothing if there is no such match (if the basename has no extension or if the root before the extension is too short). Depending on your requirements, you can adjust the regex. This regex enforces the constraints:

  1. It matches the 3 characters before the final extension (the part after and including the last dot). These 3 characters can contain a dot.
  2. The extension can be empty (except for the dot).
  3. The matched part and the extension must be part of the basename (the part after the last slash).

Using this in a command substitution has the normal issues with removing too many trailing newlines, a problem which also affects Stéphane's answer. It can be dealt with in both cases, but is a little easier here:

lastpart=$(
  perl -e 'print "$1x" if shift =~ m{ ( [^/]{3} ) [.] [^./]* \z }x' -- "$file"
)
lastpart=${lastpart%x}  # allow for possible trailing newline
jrw32982
  • 723
0

Python2.7

$ echo /path/to/somefile.txt | python -c "import sys, os; print '.'.join(os.path.basename(sys.stdin.read()).split('.')[:-1])[-3:]"
ile

$ echo file.one.two.three | python -c "import sys, os; print '.'.join(os.path.basename(sys.stdin.read()).split('.')[:-1])[-3:]"
two
0

I think this bash function, pathStr(), will do what you are looking for.

It doesn't require awk, sed, grep, perl or expr. It uses only bash builtins so it's quite fast.

I've also included the dependent argsNumber and isOption functions but their functionalities could be easily incorporated into pathStr.

The dependent function ifHelpShow is not included as it has numerous subdependencies for outputting the help text either on the terminal commandline or to a GUI dialog box via YAD. The help text passed to it is included for documentation. Advise if you would like ifHelpShow and its dependents.

function  pathStr () {
  ifHelpShow "$1" 'pathStr --OPTION FILENAME
    Given FILENAME, pathStr echos the segment chosen by --OPTION of the
    "absolute-logical" pathname. Only one segment can be retrieved at a time and
    only the FILENAME string is parsed. The filesystem is never accessed, except
    to get the current directory in order to build an absolute path from a relative
    path. Thus, this function may be used on a FILENAME that does not yet exist.
    Path characteristics:
        File paths are "absolute" or "relative", and "logical" or "physical".
        If current directory is "/root", then for "bashtool" in the "sbin" subdirectory ...
            Absolute path:  /root/sbin/bashtool
            Relative path:  sbin/bashtool
        If "/root/sbin" is a symlink to "/initrd/mnt/dev_save/share/sbin", then ...
            Logical  path:  /root/sbin/bashtool
            Physical path:  /initrd/mnt/dev_save/share/sbin/bashtool
                (aka: the "canonical" path)
    Options:
        --path  Absolute-logical path including filename with extension(s)
                  ~/sbin/file.name.ext:     /root/sbin/file.name.ext
        --dir   Absolute-logical path of directory containing FILENAME (which can be a directory).
                  ~/sbin/file.name.ext:     /root/sbin
        --file  Filename only, including extension(s).
                  ~/sbin/file.name.ext:     file.name.ext
        --base  Filename only, up to last dot(.).
                  ~/sbin/file.name.ext:     file.name
        --ext   Filename after last dot(.).
                  ~/sbin/file.name.ext:     ext
    Todo:
        Optimize by using a regex to match --options so getting argument only done once.
    Revised:
        20131231  docsalvage'  && return
  #
  local _option="$1"
  local _optarg="$2"
  local _cwd="$(pwd)"
  local _fullpath=
  local _tmp1=
  local _tmp2=
  #
  # validate there are 2 args and first is an --option
  [[ $(argsNumber "$@") != 2 ]]                        && return 1
  ! isOption "$@"                                      && return 1
  #
  # determine full path of _optarg given
  if [[ ${_optarg:0:1} == "/" ]]
  then
    _fullpath="$_optarg"
  else
    _fullpath="$_cwd/$_optarg"
  fi
  #
  case "$_option" in
   --path)  echo "$_fullpath"                            ; return 0;;
    --dir)  echo "${_fullpath%/*}"                       ; return 0;;
   --file)  echo "${_fullpath##*/}"                      ; return 0;;
   --base)  _tmp1="${_fullpath##*/}"; echo "${_tmp1%.*}" ; return 0;;
    --ext)  _tmp1="${_fullpath##*/}";
            _tmp2="${_tmp1##*.}";
            [[ "$_tmp2" != "$_tmp1" ]]  && { echo "$_tmp2"; }
            return 0;;
  esac
  return 1
}

function argsNumber () {
  ifHelpShow "$1" 'argsNumber "$@"
  Echos number of arguments.
  Wrapper for "$#" or "${#@}" which are equivalent.
  Verified by testing on bash 4.1.0(1):
      20140627 docsalvage
  Replaces:
      argsCount
  Revised:
      20140627 docsalvage'  && return
  #
  echo "$#"
  return 0
}

function isOption () {
  # isOption "$@"
  # Return true (0) if argument has 1 or more leading hyphens.
  # Example:
  #     isOption "$@"  && ...
  # Note:
  #   Cannot use ifHelpShow() here since cannot distinguish 'isOption --help'
  #   from 'isOption "$@"' where first argument in "$@" is '--help'
  # Revised:
  #     20140117 docsalvage
  # 
  # support both short and long options
  [[ "${1:0:1}" == "-" ]]  && return 0
  return 1
}

RESOURCES

DocSalvager
  • 2,152
  • I don't understand - it's already been demoed here how to do similar fully portably - without bashisms - seemingly simpler than this. Also, what is ${#@}? – mikeserv Jun 27 '14 at 03:30
  • This just packages the functionality into a reusable function. re:${#@}... Manipulating arrays and their elements requires the full variable notation ${}. $@ is the 'array' of arguments. ${#@} is the bash syntax for the number of arguments. – DocSalvager Jun 27 '14 at 10:17
  • No, $# is the syntax for the number of arguments, and it is also used elsewhere here. – mikeserv Jun 27 '14 at 14:52
  • You are right that "$#" is the widely documented systax for "number of arguments." However, I've just reverified that "${#@}" is equivalent. I wound up with that after experimenting with the differences and similarities between positional arguments and arrays. The later comes from the array syntax which apparently is a synonym for the shorter, simpler "$#" syntax. I've altered and documented argsNumber() to use "$#". Thanks! – DocSalvager Jun 28 '14 at 00:09
  • ${#@} is not equivalent in most cases - the POSIX spec states the results of any parameter expansions on either $@ or $* are unspecified, unfortunately. It may work in bash but that is not a reliable feature, I guess is what I'm trying to say., – mikeserv Jun 28 '14 at 00:24
  • Added bash version to internal help for function argsNumber. – DocSalvager Jun 28 '14 at 00:47