1

I would like to extract the numeric part of the file names that begin with "hsli" and end with ".h5" in Bash on Ubuntu 14.04.1 64-bit LTS. My ls -l hsli* output is as follows:

-rwxrwxrwx 1 ongun ongun 31392 Feb 26 13:04 hsli0.03.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 13:44 hsli0.042.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 14:24 hsli0.054.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 15:03 hsli0.066.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 15:42 hsli0.078.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 16:22 hsli0.09.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 17:02 hsli0.102.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 17:36 hsli0.114.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 17:58 hsli0.126.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 18:20 hsli0.138.h5
-rwxrwxrwx 1 ongun ongun 31392 Feb 26 18:42 hsli0.15.h5

They are already in ascending order and after a bit of manipulation I am able to get the file name for the first file with the following command. The command and the output follow below:

$ ls -l hsli* | head -1 | rev | cut -f 1 -d " " | rev 
hsli0.03.h5

Now my aim is to extract 0.03 from here, how can I do so? I am not familiar with regular expressions and this seems like a hard case since there are 2 dots in the file name.

Vesnog
  • 679
  • 4
  • 11
  • 29
  • ls hsli* | head -1 | sed 's/[0-9]*\.[0-9]*//' or even ls hsli* | head -1 | sed 's/[0-9.]\+/./' – Costas Feb 26 '15 at 19:56
  • Will try it @Costas thanks. The command got rid of the digits all together and the output is hsli.h5. – Vesnog Feb 26 '15 at 19:59
  • If there are definitely no \newlines in the filenames, do: \ls -d ./hsli* | cut -d. -f3 for the whole list - add head -n1 to the end. @Costas - you can drop head if you just add a ;q to the tail of your command. – mikeserv Feb 26 '15 at 20:01
  • 1
    Of course, without ls, you can do: set -- hsli*; set -- "${1#*.}"; echo "${1%.*}" – mikeserv Feb 26 '15 at 20:03
  • @mikeserv It gives 03 as the output not 0.03. Can I manually prepend a dot in the beginning, say with sed? – Vesnog Feb 26 '15 at 20:05
  • 1
    @Vesnog - ok, so do printf %.02f\\n ".$(earlier cmd)" - it's probably better than echo anyway. Or for the second version just ...;echo "0.${1%.*}". Oh, and maybe add a -s switch to cut so you only work with filenames that definitely contain the right amount of . dots. – mikeserv Feb 26 '15 at 20:07
  • @mike Okay the second version worked like a charm but I could not get the first one to work. – Vesnog Feb 26 '15 at 20:14
  • 1
    @Vesnog - well, for the second one, you might want to do a test first before the echo (in case the filename you search for doesn't exist or doesn't have the right number of dots). I'll do an answer. – mikeserv Feb 26 '15 at 20:17

2 Answers2

2

Without ls, since you're just populating its list with shell globs anyway, you can cut out the middle-man like:

glob_hsli()(IFS=.;set +f
    set -f -- '' hsli*.*.h5
    for h5 do case ${h5#*.}  in
        (*[!0-9]*.*|.*|'') : ;;
        (*) set $h5 "${1:-0}";
        shift $((3>>($2>$4)));;
    esac;done
    printf "0.%d\n" "${1:?No Match Found!}"
)

Call it without arguments and it will glob your hsli* files and only print the 1st occurring middle *.string.h5 part in the results for the current directory, or it will return with error and a meaningful error message printed to stderr if it cannot do so.

mikeserv
  • 58,310
  • Should I save this as a separate file in the same directory? I have never done this before. – Vesnog Feb 26 '15 at 20:34
  • @Vesnog - You can if you like - you can then source it like . ./filename.fn (or whatever you name it). Or you can copy/paste it into your command-line. After doing either thing you'd just call it like glob_hsli. I think I got it ironed out to handle all outside cases now, as well. It will recurse if it needs to get a match for *.*.h5 without also matching *.*.*.h5 - but it will quit as soon as it can regardless. With your above dataset, one iteration should be all it takes. – mikeserv Feb 26 '15 at 20:37
  • Thanks once again while we are at it I got another file that has some line with a word like reso=35, how can I extract 35 here? Tried glob_hsli and it returns 0. 03.h5 – Vesnog Feb 26 '15 at 20:43
  • 1
    @Vesnog - sed '/\n/P;//!s/reso=\([0-9]\{1,\}\)/\n\1\n/;D' <file - but you might want to use literal newlines rather than the ns in the right hand side of the s///ubstitution. Look here... – mikeserv Feb 26 '15 at 20:47
  • Thanks that works for the reso=35. For the first part I think I will go with the solution in your comment since I am sure that the hsli files exist in that directory. However, I cannot figure out how to use it for the maximum number since it does not use ls. – Vesnog Feb 26 '15 at 21:01
  • For example I have hsli0.15.h5 also in the same directory and would like to extract 0.15, if you look at the ls output in my original post it might illustrate my question better. – Vesnog Feb 26 '15 at 21:11
  • I will try now what do you think about the 0.15 by the way? – Vesnog Feb 26 '15 at 21:25
  • 1
    @Vesnog - oh! I get it. I do think it should glob .15 before in order before .03, but if not we can explicitly test for that. It's not so hard. – mikeserv Feb 26 '15 at 21:28
  • It works like a charm now with the latest version it prints 0.03. I did not understand your last argument though. – Vesnog Feb 26 '15 at 21:34
  • The latest version does not provide any output. – Vesnog Feb 26 '15 at 21:44
  • Yes you are right. – Vesnog Feb 26 '15 at 21:55
1

Bash makes it relatively easy to apply a transformation like stripping prefixes and suffixes to elements of an array.

shopt -s nullglob                  # if there are no matches, produce an empty list
versions=(hsli*.h5)                # list matches
versions=("${versions[@]#hsli}")   # strip prefix
versions=("${versions[@]%.h5}")    # strip suffix
printf '%s\n' "${versions[@]}"     # print one version per line
for v in "${versions[@]}"; do      # execute a command on each version
  somecommand "$v"
done

Note that the versions (if that's what they are) are sorted in lexicographic order, so e.g. 0.9 comes after 0.10. If you want a numerical order and you have recent enough versions of GNU coreutils, you can use sort -V to sort 0.9 before 0.10. Given that your file names don't contain whitespace or globbing characters, you can sort them with

versions=($(printf '%s\n' "$versions[@]" | sort -V))
  • Thanks your help is much appreciated I was too confused and got the job done with ls -l hsli* | tail -1 | rev | cut -f 1 -d " " | rev | sed -e 's/[a-z]*//' -e 's/.h5//' and the same command with head in the second pipe to get the first files number. The numbers correspond to frequencies in an FDTD(Finite Difference Time Domain analysis). I would like to learn much more about sed, awk and regexs when I find the time btw, where can I start? – Vesnog Feb 26 '15 at 23:20