0

When archiving certain data we encode the archive's sha1 HASH within the file name so as to determine the integrity of the archive.

I am trying to find a way to automate the integrity check by extracting the HASH out of the file name:

echo myid123_2019-08-31_b7769c0e22c7f75b2935afad499852630ca83145.tar.xz | sed -n 's/^.*\([[:xdigit:]]{40}\).*$/\1/p'

echo myid123_2019-08-31_b7769c0e22c7f75b2935afad499852630ca83145.tar.xz | sed -n 's/^.([0-9a-fA-F]{40}).$/\1/p'

Both tests above return no results. Am I missing something?

I would prefer to test for the HASH explicitly, rather then by elimination or position as the filename format can vary. In any case the hash would be delimited by non-hash characters.

Follow-up:

Thanks for the help.

This is the final product I was looking to create:

function checkhash () { 
 for f in "$@"
  do 
   test -f $f || continue
   export HASH=$(echo ${f}| grep -o  '[0-9a-fA-F]\{32,128\}' )
   case $(echo -n ${HASH} | wc -c) in
    32)
       echo "${HASH} *${f}" | md5sum -c -
    ;;
    40)
       echo "${HASH} *${f}" | sha1sum -c -
    ;;
    56)
       echo "${HASH} *${f}" | sha224sum -c -
    ;;
    64)
       echo "${HASH} *${f}" | sha256sum -c -
    ;;
    96)
       echo "${HASH} *${f}" | sha384sum -c -
    ;;
    128)
       echo "${HASH} *${f}" | sha512sum -c -
    ;;
    *)
       echo "No Identified HASH found in filename: ${f}"
    ;;
   esac
 done
}
Mark C
  • 1

5 Answers5

1

Your examples suggest the hash string begins after the right-most underscore (_) character and ends before the left-most dot (.) character.

If you don't mind a two-step process, you can do it in bash like this:

file_name="myid123_2019-08-31_b7769c0e22c7f75b2935afad499852630ca83145.tar.xz"
name_hash="${file_name%%.*}"
hash="${name_hash##*_}"
echo "$hash"

produces

b7769c0e22c7f75b2935afad499852630ca83145
Sotto Voce
  • 4,131
  • Again, this is biased on testing for something other then the hash/digest. A hash is a fixed length, long-run sequence 32, 40, 56, 64, 96, or 128 characters (depending on the hash) consisting of hexadecimal case in-sensitive characters [0-9,a-f]. The only thing I can count on about the delimiter is that they will be non-hexadecimal characters. – Mark C Sep 01 '22 at 18:55
1

Let me offer something in awk:

echo myid123_2019-08-31_b7769c0e22c7f75b2935afad499852630ca83145.tar.xz | awk -F'[_.]' '{print $3}'
Romeo Ninov
  • 17,484
  • The problem here is that the test fixes on the delimiters which cannot be relied upon. I need to test for the embedded HASH string in the file name. – Mark C Sep 01 '22 at 16:47
  • @MarkC, provide REAL example and we can help you extract the hash. – Romeo Ninov Sep 01 '22 at 17:00
  • 1
    You seem to have missed the point. I got there eventually. Thanks to everyone for their efforts. – Mark C Sep 01 '22 at 18:45
0

Perhaps using grep would do a cleaner job:

$ a='myid123_2019-08-31_b7769c0e22c7f75b2935afad499852630ca83145.tar.xz'

$ echo "$a" | grep -o '[0-9a-fA-F]{40}'

b7769c0e22c7f75b2935afad499852630ca83145

Please note that the {...} need quoting \{...\} in BRE.

0

Using sed

$ sed -E 's/([^_]*_){2}([^.]*).*/\2/' input_file
b7769c0e22c7f75b2935afad499852630ca83145
sseLtaH
  • 2,786
0
echo "myid123_2019-08-31_b7769c0e22c7f75b2935afad499852630ca83145.tar.xz"|awk -F "_" '{gsub(/\..*/,"",$NF);print $NF}'

output

b7769c0e22c7f75b2935afad499852630ca83145