0

I am trying to extract parts of a filename, where I want to extract everything after the first _ and I have found a working solution, see below

file=22NGS71294_S191_R1_001.fastq.gz
echo $file
22NGS71294_S191_R1_001.fastq.gz
echo ${file#*[_ ]}
S191_R1_001.fastq.gz

but, when I use wildcards, it stops working

file2=*R1*
echo $file2
22NGS71294_S191_R1_001.fastq.gz
echo ${file2#*[_ ]}
22NGS71294_S191_R1_001.fastq.gz

I have no idea why this is not working, as the echo command of $file and $file2 gives the exact same results. Could someone please explain this behaviour?

ch_esr
  • 3
  • In stead of echo $file2, do echo "$file2". Then you'll see what is actually in file2, and you'll understand why it doesn't work. – Ljm Dullaart Jan 26 '23 at 23:03
  • ah I see, but echo {$file2} gives 22NGS71294_S191_R1_001.fastq.gz too

    still not sure how I can solve that problem though. I am using an array to loop through certain directories and then extracting the realpath of the files that are sitting in the directories, as well as creating new files names (and the above problem relates to that)

    – ch_esr Jan 26 '23 at 23:17

1 Answers1

0

If you're using sh, ksh, bash (or zsh without having set the globassign option), then filename generation (aka "globbing") does not occur on the RHS of a scalar assignment like file2=*R1*

Your echo $file2 returns 22NGS71294_S191_R1_001.fastq.gz because the unquoted expansion $file2 is subject to filename generation at that point, but echo ${file2#*[_ ]} is effectively equivalent to echo ${'*R1*'#*[_ ]} which is the same as echo *R1*. See for example When is double-quoting necessary?.

In ksh/bash/zsh you could use an array assignment - which you probably should be doing anyhow, since in general *R1* might generate more than a single filename. So for example in bash:

shopt -s nullglob

file2=(R1) echo "${file2[@]#*_}"

steeldriver
  • 81,074
  • thanks very much for your help - its working now! appreciate the links as well, helped me understand why. – ch_esr Jan 26 '23 at 23:50