0

I have a set of files that vary in characters. For Example:

IDNR19_15_037_S514_L001_R1_001.fastq
IDNR19_02_016_S238_L001_R1_001.fastq

I would like to remove all of the characters up the point of S514 and S238, while keeping everything that comes after. Is this possible to do when the files have different numbers as shown in my example?

There are around 1,100 files, so doing this manually would be pretty time consuming.

The closest I have been able to do is:

rename 's/IDNR19_//g' *.fastq

to remove the IDNR19_ portion, but this does not solve my problem.

1 Answers1

0

Assuming these are names of files on disk that you want to rename, not strings stored in a variable or in a text file. You may use a simple shell loop:

for name in *.fastq; do
    newname=${name#*_*_*_}
    printf 'Would move "%s" to "%s"\n' "$name" "$newname"
    # mv -i -- "$name" "$newname"
done

This loops over all names that matches the pattern *.fastq in the current directory (you may want to be more specific with this pattern by e.g. changing it to IDNR*.fastq). For each filename, it constructs a new name by removing the prefix that matches the filename globbing pattern *_*_*_. This is done using a standard parameter expansion.

For safety, the mv is commented out. You should run the code once to see that it does the right thing before enabling the mv.

Using one of the various rename utilities (the one based on Perl's File::Rename module; there are a number of different ones, see "What's with all the renames: prename, rename, file-rename?"):

rename -n -v 's/.*?_.*?_.*?_//' -- *.fastq

or shorter,

rename -n -v 's/(.*?_){3}//' -- *.fastq

This more or less does the same thing as the shell code above, but using a Perl substitution. The substitution removes the initial bits of the filename string by matching the three substrings between the underscores using a non-greedy .* match. Remove the -n option when you are confident that it does the right thing.

Kusalananda
  • 333,661