First of all, avoid parsing the output of ls
. Next, even if you have a good reason to parse ls
output (you don't, here), there is no reason to pass it through grep
: ls *gz
will list only the file and directory names ending with gz
(but note that it will also list the contents of directories whose names end with gz
unless you use ls -d
) and, unlike ls | grep .gz
will not match files like not.a.gz.file
, and will match files with newlines in their names.
In any case, you don't need or want ls
at all here, all you want is for file in *gz
which is a far better approach since it can deal with arbitrary file names (as long as you properly quote your variables).
So, your loop could be much better written as:
for file in *.gz; do
newfile=$(echo "$file" | awk -F "_" '{print $1"_"$2".gz"}' | sed 's/ //g')
mv -- "$file" "$newfile"
done
Note how I also fixed your awk and sed commands since you hadn't closed the awk
part before opening the sed
part and how all variables are quoted. Also note how I removed the ,
in your awk print
statement since those would be adding a space (or whatever you set the OFS
variable to) between each printed item. The --
is used to indicate the end of command line options and ensures the command will work with file names starting with -
.
Next, you don't really need awk
or sed
at all. You could use the shell:
for f in *gz; do
echo mv -- "$f" "${f%%_V*}"
done
The ${variable%%pattern}
syntax ("${f%%_V*}") means "return the value of $variable after removing the longest string matching $pattern from the end". So, in this case, it means "remove everything from the first
_which comes before a
V`. You can read more about it here:
${parameter%%word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the ‘%’ case) or the longest matching pattern (the ‘%%’ case) deleted. If parameter is ‘@’ or ‘’, the pattern removal operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with ‘@’ or ‘’, the pattern removal operation is applied to each member of the array in turn, and the expansion is the resultant list.
Once you are satisfied that it works as expected, remove the echo
and run it again to actually rename the files.
Finally, if you have perl-rename (called rename
on Debian-based Linux distributions), you can also do:
$ rename -n -- 's/_V.*//s' *gz
GCF_000901975.1_ViralProj181986_genomic.fna.gz -> GCF_000901975.1
GCF_000901995.1_ViralProj181990_genomic.fna.gz -> GCF_000901995.1
GCF_001041015.1_ViralProj287961_genomic.fna.gz -> GCF_001041015.1
GCF_001885505.1_ViralProj344311_genomic.fna.gz -> GCF_001885505.1
If that looks OK, remove the -n
to actually rename the files.