-1

The following files exist in the current directory

 GCF_000901975.1_ViralProj181986_genomic.fna.gz  
 GCF_001885505.1_ViralProj344311_genomic.fna.gz
 GCF_000901995.1_ViralProj181990_genomic.fna.gz  
 GCF_001041015.1_ViralProj287961_genomic.fna.gz

and i want to rename the current file like this

 GCF_000901975.1
 GCF_001885505.1
 GCF_000901995.1
 GCF_001041015.1

I'm using the below script to get it but it's failed

 for file in `ls | grep .gz`
 do
    newfile=`echo $file | awk -F "_" '{print $1,"_",$2,".gz"| sed 's/ //g"`
    mv $file $newfile
 done

anybody can give me some advice? or maybe i should try the "Split" , appreciate it

loki
  • 43
  • 1
    Please copy/paste your shell scripts into http://shellheck.net and fix the issues it tells you about before posting here so we don't waste time telling you about issues a tool can tell you about. – Ed Morton Jul 14 '22 at 00:19

2 Answers2

7

First of all, avoid parsing the output of ls. Next, even if you have a good reason to parse ls output (you don't, here), there is no reason to pass it through grep: ls *gz will list only the file and directory names ending with gz (but note that it will also list the contents of directories whose names end with gz unless you use ls -d) and, unlike ls | grep .gz will not match files like not.a.gz.file, and will match files with newlines in their names.

In any case, you don't need or want ls at all here, all you want is for file in *gz which is a far better approach since it can deal with arbitrary file names (as long as you properly quote your variables).

So, your loop could be much better written as:

for file in *.gz; do
    newfile=$(echo "$file" | awk -F "_" '{print $1"_"$2".gz"}' | sed 's/ //g')
    mv -- "$file" "$newfile"
 done

Note how I also fixed your awk and sed commands since you hadn't closed the awk part before opening the sed part and how all variables are quoted. Also note how I removed the , in your awk print statement since those would be adding a space (or whatever you set the OFS variable to) between each printed item. The -- is used to indicate the end of command line options and ensures the command will work with file names starting with -.

Next, you don't really need awk or sed at all. You could use the shell:

for f in *gz; do 
    echo mv -- "$f" "${f%%_V*}"
done

The ${variable%%pattern} syntax ("${f%%_V*}") means "return the value of $variable after removing the longest string matching $pattern from the end". So, in this case, it means "remove everything from the first _which comes before aV`. You can read more about it here:

${parameter%%word}

The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the ‘%’ case) or the longest matching pattern (the ‘%%’ case) deleted. If parameter is ‘@’ or ‘’, the pattern removal operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with ‘@’ or ‘’, the pattern removal operation is applied to each member of the array in turn, and the expansion is the resultant list.

Once you are satisfied that it works as expected, remove the echo and run it again to actually rename the files.

Finally, if you have perl-rename (called rename on Debian-based Linux distributions), you can also do:

$ rename -n -- 's/_V.*//s' *gz
GCF_000901975.1_ViralProj181986_genomic.fna.gz -> GCF_000901975.1
GCF_000901995.1_ViralProj181990_genomic.fna.gz -> GCF_000901995.1
GCF_001041015.1_ViralProj287961_genomic.fna.gz -> GCF_001041015.1
GCF_001885505.1_ViralProj344311_genomic.fna.gz -> GCF_001885505.1

If that looks OK, remove the -n to actually rename the files.

terdon
  • 242,166
  • @stéphane I rolled back your edit because I wanted the first examples to be closer to the OP's original since I do provide safe, robust solutions later. – terdon Jul 13 '22 at 19:38
  • Your statement starting `ls gz* is incorrect without my edit. Yourrename` approach is not robust as you forgot the s flag (some rename variants will also have trouble with filenames starting with -) – Stéphane Chazelas Jul 13 '22 at 19:55
  • 1
    @StéphaneChazelas yes, the ls *gz isn't supposed to be correct, it isn't the right approach. Of course ls -d -- *gz is better since it won't descend into subdirs and can handle names starting with -, but given that the OP is clearly very new to this, I wanted to start slow. I put the s back, I had indeed missed that and it's needed, as you point out. Thanks! – terdon Jul 13 '22 at 20:05
-1
for i in $(ls GCF*.gz); do var=$(echo $i | awk -F "_" '{print $1"_"$2}'); echo $var; mv $i $var; done

awk method

ls -ltr | awk '/GCF.*gz/{print "mv "$NF" "substr($NF,1,15)}'|sh