second loop issue

Question

Hi I am able to run the following loop to apply the remapnn command on multiple lat and lon.

I have large number of NETCDF files namely era_temperature_2016 era_temperature_2017 era_temperature_2018 era_temperature_2019 era_temperature_2020 , I want to apply the loop on all these files.

#!/bin/bash
infile="era_temperature_2016.nc"
coords="coords.txt"
while read line
do
   line=$(echo $line | sed -e 's/\r//g')
   station=$(echo $line | cut -d ' ' -f 1)
   #-- skip header line
   if [[ "$station" == "station" ]]; then continue; fi
   #-- select station coordinates
   lat=$(echo $line | cut -d ' ' -f 2)
   lon=$(echo $line | cut -d ' ' -f 3)
   station="${station}${lat}${lon}"
   #-- extract the station data
   cdo -remapnn,"lon=${lon}_lat=${lat}" ${infile} ${station}_out.nc
done < $coords

I have tried the following but getting the error

Error ./values1.sh: line 5: syntax error near unexpected token coords="coords.txt"' ./values1.sh: line 5: coords="coords.txt"'

#!/bin/bash
my_files=$(ls era_temperature_*.nc)
for f in $my_files
coords="coords.txt"
while read line
do
   line=$(echo $line | sed -e 's/\r//g')
   station=$(echo $line | cut -d ' ' -f 1)
   #-- skip header line
   if [[ "$station" == "station" ]]; then continue; fi
   #-- select station coordinates
   lat=$(echo $line | cut -d ' ' -f 2)
   lon=$(echo $line | cut -d ' ' -f 3)
   station="${station}${lat}${lon}"
   #-- extract the station data
   cdo -remapnn,"lon=${lon}_lat=${lat}" ${infile} ${station}_out.nc
done < $coords

Thanks every one for the input and help

the following code is working fine

#!/bin/bash
for NUM in $(seq 2016 2018)
do
infile=era_temperature_$NUM.nc
coords="coords.txt"
while read line
do
   line=$(echo $line | sed -e 's/\r//g')
   station=$(echo $line | cut -d ' ' -f 1)
   #-- skip header line
   if [[ "$station" == "station" ]]; then continue; fi
   #-- select station coordinates
   lat=$(echo $line | cut -d ' ' -f 2)
   lon=$(echo $line | cut -d ' ' -f 3)
   station="${station}${NUM}${lat}_${lon}"
   #-- extract the station data
   cdo -remapnn,"lon=${lon}_lat=${lat}" ${infile} ${station}_out.nc
done < $coords
done

You might want to run your script through shellcheck and see what advice it gives. — doneal24, Dec 09 '22 at 17:21
@GISlearner See that message "SC1058 (error): Expected 'do'"? That's the first problem here, and the link it gives to SC2058 explains it in more detail. If you fix that, it'll then point out the next problem, which is that you're missing done at the end of the loop. If you fix that, it'll be able to parse the loop, and so it can find & point out a bunch of more subtle issues. — Gordon Davisson, Dec 09 '22 at 19:25

score 1 · Answer 1 · answered Dec 09 '22 at 17:28

In Bash for loops follow the syntax of:

for <variable name> in <a list of items> ; do <some command> ; done

Let's break that down.

for tells the shell that it will be iterating over an array.

<variable name> gives the shell a place to store the entry in the array that it is currently iterating over.

in <a list of items> specifies the array to be iterated over.

; specifies a line break, this can be either a semicolon or an actual line break in a script.

do <some command> is the command that you want executed in the loop, it may contain the variable that was defined earlier in the for loop, but it does not necessarily have to.

; a line break again, this time to prepare to end the loop.

done which closes the loop.

So, in the for f in $my_files that you have added, we can see that you have a line break after this, but then instead of a do, which is what the shell is expecting, you are defining a variable, which the shell is not expecting. Because the shell isn't expecting this, it quits with a syntax error message. There is also missing a closing done at the end of the code you want to be looped; the while loop has an appropriate done, but there is none for the for loop.

Additionally, you may want to consider avoiding parsing ls. It can cause issues, and for simple things such as iterating over files you can easily accomplish the same thing by just removing the ls:

thegs@wk-thegs-01:test$ ls 
test1.txt  test2.txt  test3.txt
thegs@wk-thegs-01:test$ for file in test*.txt ; do echo $file ; done
test1.txt
test2.txt
test3.txt

It doesn't hurt to brush up on loop syntax before continuing, Redhat offers some accessible documentation on looping in bash that I would highly recommend reading (They unfortunately parse ls, but hey, nobody perfect).

score 0 · Answer 2 · answered Dec 10 '22 at 11:40

Shell is the wrong language for processing data. You should use awk, or perl or python (or almost any language that isn't shell) instead. See Why is using a shell loop to process text considered bad practice? and Why does my shell script choke on whitespace or other special characters? for some of the many reasons why.

Also, many languages have library modules for processing NetCDF data...for example, perl has PDL::NetCDF and python has netCDF4.

Even without using NetCDF processing libraries, both awk and perl make it a lot easier to script common tasks that you might otherwise do in shell.

For example, here's a perl version of your script - perl was chosen because it combines many of the features of sed, awk, cut, tr into one language, and the extremely useful split() function, and finally because perl's system() function can take an array of arguments rather than just a single string (which would have the same annoyances and require the same workarounds as shell):

#!/usr/bin/perl
use strict;
my @coords=();
Read coords.txt into an array, so we don't have to read it
again for each year.

Yes, you could read coords.txt into an array in bash too - I very
strongly encourage you to do so if you decide to stick to shell.
In bash, its probably best to read coords.txt into three arrays, one
each for station, lon, and lat. Or two associative arrays, one each
for lon and lat (both with station as the key).
Anyway, see help mapfile in bash.
my $coords = "coords.txt";
open(my $C, "<", $coords) || die "couldn't open $coords for read: $!\n";
while(<$C>) {
  next if /^station/; # skip header
  chomp;              # get rid of \n, \r, or \r\n line-endings
  push @coords, $_;
};
close($C);
process each year
foreach my $num (2016..2018) {
  my $infile = "era_temperature_$num.nc";
process the coords data for the current year
foreach (@coords) {
    my ($station, $lat, $lon) = split;
    $outfile = "${station}${num}${lat}_${lon}_out.nc";
system(&quot;cdo&quot;, &quot;-remapnn&quot;, &quot;lon=${lon}_lat=${lat}&quot;, $infile, $outfile);

};
};

Note on the system() line that it is completely safe to use $infile and $outfile without quotes as it passes each entire variable as one argument to cdo no matter what it contains. This is not true in bash - if $infile or $outfile contained any white space or shell meta-characters (e.g. ;, &) and were used without double-quotes, they would be subject to shell word-splitting and interpretation and will cause the script to break (so, remember to always double-quote your variables in shell)

Here's an alternative version that uses two associative arrays. This will probably be slightly faster (because it only has to use split() once for each line of coords.txt) but probably not noticeably so unless coords.txt file has many thousands of lines:

#!/usr/bin/perl
use strict;
my %lon = ();
my %lat = ();
Read coords.txt into two hashes (associative arrays), one
each for lon and lat.
my $coords = "coords.txt";
open(my $C, "<", $coords) || die "couldn't open $coords for read: $!\n";
while(<$C>) {
  next if /^station/; # skip header
  chomp;              # get rid of \n, \r, or \r\n
  my ($station, $lat, $lon) = split;
  $lat{$station} = $lat;
  $lon{$station} = $lon;
}
close($C);
foreach my $num (2016..2018) {
  my $infile = "era_temperature_$num.nc";
  foreach my $station (sort keys %lat) {
    # Two different ways of constructing a string from other variables.
# Simple interpolation, as in the first version above:
my $outfile = &quot;${station}_${num}_${lat{$station}}_${lon{$station}}&quot;;

# And string concatenation with `.`, which can be easier to read
# in some cases.
my $lonlat = &quot;lon=&quot; . $lon{$station} . &quot;_lat=&quot; . $lat{$station};

# Another method is to use sprintf, which can be even easier to read.
# For example, use the following instead of the line above:
# my $lonlat = sprintf &quot;lon=%s_lat=%s&quot;, $lon{$station}, $lat{$station};
#
# note: bash has a printf built-in too.  I highly recommend using it.


system(&quot;cdo&quot;, &quot;-remapnn&quot;, $lonlat, $infile, $outfile);

};
};

BTW, perl also has some very useful quoting operators - e.g. qw() which would allow you to write the system() line as:

system(qw(cdo -remapnn lon=${lon}_lat=${lat} $infile $outfile));

or (for the associative array version):

system(qw(cdo -remapnn $lonlat $infile $outfile));

See perldoc -f qw for details.

Finally, some people make ignorant claims that perl is hard to read or understand (AFAICT that's largely because they're terrified of the fact that perl, like sed, has operators for regular expressions - without being wrapped in a function call regexes are somehow scary and unreadable)....IMO, both of the perl examples above are far clearer and easier to read and understand than your shell script with the multiple command substitutions. They will also run much faster because they don't have to fork sed and cut four times for each iteration of the loops (i.e. 3 years times however many lines are in coords.txt).

second loop issue

2 Answers2

Read coords.txt into an array, so we don't have to read it

again for each year.

Yes, you could read coords.txt into an array in bash too - I very

strongly encourage you to do so if you decide to stick to shell.

In bash, its probably best to read coords.txt into three arrays, one

each for station, lon, and lat. Or two associative arrays, one each

for lon and lat (both with station as the key).

Anyway, see `help mapfile` in bash.

process each year

process the coords data for the current year

Read coords.txt into two hashes (associative arrays), one

each for lon and lat.