Shell is the wrong language for processing data. You should use awk
, or perl
or python
(or almost any language that isn't shell) instead. See Why is using a shell loop to process text considered bad practice? and Why does my shell script choke on whitespace or other special characters? for some of the many reasons why.
Also, many languages have library modules for processing NetCDF data...for example, perl has PDL::NetCDF and python has netCDF4.
Even without using NetCDF processing libraries, both awk
and perl
make it a lot easier to script common tasks that you might otherwise do in shell.
For example, here's a perl version of your script - perl was chosen because it combines many of the features of sed, awk, cut, tr into one language, and the extremely useful split()
function, and finally because perl's system()
function can take an array of arguments rather than just a single string (which would have the same annoyances and require the same workarounds as shell):
#!/usr/bin/perl
use strict;
my @coords=();
Read coords.txt into an array, so we don't have to read it
again for each year.
Yes, you could read coords.txt into an array in bash too - I very
strongly encourage you to do so if you decide to stick to shell.
In bash, its probably best to read coords.txt into three arrays, one
each for station, lon, and lat. Or two associative arrays, one each
for lon and lat (both with station as the key).
Anyway, see help mapfile
in bash.
my $coords = "coords.txt";
open(my $C, "<", $coords) || die "couldn't open $coords for read: $!\n";
while(<$C>) {
next if /^station/; # skip header
chomp; # get rid of \n, \r, or \r\n line-endings
push @coords, $_;
};
close($C);
process each year
foreach my $num (2016..2018) {
my $infile = "era_temperature_$num.nc";
process the coords data for the current year
foreach (@coords) {
my ($station, $lat, $lon) = split;
$outfile = "${station}${num}${lat}_${lon}_out.nc";
system("cdo", "-remapnn", "lon=${lon}_lat=${lat}", $infile, $outfile);
};
};
Note on the system()
line that it is completely safe to use $infile
and $outfile
without quotes as it passes each entire variable as one argument to cdo
no matter what it contains. This is not true in bash - if $infile
or $outfile
contained any white space or shell meta-characters (e.g. ;
, &
) and were used without double-quotes, they would be subject to shell word-splitting and interpretation and will cause the script to break (so, remember to always double-quote your variables in shell)
Here's an alternative version that uses two associative arrays. This will probably be slightly faster (because it only has to use split()
once for each line of coords.txt) but probably not noticeably so unless coords.txt file has many thousands of lines:
#!/usr/bin/perl
use strict;
my %lon = ();
my %lat = ();
Read coords.txt into two hashes (associative arrays), one
each for lon and lat.
my $coords = "coords.txt";
open(my $C, "<", $coords) || die "couldn't open $coords for read: $!\n";
while(<$C>) {
next if /^station/; # skip header
chomp; # get rid of \n, \r, or \r\n
my ($station, $lat, $lon) = split;
$lat{$station} = $lat;
$lon{$station} = $lon;
}
close($C);
foreach my $num (2016..2018) {
my $infile = "era_temperature_$num.nc";
foreach my $station (sort keys %lat) {
# Two different ways of constructing a string from other variables.
# Simple interpolation, as in the first version above:
my $outfile = "${station}_${num}_${lat{$station}}_${lon{$station}}";
# And string concatenation with `.`, which can be easier to read
# in some cases.
my $lonlat = "lon=" . $lon{$station} . "_lat=" . $lat{$station};
# Another method is to use sprintf, which can be even easier to read.
# For example, use the following instead of the line above:
# my $lonlat = sprintf "lon=%s_lat=%s", $lon{$station}, $lat{$station};
#
# note: bash has a printf built-in too. I highly recommend using it.
system("cdo", "-remapnn", $lonlat, $infile, $outfile);
};
};
BTW, perl also has some very useful quoting operators - e.g. qw()
which would allow you to write the system()
line as:
system(qw(cdo -remapnn lon=${lon}_lat=${lat} $infile $outfile));
or (for the associative array version):
system(qw(cdo -remapnn $lonlat $infile $outfile));
See perldoc -f qw
for details.
Finally, some people make ignorant claims that perl is hard to read or understand (AFAICT that's largely because they're terrified of the fact that perl, like sed, has operators for regular expressions - without being wrapped in a function call regexes are somehow scary and unreadable)....IMO, both of the perl examples above are far clearer and easier to read and understand than your shell script with the multiple command substitutions. They will also run much faster because they don't have to fork sed
and cut
four times for each iteration of the loops (i.e. 3 years times however many lines are in coords.txt).
done
at the end of the loop. If you fix that, it'll be able to parse the loop, and so it can find & point out a bunch of more subtle issues. – Gordon Davisson Dec 09 '22 at 19:25