2

I have a job script which is not producing results, and one of my suspicions is that there are some files called which are missing, the relevant part of the job scripts looks like this:

  echo get_data

  get_fms_data \
    amip1 \
    seaesf \
    albedo \
    lad \
    topog \
    ggrpsst \
    mom4 \
    /data0/home/rslat/GFDL/archive/edg/fms/river_routes_gt74Sto61S=river_destination_field \
    /data0/home/rslat/GFDL/archive/fms/mom4/mom4p1/mom4p1a/mom4_ecosystem/preprocessing/rho0_profile.nc \
    /data0/home/rslat/GFDL/archive/fms/mom4/mom4p0/mom4p0c/mom4_test8/preprocessing/fe_dep_ginoux_gregg_om3_bc.nc=Soluble_Fe_Flux_PI.nc \
    /data0/home/rslat/GFDL/archive/jwd/regression_data/esm2.1/input/cover_type_1860_g_ens=cover_type_field \
    /data0/home/rslat/GFDL/archive/jwd/regression_data/esm2.1/input/soil_color.nc \
    /data0/home/rslat/GFDL/archive/jwd/regression_data/esm2.1/input/biodata.nc \
    /data0/home/rslat/GFDL/archive/jwd/regression_data/esm2.1/input/ground_type.nc \
    /data0/home/rslat/GFDL/archive/jwd/regression_data/esm2.1/input/groundwater_residence.nc \
    /data0/home/rslat/GFDL/archive/ms2/esm2.1/input/max_water.nc \
...

As a first step, I want to copy all these paths into a text file and then check if they actually exist.

Is there an easy way to do it? I looked in other questions but most of them refer to checking only one file and not from a file.

Thank you!

  • Is that really the format of your file? Including the \ and the spaces at the beginning of the lines? – terdon Jul 02 '19 at 10:00
  • Yes, because it's a job script which I'm trying to debug. But I can create another text file with another formatting for the paths. – ValientProcess Jul 02 '19 at 10:06
  • Can you show a bit more from the script then? It looks like it's a part of a loop in a script or something. – rush Jul 02 '19 at 10:07
  • I edited my question, although I think the original script is less relevant as I'm trying to construct a basic loop to check if a list of files exists. – ValientProcess Jul 02 '19 at 10:12
  • @ValientProcess ah, the edited file is much easier to deal with. You now have one path per line and don't have file names split onto multiple lines. Is that accurate? – terdon Jul 02 '19 at 10:15
  • True! And there is the "" separation – ValientProcess Jul 02 '19 at 10:22

2 Answers2

3

Here's one way (assuming GNU tools):

grep -Po '^\s*\K/.*' file | 
    sed 's/\s*\\//'  | 
        while IFS= read -r path; do 
            [[ -e "$path" ]] && 
                printf 'FOUND: "%s"\n' "$path" || 
                printf "ERROR: '%s' doesn't exist\n" "$path"; 
        done 

Explanation

  • grep -Po '^\s*\K/.*' : find only those lines beginning with 0 or more whitespace characters and then a /. This will print the lines with the target paths.
  • sed 's/\s*\\//' : remove any trailing whitespace and the trailing backslash.
  • while IFS= read -r path; do : read each line (path) into the variable $path.
  • [[ -e "$path" ]] && printf 'FOUND: "%s"\n' "$path" : if this path exists, print the relevant message.
  • || printf "ERROR: '%s' doesn't exist\n" "$path"; : else, if it doesn't exist, print an error message.
terdon
  • 242,166
3

I would use ls -- no, not parse the output of ls!, but use its behavior of reporting missing files to stderr. Type the three characters "ls" (l s Space), then paste the file list, then enter > /dev/null. An example with the filenames from the question:

ls  /data0/home/rslat/GFDL/archive/edg/fms/river_routes_gt74Sto61S=river_destination_field \
    /data0/home/rslat/GFDL/archive/fms/mom4/mom4p1/mom4p1a/mom4_ecosystem/preprocessing/rho0_profile.nc \
    /data0/home/rslat/GFDL/archive/fms/mom4/mom4p0/mom4p0c/mom4_test8/preprocessing/fe_dep_ginoux_gregg_om3_bc.nc=Soluble_Fe_Flux_PI.nc \
    /data0/home/rslat/GFDL/archive/jwd/regression_data/esm2.1/input/cover_type_1860_g_ens=cover_type_field \
    /data0/home/rslat/GFDL/archive/jwd/regression_data/esm2.1/input/soil_color.nc \
    /data0/home/rslat/GFDL/archive/jwd/regression_data/esm2.1/input/biodata.nc \
    /data0/home/rslat/GFDL/archive/jwd/regression_data/esm2.1/input/ground_type.nc \
    /data0/home/rslat/GFDL/archive/jwd/regression_data/esm2.1/input/groundwater_residence.nc \
    /data0/home/rslat/GFDL/archive/ms2/esm2.1/input/max_water.nc \
  > /dev/null

You'll get no output if every file exists; you'll get messages about ones that don't exist (because we allowed stderr through). For (a made-up) example:

ls  /bogus/data0/home/rslat/GFDL/archive/edg/fms/river_routes_gt74Sto61S=river_destination_field \
    /bogus/data0/home/rslat/GFDL/archive/fms/mom4/mom4p1/mom4p1a/mom4_ecosystem/preprocessing/rho0_profile.nc
    > /dev/null

You'd get:

ls: cannot access /bogus/data0/home/rslat/GFDL/archive/edg/fms/river_routes_gt74Sto61S=river_destination_field: No such file or directory
ls: cannot access /bogus/data0/home/rslat/GFDL/archive/fms/mom4/mom4p1/mom4p1a/mom4_ecosystem/preprocessing/rho0_profile.nc: No such file or directory

This method is easily scripted, too -- just check the return code (and drop stderr, if you'd like):

if ls /data0/exists /bogus/doesnot > /dev/null 2> /dev/null
then
  echo all files exist
else
  echo some files are missing
fi
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255