Loop through subdirectories to get files and do something on them

Question

I am trying to loop through the folders to get the files and do something on them, with output redirected to a text file with the same name as the file. I tried using 'find' -

cd /filepath/orig/v1
for dir in $(find . -type d); do
  cd $dir
  for subdir in $(find . -type d); do
          cd $subdir
          for file in ls; do
                  echo $file
                  touch $file.txt
                  cdo info $file > $file.txt
          done
  done
done

But this does not work. The directory structure is like - /filepath/orig/v1/level1/level2/file.nc but subdirectories can have more than two levels.

The first for dir in $(find . -type d); do would already find every directory on all levels of the tree. Do you need to find the directories here, or is it enough to process all the files individually? — ilkkachu, Dec 06 '21 at 12:06

score 2 · Answer 1 · answered Dec 06 '21 at 12:28

2

Loops are unnecessary for this. Find will do it all.

find . -type f ! -name '*.txt' -print -exec sh -c 'cdo info {} > {}.txt' \;

Note that this will clobber existing .txt files and you might want to use a more specific filename filter than "not *.txt"

answered Dec 06 '21 at 12:28

user10489

6,740

5

Never embed {} in the shell code. – Kamil Maciorowski Dec 06 '21 at 14:35
This does not control the depth on which the files will be found. – Kusalananda Dec 06 '21 at 15:08
If you want to control depth, find has a -depth option. – user10489 Dec 07 '21 at 00:04

Kusalananda · Answer 2 · 2021-12-06T15:23:37.067

If you have a fixed directory structure of two levels:

shopt -s dotglob nullglob
for pathname in /filepath/orig/v1//; do
    [[ $pathname == *.txt ]] && continue
printf 'Processing &quot;%s&quot;\n' &quot;$pathname&quot; &gt;&amp;2

cdo info &quot;$pathname&quot; &gt;&quot;$pathname.txt&quot;

done

This first enables the dotglob and nullglob shell options. These shell options allows globbing patterns to match hidden names (dotglob) and will ensure that patterns that are not matched are removed completely (nullglob; this means the loop would not run a single iteration if /filepath/orig/v1/*/* does not match any names).

Any name in our loop that already ends with .txt is skipped, and the rest is processed with cdo info to generate a .txt file (note that I don't know what cdo info actually does). Note that there is no need to touch the filename first as the file would be created by virtue of redirecting into it.

shopt -s dotglob nullglob
for pathname in /filepath/orig/v1//.nc; do
    printf 'Processing "%s"\n' "$pathname" >&2
    cdo info "$pathname" >"$pathname.txt"
done

If you want to process all files with names ending in .nc anywhere beneath /filepath/orig/v1:

find /filepath/orig/v1 -type f -name '*.nc' -exec sh -c '
    for pathname do
        printf "Processing \"%s\"\n" "$pathname" >&2
        cdo info "$pathname" >"$pathname.txt"
    done' sh {} +

This calls a short in-line script for batches of found regular files with names ending in .nc.

You could also use /filepath/orig/v1/*/ as the search path with find to only search the subdirectories of /filepath/orig/v1 and not /filepath/orig/v1 itself.

Windy Day · Accepted Answer · 2021-12-07T00:02:54.877

0

I dumped 'find' because I had trouble understanding its concept, but seems like this worked -

orig_dir='/filepath/orig/v1'
for entry in "$orig_dir"//; do
    cd "$entry"
    x=ls *.nc
    echo "$x"
    name=basename $x .nc
    cdo info "$x" > new_path/"$name".txt
done

edited Dec 07 '21 at 00:02

answered Dec 06 '21 at 12:28

Windy Day

35

Quote properly. – Kamil Maciorowski Dec 06 '21 at 14:39

score 0 · Answer 4 · answered Dec 07 '21 at 00:56

With GNU Parallel:

doit() {
  dir="$1"
  file="$2"
  cd "$dir"
  echo "$file"
  touch "$file".txt
  cdo info "$file" > "$file".txt
}
export -f doit
# 2 level only
printf "%s\0" */*/* | parallel -0 doit {//} {/}
# any level
find . -type f -print0 | parallel -0 doit {//} {/}

If you do not need the echo, touch, and if cdo can work on full path it can be shorter:

# 2 level only
printf "%s\0" */*/* | parallel -0 'cdo info {} > {}.txt'
# any level
find . -type f -print0 | parallel -0 'cdo info {} > {}.txt'

Contrary to xargs' shell code, {} is safe here.

If you want foo.nc to result in foo.txt:

# 2 level only
printf "%s\0" */*/*.nc | parallel -0 'cdo info {} > {.}.txt'
# any level
find . -type f -name '*.nc' -print0 | parallel -0 'cdo info {} > {.}.txt'

cas · Answer 5 · 2021-12-07T03:49:03.593

If you're using GNU or BSD find, you can use the -execdir option. It's the same as -exec except that it changes into the directory containing the file(s) first (and if you're using + instead of ; to terminate the -execdir, it batches up the files in the same dir to minimise to minimise the amount of forking per directory). e.g.

find . -type f -execdir \
  sh -c 'for f; do printf "%s\n" "$f" ; cdo info "$f" > "$f.txt"; done' sh {} +

Notes:

for f; do is the same as for f in "$@"; do
The first arg to the sh -c '...' command is sh. That's the name that will be used in the process table for the sh -c being executed by -exec or -execdir - i.e. $0. You can use any arbitrary name you like there - sh or find-sh are commonly used. If it's not there, then the shell script will not see the first filename found by find. This is specific to sh -c (and some other commands, usually script interpreters, like bash -c), it is not required for most commands that you might want to run with find -exec or -execdir (e.g. grep and sed don't need it)
This uses -type f because, even though we want find to cd into the directory containing files, we only want to process regular files, not directories (or sockets, named pipes, symlinks, etc). If you want to process regular files and symlinks, use either find's -L option or $ -type f -o -type l $. Note that -L will follow symlinks to directories outside of your search tree, which is not usually what you want.

If using $ -type f -o -type l $. the embedded sh -c script should check each argument to be sure that it (e.g. "$f" in my examples) is either a regular file or a symlink pointing to a regular file (test -f will do this for both because, as documented in help test and man test, "Except for -h and -L, all FILE-related tests dereference symbolic links.").
```
find . $ -type f -o -type l $ -execdir \
  sh -c 'for f; do
           printf "%s\n" "$f"
           [ -f "$f" ] && cdo info "$f" > "$f.txt"
         done' sh {} +
```
All variable expansions in the sh -c script are double-quoted. As they should be (See Why does my shell script choke on whitespace or other special characters? for why)

If you need to limit the search depth, you can use the -maxdepth option. e.g.

find . -maxdepth 2 -type f -execdir \
  sh -c 'for f; do printf "%s\n" "$f" ; cdo info "$f" > "$f.txt"; done' sh {} +

find also has related options like -d or -depth, and -mindepth for controlling how it traverses a directory tree.

PS: I don't know what the cdo command does or what arguments it takes but if it supports using -- to mark the end of options and the start of filename args, you should include it in the command, otherwise filenames beginning with - may be treated as options to cdo. e.g.

find . -type f -execdir \
  sh -c 'for f; do printf "%s\n" "$f" ; cdo info -- "$f" > "$f.txt"; done' sh {} +

This is (part of) the reason why I used printf instead of echo. See Why is printf better than echo?

Loop through subdirectories to get files and do something on them

5 Answers5