2

This will cause LIST to grow very large (even a couple of GB in a short time etc):

$ for i in *; do echo $i; cut -d '   ' -f1 $i ; done > LIST

For example, after 10 seconds:

$ wc -l LIST
132654955 LIST
$ ls -hl LIST
-rw-r--r-- 1 user users 2.3G Jan 22 21:35 LIST

I think that the reason is that LIST is added to the list of files that should be processed and cut never finishes processing it. I found 3 solutions for this problem:

  • exclude LIST from being processed:

    for i in !(LIST); do echo $i; cut -d '     ' -f1 $i ; done > LIST
    
  • use another directory for LIST:

    for i in *; do echo $i; cut -d '     ' -f1 $i ; done > /tmp/LIST
    
  • expand * before running the loop with C-x * or whatever $ bind -p | grep glob-expand-word shows

Is my reasoning correct and which way is the best here?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

2 Answers2

2

Your reasoning is correct.

Among your proposed solutions, i prefer the first two specially the second one as it seems cleaner to write to a file located in another directory.

Here is another option using GLOBIGNORE variable (Given your shell supports this):

GLOBIGNORE=LIST  ## "LIST" file will be ignored while globbing
for i in *; do echo "$i"; cut -d '   ' -f1 "$i"; done >LIST
heemayl
  • 56,300
2

If you are performing activity on every file in a folder and putting the output file in the same folder, is not a wise approach in my opinion. My > output redirections always end up in /tmp, unless I know there is not enough space for my output there. Then I look for a more suitable filesystem for it. But never place them in the same directory as I am processing the input files from.

MelBurslan
  • 6,966
  • "If you are performing activity on every file in a folder and putting the output file in the same folder, is not a wise approach in my opinion" - rather a broad brush – iruvar Jan 22 '16 at 22:08
  • Less testing against unwanted effects if I do it this way. Simple is better. Always. Let it be a broad brush. Unless there is a need for complex code, say for performance purposes, because it is going to run multiple times every minute or every few seconds, it is a different story. But for a sysadmin script which will run a handful times at best, putting effort into testing conditions that one might avoid with simple precautions, like placing the log file outside, will always be my preference – MelBurslan Jan 22 '16 at 22:11