1

I'm not used to linux scripting and this is the first time I'm working on it so I'm struggling with the following problem:

Code:

while [ $pct -gt 80 ]; do
    flag=1;
    ls -tr | while read file; do
        if [[ $file =~ .+\.log[0-9]+ ]]; then
            printf "File deleted:\n";
            stat $file; 
            rm -r $file;
            flag=1;
            break;
        else
            flag=0;
        fi;
    done;
if [ $flag -eq 0]; then
    break;
fi;

pct= # get the new pct;

done;

The operation is to delete certain log files as captured by the above regular expression, in order of oldest files first and hence I'm using ls -tr. I'm iterating over the list of files using the while loop and if any file matches with the given regex, I'm deleting it. After every deletion I'm checking the percentage of the application file system used and if it is greater than 80% (as shown in outer while loop condition), I repeat the process.

Now often, even after deleting the files, the percentage used doesn't go below 80%, even though no files with the given regex pattern is left and I cannot delete the other remaining files in the same folder. So, it goes into an infinite loop and hence I'm trying to use a flag variable to break the infinite loop in such cases. However, since I'm using pipe to iterate over the files, it becomes a sub-shell or a child process and the flag variable is not available (updated value of flag from sub shell does not reflect in parent shell) in the parent process (it's what I read in an article about piping) and thus the infinite loop is never broken. Can anyone please suggest a fix around this or an alternate logic?

  • see https://unix.stackexchange.com/a/407802/170373 for workarounds. – ilkkachu Jun 30 '22 at 19:26
  • It looks to me that ls | while loop ever only touches one file listed by ls, so I think you could just use something like file=$(ls -tr | grep '\.log[0-9]+' | head -1) to get the first matching filename. Or an empty string if there are none. Of course that won't work if your filenames are messy enough (in particular, newlines are valid in filenames), but I guess you know you only have simple filenames. In addition, if the files are created so that their names sort in order, you could just let the shell glob the names and pick the first in the list, since globs sort lexicographically. – ilkkachu Jun 30 '22 at 19:32
  • This doesn't work in my scenario, since I want the files to be sorted based on creation/modified time (oldest first) and hence using ls -tr so that the top matched file name(that matches with the regex) can be be obtained. And there are several files in the folder, some matching the regex, some not, and I have to take the oldest file that matches the regex and delete it. So is there any way I can obtain so, without the iterating over the files? any logic or so? – Biswadeep Sarkar Jun 30 '22 at 19:41
  • so why wouldn't grep work? It doesn't change the order. (The regex should be .\.[0-9]+, though. Note that the regex match isn't anchored to the start or end of the string.) And yes, if the names of the files don't sort in the same order as their timestamps, then you can't use just that. But if the files are named so that they contain e.g. YYYY-MM-DD dates and times, or just monotonically increasing numbers, then they do sort the same as their timestamps. Well, assuming they're not modified after the next file is created, but that's usually how log files work anyway. – ilkkachu Jun 30 '22 at 19:52
  • Yes the names of the files dont sort in the same order as their timestamp and they dont contain the date as a part of their names. Though, file=$(ls -tr | grep '\.log[0-9]+' | head -1) seemed like a possible solution, but I used it only to get blank output. It's not showing anything, $file is blank. Sorry, I don't understand how else to make it work since as I said this is very new for me. So if there is any other solution, I'd be grateful to you. Thanks. – Biswadeep Sarkar Jun 30 '22 at 19:59
  • whoops, that should be grep -E '\.log[0-9]+'. – ilkkachu Jun 30 '22 at 20:50
  • 2
    Invert the logic: iterate on the files, then break when you have enough space. – ctrl-alt-delor Jun 30 '22 at 20:53

4 Answers4

2

With zsh instead of bash:

while
  pct= # get the new pct

(( pct > 80 )) && oldest_log_file=( *.log<->(N.Om[1]) ) && (( $#oldest_log_file )) do print -r Removing $oldest_log_file rm -f -- $oldest_log_file done

Or:

log_files_from_oldest_to_newest=( *.log<->(N.Om) )
while
  pct= # get the new pct

(( pct > 80 )) && (( $#log_files_from_oldest_to_newest )) do print -r Removing $log_files_from_oldest_to_newest[1] rm -f -- $log_files_from_oldest_to_newest[1] shift 1 log_files_from_oldest_to_newest done

Or:

zmodload zsh/stat
for log_file in *.log<->(N.Om); do
  pct= # get the new pct
  (( pct > 80 )) || break

stat -F %FT%T%z -LH s -- $log_file || continue print -r Removing $log_file of size $s[size] last modified on $s[mtime] rm -f -- $log_file done

1

Parsing ls is very fragile, and really not a good idea. However, if you are certain that your file names will never contain spaces, tabs, newlines (assuming an unmodified $IFS), nor glob pattern operators (*, ?, [ at least) and won't start with - and are not of type directory, you could do something like this:

#!/bin/bash
pct= # get the current pct
## are there any matching files?
files=( $(ls -tr *.log[0-9]* 2>/dev/null) );

While pct is above the threshold and there is at least

one file in the files array

while [[ $pct -gt 80 && ${#files[@]} -gt 0 ]]; do printf "Deleting file:\n%s\n" "$(stat -- "${files[0]}")" rm -- "${files[0]}" ## repopulate the files array files=( $(ls -tr .log[0-9] 2>/dev/null) ); pct=85 # get the new pct; done

The idea here is to store the file names in an array, and redefine the array after each deletion. Then, we make the loop run on two conditions: "are there any files left?" and "is the $pct under 80?", so it will stop when either of the conditions is no longer true.

Caveat1: This assumes you have no files named something like foo.log12bar, i.e. that you want to look at all files that contain the string .log followed by a number and don't want to avoid file names that have non-numerical characters after the first number.

Caveat2: as mentioned in the beginning, this will fail for file names with less usual names. See here for why parsing the output of ls is almost always a bad idea:

terdon
  • 242,166
  • I don't think you need the "if" there inside the loop, the "while" -condition is just tested right before going to the loop body, and will be tested again on the next iteration, before going to the loop body again. This is one of the cases where one could put the assignment in the condition part of the while to avoid repeating it, i.e. while files=(...); [[ $pct -gt ... ]]; do ... ; done. Then again, one could just delete the removed filename from the list, and not call ls again. Easier with the positional parameters, something like: set -- $(ls ...); while ...; do rm "$1"; shift; done – ilkkachu Jun 30 '22 at 21:10
  • Understood, got a good idea. Thanks a lot to both of you terdon and @ilkkachu – Biswadeep Sarkar Jun 30 '22 at 21:21
  • Oh man... You're right, of course, @ilkkachu. For some reason I thought I had the second files=( ) inside the while and outside the if. Which also wouldn't have made much sense but that's what I thought I had and why I had the if. commonly known as a "brainfart". Very good point about using the positional parameters, but I felt that might be a bit too esoteric. It is a better way though, absolutely. Thanks! – terdon Jun 30 '22 at 22:00
  • @terdon, well, while true; do files=(...); if [[ whatever ]]; ...; done would make sense. And would avoid repeating the assignment (or using multiple commands in the condition part, which I guess some might find confusing since it's a bit different from other programming languages) – ilkkachu Jul 01 '22 at 07:32
0

Try something like this:

#!/bin/bash

dir="/path/to/dir"

read the matching files into array "$files"

mapfile -d '' files < <(find "$dir" -maxdepth 1 -type f -regex '.*.log[0-9]+' -print0)

for f in "${files[@]}"; do

get the current percentage used of the filesystem

pct=$(df --output=pcent "$dir" | awk -F'[ %]' 'NR==2 {print $2}')

[ "$pct" -lt 80 ] && break rm "$f" done

This keeps on deleting the matching files one at a time until either the current percentage used for the filesystem falls below 80% or until there are no more matching filenames (whichever comes first).

Note that this requires GNU df for the --output option.

For non-GNU df, the following will work if the device name ("Filesystem" column) doesn't contain any whitespace or % characters:

pct=$(df "$dir" | awk -F'[[:space:]]+|%' 'NR==2 {print $5}')

If the mount-point ("Mounted on" column) does contain whitespace, but doesn't contain a % character, you could use something like this instead:

pct=$(df "$dir" | awk '
        NR==2 {
          for (i=NF; i > 0; i--) { if ($i ~ /%/) {break} };
          sub("%","",$i);
          print $i
        }')

(yes, extracting data from df's output is harder than it seems at first glance, for pretty much the same reason why parsing ls is a bad idea )


If you need to sort the filenames by timestamp (so you can delete the oldest files first), you can use the method described in my answer to Shell Script to move oldest file from one directory to another directory. That would be something like:

mapfile -d '' files < <(
    find "$dir" -maxdepth 1 \
      -type f \
      -regex '.*\.log[0-9]+' \
      -printf '%C@\t%p\0' |
    sort -z -k1,1 -r -n |
    cut -z -f2
)

Note that this also requires the GNU versions of find, sort, and cut.

cas
  • 78,579
0

As you will note from the other fine answers, there are many different ways to do what you're attempting, so I won't throw another implementation on the pile.

Also disregarding for a moment that processing ls output has pitfalls.

The core of the question seems to be about getting the variable value to hold outside of the loop. As such, I think this is what you're looking for:

flag=foo
while read file; do
  echo "do stuff with $file"
  flag=bar
done < <(ls -tr) # preferably do something else to get your list
echo "$flag"
bxm
  • 4,855