1

I need to extract information from a log file that is deleted and recreated every time a program runs. After detecting that the file exists (again), I would like to tail it for a certain regexp.

The regexp will be matched a few times, but the result is always the same and I want to print it just once and after that go back to monitoring when the file is re-created.

I looked at ways of detecting file creation. One way would be via inotifywait, but that requires installing a separate package.

Perhaps a simpler way is to take advantage that tail prints to stderr when a file that is being tailed is deleted and created:

tail: '/path/to/debug.log' has become inaccessible: No such file or directory
tail: '/path/to/debug.log' has appeared;  following new file

So I applied this solution which is working:

debug_file="/path/to/debug.log"

while true; do # Monitor the log file until the 'new file' message appears ( tail -F $debug_file 2>&1 & ) | grep -q "has appeared; following new file"

# After the new file message appears, switch to monitoring for the regexp
tail -F "$debug_file" | while read -r line; do
    id=$(echo "$line" | sed -n 's/.* etc \([0-9]\+\),.*/\1/p')
    if [ -n "$id" ]; then
        echo "ID: $id"
        break  # Exit the inner loop after the first match
    fi
done

done

But I don't like that this solution starts 2 different tail processes. Is there a way to achieve the same result, but using just 1 tail process?

And then switch 'modes', start by looking for file creation, then look for the regexp and once that is found go back to 'standby' mode waiting for the log file to be deleted and created again.

Is inotifywait a more elegant solution? Ideally I would like a solution I could port easily to Windows CMD.

  • I think this may be similar to the issue posed by this question: https://unix.stackexchange.com/questions/410471/tail-f-but-when-the-file-is-deleted-and-re-created-not-appended#:~:text=Just%20make%20sure,they%E2%80%99ve%20been%20deleted%29. – darnold0714 Mar 19 '24 at 17:47
  • Is there any point in using tail here? You're not watching the output, you seem to only care about a single value, so why bring tail in at all? Why not just grep the file for your pattern? – terdon Mar 19 '24 at 19:42
  • I want to monitor the file over several consecutive cycles of delete file -> create new file -> populate with data, so using tail is nice because it keeps the process active and adds a line to the console every time the cycle repeats. The initial problem was that the script would terminate as soon as sed found a match and the behaviour I'm looking for is to print the match and go back to listening mode. – user2066480 Mar 19 '24 at 19:51
  • But why would tail be needed over just running grep? Why is it a good thing to keep the process active? – terdon Mar 19 '24 at 19:57
  • Suppose you open the clock app in your computer. Every time you open the app, a log file is deleted, created new and populated with the time. Now imagine you open and close the clock app repeatedly. At the same time, you want a terminal window open printing the time that is saved in the log file. It makes sense to me to use tail, but perhaps there is a way to use grep that also continuously monitors the log file for changes and prints the time every time the clock app is opened? – user2066480 Mar 19 '24 at 20:45
  • Can't you just use tail -F FILE? Works even for files which not even exists (yet). – paladin Mar 20 '24 at 00:37
  • have you check --follow=name option ? – Archemar Mar 20 '24 at 12:40

3 Answers3

0

It seems to me that you don't need the outer loop. piping tail output into read might allow you to operate in a kind of state machine, where you are in two states. a normal tail state, and a new file state, where you wait for the next match to print your "ID: " line.

I'd imagine you'd want to try redirecting stderr into stdout while reading, though. added some pseudo-code to your sample.

debug_file="/path/to/debug.log"

After the new file message appears, switch to monitoring for the regexp

state="new_file"

tail -F "$debug_file" 2>&1 | while read -r line; do # if state = "new_file" id=$(echo "$line" | sed -n 's/.* etc ([0-9]+),.*/\1/p') if [ -n "$id" ]; then echo "ID: $id" # state = "id_printed" fi # else # if line contains "has appeared; following new file" # state = new_file # echo "$line" # fi done

Edit. The above solution would be necessary if there can be multiple matches of the regular expression in the file, and you only want to match the first one. Otherwise, you could probably create a somewhat complicated one-liner using tail -F | grep | sed (could also potentially use awk instead of sed). You may even be able to get away with just tail and grep, using the -o option, depending on your requirements.

e.g. try: tail -F /path/to/debug.log | grep -E '.* etc ([0-9]+),.*', and then see how you'd want to transform it from there.

0

Using TXR Lisp:

(open-tail "/path/to/debug.log")  ;; returns a stream that follows rotating log

You just read input from this stream, e.g. using something like:

(with-stream (s (open-tail "/path/to/debug.log"))
  (whilet ((ln (get-line s)))
    (if-match `@nil etc @id,@nil` ln
      (put-line `ID: @id`))))
Kaz
  • 8,273
0

All you need to continuously tail a file across recreations and only output unique lines is:

tail -F "$debug_file" 2>/dev/null | awk '!seen[$0]++'

and if you only want to consider parts of lines that the regexp in your sed command sed -n 's/.* etc \([0-9]\+\),.*/\1/p' would produce you could do this (untested) using GNU awk for the 3rg arg to match()

tail -F "$debug_file" 2>/dev/null |
awk 'match($0,/.* etc ([0-9]+),/,a) && !seen[a[1]]++ { print "ID: " a[1] }'

or this using any awk:

tail -F "$debug_file" 2>/dev/null |
awk '/.* etc [0-9]+,/ && sub(/.* etc /,"") && sub(/,.*/,"") && !seen[$0]++ { print "ID: " $0 }'

and if you wanted to discard any "IDs" seen in the previous iteration of the input then you could do:

tail -F "$debug_file" 2>&1 |
awk '
    /^tail: \047.*\047 has appeared;  following new file$/ { delete seen }
    ... code from above ...
'

assuming the file you're tailing can't itself contain lines matching that ^tail:.* regexp.

There's a good chance that this:

awk '/.* etc [0-9]+,/ && sub(/.* etc /,"") && sub(/,.*/,"") && !seen[$0]++ { print "ID: " $0 }'

could actually be written as something more concise like:

awk '$(X-1) == "etc" && !seen[$X]++ { print "ID: " $X+0 }'

where X is whatever the space-separated field number is that contains the ID number but without seeing an example of your log file, I don't know if that'd work or not and, if it did, what value X would have, nor exactly what the condition would look like.

Ed Morton
  • 31,617