13

I am trying to search a log file for logged activities that did not complete. For example, I log a "Starting activity for ID 1234..." and if successful, the next line will be "Activity 1234 Completed."

I'm trying to get the "Starting..." lines that are NOT followed by their corresponding "Completed" lines.

Example Log File

Starting activity for ID 1234
ID 1234 completed successfully
Starting activity for ID 3423
ID 3423 completed successfully
Starting activity for ID 9876
ID 9876 completed successfully
Starting activity for ID 99889
ID 99889 completed successfully
Starting activity for ID 10011
ID 10011 completed successfully
Starting activity for ID 33367
Starting activity for ID 936819
ID 936819 completed successfully

In this example, I would be looking for the output to be:

Starting activity for ID 33367

...because it's not followed by a "completed" line.

I've tried doing this with grep and awk, but have not had much success. I'm assuming it can be done with one of those tools, but my grep and awk chops are not advanced.

Looking for a quick and reliable grep or awk pattern to give the results I need here.

  • I don't think it's easy with grep + awk, but can you explain a little bit about why are you doing that ? An output of all running activities, e.g success or not finsihed ? – daisy Jul 30 '12 at 14:31
  • @warl0ck, I'm looking for the "not finished". – PattMauler Jul 30 '12 at 15:35

4 Answers4

11

Here is an awk alternative:

awk '
  /^Starting/ { I[$5] = $0                  }
  /^ID/       { delete I[$2]                }
  END         { for (key in I) print I[key] }
' infile

Output:

Starting activity for ID 33367

The I associative array keeps track of what ids have been seen.

Thor
  • 17,182
  • This works really well, as it even seems to accommodate situations where the "Starting..." and "Completed..." log lines are not adjacent/sequential. Thanks @Thor! – PattMauler Jul 30 '12 at 18:23
  • Your welcome. This should work efficiently with (almost) arbitrary size input as it only ever stores the ID and lookup time is O(1). – Thor Jul 30 '12 at 20:21
  • Nice. Only one thing: as I learned from @RobertL (http://unix.stackexchange.com/a/243550/135943) you don't need to assign a value to create an array element. So instead of I[$5] = 1, you can just use I[$5]. (You don't care about the value, you just want to make the element exist, and simply naming it accomplishes that.) – Wildcard Dec 09 '15 at 01:29
  • @Wildcard: You are right, but after reviewing the OP's question and the grep like output he is after, it is more appropriate to remember the whole line and output that at the end. – Thor Dec 09 '15 at 13:51
3
sed '$!N;/\n.*completed/d;P;D' <input

This will delete from output all input lines which are not followed by a line matching the string completed.

mikeserv
  • 58,310
2

Here's how you could do it with GNU sed:

sed -r 'N; /([0-9]+)\n\w+\s+\1/d; P; D' infile
  • N reads one more line into pattern space.
  • The match regex checks if identical ids are found, if so pattern space is deleted (d) and the cycle is restarted.
  • If it didn't match, print out the first line in pattern space (P) and delete it (D).
Thor
  • 17,182
1

if your installation supports pcregrep, the multiline (-M) option comes in handy.

pcregrep -M -o '\AStarting activity for ID (\d+)\n(?!ID \1)' t.z

Starting activity for ID 33367

iruvar
  • 16,725