1

I am new to Unix and I have a log file which I need to analyze. Below is my sample log file:

Container:container_e182_1234
=============================
LogType:container-localizer-syslog
Log Upload Time :Thu Jun 25 12:24:45 +0100 2020
LogLength:0
Log Contents:

LogType:stderr Log Upload Time :Thu Jun 25 12:24:52 +0100 2020 LogLength:3000 Log Contents: 20/06/25 12:19:33 INFO datasources.FileScanRDD: Reading File path: hdfs://bpaiddev/dev/data/warehouse/clean/falcon/ukc/ 20/06/25 12:19:39 ERROR Exception found java.io.Exception:Not initiated at.apache.java.org.Exception(132) 20/06/25 12:19:40 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver. 20/06/25 12:20:41 WARN Warning as the node is accessed without started

LogType:stdout Log Upload Time :Thu Jun 25 12:24:52 +0100 2020 LogLength:0 Log Contents:

Container:container_e182_1234

LogType:container-localizer-syslog Log Upload Time :Thu Jun 25 12:24:52 +0100 2020 LogLength:0 Log Contents:

LogType:stderr Log Upload Time :Thu Jun 25 12:24:52 +0100 2020 LogLength:3000 Log Contents:

LogType:stdout Log Upload Time :Thu Jun 25 12:24:52 +0100 2020 LogLength:0 Log Contents:

Expected output

stderr
Thu Jun 25 12:24:52 +0100 2020
3000
20/06/25 12:19:39 ERROR Exception found
java.io.Exception:Not initiated
    at.apache.java.org.Exception(132)
20/06/25 12:20:41 WARN Warning as the node is accessed without started

The output must contain only the ERROR and WARN and also the other details as mentioned above

Log file:

Container:container_e182_1234
=============================
LogType:container-localizer-syslog
Log Upload Time :Thu Jun 25 12:24:45 +0100 2020
LogLength:0
Log Contents:

LogType:stderr Log Upload Time :Thu Jun 25 12:24:52 +0100 2020 LogLength:3000 Log Contents: 20/06/25 12:19:33 INFO datasources.FileScanRDD: Reading File path: hdfs://bpaiddev/dev/data/warehouse/clean/falcon/ukc/masked_data/parquet/FRAUD_CUSTOMER_INFORMATION/rcd_crt_dttm_yyyymmdd=20200523/part-0042-ed52abc2w.c000.snapp.parquet, range:0-27899, partition values :[20200523] 20/06/25 12:19:39 ERROR Exception found java.io.Exception:Not initated at.apache.java.org........ 20/06/25 12:19:40 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver. 20/06/25 12:20:41 WARN Warning as the node is accessed without started

LogType:stdout Log Upload Time :Thu Jun 25 12:24:52 +0100 2020 LogLength:0 Log Contents:

Container:container_e182_1234

LogType:container-localizer-syslog Log Upload Time :Thu Jun 25 12:24:52 +0100 2020 LogLength:0 Log Contents:

LogType:stderr Log Upload Time :Thu Jun 25 12:24:52 +0100 2020 LogLength:3000 Log Contents: 20/06/25 12:19:33 INFO datasources.FileScanRDD: Reading File path: hdfs://bpaiddev/dev/data/warehouse/clean/falcon/ukc/masked_data/parquet/FRAUD_CUSTOMER_INFORMATION/rcd_crt_dttm_yyyymmdd=20200523/part-0042-ed52abc2w.c000.snapp.parquet, range:0-27899, partition values :[20200523] 20/06/25 12:19:34 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver.

LogType:stdout Log Upload Time :Thu Jun 25 12:24:52 +0100 2020 LogLength:0 Log Contents:

How to do this ? Please help me to solve this issue. Thanks a lot!

Lekshmi
  • 15
  • You asked the same question on the main SE site, but for Python rather than Awk. I don't think you're likely to get a different result here. If an existing solution won't work for you, try something in your preferred language and ask a more specific question when you get stuck. – Brian Z Jul 09 '20 at 16:52
  • I think using awk I can get a solution which I can use in Python. So I have asked here.No I don't want to use tools . I just need to analyze them using code. I have asked for different approach here – Lekshmi Jul 09 '20 at 16:55
  • I got stuck in the above mentioned area. So I have asked for the solution . – Lekshmi Jul 09 '20 at 16:56
  • I thought of going with that approach and I found using awk I can easily create a new file. From that file I can easily analyze them – Lekshmi Jul 09 '20 at 16:57
  • It looks like you could first select the blocks with ERROR and WARN with something like this: https://stackoverflow.com/questions/19257597/find-specific-pattern-and-print-complete-text-block-using-awk-or-sed Then you would just need to remove the beginning of each line up through :. – Brian Z Jul 09 '20 at 16:58
  • But so far you haven't shown us anything except your input. Give it a try and post your code so we have something to start with, and I think you're more likely to get to the solution. – Brian Z Jul 09 '20 at 16:59
  • Is there a reason why you need awk? You can just use sed like this: sed -n 's/^.*LogType:\(.*\)$/\1/p; s/^.*Log Upload Time :\(.*\)/\1/p; s/^.*LogLength:\(.*\)$/\1/p; s/^.*\(ERROR\|WARN\).*$/\0/p' file. If not, I would write it as an answer. – Giuseppe Clemente Jul 09 '20 at 17:01
  • Thank you I will try the above mentioned link. Yeah I would like to include my code but this is the beginning part So I am out of thoughts. Sure I will try with the above mentioned link .Thanks again for your guidance! – Lekshmi Jul 09 '20 at 17:02
  • No I can use sed also .Thank you Clemente. Post as a solution .Will try that – Lekshmi Jul 09 '20 at 17:03
  • Does stdout have some content that we need to worry about? – unxnut Jul 09 '20 at 17:15
  • No there is no content there mate! I just need the stderr logType – Lekshmi Jul 09 '20 at 17:17
  • Is this a tab character at the start of each line or this a typo? – Freddy Jul 09 '20 at 17:46
  • No No it's just a typo – Lekshmi Jul 09 '20 at 17:47

4 Answers4

3

You can use sed for the same purpose with the following one-liner (assuming your file is called file):

sed -n 's/^.*LogType:\(stderr\)$/\1/p; s/^.*Log Upload Time :\(.*\)/\1/p; s/^.*LogLength:\(.*\)$/\1/p; s/^.*\(ERROR\|WARN\).*$/\0/p' file

Then you can save its output somewhere using a redirection (>) to another file.

Split up to multiple lines for easier reading:

sed -n -e 's/^.*LogType:\(stderr\)$/\1/p' \
       -e 's/^.*Log Upload Time :\(.*\)/\1/p' \
       -e 's/^.*LogLength:\(.*\)$/\1/p' \
       -e 's/^.*\(ERROR\|WARN\).*$/\0/p' file

Update

The solution above doesn't exclude blocks which are not of the 'LogType:stderr' as requested by the OP; there is required non-local infomation (not in the same line), which is not amenable to treat with sed alone.

The following script, which uses both awk and sed, (with the awk part inspired by this post), does the job:

#!/bin/bash
file=$1
awk '{
  if($0 ~ /LogType/){
    if(hold ~ /LogType:stderr/){
      print hold;
    }
    hold=$0
  }else{
    hold=hold "\n" $0
  }
}END{
  if(hold ~ /LogType:stderr/){
    print hold
  }
}' $file | sed -n -e 's/^.*LogType:\(stderr\)$/\1/p' \
                 -e 's/^.*Log Upload Time :\(.*\)/\1/p' \
                 -e 's/^.*LogLength:\(.*\)$/\1/p'       \
                 -e 's/^.*\(ERROR\|WARN\).*$/\0/p'
1

I am able to do it using a short script. The original log is contained in the file logdata.

#!/bin/bash

tmpfile="/tmp/$0.$$"

sed -n '/stderr/,/^ $/p' logdata > "$tmpfile" sed -n 's/^.LogType:(.)/\1/p s/^.Log Upload Time :(.)/\1/p s/^.LogLength:(.*)/\1/p' "$tmpfile" grep -E "(ERROR|WARN)" "$tmpfile" rm "$tmpfile"

First, we extract the stderr block into a temporary file. Then, take out the two fields and then, grep the error and warning. I was trying to connect the last two steps using tee but did not succeed.

I could do it without the temporary file by

sed -n '/stderr/,/^ *$/p' logdata | \
sed -n 's/^.*LogType:\(.*\)/\1/p
        s/^.*Log Upload Time :\(.*\)/\1/p
        s/^.*LogLength:\(.*\)/\1/p
        /ERROR/p
        /WARN/p'
unxnut
  • 6,008
  • Thanks a lot mate! But I was able to get only the error/warning statements '20/06/25 12:19:34 ERROR executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver'. I was not able to get the other details like LogType, Log Length – Lekshmi Jul 09 '20 at 18:34
  • Do you have the missed fields not starting in column 1? If so, you can change the regex by inserting .* after ^, for example, s/^.*LogType:\(.*\)/\1/p. I'll fix the answers. – unxnut Jul 09 '20 at 19:46
  • Thank you I got the answer .But it is also giving the stdout also in addition to stderr . Can you check with that please ? – Lekshmi Jul 10 '20 at 04:19
  • Yes, you have spaces in your blank line before stdout. You can remove that or look for it. Modifying the answer for that as well. All I do is to replace ^$ with ^ *$. – unxnut Jul 10 '20 at 12:24
  • Thanks a lot for your inputs! – Lekshmi Jul 10 '20 at 14:03
1

With awk:

awk '
  /LogType:stderr/ || (p && /Log( Upload Time|Length)/){
    p=1                    # set flag for stderr block
    sub(/^[^:]+:/, "")     # replace content before `:` including `:`
    print                  # print (modified) line
  }
  p && / (WARN|ERROR) /{ 
    sub(/^[^0-9]*/, "")    # remove unknown prefix
    print
  }  
  /LogType:stdout/{ exit } # exit the script
' file
Freddy
  • 25,565
  • Thanks Freddy for the code. But should I add anything to this code because currently it is not giving any output . – Lekshmi Jul 09 '20 at 18:37
  • Not really. file should be your logfile, that's all. Does it work if you copy & paste the example from your question above (again) into a file? – Freddy Jul 09 '20 at 18:45
  • Yeah I tried that but it is not showing any output . It runs without any error but gives no output – Lekshmi Jul 09 '20 at 18:52
  • Can you try awk '/^LogType:stderr/ || /^Log( Upload Time|Length)/' file just for testing? If that doesn't print anything, there's something wrong with the logfile format. – Freddy Jul 09 '20 at 19:00
  • Yes It doesn't print anything mate – Lekshmi Jul 09 '20 at 19:01
  • Can you upload a sample somewhere? – Freddy Jul 09 '20 at 19:16
  • You need to add .* after ^ as I suggested in my answer. – unxnut Jul 09 '20 at 19:48
  • Removed ^ and added a sub() for the WARN and ERROR lines. It's still not clear how your lines start (whitespace, tab?), please try again. – Freddy Jul 09 '20 at 20:06
  • Thanks Freddy I am able to get the output! But it also prints the logUpload time and loglength of every Type which is coming after the stderr. How to remove that ? – Lekshmi Jul 10 '20 at 04:32
  • I can only guess since your input is different and replaced the last line of the script. Please try again. – Freddy Jul 10 '20 at 13:16
1

Using GNU sed and utilizing it's extended regex mode.

sed -Ee '
  /LogType:stderr/,/^\s*$/!d
  /Log Contents:/,/^\s*$/!{
    s/^[^:]*://;b
  }
  /\s(ERROR|WARN)\s/!d
' logfile

Explanation:

  • We partition the file into range (log type to blank line) and then subdivide each range into (pre log contents and post)

  • In the subrange pre block, strip away till the first colon character. But don't print it yet since we don't know as of now whether an error or warning is present in the post block of the subrange. So we hold it in the hold space.

  • When we reach the post block in the subrange we detect the error or warning lines. Then retrieve the hold and print it now.

Results:

stderr
Thu Jun 25 12:24:52 +0100 2020
3000
20/06/25 12:19:39 ERROR Exception found
20/06/25 12:20:41 WARN Warning as the node is accessed without started

In case you need also the line numbers of the error / warning messages, then use the below sed commands which are modified from above:

sed -Ee '
  /LogType:stderr/,/^\s*$/!d
  /Log Contents:/,/^\s*$/!{
    s/^[^:]*://;b
  }
  /\s(ERROR|WARN)\s/!d
  p;=;d
' logfile |
sed -Ee '/\s(ERROR|WARN)\s/N;s/\n/ on line #/'

You can use other tools like awk n perl to do this job also: Note : remove trailing spaces from blank lines first.

awk '
  BEGIN {
    RS = "\n\n"              
    FS = "\nLog Contents:\n" 
    OFS = "\n"               
    ORS = OFS                
    spc = "[[:blank:]]" 
    str = "(ERROR|WARN)" 
    pat = spc str spc 
  }
  /^LogType:stderr/ &&
  NF == 2 {
    p = $1; gsub(/(^|\n)[^:]+:/, "\n", p);sub(/./, "", p) 
    N = split($2, a, /\n/)
    print p
    for ( i=1; i<=N; i++ ) 
      if ( a[i] ~ pat ) 
        print a[i]
  }
' logfile
perl -F'/^Log\hContents:$/m,$_,2' -00 -ne '
  next if ! /\ALogType:stderr$/m;
  (my $pre = $F[0])=~ s/.*?://gm;
  my $post = join "\n",
    grep { /\s(?:ERROR|WARN)/ }
    split /\n/, $F[1];
  print($pre,$post);
' logfile
  • Thanks a lot Rakesh for your support ! I am able to get the result – Lekshmi Jul 10 '20 at 07:05
  • Hi Rakesh, I was just able to print only I block of stderr . Why is it like that ? Should I make any changes to the code ? – Lekshmi Jul 10 '20 at 14:22
  • Yes I have shared in the question – Lekshmi Jul 10 '20 at 14:32
  • Ok fine Now I understand the code. Also I think only the first line of the error message not the other lines of the particular error message .Is that possible to do that ? I have updated my ERROR log in the question. Suppose we have multiple line for a ERROR message , Will this code work ? – Lekshmi Jul 10 '20 at 14:53
  • Sorry my bad! I actually took a part of my log file and posted .Sorry Rakesh! I will update in the question – Lekshmi Jul 10 '20 at 15:14
  • Yes I have added. If there is no ERROR /WARN messages it leaves them out. It just prints the stderr which has exactly these messages which was also my requirement. – Lekshmi Jul 10 '20 at 15:42
  • No mate It prints only the first line of the ERROR message and also prints the second stderr block which has no content in it. Your previous code did well compared to this. – Lekshmi Jul 10 '20 at 16:23
  • I have put my expected output in the above question . – Lekshmi Jul 10 '20 at 16:31
  • It is printing only the first line of the ERROR not the next line of error. (i.e) it is printing 'ERROR exception found' not the next lines of that error message (i.e) java.io.exception.... – Lekshmi Jul 10 '20 at 16:39