1

From this answer I have reduced a log file to this:

Timestamp:1359021601 2013-01-17 15:00:01
size: 10G   /mnt/SolrFiles/solr/api/
Timestamp:1359025201 2013-01-17 16:00:01
size: 11G   /mnt/SolrFiles/solr/api/
...snip hundreds of lines...
Timestamp:1359021601 2013-01-24 10:00:01
size: 11G   /mnt/SolrFiles/solr/api/
Timestamp:1359025201 2013-01-24 11:00:01
size: 11G   /mnt/SolrFiles/solr/api/
Timestamp:1359028801 2013-01-24 12:00:01
size: 11G   /mnt/SolrFiles/solr/api/
Timestamp:1359032401 2013-01-24 13:00:01
size: 12G   /mnt/SolrFiles/solr/api/

That pattern will carry on for hundreds of lines. I would like to reduce the file to only show the Timestamps and sizes when the size changes, like this:

Timestamp:1359021601 2013-01-17 15:00:01
size: 10G   /mnt/SolrFiles/solr/api/
Timestamp:1359025201 2013-01-17 16:00:01
size: 11G   /mnt/SolrFiles/solr/api/
Timestamp:1359032401 2013-01-24 13:00:01
size: 12G   /mnt/SolrFiles/solr/api/

Can this be accomplished using common Linux CLI tools such as grep and sed?

dotancohen
  • 15,864
  • What about using a few lines of Perl or Python to get this solved? – wollud1969 Jan 24 '13 at 14:58
  • I'm open to all solutions, but ideally I would be able to add the solution as a bash alias without installing anything to $HOME/bin. Even better would be a one-liner that I could bang out from memory and impress my colleagues with! I'll be using this from a few terminals SSHed into a few different servers. – dotancohen Jan 24 '13 at 15:01

1 Answers1

7

That's a typical job for awk:

awk '/^Timestamp/{t=$0; next}
     /^size/ && $2 != last_size {
        print t
        print
        last_size = $2
     }'

If you want to make it obscure and consise, you could do:

awk '!(/^T/&&t=$0)&&$2!=l&&(l=$2)&&$0=t RS$0'
  • Cool! I would have done it using Perl, but this is definitely better. – wollud1969 Jan 24 '13 at 15:22
  • Thanks, Stephane! Is there any way to concise this into a single line of bash? If not then no worry. – dotancohen Jan 24 '13 at 15:37
  • @dotancohen you can put the above awk on a single line, but leaving the newlines in won't prevent you from running it in an interactive shell. – jordanm Jan 24 '13 at 15:40
  • When I put it in an alias or shell script, I get this error: awk: line 1: syntax error at or near != awk: line 1: syntax error at or near. For instance: alias grow="cat hourly.log | sed -n '/Timestamp/{p;n;n;n;n;n;n;n;n;n;p}' | awk '/^Timestamp/{t=$0; next} ; /^size/ && $2 != last_size { print t ; print ; last_size = $2 }' | tail". On the CLI it works perfect. Any idea why? – dotancohen Jan 24 '13 at 16:47
  • @dotancohen, Don't use aliases, use functions. It doesn't work above because the $2, $0 are inside double quotes so get expanded by the shell. Run alias alone to see. – Stéphane Chazelas Jan 24 '13 at 17:29
  • Perfect, thank you, it works as a function. Have a peaceful weekend! – dotancohen Jan 25 '13 at 12:13