11

I tried my luck with grep and sed but somehow I don't manage to get it right.

I have a log file which is about 8 GB in size. I need to analyze a 15 minute time period of suspicious activity. I located the part of the log file that I need to look at and I am trying to extract those lines and save it into a separate file. How would I do that on a regular CentOS machine?

My last try was this but it didn't work. I am at loss when it comes to sed and those type of commands.

sed -n '2762818,2853648w /var/log/output.txt' /var/log/logfile
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
Kolja
  • 365
  • 4
  • 5
  • 12

3 Answers3

14
sed -n '2762818,2853648p' /var/log/logfile > /var/log/output.txt

p is for print

thiagowfx
  • 1,249
  • Thanks. Is there a way to tell sed to make a new file if none exists? Right now I am getting sed: can't read /var/log/output.txt: No such file or directory. I can of course just make a file, but for the sake of learning, I would like to know how to do it automatically. – Kolja Jan 05 '15 at 15:28
  • This command will create the file /var/log/logfile automatically if doesn't exist. It will even replace it if it already exists. The point is: /var/log/logfile is the file which will have the lines you wanted. Now, the file you want to read from is /var/log/output.txt: I just copied your example. It seems like you are trying to read from a file that doesn't exist. You should replace it by the actual path where the log file you want to read is. – thiagowfx Jan 05 '15 at 15:34
  • 1
    Ops, you are right. I haven't bothered to read the name of the files, I just assumed the one in the left was input and the one in the right was output. I'll update my answer. – thiagowfx Jan 05 '15 at 17:06
2

Probably the best way to do this is with shell redirection, as others have mentioned. sed though, while a personal favorite, is probably not going to do this more efficiently than will head - which is designed to grab only so many lines from a file.

There are other answers on this site which demonstrably show that for large files head -n[num] | tail -n[num] will outperform sed every time, but probably even faster than that is to eschew the pipe altogether.

I created a file like:

echo | dd cbs=5000000 conv=block | tr \  \\n >/tmp/5mil_lines

And I ran it through:

{ head -n "$((ignore=2762817))" >&2
  head -n "$((2853648-ignore))" 
} </tmp/5mil_lines 2>/dev/null  |
sed -n '1p;$p'                

I only used sed at all there to grab only the first and last line to show you...

2762818
2853648

This works because when you group commands with { ... ; } and redirect the input for the group like ... ; } <input all of them will share the same input. Most commands will exhaust the whole infile while reading it so in a { cmd1 ; cmd2; } <infile case usually cmd1 reads from the head of the infile to its tail and cmd2 is left with none.

head, however, will always seek only so far through its infile as it is instructed to do, and so in a...

{ head -n [num] >/dev/null
  head -n [num]
} <infile 

...case the first seeks through to [num] and dumps its output to /dev/null and the second is left to begin its read where the first left it.

You can do...

{ head -n "$((ignore=2762817))" >/dev/null
  head -n "$((2853648-ignore))" >/path/to/outfile
} <infile

This construct also works with other kinds of compound commands. For example:

set "$((n=2762817))" "$((2853648-n))"
for n do head "-n$n" >&"$#"; shift
done <5mil_lines 2>/dev/null | 
sed -n '1p;$p'

...which prints...

2762818
2853648

But it might also work like:

d=$(((  n=$(wc -l </tmp/5mil_lines))/43 ))      &&
until   [ "$(((n-=d)>=(!(s=143-n/d))))" -eq 0 ] &&
        head "-n$d" >>"/tmp/${s#1}.split"
do      head "-n$d" > "/tmp/${s#1}.split"       || ! break
done    </tmp/5mil_lines

Above the shell initially sets the $n and $d variables to ...

  • $n
    • The line count as reported by wc for my test file /tmp/5mil_lines
  • $d
    • The quotient of $n/43 where 43 is just some arbitrarily selected divisor.

It then loops until it has decremented $n by $d to a value less $d. While doing so it saves its split count in $s and uses that value in the loop to increment the named > output file called /tmp/[num].split. The result is that it reads out an equal number of \newline delimited fields in its infile to a new outfile for each iteration - splitting it out equally 43 times over the course of the loop. It manages it without having to read its infile any more than 2 times - the first time is when wc does it to count its lines, and for the rest of the operation it only reads as many lines as it writes to the outfile each time.

After running it I checked my results like...

tail -n1 /tmp/*split | grep .

OUTPUT:

==> /tmp/01.split <==
116279  
==> /tmp/02.split <==
232558  
==> /tmp/03.split <==
348837  
==> /tmp/04.split <==
465116  
==> /tmp/05.split <==
581395  
==> /tmp/06.split <==
697674  
==> /tmp/07.split <==
813953  
==> /tmp/08.split <==
930232  
==> /tmp/09.split <==
1046511 
==> /tmp/10.split <==
1162790 
==> /tmp/11.split <==
1279069 
==> /tmp/12.split <==
1395348 
==> /tmp/13.split <==
1511627 
==> /tmp/14.split <==
1627906 
==> /tmp/15.split <==
1744185 
==> /tmp/16.split <==
1860464 
==> /tmp/17.split <==
1976743 
==> /tmp/18.split <==
2093022 
==> /tmp/19.split <==
2209301 
==> /tmp/20.split <==
2325580 
==> /tmp/21.split <==
2441859 
==> /tmp/22.split <==
2558138 
==> /tmp/23.split <==
2674417 
==> /tmp/24.split <==
2790696 
==> /tmp/25.split <==
2906975 
==> /tmp/26.split <==
3023254 
==> /tmp/27.split <==
3139533 
==> /tmp/28.split <==
3255812 
==> /tmp/29.split <==
3372091 
==> /tmp/30.split <==
3488370 
==> /tmp/31.split <==
3604649 
==> /tmp/32.split <==
3720928 
==> /tmp/33.split <==
3837207 
==> /tmp/34.split <==
3953486 
==> /tmp/35.split <==
4069765 
==> /tmp/36.split <==
4186044 
==> /tmp/37.split <==
4302323 
==> /tmp/38.split <==
4418602 
==> /tmp/39.split <==
4534881 
==> /tmp/40.split <==
4651160 
==> /tmp/41.split <==
4767439 
==> /tmp/42.split <==
4883718 
==> /tmp/43.split <==
5000000 
mikeserv
  • 58,310
  • @don_crissti - wait, what? tac would have to eat the whole file - just like tail, I would guess - but I would think that if you did the head thing first, you should be able reverse only the latter portion of the file. Is that not what happens? Sorry, this just caught me by surprise. But looking at it more and more and it's an interesting notion. – mikeserv Mar 17 '15 at 21:17
  • @don_crissti - more and more interesting... I'm going to try an strace. Oh wait a minute - tac must be testing stdin to check for a seekable input and rewinding the descriptor - it's the only thing that makes sense to me. I'll check it with strace, though. That, by the way, would be bad behavior, I think. – mikeserv Mar 17 '15 at 21:31
  • @don_crissti - Yeah - it's doing lseek(): [pid 6542] lseek(0, 0, SEEK_END) = 551 [pid 6542] ioctl(0, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff51f3a940) = -1 ENOTTY (Inappropriate ioctl for device) [pid 6542] lseek(0, 0, SEEK_END) = 551 [pid 6542] lseek(0, 0, SEEK_SET) = 0 - not bad behavior, per se, considering what tac is supposed to do, but abnormal, and contrary to what the standard utilities should do. – mikeserv Mar 17 '15 at 21:45
  • @don_crissti - not at all. – mikeserv Mar 17 '15 at 21:59
  • Mike, this is totally unrelated to your answer here but I'm just curious... if you read the question again, why wouldn't OP's "last try" work ? Are there sed's out there that don't support w ? The accepted answer does pretty much the same only with p and >... Makes no sense... – don_crissti Jul 05 '15 at 02:59
  • @don_crissti - I think it should have. some old seds - Solaris - will refuse to count that high, though. The line numbers must go that high as well. And you need write permissions. And the required space. All of that's just the obvious. It looks good to me - maybe OP just didn't like to wait for it. So I dunno. I never know with this place. – mikeserv Jul 05 '15 at 03:19
  • @don_crissti - i >= j = 1-line i address; j > i = regular 2-line i,j address; j > $ && i <= $ = 2-line i,$ address; i > $ && j > $ = 2-line no-match address; i < 1 || j < 1 = syntax error (excepting GNU's 0,... address form). Thinking about it, though, I'm not certain if j > $ is definitely equivalent to i,$. One way I can imagine that they might differ is w/ c. Try i,$ctext and i,$+1ctext - i think it might behave differently. Or maybe not. – mikeserv Jul 05 '15 at 17:54
0

You could probably accomplish this with the help of head and tail command combinations as below.

head -n{to_line_number} logfile | tail -n+{from_line_number} > newfile

Replace the from_line_number and to_line_number with the line numbers you desire.

Testing

cat logfile
This is first line.
second
Third
fourth
fifth
sixth
seventh
eighth
ninth
tenth

##I use the command as below. I extract from 4th line to 10th line. 

head -n10 logfile | tail -n+4 > newfile
fourth
fifth
sixth
seventh
eighth
ninth
tenth
Ramesh
  • 39,297