fast ways of removing beginning lines from large text file

Question

I have a big text file (>500GB), all the ways I can find (sed/tail and others) all require write the 500GB content to disk. Is there anyway to quickly remove the first a few lines in place without writing 500GB to disk?

There is no way to efficiently remove things from the start of a file. — don_crissti, Feb 16 '17 at 23:44
Good find, don. I was about to suggest ed, but the other Q covers it. — Jeff Schaller, Feb 16 '17 at 23:51
Thank you for pointing out! How about removing the last line? I see it says removing last line can be very fast, but it didn't say how. @don_crissti — 1a1a11a, Feb 17 '17 at 00:22
Well, if you know the size in bytes you can truncate the file. For your actual problem there's also this approach... — don_crissti, Feb 17 '17 at 00:29

guile · Answer 1 · 2017-02-16T23:25:53.553

0

By using the tail command in that way:

# tail -n +<lines to skip> filename

for example:

tail -n +1000 hugefile.txt > hugefile-wo-the-first-1000-lines.txt

And that's all.- For more information: https://es.wikipedia.org/wiki/Tail

BTW: Don't be fooled if someone tell you this is exactly the opposite what you want to do, I've tested it:

$ tail -n +3 /tmp/test 
3
4
5

$ cat /tmp/test 
1
2
3
4
5

edited Feb 16 '17 at 23:25

answered Feb 16 '17 at 23:17

guile

51

2

This is exactly what the OP does not want to do. – don_crissti Feb 16 '17 at 23:20
1

this method needs to write 500GB data to the disk, my question is how to in-place remove the first few lines without writing so much data. – 1a1a11a Feb 17 '17 at 00:16

edaemon · Accepted Answer · 2017-02-16T23:46:56.710

0

You can use sed to delete lines in place with the -i option:

$ cat foo.txt
bar
baz
lorem
$ sed -i '1d' foo.txt
$ cat foo.txt
baz
lorem

You can also delete a range of lines; for example sed -i '1,4d' foo.txt will remove lines 1-4.

EDIT: as don pointed out in the comments, the -i option still creates a copy.

edited Feb 16 '17 at 23:46

answered Feb 16 '17 at 23:38

edaemon

356

3

This will also create a temporary file, write the 500GB minus a few lines to the temporary file then overwrite the original. – don_crissti Feb 16 '17 at 23:39
@don_crissti: does it? It's possible, I'm not 100% familiar with sed's inner workings, but the -i option in the manual says: "edit files in place". I always assumed that meant it would just modify the file without having to create a copy. – edaemon Feb 16 '17 at 23:42
2

As Don says. sed -i ... is equivalent to sed ... file >tmpfile && mv tmpfile file. Removing lines from a file in place (properly) is not possible as the length of the file changes. – Kusalananda Feb 16 '17 at 23:43
@Kusalananda: huh, okay. Learned something new, I guess. – edaemon Feb 16 '17 at 23:45
Thank you for your answer even though it didn't solve the problem. – 1a1a11a Feb 17 '17 at 00:18

fast ways of removing beginning lines from large text file

2 Answers2