I have a big text file (>500GB), all the ways I can find (sed/tail and others) all require write the 500GB content to disk. Is there anyway to quickly remove the first a few lines in place without writing 500GB to disk?
Asked
Active
Viewed 3,904 times
1
-
2There is no way to efficiently remove things from the start of a file. – don_crissti Feb 16 '17 at 23:44
-
Good find, don. I was about to suggest ed, but the other Q covers it. – Jeff Schaller Feb 16 '17 at 23:51
-
Thank you for pointing out! How about removing the last line? I see it says removing last line can be very fast, but it didn't say how. @don_crissti – 1a1a11a Feb 17 '17 at 00:22
-
Well, if you know the size in bytes you can truncate the file. For your actual problem there's also this approach... – don_crissti Feb 17 '17 at 00:29
2 Answers
0
By using the tail command in that way:
# tail -n +<lines to skip> filename
for example:
tail -n +1000 hugefile.txt > hugefile-wo-the-first-1000-lines.txt
And that's all.- For more information: https://es.wikipedia.org/wiki/Tail
BTW: Don't be fooled if someone tell you this is exactly the opposite what you want to do, I've tested it:
$ tail -n +3 /tmp/test
3
4
5
$ cat /tmp/test
1
2
3
4
5

guile
- 51
-
2
-
1this method needs to write 500GB data to the disk, my question is how to in-place remove the first few lines without writing so much data. – 1a1a11a Feb 17 '17 at 00:16
0
You can use sed
to delete lines in place with the -i
option:
$ cat foo.txt
bar
baz
lorem
$ sed -i '1d' foo.txt
$ cat foo.txt
baz
lorem
You can also delete a range of lines; for example sed -i '1,4d' foo.txt
will remove lines 1-4.
EDIT: as don pointed out in the comments, the -i
option still creates a copy.

edaemon
- 356
-
3This will also create a temporary file, write the 500GB minus a few lines to the temporary file then overwrite the original. – don_crissti Feb 16 '17 at 23:39
-
@don_crissti: does it? It's possible, I'm not 100% familiar with sed's inner workings, but the
-i
option in the manual says: "edit files in place". I always assumed that meant it would just modify the file without having to create a copy. – edaemon Feb 16 '17 at 23:42 -
2As Don says.
sed -i ...
is equivalent tosed ... file >tmpfile && mv tmpfile file
. Removing lines from a file in place (properly) is not possible as the length of the file changes. – Kusalananda Feb 16 '17 at 23:43 -
-