Efficient header replacement for large (10M line) files?

Question

I am trying to rename the column headers of a large file, and I want to know the most efficient way to do so. Files are in the range of 10M to 50M lines, with ~100 characters per line in 10 columns.

A similar question was asked to remove the first line, and the best answer involved "tail". Efficient in-place header removing for large files using sed?

My guess is:

bash-4.2$ seq -w 100000000 1 125000000 > bigfile.txt
bash-4.2$ tail -n +2 bigfile.txt > bigfile.tail && sed '1 s/^/This is my first line\n/' bigfile.tail > bigfile.new && mv -f bigfile.new bigfile.txt;

Is there a faster way?

If you have an old heading that is exactly the same size as the new heading, a very fast replacement is possible. — Jeremy Boden, Jun 17 '21 at 21:53
The heading will have the same number of columns (ie separators eg \t) but not the same number of characters. — user36302, Jun 17 '21 at 21:57

score 1 · Answer 1 · answered Jun 17 '21 at 22:18

Have a new header output to a new file, e.g. printf "This is my first line\n > bigfile.new.
Use the tail of the bigfile to supply the rest using append redirection: >>.

One remark: tail +2 is "GNUism" - will work on most Linux distribution, but is not POSIX compliant, and will probably not work on other Unices.

score 1 · Answer 2 · answered Jun 17 '21 at 23:16

1

Assuming bash and Linux, this is probably faster than the code in your question:

(echo "New headers";tail +2 bigfile.txt) > newbigfile.txt && mv newbigfile.txt bigfile.txt

answered Jun 17 '21 at 23:16

Eduardo Trápani

12,834

Efficient header replacement for large (10M line) files?

2 Answers2