1

I am trying to rename the column headers of a large file, and I want to know the most efficient way to do so. Files are in the range of 10M to 50M lines, with ~100 characters per line in 10 columns.

A similar question was asked to remove the first line, and the best answer involved "tail". Efficient in-place header removing for large files using sed?

My guess is:

bash-4.2$ seq -w 100000000 1 125000000 > bigfile.txt
bash-4.2$ tail -n +2 bigfile.txt > bigfile.tail && sed '1 s/^/This is my first line\n/' bigfile.tail > bigfile.new && mv -f bigfile.new bigfile.txt;

Is there a faster way?

user36302
  • 111
  • If you have an old heading that is exactly the same size as the new heading, a very fast replacement is possible. – Jeremy Boden Jun 17 '21 at 21:53
  • The heading will have the same number of columns (ie separators eg \t) but not the same number of characters. – user36302 Jun 17 '21 at 21:57

2 Answers2

1
  1. Have a new header output to a new file, e.g. printf "This is my first line\n > bigfile.new.
  2. Use the tail of the bigfile to supply the rest using append redirection: >>.

One remark: tail +2 is "GNUism" - will work on most Linux distribution, but is not POSIX compliant, and will probably not work on other Unices.

d.c.
  • 887
1

Assuming bash and Linux, this is probably faster than the code in your question:

(echo "New headers";tail +2 bigfile.txt) > newbigfile.txt && mv newbigfile.txt bigfile.txt