How to insert a line in a file after certain bytes

Question

Let's say I have a large file (several gigs) with n lines in it. I would like to add/insert a line after k bytes offset, from the beginning of the file, then what would be the fastest way to achieve that?

I was thinking of trying something like head -c k file > temp.log; echo 'some data\n' >> temp.log; tail -c +(k+1) file > temp.log; mv temp.log file; but not sure if its the best way to do this simple task.. — khan, Mar 10 '19 at 05:56
Want to (ideally) achieve a way through which in-place insert into a file can be made possible.. — khan, Mar 10 '19 at 05:58
Lines are, well, line-oriented objects, and bytes are raw, binary objects. Do you need to add a line after X number of lines? — RonJohn, Mar 10 '19 at 06:37
@RonJohn I would ideally like to add after a certain number bytes, but yeah, lines can work too. I have looked into those sed solutions, but the problem with them is that they are kind of expensive for larger files.. — khan, Mar 10 '19 at 06:48
"but the problem with them is that they are kind of expensive for larger files." and I was going to give you a sed example... :) — RonJohn, Mar 10 '19 at 07:29
For Very Large Files, your standard Unix text tools might not be the best solution. It certainly can't hurt to try head -c k file > temp.log; echo 'some data\n' >> temp.log; tail -c +(k+1) file > temp.log; mv temp.log file especially if it's a one-time operation. It might have already completed by now. — RonJohn, Mar 10 '19 at 07:32
One option could be to use concatfs which would provide a virtual file that presents the concatenation of the parts. See https://unix.stackexchange.com/questions/94041/a-virtual-file-containing-the-concatenation-of-other-files for more — Ralph Rönnquist, Mar 10 '19 at 07:38
The fastest way would be to write your own program, e.g. in C, to do that. It will still need to copy all the gigabytes after the line you insert. Because there's no real fast way to do it, the best advice is "don't use a sequential format of several gigabytes for inserts. Use a different format, e.g. a tree". Or split your large file into smaller files. — dirkt, Mar 10 '19 at 09:43
"Several gigs" does not sound too bad for a sed solution IMHO. I've done complicated sed editing on files that are several hundred gigabytes. — Kusalananda, Mar 10 '19 at 12:24

igal · Answer 1 · 2019-03-11T03:02:34.590

Here's a Python solution:

#!/usr/bin/env python3
# -*- encoding: utf-8 -*-
"""split_bytes.py"""

import os
import sys

stdout = os.fdopen(sys.stdout.fileno(), 'wb')

path_to_file = sys.argv[1]
width_in_bytes = int(sys.argv[2])

with open(path_to_file, "rb") as f:
    byte = f.read(1)
    while byte:
        for i in range(width_in_bytes):
            stdout.write(byte)
            byte = f.read(1)
        stdout.write(b"\n")

You could execute it like this:

python split_bytes.py path/to/file offset > new_file

As a test, I generated a 1GB file of random data:

dd if=/dev/urandom of=data.bin bs=64M count=16 iflag=fullblock

Then ran the script on that file:

python split_lines.py data.bin 10 > split-data.bin

This looks like a promising approach..let me try some tests. — khan, Mar 11 '19 at 03:05

score 0 · Answer 2 · answered Mar 11 '19 at 03:51

a bash only solution :

use split command :

split --lines=2 --suffix-length=6 /etc/passwd /tmp/split.passwd.part

reassemble the file into one new

(
  for F in /tmp/split.passwd.part* ; 
  do 
    cat $F ; 
    echo ; 
  done
) > /tmp/passwd_emptyline_evrey_2

How to insert a line in a file after certain bytes

2 Answers2