0

Let's say I have a large file (several gigs) with n lines in it. I would like to add/insert a line after k bytes offset, from the beginning of the file, then what would be the fastest way to achieve that?

Kusalananda
  • 333,661
khan
  • 121
  • what did you try until now? – yael Mar 10 '19 at 05:53
  • please let us know why you want to do it? – yael Mar 10 '19 at 05:54
  • I was thinking of trying something like head -c k file > temp.log; echo 'some data\n' >> temp.log; tail -c +(k+1) file > temp.log; mv temp.log file; but not sure if its the best way to do this simple task.. – khan Mar 10 '19 at 05:56
  • Want to (ideally) achieve a way through which in-place insert into a file can be made possible.. – khan Mar 10 '19 at 05:58
  • Lines are, well, line-oriented objects, and bytes are raw, binary objects. Do you need to add a line after X number of lines? – RonJohn Mar 10 '19 at 06:37
  • 1
    @yael it doesn't really matter why he wants to do it. – RonJohn Mar 10 '19 at 06:38
  • @RonJohn I would ideally like to add after a certain number bytes, but yeah, lines can work too. I have looked into those sed solutions, but the problem with them is that they are kind of expensive for larger files.. – khan Mar 10 '19 at 06:48
  • "but the problem with them is that they are kind of expensive for larger files." and I was going to give you a sed example... :) – RonJohn Mar 10 '19 at 07:29
  • 1
    For Very Large Files, your standard Unix text tools might not be the best solution. It certainly can't hurt to try head -c k file > temp.log; echo 'some data\n' >> temp.log; tail -c +(k+1) file > temp.log; mv temp.log file especially if it's a one-time operation. It might have already completed by now. – RonJohn Mar 10 '19 at 07:32
  • One option could be to use concatfs which would provide a virtual file that presents the concatenation of the parts. See https://unix.stackexchange.com/questions/94041/a-virtual-file-containing-the-concatenation-of-other-files for more – Ralph Rönnquist Mar 10 '19 at 07:38
  • The fastest way would be to write your own program, e.g. in C, to do that. It will still need to copy all the gigabytes after the line you insert. Because there's no real fast way to do it, the best advice is "don't use a sequential format of several gigabytes for inserts. Use a different format, e.g. a tree". Or split your large file into smaller files. – dirkt Mar 10 '19 at 09:43
  • "Several gigs" does not sound too bad for a sed solution IMHO. I've done complicated sed editing on files that are several hundred gigabytes. – Kusalananda Mar 10 '19 at 12:24

2 Answers2

2

Here's a Python solution:

#!/usr/bin/env python3
# -*- encoding: utf-8 -*-
"""split_bytes.py"""

import os
import sys

stdout = os.fdopen(sys.stdout.fileno(), 'wb')

path_to_file = sys.argv[1]
width_in_bytes = int(sys.argv[2])

with open(path_to_file, "rb") as f:
    byte = f.read(1)
    while byte:
        for i in range(width_in_bytes):
            stdout.write(byte)
            byte = f.read(1)
        stdout.write(b"\n")

You could execute it like this:

python split_bytes.py path/to/file offset > new_file

As a test, I generated a 1GB file of random data:

dd if=/dev/urandom of=data.bin bs=64M count=16 iflag=fullblock

Then ran the script on that file:

python split_lines.py data.bin 10 > split-data.bin
igal
  • 9,886
0

a bash only solution :

use split command :

split --lines=2 --suffix-length=6 /etc/passwd /tmp/split.passwd.part

reassemble the file into one new

(
  for F in /tmp/split.passwd.part* ; 
  do 
    cat $F ; 
    echo ; 
  done
) > /tmp/passwd_emptyline_evrey_2
EchoMike444
  • 3,165