Let's say I have a large file (several gigs) with n
lines in it. I would like to add/insert a line after k
bytes offset, from the beginning of the file, then what would be the fastest way to achieve that?
Asked
Active
Viewed 835 times
0

Kusalananda
- 333,661

khan
- 121
2 Answers
2
Here's a Python solution:
#!/usr/bin/env python3
# -*- encoding: utf-8 -*-
"""split_bytes.py"""
import os
import sys
stdout = os.fdopen(sys.stdout.fileno(), 'wb')
path_to_file = sys.argv[1]
width_in_bytes = int(sys.argv[2])
with open(path_to_file, "rb") as f:
byte = f.read(1)
while byte:
for i in range(width_in_bytes):
stdout.write(byte)
byte = f.read(1)
stdout.write(b"\n")
You could execute it like this:
python split_bytes.py path/to/file offset > new_file
As a test, I generated a 1GB file of random data:
dd if=/dev/urandom of=data.bin bs=64M count=16 iflag=fullblock
Then ran the script on that file:
python split_lines.py data.bin 10 > split-data.bin

igal
- 9,886
0
a bash only solution :
use split command :
split --lines=2 --suffix-length=6 /etc/passwd /tmp/split.passwd.part
reassemble the file into one new
(
for F in /tmp/split.passwd.part* ;
do
cat $F ;
echo ;
done
) > /tmp/passwd_emptyline_evrey_2

EchoMike444
- 3,165
head -c k file > temp.log; echo 'some data\n' >> temp.log; tail -c +(k+1) file > temp.log; mv temp.log file;
but not sure if its the best way to do this simple task.. – khan Mar 10 '19 at 05:56sed
solutions, but the problem with them is that they are kind of expensive for larger files.. – khan Mar 10 '19 at 06:48head -c k file > temp.log; echo 'some data\n' >> temp.log; tail -c +(k+1) file > temp.log; mv temp.log file
especially if it's a one-time operation. It might have already completed by now. – RonJohn Mar 10 '19 at 07:32concatfs
which would provide a virtual file that presents the concatenation of the parts. See https://unix.stackexchange.com/questions/94041/a-virtual-file-containing-the-concatenation-of-other-files for more – Ralph Rönnquist Mar 10 '19 at 07:38sed
solution IMHO. I've done complicatedsed
editing on files that are several hundred gigabytes. – Kusalananda Mar 10 '19 at 12:24