I have huge files {0..9}.bin
which I want concatenate into out.bin
. I don't need the original files afterwards. So I was wondering, if this is possible by only modifying the filesystem index without copying the file contents (see Append huge files to each other without copying them for efficient copy solutions).
On modern file systems (e.g. btrfs) cp --reflink=always
exists. Fifos are on the file system level (at least btrfs send
also tracks fifos), so they should have information about the actual data blocks used. Therefore, cp --reflink=always
should be able to determine the extend numbers on disk and re-use them.
So I was wondering, if it is possible to use mkfifo
in combination with cp --reflink=always
?
Update Currently, it is not working:
for i in {1..9}; do dd if=/dev/urandom of="in$i.bin" bs=5M count=200; done;
mkfifo fifo
cat in* >fifo &
cp --reflink=always fifo out.bin
results in
cp: failed to clone 'out.bin' from 'fifo': Invalid argument
Probably, it never will, since FIFOs have no information about the storage origin bug instead are just dumb pipes.
cat *.bin > myfifo.bin
does not actually copy the contents into RAM but instead reference them somehow. – darkdragon May 20 '20 at 10:28cp --reflink=always
refers to copy-on-write block operations - meaning blocks are not duplicated until they're written to, although this refers to files stored on the file system, not a fifo (which isn't actually a file with data blocks). – Pedro May 20 '20 at 11:00cat *.bin > out.bin
and check whether it's acceptable performance and space consumption. If it is, you're solving a problem that isn't a problem. I completely understand the logic of your question and conceptually the concatenation will duplicate all data blocks unnecessarily, but I don't think there's a mechanism that avoids this. Have a look at this https://unix.stackexchange.com/questions/118244/fastest-way-to-concatenate-files – Pedro May 20 '20 at 11:04