27

We have seen OS doing Copy on Write optimisation when forking a process. Reason being that most of the time fork is preceded by exec, so we don't want to incur the cost of page allocations and copying the data from the caller address space unnecessarily.

So does this also happen when doing CP on a linux with ext4 or xfs (journaling) file systems? If it does not happen, then why not?

sourcejedi
  • 50,249
Mridul Verma
  • 373
  • 1
  • 3
  • 5
  • Copy-on-Write is implemented on ZFS, and it has indeed very cheap filesystem/volume clones. ext4/xfs have too primitive on-disk format, I believe, to support that – myaut Sep 24 '17 at 10:22

4 Answers4

20

From cp man page:

When --reflink[=always] is specified, perform a lightweight copy, where the data blocks are copied only when modified. If this is not possible the copy fails, or if --reflink=auto is specified, fall back to a standard copy.

This works on file systems which support Copy-on-Write (reflink), mainly BTRFS at the moment. XFS reflink implementation is in development [1][2].

sebasth
  • 14,872
15

The keyword to search is reflink. It was recently implemented in XFS.

EDIT: the XFS implementation was initially marked EXPERIMENTAL. This warning was removed in the kernel release 4.16, a number of months after I wrote the above :-).

sourcejedi
  • 50,249
7

Linux has a system call that allows userspace processes to tell the kernel to make copy on write copies of files. FICLONERANGE and FICLONE used as options to ioctl allow copy on write copies of files and ranges within files to be made.

This is used by cp --reflink to make the copies where the file system supports this.

1

Unless you introduce a syscall for cp (or at least to copy a block), the OS has a hard time figuring out that the data the cp program is going to write is the same as the one it just read from another block. On top of that, you'd have additional overhead to manage the "several files share the same blocks" scenario. Large similar files that only differ in few blocks happen rarely. So it's cheaper on the whole to just copy those blocks, then to add this administrative overhead to all files.

Now if you "copy" files (lots of them) by adding another clone/snapshot of the file system in, say, BTRFS, the situation is different: Now you've "copied" all files in the filesystem, and any changes to them will be copy-on-write. This exists, but not in ext4.

"Journalling" is a completely independent concept from that, it's the administrative structures for the files that count.

dirkt
  • 32,309