12

I know that cp has a --reflink option to control full copies vs. copy-on-write "copies".

On btrfs, can I use ls (or some other command) to find out whether a file shares (in a copy-on-write senses) some storage with another file?

EDIT: @StéphaneChazelas points me to filefrag, but that fails for me:

root@void:/tmp/mount# mount | tail -1
/tmp/back on /tmp/mount type btrfs (rw,relatime,space_cache)
root@void:/tmp/mount# df -h | tail -1
/dev/loop0       32M   13M   20M  38% /tmp/mount
root@void:/tmp/mount# ls -lh
total 8.0M
-rw-r--r-- 1 root root 8.0M Jan 19 08:43 one
root@void:/tmp/mount# cp --reflink=always one two
root@void:/tmp/mount# sync
root@void:/tmp/mount# ls -lh
total 16M
-rw-r--r-- 1 root root 8.0M Jan 19 08:43 one
-rw-r--r-- 1 root root 8.0M Jan 19 08:45 two
root@void:/tmp/mount# df -h | tail -1
/dev/loop0       32M   13M   20M  38% /tmp/mount
root@void:/tmp/mount# filefrag -kvx one 
Filesystem type is: 9123683e
File size of one is 8388608 (8192 blocks of 1024 bytes)
FIEMAP failed with unknown flags 2
one: FIBMAP unsupported
root@void:/tmp/mount# uname -a
Linux void 4.1.7+ #817 PREEMPT Sat Sep 19 15:25:36 BST 2015 armv6l GNU/Linux
not-a-user
  • 1,577

5 Answers5

7

Update (Jan-2021): see comment by @bitinerant: "btrfs-debug-tree is now obsolete; use btrfs inspect-internal dump-tree"


I don't know how to find it via the ls command. But if you really want it, you can use the btrfs-progs/btrfs-debug-tree.

With reflink=always, the files will share a common data block. This common data block (aka extents) has refs more than 1.

  1. First you need to find the objectid for the files one and two

     #./btrfs-debug-tree  /dev/xvdc
     (Check under FS_TREE)
       <snip>
         item 8 key (256 DIR_INDEX 4) itemoff 15842 itemsize 33
             location key (259 INODE_ITEM 0) type FILE
             namelen 3 datalen 0 name: one
         item 9 key (256 DIR_INDEX 5) itemoff 15809 itemsize 33
             location key (260 INODE_ITEM 0) type FILE
             namelen 3 datalen 0 name: two
       </snip>
    

From above we can see its 259(one) and 260(two).

  1. Now find its refs. from extent tree. Below command will find the data block shared between two files.

     # ./btrfs-debug-tree  /dev/xvdc | grep -A2 "refs 2"
             extent refs 2 gen 9 flags DATA
             extent data backref root 5 objectid 260 offset 0 count 1
             extent data backref root 5 objectid 259 offset 0 count 1
    

Bonus: Create another reference:

# cp --reflink=always one three

verify the refcount is incremented by 1.

# ./btrfs-debug-tree   /dev/xvdc | grep -A3 "refs 3"
        extent refs 3 gen 9 flags DATA
        extent data backref root 5 objectid 260 offset 0 count 1
        extent data backref root 5 objectid 261 offset 0 count 1
        extent data backref root 5 objectid 259 offset 0 count 1

Here the data block is shared between three files which are pointed to by objectid 259,260,261.

3

Just use:

$ btrfs filesystem du .
       Total   Exclusive  Set shared  Filename
    1.11GiB     1.11GiB           -  ./file1
    1.12GiB     1.12GiB           -  ./file2
    1.31GiB       0.00B           -  ./file3
    3.54GiB     2.23GiB     1.31GiB  .

In this example, 'file3' is a reflink copy as it is not consuming any Exclusive space.

LuckyDams
  • 31
  • 2
2

@pwaller's answer shows that a listing of the data extents of each file can be compared to see if two files share identical extents. filefrag from the e2fsprogs package can (almost) do this. filefrag -v FILE1 FILE2 will show if FILE1 and FILE2 have the same extents, in which case they are reflinks of each other.

Doing this programatically in a script is harder because filefrag outputs the filename. To do this, I have a patched copy of filefrag which makes two changes:

  1. Output the device ID
  2. Do not output the filename if only one filename is specified

With these changes, the outputs from two calls to filefrag can be compared. If identical, then the two files are reflinks of each other.

One final caveat: If the output from filefrag matches the regex inline|unknown_loc|delalloc, then the file cannot be reflinked since it has no data block. To handle that case, I wrap my patched filefrag with a check for that pattern and append the filename itself to the output if I find it (to make the output unique per filename, so that it will not match the output for a different filename). See @StéphaneChazelas's comments here for more details.

I submitted a pull request (https://github.com/tytso/e2fsprogs/pull/87) and an issue (https://github.com/tytso/e2fsprogs/issues/88) for this.

jrw32982
  • 723
1

I have just released a program called fienode (← link) which computes a SHA1 hash of the physical extents of a file. Identical CoW copies have the same hash.

There is also a more detailed answer here, explaining why this is necessary.

Note however, that BTRFS is at liberty to change the physical extents. I've observed a large reflinked file changes its physical extents without provocation, making the fienode output different, even though the majority of the physical extents were still shared.

pwaller
  • 312
  • 4
  • 6
0

On xfs at least if the files have not been altered then filefrag sets a shared flag.

For example:

 >filefrag -e foobar
 Filesystem type is: 58465342
 File size of filesystems.docker is 1344 (1 block of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
 0:        0..       0:  348117738.. 348117738:      1:             last,eof
 foo: 1 extent found

>cp --reflink=auto foo bar >filefrag -e foo Filesystem type is: 58465342 File size of filesystems.docker is 1344 (1 block of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 0: 348117738.. 348117738: 1: last,shared,eof foo: 1 extent found

caveat: I'm not sure what happens if part of a file is altered so that only some blocks are in common.

caveat 2: I don't know if this works on btrfs (comment or edit if you do)