zfs send volume fluctuates misteriously

Question

Why would the amount of data in a zfs send stream "fluctuate"? I would expect send of a specific snapshot to always produce exactly the same data!

The test data below was created through something like dd if=/dev/zero of=filename bs=1024 count=1000

The snapshots were taken at intervals between creating the test files. I then repeatedly sent the data to /dev/null using command-history to re-run the command again and again rapidly. No cheating - the snapshots were taken prior to the test and no amount of "writing" to the file system should affect what goes on in an incremental send, or so I would imagine.

sol10-primary:/TST> # ls -l
total 1007
-rw-r--r--   1 root     root      102400 Mar  4 12:01 data1
-rw-r--r--   1 root     root      102400 Mar  4 11:49 data2
-rw-r--r--   1 root     root          30 Mar  4 11:48 data3
-rw-r--r--   1 root     root      102400 Mar  4 11:50 data4
-rw-r--r--   1 root     root      102400 Mar  4 11:52 data5
-rw-r--r--   1 root     root      102400 Mar  4 11:53 data6
sol10-primary:/TST> # rm data5
sol10-primary:/TST> # dd if=/dev/zero of=data5 bs=1024 count=100          
100+0 records in
100+0 records out
sol10-primary:/TST> # zfs send -i s1 rpool/tst@s3 | dd bs=1024 > /dev/null
412+6 records in
412+6 records out
sol10-primary:/TST> # zfs send -i s1 rpool/tst@s3 | dd bs=1024 > /dev/null
412+5 records in
412+5 records out
sol10-primary:/TST> # zfs send -i s1 rpool/tst@s3 | dd bs=1024 > /dev/null
412+6 records in
412+6 records out
sol10-primary:/TST> # zfs send -i s1 rpool/tst@s3 | dd bs=1024 > /dev/null
402+32 records in
402+32 records out
sol10-primary:/TST> # zfs send -i s1 rpool/tst@s3 | dd bs=1024 > /dev/null
405+22 records in
405+22 records out
sol10-primary:/TST> # zfs send -i s1 rpool/tst@s3 | dd bs=1024 > /dev/null
412+6 records in
412+6 records out
sol10-primary:/TST> # zfs send -i s1 rpool/tst@s3 | dd bs=1024 > /dev/null
412+5 records in
412+5 records out
sol10-primary:/TST> # zfs list -o name,mountpoint,used,referenced,usedbychildren,usedbysnapshots -r rpool/tst
NAME          MOUNTPOINT   USED  REFER  USEDCHILD  USEDSNAP
rpool/tst     /TST         892K   532K          0      360K
rpool/tst@s1  -             20K   232K          -         -
rpool/tst@s2  -             20K   432K          -         -
rpool/tst@s3  -            120K   532K          -         -

Sometimes in the above I created a snapshot then dd-ed over an existing file, just to get something into the "USEDBYSNAPSHOTS" value.

This is basically pretty much an academic question, I suspect that the variance is within a very small limit, only noticed here because my test data is itself pretty small.

wc -c gives me the consistent answer I expected. I sometimes use dd to just get a report of the amount of data transferred "mid-stream" ... Now I want to know why dd does that. :-/ ... care to explain? — Johan, Mar 04 '13 at 12:38
When is dd suitable for copying data? (or, when are read() and write() partial) (Cc @StephaneChazelas, and btw congratulations) — Gilles 'SO- stop being evil', Mar 04 '13 at 22:50

score 2 · Answer 1 · answered Mar 04 '13 at 14:24

Do not use dd with pipes. dd is low-level. It's an interface to the read and write system calls.

When you do a dd bs=1024 count=1, it does a read(0, buf, 1024).

If read returns less data than 1024 bytes (for instance because the pipe only contains 200 bytes at the moment), dd won't reattempt the read to get the missing 824 bytes, it will report an incomplete block read in the end (the part after +). The same thing can happen on writing as well.

That's why it's dangerous to use dd on pipes as unless you can guarantee that the processes writing or reading to the pipe do so in amounts that are proportional to the block size and are dividers to the pipe size, there's no guarantee that you'll get full blocks.

(echo AAAA; sleep 1; echo BBBBBB) | dd bs=3 > /dev/null
3+2 records in
3+2 records out

That's not too big a problem here as we're just writing what we read but it can be more problematic when for instance you specify a count.

GNU dd has -iflag fullblock to work around that:

$ (echo AAAA; sleep 1; echo BBBBBB) | dd bs=3 iflag=fullblock > /dev/null
4+0 records in
4+0 records out
12 bytes (12 B) copied, 1.00068 s, 0.0 kB/s

zfs send volume fluctuates misteriously

1 Answers1