2

My target device is a hard drive with 4096 bs. I want to bypass caches and write to storage directly or as soon as possible. Speed is not my primary concern: 30 hours is too much, but the difference between 4 and 7 hours is insignificant.


What I understand (might not be 100% correct):

conv=fsync will be executed only once at the end of dd call.

But I don't want to postpone sync until the end. I want the data written to storage as soon as possible. There two other options: oflag=direct and oflag=sync. I don't like oflag=sync, because (1) it's extremely slow when I test with bs=4096, and (2) it still uses memory cache -- unnecessarily, I think.

oflag=direct bypasses the kernel's page cache (memory cache), writing directly to the storage. But the storage may itself store the data in a write-back (hdd) cache, so conv=fsync will still be required to write hdd cache to the actual storage.

Thus, I hope it is permissible -- I wonder whether it is ideal -- to use the two arguments together, maybe like this:

dd if=/dev/zero of=/dev/sdX bs=4096 status=progress oflag=direct conv=fsync
sgon00
  • 367
  • For SSD: just in case I'd rather use the page block size (often ~ 1M) rather than the write block size. My hunch is that on SSD, if the driver doesn't deal with this and writes a new page block each time, you get up to 256 reallocations (with write amplification) in the 1M written because of the 4k size. Block buffering would probably protect from this. – A.B Jun 29 '21 at 11:43
  • @A.B thanks for the reply. Actually, I am using external hard drive instead of ssd. When dd without any direct and sync options, bs=4096 is faster than bs=1M. I haven't tested dd with direct and sync options yet. Because I am asking this question and not sure if this is the proper way to bypass all cache. Cheers. – sgon00 Jun 29 '21 at 11:47
  • For an external harddrive, just use cat /dev/zero >/dev/sdX and avoid all the fiddle with dd – Chris Davies Jun 29 '21 at 11:52
  • @roaima I think cat will use RAM too. That is not what I want. I expect direct IO without any memory and hdd cache. Cheers. – sgon00 Jun 29 '21 at 11:59
  • 1
    @roaima not really. I am actually from that question. ^_^. You can see my comments in that answer. It doesn't really tell if I should use the two options together to skip all memory and hdd caches. Thanks. – sgon00 Jun 29 '21 at 12:18

2 Answers2

3

It sounds like your question is about speed. You are already using conv=fsync, so I see no question of safety or "correctness". Unless you have hardware with a specific bug, where you would need to tell us exactly what your bug looks like.

The general rule for performance is to test your own situation, and not worry too much about small differences. dd tells you the speed. You can test different options using a small count=. E.g. you can use bs=4k count=100k to test writing 400M.


You are asking about a relatively simple case. /dev/zero will be much faster than your device, so I ignore read performance.

Using oflag=sync will wait for all temporary caches to empty after submitting each block of data, losing the benefit of always keeping data in-flight. Therefore you would need to specify a larger block size to get the best speed, e.g. bs=16M.

In principle, you want to feed the device up to two requests at a time, so that it always had at least one request to work on. Specifically on mechanical drives, if you let the feed run dry, you will have to wait for a full rotation before your next request can be written in the right place. dd itself does not do anything to ensure this. It relies on writeback cache in the kernel, or in the device.

oflag=direct is a useful in-between option. If you have a problem with kernel cache (see below), it is a great way to bypass that cache. Many devices include a writeback cache of their own, so oflag=direct can be faster than oflag=sync for the same block size.

The kernel cache is intended to work well, without slowing down IO access to other devices. E.g. your system drive that you are using at the same time :-). But this problem does happen sometimes, and people complain about it. So it might depend whether you are expecting or worried about such a problem :-).

If you want to try both options at once, you specify this as oflag=direct,sync.

Even without oflag=sync, increasing the block size by a few more times might reduce CPU usage. E.g. bs=16k or bs=1M. But bs=4k / bs=4096 is already fairly good, much better than the ancient default bs=512.

sourcejedi
  • 50,249
  • Thanks a lot for the long detailed reply. But I think you misunderstand my question. Sorry about the confusion. First, I am not asking for speed & any performance testinng. If I want to do performance test, I will simply use fio cmd. Because I met weird issues that dd does NOT write to the physical drive in fact but it shows a successful in the end, I want to skip any kind of caches asap or directly. That's why I said conv=fsync is only done once in the end, it's not sufficient. oflag=direct has a direct IO, it's what I expect, but it doesn't do fsync. So should I use them both? – sgon00 Jun 30 '21 at 03:46
  • Sorry about the confusion. I have updated my querstion. Your answer is only talking about the speed. But the speed is not my concern. I don't really care of the speed. My only concern is to skip any kind of caches. I don't want memory cache, hdd internal cache etc.. Thanks. – sgon00 Jun 30 '21 at 06:12
  • It sounds like your drive is failing, or the Linux driver for your drive is failing, and your question would need to show the failure in detail. – sourcejedi Jun 30 '21 at 08:31
  • oflag=direct will bypass the kernel cache if you need to test a theory about it. However, the theory that your kernel cache is not working correctly is unlikely. conv=fsync should be enough to ensure the kernel cache is written to the disk. – sourcejedi Jun 30 '21 at 08:37
  • conv=fsync can ensure the kernel cache is written to the disk, but it only does once in the end of dd, right? I want the data written to the disk directly or asap. Thus, I prefer using oflag=direct. The reason why I am asking if I should use both oflag=direct and conv=fsync is because people say oflag=direct will not deal with hdd write-back cache. Thus using both of them will ensure skip kernel cache and hdd cache all together. Is what I think correct? Thanks a lot. – sgon00 Jun 30 '21 at 08:41
  • I don't know what you mean by "skip ... hdd cache". conv=fsync only changes what happens after all the writes have been submitted. If you need to disable hdd writeback cache, you might be able to use the command hdparm -W 0 as detailed here: https://www.f1-consult.com/linux/linux-disk-write-caching/ – sourcejedi Jun 30 '21 at 08:45
  • 2
    If you have a weird rule that you need to avoid the kernel cache, AND regularly flush the cache in the HDD - you know you can use both options at the same time like oflag=direct,sync bs=16M, right? – sourcejedi Jun 30 '21 at 10:12
0

I wouldn't bother - dd input is so slow that you should maximise any cache usage, since output is incredibly slow. Using a large blocksize (in an effort to use more cache) is the best you can do.

With regard to @sgon00's specific question, the dd manpage says:-

fdatasync     physically write output file data before finishing            
fsync         likewise, but also write metadata

oflag=FLAGS write as per the comma separated symbol list...

oflag=nocache Request to drop cache. This option at least mentions cache. Doesn't necessarily mean that it will be granted though. This coild be combined with the 'direct' option.

dd is an old program and so its many possible parameters may not be interpreted exactly as they were originally intended.

Jeremy Boden
  • 1,320
  • I met some weird problems with my external 16TB harddrive via USB 3.0. It's not the same but similar to https://abbbi.github.io/dd/. That's why I don't want to use any cache. Because it's not actually written to the physical drive and causes many weird problems to me. Btw, you replied that you wonldn't bother. But what about if I want to bother? Will oflag=direct conv=fsync achieve what I expect? No cache, direct io to the hdd. Thanks a lot. – sgon00 Jun 29 '21 at 11:32