96

On occasion I've seen comments online along the lines of "make sure you set 'bs=' because the default value will take too long," and my own extremely-unscientific experiences of, "well that seemed to take longer than that other time last week" seem to bear that out. So whenever I use 'dd' (typically in the 1-2GB range) I make sure to specify the bytes parameter. About half the time I use the value specified in whatever online guide I'm copying from; the rest of the time I'll pick some number that makes sense from the 'fdisk -l' listing for what I assume is the slower media (e.g. the SD card I'm writing to).

For a given situation (media type, bus sizes, or whatever else matters), is there a way to determine a "best" value? Is it easy to determine? If not, is there an easy way to get 90-95% of the way there? Or is "just pick something bigger than 512" even the correct answer?

I've thought of trying the experiment myself, but (in addition to being a lot of work) I'm not sure what factors impact the answer, so I don't know how to design a good experiment.

  • writing to the same storage medium is different than writing to a different storage medium and would require different optimal settings, there are many variables which will be different for everyone, depending on device type, speed, cache and so on. On my machine bs=256M is optimal. –  Jun 02 '15 at 23:44
  • 1
    http://serverfault.com/questions/147935/how-to-determine-the-best-byte-size-for-the-dd-command || http://unix.stackexchange.com/questions/9432/is-there-a-way-to-determine-the-optimal-value-for-the-bs-parameter-to-dd || http://superuser.com/questions/234199/good-block-size-for-disk-cloning-with-diskdump-dd – Ciro Santilli OurBigBook.com Aug 24 '15 at 15:54

6 Answers6

83

There's but one way to determine the optimal block size, and that's a benchmark. I've just made a quick benchmark. The test machine is a PC running Debian GNU/Linux, with kernel 2.6.32 and coreutils 8.5. Both filesystems involved are ext3 on LVM volumes on a hard disk partition. The source file is 2GB (2040000kB to be precise). Caching and buffering are enabled. Before each run, I emptied the cache with sync; echo 1 >|/proc/sys/vm/drop_caches. The run times do not include a final sync to flush the buffers; the final sync takes on the order of 1 second.

The same runs were copies on the same filesystem; the diff runs were copies to a filesystem on a different hard disk. For consistency, the times reported are the wall clock times obtained with the time utility, in seconds. I only ran each command once, so I don't know how much variance there is in the timing.

             same   diff
             t (s)  t (s)
dd bs=64M    71.1   51.3
dd bs=1M     73.9   41.8
dd bs=4k     79.6   48.5
dd bs=512    85.3   48.9
cat          76.2   41.7
cp           77.8   45.3

Conclusion: A large block size (several megabytes) helps, but not dramatically (a lot less than I expected for same-drive copies). And cat and cp don't perform so badly. With these numbers, I don't find dd worth bothering with. Go with cat!

Jonas Stein
  • 4,078
  • 4
  • 36
  • 55
  • I'd recommend the OP to do his own benchmarking, but anyway, nice answer! – ninjalj Mar 17 '11 at 23:00
  • not in the context of the topic, @Gilles, what is the difference in echo 1 >|/proc/sys/vm/drop_caches and echo 1 > /proc/sys/vm/drop_caches – Nikhil Mulley Dec 23 '11 at 14:09
  • 8
    @Nikhil >| is the same as > except that under set -o noclobber, the shell will complain that the file exists if you use >. – Gilles 'SO- stop being evil' Dec 23 '11 at 21:29
  • @Gilles Would you cat whole HDD of 500GB too in cloning? Any better way? – Léo Léopold Hertz 준영 May 11 '16 at 08:51
  • 3
    @Masi Yes, if I want to clone a whole disk, I'll use cat. Why are you looking for a better way? What's wrong with cat? – Gilles 'SO- stop being evil' May 11 '16 at 11:05
  • @Gilles I am just a newbie who have HDDs which have some broken sectors so worried about copying with them. So what you propose is cat /media/HDD1 > /media/HDD2. Is there any limitation about Size? HDD1 is always smaller/equal than HDD2, which I think must be maintained. – Léo Léopold Hertz 준영 May 11 '16 at 12:26
  • 8
    @Masi cat just copies its input to its output. If you want to copy from unreliable media, and skip over unreadable parts or retry multiple times, that's a different problem, for which ddrescue works pretty nicely. – Gilles 'SO- stop being evil' May 11 '16 at 12:52
  • 1
    Well, dd reports the amount of data copied and the speed both during and after the copy, while cat does not, so it's a lot nicer when you're trying to clone a disk. – sudo May 06 '17 at 01:47
  • 4
    @sudo You can get the amount of data copied with lsof. Instant speed isn't very relevant with a disk copy because it's uniform so you can divide bytes transferred by elapsed time; if you want something better, you can use pv. – Gilles 'SO- stop being evil' May 06 '17 at 18:36
  • @Gilles In my experience, pv significantly bottlenecks the copy, so I only use it for slower things like network transfers. And how is cat better than dd at this point if it's not any faster, and you have to go manually calculate the throughput? – sudo May 06 '17 at 20:57
  • @sudo cat is faster if you don't get the bs parameter right, and unlike using dd with the bs parameter, there's no risk that it'll skip data in the middle, – Gilles 'SO- stop being evil' May 06 '17 at 21:22
  • @Gilles Ah, so if you don't know what bs to use, I now see that cat picks automatically based on the system buffer and filesystem block sizes. I still see dd perform much better in certain cases if you're careful, but I can see why you'd use cat. – sudo May 06 '17 at 22:12
  • 1
    @Gilles'SO-stopbeingevil' If you remember, can you please explain what do (did, could) you mean by "risk of skipping data in the middle" using dd? – Silv May 08 '20 at 23:19
  • 1
    PS. I've looked in the internet for dd being risky in this context, but with no luck. – Silv May 08 '20 at 23:24
  • 3
    @Silv https://unix.stackexchange.com/questions/32988/why-does-dd-from-dev-random-give-different-file-sizes – Gilles 'SO- stop being evil' May 09 '20 at 11:27
  • 1
    @Gilles'SO-stopbeingevil' Thank you! I didn't expect you to remember what you were about 3 years ago. :) That's a useful reading (along with a couple of other SO questions I encountered reading it). – Silv May 09 '20 at 18:03
  • Which commands were used exactly for dd, cp and cat? – Cadoiz Jan 19 '21 at 07:50
  • 1
    @Cadoiz The commands shown, plus the filename arguments or redirections. In cases where both a file name and redirection could be used (e.g. cat input), I don't remember which one was used, but it doesn't make any difference anyway (they don't use different open flags or anything that could affect the speed of the copy). – Gilles 'SO- stop being evil' Jan 19 '21 at 11:09
  • The more times I return to this answer the more I question why cat or cp would be faster thandd; what's going on in userland or kernalland to make this happen. Is dd using a different syscall to write the data? If not then is there any possibility the benchmark is being unfair? Could dd be executing an fsync() that's not performed by cat? – Philip Couling Oct 29 '21 at 14:13
  • @PhilipCouling I think it's due to the size and sequencing of read and write calls. dd forces a read of bs bytes followed by a write of bs bytes. Other tools can pick more optimal sequences if they want. But I don't know what cat and cp actually do. – Gilles 'SO- stop being evil' Oct 29 '21 at 14:25
  • 3
    @PhilipCouling Also, cp and cat in modern Busybox (≥1.23, 2014 release) uses the sendfile system call, and the latest coreutils (≥9.0, Sep 2021 release) uses copy_file_range on Linux. Both give the kernel considerable opportunity to pick optimal buffer sizes and to interleave device reads and writes in an optimal way. – Gilles 'SO- stop being evil' Nov 02 '21 at 08:34
  • @Gilles'SO-stopbeingevil' the different syscall makes more sense to me than discussion of tuning read/write sizes especially with sizes >= 1mb. I can imagine that boxing up the data to pass to userland and visa-versa comes with overheads (extra copying?). In any case, thanks for the follow up. – Philip Couling Nov 02 '21 at 08:57
38

dd dates from back when it was needed to translate old IBM mainframe tapes, and the block size had to match the one used to write the tape or data blocks would be skipped or truncated. (9-track tapes were finicky. Be glad they're long dead.) These days, the block size should be a multiple of the device sector size (usually 4KB, but on very recent disks may be much larger and on very small thumb drives may be smaller, but 4KB is a reasonable middle ground regardless) and the larger the better for performance. I often use 1MB block sizes with hard drives. (We have a lot more memory to throw around these days too.)

geekosaur
  • 32,047
  • Hard drives or USB mass storage devices are either 512 or 4096 (newer) bytes. Optical and direct access flash media is 2048 bytes. Can't go wrong with 4096 bytes. – LawrenceC Mar 17 '11 at 13:34
  • 4
    Why the copying program's block size should have anything to do with the underlying device's characteristics (tapes excepted)? The kernel does its own buffering (and sometimes prefetching) anyway. – Gilles 'SO- stop being evil' Mar 17 '11 at 22:43
  • 2
    To minimize fractional buffers; things in general go faster when you use aligned buffers because the kernel can start buffer reads/writes at sector (or better, track or cylinder, but I think modern drives lie about those) and kernel buffer boundaries, because the kernel isn't having to skip over stuff or read extra stuff or manage partial buffers. Certainly you can just let the kernel deal with it all, but if you're copying gigabytes of data that extra work can cut the copy time down considerably. – geekosaur Mar 17 '11 at 22:53
  • 1
    You (generally) need to include @Gilles if you want me to be notified of your comment reply, see How do comment @replies work?. Since I happened to be passing by: the kernel will deal with it all anyway. Your claim that “that extra work can cut the copy time down considerably” doesn't agree with my benchmarks, but different systems may have different behaviors, so please contribute timings too! – Gilles 'SO- stop being evil' Mar 17 '11 at 23:07
  • @Gilles: sorry, I had mistaken you for the original asker. – geekosaur Mar 17 '11 at 23:09
10

I agree with geekosaur's answer that the size should be a multiple of the block size, which is often 4K.

If you want to find the block size stat -c "%o" filename is probably the easiest option.

But say you do dd bs=4K, that means it does read(4096); write(4096); read(4096); write(4096)...

Each system call involves a context switch, which involves some overhead, and depending on the I/O scheduler, reads with interspersed writes could cause the disk to do lots of seeks. (Probably not a major issue with the Linux scheduler, but nonetheless something to think about.)

So if you do bs=8K, you allow the disk to read two blocks at a time, which are probably close together on the disk, before seeking somewhere else to do the write (or to service I/O for another process).

By that logic, bs=16K is even better, etc.

So what I'd like to know is if there is an upper limit where performance starts to get worse, or if it's only bounded by memory.

Cadoiz
  • 276
Mikel
  • 57,299
  • 15
  • 134
  • 153
  • 5
    Profile, don't speculate! – Gilles 'SO- stop being evil' Mar 17 '11 at 23:30
  • Absolutely. I am doing that now. But your benchmarks already illustrate the point. – Mikel Mar 17 '11 at 23:38
  • 1
    The Linux Programming Interface agrees with me. See Chapter 13 - File I/O Buffering. – Mikel Mar 18 '11 at 01:13
  • 4
    Interestingly, their benchmarks suggest there is little benefit above 4K however. – Mikel Mar 18 '11 at 01:24
  • 4
    Also, apparently the default file read ahead window is 128 KB, so that value might be beneficial. – Mikel Mar 18 '11 at 01:33
  • If you want your data to be hard to recover after you delete it on a hard drive you should use a larger blocksize as smaller block sizes (1m for example) make a hard drive fairly easy to recover – boulder_ruby Jan 10 '15 at 18:25
  • 10
    I have access to a 24 drive RAID50 here, where bs=8K gets me 197MB/s but bs=1M gets me 2.2 GB/s which is close to the theoretical throughput of the RAID. So bs matters ALOT. However using bs=10M I only get 1.7GB/s. So it appears to get worse over some threshold, but not sure why. – Joseph Garvin Nov 02 '15 at 18:22
  • 2
    @JosephGarvin Yes, I always find it worthwhile to play around a bit with the block size if I'm doing a large transfer. It seems the optimum block size depends on a lot of different things. – sudo May 06 '17 at 21:01
  • I want to mention that the improvements with larger bs stagnate. Consider this: https://unix.stackexchange.com/a/144177/318461 – Cadoiz Jan 19 '21 at 06:31
  • This article is also interesting: http://blog.tdg5.com/tuning-dd-block-size/ – Igor de Lorenzi May 31 '23 at 18:09
4

As Gilles says, you can determine the optimal parameter for the bs option to dd by benchmarking. This, though, begs the question: how can you conveniently benchmark this parameter?

My tentative answer to this question is: use dd-opt, the utility I've recently started working on to solve precisely this problem :)

Cadoiz
  • 276
3

If not, is there an easy way to get 90-95% of the way there?

Use bs=1M

It'll give you more than 95% optimal performance over more than 85% your devices, from slow USB2/3 flash drives, SD cards and hard-drives to NVMe SSDs and even RAM-only devices such as /dev/zero.

Source?

Voices in my head.

And some empirical testing over 10+ years combined with pseudo-scientific benchmarking and biased common sense.

Hey, you asked about the easy way!

MestreLion
  • 1,418
0

I optimized for sdcard reader usb2.0 which seems to run best at bs=10M. I tried 4k, on up to 16M, after 8-10M no improvement. You can see how the transfer rate measurement degrades...most likely due to loading up the buffers on the device then waiting for the device to transfer to the actual medium.

angstrom/sdcard# dd if=/dev/zero of=/dev/sdb bs=10M
123+0 records in
123+0 records out
1289748480 bytes (1.3 GB) copied, 21.4684 s, 60.1 MB/s
341+0 records in
341+0 records out
3575644160 bytes (3.6 GB) copied, 117.636 s, 30.4 MB/s
816+0 records in
816+0 records out
8556380160 bytes (8.6 GB) copied, 326.588 s, 26.2 MB/s
955+0 records in
955+0 records out
10013900800 bytes (10 GB) copied, 387.456 s, 25.8 MB/s
jasonwryan
  • 73,126
wwright
  • 111
  • The option status=progress can be used for monitoring. You could also consider this: https://unix.stackexchange.com/a/144178/318461 – Cadoiz Jan 19 '21 at 06:13