Optimal blocksizes for dd
are around 64k
-256k
, humans usually prefer 1M
.
A benchmark without real I/O:
$ for bs in 512 4k 16k 64k 128k 256k 512k 1M 4M 16M 64M 128M 256M 512M
> do
> echo ---- $bs: ----
> dd bs=$bs if=/dev/zero of=/dev/null iflag=count_bytes count=10000M
> done
---- 512: ----
20480000+0 records in
20480000+0 records out
10485760000 bytes (10 GB) copied, 4.2422 s, 2.5 GB/s
---- 4k: ----
2560000+0 records in
2560000+0 records out
10485760000 bytes (10 GB) copied, 0.843686 s, 12.4 GB/s
---- 16k: ----
640000+0 records in
640000+0 records out
10485760000 bytes (10 GB) copied, 0.533373 s, 19.7 GB/s
---- 64k: ----
160000+0 records in
160000+0 records out
10485760000 bytes (10 GB) copied, 0.480879 s, 21.8 GB/s
---- 128k: ----
80000+0 records in
80000+0 records out
10485760000 bytes (10 GB) copied, 0.464556 s, 22.6 GB/s
---- 256k: ----
40000+0 records in
40000+0 records out
10485760000 bytes (10 GB) copied, 0.48516 s, 21.6 GB/s
---- 512k: ----
20000+0 records in
20000+0 records out
10485760000 bytes (10 GB) copied, 0.495087 s, 21.2 GB/s
---- 1M: ----
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 0.494201 s, 21.2 GB/s
---- 4M: ----
2500+0 records in
2500+0 records out
10485760000 bytes (10 GB) copied, 0.496309 s, 21.1 GB/s
---- 16M: ----
625+0 records in
625+0 records out
10485760000 bytes (10 GB) copied, 0.972703 s, 10.8 GB/s
---- 64M: ----
156+1 records in
156+1 records out
10485760000 bytes (10 GB) copied, 1.0409 s, 10.1 GB/s
---- 128M: ----
78+1 records in
78+1 records out
10485760000 bytes (10 GB) copied, 1.04533 s, 10.0 GB/s
---- 256M: ----
39+1 records in
39+1 records out
10485760000 bytes (10 GB) copied, 1.04685 s, 10.0 GB/s
---- 512M: ----
19+1 records in
19+1 records out
10485760000 bytes (10 GB) copied, 1.0436 s, 10.0 GB/s
- The default
512
bytes is slow like hell (two syscalls per 512 bytes is just too much for the CPU)
4k
is considerably better than 512
16k
is considerably better than 4k
64k
-256k
is about as good as it gets
512k
-4M
slightly slower
16M
-512M
speed cuts in half, worse than 4k
.
My guess is that starting with a certain size, you start losing speed due to lack of concurrency. dd is a single process; concurrency is largely provided by the kernel (readahead, cached write, ...). If it has to read 100M before it can write 100M, there will be moments when a device sits idle, waiting for the other to finish reading or writing. Too small blocksize and you suffer from sheer syscall overhead, but that goes away completely with 64k or so.
100M or larger blocksizes might help when copying from and to the same device. At least for hard drives, doing so should reduce the time wasted on seeking, as it can't be in two places simultaneously.
Why are you overwriting your SSD like this in the first place? Normally, you try to avoid unnecessary writes on SSDs; if it considers all of its space used, it will likely also lose some of its performance until you TRIM it free again.
You could use this command instead to TRIM/discard your entire SSD:
blkdiscard /dev/sda
If your SSD has deterministic read zeroes after TRIM (a property you can check with hdparm -I
) it will look like it's full of zeroes, but the SSD actually considers all of its blocks as free which should give you the best possible performance.
The downside of TRIM is that you lose all chances at data recovery if the deleted file has already been discarded...
hdparm --security-erase
to direct the drive to wipe itself. – psusi Feb 10 '15 at 14:00