44

I've read many guides and forum posts describing how to use dd, but one thing I've noticed is that people always use different values for the bs=, count= and seek= switches.

Please can someone explain what these switches do exactly (the man page isn't very detailed), and explain what the best settings for them are for different tasks, such as creating files from either /dev/random or /dev/zero, and overwriting partitions and external drives.

Manuel Jordan
  • 1,728
  • 2
  • 16
  • 40
Eric
  • 457

3 Answers3

37

I really don't know how to explain this better than the manpage does.

bs= sets the blocksize, for example bs=1M would be 1MiB blocksize.

count= copies only this number of blocks (the default is for dd to keep going forever or until the input runs out). Ideally blocks are of bs= size but there may be incomplete reads, so if you use count= in order to copy a specific amount of data (count*bs), you should also supply iflag=fullblock.

seek= seeks this number of blocks in the output, instead of writing to the very beginning of the output device.

So, for example, this copies 1MiB worth of y\n to position 8MiB of the outputfile. So the total filesize will be 9MiB.

$ yes | dd bs=1M count=1 seek=8 iflag=fullblock of=outputfile
$ ls -alh outputfile
9.0M Jun  3 21:02 outputfile
$ hexdump -C outputfile
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00800000  79 0a 79 0a 79 0a 79 0a  79 0a 79 0a 79 0a 79 0a  |y.y.y.y.y.y.y.y.|
*
00900000

Since you mention /dev/random and overwriting partitions... it will take forever since /dev/random (as well as /dev/urandom) is just too slow. You could just use shred -v -n 1 instead, that's fast and usually available anywhere.

frostschutz
  • 48,978
  • 3
    More precisely, count=N copies at most N blocks. If the input is one block, it won't repeat that single block N times. – MSalters Feb 25 '20 at 17:06
  • First off, in the case of dd your answer provides a better explanation than the man page. One thing I would like to add to help the less experienced is to add that bs in this case is an arbitrary value whose size depends on the application context dd is being used for. The meaning of block has been disambiguated very nicely in difference-between-block-size-and-cluster-size. – darbehdar Aug 19 '21 at 07:42
25

Ok, you said the man pages were not detailed, so I will explain what they mean with an easy to understand metaphor about a moving guy (he goes by the name of dd):

   bs=BYTES
          read and write up to BYTES bytes at a time

dd picks up something (boxes, vases, beds, rice, etc.), move where it needs to be and drops it off. Until he doesn't drop the load of objects of the box, he doesn't pick anything else, it means: other box with another load of objects.

Now, when you need to tell him exactly how many objects to load into the box per travel, this is what bs do. You set the amount of data he will read and write. This is almost mandatory in all useful and common commands.

   count=N
          copy only N input blocks

This determinate the total amount of boxes he will move. Boxes in this context are the blocks on the disk. You tell him to move 5 boxes, he only moves 5 boxes even if there are more than 5 boxes (if there are less than 5 boxes, he will take a vase he found besides the boxes to add it up). If you tell dd to count only 5, and write it somewhere, he would copy the first 5 blocks he sees and writes them where you want.

   seek=N skip N obs-sized blocks at start of output

The guy normally finds the first available place to drop the load, this is normally at the start (of the disk), and continues filling up until the end. Well, with this you tell dd to start further up, say instead of the hall, start in one of the rooms further inside. It just "skip" the starting blocks.

Now, depending on what you are doing, you will need different combinations based on source and destination, along with the format they will be read and written. I recommend you to search for them separately.

Manuel Jordan
  • 1,728
  • 2
  • 16
  • 40
Braiam
  • 35,991
  • 2
    "he will take a vase he found besides the boxes to add it up". What does this metaphor stand for? – Ini Dec 25 '18 at 19:43
  • 3
    @Ini That if there's a adjacent block that doesn't belongs to the if read, dd will read it and move it too. A vase isn't a box, yet dd moves it. – Braiam Dec 25 '18 at 20:46
5

One important thing that hasn't been mentioned here so far is that, e.g., dd bs=16G requires you to have at least 16 gigabytes in free RAM. However, if you don't, you can just use a lower block size bs (say, 2G) and then do multiple rounds by setting counts to a number greater than 1. (In this case, 8 to achieve [up to] 16 gigabytes of output [depending on the input].)

Therefore:

 bs x count = data size
2GB x   8   = 16GB
Manuel Jordan
  • 1,728
  • 2
  • 16
  • 40
balu
  • 270
  • 2
    Is there an "optimal" blocksize/count combination or does it not matter as long as blocksize < ram && blocksize*count = size_of_data? – lucidbrot May 23 '20 at 16:39
  • 2
    @lucidbrot I'm not an expert but I would assume that optimality (in terms of perfomance) strongly depends on the specifics of the input / output device and there's no general answer here. For instance, if the source device is an HDD, the block size should probably account for how much data can be read from the HDD at once without moving the head (i.e. without incurring additional latencies). Clearly, this also depends on what file system is being used and in how far the source data is stored contiguously (as opposed to being fragmented / split up). – balu May 24 '20 at 15:06
  • 1
    @lucidbrot You might also be interested in this answer: https://unix.stackexchange.com/questions/26710/whats-the-difference-between-these-two-dd-commands/26766#26766 – balu May 24 '20 at 15:07
  • Thank you. I was expecting something like that but I feel like I don't fully grasp yet how to figure out which bs I need. I might pose that as a question later – lucidbrot May 24 '20 at 16:17