Does dd copy everything, including all metadata and all "empty" blocks

Question

Suddenly something was messed up with my partitions, or just one partition. I have a default Ubuntu installation, on a Kingston SSD, with the root file system encrypted with LUKS, (using AES I think). Now I'm trying to mount the partition from a live cd, but without luck.

I am so afraid of doing some additional harm that can not be undone. So I would like to make an exact copy of the drive. That means all partiton tables, whatever kind of metadata for the LUKS partition, and well any other kind of metadata that I don't know of. I guess I want all the empty blocks too, to feel absolutely safe.

I know about dd if=/dev/sda of=/dev/sdb, but I don't know if it will include all the data described.

Perhaps I need to specify block size with -b, but I don't understand how that works and why it is necessary (if it is). And I also don't know how to find the block size of the partition.

Please tell me if it does copy all data, and if not, if there is another way.

Not trying to be mean here, but where do you think metadata could be stored, other than the persistent storage? The term you are looking for is "disk cloning", btw. — Benjamin B., Jul 28 '15 at 14:34
You mean creature ;) I did not believe it was stored anywhere else than the drive, but maybe somewhere on the disk that is somehow not "represented" in /dev/sda. Perhaps the first 1kB or so is not considered "real data" (or whatever), and /dev/sda begins after that 1kB. Low level has surprised me in the past. — Mads Skjern, Jul 28 '15 at 15:19
Alright, I understand it. Fortunately you can expect the device files in Linux to be your interface to the raw data, i.e., the data without the notion of "files", contained on the disks and dd to be Swiss army knife that deals with it. Best of luck! — Benjamin B., Jul 28 '15 at 15:26

score 12 · Accepted Answer · edited Jul 28 '15 at 14:00

12

Yes it does, even the blocks that would not (officially) contain data and also all information regarding partitions, UUIDs, etc..

E.g. recovery of data (i.e. after deleting files) from the dd-copied drive would be possible.

You may want to read this regarding the noerror and sync options.

Block size (bs=) doesn't affect the result unless there are read errors, but you should set it to "1M" (or at least "4k") or it will take longer for no good reason.

edited Jul 28 '15 at 14:00

sourcejedi

50,249

answered Jul 28 '15 at 13:31

FelixJN

13,566

And about block size. Does it matter? – Mads Skjern Jul 28 '15 at 13:55
Not likely, no. From what I understand your disk isn't defective, so you'll probably not run into read errors. A larger block size will just speed up the process. This is basically what Fiximan said. – Benjamin B. Jul 28 '15 at 14:36
@BenjaminB. A 1M block size is inefficient. dd has to spend more time copying from dev>mem/mem>dev than necessary. And if the block dev is failing and there are read errors, then that block size will not only slow the operation, but also contribute to more lost data per short read. Best - depending on source/taget devs and system kernel - is usually somewhere in the 32K-256K realm. – mikeserv Jul 29 '15 at 05:45
@mikeserv: Ah! Now that I read your answer below, your comment makes sense. I didn't know it was possible and safe to use cp for cloning block devices. Thanks for that! – Benjamin B. Jul 29 '15 at 08:41
@BenjaminB. Not that it much matters, but I think it a little strange - if the answer is worth a thanks, is it not worth a vote? Also... how does the comment about dd block-size relate to the answer about cp ? – mikeserv Jul 29 '15 at 08:56
@mikeserv: fishing for upvotes ay? ;) It relates to it because my first reaction to your comment was: "if 1M block sizes are inefficient, just increase it 100M", which is probably not what you meant. Your comment about 1M being inefficient is vague, because it doesn't state why that would be so. A larger block size is more efficient if you use dd (period). The explanation is in your answer: don't use a buffer at all (like dd), but simply copy disk-to-disk using cp. – Benjamin B. Jul 29 '15 at 09:01
1

@BenjaminB. - larger block sizes are not more efficient. This is because the disk can/will only serve up so much data at a time anyway, and dd has to store that in memory between read/write. Tiny block sizes are also inefficient because rather than storing too much in memory, instead dd has to keep going back and reading more. What you want is to match the time it takes for a read to return up with the time it takes to store and write. And so dd bs=64k <i >o is faster than dd bs=1M <i >o. And I wasn't fishing - but saying thanks is antithetical to the se model. cp buffers too. – mikeserv Jul 29 '15 at 09:06
1

@mikeserv: Regarding your comment on upvotes, I want to tell you a very general fact, that might be useful for you. I always wait a bit before giving up/down votes, or choosing an answer. I do this, because I need time to understand what people write and if they are right. I don't like voting before I understand :) I could imagine that other askers also don't know how to vote before spending some time on trying out answers, but I don't know. – Mads Skjern Jul 29 '15 at 10:41
@MadsSkjern - I believe that's true - but my comment was for Benjamin. I just considered it weird that someone would take the time to write thanks for an answer which they didn't upvote. Seems the easier thing would be to click the button. – mikeserv Jul 29 '15 at 10:48
@MadsSkjern: and even after five years you did not select one of the answers? It seems that you needed that much time to rethink this to forgot it completely... – Marvin Emil Brach Sep 02 '20 at 00:07
@Marvin: Appologies, and good that you reminded me. I will read through the answers after work and choose the best one. I still believe that people should not immediately choose best answer. – Mads Skjern Sep 02 '20 at 04:54
@MadsSkjern LOL.. I really did not expect that you will answer that :D but it's good, i think they all deserve the acceptance of their answers :) But better choose any answer instead of forgetting it 5 years - at least if it's correct and more than one sentence... Also your question was "does dd copy everything" -> this was already answered completely and sufficiently with the first post by Fiximan... Better answers will get their points anyway, while getting more upvotes than the accepted :D – Marvin Emil Brach Sep 02 '20 at 08:45

mikeserv · Answer 2 · 2015-07-29T05:20:40.163

8

Just do:

cp /dev/block_device imgfile

If imgfile will be located on a file-system which understands such things, a GNU cp should default to writing the image sparsely. You can specify your preference, though, like...

cp --sparse=always /dev/sda imgfile

dd's primary usefulness is in its ability to reliably take only a specified portion of a stream, or that it can very efficiently apply certain conversions to same. If you want a 1:1 copy of all of a file then just cp it.

edited Jul 29 '15 at 05:20

answered Jul 29 '15 at 05:11

mikeserv

58,310

What does sparsely mean here? – Mads Skjern Jul 29 '15 at 07:43
@MadsSkjern - it means that the filesystem can encode a hole in a file that represents so many null-bytes in such a way that the file need not use all of the disk space allocated to the file. And so you can write a sparse file like: </dev/null dd bs=64k seek=1 of=sparse; ls -slh sparse which prints 0 -rw-r--r-- 1 mikeserv mikeserv 64K Jul 29 01:17 sparse where ls indicates that the sparse file uses 0 bytes of space but represents 64k worth of empty space. To explicitly include empty blocks w/ GNU cp: cp --sparse=never sourcefile targetfile. – mikeserv Jul 29 '15 at 08:20
So if I understand correctly, sparsely does the opposite of what I asked for? Not that it's not valuable information. Also, if I make a sparse copy, can I then restore it back to a disk, so that the disk has all those holes? I'm not gonna do the sparse thing, I'm just curious. – Mads Skjern Jul 29 '15 at 08:25
1

@MadsSkjern - you should do the sparse thing. cp will be looking at the block dev - not the fs's representation of that. And so cp will skip writing the NULs and seek to the offset of the sequence's end-point before writing more bytes - it means there is less direct disk access in the write operation - not the read. And sparse files are handled by the filesystem - and so it only works if the fs supports it. This means writing a sparse file directly to a block-dev - like cp sparse /dev/something will write 64k in NULs - because there is no fs involved there. – mikeserv Jul 29 '15 at 08:33
@MadsSkjern - that aside, what you do is your own decision of course. If you would rather not write your image sparsely, then cp --sparse=never sourcefile targetfile is what you should do – mikeserv Jul 29 '15 at 08:34

score 4 · Answer 3 · edited Apr 13 '17 at 12:36

dd doesn't care what the data it copies means. Partition tables, partition contents, file fragments, empty filesystem space, it's all bytes. dd if=/dev/sda of=/dev/sdb makes /dev/sdb an exact copy of /dev/sda, provided that sdb is at least as large as sda (plus some trailing junk that won't be directly accessible if sdb is larger).

All the magic is in the sdX block devices. dd is just a tool to copy bytes around.

This doesn't mean that dd is the best tool for the job though. It's somewhat error-prone and typically not the fastest thing around. I found cat to be faster when copying between different disks. dd can lose data in somewhat unintuitive ways (though I think that a modern Linux system is safe in this respect). Using cat has the additional advantage that there's less of a risk of destroying your data to a typo (like swapping if and of): the output is specified via the familiar shell redirection operator (you can use this syntax for dd too, by the way).

cat /dev/sda >/dev/sdb

If your other disk is larger, you can make an image of the disk in a file:

cat /dev/sda >/path/to/disk.img

Such a disk image can't be used directly: you can't boot off it. But copying it back to a disk will yield a byte-for-byte copy of the original, since the whole contents were copied both times. You can also do a loopback mount to access files off it. You can make a loop device with partitions, but Ubuntu has only had the tools for that in recent versions. If you just want to preserve your data, it's enough to copy partitions individually, and store them in individual files.

catbeing faster than dd seems strange. Do you have explanation for this and in what circumstances? — Jodka Lemon, Jul 29 '15 at 01:30
@JodkaLemon I don't know, I didn't study the traces in detail. I'd have expected the chunk size (i.e. the size that's copied at a time) to be the main factor, but dd was slower for cross-disk copies even when tuning its chunk size. — Gilles 'SO- stop being evil', Jul 29 '15 at 01:32

Does dd copy everything, including all metadata and all "empty" blocks

3 Answers3