40

I like to clone a whole partition or a whole hard drive onto a larger external disk but like to create a sparse file. I often use dd for cloning, but it doesn't support sparse files. As a workaround I used something like:

cp --sparse=always <(dd if=/dev/sda1 bs=8M) /mount/external/backup/sda1.raw

However this is a little too tricky for my taste and doesn't allow me to resume the process if aborted. It is funny that there is a NTFS tool for this (ntfsclone) but no such tool exists for the native file systems of Linux (EXT2-4).

Is there some better tool for this, e.g. a dd variant with sparse support? I do not look for some proprietary software for disk backups but simply want to make a sparse clone copy which I can mount as loop device if required.

maxschlepzig
  • 57,532
  • 7
    +1 for creative use of cp, it never occurred to me that you could sparse-copy a disk image. I always just compressed them if I needed to save space. Now why is that in a question not an answer? – Caleb Jul 20 '11 at 20:05

10 Answers10

23

You want dd_rescue.

dd_rescue -a -b 8M /dev/sda1 /mount/external/backup/sda1.raw

The copy may be interrupted at any time by Ctrl-C, showing the current position. This value can be used, when restarting by adding -s and the position to the original command, e.g.

dd_rescue -a -b 8M -s 42000k /dev/sda1 /mount/external/backup/sda1.raw

Even easier would be to specify a third file name, which acts as a log file. On restart dd_rescue will read that log file and pick up where it left off.

Olaf Dietsche
  • 1,637
  • 14
  • 17
  • 1
    Great! The manual says "If the copying process is interrupted by the user it is possible to continue at any position later." and "-a spArse file writing (default=no)". Exactly what I want! Thanks! – Martin Scharrer Jul 24 '11 at 21:56
  • 3
    Looking for dd_rescue online I found out that there is also a different tool called ddrescue (without the underscore) which was developed independently from dd_rescue but seems to do basically the same. I just mention that here as a general FYI. – Martin Scharrer Jul 24 '11 at 22:11
  • Yeah, dd_rescue and ddrescue aren't the same thing. Theoretically they do the same job, but generally I've had better luck with the older/original dd_rescue. – Steven Pritchard Jul 24 '11 at 22:49
  • 1
    In case anyone is wondering, you can stop the copy at any time with Ctrl-C. It will show you your current position, and you can use that value to restart by adding -s and the position to the original command. (So it would look like dd_rescue -a -b 8M -s 42000k /dev/sda1 /mount/external/backup/sda1.raw.) – Steven Pritchard Jul 24 '11 at 23:05
  • 1
    @Steven Pritchard: No need to remember the position. Specify a third filename, which will be the logfile, and on restart it'll read that and pick up where it left off. – Tanith Rosenbaum Nov 09 '14 at 19:38
  • Rather strange I get almost twice as fast image creation with ddrescue compared to dd, with increase blocksize in dd! (FYI: Debian package name gddrescue, but executable and manual is still ddrescue, took me some time to get that right!) – MrCalvin May 15 '19 at 20:39
  • Just fyi, the ddrescue tool wants the -S option for sparse output, as opposed to -a with dd_rescue – Remember Monica Apr 18 '22 at 00:02
22

Just for completeness the call for ddrescue. The --sparse or -S flag allows the destination to be written sparsely:

$ ddrescue -S -b8M /dev/sda1 /mount/external/backup/sda1.raw

Or with long option:

$ ddrescue --sparse --block-size 8M /dev/sda1 /mount/external/backup/sda1.raw

Or if you prefer MiBs:

$ ddrescue -S -b8Mi /dev/sda1 /mount/external/backup/sda1.raw

To allow the rescue to be interrupted and resumed, you can also make use of a logfile:

$ ddrescue -S -b8Mi /dev/sda1 /mount/external/backup/sda1.raw ~/sda1.rescue.log

Note that GNU ddrescue and dd_rescue are different programs. But GNU ddrescue seems to be more widespread. For example it is already packaged with GRML.

zaTricky
  • 408
maxschlepzig
  • 57,532
  • Does there need to be any special treatment of the image when restoring, can you provide the command used to restore a ddrescue? – user12439 Apr 06 '16 at 22:46
  • 1
    In theory the storage medium you're using for the rescue is supposed to be more reliable, so you can typically just use dd to write to the replacement disk: dd if=sda1.raw of=/dev/sdb1. However, to use ddrescue for the restore, you just change the source/destination you used for the rescue to the new source/destination, preferably with a new log file. If possible (often not), you can of course use ddrescue to copy data directly from the bad source disk to a replacement disk. – zaTricky May 15 '18 at 07:49
3

There was a patch offered in 2007 to provide sparse file support in GNU dd, but it looks to have not made it into coreutils (at least not as of 8.4). I doubt dd has changed too much since then, the patch might apply against the current version without a lot of work.

I'm also really impressed by the creative use of cp in your question, and it got me on the track of using it to accomplish resuming (here resuming from ~80M into the source):

cp --sparse=always \
  <(dd if=/dev/sda1 bs=8M skip=10) /dev/stdout \
  | dd bs=8M seek=10 of=/mount/external/backup/sda1.raw

Edit: scratch that. The second dd would of course be seeking to the wrong position in the output file, since it's not the same length as the input.

Eli Heady
  • 1,234
  • 1
    As is the case with bhinesley's answer, it would be best to log dd's progress for accurate resuming. If you were to use this approach for both the first run and resumes, and log both parallel dd's independently, then you could know how far into the output to seek. If I have time I'll try to work this up. – Eli Heady Jul 22 '11 at 18:48
  • 2
    Thanks for the link to the patch. I was starting to think about programming something like it by myself :-) Sparse files can't be pipped so your code won't work. – Martin Scharrer Jul 22 '11 at 18:50
  • Yup, I just discovered that myself. Oh well, it was fun finding crazy new uses of cp - thanks! – Eli Heady Jul 22 '11 at 18:58
  • 2
    dd commit at 2012: http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=4e776faa8482ae630d2ea9bc767298e664f07ba9 "dd: add support for the conv=sparse option" ("(iwrite): Convert a write of a NUL block to a seek if requested.") – osgx May 19 '18 at 05:12
3

This is a very old question, and dd developers added support for sparse files over a decade ago (as was pointed out by osgx in a comment above). Nowadays you can simply run:

dd if=/dev/sda1 of=/mount/external/backup/sda1.raw bs=8M conv=sparse status=progress

Depending on how fragmented your data is, you might also be able to achieve a smaller file by specifying a smaller block size value.

Unlike dd_rescue, ddrescue, qemu-img or rsync, dd is typically installed by default on GNU/Linux systems (and easier to use than cp --sparse=always $(dd ..., which may make it a more desirable option.

If you cancel the dd command, you will get something like this:

dd if=/dev/sda1 of=sda1.raw bs=512 conv=sparse status=progress
10724033024 bytes (11 GB, 10 GiB) copied, 538 s, 19.9 MB/s^C
20967649+0 records in
20967648+0 records out
10735435776 bytes (11 GB, 10 GiB) copied, 539.375 s, 19.9 MB/s

You can later use the smaller value to resume by adding the seek= and skip= arguments, like so:

dd if=/dev/sda1 of=sda1.raw bs=512 conv=sparse status=progress seek=20967648 skip=20967648

You could also obtain the value via something like:

ls -l --block-size=512 sda1.raw
-rw-r--r-- 1 root root 20967648 Nov 10 21:00 sda1.raw
  • simple cp --sparse=always /dev/disk disk.img should work, without involving <(dd ...) not sure why that was done. dd conv=sparse is useful if you want to specify the block size. Note that alignment matters as well, so bs=8M could leave a 15M sequence of zeroes as non-sparse if it's not aligned to 8M boundary. For large blocksizes (or when reading from a pipe) you should also add iflag=fullblock. – frostschutz Nov 10 '23 at 11:19
  • conv=sparse is good. @frostschutz dd bs=8M was used for speed improvements on large data. I see now that conv=sparse only works with completly empty output blocks (obs), so dd conv=sparse ibs=8M obs=512 or similar would be a better way. In general filesystem-aware cloning tools like ntfsclone or e2image would be preferable I guess. – Martin Scharrer Nov 22 '23 at 10:46
2

Just adding my 2 cents. Another way to create a sparse file from a raw disk is with qemu-img using something like:

qemu-img convert -f raw /dev/sda /tmp/sda.raw

You can use this on a single partition as well. Also, you have the option to convert the raw disk/partition to any other format that qemu-img supports (QCOW2, VHD[x], vmdk, etc)

2

Why not simply:

cp --sparse=always /dev/sda1 /mount/external/backup/sda1.raw
Zaz
  • 2,589
2

Another option is rsync. For example:

rsync -SP --copy-devices /dev/sda1 /mount/external/backup/sda1.raw

Explanation:

  • -S/--sparse to skip sparse blocks on write
  • -P/--partial --progress to show progress and keep partially transferred files
  • --copy-devices to copy device contents1

You can add --append to resume an interrupted copy (or --append-verify to confirm the checksum over both new and old data matches).

Footnotes
  1. The --copy-devices option is provided by copy-devices.diff from rsync-patches, so it may not be present on some systems. It is included by Fedora, Ubuntu, Debian (until 3.2.0-1, see Bug 992215), and likely others.
Kevinoid
  • 161
2

It appears the OP is specifically looking to clone an EXT 2/3/4 filesystem. There is a tool to do just exactly this called e2image. And can be used like so:

e2image -rap <source> <dest>

This will create a sparse file on filesystems that support sparse files and will create a raw filesystem image, but there are other options for other formats (eg. QCOW2).

The problem with using dd or dd_rescue to sparsely copy filesystems is that they aren't that smart. So if you have a filesystem that has been in use for a long time, its likely that you have many free blocks that are not zero-filled. These will get copied to the image file unnecessarily. So you could end up with an image the size of the filesystem even though df shows, say, 50% utilization. This can be worked around by using TRIM or zero-filling free blocks before using these commands. Or you could just use e2image and not worry about it.

crass
  • 338
  • 1
  • 7
0

Note: this doesn't work for the reasons described in the comments, I'm leaving it here for reference.

Monitor the statistics of dd by using kill -USR1:

$ cp --sparse=always <(dd if=/dev/urandom bs=8M) \
    /mount/external/backup/sda1.raw&
$ watch kill -USR1 `pidof -s /bin/dd`

Resume by using skip/seek:

$ i_bytes= # get from the last dd statistic
$ o_bytes=`du -b /mount/external/backup/sda1.raw | cut -f 1`   
$ cp --sparse=always <(dd if=/dev/urandom bs=8M skip=$i_bytes \
    seek=$o_bytes) /mount/external/backup/sda1.raw&
$ watch kill -USR1 `pidof -s /bin/dd`

Without $i_bytes it would be more difficult to resume. It's probably easiest to log the dd statistics to a file in case the machine crashes or whatever.

bhinesley
  • 598
0

There are xfsdump and xfsrestore for XFS which has been a native Linux filesystem for quite a long time.