93

I like to create an image backup the first time I'm backing up a system. After this first time I use rsync to do incremental backups.

My usual image backup is as follows:

  • Mount and zero out the empty space:

    dd if=/dev/zero of=temp.dd bs=1M
    rm temp.dd
    
  • Umount and dd the drive while compressing it

    dd if=/dev/hda conv=sync,noerror bs=64K | gzip -c  > /mnt/sda1/hda.ddimg.gz
    
  • To put the system back to normal, I will usually do a

    gzip -dc /mnt/sda1/hda.img.gz | dd of=/dev/hda conv=sync,noerror bs=64K
    

This is really straightforward and allows me to save the 'whole drive' but really just save the used space.

Here is the problem. Lets say I do the above but not on a clean system and don't get the rsync backups going soon enough and there are files that I want to access that are on the image. Let's say I don't have the storage space to actually unzip and dd the image to a drive but want to mount the image to get individual files off of it.... Is this possible?

Normally, one wouldn't compress the dd image, which will allow you to just mount the image using -o loop... but this isn't my case...

Any suggestions for mounting the compressed img on the fly?

Would using AVFS to 'mount' the gz file then mounting the internal dd.img work (I don't think so... but would need verification...)?

AdminBee
  • 22,803
g19fanatic
  • 1,035
  • You should use SquashFS for this kind of things. It also de-dupes duplicated files. – Avio Oct 24 '12 at 08:56
  • 1
    It looks like this fellow is doing what you are asking about: http://blogs.gnome.org/muelli/2012/10/loopback-monting-huge-gzipped-file/ – Joshua Jan 02 '13 at 20:47
  • I second Avio's suggestion. The only thing squashfs doesn't archive is acls. It archives xattrs, so selinux attributes, etc. If you don't use acls, then squashfs is the way to go IMHO. I've recently had to archive "just in case" some old drives that have already been migrated to new storage, and squashfs was perfect for the job. – Kuba hasn't forgotten Monica Feb 24 '16 at 19:41
  • Also, this is possible without restrictions for .vhd(x) images. These are quite common in the Windows world and the full solution for mounting can be found here: https://www.how2shout.com/linux/mount-virtual-hard-disk-vhd-file-ubuntu-linux/?unapproved=516&moderation-hash=6ecfb4ca6f97ff53d32570f3ba2e28f0#comment-516 – Cadoiz Jan 20 '21 at 00:38
  • You can also consider this short anser: https://askubuntu.com/a/252719/830570 and this Q/A: https://superuser.com/a/1097391/910769 – Cadoiz Jan 20 '21 at 01:02
  • conv=sync,noerror is a really bad combination. Any read error will pad the remainder of the block with zeros, and then dd will attempt to reread from where it left off. You've now got a partial block of zeros somewhere in your data in addition to the data you've successfully read. Just use cat – Chris Davies Jul 04 '23 at 14:58
  • Regarding the OP's original command to decompress to stdout, I think "gunzip -c" and "gzip -dc" operate similarly (but are technically two different programs). I wasn't paying close enough attention and ran "gzip -c" on the decompression step, which is clearly wrong. It writes a further compressed output to the drive. It wouldn't hurt for a naïve individual to decompress even using "gunzip -dc" with the -d just being redundant, but the other way is a problem. For safety reasons, it might be reasonable to just always add the -d parameter, regardless of gzip or gunzip. – mmortal03 Jul 05 '23 at 01:38

7 Answers7

94

It depends on whether the disk image is a full disk image, or just a partition.

Washing the partition(s)

If the disk is in good working condition, you will get better compression if you wash the empty space on the disk with zeros. If the disk is failing, skip this step.

If you're imaging an entire disk then you will want to wash each of the partitions on the disk.

CAUTION: Be careful, you want to set the of to a file in the mounted partition, NOT THE PARTITION ITSELF!

mkdir image_source
sudo mount /dev/sda1 image_source
dd if=/dev/zero of=image_source/wash.tmp bs=4M
rm image_source/wash.tmp
sudo umount image_source

Making a Partition Image

mkdir image
sudo dd if=/dev/sda1 of=image/sda1_backup.img bs=4M

Where sda is the name of the device, and 1 is the partition number. Adjust accordingly for your system if you want to image a different device or partition.

Making a Whole Disk Image

mkdir image
sudo dd if=/dev/sda of=image/sda_backup.img bs=4M

Where sda is the name of the device. Adjust accordingly for your system if you want to image a different device.

Compression

Make a "squashfs" image that contains the full uncompressed image.

sudo apt-get install squashfs-tools
mksquashfs image squash.img

Streaming Compression

To avoid making a separate temporary file the full size of the disk, you can stream into a squashfs image.

mkdir empty-dir
mksquashfs empty-dir squash.img -p 'sda_backup.img f 444 root root dd if=/dev/sda bs=4M'

Mounting a compressed partition image

  • First mount the squashfs image, then mount the partition image stored in the mounted squashfs image.
    mkdir squash_mount
    sudo mount squash.img squash_mount
    
  • Now you have the compressed image mounted, mount the image itself (that is inside the squashfs image)
    mkdir compressed_image
    sudo mount squash_mount/sda1_backup.img compressed_image
    
  • Now your image is mounted under compressed_image.

EDIT: If you wanted to simply restore the disk image onto a partition at this point (instead of mounting it to browse/read the contents), just dd the image at squash_mount/sda1_backup.img onto the destination instead of doing mount.

Mounting a compressed full disk image

This requires you to use a package called kpartx. kpartx allows you to mount individual partitions in a full disk image.

sudo apt-get install kpartx
  • First, mount your squashed partition that contains the full disk image

    mkdir compressed_image
    sudo mount squash.img compressed_image
    
  • Now you need to create devices for each of the partitions in the full disk image:

    sudo kpartx -a compressed_image/sda_backup.img
    

    This will create devices for the partitions in the full disk image at /dev/mapper/loopNpP where N is the number assigned for the loopback device, and P is the partition number, e.g. /dev/mapper/loop0p1. You can find this number N in the output of losetup --list. The most recently created loopback device should have the largest N number.

  • Now you have a way to mount the individual partitions in the full disk image:

    mkdir fulldisk_part1
    sudo mount /dev/mapper/loop0p1 fulldisk_part1
    
AdminBee
  • 22,803
doug65536
  • 1,054
  • interesting take on this problem (squashfs instead of gzip). I am pretty unfamiliar with squashfs tools... can you pipe the output of dd to create a squash partition on the fly as you can with the gzip partition? what is the compression ratios (gzip is okay/good, esp given the fact that I am clearing 'empty space with zeros')? – g19fanatic May 13 '13 at 03:12
  • how would you dd the image back to the hard disk? – g19fanatic May 13 '13 at 03:15
  • 2
    @g19fanatic The uncompressed disk image is "inside" the squashfs image. You mount the squashfs image, then dd the image inside it to the destination disk. – doug65536 May 13 '13 at 05:27
  • @g19fanatic The compression was excellent (very nearly the same as gzip in my case). mksquashfs was fast too, it is parallelized. On my 990x (6 core) it was actually limited by the destination disk write speed, around 100MB/sec. – doug65536 May 13 '13 at 05:29
  • @g19 The point of the extra effort of wrapping the uncompressed image in a squashfs is being able to mount, browse, and read from the image without unzipping it and without having to have the space to unzip it. – doug65536 May 13 '13 at 05:36
  • @g19fanatic I don't know a way to "stream" into a squashfs. squashfs tries to eliminate duplicate files and optimizes the directories etc (unnecessary when it holds a single disk image file btw), which it can't do on the fly. Perhaps it is possible to stream into a squashfs, I don't know. – doug65536 May 13 '13 at 05:39
  • 3
    @g19fanatic You can stream into squashfs using the -p or -pf flags to pass it a pseudo-file. A pseudo file can be used for things like making device nodes which you can't otherwise do without root (useful for building images as part of a build process) or for streaming the output of some command into the image. One of the examples given in the docs (/usr/share/doc/squashfs-tools/examples/pseudo-file.example on Debian/Ubuntu) is input f 444 root root dd if=/dev/sda1 bs=1024 count=10 to copy the first 10K from a disk image into a file named "input" in the squashfs image. – Brian Campbell Jul 22 '14 at 03:45
  • I use dd if=/dev/sdX bs=4M iflag=direct | pigz --blocksize $[4*1024] -9 -p 4 > BACKUP.gz for backup any disk/partition. - I have replaced gzip with pigz (fork of gzip) for compression uses all cores of my CPU. – andras.tim Oct 31 '14 at 12:18
  • Usually mksquashfs uses zlib compression (that is the one gzip uses). But you can change that using the -comp option. Compressors available: gzip (default), lzma (no kernel support), lzo, lz4 and xz. – erik Nov 18 '15 at 10:23
  • Of course you can also send the image with ssd such as | ssh <USER@HOST> 'cat - > filename.img' and compress it on the go with gzip like 'gzip -c' but that is another story. – AdamKalisz Aug 16 '17 at 09:11
  • Does washing still have an advantage with encrypted images? The blocks aren't zero but they should all be the same right? – jiggunjer Aug 22 '17 at 07:37
  • @jiggunjer I doubt it would compress. I would expect the encryption to vary for every block. I would also expect the same block to be completely different if you moved it to another cluster, but I am not sure. – doug65536 Aug 22 '17 at 09:25
  • if you're using an SSD, I would advise being sure you can TRIM the drive if you fill it with zeroes like this, to mark all blocks unused. Don't delete the wash file until after DDing, so that you can TRIM it afterwards, because reading unused blocks will cause the SSD to mark the the block as all zeroes and in use. – Paul M Sep 14 '19 at 12:40
  • 1
    A good explanation of the mksquashfs command arguments can be found here: https://askubuntu.com/questions/836217/how-to-mount-a-compressed-disk-image – HackerBoss Mar 19 '20 at 15:34
  • An alternative to zero-filling the rest of the partition is to shrink the partition to the smallest size a partition resizing tool will let you shrink it to, then image it (you can resize the partition back up to its previous size after). Zero-filling an entire drive has the potential to prevent some SSD optimizations, plus it requires a lot of writing. – trr Jul 31 '20 at 04:18
  • 1
    mksquashfs procedure worked great for streaming compression for a partition. For unknown reason (got a permission denied error), I needed to specify read-only to mount the inner partition. I.e., using the example above, I needed to use sudo mount -r squash_mount/sda1_backup.img compressed_image. – rickhg12hs Oct 01 '20 at 11:47
  • @HackerBoss Thanks for that link! Was looking for a simple breakdown of the arguments and their meaning for the -p flag. – g19fanatic Jan 20 '21 at 14:11
  • This answer is very interesting but it would be a lot better if it had explanation as to why writing an image into a squashfs image is a good idea. – CervEd Apr 17 '21 at 14:31
  • cat could also be considered instead of dd - check this – Cadoiz Jun 15 '21 at 21:08
37

Try archivemount

root@srv1:/backup# archivemount windows-2003-S.gz /target/
Unrecognized archive format

root@srv1:/backup# archivemount -o formatraw windows-2003-S.gz /target/ Calculating uncompressed file size. Please wait.

root@srv1:/backup# ls /target/ data

root@srv1:/backup# file /target/data /target/data: DOS/MBR boot sector; partition 1 : ID=0x7, start-CHS (0x0,1,1), end-CHS (0x3ff,254,63), startsector 63, 58717512 sectors, extended partition table (last)

archivemount is a FUSE-based file system for Unix variants, including Linux. Its purpose is to mount archives (i.e. tar, tar.gz, etc.) to a mount point where it can be read from or written to as with any other file system. This makes accessing the contents of the archive, which may be compressed, transparent to other programs, without decompressing them.

http://linuxaria.com/howto/how-to-mounts-an-archive-for-access-as-a-file-system

After mounting archive you can use it contents like regular file. Maybe get partition table, or convert, mount image with qemu tools.

squashfs useful for booting from image, but much complex for backuping.

CervEd
  • 174
eri
  • 905
  • 1
    Perfect! This is the most easy and elegant solution so far. I wonder why there are no votes here. – Neurotransmitter Apr 28 '15 at 20:54
  • I think it is because if you mount an archive like disk.img.gz on a folder with archivemount, say /mnt/, you would get a single file /mnt/disk.img, that you then have to mount elsewhere. The question instead wants something able to unwrap both in a single step (and archivemount seems capable to do that on .tar.gz, but not on gzipped raw images). – p91paul May 21 '15 at 15:04
  • @p91paul squashfs method is same with packing in tar.gz in this case. You can mount squashfs, but you also can mount tar.gz and skip ten steps in creating image and have 1-two step to see contents – eri May 21 '15 at 21:50
  • 1
    Of course, but still the original question doesn't talk about a tar.gz. My point was to tell @TranslucentCloud this answer has no votes because it doesn't actually answer the question, it tells "you should have created a tar.gz instead" – p91paul May 22 '15 at 08:41
  • @p91paul fair enough. But still the eri's answer sheds light on some interesting tool. – Neurotransmitter May 22 '15 at 10:20
  • This answer is not for votes. It shows some point of view on actions in question. – eri May 22 '15 at 22:42
  • Right, but do not forget, that Stack Exchange network is not designed to merely help to original question posters, but also to strangers, who stumble upon a similar problem. If they have a similar, but slightly different problem and if the solution is worth a considering, they could vote up the corresponding answer. – Neurotransmitter May 24 '15 at 14:55
  • I had similar problem and found this page. – eri May 24 '15 at 21:31
  • 2
    This answer is very interesting too. I believe squashfs gets more love because it has more awareness. I instantly recognized the name but have never heard of archivemount. I will have to give it a shot too! – g19fanatic Nov 16 '15 at 13:37
  • 6
    archivemount does not allow to mount an image created by command dd if=/dev/hda conv=sync,noerror bs=64K | gzip -c > /mnt/sda1/hda.ddimg.gz – Serg Kryvonos Nov 06 '16 at 15:58
  • 1
    name@hozd ~ $ archivemount /tmp/test.ddimg.gz /tmp/t Unrecognized archive format name@hozd ~ $ archivemount --version archivemount version 0.8.5 FUSE library version: 2.9.4 fusermount version: 2.9.4 using FUSE kernel interface version 7.19 – Serg Kryvonos Nov 06 '16 at 17:46
  • 3
    Agreed - at time of writing, archivemount supports tar archives that are gzipped, but not plain gzipped files. – mwfearnley Apr 22 '17 at 12:34
  • 1
    @Sergei, @mwfearnley, @g19fanatic - i added example with dd ...| gzip > raw.gz image what works – eri Sep 12 '17 at 07:54
  • This should be the accepted answer. – mckenzm May 29 '21 at 05:06
  • @eri it not works, it mount data which can't be mount as loop, this should not be accepted andswer! – zb' Oct 18 '23 at 03:09
21

If the image is read-only you can also use nbdkit (man page) and its xz filter (xz should provide better compression and random access times than gzip). If you need temporarily write access, the cow (Copy On Write) filter might be useful.

Create the compressed partition image

dd if=/dev/sda1 bs=16M | xz -9 --block-size=16MiB > sda1.img.xz

A --block-size option of 16 MiB should provide good random access performance.

Note: you may use alternative xz compression programs such as pixz which provides parallel compression, just make sure it splits the output in multiple small blocks, otherwise nbdkit has to decompress a lot of data. For example as of September 2015, pxz does not support this.

Serve it with nbdkit

nbdkit --no-fork --user nobody --group nobody -i 127.0.0.1 \
       --filter xz file sda1.img.xz

Connect to the NBD server

nbd-client 127.0.0.1 10809 /dev/nbd0 -nofork

Mount it read-only

mount -o ro /dev/nbd0 sda1

When done

umount /dev/nbd0
nbd-client -d /dev/nbd0

Stop the nbdkit server by pressing Ctrl+C (or with kill).

  • See also for 'Viewing a gz / xz Compressed dd image "on-the-fly"': https://askubuntu.com/questions/836217/how-to-mount-a-compressed-disk-image/859410#859410 – user1742529 Dec 25 '19 at 13:55
  • See also for "Mount zip file as a read-only filesystem: fuse-zip, archivemount, fusermount": https://unix.stackexchange.com/questions/168807/mount-zip-file-as-a-read-only-filesystem – user1742529 Dec 25 '19 at 13:58
  • Also you can see "Archival filesystem or format: virt-sparsify, guestmount, qcow2 gzip, ndbkit xz plugin, read-only mksquashfs": https://stackoverflow.com/questions/6147303/archival-filesystem-or-format – user1742529 Dec 25 '19 at 14:11
  • 1
    Addendum 1 guestfish for mounting: https://unix.stackexchange.com/a/138367/318461 - Addendum 2 nbdkit: "specify the block size": https://unix.stackexchange.com/a/405820/318461 – Cadoiz Jan 20 '21 at 01:08
12

This answer complements Cristian Ciupitu's answer. If you use xz compression with a reasonable block size, you can access the disk image using guestfish or other libguestfs tools like this:

nbdkit xz file=disk.img.xz --run 'guestfish --format=raw -a $nbd -i'

UPDATE: Since xz is not a plugin anymore, but has become a filter, the command is now:

nbdkit file disk.img.xz --filter xz --run 'guestfish --format=raw -a $nbd -i'
Rich
  • 313
  • 1
    See original answer is better: https://unix.stackexchange.com/questions/31669/is-it-possible-to-mount-a-gzip-compressed-dd-image-on-the-fly/138081#138081 – user1742529 Dec 25 '19 at 14:17
  • Guestfish is nice and can also be used for .vhd(x) files without restrictions, they are pretty common in the Windows world. For the full solution on mounting look here: https://www.how2shout.com/linux/mount-virtual-hard-disk-vhd-file-ubuntu-linux/?unapproved=516&moderation-hash=6ecfb4ca6f97ff53d32570f3ba2e28f0#comment-516 (I expect this to also help here as guestfish is discussed in detail) – Cadoiz Jan 20 '21 at 01:05
11

Not really. You can't really seek to a specific block in the compressed file without decompressing the whole thing first, which makes it difficult to use the compressed image as a block device.

You could use something like dump and restore (or tar, really), all of which use a streaming format...so you can access invidividual files by effectively scanning through the uncompressed stream. It means if the file you want is at the end of the compressed archive you may have a long time to wait, but it doesn't require you to actually decompress everything onto disk.

Using tar for backups may seem a bit old fashioned, but you get a lot of flexability.

larsks
  • 34,737
  • 1
    The problem lies in the fact that I do not even know if the file of interest is actually on this compressed backup... Do you know of a file explorer that will go through the whole .gz'd image, keep the file/dir structure in memory, provide a simple view of the structure and allow you to 'pick' files (now that it knows where they exist) to extract? Its a very niche specification... but I could see tons of uses for something like this... if it exists. – g19fanatic Feb 15 '12 at 12:09
  • 1
    If it doesn't, would you be able to point me towards some instruction on how to pull the structure from the gz'd image? I would be able to create such a program (program for a living...) but am blind on the topic of decompressing image data and the specifics of different filesystems. – g19fanatic Feb 15 '12 at 12:11
  • I suspect that building your own tool is going to be a larger project than you really want to undertake. However...assuming that you have an ext[234] filesystem, I would suggest the e2fsprogs package, or maybe something like fuse-ext2. Both provide user-space tools for interacting with ext[234] filesystems. – larsks Feb 15 '12 at 13:59
  • Also note that what you have doesn't appear to be a filesystem image, it's a whole disk image, which means you'll first have to parse out the partition table and locate the appropriate partition. – larsks Feb 15 '12 at 14:00
  • I mistyped in the above question and will fix it. I usually do a partition based dd image and save a copy of the partition table. I used to do whole disk copies but hated needing to mount with options to get to the proper location. – g19fanatic Feb 15 '12 at 14:51
  • Looking through some of the documentation for tar, e2fsprogs and fuse-ext2, I'm not seeing how I can take the gz img and use these programs to even search for one file in an on-the-fly manner. – g19fanatic Feb 15 '12 at 14:56
  • You can't. You were suggesting you were going to write your own tool. These projects provide examples of code that interacts directly with ext[234] filesystems. I believe your only practical solution is to unpack the image onto disk or onto a tmpfs filesystem if you have the available memory. tar would only help if you had a compressed tar archive instead of a filesystem image -- I had suggested that as a more flexible solution than using filesystem images. – larsks Feb 15 '12 at 15:09
3

Another addendum to Cristian Ciupitu's answer:

If you use nbdkit to mount a full disk image (vs. a partition image), you might need to specify the block size (sector size of the disk) when connecting to the NBD server, as it defaults to 1024 bytes. To use 512 bytes instead:

nbd-client 127.0.0.1 /dev/nbd0 -b 512 -n

After that, the disk will appear as /dev/nbd0, and you should be able to view the partition table using fdisk -l. However, the partitions are not yet mountable - Use kpartx (from doug65536's answer) to create devices for the partitions, e.g.:

kpartx -arv /dev/nbd0

Finally, the partitions will appear in /dev/mapper/, and you can mount them as usual. Make sure to use readonly mode (-o ro), as the xz plugin only supports reads:

mount -o ro /dev/mapper/nbd0p3 /mnt
0

Yes, AVFS would work.

First you create a "portal" over your filesystem where all operations performed on your files are intercepted through the AVFS FUSE system:

~ $ mkdir avfs                # the AVFS mount-point
~ $ avfsd -o allow_root avfs  # Needs `user_allow_other` option in /etc/fuse.conf
~ $ cd avfs
~/avfs $ ls  # your root filesystem listed

Then attach a loopback device with partition table on the transparently uncompressed disk-image located in the ~/avfs mount-point:

ℹ️ Note: the # character suffix is not a comment but part of the AVFS-syntax that uncompresses the file.

~/avfs $ sudo losetup -Pf ./absolute/path/to/full-disk.img.gz#
/dev/loop0

The command above selected and printed the first unused loop-device.

The kernel has now created additional loop-devices for all the partitions contained in the full-disk.img.gz. You can mount them the usual way:

$ sudo mount /dev/loop0p1 /some/mount/point/
ankostis
  • 533