5

I accidentally removed the wrong image file in my /var/lib/libvirt/images directory. I'm not sure how to recreate one or to undo my removal. Any hints?

PolkaRon
  • 387
  • 1
  • 6
  • 17
  • Absent time travel, now might be a good time to look into backup options, though that can be tricky with big binary files that may have open filesystems within them. – thrig Jul 21 '16 at 21:18
  • Is the VM that was backed by that image still running? (If so, do not shut it down). – Stephen Harris Jul 21 '16 at 22:03
  • Yeah, I am not shutting it down. I want to be able to get it to export its image file while it is on – PolkaRon Jul 21 '16 at 22:07
  • /var where you put stuff that should not be backed up. Therefore I assume that it can be regenerated, or is in the wrong place. – ctrl-alt-delor Jul 21 '16 at 22:58
  • Related: http://stackoverflow.com/questions/4171713/relinking-an-anonymous-unlinked-but-open-file. But that's talking about a file that's being appended only, not random-access. – Peter Cordes Jul 22 '16 at 02:40

2 Answers2

16

Since you haven't shut down the VM, then the process using that image file still has the file open and it hasn't actually been deleted yet. As long as the process keeps running, you should be able to recover it.

For this answer I have a kvm image called testdelete. The VM is up, but I have deleted the file.

First you need to find the process using the file. The easiest way is with lsof.

# lsof | grep /var/lib/libvirt/images/testdelete.img
qemu-kvm  29627      qemu    9u      REG                9,0  2147483648     399357 /var/lib/libvirt/images/testdelete.img (deleted)

This tells me it's process 29627 and file descriptor 9. Let's look at this

# cd /proc/29627/fd
# ls -l 9
lrwx------ 1 qemu qemu 64 Jul 21 18:13 9 -> /var/lib/libvirt/images/testdelete.img (deleted)

OK, good. That matches. Now let's recover it! You need a disk with enough free space to hold the whole image

Ideally your VM should be as quiescent as possible; because we're copying the raw disk image we do run a risk of corruption if some processes are writing to the disk. We can try to minimise this risk by sending a STOP signal.

# kill -STOP 29627

This effectively "freezes" the process. The backup we're now taking would be the equivalent of what happens after a hard crash; on reboot the OS will fsck (or equivalent) to recover.

Now we can copy the data

# dd if=9 of=/home/sweh/recovered.img bs=1M
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 5.74931 s, 374 MB/s

That looks perfect; the disk image was 2Gb and that's what it copied.

Does this image look good?

# cd /home/sweh
# sfdisk -l recovered.img 
Disk recovered.img: cannot get geometry

Disk recovered.img: 261 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
recovered.img1          0+     65-     66-    524288   82  Linux swap / Solaris
recovered.img2   *     65+    261-    196-   1571840   83  Linux
recovered.img3          0       -       0          0    0  Empty
recovered.img4          0       -       0          0    0  Empty

Yup, that looks like my partition table. At this point you can do other tests to verify the image looks good.

And that's it! You have recovered your image file.

NOTE: In this example I'm going to kill the existing qemu process. That step is irrevocable because it causes the disk to be freed up. If you want to do some "parallel run" testing then you can create a new image file and virsh define a new VM to use that.

Let's get the VM restarted with this. Destroy the old VM, copy the datafile into place and restart it.

# virsh destroy testdelete
# cp -v recovered.img /var/lib/libvirt/images/testdelete.img
`recovered.img' -> `/var/lib/libvirt/images/testdelete.img'
# virsh start testdelete
Domain testdelete started

Can we connect to the console?

# virsh console testdelete
Connected to domain testdelete
Escape character is ^]

CentOS release 6.8 (Final)
Kernel 2.6.32-642.3.1.el6.x86_64 on an x86_64

dhcp226.spuddy.org login: 

Recovery complete :-)

  • 1
    You might want to include kill -STOP in the process so that the recovered image is from a paused guest rather then a running one. (The filesystem will have less chance of corruption through change while being copied). – Chris Davies Jul 21 '16 at 22:44
  • Good point; I meant to add a step about quiescence, but didn't think of using SIGSTOP. That's nice. I've updated the answer with that hint. Thanks! – Stephen Harris Jul 21 '16 at 22:52
  • Well, a data saver anyway ;-) Glad to help! – Stephen Harris Jul 21 '16 at 23:09
  • should be possible, in theory, to create a hard-link on the original file-system. Then no additional storage is needed, and no worries about corruption. (not sure in practice) – ctrl-alt-delor Jul 21 '16 at 23:10
  • 1
    You can't hard link to /proc because it's a different filesystem. I wondered about trying to create a new link to the inode (399357 from the lsof output). I don't think there's a normal way of doing this. It might be doable via debugfs but that's not something I've really played with and is very filesystem specific (xfs and ext4 may not work the same). – Stephen Harris Jul 21 '16 at 23:16
  • @StephenHarris: Last I looked, it seems to be intentional that you can't link open file descriptors back into the filesystem.. linkat(2) will let you link a tmp file into the filesystem if it never had any links (i.e. opened with open("/some/dir", O_TMPFILE|..., 0666)), so it's possible but denied on purpose for security reasons. Interesting idea with debugfs. You might be able to use it on an ext4 without replaying the journal... – Peter Cordes Jul 22 '16 at 02:35
  • Also BTW, you could use dd conv=sparse to save disk space for the output. Or use cp --sparse=always. You could run fstrim inside the VM to issue discards for unused blocks of the disk image (which may result in the file having holes punched in it, depending on the VM host etc). This will make unused parts of it read as zero. – Peter Cordes Jul 22 '16 at 02:39
  • +1, very nice answer. On the VM, I'd stop any services that might write to files (especially binary files, e.g. mysql) and then run sync. I'd also try to make a tar.gz or rsync backup of the filesystems to another machine (the host if nothing else is available). On the host, I'd use virsh suspend rather than kill -STOP (and virsh resume to restart it). – cas Jul 22 '16 at 17:41
  • Yup, virsh suspend may work as well. I just tested; it leaves the image file open so can be used on my machine, but I'm not sure that'll always be true in every version. At least kill -STOP should always work regardless of libvirt version. – Stephen Harris Jul 22 '16 at 18:04
  • virsh suspend will always leave the VM running but suspended. It probably uses SIGSTOP to do it. The advantage is that you don't have to look up the PID yourself and there may be other stuff that needs to be done to safely suspend a VM that the signal alone won't do (i'd have to look at the libvirt source to be sure, and it's too later for that right now). – cas Jul 23 '16 at 15:37
-1

libvirt images are located by default at /var/lib/libvirt/images

You will need to find a way to restore the file, with some undelete softwares. This post maybe can help you: Unix/Linux undelete/recover deleted files

  • Welcome to U&L. Part of the attraction of this site is that we try to provide useful solutions rather than just "maybe this might work" answers. – Chris Davies Jul 21 '16 at 22:46