I hope that this is not a duplicate question. I have seen several similar questions, where the answer was to blacklist the respective device or partition. But in my case, I can't do that (see below). Having said this:
On a debian buster x64 host, I have created a VM (based on QEMU). The VM runs on a block device partition, let's say /dev/sdc1
. I have installed the debian system on that partition basically like that (some steps omitted):
#> mkfs.ext4 -j /dev/sdc1
#> mount /dev/sdc1 /mnt/target
#> debootstrap ... bullseye /mnt/target
Then I bind-mounted the necessary directories (/dev
, /sys
etc.), chrooted into /mnt/target
, completed the guest OS installation and booted the VM.
The VM first started without issues. But with every VM reboot, the VM got more problems, which I was repairing at the GRUB
and initramfs
prompts, until repairing was not possible any more because obviously the ext4
file system had been damaged.
Because I originally thought that I had done something wrong, e.g. forgot to unmount the ext4
partition before starting the VM, I repeated the whole installation from scratch multiple times. The result was the same in every case: After a few restarts, the ext4
file system was so damaged that I couldn't repair it.
Accidentally, I have found the reason for this, but have no idea how to solve the problem. I noticed that e2fsck
refused to operate on that partition, claiming that is was in use although it was not mounted and the VM was not running. Further investigation showed that there existed a kernel thread jbd2/sdc
.
That means that the host kernel accesses the journal on that partition / file system. When I start the VM, the guest kernel of course does the same. I am nearly sure that the corruption of the file system is due to both kernels accessing the file system, notably the journal, at the same time.
How can I solve the problem?
I cannot blacklist the respective disk or the respective partition on the host, because I need to mount them there to prepare or complete the guest OS installation in a chroot. On the other hand, it doesn't seem possible to tell the host kernel to release the journal as soon as the VM starts.
I have installed a lot of VMs in the past years exactly the same way, but did not turn on the journal when creating their ext4
file system. Consequently, I didn't have that issue with those VMs.
Edit 1
In case it is relevant, when mounting the partition and chrooting into it to complete the guest OS installation, I use the following commands:
cd /mnt
mkdir target
mount /dev/sdc1 target
mount --rbind /dev target/dev
mount --make-rslave target/dev
mount --rbind /proc target/proc
mount --make-rslave target/proc
mount --rbind /sys target/sys
mount --make-rslave target/sys
LANG=C.UTF-8 chroot target /bin/bash --login
When unmounting, I just do
umount -R target
The umount
command does not report any error.
-o norecovery
to mount ? – steve Jun 22 '22 at 06:49e2fsck
on that partition after having it unmounted. I even can't create a newext4
file system on that partition after having it unmounted! – Binarus Jun 22 '22 at 06:55tune2fs -O ^has_journal /dev/sdXY
– steve Jun 22 '22 at 07:00e2fsck -p /dev/sdc1
, where /dev/sdc1 is the guest's partition, on the host when it is not mounted and not used by QEMU. – Vilinkameni Jun 22 '22 at 07:05umount
did not give any errors. Maybe the problem is due to the rbinds? I'll update my question to show them exactly. – Binarus Jun 22 '22 at 07:06e2fsck
on that partition becausee2fsck
claims that it is in use after unmounting it. That was what put me on the right track regarding the cause of the problem ... – Binarus Jun 22 '22 at 07:09jbd2/sdc
even after the file system has been dismounted; I was usinglsof
to find that out. This means that the host kernel accesses the journal of that ext4 file system even after it has been unmounted. – Binarus Jun 22 '22 at 07:18--rbind
and--make-rslave
and double calls of mount for each mountpoint are necessary instead of single mount calls per mountpoint with a simple-B
? That might be causing the issue. – Vilinkameni Jun 22 '22 at 09:05mount
as well as/proc/self/mountinfo
as well as/proc/self/mounts
did not output / contain an entry relating to the partition or directory in question. Neither didlsof
orfuser
. I believe that I am a victim of that kernel bug. debian buster comes with 4.19 with debian patches. – Binarus Jun 22 '22 at 10:18mount -B
: It does not not bind recursively, and nowadays it does a shared bind (instead of the usual private bind) due to some changes insystemd
, which is fatal in use cases like the one described above. For more information, see https://wiki.debian.org/systemd#Known_Issues_and_Workarounds – Binarus Jun 22 '22 at 10:24-o norecovery
, the host kernel does not put its hands on theext4
partition's journal, and there are nojbd2/sdc
entries any more in the output oflsof
. If you make your comment an answer, I'll accept it. Besides that, I guess that the debian kernel is buggy: I still even can'te2fsck
that partitions as soon as I have mounted and unmounted it, but at least it doesn't damage the file system any more. – Binarus Jun 22 '22 at 10:30