For over a year I have been fighting a battle with partitions and drives corruptiong on me without rhyme or reason. Even from a LiveCD I would hit snags as I struggled to get control and recover data. I was ready to turn to recovery tools, but found hope that gparted with gpart could manage what I really needed was just to get my data off to a backup, then
Unfortunately while gparted correctly identified my folders and files and determined my drive structure, it merely reported there were discrepancies in the partition table, but did not correct them. Under Devices it offered to do a Data Rescue, but turns out that you need gpart installed for that. It is not on the LiveCD, and the limited Repositories allowed the LiveCD does not have it. You have to do a full install with upgrades to get anywhere.
But you do not date install to a corrupted drive without a complete rebuild. So
I picked the /dev/sdb1 to be root, and made sure noting else was picked other than the swap partition on the removable drive. The installer assumes any existing swap partitions can be used as well, but on a corrupted drive that is a dangerous assumption. The install went well, but I got off my old distro and went with UbuntuGnome 16.04 in itd place. I wanted to elemenate the old distro as the cause of my woes, but I was pretty sure
The install went smoothly, but it was different. A couple of hours with it convenced me I liked it far better than my old distro. Itg was a bit raw and had some buggy parts, but all and all, it was more than decent. I did't like that it only had Firefox as a browser, and did not have gpart or parted or other recovery tools on it, but looking at the underbelly of the DVD disk, it had about twice as much bundled software as my old distro, so that was good.
Deciding not to go back to my old distro, and finding it inconvenient to work from a LiveCD, I decided to do a second install to the second new partition. in both installs I had the boot process put on /dev/sdb, not on /dev/sda, which is the default. /dev/sda was corrupted, so no writes to it until I get my user accounts under /home copied off. Otherwise I might accidentally corrupt it worse. Each time you did a new install or run update0grup in a terminal, the current install becomes primary. Now I was ready to boot up the removable drive. I had previously deleted everything except /home on two of the 3 partitions on /dev/sda. From a LiveCD, you should be able to mount partitions, then you use a terminal and enter these commands:
sudo su root
cd /m*/*/*/home
The first command gives you root id and power permanently while in that terminal session, no timeouts. But it leaves $HOME and $USER as they were. However "~" is changed to /root. You can exit the root is vy typing "exit" or doing a "su" to a different identity.
The second command may have to be modified. It allows for the partition to be mounted under either /mnt or /media or any other folder starting with "m". Normally these are the only two in a Linux system. But even without the "m", the command will only succeed if there is a "home" folder two levels in. If this is not a drive with "home" on it, pick a folder name on the partition to help. If you don't know what is on the partition, you can use this command sequence to get you most of the way there:
cd /m*/$USER; dir *; dir */
This will likely scroll beyond view. To scroll up or down, hold down the Ctrl+Shift keys and use the up and down arrow keys. Identify the partition you want to get to, and do this: "cd ". Again, you only have to enter part of whatever is, using * to fill in the rest.
Since we are here to retain user accounts and eliminate everything else, I am assuming you used "home" are now on the right partition. In fact your are in .../home at this point. So everything we want to delete is one layer back. We identify the present level with just a period (.), and one level back with 2 periods (..). Now "home" is the only folder at that level that starts with "h", and that makes our job easy:
rm -r ../[!h]*
That command removes everything recursively one level back that does not start with "h". That includes files and folders, and with the -r, that means what is in those folders regardless of whether they have an "h" in their names or not. That's it. Now you want to try and get your data off if possible. But to do that, I want gpart, and that meant boot up one of the two newly installed partitions. So I typed "reboot now", and went through the restart process.
It came up as planned, but after installing some needed and wanted software via the terminal window using apt-get and some other commands, I decided to go ahead and modify /etc/sudoers with this command:
sudo echo $USER ' ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers
See, its's not really necessary to edit a file if you are just going to append something to it. This would do the trick. Now when I use "sudo", I will not be asked for a password.
That done, I contemplated how I wanted to get the data off. There was gpart, which did a good job making 2 of my 3 partitions readable again when it was called by gparted. With the -w argument, it could vopy what it read to another location, But it would probably overwrite what was already there. rsync allows for syncronization with includes and excludes where you keep what is newest, but so does "cp -purf" without benefit of the includes and excludes. But there weren't many excludes to worry about except for the trash. And there was ddrescue, and I hadn't even studied the contents of several rescue disk ISOs I had downloaded. I wasn;r interested in rescuing a corrupt drive anyway, just getting the data off if possible and starting over.
Then my new install ran into partition problems. Now I knew with a certainty that it was due to one of two causes: Either ext4 waa bad, or swap. Thise were the only two partition types I had been using in a long, long time. Everything else had changed, but these were the two constants, I didn't know what to do about swap, but it's been around for a long time, and it's role is reasonably simple, so it was an unlikely candidate. More than likely it was ext4, and that I could change.
I started over on the removable drive with the LiveCD and gparted. O figured I could drop back a version and try ext3 next. Working my way back up, it bombed worse than ext4 did. fsck let me know ext4 was handling ext3, and it was finding an unbelievable amount of errors in a disk it had just formatted, on a new drive, that had been verified as clean. Turns out there is no separate ext? format anymore, and the only way you can get an old copy of ext4 is via an old LiveCD. You would have to back a year or more, and that might help on the install, but the first upgrade would replace it with what appeared to be a defective version.
Have I opened a bug report? No, and I don't intend to do so. First of all, this is my personal experience, I can't speak for anyone else. That it has happened on 4 PCs and six hard drives could just be coincidence or reflect a bad mix of software. right? It needs confirmation, so if you have also been having a bad run of luck and you use ext4, that might be worth talking about.
Second, I am fed up with sites that put the burden of proof on the user, or restrict threads and posts to their concept of what is needed, Not everything good comes out of a mold. If there is a bug, they need to get wise to it on their own. Don't put it on me to point fingers at as specific package or combination and say "Here it is! I found it for you!". That's not my role here. I'm just a user, not a maintainer or developer.
That said, I needed to pick a different structure for my partitions, but which one? I searched the Internet, but it's not a big topic, and everybody leaves it to personal choice. I then considered the options in gparted and the installer when you use "Something else". They don;t agree. There are some matches of course, but not that many. You lose the ext2, ext3, aND ext4 choices immediately but you need to rule out FAT16, FAT32, and NTFS as well. I won;t explain why, just don;t pick them unless you really need to for Windows or DOS compatibility. Well, I will explain in brief: FAT16 bis too limited, best for floppies, FAT32 is weak, and NTFS is flawed and has no good recovery tool on either the Windows or Linux side.
Rather than gamble on 1 partition type again, I decide to go with at least two. The overlap between gparted and the installer featured 3m and I picked jfs and xfs. I did them both, one per partition, and so far no problems.
As to the data recovery, the 3rd partition is completely gone. The partition table entry for swap, the 4th partition, jumped from about 6GB to about 58GB, mapping over as significant part of the 3rd partition. It was largely redundant anyway, as I kept things there, but had no real time to use it.
I decided to just use "cp -purf If the folders and files were intact, I would get them easily enough. Of not, I did not want them anyway. I would recover /dev/sda1/* to /dev/sdb1/, and /dev/sda2/* tp dev/sdb2/. I took one further step: I mounted /dev/sda1 and /dev/sda2 as read only. I wasn't going to take any chances on a write operation going bad. Sounds totally unlikely, but I've had a lot of things go bad in recent months, and it had been getting worse. May I can break free of it now.
Oh, the commands actually used:
dir /mnt
sudo -i
mkdir /mnt/sda1
mkdir /mnt/sda2
mkdir /mnt/sda3
sudo mount -o ro /dev/sda1 /mnt/sda1
sudo mount -o ro /dev/sda2 /mnt/sda2
sudo mount -o ro /dev/sda3 /mnt/sda3
dir /mnt/sda1
home hope
dir /mnt/sda2
hold home
dir /mnt/sda3
lost+found
dir /mnt/sda3/lost+found
mkdir /mnt/hold1
cp -rfup /mnt/sda1/home/* /media/$USER/sdb1/home/; cp -rfup /mnt/sda2/hold/* /media/$USER/sda1/home/; cp -rfup /mnt/sda2/home/* /media/$USER/sdb2/home/; cp -rfup /mnt/sda2/hope/* /media/$USER/sda2/home/
Using this technique, I consolidated 2 folders on /dev/sda1 into one folder on /dev/sdb1, and did the same with respects to the other two partitions. Now I will start over with /dev/sda and put gparted to work again. Much faster and more thorough than trying to repair a corrupted drive, which is an uncertain proposition at best.
sync
norecho 3 > /proc/sys/vm/drop_caches
are destructive, so they're probably irrelevant. – derobert Oct 28 '13 at 20:47