2

I don't know what's going on. The last evening I left my computer with about 700MB of free space on the system partition and today there is no free space on disk, moreover a lot of files containing data is, according to system, 0 bytes, but actually files are full of right data like text (and they have the icon of blank document). After deleting some files I can see no changes - still no free space. These files were deleted permanently and were not used during deleting.

Yesterday I did command sync once and for the first time ever. Today I also did following as root: sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'.

It looks like it's something wrong with partition or actually file system. There was no rebooting the system for a long time and this is in my case the last thing I would like to do now.

Moreover, I noticed that I've got no command history in terminal - the file .bash_history is empty. Is this because of dropping cache or my disk is broken down? If the first, what else was cleared among noticeable things like the bash history?

Yesterday I used to make the disk working hard due to some actions like file serching and things alike but after that untill the evening everything was fine.

One more annoying thing I noticed was working process called update-apt-xapian-index-dbus which is still in background but it's sleeping now (I couldn't kill it, it comes back everytime I try so).

I see no other errors - the system is still working and it's stable. I'd like to know what is happening with my system... Do you have any suggestion? How to diagnose it? How to display real free space and sizes of files? Is a reboot necessary?

Edit: One thing else is that key strokes like Shift+Del are not working. And I was trying to find files created during the night, but find / -ctime=0 shows then also older files, same for -mtime. I frgot to mention that I use Ubuntu.

**EDIT: I just found the file called .xsession-errors and it's about 650MB, so maybe it's the space I lost (it maches), it's accessed and modified today one hour ago, but I can't say (and check) when it's created. How about this? Next to that file is .xsession-errors.old modified in the day of last reboot and it's under 0,5MB. Did I just find my "free space"?

And command sync can be a cause of problem with partition? I read so somewhere, but really can it?**

Edit: I opened the file .xsession-errors and what can I see inside is some true errors with description about displaying windows and thousands of lines with Illegal character <2e> in hex string and millions lines with Write error: Unknow error. I guess it was writing till free space on disk was available. The character is not always <2e> but it repeats the most often.

  • Neither sync nor echo 3 > /proc/sys/vm/drop_caches are destructive, so they're probably irrelevant. – derobert Oct 28 '13 at 20:47
  • Thanks. Is bash command history a part of cache? Should I assumpt that it's empty now because of dropping caches? – user49847 Oct 28 '13 at 20:53
  • The cache that you dropped stores in-memory copies of things that should also be on disk. So it should just reload from disk. It's probably empty because bash tried to write it out when you exited a shell, but couldn't because you were out of disk space. – derobert Oct 28 '13 at 21:00
  • It seems to be logical, thanks for the explanation. Have you got any ideas about my last editing? – user49847 Oct 28 '13 at 21:04
  • 1
    @derobert: while dropping caches is non-destructive, with faulty memory it forces to reload things from disk, which may make heisenbugs disappear temporarily. – ninjalj Oct 29 '13 at 13:08

4 Answers4

2

For finding files size, if you have corruption of the partition it's not going to report the correct size until the fs is vetted with fsck. You'll want to perform a fsck -P on the root disk, you can identify this disk by running df -h you should get something like this.

user@server:~> df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              25G   18G  5.9G  76% /
udev                  2.0G  116K  2.0G   1% /dev
/dev/sda1             244M   20M  211M   9% /boot
/dev/sda5             4.0G  1.7G  2.2G  43% /var
/dev/sda6             4.7G  1.2G  3.3G  27% /tmp
/dev/sdb1             197G  127G   61G  68% /data

You'll want to boot into recovery mode from grub or live from a disk (preferable). The live environment is preferable because if you have file system corruption on /, the fsck utility could be corrupted and cause damage. If you can boot into a live environment / will not likely be mounted by default so df wont help, running sudo fdisk -l will list the available disk, once identified you can run the fsck on the intended device.

Also, you'll want to copy off any logs that have been rolled into archive. Assuming the drive isn't fubar and you can clear some space, having live rolling log output is critical for diagnosing bugs. You may also consider using something like rsyslog for writing your logs a database on another box. This would give you access to logs when you disk is getting wonky.

sean_m
  • 174
1

You likely want to run fsck ASAP, as noted in another answer. I'd try to do a backup first, especially if you are able to access the content of files despite them being listed with a 0 size.

See if a tar backup and restore to another machine fixes the file size.

cd /to/problem/area
tar -zcf - . | ssh user@othermachine.com tar -C /some/safe/dir -zxvf -

If so - back up everything then reboot into Recovery mode and fsck everything.

0

For over a year I have been fighting a battle with partitions and drives corruptiong on me without rhyme or reason. Even from a LiveCD I would hit snags as I struggled to get control and recover data. I was ready to turn to recovery tools, but found hope that gparted with gpart could manage what I really needed was just to get my data off to a backup, then

Unfortunately while gparted correctly identified my folders and files and determined my drive structure, it merely reported there were discrepancies in the partition table, but did not correct them. Under Devices it offered to do a Data Rescue, but turns out that you need gpart installed for that. It is not on the LiveCD, and the limited Repositories allowed the LiveCD does not have it. You have to do a full install with upgrades to get anywhere.

But you do not date install to a corrupted drive without a complete rebuild. So

I picked the /dev/sdb1 to be root, and made sure noting else was picked other than the swap partition on the removable drive. The installer assumes any existing swap partitions can be used as well, but on a corrupted drive that is a dangerous assumption. The install went well, but I got off my old distro and went with UbuntuGnome 16.04 in itd place. I wanted to elemenate the old distro as the cause of my woes, but I was pretty sure

The install went smoothly, but it was different. A couple of hours with it convenced me I liked it far better than my old distro. Itg was a bit raw and had some buggy parts, but all and all, it was more than decent. I did't like that it only had Firefox as a browser, and did not have gpart or parted or other recovery tools on it, but looking at the underbelly of the DVD disk, it had about twice as much bundled software as my old distro, so that was good.

Deciding not to go back to my old distro, and finding it inconvenient to work from a LiveCD, I decided to do a second install to the second new partition. in both installs I had the boot process put on /dev/sdb, not on /dev/sda, which is the default. /dev/sda was corrupted, so no writes to it until I get my user accounts under /home copied off. Otherwise I might accidentally corrupt it worse. Each time you did a new install or run update0grup in a terminal, the current install becomes primary. Now I was ready to boot up the removable drive. I had previously deleted everything except /home on two of the 3 partitions on /dev/sda. From a LiveCD, you should be able to mount partitions, then you use a terminal and enter these commands:

    sudo su root
    cd /m*/*/*/home

The first command gives you root id and power permanently while in that terminal session, no timeouts. But it leaves $HOME and $USER as they were. However "~" is changed to /root. You can exit the root is vy typing "exit" or doing a "su" to a different identity.

The second command may have to be modified. It allows for the partition to be mounted under either /mnt or /media or any other folder starting with "m". Normally these are the only two in a Linux system. But even without the "m", the command will only succeed if there is a "home" folder two levels in. If this is not a drive with "home" on it, pick a folder name on the partition to help. If you don't know what is on the partition, you can use this command sequence to get you most of the way there:

   cd /m*/$USER; dir *; dir */

This will likely scroll beyond view. To scroll up or down, hold down the Ctrl+Shift keys and use the up and down arrow keys. Identify the partition you want to get to, and do this: "cd ". Again, you only have to enter part of whatever is, using * to fill in the rest.

Since we are here to retain user accounts and eliminate everything else, I am assuming you used "home" are now on the right partition. In fact your are in .../home at this point. So everything we want to delete is one layer back. We identify the present level with just a period (.), and one level back with 2 periods (..). Now "home" is the only folder at that level that starts with "h", and that makes our job easy:

    rm -r ../[!h]*

That command removes everything recursively one level back that does not start with "h". That includes files and folders, and with the -r, that means what is in those folders regardless of whether they have an "h" in their names or not. That's it. Now you want to try and get your data off if possible. But to do that, I want gpart, and that meant boot up one of the two newly installed partitions. So I typed "reboot now", and went through the restart process.

It came up as planned, but after installing some needed and wanted software via the terminal window using apt-get and some other commands, I decided to go ahead and modify /etc/sudoers with this command:

    sudo echo $USER '    ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers

See, its's not really necessary to edit a file if you are just going to append something to it. This would do the trick. Now when I use "sudo", I will not be asked for a password.

That done, I contemplated how I wanted to get the data off. There was gpart, which did a good job making 2 of my 3 partitions readable again when it was called by gparted. With the -w argument, it could vopy what it read to another location, But it would probably overwrite what was already there. rsync allows for syncronization with includes and excludes where you keep what is newest, but so does "cp -purf" without benefit of the includes and excludes. But there weren't many excludes to worry about except for the trash. And there was ddrescue, and I hadn't even studied the contents of several rescue disk ISOs I had downloaded. I wasn;r interested in rescuing a corrupt drive anyway, just getting the data off if possible and starting over.

Then my new install ran into partition problems. Now I knew with a certainty that it was due to one of two causes: Either ext4 waa bad, or swap. Thise were the only two partition types I had been using in a long, long time. Everything else had changed, but these were the two constants, I didn't know what to do about swap, but it's been around for a long time, and it's role is reasonably simple, so it was an unlikely candidate. More than likely it was ext4, and that I could change.

I started over on the removable drive with the LiveCD and gparted. O figured I could drop back a version and try ext3 next. Working my way back up, it bombed worse than ext4 did. fsck let me know ext4 was handling ext3, and it was finding an unbelievable amount of errors in a disk it had just formatted, on a new drive, that had been verified as clean. Turns out there is no separate ext? format anymore, and the only way you can get an old copy of ext4 is via an old LiveCD. You would have to back a year or more, and that might help on the install, but the first upgrade would replace it with what appeared to be a defective version.

Have I opened a bug report? No, and I don't intend to do so. First of all, this is my personal experience, I can't speak for anyone else. That it has happened on 4 PCs and six hard drives could just be coincidence or reflect a bad mix of software. right? It needs confirmation, so if you have also been having a bad run of luck and you use ext4, that might be worth talking about.

Second, I am fed up with sites that put the burden of proof on the user, or restrict threads and posts to their concept of what is needed, Not everything good comes out of a mold. If there is a bug, they need to get wise to it on their own. Don't put it on me to point fingers at as specific package or combination and say "Here it is! I found it for you!". That's not my role here. I'm just a user, not a maintainer or developer.

That said, I needed to pick a different structure for my partitions, but which one? I searched the Internet, but it's not a big topic, and everybody leaves it to personal choice. I then considered the options in gparted and the installer when you use "Something else". They don;t agree. There are some matches of course, but not that many. You lose the ext2, ext3, aND ext4 choices immediately but you need to rule out FAT16, FAT32, and NTFS as well. I won;t explain why, just don;t pick them unless you really need to for Windows or DOS compatibility. Well, I will explain in brief: FAT16 bis too limited, best for floppies, FAT32 is weak, and NTFS is flawed and has no good recovery tool on either the Windows or Linux side.

Rather than gamble on 1 partition type again, I decide to go with at least two. The overlap between gparted and the installer featured 3m and I picked jfs and xfs. I did them both, one per partition, and so far no problems.

As to the data recovery, the 3rd partition is completely gone. The partition table entry for swap, the 4th partition, jumped from about 6GB to about 58GB, mapping over as significant part of the 3rd partition. It was largely redundant anyway, as I kept things there, but had no real time to use it.

I decided to just use "cp -purf If the folders and files were intact, I would get them easily enough. Of not, I did not want them anyway. I would recover /dev/sda1/* to /dev/sdb1/, and /dev/sda2/* tp dev/sdb2/. I took one further step: I mounted /dev/sda1 and /dev/sda2 as read only. I wasn't going to take any chances on a write operation going bad. Sounds totally unlikely, but I've had a lot of things go bad in recent months, and it had been getting worse. May I can break free of it now.

Oh, the commands actually used:

    dir /mnt
    sudo -i
    mkdir /mnt/sda1
    mkdir /mnt/sda2
    mkdir /mnt/sda3
    sudo mount -o ro /dev/sda1 /mnt/sda1
    sudo mount -o ro /dev/sda2 /mnt/sda2
    sudo mount -o ro /dev/sda3 /mnt/sda3
    dir /mnt/sda1
    home  hope
    dir /mnt/sda2
    hold  home
    dir /mnt/sda3
    lost+found
    dir /mnt/sda3/lost+found
    mkdir /mnt/hold1
    cp -rfup /mnt/sda1/home/* /media/$USER/sdb1/home/; cp -rfup /mnt/sda2/hold/* /media/$USER/sda1/home/; cp -rfup /mnt/sda2/home/* /media/$USER/sdb2/home/; cp -rfup /mnt/sda2/hope/* /media/$USER/sda2/home/

Using this technique, I consolidated 2 folders on /dev/sda1 into one folder on /dev/sdb1, and did the same with respects to the other two partitions. Now I will start over with /dev/sda and put gparted to work again. Much faster and more thorough than trying to repair a corrupted drive, which is an uncertain proposition at best.

0

Here is my guess of what happened.

  1. Your .xsession-errors file got filled up with 650+MB worth of junk.
  2. This filled up your home partition.
  3. Lots of programs that are still running continue to run and attempt to write to files on the home partition.

Some pointers below that describe some gotchas with his.

Real Disk Space

To check the disk usage on your machine, use the df command. Here's an example:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        10G  4.5G  4.5G  50% /
/dev/sda5        25G 22.5G    0G 100% /home

This should show you the free space on your disk.

You might notice something weird with the example above. In /dev/sda1, the stated partition size is 10G. However, if you add up the Used and Avail columns, you only get 9G. The same thing with /dev/sda5 - the Used and Avail columns fall a couple gigs under the filesystem size. What gives?

By default, ext partitions are created with reserved space, some small percentage of the filesystem that is not available to regular users, but is available to the root user. In the above, the reserved space for /dev/sda1 and /dev/sda5 are 10%. You can view the reserved space in /dev/sda1 by using tune2fs -l /dev/sda1.

What happens when a program writes to disk

The long and short of it is that if your partition is full, programs cannot write to the filesystem.

But what does it look like when a program cannot write to the filesystem? To explain this, you have to recall that programs write to the filesystem in 2 steps:

  1. the program acquires a FILE DESCRIPTOR from the operating system (with the write permissions).
  2. the program sends data to the FILE DESCRIPTOR. The operating system can then queue that data to be written.

Interestingly, this matches the kernel's representation of files, which makes a distinction between the following:

  1. the block in the disk that describes how large the file is, permissions, etc - aka the inode
  2. the blocks in the disk that holds the file's contents - aka the data blocks

inodes are small, relative to the blocks that hold data. Most filesystems distinguish between the space consumed by the data blocks and the space consumed by the inodes, so that when we do df -h, we are only reporting the space for the data blocks. As far as free space is concerned, inodes don't count.

How is this related to your situation?

If the disk is full, as in really 100% full, with no disk space, that doesn't mean that your system can't write to disk anymore. Programs can still acquire FILE DESCRIPTORS, thus creating 0-sized blank files. However, they will not be able to write data to those files. Many programs will either produce an error message or crash if it fails to write to a file due to disk space problems. But at the end of the day, you'll be left with a 0-sized file on your disk.

Example failed write

Let's give bash as an example.

  1. bash starts up and reads the HISTFILE variable, which tells it to open .bash_history for the command history
  2. according to the HISTSIZE variable, bash will only allow 1000 lines of history (or whatever is set in yours)
  3. you run some commands, then exit.
  4. bash now tries to write the last 1000 commands in your history to .bash_history
    1. bash acquires a write FILE DESCRIPTOR for .bash_history
    2. Linux now produces a 0-sized .bash_history file
    3. bash sends 1000 lines of data to the history FILE DESCRIPTOR
    4. Linux gives a write error due to disk space
  5. You are left with a 0-sized .bash_history file

This process easily repeats itself for any programs that try to write a whole file, as opposed to simply appending to the end of the file. So I would not expect logfiles to be zeroed out, although I would not expect them to be updated.

Any program that attempts to write to the disk will be affected by the above issue.

But wait, your system didn't crash, right?

I highly suspect that your system partition (/) is separate from your home partition (/home), which you should see if you run df. Furthermore, even if it isn't, your system programs that are running as the root user should be able to access the reserved space in your partitions, which should be something from 1-10% of your total partition size.

In all likelihood, you do not need to reboot your machine. However, I would not trust the stability of the system because some programs that intended to write their changes to temporary files on disk would not be able to. Since you're running a desktop, it will be difficult to determine which programs those are. I suggest you free up some space and then do a reboot, just to be sure.

madumlao
  • 1,716