4

I have a Linux based STB (set-top-box) and it features a 64 MB flash memory and 256 MB of RAM. I wanted to take a backup of some of my settings before I flash it with another image, but I wasn't sure where exactly they were located. I figured I would look into that later on. So I decided to connect to the box over FTP and download all the files and folders. Within the FTP client, I right-clicked on the root of the box and chose do download it to a dedicated folder on my desktop in Windows.

The download just kept on going, it seemed like it would never stop... but then the FTP connection got terminated by the FTP server (I think it said in the log). I ended up with 2.97 GB of data. How is this possible? Where is all this data coming from?... it can't even hold more than 256 MB at most?!...

Why can't you just copy the root of a Linux machine right off, and expect all the other files and folders to follow? Is it not the same as copying the C:\ on Windows? Is it because it's a live system?... maybe I have to shut it down first or log off and stop processes? It was in standby at the time...

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
Samir
  • 826

3 Answers3

8

At least 3 different things could explain why you transferred more data than could possibly have been stored on the STB:

  • Sparse files: Files always appear to contain a continuous sequence of bytes from the start of the time to the current length of the file. But you can create a (usually binary) file and only write to certain byte ranges. In this case, the empty holes between these byte ranges (which have never been written to) appear to contain 0-values bytes when read. File systems usually notice when software creates these "holes" and doesn't actually store the holes on disk. In this way, you can create a 1000000-byte file, write a single byte at position 999999, and note that the file is almost one megabyte in size but only consumes a single block of disk space.

    Certain kinds of database or index files might commonly be sparse, if the file format calls for certain parts of the file to be at certain byte offsets but not everything is filled in.

    File copiers can't tell that a file was sparse at the start location, so they just read the whole file as a stream of bytes form the source and write the same stream of bytes to the destination. Since every byte of the file gets written at the destination, the destination's filesystem doesn't create a sparse file.

    If you suspect that sparse files in the data set are causing it to increase in size, try the --sparse option to rsync. It will opportunistically create sparse files on the destination whenever there were large runs of 0-values bytes in the source. (It can't tell is the source file was actually sparse, only potentially sparse, but it makes it sparse on the destination anyway.)

    Your STB probably contains an internal database of some kind which could be implemented with one or more sparse files. Look for very large files on the source filesystem, in particular files that are larger than the amount of storage on the STB. Those have to be sparse.

  • Things mounted in more than one place. Embedded systems like STBs often have a strange filesystem layout owing to the fact that they may have a mix of read-only and read-write partitions that are part of the manufacturer's software distribution and user data, respectively, different types of filesystems designed for use on raw flash (not block devices), bootloader partitions, union-mounted filesystems that allow very easy implementation of factory reset features, ramdisks in order to gracefully survive power loss without filesystem corruption, etc... As a result, the actual same content might appear mounted at several different independent locations (e.g. in factory-original form, as a union mount, bind mounts for other purposes...)

    To crack this nut, the df command might be helpful although some embedded systems manufacturers do things that are weird enough that it might not be clear what's doing on from df output. But you should at least be able to see what filesystems exist and how full each one of them is.

  • Hard links: FTP doesn't recognize hard links, so if you ask it to copy two links to the same file, it will copy the file twice and it will take up twice the space on the destination side. If the file has more than 2 links, multiply accordingly.

    To help with this, try rsync's --hard-links option.

Note that in two out of three cases, I've recommended that you use rsync to copy the files. This is only possible if you have shell access to the STB and rsync is installed (or you can install it), or if the STB offers rsync as a file transfer protocol (a STB probably doesn't, but some NAS appliances sold for home use do).

If you can use it at all, rsync is a great way to copy large amounts of data from one system to another. Not only does it have options to solve two out of the three problems mentioned above (or maybe all 3? Look at --one-file-system) but it's very handy for resuming an interrupted copy.

Celada
  • 44,132
4

In Windows terms, you didn't just copy the c: drive, you also copied all kinds of files that are not disk files but instead hardware devices, and you copied some files many times. Probably including the whole disk contents and a dump of the RAM several times over.

On Linux and other unix-like systems, almost everything is a file. In addition to regular files and directories, there are symbolic links (pointers to other files) and device files which represent hardware devices (disks, partitions, the RAM, serial ports, etc.). There are also special filesystems which are not stored on a disk, but let applications access data about the system: /proc (procfs) and /sys (sysfs).

Among the devices in /dev, there are even infinite files — files that you can keep reading forever. There's /dev/zero, which contains as many null bytes as you care to read from it. There's also /dev/urandom, which contains as many random bytes as you care to read from it — so to get n random bytes, you read n bytes from /dev/urandom.

If you used an FTP program to transfer the whole filesystem tree, it copied everything and was probably either copying the large amount of things that can be obtained from /proc, or more likely the infinite amount of data from /dev.

Further reading:

If you have some way to connect to the box other than FTP, for example an SSH command line, use that instead of FTP, which doesn't know about special files. Run the command df to see what filesystems are present. You can make a backup of the root filesystem with the command

rsync -a -x root@settopbox:/ settopbox.backup

(Note the -x option to tell the rsync program not to cross filesystems.)

The root filesystem might not be the interesting one to back up, some devices are set up with a read-only root filesystem and a different read-write filesystem containing settings. Post the output of the commands df and mount if you need help figuring out which one(s) to back up.

Alternatively, back up the flash memory itself. You'll have to find what the device name is. Try the following commands to look for block devices, i.e. devices that correspond to a disk or disk partition or other similar device:

find /dev -type b
ls -l /dev /dev/* | grep '^b'

If you aren't sure what the devices mean, post the output of these commands.

  • Good point about device files. Using a file transfer program that tries to copy /dev/zero for example by blindly reading from it will certainly be a problem! I should have included those in my three possible explanations! – Celada Sep 23 '12 at 20:38
0

The reason is because Linux has something called the proc filesystem.

proc is mounted on /proc and represents in kernel data structures. One such object is /proc/kcore which is a binary image of the kernel's memory core. I.e., all memory in use on the system including virtual memory.

Here's an example on my workstation:

$ cat /proc/meminfo | grep MemTotal
MemTotal:        3507728 kB
$ ls -lh /proc/kcore
-r-------- 1 root root 128T 2012-09-21 17:24 /proc/kcore

As you can see I only have 4GB of RAM. While /proc/kcore is a whopping 128TB! This is significantly more (approximately 32,000 times more) memory than I have.

bahamat
  • 39,666
  • 4
  • 75
  • 104