16

I want to automatically test if a piece of software reacts as expected if an essential SQLite DB file fails to be read (causing an I/O error). Exactly that happened some days ago at a client. We manually fixed it but now I want to create automatic code to fix it and need access to a broken file to test that.

As everything in Unix's a file, I suspected that there might be a special file that always causes I/O errors when one tries to read it (e.g. in /dev).

Some similar files (imo) would be:

  • /dev/full which always says "No space left on device" if you try to write it
  • /dev/null and /dev/zero

so I assumed there just has to be a file like that (but haven't found one yet).

Does anyone know such a file or any other method for me to get the desired result (a intentionally faulty partition image, a wrapper around open() using LD_PRELOAD, ...)?
What's the best way to go here?

mreithub
  • 3,583

5 Answers5

20

There's a great set of answers to this on Stack Overflow and Server Fault already but some techniques were missing. To make life easier here's a list of VM/Linux block device/Linux filesystem/Linux userspace library I/O fault injection mechanisms:

Bonus fact: SQLite has a VFS driver for simulating errors so it can get good test coverage.

Related:

Anon
  • 3,794
8

You can use dmsetup to create a device-mapper device using either the error or flakey targets to simulate failures.

dmsetup create test --table '0 123 flakey 1 0 /dev/loop0'

Where 123 is the length of the device, in sectors and /dev/loop0 is the original device that you want to simulate errors on. For error, you don't need the subsequent arguments as it always returns an error.

psusi
  • 17,303
  • 1
    I find at least two errors in that command: The missing device name, the quoting typo, and what is "1 0 /dev/null" supposed to mean? – Hauke Laging May 29 '13 at 13:40
  • @HaukeLaging, ahh, yes, I left out the name and somehow hit the wrong quote. The 1 0 /dev/null means 1 target, starting at offset 0, backed by device /dev/null. It is needed for flakey, but apparently is optional for error. – psusi May 29 '13 at 13:44
  • It seems to me that it's not "optional" but simply ignored. You may check with dmsetup table test. You can even write foo bar behind error; it just doesn't care (and thus should be deleted). – Hauke Laging May 29 '13 at 13:48
  • @HaukeLaging, edited. – psusi May 29 '13 at 13:54
  • Thanks for the answer, I think that's the way I'll go for now. The only minor issue I have with this is that it requires root access, but I guess you'll need that anyway or such lowlevel stuff... (I'll dig into the LD_PRELOAD idea when I have time). – mreithub May 29 '13 at 15:06
  • 1
5

You want a fault injection mechanism for I/O.

On Linux, here's a method that doesn't require any prior setup and generates an unusual error (not EIO “Input/output error” but ESRCH “No such process”):

cat /proc/1234/mem

where 1234 is the PID of a process running as the same user as the process you're testing, but not that process itself. Credits to rubasov for thinking of /proc/$pid/mem.

If you use the PID of the process itself, you get EIO, but only if you're reading from an area that isn't mapped in the process's memory. The first page is never mapped, so it's ok if you read the file sequentially, but not suitable for a database process that seeks directly to the middle of the file.

With some more setup as root, you can leverage the device mapper to create files with valid sectors and bad sectors.

Another approach would be to implement a small FUSE filesystem. EIO is the default error code when your userspace filesystem driver does something wrong, so it's easy to achieve. Both the Perl and Python bindings come with examples to get started, you can quickly write a filesystem that mostly mirrors existing files but injects an EIO in carefully chosen places. There's an existing such filesystem: petardfs (article), I don't know how well it works out of the box.

Yet another method is an LD_PRELOAD wrapper. An existing one is Libfiu (fault injection in userspace). It works by preloading a library that overloads the POSIX API calls. You can write simple directives or arbitrary C code to override the normal behavior.

2

The solution is a lot easier if it's OK to use a device file as "file with I/O errors". My proposal is for those cases where a regular file shall have such errors.

> dd if=/dev/zero of=/path/to/ext2.img bs=10M count=10
> losetup /dev/loop0 /path/to/ext2.img
> blockdev --getsz /dev/loop0
204800
> echo "0 204800 linear /dev/loop0 0" | dmsetup create sane_dev
> mke2fs /dev/mapper/sane_dev # ext2 reicht
> mount -t ext2 /dev/mapper/sane_dev /some/where
> dd if=/dev/zero of=/some/where/unreadable_file bs=512 count=4
> hdparm --fibmap /some/where/unreadable_file
/mnt/tmp/unreadable_file:
 filesystem blocksize 1024, begins at LBA 0; assuming 512 byte sectors.
 byte_offset  begin_LBA    end_LBA    sectors
           0       2050       2053          4
> umount /dev/mapper/sane_dev
> dmsetup remove sane_dev
> start_sector=$((204800-2053-1))
> echo $'0 2053 linear /dev/loop0 0\n2053 1 error\n2054 '"${start_sector} linear /dev/loop0 2054" | 
>   dmsetup create error_dev
> mount -t ext2 /dev/mapper/error_dev /some/where
> cat /some/where/unreadable_file # 3rd sector of file is unreadable
cat: /some/where/unreadable_file: Input/output error

I must admit that I am a bit confused because I haven't managed to read single sectors from that file without an error (with dd .. seek=...). Maybe that is a read-ahead problem.

Hauke Laging
  • 90,279
  • Your filesystem's blocks are at least 4096 bytes in size so they will span multiple sectors even if the file is small. – Anon Sep 11 '17 at 06:38
1

You could use CharybdeFS that was made exactly for this kind of purpose.

It's a passthrough fuse filesystem like PetardFS but much more configurable.

See the CharybdeFS cookbook here: http://www.scylladb.com/2016/05/02/fault-injection-filesystem-cookbook/

It's advanced enough to test a database.