Unclear htop output when measuring IO usage of a dummy process

Question

I write a simple test.sh script which only does this: while true; do cp ./target-file ./target-file.tmp; done wherein target-file is created using dd if=/dev/urandom of=target-file bs=1M count=100 and has du -h target-file output of 101M target-file.

I attempt to measure disk IO of this process using htop. I enabled all of these columns (availabe in htop: F2 -> select "Columns" -> on the right scroll to the bottom):

This is sample output at some time (see bottom row):

I am clear on few parameters: RD_CHAR is greater than WR_CHAR and RD_SYSC is greater than WR_SYSC. Logically, for cp to work, it has to first read the file and then write it. So this makes sense.

However, DISK_READ is consistently 0.00 B/s, and IO_RBYTES is much much less than IO_WBYTES. Obviously, assuming this to mean that the process is reading no bytes per second woud be incorrect. Or to assume that it is writing more information than it has read is incorect too.

Question: how to logically understand these very low reported values of IO_RBYTES and DISK_READ?

Since you're copying the same file over and over, disk caching might be coming into play. So the file gets read once, is cached, and then written over and over a bunch of times. You can try using a single large file instead, and see if the measurements align with what you expect. — Haxiel, Jun 09 '21 at 16:10
Thanks for the note @Haxiel I knew the different caches in a process but didn't know about file system caching. I tried with a 1GiB file, this time the DISK_READ rate still almost always remains 0 (or spikes upto very low values compared to the corresponding write rates. The IO_RBYTES does increase a lot many times but it is still mostly zero and lesser than IO_WBYTES. — Gaurang Tandon, Jun 09 '21 at 16:22

phemmer · Answer 1 · 2021-06-09T16:56:32.943

Those fields come from /proc/$pid/io which you can find documented here: https://man7.org/linux/man-pages/man5/proc.5.html

rchar: characters read
The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read(2) and similar system calls. It includes things such as terminal I/O and is unaffected by whether or not actual physical disk I/O was required (the read might have been satisfied from pagecache).

wchar: characters written
The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with rchar.

read_bytes: bytes read
Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. This is accurate for block-backed filesystems.

write_bytes: bytes written
Attempt to count the number of bytes which this process caused to be sent to the storage layer.

There are a few critical subtleties in these definitions.

I am clear on few parameters: RD_CHAR is greater than WR_CHAR and RD_SYSC is greater than WR_SYSC. Logically, for cp to work, it has to first read the file and then write it. So this makes sense.

Yes, that is correct, however there's more to it. The rchar and wchar metrics include all reads and writes, including STDIN, STDOUT, STDERR, network, etc...
Now you're just using cp and dd, which don't output that much, so in this specific scenario, it shouldn't make much difference. But it is something to be aware of when working with more complex processes.

However, DISK_READ is consistently 0.00 B/s, and IO_RBYTES is much much less than IO_WBYTES. Obviously, assuming this to mean that the process is reading no bytes per second woud be incorrect.

The critical bit to understanding this behavior is:

read_bytes: bytes read
Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer.

In your case, the input data is likely still in the page cache, so you're not actually hitting the disk, you're reading from memory. But since you're writing to a new file, the output is hitting the disk.

Tl;Dr: The *char metrics include all I/O to/from the process, including disk, TTY, network, etc. The *bytes metrics are only physical disk I/O.

Thanks. Is there a way to drop this file from the fs cache on every iteration of the loop? One way I can think of is doing the same operation on 10 random files in cyclic order. But I do not how much caching it will alleviate. Another way I just found is this link. Do you have any suggestions on this? — Gaurang Tandon, Jun 10 '21 at 03:28
It's easy to drop the whole page cache, but not so much dropping individual files. The linked answer is probably worth trying. The 10 random files thing sounds unreliable. — phemmer, Jun 10 '21 at 19:38

Unclear htop output when measuring IO usage of a dummy process

1 Answers1